Feat/#26 mock jobposting entity#94
Conversation
📝 WalkthroughWalkthroughIntroduces a full corpus administration pipeline: three new JPA entities ( ChangesCorpus Admin Pipeline
Sequence Diagram(s)sequenceDiagram
participant Client
participant CorpusAdminController
participant CorpusImportService
participant CorpusClassificationResolver
participant DetailClassificationRepository
participant MockJobPostingCorpusRepository
Client->>CorpusAdminController: POST /api/admin/corpus/import (CorpusImportRequest)
CorpusAdminController->>CorpusImportService: importFromXlsx(path)
loop Each XLSX row
CorpusImportService->>CorpusClassificationResolver: resolve(jobGroupL1, jobFamilyL2, roleL3)
CorpusClassificationResolver->>DetailClassificationRepository: findByHierarchyNames / countByDetailName
DetailClassificationRepository-->>CorpusClassificationResolver: Optional<DetailClassification>
CorpusClassificationResolver-->>CorpusImportService: Optional<DetailClassification>
CorpusImportService->>MockJobPostingCorpusRepository: findBySourceAnalysisId / save
end
CorpusImportService-->>CorpusAdminController: CorpusImportResult
CorpusAdminController-->>Client: ApiResponse<CorpusImportResult>
sequenceDiagram
participant Client
participant CorpusAdminController
participant CorpusEmbeddingSyncService
participant CohereCorpusEmbeddingClient
participant CohereAPI
participant PostgreSQL
Client->>CorpusAdminController: POST /api/admin/corpus/embeddings/sync (limit)
CorpusAdminController->>CorpusEmbeddingSyncService: syncAll(limit)
CorpusEmbeddingSyncService->>CorpusEmbeddingSyncService: fetch eligible corpus rows, partition batches
loop Each batch
CorpusEmbeddingSyncService->>CohereCorpusEmbeddingClient: embed(texts)
CohereCorpusEmbeddingClient->>CohereAPI: POST /v2/embed
CohereAPI-->>CohereCorpusEmbeddingClient: float embeddings
CohereCorpusEmbeddingClient-->>CorpusEmbeddingSyncService: List<float[]>
CorpusEmbeddingSyncService->>PostgreSQL: JDBC upsert PGvector rows
end
CorpusEmbeddingSyncService-->>CorpusAdminController: CorpusEmbeddingSyncResponse
CorpusAdminController-->>Client: ApiResponse<CorpusEmbeddingSyncResponse>
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 10
🧹 Nitpick comments (6)
src/main/java/com/jobdri/jobdri_api/domain/user/entity/User.java (1)
110-112: ⚡ Quick winConsider restricting visibility and adding audit logging.
The
promoteToAdmin()method is public and performs no internal authorization checks, allowing any code with aUserreference to grant admin privileges. While the current caller (BootstrapAdminService) checks the role first, direct invocations bypass that safeguard. Additionally, admin role changes typically require audit logging for security and compliance.🔒 Recommended improvements
- Reduce visibility to package-private if callers are within the same package:
-public void promoteToAdmin() { +void promoteToAdmin() {
- Add audit logging:
+@Slf4j // at class level + public void promoteToAdmin() { this.role = UserRole.ADMIN; + log.warn("User promoted to ADMIN role: email={}, id={}", this.email, this.id); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/main/java/com/jobdri/jobdri_api/domain/user/entity/User.java` around lines 110 - 112, The promoteToAdmin() method in the User class is currently public with no authorization checks or audit logging, creating a security risk since any code with a User reference can grant admin privileges. Change the method visibility from public to package-private (remove the public modifier) since the only caller BootstrapAdminService is in the same package, and add audit logging to the promoteToAdmin() method to record when admin role changes occur for compliance and security tracking purposes.src/main/java/com/jobdri/jobdri_api/domain/corpus/service/BootstrapAdminService.java (1)
33-35: ⚡ Quick winConsider logging when configured emails are not found.
The current implementation silently skips emails that don't match any user (Line 34 uses
ifPresent). For operational visibility, log a warning when a configured bootstrap email has no corresponding user record, as this might indicate a configuration error or a missing account.📋 Recommended logging improvement
for (String email : emails) { - userRepository.findByEmail(email).ifPresent(this::promoteIfNeeded); + var userOpt = userRepository.findByEmail(email); + if (userOpt.isPresent()) { + promoteIfNeeded(userOpt.get()); + } else { + log.warn("Bootstrap admin email not found in database: {}", email); + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/BootstrapAdminService.java` around lines 33 - 35, In the BootstrapAdminService class, modify the email processing loop to add logging visibility when a configured bootstrap email has no corresponding user. Currently, the userRepository.findByEmail(email).ifPresent(this::promoteIfNeeded) pattern silently skips missing emails. Replace this with either ifPresentOrElse() or add an additional condition to log a warning when the Optional is empty, ensuring you capture the email address in the log message so operators can identify which configured emails lack corresponding user records.src/main/java/com/jobdri/jobdri_api/domain/corpus/repository/MockQuestionCorpusRepository.java (1)
16-16: 💤 Low valueReconsider the
IsNotNullpredicate ifembeddingTextis truly non-nullable.Similar to
MockJobPostingCorpusRepository, the entity mapping showsembeddingTextwithnullable = false, yet the query includesAndEmbeddingTextIsNotNull. If the column constraint guarantees non-null values, this clause is redundant. Retain it only if rows can exist with nullembeddingTextduring a multi-step import workflow.♻️ Simplified query if embeddingText is guaranteed non-null
-List<MockQuestionCorpus> findAllByValidForEmbeddingTrueAndEmbeddingTextIsNotNullOrderByIdAsc(); +List<MockQuestionCorpus> findAllByValidForEmbeddingTrueOrderByIdAsc();🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/main/java/com/jobdri/jobdri_api/domain/corpus/repository/MockQuestionCorpusRepository.java` at line 16, The method findAllByValidForEmbeddingTrueAndEmbeddingTextIsNotNullOrderByIdAsc() in MockQuestionCorpusRepository includes an EmbeddingTextIsNotNull predicate, but the entity mapping declares embeddingText with nullable=false, making the predicate redundant. Determine if embeddingText can actually be null during import workflows: if it cannot be null (as the entity constraint suggests), remove the AndEmbeddingTextIsNotNull clause from the method name to simplify it to findAllByValidForEmbeddingTrueOrderByIdAsc(); if rows can temporarily have null embeddingText during a multi-step import, instead update the entity mapping to nullable=true to align with the query logic.src/main/java/com/jobdri/jobdri_api/domain/corpus/repository/MockJobPostingCorpusRepository.java (1)
14-14: 💤 Low valueReconsider the
IsNotNullpredicate ifembeddingTextis truly non-nullable.The entity mapping shows
embeddingTextwithnullable = false, yet the query includesAndEmbeddingTextIsNotNull. If the column constraint guarantees non-null values at the database level, theIsNotNullclause is redundant. However, if rows can be inserted beforeembeddingTextis populated (e.g., in a multi-step workflow), the predicate is a necessary safeguard.♻️ Simplified query if embeddingText is guaranteed non-null
-List<MockJobPostingCorpus> findAllByValidForEmbeddingTrueAndEmbeddingTextIsNotNullOrderByIdAsc(); +List<MockJobPostingCorpus> findAllByValidForEmbeddingTrueOrderByIdAsc();🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/main/java/com/jobdri/jobdri_api/domain/corpus/repository/MockJobPostingCorpusRepository.java` at line 14, Review the entity mapping for the `embeddingText` field in MockJobPostingCorpus to determine if it is truly non-nullable at the database level. If the column has a `nullable = false` constraint and there is no multi-step workflow where rows are inserted before `embeddingText` is populated, remove the `IsNotNull` predicate from the method name `findAllByValidForEmbeddingTrueAndEmbeddingTextIsNotNullOrderByIdAsc` since the database constraint already guarantees non-null values. If the field can be null during intermediate stages of a workflow, keep the predicate as a safeguard.src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusImportService.java (1)
222-231: ⚡ Quick winCache company resolution within a workbook import.
resolveCompanyhits the repository per row. For large sheets with repeated company names, this creates avoidable N+1 lookups.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusImportService.java` around lines 222 - 231, The resolveCompany method performs a repository lookup for every row in an import without caching results, causing N+1 query problems when company names are repeated across rows. Add a local cache (Map) to store resolved companies by their normalized name within the scope of a single import operation. Modify resolveCompany to check this cache first before querying the repository, and populate it when creating or retrieving companies. Initialize the cache at the beginning of the import method that calls resolveCompany, and pass it as a parameter to resolveCompany so that repeated company name lookups within the same import reuse cached results instead of hitting the repository multiple times.src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingSyncService.java (1)
67-69: ⚡ Quick winApply
limitat query level instead of loading full corpora first.Line 67–69 and Line 75–77 fetch all eligible rows, then Line 129–134 trims in memory; this scales poorly as corpus size grows.
Also applies to: 75-77, 129-134
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingSyncService.java` around lines 67 - 69, The current implementation fetches all eligible corpora from the database and then applies the limit in memory using the applyLimit method, which is inefficient at scale. Modify the repository query methods at lines 67-69 and 75-77 to accept a limit parameter and apply pagination/limiting at the query level rather than loading all results first (consider using Spring Data's Pageable interface or creating new query methods with limit parameters such as findAllByValidForEmbeddingTrueAndEmbeddingTextIsNotNullOrderByIdAscWithLimit). Then remove or simplify the applyLimit method at lines 129-134 since the limiting will now be handled at the database query level.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@src/main/java/com/jobdri/jobdri_api/domain/classification/repository/DetailClassificationRepository.java`:
- Line 15: The methods countByDetailName and findByDetailName in
DetailClassificationRepository are case-sensitive derived queries, which causes
the fallback resolution in CorpusClassificationResolver to miss matches when
only the casing differs (e.g., searching for "software engineer" when "Software
Engineer" is stored). Replace these case-sensitive derived query methods with
case-insensitive variants by using the appropriate JPA query syntax that
performs case-insensitive comparisons, ensuring that detail names are matched
regardless of letter casing.
In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/controller/CorpusAdminController.java`:
- Around line 33-39: The importCorpus method in CorpusAdminController accepts a
raw filesystem path directly from the request parameter request.filePath()
without validation, creating a path traversal vulnerability where admin tokens
could access arbitrary files on the server. Fix this by normalizing the provided
file path using Path.of().normalize() or similar canonicalization, then validate
that the normalized path is within an approved configured import root directory
(using a whitelist approach such as startsWith checks against the base
directory). Only proceed with corpusImportService.importFromXlsx() if the path
validation succeeds; otherwise, reject the request with an appropriate error
response.
In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CohereCorpusEmbeddingClient.java`:
- Around line 40-56: The RestClient instantiation in the
CohereCorpusEmbeddingClient for building the Cohere API client lacks explicit
HTTP timeouts, which can cause request threads to block indefinitely if the API
hangs or responds slowly. Create a SimpleClientHttpRequestFactory instance
configured with a 5-second connect timeout and 10-second read timeout, then pass
this factory to the restClientBuilder via the .requestFactory() method before
calling .build(), following the same pattern used in TossPaymentClient elsewhere
in the codebase.
In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusAdminRunner.java`:
- Around line 35-42: The optional startup tasks in the if blocks (corpus import
and embedding sync) lack local exception handling, which means any failure will
bubble up and stop the application from starting. Wrap each optional task (the
importFromXlsx call and the syncAll call) in try-catch blocks to handle
exceptions locally. Log the error details when exceptions occur, then allow the
application to continue starting. This ensures that failures in these optional
startup operations do not cause the entire application to fail to boot.
In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusClassificationResolver.java`:
- Around line 71-86: The registerMapping method normalizes the three job
parameters (jobGroupL1, jobFamilyL2, roleL3) but does not validate that the
normalize() calls produce non-null results, allowing null values to reach the
database and cause constraint violations instead of controlled validation
errors. Add input validation immediately after the three normalize() calls to
verify that normalizedJobGroup, normalizedJobFamily, and normalizedRole are all
non-null, and throw an appropriate validation exception if any of them are null
before proceeding to the
mappingRepository.findBySourceJobGroupL1AndSourceJobFamilyL2AndSourceRoleL3
call.
In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingSyncService.java`:
- Around line 58-63: Remove the readOnly = true parameter from the
`@Transactional` annotation on the three methods syncAll (line 58),
syncJobPostingEmbeddings (line 65), and syncQuestionEmbeddings (line 73). These
methods perform database write operations through calls to upsertVectors, and
marking them as read-only transactions prevents these writes from executing
properly. Either remove the readOnly parameter entirely (to default to readOnly
= false) or explicitly set readOnly = false on the `@Transactional` annotations
for all three methods.
- Around line 110-123: The upsertVectors() method performs batch write
operations using a connection obtained directly from dataSource.getConnection(),
which bypasses Spring's transaction management. Replace
dataSource.getConnection() with DataSourceUtils.getConnection(dataSource) to
ensure the batch operations participate in the active Spring transaction, and
properly release the connection using DataSourceUtils.releaseConnection() in a
finally block or try-with-resources equivalent. Additionally, either remove the
readOnly=true constraint from the public methods that call upsertVectors() and
mark upsertVectors() itself as `@Transactional`, or alternatively create a new
public `@Transactional` (non-readOnly) method that wraps the upsertVectors() call
to provide proper transactional boundaries for the write operations.
In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusImportService.java`:
- Around line 261-273: The getInteger method at line 272 calls
Integer.parseInt(value) directly, which throws NumberFormatException when the
DataFormatter returns Excel-formatted numbers like "1,000" or "120.0", causing
the entire import to fail. Add error handling around the Integer.parseInt call
in the getInteger method to catch NumberFormatException, and before parsing,
clean the value string by removing any formatting characters (such as commas)
and handling decimal places appropriately (either by stripping decimals or
rounding). Return null or a sensible default if parsing fails after cleanup,
allowing the import to continue gracefully instead of aborting.
- Around line 254-257: The getString method in CorpusImportService currently
returns null when a required header like analysis_id or question_id is missing,
causing rows to be silently skipped and producing a false-success import. Add a
validateRequiredHeaders method that takes the headerMap and a variable number of
required column names as parameters, then iterates through each required column
name and throws an IllegalArgumentException with a descriptive message if any
required column is not found in the headerMap. Call this validation method
immediately after the header map is populated from the sheet (before any data
processing occurs) to fail fast and prevent silent skipping of required fields.
This ensures that missing required headers are caught upfront rather than
silently producing empty imports.
In `@src/main/resources/schema.sql`:
- Around line 4-11: The mock_job_posting_embeddings table in schema.sql defines
foreign key constraints referencing mock_job_posting_corpus, which is not
defined in schema.sql but instead is created by Hibernate as a JPA `@Entity`
(MockJobPostingCorpus). Since the foreign key constraint requires the parent
table to exist first, schema.sql will fail if it executes before Hibernate
creates the JPA entity tables. Resolve this by either disabling
hibernate.ddl-auto in application properties and managing all DDL through a
migration tool like Flyway or Liquibase, or ensuring schema.sql is executed
after JPA table generation by adjusting your initialization configuration.
Verify that all referenced parent tables (mock_job_posting_corpus and
mock_question_corpus) are consistently managed through the same DDL mechanism.
---
Nitpick comments:
In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/repository/MockJobPostingCorpusRepository.java`:
- Line 14: Review the entity mapping for the `embeddingText` field in
MockJobPostingCorpus to determine if it is truly non-nullable at the database
level. If the column has a `nullable = false` constraint and there is no
multi-step workflow where rows are inserted before `embeddingText` is populated,
remove the `IsNotNull` predicate from the method name
`findAllByValidForEmbeddingTrueAndEmbeddingTextIsNotNullOrderByIdAsc` since the
database constraint already guarantees non-null values. If the field can be null
during intermediate stages of a workflow, keep the predicate as a safeguard.
In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/repository/MockQuestionCorpusRepository.java`:
- Line 16: The method
findAllByValidForEmbeddingTrueAndEmbeddingTextIsNotNullOrderByIdAsc() in
MockQuestionCorpusRepository includes an EmbeddingTextIsNotNull predicate, but
the entity mapping declares embeddingText with nullable=false, making the
predicate redundant. Determine if embeddingText can actually be null during
import workflows: if it cannot be null (as the entity constraint suggests),
remove the AndEmbeddingTextIsNotNull clause from the method name to simplify it
to findAllByValidForEmbeddingTrueOrderByIdAsc(); if rows can temporarily have
null embeddingText during a multi-step import, instead update the entity mapping
to nullable=true to align with the query logic.
In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/BootstrapAdminService.java`:
- Around line 33-35: In the BootstrapAdminService class, modify the email
processing loop to add logging visibility when a configured bootstrap email has
no corresponding user. Currently, the
userRepository.findByEmail(email).ifPresent(this::promoteIfNeeded) pattern
silently skips missing emails. Replace this with either ifPresentOrElse() or add
an additional condition to log a warning when the Optional is empty, ensuring
you capture the email address in the log message so operators can identify which
configured emails lack corresponding user records.
In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingSyncService.java`:
- Around line 67-69: The current implementation fetches all eligible corpora
from the database and then applies the limit in memory using the applyLimit
method, which is inefficient at scale. Modify the repository query methods at
lines 67-69 and 75-77 to accept a limit parameter and apply pagination/limiting
at the query level rather than loading all results first (consider using Spring
Data's Pageable interface or creating new query methods with limit parameters
such as
findAllByValidForEmbeddingTrueAndEmbeddingTextIsNotNullOrderByIdAscWithLimit).
Then remove or simplify the applyLimit method at lines 129-134 since the
limiting will now be handled at the database query level.
In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusImportService.java`:
- Around line 222-231: The resolveCompany method performs a repository lookup
for every row in an import without caching results, causing N+1 query problems
when company names are repeated across rows. Add a local cache (Map) to store
resolved companies by their normalized name within the scope of a single import
operation. Modify resolveCompany to check this cache first before querying the
repository, and populate it when creating or retrieving companies. Initialize
the cache at the beginning of the import method that calls resolveCompany, and
pass it as a parameter to resolveCompany so that repeated company name lookups
within the same import reuse cached results instead of hitting the repository
multiple times.
In `@src/main/java/com/jobdri/jobdri_api/domain/user/entity/User.java`:
- Around line 110-112: The promoteToAdmin() method in the User class is
currently public with no authorization checks or audit logging, creating a
security risk since any code with a User reference can grant admin privileges.
Change the method visibility from public to package-private (remove the public
modifier) since the only caller BootstrapAdminService is in the same package,
and add audit logging to the promoteToAdmin() method to record when admin role
changes occur for compliance and security tracking purposes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 85bc6e99-fe4d-44c6-bff1-040a4cdc5187
📒 Files selected for processing (25)
build.gradlesrc/main/java/com/jobdri/jobdri_api/domain/classification/repository/DetailClassificationRepository.javasrc/main/java/com/jobdri/jobdri_api/domain/corpus/controller/CorpusAdminController.javasrc/main/java/com/jobdri/jobdri_api/domain/corpus/dto/request/CorpusEmbeddingSyncRequest.javasrc/main/java/com/jobdri/jobdri_api/domain/corpus/dto/request/CorpusImportRequest.javasrc/main/java/com/jobdri/jobdri_api/domain/corpus/dto/response/CorpusEmbeddingSyncResponse.javasrc/main/java/com/jobdri/jobdri_api/domain/corpus/entity/CorpusClassificationMapping.javasrc/main/java/com/jobdri/jobdri_api/domain/corpus/entity/MockJobPostingCorpus.javasrc/main/java/com/jobdri/jobdri_api/domain/corpus/entity/MockQuestionCorpus.javasrc/main/java/com/jobdri/jobdri_api/domain/corpus/repository/CorpusClassificationMappingRepository.javasrc/main/java/com/jobdri/jobdri_api/domain/corpus/repository/MockJobPostingCorpusRepository.javasrc/main/java/com/jobdri/jobdri_api/domain/corpus/repository/MockQuestionCorpusRepository.javasrc/main/java/com/jobdri/jobdri_api/domain/corpus/service/BootstrapAdminService.javasrc/main/java/com/jobdri/jobdri_api/domain/corpus/service/CohereCorpusEmbeddingClient.javasrc/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusAdminRunner.javasrc/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusClassificationResolver.javasrc/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingClient.javasrc/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingSyncService.javasrc/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusImportResult.javasrc/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusImportService.javasrc/main/java/com/jobdri/jobdri_api/domain/user/entity/User.javasrc/main/java/com/jobdri/jobdri_api/global/config/SecurityConfig.javasrc/main/resources/application-dev.yamlsrc/main/resources/application-prod.yamlsrc/main/resources/schema.sql
| public interface DetailClassificationRepository extends JpaRepository<DetailClassification, Long> { | ||
| List<DetailClassification> findAllByMiddleClassificationId(Long middleClassificationId); | ||
| Optional<DetailClassification> findByDetailName(String detailName); | ||
| long countByDetailName(String detailName); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -type f -name "DetailClassificationRepository.java"Repository: JobDri-Developer/BackEnd
Length of output: 175
🏁 Script executed:
find . -type f -name "CorpusClassificationResolver.java"Repository: JobDri-Developer/BackEnd
Length of output: 162
🏁 Script executed:
find . -type f -name "*.java" | grep -i "classification" | head -20Repository: JobDri-Developer/BackEnd
Length of output: 1908
🏁 Script executed:
cat -n src/main/java/com/jobdri/jobdri_api/domain/classification/repository/DetailClassificationRepository.javaRepository: JobDri-Developer/BackEnd
Length of output: 2841
🏁 Script executed:
cat -n src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusClassificationResolver.javaRepository: JobDri-Developer/BackEnd
Length of output: 4267
Use case-insensitive detail-name methods for fallback resolution.
countByDetailName at line 15 and findByDetailName at line 14 are case-sensitive derived queries. In CorpusClassificationResolver (lines 57-59), this causes the fallback resolution to incorrectly skip a unique match when only casing differs (e.g., Software Engineer stored but software engineer searched). The normalize() method only trims whitespace, not case.
Replace with case-insensitive variants:
Suggested fix
--- a/src/main/java/com/jobdri/jobdri_api/domain/classification/repository/DetailClassificationRepository.java
+++ b/src/main/java/com/jobdri/jobdri_api/domain/classification/repository/DetailClassificationRepository.java
@@
- Optional<DetailClassification> findByDetailName(String detailName);
- long countByDetailName(String detailName);
+ Optional<DetailClassification> findByDetailNameIgnoreCase(String detailName);
+ long countByDetailNameIgnoreCase(String detailName);--- a/src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusClassificationResolver.java
+++ b/src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusClassificationResolver.java
@@
- if (detailClassificationRepository.countByDetailName(normalizedRole) == 1) {
- return detailClassificationRepository.findByDetailName(normalizedRole);
+ if (detailClassificationRepository.countByDetailNameIgnoreCase(normalizedRole) == 1) {
+ return detailClassificationRepository.findByDetailNameIgnoreCase(normalizedRole);
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@src/main/java/com/jobdri/jobdri_api/domain/classification/repository/DetailClassificationRepository.java`
at line 15, The methods countByDetailName and findByDetailName in
DetailClassificationRepository are case-sensitive derived queries, which causes
the fallback resolution in CorpusClassificationResolver to miss matches when
only the casing differs (e.g., searching for "software engineer" when "Software
Engineer" is stored). Replace these case-sensitive derived query methods with
case-insensitive variants by using the appropriate JPA query syntax that
performs case-insensitive comparisons, ensuring that detail names are matched
regardless of letter casing.
| public ApiResponse<CorpusImportResult> importCorpus( | ||
| @Valid @RequestBody CorpusImportRequest request | ||
| ) throws IOException { | ||
| return ApiResponse.onSuccess( | ||
| "corpus 엑셀 적재에 성공했습니다.", | ||
| corpusImportService.importFromXlsx(Path.of(request.filePath())) | ||
| ); |
There was a problem hiding this comment.
Constrain import file paths to an approved base directory.
Line 38 accepts a raw server filesystem path from the request; this creates an arbitrary local file access surface for admin tokens. Enforce normalization and allowlist a configured import root.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/controller/CorpusAdminController.java`
around lines 33 - 39, The importCorpus method in CorpusAdminController accepts a
raw filesystem path directly from the request parameter request.filePath()
without validation, creating a path traversal vulnerability where admin tokens
could access arbitrary files on the server. Fix this by normalizing the provided
file path using Path.of().normalize() or similar canonicalization, then validate
that the normalized path is within an approved configured import root directory
(using a whitelist approach such as startsWith checks against the base
directory). Only proceed with corpusImportService.importFromXlsx() if the path
validation succeeds; otherwise, reject the request with an appropriate error
response.
| RestClient client = restClientBuilder | ||
| .baseUrl("https://api.cohere.com") | ||
| .defaultHeader(HttpHeaders.AUTHORIZATION, "Bearer " + cohereApiKey) | ||
| .defaultHeader(HttpHeaders.CONTENT_TYPE, MediaType.APPLICATION_JSON_VALUE) | ||
| .build(); | ||
|
|
||
| EmbedResponse response = client.post() | ||
| .uri("/v2/embed") | ||
| .body(new EmbedRequest( | ||
| texts, | ||
| embeddingModel, | ||
| documentInputType, | ||
| outputDimension, | ||
| List.of("float") | ||
| )) | ||
| .retrieve() | ||
| .body(EmbedResponse.class); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify whether RestClient timeout settings are configured anywhere in the repository
rg -n -C3 "RestClient\\.Builder|ClientHttpRequestFactory|setConnectTimeout|setReadTimeout|JdkClientHttpRequestFactory|HttpComponentsClientHttpRequestFactory" src/main/java src/main/resourcesRepository: JobDri-Developer/BackEnd
Length of output: 3892
🏁 Script executed:
cat -n src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CohereCorpusEmbeddingClient.javaRepository: JobDri-Developer/BackEnd
Length of output: 3681
🏁 Script executed:
# Search for RestClientBuilder or RestClient bean configuration in Spring configs
rg -n "RestClient\.Builder|`@Bean`.*RestClient|restClientBuilder" src/main/java --type java -A 5Repository: JobDri-Developer/BackEnd
Length of output: 3026
Add explicit HTTP timeouts to the Cohere client.
Lines 40–56 make a blocking outbound call without connect or read timeouts. If the Cohere API hangs or is slow to respond, this can block request threads indefinitely. Set timeouts using SimpleClientHttpRequestFactory and pass it via .requestFactory(), similar to TossPaymentClient in the same codebase (which configures 5s connect and 10s read timeouts).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CohereCorpusEmbeddingClient.java`
around lines 40 - 56, The RestClient instantiation in the
CohereCorpusEmbeddingClient for building the Cohere API client lacks explicit
HTTP timeouts, which can cause request threads to block indefinitely if the API
hangs or responds slowly. Create a SimpleClientHttpRequestFactory instance
configured with a 5-second connect timeout and 10-second read timeout, then pass
this factory to the restClientBuilder via the .requestFactory() method before
calling .build(), following the same pattern used in TossPaymentClient elsewhere
in the codebase.
| if (runImportOnStartup && StringUtils.hasText(importXlsxPath)) { | ||
| CorpusImportResult result = corpusImportService.importFromXlsx(Path.of(importXlsxPath)); | ||
| log.info("corpus import 완료: {}", result); | ||
| } | ||
|
|
||
| if (syncEmbeddingsOnStartup) { | ||
| log.info("corpus embedding sync 완료: {}", corpusEmbeddingSyncService.syncAll(null)); | ||
| } |
There was a problem hiding this comment.
Isolate optional startup tasks from application boot failure.
Lines 35–42 run optional import/sync without local exception handling. A single file/path/API failure will bubble up through run(...) and can stop the whole app from starting.
Proposed fix
if (runImportOnStartup && StringUtils.hasText(importXlsxPath)) {
- CorpusImportResult result = corpusImportService.importFromXlsx(Path.of(importXlsxPath));
- log.info("corpus import 완료: {}", result);
+ try {
+ CorpusImportResult result = corpusImportService.importFromXlsx(Path.of(importXlsxPath));
+ log.info("corpus import 완료: {}", result);
+ } catch (Exception e) {
+ log.error("corpus import startup task failed", e);
+ }
}
if (syncEmbeddingsOnStartup) {
- log.info("corpus embedding sync 완료: {}", corpusEmbeddingSyncService.syncAll(null));
+ try {
+ log.info("corpus embedding sync 완료: {}", corpusEmbeddingSyncService.syncAll(null));
+ } catch (Exception e) {
+ log.error("corpus embedding sync startup task failed", e);
+ }
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (runImportOnStartup && StringUtils.hasText(importXlsxPath)) { | |
| CorpusImportResult result = corpusImportService.importFromXlsx(Path.of(importXlsxPath)); | |
| log.info("corpus import 완료: {}", result); | |
| } | |
| if (syncEmbeddingsOnStartup) { | |
| log.info("corpus embedding sync 완료: {}", corpusEmbeddingSyncService.syncAll(null)); | |
| } | |
| if (runImportOnStartup && StringUtils.hasText(importXlsxPath)) { | |
| try { | |
| CorpusImportResult result = corpusImportService.importFromXlsx(Path.of(importXlsxPath)); | |
| log.info("corpus import 완료: {}", result); | |
| } catch (Exception e) { | |
| log.error("corpus import startup task failed", e); | |
| } | |
| } | |
| if (syncEmbeddingsOnStartup) { | |
| try { | |
| log.info("corpus embedding sync 완료: {}", corpusEmbeddingSyncService.syncAll(null)); | |
| } catch (Exception e) { | |
| log.error("corpus embedding sync startup task failed", e); | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusAdminRunner.java`
around lines 35 - 42, The optional startup tasks in the if blocks (corpus import
and embedding sync) lack local exception handling, which means any failure will
bubble up and stop the application from starting. Wrap each optional task (the
importFromXlsx call and the syncAll call) in try-catch blocks to handle
exceptions locally. Log the error details when exceptions occur, then allow the
application to continue starting. This ensures that failures in these optional
startup operations do not cause the entire application to fail to boot.
| String normalizedJobGroup = normalize(jobGroupL1); | ||
| String normalizedJobFamily = normalize(jobFamilyL2); | ||
| String normalizedRole = normalize(roleL3); | ||
|
|
||
| return mappingRepository | ||
| .findBySourceJobGroupL1AndSourceJobFamilyL2AndSourceRoleL3( | ||
| normalizedJobGroup, | ||
| normalizedJobFamily, | ||
| normalizedRole | ||
| ) | ||
| .orElseGet(() -> mappingRepository.save(CorpusClassificationMapping.create( | ||
| normalizedJobGroup, | ||
| normalizedJobFamily, | ||
| normalizedRole, | ||
| detailClassification | ||
| ))); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -type f -name "CorpusClassificationResolver.java" | head -5Repository: JobDri-Developer/BackEnd
Length of output: 162
🏁 Script executed:
find . -type f -name "CorpusClassificationMapping.java" | head -5Repository: JobDri-Developer/BackEnd
Length of output: 160
🏁 Script executed:
cat -n src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusClassificationResolver.java | sed -n '60,100p'Repository: JobDri-Developer/BackEnd
Length of output: 1469
🏁 Script executed:
cat -n src/main/java/com/jobdri/jobdri_api/domain/corpus/entity/CorpusClassificationMapping.java | sed -n '1,50p'Repository: JobDri-Developer/BackEnd
Length of output: 2098
🏁 Script executed:
cat -n src/main/java/com/jobdri/jobdri_api/domain/corpus/entity/CorpusClassificationMapping.java | sed -n '45,60p'Repository: JobDri-Developer/BackEnd
Length of output: 742
🏁 Script executed:
rg "registerMapping" src/main/java/ -A 3 -B 3 --type javaRepository: JobDri-Developer/BackEnd
Length of output: 884
Add input validation to prevent null values from breaching NOT NULL constraints.
The registerMapping method normalizes the job triplet parameters, but normalize() returns null for blank inputs (line 90-91). Without validation, null values can be passed to save() at line 81, causing a database constraint violation instead of a controlled validation error. The mapped entity requires non-null source_job_group_l1, source_job_family_l2, and source_role_l3 (lines 29, 32, 35 of CorpusClassificationMapping.java).
Suggested fix
--- a/src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusClassificationResolver.java
+++ b/src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusClassificationResolver.java
@@
public CorpusClassificationMapping registerMapping(
String jobGroupL1,
String jobFamilyL2,
String roleL3,
DetailClassification detailClassification
) {
String normalizedJobGroup = normalize(jobGroupL1);
String normalizedJobFamily = normalize(jobFamilyL2);
String normalizedRole = normalize(roleL3);
+
+ if (!StringUtils.hasText(normalizedJobGroup)
+ || !StringUtils.hasText(normalizedJobFamily)
+ || !StringUtils.hasText(normalizedRole)) {
+ throw new IllegalArgumentException("jobGroupL1, jobFamilyL2, and roleL3 are required.");
+ }
+ if (detailClassification == null) {
+ throw new IllegalArgumentException("detailClassification is required.");
+ }
return mappingRepository
.findBySourceJobGroupL1AndSourceJobFamilyL2AndSourceRoleL3(
normalizedJobGroup,
normalizedJobFamily,
normalizedRole
)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusClassificationResolver.java`
around lines 71 - 86, The registerMapping method normalizes the three job
parameters (jobGroupL1, jobFamilyL2, roleL3) but does not validate that the
normalize() calls produce non-null results, allowing null values to reach the
database and cause constraint violations instead of controlled validation
errors. Add input validation immediately after the three normalize() calls to
verify that normalizedJobGroup, normalizedJobFamily, and normalizedRole are all
non-null, and throw an appropriate validation exception if any of them are null
before proceeding to the
mappingRepository.findBySourceJobGroupL1AndSourceJobFamilyL2AndSourceRoleL3
call.
| @Transactional(readOnly = true) | ||
| public CorpusEmbeddingSyncResponse syncAll(Integer limit) { | ||
| int jobPostingCount = syncJobPostingEmbeddings(limit); | ||
| int questionCount = syncQuestionEmbeddings(limit); | ||
| return new CorpusEmbeddingSyncResponse(jobPostingCount, questionCount, embeddingModel); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
git ls-files src/main/java/com/jobdri/jobdri_api/domain/corpus/service/ | head -20Repository: JobDri-Developer/BackEnd
Length of output: 757
🏁 Script executed:
cat -n src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingSyncService.java | head -150Repository: JobDri-Developer/BackEnd
Length of output: 7370
Remove readOnly = true from write sync paths.
Methods at lines 58, 65, and 73 (syncAll, syncJobPostingEmbeddings, syncQuestionEmbeddings) are marked read-only but execute database writes via upsertVectors (line 105). Self-invocation from syncAll means the read-only transaction context is inherited during write operations, which can prevent writes depending on database and transaction manager configuration.
Suggested fix
- `@Transactional`(readOnly = true)
+ `@Transactional`
public CorpusEmbeddingSyncResponse syncAll(Integer limit) {
int jobPostingCount = syncJobPostingEmbeddings(limit);
int questionCount = syncQuestionEmbeddings(limit);
return new CorpusEmbeddingSyncResponse(jobPostingCount, questionCount, embeddingModel);
}
- `@Transactional`(readOnly = true)
+ `@Transactional`
public int syncJobPostingEmbeddings(Integer limit) {
...
}
- `@Transactional`(readOnly = true)
+ `@Transactional`
public int syncQuestionEmbeddings(Integer limit) {
...
}Also applies to: 65-79
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingSyncService.java`
around lines 58 - 63, Remove the readOnly = true parameter from the
`@Transactional` annotation on the three methods syncAll (line 58),
syncJobPostingEmbeddings (line 65), and syncQuestionEmbeddings (line 73). These
methods perform database write operations through calls to upsertVectors, and
marking them as read-only transactions prevents these writes from executing
properly. Either remove the readOnly parameter entirely (to default to readOnly
= false) or explicitly set readOnly = false on the `@Transactional` annotations
for all three methods.
| try (Connection connection = dataSource.getConnection()) { | ||
| PGvector.registerTypes(connection); | ||
| try (PreparedStatement statement = connection.prepareStatement(sql)) { | ||
| Timestamp now = Timestamp.valueOf(LocalDateTime.now()); | ||
| for (int i = 0; i < ids.size(); i++) { | ||
| statement.setLong(1, ids.get(i)); | ||
| statement.setString(2, embeddingModel); | ||
| statement.setObject(3, new PGvector(embeddings.get(i))); | ||
| statement.setTimestamp(4, now); | ||
| statement.setTimestamp(5, now); | ||
| statement.addBatch(); | ||
| } | ||
| statement.executeBatch(); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate and examine the CorpusEmbeddingSyncService file
find . -name "CorpusEmbeddingSyncService.java" -type fRepository: JobDri-Developer/BackEnd
Length of output: 160
🏁 Script executed:
# If found, read the entire file to understand the context and method signature
cat -n "src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingSyncService.java"Repository: JobDri-Developer/BackEnd
Length of output: 7370
🏁 Script executed:
# Search for any Spring annotations (`@Transactional`, `@Service`, etc.) in the file
rg -n "`@Transactional`|`@Service`|`@Component`" "src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingSyncService.java"Repository: JobDri-Developer/BackEnd
Length of output: 197
🌐 Web query:
Spring DataSourceUtils.getConnection() vs dataSource.getConnection() transaction management
💡 Result:
The primary difference between DataSourceUtils.getConnection(DataSource) and DataSource.getConnection() is how they handle Spring-managed transactions and exception translation. DataSourceUtils.getConnection(dataSource) is designed to be aware of Spring's transaction synchronization [1][2][3]. If a transaction is active (e.g., managed by DataSourceTransactionManager or JtaTransactionManager), it returns the connection bound to the current thread instead of a brand-new connection [1][4][2]. This ensures that your code participates in the existing transaction rather than accidentally creating a separate, non-transactional connection [3]. If no transaction is active, it behaves exactly like dataSource.getConnection() [1][4]. In contrast, dataSource.getConnection() always attempts to retrieve a new connection from the underlying data source or connection pool, ignoring any ongoing Spring transactions [4]. Key technical differences include: Transaction Awareness: DataSourceUtils detects thread-bound connections linked to an active Spring transaction [1][2][3]. Exception Handling: DataSourceUtils translates standard SQLExceptions into Spring's hierarchy of unchecked DataAccessExceptions (e.g., CannotGetJdbcConnectionException), which simplifies error handling and provides more meaningful information [1][2][3]. Framework Integration: Spring's internal classes, such as JdbcTemplate, use DataSourceUtils implicitly to ensure seamless participation in transaction management [1][5][3]. When writing manual JDBC code within a Spring application, you should use DataSourceUtils.getConnection(dataSource) to ensure consistent transaction participation [1][3]. You must also use DataSourceUtils.releaseConnection(connection, dataSource) to ensure the connection is closed only if it is not managed externally by an active transaction [2][6]. If you are using higher-level abstractions like JdbcTemplate, this management is handled for you automatically [1][3].
Citations:
- 1: https://docs.spring.io/spring-framework/reference/data-access/jdbc/connections.html
- 2: https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/jdbc/datasource/DataSourceUtils.html
- 3: https://docs.spring.io/spring-framework/reference/6.2/data-access/transaction/tx-resource-synchronization.html
- 4: https://stackoverflow.com/questions/9642643/datasourceutils-getconnection-vs-datasource-getconnection
- 5: https://docs.spring.io/spring/reference/data-access/jdbc/connections.html
- 6: https://docs.spring.io/spring-framework/docs/6.2.11/javadoc-api/org/springframework/jdbc/datasource/DataSourceUtils.html
🌐 Web query:
Spring @transactional on private methods behavior transaction proxy
💡 Result:
In Spring’s default proxy-based AOP mode, the @Transactional annotation is not supported on private methods [1][2]. The reasons for this behavior are rooted in the mechanics of how Spring implements transaction management: 1. Proxy Limitations: Spring creates proxies for beans either by wrapping them in a JDK dynamic proxy (for interfaces) or by generating a CGLIB subclass (for classes) [2][3]. Because private methods cannot be overridden or accessed by subclasses or proxy objects, the proxy cannot intercept calls to these methods [2][4]. Consequently, the transactional advice is never applied, and the annotation is silently ignored [1][2]. 2. Visibility Requirements: Historically, Spring required methods to be public for transactional advice to work [5][6]. As of Spring 6.0, protected and package-visible methods are supported for class-based proxies (CGLIB) by default [5][2]. However, private methods remain unsupported in standard proxy-based configurations because they are not visible or overridable by the generated proxy [2][4]. 3. Proxy vs. AspectJ: If transactional behavior is required on private methods, developers typically switch from the default proxy-based AOP to AspectJ-based weaving [1][7]. AspectJ functions by modifying the bytecode directly rather than relying on proxy objects, allowing it to intercept private method calls [2]. Important Note on Self-Invocation: Even if a method is public and properly annotated, transactional behavior will be bypassed if the method is called from within the same class (self-invocation) [5][8]. This occurs because the call is made via the 'this' reference, bypassing the proxy entirely and therefore missing the transactional interceptor [1][4]. To resolve self-invocation issues, developers often move the transactional method to a separate bean or use AopContext to access the current proxy [8][7][9].
Citations:
- 1: https://stackoverflow.com/questions/4396284/does-spring-transactional-attribute-work-on-a-private-method
- 2: https://www.baeldung.com/spring-transactional-annotation-private-method
- 3: https://docs.spring.vmware.com/spring-framework/reference/6.1-SNAPSHOT/core/aop/proxying.html
- 4: https://docs.spring.io/spring/reference/core/aop/proxying.html
- 5: https://docs.spring.io/spring-framework/reference/data-access/transaction/declarative/annotations.html
- 6: https://docs.spring.io/spring-framework/docs/2.0.7/javadoc-api/org/springframework/transaction/annotation/AnnotationTransactionAttributeSource.html
- 7: https://stackoverflow.com/questions/3423972/spring-transaction-method-call-by-the-method-within-the-same-class-does-not-wo
- 8: https://docs.spring.io/spring-framework/reference/6.2-SNAPSHOT/data-access/transaction/declarative/annotations.html
- 9: https://docs.spring.io/spring/reference/6.2/core/aop/proxying.html
Transaction management missing for batch write operations.
The upsertVectors() method is called from @Transactional(readOnly=true) public methods but executes writes (INSERT/UPDATE) outside any Spring-managed transaction due to:
- Direct
dataSource.getConnection()at line 110 bypasses Spring's transaction synchronization upsertVectors()is private and non-transactional (Spring's proxy-based AOP cannot intercept private methods)- Batch writes execute without transactional guarantees, risking partial commits on failures
Use DataSourceUtils.getConnection(dataSource) with proper release to participate in active transactions. Additionally, either remove readOnly=true from calling methods or move write operations to a separate @Transactional method to ensure these operations execute within a transaction context.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingSyncService.java`
around lines 110 - 123, The upsertVectors() method performs batch write
operations using a connection obtained directly from dataSource.getConnection(),
which bypasses Spring's transaction management. Replace
dataSource.getConnection() with DataSourceUtils.getConnection(dataSource) to
ensure the batch operations participate in the active Spring transaction, and
properly release the connection using DataSourceUtils.releaseConnection() in a
finally block or try-with-resources equivalent. Additionally, either remove the
readOnly=true constraint from the public methods that call upsertVectors() and
mark upsertVectors() itself as `@Transactional`, or alternatively create a new
public `@Transactional` (non-readOnly) method that wraps the upsertVectors() call
to provide proper transactional boundaries for the write operations.
| Integer index = headerMap.get(columnName); | ||
| if (index == null) { | ||
| return null; | ||
| } |
There was a problem hiding this comment.
Fail fast when required headers are missing.
getString returns null when a header is absent, which makes required keys like analysis_id/question_id behave as “empty rows” and silently skip the sheet. This can produce a false-success import with zero records.
Proposed fix
Map<String, Integer> headerMap = readHeaderMap(rows.next(), formatter, evaluator);
+validateRequiredHeaders(headerMap, "analysis_id");private void validateRequiredHeaders(Map<String, Integer> headerMap, String... required) {
for (String column : required) {
if (!headerMap.containsKey(column)) {
throw new IllegalArgumentException("Missing required column: " + column);
}
}
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusImportService.java`
around lines 254 - 257, The getString method in CorpusImportService currently
returns null when a required header like analysis_id or question_id is missing,
causing rows to be silently skipped and producing a false-success import. Add a
validateRequiredHeaders method that takes the headerMap and a variable number of
required column names as parameters, then iterates through each required column
name and throws an IllegalArgumentException with a descriptive message if any
required column is not found in the headerMap. Call this validation method
immediately after the header map is populated from the sheet (before any data
processing occurs) to fail fast and prevent silent skipping of required fields.
This ensures that missing required headers are caught upfront rather than
silently producing empty imports.
| private Integer getInteger( | ||
| Row row, | ||
| Map<String, Integer> headerMap, | ||
| String columnName, | ||
| DataFormatter formatter, | ||
| FormulaEvaluator evaluator | ||
| ) { | ||
| String value = getString(row, headerMap, columnName, formatter, evaluator); | ||
| if (!StringUtils.hasText(value)) { | ||
| return null; | ||
| } | ||
| return Integer.parseInt(value); | ||
| } |
There was a problem hiding this comment.
Harden integer parsing for Excel-formatted numeric cells.
Line 272 directly calls Integer.parseInt(value). With POI DataFormatter, cells like 1,000 or 120.0 can trigger NumberFormatException, aborting and rolling back the full import.
Proposed fix
private Integer getInteger(
@@
) {
String value = getString(row, headerMap, columnName, formatter, evaluator);
if (!StringUtils.hasText(value)) {
return null;
}
- return Integer.parseInt(value);
+ String normalized = value.replace(",", "");
+ try {
+ if (normalized.endsWith(".0")) {
+ normalized = normalized.substring(0, normalized.length() - 2);
+ }
+ return Integer.parseInt(normalized);
+ } catch (NumberFormatException ex) {
+ throw new IllegalArgumentException("Invalid integer in column '" + columnName + "': " + value, ex);
+ }
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| private Integer getInteger( | |
| Row row, | |
| Map<String, Integer> headerMap, | |
| String columnName, | |
| DataFormatter formatter, | |
| FormulaEvaluator evaluator | |
| ) { | |
| String value = getString(row, headerMap, columnName, formatter, evaluator); | |
| if (!StringUtils.hasText(value)) { | |
| return null; | |
| } | |
| return Integer.parseInt(value); | |
| } | |
| private Integer getInteger( | |
| Row row, | |
| Map<String, Integer> headerMap, | |
| String columnName, | |
| DataFormatter formatter, | |
| FormulaEvaluator evaluator | |
| ) { | |
| String value = getString(row, headerMap, columnName, formatter, evaluator); | |
| if (!StringUtils.hasText(value)) { | |
| return null; | |
| } | |
| String normalized = value.replace(",", ""); | |
| try { | |
| if (normalized.endsWith(".0")) { | |
| normalized = normalized.substring(0, normalized.length() - 2); | |
| } | |
| return Integer.parseInt(normalized); | |
| } catch (NumberFormatException ex) { | |
| throw new IllegalArgumentException("Invalid integer in column '" + columnName + "': " + value, ex); | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusImportService.java`
around lines 261 - 273, The getInteger method at line 272 calls
Integer.parseInt(value) directly, which throws NumberFormatException when the
DataFormatter returns Excel-formatted numbers like "1,000" or "120.0", causing
the entire import to fail. Add error handling around the Integer.parseInt call
in the getInteger method to catch NumberFormatException, and before parsing,
clean the value string by removing any formatting characters (such as commas)
and handling decimal places appropriately (either by stripping decimals or
rounding). Return null or a sensible default if parsing fails after cleanup,
allowing the import to continue gracefully instead of aborting.
| CREATE TABLE IF NOT EXISTS mock_job_posting_embeddings ( | ||
| id BIGSERIAL PRIMARY KEY, | ||
| corpus_id BIGINT NOT NULL UNIQUE REFERENCES mock_job_posting_corpus(id) ON DELETE CASCADE, | ||
| embedding_model VARCHAR(100) NOT NULL, | ||
| embedding vector NOT NULL, | ||
| created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP, | ||
| updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP | ||
| ); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Search for CREATE TABLE statements for parent corpus tables.
rg -n --type=sql -C3 'CREATE TABLE.*mock_job_posting_corpus'
rg -n --type=sql -C3 'CREATE TABLE.*mock_question_corpus'Repository: JobDri-Developer/BackEnd
Length of output: 50
🏁 Script executed:
# First, let's read the entire schema.sql file to see all tables
git ls-files | grep -i schemaRepository: JobDri-Developer/BackEnd
Length of output: 98
🏁 Script executed:
# Search for the parent table names anywhere in SQL files
rg -n 'mock_job_posting_corpus|mock_question_corpus' --type=sql -B2 -A2Repository: JobDri-Developer/BackEnd
Length of output: 902
🏁 Script executed:
# Check the structure and directory layout for SQL files
find . -name "*.sql" -type f | head -20Repository: JobDri-Developer/BackEnd
Length of output: 100
🏁 Script executed:
# Read the complete schema.sql file to see all table definitions
cat -n src/main/resources/schema.sqlRepository: JobDri-Developer/BackEnd
Length of output: 1679
🏁 Script executed:
# Search entire repository for parent table definitions
rg -i 'CREATE TABLE.*mock_job_posting_corpus|CREATE TABLE.*mock_question_corpus' --type=sqlRepository: JobDri-Developer/BackEnd
Length of output: 50
🏁 Script executed:
# Also search in Java/Kotlin files for any entity or migration definitions
rg -i 'mock_job_posting_corpus|mock_question_corpus' --type=java --type=kt -lRepository: JobDri-Developer/BackEnd
Length of output: 95
🏁 Script executed:
# Check for migration directories that might contain table definitions
find . -type d \( -name "*migration*" -o -name "*flyway*" -o -name "*liquibase*" \) 2>/dev/nullRepository: JobDri-Developer/BackEnd
Length of output: 50
🏁 Script executed:
# Search for parent tables in all files (no type restriction)
rg 'mock_job_posting_corpus|mock_question_corpus' -g '!.git' | head -20Repository: JobDri-Developer/BackEnd
Length of output: 1912
🏁 Script executed:
# Check if parent tables are defined in Java entity classes
find . -name "*.java" -type f -exec grep -l "mock_job_posting_corpus\|mock_question_corpus" {} \;Repository: JobDri-Developer/BackEnd
Length of output: 236
🏁 Script executed:
# Search for any reference to "corpus" table in the codebase
rg 'CREATE TABLE.*corpus' --type=sql -iRepository: JobDri-Developer/BackEnd
Length of output: 50
🏁 Script executed:
# Read the MockJobPostingCorpus entity to understand table generation config
cat -n src/main/java/com/jobdri/jobdri_api/domain/corpus/entity/MockJobPostingCorpus.java | head -30Repository: JobDri-Developer/BackEnd
Length of output: 1377
🏁 Script executed:
# Read the MockQuestionCorpus entity to understand table generation config
cat -n src/main/java/com/jobdri/jobdri_api/domain/corpus/entity/MockQuestionCorpus.java | head -30Repository: JobDri-Developer/BackEnd
Length of output: 1436
Parent corpus tables are defined in JPA entities, not in schema.sql—verify DDL execution order prevents FK constraint failures.
The schema creates foreign key constraints referencing mock_job_posting_corpus(id) and mock_question_corpus(id), but these tables are not defined in schema.sql. They are defined as JPA @Entity classes (MockJobPostingCorpus and MockQuestionCorpus), which means their tables are created by Hibernate's DDL generation, not by this SQL file.
Since foreign key constraints require the parent table to exist at the time of the CREATE TABLE statement, schema.sql will fail if it executes before Hibernate creates these tables. Ensure proper initialization order: either disable hibernate.ddl-auto and manually manage DDL through Flyway/Liquibase, or verify schema.sql runs after JPA table generation.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/main/resources/schema.sql` around lines 4 - 11, The
mock_job_posting_embeddings table in schema.sql defines foreign key constraints
referencing mock_job_posting_corpus, which is not defined in schema.sql but
instead is created by Hibernate as a JPA `@Entity` (MockJobPostingCorpus). Since
the foreign key constraint requires the parent table to exist first, schema.sql
will fail if it executes before Hibernate creates the JPA entity tables. Resolve
this by either disabling hibernate.ddl-auto in application properties and
managing all DDL through a migration tool like Flyway or Liquibase, or ensuring
schema.sql is executed after JPA table generation by adjusting your
initialization configuration. Verify that all referenced parent tables
(mock_job_posting_corpus and mock_question_corpus) are consistently managed
through the same DDL mechanism.
✨ 어떤 이유로 PR를 하셨나요?
📋 세부 내용 - 왜 해당 PR이 필요한지 작업 내용을 자세하게 설명해주세요
📸 작업 화면 스크린샷
🚨 관련 이슈 번호 [#26]
Summary by CodeRabbit
Release Notes
New Features
Dependencies