Skip to content

Feat/#26 mock jobposting entity#94

Merged
shinae1023 merged 3 commits into
mainfrom
feat/#26-mock-jobposting-entity
Jun 15, 2026
Merged

Feat/#26 mock jobposting entity#94
shinae1023 merged 3 commits into
mainfrom
feat/#26-mock-jobposting-entity

Conversation

@shinae1023

@shinae1023 shinae1023 commented Jun 15, 2026

Copy link
Copy Markdown
Member

✨ 어떤 이유로 PR를 하셨나요?

  • feature 병합
  • 버그 수정(아래에 issue #를 남겨주세요)
  • 코드 개선
  • 코드 수정
  • 배포
  • 기타(아래에 자세한 내용 기입해주세요)

📋 세부 내용 - 왜 해당 PR이 필요한지 작업 내용을 자세하게 설명해주세요

📸 작업 화면 스크린샷

⚠️ PR하기 전에 확인해주세요

  • 로컬테스트를 진행하셨나요?
  • 머지할 브랜치를 확인하셨나요?
  • 관련 label을 선택하셨나요?

🚨 관련 이슈 번호 [#26]

Summary by CodeRabbit

Release Notes

  • New Features

    • Added admin API endpoints to import job postings and questions from Excel files.
    • Added embedding synchronization capability for corpus data.
    • Added support for vector storage and retrieval in the database.
    • Added configurable admin user bootstrap on application startup.
  • Dependencies

    • Added Apache POI for Excel file processing.
    • Added pgvector for PostgreSQL vector operations.

@shinae1023 shinae1023 self-assigned this Jun 15, 2026
@shinae1023 shinae1023 added the ✨ feat New feature or request label Jun 15, 2026
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Introduces a full corpus administration pipeline: three new JPA entities (MockJobPostingCorpus, MockQuestionCorpus, CorpusClassificationMapping) backed by new DB tables and embedding storage, an XLSX import service (Apache POI), a Cohere embedding sync service (pgvector via JDBC), a startup admin bootstrap service, a CorpusAdminController REST API, and supporting security, config, and schema changes.

Changes

Corpus Admin Pipeline

Layer / File(s) Summary
Schema, entities, and repositories
src/main/resources/schema.sql, build.gradle, src/.../corpus/entity/MockJobPostingCorpus.java, src/.../corpus/entity/MockQuestionCorpus.java, src/.../corpus/entity/CorpusClassificationMapping.java, src/.../corpus/repository/Mock*Repository.java, src/.../corpus/repository/CorpusClassificationMappingRepository.java
Adds vector extension, two embedding tables (mock_job_posting_embeddings, mock_question_embeddings) with cascade-delete FK and index, three JPA entities with factory methods and mutators, and their Spring Data repositories. Adds poi-ooxml, pgvector, and spring-boot-starter-jdbc dependencies.
Admin role and security bootstrap
src/.../user/entity/User.java, src/.../global/config/SecurityConfig.java, src/.../corpus/service/BootstrapAdminService.java
Adds User.promoteToAdmin(), restricts /api/admin/** to ADMIN role in security config, and adds BootstrapAdminService that reads configured emails and promotes matching users on startup.
Classification resolution
src/.../classification/repository/DetailClassificationRepository.java, src/.../corpus/service/CorpusClassificationResolver.java
Extends DetailClassificationRepository with countByDetailName and a JPQL hierarchy-join query. Adds CorpusClassificationResolver implementing a three-tier resolution (cached mapping → hierarchical name match → unique name fallback) and a registerMapping write path.
XLSX corpus import service
src/.../corpus/service/CorpusImportResult.java, src/.../corpus/service/CorpusImportService.java, src/.../corpus/dto/request/CorpusImportRequest.java
Adds CorpusImportResult record and CorpusImportService that parses jd_embed_corpus and question_embed_corpus sheets via Apache POI, resolves Company and DetailClassification per row, upserts corpus entities, and returns aggregated counters.
Cohere embedding client and sync service
src/.../corpus/service/CorpusEmbeddingClient.java, src/.../corpus/service/CohereCorpusEmbeddingClient.java, src/.../corpus/service/CorpusEmbeddingSyncService.java, src/.../corpus/dto/response/CorpusEmbeddingSyncResponse.java, src/.../corpus/dto/request/CorpusEmbeddingSyncRequest.java
Introduces CorpusEmbeddingClient interface and its Cohere implementation (REST to /v2/embed, Doublefloat[] conversion). CorpusEmbeddingSyncService batches eligible corpus rows, generates embeddings, and upserts them via JDBC PGvector using SQL upsert templates keyed by corpus_id.
Admin controller, startup runner, and configuration
src/.../corpus/controller/CorpusAdminController.java, src/.../corpus/service/CorpusAdminRunner.java, src/main/resources/application-dev.yaml, src/main/resources/application-prod.yaml
Adds CorpusAdminController with POST /import and /embeddings/sync endpoints. Adds CorpusAdminRunner that orchestrates admin bootstrap, optional XLSX import, and optional embedding sync at startup. Dev/prod YAML configs add Cohere key, admin emails, corpus import/embedding parameters, and defer-datasource-initialization.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant CorpusAdminController
    participant CorpusImportService
    participant CorpusClassificationResolver
    participant DetailClassificationRepository
    participant MockJobPostingCorpusRepository

    Client->>CorpusAdminController: POST /api/admin/corpus/import (CorpusImportRequest)
    CorpusAdminController->>CorpusImportService: importFromXlsx(path)
    loop Each XLSX row
        CorpusImportService->>CorpusClassificationResolver: resolve(jobGroupL1, jobFamilyL2, roleL3)
        CorpusClassificationResolver->>DetailClassificationRepository: findByHierarchyNames / countByDetailName
        DetailClassificationRepository-->>CorpusClassificationResolver: Optional<DetailClassification>
        CorpusClassificationResolver-->>CorpusImportService: Optional<DetailClassification>
        CorpusImportService->>MockJobPostingCorpusRepository: findBySourceAnalysisId / save
    end
    CorpusImportService-->>CorpusAdminController: CorpusImportResult
    CorpusAdminController-->>Client: ApiResponse<CorpusImportResult>
Loading
sequenceDiagram
    participant Client
    participant CorpusAdminController
    participant CorpusEmbeddingSyncService
    participant CohereCorpusEmbeddingClient
    participant CohereAPI
    participant PostgreSQL

    Client->>CorpusAdminController: POST /api/admin/corpus/embeddings/sync (limit)
    CorpusAdminController->>CorpusEmbeddingSyncService: syncAll(limit)
    CorpusEmbeddingSyncService->>CorpusEmbeddingSyncService: fetch eligible corpus rows, partition batches
    loop Each batch
        CorpusEmbeddingSyncService->>CohereCorpusEmbeddingClient: embed(texts)
        CohereCorpusEmbeddingClient->>CohereAPI: POST /v2/embed
        CohereAPI-->>CohereCorpusEmbeddingClient: float embeddings
        CohereCorpusEmbeddingClient-->>CorpusEmbeddingSyncService: List<float[]>
        CorpusEmbeddingSyncService->>PostgreSQL: JDBC upsert PGvector rows
    end
    CorpusEmbeddingSyncService-->>CorpusAdminController: CorpusEmbeddingSyncResponse
    CorpusAdminController-->>Client: ApiResponse<CorpusEmbeddingSyncResponse>
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇 Hop, hop — the corpus grows today,
Excel sheets parsed, embeddings on their way!
Cohere whispers vectors, pgvector stores,
The admin bootstraps, then unlocks new doors.
Classification resolved in tiers of three —
A tidy data pipeline, neat as can be! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description uses the provided template but is almost entirely empty—only the issue number (#26) is filled in; all required sections (reason, details, screenshots, checks) are left blank or unchecked. Complete all required sections: explain why the PR is needed, provide detailed work description, confirm testing and branch verification, and describe any relevant implementation details.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title is partially related to the changeset; it refers to the mock job posting entity, but the PR also includes mock question corpus, admin API, embedding synchronization, and classification resolution—making it incomplete. Expand the title to reflect the full scope: e.g., 'Feat/#26: Add corpus import/embedding with mock job posting and question entities' or use a more comprehensive descriptor.
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/#26-mock-jobposting-entity

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Nitpick comments (6)
src/main/java/com/jobdri/jobdri_api/domain/user/entity/User.java (1)

110-112: ⚡ Quick win

Consider restricting visibility and adding audit logging.

The promoteToAdmin() method is public and performs no internal authorization checks, allowing any code with a User reference to grant admin privileges. While the current caller (BootstrapAdminService) checks the role first, direct invocations bypass that safeguard. Additionally, admin role changes typically require audit logging for security and compliance.

🔒 Recommended improvements
  1. Reduce visibility to package-private if callers are within the same package:
-public void promoteToAdmin() {
+void promoteToAdmin() {
  1. Add audit logging:
+@Slf4j  // at class level
+
 public void promoteToAdmin() {
     this.role = UserRole.ADMIN;
+    log.warn("User promoted to ADMIN role: email={}, id={}", this.email, this.id);
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/main/java/com/jobdri/jobdri_api/domain/user/entity/User.java` around
lines 110 - 112, The promoteToAdmin() method in the User class is currently
public with no authorization checks or audit logging, creating a security risk
since any code with a User reference can grant admin privileges. Change the
method visibility from public to package-private (remove the public modifier)
since the only caller BootstrapAdminService is in the same package, and add
audit logging to the promoteToAdmin() method to record when admin role changes
occur for compliance and security tracking purposes.
src/main/java/com/jobdri/jobdri_api/domain/corpus/service/BootstrapAdminService.java (1)

33-35: ⚡ Quick win

Consider logging when configured emails are not found.

The current implementation silently skips emails that don't match any user (Line 34 uses ifPresent). For operational visibility, log a warning when a configured bootstrap email has no corresponding user record, as this might indicate a configuration error or a missing account.

📋 Recommended logging improvement
 for (String email : emails) {
-    userRepository.findByEmail(email).ifPresent(this::promoteIfNeeded);
+    var userOpt = userRepository.findByEmail(email);
+    if (userOpt.isPresent()) {
+        promoteIfNeeded(userOpt.get());
+    } else {
+        log.warn("Bootstrap admin email not found in database: {}", email);
+    }
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/BootstrapAdminService.java`
around lines 33 - 35, In the BootstrapAdminService class, modify the email
processing loop to add logging visibility when a configured bootstrap email has
no corresponding user. Currently, the
userRepository.findByEmail(email).ifPresent(this::promoteIfNeeded) pattern
silently skips missing emails. Replace this with either ifPresentOrElse() or add
an additional condition to log a warning when the Optional is empty, ensuring
you capture the email address in the log message so operators can identify which
configured emails lack corresponding user records.
src/main/java/com/jobdri/jobdri_api/domain/corpus/repository/MockQuestionCorpusRepository.java (1)

16-16: 💤 Low value

Reconsider the IsNotNull predicate if embeddingText is truly non-nullable.

Similar to MockJobPostingCorpusRepository, the entity mapping shows embeddingText with nullable = false, yet the query includes AndEmbeddingTextIsNotNull. If the column constraint guarantees non-null values, this clause is redundant. Retain it only if rows can exist with null embeddingText during a multi-step import workflow.

♻️ Simplified query if embeddingText is guaranteed non-null
-List<MockQuestionCorpus> findAllByValidForEmbeddingTrueAndEmbeddingTextIsNotNullOrderByIdAsc();
+List<MockQuestionCorpus> findAllByValidForEmbeddingTrueOrderByIdAsc();
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/repository/MockQuestionCorpusRepository.java`
at line 16, The method
findAllByValidForEmbeddingTrueAndEmbeddingTextIsNotNullOrderByIdAsc() in
MockQuestionCorpusRepository includes an EmbeddingTextIsNotNull predicate, but
the entity mapping declares embeddingText with nullable=false, making the
predicate redundant. Determine if embeddingText can actually be null during
import workflows: if it cannot be null (as the entity constraint suggests),
remove the AndEmbeddingTextIsNotNull clause from the method name to simplify it
to findAllByValidForEmbeddingTrueOrderByIdAsc(); if rows can temporarily have
null embeddingText during a multi-step import, instead update the entity mapping
to nullable=true to align with the query logic.
src/main/java/com/jobdri/jobdri_api/domain/corpus/repository/MockJobPostingCorpusRepository.java (1)

14-14: 💤 Low value

Reconsider the IsNotNull predicate if embeddingText is truly non-nullable.

The entity mapping shows embeddingText with nullable = false, yet the query includes AndEmbeddingTextIsNotNull. If the column constraint guarantees non-null values at the database level, the IsNotNull clause is redundant. However, if rows can be inserted before embeddingText is populated (e.g., in a multi-step workflow), the predicate is a necessary safeguard.

♻️ Simplified query if embeddingText is guaranteed non-null
-List<MockJobPostingCorpus> findAllByValidForEmbeddingTrueAndEmbeddingTextIsNotNullOrderByIdAsc();
+List<MockJobPostingCorpus> findAllByValidForEmbeddingTrueOrderByIdAsc();
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/repository/MockJobPostingCorpusRepository.java`
at line 14, Review the entity mapping for the `embeddingText` field in
MockJobPostingCorpus to determine if it is truly non-nullable at the database
level. If the column has a `nullable = false` constraint and there is no
multi-step workflow where rows are inserted before `embeddingText` is populated,
remove the `IsNotNull` predicate from the method name
`findAllByValidForEmbeddingTrueAndEmbeddingTextIsNotNullOrderByIdAsc` since the
database constraint already guarantees non-null values. If the field can be null
during intermediate stages of a workflow, keep the predicate as a safeguard.
src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusImportService.java (1)

222-231: ⚡ Quick win

Cache company resolution within a workbook import.

resolveCompany hits the repository per row. For large sheets with repeated company names, this creates avoidable N+1 lookups.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusImportService.java`
around lines 222 - 231, The resolveCompany method performs a repository lookup
for every row in an import without caching results, causing N+1 query problems
when company names are repeated across rows. Add a local cache (Map) to store
resolved companies by their normalized name within the scope of a single import
operation. Modify resolveCompany to check this cache first before querying the
repository, and populate it when creating or retrieving companies. Initialize
the cache at the beginning of the import method that calls resolveCompany, and
pass it as a parameter to resolveCompany so that repeated company name lookups
within the same import reuse cached results instead of hitting the repository
multiple times.
src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingSyncService.java (1)

67-69: ⚡ Quick win

Apply limit at query level instead of loading full corpora first.

Line 67–69 and Line 75–77 fetch all eligible rows, then Line 129–134 trims in memory; this scales poorly as corpus size grows.

Also applies to: 75-77, 129-134

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingSyncService.java`
around lines 67 - 69, The current implementation fetches all eligible corpora
from the database and then applies the limit in memory using the applyLimit
method, which is inefficient at scale. Modify the repository query methods at
lines 67-69 and 75-77 to accept a limit parameter and apply pagination/limiting
at the query level rather than loading all results first (consider using Spring
Data's Pageable interface or creating new query methods with limit parameters
such as
findAllByValidForEmbeddingTrueAndEmbeddingTextIsNotNullOrderByIdAscWithLimit).
Then remove or simplify the applyLimit method at lines 129-134 since the
limiting will now be handled at the database query level.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@src/main/java/com/jobdri/jobdri_api/domain/classification/repository/DetailClassificationRepository.java`:
- Line 15: The methods countByDetailName and findByDetailName in
DetailClassificationRepository are case-sensitive derived queries, which causes
the fallback resolution in CorpusClassificationResolver to miss matches when
only the casing differs (e.g., searching for "software engineer" when "Software
Engineer" is stored). Replace these case-sensitive derived query methods with
case-insensitive variants by using the appropriate JPA query syntax that
performs case-insensitive comparisons, ensuring that detail names are matched
regardless of letter casing.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/controller/CorpusAdminController.java`:
- Around line 33-39: The importCorpus method in CorpusAdminController accepts a
raw filesystem path directly from the request parameter request.filePath()
without validation, creating a path traversal vulnerability where admin tokens
could access arbitrary files on the server. Fix this by normalizing the provided
file path using Path.of().normalize() or similar canonicalization, then validate
that the normalized path is within an approved configured import root directory
(using a whitelist approach such as startsWith checks against the base
directory). Only proceed with corpusImportService.importFromXlsx() if the path
validation succeeds; otherwise, reject the request with an appropriate error
response.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CohereCorpusEmbeddingClient.java`:
- Around line 40-56: The RestClient instantiation in the
CohereCorpusEmbeddingClient for building the Cohere API client lacks explicit
HTTP timeouts, which can cause request threads to block indefinitely if the API
hangs or responds slowly. Create a SimpleClientHttpRequestFactory instance
configured with a 5-second connect timeout and 10-second read timeout, then pass
this factory to the restClientBuilder via the .requestFactory() method before
calling .build(), following the same pattern used in TossPaymentClient elsewhere
in the codebase.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusAdminRunner.java`:
- Around line 35-42: The optional startup tasks in the if blocks (corpus import
and embedding sync) lack local exception handling, which means any failure will
bubble up and stop the application from starting. Wrap each optional task (the
importFromXlsx call and the syncAll call) in try-catch blocks to handle
exceptions locally. Log the error details when exceptions occur, then allow the
application to continue starting. This ensures that failures in these optional
startup operations do not cause the entire application to fail to boot.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusClassificationResolver.java`:
- Around line 71-86: The registerMapping method normalizes the three job
parameters (jobGroupL1, jobFamilyL2, roleL3) but does not validate that the
normalize() calls produce non-null results, allowing null values to reach the
database and cause constraint violations instead of controlled validation
errors. Add input validation immediately after the three normalize() calls to
verify that normalizedJobGroup, normalizedJobFamily, and normalizedRole are all
non-null, and throw an appropriate validation exception if any of them are null
before proceeding to the
mappingRepository.findBySourceJobGroupL1AndSourceJobFamilyL2AndSourceRoleL3
call.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingSyncService.java`:
- Around line 58-63: Remove the readOnly = true parameter from the
`@Transactional` annotation on the three methods syncAll (line 58),
syncJobPostingEmbeddings (line 65), and syncQuestionEmbeddings (line 73). These
methods perform database write operations through calls to upsertVectors, and
marking them as read-only transactions prevents these writes from executing
properly. Either remove the readOnly parameter entirely (to default to readOnly
= false) or explicitly set readOnly = false on the `@Transactional` annotations
for all three methods.
- Around line 110-123: The upsertVectors() method performs batch write
operations using a connection obtained directly from dataSource.getConnection(),
which bypasses Spring's transaction management. Replace
dataSource.getConnection() with DataSourceUtils.getConnection(dataSource) to
ensure the batch operations participate in the active Spring transaction, and
properly release the connection using DataSourceUtils.releaseConnection() in a
finally block or try-with-resources equivalent. Additionally, either remove the
readOnly=true constraint from the public methods that call upsertVectors() and
mark upsertVectors() itself as `@Transactional`, or alternatively create a new
public `@Transactional` (non-readOnly) method that wraps the upsertVectors() call
to provide proper transactional boundaries for the write operations.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusImportService.java`:
- Around line 261-273: The getInteger method at line 272 calls
Integer.parseInt(value) directly, which throws NumberFormatException when the
DataFormatter returns Excel-formatted numbers like "1,000" or "120.0", causing
the entire import to fail. Add error handling around the Integer.parseInt call
in the getInteger method to catch NumberFormatException, and before parsing,
clean the value string by removing any formatting characters (such as commas)
and handling decimal places appropriately (either by stripping decimals or
rounding). Return null or a sensible default if parsing fails after cleanup,
allowing the import to continue gracefully instead of aborting.
- Around line 254-257: The getString method in CorpusImportService currently
returns null when a required header like analysis_id or question_id is missing,
causing rows to be silently skipped and producing a false-success import. Add a
validateRequiredHeaders method that takes the headerMap and a variable number of
required column names as parameters, then iterates through each required column
name and throws an IllegalArgumentException with a descriptive message if any
required column is not found in the headerMap. Call this validation method
immediately after the header map is populated from the sheet (before any data
processing occurs) to fail fast and prevent silent skipping of required fields.
This ensures that missing required headers are caught upfront rather than
silently producing empty imports.

In `@src/main/resources/schema.sql`:
- Around line 4-11: The mock_job_posting_embeddings table in schema.sql defines
foreign key constraints referencing mock_job_posting_corpus, which is not
defined in schema.sql but instead is created by Hibernate as a JPA `@Entity`
(MockJobPostingCorpus). Since the foreign key constraint requires the parent
table to exist first, schema.sql will fail if it executes before Hibernate
creates the JPA entity tables. Resolve this by either disabling
hibernate.ddl-auto in application properties and managing all DDL through a
migration tool like Flyway or Liquibase, or ensuring schema.sql is executed
after JPA table generation by adjusting your initialization configuration.
Verify that all referenced parent tables (mock_job_posting_corpus and
mock_question_corpus) are consistently managed through the same DDL mechanism.

---

Nitpick comments:
In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/repository/MockJobPostingCorpusRepository.java`:
- Line 14: Review the entity mapping for the `embeddingText` field in
MockJobPostingCorpus to determine if it is truly non-nullable at the database
level. If the column has a `nullable = false` constraint and there is no
multi-step workflow where rows are inserted before `embeddingText` is populated,
remove the `IsNotNull` predicate from the method name
`findAllByValidForEmbeddingTrueAndEmbeddingTextIsNotNullOrderByIdAsc` since the
database constraint already guarantees non-null values. If the field can be null
during intermediate stages of a workflow, keep the predicate as a safeguard.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/repository/MockQuestionCorpusRepository.java`:
- Line 16: The method
findAllByValidForEmbeddingTrueAndEmbeddingTextIsNotNullOrderByIdAsc() in
MockQuestionCorpusRepository includes an EmbeddingTextIsNotNull predicate, but
the entity mapping declares embeddingText with nullable=false, making the
predicate redundant. Determine if embeddingText can actually be null during
import workflows: if it cannot be null (as the entity constraint suggests),
remove the AndEmbeddingTextIsNotNull clause from the method name to simplify it
to findAllByValidForEmbeddingTrueOrderByIdAsc(); if rows can temporarily have
null embeddingText during a multi-step import, instead update the entity mapping
to nullable=true to align with the query logic.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/BootstrapAdminService.java`:
- Around line 33-35: In the BootstrapAdminService class, modify the email
processing loop to add logging visibility when a configured bootstrap email has
no corresponding user. Currently, the
userRepository.findByEmail(email).ifPresent(this::promoteIfNeeded) pattern
silently skips missing emails. Replace this with either ifPresentOrElse() or add
an additional condition to log a warning when the Optional is empty, ensuring
you capture the email address in the log message so operators can identify which
configured emails lack corresponding user records.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingSyncService.java`:
- Around line 67-69: The current implementation fetches all eligible corpora
from the database and then applies the limit in memory using the applyLimit
method, which is inefficient at scale. Modify the repository query methods at
lines 67-69 and 75-77 to accept a limit parameter and apply pagination/limiting
at the query level rather than loading all results first (consider using Spring
Data's Pageable interface or creating new query methods with limit parameters
such as
findAllByValidForEmbeddingTrueAndEmbeddingTextIsNotNullOrderByIdAscWithLimit).
Then remove or simplify the applyLimit method at lines 129-134 since the
limiting will now be handled at the database query level.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusImportService.java`:
- Around line 222-231: The resolveCompany method performs a repository lookup
for every row in an import without caching results, causing N+1 query problems
when company names are repeated across rows. Add a local cache (Map) to store
resolved companies by their normalized name within the scope of a single import
operation. Modify resolveCompany to check this cache first before querying the
repository, and populate it when creating or retrieving companies. Initialize
the cache at the beginning of the import method that calls resolveCompany, and
pass it as a parameter to resolveCompany so that repeated company name lookups
within the same import reuse cached results instead of hitting the repository
multiple times.

In `@src/main/java/com/jobdri/jobdri_api/domain/user/entity/User.java`:
- Around line 110-112: The promoteToAdmin() method in the User class is
currently public with no authorization checks or audit logging, creating a
security risk since any code with a User reference can grant admin privileges.
Change the method visibility from public to package-private (remove the public
modifier) since the only caller BootstrapAdminService is in the same package,
and add audit logging to the promoteToAdmin() method to record when admin role
changes occur for compliance and security tracking purposes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 85bc6e99-fe4d-44c6-bff1-040a4cdc5187

📥 Commits

Reviewing files that changed from the base of the PR and between c41f1b1 and 7762d59.

📒 Files selected for processing (25)
  • build.gradle
  • src/main/java/com/jobdri/jobdri_api/domain/classification/repository/DetailClassificationRepository.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/controller/CorpusAdminController.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/dto/request/CorpusEmbeddingSyncRequest.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/dto/request/CorpusImportRequest.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/dto/response/CorpusEmbeddingSyncResponse.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/entity/CorpusClassificationMapping.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/entity/MockJobPostingCorpus.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/entity/MockQuestionCorpus.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/repository/CorpusClassificationMappingRepository.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/repository/MockJobPostingCorpusRepository.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/repository/MockQuestionCorpusRepository.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/service/BootstrapAdminService.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CohereCorpusEmbeddingClient.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusAdminRunner.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusClassificationResolver.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingClient.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingSyncService.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusImportResult.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusImportService.java
  • src/main/java/com/jobdri/jobdri_api/domain/user/entity/User.java
  • src/main/java/com/jobdri/jobdri_api/global/config/SecurityConfig.java
  • src/main/resources/application-dev.yaml
  • src/main/resources/application-prod.yaml
  • src/main/resources/schema.sql

public interface DetailClassificationRepository extends JpaRepository<DetailClassification, Long> {
List<DetailClassification> findAllByMiddleClassificationId(Long middleClassificationId);
Optional<DetailClassification> findByDetailName(String detailName);
long countByDetailName(String detailName);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "DetailClassificationRepository.java"

Repository: JobDri-Developer/BackEnd

Length of output: 175


🏁 Script executed:

find . -type f -name "CorpusClassificationResolver.java"

Repository: JobDri-Developer/BackEnd

Length of output: 162


🏁 Script executed:

find . -type f -name "*.java" | grep -i "classification" | head -20

Repository: JobDri-Developer/BackEnd

Length of output: 1908


🏁 Script executed:

cat -n src/main/java/com/jobdri/jobdri_api/domain/classification/repository/DetailClassificationRepository.java

Repository: JobDri-Developer/BackEnd

Length of output: 2841


🏁 Script executed:

cat -n src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusClassificationResolver.java

Repository: JobDri-Developer/BackEnd

Length of output: 4267


Use case-insensitive detail-name methods for fallback resolution.

countByDetailName at line 15 and findByDetailName at line 14 are case-sensitive derived queries. In CorpusClassificationResolver (lines 57-59), this causes the fallback resolution to incorrectly skip a unique match when only casing differs (e.g., Software Engineer stored but software engineer searched). The normalize() method only trims whitespace, not case.

Replace with case-insensitive variants:

Suggested fix
--- a/src/main/java/com/jobdri/jobdri_api/domain/classification/repository/DetailClassificationRepository.java
+++ b/src/main/java/com/jobdri/jobdri_api/domain/classification/repository/DetailClassificationRepository.java
@@
-    Optional<DetailClassification> findByDetailName(String detailName);
-    long countByDetailName(String detailName);
+    Optional<DetailClassification> findByDetailNameIgnoreCase(String detailName);
+    long countByDetailNameIgnoreCase(String detailName);
--- a/src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusClassificationResolver.java
+++ b/src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusClassificationResolver.java
@@
-        if (detailClassificationRepository.countByDetailName(normalizedRole) == 1) {
-            return detailClassificationRepository.findByDetailName(normalizedRole);
+        if (detailClassificationRepository.countByDetailNameIgnoreCase(normalizedRole) == 1) {
+            return detailClassificationRepository.findByDetailNameIgnoreCase(normalizedRole);
         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/main/java/com/jobdri/jobdri_api/domain/classification/repository/DetailClassificationRepository.java`
at line 15, The methods countByDetailName and findByDetailName in
DetailClassificationRepository are case-sensitive derived queries, which causes
the fallback resolution in CorpusClassificationResolver to miss matches when
only the casing differs (e.g., searching for "software engineer" when "Software
Engineer" is stored). Replace these case-sensitive derived query methods with
case-insensitive variants by using the appropriate JPA query syntax that
performs case-insensitive comparisons, ensuring that detail names are matched
regardless of letter casing.

Comment on lines +33 to +39
public ApiResponse<CorpusImportResult> importCorpus(
@Valid @RequestBody CorpusImportRequest request
) throws IOException {
return ApiResponse.onSuccess(
"corpus 엑셀 적재에 성공했습니다.",
corpusImportService.importFromXlsx(Path.of(request.filePath()))
);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Constrain import file paths to an approved base directory.

Line 38 accepts a raw server filesystem path from the request; this creates an arbitrary local file access surface for admin tokens. Enforce normalization and allowlist a configured import root.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/controller/CorpusAdminController.java`
around lines 33 - 39, The importCorpus method in CorpusAdminController accepts a
raw filesystem path directly from the request parameter request.filePath()
without validation, creating a path traversal vulnerability where admin tokens
could access arbitrary files on the server. Fix this by normalizing the provided
file path using Path.of().normalize() or similar canonicalization, then validate
that the normalized path is within an approved configured import root directory
(using a whitelist approach such as startsWith checks against the base
directory). Only proceed with corpusImportService.importFromXlsx() if the path
validation succeeds; otherwise, reject the request with an appropriate error
response.

Comment on lines +40 to +56
RestClient client = restClientBuilder
.baseUrl("https://api.cohere.com")
.defaultHeader(HttpHeaders.AUTHORIZATION, "Bearer " + cohereApiKey)
.defaultHeader(HttpHeaders.CONTENT_TYPE, MediaType.APPLICATION_JSON_VALUE)
.build();

EmbedResponse response = client.post()
.uri("/v2/embed")
.body(new EmbedRequest(
texts,
embeddingModel,
documentInputType,
outputDimension,
List.of("float")
))
.retrieve()
.body(EmbedResponse.class);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify whether RestClient timeout settings are configured anywhere in the repository
rg -n -C3 "RestClient\\.Builder|ClientHttpRequestFactory|setConnectTimeout|setReadTimeout|JdkClientHttpRequestFactory|HttpComponentsClientHttpRequestFactory" src/main/java src/main/resources

Repository: JobDri-Developer/BackEnd

Length of output: 3892


🏁 Script executed:

cat -n src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CohereCorpusEmbeddingClient.java

Repository: JobDri-Developer/BackEnd

Length of output: 3681


🏁 Script executed:

# Search for RestClientBuilder or RestClient bean configuration in Spring configs
rg -n "RestClient\.Builder|`@Bean`.*RestClient|restClientBuilder" src/main/java --type java -A 5

Repository: JobDri-Developer/BackEnd

Length of output: 3026


Add explicit HTTP timeouts to the Cohere client.

Lines 40–56 make a blocking outbound call without connect or read timeouts. If the Cohere API hangs or is slow to respond, this can block request threads indefinitely. Set timeouts using SimpleClientHttpRequestFactory and pass it via .requestFactory(), similar to TossPaymentClient in the same codebase (which configures 5s connect and 10s read timeouts).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CohereCorpusEmbeddingClient.java`
around lines 40 - 56, The RestClient instantiation in the
CohereCorpusEmbeddingClient for building the Cohere API client lacks explicit
HTTP timeouts, which can cause request threads to block indefinitely if the API
hangs or responds slowly. Create a SimpleClientHttpRequestFactory instance
configured with a 5-second connect timeout and 10-second read timeout, then pass
this factory to the restClientBuilder via the .requestFactory() method before
calling .build(), following the same pattern used in TossPaymentClient elsewhere
in the codebase.

Comment on lines +35 to +42
if (runImportOnStartup && StringUtils.hasText(importXlsxPath)) {
CorpusImportResult result = corpusImportService.importFromXlsx(Path.of(importXlsxPath));
log.info("corpus import 완료: {}", result);
}

if (syncEmbeddingsOnStartup) {
log.info("corpus embedding sync 완료: {}", corpusEmbeddingSyncService.syncAll(null));
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Isolate optional startup tasks from application boot failure.

Lines 35–42 run optional import/sync without local exception handling. A single file/path/API failure will bubble up through run(...) and can stop the whole app from starting.

Proposed fix
 if (runImportOnStartup && StringUtils.hasText(importXlsxPath)) {
-    CorpusImportResult result = corpusImportService.importFromXlsx(Path.of(importXlsxPath));
-    log.info("corpus import 완료: {}", result);
+    try {
+        CorpusImportResult result = corpusImportService.importFromXlsx(Path.of(importXlsxPath));
+        log.info("corpus import 완료: {}", result);
+    } catch (Exception e) {
+        log.error("corpus import startup task failed", e);
+    }
 }
 
 if (syncEmbeddingsOnStartup) {
-    log.info("corpus embedding sync 완료: {}", corpusEmbeddingSyncService.syncAll(null));
+    try {
+        log.info("corpus embedding sync 완료: {}", corpusEmbeddingSyncService.syncAll(null));
+    } catch (Exception e) {
+        log.error("corpus embedding sync startup task failed", e);
+    }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (runImportOnStartup && StringUtils.hasText(importXlsxPath)) {
CorpusImportResult result = corpusImportService.importFromXlsx(Path.of(importXlsxPath));
log.info("corpus import 완료: {}", result);
}
if (syncEmbeddingsOnStartup) {
log.info("corpus embedding sync 완료: {}", corpusEmbeddingSyncService.syncAll(null));
}
if (runImportOnStartup && StringUtils.hasText(importXlsxPath)) {
try {
CorpusImportResult result = corpusImportService.importFromXlsx(Path.of(importXlsxPath));
log.info("corpus import 완료: {}", result);
} catch (Exception e) {
log.error("corpus import startup task failed", e);
}
}
if (syncEmbeddingsOnStartup) {
try {
log.info("corpus embedding sync 완료: {}", corpusEmbeddingSyncService.syncAll(null));
} catch (Exception e) {
log.error("corpus embedding sync startup task failed", e);
}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusAdminRunner.java`
around lines 35 - 42, The optional startup tasks in the if blocks (corpus import
and embedding sync) lack local exception handling, which means any failure will
bubble up and stop the application from starting. Wrap each optional task (the
importFromXlsx call and the syncAll call) in try-catch blocks to handle
exceptions locally. Log the error details when exceptions occur, then allow the
application to continue starting. This ensures that failures in these optional
startup operations do not cause the entire application to fail to boot.

Comment on lines +71 to +86
String normalizedJobGroup = normalize(jobGroupL1);
String normalizedJobFamily = normalize(jobFamilyL2);
String normalizedRole = normalize(roleL3);

return mappingRepository
.findBySourceJobGroupL1AndSourceJobFamilyL2AndSourceRoleL3(
normalizedJobGroup,
normalizedJobFamily,
normalizedRole
)
.orElseGet(() -> mappingRepository.save(CorpusClassificationMapping.create(
normalizedJobGroup,
normalizedJobFamily,
normalizedRole,
detailClassification
)));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "CorpusClassificationResolver.java" | head -5

Repository: JobDri-Developer/BackEnd

Length of output: 162


🏁 Script executed:

find . -type f -name "CorpusClassificationMapping.java" | head -5

Repository: JobDri-Developer/BackEnd

Length of output: 160


🏁 Script executed:

cat -n src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusClassificationResolver.java | sed -n '60,100p'

Repository: JobDri-Developer/BackEnd

Length of output: 1469


🏁 Script executed:

cat -n src/main/java/com/jobdri/jobdri_api/domain/corpus/entity/CorpusClassificationMapping.java | sed -n '1,50p'

Repository: JobDri-Developer/BackEnd

Length of output: 2098


🏁 Script executed:

cat -n src/main/java/com/jobdri/jobdri_api/domain/corpus/entity/CorpusClassificationMapping.java | sed -n '45,60p'

Repository: JobDri-Developer/BackEnd

Length of output: 742


🏁 Script executed:

rg "registerMapping" src/main/java/ -A 3 -B 3 --type java

Repository: JobDri-Developer/BackEnd

Length of output: 884


Add input validation to prevent null values from breaching NOT NULL constraints.

The registerMapping method normalizes the job triplet parameters, but normalize() returns null for blank inputs (line 90-91). Without validation, null values can be passed to save() at line 81, causing a database constraint violation instead of a controlled validation error. The mapped entity requires non-null source_job_group_l1, source_job_family_l2, and source_role_l3 (lines 29, 32, 35 of CorpusClassificationMapping.java).

Suggested fix
--- a/src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusClassificationResolver.java
+++ b/src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusClassificationResolver.java
@@
     public CorpusClassificationMapping registerMapping(
             String jobGroupL1,
             String jobFamilyL2,
             String roleL3,
             DetailClassification detailClassification
     ) {
         String normalizedJobGroup = normalize(jobGroupL1);
         String normalizedJobFamily = normalize(jobFamilyL2);
         String normalizedRole = normalize(roleL3);
+
+        if (!StringUtils.hasText(normalizedJobGroup)
+                || !StringUtils.hasText(normalizedJobFamily)
+                || !StringUtils.hasText(normalizedRole)) {
+            throw new IllegalArgumentException("jobGroupL1, jobFamilyL2, and roleL3 are required.");
+        }
+        if (detailClassification == null) {
+            throw new IllegalArgumentException("detailClassification is required.");
+        }
 
         return mappingRepository
                 .findBySourceJobGroupL1AndSourceJobFamilyL2AndSourceRoleL3(
                         normalizedJobGroup,
                         normalizedJobFamily,
                         normalizedRole
                 )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusClassificationResolver.java`
around lines 71 - 86, The registerMapping method normalizes the three job
parameters (jobGroupL1, jobFamilyL2, roleL3) but does not validate that the
normalize() calls produce non-null results, allowing null values to reach the
database and cause constraint violations instead of controlled validation
errors. Add input validation immediately after the three normalize() calls to
verify that normalizedJobGroup, normalizedJobFamily, and normalizedRole are all
non-null, and throw an appropriate validation exception if any of them are null
before proceeding to the
mappingRepository.findBySourceJobGroupL1AndSourceJobFamilyL2AndSourceRoleL3
call.

Comment on lines +58 to +63
@Transactional(readOnly = true)
public CorpusEmbeddingSyncResponse syncAll(Integer limit) {
int jobPostingCount = syncJobPostingEmbeddings(limit);
int questionCount = syncQuestionEmbeddings(limit);
return new CorpusEmbeddingSyncResponse(jobPostingCount, questionCount, embeddingModel);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

git ls-files src/main/java/com/jobdri/jobdri_api/domain/corpus/service/ | head -20

Repository: JobDri-Developer/BackEnd

Length of output: 757


🏁 Script executed:

cat -n src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingSyncService.java | head -150

Repository: JobDri-Developer/BackEnd

Length of output: 7370


Remove readOnly = true from write sync paths.

Methods at lines 58, 65, and 73 (syncAll, syncJobPostingEmbeddings, syncQuestionEmbeddings) are marked read-only but execute database writes via upsertVectors (line 105). Self-invocation from syncAll means the read-only transaction context is inherited during write operations, which can prevent writes depending on database and transaction manager configuration.

Suggested fix
-    `@Transactional`(readOnly = true)
+    `@Transactional`
     public CorpusEmbeddingSyncResponse syncAll(Integer limit) {
         int jobPostingCount = syncJobPostingEmbeddings(limit);
         int questionCount = syncQuestionEmbeddings(limit);
         return new CorpusEmbeddingSyncResponse(jobPostingCount, questionCount, embeddingModel);
     }

-    `@Transactional`(readOnly = true)
+    `@Transactional`
     public int syncJobPostingEmbeddings(Integer limit) {
         ...
     }

-    `@Transactional`(readOnly = true)
+    `@Transactional`
     public int syncQuestionEmbeddings(Integer limit) {
         ...
     }

Also applies to: 65-79

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingSyncService.java`
around lines 58 - 63, Remove the readOnly = true parameter from the
`@Transactional` annotation on the three methods syncAll (line 58),
syncJobPostingEmbeddings (line 65), and syncQuestionEmbeddings (line 73). These
methods perform database write operations through calls to upsertVectors, and
marking them as read-only transactions prevents these writes from executing
properly. Either remove the readOnly parameter entirely (to default to readOnly
= false) or explicitly set readOnly = false on the `@Transactional` annotations
for all three methods.

Comment on lines +110 to +123
try (Connection connection = dataSource.getConnection()) {
PGvector.registerTypes(connection);
try (PreparedStatement statement = connection.prepareStatement(sql)) {
Timestamp now = Timestamp.valueOf(LocalDateTime.now());
for (int i = 0; i < ids.size(); i++) {
statement.setLong(1, ids.get(i));
statement.setString(2, embeddingModel);
statement.setObject(3, new PGvector(embeddings.get(i)));
statement.setTimestamp(4, now);
statement.setTimestamp(5, now);
statement.addBatch();
}
statement.executeBatch();
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, locate and examine the CorpusEmbeddingSyncService file
find . -name "CorpusEmbeddingSyncService.java" -type f

Repository: JobDri-Developer/BackEnd

Length of output: 160


🏁 Script executed:

# If found, read the entire file to understand the context and method signature
cat -n "src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingSyncService.java"

Repository: JobDri-Developer/BackEnd

Length of output: 7370


🏁 Script executed:

# Search for any Spring annotations (`@Transactional`, `@Service`, etc.) in the file
rg -n "`@Transactional`|`@Service`|`@Component`" "src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingSyncService.java"

Repository: JobDri-Developer/BackEnd

Length of output: 197


🌐 Web query:

Spring DataSourceUtils.getConnection() vs dataSource.getConnection() transaction management

💡 Result:

The primary difference between DataSourceUtils.getConnection(DataSource) and DataSource.getConnection() is how they handle Spring-managed transactions and exception translation. DataSourceUtils.getConnection(dataSource) is designed to be aware of Spring's transaction synchronization [1][2][3]. If a transaction is active (e.g., managed by DataSourceTransactionManager or JtaTransactionManager), it returns the connection bound to the current thread instead of a brand-new connection [1][4][2]. This ensures that your code participates in the existing transaction rather than accidentally creating a separate, non-transactional connection [3]. If no transaction is active, it behaves exactly like dataSource.getConnection() [1][4]. In contrast, dataSource.getConnection() always attempts to retrieve a new connection from the underlying data source or connection pool, ignoring any ongoing Spring transactions [4]. Key technical differences include: Transaction Awareness: DataSourceUtils detects thread-bound connections linked to an active Spring transaction [1][2][3]. Exception Handling: DataSourceUtils translates standard SQLExceptions into Spring's hierarchy of unchecked DataAccessExceptions (e.g., CannotGetJdbcConnectionException), which simplifies error handling and provides more meaningful information [1][2][3]. Framework Integration: Spring's internal classes, such as JdbcTemplate, use DataSourceUtils implicitly to ensure seamless participation in transaction management [1][5][3]. When writing manual JDBC code within a Spring application, you should use DataSourceUtils.getConnection(dataSource) to ensure consistent transaction participation [1][3]. You must also use DataSourceUtils.releaseConnection(connection, dataSource) to ensure the connection is closed only if it is not managed externally by an active transaction [2][6]. If you are using higher-level abstractions like JdbcTemplate, this management is handled for you automatically [1][3].

Citations:


🌐 Web query:

Spring @transactional on private methods behavior transaction proxy

💡 Result:

In Spring’s default proxy-based AOP mode, the @Transactional annotation is not supported on private methods [1][2]. The reasons for this behavior are rooted in the mechanics of how Spring implements transaction management: 1. Proxy Limitations: Spring creates proxies for beans either by wrapping them in a JDK dynamic proxy (for interfaces) or by generating a CGLIB subclass (for classes) [2][3]. Because private methods cannot be overridden or accessed by subclasses or proxy objects, the proxy cannot intercept calls to these methods [2][4]. Consequently, the transactional advice is never applied, and the annotation is silently ignored [1][2]. 2. Visibility Requirements: Historically, Spring required methods to be public for transactional advice to work [5][6]. As of Spring 6.0, protected and package-visible methods are supported for class-based proxies (CGLIB) by default [5][2]. However, private methods remain unsupported in standard proxy-based configurations because they are not visible or overridable by the generated proxy [2][4]. 3. Proxy vs. AspectJ: If transactional behavior is required on private methods, developers typically switch from the default proxy-based AOP to AspectJ-based weaving [1][7]. AspectJ functions by modifying the bytecode directly rather than relying on proxy objects, allowing it to intercept private method calls [2]. Important Note on Self-Invocation: Even if a method is public and properly annotated, transactional behavior will be bypassed if the method is called from within the same class (self-invocation) [5][8]. This occurs because the call is made via the 'this' reference, bypassing the proxy entirely and therefore missing the transactional interceptor [1][4]. To resolve self-invocation issues, developers often move the transactional method to a separate bean or use AopContext to access the current proxy [8][7][9].

Citations:


Transaction management missing for batch write operations.

The upsertVectors() method is called from @Transactional(readOnly=true) public methods but executes writes (INSERT/UPDATE) outside any Spring-managed transaction due to:

  1. Direct dataSource.getConnection() at line 110 bypasses Spring's transaction synchronization
  2. upsertVectors() is private and non-transactional (Spring's proxy-based AOP cannot intercept private methods)
  3. Batch writes execute without transactional guarantees, risking partial commits on failures

Use DataSourceUtils.getConnection(dataSource) with proper release to participate in active transactions. Additionally, either remove readOnly=true from calling methods or move write operations to a separate @Transactional method to ensure these operations execute within a transaction context.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingSyncService.java`
around lines 110 - 123, The upsertVectors() method performs batch write
operations using a connection obtained directly from dataSource.getConnection(),
which bypasses Spring's transaction management. Replace
dataSource.getConnection() with DataSourceUtils.getConnection(dataSource) to
ensure the batch operations participate in the active Spring transaction, and
properly release the connection using DataSourceUtils.releaseConnection() in a
finally block or try-with-resources equivalent. Additionally, either remove the
readOnly=true constraint from the public methods that call upsertVectors() and
mark upsertVectors() itself as `@Transactional`, or alternatively create a new
public `@Transactional` (non-readOnly) method that wraps the upsertVectors() call
to provide proper transactional boundaries for the write operations.

Comment on lines +254 to +257
Integer index = headerMap.get(columnName);
if (index == null) {
return null;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fail fast when required headers are missing.

getString returns null when a header is absent, which makes required keys like analysis_id/question_id behave as “empty rows” and silently skip the sheet. This can produce a false-success import with zero records.

Proposed fix
 Map<String, Integer> headerMap = readHeaderMap(rows.next(), formatter, evaluator);
+validateRequiredHeaders(headerMap, "analysis_id");
private void validateRequiredHeaders(Map<String, Integer> headerMap, String... required) {
    for (String column : required) {
        if (!headerMap.containsKey(column)) {
            throw new IllegalArgumentException("Missing required column: " + column);
        }
    }
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusImportService.java`
around lines 254 - 257, The getString method in CorpusImportService currently
returns null when a required header like analysis_id or question_id is missing,
causing rows to be silently skipped and producing a false-success import. Add a
validateRequiredHeaders method that takes the headerMap and a variable number of
required column names as parameters, then iterates through each required column
name and throws an IllegalArgumentException with a descriptive message if any
required column is not found in the headerMap. Call this validation method
immediately after the header map is populated from the sheet (before any data
processing occurs) to fail fast and prevent silent skipping of required fields.
This ensures that missing required headers are caught upfront rather than
silently producing empty imports.

Comment on lines +261 to +273
private Integer getInteger(
Row row,
Map<String, Integer> headerMap,
String columnName,
DataFormatter formatter,
FormulaEvaluator evaluator
) {
String value = getString(row, headerMap, columnName, formatter, evaluator);
if (!StringUtils.hasText(value)) {
return null;
}
return Integer.parseInt(value);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Harden integer parsing for Excel-formatted numeric cells.

Line 272 directly calls Integer.parseInt(value). With POI DataFormatter, cells like 1,000 or 120.0 can trigger NumberFormatException, aborting and rolling back the full import.

Proposed fix
 private Integer getInteger(
@@
 ) {
     String value = getString(row, headerMap, columnName, formatter, evaluator);
     if (!StringUtils.hasText(value)) {
         return null;
     }
-    return Integer.parseInt(value);
+    String normalized = value.replace(",", "");
+    try {
+        if (normalized.endsWith(".0")) {
+            normalized = normalized.substring(0, normalized.length() - 2);
+        }
+        return Integer.parseInt(normalized);
+    } catch (NumberFormatException ex) {
+        throw new IllegalArgumentException("Invalid integer in column '" + columnName + "': " + value, ex);
+    }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private Integer getInteger(
Row row,
Map<String, Integer> headerMap,
String columnName,
DataFormatter formatter,
FormulaEvaluator evaluator
) {
String value = getString(row, headerMap, columnName, formatter, evaluator);
if (!StringUtils.hasText(value)) {
return null;
}
return Integer.parseInt(value);
}
private Integer getInteger(
Row row,
Map<String, Integer> headerMap,
String columnName,
DataFormatter formatter,
FormulaEvaluator evaluator
) {
String value = getString(row, headerMap, columnName, formatter, evaluator);
if (!StringUtils.hasText(value)) {
return null;
}
String normalized = value.replace(",", "");
try {
if (normalized.endsWith(".0")) {
normalized = normalized.substring(0, normalized.length() - 2);
}
return Integer.parseInt(normalized);
} catch (NumberFormatException ex) {
throw new IllegalArgumentException("Invalid integer in column '" + columnName + "': " + value, ex);
}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusImportService.java`
around lines 261 - 273, The getInteger method at line 272 calls
Integer.parseInt(value) directly, which throws NumberFormatException when the
DataFormatter returns Excel-formatted numbers like "1,000" or "120.0", causing
the entire import to fail. Add error handling around the Integer.parseInt call
in the getInteger method to catch NumberFormatException, and before parsing,
clean the value string by removing any formatting characters (such as commas)
and handling decimal places appropriately (either by stripping decimals or
rounding). Return null or a sensible default if parsing fails after cleanup,
allowing the import to continue gracefully instead of aborting.

Comment on lines +4 to +11
CREATE TABLE IF NOT EXISTS mock_job_posting_embeddings (
id BIGSERIAL PRIMARY KEY,
corpus_id BIGINT NOT NULL UNIQUE REFERENCES mock_job_posting_corpus(id) ON DELETE CASCADE,
embedding_model VARCHAR(100) NOT NULL,
embedding vector NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Search for CREATE TABLE statements for parent corpus tables.

rg -n --type=sql -C3 'CREATE TABLE.*mock_job_posting_corpus' 
rg -n --type=sql -C3 'CREATE TABLE.*mock_question_corpus'

Repository: JobDri-Developer/BackEnd

Length of output: 50


🏁 Script executed:

# First, let's read the entire schema.sql file to see all tables
git ls-files | grep -i schema

Repository: JobDri-Developer/BackEnd

Length of output: 98


🏁 Script executed:

# Search for the parent table names anywhere in SQL files
rg -n 'mock_job_posting_corpus|mock_question_corpus' --type=sql -B2 -A2

Repository: JobDri-Developer/BackEnd

Length of output: 902


🏁 Script executed:

# Check the structure and directory layout for SQL files
find . -name "*.sql" -type f | head -20

Repository: JobDri-Developer/BackEnd

Length of output: 100


🏁 Script executed:

# Read the complete schema.sql file to see all table definitions
cat -n src/main/resources/schema.sql

Repository: JobDri-Developer/BackEnd

Length of output: 1679


🏁 Script executed:

# Search entire repository for parent table definitions
rg -i 'CREATE TABLE.*mock_job_posting_corpus|CREATE TABLE.*mock_question_corpus' --type=sql

Repository: JobDri-Developer/BackEnd

Length of output: 50


🏁 Script executed:

# Also search in Java/Kotlin files for any entity or migration definitions
rg -i 'mock_job_posting_corpus|mock_question_corpus' --type=java --type=kt -l

Repository: JobDri-Developer/BackEnd

Length of output: 95


🏁 Script executed:

# Check for migration directories that might contain table definitions
find . -type d \( -name "*migration*" -o -name "*flyway*" -o -name "*liquibase*" \) 2>/dev/null

Repository: JobDri-Developer/BackEnd

Length of output: 50


🏁 Script executed:

# Search for parent tables in all files (no type restriction)
rg 'mock_job_posting_corpus|mock_question_corpus' -g '!.git' | head -20

Repository: JobDri-Developer/BackEnd

Length of output: 1912


🏁 Script executed:

# Check if parent tables are defined in Java entity classes
find . -name "*.java" -type f -exec grep -l "mock_job_posting_corpus\|mock_question_corpus" {} \;

Repository: JobDri-Developer/BackEnd

Length of output: 236


🏁 Script executed:

# Search for any reference to "corpus" table in the codebase
rg 'CREATE TABLE.*corpus' --type=sql -i

Repository: JobDri-Developer/BackEnd

Length of output: 50


🏁 Script executed:

# Read the MockJobPostingCorpus entity to understand table generation config
cat -n src/main/java/com/jobdri/jobdri_api/domain/corpus/entity/MockJobPostingCorpus.java | head -30

Repository: JobDri-Developer/BackEnd

Length of output: 1377


🏁 Script executed:

# Read the MockQuestionCorpus entity to understand table generation config
cat -n src/main/java/com/jobdri/jobdri_api/domain/corpus/entity/MockQuestionCorpus.java | head -30

Repository: JobDri-Developer/BackEnd

Length of output: 1436


Parent corpus tables are defined in JPA entities, not in schema.sql—verify DDL execution order prevents FK constraint failures.

The schema creates foreign key constraints referencing mock_job_posting_corpus(id) and mock_question_corpus(id), but these tables are not defined in schema.sql. They are defined as JPA @Entity classes (MockJobPostingCorpus and MockQuestionCorpus), which means their tables are created by Hibernate's DDL generation, not by this SQL file.

Since foreign key constraints require the parent table to exist at the time of the CREATE TABLE statement, schema.sql will fail if it executes before Hibernate creates these tables. Ensure proper initialization order: either disable hibernate.ddl-auto and manually manage DDL through Flyway/Liquibase, or verify schema.sql runs after JPA table generation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/main/resources/schema.sql` around lines 4 - 11, The
mock_job_posting_embeddings table in schema.sql defines foreign key constraints
referencing mock_job_posting_corpus, which is not defined in schema.sql but
instead is created by Hibernate as a JPA `@Entity` (MockJobPostingCorpus). Since
the foreign key constraint requires the parent table to exist first, schema.sql
will fail if it executes before Hibernate creates the JPA entity tables. Resolve
this by either disabling hibernate.ddl-auto in application properties and
managing all DDL through a migration tool like Flyway or Liquibase, or ensuring
schema.sql is executed after JPA table generation by adjusting your
initialization configuration. Verify that all referenced parent tables
(mock_job_posting_corpus and mock_question_corpus) are consistently managed
through the same DDL mechanism.

@shinae1023 shinae1023 merged commit fe5cd60 into main Jun 15, 2026
3 checks passed
@coderabbitai coderabbitai Bot mentioned this pull request Jun 15, 2026
23 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

✨ feat New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant