Skip to content

feat(sparkctl): add PDF extraction contract fixture#5

Open
ProfRandom92 wants to merge 9 commits into
mainfrom
spark-pdf-structured-data-adapter-v1
Open

feat(sparkctl): add PDF extraction contract fixture#5
ProfRandom92 wants to merge 9 commits into
mainfrom
spark-pdf-structured-data-adapter-v1

Conversation

@ProfRandom92
Copy link
Copy Markdown
Owner

Summary

Adds a lightweight PDF-to-structured-data adapter contract for SPARK-like administrative workflows.

This PR defines a deterministic PDF-EXTRACTION-V1 JSON schema, a synthetic/manual fixture, local runtime fixture-contract validation, focused tests, use-case documentation, and a dedicated local Agent Skill for PDF extraction contracts.

What Changed

  • Adds schemas/spark/pdf_extraction_v1.schema.json for the structured extraction contract.
  • Adds examples/spark/pdf_extraction_fixture.json as a synthetic/manual administrative planning fixture.
  • Adds docs/use-cases/PDF_TO_EVIDENCE_PACKET.md describing how extraction JSON can feed Context Pack / Evidence Packet workflows.
  • Adds runtime validation via validate_pdf_extraction_contract_value with deterministic canonical JSON hashing.
  • Adds positive and negative Rust tests for fixture contract validation.
  • Adds .agents/skills/pdf-extraction-contracts/SKILL.md for future local agent guidance.

Boundaries

  • No OCR implementation.
  • No PDF parser.
  • No external downloads.
  • No provider calls.
  • No dependency updates.
  • No Codex plugin bundle, plugin manifest, hooks, commands, or MCP server.
  • No real SPARK, Daimler, ePA, medical, or protected personal data.
  • No production, compliance, legal, forensic, official-SPARK, official-OpenAI-plugin, autonomous-approval, or guaranteed-correctness claim.
  • Provider output remains untrusted.
  • Human review remains required.

Validation

Local validation completed before push:

  • cargo fmt --all --check
  • cargo test
  • cargo clippy --all-targets --all-features -- -D warnings
  • Claim-boundary scan completed; hits were boundary/forbidden wording only.
  • Generated report churn was excluded before commit.

Reviewer Checklist

  • Confirm this is a contract/fixture/runtime-validation PR, not OCR or PDF parsing.
  • Confirm changed files are limited to schema, fixture, use-case doc, local skill, runtime validator, and tests.
  • Confirm review_required: false is rejected.
  • Confirm negative tests cover unknown fields, invalid metadata, blank fields, empty decision points, and review-required boundaries.
  • Confirm no unsafe public claims are introduced.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the PDF-EXTRACTION-V1 adapter contract, including its JSON schema, documentation, a synthetic fixture, and a Rust validator with associated tests. The reviewer feedback highlights several discrepancies between the Rust validator implementation and the JSON schema. Specifically, the validator incorrectly rejects empty tables and empty warnings, which are permitted by the schema, and fails to validate the contains_personal_data_risk field. The reviewer provides actionable code suggestions to align the Rust validation logic with the schema and recommends adding corresponding unit tests to verify these cases.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread agy7rust/src/codec/package.rs Outdated
Comment thread agy7rust/src/codec/package.rs Outdated
Comment thread agy7rust/src/codec/package.rs Outdated
Comment thread agy7rust/src/codec/package.rs
Comment thread agy7rust/tests/spark_pdf_extraction_contract.rs
Comment thread agy7rust/tests/spark_pdf_extraction_contract.rs
Comment thread agy7rust/tests/spark_pdf_extraction_contract.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant