Skip to content

Fix: Error handling, BAD_PDF fallback, and aspect ratio skip#48

Open
Nishit24113 wants to merge 4 commits into
mainfrom
fix/error-handling-and-logging
Open

Fix: Error handling, BAD_PDF fallback, and aspect ratio skip#48
Nishit24113 wants to merge 4 commits into
mainfrom
fix/error-handling-and-logging

Conversation

@Nishit24113

@Nishit24113 Nishit24113 commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Bedrock aspect ratio fix: Form PDFs with thin lines/borders (>20:1 ratio) no longer crash. Images exceeding Bedrock's limit are skipped and marked as "Decorative element."
  • Env var passing fix: Alt-text ECS task now reads s3_bucket/s3_key directly from Map iterator input instead of indexing into ContainerOverrides (which GuardDuty sidecar reorders).
  • Zero-silent-failure error handling: New failure-handler Lambda wired to Step Functions Catch. On any failure, writes result/FAILED_.json with reason category and chunk/page range so the frontend stops polling and shows the user what happened.
  • BAD_PDF fallback: When Adobe API rejects a PDF as damaged/too complex (BAD_PDF 400), the container now bypasses autotag/extract, delivers the viewer-preferences PDF, and writes an empty image DB so alt-text completes cleanly. User gets a result instead of nothing.

Files changed

  • adobe-autotag-container/adobe_autotag_processor.py — BAD_PDF fallback + station error reporting
  • alt-text-generator-container/alt_text_generator.js — aspect ratio validation + skip logic
  • app.py — failure-handler Lambda, Step Functions Catch, env var fix
  • lambda/failure-handler/main.py — new Lambda (aggregates errors, writes FAILED marker)
  • lambda/pdf-splitter-lambda/main.py — pre-Step-Function failure reporting

Test plan

  • Upload a form PDF with thin horizontal lines — passes (aspect ratio images skipped as decorative)
  • Upload a complex/damaged PDF that Adobe rejects — passes (BAD_PDF fallback delivers result)
  • Upload 24 diverse real-world PDFs — all pass
  • Verify failure markers written correctly when actual unrecoverable failures occur
  • Reviewer to verify and merge when ready

…ixes

Bug fixes (from PR #46 and #45):
- Fix env var passing in RunAltTextGenerationTask: read s3_bucket/s3_key
  directly from the Map iterator input instead of indexing into the ECS
  ContainerOverrides array, which GuardDuty sidecar injection can reorder
- Fix Bedrock aspect-ratio rejection for form PDFs: skip images exceeding
  the 20:1 limit and assign "Decorative element" alt text instead of crashing

Error handling (no failure can go unreported):
- New failure-handler Lambda wired to a Step Functions Catch (States.ALL).
  On any failure it aggregates per-station detail and writes
  result/FAILED_<name>.json where the frontend already polls, carrying the
  reason category and failing chunk/page range
- Instrument all 5 stations to write temp/<name>/_errors/<station>.json plus
  a structured CloudWatch line (station, reason, chunk, page range)
- Splitter writes the marker directly (it runs before the state machine)
- Fix two silent-success paths: the title Lambda returned 500 dicts and the
  Java merger returned an error string, both treated as success by Step
  Functions; they now report and raise so the Catch fires
- Add docs/ERROR_HANDLING.md describing the frontend FAILED_ marker contract
Pass the Step Functions execution ARN ($$.Execution.Id) into the failure-handler
payload and record it in result/FAILED_<name>.json so support can trace the exact
failed execution. Verified: markers now include the execution ARN.
…jects a PDF

When Adobe API returns BAD_PDF (400), the container now bypasses autotag/extract,
uploads the viewer-preferences PDF as output, and writes an empty image DB so the
alt-text step completes cleanly. The user gets a partially-remediated result instead
of a silent failure.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant