This document is for AI security scanners and vulnerability researchers to understand the security model of Apache PDFBox and to avoid false positives.
The canonical, human-authored version of this information is published at https://pdfbox.apache.org/security.html.
Apache PDFBox is a Java library (pdfbox.jar) for creating, manipulating,
and extracting content from PDF documents. It is accessed through its public
Java API by other Java applications. It also ships command-line utilities
built on top of that API.
PDFBox supports processing of untrusted (potentially malicious or malformed) PDF files to a limited degree:
-
In scope: Remote code execution, privilege escalation, unauthorized data access, escape from an embedding application's sandbox or security boundary, or disproportionate resource amplification (i.e. small attacker-controlled inputs triggering catastrophic memory or CPU consumption) caused by processing an untrusted PDF document. These are genuine vulnerabilities and should be reported privately.
-
Known limitations (not vulnerabilities): Malformed PDFs may cause unchecked exceptions such as
NullPointerExceptionorStackOverflowError, or may consume significant memory or CPU relative to document size. General parser and resource exhaustion issues are treated as robustness limitations rather than security vulnerabilities. Pure resource exhaustion findings without privilege escalation, unauthorized access, or disproportionate amplification (e.g. tenant starvation, indexing-service disruption from a tiny crafted input) are generally out of scope.
Applications embedding PDFBox are expected to apply appropriate operational controls — such as timeouts, memory limits, process isolation, and sandboxing — when processing untrusted documents at scale or in multi-tenant environments. Resource exhaustion risks in those contexts are the responsibility of the embedding application, not of PDFBox itself.
PDFBox uses the Java Cryptography Architecture (JCA) and the Bouncy Castle libraries to implement PDF encryption (RC4, AES-128, AES-256) and digital signatures. Vulnerabilities in these dependencies should be reported to their respective projects; PDFBox will incorporate fixes in its releases as needed. Issues in how PDFBox uses those libraries (e.g. incorrect key derivation, bypass of access permissions) are in scope.
PDFBox is a library. The embedding application is assumed to control its own JVM environment, including classpath, JVM arguments, installed security providers, classloaders, and dependency resolution. Attacks that require the attacker to influence any of these are out of scope.
The command-line tools (PDFToText, PDFDebugger, etc.) are convenience
wrappers around the library API. Their security scope follows the library
scope above.
For a full list of disclosed CVEs, see https://pdfbox.apache.org/security.html.
Scanners should check there before reporting a finding to avoid duplicate reports.
Do not open a public JIRA issue for an undisclosed vulnerability.
Report undisclosed vulnerabilities by sending a plain-text email to:
security@apache.org
Send one email per vulnerability. The PDFBox security team will work with you privately to confirm and resolve the issue before public disclosure.
The typical handling process is:
- Reporter sends details to security@apache.org.
- The PDFBox security team acknowledges receipt and works privately with the reporter to validate and fix the issue.
- A new release is prepared that includes the fix.
- The vulnerability and its fix are publicly announced on the blog and in the CVE database.
For more detail on the Apache vulnerability handling process, see https://www.apache.org/security/committers.html.