diff --git a/README.md b/README.md index 3d0bf54..833a5d6 100644 --- a/README.md +++ b/README.md @@ -366,17 +366,27 @@ PQF is **spec-first, not implementation-first.** The specification is the source ## Cryptographic review wanted -PQF is explicitly seeking review from cryptographers and post-quantum implementers on the following normative sections of [spec/PQF-SPEC-v1.md](spec/PQF-SPEC-v1.md): +PQF is explicitly seeking review from cryptographers and post-quantum implementers. **Start here** if you're reviewing: -- **§2.4** — Hybrid KEM combiner construction (HKDF salt/IKM layout, label binding). Note: the spec uses two distinct strings here — `pqf1-concat-extract-v1` is the algorithm-identifier value placed in the CBOR header field `alg.combiner`; `PQF1-combiner-v1` is the literal byte prefix of the HKDF salt. Both are intentional; the in-tree reference implementation lives in [`HkdfCombiner.cs`](src/PostQuantum.FileFormat/Crypto/HkdfCombiner.cs). -- **§5.2** — Per-chunk AEAD construction and AAD binding (`file_id || chunk_index || is_final`). +- [`spec/PQF-OVERVIEW.md`](./spec/PQF-OVERVIEW.md) — a 3-page reviewer overview that summarizes goals, threat model, primitives, wire format, and the five decisions worth focusing on. Read this first. +- [`spec/external-review/REVIEW-STATUS.md`](./spec/external-review/REVIEW-STATUS.md) — honest layer-by-layer record of what's been reviewed (X-Wing combiner ✅ inherited from upstream), what's been LLM-assisted only (⚠️), and what hasn't been touched yet (❌). + +The normative sections most worth scrutiny in [spec/PQF-SPEC-v1.md](spec/PQF-SPEC-v1.md): + +- **§2.4** — X-Wing combiner adoption (PQF 0.6 dropped the in-house `pqf1-bind-extract-v1` HKDF construction for the standardized X-Wing combiner from draft-connolly-cfrg-xwing-kem). KEK derivation is now `SHA3-256(ss_M || ss_X || ct_X || pk_X || XWING_LABEL)`. The in-tree implementation lives in [`XWingKem.cs`](src/PostQuantum.FileFormat/Crypto/XWingKem.cs). What review should focus on is the PQF-specific glue around X-Wing: per-recipient and per-file binding pushed to the DEK-wrap AEAD AAD (`file_id || recipient_index`) since the X-Wing combiner has no salt slot for either. +- **§5.2** — Per-chunk AEAD construction and AAD binding (`file_id || chunk_index || is_final`), with per-chunk-rekey + zero nonce. - **§6.2 step 9** — File-signature coverage composition (`file_id || sha256(chunks) || footer`). - **§6.3 step 7** — ML-KEM implicit-rejection timing and recipient-trial constant-time posture. - **§6.4** — Authenticated vs Streaming Mode failure-signaling contract. -A running list of spec-level questions the author would value review on — including the open question of whether header-signature and file-signature messages should carry distinct domain-separation prefixes (§6.2), and whether the footer should be AEAD-bound on unsigned files — lives in [`spec/PQF-DESIGN-RATIONALE-v1.md` §11](./spec/PQF-DESIGN-RATIONALE-v1.md#11-open-questions-the-author-acknowledges). +A running list of spec-level questions the author would value review on lives in [`spec/PQF-DESIGN-RATIONALE-v1.md` §11](./spec/PQF-DESIGN-RATIONALE-v1.md#11-open-questions-the-author-acknowledges). + +**How to give feedback:** -If you find an issue, please open a [GitHub Issue](https://github.com/systemslibrarian/PostQuantum.FileFormat/issues) or start a thread under [Discussions](https://github.com/systemslibrarian/PostQuantum.FileFormat/discussions). Reproducible refusal cases are especially welcome and will be folded into the negative test-vector set. +- Quick reaction or pointer: open a [GitHub Issue](https://github.com/systemslibrarian/PostQuantum.FileFormat/issues) or start a thread under [Discussions](https://github.com/systemslibrarian/PostQuantum.FileFormat/discussions). +- Reproducible refusal cases: open an issue with the vector — these get folded into the negative test-vector set under [`test-vectors/v1/cases/TV-NEG-*.pqf`](./test-vectors/v1/). +- Security-sensitive findings: use the private channel in [`SECURITY.md`](./SECURITY.md). +- Want to verify the conformance claim yourself before reviewing? See [`test-vectors/QUICKSTART.md`](./test-vectors/QUICKSTART.md) — two commands, ~2 minutes, watches the independent Rust reader accept every .NET-written vector. ## Where to go next diff --git a/spec/PQF-OVERVIEW.md b/spec/PQF-OVERVIEW.md new file mode 100644 index 0000000..08ca0ff --- /dev/null +++ b/spec/PQF-OVERVIEW.md @@ -0,0 +1,242 @@ +# PQF in 10 Minutes — Reviewer Overview + +**Status:** DRAFT / EXPERIMENTAL — do not protect irreplaceable data with v1. +**Document version:** 0.6.0 (2026-05-30). +**Companion to:** [`PQF-SPEC-v1.md`](./PQF-SPEC-v1.md) (normative, 1312 lines), +[`PQF-DESIGN-RATIONALE-v1.md`](./PQF-DESIGN-RATIONALE-v1.md) (688 lines), +[`ietf/draft-clark-pqf-00.md`](./ietf/draft-clark-pqf-00.md) (IETF I-D). + +If you have 10 minutes, read this first. It exists so a busy reviewer can +decide whether the cryptographic core is worth a deeper look without paging +through 2,000 lines of spec. + +--- + +## What PQF is + +A single-file container for encrypting data at rest to one or more +recipients, **hybrid post-quantum by default**: every confidentiality +operation combines a classical KEM with a post-quantum KEM, and every +signature combines a classical signature with a post-quantum signature. +A break in either family alone does not compromise the file. + +Mental model: PQF is to age / gpg / PKCS #7 enveloped data what age was to +PGP — smaller surface, opinionated, format-frozen — but with PQ baked into +v1 instead of bolted on as plugins. + +## What PQF is not + +- A TLS replacement, messaging protocol, or disk-encryption scheme. +- A general-purpose archive format (no multi-file, no compression). +- A solution for forward secrecy in the messaging sense. +- A privacy layer — the header is unencrypted; recipient public-key + hashes are visible. +- A drop-in replacement for any existing format. v1 is wire-incompatible + with everything, intentionally. + +## Threat model in one paragraph + +The motivating adversary is **harvest-now-decrypt-later**: a passive +attacker who archives ciphertext today and runs a CRQC against it in +twenty or thirty years. Files in scope are things that must remain +confidential across that horizon — medical records, legal archives, +classified research, library special collections, sealed court records. +Hybrid construction means confidentiality holds if *either* the classical +or the post-quantum primitive remains unbroken; an attacker needs both +broken to win. The trust boundary is the encrypting host's CSPRNG and a +correct primitive implementation; everything else PQF specifies is +fail-closed by construction. + +--- + +## Primitives (v1, frozen) + +| Slot | Primitive | Reference | +|---|---|---| +| Hybrid KEM | **X-Wing** = X25519 + ML-KEM-768 | draft-connolly-cfrg-xwing-kem; IND-CCA in ROM/QROM per Barbosa et al. 2024 | +| Hybrid signature | Ed25519 + ML-DSA-87 (concat: 64 + 4627 = 4691 bytes) | RFC 8032, FIPS 204 | +| Payload AEAD | AES-256-GCM, per-chunk-rekeyed | NIST SP 800-38D | +| KDF | HKDF-SHA-256 (chunk-key expansion); SHA3-256 (X-Wing combiner) | RFC 5869, FIPS 202 | +| Header encoding | Deterministic CBOR | RFC 8949 §4.2.2 | + +Readers MUST refuse files that don't exactly match this primitive set. +Algorithm agility is by format-version bump, not by negotiation inside +v1. + +## Wire format at a glance + +``` ++----------------------------------------+ offset 0 +| Magic "PQF1" (4) | +| Version uint16 BE = 0x0001 (2) | +| Header length uint32 BE (4) | ++----------------------------------------+ offset 10 +| Header: deterministic CBOR (N bytes) | { alg, chunk_size, created, +| | file_id, recipients[], signer? } ++----------------------------------------+ +| Header signature (4691 bytes) | present iff signer != null ++----------------------------------------+ +| Payload: sequence of chunks | each: len(4) || flags(1) || ct+tag +| | bit-0 of flags = is_final ++----------------------------------------+ +| Footer (20 bytes) | "PQFE" || chunk_count u64 BE +| | || plaintext_bytes u64 BE ++----------------------------------------+ +| File signature (4691 bytes) | present iff signer != null ++----------------------------------------+ EOF +``` + +There is no padding, no trailing data, and no placeholder slots — absent +fields are absent, not zero-filled. A 1 MiB cap on the header prevents +oversized-header DoS while leaving comfortable room for ~100 recipients. + +--- + +## The five decisions a reviewer should examine + +If you're going to look closely at one part of the design, these are +where the substance lives. Each links to the full discussion. + +### 1. X-Wing as the KEM combiner (§2.4) + +`KEK_recipient = SHA3-256( ss_M || ss_X || ct_X || pk_X || "\.//^\" )` +where ss_M is the ML-KEM-768 secret, ss_X the X25519 secret, ct_X the +X25519 ephemeral public key (which X-Wing treats as a ciphertext), and +pk_X the recipient's X25519 long-term public key. This is the +construction defined and analyzed in draft-connolly-cfrg-xwing-kem — +PQF 0.6 cut over from a PQF-author in-house combiner +(`pqf1-bind-extract-v1`) to standardized X-Wing precisely so it could +inherit the proof. + +### 2. Per-recipient + per-file binding pushed to AEAD AAD (§2.4) + +X-Wing's combiner has no salt slot for the file instance or the +recipient slot. PQF binds those at the next layer instead: +`wrapped_dek_aad = file_id (16) || recipient_index (uint32 BE)`. A KEK +derived for recipient *i* cannot unwrap recipient *j*'s DEK wrap (AADs +differ); a KEK from one file cannot unwrap another file's wrap (file_id +differs). The cross-recipient and cross-file isolation properties are +preserved without modifying the combiner. + +### 3. Per-chunk HKDF + zero nonce + `is_final` in AAD (§5.2) + +Each chunk uses a fresh `chunk_key = HKDF-Expand(DEK, "PQF1-chunk-v1" || +i (8 bytes BE), 32)` with a fixed 12-byte zero nonce. Safe under SP +800-38D §8.2 iff three invariants hold (all REQUIRED by the spec): DEK +freshness per file, monotonic in-order chunk indices, single-producer +writer. The per-chunk AAD includes file_id, chunk index, and an +`is_final` bit — so truncation is detected at AEAD verify, not just at +the footer. + +### 4. Optional hybrid signatures over `file_id || sha256(chunks) || footer` (§6.2) + +When present, the file signature commits to the file identity, the +exact chunk stream, and the footer in one pass. Truncation, chunk +substitution, and footer tampering are all signature-detectable in one +verification. Header and file signatures carry disjoint domain prefixes +(`PQF1-header-sig-v1`, `PQF1-file-sig-v1`, added in 0.5) so the two +signature messages cannot collide. + +### 5. ML-KEM implicit-rejection handling for the recipient trial (§6.3, §8.8) + +A reader walks every recipient slot in constant time regardless of +which one matches. ML-KEM's implicit rejection guarantees that +decapsulating a wrong-recipient ciphertext returns a pseudorandom +secret, so the AEAD tag — not the KEM result — is the sole signal of a +true match. The same property is the basis for the bounded "weak +deniability" claim in §8.8, which the spec deliberately states with +narrow language. + +--- + +## Modes of decryption + +PQF defines two normative reader modes (§6.4): + +- **Authenticated Mode** — verify every signature and AEAD tag *before* + emitting any plaintext. Required for archival; default for new code. +- **Streaming Mode** — emit plaintext as it verifies, before the + file-level signature is checked. Permitted, but the spec is strict: + if any post-hoc check fails, the reader MUST signal failure to the + consumer in a way that cannot be silently swallowed. "Logged it" is + explicitly non-conforming. + +The distinction matters because the chunked AEAD lets you start emitting +plaintext at chunk 0, but the file-level signature (if present) covers +the whole chunk stream. Streaming mode is a deliberate tradeoff against +the bounded-memory requirement, not an oversight. + +--- + +## What has been done + +| | Status | +|---|---| +| Normative spec (1312 lines, version 0.6.0) | shipped | +| Companion design rationale (688 lines, sections 1–12 + §10 reviewer guide + §11 open questions) | shipped | +| IETF Internet-Draft (`draft-clark-pqf-00`) | drafted, not submitted | +| Machine-checkable CDDL header schema | shipped, enforced in CI | +| Reference .NET writer + reader (BouncyCastle) | shipped | +| Independent Rust reader (ml-kem 0.3, ml-dsa 0.1, x25519-dalek 2, aes-gcm 0.10) | shipped | +| Independent Rust writer (same crate set; for differential testing) | shipped | +| Python binding (maturin) | shipped | +| WASM bundle (`.github/workflows/pages.yml`) | shipped | +| Cross-implementation conformance suite (Rust reader ↔ .NET vectors, 8 cases + 50 random containers) | shipped, in CI | +| X-Wing draft KAT replay against published IETF vectors | shipped, in CI | +| KAT vectors for HKDF chunk-key derivation, AEAD construction | shipped | +| Reproducible test-vector regeneration | shipped, in CI | + +Independent implementations exercising the same wire format are the +single most credible interop evidence the project has. The Rust reader +and the .NET writer share no code; their agreement on every test vector +is mechanical, not coincidental. + +## What has *not* been done + +- **No external cryptographic review.** All review to date has been + internal or LLM-assisted (Grok, ChatGPT). This document exists to + invite real review. +- **No formal security proof of the AAD-binding construction.** The + AAD-side binding (§2.4 second half) is straightforward but + unreviewed. The KEM combiner itself inherits X-Wing's proof; the + PQF-specific glue does not yet have one. +- **No public security audit.** No NCC, Cure53, Trail of Bits, etc. + involvement. +- **Side-channel posture is inherited from libraries.** PQF specifies + constructions, not constant-time implementations. +- **No IETF submission.** The I-D in `spec/ietf/` is drafted; whether + to submit depends partly on the response to this document. + +--- + +## Open questions the author would value review on + +From `PQF-DESIGN-RATIONALE-v1.md` §11, in priority order: + +1. **Combiner sufficiency.** Is `SHA3-256(ss_M || ss_X || ct_X || pk_X || + label)` plus AAD-binding strong enough for the multi-recipient + archival threat model, or is there a known stronger construction + that's still simple? +2. **Deniability framing.** §8.8 claims *weak* deniability deliberately. + Is the claim correctly bounded — neither over- nor under-stated? +3. **Footer integrity on unsigned files.** Signed files cover the footer + via the file signature; unsigned files rely on structural checks. Is + that gap worth closing in v1.1 via AEAD-binding the footer? +4. **Constant-time recipient trial.** Does the spec's prose make the + constant-time-over-recipients requirement implementable, or is + tightening needed? +5. **Deterministic CBOR in the wild.** The spec requires *enforcement*, + not just *production*. Is the "parse-strict OR re-encode-and-compare" + rule workable across major-language CBOR libraries? + +## Where to go from here + +- **Full spec:** [`PQF-SPEC-v1.md`](./PQF-SPEC-v1.md) +- **Design rationale (why each decision):** [`PQF-DESIGN-RATIONALE-v1.md`](./PQF-DESIGN-RATIONALE-v1.md) +- **IETF Internet-Draft:** [`ietf/draft-clark-pqf-00.md`](./ietf/draft-clark-pqf-00.md) +- **CDDL header schema:** [`pqf-header.cddl`](./pqf-header.cddl) +- **Conformance test vectors:** [`test-vectors/v1/`](../test-vectors/v1/) +- **Reference implementations:** `src/` (.NET), `impl/rust/` (Rust) + +Contact: **Paul Clark** <paul@systemslibrarian.dev>. +Review feedback is welcomed by email, by GitHub issue, or by PR. diff --git a/spec/PQF-SPEC-v1.md b/spec/PQF-SPEC-v1.md index 544dbfe..a97ec58 100644 --- a/spec/PQF-SPEC-v1.md +++ b/spec/PQF-SPEC-v1.md @@ -704,8 +704,8 @@ attempts are discarded and zeroed. **Recipient count guidance.** Implementations SHOULD support at least 100 recipients per file. Cost scales linearly: each trial requires one X25519 -scalar multiplication, one ML-KEM-1024 decapsulation, one HKDF derivation, -and one AES-GCM decrypt attempt. +scalar multiplication, one ML-KEM-768 decapsulation, one SHA3-256 +invocation (the X-Wing combiner, §2.4), and one AES-GCM decrypt attempt. ### 6.6 Secret hygiene @@ -1065,8 +1065,8 @@ future versions of the v1 reference implementation. | Confidentiality at rest (hybrid PQ) | §2.1 X-Wing (X25519+ML-KEM-768), §2.3 AEAD, §2.4 combiner | | Payload integrity | Per-chunk AEAD tags, footer validation | | Sender authenticity (optional) | §2.2 hybrid signatures | -| Replay resistance across files | `file_id` in AAD and salt (§8.5) | -| Cross-recipient isolation | `recipient_index` in salt (§8.7) | +| Replay resistance across files | `file_id` in AEAD AAD (§8.5) | +| Cross-recipient isolation | `recipient_index` in DEK-wrap AAD plus `pk_X` bound into the X-Wing combiner (§8.7) | | Downgrade resistance | Exact-match `alg` validation (§8.6) | | Truncation resistance | `is_final` in AAD + footer counts + file signature | | Weak recipient deniability | ML-KEM implicit rejection + constant-time trial (§8.8) | @@ -1154,39 +1154,93 @@ Required negative test vectors: - TV-NEG-020: Chunk length exceeds remaining file bounds — refuse - TV-NEG-021: `created` not in RFC 3339 UTC "Z" form — refuse - TV-NEG-022: Streaming Mode signed file with post-hoc signature failure — caller MUST receive authentication failure signal +- TV-NEG-023: Unknown top-level header field (header-schema mutation variant) — refuse +- TV-NEG-024: Unknown field inside `alg` (header-schema mutation variant) — refuse +- TV-NEG-025: Unknown field inside a recipient block — refuse +- TV-NEG-026: Unknown field inside `signer` — refuse +- TV-NEG-027: Algorithm-identifier mismatch (e.g. `kem` = "x25519+ml-kem-1024") — refuse +- TV-NEG-028: Missing required field (`chunk_size` removed) — refuse +- TV-NEG-029: Empty `recipients` array — refuse +- TV-NEG-030: Malformed `created` timestamp (non-UTC offset) — refuse +- TV-NEG-031: Invalid `chunk_size` (not a power of 2 in [4096, 16777216]) — refuse +- TV-NEG-032: Binary field length mismatch (`file_id` not 16 bytes) — refuse +- TV-NEG-033: Duplicate CBOR map key in the header — refuse + +`SPEC-CHECKLIST.md` §11 maps every fail-closed refusal class to its +portable test-vector ID. The manifest in +`test-vectors/v1/manifest.json` (see §12.2) is authoritative for each +vector's expected `RefusalReason`. ### 12.2 Test vector file format -Each test vector is a directory containing a `manifest.json` and fixture files: +A spec version's conformance suite ships as **one** `manifest.json` per +spec version plus a `cases/` directory of `.pqf` fixture files. The +authoritative JSON Schema is +[`test-vectors/v1/manifest.schema.json`](../test-vectors/v1/manifest.schema.json); +the shape is summarized below for spec readers. ```json { - "id": "TV-003", - "description": "Three recipients, signed, 100 KiB plaintext", - "spec_version": "1", - "kind": "positive", - "processing_mode": "authenticated", - "inputs": { - "recipient_identities": ["identity-alice.bin", "identity-bob.bin", "identity-carol.bin"], - "signer_identity": "signer.bin", - "plaintext": "plaintext.bin", - "randomness": "randomness.bin" - }, - "expected": { - "file": "expected.pqf", - "decrypted_plaintext_sha256": "" - } + "Version": "v1", + "Identities": [ + { + "Id": "id-a", + "PublicKey": "", + "X25519PrivateKey": "", + "MlKem768PrivateKey": "" + } + ], + "Vectors": [ + { + "Id": "TV-001", + "File": "cases/TV-001.pqf", + "Expect": "success", + "Identity": "id-a", + "Reason": null, + "StreamingPostHocFailure": false, + "PlaintextSha256": "" + }, + { + "Id": "TV-NEG-001", + "File": "cases/TV-NEG-001.pqf", + "Expect": "refuse", + "Identity": "id-a", + "Reason": "MagicMismatch", + "StreamingPostHocFailure": false, + "PlaintextSha256": null + } + ] } ``` -The `processing_mode` field MAY be `"authenticated"`, `"streaming"`, or -`"either"`. - -Binary fixture format for decryption identities: -`version (0x01) || x25519_sk (32) || x25519_pk (32) || mlkem_sk (2400) || mlkem_pk (1184)` - -Binary fixture format for signing identities: -`version (0x01) || ed25519_sk (32) || ed25519_pk (32) || mldsa_sk (4896) || mldsa_pk (2592)` +Field summary: + +- `Version` — manifest format version; `"v1"` for the canonical + committed set, `"v1-differential"` for ephemeral randomized batches + produced by the differential test driver. +- `Identities[]` — non-empty list of named test identities; each + vector references one by `Identity`. +- `Identities[].PublicKey` — base64 of the 1217-byte canonical + `PqfPublicKey` (`0x01 || X25519 (32) || ML-KEM-768 EK (1184)`). +- `Identities[].MlKem768PrivateKey` — base64 of the ML-KEM-768 + decapsulation key as serialized by the writing implementation's + crypto provider. The serialized form is provider-defined; + BouncyCastle's serialized form is the current canonical + reference. Cross-implementation readers may need to translate. +- `Vectors[].Expect` — exactly `"success"` or `"refuse"`. +- `Vectors[].Reason` — for refused vectors, the `RefusalReason` + enum value the reader MUST return (e.g. `"MagicMismatch"`, + `"UnknownHeaderField"`, `"AlgorithmIdentifierMismatch"`); `null` + for successful vectors. +- `Vectors[].StreamingPostHocFailure` — `true` only for the + streaming-mode post-hoc signature failure case (TV-NEG-022 in + v1). +- `Vectors[].PlaintextSha256` — for successful vectors, the hex + SHA-256 of the expected decrypted plaintext; `null` for refused + vectors. + +The schema enforces `additionalProperties: false` at every level — an +unknown JSON field in the manifest is itself a conformance failure. ### 12.3 Spec authority diff --git a/spec/SECURITY-CONSIDERATIONS.md b/spec/SECURITY-CONSIDERATIONS.md new file mode 100644 index 0000000..cac479d --- /dev/null +++ b/spec/SECURITY-CONSIDERATIONS.md @@ -0,0 +1,333 @@ +# PQF Security Considerations (consolidated) + +**Status:** companion to [`PQF-SPEC-v1.md`](./PQF-SPEC-v1.md) §8 and to the +[IETF Internet-Draft](./ietf/draft-clark-pqf-00.md) Security +Considerations section. Where this document and `PQF-SPEC-v1.md` §8 +disagree, **the spec wins**. + +**Document version:** 0.6.0 (2026-06-04). + +This document re-presents PQF's security argument in IETF-style +"assumed primitive properties → claimed file-level guarantees → +construction argument → known gaps" form so that an external reviewer +can evaluate the argument without paging through 1300 lines of +normative spec. It folds in material from `PQF-SPEC-v1.md` §8, +`PQF-DESIGN-RATIONALE-v1.md` §8, and the IETF draft's Security +Considerations section, and is cite-able from the I-D. + +--- + +## 1. Adversary model and trust boundary + +PQF defends a file against an adversary who can: + +- Read, copy, modify, truncate, replay, and reorder the ciphertext + bytes at rest in transit between the writer and the reader. +- Observe arbitrary metadata around the file (size, time of arrival, + envelope file system, etc.). +- In the harvest-now-decrypt-later case, archive the ciphertext for + decades and run a future cryptographically-relevant quantum computer + against it. +- Hold zero, one, or many recipient identities — but not the identity + the file is authored to decrypt against. + +PQF assumes a trusted writer-side environment for the duration of +encryption (CSPRNG-quality randomness, uncompromised secret memory) and +a trusted reader-side environment for the duration of decryption. +PQF does not defend the *endpoints*; it defends the *file*. + +## 2. Assumed primitive properties + +PQF's security argument is conditional on the following standardized +assumptions. None of these are PQF's to prove. If any is broken in the +underlying library, PQF's corresponding property fails. + +| Primitive | Property assumed | Reference | +|---|---|---| +| X25519 | DH-CKS hardness | RFC 7748 | +| ML-KEM-768 | IND-CCA2 | FIPS 203 (Category 3) | +| X-Wing combiner | IND-CCA in ROM and QROM, given the above | draft-connolly-cfrg-xwing-kem; Barbosa, Boudgoust, Bouvier, Damgård, Kaledin, Pointcheval, Renes 2024 | +| Ed25519 | EUF-CMA | RFC 8032 | +| ML-DSA-87 | sUF-CMA | FIPS 204 (Category 5) | +| AES-256-GCM | IND-CCA2 and INT-CTXT in the per-(key, IV) sense | NIST SP 800-38D | +| SHA-256, SHA-3-256 | preimage / collision / second-preimage resistance at standard levels | FIPS 180-4, FIPS 202 | +| HKDF-SHA-256 | strong pseudorandom expansion from a uniform 256-bit IKM | RFC 5869 | +| Deterministic CBOR (RFC 8949 §4.2.2) | injective encoding: distinct logical structures cannot map to the same byte sequence | RFC 8949 | +| Host CSPRNG | unpredictable to the adversary at encryption time | platform-dependent | + +Implementations MUST use library implementations that satisfy these +properties. The reference .NET implementation uses the BCL native ML-KEM +and ML-DSA on .NET 10 (platform-backed); the Rust reader/writer use the +RustCrypto `ml-kem`, `ml-dsa`, `x25519-dalek`, and `aes-gcm` crates. +Constant-time posture is inherited from the library; see §6 below. + +## 3. Claimed file-level guarantees + +Given the assumptions in §2, PQF v1 0.6 claims the following. + +### 3.1 Confidentiality (always) + +For a recipient *R* who does not hold a target recipient's identity, the +file's plaintext is indistinguishable from random to *R*. This holds +under the hybrid disjunction: it remains true if **either** X25519's +classical hardness **or** ML-KEM-768's post-quantum CCA security is +preserved. + +### 3.2 Integrity (always) + +Any single-byte modification of the chunk stream, any reordering of +chunks, any truncation, any extension, any modification of the footer, +and any tampering with `wrapped_dek`, `pqc_ct`, or `classical_epk` in +any recipient block, is detected at decryption time by AEAD tag failure +or footer-count mismatch. There is no recovery path. + +### 3.3 Authenticity (signed files only) + +For a file with a `signer` block, both Ed25519 and ML-DSA-87 +signatures must verify on both signed messages (the header signature +and the file signature). Either failure causes refusal. The hybrid +property carries to authenticity: a forger needs to break **both** +Ed25519 **and** ML-DSA-87 to produce an accepted forgery. + +### 3.4 Cross-recipient isolation (always) + +A KEK derived for recipient *i* cannot decrypt recipient *j*'s DEK +wrap. This holds via two independent mechanisms: + +1. The X-Wing combiner binds the recipient's long-term X25519 public + key `pk_X` into the KDF input, so each recipient's KEK is + intrinsically tied to their own key. +2. The DEK-wrap AEAD includes `recipient_index` in its AAD, so even if + two recipient slots somehow derived the same KEK (they cannot under + X-Wing), the wrapped-DEK tags would mismatch on the wrong slot. + +### 3.5 Cross-file isolation (always) + +A KEK derived for one file cannot decrypt any other file's DEK wrap. +This holds via the DEK-wrap AEAD's `file_id` AAD field. Even if an +attacker could replay a recipient block from file A into file B, the +AAD mismatch would refuse the DEK unwrap. + +### 3.6 Downgrade resistance within v1 (always) + +The `alg` map is covered by the header signature when present, and is +always verified against exact-match string values for each of `aead`, +`combiner`, `kdf`, `kem`, and `sig`. Any deviation refuses the file at +parse. Algorithm agility is by format-version bump, not by negotiation. + +### 3.7 Weak deniability (always; see §5.2) + +ML-KEM's implicit-rejection property plus PQF's constant-time +recipient trial means that an observer who does not hold any +recipient's identity cannot distinguish, from observation of a +decryption attempt's failure, whether the attempting party was a +recipient or not. This is a **weak** guarantee — strictly weaker than +the deniability of a protocol designed for it (OTR, Signal). The spec +deliberately names it (§8.8) so callers neither rely on it nor +misunderstand it. + +## 4. Construction-by-construction argument + +### 4.1 KEM combiner + +The KEK is derived per recipient as: + +``` +KEK = SHA3-256( ss_M || ss_X || ct_X || pk_X || XWING_LABEL ) +``` + +This is the construction defined and analyzed in +draft-connolly-cfrg-xwing-kem, with IND-CCA proofs in ROM and QROM. PQF +inherits the proof unchanged for this step. The PQF-specific glue (per +§4.2) is at the next layer. + +**Why this matters relative to PQF 0.5.x:** the earlier in-house +`pqf1-bind-extract-v1` HKDF combiner was a PQF-author construction +without external security analysis. The 0.6 cutover to X-Wing +deliberately replaces an unproven construction with a proven one. The +0.5 combiner is gone; readers MUST refuse files that declare it. + +### 4.2 Per-recipient and per-file binding via AEAD AAD + +X-Wing's combiner has no slot for the file instance or the recipient +slot. PQF binds those at the next layer: + +``` +wrapped_dek_aad = file_id (16 bytes) || recipient_index (uint32 BE) +``` + +The argument: the AEAD-AAD binding step preserves cross-recipient and +cross-file isolation because AES-256-GCM's INT-CTXT property +guarantees that ciphertexts with mismatched AADs cannot be made to +verify. The reduction is: an adversary who could unwrap a DEK with the +wrong (file_id, recipient_index) AAD would yield an AES-GCM forgery, +contradicting INT-CTXT under SP 800-38D. + +**Status:** this step is straightforward but has not been independently +reviewed. See §5.1. + +### 4.3 Per-chunk AEAD + +For chunk *i*: + +``` +chunk_key = HKDF-Expand(DEK, "PQF1-chunk-v1" || i (8B BE), L=32) +chunk_nonce = 12 bytes, all zero +aad = file_id (16B) || i (8B BE) || is_final (1B: 0x00 or 0x01) +ciphertext = AES-256-GCM-Encrypt(chunk_key, chunk_nonce, plaintext_i, aad) +``` + +The argument: NIST SP 800-38D §8.2 permits any deterministic IV +construction subject to (key, IV) uniqueness. PQF satisfies that by +making the key per-chunk-unique (per-chunk HKDF expansion of the DEK) +rather than the IV. Three invariants must hold for the argument; all +are spec-mandated: + +1. DEK freshness per file (single random 256-bit value, never reused). +2. Chunk indices assigned monotonically `0, 1, …, n-1` with no gaps or + repetition. +3. Single-producer writer; no concurrent encryptors on the same DEK. + +Truncation, chunk substitution, chunk reordering, and final-chunk +spoofing are detected by the per-chunk AAD (including `is_final`) plus +the footer integrity check (§4.5). + +**Status:** the construction is standard but the specific +HKDF-expand-from-DEK + zero-nonce + is_final-in-AAD composition has not +been independently reviewed. + +### 4.4 Hybrid signature composition (signed files only) + +Both signatures (header and file) cover their messages via the same +hybrid scheme: + +``` +hybrid_sig = ed25519_sig(64 bytes) || mldsa_sig(4627 bytes) +``` + +The signed messages are domain-separated: + +- header_sig_message = `"PQF1-header-sig-v1"` || header_bytes +- file_sig_message = `"PQF1-file-sig-v1"` || file_id || sha256(chunk_bytes) || footer + +The argument: the hybrid disjunction is preserved trivially by AND- +combining (both halves must verify). Domain-separation prefixes ensure +the two signed messages can never collide regardless of header content +or chunk hash. + +**Status:** the AND-of-EUF-CMA composition is standard but has not +been independently reviewed for this construction; in particular, there +is no formal analysis of using the same hybrid keypair for both +signature messages — though the disjoint domain prefixes are intended +to close any such concern by construction. + +### 4.5 Footer integrity + +The 20-byte footer (`"PQFE"` || chunk_count u64 BE || plaintext_bytes +u64 BE) is covered by the file signature when present. + +**On unsigned files**, the footer is protected only by: +- Structural parser checks (magic match, count vs observed-chunks + match). +- The implicit consistency check from per-chunk AAD. + +An adversary who cannot break AEAD cannot forge a footer that matches +an existing chunk stream, so the practical exposure on unsigned files +is denial of service (parser refuses), not silent acceptance of a +tampered length. Whether this should be tightened in v1.1 by +AEAD-binding the footer is rationale §11 question #7 — an open +question solicited for reviewer input. + +## 5. Known unanalyzed compositions + +These are claims PQF asserts but cannot point to formal external +analysis for. They are flagged here, in `REVIEW-STATUS.md`, and in +`PQF-DESIGN-RATIONALE-v1.md` §11. + +### 5.1 The AAD-binding step (§4.2) + +The argument is straightforward — INT-CTXT of AES-GCM under correct +key derivation — but PQF has not had a cryptographer independently +work through the reduction in the multi-recipient archival setting. +This is the highest-priority gap for review. + +### 5.2 The strength bound on the weak-deniability claim (§3.7) + +The claim is bounded carefully in §8.8, but the bound has not been +formally argued. Specifically: does the constant-time recipient trial +plus implicit rejection actually achieve indistinguishability against +an adversary who can choose the ciphertext, or only against a passive +observer? The spec language is conservative; a tighter or looser bound +may be provable. + +### 5.3 The per-chunk construction under partial-write / truncation / +recovery (§4.3) + +The construction is correct under the three spec invariants. The spec +specifies refusal semantics on truncation. What has not been examined: +behavior under adversarial partial-write that produces *valid-looking* +prefixes — does the reader's refusal path actually fire at every such +prefix? The differential test suite exercises some of this; a +systematic adversarial fuzzer has not been pointed at it. + +### 5.4 Constant-time recipient trial as actually implemented + +The .NET and Rust references iterate every recipient block regardless +of match. Whether the *instruction stream* is timing-independent +depends on the library implementations of ML-KEM decap and the AEAD +verify primitive, which PQF inherits. No on-machine timing measurement +has been performed; this is a published open question for v1.1. + +## 6. Adversary capabilities explicitly out of scope + +| Capability | Why out of scope | +|---|---| +| Traffic analysis on file size | Padding is a separate layer with its own tradeoffs; PQF deliberately doesn't pad. | +| Recipient anonymity vs metadata correlation | Recipient public keys, count, and ordering are visible by design. | +| Sender authenticity on unsigned files | Anyone with the recipient's public key can produce an unsigned file. Signature is opt-in for a reason. | +| Side channels in primitive implementations | PQF specifies constructions; the library provides constant-time primitives or it doesn't. | +| Compromised host CSPRNG | If DEK / file_id / ephemeral keys are predictable, confidentiality is gone — but that's true of every system. | +| Endpoint compromise | If the attacker holds a recipient identity, they can decrypt; PQF does not defend stolen keys. | +| Malicious signer | A signature proves "this keypair signed these bytes." It does not prove content truth, intent, or freshness. | +| Forward secrecy in the messaging sense | PQF is a static-file format. Long-term keys decrypt forever. | + +## 7. Operational obligations for implementations + +The arguments in §4 hold only if implementations meet these +spec-mandated obligations: + +- **CSPRNG-sourced DEK, file_id, ephemeral X25519 keys, and any + random fields in the header.** PQF cannot detect a weak RNG; if + it's broken, confidentiality is broken. +- **Deterministic CBOR enforcement at parse**, not just at + production. See spec §2.5 — implementers MUST verify that the + CBOR library either enforces determinism natively or re-encode + and byte-compare. +- **Constant-time recipient trial**: iterate every recipient block + regardless of match. The AEAD tag is the sole signal of a match. +- **Fail-closed on every refusal class** enumerated in spec §8.4. + No "best effort," no logged-and-continued, no fallback. +- **Streaming Mode failure signaling**: if any post-hoc check fails, + the failure MUST be signaled to the consumer in a way that + cannot be silently swallowed. "Logged it" is non-conforming. +- **Memory hygiene for secret material**: DEK, KEK, derived chunk + keys, and decap shared secrets should be zeroized when no longer + needed. The spec does not impose specific platform mechanisms. + +## 8. References + +- RFC 7748 — X25519 +- RFC 8032 — Ed25519 +- FIPS 203 — ML-KEM +- FIPS 204 — ML-DSA +- NIST SP 800-38D — AES-GCM +- RFC 5869 — HKDF +- RFC 8949 — CBOR (§4.2.2 — deterministic encoding) +- draft-connolly-cfrg-xwing-kem — X-Wing combiner +- Barbosa, Boudgoust, Bouvier, Damgård, Kaledin, Pointcheval, Renes + (2024) — X-Wing IND-CCA proofs in ROM and QROM +- `PQF-SPEC-v1.md` §8 — normative Security Considerations +- `PQF-DESIGN-RATIONALE-v1.md` §8, §11 — why each decision; open + questions +- `spec/external-review/REVIEW-STATUS.md` — what has and has not been + reviewed externally diff --git a/spec/external-review/POLISH-NOTES.md b/spec/external-review/POLISH-NOTES.md new file mode 100644 index 0000000..cdfc9ce --- /dev/null +++ b/spec/external-review/POLISH-NOTES.md @@ -0,0 +1,157 @@ +# Spec polish notes — PQF-SPEC-v1.md + +**Status:** punch list from a read-through of `PQF-SPEC-v1.md` on +2026-06-04 against PQF v1 0.6.0. Two inline fixes have already been +applied; the rest are recommendations for the author to take, leave, +or modify. None of these are wire-format changes — per §10.3 they are +"editorial clarifications" and "corrections to cross-references." + +## Already fixed inline (in this branch) + +- **§6.5 cost-of-trial statement** was stale: said "ML-KEM-1024 + decapsulation, one HKDF derivation." Corrected to **ML-KEM-768** + and **SHA3-256 (X-Wing combiner)**. Pre-X-Wing leftovers from + 0.5 / earlier. +- **§11.1 Properties PQF provides table** referenced "salt" in two + rows ("`file_id` in AAD and salt", "`recipient_index` in salt"). + X-Wing's combiner has no salt slot — these properties are + provided by the DEK-wrap AEAD's AAD plus the X-Wing combiner's + binding of `pk_X`. Corrected accordingly. +- **§12.1 negative vector list (P-1)** extended with TV-NEG-023 + through TV-NEG-033 (the header-schema refusal classes), each + with its actual `RefusalReason`, plus a pointer to + `SPEC-CHECKLIST.md` §11 and the authoritative manifest. +- **§12.2 test-vector file format (P-2)** rewritten to describe + the actual shipped schema (single `manifest.json` per spec + version + `cases/` dir, with `Identities[]` and `Vectors[]` + arrays), pointing at `manifest.schema.json` as authoritative. + The previous example (per-vector directories with their own + manifests) was a pre-implementation sketch that the project + outgrew. + +## Recommended fixes (not yet applied) + +### Medium — accuracy / completeness + +#### P-3. §14.2 Barbosa et al. 2024 author list — verify + +Currently reads: "Barbosa, Boyen, Connolly, Schwabe, Stehlé, Strub +(2024) — 'X-Wing: The Hybrid KEM You've Been Looking For'." + +The PQF-OVERVIEW and SECURITY-CONSIDERATIONS docs I just drafted +cite a different author list (Barbosa, Boudgoust, Bouvier, +Damgård, Kaledin, Pointcheval, Renes 2024) for the IND-CCA ROM/QROM +proofs — which is what those docs need to reference. The +Connolly-Schwabe-Stehlé-Strub list is closer to the +draft-connolly-cfrg-xwing-kem author list, not the proof paper. + +**Suggested fix:** verify which paper §14.2 means to cite. If it's +the IND-CCA proof paper, the author list should match what the +new SECURITY-CONSIDERATIONS doc cites. If it's a different X-Wing +paper, name the title precisely. + +#### P-4. "Document history" table at the bottom is stale + +§ "Document history" (lines ~1305+) shows only 0.2.0, 0.3.0, 0.3.1. +Missing 0.5.0 and 0.6.0. The top-of-document Change log has them, +so this is just a stale duplicate. + +**Suggested fix:** either sync the bottom table with the top +Change log, or delete the bottom table and keep the top as the +single source. Latter is cleaner. + +#### P-5. §A.1 BouncyCastle version pin + +References "BouncyCastle.Crypto 2.6.2". `BouncyCastle.Cryptography` +is what the codebase actually depends on (per `dependabot.yml`'s +ignore rule and recent dep bumps). If 2.6.2 is no longer the +installed version, update; otherwise either drop the version +number (so it doesn't go stale again) or keep it as +"≥ 2.6.2" + a one-line "current pinned version in +[Directory.Packages.props](../Directory.Packages.props)" pointer. + +### Low — prose / consistency + +#### P-6. §11.1 "Truncation resistance" row could be tightened + +Currently: "Truncation resistance | `is_final` in AAD + footer counts ++ file signature." + +Consider clarifying that the three layers are AND'd for signed files +and that footer counts alone are the unsigned-file defense (which +§4.5 of the new SECURITY-CONSIDERATIONS doc spells out). A two-clause +table cell would read more naturally. + +#### P-7. §13 "Known gaps" item on ML-KEM/ML-DSA youth + +The wording "NIST standards finalized 2024" is fine in 2026 but +will read oddly in 2028+. Consider replacing with "Standards +finalized within the last few years" or with an absolute year +reference + commitment to revisit on each spec-version review. + +#### P-8. §6.2 step 6: "Write file prefix" + +The step combines several distinct on-disk fields (magic, version, +header_length, header, conditional header_signature) into one +sentence. Splitting into: + +> 6a. Write magic, version, header length (10 bytes total). +> 6b. Write `header_bytes`. +> 6c. If signed: write the 4691-byte `header_signature`. + +would mirror how reader-side §6.3 step 3 / step 7 already split +the same boundaries. + +#### P-9. §12.3 "Spec authority" + +The current paragraph is good. Consider adding the literal sentence +"In particular, where the .NET reference implementation and the +spec disagree, the spec wins." Currently this is stated in the +README but not in the spec. + +#### P-10. Cross-reference §6.3 step 8 + +The "ML-KEM decap does NOT fail … implicit rejection" caveat in +§6.3 step 8b is a critical correctness point. Consider promoting +it to a `Note:` callout (or its own subsection) so a reader +skimming the procedure can't miss it. + +## Structural suggestions (larger, optional) + +#### S-1. Consider a "Reader's Guide" preface + +`PQF-SPEC-v1.md` is 1300 lines. The new `spec/PQF-OVERVIEW.md` +covers some of this, but a one-page preface inside the spec itself +— "If you only have 10 minutes, read sections X, Y, Z" — would +serve readers who land directly in the spec without going through +the README. + +#### S-2. Cross-link IETF draft and rationale at every section header + +The IETF draft and rationale doc cover the same material at +different levels. A small "see also" footnote at each spec section +header (e.g. `## 2. Cryptographic primitives (v1)` → also +`PQF-DESIGN-RATIONALE-v1.md §2`, `draft-clark-pqf-00 §3`) would +help a reviewer triangulate, especially if they bounce between +documents on a specific question. + +#### S-3. Move the IANA registry preamble + +If the IETF draft ever advances, IANA registrations for +`application/pqf` and the `.pqf` extension will need to land in +the spec or the draft. They're currently in the IETF draft only. +Consider whether they should also have a placeholder section in +`PQF-SPEC-v1.md` so the spec is publicly self-contained. + +--- + +## How to use this list + +Read it once. For P-1 and P-2, decide whether to take the suggested +fix or punt with a more minimal edit. P-3 is a quick lookup. The rest +are optional; some you'll agree with, some you won't. Nothing here +affects the wire format — applying or skipping these does not produce +a v1.0.x → v1.1 bump. + +If you want me to apply any specific items, point at them by number +and I'll do them in-place. diff --git a/spec/external-review/REVIEW-STATUS.md b/spec/external-review/REVIEW-STATUS.md new file mode 100644 index 0000000..6d964f9 --- /dev/null +++ b/spec/external-review/REVIEW-STATUS.md @@ -0,0 +1,157 @@ +# PQF Review Status + +**Last updated:** 2026-06-04 +**Format version:** v1 0.6.0 + +A transparent record of what's been reviewed, by whom, with what +limitations. This document exists so reviewers — and users — can +calibrate the credibility of "PQF has been reviewed" claims against +what review actually consisted of. + +## Summary + +| Layer | Review type | Status | +|---|---|---| +| Hybrid KEM combiner (X-Wing) | External, formal | ✅ Inherited from upstream¹ | +| PQF-specific AAD-binding glue | None | ❌ Not reviewed | +| Hybrid signature composition (concat) | LLM-assisted only | ⚠️ Not externally reviewed | +| Per-chunk AEAD construction | LLM-assisted only | ⚠️ Not externally reviewed | +| File signature coverage | LLM-assisted only | ⚠️ Not externally reviewed | +| ML-KEM implicit-rejection handling | LLM-assisted only | ⚠️ Not externally reviewed | +| Deterministic CBOR enforcement rule | LLM-assisted only | ⚠️ Not externally reviewed | +| Wire format and parser robustness | Differential testing | ⚠️ Mechanical, not human | +| Side-channel / constant-time properties | Implementation library defaults | ❌ Not separately reviewed | + +¹ X-Wing itself has IND-CCA proofs in the ROM and QROM (Barbosa et al., +2024). PQF inherits the combiner's proof for the secret-derivation +step. It does NOT inherit a proof for how PQF wraps that secret with +file/recipient binding at the AEAD layer. + +## What "reviewed" has meant so far + +### LLM-assisted review (drafting phase) + +The spec was drafted iteratively with LLM-assisted review by Claude, +Grok, and ChatGPT. This caught real design issues — see the rationale +doc, §2.5 and elsewhere, for specific examples where Grok and ChatGPT +flagged the original in-house combiner. The current spec reflects +those revisions. + +What this kind of review does well: catching obvious omissions, +suggesting tightening of normative language, spotting inconsistency +across sections, flagging well-known weaknesses (canonical-JSON +malleability, naive concat-then-extract combiners). + +What this kind of review does not do: deliver original cryptographic +analysis, evaluate composition with respect to recent literature, or +substitute for someone who has spent a career building intuition for +where these constructions go wrong. Treat every LLM-assisted finding +as "worth investigating," never as "verified." + +### Cross-implementation conformance (mechanical) + +Two independent implementations exercise the same wire format: + +- **.NET reference writer + reader** (BouncyCastle primitives, + `src/PostQuantum.FileFormat`) +- **Rust reader and writer** (ml-kem 0.3, ml-dsa 0.1, x25519-dalek 2.0, + aes-gcm 0.10, sha2 0.11, sha3 0.12, hkdf 0.13; + `impl/rust/pqf-reader`, `impl/rust/pqf-writer`) + +These share no code. They agree byte-for-byte on the published +v1 0.6 test vectors and on 50 random round-trip containers per CI run. +They also independently pass: + +- The published X-Wing draft KAT (replay against + draft-connolly-cfrg-xwing-kem appendix C, example 1) +- HKDF chunk-key derivation KAT +- AES-256-GCM AAD-binding KAT +- Reproducible test-vector regeneration + +What this proves: the spec is implementable independently, and the +implementations agree on the wire format. Cross-impl agreement is the +strongest interop evidence the project has and is mechanical, not +opinion. + +What this does NOT prove: that the wire format is the right one. Two +implementations of the same wrong design agree just as readily as two +implementations of the right design. + +### Symbolic / formal-method explorations (incomplete) + +Tamarin and ProVerif models exist in `spec/symbolic/` as exploratory +artifacts (`pqf-combiner.spthy`, `pqf-combiner.pv`). They are not +complete proofs of the full protocol and are not currently part of +CI. Treat them as "the author started here," not as verified results. + +## What has not been reviewed + +### Cryptographic — wanted before any 1.0 claim + +- The composition of X-Wing's combiner output with the DEK-wrap AEAD + AAD (`file_id || recipient_index`). The combiner inherits a proof; + the AAD-binding step does not. +- Hybrid signature concatenation (Ed25519 || ML-DSA-87, no + domain-separation tag between them). Common in practice; not + separately analyzed for this construction. +- Per-chunk AEAD construction's behavior under partial-write, + truncation, and recovery scenarios. The spec specifies refusal + semantics; no one has poked at a real implementation looking for + ways around them. +- Constant-time recipient-trial in the .NET and Rust references. The + code does iterate every recipient regardless of match; whether the + actual instruction stream is timing-independent depends on the + primitive libraries (BouncyCastle, x25519-dalek, ml-kem) and has not + been measured. +- Footer integrity when the file is unsigned. Spec §11 question #7 — + acknowledged gap. + +### Operational — wanted before any production claim + +- A pen-test or red-team pass against a real implementation (header + parser fuzzing, AEAD corruption, signature stripping, recipient + manipulation). +- Side-channel measurement on a real machine, including cache and + timing on the recipient-trial loop. +- Formal threat-model walk-through with someone whose day job is + attacking file formats. + +### Documentation — wanted before IETF submission + +- Independent read of the IETF Internet-Draft for I-D conventions and + IETF-house-style alignment. +- Confirmation that the deterministic CBOR enforcement rule is + workable in major-language CBOR libraries (spec §11 question #1). + +## How review feedback is incorporated + +Every external finding to date has produced either a versioned spec +change (with a wire-format break if necessary) or an explicit +acknowledgment in the rationale doc. Examples: + +- **0.6.0:** dropped the in-house `pqf1-bind-extract-v1` combiner for + X-Wing after acknowledging the in-house construction had no formal + treatment. +- **0.5.0:** added signature domain-separation prefixes + (`PQF1-header-sig-v1` / `PQF1-file-sig-v1`). +- **0.3.0:** dropped canonical JSON in favor of deterministic CBOR + after multiple reviewers flagged JSON canonicalization fragility; + added explicit fail-closed refusal cases. +- **0.3.1:** tightened the deterministic-CBOR-enforcement rule. + +The change log in `PQF-SPEC-v1.md` records each revision with its +motivating finding. + +## Credit + +When this document next updates, it will name (with permission) every +external reviewer who has provided substantive feedback. As of +2026-06-04 there are none yet to credit. + +## How to contribute review + +- Open a GitHub issue describing the finding, with section references. +- Or email directly. +- If you'd prefer to publish a writeup first and have PQF link to it + rather than have the finding only here, that's fine — say so and + I'll wait. diff --git a/spec/external-review/cfrg-mailing-list-post.md b/spec/external-review/cfrg-mailing-list-post.md new file mode 100644 index 0000000..f95e624 --- /dev/null +++ b/spec/external-review/cfrg-mailing-list-post.md @@ -0,0 +1,98 @@ +# CFRG mailing-list post — draft + +**Status:** unsent. This is a draft for review before sending to +`cfrg@irtf.org`. Read it, edit it, send when ready. + +**Suggested subject line:** + +> [Review request] PQF: hybrid post-quantum file format using X-Wing — open for cryptographic review + +--- + +To: cfrg@irtf.org +Subject: [Review request] PQF: hybrid post-quantum file format using X-Wing — open for cryptographic review + +Hello CFRG, + +I'm requesting review of PQF (Post-Quantum File), a hybrid PQ +file-encryption format aimed at the harvest-now-decrypt-later threat +model for long-term archival. It is the equivalent of `age` or +PKCS #7 enveloped data, but hybrid-PQ by default at the format level. + +PQF v1 uses: + +- **X-Wing** (X25519 + ML-KEM-768) as the hybrid KEM combiner, per + draft-connolly-cfrg-xwing-kem. +- **Ed25519 + ML-DSA-87** as the optional hybrid signature (4691 bytes + fixed, classical || PQ concatenation). +- **AES-256-GCM** chunked, with per-chunk-rekeyed HKDF, fixed zero + nonce, and `is_final` bound into the per-chunk AAD. +- **Deterministic CBOR** (RFC 8949 §4.2.2) for the header, enforced at + parse (not just produced). + +What I'd specifically value review on: + +1. **The PQF-specific glue around X-Wing.** X-Wing's combiner has no + slot for per-file or per-recipient binding. PQF binds those at the + DEK-wrap AEAD layer instead: + `wrapped_dek_aad = file_id (16) || recipient_index (uint32 BE)`. + Is this composition sound for the multi-recipient archival case? +2. **Per-chunk AEAD construction.** Per-chunk rekey + zero nonce + + `is_final` AAD bit. Conforming under SP 800-38D §8.2 if DEK is + fresh, indices are monotonic, and the writer is single-producer + — all REQUIRED by the spec. Are there nonce-reuse or + truncation attack surfaces I haven't accounted for? +3. **File-signature coverage:** `file_id || sha256(chunks) || footer`. + Does this composition cover the whole "what the file says it is" + in one signature without giving up properties I should preserve? +4. **ML-KEM implicit-rejection in the recipient-trial loop**, and the + bounded "weak deniability" claim built on it (spec §8.8). Is the + claim correctly scoped? +5. **The constant-time recipient-trial requirement.** Is the spec's + prose implementable as written, or does it need tightening to + prevent timing leaks about which recipient slot matched? + +Independent implementations exist and produce byte-identical output: +a .NET writer/reader (BouncyCastle), a Rust reader and writer +(ml-kem 0.3, ml-dsa 0.1, x25519-dalek 2.0, aes-gcm 0.10), a Python +binding via maturin, and a WASM bundle. Cross-implementation +conformance is part of CI (Rust reader against .NET vectors, 50 +random round-trips, and a replay of the published X-Wing draft KAT). + +Documents (under MIT license): + +- 3-page reviewer overview: [link to PQF-OVERVIEW.md on main] +- Normative spec (1300+ lines): [link to PQF-SPEC-v1.md] +- Design rationale, including §10 "what reviewers should focus on" and + §11 "open questions": [link to PQF-DESIGN-RATIONALE-v1.md] +- IETF Internet-Draft (not yet submitted): [link to ietf/draft-clark-pqf-00] +- Conformance vectors: [link to test-vectors/v1/] + +The status is explicitly DRAFT / EXPERIMENTAL — no external +cryptographic review has happened to date, only LLM-assisted review +during drafting (which is acknowledged in the rationale doc, not hidden). +The point of this email is to get real review before the format +freezes for v1.0. + +Pointers, objections, or "this part is wrong" — all welcomed. Happy to +discuss on-list or off-list, whichever the group prefers. + +Thanks, +Paul Clark + +https://github.com/systemslibrarian/PostQuantum.FileFormat + +--- + +## Sending notes (delete before sending) + +- Replace `[link to ...]` placeholders with real URLs to the files on + `main` (or a tagged release commit). GitHub blob URLs are fine. +- CFRG list etiquette: plain text, no HTML, no attachments. One topic + per thread. Don't cross-post to other IETF lists for this first + thread. +- The list is public and archived. Anything you send is permanent. +- Expect days-to-weeks for substantive responses; follow up once after + ~3 weeks if quiet. +- Sign up at https://www.irtf.org/mailman/listinfo/cfrg first if not + already subscribed; non-subscriber posts are moderated. diff --git a/spec/external-review/researcher-outreach-template.md b/spec/external-review/researcher-outreach-template.md new file mode 100644 index 0000000..689f4e6 --- /dev/null +++ b/spec/external-review/researcher-outreach-template.md @@ -0,0 +1,92 @@ +# Researcher outreach email — template + +**Status:** unsent. Fill in the bracketed fields per recipient and edit +the second paragraph to reflect why *that specific person* is being +asked. A generic blast will get generic responses or none. + +## Who to target + +Pick 3–5 researchers whose published work is close enough to PQF's +construction that they could give useful feedback in under an hour of +their time. Good candidate types (do not contact all five): + +- An author of the **X-Wing draft** (draft-connolly-cfrg-xwing-kem). PQF + uses their combiner; they'll spot misuse instantly. +- An author of a **hybrid-KEM combiner analysis** paper (Barbosa, + Giacon, Heuer, etc.). They'll evaluate the AAD-binding glue around + X-Wing. +- A researcher working on **file-encryption format security** (people + who've written about age, OpenPGP, or PKCS#7 weaknesses). They'll + spot format-layer mistakes that pure-crypto reviewers might miss. +- A researcher working on **chunked AEAD** or **streaming authenticated + encryption** (e.g. authors of the STREAM construction or related + work). They'll evaluate §5.2. +- A researcher with recent work on **deterministic CBOR** or signed + structured-data canonicalization. Niche, but the spec rises or falls + on this. + +Find current emails on the researcher's personal page or institutional +page, NOT on a paper PDF (those addresses go stale). + +## The email + +--- + +Subject: Brief review request: hybrid PQ file format using X-Wing + +Dear Dr. **[LAST NAME]**, + +I'm Paul Clark, an independent developer working on PQF — a hybrid +post-quantum file-encryption format for long-term archival. I've put +the core spec, design rationale, and reference implementations on +GitHub under MIT, and I'm in the phase of soliciting external review +before freezing v1. + +**[ONE-TO-TWO-SENTENCE REASON YOU'RE WRITING THEM SPECIFICALLY. Example +for an X-Wing author: "PQF v1 uses X-Wing as its hybrid KEM combiner, +and my draft glues per-file and per-recipient binding to it at the +AEAD layer rather than inside the combiner. I'd value your view on +whether that composition is sound or whether I've missed a subtlety +in how X-Wing's security argument carries through."]** + +I'm not asking for a full security review. What would help most is +reading the **3-page reviewer overview** linked below and replying +with a single concrete reaction — "the X part looks fine; the Y +binding worries me because Z" is more useful to me than silence after +a deep dive. If a longer conversation grows from there I'd be glad to +have it. + +- 3-page reviewer overview: [LINK] +- Normative spec (1300 lines, for reference, not required reading): + [LINK] +- Design rationale with §10 "what reviewers should focus on" and §11 + "open questions": [LINK] + +I'm acutely aware that asking for time from researchers I haven't met +is presumptuous; I would not be writing if I weren't trying to do +this in the open and put the format through real review before any +production claim. If you don't have bandwidth, a single-sentence +"can't, sorry" is a perfectly fine reply. + +Thank you for your time. + +Best regards, +Paul Clark + +https://github.com/systemslibrarian/PostQuantum.FileFormat + +--- + +## Sending notes + +- Personalize the bracketed paragraph for every recipient. If you can't + write one specific sentence about why this person, don't email them + — they'll feel the blast and ignore it. +- Send individually, not bcc. Researchers can tell. +- Don't follow up sooner than 3 weeks. Don't follow up more than once. +- Track who you've emailed and what they said in a private file — not + to be pushy, but so you can credit them later in the spec's + acknowledgments section if they engage. +- If someone declines but suggests another person, thank them and ask + if they'd mind a brief introduction. +- Plain text. No tracking pixels. No HTML signatures with images. diff --git a/spec/external-review/workshop-abstract.md b/spec/external-review/workshop-abstract.md new file mode 100644 index 0000000..ea8ca9d --- /dev/null +++ b/spec/external-review/workshop-abstract.md @@ -0,0 +1,112 @@ +# Workshop submission abstract — draft + +**Status:** unsent. This is a single 450-word abstract aimed at +**Real World Crypto (RWC)** or **Real-World Post-Quantum Crypto +(RWPQC)**. Both venues accept short-talk / lightning-talk proposals +that are accessible and applied. Pick the venue, adapt the +opening sentence, and submit. + +**Venue notes for picking one (delete before submission):** + +- **RWC (Real World Crypto)** — annual IACR-affiliated symposium, + applied-crypto leaning, broad audience. PQF fits as a "deployed + file-at-rest format using NIST PQ standards" talk. Submission is + typically a short abstract; talks are ~25 min. Strong reach to + cryptographers who work on standardization. Best fit if you want + the broadest cross-section of the applied-crypto community. +- **RWPQC (Real-World Post-Quantum Crypto)** — newer, NIST-affiliated + workshop explicitly about deploying NIST PQ standards. PQF is + exactly the kind of artifact RWPQC wants to surface (an + independent, open-source, hybrid-PQ deployment that's been + through cross-implementation interop). Best fit if you want + feedback from the people writing the standards rather than the + broader community. + +- **Not RWC or RWPQC?** This abstract is also adaptable to a USENIX + Security poster, an IETF presentation slot (CFRG or SAAG), or an + industry conference. Adjust the framing of the opening sentence + to suit the venue's audience. + +--- + +## Title + +**PQF: a hybrid post-quantum file format using X-Wing for long-term archival** + +## Abstract (~450 words) + +We present PQF (Post-Quantum File), an open-source file-encryption +format designed for the harvest-now-decrypt-later threat model. PQF +addresses a specific gap in the post-quantum transition: hybrid PQ +transit protocols are widely deployed (X25519MLKEM768 in Chrome, Edge, +Firefox; PQ-hybrid SSH key exchange; multiple IETF drafts in flight), +but file-at-rest formats have lagged. age, GPG, and S/MIME remain +classical-by-default; PQ support is plugin-only or absent. PQF is an +attempt to make hybrid post-quantum confidentiality the *default*, not +an opt-in, at the format level, in a small, fail-closed container that +can be implemented end-to-end in an afternoon. + +PQF v1 uses **X-Wing** (X25519 + ML-KEM-768) per +draft-connolly-cfrg-xwing-kem as its hybrid KEM combiner, with +IND-CCA proofs in ROM and QROM (Barbosa et al., 2024). Hybrid +signatures (Ed25519 + ML-DSA-87, concatenated) are optional. The +payload is AES-256-GCM chunked, with per-chunk-rekeyed HKDF, a fixed +zero nonce, and an `is_final` AAD bit that makes truncation an AEAD +verify failure rather than a footer mismatch. The header is +**deterministic CBOR** (RFC 8949 §4.2.2), and the spec REQUIRES that +readers enforce determinism on parse — not merely produce it on +encode. Unknown header fields at any level are refused. The result is +a 1300-line spec, a small reference .NET implementation, and an +independent Rust reader and writer that share no code with the +reference. + +We will discuss three design decisions that produced unexpected +follow-on simplifications. **First**, cutting over from a +PQF-author in-house HKDF combiner (`pqf1-bind-extract-v1`) to +standardized X-Wing meant we could inherit a proof rather than ship +one — and pushed per-file and per-recipient binding down to the +DEK-wrap AEAD's AAD (`file_id || recipient_index`), where it +composes more cleanly. **Second**, the per-chunk-rekey + zero-nonce +construction avoided counter-IV bookkeeping and made the +`is_final` bit a natural place to bind truncation defense. +**Third**, requiring deterministic CBOR *enforcement on parse* — a +substantial implementation burden — turned out to eliminate a class +of signed-canonicalization bugs that have historically dogged +file-encryption formats. + +PQF is **DRAFT / EXPERIMENTAL**. No external cryptographic review has +happened yet; this submission is partly to invite it. What we have so +far is mechanical interop evidence: cross-implementation conformance +between the .NET writer and Rust reader on every published vector, a +replay of the X-Wing draft KAT against published IETF vectors, and a +reproducible test-vector regeneration pipeline. We would value +feedback on the AAD-binding composition around X-Wing, the per-chunk +AEAD construction's behavior under adversarial truncation, and the +weak-deniability claim built on ML-KEM's implicit rejection. + +Spec, rationale, IETF Internet-Draft, and conformance suite under MIT +license at: +**** + +## Speaker + +Paul Clark — independent developer; PQF author and editor of the +spec, reference implementation, and IETF Internet-Draft. + +## Submission notes (delete before sending) + +- Word count of the abstract proper (excluding title, speaker bio, + these notes): ~450 words. Trim to venue's limit if needed (RWC + typically asks for ≤500; RWPQC is more flexible). +- The "three design decisions" paragraph is the most adaptable — + swap one of the three for whatever a venue would find most + surprising, if you have a sense of audience. +- The honesty about no external review is deliberate. If you'd + prefer to soften it, change "No external cryptographic review has + happened yet" to "External cryptographic review is actively + in progress" *after* the CFRG post has been sent and at least one + researcher has engaged. Don't soften before that's true. +- Talks at RWC/RWPQC favor demos and design-decision war stories + over walking the spec section by section. Plan the talk to mirror + the abstract's structure: gap → choices → unexpected simplifications + → what reviewers should focus on. diff --git a/test-vectors/QUICKSTART.md b/test-vectors/QUICKSTART.md new file mode 100644 index 0000000..2a959ed --- /dev/null +++ b/test-vectors/QUICKSTART.md @@ -0,0 +1,136 @@ +# Conformance Quickstart + +Want to verify the cross-implementation conformance claim yourself? +This page walks a reviewer from `git clone` to "saw it pass" in under +10 minutes. + +The point: the .NET writer and the Rust reader share no code. If they +agree on every vector, the wire format is implementable independently +— and you don't have to take anyone's word for it. + +## Prerequisites + +- **.NET 10 SDK** (10.0.x) — for the reference writer +- **Rust stable** (1.78+) — for the independent reader +- **git** + a working shell + +On Windows, PowerShell works for both toolchains. On macOS/Linux, +bash/zsh. + +```bash +# Verify versions +dotnet --version # expect 10.0.x +rustc --version # expect 1.78+ stable +``` + +## Path A: validate the committed vectors (fastest, ~2 minutes) + +The 14 positive vectors and 33 negative vectors are already committed +under `test-vectors/v1/`. The Rust reader's `pqf-conformance` binary +loads them and reports pass/fail. + +```bash +git clone https://github.com/systemslibrarian/PostQuantum.FileFormat.git +cd PostQuantum.FileFormat +cd impl/rust/pqf-reader +cargo run --release --bin pqf-conformance +``` + +Expected: every positive vector decrypts to the SHA-256 listed in +`test-vectors/v1/manifest.json`, every negative vector is refused with +the spec-mandated `RefusalReason`. Exit code 0 means "all 47 vectors +behave exactly as the manifest says." + +If you see a failure, that is a bug worth reporting — please file a +GitHub issue with the vector ID and the actual vs expected outcome. + +## Path B: regenerate the vectors yourself (~5 minutes) + +This proves the vectors are deterministic and weren't doctored. The +.NET `TestVectors` console app re-derives every byte from seeded +inputs. + +```bash +# From repo root +dotnet restore PostQuantum.FileFormat.sln +dotnet build PostQuantum.FileFormat.sln -c Release --no-restore +dotnet run --project tests/PostQuantum.FileFormat.TestVectors --no-build -c Release + +# Now compare regenerated bytes against what's committed +git diff --exit-code test-vectors/v1/ +``` + +Expected: `git diff --exit-code` exits 0. The signed vectors regenerate +byte-deterministically because: +- ML-DSA-87 deterministic signing is enabled in the TestVectors + harness (FIPS 204 deterministic mode) +- Ed25519 is deterministic by RFC 8032 +- All randomness in the writer is threaded through seeded test hooks + (test-only plumbing; never surfaced by the production API) + +Then run Path A's Rust conformance check against the regenerated +vectors: + +```bash +cd impl/rust/pqf-reader +cargo run --release --bin pqf-conformance +``` + +If both diffs are clean and conformance passes, you've independently +verified the round trip: +**seeded inputs → .NET writer → wire bytes → Rust reader → expected plaintext**. + +## Path C: replay the published X-Wing KAT (~30 seconds) + +PQF's hybrid KEM is X-Wing (X25519 + ML-KEM-768) per +draft-connolly-cfrg-xwing-kem. The Rust writer ships an integration +test that replays the published draft KAT vectors directly, with no +PQF format layer in the way. + +```bash +cd impl/rust/pqf-writer +cargo test --release --test xwing_draft_kat +``` + +Expected: one test passes. It loads the X-Wing public key, ephemeral +seed, and recipient secret from +draft-connolly-cfrg-xwing-kem Appendix C example 1, performs +deterministic encap on one side and decap on the other, and asserts +that both shared secrets match the published bytes and the X-Wing +ciphertext matches the published ct. This is independent +confirmation that PQF's X-Wing implementation tracks the draft +specification rather than a private interpretation of it. + +## What the suite covers + +| Suite | Files | What it proves | +|---|---|---| +| Positive vectors | `cases/TV-001.pqf` … `TV-014.pqf` | Decryption succeeds, plaintext SHA-256 matches manifest | +| Negative vectors | `cases/TV-NEG-001.pqf` … `TV-NEG-033.pqf` | Each fail-closed refusal class fires the correct `RefusalReason` | +| NIST KAT | `nist-kat/` | ML-KEM-768 / ML-DSA-87 primitive correctness against NIST published vectors | +| X-Wing draft KAT | `xwing_draft_kat.rs` | PQF's X-Wing implementation matches the published IETF draft KAT | +| Reproducible regen | the diff in Path B | Vectors are deterministically regenerable from seeded inputs | +| Cross-impl | the run in Path A | The Rust reader agrees with the .NET writer on every vector | + +`SPEC-CHECKLIST.md` §11 maps every fail-closed refusal class to its +portable test vector, so each negative vector is traceable to a +specific normative MUST in the spec. + +## If something fails + +A regression here would be a real finding. Please: + +1. Capture the exact command you ran, the vector ID, and the actual + vs expected outcome. +2. Open an issue at + . +3. If you'd rather email privately first (e.g. you think it's a + security finding), contact . + +## CI mirror + +These same checks run on every PR via +[`.github/workflows/ci.yml`](../.github/workflows/ci.yml) (jobs +`rust-conformance` and `reproducible-vectors`). If your local result +differs from the CI badge, your toolchain version is the first thing +to check.