hack-ink · yvette-carlisle · Jun 11, 2026 · Jun 11, 2026
diff --git a/Cargo.lock b/Cargo.lock
diff --git a/README.md b/README.md
@@ -149,19 +149,20 @@ provider-backed ELF evidence was required.
   mem0, OpenViking, and claude-mem remained typed non-pass states. OpenViking now
   reaches its pinned Docker local embedding path and is reported as `wrong_result`
   when same-corpus evidence terms are missed; setup failures remain `incomplete`.
-- Real-world agent memory aggregate after the P1 benchmark batch: 38 fixture-backed
-  jobs across 11 suites, 36 pass, 0 incomplete, 2 blocked, 0 wrong-result,
+- Real-world agent memory aggregate after the P1 benchmark batch: 40 fixture-backed
+  jobs across 11 suites, 38 pass, 0 incomplete, 2 blocked, 0 wrong-result,
   0 not-encoded, and 0 unsupported-claim results. The remaining non-pass jobs are
   production-ops operator boundaries, not hidden benchmark wins.
 - Full-suite live real-world adapter sweep after XY-899: ELF and qmd emit
-  Docker-isolated `live_real_world` records for all 38 encoded jobs across 11 suites
+  Docker-isolated `live_real_world` records for all 40 encoded jobs across 11 suites
   through `cargo make real-world-memory-live-adapters`. Both keep the original
   targeted `work_resume`, `retrieval`, and `project_decisions` slice passing, but the
-  full sweep is not a full-suite pass. The fresh ELF sweep reports 18 pass,
-  5 wrong_result, 2 blocked, and 13 not_encoded jobs. The fresh qmd sweep reports
-  17 pass, 6 wrong_result, 2 blocked, and 13 not_encoded jobs. The difference is the
-  delete/TTL tombstone case; qmd remains the local retrieval-debug UX reference, and
-  no broad ELF-over-qmd claim is allowed.
+  full sweep is not a full-suite pass. The fresh ELF sweep reports 22 pass,
+  5 wrong_result, 2 blocked, and 11 not_encoded jobs. The fresh qmd sweep reports
+  17 pass, 6 wrong_result, 2 blocked, and 15 not_encoded jobs. The differences are
+  the delete/TTL tombstone case plus ELF-only capture/write-policy live self-checks;
+  qmd remains the local retrieval-debug UX reference, and no broad ELF-over-qmd claim
+  is allowed.
 - Live operator-debugging slice after XY-932: `cargo make
   real-world-job-operator-ux-live-adapters` emits narrow Docker-isolated
   `live_real_world` records for ELF and qmd over the operator-debugging fixtures.
@@ -194,6 +195,12 @@ provider-backed ELF evidence was required.
   for local SDK export-style parity, `blocked` for OpenMemory UI/export, and
   `non_goal` for hosted Platform export and optional graph memory in the local OSS
   lane.
+- Capture/write-policy live follow-up after XY-933: ELF now passes 4/4 live
+  `capture_integration` jobs with zero redaction leaks, source ids preserved in
+  source refs, write-policy redaction audit counts, evidence binding, and no secret
+  leakage. qmd remains `not_encoded` for this suite. agentmemory capture comparison is
+  blocked by mocked/in-memory storage, and claude-mem hook/viewer capture remains
+  untested, so no broad capture-breadth superiority claim is allowed.
 - The benchmark runner and report publisher are checked in and Docker-isolated:
   `cargo make baseline-live-docker`, `cargo make baseline-backfill-docker`,
   `cargo make baseline-production-private-addendum`,
@@ -216,6 +223,7 @@ Detailed evidence and interpretation:
 - [ELF/qmd Trace Replay Diagnostics Report - June 11, 2026](docs/guide/benchmarking/2026-06-11-elf-qmd-trace-replay-diagnostics-report.md)
 - [Graph/RAG Scored Smoke Adapter Report - June 11, 2026](docs/guide/benchmarking/2026-06-11-graph-rag-scored-smoke-adapter-report.md)
 - [mem0/OpenMemory History and UI Export Report - June 11, 2026](docs/guide/benchmarking/2026-06-11-mem0-openmemory-history-ui-export-report.md)
+- [Capture/Write-Policy Live Report - June 11, 2026](docs/guide/benchmarking/2026-06-11-capture-write-policy-live-report.md)
 - [Live Baseline Benchmark Runbook](docs/guide/benchmarking/live_baseline_benchmark.md)
 - [Single-User Production Runbook](docs/guide/single_user_production.md)
 - Benchmark contract:
@@ -238,7 +246,8 @@ Evidence-backed position after the June 11 real-world reports:
   typed non-pass states, while ELF has the stronger service and provenance contract.
 - ELF is still behind or not yet proven on full-suite live real-world pass parity,
   private-corpus production quality, credentialed production-ops gates,
-  qmd-style local debug knobs, agentmemory/claude-mem/OpenMemory-style continuity UX,
+  qmd-style local debug knobs, agentmemory/claude-mem/OpenMemory-style capture and
+  continuity UX,
   OpenViking-style context trajectory, and hosted managed memory.
 
 Quick comparison snapshot (objective/high-level).
@@ -292,6 +301,7 @@ Detailed comparison, mechanism-level analysis, and source map:
 - [ELF/qmd Trace Replay Diagnostics Report - June 11, 2026](docs/guide/benchmarking/2026-06-11-elf-qmd-trace-replay-diagnostics-report.md)
 - [Graph/RAG Scored Smoke Adapter Report - June 11, 2026](docs/guide/benchmarking/2026-06-11-graph-rag-scored-smoke-adapter-report.md)
 - [mem0/OpenMemory History and UI Export Report - June 11, 2026](docs/guide/benchmarking/2026-06-11-mem0-openmemory-history-ui-export-report.md)
+- [Capture/Write-Policy Live Report - June 11, 2026](docs/guide/benchmarking/2026-06-11-capture-write-policy-live-report.md)
 - [Live Baseline Benchmark Runbook](docs/guide/benchmarking/live_baseline_benchmark.md)
 - [Real-World Agent Memory Benchmark](docs/guide/benchmarking/real_world_agent_memory_benchmark.md)
 - [External Memory Improvement Plan](docs/guide/research/external_memory_improvement_plan.md)

diff --git a/apps/elf-eval/Cargo.toml b/apps/elf-eval/Cargo.toml
@@ -22,6 +22,7 @@ uuid               = { workspace = true }
 elf-chunking = { workspace = true }
 elf-cli      = { workspace = true }
 elf-config   = { workspace = true }
+elf-domain   = { workspace = true }
 elf-service  = { workspace = true }
 elf-storage  = { workspace = true }
 elf-testkit  = { workspace = true }

diff --git a/apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json b/apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json
@@ -29,7 +29,7 @@
       },
       "run": {
         "status": "blocked",
-        "evidence": "The current fixture set reports 38 jobs, 36 pass, 0 incomplete, 2 blocked, 0 wrong_result, 0 not_encoded, and 0 unsupported_claim.",
+        "evidence": "The current fixture set reports 40 jobs, 38 pass, 0 incomplete, 2 blocked, 0 wrong_result, 0 not_encoded, and 0 unsupported_claim.",
         "command": "cargo make real-world-memory",
         "artifact": "tmp/real-world-memory/real-world-memory-report.json"
       },
@@ -99,7 +99,7 @@
         {
           "suite_id": "capture_integration",
           "status": "pass",
-          "evidence": "The redaction and capture-boundary fixture is encoded and passing."
+          "evidence": "Four redaction, exclusion, source-id, evidence-binding, and capture-boundary fixtures are encoded and passing."
         },
         {
           "suite_id": "production_ops",
@@ -146,13 +146,13 @@
       },
       "run": {
         "status": "wrong_result",
-        "evidence": "ELF materializes 38 real_world_job adapter_response objects through ElfService, worker indexing, and search_raw before scoring; the full sweep includes typed wrong_result, blocked, and not_encoded job records.",
+        "evidence": "ELF materializes 40 real_world_job adapter_response objects through ElfService, worker indexing, search_raw, and live capture/write-policy ingestion before scoring; the full sweep includes typed wrong_result, blocked, and not_encoded job records.",
         "command": "cargo make real-world-memory-live-adapters",
         "artifact": "tmp/real-world-memory/live-adapters/elf-report.json"
       },
       "result": {
         "status": "wrong_result",
-        "evidence": "The fresh full live sweep scores 38 jobs across all 11 encoded suites: 18 pass, 5 wrong_result, 0 incomplete, 2 blocked, and 13 not_encoded. This is not a full-suite live pass.",
+        "evidence": "The fresh full live sweep scores 40 jobs across all 11 encoded suites: 22 pass, 5 wrong_result, 0 incomplete, 2 blocked, and 11 not_encoded. This is not a full-suite live pass.",
         "command": "cargo make real-world-memory-live-adapters",
         "artifact": "tmp/real-world-memory/live-adapters/elf-report.md"
       },
@@ -175,7 +175,7 @@
         {
           "capability": "full_suite_live_sweep",
           "status": "wrong_result",
-          "evidence": "The runner now emits per-job and per-suite live records for all 38 encoded jobs, but memory_evolution is wrong_result and several non-answer-generation suites remain typed non-pass."
+          "evidence": "The runner now emits per-job and per-suite live records for all 40 encoded jobs, but memory_evolution is wrong_result and several non-answer-generation suites remain typed non-pass."
         },
         {
           "capability": "full_suite_live_pass",
@@ -231,8 +231,8 @@
         },
         {
           "suite_id": "capture_integration",
-          "status": "not_encoded",
-          "evidence": "The live adapter sweep does not exercise capture integrations or write-policy redaction boundaries."
+          "status": "pass",
+          "evidence": "The live adapter passes 4/4 capture_integration jobs through Docker-local ELF ingestion, including capture-boundary classification, excluded evidence ids, source ids in source_ref, write_policy redaction audit counts, evidence binding, and zero secret leakage."
         },
         {
           "suite_id": "production_ops",
@@ -245,6 +245,18 @@
           "evidence": "The live adapter retrieved the scoped preference evidence and passed the personalization job."
         }
       ],
+      "scenarios": [
+        {
+          "scenario_id": "live_capture_write_policy",
+          "suite_id": "capture_integration",
+          "status": "pass",
+          "elf_position": "ties",
+          "comparison_outcome": "tie",
+          "evidence": "ELF live capture/write-policy jobs pass for redaction, exclusions, source ids, evidence binding, and no secret leakage. This is an ELF self-check, not a win over external hook systems.",
+          "command": "cargo make real-world-memory-live-adapters",
+          "artifact": "tmp/real-world-memory/live-adapters/elf-materialization.json"
+        }
+      ],
       "evidence": [
         {
           "kind": "fixture_dir",
@@ -359,13 +371,13 @@
       },
       "run": {
         "status": "wrong_result",
-        "evidence": "qmd materializes 38 real_world_job adapter_response objects through collection add, update, embed, and query --json before scoring; the full sweep includes typed wrong_result, blocked, and not_encoded job records.",
+        "evidence": "qmd materializes 40 real_world_job adapter_response objects through collection add, update, embed, and query --json before scoring; the full sweep includes typed wrong_result, blocked, and not_encoded job records.",
         "command": "cargo make real-world-memory-live-adapters",
         "artifact": "tmp/real-world-memory/live-adapters/qmd-report.json"
       },
       "result": {
         "status": "wrong_result",
-        "evidence": "The fresh full qmd live sweep scores 38 jobs across all 11 encoded suites: 17 pass, 6 wrong_result, 0 incomplete, 2 blocked, and 13 not_encoded. This is not a full-suite live pass.",
+        "evidence": "The fresh full qmd live sweep scores 40 jobs across all 11 encoded suites: 17 pass, 6 wrong_result, 0 incomplete, 2 blocked, and 15 not_encoded. This is not a full-suite live pass.",
         "command": "cargo make real-world-memory-live-adapters",
         "artifact": "tmp/real-world-memory/live-adapters/qmd-report.md"
       },
@@ -388,7 +400,7 @@
         {
           "capability": "full_suite_live_sweep",
           "status": "wrong_result",
-          "evidence": "The runner now emits per-job and per-suite live records for all 38 encoded jobs, but memory_evolution is wrong_result and several non-answer-generation suites remain typed non-pass."
+          "evidence": "The runner now emits per-job and per-suite live records for all 40 encoded jobs, but memory_evolution is wrong_result and several non-answer-generation suites remain typed non-pass."
         },
         {
           "capability": "full_suite_live_pass",
@@ -445,7 +457,7 @@
         {
           "suite_id": "capture_integration",
           "status": "not_encoded",
-          "evidence": "The qmd live adapter sweep does not exercise capture integrations or write-policy redaction boundaries."
+          "evidence": "The qmd live adapter sweep does not exercise capture integrations or write-policy redaction boundaries; all capture_integration jobs remain typed not_encoded for qmd."
         },
         {
           "suite_id": "production_ops",
@@ -838,6 +850,15 @@
           "elf_position": "untested",
           "evidence": "agentmemory's relevant strength is durable coding-agent continuity and capture, but the Docker harness has not proven a persistent session/capture path. Keep work_resume and capture claims blocked until a durable local adapter path exists.",
           "artifact": "apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json"
+        },
+        {
+          "scenario_id": "capture_write_policy_hooks",
+          "suite_id": "capture_integration",
+          "status": "blocked",
+          "elf_position": "untested",
+          "comparison_outcome": "blocked",
+          "evidence": "agentmemory capture breadth is blocked for comparison because the current Docker baseline uses a process-local StateKV Map and in-memory index; no durable local session/capture path stores source ids, exclusions, write-policy audit, or evidence-bound capture output.",
+          "artifact": "apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json"
         }
       ],
       "evidence": [
@@ -1353,7 +1374,7 @@
           "suite_id": "capture_integration",
           "status": "not_encoded",
           "elf_position": "untested",
-          "evidence": "The Docker baseline uses repository classes only. claude-mem hooks, viewer, timeline, and observation workflows are not executed by the runner.",
+          "evidence": "The Docker baseline uses repository classes only. claude-mem hooks, timeline, observations, viewer capture, and automatic capture review workflows are not executed by the runner, so capture breadth remains untested rather than an ELF win/loss.",
           "artifact": "apps/elf-eval/fixtures/real_world_external_adapters/memory_projects_manifest.json"
         }
       ],

diff --git a/apps/elf-eval/fixtures/real_world_memory/capture_integration/redaction_exclusion.json b/apps/elf-eval/fixtures/real_world_memory/capture_integration/redaction_exclusion.json
@@ -6,11 +6,34 @@
   "corpus": {
     "corpus_id": "real-world-memory-capture-2026-06-09",
     "profile": "synthetic",
+    "capture_behaviors": {
+      "real": [
+        "ELF live add_note capture can persist public evidence with source ids and skip excluded evidence ids through the Docker live adapter."
+      ],
+      "fixture_backed": [
+        "The fixture encodes public capture, write-policy audit evidence, and a private excluded span as a negative trap."
+      ],
+      "blocked": [
+        "agentmemory hook breadth remains blocked until a durable local session/capture adapter replaces the in-memory mock.",
+        "claude-mem hook and viewer capture remain not encoded because the Docker baseline does not execute hooks, timeline, or viewer workflows."
+      ],
+      "not_encoded": [
+        "Host-global capture hooks, Slack/browser capture, and credentialed tool capture are outside this Docker-scoped benchmark."
+      ],
+      "notes": [
+        "Live ELF scoring must not store or retrieve the private-excluded-text evidence id."
+      ]
+    },
     "items": [
       {
         "evidence_id": "public-captured-decision",
         "kind": "message",
         "text": "Public memory candidate: benchmark reports should include evidence coverage, unsupported claim count, stale retrieval count, scope correctness, and redaction leak count.",
+        "capture": {
+          "action": "store",
+          "source_id": "capture:linear-comment-933",
+          "evidence_binding": "source_ref"
+        },
         "source_ref": {
           "schema": "source_ref/v1",
           "resolver": "real_world_job_fixture/v1",
@@ -28,6 +51,11 @@
         "evidence_id": "write-policy-audit",
         "kind": "trace",
         "text": "Write policy audit: private spans were excluded before evidence binding, and the stored note only retained the public benchmark-report requirement.",
+        "capture": {
+          "action": "store",
+          "source_id": "capture:write-policy-audit-933",
+          "evidence_binding": "source_ref"
+        },
         "source_ref": {
           "schema": "source_ref/v1",
           "resolver": "real_world_job_fixture/v1",
@@ -45,6 +73,11 @@
         "evidence_id": "private-excluded-text",
         "kind": "message",
         "text": "Private excluded text: operator's home server label is cobalt-ridge.",
+        "capture": {
+          "action": "exclude",
+          "source_id": "capture:excluded-private-span-933",
+          "evidence_binding": "negative_trap"
+        },
         "source_ref": {
           "schema": "source_ref/v1",
           "resolver": "real_world_job_fixture/v1",