AdaWorldAPI · AdaWorldAPI · Jul 2, 2026 · Jul 2, 2026 · Jul 2, 2026 · Jul 2, 2026
diff --git a/.claude/board/EPIPHANIES.md b/.claude/board/EPIPHANIES.md
@@ -1,3 +1,23 @@
+## 2026-07-02 — E-1BRC-GRIDLAKE-SWEETSPOT-1: the 64×64 gridlake SoA is the measured sweet spot — the batch pipeline at tile scale equals the best streamed topology while carrying the double-WAL
+**Status:** FINDING (measured, onebrc-probe lane J t7; closes the operator's four follow-up questions and the t4→t7 kanban-update arc)
+
+Lane J parameterized lane I with the operator's questions as knobs (grid × sink-lanes × registry). t7 @4 cores, same-session refs G(1)=46.3 / H=40.5 / F=70.1: **J(gridlake 4096, 1 lane, no registry) = 46.2–46.3 Mrows/s — equal to the best streamed ownership topology, above the lazy-owner orchestrator, with the strongest witness (double-WAL both ends, 312 msgs).** The four answers: (1) t6's ~20 vs H's 39.4 decomposes into registry RESIDENCY (knob-isolated: ON halves steady state net of spawn — t6's CONJECTURE promoted to FINDING) + the L2-busting 64K-cell table/memo working set; (2) the cache WAS wrong — matching the batch SoA to the 64×64 gridlake tile (4096 cells ≈ 80 KB integer; the literal 4096×BF16=16 KB pair is ndarray #227's proven VDPBF16PS tier) recovers it completely; (3) 1 sink lane suffices, 8 free, 64 over-lanes — lane count scales with per-batch APPLY work, never with data or address space; (4) orchestration-with-occupancy is NOT the sweet spot when ownership can live as the index-aligned guarantee table — H's lazy mechanisms stay right only when fine-grained ownership must be actors. Composed recipe: **64×64 gridlake batch SoA + codebook CAM addressing + 1–8 sink lane pairs + whole-table double-cast + flush cache; ownership = index-aligned guarantee table, no standing per-cell registry; BF16 planes per #227 when tile-GEMM lands.** Tables: crates/onebrc-probe/README.md §5.7.
+## 2026-07-02 — E-1BRC-BATCH-PIPELINE-1: the operator's batch-pipeline spec measured — 312 messages total, double-WAL on both ends, flush cache interleaves; the remaining cost is residency, not architecture
+**Status:** FINDING (measured, onebrc-probe lane I t6; completes the t4→t5→t6 kanban-update arc; operator spec implemented verbatim)
+
+The spec: all 65536 mailboxes UPFRONT; two fixed aligned indices (mailbox idx == SoA row idx — ownership = the `row_owner[i]==i` binding + write-on-behalf, never a message path); codebook-minted identity → direct CAM addressing (no probe/compare in the hot loop; ~400 global mint locks per run, total); whole-table DOUBLE-CASTS (one Arc per 64K-row batch to BOTH the mailbox-ownership-guarantee sink and the Lance row-address sink); flush cache so flushing and reindex-next interleave. t6 measured: **312 messages total** (156 batches × 2 ends — vs ~63K flat t4a, ~2.6K orchestrated t5: message count tracks BATCHES, independent of occupancy AND address-space size); flush_cache peak 2–3 tables/worker (the worker never waits for a flush); ownership==lance==156 journals + 156 version ticks asserted (the double-WAL: replayable from either end — the full W1b batch-writer shape, stronger than G/H's ownership-only witness); 64K spawn = 1.1–2.7 s one-time standing infrastructure (17–40 µs/actor); steady state ~20–22 Mrows/s ≈ ½ of the 1-owner streamed G(1) 43 — attribution CONJECTURE: resident-64K-actor memory footprint on a 4-core container + two serialized sinks. RULING: **the batch pipeline wins the messaging war outright and carries the strongest witness story; the standing 64K registry is affordable infrastructure; the remaining optimization surface is residency footprint, not architecture.** Composition across the arc: H's lazy activation (registry need not pre-exist) and I's whole-table double-cast + flush cache (when it does) are complementary — the shared invariant is that producers NEVER address fine-grained owners directly. Tables: crates/onebrc-probe/README.md §5.6.
+## 2026-07-02 — E-1BRC-ORCHESTRATION-SWEETSPOT-1: the sweet spot is the orchestration tier itself — lazy activation + ahead-firing batching flatten the ownership curve (23× recovery at the 64K end)
+**Status:** FINDING (measured, onebrc-probe lane H t5; completes the E-1BRC-KANBAN-UPDATE-1 / E-1BRC-OWNER-GRANULARITY-1 arc; operator: "the 65536 mailboxes had no Orchestration at all — find the sweet spot")
+
+t4a's 20× cliff was the FLAT topology: 64K eager spawns, ~63K owner-addressed casts, producers addressing owners directly. Lane H interposes the planner/kanban-executor domain's own two mechanisms — **lazy activation** (router tier spawns an owner only on first traffic: live mailboxes track OCCUPANCY ~413, never the 64K address space) and the **ahead-firing batch writer** (routers buffer per-owner entries, fire batched Applys at batch_k) — over lane G's UNCHANGED one-mailbox-per-SoA substrate, witness discipline intact (owner journals == router casts). t5 medians @4 cores: H(16) 42.2 / H(256) 36.8 / H(4096) 40.2 / **H(65536) 39.4 vs same-session flat 1.7 — 23× recovery, within ~9% of the best coarse topology (G(1) 43.2; F reference 81.7)**. The ruling that closes the arc: **orchestration FLATTENS the granularity curve — ownership granularity becomes a semantic choice (per-tile addressability, per-owner WAL), not a performance gamble. Fine-grained mailbox-as-owner is viable IF AND ONLY IF producers never address owners directly: the router/delegation tier is a LOAD-BEARING part of the kanban-update architecture, and flat fan-out to fine owners is the measured 20× anti-pattern.** graph-flow (rs-graph-llm) remains the OUTER loop by design — task-granularity persisted-cursor orchestration; per-morsel it would measure storage latency, and its in-container build is blocked by the pre-existing burn 403 (W3b). Tables: crates/onebrc-probe/README.md §5.5.
+## 2026-07-02 — E-1BRC-OWNER-GRANULARITY-1: one mailbox per SoA (operator correction) — the ownership curve is a plateau then a 20× cliff; Morton tile GROUPING is what makes mailbox-as-owner viable
+**Status:** FINDING (measured, onebrc-probe lane G t4a; corrects the framing of E-1BRC-KANBAN-UPDATE-1 — the numbers there stand, the topology language was inverted)
+
+Operator: "I thought we spawn one ractor mailbox per SoA?" — ratified: that IS the canon, and lane G's "sharding the 64K SoA" framing was an ownership inversion (the code's owners were always independent, but each allocated a full 64K-slot table, making the fine-grained end unrunnable). Reworked: each owner's actor State is its OWN `OwnerSoa` sized to its tile span — one mailbox = one SoA, verbatim — unlocking the full granularity sweep including the literal 64K-concurrent-SoAs end. Medians @4 cores: G(1) 43.4 / G(16) 30.3 / G(256) 35.9 / G(4096) 18.3 / **G(65536 = one mailbox per tile) 2.1 Mrows/s — a 20× collapse** vs one owner (64K spawns paid in-run; cast fragmentation ~150→~63K messages as each morsel's ~413 stations scatter to ~413 owners; 64K mailbox tasks on 4 cores). The completed ruling: **the ownership-granularity curve is a plateau (1–256 owners, ~30–43 Mrows/s, topology noise-dominated) then a cliff; one-mailbox-per-semantic-cell is architecturally clean and measurably catastrophic at OLAP arrival rates. Morton tile GROUPING is not an optimization detail — it is the mechanism that makes mailbox-as-owner viable: the mailbox is the OWNER boundary, the tile is the ADDRESS boundary, and they must never be conflated 1:1 under load.** Owners' memory now ∝ span (the collapse is scheduling+messaging, not memory). Tables: crates/onebrc-probe/README.md §5.4a.
+## 2026-07-02 — E-1BRC-KANBAN-UPDATE-1: the kanban-update write path measured — 0.54× at morsel granularity, the tax is all boundary, and ownership must not shard below contention
+**Status:** FINDING (measured, onebrc-probe lane G t4, same recipe corpus as E-1BRC-ADDRESSING-1; tables `crates/onebrc-probe/README.md` §5.4; operator-requested follow-up "compare morton and the kanban vs without / 64k concurrent SoA vs Morton tile ... when using kanban update")
+
+Lane G holds lane F's Morton-tile 64K SoA as OWNED state behind shard mailbox actors: workers pre-reduce 64K-row morsels (#227's morsel size; ndarray rebased onto master to sit on its merged Morton/morsel probe), cast dirty entries prefix-routed to owners, every applied batch witnessed with a KanbanMove (journal==casts asserted). t4 medians @4 cores: **F 79.5 (private merge, no witness) / G 43.0 @1 shard / 39.9 @4 / 36.0 @16 (one thrash collapse to 11.7); workers=3 strictly worse.** Three rulings for the architecture: (1) **kanban update costs ~0.54×** at morsel granularity and the tax decomposes entirely into boundary costs (Arc corpus copy, blocking+async oversubscription, per-morsel messaging) — the witness itself is ~free (lane E) — buying live bounded-staleness state, witnessed replayable writes, single-writer safety, bounded worker memory; (2) **do not shard ownership below contention** — at ~400 groups ONE mailbox absorbs all apply work and every extra shard is pure scheduling overhead; shard count scales with owner WORK, never with rows; (3) **the Morton prefix ROUTE is free as a mechanism** (G@4 within ~7% of G@1 before thrash) — tile-sharding stays the right tool, its trigger is owner-side contention (high cardinality / heavy per-entry work). W2d consequence: private-merge when the product is one final answer; pay the ~2× only when the product IS the live/witnessed/owned state — and the 550 ms Libet budget is untouched either way.
 ## 2026-07-02 — E-1BRC-ADDRESSING-1: addressing-is-aggregation measured — route-and-write is 3× the classic map; the Morton dress costs ~10%
 **Status:** FINDING (measured, onebrc-probe t0–t3, recipe corpus rows=10000000 seed=42 sha256=f1853caa…5691, 4-core container; tables in `crates/onebrc-probe/README.md` §5–5.3)
 

diff --git a/.claude/v3/INTEGRATION-PLAN.md b/.claude/v3/INTEGRATION-PLAN.md
@@ -591,3 +591,120 @@ where the win lives (B was 1.06×). All six lanes A–F + R now measured
 on one regenerable recipe corpus. Board: E-1BRC-ADDRESSING-1. The probe
 is COMPLETE; follow-ups (100M container-scale run, high-cardinality
 corpus, SWAR parse, mmap) are priced and parked in README §1/§5.3.
+
+#### Addendum-13 status update (2026-07-02, t4 — lane G, operator follow-up)
+
+Operator: "compare morton and the kanban vs without — if 64k concurrent
+SoA vs Morton tile can help us understand the pros and cons of our
+architecture when using kanban update." Lane G SHIPPED (feature
+`lane-g`): the lane-F Morton-tile 64K SoA as OWNED state behind shard
+mailbox actors — prefix-routed morsel casts (64K rows, #227's morsel
+size, clear-by-undo extraction), every applied batch witnessed with a
+KanbanMove, journal==casts asserted. ndarray checkout rebased onto
+master (#227 merged — its Morton scatter/morsel probe is the sibling
+reference). t4 medians: F 79.5 / G(1 shard) 43.0 / G(4) 39.9 /
+G(16) 36.0 (one thrash collapse 11.7) / G(workers=3) strictly worse.
+**Ledger: kanban update = 0.54× at morsel granularity, and the tax is
+all boundary (corpus copy + oversubscription + messaging), not the
+witness (lane E: journal ~free). It buys live bounded-staleness state,
+witnessed replayable writes, single-writer safety, bounded worker
+memory. Do NOT shard ownership below contention — at ~400 groups one
+mailbox absorbs everything; shards scale with owner WORK, never with
+rows; the Morton prefix route itself is free (G(4)≈G(1)).** Tables +
+full readings: crates/onebrc-probe/README.md §5.4. Board follow-up
+appended to E-1BRC-ADDRESSING-1 thread as E-1BRC-KANBAN-UPDATE-1.
+
+#### Addendum-13 status update (2026-07-02, t4a — topology corrected, curve completed)
+
+Operator correction ratified: **one ractor mailbox per SoA** (canon).
+Lane G reworked — each owner's State is its OWN `OwnerSoa` sized to its
+tile span (no full-64K tables per owner, no "sharded one SoA" framing);
+flush grouping made sort-based (no dense per-shard vecs at 64K owners);
+parity test extended to 4096 mailboxes. Full ownership-granularity curve
+@4 workers, medians: G(1) 43.4 / G(16) 30.3 / G(256) 35.9 / G(4096) 18.3
+/ **G(65536, one mailbox per tile) 2.1 — a 20× collapse** (spawn ×64K +
+cast fragmentation ~150→~63K + 64K tasks on 4 cores). **Ruling: the
+ownership plateau spans 1–256 owners; Morton tile GROUPING is what makes
+mailbox-as-owner viable — mailbox = OWNER boundary, tile = ADDRESS
+boundary, never conflate 1:1 under load.** README §5.4a; board
+E-1BRC-KANBAN-UPDATE-1 correction appended as E-1BRC-OWNER-GRANULARITY-1.
+
+#### Addendum-13 status update (2026-07-02, t5 — orchestration sweet spot, operator follow-up)
+
+Operator: "the 65536 mailboxes had no Orchestration at all — find the
+sweet spot with rs-graph-llm or lance-graph-planner + kanban update."
+Lane H SHIPPED (feature `lane-h`): router tier with LAZY owner
+activation (live mailboxes track occupancy ~413, never the 64K address
+space) + AHEAD-FIRING batched delivery (batch_k=64) over lane G's
+unchanged one-mailbox-per-SoA substrate; witness discipline preserved
+(owner journals == router casts asserted). graph-flow stays the OUTER
+loop (task-granularity cursor; burn-submodule 403 blocks in-container
+builds anyway) — the in-loop mechanisms are the planner/kanban-executor
+domain's own. t5 medians @4 cores: H(16) 42.2 / H(256) 36.8 / H(4096)
+40.2 / **H(65536) 39.4 vs flat 1.7 same-session — 23× recovery, within
+~9% of G(1)=43.2; F=81.7.** RULING: orchestration FLATTENS the
+granularity curve — the sweet spot is not a shard count, it is the
+orchestration tier itself; fine-grained mailbox-as-owner is viable iff
+producers never address owners directly (the ahead-firing batch-writer
+is load-bearing, not an optimization; flat fan-out = the measured 20×
+anti-pattern). README §5.5; board E-1BRC-ORCHESTRATION-SWEETSPOT-1.
+
+#### Addendum-13 status update (2026-07-02, t6 — lane I, operator batch-pipeline spec)
+
+Operator spec implemented verbatim as lane I (feature `lane-i`): all
+65536 mailboxes UPFRONT (standing ownership registry; spawn measured
+separately: 1.1–2.7 s, 17–40 µs/actor); two fixed aligned indices
+(mailbox idx == SoA row idx — ownership guarantee is the
+`row_owner[i]==i` binding + write-on-behalf, never a message path);
+codebook-minted identity → direct CAM addressing (no probe in the hot
+loop; worker-local memo, ~400 global mint locks total); whole-table
+DOUBLE-CASTS (one Arc per 64K-row batch to BOTH the ownership-guarantee
+sink and the Lance row-address sink — 312 messages total vs 63K flat /
+2.6K orchestrated); flush cache interleaving flush and refill (peak 2–3
+tables/worker, worker never waits). Both ends journal every batch
+(ownership==lance==156 asserted) + one DatasetVersion tick per batch —
+the full double-WAL the W1b batch writer needs. t6: steady state ~20–22
+Mrows/s (≈½ of G(1) 43 — residency-footprint attribution CONJECTURE);
+total incl. spawn 3.2–6.1. RULING: the batch pipeline wins the
+messaging war outright (messages ∝ batches, independent of occupancy
+AND address space); the standing 64K registry is affordable
+infrastructure; the remaining surface is residency, not architecture.
+README §5.6; board E-1BRC-BATCH-PIPELINE-1.
+
+#### Addendum-13 status update (2026-07-02, t7 — lane J knob matrix, PROBE ARC COMPLETE)
+
+Lane J (feature `lane-j`) parameterizes lane I with the operator's four
+follow-up questions as knobs: grid (4096 gridlake vs 65536), sink lanes
+(1/8/64), registry (on/off). t7 @4 cores, same-session refs G(1)=46.3 /
+H=40.5 / F=70.1: **J(gridlake 4096, 1 lane, no registry) = 46.2–46.3 —
+the measured sweet spot: equals the best streamed topology while
+carrying the double-WAL.** Registry ON halves steady state net of spawn
+(t6 residency CONJECTURE → FINDING); grid 65536 → 40 (L2-busting
+table+memo); lanes 1≈8, 64 over-lanes (apply work is O(dirty) —
+lanes scale with APPLY work, never data). The composed recipe: 64×64
+gridlake batch SoA + codebook CAM + 1–8 lane pairs + whole-table
+double-cast + flush cache; ownership as the index-aligned guarantee
+table, NOT a standing per-cell actor registry; BF16 planes per ndarray
+#227's proven VDPBF16PS tier as the tile-GEMM continuation. README
+§5.7; board E-1BRC-GRIDLAKE-SWEETSPOT-1.
+
+#### Addendum-13 status update (2026-07-02, consolidation — findings/commentary split, 8 presets, simd_ops wiring)
+
+Operator-requested consolidation SHIPPED: (1) `crates/onebrc-probe/FINDINGS.md`
+— the AGNOSTIC record (environment, methods, all t0–t7 tables, all 11
+invariants WITH their code, reproduction commands; zero interpretation);
+(2) `crates/onebrc-probe/COMMENTARY.md` — this session's prime stored
+SEPARATELY (readings, rulings executed, composed recipe, flagged
+uncertainty, suggested lab sweeps) so another session can analyze the
+findings from its own angle; (3) `src/presets.rs` (feature `presets`) —
+the 8 batching methods frozen as named presets (map-private-merge /
+grid-private-merge / stream-single-owner / orchestrated-lazy-owners /
+batch-64k-registry / gridlake / gridlake-8-lanes / batch-64k-no-registry)
+sharing one signature + one parity harness (`all_presets_agree_with_lane_a`
+— every preset byte-identical to lane A); (4) honest answer to the simd
+question: lane B had used ONLY `U8x32::cmpeq_mask`; NOW also routes the
+stride walk through `ndarray::simd::array_chunks` (simd_ops.rs, the
+non-overlapping walker; `array_windows` is the overlapping GEMM sibling,
+deliberately unused); `simd_soa.rs::SoaBytes` remains an OPEN follow-up
+(natural carrier for vectorized sink sweeps + batch tables). Note: probe
+target/debug purged mid-round (disk full at 100%); gates re-run green.