Skip to content

probe(1brc): lanes G–J — kanban-update arc: ownership, orchestration, batch pipeline, and the 64×64 gridlake sweet spot#635

Merged
AdaWorldAPI merged 7 commits into
mainfrom
claude/v3-substrate-migration-review-o0yoxv
Jul 2, 2026
Merged

probe(1brc): lanes G–J — kanban-update arc: ownership, orchestration, batch pipeline, and the 64×64 gridlake sweet spot#635
AdaWorldAPI merged 7 commits into
mainfrom
claude/v3-substrate-migration-review-o0yoxv

Conversation

@AdaWorldAPI

@AdaWorldAPI AdaWorldAPI commented Jul 2, 2026

Copy link
Copy Markdown
Owner

The full operator arc on the Addendum-13 probe, five follow-ups: kanban vs without (G) → one mailbox per SoA (topology corrected) → orchestration sweet spot (H) → the batch-pipeline spec: 64K upfront + aligned indices + codebook CAM + whole-table double-cast + flush cache (I) → the knob matrix: grid × lanes × registry (J). ndarray rebased onto master (#227 — sibling Morton/morsel probe; its VDPBF16PS tier is the BF16 continuation).

The final ledger (same recipe corpus, 4 cores; same-session refs G(1)=46.3, H=40.5, F=70.1)

Topology Mrows/s messages witness
G flat 64K owners 1.7–2.1 ~63K ownership WAL
H orchestrated (lazy + ahead-firing) 39.4–40.5 ~2.6K ownership WAL
I batch pipeline, 64K registry + 64K grid ~20 steady 312 double-WAL + version ticks
J batch pipeline, 64×64 gridlake, 1 lane, no registry 46.2–46.3 312 double-WAL + version ticks

The operator's questions, answered by knobs (README §5.7)

  1. "With Orchestrator it was 39.4?" — t6's ~20 decomposes into two now-ISOLATED costs: the standing 64K actor registry (knob: ON halves steady state net of spawn — t6's residency CONJECTURE promoted to FINDING) and the L2-busting 64K-cell table+memo working set (~2.5 MB/worker). Remove both → the batch pipeline beats the orchestrator.
  2. "Different cache?" — yes: matching the batch SoA to the 64×64 gridlake tile (4096 cells ≈ 80 KB integer-exact; the literal 4096×BF16=16 KB pair is ndarray epiphanies: E-ORIG-1..7 + E-MEMB-1..13 + AGI Grundlagenforschung synthesis #227's proven VDPBF16PS tier) recovers the loss completely (4096 → 46 vs 65536 → 40).
  3. "8 or 64 lanes?" — 1 lane suffices, 8 is free headroom, 64 over-lanes. Sink-lane count scales with per-batch APPLY work, never with data or address space.
  4. "Orchestration with 400 the sweet spot?" — no: the gridlake batch pipeline (46.2, double witness, 312 msgs) beats lazy-owner orchestration (40.5, single witness, ~2.6K msgs). H's mechanisms stay right only when fine-grained ownership must live as actors; when ownership can be the index-aligned guarantee table, the batch pipeline dominates.
  5. "64×64 gridlake SoA?"yes: the measured sweet spot. Equals the best streamed topology while carrying the double-WAL; the kanban-update tax at gridlake scale is ~0 vs the best owned topology.

The composed recipe: 64×64 gridlake batch SoA + codebook CAM addressing + 1–8 sink lane pairs + whole-table Arc double-cast + flush cache; ownership as the index-aligned guarantee table (no standing per-cell actor registry); BF16 planes per #227's tier when tile-GEMM lands.

Gates

All-features test matrix green (incl. lane J 4-corner knob parity, lane I recycled-flush-cache parity, codebook idempotence, double-cast completeness asserts); clippy -D warnings exit 0; fmt clean. Board: E-1BRC-KANBAN-UPDATE-1, E-1BRC-OWNER-GRANULARITY-1, E-1BRC-ORCHESTRATION-SWEETSPOT-1, E-1BRC-BATCH-PIPELINE-1, E-1BRC-GRIDLAKE-SWEETSPOT-1; plan Addendum-13 t4–t7; README §5.4–§5.7.

🤖 Generated with Claude Code

https://claude.ai/code/session_01MLBnPuScZy6w9di2QEjsXM

Summary by CodeRabbit

  • New Features

    • Added new benchmark lanes for kanban-style, orchestrated, batch-pipeline, and grid-based processing.
    • Introduced a presets menu of 8 ready-to-run benchmarking configurations.
    • Expanded the CLI to support additional lane choices and shard/owner settings.
  • Documentation

    • Added a clearer reading order plus new findings and commentary notes.
    • Expanded the README with lane descriptions, usage details, and updated measurement results.
  • Tests

    • Added coverage to verify new benchmark modes still match the baseline output.

… merge, tax is all boundary, never shard ownership below contention

Operator follow-up: 'compare morton and the kanban vs without — if 64k
concurrent SoA vs Morton tile can help us understand the pros and cons
of our architecture when using kanban update.' ndarray checkout rebased
onto master (#227 merged — sibling Morton scatter/morsel probe).

Lane G (feature lane-g): lane F's Morton-tile 64K SoA held as OWNED
state by shard mailbox actors (mailbox-as-owner over contiguous Morton
tile ranges). Workers pre-reduce 64K-row morsels in private tables
(identical hot loop to F), extract dirty slots clear-by-undo, cast
entries prefix-routed to the owning shard; every applied batch is
witnessed with a KanbanMove on the owner WAL (journal==casts asserted;
CognitiveWork->Evaluation legal edge, storage_kanban journaling
precedent). Scan workers on the blocking pool so owners never starve.

t4 (recipe corpus re-verified, 4 cores, 3 passes, medians): F 79.5 /
G(1 shard) 43.0 / G(4) 39.9 / G(16) 36.0 with one thrash collapse to
11.7; workers=3 variants strictly worse. Ledger: kanban update = 0.54x
at morsel granularity, the tax is all boundary (corpus copy +
oversubscription + messaging; the witness itself ~free per lane E);
it buys live bounded-staleness state, witnessed replayable writes,
single-writer safety, bounded worker memory. Do NOT shard ownership
below contention — one mailbox absorbs ~400-group apply work; shards
scale with owner WORK, never rows; the Morton prefix route mechanism is
free (G@4 within ~7% of G@1). README §5.4 tables; plan Addendum-13 t4;
board E-1BRC-KANBAN-UPDATE-1.

Gates: 14 std / 16 lane-g / 20 all-features tests green; clippy -D
warnings exit 0; fmt clean.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MLBnPuScZy6w9di2QEjsXM
@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds probe lanes G, H, I, and J, wires them behind features and CLI entrypoints, exports preset-based runners, and updates the measurement records and session documentation for the new lane behaviors.

Changes

Probe lanes and supporting docs

Layer / File(s) Summary
Shared SoA helpers exposed
crates/onebrc-probe/src/lane_f.rs
Constants, Morton helpers, SoaTable, observe, and table_to_map are made crate-visible, and observe now returns the resolved slot index.
Lane G kanban update path
crates/onebrc-probe/src/lane_g.rs
Adds the shard-owner write path, Morton-prefix routing, worker flushing, public lane G entrypoints, and routing/tests support.
Lane H orchestration tier
crates/onebrc-probe/src/lane_h.rs
Adds router buffering, lazy owner activation, ahead-firing delivery, public lane H entrypoints, and tests.
Lane I and Lane J batch pipelines
crates/onebrc-probe/src/lane_i.rs, crates/onebrc-probe/src/lane_j.rs
Adds the batch-pipeline codebook/sinks/flush cache for lane I, then the grid-based lane J pipeline with registry and knob-matrix coverage.
Feature flags, exports, and CLI
crates/onebrc-probe/Cargo.toml, crates/onebrc-probe/src/lib.rs, crates/onebrc-probe/src/main.rs
Adds lane features, module declarations, public re-exports, and CLI dispatch/output handling for the new lanes.
Preset catalog and runner
crates/onebrc-probe/src/presets.rs
Defines the preset table, dispatcher, and parity test for the frozen lane configurations.
Measurement records and session docs
crates/onebrc-probe/README.md, crates/onebrc-probe/FINDINGS.md, crates/onebrc-probe/COMMENTARY.md, .claude/board/EPIPHANIES.md, .claude/v3/INTEGRATION-PLAN.md
Updates the probe docs, findings, commentary, epiphany board, and integration plan with lane coverage, measurements, and session notes.

Estimated code review effort: 5 (Critical) | ~120 minutes

Poem

I hopped through lanes both wide and keen,
Through mailboxes and grids agleam.
With presets packed and rows aligned,
The batchy burrow feels designed.
Hooray for hops, and lucky streams! 🐇

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main changes: new G–J lanes, the kanban-update arc, and the 64×64 gridlake sweet spot.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

… full ownership-granularity curve to 64K mailboxes

Operator correction: 'I thought we spawn one ractor mailbox per SoA?' —
ratified. Each owner's actor State is now its OWN OwnerSoa sized to its
tile span (span-clamped capacity, local probe, full-name verification;
panic-on-full, never silent) — one mailbox = one SoA verbatim, no
full-64K tables per owner, no 'sharded one SoA' framing. Flush grouping
sort-based (no dense per-shard vec allocation at 64K owners). Parity
test extended to 4096 mailboxes; shards clamped to [1, 65536].

t4a curve (recipe corpus re-verified, 4 cores, medians of 3): G(1) 43.4
/ G(16) 30.3 / G(256) 35.9 / G(4096) 18.3 / G(65536 = one mailbox per
tile) 2.1 Mrows/s — a 20x collapse at the literal 64K-concurrent-SoAs
end (64K spawns in-run, cast fragmentation ~150 -> ~63K, 64K tasks on 4
cores). Ruling: ownership plateau 1-256 owners, then the cliff; Morton
tile GROUPING is what makes mailbox-as-owner viable — mailbox = OWNER
boundary, tile = ADDRESS boundary, never 1:1 under load. README §5.4a;
plan Addendum-13 t4a; board E-1BRC-OWNER-GRANULARITY-1.

Gates: 16 lane-g / 20 all-features tests green; clippy -D warnings exit
0; fmt clean.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MLBnPuScZy6w9di2QEjsXM
@AdaWorldAPI AdaWorldAPI changed the title probe(1brc): lane G — the kanban-update write path measured (Morton-tile SoA as owned mailbox state vs private merge) probe(1brc): lane G — kanban-update write path, one mailbox per SoA, ownership-granularity curve to 64K mailboxes Jul 2, 2026
…tion + ahead-firing batching flatten the ownership curve (23x at the 64K end)

Operator: 'the 65536 mailboxes had no Orchestration at all — check with
rs-graph-llm or lance-graph-planner + kanban update to find the sweet
spot.' Lane H (feature lane-h) interposes the planner/kanban-executor
domain's two in-loop mechanisms over lane G's unchanged
one-mailbox-per-SoA substrate: LAZY activation (router tier spawns an
owner only on first traffic — live mailboxes track occupancy ~413,
never the 64K address space) and AHEAD-FIRING batched delivery (routers
buffer per-owner entries, fire batched Applys at batch_k=64, drain
flushes remainders). Witness discipline preserved: owner journals ==
router casts asserted. graph-flow (rs-graph-llm) stays the OUTER loop
by design: task-granularity persisted cursor (M25); per-morsel it would
put a session save on the hot path, and its in-container build is
blocked by the pre-existing burn-submodule 403 (W3b).

t5 (recipe corpus re-verified, 4 cores, medians of 3): H(16) 42.2 /
H(256) 36.8 / H(4096) 40.2 / H(65536) 39.4 Mrows/s vs same-session
flat G(65536) 1.7 — a 23x recovery, within ~9% of G(1) 43.2 (F 81.7).
Ruling: orchestration FLATTENS the granularity curve; the sweet spot is
not a shard count, it is the orchestration tier itself. Fine-grained
mailbox-as-owner is viable iff producers never address owners directly
— the router/delegation (ahead-firing batch-writer) tier is
load-bearing, not an optimization; flat fan-out to fine owners is the
measured 20x anti-pattern. README §5.5; plan Addendum-13 t5; board
E-1BRC-ORCHESTRATION-SWEETSPOT-1.

Gates: 14 std / 17 lane-h / 21 all-features tests green; clippy -D
warnings exit 0; fmt clean.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MLBnPuScZy6w9di2QEjsXM
@AdaWorldAPI AdaWorldAPI changed the title probe(1brc): lane G — kanban-update write path, one mailbox per SoA, ownership-granularity curve to 64K mailboxes probe(1brc): lanes G+H — kanban-update write path, one mailbox per SoA, and the orchestration sweet spot (23× at the 64K end) Jul 2, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
crates/onebrc-probe/src/main.rs (2)

135-164: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Fallback "unknown lane" message doesn't mention new lanes g/h.

Adding the new "g"/"h" match arms leaves the other fallback's expected-lane list stale (Line 161 still lists only a, b, c, d, e, f, r), which will mislead users who mistype a lane or check the error text for valid options.

🩹 Proposed fix
         other => {
-            eprintln!("unknown lane '{other}' (expected 'a', 'b', 'c', 'd', 'e', 'f', or 'r')");
+            eprintln!("unknown lane '{other}' (expected 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', or 'r')");
             std::process::exit(2);
         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/onebrc-probe/src/main.rs` around lines 135 - 164, The fallback error
message in the `other` match arm is stale and still lists only the old lane
options. Update the unknown-lane message in `main` to include the new valid
lanes `g` and `h` alongside `a` through `f` and `r`, so the `lane` dispatch and
the user-facing validation text stay in sync.

172-184: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Lane h's owner-granularity parameter isn't printed.

Lane g gets a dedicated print branch showing shards={shards} (176-179), but lane h (which also consumes shards as the nominal owner count) falls through to the generic branch and never prints it. This is the exact sweep parameter used to produce README §5.5's ownership-granularity table, so runs can't be distinguished from their output alone.

🩹 Proposed fix
     if lane == "e" {
         println!(
             "lane={lane} rows={rows} workers={workers} batches={batches} elapsed_ms={elapsed_ms:.3} throughput_mrows_s={throughput_mrows_s:.3}"
         );
-    } else if lane == "g" {
+    } else if lane == "g" || lane == "h" {
         println!(
             "lane={lane} rows={rows} workers={workers} shards={shards} elapsed_ms={elapsed_ms:.3} throughput_mrows_s={throughput_mrows_s:.3}"
         );
     } else {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/onebrc-probe/src/main.rs` around lines 172 - 184, The lane-specific
summary in main does not print the owner-granularity value for lane h, so runs
using that lane cannot be distinguished from the output. Update the conditional
printing logic in main to give lane h its own branch, similar to the existing
lane g branch, and include shards={shards} in the formatted output while keeping
the other lanes unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/onebrc-probe/README.md`:
- Around line 96-101: The CLI documentation in README.md is missing lane h and
its usage details. Update the onebrc-probe run syntax and lane selector to
include h, and revise the feature-gate note to mention that lane-g and lane-h
both require their corresponding features and are enforced by main.rs. Also
document that lane h uses the same 4th positional argument semantics as lane g,
so readers can find the correct behavior from the onebrc-probe CLI docs.

In `@crates/onebrc-probe/src/lane_h.rs`:
- Around line 158-172: The routing logic in RouteMsg::Morsel can spawn the same
owner_idx in multiple routers when routers_n and owners_nominal partition Morton
slots differently, inflating live-owner counts. Update the router-to-owner
mapping so each owner_idx is owned by exactly one router, either by snapping
router boundaries to owner boundaries or by routing based directly on owner_idx
instead of morton_slot-derived router partitions. Keep the buffer/fire flow in
the RouteMsg::Morsel path and the setup around routers_n and owners_nominal
consistent with that single-owner ownership model.

---

Outside diff comments:
In `@crates/onebrc-probe/src/main.rs`:
- Around line 135-164: The fallback error message in the `other` match arm is
stale and still lists only the old lane options. Update the unknown-lane message
in `main` to include the new valid lanes `g` and `h` alongside `a` through `f`
and `r`, so the `lane` dispatch and the user-facing validation text stay in
sync.
- Around line 172-184: The lane-specific summary in main does not print the
owner-granularity value for lane h, so runs using that lane cannot be
distinguished from the output. Update the conditional printing logic in main to
give lane h its own branch, similar to the existing lane g branch, and include
shards={shards} in the formatted output while keeping the other lanes unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 0bde42de-a647-44c7-bb1c-d5c3128a3672

📥 Commits

Reviewing files that changed from the base of the PR and between a575c7c and bff6c86.

📒 Files selected for processing (9)
  • .claude/board/EPIPHANIES.md
  • .claude/v3/INTEGRATION-PLAN.md
  • crates/onebrc-probe/Cargo.toml
  • crates/onebrc-probe/README.md
  • crates/onebrc-probe/src/lane_f.rs
  • crates/onebrc-probe/src/lane_g.rs
  • crates/onebrc-probe/src/lane_h.rs
  • crates/onebrc-probe/src/lib.rs
  • crates/onebrc-probe/src/main.rs

Comment thread crates/onebrc-probe/README.md Outdated
Comment thread crates/onebrc-probe/src/lane_h.rs
… codebook CAM addressing, whole-table double-casts, flush cache

Operator spec implemented verbatim: (1) all 65536 mailboxes spawned
UPFRONT — standing ownership registry, spawn measured separately
(1.1-2.7 s, 17-40 us/actor); (2) two fixed aligned indices — mailbox
idx == SoA row idx; ownership guarantee = the row_owner[i]==i binding
plus write-on-behalf at the sink, never a message path; (3) codebook
index — identity minted once (Morton place + probe + name verify at
mint only), worker-local memo, hot loop = direct CAM index with no
probe and no compare; (4) whole-table DOUBLE-CASTS — full 65536-row SoA
batch tables frozen into one Arc cast to BOTH the mailbox-ownership-
guarantee sink and the Lance row-address sink (156 batches -> 312
messages total vs ~63K flat / ~2.6K orchestrated; messages track
batches, independent of occupancy and address space); (5) flush cache —
refcount-gated table pool, peak 2-3 tables/worker, flushing and
reindex-next interleave, worker never waits.

Double-WAL asserted: ownership journal == lance journal == 156 batches
+ one DatasetVersion tick per batch — replayable from either end (the
full W1b batch-writer shape). t6: steady state ~20-22 Mrows/s (~1/2 of
G(1) 43 — residency-footprint attribution flagged CONJECTURE); total
incl. one-time spawn 3.2-6.1. Ruling: the batch pipeline wins the
messaging war outright and carries the strongest witness story; the
standing 64K registry is affordable infrastructure; the remaining
surface is residency, not architecture. README §5.6 + lane table; plan
Addendum-13 t6; board E-1BRC-BATCH-PIPELINE-1.

Gates: 14 std / 16 lane-i / 23 all-features tests green (incl. lane I
parity with recycled flush cache + codebook mint idempotence); clippy
-D warnings exit 0; fmt clean.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MLBnPuScZy6w9di2QEjsXM
@AdaWorldAPI AdaWorldAPI changed the title probe(1brc): lanes G+H — kanban-update write path, one mailbox per SoA, and the orchestration sweet spot (23× at the 64K end) probe(1brc): lanes G+H+I — kanban-update write path, orchestration sweet spot, and the batch-pipeline spec (64K upfront + double-cast + flush cache) Jul 2, 2026
… spot (46 Mrows/s, equals best streamed topology, double-WAL)

One parameterized lane answers the operator's four follow-ups as knobs:
grid (4096 gridlake vs 65536), sink lanes (1/8/64), registry (on/off),
over lane I's unchanged batch pipeline (codebook CAM, whole-table Arc
double-cast, flush cache; per-batch witness on every lane asserted).

t7 (recipe corpus, 4 cores; same-session refs G(1)=46.3 / H=40.5 /
F=70.1): J(4096, 1 lane, no registry) = 46.2-46.3 — the measured sweet
spot. Registry ON halves steady state net of spawn (t6 residency
CONJECTURE -> FINDING). Grid 65536 -> 40 (L2-busting table+memo
working set ~2.5 MB/worker). Lanes: 1 suffices, 8 free, 64 over-lanes
— lane count scales with per-batch APPLY work, never data. Composed
recipe: 64x64 gridlake batch SoA + codebook CAM + 1-8 lane pairs +
whole-table double-cast + flush cache; ownership as the index-aligned
guarantee table (no standing per-cell actor registry); BF16 planes per
ndarray #227's proven VDPBF16PS tier as the tile-GEMM continuation.

Gates: 25 all-features tests green (incl. lane J 4-corner knob-matrix
parity); clippy -D warnings exit 0; fmt clean. README §5.7; plan
Addendum-13 t7; board E-1BRC-GRIDLAKE-SWEETSPOT-1.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MLBnPuScZy6w9di2QEjsXM
@AdaWorldAPI AdaWorldAPI changed the title probe(1brc): lanes G+H+I — kanban-update write path, orchestration sweet spot, and the batch-pipeline spec (64K upfront + double-cast + flush cache) probe(1brc): lanes G–J — kanban-update arc: ownership, orchestration, batch pipeline, and the 64×64 gridlake sweet spot Jul 2, 2026
claude added 2 commits July 2, 2026 19:12
…rray_chunks wiring

Consolidation per operator request: (1) FINDINGS.md — the agnostic
record: environment, methods, all t0-t7 measurement tables, and all 11
asserted invariants WITH their code (merge algebra, cross-method parity,
corpus reproducibility, witness completeness single- and double-sink,
aligned-index ownership guarantee, codebook mint idempotence,
flush-cache refcount exclusivity, bounded-probe panic, lifecycle
legality, SIMD provenance); no interpretation. (2) COMMENTARY.md — this
session's prime stored separately (readings, composed recipe, flagged
uncertainty, suggested lab sweeps) so another session can read the
findings from its own angle. (3) src/presets.rs (feature presets) — the
8 batching methods frozen as named reproducible presets with one shared
signature and one parity harness: all_presets_agree_with_lane_a asserts
every preset byte-identical to lane A, so lab sweeps inherit
correctness for free. (4) lane B stride walk now routes through
ndarray::simd::array_chunks (simd_ops.rs non-overlapping walker;
array_windows is the overlapping GEMM-style sibling, deliberately
unused for delimiter scanning); simd_soa.rs SoaBytes noted as the open
follow-up for vectorized sink sweeps.

Gates: 14 std / 16 lane-b / 21 presets tests green; clippy -D warnings
exit 0; fmt clean. (Probe target/debug purged mid-round — container
disk hit 100%; all gates re-run clean after.)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MLBnPuScZy6w9di2QEjsXM
CodeRabbit #635 r3515365083 (open, Functional Correctness / Minor):
lane H partitioned the 65536 Morton-slot space TWICE independently —
`routers_n` (worker grouping) and `owners_nominal` (router-internal
owner_idx). When the two grids don't nest (e.g. 3 routers vs 16 owners:
21845-wide router ranges vs 4096-wide owner ranges), one owner_idx range
straddled a router boundary and was lazily spawned as a separate
ShardOwner in two routers — duplicating MailboxIds and inflating the
live-owner count that lane H exists to measure ("activates only
~occupancy owners"). Final aggregates were already correct (each station
lands in one router→one owner, disjoint maps merged); the defect was
measurement integrity.

Fix (CodeRabbit option 2 — route owners→routers via owner_idx):
- `worker_scan_grouped` now takes a `group_of: impl Fn(u64) -> usize`
  closure instead of the hardwired `slot * groups >> 16`, so the caller
  chooses the bucket mapping.
- lane H adds `owner_of(h, owners_nominal)` (the fine, stable global tile
  grain) and `router_of_owner(owner_idx, owners_nominal, routers_n)`
  (coarse grain derived FROM owner_idx by integer division). The worker
  routes each entry to `router_of_owner(owner_of(h))`, a pure function of
  owner_idx, so every owner_idx maps to exactly one router by
  construction. The router computes the same `owner_of` — identical
  formula, single source.
- `RouteMsg::Drain` now also returns each router's activated owner_idxs;
  the coordinator asserts their union is duplicate-free
  (`unique.len() == all.len()`), so a straddle regression fails the run.

The existing lane_h test already exercises the misaligned 16-owner /
3-router case; its doc now names it as the straddle guard. All 21
onebrc-probe tests pass (incl. presets parity); clippy -D warnings clean;
fmt clean.

Thread 1 (README omits lane h) was already resolved in 60bc1e6.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MLBnPuScZy6w9di2QEjsXM
@AdaWorldAPI AdaWorldAPI merged commit e1279cf into main Jul 2, 2026
5 of 6 checks passed

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (5)
crates/onebrc-probe/src/lane_i.rs (1)

93-110: 🩺 Stability & Availability | 🔵 Trivial | 💤 Low value

Same unbounded-probe pattern as lane_j.rs::GridCodebook::mint, lower risk here.

Codebook::mint has no capacity guard either, but SLOTS (65536) comfortably exceeds the 1BRC spec's ~10,000-station ceiling, so this is much less likely to hang in practice than lane J's 4096-cell default. Still, the same defensive assert!(self.len < SLOTS, ...) would make this explicit rather than implicit.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/onebrc-probe/src/lane_i.rs` around lines 93 - 110, `Codebook::mint`
can still probe forever when the table fills, even though the risk is lower than
in `lane_j.rs::GridCodebook::mint`. Add a defensive capacity check in `mint`
before entering the probe loop, using the existing `self.len` and `SLOTS`
symbols, so the code explicitly fails once all slots are consumed instead of
relying on the spec’s station limit.
crates/onebrc-probe/src/lane_b.rs (2)

71-74: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Prefer U8x32::LANES over the literal 32 in array_chunks::<u8, 32>.

aligned_end is derived from U8x32::LANES, but the chunk size passed to array_chunks is a separate hardcoded literal. If LANES ever changes, these two would silently diverge.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/onebrc-probe/src/lane_b.rs` around lines 71 - 74, The chunking logic
in lane_b uses a hardcoded 32 in array_chunks::<u8, 32> while aligned_end and
pos already rely on U8x32::LANES, so update the array_chunks call to use the
same U8x32::LANES constant. Keep the chunk size, aligned_end calculation, and
U8x32::from_slice path consistent so the code stays correct if the lane count
changes.

46-46: 🗄️ Data Integrity & Integration | 🔵 Trivial | 💤 Low value

Use U8x32::LANES here instead of the hardcoded 32. The loop already derives its stride from U8x32::LANES, so this keeps the chunk size and offset math aligned.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/onebrc-probe/src/lane_b.rs` at line 46, The loop in lane_b is using a
hardcoded chunk size of 32 instead of the SIMD lane count, which can drift from
the existing stride logic. Update the chunking/offset math in the lane_b code to
use U8x32::LANES consistently, matching the stride already derived from that
symbol so the array_chunks-based processing stays aligned.
crates/onebrc-probe/src/lane_j.rs (1)

56-181: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚖️ Poor tradeoff

Significant duplication with lane_i.rs's Codebook/SlotMemo/BatchTable/FlushCache.

GridCodebook, GridMemo, GridBatch, and GridFlushCache are near-identical to their lane_i counterparts, generalized only by a runtime grid size. Given lane J is documented as "lane I's shape with knobs," lane I's fixed-SLOTS types could be expressed as lane J with grid = SLOTS, removing ~150 lines of parallel logic to maintain.

Also applies to: 361-398

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/onebrc-probe/src/lane_j.rs` around lines 56 - 181, `GridCodebook`,
`GridMemo`, `GridBatch`, and `GridFlushCache` are duplicating the same
slot-memo/batch logic already present in lane I, differing mainly by a
configurable grid size. Refactor the lane J implementation to reuse the shared
abstractions instead of maintaining parallel copies: make the common
codebook/memo/batch/cache logic parameterized by grid and have lane I
instantiate it with `SLOTS`, keeping the lane J-specific behavior only where it
truly differs. Preserve the existing behavior of `mint`, `resolve`, `observe`,
and `reset`, but consolidate the near-identical implementations into a single
path to reduce maintenance overhead.
crates/onebrc-probe/Cargo.toml (1)

61-61: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Consider pinning the ractor git dependency to a rev/tag.

The git source has no rev/tag/branch, so it floats on the fork's default branch — a rebuild months later could silently pick up breaking changes.

♻️ Suggested pin
-ractor = { git = "https://github.com/AdaWorldAPI/ractor", optional = true, default-features = false, features = ["tokio_runtime"] }
+ractor = { git = "https://github.com/AdaWorldAPI/ractor", rev = "<pin-me>", optional = true, default-features = false, features = ["tokio_runtime"] }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/onebrc-probe/Cargo.toml` at line 61, The ractor git dependency is
floating on the fork’s default branch because it has no pinned source, which can
cause rebuilds to silently pick up different code. Update the ractor entry in
Cargo.toml to pin it to a specific rev, tag, or branch so the dependency is
reproducible and stable over time.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.claude/v3/INTEGRATION-PLAN.md:
- Line 688: The Markdown in INTEGRATION-PLAN.md is triggering MD018 because a
wrapped line starts with an inline `#227-style` reference that looks like a
heading marker. Update the affected prose so the line no longer begins with a
bare # reference, or reflow the surrounding text in the section containing the
`#227` reference to keep the hash from appearing at the start of a line. Use the
nearby markdown content in the integration plan to locate and adjust the wrapped
sentence.

In `@crates/onebrc-probe/src/lane_j.rs`:
- Around line 76-91: GridCodebook::mint can loop forever when the table is full
and no empty slot remains; update the probing logic in mint to detect exhaustion
instead of continuing indefinitely. Use the existing GridCodebook state (for
example len, mask, used, tags, names) to cap the probe count or check fullness
before/while probing, and return a failure result or otherwise fail fast once
every slot has been examined. Keep the existing matching behavior for identical
(h, name) entries, but ensure mint always terminates even when the number of
unique station identities exceeds the grid size.

---

Nitpick comments:
In `@crates/onebrc-probe/Cargo.toml`:
- Line 61: The ractor git dependency is floating on the fork’s default branch
because it has no pinned source, which can cause rebuilds to silently pick up
different code. Update the ractor entry in Cargo.toml to pin it to a specific
rev, tag, or branch so the dependency is reproducible and stable over time.

In `@crates/onebrc-probe/src/lane_b.rs`:
- Around line 71-74: The chunking logic in lane_b uses a hardcoded 32 in
array_chunks::<u8, 32> while aligned_end and pos already rely on U8x32::LANES,
so update the array_chunks call to use the same U8x32::LANES constant. Keep the
chunk size, aligned_end calculation, and U8x32::from_slice path consistent so
the code stays correct if the lane count changes.
- Line 46: The loop in lane_b is using a hardcoded chunk size of 32 instead of
the SIMD lane count, which can drift from the existing stride logic. Update the
chunking/offset math in the lane_b code to use U8x32::LANES consistently,
matching the stride already derived from that symbol so the array_chunks-based
processing stays aligned.

In `@crates/onebrc-probe/src/lane_i.rs`:
- Around line 93-110: `Codebook::mint` can still probe forever when the table
fills, even though the risk is lower than in `lane_j.rs::GridCodebook::mint`.
Add a defensive capacity check in `mint` before entering the probe loop, using
the existing `self.len` and `SLOTS` symbols, so the code explicitly fails once
all slots are consumed instead of relying on the spec’s station limit.

In `@crates/onebrc-probe/src/lane_j.rs`:
- Around line 56-181: `GridCodebook`, `GridMemo`, `GridBatch`, and
`GridFlushCache` are duplicating the same slot-memo/batch logic already present
in lane I, differing mainly by a configurable grid size. Refactor the lane J
implementation to reuse the shared abstractions instead of maintaining parallel
copies: make the common codebook/memo/batch/cache logic parameterized by grid
and have lane I instantiate it with `SLOTS`, keeping the lane J-specific
behavior only where it truly differs. Preserve the existing behavior of `mint`,
`resolve`, `observe`, and `reset`, but consolidate the near-identical
implementations into a single path to reduce maintenance overhead.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: bfc9f672-ce62-4c29-b1a8-0ea5761cdd59

📥 Commits

Reviewing files that changed from the base of the PR and between bff6c86 and f2411bc.

📒 Files selected for processing (14)
  • .claude/board/EPIPHANIES.md
  • .claude/v3/INTEGRATION-PLAN.md
  • crates/onebrc-probe/COMMENTARY.md
  • crates/onebrc-probe/Cargo.toml
  • crates/onebrc-probe/FINDINGS.md
  • crates/onebrc-probe/README.md
  • crates/onebrc-probe/src/lane_b.rs
  • crates/onebrc-probe/src/lane_g.rs
  • crates/onebrc-probe/src/lane_h.rs
  • crates/onebrc-probe/src/lane_i.rs
  • crates/onebrc-probe/src/lane_j.rs
  • crates/onebrc-probe/src/lib.rs
  • crates/onebrc-probe/src/main.rs
  • crates/onebrc-probe/src/presets.rs
✅ Files skipped from review due to trivial changes (3)
  • crates/onebrc-probe/FINDINGS.md
  • .claude/board/EPIPHANIES.md
  • crates/onebrc-probe/README.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/onebrc-probe/src/main.rs
  • crates/onebrc-probe/src/lane_g.rs

gridlake batch SoA + codebook CAM + 1–8 lane pairs + whole-table
double-cast + flush cache; ownership as the index-aligned guarantee
table, NOT a standing per-cell actor registry; BF16 planes per ndarray
#227's proven VDPBF16PS tier as the tile-GEMM continuation. README

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Markdownlint MD018: missing space after #.

Likely triggered by an inline #227-style reference sitting at the start of a wrapped line, which markdown parses as an ATX heading attempt.

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 688-688: No space after hash on atx style heading

(MD018, no-missing-space-atx)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.claude/v3/INTEGRATION-PLAN.md at line 688, The Markdown in
INTEGRATION-PLAN.md is triggering MD018 because a wrapped line starts with an
inline `#227-style` reference that looks like a heading marker. Update the
affected prose so the line no longer begins with a bare # reference, or reflow
the surrounding text in the section containing the `#227` reference to keep the
hash from appearing at the start of a line. Use the nearby markdown content in
the integration plan to locate and adjust the wrapped sentence.

Source: Linters/SAST tools

Comment on lines +76 to +91
fn mint(&mut self, h: u64, name: &[u8]) -> u16 {
let mut s = morton_slot(h) as usize & self.mask;
loop {
if !self.used[s] {
self.used[s] = true;
self.tags[s] = h;
self.names[s] = name.to_vec();
self.len += 1;
return s as u16;
}
if self.tags[s] == h && self.names[s] == name {
return s as u16;
}
s = (s + 1) & self.mask;
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🔴 Critical | ⚡ Quick win

GridCodebook::mint can spin forever once the grid fills up.

With grid as low as 4096 (the default "gridlake" preset), any corpus with more unique station identities than grid cells will cause mint to loop indefinitely — no empty slot is ever found and no matching (h, name) exists, so the loop never terminates. The classic 1BRC dataset spec allows up to 10,000 distinct stations, well above the 4096-cell default, so this is reachable with realistic data even though the probe's own synthetic corpora stay under it.

🛡️ Proposed fix: fail fast instead of hanging
     fn mint(&mut self, h: u64, name: &[u8]) -> u16 {
+        assert!(
+            self.len < self.tags.len(),
+            "GridCodebook capacity ({}) exceeded — increase `grid`",
+            self.tags.len()
+        );
         let mut s = morton_slot(h) as usize & self.mask;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn mint(&mut self, h: u64, name: &[u8]) -> u16 {
let mut s = morton_slot(h) as usize & self.mask;
loop {
if !self.used[s] {
self.used[s] = true;
self.tags[s] = h;
self.names[s] = name.to_vec();
self.len += 1;
return s as u16;
}
if self.tags[s] == h && self.names[s] == name {
return s as u16;
}
s = (s + 1) & self.mask;
}
}
fn mint(&mut self, h: u64, name: &[u8]) -> u16 {
assert!(
self.len < self.tags.len(),
"GridCodebook capacity ({}) exceeded — increase `grid`",
self.tags.len()
);
let mut s = morton_slot(h) as usize & self.mask;
loop {
if !self.used[s] {
self.used[s] = true;
self.tags[s] = h;
self.names[s] = name.to_vec();
self.len += 1;
return s as u16;
}
if self.tags[s] == h && self.names[s] == name {
return s as u16;
}
s = (s + 1) & self.mask;
}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/onebrc-probe/src/lane_j.rs` around lines 76 - 91, GridCodebook::mint
can loop forever when the table is full and no empty slot remains; update the
probing logic in mint to detect exhaustion instead of continuing indefinitely.
Use the existing GridCodebook state (for example len, mask, used, tags, names)
to cap the probe count or check fullness before/while probing, and return a
failure result or otherwise fail fast once every slot has been examined. Keep
the existing matching behavior for identical (h, name) entries, but ensure mint
always terminates even when the number of unique station identities exceeds the
grid size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants