fix(onebrc lane B): dispatch SIMD width (U8x64 zmm / U8x32 ymm) instead of hardcoding 32 by AdaWorldAPI · Pull Request #636 · AdaWorldAPI/lance-graph

AdaWorldAPI · 2026-07-02T20:21:30Z

Follow-up on merged #635. The lane-B delimiter scan hardcoded array_chunks::<u8, 32> + U8x32 throughout, pinning the walk to 32-byte ymm (AVX2) regardless of target-cpu — so under target-cpu=x86-64-v4/native it never strided the 64-byte zmm the AVX-512 build provides. (The probe's .cargo/config.toml v3 pin stays — it's a deliberate CI-parity choice; this only makes lane B honor native/v4 when a run opts in.)

Change

SimdByte = compile-time width alias: U8x64 under cfg(target_feature = "avx512f"), U8x32 otherwise. Both are ndarray::simd types (the "all SIMD from ndarray::simd" iron rule — no raw intrinsic). cmpeq_mask returns u64/u32 respectively; the set-bit walk was already generic over the mask width, so the body is unchanged apart from the alias.
array_chunks::<u8, { SimdByte::LANES }> — the const-generic tracks the dispatched width; aligned_end, pos, needles, and from_slice all key off SimdByte::LANES. No literal stride remains.
Module + fn docs rewritten to describe the dispatch (64-byte zmm avx512 / 32-byte ymm avx2) instead of asserting a fixed 32.
Test ..._straddle_32_byte_block_boundaries → ..._straddle_block_boundaries, now asserting the crossing at the dispatched lane_b::SIMD_LANES (test-gated const) instead of / 32. The 68-byte corpus straddles a boundary at both widths (long_name @32, Vv @64), so cross-block-carry coverage holds either way.

Verification

Both arms, from the crate dir (onebrc-probe builds standalone):

v3 default (U8x32, 32-byte ymm): 16/16 lane-b tests byte-parity with lane A; clippy -D warnings clean (lib + all-targets); fmt clean.
RUSTFLAGS=-Ctarget-cpu=native (U8x64, 64-byte zmm on an avx512f host): 16/16; clippy clean (all-targets).

README/FINDINGS narrative on the v3-pin correction is intentionally left to the parallel session's §5.5 to avoid clobbering its in-flight edits.

🤖 Generated with Claude Code

https://claude.ai/code/session_01MLBnPuScZy6w9di2QEjsXM

Generated by Claude Code

@32

… stride The scan hardcoded `array_chunks::<u8, 32>` + `U8x32` throughout, pinning the delimiter walk to 32-byte `ymm` (AVX2) regardless of target-cpu — so under `target-cpu=x86-64-v4`/`native` it strided `ymm`, never the 64-byte `zmm` the AVX-512 build provides. (The probe's `.cargo/config.toml` v3 pin is a deliberate CI-parity choice; this is about honoring native/v4 when a run opts into it — "here v4 or native is a must".) - `SimdByte` = compile-time width alias: `U8x64` under `cfg(target_feature = "avx512f")`, `U8x32` otherwise. Both are `ndarray::simd` types (iron rule; no raw intrinsic). `cmpeq_mask` returns `u64`/`u32` respectively; the set-bit walk was already generic over the mask width, so the body is unchanged apart from the alias. - `array_chunks::<u8, { SimdByte::LANES }>` — the const-generic tracks the dispatched width; `aligned_end`, `pos`, needles, and `from_slice` all key off `SimdByte::LANES`. No literal stride remains. - Module + fn docs rewritten to describe the dispatch (64-byte zmm avx512 / 32-byte ymm avx2) instead of asserting a fixed 32. - Test `..._straddle_32_byte_block_boundaries` → `..._straddle_block_boundaries`, now asserts crossing at the dispatched `lane_b::SIMD_LANES` (test-gated const) instead of a literal `/ 32`; the 68-byte corpus straddles a boundary at BOTH widths (`long_name` @32, `Vv` @64), so coverage holds either way. Verified both arms: v3 default (U8x32, 32B) and `RUSTFLAGS=-Ctarget-cpu=native` (U8x64, 64B zmm on this avx512f host) — 16/16 lane-b tests byte-parity with lane A, clippy `-D warnings` clean (lib + all-targets) on both, fmt clean. README/FINDINGS narrative on the v3-pin correction is deferred to the parallel session's §5.5 to avoid clobbering its in-flight README edits. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01MLBnPuScZy6w9di2QEjsXM

coderabbitai · 2026-07-02T20:21:38Z

Warning

Review limit reached

@AdaWorldAPI, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 20 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: a0fdf7b9-53f8-4036-89f4-5c3c78b048f0

📥 Commits

Reviewing files that changed from the base of the PR and between e1279cf and 45623c2.

📒 Files selected for processing (2)

crates/onebrc-probe/src/lane_b.rs
crates/onebrc-probe/src/lib.rs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 45623c2993

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-07-02T20:26:14Z

+#[cfg(target_feature = "avx512f")]
+use ndarray::simd::U8x64 as SimdByte;


Require AVX512BW before selecting U8x64

On targets that advertise avx512f without avx512bw (for example -Ctarget-cpu=knl or a manual -Ctarget-feature=+avx512f build), this alias selects U8x64, and the scan later calls byte cmpeq_mask. In the ndarray fork that method is implemented with the AVX-512 byte-compare intrinsic, which needs AVX512BW, so Lane B can execute an unsupported instruction instead of falling back to the 32-byte path. Please gate the 64-byte alias on both avx512f and avx512bw.

Useful? React with 👍 / 👎.

AdaWorldAPI merged commit e4bea83 into main Jul 2, 2026
5 checks passed

chatgpt-codex-connector Bot reviewed Jul 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(onebrc lane B): dispatch SIMD width (U8x64 zmm / U8x32 ymm) instead of hardcoding 32#636

fix(onebrc lane B): dispatch SIMD width (U8x64 zmm / U8x32 ymm) instead of hardcoding 32#636
AdaWorldAPI merged 1 commit into
mainfrom
claude/v3-substrate-migration-review-o0yoxv

AdaWorldAPI commented Jul 2, 2026

Uh oh!

coderabbitai Bot commented Jul 2, 2026

Review limit reached

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		#[cfg(target_feature = "avx512f")]
		use ndarray::simd::U8x64 as SimdByte;

Conversation

AdaWorldAPI commented Jul 2, 2026

Change

Verification

Uh oh!

coderabbitai Bot commented Jul 2, 2026

Review limit reached

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants