Skip to content

fix(onebrc lane B): dispatch SIMD width (U8x64 zmm / U8x32 ymm) instead of hardcoding 32#636

Merged
AdaWorldAPI merged 1 commit into
mainfrom
claude/v3-substrate-migration-review-o0yoxv
Jul 2, 2026
Merged

fix(onebrc lane B): dispatch SIMD width (U8x64 zmm / U8x32 ymm) instead of hardcoding 32#636
AdaWorldAPI merged 1 commit into
mainfrom
claude/v3-substrate-migration-review-o0yoxv

Conversation

@AdaWorldAPI

Copy link
Copy Markdown
Owner

Follow-up on merged #635. The lane-B delimiter scan hardcoded array_chunks::<u8, 32> + U8x32 throughout, pinning the walk to 32-byte ymm (AVX2) regardless of target-cpu — so under target-cpu=x86-64-v4/native it never strided the 64-byte zmm the AVX-512 build provides. (The probe's .cargo/config.toml v3 pin stays — it's a deliberate CI-parity choice; this only makes lane B honor native/v4 when a run opts in.)

Change

  • SimdByte = compile-time width alias: U8x64 under cfg(target_feature = "avx512f"), U8x32 otherwise. Both are ndarray::simd types (the "all SIMD from ndarray::simd" iron rule — no raw intrinsic). cmpeq_mask returns u64/u32 respectively; the set-bit walk was already generic over the mask width, so the body is unchanged apart from the alias.
  • array_chunks::<u8, { SimdByte::LANES }> — the const-generic tracks the dispatched width; aligned_end, pos, needles, and from_slice all key off SimdByte::LANES. No literal stride remains.
  • Module + fn docs rewritten to describe the dispatch (64-byte zmm avx512 / 32-byte ymm avx2) instead of asserting a fixed 32.
  • Test ..._straddle_32_byte_block_boundaries..._straddle_block_boundaries, now asserting the crossing at the dispatched lane_b::SIMD_LANES (test-gated const) instead of / 32. The 68-byte corpus straddles a boundary at both widths (long_name @32, Vv @64), so cross-block-carry coverage holds either way.

Verification

Both arms, from the crate dir (onebrc-probe builds standalone):

  • v3 default (U8x32, 32-byte ymm): 16/16 lane-b tests byte-parity with lane A; clippy -D warnings clean (lib + all-targets); fmt clean.
  • RUSTFLAGS=-Ctarget-cpu=native (U8x64, 64-byte zmm on an avx512f host): 16/16; clippy clean (all-targets).

README/FINDINGS narrative on the v3-pin correction is intentionally left to the parallel session's §5.5 to avoid clobbering its in-flight edits.

🤖 Generated with Claude Code

https://claude.ai/code/session_01MLBnPuScZy6w9di2QEjsXM


Generated by Claude Code

… stride

The scan hardcoded `array_chunks::<u8, 32>` + `U8x32` throughout, pinning
the delimiter walk to 32-byte `ymm` (AVX2) regardless of target-cpu — so
under `target-cpu=x86-64-v4`/`native` it strided `ymm`, never the 64-byte
`zmm` the AVX-512 build provides. (The probe's `.cargo/config.toml` v3 pin
is a deliberate CI-parity choice; this is about honoring native/v4 when a
run opts into it — "here v4 or native is a must".)

- `SimdByte` = compile-time width alias: `U8x64` under
  `cfg(target_feature = "avx512f")`, `U8x32` otherwise. Both are
  `ndarray::simd` types (iron rule; no raw intrinsic). `cmpeq_mask`
  returns `u64`/`u32` respectively; the set-bit walk was already generic
  over the mask width, so the body is unchanged apart from the alias.
- `array_chunks::<u8, { SimdByte::LANES }>` — the const-generic tracks the
  dispatched width; `aligned_end`, `pos`, needles, and `from_slice` all key
  off `SimdByte::LANES`. No literal stride remains.
- Module + fn docs rewritten to describe the dispatch (64-byte zmm avx512
  / 32-byte ymm avx2) instead of asserting a fixed 32.
- Test `..._straddle_32_byte_block_boundaries` → `..._straddle_block_boundaries`,
  now asserts crossing at the dispatched `lane_b::SIMD_LANES` (test-gated
  const) instead of a literal `/ 32`; the 68-byte corpus straddles a
  boundary at BOTH widths (`long_name` @32, `Vv` @64), so coverage holds
  either way.

Verified both arms: v3 default (U8x32, 32B) and `RUSTFLAGS=-Ctarget-cpu=native`
(U8x64, 64B zmm on this avx512f host) — 16/16 lane-b tests byte-parity with
lane A, clippy `-D warnings` clean (lib + all-targets) on both, fmt clean.

README/FINDINGS narrative on the v3-pin correction is deferred to the
parallel session's §5.5 to avoid clobbering its in-flight README edits.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MLBnPuScZy6w9di2QEjsXM
@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@AdaWorldAPI, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 20 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: a0fdf7b9-53f8-4036-89f4-5c3c78b048f0

📥 Commits

Reviewing files that changed from the base of the PR and between e1279cf and 45623c2.

📒 Files selected for processing (2)
  • crates/onebrc-probe/src/lane_b.rs
  • crates/onebrc-probe/src/lib.rs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@AdaWorldAPI AdaWorldAPI merged commit e4bea83 into main Jul 2, 2026
5 checks passed

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 45623c2993

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +66 to +67
#[cfg(target_feature = "avx512f")]
use ndarray::simd::U8x64 as SimdByte;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require AVX512BW before selecting U8x64

On targets that advertise avx512f without avx512bw (for example -Ctarget-cpu=knl or a manual -Ctarget-feature=+avx512f build), this alias selects U8x64, and the scan later calls byte cmpeq_mask. In the ndarray fork that method is implemented with the AVX-512 byte-compare intrinsic, which needs AVX512BW, so Lane B can execute an unsupported instruction instead of falling back to the 32-byte path. Please gate the 64-byte alias on both avx512f and avx512bw.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants