You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Track follow-up performance work for Vector::dot and Vector::norm2_sq / squared norm in v0.4.4.
Recent vs_linalg benchmark results suggest these vector kernels are significantly slower than comparable nalgebra and faer rows, unlike the stronger LDLT/Cholesky solve signal. This should be investigated separately from Matrix::inf_norm so each kernel can be profiled, changed, and reviewed independently.
Current State
bench-vs-linalg includes comparable dot and norm2_sq rows across D=2-64.
The current implementation appears to lag nalgebra/faer in those rows.
Local saved-baseline workflows are documented in docs/BENCHMARKING.md for tight optimization loops.
Proposed Changes
Re-run focused vs_linalg measurements for dot and norm2_sq.
Profile the current implementations and identify whether the gap is loop shape, bounds checks, missed vectorization/FMA, Criterion setup, or another effect.
Consider allocation-free and const-generic-friendly implementation changes that preserve the public API and numerical behavior.
Add or adjust focused benchmarks/tests only where they improve confidence.
Benefits
Improves small fixed-dimension vector-kernel performance.
Keeps v0.4.4 performance work scoped alongside the inf_norm investigation.
Makes README and release performance claims easier to reason about.
Summary
Track follow-up performance work for
Vector::dotandVector::norm2_sq/ squared norm in v0.4.4.Recent
vs_linalgbenchmark results suggest these vector kernels are significantly slower than comparable nalgebra and faer rows, unlike the stronger LDLT/Cholesky solve signal. This should be investigated separately fromMatrix::inf_normso each kernel can be profiled, changed, and reviewed independently.Current State
bench-vs-linalgincludes comparabledotandnorm2_sqrows across D=2-64.docs/BENCHMARKING.mdfor tight optimization loops.Proposed Changes
vs_linalgmeasurements fordotandnorm2_sq.Benefits
inf_norminvestigation.Implementation Notes
just performance-local-vs-linalg <current-tag> <baseline-tag>for release-signal checks.just bench-save-baseline <name> vs_linalg,just bench-vs-linalg-la-stack, andjust bench-compare <name> vs_linalg all-benchesfor local iteration.