gh-150424: Widen specialized int fast paths to full int64 range#150425
gh-150424: Widen specialized int fast paths to
full int64 range#150425KRRT7 wants to merge 6 commits into
Conversation
Replace the 30-bit compact-only range check (is_medium_int) with __builtin_add_overflow / sub_overflow / mul_overflow, widening the JIT arithmetic fast path from ±2^30 to ±2^62. Relax operand guards from _PyLong_CheckExactAndCompact to PyLong_CheckExact so non-compact inputs also stay in the JIT trace. Add inline _PyLong_AsInt64 for fast digit extraction from non-compact PyLongs (avoids calling the heavy PyLong_AsLongLongAndOverflow). Results (pyperf, ARM64, rigorous): intermediate_overflow 2.70x faster double_add 2.73x faster accumulate 1.65x faster geometric mean 1.52x faster Includes pyperf-based microbenchmark (Tools/scripts/jit_int_benchmark_pyperf.py) and a simpler timeit-based version (jit_int_benchmark.py).
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
|
The following commit authors need to sign the Contributor License Agreement: |
Do we actually support such builds? |
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
|
I suspect this has been entirely generated by an LLM (at least the summary has been generated by that). It's unclear what the issue is so please first wait for some feedbcak on the issue before opening the PR. This is the common process as highlighted in our devguide. |
Hi, that's incorrect, I'm talking to a core dev directly on the the PR and as noted by the status, it's still a draft and WIP |
15-bit builds are still supported: --enable-big-digits=15 is a documented configure option, and CPython still has both 15-bit and 30-bit PyLong layouts. I called it out because widening the JIT path to full int64_t exposed a real correctness issue on that https://github.com/python/cpython/blob/main/Doc/using/configure.rst#L229-L237 |
Who's the core dev? Ok for the 15-bit but please, first share a reproducer on the issue before jumping onto PRs. I don't know if you want to address correctness or performance. Those are different. And the described scope is not helpful as there are too many points on the issue. |
I'm addressing performance, the performance changes surfaced correctness issues on 15-bit by the test suite.
I'm working on updating the title and bodies, as I said, this is a draft / WIP, I still haven't settled the final body as I'm doing some cleanup |
|
Before doing any work, we should first understand the issue. It doesn't make sense to make PRs and update the issue accordingly. This is not how our worklow is. So please, first discuss the change on the issue by including a reproducer. |
I don't think what I'm doing is an "issue", though arguably performance bottlenecks are issues (which is something that I personally believe).
it is however, Github native workflow (opening PRs as a draft, a rough PR where as a draft it is iterated and improved upon), where cpython happens to be hosted in. |
We have a policy written on the devguide. Contributors need to follow that policy. The purpose is to avoid back-and-forth so knowing first what the issue is about is the first step. PRs come after. Nothing forbids you to open something on your fork though. But any change you make in a PR against CPython takes CI resources which affects other contributors. |
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
1 similar comment
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
|
Considering @Fidget-Spinner's stance, I'm reopening it. But please follow what the devguide says. We have too many AI-generated PRs that take both triagers and reviewers time. When it comes to performance PRs, we really want to see benchmarks before PRs even if in a draft state. This gives more intution on whether the change is worth or not. |
Proposed PR title
gh-150424: Widen specialized int fast paths to full int64 rangeProposed PR body
Fixes gh-150424.
This widens the interpreter’s specialized
intfast paths (tier-2 / uop execution) from the compact-only range to the fullint64_trange, and fixes follow-up correctness issues resulting from this for non-compact exact ints and 15-bit builds.Changes:
int64_trangeint64_tPyLong_FromInt64()so 15-bit builds do not narrow throughstwodigitsTests run:
./python.exe -m unittest test.test_capi.test_opt.TestUopsOptimization./python.exe -m test test_generated_casesAdditional validation:
--with-pydebug --enable-experimental-jit --enable-big-digits=15Comparison: main (1310d2c) vs this branch (160d3cd).
Build: PGO + full LTO, installed binary, macOS arm64, Clang 22,
PYTHON_JIT=0(interpreter only).Microbenchmarks target non-compact int arithmetic directly (values that exceed
2**30but fit inint64_t).pyperformance benchmarks are general workloads not specific to this change — included to confirm no regression.
pyperformance
Microbenchmarks (
Tools/scripts/jit_int_benchmark_pyperf.py)