prio-queue: use cascade-down sift for faster extract-min by spkrka · Pull Request #2132 · gitgitgadget/git

spkrka · 2026-05-30T17:07:59Z

This is a small optimization to prio_queue_get() that reduces the
number of comparisons per extract-min from 2d to d (where d is the
sift distance).

The standard extract places the last array element at the root and
sifts it down, comparing against both children at each level. The
new sift_up_rebalance() instead promotes the smaller child at each
level (one comparison and one copy) leaving a vacancy that sinks
to a leaf. The last element is placed there and sifted up, which
in practice rarely moves more than a level or two.

The improvement shows clearly in synthetic benchmarks (up to 1.38x
for ascending keys at queue width 100) but is modest end-to-end
since sift_down_root is only a fraction of total runtime. On the
linux kernel repo, rev-list --count v5.0..v6.0 improves by ~2%.
The effect scales with DAG width.

Changes since v1:

Kept sift_down_root() and prio_queue_replace() completely
unchanged, preserving René's optimization that avoids the
get+put overhead for replace. The cascade approach now only
applies to prio_queue_get().
Extracted the new logic into a separate sift_up_rebalance()
function rather than inlining it in prio_queue_get().
Updated benchmark numbers for ascending, descending and
random insertion ordering. No regressions in any scenario.

spkrka · 2026-05-31T17:56:22Z

/submit

gitgitgadget · 2026-05-31T17:57:27Z

Submitted as pull.2132.git.1780250236304.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-2132/spkrka/cascade-sift-down-v1

To fetch this version to local tag pr-2132/spkrka/cascade-sift-down-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-2132/spkrka/cascade-sift-down-v1

gitgitgadget · 2026-06-01T01:56:26Z

Junio C Hamano wrote on the Git mailing list (how to reply to this email):

"Kristofer Karlsson via GitGitGadget" <gitgitgadget@gmail.com>
writes:

I'll add René the recipients, as _replace() was added by him as
optimization, so "this new one is functionally equivalent to the
original" somewhat misses the point, even though we may all agree
that the change is a very good one overall in the end when we look
at the entire picture.

> From: Kristofer Karlsson <krka@spotify.com>
>
> Replace the standard sift-down in prio_queue_get() with a
> cascade-down approach.
>
> The standard approach places the last array element at the root,
> then sifts it down.  At each level this requires two comparisons
> (left vs right child, then element vs winner) and, when the
> element is larger, a swap (three 16-byte copies).
>
> The cascade approach instead promotes the smaller child into the
> vacant root slot at each level — one comparison and one copy.
> The vacancy sinks to a leaf, where the last array element is
> placed and sifted up if needed — typically zero levels since the
> last array element tends to be large.
>
> In the common case, work per extract drops from 2d comparisons
> + 3d copies to d comparisons + d copies: roughly half the
> comparisons and a third of the data movement.  The sift-up phase
> can add work when the last element is smaller than ancestors of
> the leaf vacancy, but this is rare in practice.
>
> Simplify prio_queue_replace() to a plain get+put sequence.  This
> is semantically equivalent: the old implementation wrote to slot 0
> and sifted down, which has the same observable effect as removing
> the root and inserting a new element.  No caller observes queue
> state between the two operations.  The previous implementation
> shared sift_down_root() with get, but the cascade approach no
> longer accommodates that cleanly since sift_down_root() now
> expects the element to reinsert at queue->array[queue->nr], left
> there by prio_queue_get() after decrementing nr.  This is fine in
> practice: replace is only called from pop_most_recent_commit()
> (fetch-pack, object-name, walker) and show-branch — none of
> which appear in any hot path.
>
> A synthetic benchmark (10 rounds of 10M put+get cycles, ascending
> integer keys, CPU-pinned, median of 3 runs, same compiler and
> Makefile flags) shows consistent improvement across all queue
> sizes, with no regressions:
>
>     queue width       baseline    cascade    speedup
>     ------------------------------------------------
>              10        4.32s      3.97s      1.09x
>             100        7.95s      6.49s      1.23x
>           1,000       11.30s      9.66s      1.17x
>          10,000       16.34s     14.15s      1.16x
>         100,000       21.43s     18.66s      1.15x
>
> With descending keys (worst case — the last element always sinks
> to a leaf in both approaches) the cascade still wins slightly
> (1-4%) by replacing swaps with copies, and never regresses.
>
> In end-to-end git commands the improvement is modest because
> sift_down_root is only ~8% of total runtime.  Profiling
> rev-list --count on a 2.5M-commit monorepo shows sift_down_root
> dropping from 8.2% to 0.4% of total runtime.  The improvement
> scales with DAG width: wider DAGs produce larger priority queues,
> amplifying the per-level savings.  In small or narrow repos the
> queues stay shallow and the effect is negligible.
>
> Signed-off-by: Kristofer Karlsson <krka@spotify.com>
> ---
>     prio-queue: use cascade-down sift for faster extract-min
>     
>     Hi, I am not sure this is just noise or not but I thought it at least
>     was interesting.
>     
>     I looked into the internals of prio_queue and found it was technically
>     doing too much work and could be simplified/optimized. I found I could
>     optimize it by ~20% for the common case (adding commits that would
>     typically end up far back in the queue) but only ~1% for the reverse
>     case (adding things to the front of the prio queue). The average speedup
>     is somewhere in between I suppose. That said, this is not really the
>     bottleneck so the overall boost seems to be around ~3-4% improvement for
>     repos with wide DAGs.
>     
>     I would normally classify this as not urgent or important, but I think
>     the advantage is that the change is very small and simple and it already
>     has good unit tests (t/unit-tests/u-prio-queue.c).
>     
>     With that said, here are the details:
>     
>     The prio_queue_get impl is based on removing the root entry, then moving
>     the very last element into the root slot, then sifting it down into the
>     right place. This uses both comparisons between sibling elements in the
>     heap as well as comparisons between the element to add and one of the
>     siblings. Then it uses swap operations to move things correctly.
>     
>     This patch instead promotes the smaller child upward at each level,
>     leaving a vacancy that sinks to a leaf, then places the removed element
>     there with a short sift-up to keep the heap balanced.
>     
>     We can analytically compare this - for a sift-distance of d we can
>     reason about the number of operations to execute.
>     
>     Before: 2d comparisons + 3d copies
>     After:   d comparisons +  d copies
>     
>     
>     After changing sift_down in this way, the replace operation can't simply
>     depend on it anymore, so I reimplemented it as a sequence of get + put.
>     This is technically correct but maybe not as efficient. However, I am
>     not sure that it matters, since I couldn't see any usage of the replace
>     operation in any hot path.
>     
>     Performance: Profiling git rev-list --count on a 2.5M-commit monorepo
>     shows sift_down_root dropping from 8.2% to 0.4% of total runtime,
>     effectively eliminated as significant overhead.
>     
>     Synthetic benchmark 10 rounds of 10M put+get cycles, CPU-pinned, median
>     of 3 runs, same compiler and Makefile flags.
>     
>     Ascending keys (git's typical pattern -- parents have lower priority
>     than children):
>     
>     queue width  baseline  patched  speedup
>              10     4.32s    3.97s    1.09x
>             100     7.95s    6.49s    1.23x
>           1,000    11.30s    9.66s    1.17x
>          10,000    16.34s   14.15s    1.16x
>         100,000    21.43s   18.66s    1.15x
>     
>     
>     Descending keys (worst case — last element always sinks to leaf in both
>     approaches):
>     
>     queue width  baseline  patched  speedup
>              10     4.84s    4.78s    1.01x
>             100     9.43s    9.20s    1.03x
>           1,000    15.28s   14.71s    1.04x
>          10,000    23.61s   23.49s    1.01x
>         100,000    29.16s   28.22s    1.03x
>     
>     
>     No regressions in any scenario.
>     
>     End-to-end benchmarks
>     
>     All benchmarks use a benchmark setup of 1 warmup run followed by 10
>     timed runs. Each configuration is built from the same source tree and
>     tested on the same repo in alternating order.
>     
>     linux kernel (1.4M commits) — range v5.0..v6.0 (311K commits):
>     
>     Command                      baseline  patched  speedup
>     rev-list --count v5.0..v6.0     455ms    440ms    1.04x
>     
>     
>     I also ran it on git.git but did not see any performance diff at all,
>     due to the size and narrow DAG.
>     
>     The improvement scales with DAG width: wider DAGs produce larger
>     priority queues, amplifying the per-level savings. In small or narrow
>     repositories the priority queues stay shallow and the sift-down cost is
>     already negligible, so the change is not noticeable.
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2132%2Fspkrka%2Fcascade-sift-down-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2132/spkrka/cascade-sift-down-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/2132
>
>  prio-queue.c | 22 ++++++++++++----------
>  1 file changed, 12 insertions(+), 10 deletions(-)
>
> diff --git a/prio-queue.c b/prio-queue.c
> index 9748528ce6..18005c43c4 100644
> --- a/prio-queue.c
> +++ b/prio-queue.c
> @@ -62,17 +62,21 @@ static void sift_down_root(struct prio_queue *queue)
>  {
>  	size_t ix, child;
>  
> -	/* Push down the one at the root */
> -	for (ix = 0; ix * 2 + 1 < queue->nr; ix = child) {
> -		child = ix * 2 + 1; /* left */
> +	for (ix = 0; (child = ix * 2 + 1) < queue->nr; ix = child) {
>  		if (child + 1 < queue->nr &&
>  		    compare(queue, child, child + 1) >= 0)
>  			child++; /* use right child */
> +		queue->array[ix] = queue->array[child];
> +	}
>  
> -		if (compare(queue, ix, child) <= 0)
> +	/* Place queue->array[queue->nr] (left by caller) and sift up. */
> +	queue->array[ix] = queue->array[queue->nr];
> +	while (ix) {
> +		size_t parent = (ix - 1) / 2;
> +		if (compare(queue, parent, ix) <= 0)
>  			break;
> -
> -		swap(queue, child, ix);
> +		swap(queue, parent, ix);
> +		ix = parent;
>  	}
>  }
>  
> @@ -89,7 +93,6 @@ void *prio_queue_get(struct prio_queue *queue)
>  	if (!--queue->nr)
>  		return result;
>  
> -	queue->array[0] = queue->array[queue->nr];
>  	sift_down_root(queue);
>  	return result;
>  }
> @@ -111,8 +114,7 @@ void prio_queue_replace(struct prio_queue *queue, void *thing)
>  		queue->array[queue->nr - 1].ctr = queue->insertion_ctr++;
>  		queue->array[queue->nr - 1].data = thing;
>  	} else {
> -		queue->array[0].ctr = queue->insertion_ctr++;
> -		queue->array[0].data = thing;
> -		sift_down_root(queue);
> +		prio_queue_get(queue);
> +		prio_queue_put(queue, thing);
>  	}
>  }
>
> base-commit: c69baaf57ba26cf117c2b6793802877f19738b0d

gitgitgadget · 2026-06-01T06:57:42Z

Junio C Hamano wrote on the Git mailing list (how to reply to this email):

"Kristofer Karlsson via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> diff --git a/prio-queue.c b/prio-queue.c
> index 9748528ce6..18005c43c4 100644
> --- a/prio-queue.c
> +++ b/prio-queue.c
> @@ -62,17 +62,21 @@ static void sift_down_root(struct prio_queue *queue)
>  {
>  	size_t ix, child;
>  
> -	/* Push down the one at the root */
> -	for (ix = 0; ix * 2 + 1 < queue->nr; ix = child) {
> -		child = ix * 2 + 1; /* left */
> +	for (ix = 0; (child = ix * 2 + 1) < queue->nr; ix = child) {
>  		if (child + 1 < queue->nr &&
>  		    compare(queue, child, child + 1) >= 0)
>  			child++; /* use right child */
> +		queue->array[ix] = queue->array[child];
> +	}
>  
> -		if (compare(queue, ix, child) <= 0)
> +	/* Place queue->array[queue->nr] (left by caller) and sift up. */
> +	queue->array[ix] = queue->array[queue->nr];

Here we always sift/bubble up the last element.

I am wondering if it makes sense to teach sift_down_root to take an
extra argument, "struct prio_queue_entry entry" (passed by value)
and sift/bubble it up, not always queue->array[queue->nr], and ...

> +	while (ix) {
> +		size_t parent = (ix - 1) / 2;
> +		if (compare(queue, parent, ix) <= 0)
>  			break;
> -
> -		swap(queue, child, ix);
> +		swap(queue, parent, ix);
> +		ix = parent;
>  	}
>  }
>  
> @@ -89,7 +93,6 @@ void *prio_queue_get(struct prio_queue *queue)
>  	if (!--queue->nr)
>  		return result;
>  
> -	queue->array[0] = queue->array[queue->nr];
>  	sift_down_root(queue);
>  	return result;
>  }
> @@ -111,8 +114,7 @@ void prio_queue_replace(struct prio_queue *queue, void *thing)
>  		queue->array[queue->nr - 1].ctr = queue->insertion_ctr++;
>  		queue->array[queue->nr - 1].data = thing;
>  	} else {
> -		queue->array[0].ctr = queue->insertion_ctr++;
> -		queue->array[0].data = thing;
> -		sift_down_root(queue);
> +		prio_queue_get(queue);
> +		prio_queue_put(queue, thing);

... update this part in the else clause to do something like

		struct prio_queue_entry entry;
		entry.ctr = queue->insertion_ctr++;
		entry.data = thing;
		sift_down_root(queue, entry);

to retain the optimization?  It would perform a single cascade-down
sift, followed by a single sift-up, so it would save a comparison, a
copy, and a swap in the worset case compared to the get+put sequence?

Of course, the original sift_down_root() caller (i.e. prio_queue_get())
needs to pass queue->array[queue->nr] as the second parameter to match.

>  	}
>  }
>
> base-commit: c69baaf57ba26cf117c2b6793802877f19738b0d

gitgitgadget · 2026-06-01T06:57:45Z

Kristofer Karlsson wrote on the Git mailing list (how to reply to this email):

Thanks for the quick and very valid feedback! I already started
investigating - I think I was too quick (and wrong) when I reasoned
about the replace operation.I will rework it a bit and come back with
a patch version 2 soon that ensures that neither get and replace have
regressed in any way.

- Kristofer

On Mon, 1 Jun 2026 at 08:16, Junio C Hamano <gitster@pobox.com> wrote:
>
> "Kristofer Karlsson via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > diff --git a/prio-queue.c b/prio-queue.c
> > index 9748528ce6..18005c43c4 100644
> > --- a/prio-queue.c
> > +++ b/prio-queue.c
> > @@ -62,17 +62,21 @@ static void sift_down_root(struct prio_queue *queue)
> >  {
> >       size_t ix, child;
> >
> > -     /* Push down the one at the root */
> > -     for (ix = 0; ix * 2 + 1 < queue->nr; ix = child) {
> > -             child = ix * 2 + 1; /* left */
> > +     for (ix = 0; (child = ix * 2 + 1) < queue->nr; ix = child) {
> >               if (child + 1 < queue->nr &&
> >                   compare(queue, child, child + 1) >= 0)
> >                       child++; /* use right child */
> > +             queue->array[ix] = queue->array[child];
> > +     }
> >
> > -             if (compare(queue, ix, child) <= 0)
> > +     /* Place queue->array[queue->nr] (left by caller) and sift up. */
> > +     queue->array[ix] = queue->array[queue->nr];
>
> Here we always sift/bubble up the last element.
>
> I am wondering if it makes sense to teach sift_down_root to take an
> extra argument, "struct prio_queue_entry entry" (passed by value)
> and sift/bubble it up, not always queue->array[queue->nr], and ...
>
> > +     while (ix) {
> > +             size_t parent = (ix - 1) / 2;
> > +             if (compare(queue, parent, ix) <= 0)
> >                       break;
> > -
> > -             swap(queue, child, ix);
> > +             swap(queue, parent, ix);
> > +             ix = parent;
> >       }
> >  }
> >
> > @@ -89,7 +93,6 @@ void *prio_queue_get(struct prio_queue *queue)
> >       if (!--queue->nr)
> >               return result;
> >
> > -     queue->array[0] = queue->array[queue->nr];
> >       sift_down_root(queue);
> >       return result;
> >  }
> > @@ -111,8 +114,7 @@ void prio_queue_replace(struct prio_queue *queue, void *thing)
> >               queue->array[queue->nr - 1].ctr = queue->insertion_ctr++;
> >               queue->array[queue->nr - 1].data = thing;
> >       } else {
> > -             queue->array[0].ctr = queue->insertion_ctr++;
> > -             queue->array[0].data = thing;
> > -             sift_down_root(queue);
> > +             prio_queue_get(queue);
> > +             prio_queue_put(queue, thing);
>
> ... update this part in the else clause to do something like
>
>                 struct prio_queue_entry entry;
>                 entry.ctr = queue->insertion_ctr++;
>                 entry.data = thing;
>                 sift_down_root(queue, entry);
>
> to retain the optimization?  It would perform a single cascade-down
> sift, followed by a single sift-up, so it would save a comparison, a
> copy, and a swap in the worset case compared to the get+put sequence?
>
> Of course, the original sift_down_root() caller (i.e. prio_queue_get())
> needs to pass queue->array[queue->nr] as the second parameter to match.
>
> >       }
> >  }
> >
> > base-commit: c69baaf57ba26cf117c2b6793802877f19738b0d

gitgitgadget · 2026-06-01T06:57:47Z

User Kristofer Karlsson <krka@spotify.com> has been added to the cc: list.

Add sift_up_rebalance(), an alternative to sift_down_root() that halves the number of comparisons per extract-min. The standard extract places the last array element at the root and sifts it down. At each level this requires two comparisons (left vs right child, then element vs winner) and a swap. sift_up_rebalance() instead promotes the smaller child into the root slot at each level — one comparison and one copy — until the vacancy reaches a leaf. The last array element is placed at the vacancy and sifted up to restore heap order. In practice the sift-up rarely moves more than a level or two because the last array element tends to be large. Work per extract drops from 2d comparisons + d swaps to d comparisons + d copies + a short sift-up. prio_queue_get() now calls sift_up_rebalance() instead of placing the last element at root and calling sift_down_root(). sift_down_root() and prio_queue_replace() are left unchanged. Synthetic benchmark (10 rounds of 10M put+get cycles, CPU-pinned, same compiler and Makefile flags): Ascending keys (git's typical pattern — parents have lower priority than children): queue width baseline patched speedup 10 4.39s 3.91s 1.12x 100 9.10s 6.61s 1.38x 1,000 11.84s 9.25s 1.28x 10,000 17.50s 13.92s 1.26x 100,000 23.97s 20.19s 1.19x Descending keys (worst case — last element always sinks to leaf): queue width baseline patched speedup 10 4.94s 4.95s 1.00x 100 9.75s 9.42s 1.03x 1,000 15.01s 15.29s 0.98x 10,000 24.79s 23.88s 1.04x 100,000 29.69s 28.24s 1.05x Random keys: queue width baseline patched speedup 10 5.05s 4.99s 1.01x 100 9.90s 9.50s 1.04x 1,000 15.35s 14.77s 1.04x 10,000 25.35s 24.21s 1.05x 100,000 65.71s 63.38s 1.04x No regressions in any scenario. End-to-end benchmark on the linux kernel repo (1.4M commits, range v5.0..v6.0, 311K commits, 20 interleaved runs, 1 warmup): Command baseline patched speedup rev-list --count v5.0..v6.0 484ms 474ms 1.02x The improvement scales with DAG width: wider DAGs produce larger priority queues, amplifying the per-level savings. In small or narrow repositories the queues stay shallow and the sift-down cost is already negligible. Signed-off-by: Kristofer Karlsson <krka@spotify.com>

spkrka · 2026-06-01T08:16:31Z

/submit

gitgitgadget · 2026-06-01T08:17:46Z

Submitted as pull.2132.v2.git.1780301856444.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-2132/spkrka/cascade-sift-down-v2

To fetch this version to local tag pr-2132/spkrka/cascade-sift-down-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-2132/spkrka/cascade-sift-down-v2

spkrka force-pushed the cascade-sift-down branch 9 times, most recently from 0a3a2b0 to 9ca2fab Compare May 31, 2026 08:25

spkrka marked this pull request as ready for review May 31, 2026 17:56

spkrka force-pushed the cascade-sift-down branch from 9ca2fab to 6051d44 Compare June 1, 2026 07:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prio-queue: use cascade-down sift for faster extract-min#2132

prio-queue: use cascade-down sift for faster extract-min#2132
spkrka wants to merge 1 commit into
gitgitgadget:masterfrom
spkrka:cascade-sift-down

spkrka commented May 30, 2026 •

edited

Loading

Uh oh!

spkrka commented May 31, 2026

Uh oh!

gitgitgadget Bot commented May 31, 2026

Uh oh!

gitgitgadget Bot commented Jun 1, 2026

Uh oh!

gitgitgadget Bot commented Jun 1, 2026

Uh oh!

gitgitgadget Bot commented Jun 1, 2026

Uh oh!

gitgitgadget Bot commented Jun 1, 2026

Uh oh!

spkrka commented Jun 1, 2026

Uh oh!

gitgitgadget Bot commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

spkrka commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

spkrka commented May 31, 2026

Uh oh!

gitgitgadget Bot commented May 31, 2026

Uh oh!

gitgitgadget Bot commented Jun 1, 2026

Uh oh!

gitgitgadget Bot commented Jun 1, 2026

Uh oh!

gitgitgadget Bot commented Jun 1, 2026

Uh oh!

gitgitgadget Bot commented Jun 1, 2026

Uh oh!

spkrka commented Jun 1, 2026

Uh oh!

gitgitgadget Bot commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

spkrka commented May 30, 2026 •

edited

Loading