Implement topological_sort_levels by joemalle · Pull Request #501 · boostorg/graph

joemalle · 2026-06-07T16:23:23Z

This phab implements issue #240 . Please note that I'm not a frequent contributor, and I may have made some noob mistakes. Also, I used Claude to write this. LMK if that's a foul.

Before submitting

This PR targets the develop branch.
I searched for an existing PR or issue covering the same change.
My contribution is licensed under the Boost Software License 1.0.

Type of change

Does this PR introduce a breaking change?

Yes (describe migration impact below)
No

What this PR does

Adds functions to toposort into levels. There's a lower level function that uses a property map to output the vertex levels and a vector shorthand.

Motivation

Fixes issue #240

Testing

I added new tests. Here's what I ran

$ (cd libs/graph/test && ../../../b2 topological_sort_levels_test)
Performing configuration checks

    - default address-model    : 64-bit (cached) [1]
    - default architecture     : x86 (cached) [1]
    - symlinks supported       : yes (cached)

[1] gcc-10
...patience...
...patience...
...found 2402 targets...
...updating 4 targets...
gcc.compile.c++ ../../../bin.v2/libs/graph/test/topological_sort_levels_test.test/gcc-10/debug/x86_64/threading-multi/visibility-hidden/topological_sort_levels_test.o
gcc.link ../../../bin.v2/libs/graph/test/topological_sort_levels_test.test/gcc-10/debug/x86_64/threading-multi/visibility-hidden/topological_sort_levels_test
testing.capture-output ../../../bin.v2/libs/graph/test/topological_sort_levels_test.test/gcc-10/debug/x86_64/threading-multi/visibility-hidden/topological_sort_levels_test.run
**passed** ../../../bin.v2/libs/graph/test/topological_sort_levels_test.test/gcc-10/debug/x86_64/threading-multi/visibility-hidden/topological_sort_levels_test.test

...updated 4 targets...

Checklist

Existing tests pass (b2 in the test/ directory).
New behavior is covered by a test, or this is a docs / build / refactor change.
Documentation was updated if user-facing behavior changed.
No new compiler warnings on the platforms I built against.

jeremy-murphy · 2026-06-08T02:57:45Z

Also, I used Claude to write this. LMK if that's a foul.

It's not a foul, and it is important to note, so thank you.

jeremy-murphy · 2026-06-08T03:03:50Z

Could you please go into more detail on:

why this algorithm is useful (what problems does it solve, references to literature, etc)
why you choose this implementation, how it relates to the existing topological sort algorithm and why, for example, is it not an enhancement of the existing algorithm?

Thanks, cheers.

Becheler · 2026-06-08T05:40:44Z

Thank you @joemalle for the PR and @jeremy-murphy for the comment. Feel free to edit your own PR first message 😄
Thanks again for shipping this PR on a Sunday so quickly after I reached out 🚀

github-actions · 2026-06-08T07:46:41Z

Compiler-warning counts vs develop (auto-generated).
PR run 27098118816 vs develop run 27120109824 (724d8f3c9a).

Job	Baseline	After	Delta
macos (clang, 14)	75	75	0
macos (clang, 17)	70	70	0
macos (clang, 20)	70	70	0
ubuntu (clang-19, 14)	75	75	0
ubuntu (clang-19, 17)	75	75	0
ubuntu (clang-19, 20)	75	75	0
ubuntu (clang-19, 23)	75	75	0
ubuntu (gcc-14, 14)	51	51	0
ubuntu (gcc-14, 17)	51	51	0
ubuntu (gcc-14, 20)	51	51	0
ubuntu (gcc-14, 23)	51	51	0
windows_msvc_14_3 (msvc-14.3)	1226	1232	+6

Becheler · 2026-06-08T08:45:26Z

Note 1 : the +6 warnings on msvc are not an error of the test or header, it's due to Boost.PropertyMap lib forcing a size_t -> unsigned -> size_t. I opened a PR on Boost.PropertyMap repo that should fix this warning (and others) : boostorg/property_map#37

Note 2 Boost.PropertyMap was community maitained, I jumped in but the CI is on fire, we should not condition the present topological_sort_levels on those 6 warnings that are not @joemalle responsibility 🥲

joemalle · 2026-06-12T23:09:33Z

Hi, I'm happy to contribute! And thank you for investigating/fixing those warnings.

Here are some answers. I can edit the main description when we have a finalized design.

why this algorithm is useful (what problems does it solve, references to literature, etc)

Sorting into "levels" or "generations" is a common issue in scheduling. You find yourself with a DAG of jobs, and you want to know which ones can run concurrently.

I faced a similar scheduling-related issue when I posted this three years ago. I wanted to find sets of instructions that could be reordered/vectorized without affecting correctness.

While writing this answer, I learned that toposorting into levels can also be useful when drawing layered graphs e.g. when rendering DOT/graphviz files.

why you choose this implementation, how it relates to the existing topological sort algorithm and why, for example, is it not an enhancement of the existing algorithm?

We absolutely could enhance the existing DFS implementation. I'm happy to switch to that if you prefer. However, we'd have to add more branches and a small amount of extra state to track the depth. This BFS algorithm is, in my opinion, more natural since it naturally iterates through levels.

This is mentioned in the commit, but just in case, this is Kahn's algorithm (https://en.wikipedia.org/wiki/Topological_sorting#Kahn's_algorithm).

jeremy-murphy · 2026-06-16T22:23:11Z

+  order of vertices within a level is unspecified.
+</blockquote>
+
+<h3>Named Parameters</h3>


@Becheler , correct me if I'm wrong, but I think we're going to eschew named parameters now, right?

jeremy-murphy · 2026-06-16T23:04:11Z

Cool, thanks for answering my questions, I think this will be a valuable addition to the algorithms.

This is mentioned in the commit, but just in case, this is Kahn's algorithm (https://en.wikipedia.org/wiki/Topological_sorting#Kahn's_algorithm).

Ah, now I understand. In that case, I think this should be implemented as a breadth_first_search visitor.
The visitor has a level state that starts at zero. The first thing the algorithm has to do is find all the vertices with no incoming edges to provide as input to breadth_first_search as sources.
When discover_vertex(v, g) is called, the visitor 'associates' (exactly what that means should probably be left to the user but with a sensible default) the vertex with the level by calling something like set(v, level). The trick is then that when examine_vertex(u, g) is called, the visitor sets the level to get(u) + 1.

I think it's as simple as that, but maybe I've misunderstood. :)

(Oh, and it should throw an exception if it detects a cycle.)

PS. It's also possible to detect the transition from one level to the next as a specific event and mark it, which I think would permit more dynamic interaction. What I mean is, since the levels are found sequentially by the visitor, a scheduler using this algorithm could start scheduling one level of tasks without waiting for the whole graph to be processed. If the result is stored in containers then it seems pretty awkward for the scheduler to detect the level change; as in, if we use the vector<vector<vertex_descriptor>> design, it would have to watch and respond to a push_back on the outer vector. What I think is better, and this is an important consideration of generic programming design, is that the algortihm basically outputs a stream of vertices (vertex_descriptors) punctuated with level changes. E.g.: [1, 4, 2, LEVEL, 3, 5, 0, LEVEL, 7, 8, 9, LEVEL, 6] (for instance, for a graph of size 10), so that the user is not committed to allocating any memory. (They might just print the values to the console.) We don't support dynamically created graphs, so we don't need a termination punctuator, since we know exactly how many vertices to expect. Graphs have a default invalid_vertex value, so that would presumably be the natural level punctuator. I think that's the ideal interface. What do you think?

Becheler · 2026-06-17T07:41:23Z

@jeremy-murphy Unless I'm mistaken I think Kahn and BFS do different things?
For example, if you take the following graph, you expect scheduling levels to be A:0, B:1, C:2

   A
  / \
 v   v
 B-->C

With BFS you'd get A:0, B:1, C:1 because it doesn't look at the long path A>B>C the scheduler needs:

Start at A (level 0). A's out-edges: A -> B and A -> C.
B is new -> give it level 1. C is new -> give it level 1.
Process B, look at it's out-edges: C was already touched, don't touch it again.

With Kahn, you'd get A:0, B:1, C:2 because it looks at how many edges still point into each vertex, and only releases a vertex when that count hits zero, so it looks at the long path (waiting for all predecessors)

in-degrees: A:0, B:1, C:2
Wave 0: in-degree 0 -> {A}. Level 0. Remove A, decrement its targets: B:0, C:1.
Wave 1: in-degree 0 -> {B}. Level 1. Remove B, decrement its targets: C:0.
Wave 2: in-degree 0 -> {C}. Level 2.

thoughts ?

@jeremy-murphy we (surprisingly) do support lazy/dynamic/infinite constructed graphs, see e.g. #134 but Kahn needs access to the whole vertex set so it does not sound very useful here.

Note: I am also very sorry I prepared a more thorough review but somehow it disappeared ? I was mostly question the vector allocations inside the algorithm and the return type. I am not sure the bucket double-vector (although a convenience wrapper) is idiomatic for BGL.

jeremy-murphy · 2026-06-17T09:09:48Z

Yup, looks like I oversimplified because I thought it would make such an elegant solution. 😂

jeremy-murphy · 2026-06-17T11:59:38Z

Ok, so I was wrong about the algorithm details, very wrong 😂, but it looks like we should be able to get the online behaviour I was talking about, emitting the vertices in level order with punctuation to delimit the levels. I think it's basically Kahn's algorithm but using two queues alternately to track the level change, and a slightly more detailed property map. Hmm, of course I could be completely wrong again. 😅
PS. I looked more closely and realized the current code is actually using two queues, so just ignore me.

Becheler · 2026-06-17T13:01:28Z

Ahaha I think we are all figuring things out. You thought about property map, but could a visitor answer your vision ? Something like:


struct my_visitor {
    void discover_vertex(Vertex v, const Graph&) {
        std::cout << v << " "; // a task in the current wave
    }
    void finish_level(const Graph&) {
        std::cout << "| "; // a wave just ended
    }
};

topological_sort_levels(g, my_visitor{});
// prints 0 1 | 2 3 4 | 5 |

Then populating vectors or property maps or whatever become a thin wrapper on top of that ?

jeremy-murphy · 2026-06-17T22:12:43Z

Ahaha I think we are all figuring things out. You thought about property map, but could a visitor answer your vision ?

Interesting. It does feel like the Kahn algorithm could actually be used as a fundamental traversal algorithm with a visitor underlying this. Let's park that idea for now as it would take some time to work out the details, which don't have to hold up this particular algorithm.

jeremy-murphy · 2026-06-17T22:33:20Z

+IN: <tt>vertex_index_map(VertexIndexMap i_map)</tt>
+<blockquote>
+  Maps each vertex to an integer in <tt>[0, num_vertices(g))</tt>, used
+  internally to store running in-degrees. The type must be a model of
+  <a href="../../property_map/doc/ReadablePropertyMap.html">Readable Property
+  Map</a> whose key type is the graph's vertex descriptor and whose value
+  type is an integer.<br>
+  <b>Default:</b> <tt>get(vertex_index, g)</tt>. If you use this default,
+  make sure your graph has an internal <tt>vertex_index</tt> property
+  (e.g. <tt>adjacency_list&lt;…, vecS, …&gt;</tt>).
+</blockquote>


Why is this necessary? Why not just map each vertex directly to the in-degree?

jeremy-murphy · 2026-06-17T22:49:07Z

+        typename VertexIndexMap >
+    typename graph_traits< VertexListGraph >::vertices_size_type
+    topological_sort_levels_impl(
+        const VertexListGraph& g, LevelMap level, VertexIndexMap index_map)


So, in a library like the BGL, this function should actually be the first-class public algorithm, not an implementation detail. Adding convenience wrappers around it is fine, but they should be considered as optional extras. This function is what matters.

Instead of the LevelMap level parameter, the last parameter should be VertexOutputIterator result (or something like that), where VertexOutputIterator is a template parameter.
Then, replace put(level, v, level_count); with *result++ = v; and do *result++ = boost::graph_traits<VertexListGraph>::null_vertex(); on line 88 (after next_level.clear();).

Finally, return the result iterator, not level_count.

For testing just use back_inserter onto a vector.

The point is that users could actually process the vertices coming out interactively without waiting for the algorithm to finish, using boost::function_output_iterator, etc.

Implement topological_sort_levels

724d8f3

joemalle marked this pull request as ready for review June 7, 2026 16:24

joemalle requested review from Becheler and jeremy-murphy as code owners June 7, 2026 16:24

joemalle mentioned this pull request Jun 7, 2026

request: toposort into levels #240

Open

jeremy-murphy self-assigned this Jun 8, 2026

jeremy-murphy reviewed Jun 16, 2026

View reviewed changes

jeremy-murphy reviewed Jun 17, 2026

View reviewed changes

jeremy-murphy requested changes Jun 17, 2026

View reviewed changes

Conversation

joemalle commented Jun 7, 2026

Before submitting

Type of change

Does this PR introduce a breaking change?

What this PR does

Motivation

Testing

Checklist

Uh oh!

jeremy-murphy commented Jun 8, 2026

Uh oh!

jeremy-murphy commented Jun 8, 2026

Uh oh!

Becheler commented Jun 8, 2026

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

Becheler commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joemalle commented Jun 12, 2026

Uh oh!

jeremy-murphy Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

jeremy-murphy commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Becheler commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeremy-murphy commented Jun 17, 2026

Uh oh!

jeremy-murphy commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Becheler commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeremy-murphy commented Jun 17, 2026

Uh oh!

jeremy-murphy Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

jeremy-murphy Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Becheler commented Jun 8, 2026 •

edited

Loading

jeremy-murphy commented Jun 16, 2026 •

edited

Loading

Becheler commented Jun 17, 2026 •

edited

Loading

jeremy-murphy commented Jun 17, 2026 •

edited

Loading

Becheler commented Jun 17, 2026 •

edited

Loading