Skip to content

Implement topological_sort_levels#501

Open
joemalle wants to merge 1 commit into
boostorg:developfrom
joemalle:feature/topological_sort_levels
Open

Implement topological_sort_levels#501
joemalle wants to merge 1 commit into
boostorg:developfrom
joemalle:feature/topological_sort_levels

Conversation

@joemalle

@joemalle joemalle commented Jun 7, 2026

Copy link
Copy Markdown

This phab implements issue #240 . Please note that I'm not a frequent contributor, and I may have made some noob mistakes. Also, I used Claude to write this. LMK if that's a foul.

Before submitting

  • This PR targets the develop branch.
  • I searched for an existing PR or issue covering the same change.
  • My contribution is licensed under the Boost Software License 1.0.

Type of change

  • Bug fix
  • New feature or API addition
  • Refactor (no behavior change)
  • Documentation
  • Build, CI, or tooling
  • Other (specify below)

Does this PR introduce a breaking change?

  • Yes (describe migration impact below)
  • No

What this PR does

Adds functions to toposort into levels. There's a lower level function that uses a property map to output the vertex levels and a vector shorthand.

Motivation

Fixes issue #240

Testing

I added new tests. Here's what I ran

$ (cd libs/graph/test && ../../../b2 topological_sort_levels_test)
Performing configuration checks

    - default address-model    : 64-bit (cached) [1]
    - default architecture     : x86 (cached) [1]
    - symlinks supported       : yes (cached)

[1] gcc-10
...patience...
...patience...
...found 2402 targets...
...updating 4 targets...
gcc.compile.c++ ../../../bin.v2/libs/graph/test/topological_sort_levels_test.test/gcc-10/debug/x86_64/threading-multi/visibility-hidden/topological_sort_levels_test.o
gcc.link ../../../bin.v2/libs/graph/test/topological_sort_levels_test.test/gcc-10/debug/x86_64/threading-multi/visibility-hidden/topological_sort_levels_test
testing.capture-output ../../../bin.v2/libs/graph/test/topological_sort_levels_test.test/gcc-10/debug/x86_64/threading-multi/visibility-hidden/topological_sort_levels_test.run
**passed** ../../../bin.v2/libs/graph/test/topological_sort_levels_test.test/gcc-10/debug/x86_64/threading-multi/visibility-hidden/topological_sort_levels_test.test

...updated 4 targets...

Checklist

  • Existing tests pass (b2 in the test/ directory).
  • New behavior is covered by a test, or this is a docs / build / refactor change.
  • Documentation was updated if user-facing behavior changed.
  • No new compiler warnings on the platforms I built against.

@joemalle joemalle marked this pull request as ready for review June 7, 2026 16:24
@jeremy-murphy

Copy link
Copy Markdown
Collaborator

Also, I used Claude to write this. LMK if that's a foul.

It's not a foul, and it is important to note, so thank you.

@jeremy-murphy jeremy-murphy self-assigned this Jun 8, 2026
@jeremy-murphy

Copy link
Copy Markdown
Collaborator

Could you please go into more detail on:

  • why this algorithm is useful (what problems does it solve, references to literature, etc)
  • why you choose this implementation, how it relates to the existing topological sort algorithm and why, for example, is it not an enhancement of the existing algorithm?

Thanks, cheers.

@Becheler

Becheler commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Thank you @joemalle for the PR and @jeremy-murphy for the comment. Feel free to edit your own PR first message 😄
Thanks again for shipping this PR on a Sunday so quickly after I reached out 🚀

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

Compiler-warning counts vs develop (auto-generated).
PR run 27098118816 vs develop run 27120109824 (724d8f3c9a).

Job Baseline After Delta
macos (clang, 14) 75 75 0
macos (clang, 17) 70 70 0
macos (clang, 20) 70 70 0
ubuntu (clang-19, 14) 75 75 0
ubuntu (clang-19, 17) 75 75 0
ubuntu (clang-19, 20) 75 75 0
ubuntu (clang-19, 23) 75 75 0
ubuntu (gcc-14, 14) 51 51 0
ubuntu (gcc-14, 17) 51 51 0
ubuntu (gcc-14, 20) 51 51 0
ubuntu (gcc-14, 23) 51 51 0
windows_msvc_14_3 (msvc-14.3) 1226 1232 +6

@Becheler

Becheler commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Note 1 : the +6 warnings on msvc are not an error of the test or header, it's due to Boost.PropertyMap lib forcing a size_t -> unsigned -> size_t. I opened a PR on Boost.PropertyMap repo that should fix this warning (and others) : boostorg/property_map#37

Note 2 Boost.PropertyMap was community maitained, I jumped in but the CI is on fire, we should not condition the present topological_sort_levels on those 6 warnings that are not @joemalle responsibility 🥲

@joemalle

Copy link
Copy Markdown
Author

Hi, I'm happy to contribute! And thank you for investigating/fixing those warnings.

Here are some answers. I can edit the main description when we have a finalized design.

why this algorithm is useful (what problems does it solve, references to literature, etc)

Sorting into "levels" or "generations" is a common issue in scheduling. You find yourself with a DAG of jobs, and you want to know which ones can run concurrently.

I faced a similar scheduling-related issue when I posted this three years ago. I wanted to find sets of instructions that could be reordered/vectorized without affecting correctness.

While writing this answer, I learned that toposorting into levels can also be useful when drawing layered graphs e.g. when rendering DOT/graphviz files.

why you choose this implementation, how it relates to the existing topological sort algorithm and why, for example, is it not an enhancement of the existing algorithm?

We absolutely could enhance the existing DFS implementation. I'm happy to switch to that if you prefer. However, we'd have to add more branches and a small amount of extra state to track the depth. This BFS algorithm is, in my opinion, more natural since it naturally iterates through levels.

This is mentioned in the commit, but just in case, this is Kahn's algorithm (https://en.wikipedia.org/wiki/Topological_sorting#Kahn's_algorithm).

order of vertices within a level is unspecified.
</blockquote>

<h3>Named Parameters</h3>

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Becheler , correct me if I'm wrong, but I think we're going to eschew named parameters now, right?

@jeremy-murphy

jeremy-murphy commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Cool, thanks for answering my questions, I think this will be a valuable addition to the algorithms.

This is mentioned in the commit, but just in case, this is Kahn's algorithm (https://en.wikipedia.org/wiki/Topological_sorting#Kahn's_algorithm).

Ah, now I understand. In that case, I think this should be implemented as a breadth_first_search visitor.
The visitor has a level state that starts at zero. The first thing the algorithm has to do is find all the vertices with no incoming edges to provide as input to breadth_first_search as sources.
When discover_vertex(v, g) is called, the visitor 'associates' (exactly what that means should probably be left to the user but with a sensible default) the vertex with the level by calling something like set(v, level). The trick is then that when examine_vertex(u, g) is called, the visitor sets the level to get(u) + 1.

I think it's as simple as that, but maybe I've misunderstood. :)

(Oh, and it should throw an exception if it detects a cycle.)

PS. It's also possible to detect the transition from one level to the next as a specific event and mark it, which I think would permit more dynamic interaction. What I mean is, since the levels are found sequentially by the visitor, a scheduler using this algorithm could start scheduling one level of tasks without waiting for the whole graph to be processed. If the result is stored in containers then it seems pretty awkward for the scheduler to detect the level change; as in, if we use the vector<vector<vertex_descriptor>> design, it would have to watch and respond to a push_back on the outer vector. What I think is better, and this is an important consideration of generic programming design, is that the algortihm basically outputs a stream of vertices (vertex_descriptors) punctuated with level changes. E.g.: [1, 4, 2, LEVEL, 3, 5, 0, LEVEL, 7, 8, 9, LEVEL, 6] (for instance, for a graph of size 10), so that the user is not committed to allocating any memory. (They might just print the values to the console.) We don't support dynamically created graphs, so we don't need a termination punctuator, since we know exactly how many vertices to expect. Graphs have a default invalid_vertex value, so that would presumably be the natural level punctuator. I think that's the ideal interface. What do you think?

@Becheler

Becheler commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

@jeremy-murphy Unless I'm mistaken I think Kahn and BFS do different things?
For example, if you take the following graph, you expect scheduling levels to be A:0, B:1, C:2

   A
  / \
 v   v
 B-->C

With BFS you'd get A:0, B:1, C:1 because it doesn't look at the long path A>B>C the scheduler needs:

  1. Start at A (level 0). A's out-edges: A -> B and A -> C.
  2. B is new -> give it level 1. C is new -> give it level 1.
  3. Process B, look at it's out-edges: C was already touched, don't touch it again.

With Kahn, you'd get A:0, B:1, C:2 because it looks at how many edges still point into each vertex, and only releases a vertex when that count hits zero, so it looks at the long path (waiting for all predecessors)

  1. in-degrees: A:0, B:1, C:2
  2. Wave 0: in-degree 0 -> {A}. Level 0. Remove A, decrement its targets: B:0, C:1.
  3. Wave 1: in-degree 0 -> {B}. Level 1. Remove B, decrement its targets: C:0.
  4. Wave 2: in-degree 0 -> {C}. Level 2.

thoughts ?

@jeremy-murphy we (surprisingly) do support lazy/dynamic/infinite constructed graphs, see e.g. #134 but Kahn needs access to the whole vertex set so it does not sound very useful here.

Note: I am also very sorry I prepared a more thorough review but somehow it disappeared ? I was mostly question the vector allocations inside the algorithm and the return type. I am not sure the bucket double-vector (although a convenience wrapper) is idiomatic for BGL.

@jeremy-murphy

Copy link
Copy Markdown
Collaborator

Yup, looks like I oversimplified because I thought it would make such an elegant solution. 😂

@jeremy-murphy

jeremy-murphy commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Ok, so I was wrong about the algorithm details, very wrong 😂, but it looks like we should be able to get the online behaviour I was talking about, emitting the vertices in level order with punctuation to delimit the levels. I think it's basically Kahn's algorithm but using two queues alternately to track the level change, and a slightly more detailed property map. Hmm, of course I could be completely wrong again. 😅
PS. I looked more closely and realized the current code is actually using two queues, so just ignore me.

@Becheler

Becheler commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Ahaha I think we are all figuring things out. You thought about property map, but could a visitor answer your vision ? Something like:


struct my_visitor {
    void discover_vertex(Vertex v, const Graph&) {
        std::cout << v << " "; // a task in the current wave
    }
    void finish_level(const Graph&) {
        std::cout << "| "; // a wave just ended
    }
};

topological_sort_levels(g, my_visitor{});
// prints 0 1 | 2 3 4 | 5 |

Then populating vectors or property maps or whatever become a thin wrapper on top of that ?

@jeremy-murphy

Copy link
Copy Markdown
Collaborator

Ahaha I think we are all figuring things out. You thought about property map, but could a visitor answer your vision ?

Interesting. It does feel like the Kahn algorithm could actually be used as a fundamental traversal algorithm with a visitor underlying this. Let's park that idea for now as it would take some time to work out the details, which don't have to hold up this particular algorithm.

Comment on lines +110 to +120
IN: <tt>vertex_index_map(VertexIndexMap i_map)</tt>
<blockquote>
Maps each vertex to an integer in <tt>[0, num_vertices(g))</tt>, used
internally to store running in-degrees. The type must be a model of
<a href="../../property_map/doc/ReadablePropertyMap.html">Readable Property
Map</a> whose key type is the graph's vertex descriptor and whose value
type is an integer.<br>
<b>Default:</b> <tt>get(vertex_index, g)</tt>. If you use this default,
make sure your graph has an internal <tt>vertex_index</tt> property
(e.g. <tt>adjacency_list&lt;…, vecS, …&gt;</tt>).
</blockquote>

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary? Why not just map each vertex directly to the in-degree?

typename VertexIndexMap >
typename graph_traits< VertexListGraph >::vertices_size_type
topological_sort_levels_impl(
const VertexListGraph& g, LevelMap level, VertexIndexMap index_map)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, in a library like the BGL, this function should actually be the first-class public algorithm, not an implementation detail. Adding convenience wrappers around it is fine, but they should be considered as optional extras. This function is what matters.

Instead of the LevelMap level parameter, the last parameter should be VertexOutputIterator result (or something like that), where VertexOutputIterator is a template parameter.
Then, replace put(level, v, level_count); with *result++ = v; and do *result++ = boost::graph_traits<VertexListGraph>::null_vertex(); on line 88 (after next_level.clear();).

Finally, return the result iterator, not level_count.

For testing just use back_inserter onto a vector.

The point is that users could actually process the vertices coming out interactively without waiting for the algorithm to finish, using boost::function_output_iterator, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants