Implement topological_sort_levels#501
Conversation
It's not a foul, and it is important to note, so thank you. |
|
Could you please go into more detail on:
Thanks, cheers. |
|
Thank you @joemalle for the PR and @jeremy-murphy for the comment. Feel free to edit your own PR first message 😄 |
|
Compiler-warning counts vs
|
|
Note 1 : the +6 warnings on msvc are not an error of the test or header, it's due to Boost.PropertyMap lib forcing a Note 2 Boost.PropertyMap was community maitained, I jumped in but the CI is on fire, we should not condition the present |
|
Hi, I'm happy to contribute! And thank you for investigating/fixing those warnings. Here are some answers. I can edit the main description when we have a finalized design.
Sorting into "levels" or "generations" is a common issue in scheduling. You find yourself with a DAG of jobs, and you want to know which ones can run concurrently. I faced a similar scheduling-related issue when I posted this three years ago. I wanted to find sets of instructions that could be reordered/vectorized without affecting correctness. While writing this answer, I learned that toposorting into levels can also be useful when drawing layered graphs e.g. when rendering DOT/graphviz files.
We absolutely could enhance the existing DFS implementation. I'm happy to switch to that if you prefer. However, we'd have to add more branches and a small amount of extra state to track the depth. This BFS algorithm is, in my opinion, more natural since it naturally iterates through levels. This is mentioned in the commit, but just in case, this is Kahn's algorithm (https://en.wikipedia.org/wiki/Topological_sorting#Kahn's_algorithm). |
| order of vertices within a level is unspecified. | ||
| </blockquote> | ||
|
|
||
| <h3>Named Parameters</h3> |
There was a problem hiding this comment.
@Becheler , correct me if I'm wrong, but I think we're going to eschew named parameters now, right?
|
Cool, thanks for answering my questions, I think this will be a valuable addition to the algorithms.
Ah, now I understand. In that case, I think this should be implemented as a I think it's as simple as that, but maybe I've misunderstood. :) (Oh, and it should throw an exception if it detects a cycle.) PS. It's also possible to detect the transition from one level to the next as a specific event and mark it, which I think would permit more dynamic interaction. What I mean is, since the levels are found sequentially by the visitor, a scheduler using this algorithm could start scheduling one level of tasks without waiting for the whole graph to be processed. If the result is stored in containers then it seems pretty awkward for the scheduler to detect the level change; as in, if we use the |
|
@jeremy-murphy Unless I'm mistaken I think Kahn and BFS do different things? With BFS you'd get A:0, B:1, C:1 because it doesn't look at the long path A>B>C the scheduler needs:
With Kahn, you'd get A:0, B:1, C:2 because it looks at how many edges still point into each vertex, and only releases a vertex when that count hits zero, so it looks at the long path (waiting for all predecessors)
thoughts ? @jeremy-murphy we (surprisingly) do support lazy/dynamic/infinite constructed graphs, see e.g. #134 but Kahn needs access to the whole vertex set so it does not sound very useful here. Note: I am also very sorry I prepared a more thorough review but somehow it disappeared ? I was mostly question the vector allocations inside the algorithm and the return type. I am not sure the bucket double-vector (although a convenience wrapper) is idiomatic for BGL. |
|
Yup, looks like I oversimplified because I thought it would make such an elegant solution. 😂 |
|
Ok, so I was wrong about the algorithm details, very wrong 😂, but it looks like we should be able to get the online behaviour I was talking about, emitting the vertices in level order with punctuation to delimit the levels. I think it's basically Kahn's algorithm but using two queues alternately to track the level change, and a slightly more detailed property map. Hmm, of course I could be completely wrong again. 😅 |
|
Ahaha I think we are all figuring things out. You thought about property map, but could a visitor answer your vision ? Something like: Then populating vectors or property maps or whatever become a thin wrapper on top of that ? |
Interesting. It does feel like the Kahn algorithm could actually be used as a fundamental traversal algorithm with a visitor underlying this. Let's park that idea for now as it would take some time to work out the details, which don't have to hold up this particular algorithm. |
| IN: <tt>vertex_index_map(VertexIndexMap i_map)</tt> | ||
| <blockquote> | ||
| Maps each vertex to an integer in <tt>[0, num_vertices(g))</tt>, used | ||
| internally to store running in-degrees. The type must be a model of | ||
| <a href="../../property_map/doc/ReadablePropertyMap.html">Readable Property | ||
| Map</a> whose key type is the graph's vertex descriptor and whose value | ||
| type is an integer.<br> | ||
| <b>Default:</b> <tt>get(vertex_index, g)</tt>. If you use this default, | ||
| make sure your graph has an internal <tt>vertex_index</tt> property | ||
| (e.g. <tt>adjacency_list<…, vecS, …></tt>). | ||
| </blockquote> |
There was a problem hiding this comment.
Why is this necessary? Why not just map each vertex directly to the in-degree?
| typename VertexIndexMap > | ||
| typename graph_traits< VertexListGraph >::vertices_size_type | ||
| topological_sort_levels_impl( | ||
| const VertexListGraph& g, LevelMap level, VertexIndexMap index_map) |
There was a problem hiding this comment.
So, in a library like the BGL, this function should actually be the first-class public algorithm, not an implementation detail. Adding convenience wrappers around it is fine, but they should be considered as optional extras. This function is what matters.
Instead of the LevelMap level parameter, the last parameter should be VertexOutputIterator result (or something like that), where VertexOutputIterator is a template parameter.
Then, replace put(level, v, level_count); with *result++ = v; and do *result++ = boost::graph_traits<VertexListGraph>::null_vertex(); on line 88 (after next_level.clear();).
Finally, return the result iterator, not level_count.
For testing just use back_inserter onto a vector.
The point is that users could actually process the vertices coming out interactively without waiting for the algorithm to finish, using boost::function_output_iterator, etc.
This phab implements issue #240 . Please note that I'm not a frequent contributor, and I may have made some noob mistakes. Also, I used Claude to write this. LMK if that's a foul.
Before submitting
developbranch.Type of change
Does this PR introduce a breaking change?
What this PR does
Adds functions to toposort into levels. There's a lower level function that uses a property map to output the vertex levels and a vector shorthand.
Motivation
Fixes issue #240
Testing
I added new tests. Here's what I ran
Checklist
b2in thetest/directory).