Skip to content

cuda.core: add GraphBuilder.graph_definition property#2026

Open
Andy-Jost wants to merge 9 commits into
NVIDIA:mainfrom
Andy-Jost:graph-step-3
Open

cuda.core: add GraphBuilder.graph_definition property#2026
Andy-Jost wants to merge 9 commits into
NVIDIA:mainfrom
Andy-Jost:graph-step-3

Conversation

@Andy-Jost
Copy link
Copy Markdown
Contributor

Summary

Completes step 3 of #1330 by exposing the captured graph as an explicit GraphDefinition view that shares ownership of the same graph the builder is producing. The handle-layer plumbing landed in #2008; this PR wires up the user-facing surface and the state-guard rules.

The new property unlocks two hybrid flows:

  • Inspect or augment a captured graph through the explicit API after end_building(), then re-complete() to pick up the changes.
  • Populate a conditional body (returned by if_then / if_else / while_loop / switch) entirely through the explicit API without ever calling begin_building() on it.

API addition

GraphBuilder.graph_definition: GraphDefinition (read-only property)

Availability rules:

Builder kind Before begin_building While building After end_building
Primary RuntimeError RuntimeError OK
Conditional body OK RuntimeError OK
Forked RuntimeError RuntimeError RuntimeError

The returned GraphDefinition is a view, not an owning wrapper: nodes added through it appear in subsequent complete() and debug_dot_print() calls on the builder.

Test plan

Nine new tests in test_graph_builder.py:

  • Happy path: graph_definition returns a GraphDefinition after end_building() and reflects the captured nodes.
  • Three error paths: forked builder, primary pre-begin_building, any builder mid-capture.
  • Shared validity: the returned GraphDefinition keeps working after the builder is closed.
  • Round-trip: mutate via the explicit API after capture, complete(), and run end-to-end on a stream.
  • Conditional-body hybrid flow: populate the body entirely through the explicit API and run.
  • Conditional-body augmentation flow: capture into the body, then add extra nodes through the explicit view, and run.
  • Conditional-body during-capture rejection.

Local pre-commit clean. Local test run on a 2-GPU machine: all cuda_core/tests/graph pass.

Related

Made with Cursor

Andy-Jost and others added 8 commits May 1, 2026 14:52
…hine

Refactor GraphBuilder from a Python class using _MembersNeededForFinalize
to a cdef class with explicit _BuilderKind (PRIMARY/FORKED/CONDITIONAL_BODY)
and _CaptureState (NOT_STARTED/CAPTURING/ENDED) tracking. Cleanup moves
into __dealloc__/close, and the builder now uses GraphHandle/StreamHandle
from _resource_handles instead of holding raw driver objects. Drop the
is_stream_owner flag now that StreamHandle owns the lifetime.

End-capture paths in __dealloc__ and close guard on _h_stream so cleanup
is safe even if _init* fails before completing assignment.

Made-with: Cursor
Add a GraphExecHandle to the resource-handle layer (parallel to
GraphHandle) wrapping CUgraphExec with RAII cleanup via
cuGraphExecDestroy on shared_ptr release. Convert Graph from a Python
class using _MembersNeededForFinalize to a cdef class holding a typed
_h_graph_exec attribute, dropping the weakref.finalize machinery.
update/upload/launch move to nogil cydriver paths consistent with the
GraphBuilder rewrite.

Also drop quoted forward-reference annotations on create_graph_builder
and _instantiate_graph/complete now that GraphBuilder is cimported in
_device.pyx and _stream.pyx and Cython accepts the in-module forward
reference to Graph. Clears the related "Strings should no longer be
used for type declarations" warnings.

Made-with: Cursor
The cdef-class member declarations live in the .pxd, so the .pyx does
not need to re-cimport GraphExecHandle, GraphHandle, or StreamHandle.

Made-with: Cursor
… cycle

cimport-ing GraphBuilder at the top of _stream.pyx and _device.pyx made
Cython emit a Python-level import of cuda.core.graph._graph_builder
during _stream module init. That triggered the chain
graph -> _graph_node -> _kernel_arg_handler -> _memory._buffer
-> _device, which then re-entered the still-initializing _stream module
via "from cuda.core._stream import IsStreamT", failing with
ImportError: cannot import name IsStreamT.

Restore the original lazy "import GraphBuilder" inside
create_graph_builder (Stream and Device) and Stream_accept. The return
annotations stay as bare names; "from __future__ import annotations" in
both files defers their evaluation, so they need not resolve at
function-definition time.

Made-with: Cursor
The previous import-cycle fix changed _stream/_device.create_graph_builder
to a lazy Python "import GraphBuilder" instead of a module-level cimport.
With _init declared as @staticmethod cdef, Python attribute lookup
cannot find it, so every test that builds a graph failed with
"AttributeError: type object 'GraphBuilder' has no attribute '_init'"
at _device.pyx:1376 / _stream.pyx:376.

Convert _init from @staticmethod cdef to @staticmethod def (matches the
Stream._init pattern) and drop the cdef declaration from the .pxd.
_init runs once per builder creation, so the loss of cdef-level
dispatch is irrelevant. Graph._init stays cdef; it is only called
intra-module.

Made-with: Cursor
Every graph-builder test failed with CUDA_ERROR_INVALID_VALUE on the
new ``GraphBuilder.begin_building`` path. The driver rejects
``cuStreamGetCaptureInfo`` when ``captureStatus_out`` is NULL, but the
new ``_get_capture_info`` helper accepted a NULL status pointer and
``begin_building`` was calling it that way (it just wanted the freshly
captured graph handle and assumed the status was implied by the
preceding ``cuStreamBeginCapture``).

Pass a stack-local ``CUstreamCaptureStatus`` and document the helper's
requirement that ``status`` be non-NULL. ``graph`` is still allowed to
be NULL (``is_building`` calls it that way and the driver accepts it).

Co-authored-by: Cursor <cursoragent@cursor.com>
@Andy-Jost Andy-Jost added this to the cuda.core v1.0.0 milestone May 5, 2026
@Andy-Jost Andy-Jost added enhancement Any code-related improvements P1 Medium priority - Should do cuda.core Everything related to the cuda.core module labels May 5, 2026
@Andy-Jost Andy-Jost self-assigned this May 5, 2026
@Andy-Jost
Copy link
Copy Markdown
Contributor Author

The builds on #2008. Commits unique to this PR begin at de85b92

Completes step 3 of NVIDIA#1330 by exposing the captured graph as an explicit
`GraphDefinition` view that shares ownership of the underlying `CUgraph`.
The handle-layer plumbing landed in PR NVIDIA#2008; this commit wires up the
user-facing surface and locks in the state-guard rules.

State semantics:

- PRIMARY builder: only valid after `end_building()`. Before
  `begin_building()` no graph exists; during capture the driver is the
  sole writer, so explicit access is unsafe.
- CONDITIONAL_BODY builder: valid both before `begin_building()` (the
  body graph is allocated at conditional-node creation time) and after
  `end_building()`. This enables a hybrid flow where a conditional body
  is populated entirely via the explicit API, with no capture at all.
- FORKED builder: never valid. Forked builders share the primary's
  graph; access through the primary instead.

Tests cover the happy path, both hybrid flows on conditional bodies
(populate-via-explicit-API and capture-then-augment), the three error
states (forked, capturing, primary pre-capture), and the
shared-ownership guarantee (the `GraphDefinition` survives the
builder's `close()`).

Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

@Andy-Jost Andy-Jost requested review from leofang and rwgk May 28, 2026 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P1 Medium priority - Should do

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants