feat(openai-agents): Support span streaming#6404
feat(openai-agents): Support span streaming#6404alexander-alderman-webb wants to merge 15 commits into
Conversation
Codecov Results 📊✅ 282 passed | Total: 282 | Pass Rate: 100% | Execution Time: 55.30s 📊 Comparison with Base Branch
✨ No test changes detected All tests are passing successfully. ❌ Patch coverage is 6.48%. Project has 15110 uncovered lines. Files with missing lines (9)
Coverage diff@@ Coverage Diff @@
## main #PR +/-##
==========================================
- Coverage 34.91% 33.53% -1.38%
==========================================
Files 190 190 —
Lines 22689 22732 +43
Branches 7742 7770 +28
==========================================
+ Hits 7920 7622 -298
- Misses 14769 15110 +341
- Partials 832 807 -25Generated by Codecov Action |
ericapisani
left a comment
There was a problem hiding this comment.
Overall looking good but there's a few things around the potential for some bugs with a couple of the conditionals that were updated, and the inconsistent way that we're processing spans in tests (some aren't asserting the total amount of spans and the order in which they appear, others are).
| span is None | ||
| or isinstance(span, StreamedSpan) | ||
| and span.end_timestamp is not None | ||
| or not isinstance(span, StreamedSpan) | ||
| and span.timestamp is not None |
There was a problem hiding this comment.
This is a bit difficult to read and I think could lead to an accidental bug because of orders or precedence with the and and or keywords (the former taking evaluation precedence over the latter).
Reading this, I'm not sure if the intention was to have the statement evaluate as isinstance(span, StreamedSpan) and span.end_timestamp is not None rather than (span is None or isinstance(span, StreamedSpan)) and ...
There was a problem hiding this comment.
Fair point about lack of intentionality.
The intention is for the condition to evaluate to true if the end timestamp has not yet been set, indicating that the span is still active. The complex conditional is used because Span.timestamp and StreamedSpan.end_timestamp represent the end timestamp (i.e., because the instance variable was renamed).
I've added brackets in e243b86
| name=f"chat {model_name}", | ||
| origin=SPAN_ORIGIN, | ||
| ) | ||
| # TODO-anton: remove hardcoded stuff and replace something that also works for embedding and so on |
There was a problem hiding this comment.
I'm not sure how useful this comment is anymore - any chance we can remove it?
There was a problem hiding this comment.
I think this is still a bug 😬.
Created #6417 to track.
| span_streaming = has_span_streaming_enabled(sentry_sdk.get_client().options) | ||
| if span_streaming: | ||
| with sentry_sdk.traces.start_span( | ||
| name=f"handoff from {from_agent.name} to {to_agent_name}", |
There was a problem hiding this comment.
We have a gen_ai.agent.name attribute and I'm thinking this should maybe be added as an attribute with a value of from_agent.name (since it's the originator of the span in terms of the action being taken)
There was a problem hiding this comment.
I'd have the same instinct to add an attribute to avoid users matching on the name.
However, we're removing the workflow span entirely per the RFC.
| assert result.final_output == "Hello, how can I help you?" | ||
|
|
||
| sentry_sdk.flush() | ||
| spans = [item.payload for item in items] |
There was a problem hiding this comment.
Even if we're only interested in the invoke agent span and ai client span, we should still assert the number of expected spans here since this should be stable. If there are more/less spans, we'd want this test to fail
There was a problem hiding this comment.
I'd consider this out of scope for this PR because the existing test did not assert the total number of spans the goal of the PR is to port the integration to support span streaming (and make the same test guarantees as before).
There was a problem hiding this comment.
I'm not sure if it's always been the intention to open another pull request to introduce the stricter checks, but if so, please add that in the pull request descriptions going forward so it's clear if it's an intentional decision or an accidental oversight.
As we've been introducing these stricter checks as part of other streamed span integration migrations and code review comments have been left when it's not been done, it appears to me that these stricter checks are part of the migration process, not out of scope.
In the case of this change set it's especially noticeable because it's being applied in some places and not others.
There was a problem hiding this comment.
I was neither intending to open another PR nor was this an accidental oversight.
Going by the migration guide in #5386 (comment) I'm adding test code equivalent to the tests for the transaction based traces. (emphasis mine below).
Parametrize all tests that deal with tracing to execute both in legacy and in span streaming mode. Add equivalent assertions in streaming mode. You can use the capture_items fixture for unwrapping envelope item items automatically.
The two instances this has been brought up in reviews that I'm aware of (#6206 (comment) and #6123 (comment)) were specifically pointing out looser assertions compared to the legacy path.
Based on the information I have seen, broader changes to the tests are not part of the span first migration. If they are let's update the integration guide.
100% agree that the change set is a pain to read (and it's frustrating that code written so recently has so much tech debt).
There was a problem hiding this comment.
With regards to porting tests, if the existing tests don't assert on the number/order of spans currently, fine to also not do that in scope of the span first conversion.
What I want to avoid during the migration is losing assertions compared to the legacy path. If they weren't there in the first place, we don't need to add them. (It's def nice to have though, and can be done low-effort with a clanker.)
| invoke_agent_span = next( | ||
| span | ||
| for span in spans | ||
| if span["attributes"]["sentry.op"] == OP.GEN_AI_INVOKE_AGENT | ||
| ) | ||
| ai_client_span = next( | ||
| span for span in spans if span["attributes"]["sentry.op"] == OP.GEN_AI_CHAT | ||
| ) |
There was a problem hiding this comment.
As I'm reviewing, I'm noticing some test cases with this pattern, and other test cases with this pattern.
Is there a reason why some are implemented one way vs the other? Is it to avoid assigning spans to variables that won't be asserted on?
There was a problem hiding this comment.
I have no idea what the authors of the integration were thinking. I have not made a conscious decision in favor of one or the other style.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit e1d6ebe. Configure here.

Description
Use
sentry_sdk.traces.start_span,replaceSpan.set_data()withStreamedSpan.set_attribute()andSpan.set_status(SPANSTATUS.INTERNAL_ERROR)withStreamedSpan.status = SpanStatus.ERRORwhen in span streaming mode.Parametrize tests on the trace lifecycle option.
Issues
Closes #6041
Reminders
tox -e linters.feat:,fix:,ref:,meta:)