Skip to content

fix(a2a): Promote RemoteA2aAgent response to workflow node output#5852

Open
Harineko0 wants to merge 2 commits into
google:mainfrom
Harineko0:fix/remote-a2a-agent-workflow-output
Open

fix(a2a): Promote RemoteA2aAgent response to workflow node output#5852
Harineko0 wants to merge 2 commits into
google:mainfrom
Harineko0:fix/remote-a2a-agent-workflow-output

Conversation

@Harineko0
Copy link
Copy Markdown

@Harineko0 Harineko0 commented May 26, 2026

Link to Issue or Description of Change

Problem

When a RemoteA2aAgent is used as a static node in a Workflow graph that feeds into a JoinNode, the joined output contains None for every RemoteA2aAgent predecessor.

Reproducer (simplified from a real coordinator graph):

parallel_investigation_join = JoinNode(name="parallel_investigation_join")
Workflow(
    edges=[
        ("START", account_context_agent, parallel_investigation_join),  # RemoteA2aAgent
        ("START", ticket_history_agent, parallel_investigation_join),   # RemoteA2aAgent
        ("START", diagnostics_agent, parallel_investigation_join),      # RemoteA2aAgent
        ...
    ]
)

Observed JoinNode input:

parallel_investigation_join:
  account_context_agent: null
  ticket_history_agent: null
  diagnostics_agent: null

Root cause: RemoteA2aAgent inherits the default BaseAgent._run_impl, which iterates run_async and yields events without ever setting event.output or event.node_info.message_as_output. As a result, NodeRunner._track_event_in_context leaves ctx.output as None, and Workflow._handle_completion never records an entry in loop_state.node_outputs for that predecessor. JoinNode then sees None for it.

LlmAgent already solves the equivalent problem by overriding _run_impl and promoting the model's text reply to event.output (via process_llm_agent_output in _llm_agent_wrapper.py). RemoteA2aAgent had no equivalent hook.

Solution

Add a workflow-only override of _run_impl on RemoteA2aAgent that mirrors LlmAgent's behavior. For each event yielded by BaseAgent._run_impl, a new _promote_response_to_output helper joins the text of all parts that are not thoughts, function calls, or function responses, assigns it to event.output, and sets event.node_info.message_as_output = True (consistent with LlmAgent, prevents NodeRunner._flush_output_and_deltas from emitting a duplicate trailing output event).

The helper skips:

  • partial events (streaming chunks)
  • events not authored by this agent
  • events whose event.output is already set
  • events whose content carries only thoughts (streaming working / submitted task statuses that the legacy _handle_a2a_response marks thought=True)
  • events whose content carries only function calls (the input_required / auth_required mock function call inserted by _create_mock_function_call_for_required_user_input — those should remain interrupts, not outputs)
  • events whose A2A task state is non-final (submitted, working, input-required, auth-required, unknown). The v2 integration path (_handle_a2a_response_v2) delegates to converters that do not mark streaming working text as thought=True, so the thought filter alone is not enough. Without this guard, a working text event and the subsequent completed text event would each try to set event.output, causing NodeRunner to raise ValueError: Output already set on the second event and aborting the run before the real final answer ever surfaced. The state is read from event.custom_metadata['a2a:response']['status']['state'], which _run_async_impl already stamps before yield. Plain A2AMessage responses (no status field) and terminal task states (completed, failed, canceled, rejected) still promote.

In addition, _run_impl short-circuits after the first successful promotion. This protects against the case where a server emits multiple terminal-state events for one run (e.g. a completed status update followed by trailing artifact updates on the same already-completed task) — only the first terminal event becomes the node's output, subsequent ones pass through untouched.

Scope is intentionally narrow: only the agent boundary is touched. to_adk_event.py and the workflow scheduler are unchanged, since the same workaround (promoting content → output at the agent layer) is what LlmAgent does and what keeps the fix local.

Testing Plan

Unit Tests

  • I have added or updated unit tests for my change.
  • All unit tests pass locally.

Added TestRemoteA2aAgentWorkflowOutput in tests/unittests/agents/test_remote_a2a_agent.py (20 cases).

Manual End-to-End (E2E) Tests

I ran the failing workflow described in the Problem section against a real ADK app: a Workflow graph whose START fans out into multiple RemoteA2aAgent nodes that all feed into a single JoinNode. Each remote specialist runs as its own A2A server; the coordinator runs the workflow and forwards the joined dict to a downstream synthesis step.

Before the fix:

Screenshot 2026-05-26 at 15 32 06

After the fix:

Screenshot 2026-05-26 at 15 30 25

Checklist

  • I have read the CONTRIBUTING.md document.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • I have manually tested my changes end-to-end.
  • Any dependent changes have been merged and published in downstream modules.

Additional context

The fix mirrors the pattern LlmAgent already uses (process_llm_agent_output in src/google/adk/workflow/_llm_agent_wrapper.py), keeping output-promotion at the agent boundary rather than touching the workflow scheduler or the A2A converters. This minimizes blast radius and avoids regressions in non-workflow usages of RemoteA2aAgent, where the _run_impl path is not exercised.

Harineko0 added 2 commits May 26, 2026 15:15
RemoteA2aAgent inherits BaseAgent._run_impl, which never sets
event.output or message_as_output, so NodeRunner leaves ctx.output as
None for A2A agent nodes. When a JoinNode aggregates parallel
RemoteA2aAgent predecessors, every value in the joined dict comes back
as None.

Override _run_impl on RemoteA2aAgent to mirror LlmAgent: join the
non-thought, non-function-call/response text parts of each yielded
event into event.output and set message_as_output=True. Partial,
foreign-author, and input-required (mock function call) events are
skipped.
The v2 A2A response handler delegates to converters that do not mark
streaming working-state text as thought=True, so the prior fix promoted
every non-partial text event to event.output. NodeRunner sets ctx.output
from the first one and raises "Output already set" on the next,
breaking streaming RemoteA2aAgent workflow nodes before they reach the
real final answer.

Skip events whose A2A task state is submitted, working, input-required,
auth-required, or unknown (read from custom_metadata['a2a:response']),
and short-circuit further promotion in _run_impl after the first
terminal event so trailing artifact updates on a completed task don't
trigger the double-set.
@Harineko0 Harineko0 changed the title Fix/remote a2a agent workflow output fix(a2a): Promote RemoteA2aAgent response to workflow node output May 26, 2026
@adk-bot adk-bot added the core [Component] This issue is related to the core interface and implementation label May 26, 2026
@rohityan rohityan self-assigned this May 26, 2026
@rohityan rohityan added the request clarification [Status] The maintainer need clarification or more information from the author label May 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core [Component] This issue is related to the core interface and implementation request clarification [Status] The maintainer need clarification or more information from the author

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants