Skip to content

[codex] Prevent MCP tool metadata hangs on malformed responses#110

Open
zoeshawwang wants to merge 1 commit into
mainfrom
codex/mcp-malformed-error-timeout
Open

[codex] Prevent MCP tool metadata hangs on malformed responses#110
zoeshawwang wants to merge 1 commit into
mainfrom
codex/mcp-malformed-error-timeout

Conversation

@zoeshawwang
Copy link
Copy Markdown
Collaborator

Summary

Fixes an AgentRun SDK hang where ToolResource MCP metadata loading can wait indefinitely when the MCP Python transport logs a malformed JSON-RPC response, for example an error payload with error.message = null.

Aone: https://project.aone.alibaba-inc.com/v2/project/2139638/req/82638110

Root Cause

The MCP Python streamable HTTP transport can surface malformed JSON-RPC response parsing as an Exception on the read stream. The default ClientSession handler does not route that exception back to the pending initialize or list_tools request, so SDK callers can keep awaiting forever.

Changes

  • Bound MCP metadata operations (initialize and list_tools) with a 30s timeout so agent creation cannot hang indefinitely on malformed or silent MCP responses.
  • Bound MCP tool invocation with Config.timeout so tool calls also fail instead of waiting forever.
  • Added unit coverage for metadata timeout and tool-call timeout behavior.

Validation

  • uv run ruff check agentrun/tool/api/mcp.py tests/unittests/tool/test_mcp.py
  • uv run pytest tests/unittests/tool/test_mcp.py -q
  • uv run pytest tests/unittests/tool -q
  • git diff --check

Notes

The MCP service should still be fixed to return a valid JSON-RPC error with a string error.message; this SDK change prevents the client-side hang while preserving that server-side requirement.

Constraint: MCP Python client can log malformed JSON-RPC errors without waking pending initialize/list_tools awaits.

Rejected: Template-side timeout only | leaves SDK callers exposed to the same hang.

Confidence: high

Scope-risk: narrow

Directive: Keep MCP metadata operations bounded so agent creation cannot wait indefinitely on malformed server responses.

Tested: uv run ruff check agentrun/tool/api/mcp.py tests/unittests/tool/test_mcp.py; uv run pytest tests/unittests/tool/test_mcp.py -q; uv run pytest tests/unittests/tool -q; git diff --check

Not-tested: live MCP server returning malformed JSON-RPC error

Closes: coop#82638110

Change-Id: I20569d10af7ba44c140ab19e446d7fc35870f7ec
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants