Fix proxy container UID/GID mismatch causing MCP log write failures#35069
Fix proxy container UID/GID mismatch causing MCP log write failures#35069Copilot wants to merge 3 commits into
Conversation
Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com>
Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Aligns DIFC and CLI proxy container UIDs/GIDs with the host runner so the shared /tmp/gh-aw/mcp-logs/rpc-messages.jsonl log file isn't root-owned, preventing MCP Gateway permission failures and zero-byte telemetry artifacts.
Changes:
- Add
--user "$(id -u):$(id -g)"todocker runinstart_difc_proxy.sh. - Add
--user "$(id -u):$(id -g)"todocker runinstart_cli_proxy.sh. - Minor gofmt-style realignment of struct field values in
codex_engine.go.
Show a summary per file
| File | Description |
|---|---|
| actions/setup/sh/start_difc_proxy.sh | Run DIFC proxy container as runner UID/GID. |
| actions/setup/sh/start_cli_proxy.sh | Run CLI proxy container as runner UID/GID. |
| pkg/workflow/codex_engine.go | Formatting-only alignment changes for CODEX_HOME / RUST_LOG. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 2/3 changed files
- Comments generated: 0
|
🧪 Test Quality Sentinel completed test quality analysis. No test files were added or modified in this PR (PR #35069). Changed files: actions/setup/sh/start_cli_proxy.sh, actions/setup/sh/start_difc_proxy.sh, pkg/workflow/codex_engine.go. Test Quality Sentinel skipped. |
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. No ADR enforcement needed: PR #35069 does not have the 'implementation' label and has only 2 new lines of code in default business logic directories (threshold: 100). |
|
✅ PR Code Quality Reviewer completed the code quality review. |
|
✨ The prophecy is fulfilled... Smoke Codex has completed its mystical journey. The stars align. 🌟 |
|
📰 BREAKING: Smoke Copilot is now investigating this pull request. Sources say the story is developing... |
|
🚀 Smoke Pi MISSION COMPLETE! Pi delivered. 🥧 |
|
🚀 Smoke Antigravity MISSION COMPLETE! Antigravity has spoken. ✨ |
|
🚀 Smoke Gemini MISSION COMPLETE! Gemini has spoken. ✨ |
|
🎬 THE END — Smoke Claude MISSION: ACCOMPLISHED! The hero saves the day! ✨ |
Agent Container Tool Check
Result: 11/12 tools available — FAIL (dotnet missing)
|
Smoke Test Results
Overall Status: FAIL Warning Firewall blocked 1 domainThe following domain was blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "localhost"See Network Configuration for more information.
|
There was a problem hiding this comment.
The core fix is correct and sound for the primary use case (GitHub-hosted runners running as a non-root runner user). One non-blocking concern worth resolving before this ships to self-hosted runners.
### Findings
Arbitrary-UID container compatibility (medium) — --user $(id -u):$(id -g) passes the host UID directly into the container. This is safe only if both proxy images are built to tolerate UIDs that have no /etc/passwd entry. Processes that call getpwuid() (TLS libs, Rust stdlib, Node) fail with ENOENT when the UID is missing, which could silently break TLS cert generation and cause the post-start health check to time out. See inline comment for mitigations.
codex_engine.go change — whitespace alignment swap only; no logic concern.
🔎 Code quality review by PR Code Quality Reviewer · sonnet46 1.2M
| fi | ||
|
|
||
| docker run -d --name awmg-cli-proxy --network host \ | ||
| --user "$(id -u):$(id -g)" \ |
There was a problem hiding this comment.
Host UID may not exist in the container image, risking getpwuid() failures during TLS initialization.
💡 Details and suggested mitigation
Passing --user $(id -u):$(id -g) injects the host runner UID/GID into the container. If that UID is absent from the container image's /etc/passwd, processes that call getpwuid() — including TLS libraries (OpenSSL, rustls) and many Rust/Node runtimes — receive ENOENT and can silently fail or crash during startup. The health-check loop that follows relies on the TLS cert being written (--tls --tls-dir); a crash before cert generation would cause the job to time out with a misleading error.
This pattern is only safe if the container image is explicitly built to tolerate arbitrary UIDs (e.g. home directory and write paths are world-writable, as recommended by the [OpenShift arbitrary-UID guidance]((docs.openshift.com/redacted)
Suggested mitigations (pick one):
- Confirm and document that both proxy images support arbitrary UIDs.
- If not, use a fixed non-root UID defined in the image Dockerfile rather than passing the host UID.
- Add a post-start check:
docker exec awmg-cli-proxy idto surface this failure early.
Note: identical concern applies to start_difc_proxy.sh line 43.
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /diagnose and /zoom-out — approving with minor observations.
📋 Key Themes & Highlights
Key Themes
- Root cause properly addressed: Adding
--user "$(id -u):$(id -g)"to both proxydocker runinvocations is the correct fix. File ownership now aligns with the MCP Gateway runner UID/GID. - Missing regression test: No test guards against this flag being dropped in future; the
start_mcp_gateway_test.shpattern shows a precedent for pairing proxy scripts with shell tests. - Unrelated formatting change: The
codex_engine.gohunk is a pure alignment swap — harmless but adds diff noise to an otherwise tightly-scoped fix.
Positive Highlights
- ✅ Minimal, surgical fix — exactly one line added per affected script
- ✅ Consistent treatment: both
start_difc_proxy.shandstart_cli_proxy.shpatched symmetrically - ✅ PR description is clear and includes a concrete
docker runexample - ✅ No logic changes; low regression risk
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · sonnet46 1.4M
| echo "Starting DIFC proxy container: $CONTAINER_IMAGE" | ||
|
|
||
| docker run -d --name awmg-proxy --network host \ | ||
| --user "$(id -u):$(id -g)" \ |
There was a problem hiding this comment.
[/diagnose] No regression test accompanies this fix. The failure mode (root-owned rpc-messages.jsonl → permission-denied on subsequent writes) is concrete and reproducible.
💡 Consider a shell-level smoke test
Other scripts in this directory (e.g., start_mcp_gateway_test.sh) show a pattern of pairing start_*.sh with a *_test.sh. A lightweight start_difc_proxy_test.sh could assert the --user flag is present in the script, guarding against accidental removal in future refactors.
| // Keep Codex runtime state in /tmp/gh-aw because ${RUNNER_TEMP}/gh-aw is | ||
| // mounted read-only inside the AWF chroot sandbox. | ||
| "CODEX_HOME": "/tmp/gh-aw/mcp-config", | ||
| "CODEX_HOME": "/tmp/gh-aw/mcp-config", |
There was a problem hiding this comment.
[/zoom-out] This alignment change is unrelated to the stated bug fix and adds noise to the diff. The PR title and body describe a proxy UID/GID fix — a formatting-only change to codex_engine.go is unexpected here.
Consider reverting this hunk or splitting it into a separate formatting PR so the diff stays focused on the permission fix.
|
Smoke test complete: FAIL
Summary issue created for the run. Warning Firewall blocked 6 domainsThe following domains were blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "accounts.google.com"
- "android.clients.google.com"
- "clients2.google.com"
- "contentautofill.googleapis.com"
- "safebrowsingohttpgateway.googleapis.com"
- "www.google.com"See Network Configuration for more information.
|
Comment MemoryNote This comment is managed by comment memory.It stores persistent context for this thread in the code block at the top of this comment. Warning Firewall blocked 6 domainsThe following domains were blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "accounts.google.com"
- "android.clients.google.com"
- "clients2.google.com"
- "contentautofill.googleapis.com"
- "safebrowsingohttpgateway.googleapis.com"
- "www.google.com"See Network Configuration for more information.
|
Smoke Test: Claude — Run §26482687149Overall: ✅ PARTIAL (1 skipped)
Warning Firewall blocked 6 domainsThe following domains were blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "accounts.google.com"
- "android.clients.google.com"
- "clients2.google.com"
- "contentautofill.googleapis.com"
- "safebrowsingohttpgateway.googleapis.com"
- "www.google.com"See Network Configuration for more information.
|
There was a problem hiding this comment.
💥 Automated smoke test review - all systems nominal!
Warning
Firewall blocked 6 domains
The following domains were blocked by the firewall during workflow execution:
accounts.google.comandroid.clients.google.comclients2.google.comcontentautofill.googleapis.comsafebrowsingohttpgateway.googleapis.comwww.google.com
To allow these domains, add them to the
network.allowedlist in your workflow frontmatter:
network:
allowed:
- defaults
- "accounts.google.com"
- "android.clients.google.com"
- "clients2.google.com"
- "contentautofill.googleapis.com"
- "safebrowsingohttpgateway.googleapis.com"
- "www.google.com"See Network Configuration for more information.
💥 [THE END] — Illustrated by Smoke Claude · opus47 4.9M
| fi | ||
|
|
||
| docker run -d --name awmg-cli-proxy --network host \ | ||
| --user "$(id -u):$(id -g)" \ |
There was a problem hiding this comment.
Nice fix — running the proxy container as the runner UID/GID should keep MCP log file permissions consistent with the host runner. Consider also documenting this in a short comment above the flag so the intent survives future refactors.
| echo "Starting DIFC proxy container: $CONTAINER_IMAGE" | ||
|
|
||
| docker run -d --name awmg-proxy --network host \ | ||
| --user "$(id -u):$(id -g)" \ |
There was a problem hiding this comment.
Good symmetry with start_cli_proxy.sh. Worth a follow-up: factor the --user "$(id -u):$(id -g)" flag into a shared helper to avoid drift between the two proxy launch scripts.
|
Smoke Copilot 26482687087: FAIL Warning Firewall blocked 6 domainsThe following domains were blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "accounts.google.com"
- "android.clients.google.com"
- "clients2.google.com"
- "contentautofill.googleapis.com"
- "safebrowsingohttpgateway.googleapis.com"
- "www.google.com"See Network Configuration for more information.
|
There was a problem hiding this comment.
Smoke review completed: inline comments note the proxy UID/GID ownership alignment.
Warning
Firewall blocked 6 domains
The following domains were blocked by the firewall during workflow execution:
accounts.google.comandroid.clients.google.comclients2.google.comcontentautofill.googleapis.comsafebrowsingohttpgateway.googleapis.comwww.google.com
To allow these domains, add them to the
network.allowedlist in your workflow frontmatter:
network:
allowed:
- defaults
- "accounts.google.com"
- "android.clients.google.com"
- "clients2.google.com"
- "contentautofill.googleapis.com"
- "safebrowsingohttpgateway.googleapis.com"
- "www.google.com"See Network Configuration for more information.
📰 BREAKING: Report filed by Smoke Copilot · gpt55 4.4M
|
|
||
| docker run -d --name awmg-cli-proxy --network host \ | ||
| --user "$(id -u):$(id -g)" \ | ||
| -e GH_TOKEN \ |
There was a problem hiding this comment.
Smoke review: running the CLI proxy container as the runner UID/GID should keep shared MCP log ownership aligned with the gateway process.
|
|
||
| docker run -d --name awmg-proxy --network host \ | ||
| --user "$(id -u):$(id -g)" \ | ||
| -e GH_TOKEN \ |
There was a problem hiding this comment.
Smoke review: this mirrors the ownership fix for the DIFC proxy path, which helps prevent root-owned files in the shared log directory.
|
📰 VERDICT: Smoke Copilot has concluded. All systems operational. This is a developing story. 🎤 |
When DIFC/CLI proxy mode is enabled, proxy containers were started as root and created
/tmp/gh-aw/mcp-logs/rpc-messages.jsonlwith root ownership. MCP Gateway runs as the runner UID/GID, so log initialization could fail with permission denied and produce zero-byte telemetry files that fail post-step parsing.What changed
actions/setup/sh/start_difc_proxy.sh)awmg-proxycontainer as the current runner user.actions/setup/sh/start_cli_proxy.sh)awmg-cli-proxycontainer as the current runner user.Effect
/tmp/gh-aw/mcp-logsartifacts across proxy and MCP Gateway containers.rpc-messages.jsonlfrom breaking MCP telemetry parsing.docker run -d --name awmg-proxy --network host \ --user "$(id -u):$(id -g)" \ -e GH_TOKEN \ ...✨ PR Review Safe Output Test - Run 26482687149
Warning
Firewall blocked 6 domains
The following domains were blocked by the firewall during workflow execution:
accounts.google.comandroid.clients.google.comclients2.google.comcontentautofill.googleapis.comsafebrowsingohttpgateway.googleapis.comwww.google.comSee Network Configuration for more information.