Skip to content

Fix proxy container UID/GID mismatch causing MCP log write failures#35069

Open
Copilot wants to merge 3 commits into
mainfrom
copilot/fix-difc-cli-proxy-permissions
Open

Fix proxy container UID/GID mismatch causing MCP log write failures#35069
Copilot wants to merge 3 commits into
mainfrom
copilot/fix-difc-cli-proxy-permissions

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 26, 2026

When DIFC/CLI proxy mode is enabled, proxy containers were started as root and created /tmp/gh-aw/mcp-logs/rpc-messages.jsonl with root ownership. MCP Gateway runs as the runner UID/GID, so log initialization could fail with permission denied and produce zero-byte telemetry files that fail post-step parsing.

  • What changed

    • DIFC proxy startup (actions/setup/sh/start_difc_proxy.sh)
      • Run awmg-proxy container as the current runner user.
    • CLI proxy startup (actions/setup/sh/start_cli_proxy.sh)
      • Run awmg-cli-proxy container as the current runner user.
  • Effect

    • Aligns file ownership for shared /tmp/gh-aw/mcp-logs artifacts across proxy and MCP Gateway containers.
    • Prevents root-owned rpc-messages.jsonl from breaking MCP telemetry parsing.
docker run -d --name awmg-proxy --network host \
  --user "$(id -u):$(id -g)" \
  -e GH_TOKEN \
  ...


✨ PR Review Safe Output Test - Run 26482687149

Warning

Firewall blocked 6 domains

The following domains were blocked by the firewall during workflow execution:

  • accounts.google.com
  • android.clients.google.com
  • clients2.google.com
  • contentautofill.googleapis.com
  • safebrowsingohttpgateway.googleapis.com
  • www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "android.clients.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

💥 [THE END] — Illustrated by Smoke Claude · opus47 4.9M ·

Copilot AI and others added 2 commits May 26, 2026 22:22
Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com>
Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix DIFC/CLI proxy containers running as root causing permission denied Fix proxy container UID/GID mismatch causing MCP log write failures May 26, 2026
Copilot finished work on behalf of lpcox May 26, 2026 22:24
Copilot AI requested a review from lpcox May 26, 2026 22:24
@lpcox lpcox marked this pull request as ready for review May 26, 2026 23:57
Copilot AI review requested due to automatic review settings May 26, 2026 23:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Aligns DIFC and CLI proxy container UIDs/GIDs with the host runner so the shared /tmp/gh-aw/mcp-logs/rpc-messages.jsonl log file isn't root-owned, preventing MCP Gateway permission failures and zero-byte telemetry artifacts.

Changes:

  • Add --user "$(id -u):$(id -g)" to docker run in start_difc_proxy.sh.
  • Add --user "$(id -u):$(id -g)" to docker run in start_cli_proxy.sh.
  • Minor gofmt-style realignment of struct field values in codex_engine.go.
Show a summary per file
File Description
actions/setup/sh/start_difc_proxy.sh Run DIFC proxy container as runner UID/GID.
actions/setup/sh/start_cli_proxy.sh Run CLI proxy container as runner UID/GID.
pkg/workflow/codex_engine.go Formatting-only alignment changes for CODEX_HOME / RUST_LOG.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 2/3 changed files
  • Comments generated: 0

@lpcox lpcox added the smoke label May 27, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

🧪 Test Quality Sentinel completed test quality analysis.

No test files were added or modified in this PR (PR #35069). Changed files: actions/setup/sh/start_cli_proxy.sh, actions/setup/sh/start_difc_proxy.sh, pkg/workflow/codex_engine.go. Test Quality Sentinel skipped.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #35069 does not have the 'implementation' label and has only 2 new lines of code in default business logic directories (threshold: 100).

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

PR Code Quality Reviewer completed the code quality review.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

✨ The prophecy is fulfilled... Smoke Codex has completed its mystical journey. The stars align. 🌟

@github-actions
Copy link
Copy Markdown
Contributor

📰 BREAKING: Smoke Copilot is now investigating this pull request. Sources say the story is developing...

@github-actions github-actions Bot removed the smoke label May 27, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

🚀 Smoke Pi MISSION COMPLETE! Pi delivered. 🥧

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

🚀 Smoke Antigravity MISSION COMPLETE! Antigravity has spoken. ✨

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

🚀 Smoke Gemini MISSION COMPLETE! Gemini has spoken. ✨

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

🎬 THE ENDSmoke Claude MISSION: ACCOMPLISHED! The hero saves the day! ✨

@github-actions github-actions Bot mentioned this pull request May 27, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Agent Container Tool Check

Tool Status Version
bash 5.2.21
sh available
git 2.54.0
jq 1.7
yq 4.53.2
curl 8.5.0
gh 2.92.0
node 22.22.3
python3 3.14.5
go 1.24.13
java 10.0.300
dotnet missing

Result: 11/12 tools available — FAIL (dotnet missing)

🔧 Tool validation by Agent Container Smoke Test · sonnet46 505K ·

@github-actions
Copy link
Copy Markdown
Contributor

Smoke Test Results

  • GitHub MCP Testing: ✅
  • Web Fetch Testing: ✅
  • File Writing Testing: ✅
  • Bash Tool Testing: ✅
  • Build gh-aw: ❌

Overall Status: FAIL

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • localhost

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "localhost"

See Network Configuration for more information.

Smoke Gemini — Powered by Gemini ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core fix is correct and sound for the primary use case (GitHub-hosted runners running as a non-root runner user). One non-blocking concern worth resolving before this ships to self-hosted runners.

### Findings

Arbitrary-UID container compatibility (medium)--user $(id -u):$(id -g) passes the host UID directly into the container. This is safe only if both proxy images are built to tolerate UIDs that have no /etc/passwd entry. Processes that call getpwuid() (TLS libs, Rust stdlib, Node) fail with ENOENT when the UID is missing, which could silently break TLS cert generation and cause the post-start health check to time out. See inline comment for mitigations.

codex_engine.go change — whitespace alignment swap only; no logic concern.

🔎 Code quality review by PR Code Quality Reviewer · sonnet46 1.2M

fi

docker run -d --name awmg-cli-proxy --network host \
--user "$(id -u):$(id -g)" \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Host UID may not exist in the container image, risking getpwuid() failures during TLS initialization.

💡 Details and suggested mitigation

Passing --user $(id -u):$(id -g) injects the host runner UID/GID into the container. If that UID is absent from the container image's /etc/passwd, processes that call getpwuid() — including TLS libraries (OpenSSL, rustls) and many Rust/Node runtimes — receive ENOENT and can silently fail or crash during startup. The health-check loop that follows relies on the TLS cert being written (--tls --tls-dir); a crash before cert generation would cause the job to time out with a misleading error.

This pattern is only safe if the container image is explicitly built to tolerate arbitrary UIDs (e.g. home directory and write paths are world-writable, as recommended by the [OpenShift arbitrary-UID guidance]((docs.openshift.com/redacted)

Suggested mitigations (pick one):

  1. Confirm and document that both proxy images support arbitrary UIDs.
  2. If not, use a fixed non-root UID defined in the image Dockerfile rather than passing the host UID.
  3. Add a post-start check: docker exec awmg-cli-proxy id to surface this failure early.

Note: identical concern applies to start_difc_proxy.sh line 43.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /diagnose and /zoom-out — approving with minor observations.

📋 Key Themes & Highlights

Key Themes

  • Root cause properly addressed: Adding --user "$(id -u):$(id -g)" to both proxy docker run invocations is the correct fix. File ownership now aligns with the MCP Gateway runner UID/GID.
  • Missing regression test: No test guards against this flag being dropped in future; the start_mcp_gateway_test.sh pattern shows a precedent for pairing proxy scripts with shell tests.
  • Unrelated formatting change: The codex_engine.go hunk is a pure alignment swap — harmless but adds diff noise to an otherwise tightly-scoped fix.

Positive Highlights

  • ✅ Minimal, surgical fix — exactly one line added per affected script
  • ✅ Consistent treatment: both start_difc_proxy.sh and start_cli_proxy.sh patched symmetrically
  • ✅ PR description is clear and includes a concrete docker run example
  • ✅ No logic changes; low regression risk

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · sonnet46 1.4M

echo "Starting DIFC proxy container: $CONTAINER_IMAGE"

docker run -d --name awmg-proxy --network host \
--user "$(id -u):$(id -g)" \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnose] No regression test accompanies this fix. The failure mode (root-owned rpc-messages.jsonl → permission-denied on subsequent writes) is concrete and reproducible.

💡 Consider a shell-level smoke test

Other scripts in this directory (e.g., start_mcp_gateway_test.sh) show a pattern of pairing start_*.sh with a *_test.sh. A lightweight start_difc_proxy_test.sh could assert the --user flag is present in the script, guarding against accidental removal in future refactors.

// Keep Codex runtime state in /tmp/gh-aw because ${RUNNER_TEMP}/gh-aw is
// mounted read-only inside the AWF chroot sandbox.
"CODEX_HOME": "/tmp/gh-aw/mcp-config",
"CODEX_HOME": "/tmp/gh-aw/mcp-config",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/zoom-out] This alignment change is unrelated to the stated bug fix and adds noise to the diff. The PR title and body describe a proxy UID/GID fix — a formatting-only change to codex_engine.go is unexpected here.

Consider reverting this hunk or splitting it into a separate formatting PR so the diff stays focused on the permission fix.

@github-actions
Copy link
Copy Markdown
Contributor

Smoke test complete: FAIL

  • GitHub MCP PR lookup: ✅
  • Serena symbol lookup: ✅
  • Playwright GitHub title check: ✅
  • Web-fetch MCP: ❌ unavailable
  • Build: ✅

Summary issue created for the run.

Warning

Firewall blocked 6 domains

The following domains were blocked by the firewall during workflow execution:

  • accounts.google.com
  • android.clients.google.com
  • clients2.google.com
  • contentautofill.googleapis.com
  • safebrowsingohttpgateway.googleapis.com
  • www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "android.clients.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

🔮 The oracle has spoken through Smoke Codex · gpt54 8.7M ·

@github-actions
Copy link
Copy Markdown
Contributor

Comment Memory

Quiet checks converge
Build, browser, and memory
Signals hold steady

Note

This comment is managed by comment memory.

It stores persistent context for this thread in the code block at the top of this comment.
Edit only the text inside the backtick fences; workflow metadata and the footer are regenerated automatically.

Learn more about comment memory

Warning

Firewall blocked 6 domains

The following domains were blocked by the firewall during workflow execution:

  • accounts.google.com
  • android.clients.google.com
  • clients2.google.com
  • contentautofill.googleapis.com
  • safebrowsingohttpgateway.googleapis.com
  • www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "android.clients.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

🔮 The oracle has spoken through Smoke Codex · gpt54 8.7M ·

@github-actions
Copy link
Copy Markdown
Contributor

Smoke Test: Claude — Run §26482687149

Overall: ✅ PARTIAL (1 skipped)

Warning

Firewall blocked 6 domains

The following domains were blocked by the firewall during workflow execution:

  • accounts.google.com
  • android.clients.google.com
  • clients2.google.com
  • contentautofill.googleapis.com
  • safebrowsingohttpgateway.googleapis.com
  • www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "android.clients.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

💥 [THE END] — Illustrated by Smoke Claude · opus47 4.9M ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💥 Automated smoke test review - all systems nominal!

Warning

Firewall blocked 6 domains

The following domains were blocked by the firewall during workflow execution:

  • accounts.google.com
  • android.clients.google.com
  • clients2.google.com
  • contentautofill.googleapis.com
  • safebrowsingohttpgateway.googleapis.com
  • www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "android.clients.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

💥 [THE END] — Illustrated by Smoke Claude · opus47 4.9M

fi

docker run -d --name awmg-cli-proxy --network host \
--user "$(id -u):$(id -g)" \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix — running the proxy container as the runner UID/GID should keep MCP log file permissions consistent with the host runner. Consider also documenting this in a short comment above the flag so the intent survives future refactors.

echo "Starting DIFC proxy container: $CONTAINER_IMAGE"

docker run -d --name awmg-proxy --network host \
--user "$(id -u):$(id -g)" \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good symmetry with start_cli_proxy.sh. Worth a follow-up: factor the --user "$(id -u):$(id -g)" flag into a shared helper to avoid drift between the two proxy launch scripts.

@github-actions
Copy link
Copy Markdown
Contributor

Smoke Copilot 26482687087: FAIL
PRs: Clarify .lock.yml purpose and edit model in Quick Start Step 2; Reduce BenchmarkValidation latency by caching permission-scope validation
Results: ✅ Serena, Playwright, file/bash, discussion comment, build/artifact, dispatch, review, sub-agent, check run; ❌ GitHub MCP, mcpscripts-gh, web-fetch, discussion label/temp-id, comment-memory
Author: app/copilot-swe-agent
Assignees: lpcox, Copilot

Warning

Firewall blocked 6 domains

The following domains were blocked by the firewall during workflow execution:

  • accounts.google.com
  • android.clients.google.com
  • clients2.google.com
  • contentautofill.googleapis.com
  • safebrowsingohttpgateway.googleapis.com
  • www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "android.clients.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

📰 BREAKING: Report filed by Smoke Copilot · gpt55 4.4M ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smoke review completed: inline comments note the proxy UID/GID ownership alignment.

Warning

Firewall blocked 6 domains

The following domains were blocked by the firewall during workflow execution:

  • accounts.google.com
  • android.clients.google.com
  • clients2.google.com
  • contentautofill.googleapis.com
  • safebrowsingohttpgateway.googleapis.com
  • www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "android.clients.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

📰 BREAKING: Report filed by Smoke Copilot · gpt55 4.4M


docker run -d --name awmg-cli-proxy --network host \
--user "$(id -u):$(id -g)" \
-e GH_TOKEN \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smoke review: running the CLI proxy container as the runner UID/GID should keep shared MCP log ownership aligned with the gateway process.


docker run -d --name awmg-proxy --network host \
--user "$(id -u):$(id -g)" \
-e GH_TOKEN \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smoke review: this mirrors the ownership fix for the DIFC proxy path, which helps prevent root-owned files in the shared log directory.

@github-actions
Copy link
Copy Markdown
Contributor

📰 VERDICT: Smoke Copilot has concluded. All systems operational. This is a developing story. 🎤

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: DIFC/CLI proxy containers run as root, causing permission denied on shared rpc-messages.jsonl

3 participants