Skip to content

chore: experiment with Bazel Remote Execution (BRE) on Namespace#10579

Draft
basvandijk wants to merge 57 commits into
masterfrom
basvandijk/namespace-bazel-remote-execution
Draft

chore: experiment with Bazel Remote Execution (BRE) on Namespace#10579
basvandijk wants to merge 57 commits into
masterfrom
basvandijk/namespace-bazel-remote-execution

Conversation

@basvandijk

@basvandijk basvandijk commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Overview

Experiment with Namespace Bazel Remote Execution (BRE): run bazel test on Namespace runners, with actions executed on remote Namespace workers booted from a custom worker image (a mirror of ic-build).

Changes

Extended: .github/workflows/container-autobuild.yml

  • New bre-worker-image job (on the Namespace runner where nsc is pre-authenticated): mirrors the freshly built ic-build image — by digest — into the Namespace tenant registry ($NSC_CONTAINER_REGISTRY, i.e. nscr.io/<tenant>/ic-build-worker) via nsc base-image upload, resolves the pushed digest, and optimizes it for BRE via nsc base-image optimize. Same fork guard as above.
  • It is treated as a required job, like ic-build-image: if it fails, that's a bug to fix.
  • The existing update-image-references job now also pins the worker image. It needs bre-worker-image (so its commit/push — which re-triggers the workflow and would otherwise cancel the in-flight optimize via cancel-in-progress — only happens once the optimize completes), and rewrites the pinned ref in bre-namespace-test.yml in the same commit as the ic-build/ic-dev ref and TAG updates. The pin sed is tenant-agnostic, so it self-corrects if the Namespace tenant ever changes.

New workflow: .github/workflows/bre-namespace-test.yml

  • Runs bazel test on a Namespace runner (namespace-profile-amd64-linux-32x64) using BRE. Targets default to //... and are overridable via a workflow_dispatch input.
  • Provisions the RBE cluster with nsc bazel execution setup (writes a bazelrc with the remote executor, cache and credentials — deliberately not printed, since it contains short-lived credentials).
  • Routes actions to the custom worker image via --remote_default_exec_properties=container-image=....
  • Bypasses the DFINITY-internal Bazel cache/RE config (--noworkspace_rc + explicit .bazelrc.build + the Namespace RBE bazelrc), mirroring the existing bazel-test-arm64 job.
  • Excludes long/nightly/fuzz/large-system tests via --test_tag_filters and runs with --keep_going.
  • Opt-in while experimental: workflow_dispatch, pushes to dev-gh-*, or non-fork PRs labeled CI_BRE. Restricted to dfinity/ic; fork PRs are excluded because the job runs on a privileged Namespace runner with pre-authenticated nsc.

Notes / follow-ups

  • The worker-image ref in the test workflow starts as a placeholder digest; it is populated on the next ic-build rebuild (bump ci/container/TAG).
  • Because bre-worker-image is required, a Namespace/BRE outage would block the production ic-build/ic-dev reference bump — a conscious trade-off (both must succeed).
  • Expect rough edges running //... under BRE (ic-os local-strategy targets, privileged/system tests); the broad --test_tag_filters exclusions and --keep_going reduce noise while iterating.

@basvandijk basvandijk requested a review from Copilot June 26, 2026 12:28
@basvandijk basvandijk changed the title Experiment with Bazel Remote Execution (BRE) on Namespace chore: experiment with Bazel Remote Execution (BRE) on Namespace Jun 26, 2026
@github-actions github-actions Bot added the chore label Jun 26, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an experimental GitHub Actions path to run bazel test using Namespace Bazel Remote Execution (BRE), including automation to build/mirror/optimize a worker image and keep the workflow pinned to an immutable digest.

Changes:

  • Adds a new experimental workflow to run bazel test on Namespace runners with remote execution enabled.
  • Extends the container autobuild workflow with jobs to mirror ic-build into nscr.io, optimize it for BRE, and automatically update the pinned worker-image digest used by the new workflow.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
.github/workflows/container-autobuild.yml Adds jobs to create an optimized Namespace BRE worker image and auto-update the pinned digest reference in workflows.
.github/workflows/bre-namespace-test.yml New opt-in workflow to run bazel test using Namespace remote execution with a pinned worker image.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/bre-namespace-test.yml Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

Comment thread .github/workflows/container-autobuild.yml Outdated
Comment thread .github/workflows/container-autobuild.yml Outdated
Comment thread .github/workflows/container-autobuild.yml Outdated
Comment thread .github/workflows/bre-namespace-test.yml Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

Add a new 'bre-namespace-test.yml' workflow that runs 'bazel test' on Namespace runners using Bazel Remote Execution (BRE). Actions execute on Namespace workers booted from a custom worker image (a mirror of ic-build).

Extend 'container-autobuild.yml' to mirror the freshly built ic-build image into nscr.io, optimize it for BRE, and pin the resulting digest. These jobs are decoupled from the production image-reference update so an early-access BRE failure can never block it.
Addresses Copilot review: 'nsc bazel execution setup' writes short-lived credentials into the bazelrc, so 'cat'-ing it could leak auth material into the Actions logs.
…update job

- Use the same fully-qualified nscr.io ref for the upload destination and the digest lookup, so the inspected tag is guaranteed to exist.

- Guard bre-worker-image and bre-namespace-test against fork PRs via head.repo.full_name == github.repository (matching ci-kickoff.yml), since both run on privileged Namespace runners with pre-authenticated nsc.

- Drop update-image-references from update-worker-reference's needs for true decoupling; the existing 'git pull --rebase' absorbs any concurrent push.
@basvandijk basvandijk force-pushed the basvandijk/namespace-bazel-remote-execution branch from 27f3f10 to 95ee442 Compare June 28, 2026 12:15
…ac895bdd550cac7bacb9dad553bae

ic-build: sha256:f4c6c7e0e16da470cba7ebceb0145f588d5fd4859c04acfa607bee475ecfa914

ic-dev:   sha256:2f98d344d708a1ae70938d5e777a1f141f7f2a9545687653f407a405eb1a27ea
@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Run URL: https://github.com/dfinity/ic/actions/runs/28442091939

New container images with tag: 22378bb2ad2621b518f4000afdb1ebbe793826b789c9ea988d61e863e46d4d95
ic-build: sha256:e9f95a42acbb5dd96f36d53037129842e16f2ec628ea38f09c9d2404cba2fdff
ic-dev: sha256:cb0b750d7254a4fa280b2f0d0a62ab05649fa9b8c24eafb59bb2dc040fd8dac2
ic-build-worker: nscr.io/c9ptjuknd7oc6/ic-build-worker@sha256:b3209ba49237175d9f4339daa4b0828f1eee0cf9bd8ccd193af7be4a9663d919

update-image-references now depends on bre-worker-image (default needs semantics) and pins the worker digest unconditionally in the same commit, dropping the cancelled()-guard and the separate update-worker-reference job. Waiting on bre-worker-image before committing/pushing avoids the concurrency cancellation seen earlier.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

Comment thread .github/workflows/bre-namespace-test.yml Outdated
Comment thread .github/workflows/bre-namespace-test.yml Outdated
Comment thread .github/workflows/container-autobuild.yml
nsc base-image upload prepends $NSC_CONTAINER_REGISTRY (nscr.io/<tenant>) to a relative name. Passing the fully-qualified nscr.io/dfinity ref caused a double-prefixed push (nscr.io/<tenant>/nscr.io/dfinity/...) and a 401 on the digest lookup. Derive the registry from $NSC_CONTAINER_REGISTRY, upload a relative name, pin the resulting full ref, and match any tenant in the pin sed. Also fixes a stale comment referencing the removed update-worker-reference job.
bazel_skylib's copy_file emits a CopyFile spawn tagged no-remote/no-cache
(COPY_EXECUTION_REQUIREMENTS). Under remote execution (Namespace BRE), where
the only available spawn strategy is remote, these actions have no eligible
strategy and fail with:

  CopyFile spawn cannot be executed with any of the available strategies: [remote]

Force the CopyFile/CopyDirectory mnemonics to a local strategy. A per-mnemonic
--strategy overrides --spawn_strategy, so it works regardless of the remote
execution config, and is a no-op for non-remote-execution builds where these
copies already run locally.
The Namespace remote-execution bazelrc makes 'remote' the only spawn strategy,
so spawns that forbid remote execution have no eligible strategy and fail with:

  <Mnemonic> spawn cannot be executed with any of the available strategies: [remote]

This affects e.g. bazel_skylib's copy_file (tagged no-remote) and rules_python's
compile_pip_requirements .test target (tagged no-remote-exec, requires-network).

Pass --spawn_strategy=remote,local on the BRE 'bazel test' command line so Bazel
runs exactly those spawns locally on the runner while everything else still runs
remotely. A command-line flag overrides the remote-only --spawn_strategy from the
Namespace bazelrc.

This supersedes the earlier per-mnemonic --strategy=CopyFile=local workaround, so
revert it from the shared bazelrc.build (it had affected non-BRE builds too).
…373e886e00994723ae3de5e53b107

ic-build: sha256:cb929d45e83f893f4b03fde1d596dd1acc3211d367f0b8b0195c13c72ee329de

ic-dev:   sha256:3e9bd73664f66ea9feb41c414731b66d71e4dc9b8febed19f46e01fe098caf60

ic-build-worker: nscr.io/c9ptjuknd7oc6/ic-build-worker@sha256:584ad7548763df6bf44fd8aa320091f8c67664c4e09111bd80d285b8c5fa8154
Genrules resolve their spawn strategy via --genrule_strategy / --strategy=Genrule,
not --spawn_strategy, so the earlier --spawn_strategy=remote,local fallback did not
reach them. Under the remote-only Namespace BRE config, genrules marked
'local = True' (e.g. //rs/tests:libvirtd and //rs/tests:dnsmasq, which copy host
binaries) thus failed with:

  Genrule spawn cannot be executed with any of the available strategies: [remote]

Pass --strategy=Genrule=remote,local on the 'bazel test' command line. Per Bazel,
--strategy=<mnemonic> overrides both --spawn_strategy and --genrule_strategy, and a
command-line flag overrides the Namespace bazelrc, so local-only genrules fall back
to local while everything else still runs remotely.
The update-image-references sed rewrites ghcr.io/dfinity/ic-build(:|@)... across .github/workflow*/* (which includes container-autobuild.yml), clobbering the dynamic src back into a hardcoded digest each run. Build src from an ic_build_repo variable so the literal ghcr.io/dfinity/ic-build@ never appears on the line and the sed no longer matches it.
bazel/conf/.bazelrc.build sets --noexperimental_inmemory_dotd_files (forcing C++
.d dependency files to disk) to work around a DFINITY remote-cache bug
(bazelbuild/bazel#22387). Under Namespace remote execution with
build-without-the-bytes, intermediate .d files are not downloaded locally, so
reading them from disk fails with:

  error while parsing .d file: .../infogetty.d (No such file or directory)

Pass --experimental_inmemory_dotd_files on the 'bazel test' command line so .d
files are streamed in memory from the remote nodes. This overrides the bazelrc
setting (command-line flags win); the cache bug it guarded against does not apply
here since this job bypasses the DFINITY cache and uses Namespace's executor.
basvandijk and others added 4 commits June 30, 2026 11:47
ic-os image builds run rootless 'podman build' and are forced to run locally (in
this job's container) by --strategy_regexp=ic-os[:/].*=local. Rootless podman
must create a user namespace, which an unprivileged container blocks:

  cannot clone: Operation not permitted
  Error: cannot re-exec process

Add --privileged (grants the capabilities podman needs) and a tmpfs at
/tmp/containers (backs podman's --root/--runroot, off the container's overlay
rootfs), mirroring the privileged container options used by the jobs in
ci-main.yml. --cgroupns host is intentionally omitted for now; it is not
implicated by this failure and can be re-added if a cgroup error appears.
…826b789c9ea988d61e863e46d4d95

ic-build: sha256:e9f95a42acbb5dd96f36d53037129842e16f2ec628ea38f09c9d2404cba2fdff

ic-dev:   sha256:cb0b750d7254a4fa280b2f0d0a62ab05649fa9b8c24eafb59bb2dc040fd8dac2

ic-build-worker: nscr.io/c9ptjuknd7oc6/ic-build-worker@sha256:b3209ba49237175d9f4339daa4b0828f1eee0cf9bd8ccd193af7be4a9663d919
…ions

rules_rust's `_symlink_sysroot_tree` iterated `target.files` instead of its `target_files` argument, so the linker target's runfiles (rust-lld's bundled `gcc-ld/*` self-contained linker wrappers, e.g. `gcc-ld/ld.lld`) were never symlinked into the generated sysroot and thus never became declared inputs of the Rustc actions.

rustc defaults to lld on x86_64-unknown-linux-gnu (it links via `-fuse-ld=lld -B<sysroot>/lib/rustlib/<target>/bin/gcc-ld`). Local (non-sandboxed) builds still found gcc-ld on disk, but Bazel Remote Execution (Namespace BRE) ships only an action's declared inputs, so every Rustc link action (starting with the bootstrap process_wrapper) failed with: "the self-contained linker was requested, but it wasn't found in the target's sysroot, or in rustc's sysroot".
Commit 7cde29f added --experimental_inmemory_dotd_files to counteract --noexperimental_inmemory_dotd_files in bazel/conf/.bazelrc.build. Commit 91ae530 stopped setting --noexperimental_inmemory_dotd_files, so the flag (and its comment) are no longer needed.
The previous version of this patch switched _symlink_sysroot_tree to iterate
only the linker target's runfiles (rust-lld's gcc-ld/* wrappers), which dropped
the rust-lld binary at lib/rustlib/<target>/bin/rust-lld that rustc invokes
directly to link wasm32 canisters, causing 'linker `rust-lld` not found' under
remote execution.

Symlink the union of the linker target's files (the rust-lld binary, used
directly for wasm32) and its runfiles (the gcc-ld/* wrappers, used by rustc's
default lld on x86_64-unknown-linux-gnu) so both link under remote execution.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore CI_BRE Trigger the bazel-test-bre job to run bazel test via Remote Execution @ Namespace

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants