Skip to content

fix(sandbox): restore GPU filesystem baseline#1522

Open
elezar wants to merge 1 commit into
mainfrom
fix/1486-gpu-sandbox-filesystem-policy/elezar
Open

fix(sandbox): restore GPU filesystem baseline#1522
elezar wants to merge 1 commit into
mainfrom
fix/1486-gpu-sandbox-filesystem-policy/elezar

Conversation

@elezar
Copy link
Copy Markdown
Member

@elezar elezar commented May 22, 2026

Summary

Restore CUDA GPU filesystem access for Docker-backed GPU sandboxes without promoting all of /proc to full read-write policy access.

This keeps the GPU device-node baseline from #1524, but handles CUDA procfs thread-name writes with a GPU-only Landlock WriteFile exception on procfs. Non-GPU sandboxes do not receive this exception.

Related Issue

Fixes #1486

Builds on the GPU no-network enrichment fix merged in #1524.
The no-network enrichment regression is handled in #1524 and was introduced by #158. This PR addresses the follow-up GPU procfs baseline regression introduced by #910, where explicit default read-only paths prevented GPU-required baseline handling.
The GPU workload test images used for validation come from #1484.

Changes

  • Keep injected NVIDIA/WSL GPU device nodes in the GPU read-write baseline.
  • Stop promoting /proc into filesystem_policy.read_write; /proc can remain read-only in the policy.
  • Add a Linux Landlock runtime exception that grants only AccessFs::WriteFile under /proc, and only when GPU devices are present in the sandbox.
  • Cover descendant CUDA processes, such as a shell workload script that later starts deviceQuery.
  • Preserve custom-policy conflicts for injected GPU device nodes that are incorrectly kept read-only.
  • Update GPU sandbox policy documentation to describe the narrower procfs behavior.

Testing

  • /home/elezar/.local/bin/mise run pre-commit
  • /home/elezar/.cargo/bin/cargo test -p openshell-sandbox --lib baseline_tests -- --nocapture
  • /home/elezar/.cargo/bin/cargo test -p openshell-sandbox --lib landlock::tests -- --nocapture
  • /home/elezar/.cargo/bin/cargo clippy -p openshell-sandbox --lib --tests -- -D warnings
  • Plain Docker control: docker run --rm --device nvidia.com/gpu=all localhost/openshell/gpu-workload-cuda-basic:bdaa08fb-dirty passed with OPENSHELL_GPU_WORKLOAD_SUCCESS cuda-basic
  • Docker-backed OpenShell sandbox: openshell sandbox create --no-keep --from localhost/openshell/gpu-workload-cuda-basic:bdaa08fb-dirty --gpu -- /usr/local/bin/openshell-gpu-workload passed with OPENSHELL_GPU_WORKLOAD_SUCCESS cuda-basic
  • A narrower /proc/self/task prototype failed the same sandbox workload with cudaGetDeviceCount returned 304, confirming the need to cover descendant CUDA processes.

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture/docs updated (if applicable)

@elezar elezar requested review from a team, derekwaynecarr, maxamillion and mrunalp as code owners May 22, 2026 13:47
@github-actions
Copy link
Copy Markdown

@elezar elezar changed the base branch from main to fix/1486-gpu-enrichment-no-network/elezar May 22, 2026 14:06
Base automatically changed from fix/1486-gpu-enrichment-no-network/elezar to main May 27, 2026 08:20
@elezar elezar force-pushed the fix/1486-gpu-sandbox-filesystem-policy/elezar branch from 96a1caa to 59e399a Compare May 27, 2026 09:02
@elezar elezar force-pushed the fix/1486-gpu-sandbox-filesystem-policy/elezar branch from 12bde4d to d73e6de Compare May 28, 2026 19:22
Copy link
Copy Markdown
Collaborator

@pimlock pimlock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a few nits and questions.

Comment on lines +26 to +28
and writable work paths needed by the proxy and shell environment. GPU runtimes
add the NVIDIA or WSL2 device nodes exposed inside the sandbox and promote
`/proc` to read-write for default-like policies because CUDA initialization
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this sentence still accurate?

promote /proc to read-write for default-like policies because CUDA initialization

Right now, the write access comes from the allow_cuda_procfs_writes patch, right? But the policy doesn't get /proc as read-write?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. This should have been removed. Let me rework this again for accuracy.

fn allow_cuda_procfs_writes_allows_descendant_comm_write() {
if std::env::var_os(PROCFS_WRITE_HELPER_ENV).is_some() {
return;
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell the allow_cuda_procfs_writes_allows_descendant_comm_write launches the current test binary with PROCFS_WRITE_HELPER_ENV, and allow_cuda_procfs_writes_helper only runs real helper logic in that child process.

Could we add a short comment or rename the helper test to make that harness pattern clearer? I assume a subprocess is needed because Landlock restrictions are irreversible for the current process.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to remove the implicit broadening of the permissions from this PR and defer it to #1628. We can look at improving the readability of the tests there.

Comment thread crates/openshell-sandbox/src/lib.rs Outdated
Comment on lines +1465 to +1467
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
enum BaselinePolicySource {
DefaultLike,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be worth documenting what this means, from what I can tell it's "the source is either the hardcoded default policy or the policy baked into the sandbox image", is this right?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, running through this again, I think the naming is not what it should be. The differences we are trying to capture is a policy that CAN be normalized (i.e. /proc elevated to read-write if GPUs are requested) such as gateway-provided baseline policies, and users-supplied policies where /proc is explicitly specified where we must not increas priviledges.

I will make a pass with some renames / clarifying comments.

modified |= fs.read_only.len() != before;
}
continue;
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this out loud:

  • we are enriching with a read_write path and that path already exists in the policy
  • if the policy is default-like and the enrichment comes from the GPU baseline, then remove that path from read_only paths.

This is what happens here, right? What scenario would this happen in?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default -> implies a policy that was not explicitly specified for this sandbox by the user (e.g. gateway imposed policy) or the default fallback. Since /proc is included in PROXY_BASELINE_READ_ONLY we need to ensure that -- if /proc was not specified explicitly (i.e. is "default-like" in the current set of changes) AND a gpu is requested for the sandbox, we need to remove it from the read_only set and then add it to the read_write set.

Let me see if I can come up with a more concrete scenario.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I think I misunderstood some things here -- or made assumptions that didn't hold. The only case when we would allow a promotion for this PR was if NO other policy was specified by the user and the default defined in code was applied. (This was more clear in later iterations of this PR).

I don't know whether that's the way we want to go since allowing control of this requires an admin to remove /proc from the policies they specify OR set /proc to read-write for ALL sandboxes. I have created #1629 which would allow an admin to opt-out of promotion to read-write for a set of paths (or path patterns). That may be a better solution than this PR.

} else {
BaselinePolicySource::Custom
}
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this part a bit confusing -> the policy coming from the sandbox image is considered DefaultLike based on https://github.com/NVIDIA/OpenShell/pull/1522/changes#diff-84c5d46e0c1e203a6a93f4a5e29986d5dc6787f11230c2e101d820fca482c1b4R2296

But it's considered Custom based on this check.
Also a DefaultLike policy that is enriched is also then considered Custom. Maybe there are some other names for this to make it clearer?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again, I think that the policies coming from the sandbox image need more scrutiny. I'll make sure that it's consistent/clear here and we can create an internal issue to discuss.

Comment on lines 2017 to 2325
// TLS handshakes.
grpc_retry("Policy discovery sync", || {
grpc_client::discover_and_sync_policy(endpoint, id, sandbox, &discovered)
})
.await?
};

// Ensure baseline filesystem paths are present for proxy-mode
// sandboxes. If the policy was enriched, sync the updated version
// back to the gateway so users can see the effective policy.
let enriched = enrich_proto_baseline_paths(&mut proto_policy);
let enriched = enrich_proto_baseline_paths(&mut proto_policy)?;
if enriched
&& let Some(sandbox_name) = sandbox.as_deref()
&& let Err(e) = grpc_client::sync_policy(endpoint, sandbox_name, &proto_policy).await
{
warn!(
error = %e,
"Failed to sync enriched policy back to gateway (non-fatal)"
);
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is outside of changes here, but the comment in the BaselinePolicySource got me here (with how the enrich_proto_baseline_paths uses Custom).

I think the flow in this function could be improved:

  • line 2278, where we got policy from the server: lines 2313-2325 should likely live there, because that part is only applicable to when we get the policy from the gateway, as if we got the policy as default/from the image, we already enriched it, so there is no need to do that again?
  • unless there is a situation in which this is still useful, even after the policy was initially enriched? Maybe safer to leave it, but maybe worth determining the source in this function, e.g. having the if in 2277 to capture both the policy and source?

let result = ruleset
.add_rule(PathBeneath::new(
path_fd,
make_bitflags!(AccessFs::{WriteFile}),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much narrower is this WriteFile from a standard read_write access? I'm wondering mostly because with /proc being in added with read_write to the policy, the user analyzing what's allowed sees it in the policy. It's more risky, but explicit.

With this, the /proc is in the policy as read-only, but some narrower scope is allowed, but without any mention in the policy.

So it's between "wider, but explicit" vs "narrower, but implicit". I'm not sure what are the risks of the former, but I think that may be a better option. WDYT?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The read-write permissions that OpenShell applies are taken from AccessFs::from_all: https://docs.rs/landlock/0.4.4/src/landlock/fs.rs.html#104

This includes the from_read: https://docs.rs/landlock/0.4.4/src/landlock/fs.rs.html#116-118
And from_write: https://docs.rs/landlock/0.4.4/src/landlock/fs.rs.html#130-143 (which is ABI version dependent).

Assuming V1 this means that the additional permissions that promoting to read-write provide are:

  • WriteFile
  • RemoveDir
  • RemoveFile
  • MakeChar
  • MakeDir
  • MakeReg
  • MakeSock
  • MakeFifo
  • MakeBlock
  • MakeSym

I think that in the case of OpenShell, where auditing what is allowed/accessed is important, explicit behaviour should be favoured over implicit behaviour. Does this maybe mean that we should keep the broader behaviour for now (the first commit) and then look at how we can better surface the permissions that we are adding? Perhaps adding a finer granularity for the policies makes sense?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have created #1628 to track looking into this properly instead of adding this as a "quick" fix on top of this PR.

@elezar elezar force-pushed the fix/1486-gpu-sandbox-filesystem-policy/elezar branch from d73e6de to 2f3b5b2 Compare May 29, 2026 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: GPU sandboxes miss filesystem access for CUDA workloads

2 participants