Skip to content

feat(providers): add Google Vertex AI inference provider#1568

Open
maxamillion wants to merge 4 commits into
NVIDIA:mainfrom
maxamillion:vertex-provider
Open

feat(providers): add Google Vertex AI inference provider#1568
maxamillion wants to merge 4 commits into
NVIDIA:mainfrom
maxamillion:vertex-provider

Conversation

@maxamillion
Copy link
Copy Markdown
Collaborator

@maxamillion maxamillion commented May 26, 2026

Summary

Add Google Vertex AI as a first-class inference provider, supporting both service account (JWT) and gcloud ADC (OAuth2 refresh token) credential flows. Routes Anthropic models through Vertex AI rawPredict and all other models (Gemini, Llama, Mistral, etc.) through the Vertex OpenAI-compatible endpoint. Includes a seccomp policy relaxation for NETLINK_ROUTE sockets required by Vertex client tooling.

Related Issue

Changes

Provider profile & discovery

  • New providers/google-vertex-ai.yaml with three credential entries: raw service account key (gateway-only, never injected into sandboxes), service account JWT-minted token, and gcloud ADC OAuth2-refreshed token.
  • ProviderTypeProfile::allows_gateway_refresh_bootstrap() and CredentialRefreshProfile::is_gateway_mintable() replace inline gateway-refresh logic in server and CLI.
  • normalize_inference_provider_type() in openshell-core is now the single source of truth for provider alias resolution (vertex, vertex-ai, google-vertexgoogle-vertex-ai).

Inference routing (server)

  • resolve_vertex_ai_route() dispatches by publisher: Anthropic models get rawPredict URLs with model_in_path=true; all others get the OpenAI-compatible /chat/completions endpoint.
  • infer_vertex_publisher() maps model prefixes to publishers (6 families: Anthropic, Google, Meta, Mistral, AI21, DeepSeek).
  • Region-to-host mapping: regional → {region}-aiplatform.googleapis.com, global → aiplatform.googleapis.com, us/euaiplatform.{region}.rep.googleapis.com.
  • Base URL override escape hatch with strict validation (HTTPS, official Vertex hostname, no IP literals, no userinfo, no query/fragment, port 443 only; rejected outright for Anthropic models).
  • Model ID validation rejects path separators, URL delimiters, percent escapes, traversal segments, whitespace, and control characters.
  • CredentialLookup enum (PreferredOnly vs PreferredThenAny) prevents raw SA JSON from being picked up as a bearer token.

Router backend

  • build_provider_url() handles four URL construction cases via model_in_path × request_path_override matrix. Streaming upgrades :rawPredict:streamRawPredict.
  • For Vertex Anthropic rawPredict: strips model from request body (Vertex encodes it in path), injects anthropic_version: "vertex-2023-10-16", and strips anthropic-beta header (Vertex rejects unknown beta values).

Provider gRPC (server)

  • is_non_injectable_provider_credential() prevents raw service account JSON from reaching sandboxes.
  • Agent config env var injection for Vertex providers: injects ANTHROPIC_VERTEX_PROJECT_ID, GCP_PROJECT_ID, CLOUD_ML_REGION, GCP_LOCATION, GOOSE_PROVIDER=gcp_vertex_ai, etc. so Claude Code, Goose, and OpenCode work inside sandboxes. Explicit credential values take precedence.

Protobuf

  • ResolvedRoute gains model_in_path (field 8) and request_path_override (field 9).

CLI

  • --from-gcloud-adc flag on provider create (mutually exclusive with --from-existing and --credential). Reads gcloud ADC from GOOGLE_APPLICATION_CREDENTIALS, $CLOUDSDK_CONFIG/application_default_credentials.json, or ~/.config/gcloud/application_default_credentials.json; validates authorized_user type; configures OAuth2 refresh and mints the first token.
  • Rollback on failure: deletes orphaned provider, or warns with manual cleanup instructions if deletion also fails.
  • Vertex-specific config env var discovery (VERTEX_AI_PROJECT_ID, VERTEX_AI_REGION, base URL, publisher).
  • SandboxUploadPlan refactor consolidates upload existence-check + git-aware planning.
  • scrub_git_env() prevents inherited git env vars from breaking subprocess git calls.

Sandbox

  • NETLINK_ROUTE (protocol 0) now allowed through seccomp; all other netlink protocols remain blocked. Required because getifaddrs(3) on Linux uses NETLINK_ROUTE and is called by Node.js, Python, Go, and most HTTP/gRPC client libraries. Security is maintained by CAP_NET_ADMIN absence, network namespace isolation, and nftables rules.
  • Bundle-to-route conversion populates model_in_path and request_path_override.
  • enrich_sandbox_baseline_paths() refactored with injectable path_exists closure for testability.

Documentation

  • New docs/providers/google-vertex-ai.mdx: full provider setup guide covering both auth flows, configuration keys, region/host selection, supported models, sandbox usage with Claude Code and OpenCode, and policy proposals guidance.
  • Updated inference-routing.mdx, manage-providers.mdx, providers-v2.mdx, supported-agents.mdx, best-practices.mdx for Vertex references.
  • New architecture/gateway.md Inference Resolution section documenting bundle resolution, Vertex host selection, route shaping, header passthrough, and security model.

Testing

  • mise run pre-commit passes (lint, format, license headers)
  • Unit tests added/updated:
    • ~35 server inference tests (publisher inference, route resolution for all regions/overrides/validation, model ID validation, bundle integration)
    • ~15 router backend tests (URL construction, body rewriting, header stripping, wiremock integration for buffered + streaming)
    • ~8 provider gRPC tests (credential injection, agent config injection, SA key filtering, refresh bootstrap)
    • ~15 CLI integration tests (ADC happy path, SA rejection, missing file, wrong provider type, configure/rotate rollback, rollback-delete failure, config keys, mutual exclusion)
    • 3 seccomp tests (rule conditionality, behavioral NETLINK_ROUTE allowed, behavioral NETLINK_SOCK_DIAG blocked)
    • 3 router integration tests (full proxy for Vertex Gemini, Vertex Anthropic buffered, Vertex Anthropic streaming)
    • Provider profile, core inference, sandbox bundle, and config tests
  • E2E tests added/updated (requires live Vertex AI credentials; not run in CI without secrets)

Checklist

@maxamillion maxamillion requested review from a team, derekwaynecarr and mrunalp as code owners May 26, 2026 16:15
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 26, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@maxamillion maxamillion marked this pull request as draft May 26, 2026 16:21
Comment thread crates/openshell-providers/src/providers/vertex.rs Outdated
Comment thread docs/providers/google-vertex-ai.mdx
@maxamillion maxamillion marked this pull request as ready for review May 27, 2026 02:52
@maxamillion maxamillion marked this pull request as draft May 27, 2026 03:28
Adds Vertex AI provider profiles, routing, credential refresh plumbing, CLI support, docs, and regression coverage. Keeps the related NETLINK_ROUTE seccomp allowance needed by Vertex client tooling that calls getifaddrs.
Cover the full end-to-end setup for running Claude Code and OpenCode
inside an OpenShell sandbox via inference.local with a Vertex AI backend:

- google-vertex-ai.mdx: add 'Use from a Sandbox' section with tabbed
  examples for Claude Code (--bare flag, no /v1 suffix) and OpenCode
  (/v1 suffix required). Add providers_v2_enabled prerequisite and
  --no-verify note for global region. Document policy proposals table
  covering metadata.google.internal (always blocked), downloads.claude.ai,
  and storage.googleapis.com.

- inference-routing.mdx: expand 'Use the Local Endpoint' section with
  tabbed examples for Claude Code, OpenCode, Python OpenAI SDK, and
  Python Anthropic SDK. Add notes explaining the /v1 path suffix
  difference between clients.

- supported-agents.mdx: update Claude Code and OpenCode rows to mention
  inference.local support and correct base URL requirements.
@maxamillion maxamillion marked this pull request as ready for review May 28, 2026 20:04
@TaylorMutch
Copy link
Copy Markdown
Collaborator

/ok to test 09ddf58

On arm64 under heavy CI load, the /proc fd scan in
find_socket_inode_owners can transiently miss the parent process's
socket fd entry, returning only the child as an owner. This causes
resolve_process_identity to return Ok (single owner, no ambiguity
check fires) instead of the expected ambiguous-ownership Err.

Extend the retry loop to also handle unexpected Ok results, mirroring
the existing retry for transient Err results. 10 retries at 50ms gives
a 500ms settling window, which is sufficient for procfs to stabilize
on loaded arm64 runners.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants