feat(providers): add Google Vertex AI inference provider#1568
Open
maxamillion wants to merge 4 commits into
Open
feat(providers): add Google Vertex AI inference provider#1568maxamillion wants to merge 4 commits into
maxamillion wants to merge 4 commits into
Conversation
johntmyers
reviewed
May 26, 2026
johntmyers
reviewed
May 26, 2026
Adds Vertex AI provider profiles, routing, credential refresh plumbing, CLI support, docs, and regression coverage. Keeps the related NETLINK_ROUTE seccomp allowance needed by Vertex client tooling that calls getifaddrs.
fb581ba to
fe3b147
Compare
Cover the full end-to-end setup for running Claude Code and OpenCode inside an OpenShell sandbox via inference.local with a Vertex AI backend: - google-vertex-ai.mdx: add 'Use from a Sandbox' section with tabbed examples for Claude Code (--bare flag, no /v1 suffix) and OpenCode (/v1 suffix required). Add providers_v2_enabled prerequisite and --no-verify note for global region. Document policy proposals table covering metadata.google.internal (always blocked), downloads.claude.ai, and storage.googleapis.com. - inference-routing.mdx: expand 'Use the Local Endpoint' section with tabbed examples for Claude Code, OpenCode, Python OpenAI SDK, and Python Anthropic SDK. Add notes explaining the /v1 path suffix difference between clients. - supported-agents.mdx: update Claude Code and OpenCode rows to mention inference.local support and correct base URL requirements.
Collaborator
|
/ok to test 09ddf58 |
On arm64 under heavy CI load, the /proc fd scan in find_socket_inode_owners can transiently miss the parent process's socket fd entry, returning only the child as an owner. This causes resolve_process_identity to return Ok (single owner, no ambiguity check fires) instead of the expected ambiguous-ownership Err. Extend the retry loop to also handle unexpected Ok results, mirroring the existing retry for transient Err results. 10 retries at 50ms gives a 500ms settling window, which is sufficient for procfs to stabilize on loaded arm64 runners.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add Google Vertex AI as a first-class inference provider, supporting both service account (JWT) and gcloud ADC (OAuth2 refresh token) credential flows. Routes Anthropic models through Vertex AI rawPredict and all other models (Gemini, Llama, Mistral, etc.) through the Vertex OpenAI-compatible endpoint. Includes a seccomp policy relaxation for
NETLINK_ROUTEsockets required by Vertex client tooling.Related Issue
Changes
Provider profile & discovery
providers/google-vertex-ai.yamlwith three credential entries: raw service account key (gateway-only, never injected into sandboxes), service account JWT-minted token, and gcloud ADC OAuth2-refreshed token.ProviderTypeProfile::allows_gateway_refresh_bootstrap()andCredentialRefreshProfile::is_gateway_mintable()replace inline gateway-refresh logic in server and CLI.normalize_inference_provider_type()inopenshell-coreis now the single source of truth for provider alias resolution (vertex,vertex-ai,google-vertex→google-vertex-ai).Inference routing (server)
resolve_vertex_ai_route()dispatches by publisher: Anthropic models get rawPredict URLs withmodel_in_path=true; all others get the OpenAI-compatible/chat/completionsendpoint.infer_vertex_publisher()maps model prefixes to publishers (6 families: Anthropic, Google, Meta, Mistral, AI21, DeepSeek).{region}-aiplatform.googleapis.com, global →aiplatform.googleapis.com,us/eu→aiplatform.{region}.rep.googleapis.com.CredentialLookupenum (PreferredOnlyvsPreferredThenAny) prevents raw SA JSON from being picked up as a bearer token.Router backend
build_provider_url()handles four URL construction cases viamodel_in_path×request_path_overridematrix. Streaming upgrades:rawPredict→:streamRawPredict.modelfrom request body (Vertex encodes it in path), injectsanthropic_version: "vertex-2023-10-16", and stripsanthropic-betaheader (Vertex rejects unknown beta values).Provider gRPC (server)
is_non_injectable_provider_credential()prevents raw service account JSON from reaching sandboxes.ANTHROPIC_VERTEX_PROJECT_ID,GCP_PROJECT_ID,CLOUD_ML_REGION,GCP_LOCATION,GOOSE_PROVIDER=gcp_vertex_ai, etc. so Claude Code, Goose, and OpenCode work inside sandboxes. Explicit credential values take precedence.Protobuf
ResolvedRoutegainsmodel_in_path(field 8) andrequest_path_override(field 9).CLI
--from-gcloud-adcflag onprovider create(mutually exclusive with--from-existingand--credential). Reads gcloud ADC fromGOOGLE_APPLICATION_CREDENTIALS,$CLOUDSDK_CONFIG/application_default_credentials.json, or~/.config/gcloud/application_default_credentials.json; validatesauthorized_usertype; configures OAuth2 refresh and mints the first token.VERTEX_AI_PROJECT_ID,VERTEX_AI_REGION, base URL, publisher).SandboxUploadPlanrefactor consolidates upload existence-check + git-aware planning.scrub_git_env()prevents inherited git env vars from breaking subprocess git calls.Sandbox
NETLINK_ROUTE(protocol 0) now allowed through seccomp; all other netlink protocols remain blocked. Required becausegetifaddrs(3)on Linux usesNETLINK_ROUTEand is called by Node.js, Python, Go, and most HTTP/gRPC client libraries. Security is maintained by CAP_NET_ADMIN absence, network namespace isolation, and nftables rules.model_in_pathandrequest_path_override.enrich_sandbox_baseline_paths()refactored with injectablepath_existsclosure for testability.Documentation
docs/providers/google-vertex-ai.mdx: full provider setup guide covering both auth flows, configuration keys, region/host selection, supported models, sandbox usage with Claude Code and OpenCode, and policy proposals guidance.inference-routing.mdx,manage-providers.mdx,providers-v2.mdx,supported-agents.mdx,best-practices.mdxfor Vertex references.architecture/gateway.mdInference Resolution section documenting bundle resolution, Vertex host selection, route shaping, header passthrough, and security model.Testing
mise run pre-commitpasses (lint, format, license headers)Checklist