feat(cli): af install/run for agent nodes — encrypted secrets, env prompting, node-to-node deps#692
feat(cli): af install/run for agent nodes — encrypted secrets, env prompting, node-to-node deps#692AbirAbbas wants to merge 7 commits into
Conversation
…n for agent nodes
Adds the foundation for making 'af install'/'af run' usable for real agent
nodes (which start via 'python -m pkg.app' and have no top-level main.py):
- internal/packages/secrets.go: encrypted at-rest secret store. KeyfileProvider
keeps a random 32-byte key at ~/.agentfield/keyring/master.key (0600);
SecretStore encrypts global.enc + <node>.enc via AES-256-GCM, with node scope
overriding global so shared keys (API tokens) are entered once.
- internal/packages/env_resolver.go: resolves declared env vars in order
process-env -> node store -> global store -> manifest default -> prompt
(hidden for type:secret), persisting prompted secrets encrypted. Injected only
into the child process; never written to disk in plaintext.
- installer.go: manifest gains entrypoint{start,healthcheck}, dependencies.nodes,
and per-var scope. Validation accepts entrypoint.start instead of requiring
main.py; package copy excludes .git/venv/.env/__pycache__.
- runner.go: launches via manifest entrypoint, exports AGENTFIELD_SERVER (the
var the SDK actually reads) alongside legacy AGENTFIELD_SERVER_URL, honors the
manifest healthcheck path, and resolves env via the secret store.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- af secrets set/ls/rm manages the encrypted store (hidden input for set, masked listing, global + --node scopes). - install resolves dependencies.nodes recursively (af://registry/<name> -> github.com/Agent-Field/<name>, or git URLs), skipping already-installed nodes to break cycles. - af run brings up a node's installed node-dependencies first, in dependency order, with cycle protection. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- docs/installing-agent-nodes.md: full guide to af install/run, the agentfield-package.yaml manifest (entrypoint, node deps, user_environment), the encrypted runtime-only secrets model, and af secrets. - cli-toolkit.md reference: document af install, af run, af secrets (+ embedded skill_data copy synced). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Local end-to-end verification revealed the CLI's install/run path goes through internal/core/services (DefaultPackageService/DefaultAgentService), which duplicated — and so bypassed — the fixes previously made in internal/packages. 'af install' on an entrypoint-only node still failed with 'main.py not found', and 'af run' still exported only AGENTFIELD_SERVER_URL and loaded plaintext .env. - package_service: validate/parse/copy now delegate to the shared packages.ValidatePackage / ParsePackageMetadata / ShouldSkipCopy (entrypoint accepted, junk excluded). Install guidance points at 'af secrets set'. - agent_service: buildProcessConfig launches via the manifest entrypoint, exports AGENTFIELD_SERVER, resolves env via the encrypted secret store (prompting for missing required), honors the manifest healthcheck path, and drops the plaintext .env loader. RunAgent starts node deps first with a threaded cycle guard. - packages: export ValidatePackage + ShouldSkipCopy as the single source of truth. - tests updated to the new contract (entrypoint validation, store-based env injection instead of .env). Verified end-to-end: install entrypoint-only node -> missing-secret errors cleanly -> af secrets set -> af run injects AGENTFIELD_SERVER + the stored secret + manifest default into the process (confirmed via the node's env dump). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Local multi-agent verification showed a port collision: dependencies were started after the parent allocated its port, so the parent's port (not yet bound) was handed out again to a dependency, which then failed to bind. Move dependency startup ahead of port allocation so each dependency fully binds its own port first. Verified end-to-end against a live local control plane: 'af run greeter-node' auto-starts its dependency echo-node (distinct ports 8002/8003), both register, both reasoners execute through the control plane, and an already-running dependency is left untouched (same PID). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Unit tests for resolveNodeRef, installedNames, installNodeDependencies (skip-already-installed), and startNodeDependencies (not-installed warning + already-running skip) in both the service and packages layers — covering the new patch lines and pinning the behaviors verified end-to-end. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
📊 Coverage gateThresholds from
✅ Gate passedNo surface regressed past the allowed threshold and the aggregate stayed above the floor. |
📐 Patch coverage gateThreshold: 80% on lines this PR touches vs
❌ Patch gate failed
How to fix
|
End-to-end install testing against the published node repos surfaced two gaps: 1. The git and GitHub install paths (git.go/github.go findPackageRoot) were a third and fourth copy of the 'main.py required' check, so 'af install <github-url>' failed for entrypoint-only nodes (no top-level main.py) such as SWE-AF and cloudsecurity-af. Both now delegate to the shared ValidatePackage (accepts a manifest entrypoint.start). 2. Dependency install only ran for requirements.txt projects, so pyproject-only nodes (pr-af, sec-af, cloudsecurity-af) installed with no venv and no deps. Dependency install is now a single shared InstallPythonDependencies that also runs 'pip install .' for pyproject.toml/setup.py projects. Verified: all five published node repos now install from their GitHub URLs; a pyproject node (sec-af) builds its venv and 'pip install .' succeeds, with sec_af + agentfield importable from the node's venv. (Nodes that declare requires-python >=3.11 need a matching interpreter on PATH — pip reports this clearly.) Tests updated for the new validation contract; new unit tests cover the pyproject branch and entrypoint-accepting findPackageRoot. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@AbirAbbas before in, can you test it with our swe/pr-af etc.. ? |
|
Rechecked this today. The main blocker is still the required |
Summary
The
af install/af runagent-node scaffolding has existed since the very first commit but was effectively unusable for real nodes: it required a top-levelmain.py(real nodes start viapython -m pkg.app), hardcodedpython main.pyat launch, exportedAGENTFIELD_SERVER_URLwhile the SDK readsAGENTFIELD_SERVER, and stored secrets as plaintext.env. This PR makes the flow actually work end-to-end and adds the pieces needed for day-to-day use.What's new
agentfield-package.yamlmanifest with anentrypoint.start(e.g.python -m pr_af.app) — nomain.pyrequired. The runner launches via the manifest entrypoint, honors the manifesthealthcheckpath, and exportsAGENTFIELD_SERVER(+ legacyAGENTFIELD_SERVER_URL).~/.agentfield/secrets/with a random 32-byte key in~/.agentfield/keyring/master.key(0600). They are decrypted only into the child process' environment at start time — never written back to disk in plaintext. Global scope is shared across nodes; node scope overrides it.af run, required variables resolve in order: process env → node store → global store → manifest default → prompt (hidden fortype: secret), persisting prompted secrets encrypted. Missing required vars in a non-interactive session produce a clean error instead of hanging.af secretscommand.set/ls(values masked) /rm, with--nodescoping.dependencies.nodes(e.g.af://registry/swe-planner→github.com/Agent-Field/<name>, or a git URL).af installpulls them in recursively (skipping already-installed, which breaks cycles);af runstarts a node's dependencies first, in order, before allocating its own port — and leaves already-running dependencies untouched.Notable fixes uncovered while verifying locally
internal/core/services, a duplicate of theinternal/packageslogic — fixes are now applied there (the two layers share oneValidatePackage/ParsePackageMetadata/ShouldSkipCopy).Verification
Verified end-to-end against a live local control plane with two no-LLM nodes built on the real SDK:
af installan entrypoint-only node (nomain.py) → registers →af call node.reasonerreturns a real result.af secrets set,af runinjectsAGENTFIELD_SERVER+ the stored secret + manifest defaults into the process (confirmed via the node's own env dump; no plaintext on disk, files0600).af run greeter-nodeauto-starts its dependencyecho-nodefirst (distinct ports), both register and execute; an already-running dependency is not restarted.Docs
docs/installing-agent-nodes.md— full guide to install/run, the manifest schema, the encrypted secrets model, andaf secrets.cli-toolkit.mdreference updated (+ embedded skill copy synced).Test plan
go build ./...cleango test ./...for control-plane green (47 packages)Follow-ups (not in this PR)
agentfield-package.yamlmanifests in the public node repos (SWE-AF, pr-af, sec-af, cloudsecurity-af, af-template).af dev(a third copy of the launch logic) still hardcodesmain.py; collapsing the duplicated install/run implementations into one is a good follow-up.🤖 Generated with Claude Code