Multi-challenge Bittensor subnet platform with master/validator orchestration
Miner Guide β’ Validator Guide β’ Foundation Master Guide β’ Architecture β’ Challenges β’ Security β’ Website
BASE is a multi-challenge Bittensor subnet platform. It lets independent challenge subnets run under one validator network, routes miner traffic to the right challenge, collects raw challenge weights, normalizes emissions, maps miner hotkeys to Bittensor UIDs, and publishes the final vector for validators to submit on-chain.
Each challenge lives in its own repository and owns its submissions, scoring logic, state, and public miner experience. BASE provides the orchestration layer that makes those challenges run together as one subnet.
BASE runs as a single Docker Swarm: a master (manager) node hosts the platform API (a single
proxy that also serves the /v1/registry and /v1/weights/latest reads plus the token-gated admin
routes), broker, supervisor, and the challenge services, while manually enrolled worker nodes run
short-lived CPU/GPU evaluation jobs. There is no Kubernetes and no runtime.backend selector; the
only backend is Swarm.
- One BASE master (Swarm manager) controls the central registry and orchestration.
- One repository and image per challenge, isolated from other challenges.
- Challenges expose a standard internal weight contract to BASE.
- Public challenge APIs are proxied through BASE without exposing internal control routes.
- The shared control-plane PostgreSQL is private to the master process.
- Each challenge keeps its own SQLite database on its
/dataSwarm volume. - Challenge state remains owned by each challenge.
- The master node runs all active challenge services; the on-chain submitter only submits weights.
Docs are grouped by audience.
Miners
- Miner guide β choose a challenge, submit through the proxy, and track leaderboards.
Validators / operators
- Validator guide β install the submit-only on-chain weight submitter.
- Validator operations β submitter plus manager-service runbook.
- Swarm deployment β node
daemon.jsonvariants, worker enrollment, networking, prune policy, and the supervisor unit. - Foundation master guide β Cortex Foundation master bring-up (foundation-only).
- Reward semantics β reference spec for how the Terminal-Bench (harbor) scorer maps verifier output to a reward and submission status.
- Deploy from scratch β the end-to-end bring-up quickstart below.
Developers / challenge integrators
- Architecture β control-plane versus worker topology and the Swarm broker contract.
- Challenges β the challenge model.
- Challenge integration guide β the API contract a challenge must expose.
- Security model β trust boundaries and secret handling.
- Versioning β SemVer, Git tag, and GHCR tag policy.
flowchart LR
BT[Bittensor] --> M[Master]
M --> PG[(Control-plane Postgres)]
M --> C1[Challenge A]
M --> C2[Challenge B]
C1 --> D1[(Challenge A /data SQLite)]
C2 --> D2[(Challenge B /data SQLite)]
M --> B[Broker]
B --> J1[CPU job]
B --> J2[GPU job]
U[Miners] --> P[Proxy]
P --> C1
P --> C2
M --> W[Weights API]
SUB[Submitter] --> W
SUB --> BT
sequenceDiagram
participant E as Epoch
participant M as Master
participant A as Challenge A
participant B as Challenge B
participant G as Aggregator
participant BT as Bittensor
E->>M: trigger
M->>A: collect challenge weights
M->>B: collect challenge weights
A-->>M: hotkey -> weight
B-->>M: hotkey -> weight
M->>G: normalize + emissions
G-->>M: uid weights
M-->>V: latest uid weights
V->>BT: set_weights
BASE coordinates the full lifecycle of a multi-challenge subnet:
- The master tracks active challenges and their emission shares.
- The master (manager) node runs the active challenge services from the registry.
- Challenge services run isolated from the control plane and from each other, each on its own
/dataSwarm volume. - Miners interact with the relevant challenge through BASE's public proxy.
- Each challenge calculates raw hotkey weights from its own scoring rules.
- BASE normalizes challenge outputs, applies configured emissions, and maps hotkeys to UIDs.
- The on-chain submitter fetches the master's final vector and submits weights to Bittensor at epoch boundaries.
If a challenge fails, BASE can isolate that challenge's contribution without taking down the entire subnet.
Miners choose a challenge, follow that challenge's submission rules, and monitor challenge-specific leaderboards through BASE.
Challenge owners maintain independent repositories, images, scoring logic, public documentation, and weight contracts.
Validators run the on-chain submitter: they fetch the master's final normalized vector from the weights API and submit it to Bittensor. Challenge services are run by the master (manager) node, not by the submitter.
platform/
src/base/ # CLI, APIs, orchestration, Bittensor wrappers
alembic/ # PostgreSQL migrations
config/ # YAML example configs
docker/ # Dockerfiles and OCI image assets
docs/ # Project, miner, validator, and challenge docs
plan/ # Detailed design plan
tests/ # Unit/runtime validation tests
BASE uses a Docker Swarm only first-party deployment path and keeps Dockerfiles for the OCI images Swarm runs:
deploy/swarm/install-swarm.shbrings up the single-node Swarm manager (master proxy, broker, and challenge services on encrypted overlay networks). It is dry-run by default, mutates only with--apply, and keeps every destructive step behind its own explicit flag (--restart-dockerd,--single-node-placement,--static-challenges).deploy/swarm/base-supervisor.serviceinstalls the manager-only systemd supervisor: broker-health, timeout-reaper, image-updater, challenge-image-updater, config-sync, and self-update loops.- The master (manager) node runs the challenge services with the placement constraint
node.role==manager. The broker dispatches CPU jobs tonode.labels.base.workload==cpuworkers and GPU jobs (gpu_count>0) tonode.labels.base.workload==gpuworkers with--generic-resource NVIDIA-GPU=<N>. - Worker nodes are enrolled manually with Swarm join tokens (no SSH) via the
base master workerCLI group:worker token [--cpu|--gpu]printsdocker swarm join --token <TOKEN> <MANAGER_IP>:2377; after the operator installs the matchingdaemon.jsonand joins,worker label <node> --workload cpu|gpusets the scheduling label. The group also hasworker list,worker drain,worker rm, andworker inspect. - Control-plane state is a single shared PostgreSQL supplied via
BASE_DATABASE_URLor a Docker secret; SQLite is rejected for control-plane state. Each challenge keeps its own SQLite database on its/dataSwarm volume; there is no Postgres server per challenge. - The on-chain submitter (
deploy/swarm/submitter/) is a systemd service that reads/v1/weights/latestand submits on-chain; it runs no challenge orchestration. - The supervisor image-updater and challenge-image-updater resolve the public GHCR tag digest and roll Swarm services to
tag@sha256:<digest>only when the digest changes fromghcr.io/baseintelligence/base-master:latest; no GHCR pull secret is required for public packages. - Pinned production mode uses a tag plus a
sha256digest, for exampleghcr.io/baseintelligence/demo:1.2.3@sha256:<64-hex-digest>for releases orghcr.io/baseintelligence/demo:latest@sha256:<64-hex-digest>for the autonomous update channel, and disables mutable auto-update. Production rejects untagged images, missing digests, and non-SemVer non-latesttags. BASE release versioning starts at3.0.0; seedocs/versioning.mdfor the SemVer, Git tag, mutablelatest/main, and GHCR tag policy. - Swarm networking uses encrypted overlay networks at MTU 1450. Required inter-node ports:
2377/tcp(management),7946/tcp+udp(gossip),4789/udp(VXLAN data plane), and IP protocol 50 (ESP) for the encrypted overlay. - Swarm services map CPU and memory to
--limit-cpuand--limit-memoryand PID ceilings to--limit-pids.docker service createdoes not support--memory-swapor--security-opt, so swap limits are not emitted andno-new-privilegesis enforced daemon-wide viadaemon.json. - Broker image allowlists should stay scoped to
ghcr.io/baseintelligence/unless a deployment explicitly adds another trusted registry namespace.
See deploy/swarm/ for the installer, supervisor unit, submitter, and daemon.json templates that define the production deployment.
PRISM GPU evals re-execute the miner's training loop on locked FineWeb-Edu data under a forced random init. The broker delivers that locked data to the eval container through a per-slug read-only mount mechanism (SwarmBrokerConfig.eval_readonly_mounts_by_slug in master/swarm_backend.py, settings docker.broker_eval_readonly_mounts_by_slug, wired in cli_app/main.py) that is decoupled from the Docker-socket allowlist, so the prism eval job receives the data without the (root-equivalent) host Docker socket.
- Every prism GPU eval job bind-mounts the locked FineWeb-Edu train volume (
prism_fineweb_edu_trainβ/data/fineweb-edu/train) and the offline reference tokenizers (prism_reference_tokenizersβ/opt/prism/reference-tokenizers) read-only, via the built-inDEFAULT_PRISM_EVAL_READONLY_MOUNTS(nomaster.yamlentry required). - Only the
trainsplit is exposed; the secretval/testheld-out splits are never mounted into the eval container, which runsnetwork=noneon an internal overlay and carries no OpenRouter secret.
deploy/swarm/install-swarm.sh canonicalizes the PRISM v2 eval-plane deploy wiring on the challenge service:
- Augmented evaluator image β
IMAGE_PRISM_EVALUATORdefaults toghcr.io/baseintelligence/prism-evaluator:augmented(bundlessentencepiece+ the offline tiktoken cache for the locked pipeline) and is passed asPRISM_BASE_EVAL_IMAGE; the registry:latestevaluator is stale and must not be used. - Host-side held-out β the manager-pinned prism scorer (not the
network=noneeval container) mounts the SECRET val split read-only (prism_fineweb_edu_valβ/secret/val) and reads it viaPRISM_BASE_EVAL_VAL_DATA_DIR=/secret/valfor the held-out delta; the held-out is gracefully skipped if val is absent. - OpenRouter LLM hard gate β
PRISM_LLM_REVIEW_ENABLED=true; the key is mounted on the challenge service ONLY at/run/secrets/openrouter_api_key(from thebase_openrouter_api_keyDocker secret), never on the eval container.
See deploy/swarm/README.md for the full broker mount mechanism and deploy details.
Run these commands from the repository root when validating the platform locally. The live Swarm checks require Docker. If a tool is missing, record the bounded blocker rather than claiming that surface was tested.
uv sync --extra dev --extra master
uv run ruff check .
uv run ruff format --check .
uv run mypy src tests
uv run pytest --cov=base --cov-report=term-missing --cov-fail-under=80
bash -n deploy/swarm/install-swarm.sh
./deploy/swarm/install-swarm.sh # dry-run: prints the planned docker swarm commands, changes nothingFor a live single-node check (mutating; run only on a disposable host):
docker swarm init
docker network create --driver overlay --opt encrypted \
--opt com.docker.network.driver.mtu=1450 base_challenges
docker service ls
docker swarm leave --forceEvidence for local validation should live in a local, gitignored evidence directory and must not contain tokens, credentialed database URLs, private registry credentials, bearer secrets, or private keys.
This is the end-to-end path to stand up the full subnet on a fresh Docker Swarm: a manager node
(control plane plus the long-lived challenge services) and one or more CPU/GPU workers (short-lived
broker eval jobs). It ties together image builds, image publishing/staging, volume provisioning,
install-swarm.sh --apply, worker enrollment, and the on-chain submitter. Weights are always
computed dry-run; the on-chain submitter is a separate, optional step. Run install-swarm.sh
dry-run first (no flags) and only --apply on a host you own.
The three backend repositories are sibling checkouts under a common parent
(platform/, agent-challenge/, prism/); the frontend deploys separately to Vercel.
| Node | Swarm role | Runs |
|---|---|---|
| Manager (also the validator / hotkey node) | node.role==manager |
Control plane (proxy / broker / supervisor) and the challenge services (agent-challenge, PRISM) |
| CPU worker | node.labels.base.workload==cpu |
Short-lived CPU broker jobs |
| GPU worker | node.labels.base.workload==gpu |
Short-lived GPU broker jobs; advertises NVIDIA-GPU as a Swarm generic resource |
The manager control-plane services are published on fixed host ports by
install-swarm.sh --apply (overridable via the MASTER_PROXY_PORT /
MASTER_BROKER_PORT env vars; the defaults below match the live box):
| Manager service (host-published) | Host port |
|---|---|
base-master-proxy (single public API; serves /v1/registry, /v1/weights/latest, /health, and routes /challenges/*) |
18080 |
| base-master-broker | 18082 |
/v1/registry and /v1/weights/latest are served by the proxy on 18080; there
is no separate admin service or port.
The challenge services and the Postgres backing stores are overlay-internal
(no host publish): clients reach the challenges through the proxy over the
base_challenges overlay (e.g. http://127.0.0.1:18080/challenges/prism/...),
and the master reaches Postgres by service name. They listen on their container
ports only:
| Overlay-internal service (reached via the proxy / by service name) | Container port |
|---|---|
| challenge-agent-challenge (plus worker sidecar) | 8000 |
| challenge-prism (SQLite-backed) | 8080 |
| base-master-postgres (control plane) | 5432 |
| challenge-agent-challenge-postgres | 5432 |
| challenge-prism-postgres | 5432 |
The live box additionally exposes some of these on the host for direct debugging only (not the canonical client path): prism on
18002, agent-challenge on18001, and the Postgres stores on15432/15433/15434. Production clients always go through the proxy.
GPU eval jobs are dispatched by the broker to a GPU worker via the constraint
node.labels.base.workload==gpu plus --generic-resource NVIDIA-GPU=<N>.
Build from each repo's Dockerfile. <tag> is your release tag (a SemVer such as 3.0.0, or
latest for the mutable channel).
# base-master (this repo): proxy (single public API) + broker + supervisor
docker build -f docker/Dockerfile.master -t ghcr.io/baseintelligence/base-master:<tag> .
# prism API + GPU evaluator (from ../prism)
docker build --target service -t ghcr.io/baseintelligence/prism:<tag> ../prism
docker build --target evaluator -t ghcr.io/baseintelligence/prism-evaluator:augmented ../prism
# agent-challenge API + own_runner eval-job image (from ../agent-challenge)
docker build --target runtime -t ghcr.io/baseintelligence/agent-challenge:<tag> ../agent-challenge
docker build --target terminal-bench-runner -t ghcr.io/baseintelligence/agent-challenge-terminal-bench-runner:<tag> ../agent-challengeThe prism evaluator must be the :augmented image: it bundles sentencepiece and the offline
tiktoken cache the locked FineWeb-Edu pipeline needs. The registry :latest evaluator is stale; do
not use it.
Build-order coupling: prism pins its
basedependency by git (base @ git+https://github.com/BaseIntelligence/base.git, public HEAD), so a freshprismbuild bundles whatever is on the pushed platform HEAD. Push the platform commits the prism/broker images depend on before buildingprism/prism-evaluator.
- GHCR publish (preferred):
docker pusheach tag toghcr.io/baseintelligence/*. Public packages need no pull secret; the supervisor image-updaters then track digests automatically. - Local-only staging (no
write:packages): build each image on the node that runs it (manager for the services, GPU/CPU workers for the eval images), and deploy withdocker service update --no-resolve-imageso a non-registry tag resolves to the node-local image. Pre-pull/stage the prism evaluator and the agent-challenge runner on the worker nodes so the broker resolves them locally.
On the GPU worker, stage the locked PRISM data and reference tokenizers as read-only volumes (produced by prism's FineWeb-Edu prep job; see the prism repo):
prism_fineweb_edu_trainβ/data/fineweb-edu/train(miner-visible, read-only)prism_fineweb_edu_val,prism_fineweb_edu_testβ secret held-out, scorer-only (never mounted in thenetwork=noneeval container)prism_reference_tokenizersβ/opt/prism/reference-tokenizers
On the manager, provision the agent-challenge read-only task cache and golden volumes:
deploy/swarm/acquire-agent-challenge-cache.sh # populates agent_challenge_task_cache + agent_challenge_goldenVerify each volume is both present and populated (a Docker named volume is auto-created empty on
first mount, so an empty mount succeeds silently). Provide the OpenRouter key for the PRISM/agent
LLM gate as the Docker secret consumed at /run/secrets/openrouter_api_key (the installer creates
base_openrouter_api_key from $OPENROUTER_API_KEY); the eval containers never carry it.
deploy/swarm/install-swarm.sh is the canonical entry point. It is dry-run by default and
mutates only with --apply; every destructive step is behind its own flag. Point the image tags at
what you built/published via the IMAGE_* environment overrides:
export IMAGE_MASTER=ghcr.io/baseintelligence/base-master:<tag>
export IMAGE_PRISM=ghcr.io/baseintelligence/prism:<tag>
export IMAGE_PRISM_EVALUATOR=ghcr.io/baseintelligence/prism-evaluator:augmented
export IMAGE_AGENT_CHALLENGE=ghcr.io/baseintelligence/agent-challenge:<tag>
export AGENT_CHALLENGE_RUNNER_IMAGE=ghcr.io/baseintelligence/agent-challenge-terminal-bench-runner:<tag>
./deploy/swarm/install-swarm.sh # dry-run: prints the planned docker commands
./deploy/swarm/install-swarm.sh --apply # apply on a disposable / owned host
./deploy/swarm/install-swarm.sh --apply --restart-dockerd # also write /etc/docker/daemon.json + restart dockerd (fresh nodes)The installer initializes the Swarm, creates the encrypted overlay networks (base_challenges
and base_jobs_internal, MTU 1450), creates the value-bearing Docker secrets via stdin (never
argv), and creates the master proxy/broker plus both challenge services (the broker pinned to
node.role==manager). The PRISM eval read-only data mounts are supplied by the broker's built-in
DEFAULT_PRISM_EVAL_READONLY_MOUNTS, so no master.yaml entry is required.
Workers are added manually with a Swarm join token (no SSH). From the manager:
base master worker token --cpu # or --gpu β prints the docker swarm join commandOn the worker, install the matching daemon.json and join (the GPU daemon.worker.json advertises
NVIDIA-GPU and registers the NVIDIA runtime):
JOIN_TOKEN=<TOKEN> scripts/install-worker.sh --manager-addr <MANAGER_IP>:2377 --workload cpu # dry-run
JOIN_TOKEN=<TOKEN> scripts/install-worker.sh --manager-addr <MANAGER_IP>:2377 --workload cpu --restart-dockerd --applyBack on the manager, label the node so jobs schedule onto it:
docker node ls
base master worker label <node> --workload cpu # or gpuSee deploy/swarm/README.md for daemon.json details, networking ports,
and the prune policy.
The submitter is a single systemd-managed process that reads /v1/weights/latest from the master
and submits on-chain. It runs no challenge orchestration and needs only the validator hotkey.
cp deploy/swarm/submitter/run_submitter.py /var/lib/base/submitter/
cp deploy/swarm/submitter/submitter.yaml /etc/base/submitter.yaml
cp deploy/swarm/submitter/base-submitter.service /etc/systemd/system/
systemctl daemon-reload
systemctl enable --now base-submitter.serviceFull submitter configuration and the operator FAQ are in the Validator guide; manager-service runbooks are in Validator operations.
The two manager services answer /health on their published host ports. The
challenges are overlay-internal, so verify them through the proxy (the
canonical client path) rather than on a host port:
docker service ls
curl -sf http://127.0.0.1:18080/health # proxy
curl -sf http://127.0.0.1:18082/health # broker
curl -sf http://127.0.0.1:18080/v1/registry # registry (served by the proxy)
curl -sf http://127.0.0.1:18080/v1/weights/latest # weights (served by the proxy)
curl -sf http://127.0.0.1:18080/challenges/prism/leaderboard # prism, via the proxy
curl -sf http://127.0.0.1:18080/challenges/agent-challenge/leaderboard # agent-challenge, via the proxyA GPU eval job lands on a GPU worker via node.labels.base.workload==gpu plus
--generic-resource NVIDIA-GPU=<N>; the long-lived challenge services stay on the manager.
The single platform API listens on 127.0.0.1:18080. To expose it publicly as
https://chain.joinbase.ai, front it with a Cloudflare tunnel using one catch-all ingress
rule β chain.joinbase.ai -> http://127.0.0.1:18080 β with no /v1 path-split, because
the one port already serves /health, /v1/registry, /v1/weights/latest, /challenges/*, and the
token-gated admin/control-plane routes (which stay private on the same app). No edge-level path
filtering is required.
Public edge is LIVE.
https://chain.joinbase.aiis served entirely from this box via its cloudflared tunnel, using the single catch-all ingress rule above (chain.joinbase.ai -> http://127.0.0.1:18080). All public read routes return200:/health,/v1/registry,/v1/weights/latest,/challenges/prism/leaderboard, and/challenges/agent-challenge/leaderboard. Admin-write/control-plane and management routes stay private (they return401/405), and/internal/*(plus/version) return404at the edge. Public responses are field-identical to the local proxyhttp://127.0.0.1:18080(see Step 7); the PRISM CURRENT epoch may legitimately be empty, which is real data, not an unavailable state.
Apache-2.0
