MOBIUS INFINITY — ERO (Entitlement–Reflection Orchestrator)

The synthesis capstone of the Möbius program: a thin sequential router that composes two existing systems —

MMV (answer entitlement): "is answering warranted?"
RQA (reflective questioning + memory-echo governance): "not yet — what deeper question advances this?"

MMV evaluates first on its unmodified stack and its route is authoritative; on the ask branch ERO hands the turn to RQA. They are composed in sequence, never layered — layering Essentials-like content onto MMV's structural governance is documented to collapse restraint (MMV Condition I), so ERO must not do it. Context flows one way: MMV → RQA.

Spec: SPEC_v0_2.md (current authority; supersedes v0.1).

Quickstart — use it like a local LLM

Prereq: Ollama installed. Then one of:

Turnkey (clones deps, installs, pulls the model, preflights):

make install      # = ./bootstrap.sh  (clones mmv+rqa into ./deps, installs, pulls gemma4:12b)
make serve        # OpenAI-compatible API, fully local, no key

Full stack in Docker (Ollama + ERO together):

docker compose up --build      # first run pulls the model (large); then:
#   -> http://localhost:8000/v1

Manual:

ollama pull gemma4:12b
pip install -e ".[serve]"          # installs the `mobius-infinity` command + FastAPI
export MMV_ROOT=/path/to/mmv RQA_ROOT=/path/to/rqa   # the two sibling repos
mobius-infinity preflight          # checks Ollama / model / repos before you start
mobius-infinity serve              # -> http://127.0.0.1:8000/v1  (point any OpenAI client here)

Production exposure: set an API key (--api-key / $ERO_API_KEY → requires Authorization: Bearer … on /v1/*) and a --timeout; see SECURITY.md.

serve defaults to --profile fast — fully local, no cloud key (ask route ~15–20s on a local 12B, masked by streaming; see the latency note below). The reflection tiers (latency-measured on a local gemma4:12b host; the local evaluator is the dominant cost):

`--profile`	budget	judge	latency	notes
turbo	k1/d1	none	~15–20s	single candidate, no selection
fast (default)	k3/d1	none	~15–20s	interactive, fully local
balanced	k4/d2	none	~25s	more candidates, local
quality	k6/d3	local	~150s	paper-style selection, slow
cloud	k6/d3	Groq	~30s	paper-faithful; needs `GROQ_API_KEY`

About latency (honest). On a local gemma4:12b, the ask route runs RQA's multi-step reflection, so it takes ~15–20s — and that floor is set by the model's tokens/sec and the number of internal LLM calls, not by k, num_ctx, or the evaluator (measured: tuning those barely moves it). So:

the wait is masked by responsive streaming (immediate role chunk + keepalive heartbeats + typewriter) — it feels active, not frozen;
the clarifying question cannot token-stream earlier (it's chosen at the end of a selection pipeline);
for genuinely lower latency, use a smaller/faster --model, a faster endpoint (--ollama-url to vLLM/a GPU host, or --cloud), or better hardware.

--local / --cloud are aliases for fast / cloud. Multi-turn history is honored by default (the message array is threaded into MMV's session); --no-multiturn routes only on the latest turn.

Why fast is the default: a blind 3-judge depth comparison (n=12, eval/REPORT_profile_compare.md) found fast is not majority-shallower than the full pipeline — depth means 3.1–4.3, no length confound — while running ~4× faster. The evaluator buys a small depth edge; cloud gets it back at low latency for key-holders.

By design, this is not a vanilla answerer. When a request is under-specified ("Fix this", "Which is better?") it asks a clarifying question or abstains, instead of guessing. That restraint + the depth of the question is the point — see paper/.

Status

v0.1 vertical slice: routing + framing + audit. The orchestrator imports neither MMV nor RQA — it depends on two injected, duck-typed surfaces, so it is fully testable without model backends.

ero/orchestrator.py   # Orchestrator + EROResult (pure routing logic)
ero/framing.py        # ask-route -> RQA input framing
ero/audit.py          # append-only audit (memory / JSONL)
ero/openai_api.py     # OpenAI-compatible HTTP surface (/v1/chat/completions)
ero/wiring.py         # HOST-ONLY: wires the real MMV + RQA backends (needs Ollama)
tests/                # network-free acceptance tests (54 across 8 suites)

Test

python3 tests/test_orchestrator.py     # no pytest needed
# or: python3 -m pytest tests/

Run against real backends (host with Ollama + sibling repos)

from ero.wiring import build_orchestrator, new_mmv_session

ero = build_orchestrator(audit_path="state/ero_audit.jsonl")  # warm, once per process
state = new_mmv_session()                                     # one per conversation
res = ero.handle("compare these", session_state=state)
print(res.route, res.answered_by, res.text or res.result)

build_orchestrator reuses each system's own factory (src/app/cli._build_engine for MMV, Controller(Config()) for RQA). Override repo locations with the MMV_ROOT / RQA_ROOT env vars. The wiring path needs a running Ollama and is not covered by the network-free suite — verify on the host.

Serve an OpenAI-compatible API (drop-in for the OpenAI ecosystem)

So any OpenAI client (OpenWebUI, LangChain, the openai SDK, …) can point its base_url at ERO and transparently get answer-entitlement routing + reflective questioning. Governance is surfaced without breaking compatibility: a non-standard ero object on the JSON body and x-ero-* response headers.

pip install -r requirements-serve.txt          # fastapi + uvicorn
python3 -m ero.openai_api --host 127.0.0.1 --port 8000 --audit state/ero_audit.jsonl

curl http://127.0.0.1:8000/v1/chat/completions -H 'content-type: application/json' \
  -d '{"model":"mobius-infinity","messages":[{"role":"user","content":"Which is better?"}]}'
# -> assistant message IS the RQA clarifying question; body.ero.route == "ask"

Route → response: ask returns the RQA question, answer/verify the synthesized answer, abstain a refusal (finish_reason: content_filter); a downed backend returns HTTP 503 with an OpenAI-style error. The request/response mapping is pure and network-free tested (tests/test_openai_api.py); FastAPI/uvicorn are imported lazily, so the package needs no web deps unless you serve.

Streaming (stream: true) is built for perceived responsiveness: the assistant role chunk is sent immediately, SSE keepalive heartbeats flow every 2s while the ask path reflects, then the text streams incrementally (typewriter). This makes the ~15–20s ask wait feel active rather than frozen — it masks, not reduces, the latency (the clarifying question is produced by a selection pipeline, so it cannot token-stream earlier).

Pluggable generation backend (consume any OpenAI-compatible model)

MMV's routing/entitlement decision is local heuristics (model-independent); only answer synthesis uses a backend. BackendConfig selects it without modifying MMV:

from ero.wiring import build_orchestrator, BackendConfig

# default: local Ollama gemma4:12b. Or target any OpenAI /v1 server:
ero = build_orchestrator(backend=BackendConfig(
    kind="openai_compatible", model="my-model", base_url="http://localhost:8001"))

(Keyless local servers — vLLM / llama.cpp / LiteLLM / Ollama's /v1 — work today; an Authorization-header passthrough for authed providers is a small planned upstream to MMV's VllmAdapter.)

Not in v0.1 (descoped)

Shared/unified memory store and a unified provenance ontology → v0.3 (MMV and RQA keep their separate stores for now).
A generalized text-citation verifier for MMV answers (RQA's sanitize_memory_refs is schema-coupled to RQA and runs inside RQA only).

License

Code: AGPL-3.0-or-later (LICENSE). Documentation and the paper/: CC BY-NC-SA 4.0. Repository-wide licensing, the commercial-license option, and rights administration: LICENSE_NOTICE.md. Composed dependencies (MMV, RQA) and attribution: NOTICE. The underlying methods are patent pending (7 filings); the AGPL §11 grant and its scope: PATENTS.md.

Related — the Möbius program

Part of the MOBIUS program — local-first, AGPL:

mmv — answer-entitlement runtime: decides whether answering is warranted
rqa — reflective questioning adapter: deepens the question when it is not
rcgov — reflective context governor: governs what a model may read
infinity — composite capstone (MMV × RQA) with an OpenAI-compatible API

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github		.github
ero		ero
eval		eval
paper		paper
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
LICENSE_NOTICE.md		LICENSE_NOTICE.md
Makefile		Makefile
NOTICE		NOTICE
PATENTS.md		PATENTS.md
README.md		README.md
SECURITY.md		SECURITY.md
SPEC_v0_1.md		SPEC_v0_1.md
SPEC_v0_2.md		SPEC_v0_2.md
SPEC_v0_6_unification.md		SPEC_v0_6_unification.md
bootstrap.sh		bootstrap.sh
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements-serve.txt		requirements-serve.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MOBIUS INFINITY — ERO (Entitlement–Reflection Orchestrator)

Quickstart — use it like a local LLM

Status

Test

Run against real backends (host with Ollama + sibling repos)

Serve an OpenAI-compatible API (drop-in for the OpenAI ecosystem)

Pluggable generation backend (consume any OpenAI-compatible model)

Not in v0.1 (descoped)

License

Related — the Möbius program

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

MOBIUS INFINITY — ERO (Entitlement–Reflection Orchestrator)

Quickstart — use it like a local LLM

Status

Test

Run against real backends (host with Ollama + sibling repos)

Serve an OpenAI-compatible API (drop-in for the OpenAI ecosystem)

Pluggable generation backend (consume any OpenAI-compatible model)

Not in v0.1 (descoped)

License

Related — the Möbius program

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages