Container status API: compute live status instead of static DB column#352
Open
runleveldev wants to merge 15 commits into
Open
Container status API: compute live status instead of static DB column#352runleveldev wants to merge 15 commits into
runleveldev wants to merge 15 commits into
Conversation
runleveldev
commented
Jun 16, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
This PR removes the persisted Containers.status column and replaces it with a live, computed container status derived from Proxmox state, job state, and config drift, while keeping status present on existing container API payloads. It also updates the UI and docs to reflect the expanded status set.
Changes:
- Add a shared status resolver (
utils/container-status.js) and wire it into container list/show/create responses. - Drop
Containers.statusfrom the Sequelize model + DB schema; remove status writes from job scripts/import paths. - Update templates generation filters and frontend types/rendering to support the new live status values.
Reviewed changes
Copilot reviewed 10 out of 13 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| create-a-container/utils/container-status.js | New live status resolver (Proxmox snapshot + job lookup + drift detection). |
| create-a-container/routers/templates.js | Switch template container selection from status='running' to a provisioned/proxy predicate. |
| create-a-container/routers/api/v1/nodes.js | Stop persisting status during node/container import. |
| create-a-container/routers/api/v1/containers.js | Embed computed status in list/show; return creating on create response; stop persisting status updates. |
| create-a-container/README.md | Remove status from ER diagram and document the live status field and enum values. |
| create-a-container/openapi.v1.yaml | Add ContainerStatus enum schema and reference it from Container.status. |
| create-a-container/models/container.js | Remove status attribute from the Container model definition. |
| create-a-container/migrations/20260615000000-remove-container-status.js | Migration to drop the status column (with rollback restoring a default). |
| create-a-container/client/src/pages/containers/ContainersListPage.tsx | Render typed live status via StatusBadge and map new states to badge variants/labels. |
| create-a-container/client/src/lib/types.ts | Introduce ContainerStatus union and type container/status-bearing payloads accordingly. |
| create-a-container/bin/reconfigure-container.js | Stop updating container status; keep MAC/IP updates and adjust logs. |
| create-a-container/bin/json-to-sql.js | Stop setting/updating container status during JSON import/update. |
| create-a-container/bin/create-container.js | Replace the old “pending” guard with a VMID presence guard; stop persisting creating/running/failed status. |
ccc31d4 to
793ba81
Compare
Admin-defined default environment variables were merged into a container's DB record when the container was created. This froze the defaults to their creation-time values and exposed them in the edit UI. Now the container record stores only user-defined env vars. System defaults (from Settings) and NVIDIA GPU defaults are merged in at configure-time, just before the Proxmox API call, with user-defined values taking precedence. - models/container.js: buildLxcEnvConfig() now merges defaults < NVIDIA defaults < user vars; add getSystemDefaultEnvVars(), parseEnvironmentVars(), and nvidiaDefaultEnvVars() helpers. - bin/create-container.js: merge system defaults at configure-time and stop writing merged env/entrypoint back into the DB record. - bin/reconfigure-container.js: merge system defaults on edit so newly added defaults also reach existing containers. - routers/api/v1/containers.js: stop baking NVIDIA defaults into the saved record on create. Closes #349
Move all environment-variable merge logic into Container#buildLxcEnvConfig so there is exactly one place that determines the env to deploy. The method now loads the admin-defined system defaults itself (it became async) and composes them with the NVIDIA and user-defined values: system defaults < NVIDIA defaults < user-defined values Callers in create-container.js and reconfigure-container.js are reduced to a single `await container.buildLxcEnvConfig()` call and no longer load or pass defaults themselves. getSystemDefaultEnvVars, parseEnvironmentVars and nvidiaDefaultEnvVars are now internal helpers of that method.
… the Proxmox env string Addresses PR review: env var keys were only trimmed, so a key containing '=' or NUL (or a non-string/array value) could corrupt the NUL-separated KEY=value blob sent to Proxmox (splitting one var into many, breaking parsing, or stringifying objects to "[object Object]"). Add Container.normalizeEnvVars() as the single definition of a valid env var: input must be a plain object; keys are trimmed and must match a conventional env-var name (no '=', NUL, or whitespace); values are coerced to strings, with null/undefined, objects/arrays, and NUL-containing values dropped. Apply it at every ingest/load point: - getSystemDefaultEnvVars() and parseEnvironmentVars() normalize their output. - API create/update routes serialize user env vars via a shared serializeUserEnvVars() helper that normalizes before persisting. buildLxcEnvConfig can now trust its inputs, so the per-pair guards are dropped.
buildLxcEnvConfig always added env/entrypoint to Proxmox's `delete` list when the container record had no value. On create the container is freshly cloned from a template that may carry its own env/entrypoint, so emitting `delete` actively wiped those template-provided defaults. Add a deleteMissing option (default false). Proxmox's config PUT is a partial update, so omitting a key leaves it untouched: - create-container.js uses the default (false): empty env/entrypoint are omitted, preserving whatever the cloned template provides. - reconfigure-container.js passes deleteMissing:true so clearing the last env var or removing a custom entrypoint still unsets it on the existing container. The reconfigure restart-detection already ignores the `delete` key, so a pure-delete config does not, by itself, trigger a restart.
Reconfigure uses deleteMissing, so any env/entrypoint that exists only on the cloned template (and was never stored on the container) would be unset on the first reconfigure — surprising the user. We can't re-query the template later: templates are mutable Docker refs that may change or disappear, and we don't cache them. Compromise: at creation, after the template is cloned, read its LXC config and fold its env/entrypoint into the container record as if the user had supplied them (precedence: template < user). This guarantees those values persist across future reconfigures without changing API client expectations. System and NVIDIA defaults are deliberately NOT persisted — they remain configure-time-only (merged by buildLxcEnvConfig) so they aren't frozen into the record or exposed as user values in the edit UI. - models/container.js: add Container.parseLxcEnvString() and Container#persistTemplateDefaults(). - bin/create-container.js: call persistTemplateDefaults() with the cloned template's config before building/applying the LXC env config.
Resolves #350. Container status was a static column updated only by the create/reconfigure job scripts, so it drifted from reality whenever Proxmox changed out of band or a job died mid-run. This replaces it with status computed on demand. New utils/container-status.js resolves status from three sources: - Proxmox: does the LXC exist, and is it running/stopped? - Jobs: active create/reconfigure job, or did the last create job fail? - Config: does the live LXC config match what the API server expects? Resolved statuses: running, offline, creating, restarting, failed, missing, out-of-sync (plus unknown when Proxmox is unreachable). New endpoint: GET /api/v1/sites/:siteId/containers/:id/status -> { data: { status } } Non-breaking: the same live status is still embedded on the list/show/create payloads, so existing consumers keep working. The static Containers.status column is dropped (migration + model). All former writers were updated to stop persisting status; templates.js now selects provisioned containers by VMID + IPv4 instead of status='running'. The list page queries the per-container status endpoint (seeded by the list value for instant paint, polling while creating/restarting). OpenAPI and README updated.
Live status is already embedded in the existing list/show/create container payloads, so the dedicated GET /containers/:id/status endpoint is redundant. Per-row polling + hydration on the index page is not needed at this time, so revert the ContainerStatusBadge to a simple presentational badge that renders the live status from the list response. Removes the endpoint, the client query/key/result type, the OpenAPI path, and the README endpoint section (the status-value reference is retained as a field description). The ContainerStatus enum and the live resolver remain.
- out-of-sync is a warning, not a danger, badge variant - templates.js routes containers by ipv4Address only (a VMID isn't required to route)
- container-status.js: match --container-id as a whole argument when looking up
create/reconfigure jobs, so container 12 no longer matches container 123's
jobs (which could misclassify status). LIKE narrows at the DB level; a regex
confirms the id is terminated by a space or end-of-string.
- resource-requests.js: the applyResourceToExistingContainers query filtered on
the now-removed Containers.status column ("Unknown column 'status'" after the
migration). Use containerId != null (a VMID is required to reconfigure anyway).
- README: note the create endpoint also returns the live status field.
The calling convention for create/reconfigure job commands is stable and deterministic, so match the command string exactly in the query rather than fetching LIKE candidates and regex-filtering in JS: - create: exact match on `node bin/create-container.js --container-id=<id>` (create commands never carry trailing args). - reconfigure: exact match OR `<base> %` (exact base followed by a space), which covers the optional ` --<resource>=<value>` flags while still preventing id 12 from matching id 123.
Resolving the container list status was O(N) DB queries because every container ran its own create/reconfigure job lookup (up to 2 queries each). Add prefetchLatestJobs() which fetches the latest create + reconfigure job for the whole set of containers in a single query (bounded WHERE clauses, independent of N), buckets them per container, and feeds the result into computeContainerStatus via an optional `jobs` param. computeContainerStatuses now issues one job query total instead of ~2N. The single-container show path is unchanged (still queries per container when no prefetched jobs are passed).
…ng, out-of-sync)
Resolving the container list was still slow because each container did per-item
work: a command-string job lookup for 'restarting', and a Proxmox lxcConfig call
for 'out-of-sync'. Both are inherently O(N) (Proxmox has no bulk config endpoint),
which dominated index latency.
Drop both statuses:
- 'restarting': not used by known API clients (launchpad only checks == running),
and reconfigure jobs have no FK so detecting it required command-string queries.
- 'out-of-sync': required one lxcConfig call per in-Proxmox container; no bulk
Proxmox config API exists to batch it.
Status now derives only from:
- Proxmox run-state, via one clusterResources('lxc') snapshot per node, and
- the create job, resolved from the eager-loaded creationJob association (or the
creationJobId FK) — no command-string queries.
Result: the index makes one Proxmox call per node and zero per-container Proxmox
or DB job queries (verified at 30 containers). Remaining statuses: running,
offline, creating, failed, missing, unknown. PUT still enqueues a reconfigure job
and returns its jobId; only the 'restarting' status label is removed.
Updates the resolver, containers router (eager-load creationJob on the index),
client types/badges, OpenAPI enum, and README.
91a3ddf to
e790cfb
Compare
runleveldev
added a commit
that referenced
this pull request
Jun 19, 2026
… + DummyApi
Run the Manager standalone on localhost (SQLite, `make dev`) and create
containers without Proxmox, keeping the real provisioning code path.
- NodeApi abstraction: Node.api() dispatches on a new `nodeType` column.
`proxmox` (default) returns ProxmoxApi; `dummy` returns a new DummyApi that
implements the same surface and simulates Proxmox calls. Creation still runs
POST /containers -> Job -> job-runner -> bin/create-container.js -> node.api().
- DummyApi.clusterResources('lxc') derives the running set from the DB so the
live container-status resolver (#352) reports dummy containers correctly;
dev-bootstrap gives the dummy node placeholder creds so nodeHasCreds() passes.
- Provider-agnostic node selection that only excludes unprovisionable nodes
(dummy, or proxmox with apiUrl/tokenId/secret); DELETE routes teardown through
node.api(). Node.api() validates apiUrl alongside credentials.
- Postgres-safe migration adding `nodeType` (existing rows default to proxmox).
- Fix the erroneous UNIQUE on Containers.nodeId via patch-package (v6 backport of
sequelize/sequelize#17583); replaces the SQLite-specific migration.
- bin/dev-bootstrap.js seeds a localhost site + dummy node + external domain,
no-op if any Site exists, so it never clobbers the docker-compose bootstrap.
- Make idempotent seeders (groups, push-notification) so re-running them in
`make dev` doesn't violate unique constraints.
- Gate Sequelize SQL query logging behind LOG_LEVEL=trace; off otherwise.
- Replace the unmaintained sqlite3 with @vscode/sqlite3 (dev-only, builds from
source via dialectModule) so there is no prebuilt-binary glibc mismatch; prod
(Postgres) never installs or loads it.
- `make dev` lives in create-a-container/Makefile (root delegates): self-contained
install (--no-package-lock, so it never rewrites the lockfile), migrate, seed,
then run server (nodemon) + job-runner + client watch together.
- Docs: "Run the Manager Locally (make dev)" section; README + release-pipeline
pointers; LOG_LEVEL documented in example.env.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resolves #350. Merge #351 first.
Summary
Container status was a static
Containers.statuscolumn updated only by the create/reconfigure job scripts, so it drifted from reality whenever Proxmox changed out of band or a job died mid-run. This replaces it with status computed live on demand, embedded directly on the existing container API payloads. The static column is dropped.Live status resolver —
create-a-container/utils/container-status.jsStatus is derived from two cheap sources of truth:
clusterResources('lxc')snapshot per node.creationJobIdforeign key, so this is resolved from the eager-loadedcreationJobassociation (zero extra queries) or a single PK lookup.Resolved statuses:
runningofflinecreatingfailedmissingunknownWhere status is returned
The live
statusis embedded on the existing list, show, and create container payloads — no new endpoint.Performance (why the scope was trimmed)
Resolving the list must stay cheap. Two things were inherently O(N):
restarting) — reconfigure jobs have no FK, so detecting them needed per-container command-string queries.out-of-sync) — Proxmox has no bulk config endpoint, so this needed onelxcConfigcall per in-Proxmox container.Both were dropped. The index now issues one Proxmox call per node and zero per-container Proxmox or DB job queries (verified at 30 containers). The create job comes free from the eager-loaded association.
Compatibility
statusremains present on the same endpoints, so existing consumers keep working. Checkedmieweb/launchpad(an external API client): it only checksstatus == "running"and never usesrestarting/out-of-sync, so dropping them is safe there.maincould emit (maincould returnrestarting), so it's not a strict superset anymore — but no known client depends on the removed values.Removed the static column
20260615000000-remove-container-status.jsdropsContainers.status; removed from the model.create-container.js,reconfigure-container.js,nodes.jsimport,json-to-sql.js). Failure is now captured by the job'sfailurestatus.status !== 'pending'guard became a Proxmox-VMID-presence guard.routers/templates.js(nginx/dnsmasq config) previously filteredwhere: { status: 'running' }; it now selects routable containers byipv4Address != null.routers/api/v1/resource-requests.js(applyResourceToExistingContainers) previously filteredwhere: { ..., status: 'running' }— that would have thrown "Unknown column 'status'" after the migration. It now filters oncontainerId != null(a VMID is required to reconfigure anyway).Frontend
ContainerStatusunion; the list page renders the live status from the list response via a small presentationalStatusBadge(friendly labels + status → badge-variant mapping). No per-row polling.Docs
ContainerStatusenum, referenced from theContainerschema.status) and a container-status field description.Testing
No test framework exists in the repo, so logic was validated with standalone Node assertion scripts (not committed):
node --check; OpenAPI YAML parses; clienttype-checkandbuildpass.Notes / follow-ups
out-of-sync) can be revisited later if Proxmox gains a bulk config endpoint or if drift is computed lazily on the detail view / cached.prove/test harness if the team wants committed tests.