Start with Use TrainCheck if you want the full workflow. This page explains traincheck-onlinecheck and traincheck-check.
TrainCheck has two checking modes:
traincheck-onlinecheckchecks traces whiletraincheck-collectis still writing them.traincheck-checkchecks completed trace files after collection finishes.
Use online checking when you want violations during a running job. Use offline checking when you want the easiest path or a reproducible local workflow.
Start trace collection for the target run:
traincheck-collect \
--pyscript target.py \
--models-to-track model \
--invariants invariants.json \
--output-dir target_traceIn another terminal, start the online checker:
traincheck-onlinecheck -f target_trace -i invariants.jsonThe online checker watches target_trace/ and updates its report as new traces arrive.
If the command fails with a missing watchdog package, install it in the same environment:
pip install watchdogControl the report refresh interval with:
traincheck-onlinecheck \
-f target_trace \
-i invariants.json \
--report-interval-seconds 30The offline path is simpler. First let traincheck-collect finish, then run:
traincheck-check -f target_trace -i invariants.jsonOffline checking reads the completed trace folder and writes a results directory.
Sampling is configured during trace collection:
traincheck-collect \
--pyscript target.py \
--models-to-track model \
--invariants invariants.json \
--sampling-interval 10 \
--warm-up-steps 10 \
--output-dir target_traceThen run either checker normally:
traincheck-onlinecheck -f target_trace -i invariants.jsontraincheck-check -f target_trace -i invariants.jsonThe checker does not decide which steps were traced. It checks the trace files that collection produced.
Both checkers write:
failed.log: violated invariants.passed.log: triggered invariants that passed.not_triggered.log: invariants that never ran on the trace.violations_summary.json: compact violation summaries.report.html: browser-readable summary.
The default output directory is timestamped. Use -o or --output-dir to choose a path:
traincheck-check \
-f target_trace \
-i invariants.json \
--output-dir check_resultsLog checker results to Weights & Biases:
traincheck-check \
-f target_trace \
-i invariants.json \
--report-wandb \
--wandb-project traincheckAttach offline checker metrics to an existing W&B run:
traincheck-check \
-f target_trace \
-i invariants.json \
--report-wandb \
--wandb-run-id <run-id>Log checker results to MLflow:
traincheck-check \
-f target_trace \
-i invariants.json \
--report-mlflow \
--mlflow-experiment traincheckThe online checker supports the same W&B and MLflow reporting flags.
-f, --trace-folders: trace directories produced bytraincheck-collect.-t, --traces: individual trace files.-i, --invariants: invariant files produced bytraincheck-infer.-o, --output-dir: results directory.--no-html-report: skipreport.html.--report-wandb: log summary metrics and the HTML report to W&B.--report-mlflow: log summary metrics and the HTML report to MLflow.--report-interval-seconds: online checker report refresh interval.
Run the command help for the complete option list:
traincheck-check --help
traincheck-onlinecheck --help