test(train): add GPU bitwise-reproducibility test and document determinism by Chouffe · Pull Request #57 · pyronear/temporal-model

Chouffe · 2026-06-11T16:34:25Z

Closes #36.

Summary

Exploration for #36 found training is already fully deterministic — L.seed_everything(seed, workers=True) + Trainer(deterministic=True) cover everything, including on GPU (Lightning sets CUBLAS_WORKSPACE_CONFIG=:4096:8 and enables strict torch.use_deterministic_algorithms itself). This PR locks the property in:

GPU twin of the bitwise-reproducibility test (skipif no CUDA — skips on ubuntu-latest CI, runs on dev GPU machines). Strict deterministic mode means any future nondeterministic op (e.g. mixed precision, attention backend changes) raises instead of silently diverging; the test catches the rest.
README Determinism section stating the guarantee and its scope.

Verification (RTX 4070 Ti SUPER)

Real config (pretrained vit_small_patch14_dinov2.lvd142m, finetune last block, dropout 0.1, full augment pipeline), same seed, two GPU fits: 0/202 weight tensors differ.
Real train.py CLI on a 32+8-sequence subset of the real data, two runs: checkpoints bitwise identical — weights, optimizer state, best epoch, and best val/f1 equal to the last float bit.
Scope caveat (documented): CPU vs GPU with the same seed diverge (~3e-5 max delta after 2 epochs) — inherent floating-point kernel differences, not fixable by any flag.

Test plan

uv run pytest tests/test_reproducibility.py -v → 2 passed (GPU machine)
CUDA_VISIBLE_DEVICES="" uv run pytest tests/test_reproducibility.py -v → 1 passed, 1 skipped (CI behavior)
make lint clean

…inism Same-seed training is bitwise reproducible on GPU as well as CPU: Trainer(deterministic=True) enables strict use_deterministic_algorithms and sets CUBLAS_WORKSPACE_CONFIG. The GPU twin of the reproducibility test guards this (skipped where CUDA is unavailable, e.g. CI). README documents the guarantee and its scope: same seed + same device type + same torch/CUDA versions; CPU vs GPU (or different GPU models) diverge by floating-point rounding, which is inherent. Closes #36

Review follow-ups: the README guarantee now says same GPU model (not device type, which two different GPUs share) and scopes the test to what it actually asserts — same-seed weight reproducibility; optimizer state and best-epoch selection were verified end-to-end, not by the test. The GPU test drops the different-seed negative control (seeds diverge init on CPU, so it proves nothing GPU-specific) and _fit_once_transformer's accelerator parameter is now required.

Chouffe requested a review from MateoLostanlen June 11, 2026 16:43

Chouffe added 2 commits June 11, 2026 18:45

style(train): apply ruff format to reproducibility test

3d43cdb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(train): add GPU bitwise-reproducibility test and document determinism#57

test(train): add GPU bitwise-reproducibility test and document determinism#57
Chouffe wants to merge 3 commits into
mainfrom
arthur/deterministic-training

Chouffe commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Chouffe commented Jun 11, 2026

Summary

Verification (RTX 4070 Ti SUPER)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant