Skip to content

test(train): add GPU bitwise-reproducibility test and document determinism#57

Open
Chouffe wants to merge 3 commits into
mainfrom
arthur/deterministic-training
Open

test(train): add GPU bitwise-reproducibility test and document determinism#57
Chouffe wants to merge 3 commits into
mainfrom
arthur/deterministic-training

Conversation

@Chouffe

@Chouffe Chouffe commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Closes #36.

Summary

Exploration for #36 found training is already fully deterministicL.seed_everything(seed, workers=True) + Trainer(deterministic=True) cover everything, including on GPU (Lightning sets CUBLAS_WORKSPACE_CONFIG=:4096:8 and enables strict torch.use_deterministic_algorithms itself). This PR locks the property in:

  • GPU twin of the bitwise-reproducibility test (skipif no CUDA — skips on ubuntu-latest CI, runs on dev GPU machines). Strict deterministic mode means any future nondeterministic op (e.g. mixed precision, attention backend changes) raises instead of silently diverging; the test catches the rest.
  • README Determinism section stating the guarantee and its scope.

Verification (RTX 4070 Ti SUPER)

  • Real config (pretrained vit_small_patch14_dinov2.lvd142m, finetune last block, dropout 0.1, full augment pipeline), same seed, two GPU fits: 0/202 weight tensors differ.
  • Real train.py CLI on a 32+8-sequence subset of the real data, two runs: checkpoints bitwise identical — weights, optimizer state, best epoch, and best val/f1 equal to the last float bit.
  • Scope caveat (documented): CPU vs GPU with the same seed diverge (~3e-5 max delta after 2 epochs) — inherent floating-point kernel differences, not fixable by any flag.

Test plan

  • uv run pytest tests/test_reproducibility.py -v → 2 passed (GPU machine)
  • CUDA_VISIBLE_DEVICES="" uv run pytest tests/test_reproducibility.py -v → 1 passed, 1 skipped (CI behavior)
  • make lint clean

…inism

Same-seed training is bitwise reproducible on GPU as well as CPU:
Trainer(deterministic=True) enables strict use_deterministic_algorithms
and sets CUBLAS_WORKSPACE_CONFIG. The GPU twin of the reproducibility
test guards this (skipped where CUDA is unavailable, e.g. CI). README
documents the guarantee and its scope: same seed + same device type +
same torch/CUDA versions; CPU vs GPU (or different GPU models) diverge
by floating-point rounding, which is inherent.

Closes #36
@Chouffe Chouffe requested a review from MateoLostanlen June 11, 2026 16:43
Chouffe added 2 commits June 11, 2026 18:45
Review follow-ups: the README guarantee now says same GPU model (not
device type, which two different GPUs share) and scopes the test to what
it actually asserts — same-seed weight reproducibility; optimizer state
and best-epoch selection were verified end-to-end, not by the test. The
GPU test drops the different-seed negative control (seeds diverge init
on CPU, so it proves nothing GPU-specific) and _fit_once_transformer's
accelerator parameter is now required.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make training fully deterministic

1 participant