docs: correct stale R-comparison-tests TODO row (R not in CI) by igerber · Pull Request #535 · igerber/diff-diff

igerber · 2026-06-07T18:03:24Z

Summary

Correct a stale TODO row (Testing/Docs, Fix TWFE within-transformation bug and add methodology review #139): "R comparison tests spawn separate Rscript per test (slow CI)". The premise is false — no CI workflow installs R (no setup-r / r-lib/actions / fixest / r-base anywhere in .github/workflows/), so every R-parity test skips in CI behind a per-file availability gate (fixest_available, _check_r_contdid(), require_r/r_available). Consolidating Rscript spawns therefore yields zero CI speedup.
The originally-cited file (tests/test_methodology_twfe.py) already session-caches its R fits via @pytest.fixture(scope="session") (r_twfe_results / r_twfe_results_with_covariate).
Rewrite the row to document the corrected scope: the only residual is a local-dev-only micro-optimization (continuous_did.py / callaway.py re-spawn library(...) per call with no session cache). Retained as a low-value local-dev note rather than silently dropped, so it isn't re-raised as a CI-speed item.
Drop the already-shipped "doc-deps integrity CI" (PR chore: add CI integrity check for docs/doc-deps.yaml #519) from the dangling Tier-D parenthetical — its dedicated row was removed when chore: add CI integrity check for docs/doc-deps.yaml #519 merged but the mention was left behind.

Methodology references

N/A — documentation-only change (TODO.md). No estimator, identification, weighting, or variance/SE behavior is touched.

Validation

No test changes. Claims were verified against the repo before writing: grep confirms no R in .github/workflows/; the twfe session-scoped fixtures, the continuous_did / callaway call sites, and the removed chore: add CI integrity check for docs/doc-deps.yaml #519 doc-deps row/test (tests/test_doc_deps_integrity.py exists) were each confirmed.
Local AI review (codex, gpt-5.5, full standard, --force-fresh): ✅ clean — no P0/P1/P2; one P3-informational note confirming the reframing is correct.

Security / privacy

Confirm no secrets/PII in this PR: Yes

🤖 Generated with Claude Code

github-actions · 2026-06-07T18:05:09Z

✅ Looks good

Executive Summary

Documentation-only PR touching TODO.md:L177 and TODO.md:L224; no estimator, weighting, identification, variance/SE, or default behavior changes.
Methodology registry/source-paper cross-check is not triggered by the diff surface.
The “no CI impact” reframing is supported by workflow searches: no R setup/install workflow was found.
TWFE R parity results are already session-cached as stated.
One P3 informational documentation-completeness note: the rewritten local-dev caching row omits another Callaway R helper that also respawns Rscript.

Methodology

No findings.

Severity: P3
Impact: No methodology behavior is changed. The diff only edits deferred-work prose in TODO.md:L177 and backlog wording in TODO.md:L224, so there is no estimator/paper mismatch to evaluate against docs/methodology/REGISTRY.md.
Concrete fix: None required.

Code Quality

No findings.

Severity: P3
Impact: No production code changed.
Concrete fix: None required.

Performance

Finding: Callaway local-dev R caching note is slightly incomplete.

Severity: P3
Impact: TODO.md:L177 correctly reframes the item as local-dev-only, but it names _run_r_estimation in tests/test_methodology_callaway.py:L337-L409 and its three callers while omitting _get_r_mpdta_and_results, which also runs Rscript at tests/test_methodology_callaway.py:L1595-L1688 and is called by multiple MPDTA R-parity tests. This does not affect CI or correctness, but a future cleanup could miss part of the same local-dev performance debt.
Concrete fix: Extend the TODO row to also mention _get_r_mpdta_and_results / MPDTA strict benchmark tests as another local-dev session-cache candidate.

Maintainability

No blocking findings.

Severity: P3
Impact: The rewritten row is long, but it is acceptable for a debt ledger and materially more precise than the stale CI-speed claim.
Concrete fix: Optional: split the row into shorter clauses or mention the helper names more compactly.

Tech Debt

No blocking findings.

Severity: P3
Impact: The PR keeps the item tracked under “Tech Debt from Code Reviews,” and removes the stale Tier-D doc-deps integrity CI mention at TODO.md:L224. This is compatible with the project’s deferred-work policy.
Concrete fix: None required.

Security

No findings.

Severity: P3
Impact: No executable workflow or code paths changed; no secrets/PII introduced in the diff.
Concrete fix: None required.

Documentation/Tests

No blocking findings.

Severity: P3
Impact: No tests are needed for this documentation-only correction. Repository inspection supports the main claims: TWFE has session-scoped R fixtures at tests/test_methodology_twfe.py:L419-L434, R availability gates exist in tests/test_methodology_twfe.py:L37-L68, tests/test_methodology_continuous_did.py:L243-L260, and tests/conftest.py:L30-L90, and tests/test_doc_deps_integrity.py exists.
Concrete fix: None required beyond the optional Callaway helper wording above.

…c-deps mention The "R comparison tests spawn separate Rscript per test (slow CI)" row's premise is false: no CI workflow installs R (no setup-r / r-lib/actions / fixest / r-base in .github/workflows/), so every R-parity test skips in CI behind a per-file availability gate. The originally-cited test_methodology_twfe.py already session-caches its R fits via scope="session" fixtures. Rewrite the row to document the correction (the only residual is a local-dev-only micro-optimization in continuous_did / callaway, which re-spawn library() per call with no session cache) and drop the already-shipped "doc-deps integrity CI" (#519) from the dangling Tier D parenthetical. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-07T18:08:48Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 6290ca1c6de4568d63c0b590f11dd42cd66735b2

✅ Looks Good

Executive Summary

Re-review scope: the prior P3 note is addressed; TODO.md:L177 now includes _get_r_mpdta_and_results.
This PR only changes TODO.md; no estimator, weighting, variance/SE, inference, assumptions, or defaults changed.
Methodology cross-check against docs/methodology/REGISTRY.md is not triggered by the diff surface.
The corrected “no CI impact” framing is supported by workflow grep and R availability gates.
The stale doc-deps integrity CI Tier-D mention is removed at TODO.md:L224.

Methodology

Finding: None.

Severity: P3
Impact: No methodology behavior changed. The diff edits only deferred-work wording in TODO.md:L177 and TODO.md:L224.
Concrete fix: None required.

Code Quality

Finding: None.

Severity: P3
Impact: No production or test code changed.
Concrete fix: None required.

Performance

Finding: Previous P3 resolved.

Severity: P3
Impact: The row now correctly scopes the residual to local-dev-only R parity caching and names both Callaway helpers: _run_r_estimation at tests/test_methodology_callaway.py:L337 and _get_r_mpdta_and_results at tests/test_methodology_callaway.py:L1595.
Concrete fix: None required.

Maintainability

Finding: None blocking.

Severity: P3
Impact: The TODO row is long, but it is precise and removes a misleading CI-speed premise.
Concrete fix: Optional only: split the row for readability if the table becomes hard to maintain.

Tech Debt

Finding: None.

Severity: P3
Impact: The PR keeps the residual work tracked under “Tech Debt from Code Reviews” and corrects the Tier-D rollup wording at TODO.md:L224.
Concrete fix: None required.

Security

Finding: None.

Severity: P3
Impact: Documentation-only change; no executable workflow/code paths or secrets introduced.
Concrete fix: None required.

Documentation/Tests

Finding: No blocking documentation or test issues.

Severity: P3
Impact: Repository inspection supports the updated wording: no R install/setup pattern appears in .github/workflows/; TWFE already uses session-scoped fixtures at tests/test_methodology_twfe.py:L419-L434; R gates exist at tests/test_methodology_twfe.py:L37-L68, tests/test_methodology_continuous_did.py:L243-L260, and tests/conftest.py:L30-L92. The current GitHub Ubuntu 24.04 runner image inventory also lists common runtimes but not R, supporting the “no CI impact” claim. (github.com)
Concrete fix: None required.

igerber force-pushed the perf/rscript-consolidation branch from bbb7cf5 to 6290ca1 Compare June 7, 2026 18:06

igerber added the ready-for-ci Triggers CI test workflows label Jun 8, 2026

igerber merged commit ea5d405 into main Jun 8, 2026
5 of 6 checks passed

igerber deleted the perf/rscript-consolidation branch June 8, 2026 11:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: correct stale R-comparison-tests TODO row (R not in CI)#535

docs: correct stale R-comparison-tests TODO row (R not in CI)#535
igerber merged 1 commit into
mainfrom
perf/rscript-consolidation

igerber commented Jun 7, 2026

Uh oh!

github-actions Bot commented Jun 7, 2026

Uh oh!

github-actions Bot commented Jun 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Jun 7, 2026

Summary

Methodology references

Validation

Security / privacy

Uh oh!

github-actions Bot commented Jun 7, 2026

✅ Looks good

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

github-actions Bot commented Jun 7, 2026

✅ Looks Good

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant