From 0f245591b80ec2b918559dea50482938d541de00 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sat, 6 Jun 2026 13:55:46 -0400
Subject: [PATCH] refactor: rename SyntheticDiDResults.placebo_effects ->
 variance_effects
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The field held method-specific contents — placebo treatment effects
("placebo"), per-draw bootstrap ATT estimates ("bootstrap"), or
leave-one-out estimates ("jackknife"); `variance_method` disambiguates —
so `placebo_effects` was misleading. Rename it to `variance_effects`
(aligns with the sibling `variance_method` field) and keep
`placebo_effects` as a deprecated read-only @property alias removed in
v4.0.0 (matches the lambda_reg/zeta deprecation convention; a deprecated
property alias on a Results object is a new pattern for this codebase).

- results.py: rename the dataclass field; add a `placebo_effects`
  @property that emits DeprecationWarning and returns variance_effects
  (read-only; it is a property not a field, so dataclasses.replace/asdict
  use variance_effects). Migrate the internal get_loo_effects_df() reads
  to self.variance_effects so normal use never routes through the alias.
  Add __setstate__ migrating legacy pickled state (`placebo_effects`, no
  `variance_effects`; <= 3.5.x) onto variance_effects so old pickles'
  draws survive unpickle via both names.
- synthetic_did.py: rename the fit() local placebo_effects/_n ->
  variance_effects/_n and the Results constructor kwarg. The
  _placebo_variance_se* helper docstrings keep "placebo_effects" (they
  genuinely return placebo effects).
- tests: migrate 16 attribute reads; add test_placebo_effects_deprecated_alias
  (warns + identity + read-only), test_variance_effects_access_emits_no_warning
  (internal reads don't trip the alias), and test_legacy_pickle_state_maps_
  placebo_effects. Migration completeness verified with a
  -W error::DeprecationWarning sweep.
- docs: autosummary RST (add variance_effects, keep placebo_effects for the
  deprecation release), REGISTRY, tutorial 03 (attribute access only;
  placebo prose kept), llms-full.txt (prose ptr + Attributes-table row).
- CHANGELOG Changed + Deprecated; TODO rows removed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 CHANGELOG.md                                  |  17 +++
 TODO.md                                       |   4 -
 diff_diff/guides/llms-full.txt                |   3 +-
 diff_diff/results.py                          |  55 +++++++--
 diff_diff/synthetic_did.py                    |  22 ++--
 .../diff_diff.SyntheticDiDResults.rst         |   1 +
 docs/methodology/REGISTRY.md                  |   4 +-
 docs/tutorials/03_synthetic_did.ipynb         |   4 +-
 tests/test_estimators.py                      |   4 +-
 tests/test_methodology_sdid.py                | 104 ++++++++++++++----
 tests/test_survey_phase5.py                   |   4 +-
 11 files changed, 168 insertions(+), 54 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index b0b990933..d2c5f86f7 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -10,6 +10,23 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ### Added
 - **`SyntheticControl` conformal inference (Chernozhukov, Wüthrich & Zhu 2021, *JASA* 116(536)).** Three opt-in `SyntheticControlResults` methods give valid p-values for the post-period effect trajectory and pointwise confidence intervals — what the in-space placebo / Firpo-Possebom test-inversion paths cannot. Unlike the Firpo path (which re-ranks the cross-unit placebo gaps), the conformal layer fits its **own** time-permutation-invariant constrained-LS synthetic-control proxy (CWZ §2.3 eqs 3–4 — simplex weights on raw outcomes over **all** periods under the null, no `V`-matrix, no intercept) and permutes residuals **over time** for the single treated unit (CWZ's exactness theory requires a time-symmetric proxy, which the headline ADH `V`-matrix fit is not). **`conformal_test(effect, q=1, scheme="moving_block", n_iid=10000, seed=None)`** computes the joint sharp-null permutation p-value (eqs 1–2) of `S_q(û) = ((1/√T*)·Σ_{t>T0}|û_t|^q)^{1/q}` (`q ∈ {1, 2, ∞}`); the proxy is fit once and only residuals are permuted (footnote 7). **`conformal_confidence_intervals(alpha=0.1, scheme="moving_block", bounds=None, n_grid=100, seed=None)`** returns pointwise per-period CIs by test inversion (Algorithm 1 — each period `t` uses `Z = (pre-periods, t)` with the other post-periods dropped, a clean `T*=1` test). **`conformal_average_effect(alpha=0.1, scheme="moving_block", bounds=None, n_grid=200, seed=None)`** returns a CI for the average post-period effect by collapsing the panel into non-overlapping `T*`-blocks and permuting the block residuals (Appendix A.1). Permutation schemes: `"moving_block"` (`Π_→` cyclic shifts, valid under serial dependence — the default) and `"iid"` (`Π_all`, sampled, finer p-values); both include the identity so the p-value floor is `1/|Π|` (no extra `+1`). Fail-closed handling for `<1` donor / unpickled result / non-finite panel / non-converged grid points (treated as indeterminate, not rejected) / grid-limited / empty / unbounded sets; a single donor and `T*≥T0` warn. Surfaced under `conformal_inference` / `get_conformal_grid_df()` and `DiagnosticReport`'s `estimator_native_diagnostics`; the analytical `se`/`t_stat`/`p_value`/`conf_int`/`is_significant` stay NaN throughout. Core in the new `diff_diff/conformal.py` (reuses the Frank-Wolfe simplex solver). *Deferred:* one-sided variants (§7), covariates folded into the proxy, and the AR/innovation-permutation path (Lemmas 5–7).
 
+### Changed
+- **`SyntheticDiDResults.placebo_effects` renamed to `variance_effects`.** The
+  array's contents are method-specific — placebo treatment effects
+  (`variance_method="placebo"`), per-draw bootstrap ATT estimates
+  (`"bootstrap"`), or leave-one-out estimates (`"jackknife"`) — so the old name
+  was misleading; the `variance_method` field disambiguates the contents. Read
+  `result.variance_effects` going forward.
+
+### Deprecated
+- **`SyntheticDiDResults.placebo_effects`** is now a read-only alias for
+  `variance_effects` that emits a `DeprecationWarning` on access; it will be
+  removed in v4.0.0. The alias is a property, not a dataclass field, so it is
+  read-only (assignment raises `AttributeError`) and
+  `dataclasses.replace(result, placebo_effects=...)` no longer works /
+  `dataclasses.asdict(result)` now emits the `variance_effects` key — use
+  `variance_effects`.
+
 ## [3.5.1] - 2026-06-02
 
 ### Added
diff --git a/TODO.md b/TODO.md
index df635e941..6043288dd 100644
--- a/TODO.md
+++ b/TODO.md
@@ -175,7 +175,6 @@ Deferred items from PR reviews that were not addressed before merge.
 | R comparison tests spawn separate `Rscript` per test (slow CI) | `tests/test_methodology_twfe.py:294` | #139 | Low |
 | CS R helpers hard-code `xformla = ~ 1`; no covariate-adjusted R benchmark for IRLS path | `tests/test_methodology_callaway.py` | #202 | Low |
 | Validating the `.txt` AI guides (`diff_diff/guides/llms-full.txt`, `llms-practitioner.txt`) as executable snippets is **not low-lift** (re-scoped 2026-06-01): of their ~112 fenced Python blocks only ~20% are standalone-runnable — the rest are API-signature references (`Foo(param: type = default)` pseudo-signatures that are `SyntaxError` by design), context fragments (e.g. `results.att` on an undefined `results`), or dataset-shape-specific blocks. The guides are reference documentation, not runnable examples; a real implementation needs signature-block detection + a context/data skip-allowlist + per-snippet fixtures (multi-round curation), unlike the curated `.rst` files the existing smoke test covers. | `tests/test_doc_snippets.py` | #239 | Low |
-| SyntheticDiD: rename internal `placebo_effects` variable to `variance_effects` (or `resampled_effects`). Misleading name across the placebo/bootstrap/jackknife dispatch paths — holds three different contents depending on variance method. Low-risk refactor; user-facing field rename should preserve `placebo_effects` as a deprecated alias for one release. | `synthetic_did.py`, `results.py` | follow-up | Medium |
 | `TestWorkflowDoesNotExecutePRHeadCode` (CodeQL #14 dismissal guard) does not model: `bash <script>` / `sh <script>` / `./<script>` / `source <script>` / `. <script>` direct shell-script execution; multi-line `python3 -c` bodies (line-by-line shlex can't reassemble across newlines — the workflow's 5 sanitizer bodies are exempt by invisibility); shell-variable-expansion indirection (`SCRIPT="$X"; python3 "$SCRIPT"`); `eval`; `find -exec`; `xargs -I {}`. Each represents a path by which PR-head bytes COULD execute without the test failing. The guard catches accidental regressions of common forms (16 tests covering pip/npm/cargo/maturin/etc. installs, python file exec, bash -c indirection with compound flags, env-var prefixes, line continuations, subshells/brace groups, single-line python -c, write-overwrites of allowlisted /tmp paths). Closing the residuals would require multi-line shell parsing with command-substitution awareness + script-execution allowlists — significant work for diminishing return given the dismissal's primary defense is the documented threat model on the alert and in `.github/workflows/ai_pr_review.yml` comment block. | `tests/test_openai_review.py`, `.github/workflows/ai_pr_review.yml` | #436 | Low |
 | Render `docs/methodology/REPORTING.md` and `docs/methodology/REGISTRY.md` as in-site Sphinx pages so cross-references can use `:doc:` instead of off-site GitHub `blob/main` URLs. Current state (#410 fix-audit-r2) restores navigable links via `blob/main`, but stable-docs readers can land on a different revision than the package version they are reading. Two viable paths: (a) add `myst-parser` to `docs/conf.py` extensions + docs extras and link with `:doc:`, or (b) convert both files to `.rst`. | `docs/conf.py`, `docs/api/business_report.rst`, `docs/api/diagnostic_report.rst`, `docs/tutorials/18_geo_experiments.ipynb`, `docs/tutorials/19_dcdh_marketing_pulse.ipynb` | follow-up | Low |
 | ImputationDiD methodology validation (PR-B): add `tests/test_methodology_imputation.py` with paper-equation-numbered Verified Components (Theorems 1-3, eqs. 5-9, Props. 5/9) and an R `didimputation` parity fixture (none on file). Flips the METHODOLOGY_REVIEW.md row to Complete. | `tests/test_methodology_imputation.py` | imputation-validation (PR-B) | Medium |
@@ -190,11 +189,8 @@ Ordered paydown view across the tables above. Tier A → D is by effort × risk,
 
 _(No active items. The sole prior entry — the WooldridgeDiD method/outcome efficiency hint — has shipped; see CHANGELOG `## [Unreleased]` and REGISTRY §WooldridgeDiD "Nonlinear extensions".)_
 
-(SyntheticDiD `placebo_effects` → `variance_effects` rename moved to Tier B — the user-facing field rename + one-release deprecation alias is too large for ≤1 day / ≤3 CI rounds.)
-
 #### Tier B — Mid-size methodology (5-10 CI rounds expected, per memory cascade priors)
 
-- SyntheticDiD: rename internal `placebo_effects` → `variance_effects` AND public `placebo_effects` field with deprecation alias retained for one release (`synthetic_did.py`, `results.py`)
 - StaggeredTripleDifference R parity: commit CSV fixtures + add covariate-adjusted scenarios + aggregation-SE assertions (`tests/test_methodology_staggered_triple_diff.py`, `benchmarks/R/benchmark_staggered_triplediff.R`)
 - StaggeredTripleDifference: per-cohort group-effect SE WIF override for exact R `triplediff` match (`staggered_triple_diff.py`)
 - WooldridgeDiD: QMLE Stata-parity `qmle` weight type + Stata golden values (`wooldridge.py`, `linalg.py`, `tests/test_wooldridge.py`)
diff --git a/diff_diff/guides/llms-full.txt b/diff_diff/guides/llms-full.txt
index f558c461e..14b342d3e 100644
--- a/diff_diff/guides/llms-full.txt
+++ b/diff_diff/guides/llms-full.txt
@@ -1263,6 +1263,7 @@ Returned by `SyntheticDiD.fit()`.
 | `pre_periods` | `list` | Pre-treatment periods |
 | `post_periods` | `list` | Post-treatment periods |
 | `variance_method` | `str` | "bootstrap", "jackknife", or "placebo" |
+| `variance_effects` | `np.ndarray` | Per-iteration draws (placebo effects, bootstrap ATT draws, or jackknife LOO estimates per `variance_method`); deprecated alias `placebo_effects` (removed v4.0.0) |
 | `noise_level` | `float` | Estimated noise level |
 | `zeta_omega` | `float` | Unit weight regularization |
 | `zeta_lambda` | `float` | Time weight regularization |
@@ -1272,7 +1273,7 @@ Returned by `SyntheticDiD.fit()`.
 
 **Validation diagnostics** (call after `fit()`):
 - `get_weight_concentration(top_k=5)` - effective N and top-k weight share; flags fragile synthetic controls dominated by a few donor units
-- `get_loo_effects_df()` - per-unit leave-one-out influence from the jackknife pass (DataFrame includes both control and treated rows). Requires `variance_method="jackknife"` with unit-level LOO granularity: available on non-survey and pweight-only jackknife fits; raises `NotImplementedError` on full-design survey jackknife (PSU-level LOO, see `result.placebo_effects` for raw PSU-level replicates) and `ValueError` when LOO is unavailable (single treated unit, only one control with nonzero effective weight, etc.)
+- `get_loo_effects_df()` - per-unit leave-one-out influence from the jackknife pass (DataFrame includes both control and treated rows). Requires `variance_method="jackknife"` with unit-level LOO granularity: available on non-survey and pweight-only jackknife fits; raises `NotImplementedError` on full-design survey jackknife (PSU-level LOO, see `result.variance_effects` for raw PSU-level replicates) and `ValueError` when LOO is unavailable (single treated unit, only one control with nonzero effective weight, etc.)
 - `in_time_placebo()` - re-estimate on shifted fake treatment dates in the pre-period; near-zero placebo ATTs indicate a credible design
 - `sensitivity_to_zeta_omega()` - re-estimate across a grid of unit-weight regularization values; checks ATT robustness to the auto-selected zeta_omega
 
diff --git a/diff_diff/results.py b/diff_diff/results.py
index 6a2838362..2cef99d22 100644
--- a/diff_diff/results.py
+++ b/diff_diff/results.py
@@ -4,6 +4,7 @@
 Provides statsmodels-style output with a more Pythonic interface.
 """
 
+import warnings
 from dataclasses import dataclass, field
 from typing import Any, Dict, List, Optional, Tuple
 
@@ -1079,12 +1080,13 @@ class SyntheticDiDResults:
         Arkhangelsky et al. 2021 Algorithm 2 step 2, and R's default
         ``synthdid::vcov(method="bootstrap")``), ``"jackknife"``, or
         ``"placebo"``.
-    placebo_effects : np.ndarray, optional
+    variance_effects : np.ndarray, optional
         Method-specific per-iteration estimates: placebo treatment effects
         (for ``"placebo"``), bootstrap ATT estimates with re-estimated
         weights per draw (for ``"bootstrap"``), or leave-one-out estimates
         (for ``"jackknife"``). The ``variance_method`` field disambiguates
-        the contents.
+        the contents. (The deprecated read-only alias ``placebo_effects``
+        returns this array and is removed in v4.0.0.)
     synthetic_pre_trajectory : np.ndarray, optional
         Synthetic control trajectory in pre-treatment periods, shape
         ``(n_pre,)``. Equal to ``Y_pre_control @ omega_eff`` where
@@ -1122,7 +1124,7 @@ class SyntheticDiDResults:
     zeta_omega: Optional[float] = field(default=None)
     zeta_lambda: Optional[float] = field(default=None)
     pre_treatment_fit: Optional[float] = field(default=None)
-    placebo_effects: Optional[np.ndarray] = field(default=None)
+    variance_effects: Optional[np.ndarray] = field(default=None)
     n_bootstrap: Optional[int] = field(default=None)
     # Survey design metadata (SurveyMetadata instance from diff_diff.survey)
     survey_metadata: Optional[Any] = field(default=None)
@@ -1145,7 +1147,7 @@ def __post_init__(self):
         # Plain attributes rather than dataclass fields so asdict()-style
         # recursion cannot serialize internal panel state.
         self._loo_unit_ids: Optional[List[Any]] = None
-        # Granularity of the `placebo_effects` LOO array: "unit" (non-
+        # Granularity of the `variance_effects` LOO array: "unit" (non-
         # survey + pweight-only jackknife), "psu" (full-design survey
         # jackknife), or None (non-jackknife variance methods). Governs
         # which accessors are well-defined. Set by `fit()` at result
@@ -1180,6 +1182,20 @@ def __getstate__(self) -> Dict[str, Any]:
         state["_fit_snapshot"] = None
         return state
 
+    def __setstate__(self, state: Dict[str, Any]) -> None:
+        """Restore from pickle, migrating the legacy field name.
+
+        Results pickled before the ``placebo_effects`` → ``variance_effects``
+        rename (<= 3.5.x) carry the old key in their state; map it so the
+        stored variance draws survive and remain reachable through both
+        ``variance_effects`` and the deprecated ``placebo_effects`` alias.
+        Remove together with the alias in v4.0.0.
+        """
+        if "placebo_effects" in state and "variance_effects" not in state:
+            state = dict(state)
+            state["variance_effects"] = state.pop("placebo_effects")
+        self.__dict__.update(state)
+
     @property
     def coef_var(self) -> float:
         """Coefficient of variation: SE / abs(ATT). NaN when ATT is 0 or SE non-finite."""
@@ -1189,6 +1205,27 @@ def coef_var(self) -> float:
             return np.nan
         return self.se / abs(self.att)
 
+    @property
+    def placebo_effects(self) -> Optional[np.ndarray]:
+        """Deprecated alias for :attr:`variance_effects` (removed in v4.0.0).
+
+        .. deprecated:: 3.6.0
+            Renamed to ``variance_effects`` because the array's contents are
+            method-specific (placebo effects, bootstrap ATT draws, or
+            leave-one-out estimates depending on ``variance_method``).
+        """
+        # `3.6.0` is the assumed next-minor (current is 3.5.1); confirm/resolve
+        # at bump-version time. The v4.0.0 removal target is fixed.
+        warnings.warn(
+            "SyntheticDiDResults.placebo_effects is deprecated; use "
+            "variance_effects instead. The array holds placebo effects, "
+            "bootstrap ATT draws, or leave-one-out estimates depending on "
+            "variance_method. Will be removed in v4.0.0.",
+            DeprecationWarning,
+            stacklevel=2,
+        )
+        return self.variance_effects
+
     def summary(self, alpha: Optional[float] = None) -> str:
         """
         Generate a formatted summary of the estimation results.
@@ -1388,7 +1425,7 @@ def get_loo_effects_df(self) -> pd.DataFrame:
         * full-design survey jackknife fits (strata / PSU / FPC set in
           ``SurveyDesign``) - the underlying replicates are PSU-level
           ``τ̂_{(h,j)}`` (Rust & Rao 1996), not unit-level. See
-          ``result.placebo_effects`` for the raw PSU-level replicate
+          ``result.variance_effects`` for the raw PSU-level replicate
           array and REGISTRY §SyntheticDiD "Note (survey + jackknife
           composition)" for the aggregation formula.
 
@@ -1424,7 +1461,7 @@ def get_loo_effects_df(self) -> pd.DataFrame:
             )
         # Survey-jackknife fits use PSU-level LOO (Rust & Rao 1996) with
         # stratum aggregation rather than unit-level LOO. The returned
-        # ``placebo_effects`` array in that path is a flat list of
+        # ``variance_effects`` array in that path is a flat list of
         # PSU-level τ̂_{(h,j)} replicates (variable length, ordered by
         # stratum then PSU), not a length-N unit-indexed array. Mapping
         # these onto the fit-time unit IDs would mislabel PSU replicates
@@ -1441,19 +1478,19 @@ def get_loo_effects_df(self) -> pd.DataFrame:
                 "stratum aggregation, Rust & Rao 1996); the underlying "
                 "replicates are PSU-level, not unit-level, so joining them "
                 "back to fit-time unit IDs is not well-defined. See "
-                "``result.placebo_effects`` for the raw PSU-level replicate "
+                "``result.variance_effects`` for the raw PSU-level replicate "
                 "array and ``docs/methodology/REGISTRY.md`` §SyntheticDiD "
                 '"Note (survey + jackknife composition)" for the '
                 "aggregation formula."
             )
-        if self._loo_unit_ids is None or self._loo_roles is None or self.placebo_effects is None:
+        if self._loo_unit_ids is None or self._loo_roles is None or self.variance_effects is None:
             raise ValueError(
                 "Leave-one-out estimates are unavailable (jackknife returned "
                 "NaN or an empty array). See prior warnings from fit() for the "
                 "cause (e.g., single treated unit, all weight on one control)."
             )
 
-        att_loo = np.asarray(self.placebo_effects, dtype=float)
+        att_loo = np.asarray(self.variance_effects, dtype=float)
         delta = att_loo - self.att
         df = pd.DataFrame(
             {
diff --git a/diff_diff/synthetic_did.py b/diff_diff/synthetic_did.py
index bfc413117..c56f87f2d 100644
--- a/diff_diff/synthetic_did.py
+++ b/diff_diff/synthetic_did.py
@@ -1080,7 +1080,7 @@ def fit(  # type: ignore[override]
                 min_decrease=min_decrease,
             )
             se = se_n * Y_scale
-            placebo_effects = np.asarray(bootstrap_estimates_n) * Y_scale
+            variance_effects = np.asarray(bootstrap_estimates_n) * Y_scale
             inference_method = "bootstrap"
         elif self.variance_method == "jackknife":
             if _jackknife_use_survey_path:
@@ -1142,7 +1142,7 @@ def fit(  # type: ignore[override]
                     w_control=w_control,
                 )
             se = se_n * Y_scale
-            placebo_effects = np.asarray(jackknife_estimates_n) * Y_scale
+            variance_effects = np.asarray(jackknife_estimates_n) * Y_scale
             inference_method = "jackknife"
         else:
             # Use placebo-based variance (R's synthdid Algorithm 4).
@@ -1155,7 +1155,7 @@ def fit(  # type: ignore[override]
                 # permutation degenerate to a global within-stratum
                 # permutation dispatched through the weighted-FW path.
                 assert w_control is not None
-                se_n, placebo_effects_n = self._placebo_variance_se_survey(
+                se_n, variance_effects_n = self._placebo_variance_se_survey(
                     Y_pre_control_n,
                     Y_post_control_n,
                     Y_pre_treated_mean_n,
@@ -1169,7 +1169,7 @@ def fit(  # type: ignore[override]
                     w_control=w_control,
                 )
             else:
-                se_n, placebo_effects_n = self._placebo_variance_se(
+                se_n, variance_effects_n = self._placebo_variance_se(
                     Y_pre_control_n,
                     Y_post_control_n,
                     Y_pre_treated_mean_n,
@@ -1184,7 +1184,7 @@ def fit(  # type: ignore[override]
                     init_lambda=time_weights,
                 )
             se = se_n * Y_scale
-            placebo_effects = np.asarray(placebo_effects_n) * Y_scale
+            variance_effects = np.asarray(variance_effects_n) * Y_scale
             inference_method = "placebo"
 
         # Compute test statistics
@@ -1195,10 +1195,10 @@ def fit(  # type: ignore[override]
         # (sampling distribution, not null), and jackknife pseudo-values are not
         # null-distribution draws either. Both use the analytical p-value from
         # the bootstrap/jackknife SE.
-        if inference_method == "placebo" and len(placebo_effects) > 0 and np.isfinite(t_stat):
+        if inference_method == "placebo" and len(variance_effects) > 0 and np.isfinite(t_stat):
             p_value = max(
-                np.mean(np.abs(placebo_effects) >= np.abs(att)),
-                1.0 / (len(placebo_effects) + 1),
+                np.mean(np.abs(variance_effects) >= np.abs(att)),
+                1.0 / (len(variance_effects) + 1),
             )
         else:
             p_value = p_value_analytical
@@ -1209,12 +1209,12 @@ def fit(  # type: ignore[override]
         unit_weights_dict = {unit_id: w for unit_id, w in zip(control_units, omega_eff)}
         time_weights_dict = {period: w for period, w in zip(pre_periods, time_weights)}
 
-        # Jackknife LOO ID/role arrays parallel to placebo_effects positions
+        # Jackknife LOO ID/role arrays parallel to variance_effects positions
         # (first n_control entries are control-LOO, next n_treated are treated-LOO;
         # see _jackknife_se docstring).
         loo_unit_ids: Optional[List[Any]]
         loo_roles: Optional[List[str]]
-        if inference_method == "jackknife" and len(placebo_effects) > 0:
+        if inference_method == "jackknife" and len(variance_effects) > 0:
             loo_unit_ids = list(control_units) + list(treated_units)
             loo_roles = ["control"] * len(control_units) + ["treated"] * len(treated_units)
         else:
@@ -1271,7 +1271,7 @@ def fit(  # type: ignore[override]
             zeta_omega=zeta_omega,
             zeta_lambda=zeta_lambda,
             pre_treatment_fit=pre_fit_rmse,
-            placebo_effects=placebo_effects if len(placebo_effects) > 0 else None,
+            variance_effects=variance_effects if len(variance_effects) > 0 else None,
             n_bootstrap=self.n_bootstrap if inference_method == "bootstrap" else None,
             survey_metadata=survey_metadata,
             synthetic_pre_trajectory=synthetic_pre_trajectory,
diff --git a/docs/api/_autosummary/diff_diff.SyntheticDiDResults.rst b/docs/api/_autosummary/diff_diff.SyntheticDiDResults.rst
index 57a1c2a53..16b1d35d4 100644
--- a/docs/api/_autosummary/diff_diff.SyntheticDiDResults.rst
+++ b/docs/api/_autosummary/diff_diff.SyntheticDiDResults.rst
@@ -44,6 +44,7 @@
       ~SyntheticDiDResults.time_weights_array
       ~SyntheticDiDResults.treated_post_trajectory
       ~SyntheticDiDResults.treated_pre_trajectory
+      ~SyntheticDiDResults.variance_effects
       ~SyntheticDiDResults.variance_method
       ~SyntheticDiDResults.zeta_lambda
       ~SyntheticDiDResults.zeta_omega
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index c3a098abe..537228fb6 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -1909,7 +1909,7 @@ Convergence criterion: stop when objective decrease < min_decrease² (default mi
 
   **Validation:** (a) hand-computed 2-stratum FPC magnitude regression (`test_jackknife_full_design_fpc_reduces_se_magnitude` — asserts `SE_fpc == SE_nofpc · sqrt(1 - f)` at `rtol=1e-10`), (b) self-consistency between the returned SE and the stratum-aggregation formula applied to the returned LOO estimates, (c) single-PSU-stratum skip, (d) all-strata-skipped UserWarning + NaN, (e) unstratified single-PSU short-circuit, (f) deterministic-dispatch regression.
 
-- **Note:** P-value computation is variance-method dependent. Placebo (Algorithm 4) uses the empirical null formula `max(mean(|placebo_effects| ≥ |att|), 1/(r+1))` because permuting control indices generates draws from the null distribution (centered on 0). Bootstrap (Algorithm 2) and jackknife (Algorithm 3) use the analytical p-value from `safe_inference(att, se)` (normal-theory): bootstrap draws are centered on `τ̂` (sampling distribution of the estimator) and jackknife pseudo-values are not null draws, so the empirical null formula is invalid for them. This matches R's `synthdid::vcov()` convention, where variance is returned and inference is normal-theory from the SE.
+- **Note:** P-value computation is variance-method dependent. Placebo (Algorithm 4) uses the empirical null formula `max(mean(|variance_effects| ≥ |att|), 1/(r+1))` because permuting control indices generates draws from the null distribution (centered on 0). Bootstrap (Algorithm 2) and jackknife (Algorithm 3) use the analytical p-value from `safe_inference(att, se)` (normal-theory): bootstrap draws are centered on `τ̂` (sampling distribution of the estimator) and jackknife pseudo-values are not null draws, so the empirical null formula is invalid for them. This matches R's `synthdid::vcov()` convention, where variance is returned and inference is normal-theory from the SE.
 - **Note (coverage Monte Carlo calibration):** `benchmarks/data/sdid_coverage.json` carries empirical rejection rates across the three variance methods on 4 representative null-panel DGPs (500 seeds × B=200, regenerable via `benchmarks/python/coverage_sdid.py`). The fourth DGP (`stratified_survey`, added in PR #355) validates the survey-bootstrap calibration; jackknife is also reported with a documented anti-conservatism caveat; placebo is N/A on this DGP because its cohort packs into a single stratum with 0 never-treated units (stratified-permutation allocator is structurally infeasible — see `test_placebo_full_design_raises_on_zero_control_stratum` / `_undersupplied_stratum` for the enforced behavior). Under H0 the nominal rejection rate at each α equals α; rates substantially above α indicate anti-conservatism, rates below indicate over-coverage.
 
     | DGP                                                       | method     | α=0.01 | α=0.05 | α=0.10 | mean SE / true SD |
@@ -1942,7 +1942,7 @@ Convergence criterion: stop when objective decrease < min_decrease² (default mi
 *Validation diagnostics (post-fit methods on `SyntheticDiDResults`):*
 
 - **Trajectories** (`synthetic_pre_trajectory`, `synthetic_post_trajectory`, `treated_pre_trajectory`, `treated_post_trajectory`): retained on results to support plotting and custom fit metrics. `synthetic_pre_trajectory = Y_pre_control @ ω_eff`; `treated_pre_trajectory` is the survey-weighted treated mean (matches the Frank-Wolfe target). `pre_treatment_fit` is recoverable as `RMSE(treated_pre_trajectory, synthetic_pre_trajectory)`.
-- **`get_loo_effects_df()`**: user-facing join of the jackknife leave-one-out pseudo-values (stored in `placebo_effects`) to the underlying unit identities. **Unit-level LOO only** — available on the non-survey and pweight-only jackknife paths (classical Algorithm 3: one LOO per unit, first `n_control` positions map to `control_unit_ids`, next `n_treated` to `treated_unit_ids`; `att_loo` is NaN when the zero-sum composed-weight guard fired for that unit; `delta_from_full = att_loo - att`). Under the full-design survey jackknife path (PSU-level LOO with stratum aggregation, Rust & Rao 1996), the underlying replicates are PSU-level rather than unit-level — the accessor raises `NotImplementedError` pointing to `result.placebo_effects` for the raw PSU-level replicate array. Dispatch is gated by an explicit `_loo_granularity` flag set at fit-time (`"unit"` vs `"psu"`). Requires `variance_method='jackknife'`; raises `ValueError` otherwise.
+- **`get_loo_effects_df()`**: user-facing join of the jackknife leave-one-out pseudo-values (stored in `variance_effects`) to the underlying unit identities. **Unit-level LOO only** — available on the non-survey and pweight-only jackknife paths (classical Algorithm 3: one LOO per unit, first `n_control` positions map to `control_unit_ids`, next `n_treated` to `treated_unit_ids`; `att_loo` is NaN when the zero-sum composed-weight guard fired for that unit; `delta_from_full = att_loo - att`). Under the full-design survey jackknife path (PSU-level LOO with stratum aggregation, Rust & Rao 1996), the underlying replicates are PSU-level rather than unit-level — the accessor raises `NotImplementedError` pointing to `result.variance_effects` for the raw PSU-level replicate array. Dispatch is gated by an explicit `_loo_granularity` flag set at fit-time (`"unit"` vs `"psu"`). Requires `variance_method='jackknife'`; raises `ValueError` otherwise.
 - **`get_weight_concentration(top_k=5)`**: returns `effective_n = 1/Σω²` (inverse Herfindahl), `herfindahl`, `top_k_share`, `top_k`. Operates on `self.unit_weights` which stores the composed `ω_eff`; for survey-weighted fits the metrics reflect the population-weighted concentration, not the raw Frank-Wolfe solution.
 - **`in_time_placebo(fake_treatment_periods=None, zeta_omega_override=None, zeta_lambda_override=None)`**: re-slices the pre-window at each fake treatment period and re-fits both ω and λ via Frank-Wolfe. Default sweeps every feasible pre-period (position index `i ≥ 2` so ≥2 pre-fake periods remain for weight estimation, `i ≤ n_pre - 1` so ≥1 post-fake period exists). Credible designs produce near-zero placebo ATTs; departures indicate pre-treatment dynamics the estimator is picking up.
   - **Note:** Regularization reuses `self.zeta_omega` / `self.zeta_lambda` from the original fit (matches R `synthdid` convention of treating regularization as a property of the fit). `*_override` re-fits with new values.
diff --git a/docs/tutorials/03_synthetic_did.ipynb b/docs/tutorials/03_synthetic_did.ipynb
index fd74871ce..eca4cb2f0 100644
--- a/docs/tutorials/03_synthetic_did.ipynb
+++ b/docs/tutorials/03_synthetic_did.ipynb
@@ -425,7 +425,7 @@
     "print(\"Placebo-based inference:\")\n",
     "print(f\"ATT: {results_placebo.att:.4f}\")\n",
     "print(f\"SE: {results_placebo.se:.4f}\")\n",
-    "print(f\"Number of placebo effects: {len(results_placebo.placebo_effects)}\")"
+    "print(f\"Number of placebo effects: {len(results_placebo.variance_effects)}\")"
    ]
   },
   {
@@ -438,7 +438,7 @@
     "    # Visualize placebo distribution\n",
     "    fig, ax = plt.subplots(figsize=(10, 6))\n",
     "    \n",
-    "    ax.hist(results_placebo.placebo_effects, bins=20, alpha=0.7, \n",
+    "    ax.hist(results_placebo.variance_effects, bins=20, alpha=0.7, \n",
     "            edgecolor='black', label='Placebo effects')\n",
     "    ax.axvline(x=results_placebo.att, color='red', linewidth=2, \n",
     "               linestyle='--', label=f'Actual ATT = {results_placebo.att:.2f}')\n",
diff --git a/tests/test_estimators.py b/tests/test_estimators.py
index 7de586d44..5448a347f 100644
--- a/tests/test_estimators.py
+++ b/tests/test_estimators.py
@@ -2553,8 +2553,8 @@ def test_placebo_inference(self, sdid_panel_data):
         )
 
         assert results.variance_method == "placebo"
-        assert results.placebo_effects is not None
-        assert len(results.placebo_effects) > 0
+        assert results.variance_effects is not None
+        assert len(results.variance_effects) > 0
         assert results.se > 0
 
     def test_bootstrap_inference(self, sdid_panel_data, ci_params):
diff --git a/tests/test_methodology_sdid.py b/tests/test_methodology_sdid.py
index 4ec97d32b..26cdaa8fc 100644
--- a/tests/test_methodology_sdid.py
+++ b/tests/test_methodology_sdid.py
@@ -486,9 +486,9 @@ def test_placebo_se_formula(self):
         assert results.variance_method == "placebo"
 
         # Verify the formula: se = sqrt((r-1)/r) * sd(placebo_estimates)
-        if results.placebo_effects is not None:
-            r = len(results.placebo_effects)
-            expected_se = np.sqrt((r - 1) / r) * np.std(results.placebo_effects, ddof=1)
+        if results.variance_effects is not None:
+            r = len(results.variance_effects)
+            expected_se = np.sqrt((r - 1) / r) * np.std(results.variance_effects, ddof=1)
             assert abs(results.se - expected_se) < 1e-10
 
 
@@ -1245,8 +1245,8 @@ def test_jackknife_se_formula(self):
             unit="unit", time="period",
             post_periods=list(range(5, 8)),
         )
-        assert results.placebo_effects is not None
-        u = results.placebo_effects
+        assert results.variance_effects is not None
+        u = results.variance_effects
         n = len(u)
         u_bar = np.mean(u)
         expected_se = np.sqrt((n - 1) / n * np.sum((u - u_bar) ** 2))
@@ -1262,8 +1262,8 @@ def test_jackknife_n_iterations(self):
             unit="unit", time="period",
             post_periods=list(range(5, 8)),
         )
-        assert results.placebo_effects is not None
-        assert len(results.placebo_effects) == n_co + n_tr
+        assert results.variance_effects is not None
+        assert len(results.variance_effects) == n_co + n_tr
 
     def test_jackknife_single_treated_nan(self):
         """Single treated unit -> NaN SE (matches R's NA)."""
@@ -1621,7 +1621,7 @@ def capture_then_call(*args, **kwargs):
             f"Python placebo SE {py_se} != R {r_se} (delta {py_se - r_se})"
         )
         # Per-draw τ regression: equal-SE doesn't imply equal sample, and
-        # the placebo τ vector is user-visible through ``placebo_effects``
+        # the placebo τ vector is user-visible through ``variance_effects``
         # and feeds the empirical placebo p-value (synthetic_did.py
         # around L1164-L1170). Compare elementwise so a permutation that
         # diverged at a single draw — but happened to leave sd() unchanged
@@ -1937,6 +1937,51 @@ def test_default_variance_method_is_placebo(self):
         assert sdid.variance_method == "placebo"
 
 
+class TestVarianceEffectsRename:
+    """`placebo_effects` was renamed to `variance_effects`; the old name is a
+    deprecated read-only alias (removed in v4.0.0)."""
+
+    def test_placebo_effects_deprecated_alias(self):
+        """result.placebo_effects warns and returns the same array as
+        result.variance_effects; the alias is read-only."""
+        df = _make_panel(seed=42)
+        res = SyntheticDiD(variance_method="placebo", n_bootstrap=50, seed=1).fit(
+            df,
+            outcome="outcome",
+            treatment="treated",
+            unit="unit",
+            time="period",
+            post_periods=[5, 6, 7],
+        )
+        with pytest.warns(DeprecationWarning, match="placebo_effects is deprecated"):
+            aliased = res.placebo_effects
+        # Alias returns the identical object held by the renamed field.
+        assert aliased is res.variance_effects
+        # Read-only: assignment to the deprecated property raises.
+        with pytest.raises(AttributeError):
+            res.placebo_effects = aliased
+
+    def test_variance_effects_access_emits_no_warning(self):
+        """Normal use must not route through the deprecated alias: reading
+        variance_effects and calling get_loo_effects_df() (which reads the
+        field internally) must emit no DeprecationWarning."""
+        df = _make_panel(n_control=15, n_treated=3, seed=42)
+        res = SyntheticDiD(variance_method="jackknife", seed=42).fit(
+            df,
+            outcome="outcome",
+            treatment="treated",
+            unit="unit",
+            time="period",
+            post_periods=[5, 6, 7],
+        )
+        with warnings.catch_warnings():
+            warnings.simplefilter("error", DeprecationWarning)
+            assert res.variance_effects is not None
+            # get_loo_effects_df() reads the field internally; if it routed
+            # through the alias this would raise the escalated warning.
+            res.get_loo_effects_df()
+
+
 class TestNoiseLevelEdgeCases:
     """Edge case tests for _compute_noise_level_numpy."""
 
@@ -2529,18 +2574,18 @@ def test_bootstrap_raises_value_error(self):
         with pytest.raises(ValueError, match="variance_method='jackknife'"):
             res.get_loo_effects_df()
 
-    def test_positional_mapping_matches_placebo_effects(self):
-        """First n_control positions in placebo_effects map to control_units,
+    def test_positional_mapping_matches_variance_effects(self):
+        """First n_control positions in variance_effects map to control_units,
         next n_treated map to treated_units."""
         res = self._fit_jackknife()
-        pe = res.placebo_effects
+        pe = res.variance_effects
         ids = res._loo_unit_ids
         roles = res._loo_roles
         assert list(ids[: res.n_control]) == res._fit_snapshot.control_unit_ids
         assert list(ids[res.n_control :]) == res._fit_snapshot.treated_unit_ids
         assert roles[: res.n_control].count("control") == res.n_control
         assert roles[res.n_control :].count("treated") == res.n_treated
-        # The DataFrame values equal placebo_effects values (up to row permutation)
+        # The DataFrame values equal variance_effects values (up to row permutation)
         loo = res.get_loo_effects_df()
         assert np.allclose(
             sorted(loo["att_loo"].dropna().to_numpy()),
@@ -2919,6 +2964,24 @@ def test_snapshot_dropped_on_pickle(self):
             restored.synthetic_pre_trajectory, res.synthetic_pre_trajectory
         )
 
+    def test_legacy_pickle_state_maps_placebo_effects(self):
+        """A result pickled before the placebo_effects → variance_effects
+        rename (state carries the old key, no variance_effects) loads with the
+        draws under variance_effects and the deprecated alias still surfaces
+        them."""
+        res = self._fit(seed=109)  # jackknife → variance_effects populated
+        effects = np.asarray(res.variance_effects)
+        assert effects.size > 0
+        # Simulate a <=3.5.x pickle state: old field name, no variance_effects.
+        legacy_state = res.__getstate__()
+        legacy_state["placebo_effects"] = legacy_state.pop("variance_effects")
+        assert "variance_effects" not in legacy_state
+        restored = type(res).__new__(type(res))
+        restored.__setstate__(legacy_state)
+        assert np.allclose(np.asarray(restored.variance_effects), effects)
+        with pytest.warns(DeprecationWarning, match="placebo_effects is deprecated"):
+            assert np.allclose(np.asarray(restored.placebo_effects), effects)
+
     def test_in_time_placebo_raises_after_pickle(self):
         import pickle
 
@@ -3073,7 +3136,7 @@ def test_baseline_parity_small_scale(self, variance_method):
         se_rel = 1e-7 if variance_method == "placebo" else 1e-14
         assert r.se == pytest.approx(se0, rel=se_rel)
         assert r.p_value == pytest.approx(p0, rel=1e-14)
-        assert len(r.placebo_effects) == n0
+        assert len(r.variance_effects) == n0
 
     @pytest.mark.parametrize("variance_method", ["placebo", "bootstrap", "jackknife"])
     def test_scale_equivariance(self, variance_method, ci_params):
@@ -3088,7 +3151,7 @@ def test_scale_equivariance(self, variance_method, ci_params):
             warnings.simplefilter("ignore", UserWarning)
             r0 = self._fit(data, variance_method, n_bootstrap=nb)
         att0, se0, p0 = r0.att, r0.se, r0.p_value
-        n0 = len(r0.placebo_effects)
+        n0 = len(r0.variance_effects)
         noise0 = r0.noise_level
         zeta_omega0 = r0.zeta_omega
 
@@ -3099,9 +3162,8 @@ def test_scale_equivariance(self, variance_method, ci_params):
                 r = self._fit(scaled, variance_method, n_bootstrap=nb)
             # Variance-method success count must be identical; divergence
             # would shift the empirical p-value floor 1/(n+1).
-            assert len(r.placebo_effects) == n0, (
-                f"(a={a}, b={b}) yielded {len(r.placebo_effects)} effects, "
-                f"baseline had {n0}"
+            assert len(r.variance_effects) == n0, (
+                f"(a={a}, b={b}) yielded {len(r.variance_effects)} effects, " f"baseline had {n0}"
             )
             assert r.att / a == pytest.approx(att0, rel=1e-8), f"att failed at a={a}, b={b}"
             assert r.se / abs(a) == pytest.approx(se0, rel=1e-6), f"se failed at a={a}, b={b}"
@@ -3198,7 +3260,7 @@ def test_bootstrap_p_value_matches_analytical(self, ci_params):
     def test_placebo_p_value_uses_empirical_formula(self, ci_params):
         """Placebo p-value must equal max(mean(|draws| >= |att|), 1/(r+1))."""
         # Self-consistency check (reported p vs the empirical formula on the reported
-        # placebo_effects) — independent of the draw count, so ci_params scaling is safe.
+        # variance_effects) — independent of the draw count, so ci_params scaling is safe.
         df = _make_panel(seed=42)
         with warnings.catch_warnings():
             warnings.simplefilter("ignore", UserWarning)
@@ -3209,9 +3271,9 @@ def test_placebo_p_value_uses_empirical_formula(self, ci_params):
                 unit="unit", time="period",
                 post_periods=[5, 6, 7],
             )
-        placebo_effects = np.asarray(r.placebo_effects)
-        empirical_p = float(np.mean(np.abs(placebo_effects) >= np.abs(r.att)))
-        expected_p = max(empirical_p, 1.0 / (len(placebo_effects) + 1))
+        variance_effects = np.asarray(r.variance_effects)
+        empirical_p = float(np.mean(np.abs(variance_effects) >= np.abs(r.att)))
+        expected_p = max(empirical_p, 1.0 / (len(variance_effects) + 1))
         assert abs(r.p_value - expected_p) < 1e-12, (
             f"placebo p_value={r.p_value} != empirical {expected_p}"
         )
diff --git a/tests/test_survey_phase5.py b/tests/test_survey_phase5.py
index 113b77bf1..9f601161a 100644
--- a/tests/test_survey_phase5.py
+++ b/tests/test_survey_phase5.py
@@ -1373,7 +1373,7 @@ def test_jackknife_full_design_stratum_aggregation_formula_magnitude(
         # has PSUs {3, 4, 5} (n_h=3). No FPC → f_h=0 for both. Every
         # PSU-LOO is well-defined, so tau_loo_all has 3 + 3 = 6 entries
         # ordered as [s0 PSU 0, s0 PSU 1, s0 PSU 2, s1 PSU 3, s1 PSU 4, s1 PSU 5].
-        taus = np.asarray(result.placebo_effects, dtype=float)
+        taus = np.asarray(result.variance_effects, dtype=float)
         assert len(taus) == 6
         # Apply the Rust & Rao formula by hand. Y_scale rescaling is
         # applied uniformly to tau_loo_all inside fit(), so the formula
@@ -1520,7 +1520,7 @@ def test_get_loo_effects_df_works_on_pweight_only_jackknife(
         assert getattr(result, "_loo_granularity", None) == "unit"
         # Accessor returns a unit-indexed DataFrame with the expected
         # schema; positional join is well-defined on the pweight-only
-        # path because ``placebo_effects`` has length n_control + n_treated.
+        # path because ``variance_effects`` has length n_control + n_treated.
         df = result.get_loo_effects_df()
         assert len(df) == result.n_control + result.n_treated
         assert set(df.columns) == {"unit", "role", "att_loo", "delta_from_full"}