EconML Tutorial 12: Panel And Longitudinal Extensions
Most causal machine-learning examples treat each row as an independent unit observed once. Many real systems do not look like that. The same user, account, product, store, market, or patient can be observed repeatedly over time. Treatment can happen at several points, and earlier treatment can change later state, later treatment, and the final outcome.
This notebook introduces the panel and longitudinal version of the CATE problem. Instead of asking only “what is the effect of treatment now?”, we ask:
What is the effect of treatment in period 0 on the final outcome?
What is the effect of treatment in period 1 on the final outcome?
Do later treatments matter more than earlier treatments?
How do we avoid splitting the same unit across train and test folds?
When is a cross-sectional shortcut answering the wrong question?
EconML includes DynamicDML, a panel estimator for sequential treatment settings with balanced panels. We will use synthetic data with known period-specific effects so we can compare the dynamic estimates with the truth.
Learning Goals
By the end of this notebook, you should be able to:
recognize when repeated observations violate ordinary row-level independence assumptions;
distinguish a snapshot treatment effect from a dynamic treatment-history effect;
construct a balanced long-format panel for DynamicDML;
use groups so cross-fitting keeps each unit together;
define baseline heterogeneity features X and time-varying controls W;
estimate period-specific effects of treatment on a final outcome;
compare dynamic estimates with cross-sectional shortcuts;
report panel-specific diagnostics and limitations clearly.
Tutorial Flow
We will first define the panel causal problem and the assumptions that make dynamic treatment-effect estimation credible. Then we will simulate a balanced four-period panel, where each unit receives a continuous treatment intensity in each period and has one final outcome.
After the EDA and support checks, we compare two shortcut analyses with a proper dynamic panel workflow:
A pooled row-level shortcut that ignores repeated observations.
A last-period snapshot DML analysis.
A DynamicDML model that estimates one final-outcome effect for each treatment period.
The notebook ends with segment summaries, cumulative treatment-history effects, and a reporting checklist.
Setup
This cell imports the notebook dependencies, creates output folders, and sets plotting defaults. The warning filters remove harmless display and pandas-to-NumPy conversion messages so the saved tutorial stays clean.
from pathlib import Pathimport osimport warnings# Suppress optional widget warnings that can appear while importing EconML in headless notebook runs.warnings.filterwarnings("ignore", message="IProgress not found.*")# Keep Matplotlib cache files in a writable location during notebook execution.os.environ.setdefault("MPLCONFIGDIR", "/tmp/matplotlib")import econmlimport matplotlib.pyplot as pltfrom matplotlib.ticker import PercentFormatterimport numpy as npimport pandas as pdimport seaborn as snsfrom IPython.display import displayfrom sklearn.ensemble import RandomForestRegressorfrom sklearn.linear_model import LinearRegressionfrom sklearn.metrics import mean_absolute_error, mean_squared_errorfrom sklearn.model_selection import train_test_splitfrom econml.dml import LinearDMLfrom econml.panel.dml import DynamicDMLwarnings.filterwarnings("ignore", message="X does not have valid feature names.*", category=UserWarning)warnings.filterwarnings("ignore", message="Not all column names are strings.*", category=UserWarning)warnings.filterwarnings("ignore", message="Co-variance matrix is underdetermined.*", category=UserWarning)warnings.filterwarnings("ignore", category=FutureWarning)sns.set_theme(style="whitegrid", context="notebook")plt.rcParams["figure.figsize"] = (10, 6)plt.rcParams["axes.titleweight"] ="bold"plt.rcParams["axes.labelsize"] =11def find_project_root(start=None):"""Find the repository root from either the repo or a nested notebook folder.""" start = Path.cwd() if start isNoneelse Path(start)for candidate in [start, *start.parents]:if (candidate /"pyproject.toml").exists() and (candidate /"notebooks").exists():return candidatereturn Path.cwd()PROJECT_ROOT = find_project_root()NOTEBOOK_DIR = PROJECT_ROOT /"notebooks"/"tutorials"/"econml"OUTPUT_DIR = NOTEBOOK_DIR /"outputs"FIGURE_DIR = OUTPUT_DIR /"figures"TABLE_DIR = OUTPUT_DIR /"tables"FIGURE_DIR.mkdir(parents=True, exist_ok=True)TABLE_DIR.mkdir(parents=True, exist_ok=True)rng = np.random.default_rng(202612)print(f"Project root: {PROJECT_ROOT}")print(f"EconML version: {econml.__version__}")print(f"Figures will be saved to: {FIGURE_DIR.relative_to(PROJECT_ROOT)}")print(f"Tables will be saved to: {TABLE_DIR.relative_to(PROJECT_ROOT)}")
Project root: /home/apex/Documents/ranking_sys
EconML version: 0.16.0
Figures will be saved to: notebooks/tutorials/econml/outputs/figures
Tables will be saved to: notebooks/tutorials/econml/outputs/tables
The environment is ready. All saved artifacts from this notebook use the 12_ prefix.
Panel Vocabulary
Panel and longitudinal settings have their own vocabulary. This table defines the terms used throughout the notebook.
panel_vocabulary = pd.DataFrame( [ {"term": "Unit","meaning": "The entity observed repeatedly over time.","notebook_example": "A synthetic account observed for four periods.", }, {"term": "Period","meaning": "A time step within a unit's history.","notebook_example": "Periods 0, 1, 2, and 3.", }, {"term": "Treatment history","meaning": "The sequence of treatment intensities assigned over time.","notebook_example": "Four treatment intensity values per unit.", }, {"term": "Final outcome","meaning": "The outcome measured after the treatment history has unfolded.","notebook_example": "A final value metric observed after period 3.", }, {"term": "Baseline features X","meaning": "Features used to describe treatment-effect heterogeneity.","notebook_example": "Need, quality, friction, price sensitivity, and region risk.", }, {"term": "Time-varying controls W","meaning": "Observed state variables that change over time and affect treatment assignment or outcome.","notebook_example": "State before treatment, seasonality, and lagged treatment.", }, {"term": "Group split","meaning": "A train-test or cross-fitting split that keeps all rows from a unit together.","notebook_example": "All four periods of an account stay in the same split.", }, ])panel_vocabulary.to_csv(TABLE_DIR /"12_panel_vocabulary.csv", index=False)display(panel_vocabulary)
term
meaning
notebook_example
0
Unit
The entity observed repeatedly over time.
A synthetic account observed for four periods.
1
Period
A time step within a unit's history.
Periods 0, 1, 2, and 3.
2
Treatment history
The sequence of treatment intensities assigned...
Four treatment intensity values per unit.
3
Final outcome
The outcome measured after the treatment histo...
A final value metric observed after period 3.
4
Baseline features X
Features used to describe treatment-effect het...
Need, quality, friction, price sensitivity, an...
5
Time-varying controls W
Observed state variables that change over time...
State before treatment, seasonality, and lagge...
6
Group split
A train-test or cross-fitting split that keeps...
All four periods of an account stay in the sam...
The most important shift is from row-level thinking to unit-history thinking. A panel row is not an independent unit; it is one moment inside a unit’s trajectory.
DynamicDML Capability Map
DynamicDML has a specific input structure. This table summarizes what it expects and how that differs from earlier cross-sectional notebooks.
dynamic_dml_map = pd.DataFrame( [ {"component": "Y","expected_shape": "n_units * n_periods rows","role": "Outcome array. DynamicDML targets the final-period outcome while using the long panel structure.", }, {"component": "T","expected_shape": "n_units * n_periods rows","role": "Treatment assigned in each period. Effects are returned separately for each period.", }, {"component": "X","expected_shape": "n_units * n_periods rows by d_x columns","role": "Heterogeneity features. First-period X is used for CATE heterogeneity; later X values act like controls.", }, {"component": "W","expected_shape": "n_units * n_periods rows by d_w columns","role": "Time-varying controls that help adjust treatment and outcome nuisance models.", }, {"component": "groups","expected_shape": "n_units * n_periods rows","role": "Unit identifier. Required so each unit's periods stay together during cross-fitting.", }, {"component": "balanced panel","expected_shape": "same number of periods per unit","role": "DynamicDML expects equal-length histories in the installed implementation.", }, ])dynamic_dml_map.to_csv(TABLE_DIR /"12_dynamic_dml_input_map.csv", index=False)display(dynamic_dml_map)
component
expected_shape
role
0
Y
n_units * n_periods rows
Outcome array. DynamicDML targets the final-pe...
1
T
n_units * n_periods rows
Treatment assigned in each period. Effects are...
2
X
n_units * n_periods rows by d_x columns
Heterogeneity features. First-period X is used...
3
W
n_units * n_periods rows by d_w columns
Time-varying controls that help adjust treatme...
4
groups
n_units * n_periods rows
Unit identifier. Required so each unit's perio...
5
balanced panel
same number of periods per unit
DynamicDML expects equal-length histories in t...
This input map is the practical contract for the notebook. The synthetic data will be deliberately balanced and sorted by unit-period order.
Longitudinal Assumptions
Dynamic causal analysis needs more than a model. The table below lists the main assumptions and the checks we can partially support with observed data.
longitudinal_assumptions = pd.DataFrame( [ {"assumption": "Sequential ignorability","plain_language": "At each period, treatment is as-if random after conditioning on observed history.","practical_check": "Include relevant lagged state and treatment history in W; inspect treatment predictability.", }, {"assumption": "No future leakage","plain_language": "Features used for period t treatment cannot include information observed after treatment t.","practical_check": "Build features from baseline or pre-period state only.", }, {"assumption": "Positivity over treatment histories","plain_language": "Comparable units have enough variation in treatment at each period.","practical_check": "Check treatment distributions by period and segment.", }, {"assumption": "Stable unit histories","plain_language": "Each unit's history is internally ordered and not mixed with other units.","practical_check": "Validate one row per unit-period and group-contiguous ordering.", }, {"assumption": "Correct estimand timing","plain_language": "The outcome window starts after the treatment history being evaluated.","practical_check": "Define treatment periods and final outcome explicitly before modeling.", }, ])longitudinal_assumptions.to_csv(TABLE_DIR /"12_longitudinal_assumptions.csv", index=False)display(longitudinal_assumptions)
assumption
plain_language
practical_check
0
Sequential ignorability
At each period, treatment is as-if random afte...
Include relevant lagged state and treatment hi...
1
No future leakage
Features used for period t treatment cannot in...
Build features from baseline or pre-period sta...
2
Positivity over treatment histories
Comparable units have enough variation in trea...
Check treatment distributions by period and se...
3
Stable unit histories
Each unit's history is internally ordered and ...
Validate one row per unit-period and group-con...
4
Correct estimand timing
The outcome window starts after the treatment ...
Define treatment periods and final outcome exp...
The assumptions are timing-heavy. In longitudinal work, a model can be mathematically valid but still answer the wrong question if the time ordering is wrong.
Teaching Data Design
The next cell creates a balanced panel with four periods per unit. Each unit has baseline features, a time-varying state before treatment, a treatment intensity in each period, and one final outcome.
The true treatment effect varies by period and by baseline unit characteristics. Later periods are designed to have larger direct effects on the final outcome, but earlier periods still matter.
The first eight rows show two units over four periods each. The final outcome is repeated across a unit’s rows because the target is one final outcome after the full treatment history.
Field Dictionary
Panel notebooks get confusing quickly if the role of each column is not explicit. This table separates baseline features, time-varying controls, treatment, outcome, and teaching-only truth columns.
panel_field_dictionary = pd.DataFrame( [ ("unit_id", "Panel identifier", "Unique identifier for the repeated unit."), ("period", "Time index", "Period number inside the unit history."), ("baseline_need", "Baseline feature X", "Pre-history need or demand signal."), ("user_quality", "Baseline feature X", "Stable quality or fit signal."), ("friction_score", "Baseline feature X", "Stable friction signal."), ("price_sensitivity", "Baseline feature X", "Stable sensitivity to cost or effort."), ("region_risk", "Baseline feature X", "Binary marker for lower baseline outcome regions."), ("high_need_segment", "Baseline feature X", "Binary segment derived from baseline need."), ("state_before_treatment", "Time-varying control W", "Observed state immediately before period treatment."), ("seasonality", "Time-varying control W", "Period-specific seasonality signal."), ("lagged_treatment", "Time-varying control W", "Treatment intensity from the previous period."), ("treatment_intensity", "Treatment T", "Continuous treatment intensity assigned in each period."), ("final_outcome", "Outcome Y", "Outcome measured after all four treatment periods."), ("true_period_effect", "Teaching-only truth", "True final-outcome effect of one treatment unit in this period."), ("true_cumulative_effect", "Teaching-only truth", "Sum of true period effects for one-unit increases in all periods."), ], columns=["field", "role", "description"],)panel_field_dictionary.to_csv(TABLE_DIR /"12_panel_field_dictionary.csv", index=False)display(panel_field_dictionary)
field
role
description
0
unit_id
Panel identifier
Unique identifier for the repeated unit.
1
period
Time index
Period number inside the unit history.
2
baseline_need
Baseline feature X
Pre-history need or demand signal.
3
user_quality
Baseline feature X
Stable quality or fit signal.
4
friction_score
Baseline feature X
Stable friction signal.
5
price_sensitivity
Baseline feature X
Stable sensitivity to cost or effort.
6
region_risk
Baseline feature X
Binary marker for lower baseline outcome regions.
7
high_need_segment
Baseline feature X
Binary segment derived from baseline need.
8
state_before_treatment
Time-varying control W
Observed state immediately before period treat...
9
seasonality
Time-varying control W
Period-specific seasonality signal.
10
lagged_treatment
Time-varying control W
Treatment intensity from the previous period.
11
treatment_intensity
Treatment T
Continuous treatment intensity assigned in eac...
12
final_outcome
Outcome Y
Outcome measured after all four treatment peri...
13
true_period_effect
Teaching-only truth
True final-outcome effect of one treatment uni...
14
true_cumulative_effect
Teaching-only truth
Sum of true period effects for one-unit increa...
The time-varying controls are observed before each period’s treatment. That timing is essential; controls measured after treatment would create leakage or block part of the treatment effect.
Balanced Panel Checks
DynamicDML expects a balanced panel in this implementation. This cell checks that every unit has exactly four periods and that there is one row for each unit-period pair.
The panel is balanced and clean. This validation step is not glamorous, but it prevents a large class of panel estimator failures.
Basic Panel Summary
This cell summarizes the number of units, treatment variation, outcome variation, and true effects. We report both row-level and unit-level quantities because a panel dataset contains repeated rows per unit.
The summary confirms there is treatment variation and effect heterogeneity. The cumulative effect is larger than any single period effect because it sums the impact of treatment across the history.
Treatment And State By Period
A panel analysis should inspect how treatment and state evolve over time. This table summarizes treatment intensity and pre-treatment state by period.
Treatment and state vary across periods. The true effect also changes by period, which is exactly why a single pooled treatment effect is not the right estimand.
Treatment Distribution By Period
The next plot shows treatment-intensity distributions separately by period. Similar support across periods makes period-specific effect estimation more stable.
The distributions overlap across periods, so the dynamic estimator has treatment variation to work with in each period. If one period had almost no treatment variation, its effect would be harder to estimate.
True Period Effect Distribution
Because this is synthetic data, we can inspect the true effect for each period. Later periods are designed to matter more on average.
The true effect is not constant over time. This is the main reason DynamicDML is more appropriate than collapsing the panel into one generic treatment row.
Time-Varying Confounding Check
Treatment assignment depends on current state and lagged treatment. This cell quantifies those relationships by period so the adjustment problem is explicit.
def safe_corr(left, right):"""Return a correlation, or NaN when one side is constant.""" left = np.asarray(left) right = np.asarray(right)if np.std(left) <1e-12or np.std(right) <1e-12:return np.nanreturn np.corrcoef(left, right)[0, 1]time_confounding_rows = []for period, period_df in panel_df.groupby("period"): time_confounding_rows.append( {"period": period,"corr_treatment_state_before": safe_corr(period_df["treatment_intensity"], period_df["state_before_treatment"]),"corr_treatment_lagged_treatment": safe_corr(period_df["treatment_intensity"], period_df["lagged_treatment"]),"corr_treatment_baseline_need": safe_corr(period_df["treatment_intensity"], period_df["baseline_need"]), } )time_varying_confounding = pd.DataFrame(time_confounding_rows)time_varying_confounding.to_csv(TABLE_DIR /"12_time_varying_confounding_checks.csv", index=False)display(time_varying_confounding)
period
corr_treatment_state_before
corr_treatment_lagged_treatment
corr_treatment_baseline_need
0
0
0.517404
NaN
0.570621
1
1
0.579897
0.602723
0.606140
2
2
0.618740
0.648185
0.602884
3
3
0.645019
0.672060
0.611827
The treatment is related to pre-treatment state and treatment history. A snapshot model that ignores this timing can easily answer a distorted question.
Unit-Level Train-Test Split
Panel data must be split by unit, not by row. If the same unit appears in both train and test data, the evaluation leaks information across time from the same unit.
No unit appears in both splits. This is the correct evaluation pattern for panel data because all periods from a unit travel together.
Model Matrices For DynamicDML
This cell creates the long-format objects passed to DynamicDML. Baseline features go into X, time-varying controls go into W, treatment goes into T, and unit IDs go into groups.
x_cols = ["baseline_need","user_quality","friction_score","price_sensitivity","region_risk","high_need_segment",]w_cols = ["state_before_treatment", "seasonality", "lagged_treatment"]Y_train = panel_train["final_outcome"].to_numpy()T_train = panel_train["treatment_intensity"].to_numpy()X_train = panel_train[x_cols].copy()W_train = panel_train[w_cols].copy()groups_train = panel_train["unit_id"].to_numpy()Y_test = panel_test["final_outcome"].to_numpy()T_test = panel_test["treatment_intensity"].to_numpy()X_test_long = panel_test[x_cols].copy()W_test = panel_test[w_cols].copy()groups_test = panel_test["unit_id"].to_numpy()first_period_test = panel_test[panel_test["period"] ==0].reset_index(drop=True)X_test_first = first_period_test[x_cols].copy()model_matrix_summary = pd.DataFrame( {"object": ["Y_train", "T_train", "X_train", "W_train", "groups_train", "X_test_first"],"shape_or_length": [len(Y_train), len(T_train), X_train.shape, W_train.shape, len(groups_train), X_test_first.shape],"role": ["Final outcome repeated in long panel format.","Treatment intensity in each period.","Baseline heterogeneity features in long format.","Time-varying controls in long format.","Unit IDs for grouped cross-fitting.","First-period X for unit-level effect prediction.", ], })model_matrix_summary.to_csv(TABLE_DIR /"12_dynamic_dml_model_matrix_summary.csv", index=False)display(model_matrix_summary)
object
shape_or_length
role
0
Y_train
4160
Final outcome repeated in long panel format.
1
T_train
4160
Treatment intensity in each period.
2
X_train
(4160, 6)
Baseline heterogeneity features in long format.
3
W_train
(4160, 3)
Time-varying controls in long format.
4
groups_train
4160
Unit IDs for grouped cross-fitting.
5
X_test_first
(560, 6)
First-period X for unit-level effect prediction.
The training arrays have one row per unit-period. The effect prediction matrix has one row per held-out unit because treatment-history effects are predicted at the unit level.
Shortcut 1: Pooled Row-Level DML
The first shortcut treats each unit-period row as if it were an independent observation. This is a common mistake. It estimates one generic treatment effect, ignoring the fact that treatment period matters and that rows from the same unit share a final outcome.
The pooled shortcut produces a number, but the estimand is muddled. It mixes period effects and treats repeated final outcomes as separate independent rows.
Shortcut 2: Last-Period Snapshot DML
A less severe shortcut uses only the final period and estimates the effect of final-period treatment on the final outcome. This is a coherent snapshot question, but it does not estimate the effects of earlier treatments.
The snapshot model is easier to interpret than the pooled shortcut, but it only answers a final-period question. It says nothing about treatment in periods 0, 1, or 2.
Fit DynamicDML
Now we fit the dynamic panel estimator. We use linear nuisance models because the synthetic treatment and outcome relationships are mostly linear and the purpose here is the panel structure. groups tells EconML to keep each unit’s history together during cross-fitting.
DynamicDML effect matrix shape: (560, 4)
First two held-out units, estimated period effects:
[[ 0.08906566 0.00553587 0.02922312 0.1746227 ]
[-0.00459736 0.2661314 0.27216845 0.42836797]]
The output has one row per held-out unit and one column per treatment period. This is the key dynamic object: a period-specific effect profile for each unit.
True Effect Matrix
To evaluate the estimator, we reshape the synthetic truth into the same unit-by-period matrix as the DynamicDML output.
The true matrix uses the same order as the estimated matrix. This alignment matters because panel reshaping errors are easy to make and hard to notice after modeling.
DynamicDML Recovery By Period
This cell compares estimated and true effects period by period. The estimator is trying to recover four different CATE functions, not one generic treatment effect.
The table shows how recovery differs by period. This is more informative than a single global score because each treatment period has its own effect function.
Period Mean Effect Plot
The next figure compares true and estimated mean effects by period. It should show the increasing effect pattern over time.
period_mean_plot = period_recovery.melt( id_vars="period", value_vars=["true_mean_effect", "estimated_mean_effect"], var_name="quantity", value_name="effect",)period_mean_plot["quantity"] = period_mean_plot["quantity"].map( {"true_mean_effect": "true mean effect", "estimated_mean_effect": "estimated mean effect"})fig, ax = plt.subplots(figsize=(10, 5))sns.lineplot(data=period_mean_plot, x="period", y="effect", hue="quantity", marker="o", linewidth=2, ax=ax)ax.axhline(0, color="#111827", linestyle=":", linewidth=1)ax.set_title("Mean Dynamic Treatment Effect By Period")ax.set_xlabel("Treatment period")ax.set_ylabel("Effect on final outcome")plt.tight_layout()fig.savefig(FIGURE_DIR /"12_dynamic_period_mean_effects.png", dpi=160, bbox_inches="tight")plt.show()
The plot is the simplest dynamic story: treatment timing matters. Later treatment periods have larger effects in this teaching setup.
Period-Specific CATE Recovery Plot
Mean effects can look good even when individual heterogeneity is poorly recovered. This plot shows true versus estimated CATE for each treatment period.
The panels show where heterogeneity is easier or harder to recover. Dynamic CATE estimation is more demanding than a single-period model because it estimates a separate profile for each treatment time.
Cumulative Treatment-History Effect
A useful summary is the effect of increasing treatment intensity by one unit in every period. For a linear treatment-history model, that cumulative effect is the sum of the period effects.
The cumulative effect provides a unit-level summary of the full treatment history. The direct effect call and the sum of period effects agree, which is a useful shape sanity check.
Cumulative Effect Plot
This plot compares true and estimated cumulative effects for held-out units.
The cumulative plot turns four period effects into one unit-level summary. It is useful for ranking units by total treatment-history responsiveness.
Period Treatment Contrast Checks
DynamicDML.effect can evaluate custom treatment-history contrasts. This cell verifies that a one-unit increase in a single period matches the corresponding period-specific marginal effect.
The custom contrasts match the period-specific marginal effects. This helps students see how to ask counterfactual treatment-history questions with the fitted model.
Coefficient Table
DynamicDML uses a linear final CATE model. The coefficient table shows how baseline features modify each period’s treatment effect.
The coefficients are a compact way to see effect modification. Positive baseline-need coefficients in later periods mean higher-need units have larger estimated treatment effects later in the history.
Coefficient Heatmap
The heatmap makes period-by-feature coefficient patterns easier to scan than a long table.
The segment plot gives a clean story: timing matters, and the time pattern differs by baseline need. This is the kind of chart that often belongs in a final report.
Policy-Style Treatment-History Ranking
A dynamic effect model can rank units by the estimated value of increasing treatment across the whole history. Here we compare top-20-percent targeting by cumulative dynamic effect against the synthetic truth.
The dynamic ranking selects units with higher true cumulative effects than the overall held-out population. The oracle row gives the synthetic upper benchmark.
Policy Ranking Plot
The final policy plot compares true cumulative effects among selected units.
The plot translates the dynamic CATE matrix into a prioritization decision. It also keeps the comparison honest by showing the all-unit baseline and oracle benchmark.
What The Shortcuts Miss
This table summarizes the difference between the pooled shortcut, the last-period snapshot model, and the dynamic panel model.
workflow_comparison = pd.DataFrame( [ {"workflow": "pooled row-level DML","question_answered": "What is one generic treatment effect if every unit-period row is treated as independent?","main_problem": "Mixes treatment periods and repeats the same final outcome across rows.","when_it_can_be_useful": "Quick exploratory baseline, not a final panel causal design.", }, {"workflow": "last-period snapshot DML","question_answered": "What is the effect of final-period treatment on the final outcome?","main_problem": "Ignores earlier treatment effects and treatment-history questions.","when_it_can_be_useful": "When the causal question is explicitly about one period only.", }, {"workflow": "DynamicDML","question_answered": "What is the effect of treatment in each period on the final outcome?","main_problem": "Requires balanced histories and credible observed treatment history controls.","when_it_can_be_useful": "When treatment timing and treatment history are central to the decision.", }, ])workflow_comparison.to_csv(TABLE_DIR /"12_panel_workflow_comparison.csv", index=False)display(workflow_comparison)
workflow
question_answered
main_problem
when_it_can_be_useful
0
pooled row-level DML
What is one generic treatment effect if every ...
Mixes treatment periods and repeats the same f...
Quick exploratory baseline, not a final panel ...
1
last-period snapshot DML
What is the effect of final-period treatment o...
Ignores earlier treatment effects and treatmen...
When the causal question is explicitly about o...
2
DynamicDML
What is the effect of treatment in each period...
Requires balanced histories and credible obser...
When treatment timing and treatment history ar...
The dynamic model is not automatically better for every question. It is better matched to treatment-history questions, while a snapshot model is appropriate only for a snapshot estimand.
Reporting Checklist
The final checklist turns the notebook into a reusable panel-analysis guide.
panel_reporting_checklist = pd.DataFrame( [ {"topic": "Unit and period definition","what_to_report": "Define the repeated unit and the treatment periods.","why_it_matters": "Panel causal questions are built around histories, not isolated rows.", }, {"topic": "Outcome timing","what_to_report": "State when the final outcome is measured relative to treatment periods.","why_it_matters": "Features and treatments must precede the outcome window.", }, {"topic": "Balance and ordering","what_to_report": "Check periods per unit, duplicate unit-period rows, and sorting.","why_it_matters": "DynamicDML expects balanced histories in this implementation.", }, {"topic": "Group splitting","what_to_report": "Keep all periods from a unit together in train/test and cross-fitting splits.","why_it_matters": "Row-level splitting leaks information across time for the same unit.", }, {"topic": "History controls","what_to_report": "List the time-varying controls included in W and verify they are pre-treatment.","why_it_matters": "Sequential ignorability depends on observed treatment history and state.", }, {"topic": "Period effects","what_to_report": "Report effects separately by treatment period before collapsing to a cumulative effect.","why_it_matters": "A single average effect can hide important timing differences.", }, {"topic": "Shortcut caveats","what_to_report": "Explain what pooled or snapshot analyses do and do not answer.","why_it_matters": "Shortcut models often answer narrower or muddier questions than intended.", }, ])panel_reporting_checklist.to_csv(TABLE_DIR /"12_panel_reporting_checklist.csv", index=False)display(panel_reporting_checklist)
topic
what_to_report
why_it_matters
0
Unit and period definition
Define the repeated unit and the treatment per...
Panel causal questions are built around histor...
1
Outcome timing
State when the final outcome is measured relat...
Features and treatments must precede the outco...
2
Balance and ordering
Check periods per unit, duplicate unit-period ...
DynamicDML expects balanced histories in this ...
3
Group splitting
Keep all periods from a unit together in train...
Row-level splitting leaks information across t...
4
History controls
List the time-varying controls included in W a...
Sequential ignorability depends on observed tr...
5
Period effects
Report effects separately by treatment period ...
A single average effect can hide important tim...
6
Shortcut caveats
Explain what pooled or snapshot analyses do an...
Shortcut models often answer narrower or muddi...
The checklist emphasizes timing, grouping, and estimand clarity. Those are the places where longitudinal analyses most often go wrong.
Summary
This notebook showed how to move from cross-sectional treatment effects to dynamic treatment-history effects.
The main lessons are:
Repeated unit-period rows are not independent cross-sectional observations.
Panel train-test splits should be done by unit, not by row.
Time-varying controls must be measured before each period’s treatment.
A pooled row-level model mixes period effects and repeats final outcomes.
A last-period snapshot model can be coherent but answers only a final-period question.
DynamicDML estimates one final-outcome effect for each treatment period.
Cumulative treatment-history effects can be formed by summing period effects or by using custom treatment-history contrasts.
Panel reporting should always state the unit, periods, outcome timing, history controls, and shortcut limitations.
The next tutorial compares estimator families on the same simulated ground truth.