DoubleML Tutorial 05: Interactive Regression Model IRM
This notebook is a full tutorial on DoubleMLIRM, the interactive regression model for binary treatments.
The previous notebooks handled continuous treatments. Here the treatment is binary: a unit either receives the intervention or does not. That changes the estimand, the nuisance functions, and the diagnostics. Instead of residualizing a continuous treatment, IRM combines outcome regressions and a propensity score to estimate ATE and ATTE-style targets.
The goal is to understand the full workflow: potential-outcome estimands, identification assumptions, propensity overlap, manual AIPW estimation, DoubleML fitting, nuisance diagnostics, and honest reporting.
Learning Goals
By the end of this notebook, you should be able to:
Explain the IRM setup for binary treatments.
Define ATE and ATTE in potential-outcome notation.
State the identification assumptions: consistency, unconfoundedness, overlap, and no interference.
Explain the roles of ml_g and ml_m in DoubleMLIRM.
Manually compute cross-fitted AIPW estimates for ATE and ATT-style targets.
Fit DoubleMLIRM for score="ATE" and score="ATTE".
Diagnose propensity overlap, inverse-propensity weights, nuisance losses, and score contributions.
Understand how weak overlap can inflate uncertainty.
Binary Treatments And Potential Outcomes
For a binary treatment, each unit has two potential outcomes:
\[
Y(1) \quad \text{and} \quad Y(0).
\]
Y(1) is the outcome the unit would have under treatment. Y(0) is the outcome the same unit would have without treatment. We observe only one of them:
\[
Y = D Y(1) + (1-D)Y(0),
\]
where D = 1 means treated and D = 0 means untreated.
Two common targets are:
\[
ATE = E[Y(1) - Y(0)]
\]
and
\[
ATT = E[Y(1) - Y(0) \mid D=1].
\]
DoubleML names the treated-target score ATTE. In this notebook we use ATTE when referring to the DoubleML score option and ATT-style language when explaining the estimand.
Identification Assumptions
IRM is not an instrument or randomized experiment by itself. It estimates causal effects from observational data under assumptions.
The standard observed-control identification assumptions are:
Consistency: the observed outcome equals the potential outcome under the treatment actually received.
Conditional unconfoundedness: after controlling for X, treatment assignment is independent of potential outcomes.
Overlap or positivity: every covariate profile has a nonzero probability of receiving each treatment state.
No interference: one unit’s treatment does not change another unit’s potential outcomes.
DoubleML helps with flexible nuisance estimation and orthogonal scores. It does not make these assumptions true.
IRM Nuisance Functions
DoubleMLIRM uses two main nuisance components:
\[
g_0(0, X) = E[Y \mid D=0, X]
\]
\[
g_0(1, X) = E[Y \mid D=1, X]
\]
and
\[
m_0(X) = P(D=1 \mid X).
\]
In the API:
ml_g estimates the outcome regression. DoubleML internally learns both treated and untreated outcome predictions.
ml_m estimates the propensity score, so it must be a classifier with predict_proba().
This is different from PLR and PLIV. In IRM, the treatment is binary and the propensity score is central.
The AIPW Score For ATE
The ATE score combines regression adjustment and inverse-propensity weighting:
The estimate is the average of this score. This is often called an augmented inverse-propensity weighted estimator.
The score is doubly robust in the usual sense: under suitable conditions, the estimate can remain consistent if either the propensity model or the outcome model is correctly specified. DoubleML adds cross-fitting and orthogonalization to make flexible machine-learning nuisance estimation more reliable.
Runtime Note
This notebook fits cross-fitted binary-treatment models, two DoubleML scores, and a repeated-splitting check. On a typical laptop, the full notebook should take roughly two to four minutes.
The examples use a synthetic dataset large enough to show overlap and heterogeneity, but small enough to rerun comfortably.
Setup
This cell prepares the notebook environment. It creates output folders, makes matplotlib cache writes local to the tutorial folder, imports scientific Python libraries, and records package versions.
The path logic supports running from the repository root or directly from the tutorial folder.
The package versions are part of the reproducibility record. Cross-fitting, tree learners, and classifier calibration can shift slightly across package versions.
Helper Functions
The helper functions below keep the analysis cells focused on the causal workflow.
They handle table saving, sigmoid transformation, baseline summaries, AIPW calculations, cross-fitted nuisance predictions, effective sample size, and DoubleML prediction extraction.
The manual AIPW formulas are included so the notebook does not feel like magic. DoubleML automates these ideas with careful score construction, sample splitting, repeated splitting, bootstrap tools, and consistent model objects.
Draw The IRM Design
The diagram below shows the observed-control design.
Observed covariates X affect both treatment assignment and potential outcomes. The binary treatment D affects the observed outcome Y. Under unconfoundedness, adjusting for X is enough to compare treated and untreated potential outcomes.
The dashed box reminds us that unobserved confounding is a design threat. If a hidden variable affects both treatment and outcome after conditioning on X, IRM is not enough.
The diagram has no instrument. It is an observed-control design. That means the covariates must be rich enough to make treated and untreated units comparable after adjustment.
Create A Teaching Dataset
We now simulate a binary-treatment dataset.
The treatment probability depends on observed controls, so the treated and untreated groups are not directly comparable. However, the treatment assignment is generated from observed controls only, so unconfoundedness holds by construction.
The treatment effect is heterogeneous: some units benefit more than others. This makes the ATE and ATT-style targets different.
Saved synthetic IRM data with shape (4000, 16)
Treatment rate: 0.459
True ATE: 0.996
True ATT-style target: 1.133
engagement_score
need_intensity
content_fit
recent_activity
price_sensitivity
tenure_signal
novelty_appetite
seasonality_signal
feature_exposure
weekly_value
true_propensity
true_mu0
true_mu1
true_tau
potential_outcome_0
potential_outcome_1
0
-0.793122
0.240571
-1.896326
1.395772
0.638295
-0.292047
-0.311949
0.303835
0
-1.008348
0.201385
-1.465574
-0.662037
0.803537
-1.008348
-0.204811
1
-0.267660
-0.225909
0.720068
0.514705
-0.064128
-0.085477
0.160916
-0.614018
0
-0.733904
0.495185
0.220266
1.217037
0.996771
-0.733904
0.262868
2
-0.403750
0.548260
-0.130483
-1.374426
-0.477279
0.656622
-0.232283
-0.148733
0
-0.257925
0.534182
0.425260
1.575338
1.150078
-0.257925
0.892153
3
0.641837
1.824610
-0.713189
1.348207
-1.230013
0.174978
-1.169530
1.351458
1
3.904644
0.692068
2.592242
4.364937
1.772695
2.131949
3.904644
4
0.833923
1.137717
-0.885533
0.684555
-0.519013
-0.457385
0.506537
0.876718
1
1.742986
0.690805
1.517669
3.106801
1.589132
0.153854
1.742986
The true ATE and ATT-style target differ because treatment is more likely for units with different expected treatment effects. That is common in observational settings: who gets treated matters.
Field Dictionary
This table documents the role of the important columns. In real work, this step prevents a common mistake: placing outcome-derived, treatment-derived, or post-treatment columns into the controls.
field_dictionary = pd.DataFrame( [ {"column": "weekly_value", "role": "outcome", "description": "Observed outcome Y."}, {"column": "feature_exposure", "role": "binary treatment", "description": "Observed binary treatment D."},*[ {"column": col, "role": "control", "description": "Observed pre-treatment control X."}for col in feature_cols ], {"column": "true_propensity", "role": "hidden teaching column", "description": "True treatment probability P(D=1 | X)."}, {"column": "true_mu0", "role": "hidden teaching column", "description": "True E[Y(0) | X]."}, {"column": "true_mu1", "role": "hidden teaching column", "description": "True E[Y(1) | X]."}, {"column": "true_tau", "role": "hidden teaching column", "description": "True individual treatment effect mu1(X) - mu0(X)."}, {"column": "potential_outcome_0", "role": "hidden teaching column", "description": "Simulated Y(0), not observed in real data for treated units."}, {"column": "potential_outcome_1", "role": "hidden teaching column", "description": "Simulated Y(1), not observed in real data for untreated units."}, ])save_table(field_dictionary, "field_dictionary")display(field_dictionary)
column
role
description
0
weekly_value
outcome
Observed outcome Y.
1
feature_exposure
binary treatment
Observed binary treatment D.
2
engagement_score
control
Observed pre-treatment control X.
3
need_intensity
control
Observed pre-treatment control X.
4
content_fit
control
Observed pre-treatment control X.
5
recent_activity
control
Observed pre-treatment control X.
6
price_sensitivity
control
Observed pre-treatment control X.
7
tenure_signal
control
Observed pre-treatment control X.
8
novelty_appetite
control
Observed pre-treatment control X.
9
seasonality_signal
control
Observed pre-treatment control X.
10
true_propensity
hidden teaching column
True treatment probability P(D=1 | X).
11
true_mu0
hidden teaching column
True E[Y(0) | X].
12
true_mu1
hidden teaching column
True E[Y(1) | X].
13
true_tau
hidden teaching column
True individual treatment effect mu1(X) - mu0(X).
14
potential_outcome_0
hidden teaching column
Simulated Y(0), not observed in real data for ...
15
potential_outcome_1
hidden teaching column
Simulated Y(1), not observed in real data for ...
The controls are all pre-treatment by construction. The true potential outcome columns are excluded from every model and used only to check synthetic truth.
Basic Data Audit
Before estimating effects, we check missingness, scale, treatment support, and the true targets in the synthetic data.
In binary-treatment problems, the treated and untreated sample sizes matter. If one group is tiny, nuisance learning and overlap diagnostics become fragile.
The treatment rate is comfortably away from zero and one. That does not guarantee overlap everywhere, but it tells us both treatment groups have enough rows for the tutorial.
Treatment Assignment Is Confounded By X
This figure shows that the true propensity score varies with observed controls. Treated units are not a random subset of all units.
The right panel shows the distribution of the true treatment effect. Since treatment effects vary across units, the ATE and ATT-style target need not match.
The treated group is shifted toward higher propensities and somewhat different effects. That is exactly why a raw mean difference can miss the ATE.
Correlation Screen
A correlation matrix is only a rough diagnostic, especially with nonlinear relationships. Still, it helps us see how treatment, outcome, true propensity, true effects, and controls relate in the synthetic data.
The raw mean difference is far from the ATE because treatment assignment depends on covariates. Adjustment moves the estimate toward the causal target, and the oracle rows show what happens when the nuisance functions are known.
Baseline Estimate Plot
This figure shows the baseline estimates. The blue dashed line marks the true ATE. The red dashed line marks the true ATT-style target.
The different reference lines matter because ATE and ATT-style targets are different in this dataset.
The plot makes the estimand distinction visible. A raw difference can be biased, and an ATT-style target can legitimately differ from the ATE when treatment effects vary.
Nuisance Learners
We will compare two nuisance specifications:
regularized linear/logistic learners;
histogram gradient boosting learners.
ml_g is a regressor because the outcome is continuous. ml_m is a classifier because it estimates the propensity score.
The nuisance learners are not the final estimand. They are tools for estimating the outcome regressions and propensity score used in the orthogonal score.
Manual Cross-Fitted AIPW
Before fitting DoubleML, we manually compute cross-fitted AIPW estimates.
The manual procedure estimates three nuisance quantities out of fold:
g0_hat(X): predicted outcome if untreated;
g1_hat(X): predicted outcome if treated;
m_hat(X): predicted probability of treatment.
Each row’s nuisance predictions come from models that did not train on that row.
The AIPW estimate uses both outcome regression and propensity weighting. In this synthetic design, it is much closer to the target than the raw mean difference.
Manual Nuisance Quality
Because the data are synthetic, we can compare the cross-fitted nuisance predictions to the true nuisance functions.
In real data, the true nuisance functions are unknown. We would instead inspect out-of-fold errors, propensity calibration, overlap, and domain plausibility.
The nuisance models recover the broad structure but are not perfect. AIPW is designed to tolerate nuisance error better than a simple plug-in estimator, especially with cross-fitting.
Propensity Overlap Diagnostics
Overlap means treated and untreated units exist across the relevant covariate space. Propensity scores near zero or one create large inverse-propensity weights and unstable estimates.
This cell summarizes the manual cross-fitted propensity predictions and the implied weights.
The predicted propensities stay away from zero and one. That is a healthy sign for this teaching dataset. In real data, this table is one of the first places to look for overlap trouble.
Propensity And Weight Plots
The left panel shows propensity overlap by treatment status. The right panel shows inverse-propensity weights on a log scale.
A small number of extreme weights can dominate an estimate, so weight diagnostics should be routine in IRM workflows.
The overlap is not perfect, but both treatment groups have support over much of the propensity range. The weight plot does not show a severe tail problem.
Outcome Nuisance Diagnostics
The next figure compares the manually cross-fitted outcome nuisance predictions against the true outcome nuisance functions.
The treated outcome model is trained only on treated rows, and the untreated outcome model is trained only on untreated rows. Cross-fitting keeps the displayed predictions out of fold.
This backend is intentionally simple: one outcome, one binary treatment, and observed controls. The real work is deciding whether those controls support an observed-control causal design.
Fit DoubleMLIRM For ATE And ATTE
We now fit four DoubleML models:
linear/logistic nuisance ATE;
linear/logistic nuisance ATTE;
gradient-boosted nuisance ATE;
gradient-boosted nuisance ATTE.
The score option changes the target. score="ATE" targets the population average effect. score="ATTE" targets the effect among treated units.
Finished: Linear nuisance IRM ATE
Finished: Linear nuisance IRM ATTE
Finished: Gradient boosting nuisance IRM ATE
Finished: Gradient boosting nuisance IRM ATTE
estimator
score
treatment
theta_hat
std_error
t_stat
p_value
ci_95_lower
ci_95_upper
true_target
bias_vs_target
0
Linear nuisance IRM ATE
ATE
feature_exposure
1.027968
0.056428
18.217240
3.766794e-74
0.917370
1.138565
0.996435
0.031533
1
Linear nuisance IRM ATTE
ATTE
feature_exposure
1.162320
0.057914
20.069872
1.353534e-89
1.048812
1.275829
1.132978
0.029342
2
Gradient boosting nuisance IRM ATE
ATE
feature_exposure
0.985272
0.053888
18.283853
1.112793e-74
0.879654
1.090890
0.996435
-0.011163
3
Gradient boosting nuisance IRM ATTE
ATTE
feature_exposure
1.048266
0.069878
15.001366
7.192421e-51
0.911308
1.185224
1.132978
-0.084712
The gradient-boosted nuisance specification is expected to do well because the synthetic treatment assignment and outcome functions are nonlinear. The linear/logistic specification remains useful as a transparent baseline.
Compare All Estimators
This table combines simple baselines, manual cross-fitted estimates, and DoubleML estimates.
When reading the table, compare each row to the right target. ATE rows should be compared with the true ATE. ATT-style and ATTE rows should be compared with the true treated target.
The comparison shows the point of IRM: use outcome and propensity nuisance models to adjust for the observed differences between treated and untreated units.
Estimate Comparison Plot
The next figure compares estimates with confidence intervals. The vertical reference lines mark the true ATE and true ATT-style target from the synthetic data.
The plot shows both bias reduction and estimand differences. ATE and ATT-style estimates should not be forced to agree when treatment effects are heterogeneous.
DoubleML Nuisance Losses
evaluate_learners() reports out-of-fold nuisance losses. For IRM, the keys are:
ml_g0: untreated outcome nuisance;
ml_g1: treated outcome nuisance;
ml_m: propensity score nuisance.
The loss values are diagnostics, not causal proof. A good propensity model cannot fix unobserved confounding.
loss_tables = []for model_name, model in irm_models.items(): loss_tables.append(learner_loss_table(model, model_name))nuisance_losses = pd.concat(loss_tables, ignore_index=True)save_table(nuisance_losses, "doubleml_nuisance_losses")display(nuisance_losses)
model
learner
mean_loss
min_loss
max_loss
0
Linear nuisance IRM ATE
ml_g0
1.314451
1.314451
1.314451
1
Linear nuisance IRM ATE
ml_g1
1.310666
1.310666
1.310666
2
Linear nuisance IRM ATE
ml_m
0.456163
0.456163
0.456163
3
Linear nuisance IRM ATTE
ml_g0
1.312939
1.312939
1.312939
4
Linear nuisance IRM ATTE
ml_g1
1.309140
1.309140
1.309140
5
Linear nuisance IRM ATTE
ml_m
0.455825
0.455825
0.455825
6
Gradient boosting nuisance IRM ATE
ml_g0
1.086616
1.086616
1.086616
7
Gradient boosting nuisance IRM ATE
ml_g1
1.063938
1.063938
1.063938
8
Gradient boosting nuisance IRM ATE
ml_m
0.470521
0.470521
0.470521
9
Gradient boosting nuisance IRM ATTE
ml_g0
1.086821
1.086821
1.086821
10
Gradient boosting nuisance IRM ATTE
ml_g1
1.070723
1.070723
1.070723
11
Gradient boosting nuisance IRM ATTE
ml_m
0.469567
0.469567
0.469567
The loss table helps compare nuisance specifications. Lower nuisance loss is useful, but the final causal estimate also depends on overlap and identification assumptions.
Extract DoubleML Predictions
We now pull the nuisance predictions from the gradient-boosted ATE model. These predictions are useful for diagnostics and for connecting DoubleML output back to the manual AIPW formula.
The DoubleML nuisance predictions have similar quality to the manual cross-fitted nuisance predictions. That is expected because they use the same learner family and the same core cross-fitting idea.
DoubleML Propensity Diagnostics
This cell summarizes the propensity predictions from the main DoubleML ATE model and the corresponding weights.
The propensity distribution is well behaved here. In real data, high maximum weights or a low effective sample size would lead to trimming, redesign, or a more limited target population.
DoubleML Nuisance Plots
This figure checks the gradient-boosted DoubleML nuisance predictions against true nuisance functions from the simulation.
The third panel compares predicted and true propensities. This is a useful calibration-style check in synthetic data.
The nuisance plots show useful but imperfect recovery. That is normal: orthogonal scores are built for realistic nuisance error, not perfect prediction.
Score Contributions
DoubleML stores orthogonal score contributions in psi. At the fitted estimate, the average score should be close to zero.
Large tails in the score distribution can point to extreme weights, poor overlap, or influential observations.
The bootstrap intervals are another uncertainty summary around the same observed-control design. They should be reported alongside overlap and nuisance diagnostics.
Repeated Sample Splitting
Cross-fitted estimates can vary slightly with the fold split. Repeated sample splitting checks whether the result is numerically stable across several random splits.
This cell uses a lighter gradient-boosting configuration to keep the check fast.
The estimates are stable across split repetitions in this synthetic setting. That supports numerical reliability, not causal identification by itself.
Overlap Stress Test
Now we isolate the overlap problem. We keep the same potential outcomes but make the treatment assignment rule more or less extreme.
This stress test uses the true nuisance functions so the only thing changing is propensity overlap. As propensities move toward zero or one, weights become more variable and uncertainty increases.
As the treatment rule becomes more extreme, the effective sample size falls and the maximum weight rises. This is the overlap problem in one table.
Overlap Stress Plot
The left panel shows uncertainty in the oracle AIPW estimate as overlap worsens. The right panel shows the effective sample size implied by inverse-propensity weights.
Poor overlap is not solved by a fancier model. It changes the support of the causal question. Sometimes the honest fix is to narrow the target population.
Propensity Clipping Sensitivity
In practice, analysts often clip or trim extreme propensity scores. Clipping can reduce variance from extreme weights, but it also changes the exact estimating equation.
This cell recomputes manual AIPW estimates under several clipping thresholds.
The clipping table is stable here because the propensity distribution is already well behaved. In weaker-overlap data, clipping sensitivity can be much larger.
When IRM Is The Right Or Wrong Tool
IRM is a good fit when:
the treatment is binary;
observed covariates plausibly adjust for treatment assignment;
there is overlap between treated and untreated units;
the target is ATE or ATT-style.
IRM is not enough when:
there is important unobserved confounding;
the treatment is continuous;
the treatment is assigned by an instrument or cutoff design;
overlap is so weak that treated and untreated units are not comparable;
interference between units is central to the question.
Reporting Checklist
This checklist turns the notebook into a reusable binary-treatment workflow.
A credible IRM report should include the target estimand, the treatment definition, control timing, propensity overlap, nuisance diagnostics, uncertainty, and limitations.
reporting_checklist = pd.DataFrame( [ {"item": "Causal question", "status": "Estimate effect of feature_exposure on weekly_value."}, {"item": "Treatment type", "status": "Binary treatment, so IRM is appropriate for ATE and ATTE-style targets."}, {"item": "Estimands", "status": "Both ATE and ATT-style targets are shown because treatment effects are heterogeneous."}, {"item": "Control timing", "status": "All controls are pre-treatment by construction in this synthetic dataset."}, {"item": "Unconfoundedness", "status": "Holds by construction here; requires design evidence in real data."}, {"item": "Overlap", "status": "Propensity and weight diagnostics reported."}, {"item": "Nuisance learners", "status": "Compared linear/logistic and gradient-boosted nuisance specifications."}, {"item": "Cross-fitting", "status": "Manual AIPW and DoubleMLIRM use cross-fitted nuisance predictions."}, {"item": "Uncertainty", "status": "Standard errors, confidence intervals, and bootstrap intervals reported."}, {"item": "Stability", "status": "Repeated sample splitting and clipping sensitivity included."}, {"item": "Main limitation", "status": "IRM does not solve hidden confounding or support violations."}, ])save_table(reporting_checklist, "irm_reporting_checklist")display(reporting_checklist)
item
status
0
Causal question
Estimate effect of feature_exposure on weekly_...
1
Treatment type
Binary treatment, so IRM is appropriate for AT...
2
Estimands
Both ATE and ATT-style targets are shown becau...
3
Control timing
All controls are pre-treatment by construction...
4
Unconfoundedness
Holds by construction here; requires design ev...
5
Overlap
Propensity and weight diagnostics reported.
6
Nuisance learners
Compared linear/logistic and gradient-boosted ...
7
Cross-fitting
Manual AIPW and DoubleMLIRM use cross-fitted n...
8
Uncertainty
Standard errors, confidence intervals, and boo...
9
Stability
Repeated sample splitting and clipping sensiti...
10
Main limitation
IRM does not solve hidden confounding or suppo...
The checklist forces the analysis to stay causal, not merely predictive. A polished propensity model is not a substitute for a credible assignment story.
Report Template
The next cell writes a short markdown report template using the main gradient-boosted IRM estimates.
This template is meant to be adapted. In real analyses, the synthetic truth checks would be replaced by design evidence, sensitivity analysis, and robustness checks.
main_ate = irm_summary.loc[irm_summary["estimator"] =="Gradient boosting nuisance IRM ATE"].iloc[0]main_atte = irm_summary.loc[irm_summary["estimator"] =="Gradient boosting nuisance IRM ATTE"].iloc[0]ess_value = dml_propensity_diagnostics.loc[ dml_propensity_diagnostics["diagnostic"] =="IPW effective sample size", "value"].iloc[0]report_text =f"""# IRM Effect Estimate Report Template## Causal QuestionEstimate the effect of `feature_exposure` on `weekly_value` using observed pre-treatment controls.## EstimandsTwo targets are reported:- ATE: average effect over the full analysis population.- ATTE: effect for the treated units under the DoubleML treated-target score.## Main EstimatesATE estimate:- Estimated effect: {main_ate['theta_hat']:.4f}- Standard error: {main_ate['std_error']:.4f}- 95 percent confidence interval: [{main_ate['ci_95_lower']:.4f}, {main_ate['ci_95_upper']:.4f}]ATTE estimate:- Estimated effect: {main_atte['theta_hat']:.4f}- Standard error: {main_atte['std_error']:.4f}- 95 percent confidence interval: [{main_atte['ci_95_lower']:.4f}, {main_atte['ci_95_upper']:.4f}]## EstimatorThe main estimator is `DoubleMLIRM` with five-fold cross-fitting, histogram gradient-boosted outcome models, and a histogram gradient-boosted propensity classifier.## Diagnostics Included- Difference-in-means, OLS adjustment, manual IPW, manual AIPW, and DoubleML comparisons.- Propensity overlap and inverse-propensity weight diagnostics.- Effective sample size from IPW weights: {ess_value:.1f}.- Outcome nuisance and propensity nuisance quality checks.- Orthogonal score contribution checks.- Bootstrap confidence intervals.- Repeated sample-splitting stability.- Propensity clipping sensitivity.- Overlap stress test.## Required AssumptionsThe estimates rely on consistency, conditional unconfoundedness given the selected controls, overlap, and no interference. DoubleML improves estimation under these assumptions but does not establish them.""".strip()report_path = REPORT_DIR /f"{NOTEBOOK_PREFIX}_irm_report_template.md"report_path.write_text(report_text)print(report_text)
# IRM Effect Estimate Report Template
## Causal Question
Estimate the effect of `feature_exposure` on `weekly_value` using observed pre-treatment controls.
## Estimands
Two targets are reported:
- ATE: average effect over the full analysis population.
- ATTE: effect for the treated units under the DoubleML treated-target score.
## Main Estimates
ATE estimate:
- Estimated effect: 0.9853
- Standard error: 0.0539
- 95 percent confidence interval: [0.8797, 1.0909]
ATTE estimate:
- Estimated effect: 1.0483
- Standard error: 0.0699
- 95 percent confidence interval: [0.9113, 1.1852]
## Estimator
The main estimator is `DoubleMLIRM` with five-fold cross-fitting, histogram gradient-boosted outcome models, and a histogram gradient-boosted propensity classifier.
## Diagnostics Included
- Difference-in-means, OLS adjustment, manual IPW, manual AIPW, and DoubleML comparisons.
- Propensity overlap and inverse-propensity weight diagnostics.
- Effective sample size from IPW weights: 2225.4.
- Outcome nuisance and propensity nuisance quality checks.
- Orthogonal score contribution checks.
- Bootstrap confidence intervals.
- Repeated sample-splitting stability.
- Propensity clipping sensitivity.
- Overlap stress test.
## Required Assumptions
The estimates rely on consistency, conditional unconfoundedness given the selected controls, overlap, and no interference. DoubleML improves estimation under these assumptions but does not establish them.
The report template keeps the estimands, assumptions, overlap diagnostics, and uncertainty attached to the numeric estimates. That is the right shape for binary-treatment causal reporting.
Artifact Manifest
The final cell lists every artifact produced by this notebook so the outputs are easy to find later.
The IRM notebook is complete. The next natural topic is the interactive IV model for binary treatments with instruments and compliance-style reasoning.