This notebook focuses on two closely related EconML estimators:
LinearDML, which estimates a linear CATE model over chosen effect modifiers;
SparseLinearDML, which is designed for wider effect-modifier spaces where only some features are expected to matter.
The teaching question is:
Which pre-treatment features appear to change the size of the treatment effect, and can we estimate those effect drivers after adjusting for confounding?
This is the first estimator-specific notebook in the EconML tutorial sequence. The previous notebook built DML from scratch. Here we lean into the library API and focus on practical modeling decisions: feature roles, coefficient reading, sparse selection behavior, CATE recovery, and treatment targeting.
Learning Goals
By the end of this notebook, you should be able to:
fit LinearDML with observed confounders and effect modifiers;
fit SparseLinearDML on a wider set of candidate effect modifiers;
explain the difference between nuisance models and the final CATE model;
read final-stage CATE coefficients without confusing them for outcome-model coefficients;
compare dense and sparse CATE estimates against known truth in a simulation;
decide when a linear CATE model is a reasonable first estimator.
Why LinearDML Comes First
LinearDML is often the easiest serious EconML estimator to explain. It combines two useful properties:
flexible nuisance models can estimate the baseline outcome and treatment assignment process;
the final treatment-effect model remains linear and readable.
That means the model can adjust for confounding with machine learning while still producing coefficient-style CATE drivers. For example, a positive coefficient on baseline_need means the estimated treatment effect is larger for higher values of baseline_need, after the DML adjustment process.
SparseLinearDML keeps the same broad structure but adds sparse regularization in the final CATE stage. That is useful when we have many possible effect modifiers and want the final model to concentrate on the strongest signals.
LinearDML Versus SparseLinearDML
The next table summarizes how to think about the two estimators before writing code.
from pathlib import Pathimport osimport warningsimport importlib.metadata as importlib_metadata# Keep Matplotlib cache files in a writable location during notebook execution.os.environ.setdefault("MPLCONFIGDIR", "/tmp/matplotlib-ranking-sys")warnings.filterwarnings("default")warnings.filterwarnings("ignore", category=DeprecationWarning)warnings.filterwarnings("ignore", category=PendingDeprecationWarning)warnings.filterwarnings("ignore", category=FutureWarning)warnings.filterwarnings("ignore", message=".*IProgress not found.*")warnings.filterwarnings("ignore", message=".*X does not have valid feature names.*")warnings.filterwarnings("ignore", message=".*The final model has a nonzero intercept.*")warnings.filterwarnings("ignore", message=".*Co-variance matrix is underdetermined.*")warnings.filterwarnings("ignore", module="sklearn.linear_model._logistic")import numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsfrom IPython.display import displayfrom sklearn.ensemble import RandomForestClassifier, RandomForestRegressorfrom sklearn.metrics import brier_score_loss, log_loss, mean_squared_error, roc_auc_scorefrom sklearn.model_selection import KFold, StratifiedKFold, cross_val_predict, train_test_splittry:import econmlfrom econml.dml import LinearDML, SparseLinearDML ECONML_AVAILABLE =True ECONML_VERSION =getattr(econml, "__version__", "unknown")exceptExceptionas exc: ECONML_AVAILABLE =False ECONML_VERSION =f"import failed: {type(exc).__name__}: {exc}"RANDOM_SEED =2026rng = np.random.default_rng(RANDOM_SEED)OUTPUT_DIR = Path("outputs")FIGURE_DIR = OUTPUT_DIR /"figures"TABLE_DIR = OUTPUT_DIR /"tables"FIGURE_DIR.mkdir(parents=True, exist_ok=True)TABLE_DIR.mkdir(parents=True, exist_ok=True)sns.set_theme(style="whitegrid", context="notebook")pd.set_option("display.max_columns", 140)pd.set_option("display.float_format", lambda value: f"{value:,.4f}")estimator_map = pd.DataFrame( [ {"estimator": "LinearDML","final CATE model": "Dense linear model over X","best first use": "A moderate number of meaningful effect modifiers","main strength": "Readable coefficient-style effect drivers","main risk": "Can assign visible coefficients to weak or noisy modifiers", }, {"estimator": "SparseLinearDML","final CATE model": "Sparse regularized linear model over X","best first use": "A wider candidate feature set with only some true effect drivers","main strength": "Shrinks weak CATE drivers and can highlight a smaller feature set","main risk": "Regularization can shrink real but subtle signals too much", }, ])estimator_map.to_csv(TABLE_DIR /"03_estimator_map.csv", index=False)print(f"EconML available: {ECONML_AVAILABLE}")print(f"EconML version: {ECONML_VERSION}")print(f"Figures will be saved to: {FIGURE_DIR.resolve()}")print(f"Tables will be saved to: {TABLE_DIR.resolve()}")display(estimator_map)
EconML available: True
EconML version: 0.16.0
Figures will be saved to: /home/apex/Documents/ranking_sys/notebooks/tutorials/econml/outputs/figures
Tables will be saved to: /home/apex/Documents/ranking_sys/notebooks/tutorials/econml/outputs/tables
estimator
final CATE model
best first use
main strength
main risk
0
LinearDML
Dense linear model over X
A moderate number of meaningful effect modifiers
Readable coefficient-style effect drivers
Can assign visible coefficients to weak or noi...
1
SparseLinearDML
Sparse regularized linear model over X
A wider candidate feature set with only some t...
Shrinks weak CATE drivers and can highlight a ...
Regularization can shrink real but subtle sign...
What this shows: both estimators use DML, but they differ in the final CATE stage. The nuisance models handle adjustment; the final stage decides how treatment-effect heterogeneity is represented.
Data-Generating Design
The synthetic dataset below is designed to teach a specific modeling problem:
treatment is confounded by observed pre-treatment features;
the outcome has a flexible baseline component;
the true treatment effect is sparse and linear in a subset of candidate effect modifiers;
several noise features are included as tempting but irrelevant effect modifiers.
This gives LinearDML and SparseLinearDML something meaningful to compare. A dense linear model can estimate every candidate coefficient, while a sparse model should concentrate more of the final-stage weight on the real effect drivers.
What this shows: the table has both real modeling fields and oracle fields. In the model-fitting steps we will use only pre-treatment observed features, treatment, and outcome. true_cate, propensity, and baseline_outcome_mean are kept only for teaching checks.
Field Dictionary
A field dictionary is especially useful in estimator tutorials because it separates three ideas that are easy to mix up:
features used to explain treatment-effect heterogeneity;
controls used to remove confounding;
oracle fields available only because this is a simulation.
signal_modifier_cols = ["baseline_need","prior_engagement","friction_score","price_sensitivity","content_affinity","region_risk","high_need_segment",]weak_or_null_modifier_cols = ["trust_score", "recency_gap"] +list(noise_features.keys())effect_modifier_cols = signal_modifier_cols + weak_or_null_modifier_colscontrol_cols = ["account_tenure", "seasonality_index", "device_stability"]all_observed_covariates = effect_modifier_cols + control_colsfield_rows = []for col in effect_modifier_cols: field_rows.append( {"column": col,"model_role": "X candidate effect modifier","observed_in_real_analysis": "yes","description": "Pre-treatment feature allowed to modify the treatment effect.","true_cate_driver": "yes"if col in signal_modifier_cols else"no", } )for col in control_cols: field_rows.append( {"column": col,"model_role": "W control","observed_in_real_analysis": "yes","description": "Pre-treatment feature used for adjustment but not used for CATE reporting.","true_cate_driver": "no", } )for col, role, description in [ ("treatment", "treatment", "Binary intervention indicator."), ("outcome", "outcome", "Observed post-treatment outcome."), ("propensity", "oracle", "True treatment probability from the simulated assignment process."), ("true_cate", "oracle", "Known individual treatment effect used for tutorial grading."), ("baseline_outcome_mean", "oracle", "Mean untreated outcome component before random noise."),]: field_rows.append( {"column": col,"model_role": role,"observed_in_real_analysis": "yes"if role in ["treatment", "outcome"] else"no","description": description,"true_cate_driver": "not applicable", } )field_dictionary = pd.DataFrame(field_rows)field_dictionary.to_csv(TABLE_DIR /"03_field_dictionary.csv", index=False)display(field_dictionary.head(30))
column
model_role
observed_in_real_analysis
description
true_cate_driver
0
baseline_need
X candidate effect modifier
yes
Pre-treatment feature allowed to modify the tr...
yes
1
prior_engagement
X candidate effect modifier
yes
Pre-treatment feature allowed to modify the tr...
yes
2
friction_score
X candidate effect modifier
yes
Pre-treatment feature allowed to modify the tr...
yes
3
price_sensitivity
X candidate effect modifier
yes
Pre-treatment feature allowed to modify the tr...
yes
4
content_affinity
X candidate effect modifier
yes
Pre-treatment feature allowed to modify the tr...
yes
5
region_risk
X candidate effect modifier
yes
Pre-treatment feature allowed to modify the tr...
yes
6
high_need_segment
X candidate effect modifier
yes
Pre-treatment feature allowed to modify the tr...
yes
7
trust_score
X candidate effect modifier
yes
Pre-treatment feature allowed to modify the tr...
no
8
recency_gap
X candidate effect modifier
yes
Pre-treatment feature allowed to modify the tr...
no
9
noise_modifier_01
X candidate effect modifier
yes
Pre-treatment feature allowed to modify the tr...
no
10
noise_modifier_02
X candidate effect modifier
yes
Pre-treatment feature allowed to modify the tr...
no
11
noise_modifier_03
X candidate effect modifier
yes
Pre-treatment feature allowed to modify the tr...
no
12
noise_modifier_04
X candidate effect modifier
yes
Pre-treatment feature allowed to modify the tr...
no
13
noise_modifier_05
X candidate effect modifier
yes
Pre-treatment feature allowed to modify the tr...
no
14
noise_modifier_06
X candidate effect modifier
yes
Pre-treatment feature allowed to modify the tr...
no
15
noise_modifier_07
X candidate effect modifier
yes
Pre-treatment feature allowed to modify the tr...
no
16
noise_modifier_08
X candidate effect modifier
yes
Pre-treatment feature allowed to modify the tr...
no
17
noise_modifier_09
X candidate effect modifier
yes
Pre-treatment feature allowed to modify the tr...
no
18
noise_modifier_10
X candidate effect modifier
yes
Pre-treatment feature allowed to modify the tr...
no
19
noise_modifier_11
X candidate effect modifier
yes
Pre-treatment feature allowed to modify the tr...
no
20
noise_modifier_12
X candidate effect modifier
yes
Pre-treatment feature allowed to modify the tr...
no
21
account_tenure
W control
yes
Pre-treatment feature used for adjustment but ...
no
22
seasonality_index
W control
yes
Pre-treatment feature used for adjustment but ...
no
23
device_stability
W control
yes
Pre-treatment feature used for adjustment but ...
no
24
treatment
treatment
yes
Binary intervention indicator.
not applicable
25
outcome
outcome
yes
Observed post-treatment outcome.
not applicable
26
propensity
oracle
no
True treatment probability from the simulated ...
not applicable
27
true_cate
oracle
no
Known individual treatment effect used for tut...
not applicable
28
baseline_outcome_mean
oracle
no
Mean untreated outcome component before random...
not applicable
What this shows: some X columns are true CATE drivers and others are deliberately irrelevant. That is the point of this lesson: wide candidate effect-modifier sets are common, and sparse final-stage models can help control clutter.
True CATE Equation
Since this is a simulation, we can show the real treatment-effect equation. The true CATE is sparse: only seven candidate effect modifiers have nonzero coefficients.
In real work, this table does not exist. We would use domain logic, robustness checks, and validation strategies rather than ground truth.
true_coefficient_map = {"cate_intercept": 0.42,"baseline_need": 0.30,"prior_engagement": 0.22,"friction_score": -0.24,"price_sensitivity": -0.18,"content_affinity": 0.16,"region_risk": -0.12,"high_need_segment": 0.24,}true_coef_table = pd.DataFrame( [{"term": "cate_intercept", "true_cate_coefficient": true_coefficient_map["cate_intercept"], "is_true_driver": True}]+ [ {"term": col,"true_cate_coefficient": true_coefficient_map.get(col, 0.0),"is_true_driver": col in true_coefficient_map, }for col in effect_modifier_cols ])true_coef_table.to_csv(TABLE_DIR /"03_true_cate_coefficients.csv", index=False)display(true_coef_table)
term
true_cate_coefficient
is_true_driver
0
cate_intercept
0.4200
True
1
baseline_need
0.3000
True
2
prior_engagement
0.2200
True
3
friction_score
-0.2400
True
4
price_sensitivity
-0.1800
True
5
content_affinity
0.1600
True
6
region_risk
-0.1200
True
7
high_need_segment
0.2400
True
8
trust_score
0.0000
False
9
recency_gap
0.0000
False
10
noise_modifier_01
0.0000
False
11
noise_modifier_02
0.0000
False
12
noise_modifier_03
0.0000
False
13
noise_modifier_04
0.0000
False
14
noise_modifier_05
0.0000
False
15
noise_modifier_06
0.0000
False
16
noise_modifier_07
0.0000
False
17
noise_modifier_08
0.0000
False
18
noise_modifier_09
0.0000
False
19
noise_modifier_10
0.0000
False
20
noise_modifier_11
0.0000
False
21
noise_modifier_12
0.0000
False
What this shows: LinearDML and SparseLinearDML are being asked to estimate this treatment-effect pattern from observational data. The nuisance models must remove confounding first; the final CATE model then tries to recover these coefficients.
Basic Shape And True Effects
Before fitting anything, we summarize the sample size, treatment rate, outcome level, and true effect distribution. This establishes the scale of the problem.
What this shows: the tutorial has enough rows for cross-fitting and enough candidate modifiers to make sparse modeling relevant. The true CATE standard deviation confirms that there is meaningful heterogeneity to estimate.
Raw Treated-Versus-Control Difference
A raw difference in observed outcomes is not a DML estimate. It ignores the treatment assignment process and will usually be biased when treatment is confounded.
The next cell compares the raw outcome difference with the true ATE.
What this shows: treated rows have different baseline covariate profiles and different average true CATE. This is why the estimator needs both nuisance adjustment and a treatment-effect model.
Covariate Balance Table
The standardized mean difference measures how different treated and untreated groups are before adjustment. Large values are a sign that treatment assignment is strongly related to covariates.
What this shows: the top imbalance features are exactly the kind of variables that nuisance models need to account for. DML does not erase the need for careful adjustment design; it operationalizes that design with cross-fitted prediction models.
Covariate Balance Plot
The plot below focuses on the most imbalanced features so the confounding pattern is easy to scan.
What this shows: the treatment group is systematically different before treatment. The DML estimators should be judged against this context, not against a fictional randomized design.
Propensity Overlap
Overlap checks whether treated and untreated rows exist across similar treatment-probability regions. Weak overlap makes treatment-effect estimation more extrapolative.
We know the true propensity in this simulation. In real data, we would estimate it.
What this shows: most propensity buckets contain observed data, which is good for teaching. The treatment rate rises with the propensity bucket, confirming that the assignment mechanism is not random.
Propensity Overlap Plot
The histogram gives a visual check of how much treated and untreated rows overlap in propensity space.
What this shows: overlap is imperfect but usable. That is a realistic teaching setting for DML: enough support to estimate effects, but enough confounding that naive comparisons are misleading.
X And W Roles
For these estimators:
X is the feature set used in the final CATE model;
W is the control set used for adjustment in nuisance models.
The same observed feature can sometimes be defensibly placed in either set. The key question is whether you want the reported treatment effect to vary along that feature.
role_table = pd.DataFrame( [ {"feature": col,"econml_role": "X","true_cate_driver": col in signal_modifier_cols,"reason": "Candidate effect modifier used by the final CATE model.", }for col in effect_modifier_cols ]+ [ {"feature": col,"econml_role": "W","true_cate_driver": False,"reason": "Adjustment control used by nuisance models, not by the final CATE model.", }for col in control_cols ])role_table.to_csv(TABLE_DIR /"03_x_w_role_table.csv", index=False)display(role_table)
feature
econml_role
true_cate_driver
reason
0
baseline_need
X
True
Candidate effect modifier used by the final CA...
1
prior_engagement
X
True
Candidate effect modifier used by the final CA...
2
friction_score
X
True
Candidate effect modifier used by the final CA...
3
price_sensitivity
X
True
Candidate effect modifier used by the final CA...
4
content_affinity
X
True
Candidate effect modifier used by the final CA...
5
region_risk
X
True
Candidate effect modifier used by the final CA...
6
high_need_segment
X
True
Candidate effect modifier used by the final CA...
7
trust_score
X
False
Candidate effect modifier used by the final CA...
8
recency_gap
X
False
Candidate effect modifier used by the final CA...
9
noise_modifier_01
X
False
Candidate effect modifier used by the final CA...
10
noise_modifier_02
X
False
Candidate effect modifier used by the final CA...
11
noise_modifier_03
X
False
Candidate effect modifier used by the final CA...
12
noise_modifier_04
X
False
Candidate effect modifier used by the final CA...
13
noise_modifier_05
X
False
Candidate effect modifier used by the final CA...
14
noise_modifier_06
X
False
Candidate effect modifier used by the final CA...
15
noise_modifier_07
X
False
Candidate effect modifier used by the final CA...
16
noise_modifier_08
X
False
Candidate effect modifier used by the final CA...
17
noise_modifier_09
X
False
Candidate effect modifier used by the final CA...
18
noise_modifier_10
X
False
Candidate effect modifier used by the final CA...
19
noise_modifier_11
X
False
Candidate effect modifier used by the final CA...
20
noise_modifier_12
X
False
Candidate effect modifier used by the final CA...
21
account_tenure
W
False
Adjustment control used by nuisance models, no...
22
seasonality_index
W
False
Adjustment control used by nuisance models, no...
23
device_stability
W
False
Adjustment control used by nuisance models, no...
What this shows: SparseLinearDML does not decide which variables are pre-treatment or causally admissible. It only regularizes among the X features we provide. The analyst still has to define a valid feature set.
Train And Test Split
The train set is used to fit nuisance models and CATE estimators. The test set is reserved for truth-known evaluation of CATE accuracy and ranking behavior.
What this shows: the split preserves treatment balance and keeps the true ATE similar across train and test. That makes estimator comparisons easier to read.
Modeling Matrices
This cell prepares the arrays and data frames used by EconML. The most important modeling constraint is that oracle fields never enter X, W, or nuisance features.
What this shows: X is intentionally wider than the true CATE equation. That is what lets us see the difference between dense and sparse final-stage behavior.
Separate Nuisance Diagnostics
EconML fits nuisance models internally, but separate out-of-fold diagnostics are helpful for teaching. They show whether the observed covariates can predict outcome and treatment assignment before we call the causal estimator.
What this shows: the treatment model can predict assignment better than chance, confirming confounding. The outcome model captures baseline response structure, which helps DML remove predictable outcome variation before estimating treatment effects.
Fit LinearDML
Now we fit LinearDML with random-forest nuisance models. The random forests are used for nuisance adjustment, while the final CATE model remains linear in X.
This distinction is important: flexible nuisance models do not make the final treatment-effect model nonlinear. They only help residualize treatment and outcome.
What this shows: LinearDML returns both an average effect over the test population and unit-level CATE estimates. The CATE correlation checks whether the model is recovering effect ranking, not just the mean.
Fit SparseLinearDML
SparseLinearDML uses the same X and W roles, but its final stage is regularized. This is useful when we include many candidate effect modifiers and expect only some of them to matter.
The sparse model is not a substitute for causal design. It can shrink noisy CATE drivers, but it cannot fix post-treatment controls, omitted confounders, or poor overlap.
What this shows: SparseLinearDML produces the same kind of object as LinearDML: unit-level CATE estimates. The difference is in how the final-stage treatment-effect coefficients are regularized.
Compare Estimator-Level Metrics
The next table compares raw observational difference, LinearDML, and SparseLinearDML against the known truth on the test population.
What this shows: DML estimators can be judged on both average-effect error and CATE recovery. The raw difference cannot be evaluated as a CATE model because it returns only one overall contrast.
Extract Final-Stage Coefficients
For both estimators, the final-stage coefficients describe treatment-effect heterogeneity. They are not baseline outcome coefficients and they are not propensity-model coefficients.
The next cell combines the known true coefficients with the estimated coefficients from both estimators.
What this shows: coefficient tables are where linear final-stage estimators shine. They turn CATE estimation into a readable statement about which features increase or decrease treatment benefit.
Coefficient Ranking
A wide coefficient table can still be hard to scan. The next cell ranks candidate effect modifiers by absolute sparse coefficient and shows whether each one is a true CATE driver in the simulation.
What this shows: a sparse model is most useful when the top coefficients are mostly real effect drivers and the irrelevant modifiers are pushed toward small values. The exact threshold for practical selection is an analyst choice, not a universal law.
Coefficient Concentration Summary
Instead of reading every coefficient, we can summarize how much absolute coefficient mass falls on true drivers versus noise features.
What this shows: sparse estimation is not only about individual coefficients. It is also about concentrating attention on a smaller part of the candidate modifier set.
Coefficient Plot
The plot below compares true coefficients, dense LinearDML coefficients, and sparse SparseLinearDML coefficients for the most important terms. This makes sign recovery and shrinkage easier to see.
What this shows: coefficient plots help communicate direction and relative size. They are most credible when paired with diagnostics about overlap, nuisance modeling, and robustness.
CATE Recovery Scatter
Because this is simulated data, we can compare estimated CATE to true CATE on the test set. The dashed line is perfect recovery.
What this shows: coefficient recovery and CATE recovery are related but not identical. A model can have imperfect coefficients yet still rank treatment effects reasonably well, and ranking quality is often what matters for targeting.
Segment-Level CATE Recovery
Segment summaries translate unit-level CATE estimates into a form that is easier to communicate. Here we summarize by high-need segment and region risk.
What this shows: segment summaries can reveal whether an estimator is useful for broad population comparisons. They also prevent us from relying only on an overall average effect.
Segment Recovery Plot
This plot compares true and estimated segment-level CATE values. Each point is a segment average.
What this shows: the sparse and dense estimators should tell a similar segment story if the main heterogeneity signals are stable. Large disagreements would be a reason to inspect features, overlap, and nuisance models more carefully.
CATE Decile Calibration
For treatment targeting, ranking is often as important as exact effect magnitude. The next table groups test rows by predicted CATE decile and compares estimated versus true average CATE.
What this shows: a useful CATE model should produce higher true effects in higher predicted-effect deciles. In real data, we would need policy evaluation or experimental follow-up because true CATE would not be available.
CATE Decile Calibration Plot
The lines below show whether predicted CATE ranking is aligned with known true CATE ranking.
What this shows: the decile view turns CATE estimation into a ranking check. Smoothly increasing true CATE across predicted deciles means the model is learning useful prioritization structure.
Simple Targeting Comparison
A common CATE use case is targeting the top fraction of units by estimated effect. The next cell compares three rules on the test set:
random 20% targeting;
top 20% by LinearDML estimated CATE;
top 20% by SparseLinearDML estimated CATE;
oracle top 20% by true CATE, which is only available in simulation.
What this shows: CATE models are often evaluated by how much better their targeted group is than random selection. The oracle row is an upper benchmark, not an achievable real-world policy.
Targeting Plot
The bar plot makes the targeting comparison easy to read: higher average true CATE in the selected group means the model is better at finding high-benefit units.
fig, ax = plt.subplots(figsize=(11, 5))sns.barplot( data=targeting_summary, x="average_true_cate_in_selected_group", y="targeting_rule", color="#34d399", ax=ax,)ax.set_title("True Benefit Among Targeted Test Rows")ax.set_xlabel("Average True CATE In Selected Group")ax.set_ylabel("Targeting Rule")plt.tight_layout()fig.savefig(FIGURE_DIR /"03_targeting_summary.png", dpi=160, bbox_inches="tight")plt.show()
What this shows: targeting is where CATE estimation becomes operational. Even when exact CATE values are noisy, a model can still be valuable if it ranks high-benefit units well.
Omitted Modifier Stress Test
A readable model can be too simple if it omits important effect modifiers from X. To make that risk concrete, the next cell fits a restricted LinearDML that excludes two true CATE drivers from the final stage.
The omitted variables can still be included in W for adjustment, but the final CATE model is no longer allowed to vary along them.
What this shows: X controls what heterogeneity the final CATE model can express. Moving a true effect modifier out of X may still allow adjustment, but it prevents the reported effect from varying along that feature.
Practical Guidance Table
This table turns the notebook into a quick modeling guide for choosing between dense and sparse linear DML.
practical_guidance = pd.DataFrame( [ {"situation": "Small number of carefully chosen effect modifiers","first estimator to try": "LinearDML","reason": "The final CATE coefficients are easy to read and unlikely to be cluttered.", }, {"situation": "Many candidate modifiers, only some expected to matter","first estimator to try": "SparseLinearDML","reason": "Regularization can shrink weaker CATE drivers and focus the coefficient table.", }, {"situation": "Strong nonlinear treatment-effect heterogeneity is expected","first estimator to try": "CausalForestDML or another flexible estimator","reason": "A linear final stage may be too restrictive even if nuisance models are flexible.", }, {"situation": "Primary goal is a transparent segment narrative","first estimator to try": "LinearDML plus segment summaries","reason": "Dense coefficients and segment-level checks are usually easier to explain.", }, {"situation": "Primary goal is treatment targeting","first estimator to try": "Compare LinearDML, SparseLinearDML, and flexible CATE models","reason": "Ranking quality matters; a more flexible model may rank better even if coefficients are less readable.", }, ])practical_guidance.to_csv(TABLE_DIR /"03_practical_guidance.csv", index=False)display(practical_guidance)
situation
first estimator to try
reason
0
Small number of carefully chosen effect modifiers
LinearDML
The final CATE coefficients are easy to read a...
1
Many candidate modifiers, only some expected t...
SparseLinearDML
Regularization can shrink weaker CATE drivers ...
2
Strong nonlinear treatment-effect heterogeneit...
CausalForestDML or another flexible estimator
A linear final stage may be too restrictive ev...
3
Primary goal is a transparent segment narrative
LinearDML plus segment summaries
Dense coefficients and segment-level checks ar...
4
Primary goal is treatment targeting
Compare LinearDML, SparseLinearDML, and flexib...
Ranking quality matters; a more flexible model...
What this shows: there is no universally best estimator. LinearDML and SparseLinearDML are excellent first choices when a linear CATE story is plausible, but the modeling choice should follow the causal question and expected heterogeneity pattern.
Linear DML Checklist
Before presenting a LinearDML or SparseLinearDML result, it is worth checking the items below.
linear_dml_checklist = pd.DataFrame( [ {"check": "Treatment and outcome are clearly defined", "why_it_matters": "The estimator needs a precise intervention and post-treatment response."}, {"check": "All X and W features are pre-treatment", "why_it_matters": "Post-treatment controls can distort the causal estimand."}, {"check": "X contains the heterogeneity dimensions you want to report", "why_it_matters": "The final CATE model can vary only along X."}, {"check": "W contains important adjustment controls", "why_it_matters": "Nuisance models need enough information to reduce confounding."}, {"check": "Overlap is adequate", "why_it_matters": "Unsupported regions force extrapolation."}, {"check": "Nuisance models have reasonable out-of-fold diagnostics", "why_it_matters": "Poor nuisance models leave confounding in the residuals."}, {"check": "Coefficient signs match domain expectations where possible", "why_it_matters": "Unexpected signs can flag leakage, coding errors, or misspecification."}, {"check": "Dense and sparse results are compared when X is wide", "why_it_matters": "Sparse shrinkage helps reveal whether the coefficient story is stable."}, {"check": "CATE estimates are evaluated as rankings, not only point values", "why_it_matters": "Treatment targeting depends heavily on ranking quality."}, ])linear_dml_checklist.to_csv(TABLE_DIR /"03_linear_dml_checklist.csv", index=False)display(linear_dml_checklist)
check
why_it_matters
0
Treatment and outcome are clearly defined
The estimator needs a precise intervention and...
1
All X and W features are pre-treatment
Post-treatment controls can distort the causal...
2
X contains the heterogeneity dimensions you wa...
The final CATE model can vary only along X.
3
W contains important adjustment controls
Nuisance models need enough information to red...
4
Overlap is adequate
Unsupported regions force extrapolation.
5
Nuisance models have reasonable out-of-fold di...
Poor nuisance models leave confounding in the ...
6
Coefficient signs match domain expectations wh...
Unexpected signs can flag leakage, coding erro...
7
Dense and sparse results are compared when X i...
Sparse shrinkage helps reveal whether the coef...
8
CATE estimates are evaluated as rankings, not ...
Treatment targeting depends heavily on ranking...
What this shows: estimator output is only part of the work. A credible linear DML analysis also needs design checks, feature-role clarity, and sanity checks on the resulting CATE story.
Summary
This notebook showed how to use two readable EconML estimators for heterogeneous treatment effects.
The main takeaways are:
LinearDML uses flexible nuisance models but keeps a linear final CATE model;
SparseLinearDML is useful when the candidate X set is wide and only some modifiers are expected to matter;
final-stage coefficients describe treatment-effect heterogeneity, not baseline outcome prediction;
coefficient tables should be paired with overlap checks, nuisance diagnostics, and CATE recovery checks;
sparse shrinkage can reduce coefficient clutter, but it can also shrink subtle real effects;
CATE models should be evaluated both for average-effect accuracy and ranking usefulness;
choosing X and W is a causal design decision, not just a modeling convenience.
The next tutorial can move from linear final-stage models to CausalForestDML, where the CATE surface is allowed to be nonlinear and feature importance replaces simple coefficient reading.