EconML Tutorial 05: DRLearner And Doubly Robust Estimation
This notebook introduces DRLearner, EconML’s learner built around doubly robust pseudo-outcomes.
The previous notebooks focused on DML residualization and causal forests. DRLearner approaches the same CATE goal from a different angle. It estimates nuisance functions for:
treatment assignment, usually called the propensity model;
potential outcomes under treatment and control, usually called outcome regression models.
Then it combines those nuisance estimates into a doubly robust pseudo-outcome. A final model learns CATE by predicting that pseudo-outcome from effect modifiers.
The core causal question stays the same:
For each unit, how much would the outcome change if treatment were applied instead of not applied?
The new lesson is how doubly robust construction uses both propensity and outcome information to make the estimate less fragile than relying on either one alone.
Learning Goals
By the end of this notebook, you should be able to:
explain the doubly robust idea in practical language;
distinguish propensity nuisance models from outcome-regression nuisance models;
write and inspect a binary-treatment doubly robust pseudo-outcome;
understand why propensity clipping is used;
fit manual DR-style CATE models for teaching;
fit EconML’s DRLearner with linear and flexible final models;
compare DRLearner with a pure outcome-model T-learner baseline;
evaluate CATE recovery, segment summaries, decile calibration, and targeting behavior;
describe when DRLearner is a good estimator to try.
The Doubly Robust Idea
A binary-treatment doubly robust pseudo-outcome combines three nuisance estimates:
e(X, W): estimated probability of treatment;
m1(X, W): estimated outcome if treated;
m0(X, W): estimated outcome if untreated.
For observed outcome Y and treatment T, the pseudo-outcome is:
The first part, m1 - m0, is the outcome-model estimate of the treatment effect. The two correction terms use residuals from the observed treatment arm, scaled by inverse propensity. If the outcome models are good, the corrections are small on average. If the propensity model is good, the corrections can repair certain outcome-model errors.
That is the practical meaning of doubly robust: the estimator has two routes to credibility. It is not invincible; it still needs causal identification, overlap, pre-treatment covariates, and reasonable nuisance models.
Tutorial Flow
The notebook proceeds in four layers:
Build a confounded binary-treatment dataset with known CATE.
Demonstrate doubly robust ATE logic using oracle and intentionally weak nuisance functions.
Manually construct cross-fitted DR pseudo-outcomes and fit a final CATE model.
Fit EconML DRLearner and compare it with baselines.
The manual layer is included because it makes the EconML estimator much easier to understand. The EconML layer is what you would normally use in practice.
Setup
This cell imports the packages used in the lesson, creates output folders, fixes a random seed, and checks that EconML is available. The warning filters keep the notebook readable while allowing real execution failures to appear.
from pathlib import Pathimport osimport warningsimport importlib.metadata as importlib_metadata# Keep Matplotlib cache files in a writable location during notebook execution.os.environ.setdefault("MPLCONFIGDIR", "/tmp/matplotlib-ranking-sys")warnings.filterwarnings("default")warnings.filterwarnings("ignore", category=DeprecationWarning)warnings.filterwarnings("ignore", category=PendingDeprecationWarning)warnings.filterwarnings("ignore", category=FutureWarning)warnings.filterwarnings("ignore", message=".*IProgress not found.*")warnings.filterwarnings("ignore", message=".*X does not have valid feature names.*")warnings.filterwarnings("ignore", message=".*The final model has a nonzero intercept.*")warnings.filterwarnings("ignore", message=".*Co-variance matrix is underdetermined.*")warnings.filterwarnings("ignore", module="sklearn.linear_model._logistic")import numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsfrom IPython.display import displayfrom sklearn.base import clonefrom sklearn.ensemble import RandomForestClassifier, RandomForestRegressor, GradientBoostingClassifier, GradientBoostingRegressorfrom sklearn.linear_model import LinearRegression, LogisticRegressionfrom sklearn.metrics import brier_score_loss, log_loss, mean_squared_error, roc_auc_scorefrom sklearn.model_selection import KFold, StratifiedKFold, cross_val_predict, train_test_splitfrom sklearn.pipeline import make_pipelinefrom sklearn.preprocessing import StandardScalertry:import econmlfrom econml.dr import DRLearnerfrom econml.dml import LinearDML ECONML_AVAILABLE =True ECONML_VERSION =getattr(econml, "__version__", "unknown")exceptExceptionas exc: ECONML_AVAILABLE =False ECONML_VERSION =f"import failed: {type(exc).__name__}: {exc}"RANDOM_SEED =2026rng = np.random.default_rng(RANDOM_SEED)OUTPUT_DIR = Path("outputs")FIGURE_DIR = OUTPUT_DIR /"figures"TABLE_DIR = OUTPUT_DIR /"tables"FIGURE_DIR.mkdir(parents=True, exist_ok=True)TABLE_DIR.mkdir(parents=True, exist_ok=True)sns.set_theme(style="whitegrid", context="notebook")pd.set_option("display.max_columns", 140)pd.set_option("display.float_format", lambda value: f"{value:,.4f}")print(f"EconML available: {ECONML_AVAILABLE}")print(f"EconML version: {ECONML_VERSION}")print(f"Figures will be saved to: {FIGURE_DIR.resolve()}")print(f"Tables will be saved to: {TABLE_DIR.resolve()}")
EconML available: True
EconML version: 0.16.0
Figures will be saved to: /home/apex/Documents/ranking_sys/notebooks/tutorials/econml/outputs/figures
Tables will be saved to: /home/apex/Documents/ranking_sys/notebooks/tutorials/econml/outputs/tables
What this shows: the environment is ready if EconML imports successfully. The output folders are shared across this tutorial series, and this notebook writes files with the 05_ prefix.
Where DRLearner Fits
The next table positions DRLearner against the estimators already covered. This helps explain why we need another learner after DML and causal forests.
estimator_map = pd.DataFrame( [ {"estimator": "LinearDML","main construction": "Residualize outcome and treatment, then fit a final CATE model","best use": "Readable linear CATE models after DML adjustment","main diagnostic focus": "Residualized signal, coefficients, CATE recovery", }, {"estimator": "CausalForestDML","main construction": "DML adjustment with a forest final CATE model","best use": "Nonlinear heterogeneity and treatment targeting","main diagnostic focus": "Feature importance, intervals, slices, support", }, {"estimator": "DRLearner","main construction": "Doubly robust pseudo-outcome from outcome and propensity models","best use": "Binary treatment CATE with explicit outcome and propensity nuisance modeling","main diagnostic focus": "Propensity overlap, outcome models by arm, pseudo-outcome noise, final CATE model", }, ])estimator_map.to_csv(TABLE_DIR /"05_estimator_map.csv", index=False)display(estimator_map)
estimator
main construction
best use
main diagnostic focus
0
LinearDML
Residualize outcome and treatment, then fit a ...
Readable linear CATE models after DML adjustment
Residualized signal, coefficients, CATE recovery
1
CausalForestDML
DML adjustment with a forest final CATE model
Nonlinear heterogeneity and treatment targeting
Feature importance, intervals, slices, support
2
DRLearner
Doubly robust pseudo-outcome from outcome and ...
Binary treatment CATE with explicit outcome an...
Propensity overlap, outcome models by arm, pse...
What this shows: DRLearner is not just another black-box CATE model. Its key object is a pseudo-outcome that combines outcome-regression and inverse-propensity correction.
Synthetic Teaching Data
The next cell creates a binary-treatment observational dataset. The true treatment effect is nonlinear but not wildly complex, so a flexible final model can learn it while a linear final model still provides a useful baseline.
The simulation includes:
observed confounding through treatment assignment;
separate potential-outcome means under treatment and control;
true CATE for teaching evaluation;
oracle propensity and potential-outcome means for a short double-robustness demonstration.
What this shows: treatment and outcome are the fields we would observe in real data. propensity, mu0, mu1, and true_cate are oracle fields used only for teaching and evaluation.
Field Dictionary
This table names the model role of each column. The most important point is that oracle fields are never allowed into fitted models.
effect_modifier_cols = ["baseline_need","prior_engagement","friction_score","content_affinity","price_sensitivity","trust_score","recency_gap","region_risk","high_need_segment",]control_cols = ["account_tenure", "seasonality_index", "device_stability", "traffic_intensity"]all_observed_covariates = effect_modifier_cols + control_colstrue_driver_cols = ["baseline_need","prior_engagement","friction_score","content_affinity","price_sensitivity","region_risk","high_need_segment",]field_rows = []for col in effect_modifier_cols: field_rows.append( {"column": col,"role": "X effect modifier","observed_in_real_analysis": "yes","description": "Pre-treatment feature used by final CATE models.","true_cate_driver": "yes"if col in true_driver_cols else"no", } )for col in control_cols: field_rows.append( {"column": col,"role": "W control","observed_in_real_analysis": "yes","description": "Pre-treatment adjustment feature used by nuisance models.","true_cate_driver": "no", } )for col, role, description in [ ("treatment", "treatment", "Binary intervention indicator."), ("outcome", "outcome", "Observed post-treatment outcome."), ("propensity", "oracle", "True treatment probability from the simulated assignment process."), ("mu0", "oracle", "True conditional mean outcome under control."), ("mu1", "oracle", "True conditional mean outcome under treatment."), ("true_cate", "oracle", "Known individual treatment effect used only for tutorial grading."),]: field_rows.append( {"column": col,"role": role,"observed_in_real_analysis": "yes"if role in ["treatment", "outcome"] else"no","description": description,"true_cate_driver": "not applicable", } )field_dictionary = pd.DataFrame(field_rows)field_dictionary.to_csv(TABLE_DIR /"05_field_dictionary.csv", index=False)display(field_dictionary)
column
role
observed_in_real_analysis
description
true_cate_driver
0
baseline_need
X effect modifier
yes
Pre-treatment feature used by final CATE models.
yes
1
prior_engagement
X effect modifier
yes
Pre-treatment feature used by final CATE models.
yes
2
friction_score
X effect modifier
yes
Pre-treatment feature used by final CATE models.
yes
3
content_affinity
X effect modifier
yes
Pre-treatment feature used by final CATE models.
yes
4
price_sensitivity
X effect modifier
yes
Pre-treatment feature used by final CATE models.
yes
5
trust_score
X effect modifier
yes
Pre-treatment feature used by final CATE models.
no
6
recency_gap
X effect modifier
yes
Pre-treatment feature used by final CATE models.
no
7
region_risk
X effect modifier
yes
Pre-treatment feature used by final CATE models.
yes
8
high_need_segment
X effect modifier
yes
Pre-treatment feature used by final CATE models.
yes
9
account_tenure
W control
yes
Pre-treatment adjustment feature used by nuisa...
no
10
seasonality_index
W control
yes
Pre-treatment adjustment feature used by nuisa...
no
11
device_stability
W control
yes
Pre-treatment adjustment feature used by nuisa...
no
12
traffic_intensity
W control
yes
Pre-treatment adjustment feature used by nuisa...
no
13
treatment
treatment
yes
Binary intervention indicator.
not applicable
14
outcome
outcome
yes
Observed post-treatment outcome.
not applicable
15
propensity
oracle
no
True treatment probability from the simulated ...
not applicable
16
mu0
oracle
no
True conditional mean outcome under control.
not applicable
17
mu1
oracle
no
True conditional mean outcome under treatment.
not applicable
18
true_cate
oracle
no
Known individual treatment effect used only fo...
not applicable
What this shows: DRLearner needs both X and W roles. The final CATE model uses X, while nuisance models use both X and W to estimate outcomes and propensity.
Basic Shape And True Effect Scale
This summary checks the sample size, treatment rate, and true CATE distribution before any estimation.
What this shows: the treatment rate is usable and the CATE distribution has real spread. That makes the dataset appropriate for learning heterogeneous effects rather than only one average effect.
True CATE Distribution
This plot shows the effect heterogeneity that the learners are trying to recover. In real data, this plot would not be available.
What this shows: average treatment effect is only one slice of the problem. The distribution shows why a CATE learner can be useful for ranking and targeting.
Raw Treated-Versus-Control Difference
A raw outcome difference ignores confounding. This cell shows the raw contrast and compares it with the true ATE.
What this shows: the raw comparison is contaminated by who gets treated. DRLearner addresses this by using propensity and outcome nuisance models rather than relying on raw group means.
Covariate Balance Table
Standardized mean differences show which pre-treatment covariates differ between treated and untreated rows.
What this shows: treatment assignment is clearly related to observed covariates. The doubly robust workflow needs these covariates in the nuisance models to reduce confounding.
Covariate Balance Plot
The plot highlights the most imbalanced features. These are the variables most visibly tied to treatment assignment.
What this shows: imbalance is not a failure of the tutorial data; it is the reason we need causal adjustment. The model should not be trusted without this kind of design check.
Propensity Overlap
Doubly robust estimators use inverse propensity terms, so overlap matters a lot. If propensities are close to zero or one, correction terms can become unstable.
What this shows: treatment probability varies substantially, but most rows remain away from the most extreme buckets. That makes the inverse-propensity correction usable for teaching.
Propensity Overlap Plot
The histogram shows true propensity by treatment group. In real data, this plot would use estimated propensity scores.
What this shows: the treated and untreated distributions overlap, but they are shifted. This is exactly the kind of setting where propensity-aware correction is useful.
Oracle Doubly Robust ATE Demonstration
Before fitting models, we use oracle simulation fields to show the algebraic idea behind double robustness.
The four nuisance settings below are artificial:
oracle outcome models and oracle propensity;
oracle outcome models with weak propensity;
weak outcome models with oracle propensity;
weak outcome models with weak propensity.
The point is not that real analyses have oracle nuisance functions. The point is to see why using both outcome and propensity information can be safer than relying on only one route.
What this shows: when either the outcome route or the propensity route is strong, the doubly robust estimate can stay close to the truth. When both routes are weak, the estimate can fail badly.
Train And Test Split
The train set is used for nuisance and CATE model fitting. The test set is held out for evaluating CATE recovery against known truth.
What this shows: DRLearner separates the CATE reporting dimensions from the adjustment controls. The nuisance models see both sets; the final model explains CATE using X.
Cross-Fitted Nuisance Models For Manual DR
This cell manually estimates the nuisance functions needed for the DR pseudo-outcome:
propensity model e(X, W);
outcome model under control m0(X, W);
outcome model under treatment m1(X, W).
The predictions are out-of-fold, meaning every row receives nuisance predictions from models that did not train on that row.
What this shows: the pseudo-outcome is much noisier than the true CATE, but its average and ranking signal are useful. The final CATE model will smooth this noisy pseudo-outcome over X.
Pseudo-Outcome Distribution
The DR pseudo-outcome can have wider tails than the true CATE because it contains inverse-propensity weighted residual corrections. This is normal and is one reason final-stage smoothing matters.
What this shows: pseudo-outcomes are not the same as individual true effects. They are noisy training targets whose conditional expectation is used to learn CATE.
Manual DR Final CATE Model
Now we fit a final model that predicts the manual DR pseudo-outcome from X. This mirrors the DRLearner structure in a transparent way.
What this shows: smoothing the noisy DR pseudo-outcome over X creates useful CATE estimates. EconML DRLearner automates this workflow and handles the nuisance/final-stage plumbing.
T-Learner Baseline
A T-learner fits separate outcome models for treated and untreated rows, then subtracts predicted control outcome from predicted treated outcome. It uses outcome regression but does not add the inverse-propensity correction term used by DR pseudo-outcomes.
This makes it a useful baseline for understanding what DRLearner adds.
What this shows: the T-learner is a strong outcome-regression baseline. DRLearner should be compared against it because doubly robust estimation is most meaningful when we see what the outcome-only route can already do.
Fit EconML DRLearner With A Linear Final Model
This first DRLearner uses flexible nuisance models but a linear final CATE model. That makes the final CATE surface easier to explain, but less flexible.
What this shows: DRLearner can use a simple final model when the CATE story needs to be compact. The tradeoff is that nonlinear effect patterns may be compressed.
Fit EconML DRLearner With A Flexible Final Model
This second DRLearner uses the same doubly robust nuisance logic but a random-forest final model. That final model can learn nonlinear CATE patterns from the pseudo-outcome.
What this shows: the flexible final model is closer to the usual reason for trying DRLearner on heterogeneous effects. The pseudo-outcome supplies the target, and the final model supplies the CATE shape.
Optional LinearDML Baseline
A LinearDML baseline gives us a bridge back to the earlier DML notebooks. It is not the star of this lesson, but it helps compare residualization-based and pseudo-outcome-based workflows.
What this shows: DML and DR workflows can both estimate CATE, but they construct the final learning target differently. Comparing them helps build estimator intuition.
Estimator Comparison
The next table compares raw difference, T-learner, manual DR, EconML DRLearner variants, and LinearDML on the same test set.
What this shows: the DRLearner variants should be evaluated on both average-effect error and CATE ranking quality. The best estimator depends on whether the task needs an average, a ranking, or a readable segment story.
CATE Recovery Scatter
The scatter plot compares estimated CATE values with known true CATE values on the test set.
What this shows: each estimator makes a different bias-variance tradeoff. DRLearner’s final model should smooth a noisy pseudo-outcome without erasing important CATE structure.
DRLearner Final-Model Feature Importance
For the forest-final DRLearner, we can inspect the fitted final random forest’s feature importance. This tells us which X features were most useful for predicting the DR pseudo-outcome.
dr_final_model = dr_forest.ortho_learner_model_final_.models_cate[0]ifnothasattr(dr_final_model, "feature_importances_"):raiseAttributeError("The fitted DRLearner final model does not expose feature_importances_.")dr_feature_importance = pd.DataFrame( {"feature": effect_modifier_cols,"importance": dr_final_model.feature_importances_,"true_cate_driver": [col in true_driver_cols for col in effect_modifier_cols], }).sort_values("importance", ascending=False)dr_feature_importance.to_csv(TABLE_DIR /"05_drlearner_final_feature_importance.csv", index=False)display(dr_feature_importance)
feature
importance
true_cate_driver
2
friction_score
0.1791
True
0
baseline_need
0.1675
True
3
content_affinity
0.1670
True
1
prior_engagement
0.1260
True
5
trust_score
0.1247
False
4
price_sensitivity
0.1213
True
6
recency_gap
0.1056
False
7
region_risk
0.0080
True
8
high_need_segment
0.0007
True
What this shows: feature importance explains the fitted final CATE model, not the causal design. A feature can be important in the final model only if it is included in X.
Feature Importance Plot
The plot makes the final-model importance ranking easier to scan.
What this shows: the feature-importance view turns the final DRLearner model into a readable diagnostic. It should be paired with CATE recovery, calibration, and segment checks.
CATE Decile Calibration
CATE models are often used for ranking units. The next table groups test rows by predicted CATE decile and compares estimated versus true average CATE.
What this shows: higher predicted deciles should have higher true CATE if the model ranks well. This is one of the most useful truth-known checks for treatment targeting.
CATE Decile Calibration Plot
The plot compares estimated and true average CATE by predicted-effect decile for selected estimators.
What this shows: decile calibration connects estimation to action. A model can be useful for targeting even if individual CATE point estimates are noisy.
Segment-Level Recovery
Segment summaries make CATE estimates easier to communicate. Here we summarize by high-need segment and region risk.
What this shows: segment summaries reveal whether the learner recovers broad subgroup patterns, not just overall ranking. This is often the most readable way to explain heterogeneous effects.
Segment Recovery Plot
This plot compares true and estimated segment-level effects.
What this shows: segment plots are useful because DR pseudo-outcomes themselves can be noisy. Segment aggregation makes the treatment-effect story easier to compare across methods.
Propensity Weight Diagnostics
The DR correction terms include 1 / e for treated rows and 1 / (1 - e) for untreated rows. Large weights can create noisy pseudo-outcomes.
The next cell summarizes manual cross-fitted correction weights.
What this shows: weight diagnostics help explain pseudo-outcome noise. Propensity clipping is a practical guardrail against extremely large correction terms.
Weight Distribution Plot
This plot shows the distribution of inverse-propensity correction weights used in the manual pseudo-outcome.
What this shows: most correction weights are moderate, so the pseudo-outcome is not dominated by a handful of extreme-weight rows. That supports the stability of this teaching example.
Targeting Comparison
A common use for CATE estimates is selecting the top fraction of units for treatment. The next cell compares random targeting, T-learner targeting, manual DR targeting, DRLearner targeting, and an oracle benchmark.
What this shows: CATE models matter operationally when they improve the selected group’s true benefit over random targeting. The oracle row is an unattainable upper benchmark available only in simulation.
Targeting Plot
This plot compares true benefit among selected test rows under each targeting rule.
fig, ax = plt.subplots(figsize=(11, 5))sns.barplot( data=targeting_summary, x="average_true_cate_in_selected_group", y="targeting_rule", color="#34d399", ax=ax,)ax.set_title("True Benefit Among Targeted Test Rows")ax.set_xlabel("Average True CATE In Selected Group")ax.set_ylabel("Targeting Rule")plt.tight_layout()fig.savefig(FIGURE_DIR /"05_targeting_summary.png", dpi=160, bbox_inches="tight")plt.show()
What this shows: model comparison should be tied to the decision. A model with slightly worse RMSE can still be useful if it ranks the highest-benefit units well.
Practical DRLearner Guidance
This table summarizes when DRLearner is a good estimator to try and what to watch.
practical_guidance = pd.DataFrame( [ {"situation": "Binary treatment with observed confounding","why DRLearner helps": "It combines outcome regression and propensity correction in a CATE pseudo-outcome.","watchout": "Both nuisance routes still need credible pre-treatment covariates and overlap.", }, {"situation": "Outcome models are strong but treatment assignment is uneven","why DRLearner helps": "Outcome regression carries much of the signal while propensity correction reduces assignment bias.","watchout": "Extreme propensity values can still inflate pseudo-outcome noise.", }, {"situation": "Treatment targeting is the main goal","why DRLearner helps": "A flexible final model can learn rankings from DR pseudo-outcomes.","watchout": "Evaluate ranking and policy value, not only average effect.", }, {"situation": "A very readable CATE story is required","why DRLearner helps": "A linear final model can make the final CATE surface compact.","watchout": "The final model may miss nonlinear heterogeneity.", }, ])practical_guidance.to_csv(TABLE_DIR /"05_practical_guidance.csv", index=False)display(practical_guidance)
situation
why DRLearner helps
watchout
0
Binary treatment with observed confounding
It combines outcome regression and propensity ...
Both nuisance routes still need credible pre-t...
1
Outcome models are strong but treatment assign...
Outcome regression carries much of the signal ...
Extreme propensity values can still inflate ps...
2
Treatment targeting is the main goal
A flexible final model can learn rankings from...
Evaluate ranking and policy value, not only av...
3
A very readable CATE story is required
A linear final model can make the final CATE s...
The final model may miss nonlinear heterogeneity.
What this shows: DRLearner is a flexible framework. The analyst chooses nuisance models and the final CATE model based on the causal question, data support, and reporting needs.
DRLearner Checklist
Before presenting DRLearner results, it is worth checking the items below.
dr_checklist = pd.DataFrame( [ {"check": "Treatment and outcome are clearly defined", "why_it_matters": "DRLearner estimates the effect of a specific intervention."}, {"check": "All X and W variables are pre-treatment", "why_it_matters": "Post-treatment variables can distort the estimand."}, {"check": "Important confounders are included", "why_it_matters": "Double robustness does not fix omitted confounding."}, {"check": "Overlap is adequate", "why_it_matters": "Inverse-propensity corrections become unstable near zero or one."}, {"check": "Propensity model quality is inspected", "why_it_matters": "Bad propensity estimates can create poor correction terms."}, {"check": "Outcome models by treatment arm are inspected", "why_it_matters": "The pseudo-outcome relies heavily on potential-outcome predictions."}, {"check": "Pseudo-outcome tails and weights are checked", "why_it_matters": "Large correction weights can dominate the final model."}, {"check": "Final CATE model is appropriate", "why_it_matters": "A linear final model and a forest final model tell different stories."}, {"check": "CATE ranking is validated where possible", "why_it_matters": "Treatment targeting depends on ranking, not only ATE accuracy."}, ])dr_checklist.to_csv(TABLE_DIR /"05_drlearner_checklist.csv", index=False)display(dr_checklist)
check
why_it_matters
0
Treatment and outcome are clearly defined
DRLearner estimates the effect of a specific i...
1
All X and W variables are pre-treatment
Post-treatment variables can distort the estim...
2
Important confounders are included
Double robustness does not fix omitted confoun...
3
Overlap is adequate
Inverse-propensity corrections become unstable...
4
Propensity model quality is inspected
Bad propensity estimates can create poor corre...
5
Outcome models by treatment arm are inspected
The pseudo-outcome relies heavily on potential...
6
Pseudo-outcome tails and weights are checked
Large correction weights can dominate the fina...
7
Final CATE model is appropriate
A linear final model and a forest final model ...
8
CATE ranking is validated where possible
Treatment targeting depends on ranking, not on...
What this shows: DRLearner is not a shortcut around causal design. It is a powerful estimator once the treatment, outcome, covariates, and support are all defensible.
Summary
This notebook introduced DRLearner and doubly robust CATE estimation.
The main takeaways are:
DRLearner uses outcome nuisance models, a propensity nuisance model, and a final CATE model;
the doubly robust pseudo-outcome combines m1 - m0 with inverse-propensity residual corrections;
double robustness means the estimator has two routes to credibility, but it still needs identification and overlap;
pseudo-outcomes are noisy training targets, so final-stage smoothing matters;
propensity clipping and weight diagnostics are practical stability checks;
DRLearner can use a readable linear final model or a flexible forest final model;
CATE estimates should be evaluated through average error, ranking, segment summaries, and targeting performance.
The next tutorial can cover S-learner, T-learner, X-learner, and related meta-learners, which provide a broader family of outcome-model-based CATE strategies.