EconML Tutorial 06: Meta-Learners: S, T, And X Learners
This notebook introduces EconML’s classic meta-learners for heterogeneous treatment effects:
SLearner: one outcome model with treatment included as a feature;
TLearner: separate outcome models for treated and untreated units;
XLearner: two outcome models plus imputed treatment-effect models, designed to help when treatment groups are imbalanced.
The causal question is the same as before:
How much would the outcome change for each unit if treatment were applied instead of not applied?
The modeling style is different. Meta-learners are outcome-model-based strategies. They are easier to explain than DML in some settings, but they rely heavily on the quality of outcome models and the overlap between treated and untreated units.
Learning Goals
By the end of this notebook, you should be able to:
explain the S-learner, T-learner, and X-learner in plain language;
understand why EconML meta-learners use one X matrix rather than separate X and W arguments;
fit SLearner, TLearner, and XLearner on a truth-known teaching dataset;
compare outcome-model quality by treatment arm;
evaluate CATE recovery, ranking, segment effects, and targeting performance;
use model-agnostic permutation diagnostics to understand which features drive CATE predictions;
decide when each meta-learner is a reasonable first choice.
The Three Meta-Learner Ideas
Meta-learners turn supervised learning models into treatment-effect estimators.
S-learner:
Fits one model: Y ~ X + T.
Estimates CATE by predicting twice for each row: once with T=1, once with T=0, then subtracting.
Strength: simple and uses all data in one model.
Risk: the model may underuse the treatment feature, especially when treatment effects are subtle.
T-learner:
Fits two models: one on treated rows and one on untreated rows.
Estimates CATE as predicted Y under treatment - predicted Y under control.
Strength: lets treated and untreated response surfaces differ freely.
Risk: one arm can be poorly learned if treatment groups are imbalanced.
X-learner:
Starts like a T-learner, then imputes treatment effects for each group.
Fits CATE models to those imputed effects.
Combines the two CATE models using propensity weights.
Strength: often helpful when one treatment arm is smaller than the other.
Risk: more moving pieces, more dependence on nuisance quality.
Tutorial Flow
This lesson follows a comparison-first structure:
Create an imbalanced binary-treatment dataset with known CATE.
Check raw bias, covariate imbalance, and propensity overlap.
Explain why EconML meta-learners use a single feature matrix.
Fit S-, T-, and X-learners.
Compare with manual S- and T-learner calculations.
Evaluate CATE recovery, decile calibration, and segment summaries.
Inspect model-agnostic CATE feature sensitivity.
Compare treatment targeting rules.
Summarize when each learner is likely to work well or fail.
Setup
This cell imports the packages used in the lesson, creates output folders, fixes a random seed, and checks that EconML is available. The warning filters keep output readable without hiding execution errors.
from pathlib import Pathimport osimport warningsimport importlib.metadata as importlib_metadata# Keep Matplotlib cache files in a writable location during notebook execution.os.environ.setdefault("MPLCONFIGDIR", "/tmp/matplotlib-ranking-sys")warnings.filterwarnings("default")warnings.filterwarnings("ignore", category=DeprecationWarning)warnings.filterwarnings("ignore", category=PendingDeprecationWarning)warnings.filterwarnings("ignore", category=FutureWarning)warnings.filterwarnings("ignore", message=".*IProgress not found.*")warnings.filterwarnings("ignore", message=".*X does not have valid feature names.*")warnings.filterwarnings("ignore", message=".*The final model has a nonzero intercept.*")warnings.filterwarnings("ignore", message=".*Co-variance matrix is underdetermined.*")warnings.filterwarnings("ignore", module="sklearn.linear_model._logistic")import numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsfrom IPython.display import displayfrom sklearn.base import clonefrom sklearn.ensemble import RandomForestClassifier, RandomForestRegressor, GradientBoostingRegressorfrom sklearn.linear_model import LinearRegression, LogisticRegressionfrom sklearn.metrics import brier_score_loss, log_loss, mean_squared_error, roc_auc_scorefrom sklearn.model_selection import KFold, StratifiedKFold, cross_val_predict, train_test_splitfrom sklearn.pipeline import make_pipelinefrom sklearn.preprocessing import StandardScalertry:import econmlfrom econml.metalearners import SLearner, TLearner, XLearner ECONML_AVAILABLE =True ECONML_VERSION =getattr(econml, "__version__", "unknown")exceptExceptionas exc: ECONML_AVAILABLE =False ECONML_VERSION =f"import failed: {type(exc).__name__}: {exc}"RANDOM_SEED =2026rng = np.random.default_rng(RANDOM_SEED)OUTPUT_DIR = Path("outputs")FIGURE_DIR = OUTPUT_DIR /"figures"TABLE_DIR = OUTPUT_DIR /"tables"FIGURE_DIR.mkdir(parents=True, exist_ok=True)TABLE_DIR.mkdir(parents=True, exist_ok=True)sns.set_theme(style="whitegrid", context="notebook")pd.set_option("display.max_columns", 140)pd.set_option("display.float_format", lambda value: f"{value:,.4f}")print(f"EconML available: {ECONML_AVAILABLE}")print(f"EconML version: {ECONML_VERSION}")print(f"Figures will be saved to: {FIGURE_DIR.resolve()}")print(f"Tables will be saved to: {TABLE_DIR.resolve()}")
EconML available: True
EconML version: 0.16.0
Figures will be saved to: /home/apex/Documents/ranking_sys/notebooks/tutorials/econml/outputs/figures
Tables will be saved to: /home/apex/Documents/ranking_sys/notebooks/tutorials/econml/outputs/tables
What this shows: the notebook is ready if EconML imports successfully. All saved artifacts use the 06_ prefix so they are easy to find in the shared tutorial output folder.
Meta-Learner Map
The next table gives a compact comparison of S-, T-, and X-learners before we start fitting models.
meta_learner_map = pd.DataFrame( [ {"learner": "S-learner","models fitted": "One outcome model","effect estimate": "Predict each row with T=1 and T=0, then subtract","strength": "Simple and data-efficient","main risk": "Treatment signal can be washed out if the outcome model ignores T", }, {"learner": "T-learner","models fitted": "Two outcome models, one per treatment arm","effect estimate": "Predicted treated outcome minus predicted control outcome","strength": "Allows very different response surfaces by arm","main risk": "Small treatment arm can have a weak outcome model", }, {"learner": "X-learner","models fitted": "Two outcome models, two imputed-effect models, and a propensity model","effect estimate": "Propensity-weighted combination of imputed-effect models","strength": "Often useful with treatment imbalance","main risk": "More nuisance-model dependence and more tuning choices", }, ])meta_learner_map.to_csv(TABLE_DIR /"06_meta_learner_map.csv", index=False)display(meta_learner_map)
learner
models fitted
effect estimate
strength
main risk
0
S-learner
One outcome model
Predict each row with T=1 and T=0, then subtract
Simple and data-efficient
Treatment signal can be washed out if the outc...
1
T-learner
Two outcome models, one per treatment arm
Predicted treated outcome minus predicted cont...
Allows very different response surfaces by arm
Small treatment arm can have a weak outcome model
2
X-learner
Two outcome models, two imputed-effect models,...
Propensity-weighted combination of imputed-eff...
Often useful with treatment imbalance
More nuisance-model dependence and more tuning...
What this shows: all three learners estimate CATE through supervised prediction, but they organize the prediction problem differently. The best choice depends on treatment balance, response-surface complexity, and reporting needs.
Synthetic Teaching Data
The dataset below intentionally has an imbalanced treatment rate. That makes the X-learner worth discussing, because X-learners were designed partly for settings where one treatment arm has fewer observations.
The true CATE is nonlinear and depends on several pre-treatment features. The outcome is continuous, treatment is binary, and assignment is confounded by observed covariates.
What this shows: treatment is observational and imbalanced, not randomized. The oracle columns make it possible to grade the meta-learners, but those fields will not be used as model inputs.
Field Dictionary
Meta-learners in EconML use one X matrix. That X should include all pre-treatment covariates needed to model potential outcomes and treatment-effect heterogeneity.
This differs from the DML notebooks where we separated X and W. Here we still track conceptual roles, but the fitted meta-learners receive one combined pre-treatment feature table.
effect_modifier_cols = ["baseline_need","prior_engagement","friction_score","content_affinity","price_sensitivity","region_risk","high_need_segment",]extra_adjustment_cols = ["trust_score", "recency_gap", "account_tenure", "seasonality_index", "device_stability", "traffic_intensity"]all_feature_cols = effect_modifier_cols + extra_adjustment_colstrue_driver_cols = ["baseline_need","prior_engagement","friction_score","content_affinity","price_sensitivity","region_risk","high_need_segment",]field_rows = []for col in effect_modifier_cols: field_rows.append( {"column": col,"conceptual_role": "effect modifier and pre-treatment covariate","included_in_econml_X": "yes","observed_in_real_analysis": "yes","description": "Feature expected to help explain treatment-effect heterogeneity.","true_cate_driver": "yes"if col in true_driver_cols else"no", } )for col in extra_adjustment_cols: field_rows.append( {"column": col,"conceptual_role": "pre-treatment outcome or assignment predictor","included_in_econml_X": "yes","observed_in_real_analysis": "yes","description": "Feature included because meta-learners use one X table for outcome modeling and heterogeneity.","true_cate_driver": "no", } )for col, role, description in [ ("treatment", "treatment", "Binary intervention indicator."), ("outcome", "outcome", "Observed post-treatment outcome."), ("propensity", "oracle", "True treatment probability from the simulated assignment process."), ("mu0", "oracle", "True conditional mean outcome under control."), ("mu1", "oracle", "True conditional mean outcome under treatment."), ("true_cate", "oracle", "Known individual treatment effect used only for tutorial evaluation."),]: field_rows.append( {"column": col,"conceptual_role": role,"included_in_econml_X": "no","observed_in_real_analysis": "yes"if role in ["treatment", "outcome"] else"no","description": description,"true_cate_driver": "not applicable", } )field_dictionary = pd.DataFrame(field_rows)field_dictionary.to_csv(TABLE_DIR /"06_field_dictionary.csv", index=False)display(field_dictionary)
column
conceptual_role
included_in_econml_X
observed_in_real_analysis
description
true_cate_driver
0
baseline_need
effect modifier and pre-treatment covariate
yes
yes
Feature expected to help explain treatment-eff...
yes
1
prior_engagement
effect modifier and pre-treatment covariate
yes
yes
Feature expected to help explain treatment-eff...
yes
2
friction_score
effect modifier and pre-treatment covariate
yes
yes
Feature expected to help explain treatment-eff...
yes
3
content_affinity
effect modifier and pre-treatment covariate
yes
yes
Feature expected to help explain treatment-eff...
yes
4
price_sensitivity
effect modifier and pre-treatment covariate
yes
yes
Feature expected to help explain treatment-eff...
yes
5
region_risk
effect modifier and pre-treatment covariate
yes
yes
Feature expected to help explain treatment-eff...
yes
6
high_need_segment
effect modifier and pre-treatment covariate
yes
yes
Feature expected to help explain treatment-eff...
yes
7
trust_score
pre-treatment outcome or assignment predictor
yes
yes
Feature included because meta-learners use one...
no
8
recency_gap
pre-treatment outcome or assignment predictor
yes
yes
Feature included because meta-learners use one...
no
9
account_tenure
pre-treatment outcome or assignment predictor
yes
yes
Feature included because meta-learners use one...
no
10
seasonality_index
pre-treatment outcome or assignment predictor
yes
yes
Feature included because meta-learners use one...
no
11
device_stability
pre-treatment outcome or assignment predictor
yes
yes
Feature included because meta-learners use one...
no
12
traffic_intensity
pre-treatment outcome or assignment predictor
yes
yes
Feature included because meta-learners use one...
no
13
treatment
treatment
no
yes
Binary intervention indicator.
not applicable
14
outcome
outcome
no
yes
Observed post-treatment outcome.
not applicable
15
propensity
oracle
no
no
True treatment probability from the simulated ...
not applicable
16
mu0
oracle
no
no
True conditional mean outcome under control.
not applicable
17
mu1
oracle
no
no
True conditional mean outcome under treatment.
not applicable
18
true_cate
oracle
no
no
Known individual treatment effect used only fo...
not applicable
What this shows: because the meta-learner API uses one X, feature selection has to be done carefully. We include all valid pre-treatment predictors, but we still remember which features are true effect drivers in the simulation.
Basic Shape And Treatment Imbalance
This summary checks the sample size, number of features, treatment rate, and true effect scale. The treatment rate is intentionally below 50 percent to create a useful X-learner teaching case.
What this shows: the smaller treated group means the treated outcome surface is harder to learn than the control outcome surface. That imbalance is exactly why the T-learner and X-learner may behave differently.
True CATE Distribution
The true CATE distribution shows the heterogeneity that the learners are trying to recover. This is available only because the data is simulated.
What this shows: the true treatment effect varies meaningfully across units. A meta-learner should be judged on how well it recovers this variation, not only on the overall average.
Raw Treated-Versus-Control Difference
A raw outcome difference ignores both confounding and treatment imbalance. It is useful as a baseline warning, not as a credible causal estimate.
What this shows: treated rows differ from control rows before treatment. Meta-learners can adjust only through the observed features included in X, so the feature table has to be causally defensible.
Covariate Balance Table
Standardized mean differences show how different treated and untreated groups are before modeling. Large absolute values indicate observed confounding.
What this shows: the treatment group is not directly comparable to the control group. Outcome-model meta-learners must use these pre-treatment features to model potential outcomes credibly.
Covariate Balance Plot
The plot highlights the most imbalanced features, which are the visible sources of confounding in the teaching data.
What this shows: several features tied to treatment assignment are also tied to outcome and effect heterogeneity. This is why raw comparisons are misleading.
Propensity Overlap
Meta-learners do not always use propensities directly, but overlap still matters. If treated and untreated rows occupy different feature regions, outcome models must extrapolate potential outcomes.
What this shows: treatment is less common overall, but there are still treated and untreated observations across useful propensity regions. That makes this suitable for meta-learner comparison.
Propensity Overlap Plot
The histogram shows true propensity by observed treatment group. In real data, this would use an estimated propensity model.
What this shows: the treated distribution is shifted toward higher propensity, but it is not completely separated from controls. The overlap is usable, though not perfect.
Train And Test Split
The train set is used to fit the learners. The test set is held out for truth-known evaluation of CATE recovery and targeting behavior.
What this shows: stratification preserves the treatment imbalance in both splits. This lets the learner comparison focus on modeling strategy rather than a strange train/test split.
Modeling Matrices
EconML’s meta-learners take a single X. Here that matrix contains all valid pre-treatment features, including both effect modifiers and adjustment predictors.
What this shows: unlike DML estimators, these meta-learners do not receive a separate control matrix. If a pre-treatment feature is needed for adjustment or outcome prediction, it must be included in X.
Outcome Model Diagnostics By Arm
The T- and X-learners rely heavily on separate outcome models for treated and untreated rows. Because treatment is imbalanced, we should check how much data each arm has and how well arm-specific outcome models predict out of fold.
What this shows: the treated arm has fewer rows, so its outcome surface is harder to estimate. This is the T-learner’s main vulnerability and the X-learner’s motivation.
Propensity Model Diagnostic
The X-learner uses a propensity model to combine imputed-effect models. This diagnostic checks whether treatment assignment is predictable from X.
What this shows: treatment assignment is predictable from the feature matrix. That is expected because the simulation is observational, and it makes the propensity-weighted X-learner combination meaningful.
Manual S-Learner
Before fitting EconML’s SLearner, we implement the S-learner idea manually:
Add treatment as a feature to the training matrix.
Fit one outcome model.
For each test row, predict twice: once with treatment set to 1 and once with treatment set to 0.
Subtract the two predictions.
This manual version makes the EconML class easier to understand.
What this shows: the S-learner is simple, but its CATE estimates depend on the model learning a useful treatment interaction. If the model treats treatment as a weak predictor, heterogeneity can be muted.
Manual T-Learner
The T-learner fits separate response surfaces for treated and untreated rows. It then subtracts predicted control outcome from predicted treated outcome.
What this shows: the T-learner can represent very different treated and control surfaces, but the smaller treated arm can make the treated model noisier.
Manual X-Learner Components
The X-learner starts from T-learner outcome models, then imputes treatment effects:
for treated rows: observed treated outcome - predicted control outcome;
for control rows: predicted treated outcome - observed control outcome.
Then it fits CATE models to those imputed effects and combines them using propensity weights.
# Impute treatment effects on each observed arm using the manual T-learner outcome models.treated_train_mask = T_train ==1control_train_mask = T_train ==0imputed_effect_treated = Y_train[treated_train_mask] - t_model_control.predict(X_train.loc[treated_train_mask])imputed_effect_control = t_model_treated.predict(X_train.loc[control_train_mask]) - Y_train[control_train_mask]x_cate_model_treated = RandomForestRegressor(n_estimators=180, min_samples_leaf=20, random_state=RANDOM_SEED +3, n_jobs=-1)x_cate_model_control = RandomForestRegressor(n_estimators=180, min_samples_leaf=20, random_state=RANDOM_SEED +4, n_jobs=-1)x_cate_model_treated.fit(X_train.loc[treated_train_mask], imputed_effect_treated)x_cate_model_control.fit(X_train.loc[control_train_mask], imputed_effect_control)propensity_model = RandomForestClassifier(n_estimators=180, min_samples_leaf=20, random_state=RANDOM_SEED +5, n_jobs=-1)propensity_model.fit(X_train, T_train)propensity_test = np.clip(propensity_model.predict_proba(X_test)[:, 1], 0.025, 0.975)# Following the common X-learner weighting: lean more on the CATE model trained from the larger opposite arm.manual_x_cate_test = propensity_test * x_cate_model_control.predict(X_test) + (1- propensity_test) * x_cate_model_treated.predict(X_test)manual_x_components = pd.DataFrame( [ {"component": "treated-arm imputed effects", "rows": len(imputed_effect_treated), "mean": imputed_effect_treated.mean(), "std": imputed_effect_treated.std()}, {"component": "control-arm imputed effects", "rows": len(imputed_effect_control), "mean": imputed_effect_control.mean(), "std": imputed_effect_control.std()}, {"component": "test propensity weights", "rows": len(propensity_test), "mean": propensity_test.mean(), "std": propensity_test.std()}, ])manual_x_components.to_csv(TABLE_DIR /"06_manual_x_learner_components.csv", index=False)display(manual_x_components)
component
rows
mean
std
0
treated-arm imputed effects
703
0.5328
1.2822
1
control-arm imputed effects
1377
0.2607
1.1767
2
test propensity weights
1120
0.3382
0.2028
What this shows: the X-learner turns the problem into two imputed-effect regressions. Its appeal is clearest when one arm has many more observations than the other.
Manual X-Learner Summary
Now we evaluate the manual X-learner’s test-set CATE estimates.
What this shows: the manual X-learner gives us a transparent benchmark before using EconML’s implementation. The exact result depends heavily on the outcome and imputed-effect models.
Fit EconML SLearner
Now we fit EconML’s SLearner. It implements the same basic idea as the manual S-learner: one outcome model, treatment included internally, and effect estimates from treatment contrasts.
What this shows: EconML’s S-learner returns unit-level CATE estimates from one fitted outcome model. It is usually the simplest meta-learner to implement and explain.
Fit EconML TLearner
The EconML TLearner fits separate models by treatment arm. We pass a single base model template and EconML clones it for each treatment category.
What this shows: the X-learner gives another CATE estimate based on imputed-effect learning. It is especially worth comparing when treatment arms are imbalanced.
Estimator Comparison
The next table compares raw difference, manual meta-learners, and EconML meta-learners on the same test set.
What this shows: meta-learners should be compared on both average-effect accuracy and CATE ranking quality. The raw difference cannot be evaluated as a CATE model because it returns only one contrast.
CATE Recovery Scatter
This scatter plot compares estimated CATE with known true CATE for the three EconML meta-learners.
What this shows: each learner creates a different shape of CATE estimate. The best visual is not always the smoothest one; the goal is good recovery of true heterogeneity and useful ranking.
Model-Agnostic CATE Sensitivity
EconML meta-learners do not all expose comparable feature importance objects. Instead, we can use a simple model-agnostic diagnostic: permute one feature at a time in the test set and measure how much the CATE predictions change.
This is not causal proof. It is a fitted-model sensitivity diagnostic.
What this shows: permutation sensitivity tells us which features the fitted CATE functions rely on most. It is a useful way to compare learners when coefficients are not available.
CATE Sensitivity Plot
The plot compares the top sensitivity features for each learner.
What this shows: different meta-learners may rely on different features even when trained on the same data. That is a reason to compare methods rather than treating one learner as automatically correct.
CATE Decile Calibration
CATE models are often used to rank units by expected benefit. The next table groups test rows by predicted CATE decile for each EconML learner.
What this shows: if a learner ranks well, higher predicted-effect deciles should have higher average true CATE. This check is central for treatment targeting.
CATE Decile Calibration Plot
The plot compares estimated and true average CATE by predicted-effect decile.
What this shows: decile calibration turns CATE estimation into a ranking check. Even when point estimates are noisy, useful rankings can still support policy decisions.
Segment-Level Recovery
Segment summaries make heterogeneous effects easier to communicate. Here we summarize by high-need segment and region risk.
What this shows: segment summaries are often easier to explain than unit-level scatter plots. They also show whether treatment imbalance is hurting particular groups.
Treatment Arm Support By Segment
Because meta-learners rely on outcome models by treatment arm, segment support matters. This table counts treated and control rows in each important segment.
What this shows: segments with few treated rows are harder for treated-arm outcome models. This support view explains why T-learner estimates can be noisy in some regions.
Targeting Comparison
A common use for CATE estimates is selecting the top fraction of units for treatment. The next cell compares random targeting, each EconML meta-learner, and an oracle benchmark.
What this shows: treatment targeting translates CATE ranking into an operational decision. The oracle benchmark is not achievable in real data, but it tells us how much room remains.
Targeting Plot
This plot compares true average benefit among the selected rows under each rule.
fig, ax = plt.subplots(figsize=(11, 5))sns.barplot( data=targeting_summary, x="average_true_cate_in_selected_group", y="targeting_rule", color="#34d399", ax=ax,)ax.set_title("True Benefit Among Targeted Test Rows")ax.set_xlabel("Average True CATE In Selected Group")ax.set_ylabel("Targeting Rule")plt.tight_layout()fig.savefig(FIGURE_DIR /"06_targeting_summary.png", dpi=160, bbox_inches="tight")plt.show()
What this shows: the best meta-learner depends on what the final use requires. For targeting, ranking quality often matters more than exact point-estimate calibration.
Learner Selection Guidance
This table summarizes when each meta-learner is a reasonable first choice.
selection_guidance = pd.DataFrame( [ {"situation": "You need a simple baseline quickly","learner_to_try": "S-learner","why": "One model is easy to fit and explain.","watchout": "It may understate heterogeneity if treatment is not used strongly by the outcome model.", }, {"situation": "Treatment and control groups are both large","learner_to_try": "T-learner","why": "Separate response surfaces can capture different outcome patterns by arm.","watchout": "If one arm is small, that arm's model may be unstable.", }, {"situation": "Treatment groups are imbalanced","learner_to_try": "X-learner","why": "Imputed-effect models can borrow strength from the larger arm.","watchout": "It has more nuisance components and can be sensitive to propensity quality.", }, {"situation": "The goal is robust causal adjustment with explicit nuisance modeling","learner_to_try": "DRLearner or DML estimator","why": "Meta-learners are outcome-model based and may not be enough for complex confounding.","watchout": "Choose the estimator based on the design, not only prediction performance.", }, ])selection_guidance.to_csv(TABLE_DIR /"06_selection_guidance.csv", index=False)display(selection_guidance)
situation
learner_to_try
why
watchout
0
You need a simple baseline quickly
S-learner
One model is easy to fit and explain.
It may understate heterogeneity if treatment i...
1
Treatment and control groups are both large
T-learner
Separate response surfaces can capture differe...
If one arm is small, that arm's model may be u...
2
Treatment groups are imbalanced
X-learner
Imputed-effect models can borrow strength from...
It has more nuisance components and can be sen...
3
The goal is robust causal adjustment with expl...
DRLearner or DML estimator
Meta-learners are outcome-model based and may ...
Choose the estimator based on the design, not ...
What this shows: meta-learners are practical baselines and sometimes strong performers, but they are not automatically superior to DML or DR methods. The data shape and decision goal matter.
Meta-Learner Checklist
Before presenting meta-learner CATE estimates, it is worth checking the items below.
meta_learner_checklist = pd.DataFrame( [ {"check": "Treatment and outcome are clearly defined", "why_it_matters": "Meta-learners estimate a specific treatment contrast."}, {"check": "All X features are pre-treatment", "why_it_matters": "Post-treatment features can contaminate potential outcome modeling."}, {"check": "Important confounders are included in X", "why_it_matters": "The EconML meta-learner API has no separate W argument."}, {"check": "Treatment-arm sample sizes are inspected", "why_it_matters": "T- and X-learners depend on arm-specific outcome models."}, {"check": "Overlap is inspected", "why_it_matters": "Outcome models extrapolate when treated and control feature support differs."}, {"check": "Outcome model quality is checked by arm", "why_it_matters": "Poor potential-outcome models produce poor CATE estimates."}, {"check": "Multiple learners are compared", "why_it_matters": "Different meta-learners can tell different CATE stories."}, {"check": "CATE rankings are evaluated", "why_it_matters": "Targeting depends on ranking, not only average effect."}, {"check": "Segment summaries are reported", "why_it_matters": "Segments make unit-level estimates easier to audit and explain."}, ])meta_learner_checklist.to_csv(TABLE_DIR /"06_meta_learner_checklist.csv", index=False)display(meta_learner_checklist)
check
why_it_matters
0
Treatment and outcome are clearly defined
Meta-learners estimate a specific treatment co...
1
All X features are pre-treatment
Post-treatment features can contaminate potent...
2
Important confounders are included in X
The EconML meta-learner API has no separate W ...
3
Treatment-arm sample sizes are inspected
T- and X-learners depend on arm-specific outco...
4
Overlap is inspected
Outcome models extrapolate when treated and co...
5
Outcome model quality is checked by arm
Poor potential-outcome models produce poor CAT...
6
Multiple learners are compared
Different meta-learners can tell different CAT...
7
CATE rankings are evaluated
Targeting depends on ranking, not only average...
8
Segment summaries are reported
Segments make unit-level estimates easier to a...
What this shows: meta-learners are easy to fit, but credible use still requires causal design checks, outcome-model diagnostics, and support-aware reporting.
Summary
This notebook compared S-, T-, and X-learners for heterogeneous treatment effects.
The main takeaways are:
S-learners fit one outcome model and estimate effects by toggling treatment;
T-learners fit separate outcome models for treated and untreated rows;
X-learners impute treatment effects and can help when treatment arms are imbalanced;
EconML meta-learners use one X matrix, so valid pre-treatment adjustment features must be included there;
treatment imbalance makes arm-specific diagnostics important;
CATE recovery should be evaluated through RMSE, correlation, decile calibration, segment summaries, and targeting performance;
meta-learners are excellent baselines and teaching tools, but DML and DR learners may be better when adjustment structure is central.
The next tutorial can move from estimating CATE to using CATE estimates for policy learning and treatment targeting.