This notebook introduces CausalForestDML, EconML’s forest-based estimator for heterogeneous treatment effects.
The earlier notebooks focused on linear final-stage CATE models. A linear CATE model is readable, but it can miss nonlinear patterns such as thresholds, saturation, and feature interactions. Causal forests are useful when the causal question is still the same but the heterogeneity surface is more flexible:
For each unit, how much would the outcome change under treatment, and how does that effect vary across pre-treatment features in a nonlinear way?
This lesson uses simulated data with known nonlinear treatment effects. That gives us a clean teaching loop: fit a causal forest, estimate CATE values and intervals, inspect feature importance, summarize segments, and check whether the forest learns the true effect ranking.
Learning Goals
By the end of this notebook, you should be able to:
explain when CausalForestDML is preferable to a linear CATE model;
define X, W, treatment, and outcome for a causal forest workflow;
fit CausalForestDML with flexible nuisance models;
compute CATE estimates, ATE estimates, and uncertainty intervals;
inspect forest feature importance without treating it as a causal assumption check;
compare causal-forest CATE recovery against a linear DML baseline;
summarize heterogeneous effects by segments, deciles, and targeted groups;
diagnose overlap, interval width, support, and ranking quality.
Why Causal Forests Are Different
CausalForestDML still follows the DML idea: nuisance models adjust for baseline outcome and treatment assignment, then a final treatment-effect model estimates heterogeneity.
The key difference is the final CATE model. Instead of estimating one linear equation over X, a causal forest estimates local treatment effects by building many honest trees. These trees split the feature space to find regions where treatment effects differ.
Important consequences:
CATE estimates can be nonlinear in X.
Interactions can be learned without manually writing interaction terms.
Feature importance replaces a simple coefficient table.
Estimates can be noisier in small or weak-overlap regions.
The model is less compact to explain than LinearDML, so diagnostics and segment summaries become more important.
Tutorial Flow
The notebook follows this sequence:
Create a nonlinear, confounded, truth-known dataset.
Check raw bias, covariate imbalance, and propensity overlap.
Define X and W roles for the forest.
Fit a linear DML baseline for comparison.
Fit CausalForestDML with inference enabled.
Compare CATE recovery, ATE error, and ranking quality.
Inspect feature importance and nonlinear effect slices.
Study uncertainty intervals and interval width drivers.
Summarize segment effects and targeting behavior.
Close with a practical causal-forest checklist.
Setup
This cell imports the packages used in the lesson, creates output folders, fixes a random seed, and checks that EconML is available. The warning filters keep the notebook readable while still allowing real execution errors to surface.
from pathlib import Pathimport osimport warningsimport importlib.metadata as importlib_metadata# Keep Matplotlib cache files in a writable location during notebook execution.os.environ.setdefault("MPLCONFIGDIR", "/tmp/matplotlib-ranking-sys")warnings.filterwarnings("default")warnings.filterwarnings("ignore", category=DeprecationWarning)warnings.filterwarnings("ignore", category=PendingDeprecationWarning)warnings.filterwarnings("ignore", category=FutureWarning)warnings.filterwarnings("ignore", message=".*IProgress not found.*")warnings.filterwarnings("ignore", message=".*X does not have valid feature names.*")warnings.filterwarnings("ignore", message=".*The final model has a nonzero intercept.*")warnings.filterwarnings("ignore", message=".*Co-variance matrix is underdetermined.*")warnings.filterwarnings("ignore", module="sklearn.linear_model._logistic")import numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsfrom IPython.display import displayfrom sklearn.ensemble import RandomForestClassifier, RandomForestRegressorfrom sklearn.metrics import brier_score_loss, log_loss, mean_squared_error, roc_auc_scorefrom sklearn.model_selection import KFold, StratifiedKFold, cross_val_predict, train_test_splittry:import econmlfrom econml.dml import CausalForestDML, LinearDML ECONML_AVAILABLE =True ECONML_VERSION =getattr(econml, "__version__", "unknown")exceptExceptionas exc: ECONML_AVAILABLE =False ECONML_VERSION =f"import failed: {type(exc).__name__}: {exc}"RANDOM_SEED =2026rng = np.random.default_rng(RANDOM_SEED)OUTPUT_DIR = Path("outputs")FIGURE_DIR = OUTPUT_DIR /"figures"TABLE_DIR = OUTPUT_DIR /"tables"FIGURE_DIR.mkdir(parents=True, exist_ok=True)TABLE_DIR.mkdir(parents=True, exist_ok=True)sns.set_theme(style="whitegrid", context="notebook")pd.set_option("display.max_columns", 140)pd.set_option("display.float_format", lambda value: f"{value:,.4f}")print(f"EconML available: {ECONML_AVAILABLE}")print(f"EconML version: {ECONML_VERSION}")print(f"Figures will be saved to: {FIGURE_DIR.resolve()}")print(f"Tables will be saved to: {TABLE_DIR.resolve()}")
EconML available: True
EconML version: 0.16.0
Figures will be saved to: /home/apex/Documents/ranking_sys/notebooks/tutorials/econml/outputs/figures
Tables will be saved to: /home/apex/Documents/ranking_sys/notebooks/tutorials/econml/outputs/tables
What this shows: the notebook is ready if EconML imports successfully. The output folders are shared across the EconML tutorial series, with filenames prefixed by 04_ for this lesson.
Estimator Map
Before generating data, it helps to state what the causal forest is meant to add beyond the previous linear lessons.
estimator_map = pd.DataFrame( [ {"estimator": "LinearDML","final CATE model": "Linear function of X","best fit": "Readable effect drivers when heterogeneity is roughly linear","output style": "Coefficients plus unit-level CATE estimates","main limitation": "Can miss thresholds, saturation, and interactions unless manually engineered", }, {"estimator": "CausalForestDML","final CATE model": "Forest-based local treatment-effect model","best fit": "Nonlinear heterogeneity, interactions, and segment discovery","output style": "CATE estimates, intervals, feature importance, segment summaries","main limitation": "Less compact than a coefficient table and more sensitive to support in small regions", }, ])estimator_map.to_csv(TABLE_DIR /"04_estimator_map.csv", index=False)display(estimator_map)
estimator
final CATE model
best fit
output style
main limitation
0
LinearDML
Linear function of X
Readable effect drivers when heterogeneity is ...
Coefficients plus unit-level CATE estimates
Can miss thresholds, saturation, and interacti...
1
CausalForestDML
Forest-based local treatment-effect model
Nonlinear heterogeneity, interactions, and seg...
CATE estimates, intervals, feature importance,...
Less compact than a coefficient table and more...
What this shows: the causal forest is not a replacement for causal design. It is a more flexible final-stage CATE model inside the same broad DML workflow.
Nonlinear Teaching Data
The next cell creates a dataset with observed confounding and a nonlinear true treatment effect. The true CATE includes:
a threshold effect for high baseline need;
a smooth nonlinear effect of prior engagement;
a friction penalty that becomes stronger at high friction;
an interaction between novelty affinity and baseline need;
a binary segment penalty for region risk.
These patterns are intentionally hard for a simple linear CATE model to capture fully.
What this shows: we now have a CATE surface with thresholds, smooth nonlinear terms, and interactions. A causal forest should have an advantage over a purely linear final-stage model in this setting.
Field Dictionary
A data dictionary prevents leakage and clarifies feature roles. The oracle columns are included only because this is a simulation; they must not be used as model inputs.
effect_modifier_cols = ["baseline_need","prior_engagement","friction_score","novelty_affinity","price_sensitivity","content_depth","recency_gap","region_risk","high_need_segment",]control_cols = ["account_tenure", "seasonality_index", "device_stability", "traffic_intensity"]all_observed_covariates = effect_modifier_cols + control_colstrue_driver_cols = ["baseline_need","prior_engagement","friction_score","novelty_affinity","price_sensitivity","content_depth","region_risk","high_need_segment",]field_rows = []for col in effect_modifier_cols: field_rows.append( {"column": col,"role": "X effect modifier","observed_in_real_analysis": "yes","description": "Pre-treatment feature allowed to shape the forest CATE function.","true_cate_driver": "yes"if col in true_driver_cols else"no", } )for col in control_cols: field_rows.append( {"column": col,"role": "W control","observed_in_real_analysis": "yes","description": "Pre-treatment adjustment feature used in nuisance models.","true_cate_driver": "no", } )for col, role, description in [ ("treatment", "treatment", "Binary intervention indicator."), ("outcome", "outcome", "Observed post-treatment outcome."), ("propensity", "oracle", "True treatment probability from the simulated assignment process."), ("true_cate", "oracle", "Known individual treatment effect used only for tutorial evaluation."), ("baseline_outcome_mean", "oracle", "Mean untreated outcome component before random noise."),]: field_rows.append( {"column": col,"role": role,"observed_in_real_analysis": "yes"if role in ["treatment", "outcome"] else"no","description": description,"true_cate_driver": "not applicable", } )field_dictionary = pd.DataFrame(field_rows)field_dictionary.to_csv(TABLE_DIR /"04_field_dictionary.csv", index=False)display(field_dictionary)
column
role
observed_in_real_analysis
description
true_cate_driver
0
baseline_need
X effect modifier
yes
Pre-treatment feature allowed to shape the for...
yes
1
prior_engagement
X effect modifier
yes
Pre-treatment feature allowed to shape the for...
yes
2
friction_score
X effect modifier
yes
Pre-treatment feature allowed to shape the for...
yes
3
novelty_affinity
X effect modifier
yes
Pre-treatment feature allowed to shape the for...
yes
4
price_sensitivity
X effect modifier
yes
Pre-treatment feature allowed to shape the for...
yes
5
content_depth
X effect modifier
yes
Pre-treatment feature allowed to shape the for...
yes
6
recency_gap
X effect modifier
yes
Pre-treatment feature allowed to shape the for...
no
7
region_risk
X effect modifier
yes
Pre-treatment feature allowed to shape the for...
yes
8
high_need_segment
X effect modifier
yes
Pre-treatment feature allowed to shape the for...
yes
9
account_tenure
W control
yes
Pre-treatment adjustment feature used in nuisa...
no
10
seasonality_index
W control
yes
Pre-treatment adjustment feature used in nuisa...
no
11
device_stability
W control
yes
Pre-treatment adjustment feature used in nuisa...
no
12
traffic_intensity
W control
yes
Pre-treatment adjustment feature used in nuisa...
no
13
treatment
treatment
yes
Binary intervention indicator.
not applicable
14
outcome
outcome
yes
Observed post-treatment outcome.
not applicable
15
propensity
oracle
no
True treatment probability from the simulated ...
not applicable
16
true_cate
oracle
no
Known individual treatment effect used only fo...
not applicable
17
baseline_outcome_mean
oracle
no
Mean untreated outcome component before random...
not applicable
What this shows: the forest gets one intentionally irrelevant effect modifier, recency_gap, plus several true drivers. Feature importance later should help separate the stronger CATE drivers from weaker or irrelevant dimensions.
Basic Shape And True Effect Scale
This summary tells us how large the dataset is, how common treatment is, and how much true treatment-effect variation exists.
What this shows: there is meaningful CATE variation and the treatment rate is not extreme. That makes the dataset appropriate for a causal-forest teaching example.
True CATE Distribution
Before fitting a model, we visualize the true treatment-effect distribution. In real data this plot is impossible, but it is useful here because the simulation lets us see what the model is trying to recover.
fig, ax = plt.subplots(figsize=(10, 5))sns.histplot(teaching_df["true_cate"], bins=45, kde=True, color="#2563eb", ax=ax)ax.axvline(teaching_df["true_cate"].mean(), color="#dc2626", linewidth=2, label="true ATE")ax.set_title("True CATE Distribution In The Teaching Data")ax.set_xlabel("True CATE")ax.set_ylabel("Rows")ax.legend()plt.tight_layout()fig.savefig(FIGURE_DIR /"04_true_cate_distribution.png", dpi=160, bbox_inches="tight")plt.show()
What this shows: the effect distribution includes both high-benefit and lower-benefit units. A useful CATE model should rank those units well, not merely estimate one average effect.
Raw Treated-Versus-Control Difference
A raw outcome difference is a useful warning label. It shows what we would get if we ignored confounding and heterogeneity.
What this shows: treated and untreated rows differ in observed covariates and in average true CATE. This is why a flexible CATE model still needs careful nuisance adjustment.
Covariate Balance Table
The standardized mean difference measures pre-treatment imbalance. Large absolute values mean treated and untreated rows differ before treatment.
What this shows: the treatment process is observably confounded. A causal forest estimates heterogeneous effects after DML adjustment; it is not just a predictive forest on raw outcomes.
Covariate Balance Plot
This plot highlights the most imbalanced pre-treatment features. It is a quick visual diagnostic of why adjustment is necessary.
What this shows: several CATE-relevant features are also treatment-assignment predictors. That makes the combination of DML adjustment and flexible heterogeneity modeling useful.
Propensity Overlap
Overlap means comparable units have some chance of being treated and some chance of being untreated. Causal forests can become unstable in regions with weak overlap because the model has little local contrast to learn from.
What this shows: most observations are in non-extreme propensity regions, which is helpful. The bucket summary also shows that propensity regions differ in average true effect, which makes naive comparisons especially risky.
Propensity Overlap Plot
The histogram below shows the true propensity distribution by observed treatment group. In real data, this would be based on an estimated propensity model.
What this shows: the groups overlap but are shifted. That is a good teaching case: enough support for estimation, but enough confounding that raw comparisons are not credible.
X And W Roles
For CausalForestDML, X contains the features that define the CATE surface. The forest splits over X to find regions with different treatment effects.
W contains additional pre-treatment controls used in nuisance models. These controls help adjust for confounding but are not used as the main axes of the final forest CATE surface.
role_table = pd.DataFrame( [ {"feature": col,"econml_role": "X","true_cate_driver": col in true_driver_cols,"reason": "Candidate effect modifier used by the causal forest CATE model.", }for col in effect_modifier_cols ]+ [ {"feature": col,"econml_role": "W","true_cate_driver": False,"reason": "Adjustment control used by nuisance models.", }for col in control_cols ])role_table.to_csv(TABLE_DIR /"04_x_w_role_table.csv", index=False)display(role_table)
feature
econml_role
true_cate_driver
reason
0
baseline_need
X
True
Candidate effect modifier used by the causal f...
1
prior_engagement
X
True
Candidate effect modifier used by the causal f...
2
friction_score
X
True
Candidate effect modifier used by the causal f...
3
novelty_affinity
X
True
Candidate effect modifier used by the causal f...
4
price_sensitivity
X
True
Candidate effect modifier used by the causal f...
5
content_depth
X
True
Candidate effect modifier used by the causal f...
6
recency_gap
X
False
Candidate effect modifier used by the causal f...
7
region_risk
X
True
Candidate effect modifier used by the causal f...
8
high_need_segment
X
True
Candidate effect modifier used by the causal f...
9
account_tenure
W
False
Adjustment control used by nuisance models.
10
seasonality_index
W
False
Adjustment control used by nuisance models.
11
device_stability
W
False
Adjustment control used by nuisance models.
12
traffic_intensity
W
False
Adjustment control used by nuisance models.
What this shows: choosing X is a substantive decision. The forest cannot discover heterogeneity along a feature that is excluded from X, even if that feature is included in W for adjustment.
Train And Test Split
The train set is used for model fitting. The test set is used for truth-known checks of CATE recovery, ranking, and interval behavior.
What this shows: the forest receives a compact set of effect modifiers. The nuisance models still receive enough pre-treatment information to adjust for treatment assignment and baseline outcome structure.
Separate Nuisance Diagnostics
EconML fits nuisance models internally, but a separate out-of-fold diagnostic pass helps us understand the assignment and outcome prediction problem.
What this shows: treatment is predictable from covariates, which confirms observed confounding. The outcome nuisance model also has meaningful predictive structure to remove before the final CATE stage.
Fit A LinearDML Baseline
A baseline model makes the causal forest easier to judge. Since the true CATE is nonlinear, a linear final-stage model should be useful but limited.
What this shows: the linear baseline gives us a reference point. If the causal forest is useful here, it should improve CATE recovery or ranking because the true effect surface is nonlinear.
Fit CausalForestDML
Now we fit the causal forest. Key parameters in this teaching setup:
n_estimators: number of trees; more trees reduce Monte Carlo noise.
min_samples_leaf: minimum local sample size in leaves; larger values smooth estimates.
max_samples: subsample fraction for honest forests; must stay below 0.5 when inference is enabled.
honest=True: separates splitting and estimation samples within trees.
What this shows: CausalForestDML returns unit-level CATE estimates and uncertainty intervals. The ATE from the estimator and the mean of unit-level CATE estimates should be close but are reported separately for clarity.
Compare Linear And Forest Metrics
The next table compares the raw difference, the linear DML baseline, and the causal forest on the test set.
What this shows: the forest should be judged by CATE recovery and ranking, not only by average-effect error. A flexible model is most valuable when it improves heterogeneity estimates.
CATE Recovery Scatter
The scatter plot compares estimated CATE with true CATE. The dashed diagonal line marks perfect recovery.
What this shows: the forest has room to learn curved patterns that the linear baseline compresses. The scatter is still noisy because treatment-effect estimation is harder than outcome prediction.
Causal Forest Feature Importance
feature_importances_ summarizes which X features the forest uses most when splitting the CATE surface.
This is not proof of causality. Feature importance is a model diagnostic: it helps explain what the fitted forest relied on, assuming the causal design is already reasonable.
forest_importance = pd.DataFrame( {"feature": effect_modifier_cols,"importance": np.ravel(causal_forest.feature_importances_),"true_cate_driver": [col in true_driver_cols for col in effect_modifier_cols], }).sort_values("importance", ascending=False)forest_importance.to_csv(TABLE_DIR /"04_causal_forest_feature_importance.csv", index=False)display(forest_importance)
feature
importance
true_cate_driver
2
friction_score
0.4603
True
0
baseline_need
0.1864
True
3
novelty_affinity
0.1345
True
5
content_depth
0.1089
True
4
price_sensitivity
0.0516
True
6
recency_gap
0.0287
False
1
prior_engagement
0.0219
True
7
region_risk
0.0066
True
8
high_need_segment
0.0011
True
What this shows: high-importance features are the dimensions the forest used most to partition treatment effects. A low-importance feature may still matter in a narrow region, but the table is a useful first summary.
Feature Importance Plot
The plot makes the importance ranking easier to scan and highlights whether the forest is emphasizing true CATE drivers in the simulation.
What this shows: feature importance helps turn a flexible forest into a readable summary. It should be paired with segment and calibration checks rather than read alone.
CATE Decile Calibration
CATE models are often used to rank units. The next table groups test rows into predicted CATE deciles and compares estimated and true average CATE.
What this shows: if predicted CATE ranking is useful, higher predicted deciles should also have higher true CATE in this simulation. This is one of the most practical checks for treatment targeting.
CATE Decile Calibration Plot
The plot compares estimated and true average CATE by predicted-effect decile for both models.
What this shows: decile calibration connects model output to decision-making. The forest is valuable if its ranking separates higher-benefit and lower-benefit rows more clearly.
Segment-Level CATE Recovery
Segment summaries are useful when a forest is too flexible to explain with coefficients. Here we summarize by high-need segment and region risk.
What this shows: segment summaries make flexible CATE estimates more communicable. The interval-width column also shows whether some segments are estimated with more uncertainty than others.
Segment Recovery Plot
This plot compares true and estimated segment-level effects.
What this shows: the forest can be summarized at the segment level even though it estimates effects continuously over X. This is often the easiest way to explain results to non-modeling audiences.
Nonlinear Effect Slice: Baseline Need
A causal forest can learn nonlinear effect variation. To visualize that, we group rows by quantiles of baseline_need and compare true and estimated average CATE.
What this shows: slicing by an important modifier lets us inspect the shape of heterogeneity. Because the true CATE has a threshold component in baseline need, a forest should track a nonlinear bend better than a simple linear final stage.
Baseline Need Slice Plot
This plot shows how average treatment effect changes across baseline-need buckets.
What this shows: this table checks whether estimated effects drop as friction increases. The forest is expected to capture this shape more naturally than a linear model.
Friction Slice Plot
The friction plot shows estimated and true average CATE across friction buckets.
What this shows: the vertical zero line marks where the true friction penalty begins to behave differently. This is the kind of shape a forest can learn without manual feature engineering.
Uncertainty Interval Summary
Causal forests can produce unit-level effect intervals. The next cell summarizes interval width and simple truth-known coverage.
Coverage is available only because this is simulated data. In real data, interval width is still useful as a rough uncertainty diagnostic.
What this shows: point estimates and uncertainty intervals answer different questions. A positive point estimate may still have an interval that crosses zero, especially in weak-support or noisy regions.
Interval Width Drivers
Intervals tend to widen when the model has less local information. The next cell relates interval width to propensity, treatment status, and key features.
What this shows: interval width can be used as a support diagnostic. Stronger width in extreme propensity or feature regions suggests the forest is less certain where comparable examples are thinner.
Interval Width Plot
The scatter plot shows how interval width changes with propensity extremeness. Extreme propensity values often mean fewer comparable treated and untreated observations.
What this shows: uncertainty is not evenly distributed. When using CATE estimates for decisions, high predicted benefit should be weighed against uncertainty and support.
Targeting Comparison
A common use for CATE estimates is selecting a top fraction of units for treatment. The next cell compares random targeting, linear-DML targeting, causal-forest targeting, and an oracle benchmark.
What this shows: targeting by point estimate and targeting by lower interval can choose different units. The lower-interval rule is more conservative because it rewards high estimated benefit and lower uncertainty.
Targeting Plot
The plot compares true average benefit among selected rows under each targeting rule.
fig, ax = plt.subplots(figsize=(11, 5))sns.barplot( data=targeting_summary, x="average_true_cate_in_selected_group", y="targeting_rule", color="#34d399", ax=ax,)ax.set_title("True Benefit Among Targeted Test Rows")ax.set_xlabel("Average True CATE In Selected Group")ax.set_ylabel("Targeting Rule")plt.tight_layout()fig.savefig(FIGURE_DIR /"04_targeting_summary.png", dpi=160, bbox_inches="tight")plt.show()
What this shows: model quality becomes operational in targeting. A good CATE model identifies a selected group with higher true benefit than random selection.
Support-Aware CATE Table
The next table bins rows by estimated CATE and interval width. This helps separate high estimated benefit from high estimated benefit with high uncertainty.
What this shows: a high point estimate is more persuasive when the interval is not extremely wide. This support-aware view is useful when deciding whether to act on CATE estimates.
Practical Causal Forest Guidance
This table summarizes when a causal forest is a good choice and what to watch carefully.
practical_guidance = pd.DataFrame( [ {"situation": "Expected heterogeneity is nonlinear or interaction-heavy","why CausalForestDML helps": "The forest can split on feature regions without manually specifying every interaction.","watchout": "Use slices and segment summaries to keep the result explainable.", }, {"situation": "The main goal is treatment targeting","why CausalForestDML helps": "Ranking quality can improve when the true CATE surface is nonlinear.","watchout": "Evaluate targeting with policy checks, experiments, or simulation truth when available.", }, {"situation": "There are weak-overlap regions","why CausalForestDML helps": "Intervals and local estimates can reveal uncertainty.","watchout": "Avoid overusing estimates in unsupported regions.", }, {"situation": "A simple coefficient narrative is required","why CausalForestDML helps": "Feature importance and slices can still explain broad patterns.","watchout": "A linear DML model may be easier to communicate if performance is similar.", }, ])practical_guidance.to_csv(TABLE_DIR /"04_practical_guidance.csv", index=False)display(practical_guidance)
situation
why CausalForestDML helps
watchout
0
Expected heterogeneity is nonlinear or interac...
The forest can split on feature regions withou...
Use slices and segment summaries to keep the r...
1
The main goal is treatment targeting
Ranking quality can improve when the true CATE...
Evaluate targeting with policy checks, experim...
2
There are weak-overlap regions
Intervals and local estimates can reveal uncer...
Avoid overusing estimates in unsupported regions.
3
A simple coefficient narrative is required
Feature importance and slices can still explai...
A linear DML model may be easier to communicat...
What this shows: CausalForestDML is most compelling when flexibility improves the decision problem. If a simpler model performs similarly, the simpler model may be preferable.
Causal Forest Checklist
Before presenting a causal forest estimate, it is worth checking the items below.
forest_checklist = pd.DataFrame( [ {"check": "Treatment and outcome are clearly defined", "why_it_matters": "The forest estimates the effect of a specific intervention."}, {"check": "All X and W features are pre-treatment", "why_it_matters": "Post-treatment controls can distort the causal estimand."}, {"check": "X contains meaningful heterogeneity dimensions", "why_it_matters": "The forest splits over X to estimate CATE variation."}, {"check": "W contains important confounding controls", "why_it_matters": "Nuisance models need enough information to adjust treatment and outcome structure."}, {"check": "Overlap is adequate", "why_it_matters": "Local treatment-effect estimates need comparable treated and untreated units."}, {"check": "Nuisance models are reasonable", "why_it_matters": "Poor nuisance models leave confounding in the final CATE stage."}, {"check": "Feature importance and slices make sense", "why_it_matters": "Flexible models need readable summaries."}, {"check": "Interval widths are inspected", "why_it_matters": "Wide intervals flag uncertain regions."}, {"check": "Targeting is evaluated with uncertainty in mind", "why_it_matters": "High point estimates can be fragile when support is weak."}, ])forest_checklist.to_csv(TABLE_DIR /"04_causal_forest_checklist.csv", index=False)display(forest_checklist)
check
why_it_matters
0
Treatment and outcome are clearly defined
The forest estimates the effect of a specific ...
1
All X and W features are pre-treatment
Post-treatment controls can distort the causal...
2
X contains meaningful heterogeneity dimensions
The forest splits over X to estimate CATE vari...
3
W contains important confounding controls
Nuisance models need enough information to adj...
4
Overlap is adequate
Local treatment-effect estimates need comparab...
5
Nuisance models are reasonable
Poor nuisance models leave confounding in the ...
6
Feature importance and slices make sense
Flexible models need readable summaries.
7
Interval widths are inspected
Wide intervals flag uncertain regions.
8
Targeting is evaluated with uncertainty in mind
High point estimates can be fragile when suppo...
What this shows: causal forests are powerful, but the analysis is only credible when model output is paired with design checks, support checks, and uncertainty-aware reporting.
Summary
This notebook introduced CausalForestDML as a flexible DML estimator for nonlinear heterogeneous treatment effects.
The main takeaways are:
causal forests estimate a flexible CATE surface over X;
nuisance models still handle outcome and treatment adjustment;
feature importance helps explain what the fitted forest used;
effect intervals help distinguish high estimated benefit from high confidence;
segment summaries and effect slices make forest estimates easier to communicate;
CATE deciles and targeting tables connect estimation to action;
causal forests are most useful when they improve heterogeneity recovery or targeting over simpler linear models.
The next tutorial can move to DRLearner, where the focus shifts from forest-style local treatment effects to doubly robust pseudo-outcomes for binary treatment settings.