EconML Tutorial 10: Multiple Treatments And Continuous Treatments
Most introductory causal examples use a binary treatment: treated or not treated. Real decision systems are often richer than that. A product team might choose among several intervention types, a pricing team might choose a discount amount, and an operations team might choose how much support or attention to allocate.
This notebook extends the EconML tutorial sequence beyond binary treatment in two directions:
multiple discrete treatments, where the decision is one of several mutually exclusive arms;
continuous treatments, where the decision is a dose, intensity, price, time, or quantity.
These settings require a careful shift in language. Instead of asking only “what is the effect of treatment versus control?”, we ask:
What is the effect of arm 1 versus control?
What is the effect of arm 2 versus control?
Which arm has the largest expected gain for a given covariate profile?
How much does the outcome change when the continuous treatment increases by one unit?
Does the estimated marginal effect vary across segments?
We will use synthetic teaching data with known ground truth, so we can check whether the estimators recover the right arm rankings and dose-response slopes.
Learning Goals
By the end of this notebook, you should be able to:
define estimands for multi-arm and continuous-treatment settings;
simulate confounded multi-arm treatment assignment and continuous treatment intensity;
use DRLearner for multi-arm heterogeneous treatment effects;
estimate treatment effects for each arm relative to a control arm;
translate arm-specific CATE estimates into an estimated best-arm policy;
use LinearDML for continuous treatment effects;
interpret marginal effects and finite dose contrasts;
diagnose overlap, dose support, and segment-level treatment-effect recovery.
Tutorial Flow
The notebook has two teaching case studies.
First, we build a three-arm discrete treatment dataset. We fit an EconML DRLearner, estimate effects for each active arm versus control, and evaluate an estimated best-arm policy against the known synthetic truth.
Second, we build a continuous-dose dataset. We fit LinearDML, estimate heterogeneous marginal dose effects, compare them with the true slope, and inspect where the estimated dose effect is strongest.
The two parts use different estimators because the treatment structure is different. That is the main lesson: the causal question determines the estimator interface.
Setup
This cell imports the notebook dependencies, creates output folders, and sets plotting defaults. The warning filters remove harmless display and pandas-to-NumPy conversion messages so the saved notebook stays readable.
from pathlib import Pathimport osimport warnings# Suppress optional widget warnings that can appear while importing EconML in headless notebook runs.warnings.filterwarnings("ignore", message="IProgress not found.*")# Keep Matplotlib cache files in a writable location during notebook execution.os.environ.setdefault("MPLCONFIGDIR", "/tmp/matplotlib")import econmlimport matplotlib.pyplot as pltfrom matplotlib.ticker import PercentFormatterimport numpy as npimport pandas as pdimport seaborn as snsfrom IPython.display import displayfrom scipy.special import softmaxfrom sklearn.ensemble import RandomForestClassifier, RandomForestRegressorfrom sklearn.linear_model import LogisticRegressionfrom sklearn.metrics import accuracy_score, log_loss, mean_absolute_error, mean_squared_error, roc_auc_scorefrom sklearn.model_selection import train_test_splitfrom econml.dr import DRLearnerfrom econml.dml import LinearDMLwarnings.filterwarnings("ignore", message="X does not have valid feature names.*", category=UserWarning)warnings.filterwarnings("ignore", message="Not all column names are strings.*", category=UserWarning)warnings.filterwarnings("ignore", category=FutureWarning)sns.set_theme(style="whitegrid", context="notebook")plt.rcParams["figure.figsize"] = (10, 6)plt.rcParams["axes.titleweight"] ="bold"plt.rcParams["axes.labelsize"] =11def find_project_root(start=None):"""Find the repository root from either the repo or a nested notebook folder.""" start = Path.cwd() if start isNoneelse Path(start)for candidate in [start, *start.parents]:if (candidate /"pyproject.toml").exists() and (candidate /"notebooks").exists():return candidatereturn Path.cwd()PROJECT_ROOT = find_project_root()NOTEBOOK_DIR = PROJECT_ROOT /"notebooks"/"tutorials"/"econml"OUTPUT_DIR = NOTEBOOK_DIR /"outputs"FIGURE_DIR = OUTPUT_DIR /"figures"TABLE_DIR = OUTPUT_DIR /"tables"FIGURE_DIR.mkdir(parents=True, exist_ok=True)TABLE_DIR.mkdir(parents=True, exist_ok=True)rng = np.random.default_rng(202610)print(f"Project root: {PROJECT_ROOT}")print(f"EconML version: {econml.__version__}")print(f"Figures will be saved to: {FIGURE_DIR.relative_to(PROJECT_ROOT)}")print(f"Tables will be saved to: {TABLE_DIR.relative_to(PROJECT_ROOT)}")
Project root: /home/apex/Documents/ranking_sys
EconML version: 0.16.0
Figures will be saved to: notebooks/tutorials/econml/outputs/figures
Tables will be saved to: notebooks/tutorials/econml/outputs/tables
The environment is ready. We will save all tables and plots with the 10_ prefix so they remain easy to separate from earlier tutorial outputs.
Estimand Map
Multi-arm and continuous-treatment problems use different estimands. This table gives the vocabulary we will use before the code starts creating data.
estimand_map = pd.DataFrame( [ {"setting": "Binary treatment","treatment_example": "0 or 1","estimand": "Effect of treatment 1 versus treatment 0","econml_call_pattern": "est.effect(X)","decision_question": "Should this unit receive the intervention?", }, {"setting": "Multiple discrete treatments","treatment_example": "0, 1, or 2","estimand": "Effect of each active arm versus a baseline arm","econml_call_pattern": "est.effect(X, T0=0, T1=arm)","decision_question": "Which arm should this unit receive?", }, {"setting": "Continuous treatment","treatment_example": "Dose, intensity, discount, exposure time","estimand": "Marginal effect of increasing the treatment by one unit","econml_call_pattern": "est.const_marginal_effect(X) or est.effect(X, T0=a, T1=b)","decision_question": "How much should the treatment intensity change?", }, ])estimand_map.to_csv(TABLE_DIR /"10_estimand_map.csv", index=False)display(estimand_map)
setting
treatment_example
estimand
econml_call_pattern
decision_question
0
Binary treatment
0 or 1
Effect of treatment 1 versus treatment 0
est.effect(X)
Should this unit receive the intervention?
1
Multiple discrete treatments
0, 1, or 2
Effect of each active arm versus a baseline arm
est.effect(X, T0=0, T1=arm)
Which arm should this unit receive?
2
Continuous treatment
Dose, intensity, discount, exposure time
Marginal effect of increasing the treatment by...
est.const_marginal_effect(X) or est.effect(X, ...
How much should the treatment intensity change?
The key distinction is the contrast. Multi-arm treatment effects are relative to another arm, while continuous-treatment effects are usually marginal slopes or finite dose changes.
Part A: Multiple Discrete Treatments
In this section, the treatment has three possible arms. Arm 0 is the baseline experience. Arm 1 is a guided personalization intervention. Arm 2 is an exploration-oriented intervention. Each active arm helps different kinds of rows.
Multi-Arm Treatment Definitions
A multi-arm analysis should name each treatment arm clearly before modeling. This avoids a common mistake: interpreting arm-specific CATE estimates without remembering which baseline they are relative to.
arm_dictionary = pd.DataFrame( [ {"arm": 0,"name": "baseline experience","description": "No active intervention beyond the standard experience.","role": "Reference arm for treatment-effect contrasts.", }, {"arm": 1,"name": "guided personalization","description": "A focused intervention intended to help high-need, already-engaged rows.","role": "Active arm compared against baseline.", }, {"arm": 2,"name": "exploration boost","description": "A broader discovery-oriented intervention intended to help novelty-seeking rows.","role": "Active arm compared against baseline.", }, ])arm_dictionary.to_csv(TABLE_DIR /"10_multi_arm_dictionary.csv", index=False)display(arm_dictionary)
arm
name
description
role
0
0
baseline experience
No active intervention beyond the standard exp...
Reference arm for treatment-effect contrasts.
1
1
guided personalization
A focused intervention intended to help high-n...
Active arm compared against baseline.
2
2
exploration boost
A broader discovery-oriented intervention inte...
Active arm compared against baseline.
The reference arm is arm 0. Every arm-specific effect we estimate in this section will be read as “switching from baseline to this arm,” unless explicitly stated otherwise.
Multi-Arm Teaching Data
This cell creates a confounded three-arm observational dataset. Treatment assignment depends on observed covariates, and each active arm has its own heterogeneous treatment effect. Because we know the true arm effects, we can evaluate whether the model learns the right contrasts.
The first rows include observed covariates, the assigned treatment arm, the outcome, and teaching-only truth columns. In a real dataset we would observe the assigned arm and outcome, but not the true arm-specific effects.
Multi-Arm Field Dictionary
The synthetic data has more moving pieces than a binary example, so a field dictionary is helpful. It separates observed inputs from treatment, outcome, and teaching-only ground truth.
multi_field_dictionary = pd.DataFrame( [ ("baseline_need", "Observed covariate", "Pre-treatment need or demand signal."), ("prior_engagement", "Observed covariate", "Historical engagement before arm assignment."), ("friction_score", "Observed covariate", "Higher values mean more user or process friction."), ("content_affinity", "Observed covariate", "Match between row and content or offer."), ("novelty_preference", "Observed covariate", "Preference for exploratory or new experiences."), ("price_sensitivity", "Observed covariate", "Sensitivity to cost, effort, or inconvenience."), ("account_tenure", "Observed covariate", "Age of the account or relationship in weeks."), ("region_risk", "Observed covariate", "Binary marker for lower baseline outcome regions."), ("high_need_segment", "Observed covariate", "Binary segment derived from baseline need."), ("low_friction_segment", "Observed covariate", "Binary segment derived from friction score."), ("treatment_arm", "Treatment", "Observed arm assignment: 0, 1, or 2."), ("outcome", "Outcome", "Observed post-treatment outcome."), ("true_tau_arm_1", "Teaching-only truth", "True effect of arm 1 versus baseline."), ("true_tau_arm_2", "Teaching-only truth", "True effect of arm 2 versus baseline."), ("true_best_arm", "Teaching-only truth", "Arm with the largest true expected gain."), ("true_best_gain", "Teaching-only truth", "Largest true gain across arms."), ("propensity_arm_0", "Teaching-only truth", "Synthetic probability of receiving arm 0."), ("propensity_arm_1", "Teaching-only truth", "Synthetic probability of receiving arm 1."), ("propensity_arm_2", "Teaching-only truth", "Synthetic probability of receiving arm 2."), ], columns=["field", "role", "description"],)multi_field_dictionary.to_csv(TABLE_DIR /"10_multi_arm_field_dictionary.csv", index=False)display(multi_field_dictionary)
field
role
description
0
baseline_need
Observed covariate
Pre-treatment need or demand signal.
1
prior_engagement
Observed covariate
Historical engagement before arm assignment.
2
friction_score
Observed covariate
Higher values mean more user or process friction.
3
content_affinity
Observed covariate
Match between row and content or offer.
4
novelty_preference
Observed covariate
Preference for exploratory or new experiences.
5
price_sensitivity
Observed covariate
Sensitivity to cost, effort, or inconvenience.
6
account_tenure
Observed covariate
Age of the account or relationship in weeks.
7
region_risk
Observed covariate
Binary marker for lower baseline outcome regions.
8
high_need_segment
Observed covariate
Binary segment derived from baseline need.
9
low_friction_segment
Observed covariate
Binary segment derived from friction score.
10
treatment_arm
Treatment
Observed arm assignment: 0, 1, or 2.
11
outcome
Outcome
Observed post-treatment outcome.
12
true_tau_arm_1
Teaching-only truth
True effect of arm 1 versus baseline.
13
true_tau_arm_2
Teaching-only truth
True effect of arm 2 versus baseline.
14
true_best_arm
Teaching-only truth
Arm with the largest true expected gain.
15
true_best_gain
Teaching-only truth
Largest true gain across arms.
16
propensity_arm_0
Teaching-only truth
Synthetic probability of receiving arm 0.
17
propensity_arm_1
Teaching-only truth
Synthetic probability of receiving arm 1.
18
propensity_arm_2
Teaching-only truth
Synthetic probability of receiving arm 2.
The teaching-only columns make this notebook measurable. They let us check arm-specific CATE recovery and best-arm accuracy after fitting the estimator.
Multi-Arm Basic Summary
This cell summarizes sample size, treatment shares, outcome levels, and true effects by assigned arm. The assigned-arm summaries are descriptive; they are not causal effects because assignment is confounded.
The assigned arms have different covariate profiles, which is expected because we intentionally made assignment confounded. That is why arm outcome means should not be read as causal effects.
Treatment Share Plot
A simple treatment-share plot helps confirm that all arms have enough data. Multi-arm causal estimation becomes fragile when one arm is rare, especially if the rare arm is concentrated in a narrow covariate region.
All three arms have material support. The arms are not perfectly balanced, but no arm is so rare that the example becomes dominated by sample-size failure.
True Arm Effect Distributions
Because this is a teaching dataset, we can inspect the true effect distribution for each active arm. The two distributions are different because each arm helps different row types.
The two arms have overlapping but distinct effect distributions. This is the reason multi-arm learning is not just two separate binary questions; the final decision may require comparing both active arms for the same row.
Naive Arm Differences Versus Truth
This cell computes raw outcome differences between each active arm and the baseline arm, then compares those raw differences with the true average arm effects. The gap is a teaching view of selection bias.
The raw differences are not the same as the true average effects. Multi-arm observational data can have selection bias in every active-arm contrast, so each contrast needs adjustment.
Multi-Arm Train-Test Split
We split the multi-arm dataset before fitting the estimator. Stratifying by treatment arm keeps all arms represented in both train and test splits.
The split keeps treatment shares very similar across train and test. That makes held-out arm-specific diagnostics more stable.
Multi-Arm Model Matrices
This cell extracts the covariate matrix, treatment-arm vector, and outcome vector. Multi-arm treatments stay as integer labels because DRLearner can model the discrete arm assignment process directly.
X_multi_train = multi_train[multi_feature_cols].copy()X_multi_test = multi_test[multi_feature_cols].copy()T_multi_train = multi_train["treatment_arm"].to_numpy()T_multi_test = multi_test["treatment_arm"].to_numpy()y_multi_train = multi_train["outcome"].to_numpy()y_multi_test = multi_test["outcome"].to_numpy()true_tau_1_test = multi_test["true_tau_arm_1"].to_numpy()true_tau_2_test = multi_test["true_tau_arm_2"].to_numpy()multi_matrix_summary = pd.DataFrame( {"object": ["X_multi_train", "X_multi_test", "T_multi_train", "y_multi_train"],"shape_or_length": [X_multi_train.shape, X_multi_test.shape, len(T_multi_train), len(y_multi_train)],"description": ["Observed pre-treatment covariates for multi-arm CATE estimation.","Held-out covariates for evaluating arm-specific effects.","Integer arm labels for the training split.","Observed outcomes for the training split.", ], })multi_matrix_summary.to_csv(TABLE_DIR /"10_multi_arm_model_matrix_summary.csv", index=False)display(multi_matrix_summary)
object
shape_or_length
description
0
X_multi_train
(2730, 10)
Observed pre-treatment covariates for multi-ar...
1
X_multi_test
(1470, 10)
Held-out covariates for evaluating arm-specifi...
2
T_multi_train
2730
Integer arm labels for the training split.
3
y_multi_train
2730
Observed outcomes for the training split.
The treatment vector has three possible values. The estimator will learn the outcome process and the multinomial treatment assignment process from these objects.
Multi-Arm Nuisance Diagnostics
Before fitting DRLearner, we train diagnostic models for arm assignment and outcome prediction. These are not the exact internal nuisance fits, but they tell us whether the observed covariates carry useful signal.
The arm model has nontrivial predictive signal, which confirms the observational assignment process is confounded. Doubly robust estimation is useful here because it uses both outcome and treatment-assignment nuisance information.
Multi-Arm Propensity Support
For multi-arm treatments, overlap means each arm has a reasonable probability in the regions where we compare arms. This cell summarizes the estimated probability of each arm on held-out rows.
The support summary checks whether any arm has near-zero estimated probability for many rows. When an arm has weak support, its contrast against baseline becomes more fragile.
Propensity Support Plot
The table gives exact values, but the distribution plot makes support problems easier to see. Each curve shows the estimated probability of one arm on held-out rows.
The plot shows whether any treatment arm becomes implausible for large parts of the covariate space. Those regions are where arm-specific effect estimates should be handled carefully.
Fit Multi-Arm DRLearner
DRLearner works well for this section because it is designed for discrete treatments and can use both outcome regression and propensity modeling. We ask it to estimate active-arm effects relative to arm 0.
multi_dr = DRLearner( model_propensity=LogisticRegression(max_iter=1_000), model_regression=RandomForestRegressor( n_estimators=220, min_samples_leaf=20, random_state=202612, n_jobs=-1, ), model_final=RandomForestRegressor( n_estimators=260, min_samples_leaf=18, random_state=202613, n_jobs=-1, ), cv=3, random_state=202614,)multi_dr.fit(y_multi_train, T_multi_train, X=X_multi_train)estimated_tau_arm_1 = multi_dr.effect(X_multi_test, T0=0, T1=1)estimated_tau_arm_2 = multi_dr.effect(X_multi_test, T0=0, T1=2)estimated_tau_arm_2_vs_1 = estimated_tau_arm_2 - estimated_tau_arm_1print(f"Mean estimated effect, arm 1 versus baseline: {estimated_tau_arm_1.mean():.4f}")print(f"Mean estimated effect, arm 2 versus baseline: {estimated_tau_arm_2.mean():.4f}")print(f"Mean estimated effect, arm 2 versus arm 1: {estimated_tau_arm_2_vs_1.mean():.4f}")
Mean estimated effect, arm 1 versus baseline: 0.1223
Mean estimated effect, arm 2 versus baseline: 0.0906
Mean estimated effect, arm 2 versus arm 1: -0.0317
The estimator returns one contrast at a time. Arm 2 versus arm 1 can be formed by subtracting the two baseline-relative estimates because both effects are measured against the same reference arm.
Multi-Arm CATE Recovery
With synthetic data, we can compare estimated arm-specific CATEs with the true arm-specific effects. This is a teaching diagnostic for whether the estimator learned the right heterogeneity pattern.
The recovery table gives a contrast-by-contrast view. In multi-arm settings, one arm can be estimated well while another is weaker, especially if support differs across arms.
Multi-Arm Recovery Plot
This plot puts the two active-arm recovery patterns side by side. Each point is a held-out row, and the diagonal line marks perfect CATE recovery.
The side-by-side view makes arm-specific strengths and weaknesses visible. A good multi-arm model should recover not only average effects, but also the ordering of arms within rows.
Estimated Best-Arm Policy
Multi-arm CATE estimates become useful when we compare arms for each row. This cell chooses the estimated best arm and compares it with the oracle best arm from the synthetic truth.
The estimated policy can improve over historical assignment if the arm-specific estimates rank arms well. The gap to the oracle is regret: value left on the table because the model does not know the true effects perfectly.
Best-Arm Policy Plot
The table is precise, but a plot makes the policy comparison easier to communicate. We show both average true gain and how often each policy uses each arm.
fig, axes = plt.subplots(1, 2, figsize=(14, 5))sns.barplot(data=policy_gain_table, x="mean_true_gain", y="policy", color="#16a34a", ax=axes[0])axes[0].set_title("Mean True Gain By Policy")axes[0].set_xlabel("Mean true gain")axes[0].set_ylabel("Policy")arm_mix = policy_gain_table.melt( id_vars="policy", value_vars=["arm_0_rate", "arm_1_rate", "arm_2_rate"], var_name="arm", value_name="rate",)sns.barplot(data=arm_mix, x="rate", y="policy", hue="arm", ax=axes[1])axes[1].set_title("Arm Mix By Policy")axes[1].set_xlabel("Share of rows")axes[1].set_ylabel("")axes[1].xaxis.set_major_formatter(PercentFormatter(1.0))axes[1].legend(title="Arm")plt.tight_layout()fig.savefig(FIGURE_DIR /"10_multi_arm_policy_comparison.png", dpi=160, bbox_inches="tight")plt.show()
The policy view is the payoff of multi-arm estimation. We are no longer asking whether treatment works in general; we are asking which treatment works best for which rows.
Segment-Level Arm Effects
A segment summary helps explain the policy in human-readable terms. Here we group by need and novelty preference, then compare true and estimated arm effects within each segment.
The segment table shows why the two arms are different. Guided personalization tends to align with need and engagement, while the exploration arm is designed to align with novelty preference and lower friction.
Segment Arm-Effect Plot
The next plot compares estimated active-arm effects by segment. This is often more digestible than individual CATE scatterplots.
The segment plot makes it clear that multi-arm estimation is about matching rows to arms, not simply finding one universally best intervention.
Part B: Continuous Treatments
Now we move from choosing among discrete arms to estimating the effect of a continuous dose. The treatment might represent intensity, minutes, discount size, number of recommendations, support hours, or any other continuous amount.
Continuous Treatment Estimand
For continuous treatments, the main estimand is often a marginal effect: the expected outcome change for a one-unit increase in treatment intensity, conditional on covariates. We can also ask for a finite contrast such as increasing dose from 0.5 to 1.5.
continuous_estimand_table = pd.DataFrame( [ {"quantity": "Marginal dose effect","notation": "d E[Y(t) | X] / dt","plain_language": "Outcome change for a small one-unit increase in dose at a covariate profile.","econml_call": "est.const_marginal_effect(X)", }, {"quantity": "Finite dose contrast","notation": "E[Y(t=b) - Y(t=a) | X]","plain_language": "Outcome change when moving from dose a to dose b.","econml_call": "est.effect(X, T0=a, T1=b)", }, {"quantity": "Dose support check","notation": "Observed T distribution by segment","plain_language": "Whether the data contain enough dose variation to support the contrast.","econml_call": "Diagnostic outside the estimator", }, ])continuous_estimand_table.to_csv(TABLE_DIR /"10_continuous_estimand_table.csv", index=False)display(continuous_estimand_table)
quantity
notation
plain_language
econml_call
0
Marginal dose effect
d E[Y(t) | X] / dt
Outcome change for a small one-unit increase i...
est.const_marginal_effect(X)
1
Finite dose contrast
E[Y(t=b) - Y(t=a) | X]
Outcome change when moving from dose a to dose b.
est.effect(X, T0=a, T1=b)
2
Dose support check
Observed T distribution by segment
Whether the data contain enough dose variation...
Diagnostic outside the estimator
Continuous treatments make support especially important. A one-unit contrast is easier to defend when the observed data contain comparable rows with a wide range of treatment intensities.
Continuous-Dose Teaching Data
This cell creates a continuous treatment intensity. The true outcome model is linear in the dose, but the slope varies across covariates. That makes LinearDML a good teaching estimator because it estimates heterogeneous marginal effects.
The treatment is a continuous intensity, not a binary indicator. The truth columns show the marginal effect and finite contrasts that we will use to evaluate the estimator.
Continuous Field Dictionary
The continuous-treatment example uses a separate field dictionary so the dose variable is clearly distinguished from the outcome and effect columns.
continuous_field_dictionary = pd.DataFrame( [ ("baseline_need", "Observed covariate", "Pre-treatment need or demand signal."), ("prior_engagement", "Observed covariate", "Historical engagement before treatment intensity is assigned."), ("friction_score", "Observed covariate", "Higher values indicate more friction."), ("content_affinity", "Observed covariate", "Match between row and content or offer."), ("price_sensitivity", "Observed covariate", "Sensitivity to cost or inconvenience."), ("capacity_score", "Observed covariate", "Ability to absorb more treatment intensity."), ("account_tenure", "Observed covariate", "Age of the relationship in weeks."), ("region_risk", "Observed covariate", "Binary marker for lower baseline outcome regions."), ("high_capacity_segment", "Observed covariate", "Binary segment derived from capacity score."), ("treatment_intensity", "Treatment", "Continuous treatment dose or intensity."), ("outcome", "Outcome", "Observed post-treatment outcome."), ("true_marginal_effect", "Teaching-only truth", "True one-unit dose effect for each row."), ("true_effect_0_to_1", "Teaching-only truth", "True effect of increasing dose from 0 to 1."), ("true_effect_0_to_2", "Teaching-only truth", "True effect of increasing dose from 0 to 2."), ], columns=["field", "role", "description"],)continuous_field_dictionary.to_csv(TABLE_DIR /"10_continuous_field_dictionary.csv", index=False)display(continuous_field_dictionary)
field
role
description
0
baseline_need
Observed covariate
Pre-treatment need or demand signal.
1
prior_engagement
Observed covariate
Historical engagement before treatment intensi...
2
friction_score
Observed covariate
Higher values indicate more friction.
3
content_affinity
Observed covariate
Match between row and content or offer.
4
price_sensitivity
Observed covariate
Sensitivity to cost or inconvenience.
5
capacity_score
Observed covariate
Ability to absorb more treatment intensity.
6
account_tenure
Observed covariate
Age of the relationship in weeks.
7
region_risk
Observed covariate
Binary marker for lower baseline outcome regions.
8
high_capacity_segment
Observed covariate
Binary segment derived from capacity score.
9
treatment_intensity
Treatment
Continuous treatment dose or intensity.
10
outcome
Outcome
Observed post-treatment outcome.
11
true_marginal_effect
Teaching-only truth
True one-unit dose effect for each row.
12
true_effect_0_to_1
Teaching-only truth
True effect of increasing dose from 0 to 1.
13
true_effect_0_to_2
Teaching-only truth
True effect of increasing dose from 0 to 2.
The treatment column is numeric and ordered. That changes the modeling goal from estimating arm contrasts to estimating a slope or dose contrast.
Continuous Treatment Summary
This cell checks the distribution of treatment intensity, the outcome, and the true marginal effect. We also look at the correlation between treatment intensity and covariates, because confounded continuous treatments are assigned more intensely to some row types.
The treatment has broad support and the true marginal effect varies across rows. That gives LinearDML a meaningful heterogeneity problem to solve.
Continuous Treatment Distribution
A continuous treatment should be inspected like a numeric exposure. The histogram shows whether the dose has enough variation and whether clipping has created large spikes at the boundaries.
The treatment distribution is continuous with useful spread. Boundary mass would make some dose contrasts less credible, so it is worth checking before model fitting.
True Marginal Effect Distribution
This plot shows the true heterogeneous slope. In real data, this distribution is unknown; here it gives us a target for evaluating the estimated marginal effects.
The marginal effect is positive for many rows but not all rows. A dose-increase policy should therefore be selective rather than assuming more intensity is always better.
Dose Support By Segment
For continuous treatments, support means having dose variation within the segments where we want to estimate effects. This cell summarizes dose distributions by high-capacity segment.
Both segments have variation in treatment intensity. If one segment had almost no dose variation, its marginal-effect estimate would be much harder to trust.
Continuous Train-Test Split
We split the continuous-treatment data before fitting the model. Because the treatment is continuous, we stratify on a binned version of dose to keep dose ranges represented in both train and test splits.
The train and test splits have similar dose and true-effect distributions. That keeps held-out marginal-effect diagnostics meaningful.
Continuous Model Matrices
This cell extracts the covariates, continuous treatment vector, and outcome vector for LinearDML. Unlike the multi-arm section, T is numeric and continuous.
X_cont_train = continuous_train[continuous_feature_cols].copy()X_cont_test = continuous_test[continuous_feature_cols].copy()T_cont_train = continuous_train["treatment_intensity"].to_numpy()T_cont_test = continuous_test["treatment_intensity"].to_numpy()y_cont_train = continuous_train["outcome"].to_numpy()y_cont_test = continuous_test["outcome"].to_numpy()true_slope_test = continuous_test["true_marginal_effect"].to_numpy()continuous_matrix_summary = pd.DataFrame( {"object": ["X_cont_train", "X_cont_test", "T_cont_train", "y_cont_train"],"shape_or_length": [X_cont_train.shape, X_cont_test.shape, len(T_cont_train), len(y_cont_train)],"description": ["Observed pre-treatment covariates.","Held-out covariates for marginal-effect evaluation.","Continuous treatment intensity for training rows.","Observed outcomes for training rows.", ], })continuous_matrix_summary.to_csv(TABLE_DIR /"10_continuous_model_matrix_summary.csv", index=False)display(continuous_matrix_summary)
object
shape_or_length
description
0
X_cont_train
(2470, 9)
Observed pre-treatment covariates.
1
X_cont_test
(1330, 9)
Held-out covariates for marginal-effect evalua...
2
T_cont_train
2470
Continuous treatment intensity for training rows.
3
y_cont_train
2470
Observed outcomes for training rows.
The continuous treatment vector has the same length as the outcome vector, but its values are real-valued doses rather than treatment-arm labels.
Continuous Nuisance Diagnostics
LinearDML residualizes both the outcome and the continuous treatment. This diagnostic cell checks whether observed covariates predict the treatment intensity and outcome reasonably well.
The dose model captures confounding in the treatment intensity. That is exactly the structure DML is designed to handle: remove predictable treatment and outcome components, then estimate the residualized treatment effect.
Fit LinearDML For Continuous Treatment
LinearDML estimates a CATE function that is linear in the treatment but can vary with covariates. In this synthetic data, the outcome is linear in dose with a heterogeneous slope, so the estimator is aligned with the teaching setup.
Mean estimated marginal effect: 0.4192
Mean true marginal effect: 0.2610
For a linear-in-dose treatment model, the one-unit effect from effect(X, T0=0, T1=1) should match the marginal effect. The two-unit contrast should be roughly twice the marginal effect.
Continuous Effect Recovery
This cell compares estimated marginal effects and finite dose contrasts with the synthetic truth. These recovery metrics are teaching diagnostics that would not be available in real observational data.
The recovery table checks whether the estimated slope and finite contrasts move with the truth. The finite contrasts scale with the dose change because the synthetic outcome is linear in treatment intensity.
Continuous Marginal-Effect Recovery Plot
The next plot compares true and estimated marginal effects for held-out rows. The diagonal line is perfect recovery.
The scatterplot shows how well the estimated marginal effect tracks the true slope. Segment coloring helps reveal whether recovery differs across capacity groups.
Dose Contrast Sanity Check
Because the teaching outcome is linear in dose, the estimated effect from dose 0 to 2 should be about twice the estimated effect from dose 0 to 1. This cell checks that relationship directly.
The ratio and absolute difference confirm the expected linear-dose behavior. If the real causal question involved nonlinear saturation, we would need a treatment featurizer or a different dose-response strategy.
Segment-Level Continuous Effects
This cell summarizes estimated and true marginal effects by capacity and need segments. Segment summaries are often easier to communicate than row-level slopes.
The segment table shows where increased treatment intensity appears most beneficial. It also exposes whether some segments have weaker estimated dose effects despite similar average dose levels.
Segment Continuous-Effect Plot
This plot compares true and estimated marginal effects by segment. It is a compact way to explain heterogeneous continuous-dose effects.
The segment plot shows whether the estimator preserves the broad ranking of groups. This is often more important for decision-making than perfect row-level recovery.
Dose-Increase Targeting Rule
A continuous-treatment model can support a policy such as “increase dose for rows whose estimated marginal effect is positive and high enough.” This cell compares a simple estimated rule to an oracle rule using the synthetic truth.
The estimated rule is a realistic version of dose targeting: it uses model-estimated marginal effects and a cost threshold. The oracle row shows the upper benchmark that would be available only if true effects were known.
Dose-Increase Policy Plot
The final policy plot compares increase rates and true net gains under the estimated and oracle dose-increase rules.
The policy plot translates marginal effects into a decision. Even in a continuous-dose setting, the model often becomes useful through a clear rule that says where to increase, decrease, or hold intensity.
Practical Checklist
The final checklist summarizes what to verify before using multiple-treatment or continuous-treatment estimates in a real analysis.
practical_checklist = pd.DataFrame( [ {"topic": "Treatment definition","multi_arm_question": "Are arms mutually exclusive and clearly named?","continuous_question": "Is the dose numeric, ordered, and measured before the outcome window?", }, {"topic": "Reference contrast","multi_arm_question": "Which arm is the baseline for effect estimates?","continuous_question": "Which finite dose contrast or marginal effect is being reported?", }, {"topic": "Overlap and support","multi_arm_question": "Does every arm have support in the covariate regions being compared?","continuous_question": "Is there enough dose variation within important segments?", }, {"topic": "Nuisance models","multi_arm_question": "Can the arm assignment and outcome models learn useful signal?","continuous_question": "Can the dose and outcome models learn useful signal?", }, {"topic": "Decision rule","multi_arm_question": "Will action use best-arm ranking, budget constraints, or arm-specific thresholds?","continuous_question": "Will action use marginal-effect thresholds, dose caps, or cost-adjusted net effects?", }, {"topic": "Reporting","multi_arm_question": "Report arm-specific effects and the arm mix of the learned policy.","continuous_question": "Report marginal effects, dose support, and the dose range where results apply.", }, ])practical_checklist.to_csv(TABLE_DIR /"10_multiple_and_continuous_treatment_checklist.csv", index=False)display(practical_checklist)
topic
multi_arm_question
continuous_question
0
Treatment definition
Are arms mutually exclusive and clearly named?
Is the dose numeric, ordered, and measured bef...
1
Reference contrast
Which arm is the baseline for effect estimates?
Which finite dose contrast or marginal effect ...
2
Overlap and support
Does every arm have support in the covariate r...
Is there enough dose variation within importan...
3
Nuisance models
Can the arm assignment and outcome models lear...
Can the dose and outcome models learn useful s...
4
Decision rule
Will action use best-arm ranking, budget const...
Will action use marginal-effect thresholds, do...
5
Reporting
Report arm-specific effects and the arm mix of...
Report marginal effects, dose support, and the...
The checklist keeps the final analysis grounded. Multiple treatments and continuous treatments are powerful, but they require careful contrast definitions and support diagnostics.
Summary
This notebook extended the EconML tutorial sequence beyond binary treatments.
The multi-arm section showed how to:
define a baseline arm and active-arm contrasts;
estimate arm-specific CATEs with DRLearner;
compare active arms for the same row;
convert arm effects into an estimated best-arm policy;
evaluate arm-specific recovery and policy regret on synthetic truth.
The continuous-treatment section showed how to:
define marginal effects and finite dose contrasts;
inspect dose support before modeling;
estimate heterogeneous dose slopes with LinearDML;
compare estimated slopes with known synthetic truth;
turn marginal effects into a dose-increase rule.
The next tutorial introduces instrumental-variable estimators, which are useful when treatment is confounded by unobserved factors but a credible instrument is available.