DoubleML Tutorial 15: Policy Learning, Weighted ATEs, Quantiles, And CVaR
This notebook moves from estimation toward decision targets. Earlier notebooks estimated average effects, group effects, conditional-effect summaries, uncertainty, and sensitivity. Those are essential building blocks, but many applied causal questions are not answered by a single ATE.
A decision maker may ask:
What is the effect for a priority population rather than the full population?
Which observations should receive treatment if the treatment is optional, costly, or capacity-limited?
Does treatment help the lower tail of the outcome distribution, not only the mean?
Does treatment reduce downside risk for vulnerable units?
DoubleML contains several tools for this broader decision layer. In this notebook we use:
weights= in DoubleMLIRM for weighted ATEs.
DoubleMLIRM.policy_tree() for a simple policy-learning rule.
DoubleMLQTE for quantile treatment effects.
DoubleMLCVAR for lower-tail potential-outcome risk.
The big conceptual warning is that these targets are still causal targets. A policy tree is not just a prediction tree. A weighted ATE is not just a filtered average. A QTE is not just a histogram comparison. Each result inherits the same identification assumptions as the base causal design: no unmeasured confounding conditional on the controls, overlap, and a stable treatment definition.
Setup
This setup cell prepares the tutorial output folders and imports the libraries used throughout the notebook. The code is visible so a reader can reproduce the environment and see exactly which DoubleML classes are being used.
The warning filters are narrow. They silence notebook-environment noise and a familiar scikit-learn feature-name warning, while leaving model and data problems visible.
from pathlib import Pathimport osimport warnings# Find the repository root from wherever the notebook is executed.PROJECT_ROOT = Path.cwd().resolve()whilenot (PROJECT_ROOT /"pyproject.toml").exists() and PROJECT_ROOT != PROJECT_ROOT.parent: PROJECT_ROOT = PROJECT_ROOT.parentOUTPUT_DIR = PROJECT_ROOT /"notebooks"/"tutorials"/"doubleml"/"outputs"DATASET_DIR = OUTPUT_DIR /"datasets"FIGURE_DIR = OUTPUT_DIR /"figures"TABLE_DIR = OUTPUT_DIR /"tables"REPORT_DIR = OUTPUT_DIR /"reports"MATPLOTLIB_CACHE_DIR = OUTPUT_DIR /"matplotlib_cache"for directory in [DATASET_DIR, FIGURE_DIR, TABLE_DIR, REPORT_DIR, MATPLOTLIB_CACHE_DIR]: directory.mkdir(parents=True, exist_ok=True)# Set Matplotlib's cache before importing pyplot so notebook execution stays quiet.os.environ.setdefault("MPLCONFIGDIR", str(MATPLOTLIB_CACHE_DIR))warnings.filterwarnings("ignore", category=FutureWarning)warnings.filterwarnings("ignore", message="IProgress not found.*")warnings.filterwarnings("ignore", message="X does not have valid feature names.*")import numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsfrom IPython.display import displayfrom matplotlib.patches import FancyArrowPatch, FancyBboxPatchfrom sklearn.base import clonefrom sklearn.ensemble import RandomForestClassifier, RandomForestRegressorfrom sklearn.metrics import brier_score_loss, log_loss, mean_squared_errorfrom sklearn.tree import export_text, plot_treeimport doubleml as dmlfrom doubleml import DoubleMLCVAR, DoubleMLData, DoubleMLIRM, DoubleMLQTENOTEBOOK_PREFIX ="15"RANDOM_SEED =150sns.set_theme(style="whitegrid", context="talk")pd.set_option("display.max_columns", 80)pd.set_option("display.float_format", "{:.4f}".format)print(f"Project root: {PROJECT_ROOT}")print(f"DoubleML version: {dml.__version__}")print(f"Outputs will be written to: {OUTPUT_DIR}")
Project root: /home/apex/Documents/ranking_sys
DoubleML version: 0.11.2
Outputs will be written to: /home/apex/Documents/ranking_sys/notebooks/tutorials/doubleml/outputs
The setup confirms the installed DoubleML version and the shared output folder. Every file created by this notebook uses prefix 15.
Helper Functions
The helpers below keep the notebook focused on causal ideas rather than file paths and repeated formatting. Two helpers deserve special attention:
lower_tail_cvar() computes an oracle lower-tail average in the synthetic data. This is only possible here because both potential outcomes are stored by the simulator.
fit_weighted_irm() wraps the DoubleMLIRM(weights=...) workflow so the weighted ATE section stays compact.
The helper design makes the target explicit each time the notebook estimates something. That is important because this notebook covers several different estimands that should not be mixed together.
Decision-Target Vocabulary
The table below separates four related ideas. The common thread is that all of them use causal identification, but each answers a different decision question.
decision_vocabulary = pd.DataFrame( [ {"target": "Weighted ATE","plain_language": "Average effect after assigning more importance to some observations.","decision_question": "What is the effect for the population we care most about?","DoubleML_tool": "DoubleMLIRM(weights=...)", }, {"target": "Policy tree","plain_language": "A shallow decision rule that recommends treatment when the orthogonal benefit signal is positive.","decision_question": "Who should receive treatment under a simple, auditable rule?","DoubleML_tool": "DoubleMLIRM.policy_tree()", }, {"target": "QTE","plain_language": "Difference between treated and untreated potential-outcome quantiles.","decision_question": "Does treatment shift the lower, middle, or upper part of the outcome distribution?","DoubleML_tool": "DoubleMLQTE", }, {"target": "CVaR","plain_language": "Average potential outcome inside a lower tail, such as the worst 10% or 20%.","decision_question": "Does treatment improve downside outcomes for the worst-off cases?","DoubleML_tool": "DoubleMLCVAR", }, ])save_table(decision_vocabulary, f"{NOTEBOOK_PREFIX}_decision_target_vocabulary.csv")display(decision_vocabulary)
target
plain_language
decision_question
DoubleML_tool
0
Weighted ATE
Average effect after assigning more importance...
What is the effect for the population we care ...
DoubleMLIRM(weights=...)
1
Policy tree
A shallow decision rule that recommends treatm...
Who should receive treatment under a simple, a...
DoubleMLIRM.policy_tree()
2
QTE
Difference between treated and untreated poten...
Does treatment shift the lower, middle, or upp...
DoubleMLQTE
3
CVaR
Average potential outcome inside a lower tail,...
Does treatment improve downside outcomes for t...
DoubleMLCVAR
The most common mistake is to describe all four targets as if they were the same effect. They are not. The weighted ATE is still a mean. The QTE is about marginal quantiles of potential outcomes. CVaR is about tail averages. A policy tree is a rule optimized against an orthogonal benefit signal.
Synthetic Decision Dataset
We create a binary-treatment, continuous-outcome dataset with three features that matter for decisions:
need_z: higher values mean the treatment is more useful.
margin_z: higher values also make the treatment more useful.
risk_z: high values can make the treatment less useful, even though treatment also partly protects against downside shocks.
The simulator stores both potential outcomes, y0_oracle and y1_oracle, so the notebook can compare estimates with oracle values. Real observational data would only contain the observed outcome.
The observed outcome is generated from one of the two potential outcomes based on treatment assignment. The oracle columns make the tutorial testable, but the DoubleML estimators only receive the observed treatment, observed outcome, and observed controls.
Field Dictionary
This field dictionary documents the dataset roles. Decision-focused notebooks need this clarity because it is easy to accidentally target on variables that are only available in a simulator.
field_dictionary = pd.DataFrame( [ {"column": "need_z", "role": "Observed control / policy feature", "description": "Standardized need or opportunity signal."}, {"column": "engagement_z", "role": "Observed control / policy feature", "description": "Standardized prior engagement signal."}, {"column": "risk_z", "role": "Observed control / policy feature", "description": "Standardized downside-risk signal."}, {"column": "margin_z", "role": "Observed control / policy feature", "description": "Standardized value or margin signal."}, {"column": "tenure_z", "role": "Observed control", "description": "Standardized tenure or relationship length signal."}, {"column": "friction_z", "role": "Observed control", "description": "Standardized friction or support burden signal."}, {"column": "propensity_true", "role": "Oracle diagnostic", "description": "True treatment probability from the simulator; not used by estimators."}, {"column": "treatment", "role": "Treatment", "description": "Binary treatment or exposure indicator."}, {"column": "direct_tau_component", "role": "Oracle diagnostic", "description": "Direct mean-shift component before downside-risk mitigation."}, {"column": "y0_oracle", "role": "Oracle diagnostic", "description": "Potential outcome under no treatment; unavailable in real data."}, {"column": "y1_oracle", "role": "Oracle diagnostic", "description": "Potential outcome under treatment; unavailable in real data."}, {"column": "true_effect", "role": "Oracle diagnostic", "description": "Individual treatment effect y1_oracle - y0_oracle; unavailable in real data."}, {"column": "outcome", "role": "Outcome", "description": "Observed outcome corresponding to the observed treatment."}, {"column": "segment", "role": "Reporting group", "description": "Mutually exclusive teaching segment used for diagnostics and summaries."}, ])save_table(field_dictionary, f"{NOTEBOOK_PREFIX}_field_dictionary.csv")display(field_dictionary)
column
role
description
0
need_z
Observed control / policy feature
Standardized need or opportunity signal.
1
engagement_z
Observed control / policy feature
Standardized prior engagement signal.
2
risk_z
Observed control / policy feature
Standardized downside-risk signal.
3
margin_z
Observed control / policy feature
Standardized value or margin signal.
4
tenure_z
Observed control
Standardized tenure or relationship length sig...
5
friction_z
Observed control
Standardized friction or support burden signal.
6
propensity_true
Oracle diagnostic
True treatment probability from the simulator;...
7
treatment
Treatment
Binary treatment or exposure indicator.
8
direct_tau_component
Oracle diagnostic
Direct mean-shift component before downside-ri...
9
y0_oracle
Oracle diagnostic
Potential outcome under no treatment; unavaila...
10
y1_oracle
Oracle diagnostic
Potential outcome under treatment; unavailable...
11
true_effect
Oracle diagnostic
Individual treatment effect y1_oracle - y0_ora...
12
outcome
Outcome
Observed outcome corresponding to the observed...
13
segment
Reporting group
Mutually exclusive teaching segment used for d...
The oracle columns are clearly labeled as diagnostics. That prevents a common tutorial mistake: letting simulated information leak into the estimator and then overestimating how easy the problem is.
Data Audit
This audit checks treatment rate, overlap, mean effect, and downside-risk scale. Since this notebook studies decision targets, it also reports the share of observations with positive oracle treatment effects.
The oracle policy increment is larger than the ATE because it treats only observations with positive individual effects. That oracle rule is not available in real data, but it gives a useful upper benchmark for the policy-learning section.
Segment Audit
The segment audit gives a first view of where treatment value is concentrated. Unlike the weighted ATE section below, this cell is descriptive and oracle-based because it uses true_effect.
The synthetic design makes high-need and high-margin observations attractive treatment candidates, while high-risk observations are more mixed. That pattern sets up a meaningful policy-learning example.
Decision Design Diagram
The diagram below is a compact map of the notebook. Observed controls affect treatment assignment and outcomes. DoubleML learns nuisance functions and orthogonal scores. Decision targets then reweight, threshold, or summarize those scores in different ways.
fig, ax = plt.subplots(figsize=(13, 6.6))ax.set_axis_off()nodes = {"X": {"xy": (0.10, 0.55), "label": "Observed\ncontrols X", "color": "#dbeafe"},"D": {"xy": (0.36, 0.34), "label": "Treatment\nD", "color": "#fef3c7"},"Y": {"xy": (0.62, 0.34), "label": "Outcome\nY", "color": "#fee2e2"},"N": {"xy": (0.38, 0.78), "label": "Nuisance\nmodels", "color": "#ede9fe"},"S": {"xy": (0.66, 0.78), "label": "Orthogonal\nsignal", "color": "#dcfce7"},"T": {"xy": (0.90, 0.55), "label": "Decision\ntargets", "color": "#e0f2fe"},}box_w, box_h =0.16, 0.12def anchor(node, side): x, y = nodes[node]["xy"] offsets = {"left": (-box_w /2, 0),"right": (box_w /2, 0),"top": (0, box_h /2),"bottom": (0, -box_h /2),"upper_right": (box_w /2, box_h *0.25),"lower_right": (box_w /2, -box_h *0.25),"upper_left": (-box_w /2, box_h *0.25),"lower_left": (-box_w /2, -box_h *0.25), } dx, dy = offsets[side]return np.array([x + dx, y + dy], dtype=float)# Keep arrowheads visibly outside the boxes instead of tucking them under the patches.def shorten(start, end, gap=0.040): start = np.asarray(start, dtype=float) end = np.asarray(end, dtype=float) delta = end - start length = np.hypot(delta[0], delta[1])if length ==0:returntuple(start), tuple(end) unit = delta / lengthreturntuple(start + gap * unit), tuple(end - gap * unit)def draw_arrow(start, end, color="#334155", style="solid", rad=0.0, linewidth=1.7): start, end = shorten(start, end) arrow = FancyArrowPatch( start, end, arrowstyle="-|>", mutation_scale=18, linewidth=linewidth, color=color, linestyle=style, connectionstyle=f"arc3,rad={rad}", zorder=2, clip_on=False, ) ax.add_patch(arrow)draw_arrow(anchor("X", "lower_right"), anchor("D", "left"), color="#475569", rad=-0.04)draw_arrow(anchor("X", "right"), anchor("Y", "left"), color="#475569", rad=-0.10)draw_arrow(anchor("D", "right"), anchor("Y", "left"), color="#b45309")draw_arrow(anchor("X", "upper_right"), anchor("N", "left"), color="#7c3aed", rad=0.04)draw_arrow(anchor("N", "right"), anchor("S", "left"), color="#15803d")draw_arrow(anchor("S", "right"), anchor("T", "upper_left"), color="#15803d", rad=-0.04)draw_arrow(anchor("Y", "right"), anchor("T", "lower_left"), color="#475569", rad=0.04)for spec in nodes.values(): x, y = spec["xy"] rect = FancyBboxPatch( (x - box_w /2, y - box_h /2), box_w, box_h, boxstyle="round,pad=0.018", facecolor=spec["color"], edgecolor="#334155", linewidth=1.2, zorder=3, ) ax.add_patch(rect) ax.text(x, y, spec["label"], ha="center", va="center", fontsize=11, fontweight="bold", zorder=4)ax.text(0.50,0.08,"Weighted effects, policy trees, quantiles, and CVaR are decision targets built on the same causal design.", ha="center", va="center", fontsize=10, color="#475569",)ax.set_title("Decision Targets Built From A DoubleML Design", pad=18)plt.tight_layout()fig.savefig(FIGURE_DIR /f"{NOTEBOOK_PREFIX}_decision_targets_design.png", dpi=160, bbox_inches="tight")plt.show()
The diagram puts policy learning and distributional summaries after the causal design. This ordering matters: decision tools cannot repair weak identification assumptions.
Fit The Base DoubleMLIRM Model
We start with a standard DoubleMLIRM ATE model. This fitted object will later produce the policy tree. It also provides nuisance predictions for overlap checks.
The base ATE estimates the average effect in the full population. This is the reference point for the weighted and policy sections that follow.
Nuisance Diagnostics
The policy tree and weighted ATEs depend on the same nuisance components. This cell checks cross-fitted outcome predictions, propensity predictions, and overlap.
The estimated propensity range stays away from zero and one. That supports, but does not prove, the overlap condition needed for policy and weighted-effect targets.
Propensity Overlap Plot
This visual check shows whether treated and untreated observations share common support in the estimated propensity score.
The distributions overlap over most of the range. If one group lived only near zero or one, policy learning would mostly extrapolate from model structure rather than observed comparisons.
Weighted ATEs
A weighted ATE changes the target population. In DoubleML, an observation-level weight vector can be passed through weights= when using DoubleMLIRM(score="ATE").
The weights below are normalized to have mean one. This keeps the scale comparable while changing which rows receive more influence.
The weighted estimates answer different questions. The high-need and high-margin targets emphasize populations where the treatment is designed to help more, while the high-risk target emphasizes a population where the effect is more uncertain and can be lower.
Plot Weighted ATEs
The figure below keeps the full-population ATE visible while showing how the target changes under weighting.
The plot shows why weighted ATEs are not just cosmetic. They can change the answer by changing the population being averaged over.
Policy Tree
A policy tree turns the orthogonal treatment-benefit signal into a shallow decision rule. DoubleML fits this as a weighted classification problem: the sign of the orthogonal signal says whether treatment appears beneficial, and the magnitude acts like importance weight.
The tree is deliberately shallow. In decision settings, a slightly less flexible rule can be easier to audit, explain, and validate.
The printed rules are the policy. This is a major advantage of shallow policy trees: the decision logic can be inspected directly instead of only being summarized by a score.
Plot The Policy Tree
The plotted tree is saved as a figure so the learned rule can be reviewed without rerunning the notebook.
The tree should be read as a candidate rule, not as a final deployment artifact. A production rule would need holdout validation, monitoring, guardrails, and an explicit treatment-cost model.
Policy Value Summary
The policy tree is optimized against the orthogonal signal psi_b. The table below reports two quantities:
orthogonal_policy_score: the score the learned rule is trying to improve.
oracle_increment_in_simulation: the mean true treatment effect among treated-by-policy observations, counting untreated-by-policy observations as zero.
The oracle columns exist only because this is a simulation.
The policy tree improves on treating nobody and can improve on treating everyone when it avoids enough low-effect cases. The oracle positive-effect policy is an upper benchmark, not an attainable rule in observational data.
Policy Assignment Diagnostics
A policy rule should be inspected for who it treats. This cell summarizes treatment recommendations by the teaching segments.
The policy tree tends to treat high-need and high-margin segments more often, while being more selective in the high-risk segment. This is the kind of operational behavior a policy learner should make transparent.
Plot Policy Assignment By Segment
The plot compares the learned policy treatment rate with the observed historical treatment rate. These are different objects: the observed rate is what happened in the data, while the policy rate is what the learned rule recommends.
The difference between historical treatment and recommended treatment is a useful audit. Large differences can be promising, but they also raise implementation and extrapolation questions.
Quantile Treatment Effects
The ATE summarizes the mean. A QTE compares quantiles of the treated and untreated potential-outcome distributions. For example, the 10th-percentile QTE asks whether treatment improves the lower part of the outcome distribution.
A QTE is not the 10th percentile of individual treatment effects. It is the difference between two marginal potential-outcome quantiles.
The QTE table shows whether the treatment effect differs across the outcome distribution. In this synthetic design, distributional effects matter because treatment partly mitigates downside shocks.
Plot Quantile Treatment Effects
The QTE plot makes the distributional pattern easier to scan than a table. The oracle points are available only because this is simulated data.
The curve shows why distribution-aware targets can change the conversation. If lower-tail effects are large, the treatment may be valuable as a risk-reduction tool even when the average effect is only moderate.
CVaR For Potential Outcomes
CVaR is a tail-average target. With an outcome where higher is better, lower-tail CVaR at alpha=0.10 is the average outcome among the worst 10% of the potential-outcome distribution.
DoubleMLCVAR estimates the CVaR of a chosen potential outcome, selected by treatment=0 or treatment=1. We fit both and compare them descriptively.
The potential-outcome CVaR estimates are levels, not treatment effects by themselves. The difference between Y(1) and Y(0) lower-tail CVaR is a tail-risk improvement summary.
CVaR Difference Summary
This cell computes a descriptive treated-minus-control CVaR difference from the two fitted potential-outcome CVaRs. The interval is not reported as a formal joint interval because the two estimates are fitted separately here.
cvar_difference_rows = []for alpha in cvar_alphas: treated = cvar_results.query("alpha == @alpha and potential_outcome == 'Y(1)'").iloc[0] control = cvar_results.query("alpha == @alpha and potential_outcome == 'Y(0)'").iloc[0] cvar_difference_rows.append( {"alpha": alpha,"estimated_tail_improvement": treated["estimate"] - control["estimate"],"oracle_tail_improvement": treated["oracle_cvar_in_simulation"] - control["oracle_cvar_in_simulation"],"note": "Difference of separately estimated potential-outcome CVaRs; treat uncertainty descriptively here.", } )cvar_difference = pd.DataFrame(cvar_difference_rows)save_table(cvar_difference, f"{NOTEBOOK_PREFIX}_cvar_tail_improvement.csv")display(cvar_difference)
alpha
estimated_tail_improvement
oracle_tail_improvement
note
0
0.1000
0.3977
0.2718
Difference of separately estimated potential-o...
1
0.2000
0.4046
0.1947
Difference of separately estimated potential-o...
The tail-improvement summary is useful when downside outcomes are operationally important. It is not a replacement for the ATE or QTE; it answers a different risk-focused question.
Plot CVaR Potential Outcomes
This plot shows the estimated lower-tail potential-outcome levels for treatment and control at each alpha.
fig, ax = plt.subplots(figsize=(10.5, 5.8))sns.pointplot( data=cvar_results, x="alpha", y="estimate", hue="potential_outcome", dodge=0.25, errorbar=None, markers="o", linestyles="-", ax=ax,)# Add confidence intervals manually so they line up with the dodged points.alpha_positions = {alpha: idx for idx, alpha inenumerate(sorted(cvar_results["alpha"].unique()))}offsets = {"Y(0)": -0.10, "Y(1)": 0.10}for _, row in cvar_results.iterrows(): x = alpha_positions[row["alpha"]] + offsets[row["potential_outcome"]] ax.errorbar( x=x, y=row["estimate"], yerr=[[row["estimate"] - row["ci_95_lower"]], [row["ci_95_upper"] - row["estimate"]]], fmt="none", color="#111827", capsize=4, linewidth=1.5, )ax.set_title("Lower-Tail CVaR Of Potential Outcomes")ax.set_xlabel("Lower-tail alpha")ax.set_ylabel("CVaR potential outcome")ax.legend(title="Potential outcome", loc="center left", bbox_to_anchor=(1.02, 0.5))plt.tight_layout()fig.savefig(FIGURE_DIR /f"{NOTEBOOK_PREFIX}_cvar_potential_outcomes.png", dpi=160, bbox_inches="tight")plt.show()
The treated potential outcome has a higher lower-tail CVaR in this simulation, which means treatment improves the worst-outcome tail. This is the risk-reduction story that an ATE alone can hide.
Bring The Decision Targets Together
The table below summarizes how each target should be used. This is the kind of table that helps prevent a decision review from collapsing all causal outputs into one number.
decision_target_summary = pd.DataFrame( [ {"target": "Overall ATE","main_output": f"{float(base_model.coef[0]):.3f}","use_when": "You need a population-average effect.","main_caution": "Can hide subgroup and distributional variation.", }, {"target": "Weighted ATE","main_output": "See weighted ATE table","use_when": "A priority population matters more than the full sample.","main_caution": "Weights define a new estimand and must be justified before analysis.", }, {"target": "Policy tree","main_output": f"Treatment rate {tree_policy.mean():.2%}","use_when": "You need a simple candidate targeting rule.","main_caution": "Needs validation, capacity/cost checks, and monitoring before use.", }, {"target": "QTE","main_output": "See quantile curve","use_when": "You care about distributional shifts, not just the mean.","main_caution": "QTE is a difference in potential-outcome quantiles, not a quantile of individual effects.", }, {"target": "CVaR","main_output": "See lower-tail table","use_when": "Downside-risk outcomes are especially important.","main_caution": "Tail estimates can be noisier and need careful uncertainty handling.", }, ])save_table(decision_target_summary, f"{NOTEBOOK_PREFIX}_decision_target_summary.csv")display(decision_target_summary)
target
main_output
use_when
main_caution
0
Overall ATE
0.347
You need a population-average effect.
Can hide subgroup and distributional variation.
1
Weighted ATE
See weighted ATE table
A priority population matters more than the fu...
Weights define a new estimand and must be just...
2
Policy tree
Treatment rate 88.86%
You need a simple candidate targeting rule.
Needs validation, capacity/cost checks, and mo...
3
QTE
See quantile curve
You care about distributional shifts, not just...
QTE is a difference in potential-outcome quant...
4
CVaR
See lower-tail table
Downside-risk outcomes are especially important.
Tail estimates can be noisier and need careful...
This table is deliberately practical. It says not only what each estimator produces, but also when the output should and should not drive a decision.
Reporting Checklist
The checklist below is reusable across decision-target analyses. It emphasizes pre-specification, target clarity, and validation.
reporting_checklist = pd.DataFrame( [ {"item": "Target definition", "guidance": "State whether the target is ATE, weighted ATE, policy value, QTE, or CVaR."}, {"item": "Weight justification", "guidance": "If using weights, explain the target population and normalize weights transparently."}, {"item": "Policy constraints", "guidance": "State treatment costs, capacity limits, fairness constraints, and monitoring needs."}, {"item": "Distributional framing", "guidance": "For QTE and CVaR, explain why mean effects are not enough."}, {"item": "Overlap", "guidance": "Show propensity overlap, especially for weighted and tail-focused targets."}, {"item": "Validation", "guidance": "Validate candidate policies and distributional claims on held-out data or future experiments."}, {"item": "Decision boundary", "guidance": "Separate evidence for prioritization from evidence for automatic deployment."}, ])save_table(reporting_checklist, f"{NOTEBOOK_PREFIX}_reporting_checklist.csv")display(reporting_checklist)
item
guidance
0
Target definition
State whether the target is ATE, weighted ATE,...
1
Weight justification
If using weights, explain the target populatio...
2
Policy constraints
State treatment costs, capacity limits, fairne...
3
Distributional framing
For QTE and CVaR, explain why mean effects are...
4
Overlap
Show propensity overlap, especially for weight...
5
Validation
Validate candidate policies and distributional...
6
Decision boundary
Separate evidence for prioritization from evid...
The checklist keeps the analysis grounded. More advanced targets can be more useful, but they also create more ways to overstate what the data supports.
Write A Reusable Report Template
This Markdown report template captures the structure of the notebook. It can be used as a starting point for a decision-target writeup.
report_template =f"""# Decision-Target Causal Report Template## Causal DesignState the treatment, outcome, population, controls, and identification assumptions. Explain why observational adjustment is credible or where it is weak.## Mean Effect- Estimated ATE: {float(base_model.coef[0]):.4f}- 95% confidence interval: [{float(base_model.confint().iloc[0, 0]):.4f}, {float(base_model.confint().iloc[0, 1]):.4f}]## Weighted EffectsList the target populations, weight definitions, and weighted ATE estimates. Explain why each weighting scheme is decision-relevant.## Policy LearningDescribe the policy features, tree depth, treatment rate, and validation plan. Make clear that the policy tree is a candidate rule, not an automatic launch decision.## Distributional EffectsReport QTEs across selected quantiles. Explain whether the treatment changes the lower tail, middle, or upper tail of the potential-outcome distribution.## Tail RiskReport CVaR levels for treated and untreated potential outcomes. Explain why downside outcomes matter for the decision.## Recommendation BoundarySeparate what the estimates support now from what needs further validation, experimentation, monitoring, or governance."""report_path = REPORT_DIR /f"{NOTEBOOK_PREFIX}_decision_target_report_template.md"report_path.write_text(report_template)print(f"Wrote report template to: {report_path}")
Wrote report template to: /home/apex/Documents/ranking_sys/notebooks/tutorials/doubleml/outputs/reports/15_decision_target_report_template.md
The template forces the report to keep target choice and decision choice separate. That separation is one of the best habits in applied causal work.
Artifact Manifest
The manifest lists the main files created by this notebook.
The notebook now produces data, tables, figures, text rules, and a report template. That makes it useful both as a tutorial and as a reusable analysis pattern.
What Comes Next
The next tutorial is about custom scores and the advanced API. This notebook used built-in decision-target tools. The next step is to learn how to extend DoubleML when the estimand or score is not already covered by a standard class.
The main lesson here is simple: better decision targets are not automatically better decisions. Weighted ATEs, policy trees, QTEs, and CVaR are powerful because they make the decision question sharper. They still need credible identification, honest uncertainty, and validation before action.