DoWhy Tutorial 13: Root Cause, Anomaly, And Distribution Change
The previous GCM notebooks used structural causal models for simulation, intervention, and counterfactual questions. This notebook uses the same GCM idea for a different class of problems: why did something look abnormal or shift over time?
We will use DoWhy GCM tools for two related but distinct tasks:
Anomaly attribution: Given a small number of unusual rows, which upstream variables explain why the target outcome is anomalous?
Distribution-change attribution: Given an old dataset and a new dataset, which causal mechanisms or upstream variables explain the change in the target distribution?
These questions are common in monitoring, operations, experimentation diagnostics, product analytics, risk systems, and model-quality investigations. The notebook stays synthetic so the story is easy to follow, but the workflow is general.
Learning Goals
By the end of this notebook, you should be able to:
Explain the difference between row-level anomaly attribution and population-level distribution-change attribution.
Fit an invertible GCM that supports anomaly attribution.
Use gcm.anomaly_scores to score unusual observations node by node.
Use gcm.attribute_anomalies to attribute target anomalies to upstream nodes.
Use gcm.distribution_change to attribute a target distribution shift between old and new data.
Use gcm.distribution_change_robust as a complementary mean-shift attribution method.
Report root-cause results with the assumptions and limitations that make them credible.
Why Root Cause Analysis Is A Causal Problem
A dashboard can tell us that a metric moved. A predictive model can tell us which variables are associated with that movement. A causal graph tries to answer a more operational question:
Which upstream changes would explain the observed downstream anomaly if the graph and mechanisms were correct?
That is a causal question because downstream variables can be symptoms rather than causes. For example, low session depth and low future value may appear together, but low session depth might be caused by an earlier latency spike or catalog-quality problem. A GCM uses the graph to keep those upstream and downstream roles separate.
Setup
This cell imports the libraries, configures output folders, suppresses known non-actionable warnings, and disables DoWhy GCM progress bars. The notebook uses InvertibleStructuralCausalModel because anomaly attribution needs to recover row-level noise for non-root nodes.
from pathlib import Pathimport osimport warnings# Keep Matplotlib cache files in a writable location during notebook execution.os.environ.setdefault("MPLCONFIGDIR", "/tmp/matplotlib-ranking-sys")warnings.filterwarnings("default")warnings.filterwarnings("ignore", category=DeprecationWarning)warnings.filterwarnings("ignore", category=PendingDeprecationWarning)warnings.filterwarnings("ignore", category=FutureWarning)warnings.filterwarnings("ignore", message=".*IProgress not found.*")warnings.filterwarnings("ignore", message=".*setParseAction.*deprecated.*")warnings.filterwarnings("ignore", message=".*copy keyword is deprecated.*")warnings.filterwarnings("ignore", message=".*disp.*iprint.*L-BFGS-B.*")warnings.filterwarnings("ignore", message=".*variables are assumed unobserved.*")warnings.filterwarnings("ignore", module="pydot.dot_parser")import numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsimport networkx as nxfrom IPython.display import displayfrom dowhy import gcmimport dowhyRANDOM_SEED =2026rng = np.random.default_rng(RANDOM_SEED)OUTPUT_DIR = Path("outputs")FIGURE_DIR = OUTPUT_DIR /"figures"TABLE_DIR = OUTPUT_DIR /"tables"FIGURE_DIR.mkdir(parents=True, exist_ok=True)TABLE_DIR.mkdir(parents=True, exist_ok=True)gcm.config.show_progress_bars =Falsesns.set_theme(style="whitegrid", context="notebook")pd.set_option("display.max_columns", 80)pd.set_option("display.float_format", lambda value: f"{value:,.4f}")print(f"DoWhy version: {getattr(dowhy, '__version__', 'unknown')}")print(f"Figures will be saved to: {FIGURE_DIR.resolve()}")print(f"Tables will be saved to: {TABLE_DIR.resolve()}")
DoWhy version: 0.14
Figures will be saved to: /home/apex/Documents/ranking_sys/notebooks/tutorials/dowhy/outputs/figures
Tables will be saved to: /home/apex/Documents/ranking_sys/notebooks/tutorials/dowhy/outputs/tables
The environment is ready once the DoWhy version and output folders print. Every saved artifact in this notebook uses a 13_ prefix.
Problem Map
Root-cause work can mean several different things. The next table separates the tasks covered in this notebook so the estimand does not get fuzzy.
problem_map = pd.DataFrame( [ {"task": "node anomaly scoring","DoWhy function": "gcm.anomaly_scores","input": "a fitted GCM and anomalous rows","question answered": "Which nodes look individually unusual, conditional on their parents?", }, {"task": "target anomaly attribution","DoWhy function": "gcm.attribute_anomalies","input": "a fitted invertible GCM, target node, and anomalous rows","question answered": "Which upstream nodes contribute to the target anomaly for each row?", }, {"task": "distribution-change attribution","DoWhy function": "gcm.distribution_change","input": "old data, new data, target node, and a causal graph","question answered": "Which node-level changes explain the target distribution shift?", }, {"task": "robust mean-shift attribution","DoWhy function": "gcm.distribution_change_robust","input": "old data, new data, target node, and a causal graph","question answered": "Which variables explain the change in a selected target functional such as the mean?", }, ])problem_map.to_csv(TABLE_DIR /"13_problem_map.csv", index=False)display(problem_map)
task
DoWhy function
input
question answered
0
node anomaly scoring
gcm.anomaly_scores
a fitted GCM and anomalous rows
Which nodes look individually unusual, conditi...
1
target anomaly attribution
gcm.attribute_anomalies
a fitted invertible GCM, target node, and anom...
Which upstream nodes contribute to the target ...
2
distribution-change attribution
gcm.distribution_change
old data, new data, target node, and a causal ...
Which node-level changes explain the target di...
3
robust mean-shift attribution
gcm.distribution_change_robust
old data, new data, target node, and a causal ...
Which variables explain the change in a select...
The key distinction is row versus population. Anomaly attribution explains specific unusual rows; distribution-change attribution explains why a target distribution changed between two samples.
Teaching Scenario
We will simulate a service-quality system with six observed variables:
traffic_intent: upstream demand or user intent.
catalog_health: availability and match quality of useful items.
latency_pressure: infrastructure or response-time pressure.
recommendation_quality: quality of the ranked experience.
session_depth: short-term engagement depth.
future_value: later value or retention-like outcome.
The target for root-cause analysis is future_value. In the old period the system is healthy. In the new period, catalog health declines, latency pressure increases, and recommendation quality receives an additional mechanism penalty. This gives us a realistic example where both root distributions and a downstream mechanism change.
field_guide = pd.DataFrame( [ {"column": "traffic_intent","role": "root context variable","plain meaning": "Baseline demand or intent entering the system.","root-cause note": "Can make outcomes high or low, but is not caused by other variables in this graph.", }, {"column": "catalog_health","role": "root quality variable","plain meaning": "How healthy and relevant the available catalog is.","root-cause note": "A decline can propagate into recommendation quality and future value.", }, {"column": "latency_pressure","role": "root reliability variable","plain meaning": "Higher values mean slower or more strained service conditions.","root-cause note": "A spike can harm recommendation quality, session depth, and future value.", }, {"column": "recommendation_quality","role": "intermediate mechanism","plain meaning": "Quality of the ranked or selected experience.","root-cause note": "Can be a cause of downstream symptoms or a symptom of upstream problems.", }, {"column": "session_depth","role": "mediating outcome","plain meaning": "Short-term engagement depth during the session.","root-cause note": "Often moves with future value, but sits downstream of quality and latency.", }, {"column": "future_value","role": "target outcome","plain meaning": "Later value we want to explain when it becomes anomalous or shifts.","root-cause note": "The target is explained by upstream nodes, not used as a root cause of itself except for residual noise.", }, ])field_guide.to_csv(TABLE_DIR /"13_field_guide.csv", index=False)display(field_guide)
column
role
plain meaning
root-cause note
0
traffic_intent
root context variable
Baseline demand or intent entering the system.
Can make outcomes high or low, but is not caus...
1
catalog_health
root quality variable
How healthy and relevant the available catalog...
A decline can propagate into recommendation qu...
2
latency_pressure
root reliability variable
Higher values mean slower or more strained ser...
A spike can harm recommendation quality, sessi...
3
recommendation_quality
intermediate mechanism
Quality of the ranked or selected experience.
Can be a cause of downstream symptoms or a sym...
4
session_depth
mediating outcome
Short-term engagement depth during the session.
Often moves with future value, but sits downst...
5
future_value
target outcome
Later value we want to explain when it becomes...
The target is explained by upstream nodes, not...
The root-cause note column is important. A downstream symptom can be useful evidence, but the graph determines whether it should be treated as a candidate cause or as part of the propagation path.
Simulate Old And New System Periods
The helper below creates data for an old healthy period and a new shifted period. The new period has three changes: lower catalog health, higher latency pressure, and an additional recommendation-quality penalty.
def simulate_service_quality_data(n, period, rng):"""Generate a service-quality dataset for either the old or new period."""if period =="old": traffic_intent = rng.normal(0.00, 1.00, size=n) catalog_health = rng.normal(0.00, 1.00, size=n) latency_pressure = rng.normal(0.00, 1.00, size=n) recommendation_penalty =0.00elif period =="new": traffic_intent = rng.normal(0.05, 1.00, size=n) catalog_health = rng.normal(-0.55, 1.00, size=n) latency_pressure = rng.normal(0.65, 1.00, size=n) recommendation_penalty =-0.25else:raiseValueError("period must be either 'old' or 'new'") recommendation_quality = (0.55* traffic_intent+0.65* catalog_health-0.55* latency_pressure+ recommendation_penalty+ rng.normal(0, 0.45, size=n) ) session_depth = (0.70* recommendation_quality+0.35* traffic_intent-0.35* latency_pressure+ rng.normal(0, 0.50, size=n) ) future_value = (0.45* recommendation_quality+0.65* session_depth+0.25* traffic_intent-0.25* latency_pressure+ rng.normal(0, 0.55, size=n) )return pd.DataFrame( {"period": period,"traffic_intent": traffic_intent,"catalog_health": catalog_health,"latency_pressure": latency_pressure,"recommendation_quality": recommendation_quality,"session_depth": session_depth,"future_value": future_value, } )old_data = simulate_service_quality_data(2_000, "old", rng)new_data = simulate_service_quality_data(2_000, "new", rng)combined_data = pd.concat([old_data, new_data], ignore_index=True)old_data.to_csv(TABLE_DIR /"13_old_period_data.csv", index=False)new_data.to_csv(TABLE_DIR /"13_new_period_data.csv", index=False)combined_data.to_csv(TABLE_DIR /"13_combined_period_data.csv", index=False)display(combined_data.head())
period
traffic_intent
catalog_health
latency_pressure
recommendation_quality
session_depth
future_value
0
old
-0.7931
-0.4100
1.4960
-1.3631
-1.8151
-2.4419
1
old
0.2406
-0.4360
-0.9171
-0.1403
0.5292
-0.2684
2
old
-1.8963
-0.0217
0.0238
-1.2923
-1.5030
-2.9105
3
old
1.3958
0.1183
-0.9931
1.6108
1.2316
2.0909
4
old
0.6383
-0.2030
0.4032
-0.2672
-0.2622
0.4969
The period column is kept for EDA and saved outputs, but the GCM itself will be fit on the causal variables. The period label is not a causal node in this graph.
Basic Period Comparison
Before using GCM tools, compare old and new summaries. This tells us the visible symptom: what changed in the new period?
The new period shows lower catalog health, higher latency pressure, lower recommendation quality, lower session depth, and lower future value. That is the symptom pattern the causal tools will try to decompose.
Plot Old Versus New Distributions
Distribution plots show whether the shift is broad or concentrated in the tails. We focus on the main upstream suspects and the target outcome.
The distribution view makes the shift visible: catalog health moves left, latency pressure moves right, and the downstream target moves left. The GCM sections ask how much each upstream change contributes to the target movement.
Specify The Causal Graph
The graph separates root conditions from downstream mechanisms. traffic_intent, catalog_health, and latency_pressure are root nodes. Recommendation quality, session depth, and future value are generated downstream.
The graph lets latency pressure affect future value both directly and indirectly through recommendation quality and session depth. That is why root-cause attribution can differ from a simple correlation ranking.
Visualize The Root-Cause Graph
The diagram places root variables on the left and downstream outcomes on the right. Arrows are drawn before node labels so they stop visually behind the labeled boxes.
The graph shows why session_depth can be a symptom rather than the original root cause. It is downstream of recommendation quality and latency pressure.
Build And Fit The Reference GCM
We fit the reference GCM on the old healthy period. Root nodes use empirical distributions. Non-root nodes use additive-noise mechanisms with linear regressors, which match the structural form used in this teaching data and support anomaly attribution.
The reference model represents normal behavior. Later, anomalies and distribution shifts are judged relative to this old-period causal system.
Fit The Reference Mechanisms
The next cell estimates each node mechanism from old-period data. After fitting, the model can generate normal samples, score anomalies, and reconstruct noise for anomalous rows.
The fitted model is now the baseline causal model. The next check asks whether it can reproduce the old-period distribution well enough for attribution examples.
Check Reference Model Samples
A root-cause workflow should not skip model checks. Here we draw samples from the fitted reference model and compare them to old-period data.
The generated sample is close to the old observed data on means and standard deviations. That does not prove the graph is correct, but it gives the attribution examples a reasonable baseline.
Choose Anomalous Rows
For anomaly attribution, we select a handful of new-period rows with the lowest future_value. These are the rows whose downstream abnormality we want to explain using the old-period reference model.
These rows are intentionally extreme in the target outcome. The attribution question is not whether they are low; it is which upstream variables explain why they are low under the fitted causal model.
Score Node-Level Anomalies
gcm.anomaly_scores gives node-level anomaly scores for each row. For root nodes, the score is based on the marginal distribution. For non-root nodes, the score is conditional on parents, so it asks whether the node is unusual given its causal inputs.
The scores show which variables look unusual for each anomalous row. A high score on a root node points to an unusual upstream condition; a high score on a child node points to unusual residual behavior after accounting for parents.
Plot Node-Level Anomaly Scores
A heatmap is easier to scan than a wide table. Darker cells mark larger node-level anomaly scores for each selected row.
The heatmap separates row-specific stories. Some low-value rows are primarily explained by root conditions, while others may include unusual downstream residual behavior.
Attribute Target Anomalies To Upstream Nodes
gcm.attribute_anomalies attributes the anomaly score of a target node to upstream nodes and the target node’s own residual noise. Because the model is invertible, DoWhy can reconstruct noise values for the anomalous rows and estimate Shapley-style contributions.
# Smaller Shapley settings keep the tutorial fast while preserving the qualitative attribution pattern.shapley_config = gcm.shapley.ShapleyConfig( num_permutations=12, num_subset_samples=250, n_jobs=1,)anomaly_attribution_dict = gcm.attribute_anomalies( reference_model, target_node="future_value", anomaly_samples=anomaly_samples, num_distribution_samples=600, shapley_config=shapley_config,)anomaly_attribution_table = pd.DataFrame(anomaly_attribution_dict).reset_index(names="anomaly_row")anomaly_attribution_table.to_csv(TABLE_DIR /"13_target_anomaly_attributions_by_row.csv", index=False)display(anomaly_attribution_table)
anomaly_row
traffic_intent
catalog_health
latency_pressure
recommendation_quality
session_depth
future_value
0
0
1.5248
0.6027
1.8258
1.2103
0.4257
1.5017
1
1
0.3036
0.8839
3.7897
-0.0811
0.4138
1.7810
2
2
1.7471
1.5004
2.7149
0.3395
0.3881
0.4009
3
3
2.4179
-0.2234
2.6530
-0.0369
0.5496
1.7307
4
4
1.4365
0.3846
2.0046
1.1893
0.2999
1.7758
5
5
2.5550
0.4034
0.7165
1.8207
0.1573
1.4381
6
6
2.5062
0.6709
0.7105
0.7116
0.7830
1.7087
7
7
1.4355
0.6852
4.1516
0.2626
0.0813
0.4748
Each row’s target anomaly is decomposed across upstream nodes and the target residual. Positive contributions increase the target anomaly score; small or negative values indicate little contribution under this attribution setup.
Summarize Anomaly Attributions
For a small incident review, row-level details matter. For a quick root-cause summary, average absolute and signed contributions across the selected anomalous rows are easier to read.
The summary ranks the strongest contributors to the selected low future-value anomalies. In this simulation, low traffic intent, catalog weakness, and latency pressure should appear as important upstream explanations.
Plot Mean Anomaly Attribution
This chart turns the attribution summary into a ranked root-cause view. The absolute scale is model-specific, so the ranking and relative size are more useful than the raw units.
The plot gives a row-level incident story: among the selected extreme rows, the strongest explanations are the nodes with the largest anomaly-attribution contributions.
Compare Row-Level Values And Attributions
Root-cause results are easier to audit when attributions are shown next to the original anomalous values. This table joins the selected rows with their largest attribution contributor.
This audit view makes it harder to overtrust a black-box ranking. We can inspect whether the top attributed node is plausible given the actual row values.
Distribution-Change Attribution Setup
Anomaly attribution explained a few low-value rows. Now we switch to a population question: why did the distribution of future_value change from the old period to the new period?
We use a signed mean difference function so the attributions are in the same direction as the target mean change.
def signed_mean_difference(old_samples, new_samples):"""Return new mean minus old mean for one-dimensional sample arrays."""returnfloat(np.mean(new_samples) - np.mean(old_samples))observed_target_shift = signed_mean_difference( old_causal_data["future_value"].to_numpy(), new_causal_data["future_value"].to_numpy(),)print(f"Observed future_value mean shift, new minus old: {observed_target_shift:.4f}")
Observed future_value mean shift, new minus old: -1.0808
The observed mean shift is negative, meaning future value decreased in the new period. Distribution-change attribution will decompose that decrease across nodes in the graph.
Run DoWhy Distribution-Change Attribution
gcm.distribution_change fits old and new versions of the causal mechanisms and estimates how much each node contributes to the target distribution change. We request additional information so we can also see which node mechanisms DoWhy flags as changed.
The signed contributions explain the direction of the target shift. Negative values push the new target mean lower than the old target mean; positive values offset the decline.
Check The Attribution Sum
For a signed mean-change attribution, the sum of node contributions should be close to the estimated target mean shift. Small differences can appear because attribution is estimated with sampling.
attribution_sum = distribution_change_table["signed_contribution"].sum()sum_check = pd.DataFrame( [ {"quantity": "observed new minus old mean shift","value": observed_target_shift, }, {"quantity": "sum of distribution-change attributions","value": attribution_sum, }, {"quantity": "absolute difference","value": abs(observed_target_shift - attribution_sum), }, ])sum_check.to_csv(TABLE_DIR /"13_distribution_change_sum_check.csv", index=False)display(sum_check)
quantity
value
0
observed new minus old mean shift
-1.0808
1
sum of distribution-change attributions
-1.0293
2
absolute difference
0.0515
The attribution sum should be close enough for teaching purposes. In production analysis, we would increase sampling settings and repeat the attribution to check stability.
Plot Distribution-Change Attributions
A signed bar chart makes it clear which nodes push the target down and which nodes push it up. The graph determines how upstream changes propagate to the target.
plot_distribution_table = distribution_change_table.sort_values("signed_contribution")fig, ax = plt.subplots(figsize=(10, 5.5))colors = ["#dc2626"if value <0else"#16a34a"for value in plot_distribution_table["signed_contribution"]]ax.barh( plot_distribution_table["node"], plot_distribution_table["signed_contribution"], color=colors,)ax.axvline(0, color="#111827", linewidth=1)ax.set_title("Signed Attribution Of Future-Value Mean Shift")ax.set_xlabel("Contribution To New Minus Old Mean Difference")ax.set_ylabel("Node")plt.tight_layout()fig.savefig(FIGURE_DIR /"13_distribution_change_attribution.png", dpi=160, bbox_inches="tight")plt.show()
The strongest negative contributors are the causal explanations for the population-level decline. This is a different question from the row-level anomaly attribution earlier.
Inspect Mechanism-Change Flags
DoWhy also returns flags indicating whether each node’s mechanism appears to have changed between old and new data. Root node flags indicate marginal distribution changes; non-root flags indicate changed conditional mechanisms.
The flags should identify catalog health and latency pressure as root shifts, and recommendation quality as a changed mechanism. That matches the way the new-period data were generated.
Robust Mean-Shift Attribution
gcm.distribution_change_robust provides another way to attribute changes in a target functional. Here we use the target mean and regression mode. This is useful as a complementary check rather than a replacement for the previous attribution.
The robust attribution is another view of the mean shift. We should expect the exact numbers to differ, but the leading contributors should tell a similar story if the signal is strong.
Compare Standard And Robust Distribution Attributions
The next table joins the two distribution-change methods. Agreement in the largest contributors strengthens the diagnostic story; disagreement is a prompt for sensitivity checks.
The methods should agree that latency pressure, catalog health, and recommendation quality are central to the target decline. That is the high-level root-cause message.
Row-Level And Population-Level Stories Side By Side
The same node can matter differently for selected anomalies and for the full-period distribution shift. This table compares the anomaly summary with distribution-change attribution.
This comparison prevents overgeneralization. A node that explains the worst individual rows is not always the largest driver of the overall population shift, and vice versa.
Practical Incident Narrative
The next cell turns the analysis into a short, structured narrative. This is the kind of summary that can sit above the figures in a report or investigation notebook.
top_distribution_nodes = distribution_change_table.head(3)["node"].tolist()top_anomaly_nodes = anomaly_attribution_summary.head(3)["node"].tolist()changed_nodes = mechanism_change_table.loc[mechanism_change_table["flagged_as_changed"], "node"].tolist()incident_narrative = pd.DataFrame( [ {"section": "Observed symptom","summary": f"Future value decreased by {observed_target_shift:.3f} on average in the new period.", }, {"section": "Population-level drivers","summary": ", ".join(top_distribution_nodes), }, {"section": "Worst-row anomaly drivers","summary": ", ".join(top_anomaly_nodes), }, {"section": "Mechanisms flagged as changed","summary": ", ".join(changed_nodes), }, {"section": "Recommended next diagnostic step","summary": "Inspect latency, catalog-health, and recommendation-quality pipelines before treating session depth as the original cause.", }, ])incident_narrative.to_csv(TABLE_DIR /"13_incident_narrative.csv", index=False)display(incident_narrative)
section
summary
0
Observed symptom
Future value decreased by -1.081 on average in...
1
Population-level drivers
latency_pressure, catalog_health, recommendati...
2
Worst-row anomaly drivers
latency_pressure, traffic_intent, future_value
3
Mechanisms flagged as changed
catalog_health, latency_pressure, recommendati...
4
Recommended next diagnostic step
Inspect latency, catalog-health, and recommend...
This narrative names the metric movement, the population drivers, the row-level anomaly drivers, and the next diagnostic step. That is more useful than a plot without a decision-oriented summary.
Assumption And Failure-Mode Checklist
Root-cause attribution is only as credible as the causal graph and fitted mechanisms. The checklist below captures the main risks to report alongside results.
assumption_checklist = pd.DataFrame( [ {"check": "Graph direction is credible","why it matters": "Attribution follows directed paths. A reversed edge can turn a symptom into a false cause.","what to do": "Validate timing and mechanism ownership with domain experts.", }, {"check": "Reference period is genuinely normal","why it matters": "Anomaly scoring is relative to the fitted baseline system.","what to do": "Fit on a stable period and rerun on alternative baselines.", }, {"check": "Mechanisms fit reasonably well","why it matters": "Poor mechanism fit can create residual anomalies that are model artifacts.","what to do": "Compare generated samples, residuals, and mechanism performance.", }, {"check": "Attributions are stable","why it matters": "Shapley estimates use sampling and can vary with settings.","what to do": "Repeat with larger sample settings or multiple seeds for final reporting.", }, {"check": "Population and row-level questions are separated","why it matters": "Worst-row root causes and overall shift drivers can differ.","what to do": "Report anomaly attribution and distribution-change attribution separately.", }, {"check": "Operational action is testable","why it matters": "Attribution is diagnostic, not proof that a fix will work.","what to do": "Use follow-up experiments, rollback tests, or targeted monitoring.", }, ])assumption_checklist.to_csv(TABLE_DIR /"13_assumption_checklist.csv", index=False)display(assumption_checklist)
check
why it matters
what to do
0
Graph direction is credible
Attribution follows directed paths. A reversed...
Validate timing and mechanism ownership with d...
1
Reference period is genuinely normal
Anomaly scoring is relative to the fitted base...
Fit on a stable period and rerun on alternativ...
2
Mechanisms fit reasonably well
Poor mechanism fit can create residual anomali...
Compare generated samples, residuals, and mech...
3
Attributions are stable
Shapley estimates use sampling and can vary wi...
Repeat with larger sample settings or multiple...
4
Population and row-level questions are separated
Worst-row root causes and overall shift driver...
Report anomaly attribution and distribution-ch...
5
Operational action is testable
Attribution is diagnostic, not proof that a fi...
Use follow-up experiments, rollback tests, or ...
The checklist is intentionally practical. Root-cause analysis can sound definitive, so every result should be paired with the checks that would make it actionable.
Final Summary
This notebook used DoWhy GCM tools for causal root-cause analysis:
gcm.anomaly_scores identified which nodes looked unusual for selected low-outcome rows.
gcm.attribute_anomalies attributed target anomalies to upstream nodes and target residual noise.
gcm.distribution_change decomposed the population-level target shift between old and new data.
gcm.distribution_change_robust provided a complementary mean-shift attribution check.
The side-by-side comparison showed why row-level anomaly stories and population-level shift stories should be reported separately.
The next tutorial is an end-to-end observational case study that combines graph design, identification, estimation, diagnostics, refutation, and reporting in one compact workflow.