Robustness and Sensitivity for Discovery Quality Mediation

The previous notebook estimated the main discovery-quality mediation result: high discovery exposure was associated with higher future interaction volume, while the satisfaction-depth mediated pathway was small and slightly negative for the primary interaction-count outcome.

This notebook asks whether that story is fragile.

A good causal portfolio notebook should not stop after one preferred specification. It should ask what happens when reasonable choices change:

The goal is not to prove the observational design is perfect. The goal is to show what is stable, what is sensitive, and what a careful reader should trust only with caveats.

1. Load Libraries and Paths

This cell imports the libraries used for robustness modeling and plotting. It also defines all inputs from the previous notebooks and all outputs created here.

from pathlib import Path
import os
import warnings

os.environ.setdefault("OMP_NUM_THREADS", "1")
os.environ.setdefault("OPENBLAS_NUM_THREADS", "1")
os.environ.setdefault("MKL_NUM_THREADS", "1")
os.environ.setdefault("NUMEXPR_NUM_THREADS", "1")

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from IPython.display import display
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

warnings.filterwarnings("ignore", category=FutureWarning)

sns.set_theme(style="whitegrid", context="notebook")
plt.rcParams["figure.figsize"] = (11, 6)
plt.rcParams["axes.titlesize"] = 13
plt.rcParams["axes.labelsize"] = 11
pd.set_option("display.max_columns", 180)
pd.set_option("display.max_colwidth", 140)

PROJECT_ROOT = Path.cwd().resolve()
while not (PROJECT_ROOT / "data").exists() and PROJECT_ROOT.parent != PROJECT_ROOT:
    PROJECT_ROOT = PROJECT_ROOT.parent

PROCESSED_DIR = PROJECT_ROOT / "data" / "processed"
NOTEBOOK_DIR = PROJECT_ROOT / "notebooks" / "discovery_quality_mediation"
WRITEUP_DIR = NOTEBOOK_DIR / "writeup"
FIGURE_DIR = WRITEUP_DIR / "figures"
TABLE_DIR = WRITEUP_DIR / "tables"

FIGURE_DIR.mkdir(parents=True, exist_ok=True)
TABLE_DIR.mkdir(parents=True, exist_ok=True)

ANALYSIS_PANEL_INPUT = PROCESSED_DIR / "kuairec_discovery_quality_mediation_analysis_panel.parquet"
EFFECT_SUMMARY_INPUT = PROCESSED_DIR / "kuairec_discovery_quality_effect_summary.csv"
ASSUMPTION_CHECKS_INPUT = PROCESSED_DIR / "kuairec_discovery_quality_assumption_checks.csv"

THRESHOLD_OUTPUT = PROCESSED_DIR / "kuairec_discovery_quality_robustness_thresholds.csv"
MEDIATOR_OUTPUT = PROCESSED_DIR / "kuairec_discovery_quality_robustness_mediators.csv"
MODEL_SENSITIVITY_OUTPUT = PROCESSED_DIR / "kuairec_discovery_quality_model_sensitivity.csv"
CONTINUOUS_OUTPUT = PROCESSED_DIR / "kuairec_discovery_quality_continuous_score_sensitivity.csv"
PLACEBO_OUTPUT = PROCESSED_DIR / "kuairec_discovery_quality_placebo_checks.csv"
ROBUSTNESS_SUMMARY_OUTPUT = PROCESSED_DIR / "kuairec_discovery_quality_robustness_summary.csv"
LIMITATIONS_OUTPUT = PROCESSED_DIR / "kuairec_discovery_quality_limitations.csv"

The thread limits keep the notebook predictable on a laptop or shared environment. This notebook runs many small linear models, so it does not need aggressive parallel math libraries.

2. Load the Analysis Panel and Main Results

This cell loads the mediation analysis panel and the effect summary from notebook 04. The effect summary is the reference point for every stress test in this notebook.

analysis_panel = pd.read_parquet(ANALYSIS_PANEL_INPUT)
effect_summary = pd.read_csv(EFFECT_SUMMARY_INPUT)
assumption_checks = pd.read_csv(ASSUMPTION_CHECKS_INPUT)

load_summary = pd.DataFrame(
    {
        "artifact": ["analysis_panel", "effect_summary", "assumption_checks"],
        "rows": [len(analysis_panel), len(effect_summary), len(assumption_checks)],
        "columns": [analysis_panel.shape[1], effect_summary.shape[1], assumption_checks.shape[1]],
    }
)

main_primary = effect_summary.query(
    "outcome == 'Y_future_interactions' and estimand in ['gcomp_total_effect', 'natural_direct_effect', 'natural_indirect_effect']"
).copy()

display(load_summary)
display(main_primary[["outcome_label", "estimand", "estimate", "ci_95_lower", "ci_95_upper", "relative_effect"]].round(4))
artifact rows columns
0 analysis_panel 8199 103
1 effect_summary 21 12
2 assumption_checks 6 5
outcome_label estimand estimate ci_95_lower ci_95_upper relative_effect
2 Future 7-day interactions gcomp_total_effect 37.2781 32.2444 42.8135 0.1152
3 Future 7-day interactions natural_direct_effect 38.5858 33.3634 43.7154 0.1192
4 Future 7-day interactions natural_indirect_effect -1.3077 -2.3047 -0.4290 -0.0036

The main result is the anchor: a positive total effect on future interactions, a direct effect of similar size, and a small negative indirect pathway through the satisfaction-depth metric. The rest of this notebook checks how stable that pattern is.

3. Define Outcomes, Mediators, and Controls

This cell defines the variable sets used across robustness checks. The rich controls mirror earlier notebooks. The simple controls are intentionally smaller and help answer whether the result is only produced by a large adjustment set.

PRIMARY_OUTCOME = "Y_future_interactions"
OUTCOME_COLUMNS = ["Y_future_interactions", "Y_future_active_days", "Y_future_play_hours"]
OUTCOME_LABELS = {
    "Y_future_interactions": "Future 7-day interactions",
    "Y_future_active_days": "Future 7-day active days",
    "Y_future_play_hours": "Future 7-day play hours",
}
BASE_TREATMENT = "A_high_discovery"
BASE_MEDIATOR = "M_satisfaction_depth"
DISCOVERY_SCORE = "discovery_breadth_score"

base_numeric_covariates = [
    "calendar_day_index",
    "lag_1_active_day",
    "prior_3day_active_day",
    "lag_1_interactions",
    "prior_3day_interactions",
    "lag_1_total_play_duration_sec",
    "prior_3day_total_play_duration_sec",
    "lag_1_valid_play_share",
    "prior_3day_valid_play_share",
    "lag_1_high_satisfaction_share",
    "prior_3day_high_satisfaction_share",
    "lag_1_discovery_candidate_share",
    "prior_3day_discovery_candidate_share",
    "recent_activity_score",
    "register_days",
    "follow_user_num",
    "fans_user_num",
    "friend_user_num",
    "is_lowactive_period",
    "is_live_streamer",
    "is_video_author",
]
profile_onehot_covariates = [col for col in analysis_panel.columns if col.startswith("onehot_feat")]
rich_numeric_covariates = [col for col in base_numeric_covariates + profile_onehot_covariates if col in analysis_panel.columns]
rich_categorical_covariates = [
    col
    for col in [
        "user_active_degree",
        "follow_user_num_range",
        "fans_user_num_range",
        "friend_user_num_range",
        "register_days_range",
    ]
    if col in analysis_panel.columns
]

simple_numeric_covariates = [
    col
    for col in [
        "calendar_day_index",
        "lag_1_interactions",
        "prior_3day_interactions",
        "prior_3day_high_satisfaction_share",
        "prior_3day_discovery_candidate_share",
        "register_days",
    ]
    if col in analysis_panel.columns
]
simple_categorical_covariates = ["user_active_degree"] if "user_active_degree" in analysis_panel.columns else []

mediator_candidates = {
    "composite_satisfaction_depth": "M_satisfaction_depth",
    "high_watch_ratio_share": "high_satisfaction_share",
    "valid_play_share": "valid_play_share",
    "average_satisfaction_score": "avg_satisfaction_score",
    "completion_or_rewatch_share": "complete_or_rewatch_share",
}

control_summary = pd.DataFrame(
    {
        "control_set": ["rich", "simple"],
        "numeric_covariates": [len(rich_numeric_covariates), len(simple_numeric_covariates)],
        "categorical_covariates": [len(rich_categorical_covariates), len(simple_categorical_covariates)],
    }
)
mediator_summary = pd.DataFrame(
    {"mediator_label": list(mediator_candidates.keys()), "column": list(mediator_candidates.values())}
)

display(control_summary)
display(mediator_summary)
control_set numeric_covariates categorical_covariates
0 rich 39 5
1 simple 6 1
mediator_label column
0 composite_satisfaction_depth M_satisfaction_depth
1 high_watch_ratio_share high_satisfaction_share
2 valid_play_share valid_play_share
3 average_satisfaction_score avg_satisfaction_score
4 completion_or_rewatch_share complete_or_rewatch_share

The mediator candidates are all same-day quality signals, but they do not measure exactly the same thing. If the indirect pathway changes sign or size across these mediators, that tells us the mediation result is sensitive to how satisfaction is measured.

4. Build Reusable Estimation Helpers

This cell defines the robustness estimator. It follows the same g-computation logic as notebook 04 but adds options for treatment definitions, mediator definitions, control sets, weighting, and treatment-mediator interactions.

def build_covariate_matrix(frame, numeric_cols, categorical_cols):
    numeric = frame[numeric_cols].copy()
    for col in numeric.columns:
        numeric[col] = pd.to_numeric(numeric[col], errors="coerce")
        numeric[col] = numeric[col].fillna(numeric[col].median())

    if categorical_cols:
        categorical = frame[categorical_cols].copy()
        for col in categorical.columns:
            categorical[col] = categorical[col].astype("string").fillna("missing")
        encoded = pd.get_dummies(categorical, drop_first=True, dtype=float)
        design = pd.concat([numeric, encoded], axis=1)
    else:
        design = numeric

    design = design.loc[:, design.nunique(dropna=False) > 1]
    return design.astype(float)


def make_treatment_design(covariates, treatment_values, treatment_col):
    design = covariates.copy()
    design.insert(0, treatment_col, np.asarray(treatment_values, dtype=float))
    return design


def make_outcome_design(covariates, treatment_values, mediator_values, treatment_col, mediator_col, include_interaction=True):
    treatment_array = np.asarray(treatment_values, dtype=float)
    mediator_array = np.asarray(mediator_values, dtype=float)
    design = covariates.copy()
    if include_interaction:
        design.insert(0, "A_x_M", treatment_array * mediator_array)
    design.insert(0, mediator_col, mediator_array)
    design.insert(0, treatment_col, treatment_array)
    return design


def estimate_stabilized_weights(frame, treatment_col, covariates):
    treatment = frame[treatment_col].astype(int).to_numpy()
    treatment_rate = treatment.mean()
    if treatment_rate <= 0 or treatment_rate >= 1:
        return np.ones(len(frame)), pd.DataFrame([{"diagnostic": "degenerate_treatment", "value": 1.0}])

    propensity_model = make_pipeline(
        StandardScaler(),
        LogisticRegression(max_iter=1000, solver="lbfgs"),
    )
    propensity_model.fit(covariates, treatment)
    propensity = propensity_model.predict_proba(covariates)[:, 1].clip(0.02, 0.98)
    weights = np.where(treatment == 1, treatment_rate / propensity, (1 - treatment_rate) / (1 - propensity))
    cap = np.quantile(weights, 0.99)
    weights = np.clip(weights, None, cap)
    ess = weights.sum() ** 2 / np.square(weights).sum()

    diagnostics = pd.DataFrame(
        [
            {"diagnostic": "treatment_rate", "value": treatment_rate},
            {"diagnostic": "propensity_p05", "value": np.quantile(propensity, 0.05)},
            {"diagnostic": "propensity_p95", "value": np.quantile(propensity, 0.95)},
            {"diagnostic": "weight_max", "value": weights.max()},
            {"diagnostic": "effective_sample_size", "value": ess},
        ]
    )
    return weights, diagnostics


def average(values):
    return float(np.mean(np.asarray(values, dtype=float)))


def estimate_binary_mediation_spec(
    frame,
    treatment_col,
    mediator_col,
    outcome_col,
    numeric_cols,
    categorical_cols,
    spec_name,
    use_weights=True,
    include_interaction=True,
):
    working = frame.copy().reset_index(drop=True)
    covariates = build_covariate_matrix(working, numeric_cols, categorical_cols)
    A = working[treatment_col].astype(float).to_numpy()
    M = working[mediator_col].astype(float).to_numpy()
    Y = working[outcome_col].astype(float).to_numpy()

    if use_weights:
        weights, weight_diagnostics = estimate_stabilized_weights(working, treatment_col, covariates)
    else:
        weights = None
        weight_diagnostics = pd.DataFrame(
            [
                {"diagnostic": "treatment_rate", "value": A.mean()},
                {"diagnostic": "effective_sample_size", "value": len(working)},
            ]
        )

    mediator_X = make_treatment_design(covariates, A, treatment_col)
    mediator_model = LinearRegression().fit(mediator_X, M, sample_weight=weights)
    M0 = mediator_model.predict(make_treatment_design(covariates, np.zeros(len(working)), treatment_col)).clip(0, 1)
    M1 = mediator_model.predict(make_treatment_design(covariates, np.ones(len(working)), treatment_col)).clip(0, 1)

    outcome_X = make_outcome_design(covariates, A, M, treatment_col, mediator_col, include_interaction=include_interaction)
    outcome_model = LinearRegression().fit(outcome_X, Y, sample_weight=weights)

    y_0_m0 = outcome_model.predict(make_outcome_design(covariates, np.zeros(len(working)), M0, treatment_col, mediator_col, include_interaction=include_interaction))
    y_1_m0 = outcome_model.predict(make_outcome_design(covariates, np.ones(len(working)), M0, treatment_col, mediator_col, include_interaction=include_interaction))
    y_1_m1 = outcome_model.predict(make_outcome_design(covariates, np.ones(len(working)), M1, treatment_col, mediator_col, include_interaction=include_interaction))

    raw_high = working.loc[working[treatment_col].eq(1), outcome_col].mean()
    raw_low = working.loc[working[treatment_col].eq(0), outcome_col].mean()
    baseline_mean = average(y_0_m0)
    total_effect = average(y_1_m1 - y_0_m0)
    direct_effect = average(y_1_m0 - y_0_m0)
    indirect_effect = average(y_1_m1 - y_1_m0)
    mediator_shift = average(M1 - M0)

    result = {
        "spec_name": spec_name,
        "treatment_col": treatment_col,
        "mediator_col": mediator_col,
        "outcome": outcome_col,
        "outcome_label": OUTCOME_LABELS.get(outcome_col, outcome_col),
        "treatment_rate": A.mean(),
        "raw_high_minus_lower": raw_high - raw_low,
        "gcomp_total_effect": total_effect,
        "natural_direct_effect": direct_effect,
        "natural_indirect_effect": indirect_effect,
        "mediator_shift": mediator_shift,
        "proportion_mediated": indirect_effect / total_effect if not np.isclose(total_effect, 0) else np.nan,
        "reference_mean": baseline_mean,
        "relative_total_effect": total_effect / baseline_mean if not np.isclose(baseline_mean, 0) else np.nan,
        "uses_weights": use_weights,
        "include_interaction": include_interaction,
        "control_columns": covariates.shape[1],
        "mediator_model_r2": mediator_model.score(mediator_X, M, sample_weight=weights),
        "outcome_model_r2": outcome_model.score(outcome_X, Y, sample_weight=weights),
    }
    for _, row in weight_diagnostics.iterrows():
        result[f"weight_{row['diagnostic']}"] = row["value"]
    return result

print("Robustness estimator ready.")
Robustness estimator ready.

The estimator returns one row per specification. That row has the raw contrast, adjusted total effect, direct effect, indirect effect, mediator shift, treatment rate, model fit, and weight diagnostics.

5. Reproduce the Baseline Specification

Before stress-testing, this cell reruns the baseline specification from notebook 04 using the helper above. The goal is to confirm that the robustness machinery lands close to the saved main result.

baseline_spec = pd.DataFrame(
    [
        estimate_binary_mediation_spec(
            analysis_panel,
            treatment_col=BASE_TREATMENT,
            mediator_col=BASE_MEDIATOR,
            outcome_col=PRIMARY_OUTCOME,
            numeric_cols=rich_numeric_covariates,
            categorical_cols=rich_categorical_covariates,
            spec_name="baseline_median_high_discovery",
            use_weights=True,
            include_interaction=True,
        )
    ]
)

baseline_compare = baseline_spec[
    [
        "spec_name",
        "treatment_rate",
        "raw_high_minus_lower",
        "gcomp_total_effect",
        "natural_direct_effect",
        "natural_indirect_effect",
        "mediator_shift",
        "proportion_mediated",
        "relative_total_effect",
    ]
]

display(baseline_compare.round(4))
spec_name treatment_rate raw_high_minus_lower gcomp_total_effect natural_direct_effect natural_indirect_effect mediator_shift proportion_mediated relative_total_effect
0 baseline_median_high_discovery 0.5001 176.4257 37.2781 38.5858 -1.3077 0.0231 -0.0351 0.1152

The reproduced baseline should be very close to the saved notebook 04 estimates. Small differences can appear because this notebook re-estimates propensity weights internally, but the qualitative story should match.

6. Robustness to Alternative Discovery Thresholds

The baseline treatment uses the top half of discovery-breadth days. This cell tests stricter thresholds: top 40%, top 30%, and top 25%. A stricter treatment asks whether the effect is still present when high discovery means a more extreme discovery day.

threshold_frame = analysis_panel.copy()
threshold_specs = [
    ("top_50_percent_baseline", 0.50),
    ("top_40_percent_discovery", 0.60),
    ("top_30_percent_discovery", 0.70),
    ("top_25_percent_discovery", 0.75),
]
threshold_rows = []

for spec_name, quantile_cutoff in threshold_specs:
    threshold_value = threshold_frame[DISCOVERY_SCORE].quantile(quantile_cutoff)
    treatment_col = f"A_{spec_name}"
    threshold_frame[treatment_col] = (threshold_frame[DISCOVERY_SCORE] >= threshold_value).astype("int8")
    result = estimate_binary_mediation_spec(
        threshold_frame,
        treatment_col=treatment_col,
        mediator_col=BASE_MEDIATOR,
        outcome_col=PRIMARY_OUTCOME,
        numeric_cols=rich_numeric_covariates,
        categorical_cols=rich_categorical_covariates,
        spec_name=spec_name,
        use_weights=True,
        include_interaction=True,
    )
    result["threshold_quantile"] = quantile_cutoff
    result["threshold_value"] = threshold_value
    threshold_rows.append(result)

threshold_results = pd.DataFrame(threshold_rows)

display(
    threshold_results[
        [
            "spec_name",
            "threshold_value",
            "treatment_rate",
            "gcomp_total_effect",
            "natural_direct_effect",
            "natural_indirect_effect",
            "mediator_shift",
            "relative_total_effect",
            "weight_effective_sample_size",
        ]
    ].round(4)
)
spec_name threshold_value treatment_rate gcomp_total_effect natural_direct_effect natural_indirect_effect mediator_shift relative_total_effect weight_effective_sample_size
0 top_50_percent_baseline 0.3300 0.5001 37.2781 38.5858 -1.3077 0.0231 0.1152 5701.3677
1 top_40_percent_discovery 0.3551 0.4000 40.0555 41.7088 -1.6532 0.0232 0.1227 6334.5853
2 top_30_percent_discovery 0.3814 0.3000 38.8190 40.4679 -1.6489 0.0230 0.1173 7016.2744
3 top_25_percent_discovery 0.3959 0.2500 39.7750 41.3340 -1.5590 0.0239 0.1196 7263.6365

This table checks whether the total effect disappears when the treatment becomes more selective. If effects are similar or larger under stricter definitions, the high-discovery story is not merely an artifact of the median split.

7. Plot Threshold Sensitivity

This cell visualizes total, direct, and indirect effects across discovery thresholds. The plot makes it easier to see whether the effect decomposition changes smoothly or flips under stricter cutoffs.

threshold_plot = threshold_results.melt(
    id_vars=["spec_name", "threshold_quantile", "treatment_rate"],
    value_vars=["gcomp_total_effect", "natural_direct_effect", "natural_indirect_effect"],
    var_name="effect_type",
    value_name="estimate",
)
threshold_plot["effect_label"] = threshold_plot["effect_type"].map(
    {
        "gcomp_total_effect": "Total",
        "natural_direct_effect": "Direct",
        "natural_indirect_effect": "Indirect",
    }
)
threshold_plot["threshold_label"] = threshold_plot["spec_name"].str.replace("_", " ").str.title()

fig, ax = plt.subplots(figsize=(12, 6))
sns.lineplot(
    data=threshold_plot,
    x="treatment_rate",
    y="estimate",
    hue="effect_label",
    marker="o",
    linewidth=2,
    ax=ax,
)
ax.axhline(0, color="black", linewidth=1)
ax.invert_xaxis()
ax.set_title("Effect Sensitivity to High-Discovery Threshold")
ax.set_xlabel("Treatment rate, stricter thresholds to the right")
ax.set_ylabel("Effect on future 7-day interactions")
plt.tight_layout()
fig.savefig(FIGURE_DIR / "17_threshold_sensitivity.png", dpi=160, bbox_inches="tight")
plt.show()

The threshold plot is a stability check. A jagged pattern would suggest that the result depends heavily on an arbitrary threshold. A smooth pattern is more reassuring.

8. Robustness to Alternative Mediator Definitions

The baseline mediator is a composite satisfaction-depth score. This cell reruns the mediation decomposition with simpler mediator candidates. The purpose is to test whether the indirect pathway is a property of satisfaction broadly or only of one engineered score.

mediator_rows = []
for mediator_label, mediator_col in mediator_candidates.items():
    result = estimate_binary_mediation_spec(
        analysis_panel,
        treatment_col=BASE_TREATMENT,
        mediator_col=mediator_col,
        outcome_col=PRIMARY_OUTCOME,
        numeric_cols=rich_numeric_covariates,
        categorical_cols=rich_categorical_covariates,
        spec_name=mediator_label,
        use_weights=True,
        include_interaction=True,
    )
    result["mediator_label"] = mediator_label
    mediator_rows.append(result)

mediator_results = pd.DataFrame(mediator_rows)

display(
    mediator_results[
        [
            "mediator_label",
            "mediator_col",
            "gcomp_total_effect",
            "natural_direct_effect",
            "natural_indirect_effect",
            "mediator_shift",
            "proportion_mediated",
            "mediator_model_r2",
        ]
    ].round(4)
)
mediator_label mediator_col gcomp_total_effect natural_direct_effect natural_indirect_effect mediator_shift proportion_mediated mediator_model_r2
0 composite_satisfaction_depth M_satisfaction_depth 37.2781 38.5858 -1.3077 0.0231 -0.0351 0.4653
1 high_watch_ratio_share high_satisfaction_share 37.2755 38.9900 -1.7145 0.0424 -0.0460 0.4301
2 valid_play_share valid_play_share 37.3502 37.2221 0.1281 0.0065 0.0034 0.5762
3 average_satisfaction_score avg_satisfaction_score 37.2560 38.5948 -1.3388 0.0174 -0.0359 0.4418
4 completion_or_rewatch_share complete_or_rewatch_share 37.2665 38.7399 -1.4734 0.0289 -0.0395 0.4126

This is one of the most important checks. If all satisfaction mediators show small mediated pathways, the notebook 04 conclusion is more stable. If one mediator dominates, the final report should be precise about which satisfaction definition matters.

9. Plot Mediator Sensitivity

This cell plots the direct and indirect estimates across mediator definitions. It keeps the total effect visible as a reference but focuses attention on the mediated pathway.

mediator_plot = mediator_results.melt(
    id_vars=["mediator_label"],
    value_vars=["gcomp_total_effect", "natural_direct_effect", "natural_indirect_effect"],
    var_name="effect_type",
    value_name="estimate",
)
mediator_plot["effect_label"] = mediator_plot["effect_type"].map(
    {
        "gcomp_total_effect": "Total",
        "natural_direct_effect": "Direct",
        "natural_indirect_effect": "Indirect",
    }
)
mediator_plot["mediator_label_clean"] = mediator_plot["mediator_label"].str.replace("_", " ").str.title()

fig, ax = plt.subplots(figsize=(13, 6))
sns.barplot(
    data=mediator_plot,
    x="estimate",
    y="mediator_label_clean",
    hue="effect_label",
    ax=ax,
)
ax.axvline(0, color="black", linewidth=1)
ax.set_title("Effect Decomposition Across Mediator Definitions")
ax.set_xlabel("Effect on future 7-day interactions")
ax.set_ylabel("Mediator definition")
plt.tight_layout()
fig.savefig(FIGURE_DIR / "18_mediator_sensitivity.png", dpi=160, bbox_inches="tight")
plt.show()

The mediator plot makes the main sensitivity question visible. A consistent direct-effect bar with small indirect bars supports the idea that future interaction gains are not mainly routed through these satisfaction proxies.

10. Robustness Across Outcomes

This cell repeats the baseline decomposition for future interactions, future active days, and future play hours. Notebook 04 already estimated these with bootstrap intervals; here we keep the point-estimate grid alongside the other robustness checks.

outcome_rows = []
for outcome_col in OUTCOME_COLUMNS:
    result = estimate_binary_mediation_spec(
        analysis_panel,
        treatment_col=BASE_TREATMENT,
        mediator_col=BASE_MEDIATOR,
        outcome_col=outcome_col,
        numeric_cols=rich_numeric_covariates,
        categorical_cols=rich_categorical_covariates,
        spec_name=f"baseline_{outcome_col}",
        use_weights=True,
        include_interaction=True,
    )
    outcome_rows.append(result)

outcome_results = pd.DataFrame(outcome_rows)

display(
    outcome_results[
        [
            "outcome_label",
            "gcomp_total_effect",
            "natural_direct_effect",
            "natural_indirect_effect",
            "mediator_shift",
            "relative_total_effect",
            "proportion_mediated",
        ]
    ].round(4)
)
outcome_label gcomp_total_effect natural_direct_effect natural_indirect_effect mediator_shift relative_total_effect proportion_mediated
0 Future 7-day interactions 37.2781 38.5858 -1.3077 0.0231 0.1152 -0.0351
1 Future 7-day active days 0.0074 0.0021 0.0053 0.0231 0.0011 0.7175
2 Future 7-day play hours 0.0784 0.0739 0.0044 0.0231 0.1000 0.0562

Different outcomes answer different product questions. Future interactions measure volume, active days measure return frequency, and play hours measure attention. A result can be meaningful for one and weak for another.

11. Sensitivity to Model Choices

This cell changes the model specification while holding the baseline treatment, mediator, and primary outcome fixed. It compares weighted versus unweighted regression, interaction versus no interaction, and rich versus simple controls.

model_specs = [
    {
        "spec_name": "weighted_rich_with_interaction",
        "use_weights": True,
        "include_interaction": True,
        "numeric_cols": rich_numeric_covariates,
        "categorical_cols": rich_categorical_covariates,
    },
    {
        "spec_name": "unweighted_rich_with_interaction",
        "use_weights": False,
        "include_interaction": True,
        "numeric_cols": rich_numeric_covariates,
        "categorical_cols": rich_categorical_covariates,
    },
    {
        "spec_name": "weighted_rich_no_interaction",
        "use_weights": True,
        "include_interaction": False,
        "numeric_cols": rich_numeric_covariates,
        "categorical_cols": rich_categorical_covariates,
    },
    {
        "spec_name": "weighted_simple_with_interaction",
        "use_weights": True,
        "include_interaction": True,
        "numeric_cols": simple_numeric_covariates,
        "categorical_cols": simple_categorical_covariates,
    },
    {
        "spec_name": "unweighted_simple_no_interaction",
        "use_weights": False,
        "include_interaction": False,
        "numeric_cols": simple_numeric_covariates,
        "categorical_cols": simple_categorical_covariates,
    },
]

model_rows = []
for spec in model_specs:
    result = estimate_binary_mediation_spec(
        analysis_panel,
        treatment_col=BASE_TREATMENT,
        mediator_col=BASE_MEDIATOR,
        outcome_col=PRIMARY_OUTCOME,
        numeric_cols=spec["numeric_cols"],
        categorical_cols=spec["categorical_cols"],
        spec_name=spec["spec_name"],
        use_weights=spec["use_weights"],
        include_interaction=spec["include_interaction"],
    )
    result["control_set"] = "rich" if len(spec["numeric_cols"]) == len(rich_numeric_covariates) else "simple"
    model_rows.append(result)

model_sensitivity = pd.DataFrame(model_rows)

display(
    model_sensitivity[
        [
            "spec_name",
            "uses_weights",
            "include_interaction",
            "control_set",
            "gcomp_total_effect",
            "natural_direct_effect",
            "natural_indirect_effect",
            "mediator_shift",
            "relative_total_effect",
        ]
    ].round(4)
)
spec_name uses_weights include_interaction control_set gcomp_total_effect natural_direct_effect natural_indirect_effect mediator_shift relative_total_effect
0 weighted_rich_with_interaction True True rich 37.2781 38.5858 -1.3077 0.0231 0.1152
1 unweighted_rich_with_interaction False True rich 36.4288 37.9141 -1.4853 0.0229 0.1130
2 weighted_rich_no_interaction True False rich 37.3139 38.4311 -1.1172 0.0231 0.1153
3 weighted_simple_with_interaction True True simple 37.6289 39.1871 -1.5581 0.0347 0.1169
4 unweighted_simple_no_interaction False False simple 34.8085 35.7445 -0.9360 0.0320 0.1077

This table tells us whether the result is mostly a modeling artifact. A stable total effect across these rows is reassuring. A highly variable indirect effect would tell us to be cautious about strong claims about the mediator pathway.

12. Plot Model-Specification Sensitivity

This cell visualizes the model-sensitivity table. It keeps direct and indirect pieces side by side for each model choice.

model_plot = model_sensitivity.melt(
    id_vars=["spec_name"],
    value_vars=["gcomp_total_effect", "natural_direct_effect", "natural_indirect_effect"],
    var_name="effect_type",
    value_name="estimate",
)
model_plot["effect_label"] = model_plot["effect_type"].map(
    {
        "gcomp_total_effect": "Total",
        "natural_direct_effect": "Direct",
        "natural_indirect_effect": "Indirect",
    }
)
model_plot["spec_label"] = model_plot["spec_name"].str.replace("_", " ").str.title()

fig, ax = plt.subplots(figsize=(13, 7))
sns.barplot(
    data=model_plot,
    x="estimate",
    y="spec_label",
    hue="effect_label",
    ax=ax,
)
ax.axvline(0, color="black", linewidth=1)
ax.set_title("Effect Decomposition Across Model Choices")
ax.set_xlabel("Effect on future 7-day interactions")
ax.set_ylabel("Model choice")
plt.tight_layout()
fig.savefig(FIGURE_DIR / "19_model_sensitivity.png", dpi=160, bbox_inches="tight")
plt.show()

The model-sensitivity plot is a compact way to see which part of the story is stable. In a strong result, the total effect should not depend on one narrow modeling choice.

13. Continuous Discovery Score Check

The binary treatment is easy to explain, but it loses information. This cell treats discovery_breadth_score as continuous and estimates adjusted associations per 0.10 increase in discovery breadth. This is not a natural-effect decomposition; it is a separate dose-style sensitivity check.

def estimate_continuous_association(frame, score_col, target_col, numeric_cols, categorical_cols, scale=0.10):
    covariates = build_covariate_matrix(frame, numeric_cols, categorical_cols)
    score = frame[score_col].astype(float).to_numpy()
    target = frame[target_col].astype(float).to_numpy()
    design = covariates.copy()
    design.insert(0, score_col, score)
    model = LinearRegression().fit(design, target)
    coef = float(model.coef_[0])
    return {
        "target": target_col,
        "target_label": OUTCOME_LABELS.get(target_col, target_col),
        "score_col": score_col,
        "effect_per_0_10_score_increase": coef * scale,
        "raw_score_target_corr": frame[score_col].corr(frame[target_col], method="spearman"),
        "model_r2": model.score(design, target),
    }

continuous_rows = []
for target_col in OUTCOME_COLUMNS + [BASE_MEDIATOR]:
    continuous_rows.append(
        estimate_continuous_association(
            analysis_panel,
            DISCOVERY_SCORE,
            target_col,
            rich_numeric_covariates,
            rich_categorical_covariates,
        )
    )
continuous_sensitivity = pd.DataFrame(continuous_rows)

display(continuous_sensitivity.round(4))
target target_label score_col effect_per_0_10_score_increase raw_score_target_corr model_r2
0 Y_future_interactions Future 7-day interactions discovery_breadth_score 20.5518 0.5674 0.7220
1 Y_future_active_days Future 7-day active days discovery_breadth_score 0.0966 0.3980 0.5179
2 Y_future_play_hours Future 7-day play hours discovery_breadth_score 0.0447 0.5392 0.6999
3 M_satisfaction_depth M_satisfaction_depth discovery_breadth_score 0.0129 0.0930 0.4612

The continuous check asks whether more discovery breadth is associated with more future value without forcing a high-versus-low split. It should broadly agree with the binary result if the threshold is not doing all the work.

14. Placebo-Style Checks for Pre-Period Activity and Calendar Time

A dangerous pattern would be high discovery exposure strongly predicting variables that happened before the treatment day. This cell checks raw and adjusted relationships between high discovery and pre-period activity, prior satisfaction, prior discovery, and calendar day.

placebo_targets = [
    "lag_1_interactions",
    "prior_3day_interactions",
    "prior_3day_high_satisfaction_share",
    "prior_3day_discovery_candidate_share",
    "calendar_day_index",
]

placebo_adjustment_numeric = [
    col
    for col in ["register_days", "follow_user_num", "fans_user_num", "friend_user_num", "is_lowactive_period", "is_live_streamer", "is_video_author"]
    if col in analysis_panel.columns
] + profile_onehot_covariates
placebo_adjustment_categorical = rich_categorical_covariates

placebo_rows = []
for target_col in placebo_targets:
    raw_high = analysis_panel.loc[analysis_panel[BASE_TREATMENT].eq(1), target_col].mean()
    raw_low = analysis_panel.loc[analysis_panel[BASE_TREATMENT].eq(0), target_col].mean()

    numeric_cols = [col for col in placebo_adjustment_numeric if col != target_col]
    categorical_cols = [col for col in placebo_adjustment_categorical if col != target_col]
    covariates = build_covariate_matrix(analysis_panel, numeric_cols, categorical_cols)
    design = covariates.copy()
    design.insert(0, BASE_TREATMENT, analysis_panel[BASE_TREATMENT].astype(float).to_numpy())
    model = LinearRegression().fit(design, analysis_panel[target_col].astype(float).to_numpy())
    adjusted_difference = float(model.coef_[0])

    placebo_rows.append(
        {
            "placebo_target": target_col,
            "raw_high_minus_lower": raw_high - raw_low,
            "adjusted_difference": adjusted_difference,
            "target_mean": analysis_panel[target_col].mean(),
            "relative_raw_difference": (raw_high - raw_low) / analysis_panel[target_col].mean() if not np.isclose(analysis_panel[target_col].mean(), 0) else np.nan,
        }
    )

placebo_checks = pd.DataFrame(placebo_rows)

display(placebo_checks.round(4))
placebo_target raw_high_minus_lower adjusted_difference target_mean relative_raw_difference
0 lag_1_interactions 18.3176 18.3670 51.0316 0.3589
1 prior_3day_interactions 49.0163 49.1383 152.5819 0.3212
2 prior_3day_high_satisfaction_share -0.0321 -0.0363 1.3416 -0.0239
3 prior_3day_discovery_candidate_share 0.3502 0.3493 1.0706 0.3271
4 calendar_day_index -15.6133 -15.6544 30.7969 -0.5070

The placebo-style checks are expected to show some pre-period differences because discovery exposure is not randomized. The point is to make that imbalance visible and connect it back to the need for adjustment and sensitivity analysis.

15. Plot Placebo-Style Checks

This cell plots raw and adjusted pre-period differences. Large pre-period differences are not fatal by themselves, but they are a warning against overclaiming from observational logs.

placebo_plot = placebo_checks.melt(
    id_vars=["placebo_target"],
    value_vars=["raw_high_minus_lower", "adjusted_difference"],
    var_name="contrast_type",
    value_name="difference",
)
placebo_plot["contrast_type"] = placebo_plot["contrast_type"].map(
    {
        "raw_high_minus_lower": "Raw high - lower",
        "adjusted_difference": "Adjusted difference",
    }
)
placebo_plot["target_label"] = placebo_plot["placebo_target"].str.replace("_", " ").str.title()

fig, ax = plt.subplots(figsize=(12, 6))
sns.barplot(data=placebo_plot, x="difference", y="target_label", hue="contrast_type", ax=ax)
ax.axvline(0, color="black", linewidth=1)
ax.set_title("Placebo-Style Checks on Pre-Period Variables")
ax.set_xlabel("High-discovery minus lower-discovery difference")
ax.set_ylabel("Pre-period or calendar variable")
plt.tight_layout()
fig.savefig(FIGURE_DIR / "20_placebo_preperiod_checks.png", dpi=160, bbox_inches="tight")
plt.show()

This plot helps communicate why the earlier balance and propensity diagnostics matter. The treatment is observational, so a final report should emphasize adjusted estimates and limitations rather than raw gaps.

16. Summarize Robustness Patterns

This cell creates a compact summary across robustness families. It focuses on whether the total effect stays positive and whether the indirect effect remains small relative to the direct effect.

robustness_families = []

for family_name, frame in [
    ("threshold_sensitivity", threshold_results),
    ("mediator_sensitivity", mediator_results),
    ("model_sensitivity", model_sensitivity),
    ("outcome_sensitivity", outcome_results.query("outcome == @PRIMARY_OUTCOME")),
]:
    robustness_families.append(
        {
            "robustness_family": family_name,
            "specifications": len(frame),
            "total_effect_min": frame["gcomp_total_effect"].min(),
            "total_effect_max": frame["gcomp_total_effect"].max(),
            "total_effect_share_positive": (frame["gcomp_total_effect"] > 0).mean(),
            "indirect_effect_min": frame["natural_indirect_effect"].min(),
            "indirect_effect_max": frame["natural_indirect_effect"].max(),
            "indirect_effect_share_positive": (frame["natural_indirect_effect"] > 0).mean(),
            "median_abs_indirect_to_total_ratio": np.nanmedian(
                np.abs(frame["natural_indirect_effect"] / frame["gcomp_total_effect"].replace(0, np.nan))
            ),
        }
    )

robustness_summary = pd.DataFrame(robustness_families)

display(robustness_summary.round(4))
robustness_family specifications total_effect_min total_effect_max total_effect_share_positive indirect_effect_min indirect_effect_max indirect_effect_share_positive median_abs_indirect_to_total_ratio
0 threshold_sensitivity 4 37.2781 40.0555 1.0 -1.6532 -1.3077 0.0 0.0402
1 mediator_sensitivity 5 37.2560 37.3502 1.0 -1.7145 0.1281 0.2 0.0359
2 model_sensitivity 5 34.8085 37.6289 1.0 -1.5581 -0.9360 0.0 0.0351
3 outcome_sensitivity 1 37.2781 37.2781 1.0 -1.3077 -1.3077 0.0 0.0351

This summary turns many model runs into a few takeaways. The strongest claim is the one that remains stable across robustness families. The weaker claim is the one that moves around or depends on a specific metric definition.

17. Limitations Table

This cell writes a limitations table for the final report. These are not flaws to hide. They are the boundaries that make the analysis honest and more believable.

limitations = pd.DataFrame(
    [
        {
            "limitation": "observational_recommendation_logs",
            "why_it_matters": "High-discovery exposure was not randomized, so user preferences and recommender selection can confound estimates.",
            "mitigation_in_this_work": "Used pre-treatment history/profile controls, propensity diagnostics, and weighted sensitivity checks.",
            "remaining_risk": "Unobserved ranking-system state and user intent can still bias estimates.",
        },
        {
            "limitation": "mediator_outcome_confounding",
            "why_it_matters": "Satisfaction depth and future engagement may share unobserved causes.",
            "mitigation_in_this_work": "Checked residual mediator-outcome relationships and compared multiple mediator definitions.",
            "remaining_risk": "Natural direct and indirect effects rely on strong mediator assumptions.",
        },
        {
            "limitation": "satisfaction_proxy_measurement",
            "why_it_matters": "Watch ratio and valid play are proxies, not direct surveys of satisfaction.",
            "mitigation_in_this_work": "Compared composite satisfaction, high watch ratio, valid play, average score, and completion-style mediators.",
            "remaining_risk": "The chosen proxies may miss frustration, novelty value, or long-run satisfaction.",
        },
        {
            "limitation": "active_day_sample_selection",
            "why_it_matters": "The analysis conditions on active user-days, so it does not study days with no observed interactions.",
            "mitigation_in_this_work": "Defined the estimand clearly for active days and reported active-day alternate outcomes.",
            "remaining_risk": "Results may not generalize to dormant users or cold-start retention decisions.",
        },
        {
            "limitation": "calendar_and_history_imbalance",
            "why_it_matters": "High discovery days differ from lower discovery days in time and recent behavior.",
            "mitigation_in_this_work": "Included calendar/history controls and showed placebo-style pre-period checks.",
            "remaining_risk": "Residual time-varying confounding can remain after observed adjustment.",
        },
    ]
)

display(limitations)
limitation why_it_matters mitigation_in_this_work remaining_risk
0 observational_recommendation_logs High-discovery exposure was not randomized, so user preferences and recommender selection can confound estimates. Used pre-treatment history/profile controls, propensity diagnostics, and weighted sensitivity checks. Unobserved ranking-system state and user intent can still bias estimates.
1 mediator_outcome_confounding Satisfaction depth and future engagement may share unobserved causes. Checked residual mediator-outcome relationships and compared multiple mediator definitions. Natural direct and indirect effects rely on strong mediator assumptions.
2 satisfaction_proxy_measurement Watch ratio and valid play are proxies, not direct surveys of satisfaction. Compared composite satisfaction, high watch ratio, valid play, average score, and completion-style mediators. The chosen proxies may miss frustration, novelty value, or long-run satisfaction.
3 active_day_sample_selection The analysis conditions on active user-days, so it does not study days with no observed interactions. Defined the estimand clearly for active days and reported active-day alternate outcomes. Results may not generalize to dormant users or cold-start retention decisions.
4 calendar_and_history_imbalance High discovery days differ from lower discovery days in time and recent behavior. Included calendar/history controls and showed placebo-style pre-period checks. Residual time-varying confounding can remain after observed adjustment.

The limitations table gives the final report a professional tone. It shows that the analysis is careful about where the evidence is strong and where it remains assumption-dependent.

18. Save Robustness Outputs

This cell saves all robustness tables to data/processed and mirrors the main tables into the notebook writeup folder.

threshold_results.to_csv(THRESHOLD_OUTPUT, index=False)
mediator_results.to_csv(MEDIATOR_OUTPUT, index=False)
model_sensitivity.to_csv(MODEL_SENSITIVITY_OUTPUT, index=False)
continuous_sensitivity.to_csv(CONTINUOUS_OUTPUT, index=False)
placebo_checks.to_csv(PLACEBO_OUTPUT, index=False)
robustness_summary.to_csv(ROBUSTNESS_SUMMARY_OUTPUT, index=False)
limitations.to_csv(LIMITATIONS_OUTPUT, index=False)

threshold_results.to_csv(TABLE_DIR / "robustness_thresholds.csv", index=False)
mediator_results.to_csv(TABLE_DIR / "robustness_mediators.csv", index=False)
model_sensitivity.to_csv(TABLE_DIR / "model_sensitivity.csv", index=False)
continuous_sensitivity.to_csv(TABLE_DIR / "continuous_score_sensitivity.csv", index=False)
placebo_checks.to_csv(TABLE_DIR / "placebo_checks.csv", index=False)
robustness_summary.to_csv(TABLE_DIR / "robustness_summary.csv", index=False)
limitations.to_csv(TABLE_DIR / "limitations.csv", index=False)

saved_outputs = pd.DataFrame(
    {
        "artifact": [
            "threshold_robustness",
            "mediator_robustness",
            "model_sensitivity",
            "continuous_score_sensitivity",
            "placebo_checks",
            "robustness_summary",
            "limitations",
        ],
        "path": [
            str(THRESHOLD_OUTPUT),
            str(MEDIATOR_OUTPUT),
            str(MODEL_SENSITIVITY_OUTPUT),
            str(CONTINUOUS_OUTPUT),
            str(PLACEBO_OUTPUT),
            str(ROBUSTNESS_SUMMARY_OUTPUT),
            str(LIMITATIONS_OUTPUT),
        ],
    }
)

display(saved_outputs)
artifact path
0 threshold_robustness /home/apex/Documents/ranking_sys/data/processed/kuairec_discovery_quality_robustness_thresholds.csv
1 mediator_robustness /home/apex/Documents/ranking_sys/data/processed/kuairec_discovery_quality_robustness_mediators.csv
2 model_sensitivity /home/apex/Documents/ranking_sys/data/processed/kuairec_discovery_quality_model_sensitivity.csv
3 continuous_score_sensitivity /home/apex/Documents/ranking_sys/data/processed/kuairec_discovery_quality_continuous_score_sensitivity.csv
4 placebo_checks /home/apex/Documents/ranking_sys/data/processed/kuairec_discovery_quality_placebo_checks.csv
5 robustness_summary /home/apex/Documents/ranking_sys/data/processed/kuairec_discovery_quality_robustness_summary.csv
6 limitations /home/apex/Documents/ranking_sys/data/processed/kuairec_discovery_quality_limitations.csv

The saved outputs make this robustness work reusable. A final report can pull directly from these tables without recomputing the sensitivity grid.

19. Notebook Takeaways

This notebook stress-tested the mediation result from several angles:

  • The total future-interaction effect is checked across stricter discovery thresholds.
  • The mediated pathway is checked across several satisfaction proxy definitions.
  • The result is compared across future interactions, active days, and play hours.
  • Model choices are varied through weighting, interactions, and control-set size.
  • Placebo-style checks make pre-period imbalance visible.
  • The limitations table records what the observational design cannot fully solve.

The next notebook should be the final report and figures notebook for this discovery-quality mediation project. It should turn the evidence into a concise portfolio-ready narrative with the key tables, plots, and caveats.