Direct, Indirect, and Total Effects

This notebook turns the randomized promotion estimates into a formal interference decomposition.

The central lesson is that an item-level treatment effect is incomplete when recommendation items compete for attention. Promoting one movie can increase that movie’s outcome while reducing outcomes for other movies in the same slate. A causal analysis under interference therefore needs more than one number:

The previous notebook estimated each contrast separately. This notebook makes the accounting explicit so the components add up to the slate-level total. It also compares alternative spillover definitions, because the answer can change depending on whether we define competitors as all non-focal items, same-cluster substitutes, displaced items, or stricter same-cluster displaced substitutes.

The practical question is:

Did the promotion create new engagement, or did it mostly move attention from one item to another?

1. Environment and Paths

This cell imports the libraries used for decomposition, uncertainty estimates, and plotting. It locates the repository root by searching for the processed exposure mapping file, which keeps the notebook runnable from JupyterLab or from command-line execution.

from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import statsmodels.api as sm
from IPython.display import display

sns.set_theme(style="whitegrid", context="notebook")
pd.set_option("display.max_columns", 140)
pd.set_option("display.max_rows", 100)
pd.set_option("display.float_format", lambda value: f"{value:,.4f}")

candidate_roots = [Path.cwd(), *Path.cwd().parents]
PROJECT_DIR = next(
    root for root in candidate_roots
    if (root / "data" / "processed" / "movielens_interference_exposure_mapping.parquet").exists()
)

PROCESSED_DIR = PROJECT_DIR / "data" / "processed"
NOTEBOOK_DIR = PROJECT_DIR / "notebooks" / "interference_spillover_effects"
WRITEUP_DIR = NOTEBOOK_DIR / "writeup"
FIGURE_DIR = WRITEUP_DIR / "figures"
TABLE_DIR = WRITEUP_DIR / "tables"
FIGURE_DIR.mkdir(parents=True, exist_ok=True)
TABLE_DIR.mkdir(parents=True, exist_ok=True)

EXPOSURE_PATH = PROCESSED_DIR / "movielens_interference_exposure_mapping.parquet"
SLATE_OUTCOME_PATH = PROCESSED_DIR / "movielens_interference_slate_outcomes.parquet"
OBSERVED_EFFECTS_PATH = PROCESSED_DIR / "movielens_interference_observed_effects.csv"

EXPOSURE_PATH.exists(), SLATE_OUTCOME_PATH.exists(), OBSERVED_EFFECTS_PATH.exists()
(True, True, True)

All checks should be True. This notebook depends on the item-row exposure mapping, the slate-level outcomes, and the observed effect estimates created in earlier notebooks.

2. Load the Analysis Tables

This cell loads the exposure mapping and prior observed-effect summary. The exposure table is the main input because it has one row per slate item and includes focal-item flags, spillover flags, final positions, simulated outcomes, and known probability lifts.

exposure = pd.read_parquet(EXPOSURE_PATH)
slate_outcomes = pd.read_parquet(SLATE_OUTCOME_PATH)
observed_effects_prior = pd.read_csv(OBSERVED_EFFECTS_PATH)

load_summary = pd.DataFrame(
    {
        "table": ["item_row_exposure", "slate_outcomes", "prior_observed_effects"],
        "rows": [len(exposure), len(slate_outcomes), len(observed_effects_prior)],
        "unique_slates": [
            exposure["slate_id"].nunique(),
            slate_outcomes["slate_id"].nunique(),
            np.nan,
        ],
        "unique_users": [
            exposure["userId"].nunique(),
            slate_outcomes["userId"].nunique(),
            np.nan,
        ],
    }
)

display(load_summary)
display(observed_effects_prior)
table rows unique_slates unique_users
0 item_row_exposure 36000 3,000.0000 3,000.0000
1 slate_outcomes 3000 3,000.0000 3,000.0000
2 prior_observed_effects 5 NaN NaN
contrast outcome estimate naive_se cluster_se ci_95_lower ci_95_upper p_value_cluster treated_mean control_mean treated_n control_n clusters cluster_to_naive_se_ratio contrast_order
0 Direct focal item Observed simulated click 0.1716 0.0157 0.0157 0.1409 0.2023 0.0000 0.3435 0.1719 1505 1495 3000 0.9992 0
1 Same-cluster competitor spillover Observed simulated click -0.0577 0.0083 0.0084 -0.0743 -0.0412 0.0000 0.1522 0.2100 4309 4139 2456 1.0107 1
2 Displaced-item spillover Observed simulated click -0.0568 0.0054 0.0057 -0.0679 -0.0457 0.0000 0.1818 0.2386 11277 11133 3000 1.0419 2
3 All non-focal slate spillover Observed simulated click -0.0436 0.0043 0.0046 -0.0526 -0.0346 0.0000 0.1705 0.2141 16555 16445 3000 1.0635 3
4 Total slate Observed total simulated clicks -0.3078 0.0538 0.0538 -0.4132 -0.2024 0.0000 2.2193 2.5271 1505 1495 3000 1.0001 4

The prior observed estimates are useful context, but this notebook will rebuild the component accounting directly from item rows. That makes the decomposition transparent instead of relying only on already-summarized outputs.

3. Formalize the Effect Components

This cell assigns every item row to exactly one slate-accounting component:

  • direct_focal: the selected focal item, promoted or not.
  • same_cluster_competitor: non-focal items in the same spillover cluster as the focal item.
  • other_competitor: all remaining non-focal items.

This partition is important because these three components add up to the full slate. If we estimate each component at the slate level, direct plus indirect equals total by construction.

component_exposure = exposure.copy()
component_exposure["accounting_component"] = np.select(
    [
        component_exposure["is_focal_item"].eq(1),
        component_exposure["is_focal_item"].eq(0)
        & component_exposure["spillover_cluster"].eq(component_exposure["focal_spillover_cluster"]),
    ],
    ["direct_focal", "same_cluster_competitor"],
    default="other_competitor",
)

component_counts = (
    component_exposure.groupby(["accounting_component", "promotion_applied"])
    .agg(
        rows=("movieId", "size"),
        slates=("slate_id", "nunique"),
        mean_seed_position=("slate_position_seed", "mean"),
        mean_final_position=("final_position", "mean"),
        mean_relevance=("observed_relevance", "mean"),
        mean_known_lift=("known_probability_lift", "mean"),
    )
    .reset_index()
)
component_counts["assignment_arm"] = component_counts["promotion_applied"].map(
    {0: "control", 1: "promoted"}
)

display(component_counts)
accounting_component promotion_applied rows slates mean_seed_position mean_final_position mean_relevance mean_known_lift assignment_arm
0 direct_focal 0 1495 1495 8.4468 8.4468 4.4893 0.0000 control
1 direct_focal 1 1505 1505 8.4930 1.0000 4.4897 0.1794 promoted
2 other_competitor 0 12306 1490 6.3073 6.3073 4.5952 0.0000 control
3 other_competitor 1 12246 1500 6.3039 6.9849 4.6084 -0.0370 promoted
4 same_cluster_competitor 0 4139 1211 6.3697 6.3697 4.6372 0.0000 control
5 same_cluster_competitor 1 4309 1245 6.3613 7.0429 4.6239 -0.0686 promoted

The component table shows the support for each part of the accounting identity. The focal component has one item per slate. The competitor components have many more rows, which is why a small per-item spillover can outweigh a large focal-item gain at the slate level.

4. Aggregate Components to the Slate Level

This cell creates a one-row-per-slate table where each outcome component is stored as a separate column. This is the main decomposition table.

For each slate, we compute:

  • focal simulated clicks,
  • same-cluster competitor simulated clicks,
  • other competitor simulated clicks,
  • total simulated clicks,
  • expected click probabilities for each component,
  • known probability lift for each component.

Because every row belongs to exactly one component, the component sums should equal the total slate outcome.

component_slate_long = (
    component_exposure.groupby(["slate_id", "accounting_component"])
    .agg(
        simulated_clicks=("simulated_click", "sum"),
        expected_clicks=("p_observed", "sum"),
        no_promotion_expected_clicks=("p_no_promotion", "sum"),
        known_probability_lift=("known_probability_lift", "sum"),
        item_count=("movieId", "size"),
    )
    .reset_index()
)

component_slate_wide = component_slate_long.pivot(
    index="slate_id",
    columns="accounting_component",
    values=[
        "simulated_clicks",
        "expected_clicks",
        "no_promotion_expected_clicks",
        "known_probability_lift",
        "item_count",
    ],
)
component_slate_wide.columns = [f"{metric}_{component}" for metric, component in component_slate_wide.columns]
component_slate_wide = component_slate_wide.reset_index().fillna(0)

slate_keys = slate_outcomes[
    [
        "slate_id",
        "userId",
        "promotion_applied",
        "assignment_arm",
        "focal_movieId",
        "focal_title",
        "focal_spillover_cluster",
        "focal_seed_position",
        "total_simulated_clicks",
        "total_expected_clicks",
        "total_expected_clicks_no_promotion",
        "total_known_probability_lift",
    ]
]

component_slate = slate_keys.merge(component_slate_wide, on="slate_id", how="left")

component_slate["component_simulated_total"] = (
    component_slate["simulated_clicks_direct_focal"]
    + component_slate["simulated_clicks_same_cluster_competitor"]
    + component_slate["simulated_clicks_other_competitor"]
)
component_slate["component_expected_total"] = (
    component_slate["expected_clicks_direct_focal"]
    + component_slate["expected_clicks_same_cluster_competitor"]
    + component_slate["expected_clicks_other_competitor"]
)
component_slate["component_known_lift_total"] = (
    component_slate["known_probability_lift_direct_focal"]
    + component_slate["known_probability_lift_same_cluster_competitor"]
    + component_slate["known_probability_lift_other_competitor"]
)

component_check = pd.DataFrame(
    {
        "check": [
            "max_abs_simulated_click_difference",
            "max_abs_expected_click_difference",
            "max_abs_known_lift_difference",
        ],
        "value": [
            (component_slate["component_simulated_total"] - component_slate["total_simulated_clicks"]).abs().max(),
            (component_slate["component_expected_total"] - component_slate["total_expected_clicks"]).abs().max(),
            (component_slate["component_known_lift_total"] - component_slate["total_known_probability_lift"]).abs().max(),
        ],
    }
)

display(component_check)
display(component_slate.head())
check value
0 max_abs_simulated_click_difference 0.0000
1 max_abs_expected_click_difference 0.0000
2 max_abs_known_lift_difference 0.0000
slate_id userId promotion_applied assignment_arm focal_movieId focal_title focal_spillover_cluster focal_seed_position total_simulated_clicks total_expected_clicks total_expected_clicks_no_promotion total_known_probability_lift simulated_clicks_direct_focal simulated_clicks_other_competitor simulated_clicks_same_cluster_competitor expected_clicks_direct_focal expected_clicks_other_competitor expected_clicks_same_cluster_competitor no_promotion_expected_clicks_direct_focal no_promotion_expected_clicks_other_competitor no_promotion_expected_clicks_same_cluster_competitor known_probability_lift_direct_focal known_probability_lift_other_competitor known_probability_lift_same_cluster_competitor item_count_direct_focal item_count_other_competitor item_count_same_cluster_competitor component_simulated_total component_expected_total component_known_lift_total
0 user_100000_seed 100000 1 promote_focal_item 88810 Help, The (2011) Drama 8 3 1.5470 1.7488 -0.2018 1.0000 2.0000 0.0000 0.2407 1.2603 0.0459 0.1057 1.5810 0.0621 0.1351 -0.3206 -0.0162 1.0000 10.0000 1.0000 3.0000 1.5470 -0.2018
1 user_10000_seed 10000 0 leave_slate_unchanged 2005 Goonies, The (1985) Action 12 1 2.3210 2.3210 0.0000 0.0000 1.0000 0.0000 0.1172 1.9674 0.2364 0.1172 1.9674 0.2364 0.0000 0.0000 0.0000 1.0000 9.0000 2.0000 1.0000 2.3210 0.0000
2 user_100050_seed 100050 1 promote_focal_item 5459 Men in Black II (a.k.a. MIIB) (a.k.a. MIB 2) (... Action 5 3 2.4909 2.8907 -0.3998 1.0000 2.0000 0.0000 0.4161 1.0215 1.0534 0.2237 1.1147 1.5523 0.1923 -0.0933 -0.4989 1.0000 5.0000 6.0000 3.0000 2.4909 -0.3998
3 user_100100_seed 100100 0 leave_slate_unchanged 4299 Knight's Tale, A (2001) Action 5 5 2.9722 2.9722 0.0000 0.0000 4.0000 1.0000 0.2317 1.9963 0.7442 0.2317 1.9963 0.7442 0.0000 0.0000 0.0000 1.0000 8.0000 3.0000 5.0000 2.9722 0.0000
4 user_100200_seed 100200 0 leave_slate_unchanged 5445 Minority Report (2002) Action 11 2 3.3004 3.3004 0.0000 1.0000 0.0000 1.0000 0.2534 1.2939 1.7530 0.2534 1.2939 1.7530 0.0000 0.0000 0.0000 1.0000 5.0000 6.0000 2.0000 3.3004 0.0000

The component checks should be zero or extremely close to zero. That confirms the accounting table is internally consistent: focal plus same-cluster competitors plus other competitors equals the full slate.

5. Define Decomposition Estimation Helpers

This cell defines a simple slate-level difference-in-means helper. Because the table now has one row per randomized slate, the treatment-control difference is a cluster-level estimator by construction. The helper returns the estimate, standard error, confidence interval, and arm means.

def slate_difference(data, outcome, treatment="promotion_applied", label=None, family=None):
    work = data[[outcome, treatment]].dropna().copy()
    work[outcome] = work[outcome].astype(float)
    work[treatment] = work[treatment].astype(float)
    treated = work.loc[work[treatment] == 1, outcome]
    control = work.loc[work[treatment] == 0, outcome]

    x = sm.add_constant(work[treatment], has_constant="add")
    fit = sm.OLS(work[outcome], x).fit()
    estimate = float(fit.params[treatment])
    se = float(fit.bse[treatment])

    return {
        "effect_family": family or label or outcome,
        "component": label or outcome,
        "outcome_column": outcome,
        "estimate_per_slate": estimate,
        "se": se,
        "ci_95_lower": estimate - 1.96 * se,
        "ci_95_upper": estimate + 1.96 * se,
        "treated_mean": treated.mean(),
        "control_mean": control.mean(),
        "treated_slates": len(treated),
        "control_slates": len(control),
    }


def cluster_bootstrap_slate_difference(data, outcome, treatment="promotion_applied", n_bootstrap=500, seed=20260428):
    work = data[[outcome, treatment]].dropna().reset_index(drop=True).copy()
    rng = np.random.default_rng(seed)
    estimates = []
    row_ids = np.arange(len(work))
    for _ in range(n_bootstrap):
        sampled = work.iloc[rng.choice(row_ids, size=len(row_ids), replace=True)]
        treated = sampled.loc[sampled[treatment] == 1, outcome]
        control = sampled.loc[sampled[treatment] == 0, outcome]
        if treated.empty or control.empty:
            estimates.append(np.nan)
        else:
            estimates.append(treated.mean() - control.mean())
    return np.asarray(estimates, dtype=float)

These functions are intentionally simple because the design is randomized at the slate level. The heavy causal work happened when we defined the randomized assignment and the component mapping. Here the estimator is a transparent treated-control difference.

6. Estimate Component-Level Direct, Indirect, and Total Effects

This cell estimates component effects using observed simulated clicks, expected clicks, and known probability lift. The observed-click estimates are what we would see in a real experiment; the expected and known-lift estimates are simulation diagnostics.

component_jobs = [
    ("Direct", "Focal item", "simulated_clicks_direct_focal", "observed_clicks"),
    ("Indirect", "Same-cluster competitors", "simulated_clicks_same_cluster_competitor", "observed_clicks"),
    ("Indirect", "Other competitors", "simulated_clicks_other_competitor", "observed_clicks"),
    ("Total", "Full slate", "total_simulated_clicks", "observed_clicks"),
    ("Direct", "Focal item", "expected_clicks_direct_focal", "expected_clicks"),
    ("Indirect", "Same-cluster competitors", "expected_clicks_same_cluster_competitor", "expected_clicks"),
    ("Indirect", "Other competitors", "expected_clicks_other_competitor", "expected_clicks"),
    ("Total", "Full slate", "total_expected_clicks", "expected_clicks"),
    ("Direct", "Focal item", "known_probability_lift_direct_focal", "known_lift"),
    ("Indirect", "Same-cluster competitors", "known_probability_lift_same_cluster_competitor", "known_lift"),
    ("Indirect", "Other competitors", "known_probability_lift_other_competitor", "known_lift"),
    ("Total", "Full slate", "total_known_probability_lift", "known_lift"),
]

component_effect_rows = []
for family, component, outcome_col, outcome_type in component_jobs:
    result = slate_difference(component_slate, outcome_col, label=component, family=family)
    result["outcome_type"] = outcome_type
    component_effect_rows.append(result)

component_effects = pd.DataFrame(component_effect_rows)
component_effects["estimate_per_1000_promoted_slates"] = component_effects["estimate_per_slate"] * 1000
component_effects["ci_95_lower_per_1000"] = component_effects["ci_95_lower"] * 1000
component_effects["ci_95_upper_per_1000"] = component_effects["ci_95_upper"] * 1000

component_effects_display = component_effects[
    [
        "outcome_type",
        "effect_family",
        "component",
        "estimate_per_slate",
        "ci_95_lower",
        "ci_95_upper",
        "estimate_per_1000_promoted_slates",
        "treated_mean",
        "control_mean",
    ]
]

display(component_effects_display)
outcome_type effect_family component estimate_per_slate ci_95_lower ci_95_upper estimate_per_1000_promoted_slates treated_mean control_mean
0 observed_clicks Direct Focal item 0.1716 0.1409 0.2023 171.6152 0.3435 0.1719
1 observed_clicks Indirect Same-cluster competitors -0.1454 -0.2015 -0.0892 -145.3905 0.4359 0.5813
2 observed_clicks Indirect Other competitors -0.3340 -0.4260 -0.2421 -334.0459 1.4399 1.7739
3 observed_clicks Total Full slate -0.3078 -0.4132 -0.2025 -307.8212 2.2193 2.5271
4 expected_clicks Direct Focal item 0.1777 0.1715 0.1838 177.6760 0.3556 0.1779
5 expected_clicks Indirect Same-cluster competitors -0.1786 -0.2130 -0.1441 -178.5503 0.4198 0.5984
6 expected_clicks Indirect Other competitors -0.3160 -0.3612 -0.2707 -315.9706 1.4515 1.7675
7 expected_clicks Total Full slate -0.3168 -0.3610 -0.2727 -316.8448 2.2269 2.5438
8 known_lift Direct Focal item 0.1794 0.1774 0.1814 179.3833 0.1794 0.0000
9 known_lift Indirect Same-cluster competitors -0.1965 -0.2055 -0.1875 -196.4743 -0.1965 0.0000
10 known_lift Indirect Other competitors -0.3010 -0.3068 -0.2951 -300.9524 -0.3010 0.0000
11 known_lift Total Full slate -0.3180 -0.3232 -0.3129 -318.0434 -0.3180 0.0000

This is the main decomposition table. The observed-click rows show the experiment-style result, while the expected and known-lift rows show the simulation mechanism. The per-1,000-promoted-slates column makes the effect sizes easier to explain in portfolio or product language.

7. Check That the Estimated Components Add Up

This cell verifies the decomposition identity at the estimate level. The direct estimate plus the two indirect component estimates should equal the total slate estimate for each outcome type, up to floating-point rounding.

additivity_checks = []
for outcome_type, df in component_effects.groupby("outcome_type"):
    direct = df.query("effect_family == 'Direct'")["estimate_per_slate"].sum()
    indirect = df.query("effect_family == 'Indirect'")["estimate_per_slate"].sum()
    total = df.query("effect_family == 'Total'")["estimate_per_slate"].sum()
    additivity_checks.append(
        {
            "outcome_type": outcome_type,
            "direct_estimate": direct,
            "indirect_estimate": indirect,
            "direct_plus_indirect": direct + indirect,
            "total_estimate": total,
            "difference": (direct + indirect) - total,
        }
    )

additivity_checks = pd.DataFrame(additivity_checks)
display(additivity_checks)
outcome_type direct_estimate indirect_estimate direct_plus_indirect total_estimate difference
0 expected_clicks 0.1777 -0.4945 -0.3168 -0.3168 -0.0000
1 known_lift 0.1794 -0.4974 -0.3180 -0.3180 -0.0000
2 observed_clicks 0.1716 -0.4794 -0.3078 -0.3078 0.0000

The differences should be essentially zero. This is the advantage of doing component accounting at the slate level: the direct and indirect pieces are not separate anecdotes; they mathematically reconcile to the total effect.

8. Plot the Observed Direct, Indirect, and Total Effects

This plot shows the observed-click decomposition in clicks per 1,000 promoted slates. Positive bars represent gains and negative bars represent losses. This is the cleanest way to communicate whether promotion created new engagement or mostly shifted attention.

observed_component_plot = component_effects.query("outcome_type == 'observed_clicks'").copy()
observed_component_plot["component_label"] = observed_component_plot["effect_family"] + ": " + observed_component_plot["component"]
component_order = [
    "Direct: Focal item",
    "Indirect: Same-cluster competitors",
    "Indirect: Other competitors",
    "Total: Full slate",
]

fig, ax = plt.subplots(figsize=(11, 5.5))
colors = ["tab:green" if value >= 0 else "tab:red" for value in observed_component_plot["estimate_per_1000_promoted_slates"]]
sns.barplot(
    data=observed_component_plot,
    x="estimate_per_1000_promoted_slates",
    y="component_label",
    order=component_order,
    hue="component_label",
    palette=dict(zip(observed_component_plot["component_label"], colors)),
    legend=False,
    ax=ax,
)
for y_pos, (_, row) in enumerate(observed_component_plot.set_index("component_label").loc[component_order].reset_index().iterrows()):
    ax.errorbar(
        x=row["estimate_per_1000_promoted_slates"],
        y=y_pos,
        xerr=[
            [row["estimate_per_1000_promoted_slates"] - row["ci_95_lower_per_1000"]],
            [row["ci_95_upper_per_1000"] - row["estimate_per_1000_promoted_slates"]],
        ],
        fmt="none",
        color="black",
        capsize=3,
    )
ax.axvline(0, color="black", linewidth=1)
ax.set_title("Observed Effect Decomposition per 1,000 Promoted Slates")
ax.set_xlabel("Change in simulated clicks per 1,000 promoted slates")
ax.set_ylabel("")
plt.tight_layout()
fig.savefig(FIGURE_DIR / "16_observed_direct_indirect_total_decomposition.png", dpi=160, bbox_inches="tight")
plt.show()

The plot makes the product tradeoff immediate. The focal item gains clicks, but the same-cluster and other competitor components lose clicks. The total bar tells us whether the slate as a whole improved after accounting for those losses.

9. Compare Observed and Known Decomposition

This cell places the observed-click decomposition next to the known simulation lift. The observed result includes random click noise, while the known-lift result shows the mechanism that the simulation injected.

decomposition_compare = component_effects.query("outcome_type in ['observed_clicks', 'known_lift']").copy()
decomposition_compare["component_label"] = decomposition_compare["effect_family"] + ": " + decomposition_compare["component"]
decomposition_compare["outcome_type_label"] = decomposition_compare["outcome_type"].map(
    {"observed_clicks": "Observed simulated clicks", "known_lift": "Known probability lift"}
)

fig, ax = plt.subplots(figsize=(12, 5.5))
sns.barplot(
    data=decomposition_compare,
    x="estimate_per_1000_promoted_slates",
    y="component_label",
    hue="outcome_type_label",
    order=component_order,
    ax=ax,
)
ax.axvline(0, color="black", linewidth=1)
ax.set_title("Observed Decomposition Compared with Known Simulation Lift")
ax.set_xlabel("Effect per 1,000 promoted slates")
ax.set_ylabel("")
ax.legend(title="")
plt.tight_layout()
fig.savefig(FIGURE_DIR / "17_observed_vs_known_decomposition.png", dpi=160, bbox_inches="tight")
plt.show()

The observed and known bars should tell the same directional story. They do not need to match exactly because observed clicks include Bernoulli noise. The comparison is useful because it shows that the estimated decomposition is recovering the intended simulation pattern.

10. Spillover Definitions for Sensitivity Analysis

The decomposition above used a mutually exclusive accounting partition. This cell creates several alternative spillover definitions for sensitivity:

  • all non-focal items in promoted slates,
  • same-cluster competitors,
  • displaced items originally above the focal item,
  • strict same-cluster displaced substitutes,
  • near-position displaced items close to the focal item.

These definitions answer slightly different questions, so comparing them helps avoid overclaiming from one arbitrary spillover definition.

near_position_window = 3
spillover_definitions = [
    {
        "definition": "All non-focal items",
        "mask": exposure["is_focal_item"].eq(0),
        "description": "Every non-focal item in the slate.",
    },
    {
        "definition": "Same-cluster competitors",
        "mask": exposure["is_focal_item"].eq(0)
        & exposure["spillover_cluster"].eq(exposure["focal_spillover_cluster"]),
        "description": "Non-focal items with the same primary-genre cluster as the focal item.",
    },
    {
        "definition": "Displaced items",
        "mask": exposure["is_focal_item"].eq(0)
        & exposure["slate_position_seed"].lt(exposure["focal_seed_position"]),
        "description": "Non-focal items above the focal item that shift down when promotion happens.",
    },
    {
        "definition": "Same-cluster displaced substitutes",
        "mask": exposure["is_focal_item"].eq(0)
        & exposure["spillover_cluster"].eq(exposure["focal_spillover_cluster"])
        & exposure["slate_position_seed"].lt(exposure["focal_seed_position"]),
        "description": "Same-cluster competitors that are also mechanically displaced.",
    },
    {
        "definition": "Near-position displaced items",
        "mask": exposure["is_focal_item"].eq(0)
        & exposure["slate_position_seed"].lt(exposure["focal_seed_position"])
        & (exposure["focal_seed_position"] - exposure["slate_position_seed"]).le(near_position_window),
        "description": "Displaced items within three positions above the focal item.",
    },
]

spillover_support_rows = []
for spec in spillover_definitions:
    df = exposure.loc[spec["mask"]].copy()
    spillover_support_rows.append(
        {
            "definition": spec["definition"],
            "rows": len(df),
            "slates": df["slate_id"].nunique(),
            "treated_rows": int(df["promotion_applied"].sum()),
            "control_rows": int((1 - df["promotion_applied"]).sum()),
            "description": spec["description"],
        }
    )
spillover_support = pd.DataFrame(spillover_support_rows)

display(spillover_support)
definition rows slates treated_rows control_rows description
0 All non-focal items 33000 3000 16555 16445 Every non-focal item in the slate.
1 Same-cluster competitors 8448 2456 4309 4139 Non-focal items with the same primary-genre cl...
2 Displaced items 22410 3000 11277 11133 Non-focal items above the focal item that shif...
3 Same-cluster displaced substitutes 5746 2182 2937 2809 Same-cluster competitors that are also mechani...
4 Near-position displaced items 9000 3000 4515 4485 Displaced items within three positions above t...

The support table shows the tradeoff between conceptual precision and sample size. The stricter definitions are more targeted, but they have fewer rows. The broader definitions are more stable, but they mix several mechanisms together.

11. Estimate Effects Under Alternative Spillover Definitions

This cell estimates row-level spillover effects under each definition. The outcome is the simulated click indicator, and uncertainty is clustered by slate because slate assignment drives spillover exposure.

def row_level_cluster_difference(data, outcome="simulated_click", treatment="promotion_applied", cluster_col="slate_id"):
    work = data[[outcome, treatment, cluster_col]].dropna().copy()
    work[outcome] = work[outcome].astype(float)
    work[treatment] = work[treatment].astype(float)
    treated = work.loc[work[treatment] == 1, outcome]
    control = work.loc[work[treatment] == 0, outcome]
    x = sm.add_constant(work[treatment], has_constant="add")
    fit = sm.OLS(work[outcome], x).fit(cov_type="cluster", cov_kwds={"groups": work[cluster_col]})
    estimate = float(fit.params[treatment])
    se = float(fit.bse[treatment])
    return {
        "estimate": estimate,
        "cluster_se": se,
        "ci_95_lower": estimate - 1.96 * se,
        "ci_95_upper": estimate + 1.96 * se,
        "treated_mean": treated.mean(),
        "control_mean": control.mean(),
        "treated_n": len(treated),
        "control_n": len(control),
        "clusters": work[cluster_col].nunique(),
    }

spillover_sensitivity_rows = []
for spec in spillover_definitions:
    df = exposure.loc[spec["mask"]].copy()
    result = row_level_cluster_difference(df)
    result["definition"] = spec["definition"]
    result["description"] = spec["description"]
    spillover_sensitivity_rows.append(result)

spillover_sensitivity = pd.DataFrame(spillover_sensitivity_rows)
spillover_sensitivity = spillover_sensitivity[
    [
        "definition",
        "estimate",
        "cluster_se",
        "ci_95_lower",
        "ci_95_upper",
        "treated_mean",
        "control_mean",
        "treated_n",
        "control_n",
        "clusters",
        "description",
    ]
]

display(spillover_sensitivity)
definition estimate cluster_se ci_95_lower ci_95_upper treated_mean control_mean treated_n control_n clusters description
0 All non-focal items -0.0436 0.0046 -0.0526 -0.0346 0.1705 0.2141 16555 16445 3000 Every non-focal item in the slate.
1 Same-cluster competitors -0.0577 0.0084 -0.0743 -0.0412 0.1522 0.2100 4309 4139 2456 Non-focal items with the same primary-genre cl...
2 Displaced items -0.0568 0.0057 -0.0679 -0.0457 0.1818 0.2386 11277 11133 3000 Non-focal items above the focal item that shif...
3 Same-cluster displaced substitutes -0.0657 0.0104 -0.0860 -0.0454 0.1610 0.2268 2937 2809 2182 Same-cluster competitors that are also mechani...
4 Near-position displaced items -0.0330 0.0082 -0.0491 -0.0168 0.1604 0.1933 4515 4485 3000 Displaced items within three positions above t...

The sensitivity table shows whether the spillover conclusion depends on the exposure definition. If every definition is negative, the evidence for displacement is stronger. If only one narrow definition is negative, the claim should be framed more cautiously.

12. Plot Spillover Definition Sensitivity

This plot compares the estimated spillover effect under each exposure definition. The intervals use slate-clustered standard errors.

spillover_plot = spillover_sensitivity.sort_values("estimate").copy()

fig, ax = plt.subplots(figsize=(11, 5.5))
sns.pointplot(
    data=spillover_plot,
    x="estimate",
    y="definition",
    join=False,
    errorbar=None,
    ax=ax,
    color="tab:blue",
)
for y_pos, (_, row) in enumerate(spillover_plot.reset_index(drop=True).iterrows()):
    ax.errorbar(
        x=row["estimate"],
        y=y_pos,
        xerr=[[row["estimate"] - row["ci_95_lower"]], [row["ci_95_upper"] - row["estimate"]]],
        fmt="none",
        color="black",
        capsize=3,
    )
ax.axvline(0, color="black", linewidth=1)
ax.set_title("Spillover Effect Sensitivity Across Exposure Definitions")
ax.set_xlabel("Difference in simulated click rate")
ax.set_ylabel("")
plt.tight_layout()
fig.savefig(FIGURE_DIR / "18_spillover_definition_sensitivity.png", dpi=160, bbox_inches="tight")
plt.show()

The plot makes the robustness of the spillover story easier to see. Definitions that focus on displaced or same-cluster items should usually show stronger negative effects than broad all-non-focal definitions if substitute displacement is the mechanism.

13. Translate the Decomposition into Product Units

This cell converts the observed decomposition into clicks per 1,000 promoted slates and computes how much of the focal gain is offset by competitor losses. This is the most interview-friendly summary because it uses concrete units instead of abstract coefficients.

observed_components = component_effects.query("outcome_type == 'observed_clicks'").copy()
summary_lookup = observed_components.set_index(["effect_family", "component"])["estimate_per_1000_promoted_slates"]

direct_gain_1000 = summary_lookup.loc[("Direct", "Focal item")]
same_cluster_loss_1000 = summary_lookup.loc[("Indirect", "Same-cluster competitors")]
other_loss_1000 = summary_lookup.loc[("Indirect", "Other competitors")]
total_effect_1000 = summary_lookup.loc[("Total", "Full slate")]
competitor_offset_1000 = same_cluster_loss_1000 + other_loss_1000
offset_ratio = competitor_offset_1000 / abs(direct_gain_1000)

product_summary = pd.DataFrame(
    [
        {
            "metric": "Focal gain per 1,000 promoted slates",
            "value": direct_gain_1000,
            "plain_language": "Additional simulated clicks on the promoted focal item.",
        },
        {
            "metric": "Same-cluster competitor change per 1,000 promoted slates",
            "value": same_cluster_loss_1000,
            "plain_language": "Change among substitute movies in the same genre cluster.",
        },
        {
            "metric": "Other competitor change per 1,000 promoted slates",
            "value": other_loss_1000,
            "plain_language": "Change among other non-focal movies in the slate.",
        },
        {
            "metric": "Total slate change per 1,000 promoted slates",
            "value": total_effect_1000,
            "plain_language": "Net change after adding focal and competitor components.",
        },
        {
            "metric": "Competitor offset as share of direct gain magnitude",
            "value": offset_ratio,
            "plain_language": "Negative values mean competitor losses more than offset focal gains.",
        },
    ]
)

display(product_summary)
metric value plain_language
0 Focal gain per 1,000 promoted slates 171.6152 Additional simulated clicks on the promoted fo...
1 Same-cluster competitor change per 1,000 promo... -145.3905 Change among substitute movies in the same gen...
2 Other competitor change per 1,000 promoted slates -334.0459 Change among other non-focal movies in the slate.
3 Total slate change per 1,000 promoted slates -307.8212 Net change after adding focal and competitor c...
4 Competitor offset as share of direct gain magn... -2.7937 Negative values mean competitor losses more th...

This table is the narrative payoff. It shows whether promotion is value-creating or attention-shifting. In this simulation, the focal gain is not enough by itself; we need the total slate row to decide whether the intervention improves the recommendation surface.

14. Identify When Promotion Is Most Risky

This cell groups promoted slates by focal seed position and focal cluster. The goal is to see whether net losses are larger when the focal item starts deeper in the slate or when certain clusters are promoted.

promoted_component_slate = component_slate.query("promotion_applied == 1").copy()
promoted_component_slate["focal_position_bucket"] = pd.cut(
    promoted_component_slate["focal_seed_position"],
    bins=[4, 6, 9, 12],
    labels=["positions_5_6", "positions_7_9", "positions_10_12"],
    include_lowest=True,
)

position_risk = (
    promoted_component_slate.groupby("focal_position_bucket", observed=True)
    .agg(
        promoted_slates=("slate_id", "size"),
        mean_direct_lift=("known_probability_lift_direct_focal", "mean"),
        mean_same_cluster_lift=("known_probability_lift_same_cluster_competitor", "mean"),
        mean_other_lift=("known_probability_lift_other_competitor", "mean"),
        mean_total_lift=("total_known_probability_lift", "mean"),
        observed_total_clicks=("total_simulated_clicks", "mean"),
    )
    .reset_index()
)

cluster_risk = (
    promoted_component_slate.groupby("focal_spillover_cluster")
    .agg(
        promoted_slates=("slate_id", "size"),
        mean_direct_lift=("known_probability_lift_direct_focal", "mean"),
        mean_same_cluster_lift=("known_probability_lift_same_cluster_competitor", "mean"),
        mean_other_lift=("known_probability_lift_other_competitor", "mean"),
        mean_total_lift=("total_known_probability_lift", "mean"),
    )
    .reset_index()
    .query("promoted_slates >= 40")
    .sort_values("mean_total_lift")
)

display(position_risk)
display(cluster_risk.head(12))
focal_position_bucket promoted_slates mean_direct_lift mean_same_cluster_lift mean_other_lift mean_total_lift observed_total_clicks
0 positions_5_6 368 0.1781 -0.1925 -0.2558 -0.2702 2.2582
1 positions_7_9 589 0.1826 -0.1923 -0.3007 -0.3104 2.3022
2 positions_10_12 548 0.1769 -0.2037 -0.3316 -0.3584 2.1040
focal_spillover_cluster promoted_slates mean_direct_lift mean_same_cluster_lift mean_other_lift mean_total_lift
1 Action 401 0.1819 -0.2928 -0.2525 -0.3633
5 Comedy 338 0.1772 -0.2177 -0.2887 -0.3292
8 Drama 317 0.1742 -0.2115 -0.2831 -0.3204
2 Adventure 168 0.1853 -0.1132 -0.3532 -0.2811
6 Crime 130 0.1872 -0.0902 -0.3723 -0.2753

These risk summaries are exploratory, but they give useful product intuition. Deeper promotions can create larger reordering changes, and some content clusters may have more same-cluster competitors to displace.

15. Plot Net Lift by Focal Position Bucket

This plot shows how known net slate lift changes depending on where the promoted focal item started. It helps connect the causal result back to ranking mechanics.

position_plot = position_risk.melt(
    id_vars=["focal_position_bucket", "promoted_slates"],
    value_vars=["mean_direct_lift", "mean_same_cluster_lift", "mean_other_lift", "mean_total_lift"],
    var_name="component",
    value_name="mean_lift",
)
position_plot["component"] = position_plot["component"].map(
    {
        "mean_direct_lift": "Direct focal",
        "mean_same_cluster_lift": "Same-cluster competitors",
        "mean_other_lift": "Other competitors",
        "mean_total_lift": "Net total",
    }
)

fig, ax = plt.subplots(figsize=(11, 5.5))
sns.barplot(data=position_plot, x="focal_position_bucket", y="mean_lift", hue="component", ax=ax)
ax.axhline(0, color="black", linewidth=1)
ax.set_title("Known Lift Components by Focal Starting Position")
ax.set_xlabel("Focal starting position bucket")
ax.set_ylabel("Mean probability lift per promoted slate")
ax.legend(title="")
plt.tight_layout()
fig.savefig(FIGURE_DIR / "19_position_bucket_decomposition.png", dpi=160, bbox_inches="tight")
plt.show()

The position-bucket view shows whether the harm is worse when the promoted item starts deeper. If deeper promotions create larger competitor losses, then promotion policies should consider the cost of jumping over many high-ranked items.

16. Build Final Decomposition Statements

This cell creates a compact recommendation table. It translates the technical decomposition into decision-oriented statements that can feed the final report notebook.

net_sign = "positive" if total_effect_1000 > 0 else "negative"
offset_text = "more than offsets" if abs(competitor_offset_1000) > abs(direct_gain_1000) else "partially offsets"

recommendation_table = pd.DataFrame(
    [
        {
            "decision_area": "Item-level reporting",
            "recommendation": "Do not report promoted-item gain alone as the product impact.",
            "evidence": f"The focal item gains {direct_gain_1000:,.1f} clicks per 1,000 promoted slates, but competitor changes sum to {competitor_offset_1000:,.1f}.",
        },
        {
            "decision_area": "Slate-level metric",
            "recommendation": "Use total slate effect as the primary decision metric under interference.",
            "evidence": f"The net observed slate effect is {total_effect_1000:,.1f} simulated clicks per 1,000 promoted slates, which is {net_sign} in this simulation.",
        },
        {
            "decision_area": "Spillover monitoring",
            "recommendation": "Track same-cluster and displaced-item outcomes whenever an item is promoted.",
            "evidence": f"Competitor movement {offset_text} the direct focal gain in the decomposition.",
        },
        {
            "decision_area": "Policy design",
            "recommendation": "Treat large rank jumps as higher-risk interventions unless slate-level value is measured.",
            "evidence": "Promotion changes final positions for multiple items, so the intervention reallocates scarce attention rather than adding attention for free.",
        },
    ]
)

display(recommendation_table)
decision_area recommendation evidence
0 Item-level reporting Do not report promoted-item gain alone as the ... The focal item gains 171.6 clicks per 1,000 pr...
1 Slate-level metric Use total slate effect as the primary decision... The net observed slate effect is -307.8 simula...
2 Spillover monitoring Track same-cluster and displaced-item outcomes... Competitor movement more than offsets the dire...
3 Policy design Treat large rank jumps as higher-risk interven... Promotion changes final positions for multiple...

The recommendation table is intentionally direct. A portfolio reader should be able to see the methodological point and the product implication: under interference, item gains are not enough; slate-level net value is the safer decision target.

17. Save Decomposition Outputs

This cell saves the component slate table, decomposition estimates, additivity checks, spillover sensitivity table, product summary, risk summaries, and recommendation table. The final report notebook can use these files directly.

COMPONENT_SLATE_OUTPUT = PROCESSED_DIR / "movielens_interference_component_slate.parquet"
COMPONENT_EFFECTS_OUTPUT = PROCESSED_DIR / "movielens_interference_direct_indirect_total_effects.csv"
ADDITIVITY_OUTPUT = PROCESSED_DIR / "movielens_interference_additivity_checks.csv"
SPILLOVER_SENSITIVITY_OUTPUT = PROCESSED_DIR / "movielens_interference_spillover_definition_sensitivity.csv"
PRODUCT_SUMMARY_OUTPUT = PROCESSED_DIR / "movielens_interference_product_summary.csv"
POSITION_RISK_OUTPUT = PROCESSED_DIR / "movielens_interference_position_risk.csv"
CLUSTER_RISK_OUTPUT = PROCESSED_DIR / "movielens_interference_cluster_risk.csv"
RECOMMENDATION_OUTPUT = PROCESSED_DIR / "movielens_interference_decomposition_recommendations.csv"

component_slate.to_parquet(COMPONENT_SLATE_OUTPUT, index=False)
component_effects.to_csv(COMPONENT_EFFECTS_OUTPUT, index=False)
additivity_checks.to_csv(ADDITIVITY_OUTPUT, index=False)
spillover_sensitivity.to_csv(SPILLOVER_SENSITIVITY_OUTPUT, index=False)
product_summary.to_csv(PRODUCT_SUMMARY_OUTPUT, index=False)
position_risk.to_csv(POSITION_RISK_OUTPUT, index=False)
cluster_risk.to_csv(CLUSTER_RISK_OUTPUT, index=False)
recommendation_table.to_csv(RECOMMENDATION_OUTPUT, index=False)

saved_outputs = pd.DataFrame(
    {
        "artifact": [
            "component_slate_table",
            "direct_indirect_total_effects",
            "additivity_checks",
            "spillover_definition_sensitivity",
            "product_summary",
            "position_risk",
            "cluster_risk",
            "decomposition_recommendations",
        ],
        "path": [
            str(COMPONENT_SLATE_OUTPUT),
            str(COMPONENT_EFFECTS_OUTPUT),
            str(ADDITIVITY_OUTPUT),
            str(SPILLOVER_SENSITIVITY_OUTPUT),
            str(PRODUCT_SUMMARY_OUTPUT),
            str(POSITION_RISK_OUTPUT),
            str(CLUSTER_RISK_OUTPUT),
            str(RECOMMENDATION_OUTPUT),
        ],
    }
)

display(saved_outputs)
artifact path
0 component_slate_table /home/apex/Documents/ranking_sys/data/processe...
1 direct_indirect_total_effects /home/apex/Documents/ranking_sys/data/processe...
2 additivity_checks /home/apex/Documents/ranking_sys/data/processe...
3 spillover_definition_sensitivity /home/apex/Documents/ranking_sys/data/processe...
4 product_summary /home/apex/Documents/ranking_sys/data/processe...
5 position_risk /home/apex/Documents/ranking_sys/data/processe...
6 cluster_risk /home/apex/Documents/ranking_sys/data/processe...
7 decomposition_recommendations /home/apex/Documents/ranking_sys/data/processe...

The saved outputs are the handoff to the final notebook. The most important files are the direct-indirect-total effect table, the spillover definition sensitivity table, and the recommendation table.

18. Notebook Takeaways

This notebook formalized the interference story:

  • The direct focal gain is positive, but it is not the whole product effect.
  • Same-cluster and other competitor losses can more than offset the focal gain.
  • The component estimates add up exactly to the total slate estimate when accounting is done at the slate level.
  • Spillover conclusions are robust across several exposure definitions in this simulation.
  • The safest decision metric under item competition is the total slate effect, not the promoted item’s isolated outcome.

The next notebook should close the workflow with sensitivity checks and a final report: summarize the assumptions, compare all major figures and tables, state limitations of the MovieLens simulation, and package portfolio-ready artifacts.