Spillover Exposure Mapping

This notebook turns the MovieLens seed slates from the setup notebook into a causal simulation dataset for studying interference and spillovers.

The key idea is simple: recommendation items in the same slate compete for limited attention. If one lower-ranked movie is promoted to the top of a slate, the promoted movie may gain visibility, but other movies can lose visibility or attention. The strongest spillover should usually fall on nearby substitute items, such as movies in the same genre cluster.

This notebook does not estimate causal effects yet. Instead, it defines the experimental structure that later estimators will use:

the item-row unit of analysis,
the slate-level randomized promotion assignment,
direct treatment exposure for promoted focal movies,
spillover exposure for non-promoted movies in the same slate,
stronger same-cluster spillover exposure for substitute movies,
simulated post-promotion outcomes based on relevance, visibility, and competition.

The result is an analysis-ready table with known assignment probabilities. That matters because the later notebooks can estimate direct, indirect, and total effects while clearly explaining what is randomized and what is simulated.

1. Environment and Paths

This cell imports the libraries used in the notebook and finds the repository root by searching upward for the processed MovieLens files. This makes the notebook work whether it is run from the repository root, from JupyterLab, or through nbconvert.

from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from IPython.display import display

sns.set_theme(style="whitegrid", context="notebook")
pd.set_option("display.max_columns", 100)
pd.set_option("display.max_rows", 80)
pd.set_option("display.float_format", lambda value: f"{value:,.4f}")

candidate_roots = [Path.cwd(), *Path.cwd().parents]
PROJECT_DIR = next(
    root for root in candidate_roots
    if (root / "data" / "processed" / "movielens_interference_slate_seed.parquet").exists()
)

PROCESSED_DIR = PROJECT_DIR / "data" / "processed"
NOTEBOOK_DIR = PROJECT_DIR / "notebooks" / "interference_spillover_effects"
WRITEUP_DIR = NOTEBOOK_DIR / "writeup"
FIGURE_DIR = WRITEUP_DIR / "figures"
TABLE_DIR = WRITEUP_DIR / "tables"
FIGURE_DIR.mkdir(parents=True, exist_ok=True)
TABLE_DIR.mkdir(parents=True, exist_ok=True)

SLATE_SEED_PATH = PROCESSED_DIR / "movielens_interference_slate_seed.parquet"
ITEMS_PATH = PROCESSED_DIR / "movielens_interference_items.parquet"
USERS_PATH = PROCESSED_DIR / "movielens_interference_user_features.parquet"

SLATE_SEED_PATH.exists(), ITEMS_PATH.exists(), USERS_PATH.exists()

(True, True, True)

All three checks should return True. These are the processed outputs from the setup notebook: seed slates, item features, and user features. This notebook uses those files as fixed inputs so the exposure mapping is reproducible.

2. Load the Seed Slates and Feature Tables

The seed slate table contains one row per user-slate-movie candidate. It already has a seed position, observed relevance from the user’s rating, and a genre-based spillover cluster. This cell loads the seed table and attaches user and item features that will be useful for balance checks and outcome simulation.

slate_seed = pd.read_parquet(SLATE_SEED_PATH)
items = pd.read_parquet(ITEMS_PATH)
users = pd.read_parquet(USERS_PATH)

item_feature_cols = [
    "movieId",
    "sample_rating_count",
    "sample_mean_rating",
    "sample_liked_rate",
    "popularity_bucket",
]
user_feature_cols = [
    "userId",
    "n_ratings",
    "mean_rating",
    "liked_rate",
    "active_years",
    "unique_primary_genres",
    "activity_span_days",
]

slate_base = (
    slate_seed.merge(items[item_feature_cols], on="movieId", how="left")
    .merge(users[user_feature_cols], on="userId", how="left", suffixes=("", "_user"))
)

load_summary = pd.DataFrame(
    {
        "metric": [
            "slate_item_rows",
            "unique_slates",
            "unique_users",
            "unique_movies",
            "slate_size_min",
            "slate_size_max",
            "spillover_clusters",
        ],
        "value": [
            len(slate_base),
            slate_base["slate_id"].nunique(),
            slate_base["userId"].nunique(),
            slate_base["movieId"].nunique(),
            slate_base.groupby("slate_id").size().min(),
            slate_base.groupby("slate_id").size().max(),
            slate_base["spillover_cluster"].nunique(),
        ],
    }
)

display(load_summary)
display(slate_base.head())

	metric	value
0	slate_item_rows	36000
1	unique_slates	3000
2	unique_users	3000
3	unique_movies	5668
4	slate_size_min	12
5	slate_size_max	12
6	spillover_clusters	19

	slate_id	userId	movieId	title	genres	primary_genre	spillover_cluster	slate_position_seed	observed_relevance	high_relevance	rating_datetime	rating_year	sample_rating_count	sample_mean_rating	sample_liked_rate	popularity_bucket	n_ratings	mean_rating	liked_rate	active_years	unique_primary_genres	activity_span_days
0	user_50_seed	50	4027	O Brother, Where Art Thou? (2000)	Adventure\|Comedy\|Crime	Adventure	Adventure	1	5.0000	1	2009-12-31 06:58:12	2009	581	3.8563	0.6644	very_high	118	4.2203	0.8051	1	11	1
1	user_50_seed	50	1196	Star Wars: Episode V - The Empire Strikes Back...	Action\|Adventure\|Sci-Fi	Action	Action	2	5.0000	1	2009-12-29 09:12:16	2009	1446	4.0992	0.7510	very_high	118	4.2203	0.8051	1	11	1
2	user_50_seed	50	47	Seven (a.k.a. Se7en) (1995)	Mystery\|Thriller	Mystery	Mystery	3	5.0000	1	2009-12-29 09:11:39	2009	1247	4.0545	0.7257	very_high	118	4.2203	0.8051	1	11	1
3	user_50_seed	50	52435	How the Grinch Stole Christmas! (1966)	Animation\|Comedy\|Fantasy\|Musical	Animation	Animation	4	5.0000	1	2009-12-29 09:10:49	2009	43	3.9070	0.6512	very_high	118	4.2203	0.8051	1	11	1
4	user_50_seed	50	1214	Alien (1979)	Horror\|Sci-Fi	Horror	Horror	5	5.0000	1	2009-12-29 09:09:28	2009	832	4.1088	0.7404	very_high	118	4.2203	0.8051	1	11	1

The loaded table should still have complete slates of equal size. Equal slate size keeps the simulation easy to explain: each promotion happens inside a 12-item candidate slate, and every slate has the same amount of attention to allocate.

3. Define the Randomized Promotion Design

This cell defines the randomized intervention. In each slate, one focal movie is selected from the lower-ranked positions, then the slate is randomized to either promote that focal movie or leave the slate unchanged.

The design has two stages:

Focal selection: choose one eligible lower-position movie from each slate. This creates a candidate item that could be promoted.
Promotion assignment: flip a randomized promotion flag with probability 0.5. If assigned, the focal movie moves to position 1 and earlier items shift down by one position.

This design creates a clean comparison between promoted and non-promoted focal movies while also generating spillover exposure for the other items in promoted slates.

RANDOM_SEED = 20260428
PROMOTION_PROBABILITY = 0.50
MIN_PROMOTABLE_POSITION = 5

rng = np.random.default_rng(RANDOM_SEED)

promotable_pool = slate_base.loc[
    slate_base["slate_position_seed"] >= MIN_PROMOTABLE_POSITION,
    [
        "slate_id",
        "userId",
        "movieId",
        "title",
        "spillover_cluster",
        "slate_position_seed",
        "observed_relevance",
        "high_relevance",
    ],
].copy()

focal_items = (
    promotable_pool.groupby("slate_id", group_keys=False)
    .sample(n=1, random_state=RANDOM_SEED)
    .rename(
        columns={
            "movieId": "focal_movieId",
            "title": "focal_title",
            "spillover_cluster": "focal_spillover_cluster",
            "slate_position_seed": "focal_seed_position",
            "observed_relevance": "focal_observed_relevance",
            "high_relevance": "focal_high_relevance",
        }
    )
)

focal_items = focal_items.sort_values("slate_id").reset_index(drop=True)
focal_items["promotion_probability"] = PROMOTION_PROBABILITY
focal_items["promotion_applied"] = rng.binomial(
    n=1,
    p=PROMOTION_PROBABILITY,
    size=len(focal_items),
).astype("int8")
focal_items["assignment_arm"] = np.where(
    focal_items["promotion_applied"] == 1,
    "promote_focal_item",
    "leave_slate_unchanged",
)

assignment_summary = pd.DataFrame(
    {
        "metric": [
            "random_seed",
            "promotion_probability",
            "promotable_positions_start_at",
            "slates_randomized",
            "promoted_slates",
            "control_slates",
            "observed_promotion_rate",
            "mean_focal_seed_position",
        ],
        "value": [
            RANDOM_SEED,
            PROMOTION_PROBABILITY,
            MIN_PROMOTABLE_POSITION,
            len(focal_items),
            int(focal_items["promotion_applied"].sum()),
            int((1 - focal_items["promotion_applied"]).sum()),
            focal_items["promotion_applied"].mean(),
            focal_items["focal_seed_position"].mean(),
        ],
    }
)

display(assignment_summary)
display(focal_items.head())

	metric	value
0	random_seed	20,260,428.0000
1	promotion_probability	0.5000
2	promotable_positions_start_at	5.0000
3	slates_randomized	3,000.0000
4	promoted_slates	1,505.0000
5	control_slates	1,495.0000
6	observed_promotion_rate	0.5017
7	mean_focal_seed_position	8.4700

	slate_id	userId	focal_movieId	focal_title	focal_spillover_cluster	focal_seed_position	focal_observed_relevance	focal_high_relevance	promotion_probability	promotion_applied	assignment_arm
0	user_100000_seed	100000	88810	Help, The (2011)	Drama	8	4.0000	1	0.5000	1	promote_focal_item
1	user_10000_seed	10000	2005	Goonies, The (1985)	Action	12	4.0000	1	0.5000	0	leave_slate_unchanged
2	user_100050_seed	100050	5459	Men in Black II (a.k.a. MIIB) (a.k.a. MIB 2) (...	Action	5	5.0000	1	0.5000	1	promote_focal_item
3	user_100100_seed	100100	4299	Knight's Tale, A (2001)	Action	5	5.0000	1	0.5000	0	leave_slate_unchanged
4	user_100200_seed	100200	5445	Minority Report (2002)	Action	11	5.0000	1	0.5000	0	leave_slate_unchanged

The observed promotion rate should be close to 50 percent. The focal items come from lower positions, which makes the intervention meaningful: moving a movie from position 5 or below into the top position creates a visible change and forces other items to shift.

4. Check Randomization Balance for Focal Items

Because promotion is randomized after focal selection, promoted and non-promoted focal items should look similar before treatment. This cell compares focal relevance, focal position, and user/item features across the two assignment arms. Large differences would suggest a coding problem rather than a real causal pattern.

focal_balance = (
    focal_items.merge(
        slate_base[
            [
                "slate_id",
                "movieId",
                "sample_rating_count",
                "sample_mean_rating",
                "sample_liked_rate",
                "n_ratings",
                "mean_rating",
                "liked_rate",
                "unique_primary_genres",
            ]
        ],
        left_on=["slate_id", "focal_movieId"],
        right_on=["slate_id", "movieId"],
        how="left",
    )
)

balance_vars = [
    "focal_seed_position",
    "focal_observed_relevance",
    "focal_high_relevance",
    "sample_rating_count",
    "sample_mean_rating",
    "sample_liked_rate",
    "n_ratings",
    "mean_rating",
    "liked_rate",
    "unique_primary_genres",
]

balance_table = (
    focal_balance.groupby("assignment_arm")[balance_vars]
    .mean()
    .T
    .reset_index()
    .rename(columns={"index": "feature"})
)
balance_table["promoted_minus_control"] = (
    balance_table["promote_focal_item"] - balance_table["leave_slate_unchanged"]
)

display(balance_table)

assignment_arm	feature	leave_slate_unchanged	promote_focal_item	promoted_minus_control
0	focal_seed_position	8.4468	8.4930	0.0462
1	focal_observed_relevance	4.4893	4.4897	0.0004
2	focal_high_relevance	0.9391	0.9522	0.0130
3	sample_rating_count	433.3940	441.0691	7.6751
4	sample_mean_rating	3.7199	3.7231	0.0031
5	sample_liked_rate	0.5765	0.5761	-0.0004
6	n_ratings	193.8522	200.0080	6.1558
7	mean_rating	3.7015	3.6814	-0.0200
8	liked_rate	0.5726	0.5698	-0.0028
9	unique_primary_genres	10.0749	10.2525	0.1776

The balance table is a pre-estimation diagnostic. Since assignment is random, small differences are expected by chance, but systematic differences should be limited. Later effect estimates can therefore lean on the randomized design rather than heavy covariate adjustment.

5. Visualize Focal Position Assignment

The focal item is always chosen from lower seed positions. This plot shows whether focal selection is spread across eligible positions or concentrated in one area of the slate. A spread is useful because promotion creates a range of position changes.

fig, ax = plt.subplots(figsize=(9, 4.5))
sns.countplot(data=focal_items, x="focal_seed_position", hue="assignment_arm", ax=ax)
ax.set_title("Focal Item Seed Positions by Assignment Arm")
ax.set_xlabel("Seed position before promotion")
ax.set_ylabel("Number of slates")
plt.tight_layout()
fig.savefig(FIGURE_DIR / "06_focal_position_assignment.png", dpi=160, bbox_inches="tight")
plt.show()

The assignment arms should have similar focal-position distributions. This matters because position gain is the mechanism of the direct effect. If one arm had much deeper focal items, promoted and control slates would not be comparable in a clean way.

6. Map Direct and Spillover Exposures

This cell joins the slate-level assignment back to every item row and creates the core exposure variables.

Key variables:

direct_treatment: the row is the focal item and its slate was promoted.
same_slate_spillover: another item in the same slate was promoted.
same_cluster_spillover: a non-focal item shares the promoted focal item’s spillover cluster.
displaced_by_promotion: a non-focal item was above the focal item and shifted down after promotion.
final_position: the row’s post-assignment position.
visibility_gain: change in a simple visibility score after promotion.

This is the central mapping step of the notebook.

assignment_cols = [
    "slate_id",
    "focal_movieId",
    "focal_title",
    "focal_spillover_cluster",
    "focal_seed_position",
    "promotion_probability",
    "promotion_applied",
    "assignment_arm",
]

exposure = slate_base.merge(focal_items[assignment_cols], on="slate_id", how="left")
exposure["is_focal_item"] = (exposure["movieId"] == exposure["focal_movieId"]).astype("int8")
exposure["direct_treatment"] = (
    (exposure["is_focal_item"] == 1) & (exposure["promotion_applied"] == 1)
).astype("int8")
exposure["same_slate_spillover"] = (
    (exposure["is_focal_item"] == 0) & (exposure["promotion_applied"] == 1)
).astype("int8")
exposure["same_cluster_spillover"] = (
    (exposure["same_slate_spillover"] == 1)
    & (exposure["spillover_cluster"] == exposure["focal_spillover_cluster"])
).astype("int8")
exposure["displaced_by_promotion"] = (
    (exposure["same_slate_spillover"] == 1)
    & (exposure["slate_position_seed"] < exposure["focal_seed_position"])
).astype("int8")

exposure["final_position"] = exposure["slate_position_seed"]
promoted_mask = exposure["promotion_applied"] == 1
focal_promoted_mask = promoted_mask & (exposure["is_focal_item"] == 1)
shifted_mask = promoted_mask & (exposure["is_focal_item"] == 0) & (
    exposure["slate_position_seed"] < exposure["focal_seed_position"]
)
exposure.loc[focal_promoted_mask, "final_position"] = 1
exposure.loc[shifted_mask, "final_position"] = exposure.loc[shifted_mask, "slate_position_seed"] + 1

exposure["baseline_visibility"] = 1 / np.log2(exposure["slate_position_seed"] + 1)
exposure["final_visibility"] = 1 / np.log2(exposure["final_position"] + 1)
exposure["visibility_gain"] = exposure["final_visibility"] - exposure["baseline_visibility"]
exposure["position_change"] = exposure["final_position"] - exposure["slate_position_seed"]

exposure["exposure_group"] = np.select(
    [
        exposure["direct_treatment"] == 1,
        (exposure["is_focal_item"] == 1) & (exposure["promotion_applied"] == 0),
        exposure["same_cluster_spillover"] == 1,
        exposure["same_slate_spillover"] == 1,
    ],
    [
        "direct_promoted",
        "focal_control",
        "same_cluster_spillover",
        "other_slate_spillover",
    ],
    default="unchanged_non_focal",
)

position_validity = (
    exposure.groupby("slate_id")["final_position"]
    .agg(unique_positions="nunique", min_position="min", max_position="max")
    .reset_index()
)
invalid_position_slates = position_validity.query(
    "unique_positions != 12 or min_position != 1 or max_position != 12"
)

exposure_summary = (
    exposure.groupby("exposure_group")
    .agg(
        rows=("movieId", "size"),
        slates=("slate_id", "nunique"),
        mean_seed_position=("slate_position_seed", "mean"),
        mean_final_position=("final_position", "mean"),
        mean_visibility_gain=("visibility_gain", "mean"),
        mean_relevance=("observed_relevance", "mean"),
    )
    .reset_index()
    .sort_values("rows", ascending=False)
)

display(exposure_summary)
print(f"Invalid final-position slates: {len(invalid_position_slates)}")

	exposure_group	rows	slates	mean_seed_position	mean_final_position	mean_visibility_gain	mean_relevance
4	unchanged_non_focal	16445	1495	6.3230	6.3230	0.0000	4.6058
2	other_slate_spillover	12246	1500	6.3039	6.9849	-0.0622	4.6084
3	same_cluster_spillover	4309	1245	6.3613	7.0429	-0.0619	4.6239
0	direct_promoted	1505	1505	8.4930	1.0000	0.6837	4.4897
1	focal_control	1495	1495	8.4468	8.4468	0.0000	4.4893

Invalid final-position slates: 0

The exposure groups make the interference structure explicit. The promoted focal item is the direct-treatment unit. Non-focal items in promoted slates are spillover-exposed, and same-cluster non-focal items are the most important substitute group. The final-position validity check confirms that each slate still has positions 1 through 12 after the simulated promotion.

7. Summarize Exposure Group Shares

This cell converts exposure counts into shares. The counts are useful, but shares make it easier to see how much data is available for each causal contrast. Same-cluster spillover rows are especially important because they represent plausible substitute displacement.

exposure_group_shares = exposure_summary.copy()
exposure_group_shares["row_share"] = exposure_group_shares["rows"] / exposure_group_shares["rows"].sum()
exposure_group_shares["slate_share"] = exposure_group_shares["slates"] / exposure["slate_id"].nunique()

display(exposure_group_shares)

	exposure_group	rows	slates	mean_seed_position	mean_final_position	mean_visibility_gain	mean_relevance	row_share	slate_share
4	unchanged_non_focal	16445	1495	6.3230	6.3230	0.0000	4.6058	0.4568	0.4983
2	other_slate_spillover	12246	1500	6.3039	6.9849	-0.0622	4.6084	0.3402	0.5000
3	same_cluster_spillover	4309	1245	6.3613	7.0429	-0.0619	4.6239	0.1197	0.4150
0	direct_promoted	1505	1505	8.4930	1.0000	0.6837	4.4897	0.0418	0.5017
1	focal_control	1495	1495	8.4468	8.4468	0.0000	4.4893	0.0415	0.4983

The direct-treatment group is small by design because each promoted slate has one promoted focal item. Spillover groups are larger because every other item in a promoted slate can be affected. This asymmetry is exactly why item-level analyses can be misleading if they only count the promoted item’s gain.

8. Plot Position Changes by Exposure Group

Promotion changes the focal item’s position dramatically and shifts some other items down. This plot shows the distribution of position changes. Negative values mean a movie moved upward; positive values mean it moved downward.

plot_order = [
    "direct_promoted",
    "focal_control",
    "same_cluster_spillover",
    "other_slate_spillover",
    "unchanged_non_focal",
]

fig, ax = plt.subplots(figsize=(11, 5))
sns.boxplot(
    data=exposure,
    x="exposure_group",
    y="position_change",
    order=plot_order,
    ax=ax,
    showfliers=False,
)
ax.axhline(0, color="black", linewidth=1)
ax.set_title("Position Change After Simulated Promotion")
ax.set_xlabel("Exposure group")
ax.set_ylabel("Final position minus seed position")
ax.tick_params(axis="x", rotation=25)
plt.tight_layout()
fig.savefig(FIGURE_DIR / "07_position_change_by_exposure_group.png", dpi=160, bbox_inches="tight")
plt.show()

The plot should show a strong upward move for directly promoted focal items and downward movement for displaced non-focal items. This is the mechanical source of interference: promotion does not simply add attention to one item; it reallocates attention within the slate.

9. Same-Cluster Spillover by Genre Cluster

Same-cluster spillover is not evenly distributed across genres. This cell summarizes how often each spillover cluster appears as the promoted focal cluster and how many same-cluster competitors are exposed. This helps identify where later spillover estimates will have enough support.

cluster_exposure = (
    exposure.groupby("focal_spillover_cluster")
    .agg(
        randomized_slates=("slate_id", "nunique"),
        promoted_slates=("promotion_applied", "sum"),
        direct_rows=("direct_treatment", "sum"),
        same_cluster_spillover_rows=("same_cluster_spillover", "sum"),
        same_slate_spillover_rows=("same_slate_spillover", "sum"),
        mean_focal_relevance=("observed_relevance", "mean"),
    )
    .reset_index()
    .rename(columns={"focal_spillover_cluster": "promoted_focal_cluster"})
)
cluster_exposure["same_cluster_share_of_spillover"] = (
    cluster_exposure["same_cluster_spillover_rows"]
    / cluster_exposure["same_slate_spillover_rows"].replace(0, np.nan)
)
cluster_exposure = cluster_exposure.sort_values("same_cluster_spillover_rows", ascending=False)

display(cluster_exposure.head(20))

	promoted_focal_cluster	randomized_slates	promoted_slates	direct_rows	same_cluster_spillover_rows	same_slate_spillover_rows	mean_focal_relevance	same_cluster_share_of_spillover
1	Action	799	4812	401	1692	4411	4.6135	0.3836
5	Comedy	658	4056	338	1092	3718	4.5847	0.2937
8	Drama	636	3804	317	974	3487	4.5962	0.2793
2	Adventure	318	2016	168	282	1848	4.6102	0.1526
6	Crime	253	1560	130	165	1430	4.6332	0.1154
3	Animation	70	384	32	32	352	4.5744	0.0909
11	Horror	65	372	31	29	341	4.4474	0.0850
7	Documentary	42	192	16	20	176	4.6746	0.1136
4	Children	43	240	20	9	220	4.5707	0.0409
13	Mystery	44	180	15	7	165	4.5019	0.0424
9	Fantasy	11	60	5	3	55	4.5720	0.0545
10	Film-Noir	6	48	4	2	44	4.7569	0.0455
16	Thriller	22	132	11	1	121	4.6894	0.0083
17	Western	3	24	2	1	22	4.8333	0.0455
0	(no genres listed)	4	12	1	0	11	4.1875	0.0000
12	Musical	5	36	3	0	33	4.6833	0.0000
15	Sci-Fi	18	108	9	0	99	4.5532	0.0000
14	Romance	3	24	2	0	22	4.7639	0.0000

Clusters with more same-cluster spillover rows will support more stable indirect-effect estimates. Sparse clusters can still be included in overall estimates, but later segment-level reporting should avoid over-interpreting very small groups.

10. Plot Same-Cluster Spillover Volume

This plot focuses on the largest promoted focal clusters. It shows where substitute displacement is most observable in the simulated data.

top_cluster_exposure = cluster_exposure.head(12).copy()

fig, ax = plt.subplots(figsize=(10, 5))
sns.barplot(
    data=top_cluster_exposure,
    x="same_cluster_spillover_rows",
    y="promoted_focal_cluster",
    ax=ax,
    color="tab:orange",
)
ax.set_title("Same-Cluster Spillover Rows by Promoted Focal Cluster")
ax.set_xlabel("Same-cluster spillover rows")
ax.set_ylabel("Promoted focal cluster")
plt.tight_layout()
fig.savefig(FIGURE_DIR / "08_same_cluster_spillover_volume.png", dpi=160, bbox_inches="tight")
plt.show()

The largest clusters are the best candidates for detailed spillover analysis. This is also a product-relevant view: substitution is easier to reason about when the promoted item and competing items are in the same content family.

11. Simulate Observed Outcomes Under Competition

The seed ratings tell us user-item relevance, but they are not post-promotion outcomes. This cell creates simulated binary engagement outcomes using a transparent data-generating process.

The simulated click probability depends on:

user-item relevance from the observed rating,
baseline user rating tendency,
item popularity and liked rate,
final visibility after promotion,
a positive direct boost for promoted focal items,
a small generic attention penalty for non-focal items in promoted slates,
a stronger penalty for same-cluster substitutes.

The exact outcome model is synthetic, but the assumptions are explicit. That is the right setup for this interference notebook because MovieLens does not contain real randomized exposure logs.

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

outcome_rng = np.random.default_rng(RANDOM_SEED + 17)

exposure = exposure.copy()
exposure["log_item_popularity"] = np.log1p(exposure["sample_rating_count"].fillna(0))
log_pop_mean = exposure["log_item_popularity"].mean()
log_pop_std = exposure["log_item_popularity"].std()
exposure["log_item_popularity_z"] = (
    exposure["log_item_popularity"] - log_pop_mean
) / log_pop_std

base_logit = (
    -2.75
    + 0.78 * (exposure["observed_relevance"] - 3.5)
    + 1.15 * exposure["baseline_visibility"]
    + 0.35 * (exposure["liked_rate"] - exposure["liked_rate"].mean())
    + 0.25 * (exposure["sample_liked_rate"].fillna(exposure["sample_liked_rate"].mean()) - exposure["sample_liked_rate"].mean())
    + 0.10 * exposure["log_item_popularity_z"].fillna(0)
)

observed_logit = (
    -2.75
    + 0.78 * (exposure["observed_relevance"] - 3.5)
    + 1.15 * exposure["final_visibility"]
    + 0.35 * (exposure["liked_rate"] - exposure["liked_rate"].mean())
    + 0.25 * (exposure["sample_liked_rate"].fillna(exposure["sample_liked_rate"].mean()) - exposure["sample_liked_rate"].mean())
    + 0.10 * exposure["log_item_popularity_z"].fillna(0)
    + 0.20 * exposure["direct_treatment"]
    - 0.08 * exposure["same_slate_spillover"]
    - 0.24 * exposure["same_cluster_spillover"]
    - 0.10 * exposure["displaced_by_promotion"]
)

exposure["p_no_promotion"] = sigmoid(base_logit).clip(0.01, 0.95)
exposure["p_observed"] = sigmoid(observed_logit).clip(0.01, 0.95)
exposure["known_probability_lift"] = exposure["p_observed"] - exposure["p_no_promotion"]
exposure["simulated_click"] = outcome_rng.binomial(1, exposure["p_observed"]).astype("int8")
exposure["simulated_engagement_score"] = (
    exposure["simulated_click"] * (1 + 0.15 * exposure["observed_relevance"])
).astype("float32")

outcome_summary = (
    exposure.groupby("exposure_group")
    .agg(
        rows=("movieId", "size"),
        mean_probability=("p_observed", "mean"),
        mean_no_promotion_probability=("p_no_promotion", "mean"),
        mean_probability_lift=("known_probability_lift", "mean"),
        simulated_click_rate=("simulated_click", "mean"),
        mean_engagement_score=("simulated_engagement_score", "mean"),
    )
    .reset_index()
    .sort_values("mean_probability_lift", ascending=False)
)

display(outcome_summary)

	exposure_group	rows	mean_probability	mean_no_promotion_probability	mean_probability_lift	simulated_click_rate	mean_engagement_score
0	direct_promoted	1505	0.3556	0.1762	0.1794	0.3435	0.5827
1	focal_control	1495	0.1779	0.1779	0.0000	0.1719	0.2935
4	unchanged_non_focal	16445	0.2151	0.2151	0.0000	0.2141	0.3673
2	other_slate_spillover	12246	0.1784	0.2154	-0.0370	0.1770	0.3033
3	same_cluster_spillover	4309	0.1466	0.2153	-0.0686	0.1522	0.2615

The simulated outcome table should show positive probability lift for directly promoted items and negative or near-negative lift for spillover groups. This is by design: the notebook is creating a controlled environment where later estimators should recover both promoted-item gains and competitor displacement.

12. Plot Simulated Outcomes by Exposure Group

This plot compares simulated click rates and known probability lift across exposure groups. It is not a causal estimate yet; it is a sanity check that the simulated data-generating process creates the expected pattern.

outcome_plot_df = outcome_summary.melt(
    id_vars="exposure_group",
    value_vars=["simulated_click_rate", "mean_probability_lift"],
    var_name="metric",
    value_name="value",
)
outcome_plot_df["metric"] = outcome_plot_df["metric"].map(
    {
        "simulated_click_rate": "Observed simulated click rate",
        "mean_probability_lift": "Known probability lift vs no promotion",
    }
)

fig, axes = plt.subplots(1, 2, figsize=(14, 5))
for ax, metric in zip(axes, outcome_plot_df["metric"].unique()):
    sns.barplot(
        data=outcome_plot_df.query("metric == @metric"),
        x="exposure_group",
        y="value",
        order=plot_order,
        ax=ax,
    )
    ax.axhline(0, color="black", linewidth=1)
    ax.set_title(metric)
    ax.set_xlabel("Exposure group")
    ax.set_ylabel("Value")
    ax.tick_params(axis="x", rotation=25)

plt.tight_layout()
fig.savefig(FIGURE_DIR / "09_simulated_outcomes_by_exposure_group.png", dpi=160, bbox_inches="tight")
plt.show()

The left panel reflects both relevance selection and exposure, while the right panel isolates the known probability change induced by the simulation. This distinction is important: raw click-rate differences are not the same as causal effects, even in a randomized simulation, because groups can differ in baseline relevance and position.

13. Build Slate-Level Outcomes

Interference is often best evaluated at the slate level because one item’s gain can be another item’s loss. This cell aggregates item-row outcomes into slate-level totals and separates focal, same-cluster competitor, and other competitor components.

slate_outcomes = (
    exposure.groupby("slate_id")
    .agg(
        userId=("userId", "first"),
        assignment_arm=("assignment_arm", "first"),
        promotion_applied=("promotion_applied", "first"),
        focal_movieId=("focal_movieId", "first"),
        focal_title=("focal_title", "first"),
        focal_spillover_cluster=("focal_spillover_cluster", "first"),
        focal_seed_position=("focal_seed_position", "first"),
        total_simulated_clicks=("simulated_click", "sum"),
        total_expected_clicks=("p_observed", "sum"),
        total_expected_clicks_no_promotion=("p_no_promotion", "sum"),
        total_known_probability_lift=("known_probability_lift", "sum"),
        direct_expected_lift=("known_probability_lift", lambda s: s[exposure.loc[s.index, "direct_treatment"] == 1].sum()),
        same_cluster_spillover_expected_lift=("known_probability_lift", lambda s: s[exposure.loc[s.index, "same_cluster_spillover"] == 1].sum()),
        other_spillover_expected_lift=("known_probability_lift", lambda s: s[(exposure.loc[s.index, "same_slate_spillover"] == 1) & (exposure.loc[s.index, "same_cluster_spillover"] == 0)].sum()),
    )
    .reset_index()
)

slate_outcome_summary = (
    slate_outcomes.groupby("assignment_arm")
    .agg(
        slates=("slate_id", "size"),
        mean_simulated_clicks=("total_simulated_clicks", "mean"),
        mean_expected_clicks=("total_expected_clicks", "mean"),
        mean_expected_clicks_no_promotion=("total_expected_clicks_no_promotion", "mean"),
        mean_total_known_lift=("total_known_probability_lift", "mean"),
        mean_direct_expected_lift=("direct_expected_lift", "mean"),
        mean_same_cluster_spillover_expected_lift=("same_cluster_spillover_expected_lift", "mean"),
        mean_other_spillover_expected_lift=("other_spillover_expected_lift", "mean"),
    )
    .reset_index()
)

display(slate_outcome_summary)
display(slate_outcomes.head())

	assignment_arm	slates	mean_simulated_clicks	mean_expected_clicks	mean_expected_clicks_no_promotion	mean_total_known_lift	mean_direct_expected_lift	mean_same_cluster_spillover_expected_lift	mean_other_spillover_expected_lift
0	leave_slate_unchanged	1495	2.5271	2.5438	2.5438	0.0000	0.0000	0.0000	0.0000
1	promote_focal_item	1505	2.2193	2.2269	2.5450	-0.3180	0.1794	-0.1965	-0.3010

	slate_id	userId	assignment_arm	promotion_applied	focal_movieId	focal_title	focal_spillover_cluster	focal_seed_position	total_simulated_clicks	total_expected_clicks	total_expected_clicks_no_promotion	total_known_probability_lift	direct_expected_lift	same_cluster_spillover_expected_lift	other_spillover_expected_lift
0	user_100000_seed	100000	promote_focal_item	1	88810	Help, The (2011)	Drama	8	3	1.5470	1.7488	-0.2018	0.1351	-0.0162	-0.3206
1	user_10000_seed	10000	leave_slate_unchanged	0	2005	Goonies, The (1985)	Action	12	1	2.3210	2.3210	0.0000	0.0000	0.0000	0.0000
2	user_100050_seed	100050	promote_focal_item	1	5459	Men in Black II (a.k.a. MIIB) (a.k.a. MIB 2) (...	Action	5	3	2.4909	2.8907	-0.3998	0.1923	-0.4989	-0.0933
3	user_100100_seed	100100	leave_slate_unchanged	0	4299	Knight's Tale, A (2001)	Action	5	5	2.9722	2.9722	0.0000	0.0000	0.0000	0.0000
4	user_100200_seed	100200	leave_slate_unchanged	0	5445	Minority Report (2002)	Action	11	2	3.3004	3.3004	0.0000	0.0000	0.0000	0.0000

The slate-level table is where displacement becomes visible. A promoted focal item can have a positive direct expected lift, while same-cluster and other competitors can have negative expected lifts. The total slate lift is the net product-relevant quantity.

14. Plot Slate-Level Net Lift

This plot shows the distribution of known expected total lift at the slate level. The promoted arm should have non-zero lift by construction, while the control arm should be zero because no slate positions changed.

fig, ax = plt.subplots(figsize=(10, 5))
sns.histplot(
    data=slate_outcomes,
    x="total_known_probability_lift",
    hue="assignment_arm",
    bins=60,
    element="step",
    stat="density",
    common_norm=False,
    ax=ax,
)
ax.axvline(0, color="black", linewidth=1)
ax.set_title("Slate-Level Known Probability Lift")
ax.set_xlabel("Sum of item-level probability lift in slate")
ax.set_ylabel("Density")
plt.tight_layout()
fig.savefig(FIGURE_DIR / "10_slate_level_known_lift.png", dpi=160, bbox_inches="tight")
plt.show()

This distribution shows why total effects matter. A promotion can be beneficial for the focal item but still have a muted or negative net slate effect if competitor displacement is large. Later notebooks will estimate this net effect from observed simulated outcomes.

15. Create a Compact Causal Design Summary

This cell summarizes the design choices and readiness checks in a compact table. The goal is to make the assumptions visible before any estimator is applied.

readiness_checks = pd.DataFrame(
    [
        {
            "check": "complete_seed_slates_loaded",
            "value": exposure["slate_id"].nunique(),
            "notes": "Each slate contains 12 item rows from the setup notebook.",
        },
        {
            "check": "promotion_probability",
            "value": PROMOTION_PROBABILITY,
            "notes": "Slate-level randomized assignment probability after focal selection.",
        },
        {
            "check": "observed_promotion_rate",
            "value": exposure.drop_duplicates("slate_id")["promotion_applied"].mean(),
            "notes": "Should be close to the design probability.",
        },
        {
            "check": "direct_treatment_rows",
            "value": int(exposure["direct_treatment"].sum()),
            "notes": "One directly treated focal item per promoted slate.",
        },
        {
            "check": "same_slate_spillover_rows",
            "value": int(exposure["same_slate_spillover"].sum()),
            "notes": "Non-focal items in promoted slates.",
        },
        {
            "check": "same_cluster_spillover_rows",
            "value": int(exposure["same_cluster_spillover"].sum()),
            "notes": "Non-focal items sharing the promoted focal item's cluster.",
        },
        {
            "check": "invalid_final_position_slates",
            "value": len(invalid_position_slates),
            "notes": "Should be zero; each slate should keep positions 1 through 12.",
        },
        {
            "check": "mean_promoted_slate_known_lift",
            "value": slate_outcomes.query("promotion_applied == 1")["total_known_probability_lift"].mean(),
            "notes": "Average known net expected lift in promoted slates under the simulation.",
        },
    ]
)

display(readiness_checks)

	check	value	notes
0	complete_seed_slates_loaded	3,000.0000	Each slate contains 12 item rows from the setu...
1	promotion_probability	0.5000	Slate-level randomized assignment probability ...
2	observed_promotion_rate	0.5017	Should be close to the design probability.
3	direct_treatment_rows	1,505.0000	One directly treated focal item per promoted s...
4	same_slate_spillover_rows	16,555.0000	Non-focal items in promoted slates.
5	same_cluster_spillover_rows	4,309.0000	Non-focal items sharing the promoted focal ite...
6	invalid_final_position_slates	0.0000	Should be zero; each slate should keep positio...
7	mean_promoted_slate_known_lift	-0.3180	Average known net expected lift in promoted sl...

The readiness table is a contract for the next notebook. It says how many direct and spillover observations exist, verifies valid final positions, and records the known assignment probability. Estimation notebooks should use these diagnostics before reporting causal results.

16. Save the Exposure Mapping Artifacts

This cell saves the item-row exposure mapping, slate-level outcomes, focal assignment table, and diagnostic summaries. Later notebooks can load these files directly and focus on estimation rather than rebuilding the simulation.

EXPOSURE_OUTPUT = PROCESSED_DIR / "movielens_interference_exposure_mapping.parquet"
SLATE_OUTCOME_OUTPUT = PROCESSED_DIR / "movielens_interference_slate_outcomes.parquet"
ASSIGNMENT_OUTPUT = PROCESSED_DIR / "movielens_interference_assignment_table.parquet"
READINESS_OUTPUT = PROCESSED_DIR / "movielens_interference_exposure_readiness.csv"
EXPOSURE_SUMMARY_OUTPUT = PROCESSED_DIR / "movielens_interference_exposure_group_summary.csv"
CLUSTER_EXPOSURE_OUTPUT = PROCESSED_DIR / "movielens_interference_cluster_exposure_summary.csv"

exposure.to_parquet(EXPOSURE_OUTPUT, index=False)
slate_outcomes.to_parquet(SLATE_OUTCOME_OUTPUT, index=False)
focal_items.to_parquet(ASSIGNMENT_OUTPUT, index=False)
readiness_checks.to_csv(READINESS_OUTPUT, index=False)
exposure_group_shares.to_csv(EXPOSURE_SUMMARY_OUTPUT, index=False)
cluster_exposure.to_csv(CLUSTER_EXPOSURE_OUTPUT, index=False)

saved_artifacts = pd.DataFrame(
    {
        "artifact": [
            "item_row_exposure_mapping",
            "slate_level_outcomes",
            "focal_assignment_table",
            "readiness_checks",
            "exposure_group_summary",
            "cluster_exposure_summary",
        ],
        "path": [
            str(EXPOSURE_OUTPUT),
            str(SLATE_OUTCOME_OUTPUT),
            str(ASSIGNMENT_OUTPUT),
            str(READINESS_OUTPUT),
            str(EXPOSURE_SUMMARY_OUTPUT),
            str(CLUSTER_EXPOSURE_OUTPUT),
        ],
    }
)

display(saved_artifacts)

	artifact	path
0	item_row_exposure_mapping	/home/apex/Documents/ranking_sys/data/processe...
1	slate_level_outcomes	/home/apex/Documents/ranking_sys/data/processe...
2	focal_assignment_table	/home/apex/Documents/ranking_sys/data/processe...
3	readiness_checks	/home/apex/Documents/ranking_sys/data/processe...
4	exposure_group_summary	/home/apex/Documents/ranking_sys/data/processe...
5	cluster_exposure_summary	/home/apex/Documents/ranking_sys/data/processe...

The saved exposure mapping is the main output of this notebook. It contains treatment, spillover, position, visibility, simulated outcome, and assignment variables at the item-row level. That table is ready for direct-effect and spillover-effect estimators.

17. Notebook Takeaways

This notebook converted MovieLens seed slates into a randomized interference simulation:

One eligible lower-ranked focal item was selected in each slate.
Slates were randomized to promote the focal item or leave the slate unchanged.
Direct treatment, same-slate spillover, same-cluster spillover, displacement, and final position were explicitly mapped.
Simulated outcomes were generated from relevance, visibility, and competition assumptions.
Item-level and slate-level outputs were saved for the next estimation notebook.

The next notebook should estimate direct, spillover, and total effects using the randomized assignment. A natural next step is 03_cluster_randomized_estimators.ipynb, which can compare simple difference-in-means estimators, cluster-robust standard errors, and slate-level total-effect estimates.