This notebook adds machine learning to the interference workflow.
The previous notebooks established the causal design and estimated direct, indirect, and total effects with transparent randomized estimators. That is the foundation. This notebook asks a more advanced product question:
Can we use user, item, slate, and spillover features to predict when a promotion is likely to create positive net slate value rather than simply shifting attention?
The notebook uses advanced models for three purposes:
Outcome modeling: predict total simulated slate clicks from pre-promotion features and treatment assignment.
Conditional effect modeling: predict the counterfactual difference between promoting and not promoting the focal item.
Policy targeting: compare random promotion with model-targeted promotion rules.
The important discipline is that ML does not replace the causal design. Treatment was randomized at the slate level, so the simple randomized estimator remains the baseline. The models are used to understand heterogeneity and policy targeting, and their predictions are checked against the known simulation lift available in this synthetic setup.
1. Environment and Paths
This cell imports the modeling libraries. LightGBM and XGBoost are used as flexible tree-based outcome models, while scikit-learn provides splitting, metrics, and cross-fitting utilities. The path logic searches upward for the processed interference files so the notebook works from JupyterLab or command-line execution.
from pathlib import Pathimport reimport warningsimport lightgbm as lgbimport matplotlib.pyplot as pltimport numpy as npimport pandas as pdimport seaborn as snsimport xgboost as xgbfrom IPython.display import displayfrom sklearn.metrics import mean_absolute_error, mean_squared_error, r2_scorefrom sklearn.model_selection import StratifiedKFold, train_test_splitwarnings.filterwarnings("ignore", category=UserWarning)sns.set_theme(style="whitegrid", context="notebook")pd.set_option("display.max_columns", 140)pd.set_option("display.max_rows", 100)pd.set_option("display.float_format", lambda value: f"{value:,.4f}")candidate_roots = [Path.cwd(), *Path.cwd().parents]PROJECT_DIR =next( root for root in candidate_rootsif (root /"data"/"processed"/"movielens_interference_exposure_mapping.parquet").exists())PROCESSED_DIR = PROJECT_DIR /"data"/"processed"NOTEBOOK_DIR = PROJECT_DIR /"notebooks"/"interference_spillover_effects"WRITEUP_DIR = NOTEBOOK_DIR /"writeup"FIGURE_DIR = WRITEUP_DIR /"figures"TABLE_DIR = WRITEUP_DIR /"tables"FIGURE_DIR.mkdir(parents=True, exist_ok=True)TABLE_DIR.mkdir(parents=True, exist_ok=True)EXPOSURE_PATH = PROCESSED_DIR /"movielens_interference_exposure_mapping.parquet"COMPONENT_SLATE_PATH = PROCESSED_DIR /"movielens_interference_component_slate.parquet"OBSERVED_EFFECTS_PATH = PROCESSED_DIR /"movielens_interference_observed_effects.csv"PRODUCT_SUMMARY_PATH = PROCESSED_DIR /"movielens_interference_product_summary.csv"EXPOSURE_PATH.exists(), COMPONENT_SLATE_PATH.exists(), OBSERVED_EFFECTS_PATH.exists(), PRODUCT_SUMMARY_PATH.exists()
(True, True, True, True)
All checks should return True. The advanced notebook builds on the exposure mapping, component slate table, prior randomized estimates, and product-unit decomposition created earlier.
2. Load Prior Outputs
This cell loads the item-row exposure table and the slate-level component table. The item-row table is used to construct pre-promotion slate features and reconstruct the known simulation lift. The slate-level table is used as the modeling unit because promotion was randomized at the slate level.
The product summary reminds us why advanced models are worth adding: the direct focal gain was positive, but competitor losses more than offset it. The advanced models will look for segments where that net effect is less negative or potentially positive.
3. Reconstruct the Known Counterfactual Promotion Lift
Because this is a simulation, we can reconstruct the expected outcome if every slate’s focal item were promoted. Real observational data would not give us this oracle signal, but it is useful here for validating model predictions.
This cell rebuilds the same outcome mechanism used in the exposure-mapping notebook. It computes:
expected click probability under no promotion,
expected click probability if the focal item is promoted,
true expected net lift from promotion for every slate.
This gives us a benchmark for evaluating conditional-effect models and targeted promotion rules.
The validation difference for actually promoted slates should be essentially zero. That confirms the reconstructed oracle lift matches the earlier simulation. The share of positive true lift tells us whether targeted promotion is plausible: if some slates have positive net value, a model might learn to select them.
4. Build Slate-Level Modeling Features
This cell creates the modeling table. Only pre-promotion information is used as features: focal item relevance, focal starting position, user history features, slate composition, item popularity, and cluster composition. We intentionally avoid using post-treatment variables such as final position, visibility gain, observed known lift, or simulated probabilities as model features.
The modeling table has one row per randomized slate. The treatment flag is included as a feature for outcome modeling, while all other features describe what was known before the promotion was applied. The oracle lift is kept only for validation and policy evaluation.
5. Encode Features for Tree Models
Tree models need a numeric feature matrix. This cell one-hot encodes categorical variables such as focal cluster and popularity bucket, fills missing numeric values with medians, and sanitizes feature names so both LightGBM and XGBoost handle them cleanly.
The encoded feature matrix is compact enough for fast model training. The outcome is total simulated clicks per slate, which is the metric that respects interference because it includes both focal gains and competitor losses.
6. Train LightGBM and XGBoost Outcome Models
This cell trains two gradient-boosted tree models to predict total simulated slate clicks. The treatment indicator is included, so each model can learn how promotion changes outcomes conditional on slate features.
The train/test split is stratified by treatment assignment. That keeps promoted and control slates balanced in both samples.
The metrics show how well flexible models predict slate-level outcomes. The goal is not perfect prediction; clicks are noisy. The useful question is whether the model learns enough structure to identify slates where promotion is more or less harmful.
7. Plot Outcome Model Performance
This plot compares model performance on the held-out test set. RMSE and MAE are in clicks per slate; R-squared shows how much outcome variation the model explains.
The model comparison is a reality check. If the models cannot predict slate outcomes at all, policy targeting will be unreliable. If they predict moderately well, they may still be useful for ranking slates by expected net effect.
8. Inspect Feature Importance
This cell extracts feature importance from LightGBM and XGBoost. The goal is to see whether the models are using sensible variables such as treatment assignment, focal position, slate composition, user history, and substitute counts.
Feature importance does not prove causality, but it helps audit the model. Sensible top features suggest the model is learning from pre-promotion slate structure rather than from artifacts. Treatment assignment and spillover-related features should matter if the model is capturing intervention effects.
9. Plot Top Feature Importance
The table above is dense, so this plot shows the top model features in a compact format. Each model gets its own panel.
The feature plot is useful for communication: it shows which pre-promotion properties drive the model’s predictions. If focal position, relevance, and competitor counts appear, the model is aligned with the interference mechanism we care about.
10. Generate Counterfactual Predictions
This cell uses each trained outcome model to predict two potential outcomes for every slate:
predicted total clicks if the focal item is promoted,
predicted total clicks if the slate is left unchanged.
The difference between those predictions is a model-estimated conditional net effect. Because this is a simulation, we can compare that estimate with the reconstructed true expected promotion lift.
The oracle comparison is only possible because this is a simulation. A good advanced model should at least rank slates in the right direction, even if the exact predicted lift is noisy. The selected model will be used for policy targeting below.
11. Plot Predicted Net Lift Against Oracle Net Lift
This scatter plot shows whether the selected model can identify slates with less harmful or more beneficial promotion effects. The diagonal line represents perfect prediction.
The scatter plot should be read as a targeting diagnostic, not as a replacement for randomized estimation. The key question is whether higher predicted lift corresponds to higher oracle lift. If it does, the model can help prioritize safer promotions.
12. Cross-Fitted Model-Assisted AIPW Estimate
Even though assignment is randomized, we can use outcome models to build a model-assisted estimator. This cell fits separate LightGBM outcome models for treated and control slates using cross-fitting, then computes an augmented inverse probability weighted estimate.
The treatment probability is known: promotion was randomized with probability 0.5. Cross-fitting keeps each slate’s nuisance predictions out of sample.
The AIPW estimator is a robustness check. Because the design is randomized, the simple difference-in-means is already valid. The model-assisted estimate asks whether flexible outcome models produce a similar average answer while also generating useful conditional-effect predictions.
13. Model-Targeted Promotion Policies
This cell compares several targeting rules using the selected model’s predicted net lift. Since this is a simulation, the policies are evaluated using oracle expected promotion lift. In a real setting, this would require an online experiment or a valid off-policy evaluation design.
The table reports both value per selected slate and value per 1,000 eligible slates. The second metric accounts for how many slates the policy chooses to promote.
The targeting table shows whether the model can reduce harm by avoiding slates with strongly negative predicted net lift. The oracle benchmark is not deployable, but it shows the value available if targeting were perfect.
14. Plot Policy Targeting Results
This plot compares the net oracle value of each promotion rule. The value is shown per 1,000 eligible slates, so policies that promote fewer slates are penalized for lower coverage.
policy_plot = policy_targeting.sort_values("oracle_lift_per_1000_eligible_slates", ascending=True).copy()fig, ax = plt.subplots(figsize=(11, 5.5))colors = ["tab:green"if value >=0else"tab:red"for value in policy_plot["oracle_lift_per_1000_eligible_slates"]]sns.barplot( data=policy_plot, x="oracle_lift_per_1000_eligible_slates", y="policy", hue="policy", palette=dict(zip(policy_plot["policy"], colors)), legend=False, ax=ax,)ax.axvline(0, color="black", linewidth=1)ax.set_title("Oracle Value of Model-Targeted Promotion Policies")ax.set_xlabel("Expected click lift per 1,000 eligible slates")ax.set_ylabel("")plt.tight_layout()fig.savefig(FIGURE_DIR /"23_policy_targeting_value.png", dpi=160, bbox_inches="tight")plt.show()
The policy plot translates advanced modeling into a decision question. If targeted rules are less negative than promoting all slates, the model is useful even if it does not fully solve the problem. The safest policy may still be to avoid promotion unless predicted net value is positive.
15. Heterogeneity Diagnostics
This cell summarizes true and predicted net lift across interpretable segments: focal starting position, substitute count, displaced count, user activity, and focal cluster. The goal is to connect ML predictions back to product reasoning.
The heterogeneity table makes the targeting story interpretable. Segments with less negative or positive true lift are safer promotion candidates. Segments with many substitutes or large displacement exposure are more likely to show negative net effects.
16. Plot Interpretable Heterogeneity
This plot focuses on the most actionable segment types: focal starting position and same-cluster competitor count. These are directly tied to the mechanism of item competition.
heterogeneity_plot = heterogeneity_summaries.query("segment_type in ['focal_position_bucket', 'same_cluster_count_bucket', 'displaced_count_bucket']").copy()heterogeneity_plot["segment_label"] = heterogeneity_plot["segment_type"] +": "+ heterogeneity_plot["segment"].astype(str)fig, ax = plt.subplots(figsize=(11, 6))sns.barplot( data=heterogeneity_plot.sort_values("mean_true_lift"), x="mean_true_lift", y="segment_label", hue="segment_type", dodge=False, ax=ax,)ax.axvline(0, color="black", linewidth=1)ax.set_title("Oracle Net Lift by Interpretable Segments")ax.set_xlabel("Mean expected lift if promoted")ax.set_ylabel("")ax.legend(title="Segment type")plt.tight_layout()fig.savefig(FIGURE_DIR /"24_advanced_heterogeneity_segments.png", dpi=160, bbox_inches="tight")plt.show()
The segment plot connects advanced modeling back to recommendation design. If promotions are most harmful in slates with many displaced or same-cluster competitors, then safer policies should account for local slate competition before promoting an item.
17. Advanced Modeling Takeaways Table
This cell creates a compact summary table for the final report. It records what the advanced models added beyond the transparent randomized estimators.
best_metrics = model_metrics.query("split == 'test' and model == @best_model_name").iloc[0]best_counterfactual = counterfactual_summary.query("model == @selected_cate_model").iloc[0]best_policy = policy_targeting.sort_values("oracle_lift_per_1000_eligible_slates", ascending=False).iloc[0]advanced_takeaways = pd.DataFrame( [ {"area": "Outcome modeling","finding": f"{best_model_name} had the best held-out RMSE at {best_metrics['rmse']:.3f} clicks per slate.","why_it_matters": "Flexible models can summarize how slate composition and treatment assignment relate to total outcomes.", }, {"area": "Conditional effects","finding": f"{selected_cate_model} had CATE RMSE {best_counterfactual['cate_rmse_vs_oracle']:.3f} versus the oracle simulation lift.","why_it_matters": "The model can be used to rank slates by predicted net harm or benefit, with simulation validation.", }, {"area": "Model-assisted estimation","finding": f"Cross-fitted AIPW estimated {aipw_estimate:.3f} total clicks per slate, compared with {simple_difference:.3f} from the randomized difference-in-means.","why_it_matters": "Model-assisted estimates should agree with the randomized baseline before being used for targeting.", }, {"area": "Policy targeting","finding": f"The best evaluated policy was '{best_policy['policy']}' with {best_policy['oracle_lift_per_1000_eligible_slates']:.1f} oracle lift per 1,000 eligible slates.","why_it_matters": "Targeting can reduce displacement harm compared with promoting every eligible slate.", }, ])display(advanced_takeaways)
area
finding
why_it_matters
0
Outcome modeling
XGBoost had the best held-out RMSE at 1.405 cl...
Flexible models can summarize how slate compos...
1
Conditional effects
XGBoost had CATE RMSE 0.161 versus the oracle ...
The model can be used to rank slates by predic...
2
Model-assisted estimation
Cross-fitted AIPW estimated -0.304 total click...
Model-assisted estimates should agree with the...
3
Policy targeting
The best evaluated policy was 'Promote only pr...
Targeting can reduce displacement harm compare...
This table is designed for the final report. It keeps the advanced work concrete: model performance, conditional-effect validation, model-assisted estimation, and policy targeting.
18. Save Advanced Modeling Outputs
This cell saves all advanced modeling artifacts. The final report notebook can load these directly rather than recomputing models.
The saved files complete the advanced-modeling handoff. The key artifacts are the counterfactual predictions, AIPW estimate table, policy targeting table, and advanced takeaways.
19. Notebook Takeaways
This notebook added advanced modeling without changing the causal foundation:
LightGBM and XGBoost were trained to predict total slate outcomes from pre-promotion features and treatment assignment.
Counterfactual predictions estimated conditional net promotion effects for each slate.
A cross-fitted model-assisted AIPW estimate was compared with the simple randomized estimator.
Targeted promotion rules were evaluated against the oracle simulation lift.
Heterogeneity summaries showed where promotion is most likely to be safer or more harmful.
The final notebook should now package the full workflow: dataset setup, exposure mapping, randomized estimators, direct/indirect/total decomposition, advanced models, assumptions, limitations, figures, tables, and portfolio-ready writing.