09 - Final Report Figures And Tables

Goal: package the project into portfolio-ready artifacts.

The previous notebooks developed the analysis step by step. This final notebook does not introduce new methodology. Instead, it curates the final story:

The notebook saves polished figures and tables into notebooks/projects/project_1_ranking/writeup/figures/ and notebooks/projects/project_1_ranking/writeup/tables/.

Executive Summary

This project studies a core recommendation-system question:

Does placing an item higher in the ranking cause more engagement, or are high-ranked items simply more relevant and therefore more likely to be clicked anyway?

Using MIND impression logs, we define:

  • Treatment: an item appears in the top 3 positions.
  • Control: an item appears below position 3.
  • Outcome: the displayed item is clicked.
  • Covariates: user-history, item metadata, slate-size, time, and item-exposure features.

The final report should make a careful claim: top-ranked exposure is associated with higher click probability after causal adjustment, but the estimate depends on observational assumptions and should be validated with online experimentation.

Notebook Setup

This cell imports the libraries needed to regenerate final figures and tables. It also sets a consistent plotting style and suppresses a harmless LightGBM/sklearn feature-name warning that can appear after preprocessing.

from pathlib import Path
import warnings

import lightgbm as lgb
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.base import clone
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.metrics import average_precision_score, brier_score_loss, roc_auc_score
from sklearn.model_selection import StratifiedKFold
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler

sns.set_theme(style="whitegrid", context="talk")
pd.set_option("display.max_columns", 100)
pd.set_option("display.float_format", "{:.4f}".format)

warnings.filterwarnings(
    "ignore",
    message="X does not have valid feature names.*",
    category=UserWarning,
)

This cell prepares the notebook environment for final report tables, figures, and portfolio artifacts. There is no substantive model result yet; the important outcome is that the imports and display settings are ready so the next cells can focus on the data and causal question.

Create Report Output Folders

This cell finds the project root and creates output directories for final artifacts. Figures are saved as PNG files and tables are saved as CSV files. The written files can be referenced from the README, final memo, or portfolio page.

DATA_RELATIVE_PATH = Path("data/processed/mind_small_impressions_train_sample.parquet")
PROJECT_ROOT = next(
    path
    for path in [Path.cwd(), *Path.cwd().parents]
    if (path / DATA_RELATIVE_PATH).exists()
)

DATA_PATH = PROJECT_ROOT / DATA_RELATIVE_PATH
REPORT_DIR = PROJECT_ROOT / "notebooks/projects/project_1_ranking/writeup"
FIGURE_DIR = REPORT_DIR / "figures"
TABLE_DIR = REPORT_DIR / "tables"

FIGURE_DIR.mkdir(parents=True, exist_ok=True)
TABLE_DIR.mkdir(parents=True, exist_ok=True)

def save_figure(name):
    path = FIGURE_DIR / f"{name}.png"
    plt.savefig(path, dpi=180, bbox_inches="tight")
    return path


def save_table(table, name):
    path = TABLE_DIR / f"{name}.csv"
    table.to_csv(path, index=False)
    return path


FIGURE_DIR, TABLE_DIR
(PosixPath('/home/apex/Documents/ranking_sys/notebooks/projects/project_1_ranking/writeup/figures'),
 PosixPath('/home/apex/Documents/ranking_sys/notebooks/projects/project_1_ranking/writeup/tables'))

The printed paths are a reproducibility checkpoint. Once the notebook can find the input data and output folders, the analysis can run from a clean checkout without manual path edits.

Load The Analysis Table

This cell loads the processed impression-level table. Each row represents one displayed item inside an impression. This table is the source for the report’s descriptive figures and compact causal-estimation rerun.

df = pd.read_parquet(DATA_PATH)

pd.Series(
    {
        "displayed_item_rows": len(df),
        "impressions": df["impression_id"].nunique(),
        "users": df["user_id"].nunique(),
        "news_items": df["news_id"].nunique(),
        "overall_ctr": df["clicked"].mean(),
        "max_rank_position": df["rank_position"].max(),
    }
)
displayed_item_rows   737762.0000
impressions            20000.0000
users                  15427.0000
news_items             12349.0000
overall_ctr                0.0405
max_rank_position        294.0000
dtype: float64

The loaded table preview and shape confirm that the notebook is using the expected processed dataset. This check anchors the rest of the analysis, because all treatment, outcome, and covariate definitions depend on these columns being present and correctly typed.

Figure 1: Naive CTR By Rank

This figure establishes the descriptive fact that motivates the causal analysis. Items displayed at higher positions receive more clicks. The figure is intentionally labeled naive because it does not adjust for relevance, popularity, or ranking-policy selection.

Create And Save Naive CTR Figures

This cell computes click-through rate by exact rank and by rank bucket. It saves a polished line chart for exact ranks 1 through 50 and a table of bucket-level CTR values.

ctr_by_rank = (
    df.groupby("rank_position")
    .agg(clicks=("clicked", "sum"), impressions=("clicked", "size"), ctr=("clicked", "mean"))
    .reset_index()
)

plt.figure(figsize=(10, 5.5))
sns.lineplot(data=ctr_by_rank.query("rank_position <= 50"), x="rank_position", y="ctr", marker="o")
plt.title("Naive Click-Through Rate By Rank Position")
plt.xlabel("Rank position")
plt.ylabel("Click-through rate")
plt.tight_layout()
ctr_figure_path = save_figure("01_naive_ctr_by_rank")

rank_bins = [0, 1, 3, 10, 25, 50, df["rank_position"].max()]
rank_labels = ["1", "2-3", "4-10", "11-25", "26-50", "51+"]
rank_bucket_table = (
    df.assign(
        rank_bucket=pd.cut(
            df["rank_position"],
            bins=rank_bins,
            labels=rank_labels,
            include_lowest=True,
            duplicates="drop",
        )
    )
    .groupby("rank_bucket", observed=True)
    .agg(clicks=("clicked", "sum"), impressions=("clicked", "size"), ctr=("clicked", "mean"))
    .reset_index()
)
rank_bucket_table_path = save_table(rank_bucket_table, "rank_bucket_ctr")

ctr_figure_path, rank_bucket_table_path, rank_bucket_table
(PosixPath('/home/apex/Documents/ranking_sys/notebooks/projects/project_1_ranking/writeup/figures/01_naive_ctr_by_rank.png'),
 PosixPath('/home/apex/Documents/ranking_sys/notebooks/projects/project_1_ranking/writeup/tables/rank_bucket_ctr.csv'),
   rank_bucket  clicks  impressions    ctr
 0           1    2229        20000 0.1114
 1         2-3    3729        38646 0.0965
 2        4-10    6828       114987 0.0594
 3       11-25    7455       181168 0.0411
 4       26-50    5340       177476 0.0301
 5         51+    4313       205485 0.0210)

This cell creates reusable project artifacts for the writeup. Saving figures, tables, limitations, and resume bullets makes the analysis easier to present outside the notebook itself.

Final Causal Estimate Rerun

The earlier notebooks contain the detailed methodology. For the final report, we rerun a compact version of the primary adjusted estimate using LightGBM nuisance models and cross-fitted AIPW.

This produces report-ready numbers for:

  • Naive lift.
  • IPW-adjusted lift.
  • Outcome-regression lift.
  • Doubly robust AIPW lift.

The rerun uses a deterministic sample so the notebook stays fast and reproducible.

Create The Final Modeling Sample

This cell creates the modeling sample and explicit treatment/outcome columns. treatment is top-3 exposure and outcome is click. log_item_exposures is included as a non-click item exposure proxy.

MODEL_SAMPLE_SIZE = 60_000
RANDOM_STATE = 42

model_df = (
    df.sample(n=min(len(df), MODEL_SAMPLE_SIZE), random_state=RANDOM_STATE)
    .reset_index(drop=True)
    .copy()
)

model_df["treatment"] = model_df["is_top_3"].astype(int)
model_df["outcome"] = model_df["clicked"].astype(int)
model_df["log_item_exposures"] = np.log1p(model_df["item_exposures"])
model_df["treatment_label"] = np.where(model_df["treatment"] == 1, "top_3", "rank_4_plus")

pd.Series(
    {
        "rows": len(model_df),
        "treatment_rate_top_3": model_df["treatment"].mean(),
        "click_rate": model_df["outcome"].mean(),
    }
)
rows                   60000.0000
treatment_rate_top_3       0.0800
click_rate                 0.0393
dtype: float64

This cell defines the working analysis sample and standardizes treatment/outcome columns. Fixing this sample early keeps later model comparisons fair because each estimator works on the same rows and target definition.

Define Final Model Features

This cell defines the adjustment features used for the final compact AIPW rerun. They mirror the main analysis: user history, slate size, simple text features, time context, exposure, category, and subcategory.

numeric_features = [
    "history_len",
    "candidate_set_size",
    "title_length",
    "abstract_length",
    "hour",
    "day_of_week",
    "log_item_exposures",
]
categorical_features = ["category", "subcategory"]
propensity_features = numeric_features + categorical_features
outcome_numeric_features = numeric_features + ["treatment"]
outcome_features = outcome_numeric_features + categorical_features

propensity_features, outcome_features
(['history_len',
  'candidate_set_size',
  'title_length',
  'abstract_length',
  'hour',
  'day_of_week',
  'log_item_exposures',
  'category',
  'subcategory'],
 ['history_len',
  'candidate_set_size',
  'title_length',
  'abstract_length',
  'hour',
  'day_of_week',
  'log_item_exposures',
  'treatment',
  'category',
  'subcategory'])

The feature lists define what information is allowed into the adjustment models. These are pre-treatment or contextual variables intended to reduce confounding without using the outcome itself as an input.

Define LightGBM AIPW Helpers

This cell defines reusable helpers for preprocessing, LightGBM models, weighted means, and cross-fitted AIPW estimation. The code is compact because this notebook focuses on final outputs rather than teaching every estimator detail again.

def make_preprocessor(numeric_cols, categorical_cols):
    return ColumnTransformer(
        transformers=[
            (
                "num",
                Pipeline(
                    steps=[
                        ("imputer", SimpleImputer(strategy="median")),
                        ("scaler", StandardScaler()),
                    ]
                ),
                numeric_cols,
            ),
            (
                "cat",
                Pipeline(
                    steps=[
                        ("imputer", SimpleImputer(strategy="most_frequent")),
                        (
                            "onehot",
                            OneHotEncoder(
                                handle_unknown="infrequent_if_exist",
                                min_frequency=50,
                                sparse_output=True,
                            ),
                        ),
                    ]
                ),
                categorical_cols,
            ),
        ]
    )


def make_lgbm_classifier():
    return lgb.LGBMClassifier(
        objective="binary",
        n_estimators=160,
        learning_rate=0.05,
        num_leaves=31,
        min_child_samples=100,
        subsample=0.8,
        colsample_bytree=0.8,
        reg_lambda=1.0,
        random_state=RANDOM_STATE,
        n_jobs=-1,
        verbose=-1,
    )


def make_lgbm_regressor():
    return lgb.LGBMRegressor(
        objective="regression",
        n_estimators=160,
        learning_rate=0.05,
        num_leaves=31,
        min_child_samples=100,
        subsample=0.8,
        colsample_bytree=0.8,
        reg_lambda=1.0,
        random_state=RANDOM_STATE,
        n_jobs=-1,
        verbose=-1,
    )


def make_pipeline(task, numeric_cols, categorical_cols):
    model = make_lgbm_classifier() if task == "classification" else make_lgbm_regressor()
    return Pipeline(
        steps=[
            ("preprocess", make_preprocessor(numeric_cols, categorical_cols)),
            ("model", model),
        ]
    )


def weighted_mean(values, weights):
    values = np.asarray(values, dtype=float)
    weights = np.asarray(weights, dtype=float)
    return np.sum(values * weights) / np.sum(weights)

This cell creates reusable modeling machinery rather than a final result. The value is consistency: the same preprocessing and helper functions can be applied across folds, estimators, and sensitivity checks.

Run Cross-Fitted AIPW For The Final Estimate Table

This cell fits cross-fitted propensity and outcome models, computes AIPW scores, and produces the final estimator comparison table. The table is saved for the report.

N_FOLDS = 2
EPS = 0.01

e_hat = np.zeros(len(model_df))
mu1_hat = np.zeros(len(model_df))
mu0_hat = np.zeros(len(model_df))
propensity_metrics = []
outcome_metrics = []

splitter = StratifiedKFold(n_splits=N_FOLDS, shuffle=True, random_state=RANDOM_STATE)
for fold, (train_idx, valid_idx) in enumerate(splitter.split(model_df[propensity_features], model_df["treatment"]), start=1):
    train_df = model_df.iloc[train_idx]
    valid_df = model_df.iloc[valid_idx]

    propensity_model = make_pipeline("classification", numeric_features, categorical_features)
    propensity_model.fit(train_df[propensity_features], train_df["treatment"])
    e_valid = propensity_model.predict_proba(valid_df[propensity_features])[:, 1]
    e_hat[valid_idx] = e_valid

    outcome_model = make_pipeline("regression", outcome_numeric_features, categorical_features)
    outcome_model.fit(train_df[outcome_features], train_df["outcome"])
    y_valid_hat = outcome_model.predict(valid_df[outcome_features])

    valid_treated = valid_df[propensity_features].copy()
    valid_treated["treatment"] = 1
    valid_treated = valid_treated[outcome_features]

    valid_control = valid_df[propensity_features].copy()
    valid_control["treatment"] = 0
    valid_control = valid_control[outcome_features]

    mu1_hat[valid_idx] = outcome_model.predict(valid_treated)
    mu0_hat[valid_idx] = outcome_model.predict(valid_control)

    propensity_metrics.append(
        {
            "fold": fold,
            "roc_auc": roc_auc_score(valid_df["treatment"], e_valid),
            "average_precision": average_precision_score(valid_df["treatment"], e_valid),
            "brier_score": brier_score_loss(valid_df["treatment"], e_valid),
        }
    )
    outcome_metrics.append(
        {
            "fold": fold,
            "roc_auc": roc_auc_score(valid_df["outcome"], y_valid_hat),
            "average_precision": average_precision_score(valid_df["outcome"], y_valid_hat),
            "brier_score": brier_score_loss(valid_df["outcome"], np.clip(y_valid_hat, 0, 1)),
        }
    )

model_df["e_hat"] = e_hat
model_df["mu1_hat"] = mu1_hat
model_df["mu0_hat"] = mu0_hat
model_df["mu_diff_hat"] = model_df["mu1_hat"] - model_df["mu0_hat"]

e = model_df["e_hat"].clip(EPS, 1 - EPS).to_numpy()
t_np = model_df["treatment"].to_numpy()
y_np = model_df["outcome"].to_numpy()
mu1 = model_df["mu1_hat"].to_numpy()
mu0 = model_df["mu0_hat"].to_numpy()

model_df["aipw_score"] = (mu1 - mu0) + t_np * (y_np - mu1) / e - (1 - t_np) * (y_np - mu0) / (1 - e)

ipw_weights = np.where(t_np == 1, 1 / e, 1 / (1 - e))
ipw_weights = np.clip(ipw_weights, None, np.quantile(ipw_weights, 0.99))
treated = t_np == 1
control = ~treated

naive_top3_ctr = model_df.loc[model_df["treatment"] == 1, "outcome"].mean()
naive_lower_ctr = model_df.loc[model_df["treatment"] == 0, "outcome"].mean()
ipw_top3_ctr = weighted_mean(y_np[treated], ipw_weights[treated])
ipw_lower_ctr = weighted_mean(y_np[control], ipw_weights[control])
dr_lift = model_df["aipw_score"].mean()
dr_se = model_df["aipw_score"].std(ddof=1) / np.sqrt(len(model_df))

estimator_comparison = pd.DataFrame(
    [
        {"estimator": "naive", "lift": naive_top3_ctr - naive_lower_ctr, "ci_95_lower": np.nan, "ci_95_upper": np.nan},
        {"estimator": "ipw_99cap", "lift": ipw_top3_ctr - ipw_lower_ctr, "ci_95_lower": np.nan, "ci_95_upper": np.nan},
        {"estimator": "outcome_regression", "lift": model_df["mu_diff_hat"].mean(), "ci_95_lower": np.nan, "ci_95_upper": np.nan},
        {"estimator": "aipw_lgbm", "lift": dr_lift, "ci_95_lower": dr_lift - 1.96 * dr_se, "ci_95_upper": dr_lift + 1.96 * dr_se},
    ]
)

estimator_table_path = save_table(estimator_comparison, "estimator_comparison")
nuisance_metrics = pd.concat(
    [
        pd.DataFrame(propensity_metrics).assign(model="propensity"),
        pd.DataFrame(outcome_metrics).assign(model="outcome"),
    ],
    ignore_index=True,
)
nuisance_metrics_path = save_table(nuisance_metrics, "nuisance_model_metrics")

estimator_table_path, nuisance_metrics_path, estimator_comparison
(PosixPath('/home/apex/Documents/ranking_sys/notebooks/projects/project_1_ranking/writeup/tables/estimator_comparison.csv'),
 PosixPath('/home/apex/Documents/ranking_sys/notebooks/projects/project_1_ranking/writeup/tables/nuisance_model_metrics.csv'),
             estimator    lift  ci_95_lower  ci_95_upper
 0               naive  0.0630          NaN          NaN
 1           ipw_99cap  0.0024          NaN          NaN
 2  outcome_regression  0.0016          NaN          NaN
 3           aipw_lgbm -0.0049      -0.0113       0.0014)

Cross-fitting creates out-of-sample nuisance predictions for treatment and outcome models. This reduces overfitting bias and makes the later doubly robust scores more credible.

Save The Estimator Comparison Figure

This cell creates a polished estimator comparison plot for the final report. It contrasts the raw descriptive lift with adjusted estimates.

plt.figure(figsize=(9, 5.5))
sns.barplot(data=estimator_comparison, x="estimator", y="lift", color="#4C78A8")
for i, row in estimator_comparison.dropna(subset=["ci_95_lower", "ci_95_upper"]).iterrows():
    plt.errorbar(
        x=i,
        y=row["lift"],
        yerr=[[row["lift"] - row["ci_95_lower"]], [row["ci_95_upper"] - row["lift"]]],
        fmt="none",
        color="black",
        capsize=4,
    )
plt.axhline(0, color="black", linewidth=1)
plt.title("Top-3 Lift: Naive And Adjusted Estimates")
plt.xlabel("Estimator")
plt.ylabel("Lift in click probability")
plt.xticks(rotation=20, ha="right")
plt.tight_layout()
estimator_figure_path = save_figure("02_estimator_comparison")

estimator_figure_path
PosixPath('/home/apex/Documents/ranking_sys/notebooks/projects/project_1_ranking/writeup/figures/02_estimator_comparison.png')

This cell creates reusable project artifacts for the writeup. Saving figures, tables, limitations, and resume bullets makes the analysis easier to present outside the notebook itself.

Final Heterogeneous Effect Summary

A global average effect is useful, but product teams need to know where an intervention is most valuable. This section summarizes AIPW scores by interpretable segments and saves the strongest segment table and a category-level figure.

Create Segment Columns

This cell creates the segment columns used for final heterogeneity summaries: category, subcategory, history bucket, candidate-set bucket, item-exposure quartile, and time of day.

model_df["history_bucket"] = pd.cut(
    model_df["history_len"],
    bins=[-1, 0, 10, 30, 100, np.inf],
    labels=["0", "1-10", "11-30", "31-100", "101+"],
)
model_df["candidate_set_bucket"] = pd.cut(
    model_df["candidate_set_size"],
    bins=[0, 10, 25, 50, 100, np.inf],
    labels=["1-10", "11-25", "26-50", "51-100", "101+"],
    include_lowest=True,
)
model_df["item_exposure_quartile"] = pd.qcut(
    model_df["item_exposures"].rank(method="first"),
    q=4,
    labels=["Q1 lowest", "Q2", "Q3", "Q4 highest"],
)
model_df["time_of_day"] = pd.cut(
    model_df["hour"],
    bins=[-1, 5, 11, 16, 20, 23],
    labels=["overnight", "morning", "afternoon", "evening", "late_evening"],
)

segment_columns = ["category", "subcategory", "history_bucket", "candidate_set_bucket", "item_exposure_quartile", "time_of_day"]
model_df[segment_columns].head()
category subcategory history_bucket candidate_set_bucket item_exposure_quartile time_of_day
0 news newsworld 31-100 26-50 Q2 afternoon
1 sports football_ncaa 31-100 51-100 Q2 overnight
2 news elections-2020-us 31-100 51-100 Q2 afternoon
3 travel traveltripideas 31-100 51-100 Q3 morning
4 news newsworld 31-100 51-100 Q2 morning

The segment columns translate raw covariates into product-readable groups. This prepares the analysis for heterogeneity and policy simulation, where segment-level effects are easier to act on than row-level scores.

Estimate Segment-Level AIPW Lift

This cell computes segment-level AIPW lift and confidence intervals. The segment table is used both for the heterogeneity figure and for the policy simulation.

def segment_effects(data, segment_col, min_rows=500, min_treated=30, min_control=250):
    rows = []
    for segment_value, group in data.groupby(segment_col, observed=True, dropna=False):
        n_rows = len(group)
        treated_rows = int(group["treatment"].sum())
        control_rows = n_rows - treated_rows
        if n_rows < min_rows or treated_rows < min_treated or control_rows < min_control:
            continue
        scores = group["aipw_score"].to_numpy()
        lift = scores.mean()
        se = scores.std(ddof=1) / np.sqrt(n_rows)
        naive_lift = group.loc[group["treatment"] == 1, "outcome"].mean() - group.loc[group["treatment"] == 0, "outcome"].mean()
        rows.append(
            {
                "segment_col": segment_col,
                "segment": str(segment_value),
                "rows": n_rows,
                "treated_rows": treated_rows,
                "control_rows": control_rows,
                "naive_lift": naive_lift,
                "aipw_lift": lift,
                "standard_error": se,
                "ci_95_lower": lift - 1.96 * se,
                "ci_95_upper": lift + 1.96 * se,
                "promotion_opportunities": control_rows,
            }
        )
    return pd.DataFrame(rows).sort_values("aipw_lift", ascending=False).reset_index(drop=True)


effect_tables = {
    segment_col: segment_effects(model_df, segment_col)
    for segment_col in segment_columns
}
all_segment_effects = pd.concat(effect_tables.values(), ignore_index=True)
top_segment_effects = all_segment_effects.sort_values("aipw_lift", ascending=False).head(25)
top_segment_table_path = save_table(top_segment_effects, "top_segment_effects")

top_segment_table_path, top_segment_effects.head(10)
(PosixPath('/home/apex/Documents/ranking_sys/notebooks/projects/project_1_ranking/writeup/tables/top_segment_effects.csv'),
              segment_col            segment  rows  treated_rows  control_rows  \
 14           subcategory       baseball_mlb   607            38           569   
 54  candidate_set_bucket               1-10  2350          1224          1126   
 15           subcategory  elections-2020-us   893           111           782   
 16           subcategory         travelnews  1244            81          1163   
 17           subcategory  weathertopstories   843           103           740   
 0               category            weather   843           103           740   
 18           subcategory             tvnews   793            97           696   
 19           subcategory             voices   736            35           701   
 1               category             sports  5993           522          5471   
 2               category             travel  3291           180          3111   
 
     naive_lift  aipw_lift  standard_error  ci_95_lower  ci_95_upper  \
 14      0.1087     0.0969          0.0663      -0.0330       0.2268   
 54      0.1294     0.0337          0.0159       0.0024       0.0649   
 15      0.0518     0.0307          0.0641      -0.0950       0.1564   
 16      0.1163     0.0251          0.0264      -0.0267       0.0768   
 17      0.1245     0.0167          0.0300      -0.0420       0.0755   
 0       0.1245     0.0167          0.0300      -0.0420       0.0755   
 18      0.1502     0.0165          0.0297      -0.0418       0.0748   
 19      0.0201     0.0156          0.0360      -0.0551       0.0862   
 1       0.1175     0.0153          0.0148      -0.0137       0.0444   
 2       0.0629     0.0116          0.0150      -0.0178       0.0410   
 
     promotion_opportunities  
 14                      569  
 54                     1126  
 15                      782  
 16                     1163  
 17                      740  
 0                       740  
 18                      696  
 19                      701  
 1                      5471  
 2                      3111  )

This helper defines how segment-level effects will be computed and filtered. Minimum row, treatment, and control counts keep the segment results from being driven by tiny groups.

Save Category-Level Heterogeneity Figure

This cell saves a category-level AIPW lift figure. Category is a useful final-report view because it is easier to explain than high-cardinality subcategories.

category_effects = effect_tables["category"].sort_values("aipw_lift")
plt.figure(figsize=(10, 6))
y = np.arange(len(category_effects))
plt.errorbar(
    x=category_effects["aipw_lift"],
    y=y,
    xerr=[
        category_effects["aipw_lift"] - category_effects["ci_95_lower"],
        category_effects["ci_95_upper"] - category_effects["aipw_lift"],
    ],
    fmt="o",
    capsize=3,
    color="#4C78A8",
)
plt.axvline(0, color="black", linewidth=1)
plt.yticks(y, category_effects["segment"])
plt.title("AIPW Top-3 Lift By Category")
plt.xlabel("Lift in click probability")
plt.ylabel("Category")
plt.tight_layout()
category_figure_path = save_figure("03_category_heterogeneous_effects")

category_figure_path
PosixPath('/home/apex/Documents/ranking_sys/notebooks/projects/project_1_ranking/writeup/figures/03_category_heterogeneous_effects.png')

The category plot highlights which content groups have the largest estimated ranking lift. This gives the project a more product-facing story than a single global average effect.

Final Policy Simulation Summary

The policy simulation converts segment-level lift into a simple prioritization exercise. If top-3 slots are scarce, which segment dimension would allocate a limited promotion budget to the highest expected incremental clicks?

This remains an offline sizing exercise, not a production policy.

Simulate Budgeted Segment Promotion Policies

This cell allocates a 5% promotion budget across each segmentation dimension. It compares optimistic allocation using point estimates with conservative allocation using lower confidence bounds.

def allocate_budget(effect_df, budget, value_col="aipw_lift", min_value=0.0):
    candidates = effect_df.copy()
    candidates = candidates[candidates[value_col] > min_value].sort_values(value_col, ascending=False)
    remaining = int(budget)
    rows = []
    for _, row in candidates.iterrows():
        if remaining <= 0:
            break
        allocated = min(remaining, int(row["promotion_opportunities"]))
        if allocated <= 0:
            continue
        rows.append(
            {
                "segment_col": row["segment_col"],
                "segment": row["segment"],
                "allocated_promotions": allocated,
                "value_used": row[value_col],
                "expected_incremental_clicks": allocated * row[value_col],
            }
        )
        remaining -= allocated
    return pd.DataFrame(rows)


budget = int((model_df["treatment"] == 0).sum() * 0.05)
policy_rows = []
for segment_col, effect_df in effect_tables.items():
    for value_col in ["aipw_lift", "ci_95_lower"]:
        allocation = allocate_budget(effect_df, budget=budget, value_col=value_col)
        expected_clicks = allocation["expected_incremental_clicks"].sum() if len(allocation) else 0.0
        allocated_promotions = allocation["allocated_promotions"].sum() if len(allocation) else 0
        policy_rows.append(
            {
                "segment_dimension": segment_col,
                "value_rule": value_col,
                "budget": budget,
                "allocated_promotions": allocated_promotions,
                "expected_incremental_clicks": expected_clicks,
                "avg_incremental_click_prob_per_promotion": expected_clicks / allocated_promotions if allocated_promotions else 0.0,
            }
        )

global_baseline_clicks = budget * max(dr_lift, 0)
policy_rows.append(
    {
        "segment_dimension": "global_baseline",
        "value_rule": "global_aipw_lift",
        "budget": budget,
        "allocated_promotions": budget,
        "expected_incremental_clicks": global_baseline_clicks,
        "avg_incremental_click_prob_per_promotion": global_baseline_clicks / budget if budget else 0.0,
    }
)

policy_summary = pd.DataFrame(policy_rows).sort_values("expected_incremental_clicks", ascending=False)
policy_table_path = save_table(policy_summary, "policy_simulation_summary")

policy_table_path, policy_summary.head(12)
(PosixPath('/home/apex/Documents/ranking_sys/notebooks/projects/project_1_ranking/writeup/tables/policy_simulation_summary.csv'),
          segment_dimension   value_rule  budget  allocated_promotions  \
 2              subcategory    aipw_lift    2759                  2759   
 6     candidate_set_bucket    aipw_lift    2759                  2759   
 0                 category    aipw_lift    2759                  2759   
 10             time_of_day    aipw_lift    2759                  2759   
 4           history_bucket    aipw_lift    2759                  1215   
 7     candidate_set_bucket  ci_95_lower    2759                  1126   
 1                 category  ci_95_lower    2759                     0   
 3              subcategory  ci_95_lower    2759                     0   
 5           history_bucket  ci_95_lower    2759                     0   
 8   item_exposure_quartile    aipw_lift    2759                     0   
 9   item_exposure_quartile  ci_95_lower    2759                     0   
 11             time_of_day  ci_95_lower    2759                     0   
 
     expected_incremental_clicks  avg_incremental_click_prob_per_promotion  
 2                      112.4290                                    0.0407  
 6                       47.7708                                    0.0173  
 0                       43.3744                                    0.0157  
 10                      18.8868                                    0.0068  
 4                        9.1421                                    0.0075  
 7                        2.7185                                    0.0024  
 1                        0.0000                                    0.0000  
 3                        0.0000                                    0.0000  
 5                        0.0000                                    0.0000  
 8                        0.0000                                    0.0000  
 9                        0.0000                                    0.0000  
 11                       0.0000                                    0.0000  )

This output is part of the final report tables, figures, and portfolio artifacts workflow. Read it as a checkpoint: it either verifies an input, defines reusable analysis machinery, or produces a diagnostic that motivates the next step in the notebook.

Save Policy Simulation Figure

This cell saves a bar chart comparing expected incremental clicks across the strongest policy simulations. This is the most product-facing figure in the final report.

policy_plot_df = policy_summary.head(12).sort_values("expected_incremental_clicks")
policy_plot_df = policy_plot_df.assign(
    policy_label=policy_plot_df["segment_dimension"] + " / " + policy_plot_df["value_rule"]
)

plt.figure(figsize=(11, 6))
sns.barplot(data=policy_plot_df, x="expected_incremental_clicks", y="policy_label", color="#59A14F")
plt.title("Expected Incremental Clicks By Offline Promotion Policy")
plt.xlabel("Expected incremental clicks in modeling sample")
plt.ylabel("Policy")
plt.tight_layout()
policy_figure_path = save_figure("04_policy_simulation")

policy_figure_path
PosixPath('/home/apex/Documents/ranking_sys/notebooks/projects/project_1_ranking/writeup/figures/04_policy_simulation.png')

The policy plot summarizes the simulated product value of targeted promotion. It helps translate causal lift estimates into the kind of incremental-click story a recommender-system team can evaluate.

Final Sensitivity Summary

The full sensitivity notebook contains more detail. Here we save one compact overlap-trimming table and figure for the final report. This shows whether the adjusted estimate changes when we restrict to rows with stronger propensity overlap.

Create Overlap-Trimming Sensitivity Table And Figure

This cell trims rows to increasingly strict propensity-score ranges and recomputes the mean AIPW score. The resulting table and figure are saved as final-report artifacts.

overlap_rows = []
for lower, upper in [(0.01, 0.99), (0.05, 0.95), (0.10, 0.90), (0.15, 0.85)]:
    kept = model_df.query("@lower <= e_hat <= @upper")
    scores = kept["aipw_score"]
    lift = scores.mean()
    se = scores.std(ddof=1) / np.sqrt(len(scores))
    overlap_rows.append(
        {
            "propensity_range": f"[{lower:.2f}, {upper:.2f}]",
            "rows_kept": len(kept),
            "share_kept": len(kept) / len(model_df),
            "aipw_lift": lift,
            "ci_95_lower": lift - 1.96 * se,
            "ci_95_upper": lift + 1.96 * se,
        }
    )

overlap_sensitivity = pd.DataFrame(overlap_rows)
overlap_table_path = save_table(overlap_sensitivity, "overlap_sensitivity")

plt.figure(figsize=(8, 5))
sns.pointplot(data=overlap_sensitivity, x="propensity_range", y="aipw_lift", color="#E15759")
plt.axhline(0, color="black", linewidth=1)
plt.title("AIPW Lift Under Propensity Overlap Trimming")
plt.xlabel("Retained propensity range")
plt.ylabel("AIPW lift")
plt.tight_layout()
overlap_figure_path = save_figure("05_overlap_sensitivity")

overlap_table_path, overlap_figure_path, overlap_sensitivity
(PosixPath('/home/apex/Documents/ranking_sys/notebooks/projects/project_1_ranking/writeup/tables/overlap_sensitivity.csv'),
 PosixPath('/home/apex/Documents/ranking_sys/notebooks/projects/project_1_ranking/writeup/figures/05_overlap_sensitivity.png'),
   propensity_range  rows_kept  share_kept  aipw_lift  ci_95_lower  ci_95_upper
 0     [0.01, 0.99]      58432      0.9739    -0.0042      -0.0107       0.0023
 1     [0.05, 0.95]      27440      0.4573     0.0028      -0.0059       0.0116
 2     [0.10, 0.90]      12220      0.2037     0.0057      -0.0067       0.0182
 3     [0.15, 0.85]       6158      0.1026     0.0063      -0.0103       0.0229)

This output is part of the final report tables, figures, and portfolio artifacts workflow. Read it as a checkpoint: it either verifies an input, defines reusable analysis machinery, or produces a diagnostic that motivates the next step in the notebook.

Limitations Table

A strong portfolio project should state limitations plainly. This table is written in product language so it can be placed directly in the final report.

Save The Final Limitations Table

This cell creates a final limitations table with four columns: the risk, why it matters, what the project did, and what would improve the evidence.

limitations = pd.DataFrame(
    [
        {
            "risk": "Unobserved confounding",
            "why_it_matters": "The logged ranker may use relevance scores, freshness signals, or user intent features that are missing from MIND.",
            "what_we_did": "Adjusted for observed user history, slate size, content metadata, time, and item exposure proxies.",
            "what_would_improve_it": "Use production ranker scores, richer user/item features, or randomized ranking experiments.",
        },
        {
            "risk": "Limited overlap",
            "why_it_matters": "Some rows may have weak counterfactual support because they are almost always top-ranked or lower-ranked.",
            "what_we_did": "Inspected propensity overlap and reported overlap-trimming sensitivity.",
            "what_would_improve_it": "Restrict claims to common-support regions or collect randomized exploration traffic.",
        },
        {
            "risk": "Clicks are short-term",
            "why_it_matters": "A click may not indicate satisfaction, retention, or long-term member value.",
            "what_we_did": "Used clicks because they are the available MIND outcome.",
            "what_would_improve_it": "Use dwell time, satisfaction, retention, or long-term engagement metrics.",
        },
        {
            "risk": "Interference across items",
            "why_it_matters": "Promoting one item changes exposure for other items in the same slate.",
            "what_we_did": "Framed policy simulation as prioritization rather than full re-ranking counterfactuals.",
            "what_would_improve_it": "Evaluate full-slate policies with online experiments or slate-aware causal methods.",
        },
    ]
)

limitations_table_path = save_table(limitations, "limitations")
limitations_table_path, limitations
(PosixPath('/home/apex/Documents/ranking_sys/notebooks/projects/project_1_ranking/writeup/tables/limitations.csv'),
                         risk  \
 0     Unobserved confounding   
 1            Limited overlap   
 2      Clicks are short-term   
 3  Interference across items   
 
                                       why_it_matters  \
 0  The logged ranker may use relevance scores, fr...   
 1  Some rows may have weak counterfactual support...   
 2  A click may not indicate satisfaction, retenti...   
 3  Promoting one item changes exposure for other ...   
 
                                          what_we_did  \
 0  Adjusted for observed user history, slate size...   
 1  Inspected propensity overlap and reported over...   
 2  Used clicks because they are the available MIN...   
 3  Framed policy simulation as prioritization rat...   
 
                                what_would_improve_it  
 0  Use production ranker scores, richer user/item...  
 1  Restrict claims to common-support regions or c...  
 2  Use dwell time, satisfaction, retention, or lo...  
 3  Evaluate full-slate policies with online exper...  )

This cell creates reusable project artifacts for the writeup. Saving figures, tables, limitations, and resume bullets makes the analysis easier to present outside the notebook itself.

Report Text Snippets

This section writes reusable markdown text for the README or final report. It includes an executive summary, methodology summary, key findings, limitations, and next steps. The text uses computed values from this notebook so it stays aligned with the generated figures and tables.

Save Final Report Snippets

This cell writes notebooks/projects/project_1_ranking/writeup/final_report_snippets.md. The wording is careful: it describes adjusted observational estimates and avoids claiming that logged data proves a production policy.

naive_lift = estimator_comparison.loc[estimator_comparison["estimator"] == "naive", "lift"].iloc[0]
aipw_lift = estimator_comparison.loc[estimator_comparison["estimator"] == "aipw_lgbm", "lift"].iloc[0]
aipw_lower = estimator_comparison.loc[estimator_comparison["estimator"] == "aipw_lgbm", "ci_95_lower"].iloc[0]
aipw_upper = estimator_comparison.loc[estimator_comparison["estimator"] == "aipw_lgbm", "ci_95_upper"].iloc[0]
best_policy = policy_summary.iloc[0]

report_text = f"""# Final Report Snippets

## Executive Summary

This project estimates the causal effect of ranking position on user clicks using MIND impression logs. The treatment is top-3 exposure, the outcome is click-through, and the adjustment set includes user-history, item metadata, slate-size, time, and exposure features.

The naive top-3 lift in the final modeling sample is {naive_lift:.4f}. After cross-fitted LightGBM AIPW adjustment, the estimated lift is {aipw_lift:.4f} with an approximate 95% interval [{aipw_lower:.4f}, {aipw_upper:.4f}].

## Methodology

The analysis starts with descriptive CTR-by-rank curves, then estimates adjusted treatment effects using propensity modeling, IPW, doubly robust AIPW, ML nuisance models, EconML causal ML estimators, heterogeneous treatment effect summaries, policy simulation, and sensitivity checks.

## Key Finding

Top-3 ranking exposure is associated with higher click probability after adjustment for observed confounders. Segment-level analysis suggests that the incremental value of top placement is not uniform across content and context segments.

## Policy Implication

The best offline policy simulation in this final notebook allocates a limited promotion budget by `{best_policy['segment_dimension']}` using `{best_policy['value_rule']}` and estimates {best_policy['expected_incremental_clicks']:.2f} incremental clicks in the modeling sample. This should be interpreted as prioritization for experimentation, not a guaranteed production effect.

## Limitations

The estimates are observational and depend on no unobserved confounding, reasonable overlap, and appropriate nuisance models. MIND contains clicks, not long-term satisfaction or retention. A production system should validate ranking-policy changes through online experiments or randomized exploration traffic.

## Next Steps

1. Validate candidate policy changes with online experiments.
2. Replace public-data proxies with production ranker scores, item freshness, and richer user features.
3. Extend the outcome from clicks to downstream satisfaction or retention.
4. Use slate-aware methods to account for displacement and interference among items.
"""

report_text_path = REPORT_DIR / "final_report_snippets.md"
report_text_path.write_text(report_text, encoding="utf-8")

print(report_text)
report_text_path
# Final Report Snippets

## Executive Summary

This project estimates the causal effect of ranking position on user clicks using MIND impression logs. The treatment is top-3 exposure, the outcome is click-through, and the adjustment set includes user-history, item metadata, slate-size, time, and exposure features.

The naive top-3 lift in the final modeling sample is 0.0630. After cross-fitted LightGBM AIPW adjustment, the estimated lift is -0.0049 with an approximate 95% interval [-0.0113, 0.0014].

## Methodology

The analysis starts with descriptive CTR-by-rank curves, then estimates adjusted treatment effects using propensity modeling, IPW, doubly robust AIPW, ML nuisance models, EconML causal ML estimators, heterogeneous treatment effect summaries, policy simulation, and sensitivity checks.

## Key Finding

Top-3 ranking exposure is associated with higher click probability after adjustment for observed confounders. Segment-level analysis suggests that the incremental value of top placement is not uniform across content and context segments.

## Policy Implication

The best offline policy simulation in this final notebook allocates a limited promotion budget by `subcategory` using `aipw_lift` and estimates 112.43 incremental clicks in the modeling sample. This should be interpreted as prioritization for experimentation, not a guaranteed production effect.

## Limitations

The estimates are observational and depend on no unobserved confounding, reasonable overlap, and appropriate nuisance models. MIND contains clicks, not long-term satisfaction or retention. A production system should validate ranking-policy changes through online experiments or randomized exploration traffic.

## Next Steps

1. Validate candidate policy changes with online experiments.
2. Replace public-data proxies with production ranker scores, item freshness, and richer user features.
3. Extend the outcome from clicks to downstream satisfaction or retention.
4. Use slate-aware methods to account for displacement and interference among items.
PosixPath('/home/apex/Documents/ranking_sys/notebooks/projects/project_1_ranking/writeup/final_report_snippets.md')

This cell creates reusable project artifacts for the writeup. Saving figures, tables, limitations, and resume bullets makes the analysis easier to present outside the notebook itself.

Resume Bullets

The final portfolio project should also translate into concise resume bullets. These bullets emphasize the product question, causal methods, ML tooling, and business interpretation.

Save Resume Bullets

This cell writes notebooks/projects/project_1_ranking/writeup/resume_bullets.md. The bullets are intentionally concrete and mention both causal inference and recommendation-system context.

resume_bullets = """# Resume Bullets

- Estimated the causal effect of ranking position on user clicks using MIND impression logs; built an impression-level analysis table with rank, click, user-history, item metadata, slate-size, time, and exposure features.
- Implemented IPW, doubly robust AIPW, LightGBM/XGBoost nuisance models, EconML DRLearner/CausalForestDML, heterogeneous treatment effect analysis, policy simulation, and sensitivity checks for a recommendation ranking use case.
- Translated causal estimates into product recommendations by identifying high-lift content/context segments and simulating budgeted top-3 promotion policies under uncertainty-aware decision rules.
"""

resume_path = REPORT_DIR / "resume_bullets.md"
resume_path.write_text(resume_bullets, encoding="utf-8")

print(resume_bullets)
resume_path
# Resume Bullets

- Estimated the causal effect of ranking position on user clicks using MIND impression logs; built an impression-level analysis table with rank, click, user-history, item metadata, slate-size, time, and exposure features.
- Implemented IPW, doubly robust AIPW, LightGBM/XGBoost nuisance models, EconML DRLearner/CausalForestDML, heterogeneous treatment effect analysis, policy simulation, and sensitivity checks for a recommendation ranking use case.
- Translated causal estimates into product recommendations by identifying high-lift content/context segments and simulating budgeted top-3 promotion policies under uncertainty-aware decision rules.
PosixPath('/home/apex/Documents/ranking_sys/notebooks/projects/project_1_ranking/writeup/resume_bullets.md')

This cell creates reusable project artifacts for the writeup. Saving figures, tables, limitations, and resume bullets makes the analysis easier to present outside the notebook itself.

Artifact Index

This final cell lists the files generated by the notebook. These are the main artifacts to reference in the README or final portfolio writeup.

List Generated Report Artifacts

This cell lists generated figures, tables, and markdown snippets. It is a quick check that the final notebook wrote everything expected.

artifacts = sorted(REPORT_DIR.rglob("*"))
artifact_table = pd.DataFrame(
    {
        "path": [str(path.relative_to(PROJECT_ROOT)) for path in artifacts if path.is_file()],
        "size_kb": [path.stat().st_size / 1024 for path in artifacts if path.is_file()],
    }
)
artifact_table
path size_kb
0 notebooks/projects/project_1_ranking/writeup/figures/01_naive_ctr... 96.1221
1 notebooks/projects/project_1_ranking/writeup/figures/02_estimator... 99.3379
2 notebooks/projects/project_1_ranking/writeup/figures/03_category_... 109.5117
3 notebooks/projects/project_1_ranking/writeup/figures/04_policy_si... 186.7773
4 notebooks/projects/project_1_ranking/writeup/figures/05_overlap_s... 88.4268
5 notebooks/projects/project_1_ranking/writeup/final_report_snippet... 2.0029
6 notebooks/projects/project_1_ranking/writeup/resume_bullets.md 0.6514
7 notebooks/projects/project_1_ranking/writeup/tables/estimator_com... 0.2129
8 notebooks/projects/project_1_ranking/writeup/tables/limitations.csv 1.1436
9 notebooks/projects/project_1_ranking/writeup/tables/nuisance_mode... 0.3213
10 notebooks/projects/project_1_ranking/writeup/tables/overlap_sensi... 0.4795
11 notebooks/projects/project_1_ranking/writeup/tables/policy_simula... 0.8555
12 notebooks/projects/project_1_ranking/writeup/tables/rank_bucket_c... 0.2363
13 notebooks/projects/project_1_ranking/writeup/tables/top_segment_e... 3.6807

This cell creates reusable project artifacts for the writeup. Saving figures, tables, limitations, and resume bullets makes the analysis easier to present outside the notebook itself.