Off-policy evaluation for recommendation systems

Causal Inference
Off-Policy Evaluation
Recommender Systems
Doubly Robust
Causal ML
A causal ML project estimating counterfactual recommendation-policy value from logged bandit data with IPS, SNIPS, direct method, doubly robust OPE, and support diagnostics.
Published

April 27, 2026

Decision Question

Which recommendation policy is credible enough to move from offline analysis into an online A/B test?

Causal Setup

  • Context: user, item, position, and time features available before recommendation.
  • Action: recommended item.
  • Reward: click.
  • Behavior policy: logged policy that generated historical data.

Methods

  • IPS and self-normalized IPS
  • Direct method reward modeling
  • Doubly robust off-policy evaluation
  • Effective sample size and weight-tail diagnostics
  • Contextual policy learning

Portfolio Takeaway

The project shows how to separate high point estimates from credible offline policy candidates by auditing support, clipping sensitivity, and reward-model stability.

Selected Figures

01 Fixed Policy Ope Estimates

02 Contextual Policy Value And Risk

03 Lift Vs Support

Notebook Sequence

The links below open rendered HTML versions of the notebooks. The source .ipynb files are kept in the matching folder under notebooks/projects/.

  1. 01 Open Bandit EDA
  2. 02 Behavior Policy And Propensity Diagnostics
  3. 03 IPS And SNIPS Policy Evaluation
  4. 04 Doubly Robust Off-Policy Evaluation
  5. 05 Policy Comparison And Sensitivity
  6. 06 Contextual Policy Learning
  7. 07 Final Report And Artifacts

Generated Artifacts

Limitations

These are notebook-driven causal analyses, not production guarantees. Each project should be read with its identification assumptions, support diagnostics, measurement choices, and sensitivity checks in view.