08. Causal Inference for Reinforcement Learning

Reinforcement Learning

Off-Policy Evaluation

Causal Inference

Lecture Notes

Sequential decisions, logged policies, bandits, off-policy evaluation, offline RL, RLHF, LLM agents, monitoring, and policy improvement.

Published

May 3, 2026

This course is written for learners who know the earlier causal tracks but may be new to reinforcement learning. It builds RL concepts from the causal viewpoint and then studies logged decisions, policy evaluation, offline RL, RLHF, and agentic systems.

Notebook links open rendered HTML pages generated from the source notebooks under notebooks/lectures/. Code is visible by default; rendering is configured not to execute live notebook code, so local LLM or GPU-heavy cells are not triggered during website builds.

Notebook Sequence

How To Read This Track

Work through the notebooks in order if you want the full course arc.
Treat each notebook as a lecture plus lab: read the discussion, inspect the code, and rerun locally when you want to experiment.
For AI-heavy notebooks, expect some brittleness when live model calls are enabled; that instability is part of the course material rather than something hidden from the reader.

The .ipynb sources remain in the matching folder under notebooks/lectures/.