04. Customer Retention and Churn Interventions

Retention analytics is often presented as a prediction problem:

Which customers are most likely to churn?

That is useful, but it is not the decision problem. The decision problem is:

Which customers should receive an intervention because the intervention changes their behavior enough to justify its cost?

Those are different questions. A customer can be very likely to churn and still be impossible to save. Another customer can have moderate churn risk but be highly responsive to a retention offer. A third customer may be loyal already and annoyed by unnecessary outreach.

This notebook treats churn intervention as a causal targeting problem. We will simulate a randomized retention campaign, compare churn-risk targeting with uplift targeting, estimate treatment effects, build targeting curves, account for customer lifetime value, and write a decision memo.

Learning Goals

By the end of this notebook, you should be able to:

  • Explain why churn prediction is not the same as retention targeting.
  • Define churn-risk, churn-reduction uplift, and net retention value.
  • Estimate an average treatment effect from a randomized retention campaign.
  • Use a two-model uplift approach to estimate heterogeneous retention effects.
  • Compare targeting by churn risk, uplift, customer lifetime value, and net value.
  • Build cumulative value curves for retention policies.
  • Include contact costs, offer costs, capacity constraints, and contact harm.
  • Translate a retention model into an operational decision memo.

1. Setup

We will use pandas, numpy, sklearn, seaborn, matplotlib, and Graphviz.

import warnings

warnings.filterwarnings("ignore")

from IPython.display import Markdown, display
from graphviz import Digraph
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from scipy.special import expit
from scipy.stats import norm
from sklearn.compose import ColumnTransformer
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder


rng = np.random.default_rng(20260430)

sns.set_theme(style="whitegrid", context="notebook")
plt.rcParams["figure.figsize"] = (10, 5)
plt.rcParams["axes.spines.top"] = False
plt.rcParams["axes.spines.right"] = False
plt.rcParams["figure.dpi"] = 130


def dollars(x, digits=0):
    return f"${x:,.{digits}f}"


def pct(x, digits=1):
    return f"{100 * x:.{digits}f}%"


def styled_table(df, money_cols=None, pct_cols=None, num_cols=None):
    money_cols = money_cols or []
    pct_cols = pct_cols or []
    num_cols = num_cols or []
    fmt = {}
    for col in money_cols:
        fmt[col] = lambda v: dollars(v, 2) if abs(v) < 100 else dollars(v, 0)
    for col in pct_cols:
        fmt[col] = lambda v: pct(v, 2)
    for col in num_cols:
        fmt[col] = lambda v: f"{v:,.3f}"
    return df.style.format(fmt)

2. Churn Prediction Is Not Retention Targeting

Let \(Y_i=1\) mean customer \(i\) churns during the measurement window.

A churn-risk model estimates:

\[ r(x) = P(Y=1\mid X=x) \]

A retention intervention model asks a different question. Let:

\[ Y_i(1) = \text{churn outcome if customer } i \text{ receives the intervention} \]

\[ Y_i(0) = \text{churn outcome if customer } i \text{ does not receive the intervention} \]

For churn prevention, define individual uplift as churn reduction:

\[ \tau(x) = E[Y(0)-Y(1)\mid X=x] \]

Positive \(\tau(x)\) means the intervention reduces churn. Negative \(\tau(x)\) means the intervention increases churn or harms the customer relationship.

If the customer has expected lifetime value \(CLV(x)\) and the intervention costs \(C(x)\), the business estimand is net retention value:

\[ \text{Net Value}(x) = \tau(x)\cdot CLV(x) - C(x) \]

The operational targeting rule is:

\[ \text{Contact customer } i \text{ if } \widehat{\text{Net Value}}(X_i)>0 \]

possibly subject to capacity, fairness, and customer-experience constraints.

Rzepakowski and Jaroszewicz (2012) make the key distinction in uplift modeling: traditional response models estimate response under treatment, while uplift models estimate the change in behavior caused by the action. Zhang, Li, and Liu (2020) connect uplift modeling with heterogeneous treatment-effect modeling under standard causal assumptions. In churn settings, this distinction is the difference between “likely to leave” and “worth contacting.”

dot = Digraph("retention_decision_graph", format="svg")
dot.attr(rankdir="LR", bgcolor="transparent")
dot.attr("node", shape="box", style="rounded,filled", color="#3B4252", fillcolor="#EEF2F7", fontname="DejaVu Sans")
dot.attr("edge", color="#5E6C84", fontname="DejaVu Sans")

dot.node("features", "Customer features\nusage, support,\npayment, tenure")
dot.node("risk", "Churn risk\nP(churn without action)")
dot.node("uplift", "Causal uplift\nchurn reduction")
dot.node("clv", "Customer value\nCLV, margin")
dot.node("cost", "Intervention cost\nsupport time,\ndiscount, fatigue")
dot.node("net", "Net retention value")
dot.node("policy", "Targeting policy")
dot.node("outcome", "Observed outcome\nretained or churned")

dot.edge("features", "risk")
dot.edge("features", "uplift")
dot.edge("features", "clv")
dot.edge("risk", "net", label="not enough")
dot.edge("uplift", "net")
dot.edge("clv", "net")
dot.edge("cost", "net")
dot.edge("net", "policy")
dot.edge("policy", "outcome", label="causal effect")

dot

Customer lifetime value matters because not every saved customer has the same financial value. Glady, Baesens, and Croux (2009) propose modeling churn using customer lifetime value and emphasize identifying customers for whom a retention action will be profitable. Oskarsdottir, Baesens, and Vanthienen (2018) similarly argue for profit-based retention model selection using individual customer lifetime values.

3. Running Example: Subscription Retention Campaign

Imagine a subscription business with a monthly renewal cycle. The company can offer an intervention to customers at risk:

  • concierge onboarding call,
  • billing support,
  • tailored discount,
  • plan-fit review,
  • proactive troubleshooting.

The intervention has costs:

  • staff time,
  • discount expense,
  • operational load,
  • possible contact fatigue.

We simulate a randomized retention experiment where half of eligible customers receive the intervention. Because this is a simulation, we know both potential outcomes. In real retention work, we only observe one outcome per customer.

def simulate_retention_experiment(n=80_000, treatment_share=0.50, seed=404):
    local_rng = np.random.default_rng(seed)

    segments = local_rng.choice(
        ["new_user", "price_sensitive", "power_user", "support_heavy", "premium"],
        size=n,
        p=[0.24, 0.28, 0.22, 0.16, 0.10],
    )
    contract_type = local_rng.choice(["monthly", "annual"], size=n, p=[0.72, 0.28])

    tenure_months = np.clip(local_rng.gamma(shape=2.2, scale=8.0, size=n), 1, 72)
    monthly_value = local_rng.lognormal(mean=np.log(58), sigma=0.45, size=n)
    monthly_value *= np.where(segments == "premium", 1.75, 1.0)
    monthly_value *= np.where(segments == "price_sensitive", 0.82, 1.0)

    usage_days_30 = np.clip(
        local_rng.normal(14, 6, size=n)
        + 5 * (segments == "power_user")
        - 4 * (segments == "new_user")
        - 3 * (segments == "support_heavy"),
        0,
        30,
    )
    support_tickets_90 = local_rng.poisson(
        0.6
        + 1.6 * (segments == "support_heavy")
        + 0.4 * (segments == "new_user"),
        size=n,
    )
    payment_failures = local_rng.binomial(
        2,
        np.clip(0.08 + 0.20 * (segments == "price_sensitive") + 0.10 * (contract_type == "monthly"), 0, 0.65),
        size=n,
    )
    satisfaction = np.clip(
        local_rng.normal(0.58, 0.18, size=n)
        + 0.18 * (segments == "power_user")
        + 0.10 * (segments == "premium")
        - 0.18 * (segments == "support_heavy")
        - 0.10 * payment_failures,
        0,
        1,
    )
    competitor_signal = np.clip(
        local_rng.beta(2.0, 5.0, size=n)
        + 0.22 * (segments == "price_sensitive")
        + 0.14 * (support_tickets_90 >= 3),
        0,
        1,
    )
    email_fatigue = np.clip(
        local_rng.beta(1.8, 4.5, size=n)
        + 0.20 * (segments == "premium")
        + 0.10 * (usage_days_30 > 24),
        0,
        1,
    )

    base_logit = (
        -1.55
        + 1.15 * competitor_signal
        + 0.42 * payment_failures
        + 0.18 * support_tickets_90
        - 1.35 * satisfaction
        - 0.055 * usage_days_30
        - 0.015 * tenure_months
        + 0.42 * (contract_type == "monthly")
        + 0.28 * (segments == "new_user")
        + 0.20 * (segments == "price_sensitive")
    )
    p0 = expit(base_logit)

    # Saveability is highest for moderate-risk customers with fixable problems.
    moderate_risk = np.exp(-((p0 - 0.38) ** 2) / 0.055)
    fixable_problem = (
        0.55 * (support_tickets_90 >= 1)
        + 0.45 * (payment_failures >= 1)
        + 0.35 * (competitor_signal > 0.35)
        + 0.20 * (segments == "new_user")
    )
    save_probability = (
        0.012
        + 0.105 * moderate_risk
        + 0.035 * fixable_problem
        + 0.020 * (segments == "premium")
        - 0.040 * (p0 > 0.78)
    )
    contact_harm = (
        0.010
        + 0.035 * email_fatigue * (p0 < 0.20)
        + 0.018 * (segments == "power_user") * (support_tickets_90 == 0)
    )
    p1 = np.clip(p0 - save_probability + contact_harm, 0.002, 0.96)

    shared_draw = local_rng.uniform(size=n)
    churn0 = (shared_draw < p0).astype(int)
    churn1 = (shared_draw < p1).astype(int)

    treatment = local_rng.binomial(1, treatment_share, size=n)
    observed_churn = np.where(treatment == 1, churn1, churn0)

    expected_remaining_months = np.clip(
        7.5
        + 0.15 * tenure_months
        + 5.0 * satisfaction
        + 4.0 * (contract_type == "annual")
        - 3.0 * competitor_signal,
        3,
        30,
    )
    gross_margin = 0.62
    clv = monthly_value * gross_margin * expected_remaining_months

    intervention_cost = (
        8.0
        + 0.05 * monthly_value
        + 12.0 * (payment_failures >= 1)
        + 7.0 * (support_tickets_90 >= 3)
        + 4.0 * (segments == "premium")
    )

    profit0 = (1 - churn0) * clv
    profit1 = (1 - churn1) * clv - intervention_cost
    observed_profit = np.where(treatment == 1, profit1, profit0)

    df = pd.DataFrame(
        {
            "customer_id": np.arange(n),
            "segment": segments,
            "contract_type": contract_type,
            "tenure_months": tenure_months,
            "monthly_value": monthly_value,
            "usage_days_30": usage_days_30,
            "support_tickets_90": support_tickets_90,
            "payment_failures": payment_failures,
            "satisfaction": satisfaction,
            "competitor_signal": competitor_signal,
            "email_fatigue": email_fatigue,
            "treatment": treatment,
            "churn": observed_churn,
            "observed_profit": observed_profit,
            "clv": clv,
            "intervention_cost": intervention_cost,
            "p_churn0": p0,
            "p_churn1": p1,
            "true_churn_uplift": p0 - p1,
            "expected_profit_uplift": (p0 - p1) * clv - intervention_cost,
            "churn0": churn0,
            "churn1": churn1,
            "profit0": profit0,
            "profit1": profit1,
            "realized_profit_uplift": profit1 - profit0,
        }
    )

    df["latent_response_type"] = np.select(
        [
            (df["churn0"] == 1) & (df["churn1"] == 0),
            (df["churn0"] == 1) & (df["churn1"] == 1),
            (df["churn0"] == 0) & (df["churn1"] == 0),
            (df["churn0"] == 0) & (df["churn1"] == 1),
        ],
        [
            "saved by intervention",
            "not saved",
            "would stay anyway",
            "harmed by contact",
        ],
        default="other",
    )
    return df


retention = simulate_retention_experiment()

display(retention.head())
print(f"Rows: {len(retention):,}")
print(f"Treatment share: {retention['treatment'].mean():.1%}")
print(f"Observed churn rate: {retention['churn'].mean():.1%}")
print(f"Mean CLV: {dollars(retention['clv'].mean(), 0)}")
print(f"Mean intervention cost if contacted: {dollars(retention['intervention_cost'].mean(), 0)}")
customer_id segment contract_type tenure_months monthly_value usage_days_30 support_tickets_90 payment_failures satisfaction competitor_signal ... p_churn0 p_churn1 true_churn_uplift expected_profit_uplift churn0 churn1 profit0 profit1 realized_profit_uplift latent_response_type
0 0 price_sensitive annual 14.047458 22.247393 15.689335 1 0 0.390349 0.617586 ... 0.113008 0.062117 0.050892 0.508876 1 1 0.000000 -9.112370 -9.112370 not saved
1 1 price_sensitive monthly 7.139827 31.844243 11.305244 0 2 0.387166 0.478387 ... 0.311865 0.185364 0.126501 1.064794 1 1 0.000000 -21.592212 -21.592212 not saved
2 2 price_sensitive monthly 40.326813 33.588314 14.649203 0 0 0.388226 0.412206 ... 0.083886 0.049810 0.034077 0.435416 0 0 296.826395 287.146979 -9.679416 would stay anyway
3 3 power_user annual 7.051698 87.023121 17.448516 1 0 0.682430 0.440927 ... 0.054699 0.009692 0.045007 23.216791 0 0 790.275768 777.924612 -12.351156 would stay anyway
4 4 price_sensitive monthly 6.429099 104.297991 18.328131 0 0 0.407344 0.581802 ... 0.128384 0.091274 0.037109 7.795775 0 0 566.183759 552.968859 -13.214900 would stay anyway

5 rows × 26 columns

Rows: 80,000
Treatment share: 49.9%
Observed churn rate: 10.2%
Mean CLV: $531
Mean intervention cost if contacted: $17

The simulation creates four latent response types:

  • customers saved by the intervention,
  • customers who would churn even with intervention,
  • customers who would stay anyway,
  • customers harmed by unnecessary contact.

In real data these types are latent because we never observe both potential outcomes. They are included here only to make the targeting problem visible.

type_summary = (
    retention.groupby("latent_response_type")
    .agg(
        customers=("customer_id", "size"),
        share=("customer_id", lambda x: len(x) / len(retention)),
        avg_p_churn0=("p_churn0", "mean"),
        avg_true_churn_uplift=("true_churn_uplift", "mean"),
        avg_expected_profit_uplift=("expected_profit_uplift", "mean"),
        avg_clv=("clv", "mean"),
    )
    .reset_index()
    .sort_values("share", ascending=False)
)

display(
    styled_table(
        type_summary,
        pct_cols=["share", "avg_p_churn0", "avg_true_churn_uplift"],
        money_cols=["avg_expected_profit_uplift", "avg_clv"],
    )
)
  latent_response_type customers share avg_p_churn0 avg_true_churn_uplift avg_expected_profit_uplift avg_clv
3 would stay anyway 69628 87.03% 12.09% 4.94% $7.58 $541
1 not saved 6022 7.53% 19.51% 7.61% $13.00 $451
2 saved by intervention 4295 5.37% 18.81% 8.18% $16.66 $473
0 harmed by contact 55 0.07% 3.23% -1.66% $-20.65 $570

Already we can see why retention is not a simple classification problem. The intervention can be profitable for some customers, wasteful for others, and harmful for a small subset.

4. Experiment Balance and Average Treatment Effect

Because treatment is randomized, pre-treatment features should be balanced across treatment groups. Balance is not proof of correct execution, but imbalance is a warning sign.

balance = (
    retention.groupby("treatment")
    .agg(
        customers=("customer_id", "size"),
        monthly_value=("monthly_value", "mean"),
        tenure_months=("tenure_months", "mean"),
        usage_days_30=("usage_days_30", "mean"),
        support_tickets_90=("support_tickets_90", "mean"),
        payment_failures=("payment_failures", "mean"),
        satisfaction=("satisfaction", "mean"),
        competitor_signal=("competitor_signal", "mean"),
        clv=("clv", "mean"),
    )
    .rename(index={0: "Control", 1: "Retention intervention"})
    .reset_index(names="group")
)

display(
    styled_table(
        balance,
        money_cols=["monthly_value", "clv"],
        num_cols=["tenure_months", "usage_days_30", "support_tickets_90", "payment_failures", "satisfaction", "competitor_signal"],
    )
)
  group customers monthly_value tenure_months usage_days_30 support_tickets_90 payment_failures satisfaction competitor_signal clv
0 Control 40067 $65.86 17.505 13.672 0.955 0.417 0.559 0.360 $532
1 Retention intervention 39933 $65.43 17.561 13.697 0.950 0.419 0.558 0.360 $529
def ate_binary_reduction(df, outcome_col="churn", treatment_col="treatment"):
    treated = df[treatment_col].eq(1)
    y_t = df.loc[treated, outcome_col]
    y_c = df.loc[~treated, outcome_col]
    # For churn, positive means treatment reduces churn.
    effect = y_c.mean() - y_t.mean()
    se = np.sqrt(y_t.var(ddof=1) / len(y_t) + y_c.var(ddof=1) / len(y_c))
    return effect, effect - 1.96 * se, effect + 1.96 * se


def ate_profit(df, outcome_col="observed_profit", treatment_col="treatment"):
    treated = df[treatment_col].eq(1)
    y_t = df.loc[treated, outcome_col]
    y_c = df.loc[~treated, outcome_col]
    effect = y_t.mean() - y_c.mean()
    se = np.sqrt(y_t.var(ddof=1) / len(y_t) + y_c.var(ddof=1) / len(y_c))
    return effect, effect - 1.96 * se, effect + 1.96 * se


churn_effect, churn_lo, churn_hi = ate_binary_reduction(retention)
profit_effect, profit_lo, profit_hi = ate_profit(retention)

ate_readout = pd.DataFrame(
    {
        "metric": [
            "Control churn rate",
            "Treatment churn rate",
            "Churn reduction ATE",
            "Churn reduction CI low",
            "Churn reduction CI high",
            "Profit ATE per customer",
            "Profit ATE CI low",
            "Profit ATE CI high",
        ],
        "value": [
            retention.loc[retention["treatment"].eq(0), "churn"].mean(),
            retention.loc[retention["treatment"].eq(1), "churn"].mean(),
            churn_effect,
            churn_lo,
            churn_hi,
            profit_effect,
            profit_lo,
            profit_hi,
        ],
    }
)

display(ate_readout)
print(f"True average churn reduction: {retention['true_churn_uplift'].mean():.2%}")
print(f"True average expected net value if everyone is contacted: {dollars(retention['expected_profit_uplift'].mean(), 2)} per customer")
metric value
0 Control churn rate 0.127337
1 Treatment churn rate 0.076428
2 Churn reduction ATE 0.050909
3 Churn reduction CI low 0.046732
4 Churn reduction CI high 0.055085
5 Profit ATE per customer 4.173300
6 Profit ATE CI low -0.721662
7 Profit ATE CI high 9.068262
True average churn reduction: 5.31%
True average expected net value if everyone is contacted: $8.45 per customer

The average treatment effect answers whether the intervention works on average. The targeting question is more specific:

Among all eligible customers, who should receive the intervention?

An average effect can be positive while broad targeting is still inefficient.

5. Churn-Risk Model

A standard retention workflow begins with churn prediction. We will train a model on the randomized control group to estimate churn risk without intervention:

\[ \hat{r}(x) \approx P(Y(0)=1\mid X=x) \]

This is a valid and useful model, but it estimates risk, not saveability.

feature_cols = [
    "segment",
    "contract_type",
    "tenure_months",
    "monthly_value",
    "usage_days_30",
    "support_tickets_90",
    "payment_failures",
    "satisfaction",
    "competitor_signal",
    "email_fatigue",
]
categorical_cols = ["segment", "contract_type"]
numeric_cols = [c for c in feature_cols if c not in categorical_cols]


def make_classifier(seed=1, max_iter=140):
    preprocess = ColumnTransformer(
        transformers=[
            ("cat", OneHotEncoder(handle_unknown="ignore", sparse_output=False), categorical_cols),
            ("num", "passthrough", numeric_cols),
        ]
    )
    model = HistGradientBoostingClassifier(
        max_iter=max_iter,
        learning_rate=0.055,
        max_leaf_nodes=24,
        random_state=seed,
    )
    return Pipeline([("preprocess", preprocess), ("model", model)])


train_df, test_df = train_test_split(
    retention,
    test_size=0.40,
    random_state=17,
    stratify=retention["treatment"],
)

control_train = train_df.query("treatment == 0")
control_test = test_df.query("treatment == 0")

risk_model = make_classifier(seed=11)
risk_model.fit(control_train[feature_cols], control_train["churn"])

test_df = test_df.copy()
test_df["risk_score"] = risk_model.predict_proba(test_df[feature_cols])[:, 1]

auc_control = roc_auc_score(control_test["churn"], risk_model.predict_proba(control_test[feature_cols])[:, 1])
print(f"Control-group churn-risk AUC: {auc_control:.3f}")
Control-group churn-risk AUC: 0.699
fig, axes = plt.subplots(1, 2, figsize=(12, 4.5))

sns.scatterplot(
    data=test_df.sample(2500, random_state=3),
    x="risk_score",
    y="true_churn_uplift",
    hue="segment",
    alpha=0.55,
    ax=axes[0],
)
axes[0].set_title("Churn risk and saveability are related but not identical")
axes[0].set_xlabel("Predicted churn risk without intervention")
axes[0].set_ylabel("True churn reduction")
axes[0].yaxis.set_major_formatter(lambda x, pos: f"{100*x:.0f}%")
axes[0].xaxis.set_major_formatter(lambda x, pos: f"{100*x:.0f}%")
axes[0].legend(fontsize=7, title="")

sns.scatterplot(
    data=test_df.sample(2500, random_state=5),
    x="risk_score",
    y="expected_profit_uplift",
    hue="contract_type",
    alpha=0.55,
    ax=axes[1],
)
axes[1].axhline(0, color="#6B7280", linewidth=1)
axes[1].set_title("High churn risk is not always profitable to target")
axes[1].set_xlabel("Predicted churn risk without intervention")
axes[1].set_ylabel("True expected net value")
axes[1].xaxis.set_major_formatter(lambda x, pos: f"{100*x:.0f}%")

plt.tight_layout()
plt.show()

A model can rank churn risk well and still be suboptimal for targeting. The highest-risk customers may include many customers who are already too far gone, while moderately risky customers with fixable problems may be more responsive.

6. Uplift Modeling with a Two-Model Approach

A simple uplift method trains separate outcome models for treated and control customers:

\[ \hat{\mu}_0(x) = \widehat{E}[Y\mid W=0,X=x] \]

\[ \hat{\mu}_1(x) = \widehat{E}[Y\mid W=1,X=x] \]

For churn reduction:

\[ \hat{\tau}(x) = \hat{\mu}_0(x)-\hat{\mu}_1(x) \]

This is often called a two-model or T-learner approach. It is easy to explain to stakeholders and works naturally with randomized retention data.

control_model = make_classifier(seed=21)
treated_model = make_classifier(seed=22)

control_model.fit(train_df.query("treatment == 0")[feature_cols], train_df.query("treatment == 0")["churn"])
treated_model.fit(train_df.query("treatment == 1")[feature_cols], train_df.query("treatment == 1")["churn"])

test_df["p0_hat"] = control_model.predict_proba(test_df[feature_cols])[:, 1]
test_df["p1_hat"] = treated_model.predict_proba(test_df[feature_cols])[:, 1]
test_df["uplift_hat"] = test_df["p0_hat"] - test_df["p1_hat"]
test_df["net_value_hat"] = test_df["uplift_hat"] * test_df["clv"] - test_df["intervention_cost"]

uplift_quality = pd.DataFrame(
    {
        "metric": [
            "Correlation: predicted uplift vs true uplift",
            "Correlation: predicted net value vs true net value",
            "Mean predicted uplift",
            "Mean true uplift",
            "Mean predicted net value",
            "Mean true net value",
        ],
        "value": [
            test_df[["uplift_hat", "true_churn_uplift"]].corr().iloc[0, 1],
            test_df[["net_value_hat", "expected_profit_uplift"]].corr().iloc[0, 1],
            test_df["uplift_hat"].mean(),
            test_df["true_churn_uplift"].mean(),
            test_df["net_value_hat"].mean(),
            test_df["expected_profit_uplift"].mean(),
        ],
    }
)

display(uplift_quality)
metric value
0 Correlation: predicted uplift vs true uplift 0.617777
1 Correlation: predicted net value vs true net v... 0.501015
2 Mean predicted uplift 0.049698
3 Mean true uplift 0.053243
4 Mean predicted net value 6.704789
5 Mean true net value 8.540971
fig, axes = plt.subplots(1, 2, figsize=(12, 4.5))

sns.scatterplot(
    data=test_df.sample(2800, random_state=8),
    x="uplift_hat",
    y="true_churn_uplift",
    hue="segment",
    alpha=0.55,
    ax=axes[0],
)
axes[0].axhline(0, color="#6B7280", linewidth=1)
axes[0].axvline(0, color="#6B7280", linewidth=1)
axes[0].set_title("Predicted versus true churn reduction")
axes[0].set_xlabel("Predicted uplift")
axes[0].set_ylabel("True uplift")
axes[0].xaxis.set_major_formatter(lambda x, pos: f"{100*x:.0f}%")
axes[0].yaxis.set_major_formatter(lambda x, pos: f"{100*x:.0f}%")
axes[0].legend(fontsize=7, title="")

sns.scatterplot(
    data=test_df.sample(2800, random_state=9),
    x="net_value_hat",
    y="expected_profit_uplift",
    hue="contract_type",
    alpha=0.55,
    ax=axes[1],
)
axes[1].axhline(0, color="#6B7280", linewidth=1)
axes[1].axvline(0, color="#6B7280", linewidth=1)
axes[1].set_title("Predicted versus true net retention value")
axes[1].set_xlabel("Predicted net value")
axes[1].set_ylabel("True expected net value")

plt.tight_layout()
plt.show()

The uplift model does not need to be perfect to be useful. It needs to rank customers better than simpler targeting rules under business constraints.

7. Policy Evaluation: Who Should We Contact?

We now compare several targeting policies on the test set:

  • contact nobody,
  • contact everybody,
  • contact the top 25% by churn risk,
  • contact the top 25% by predicted uplift,
  • contact the top 25% by predicted net value,
  • contact customers with positive predicted net value.

Because we simulated the true potential outcomes, we can evaluate each policy using true expected net value. In real life, this comparison would be estimated from randomized validation data or an online experiment.

def evaluate_policy(df, contact_mask, label):
    contacted = df.loc[contact_mask]
    total_net_value = contacted["expected_profit_uplift"].sum()
    saved_probability = contacted["true_churn_uplift"].sum()
    total_cost = contacted["intervention_cost"].sum()
    gross_saved_clv = (contacted["true_churn_uplift"] * contacted["clv"]).sum()
    return {
        "policy": label,
        "customers_contacted": int(contact_mask.sum()),
        "contact_share": contact_mask.mean(),
        "expected_saves": saved_probability,
        "gross_saved_clv": gross_saved_clv,
        "intervention_cost": total_cost,
        "expected_net_value": total_net_value,
        "net_value_per_contact": total_net_value / max(contact_mask.sum(), 1),
    }


def top_share_mask(df, score_col, share=0.25):
    cutoff_n = int(np.ceil(len(df) * share))
    selected_index = df.sort_values(score_col, ascending=False).head(cutoff_n).index
    return df.index.isin(selected_index)


policy_rows = []
policy_rows.append(evaluate_policy(test_df, np.zeros(len(test_df), dtype=bool), "Contact nobody"))
policy_rows.append(evaluate_policy(test_df, np.ones(len(test_df), dtype=bool), "Contact everybody"))
policy_rows.append(evaluate_policy(test_df, top_share_mask(test_df, "risk_score", 0.25), "Top 25% by churn risk"))
policy_rows.append(evaluate_policy(test_df, top_share_mask(test_df, "uplift_hat", 0.25), "Top 25% by uplift"))
policy_rows.append(evaluate_policy(test_df, top_share_mask(test_df, "net_value_hat", 0.25), "Top 25% by net value"))
policy_rows.append(evaluate_policy(test_df, test_df["net_value_hat"].gt(0).to_numpy(), "Predicted positive net value"))

policy_eval = pd.DataFrame(policy_rows)

display(
    styled_table(
        policy_eval,
        pct_cols=["contact_share"],
        money_cols=["gross_saved_clv", "intervention_cost", "expected_net_value", "net_value_per_contact"],
        num_cols=["expected_saves"],
    )
)
  policy customers_contacted contact_share expected_saves gross_saved_clv intervention_cost expected_net_value net_value_per_contact
0 Contact nobody 0 0.00% 0.000 $0.00 $0.00 $0.00 $0.00
1 Contact everybody 32000 100.00% 1,703.762 $808,644 $535,333 $273,311 $8.54
2 Top 25% by churn risk 8000 25.00% 832.043 $323,326 $164,143 $159,183 $19.90
3 Top 25% by uplift 8000 25.00% 743.733 $301,724 $163,379 $138,345 $17.29
4 Top 25% by net value 8000 25.00% 595.412 $354,269 $153,653 $200,615 $25.08
5 Predicted positive net value 17386 54.33% 1,116.903 $577,097 $299,597 $277,500 $15.96

This table is the heart of retention targeting. Churn-risk targeting can contact many customers who look dangerous but are not actually responsive enough to justify the intervention cost. Net-value targeting combines uplift, CLV, and cost.

def cumulative_value_curve(df, score_col, label):
    ordered = df.sort_values(score_col, ascending=False).copy()
    ordered["cum_net_value"] = ordered["expected_profit_uplift"].cumsum()
    ordered["cum_saves"] = ordered["true_churn_uplift"].cumsum()
    ordered["contact_share"] = np.arange(1, len(ordered) + 1) / len(ordered)
    sample = ordered.iloc[np.linspace(0, len(ordered) - 1, 300).astype(int)].copy()
    sample["policy"] = label
    return sample[["contact_share", "cum_net_value", "cum_saves", "policy"]]


random_df = test_df.copy()
random_df["random_score"] = rng.normal(size=len(random_df))

curves = pd.concat(
    [
        cumulative_value_curve(test_df, "risk_score", "Churn risk ranking"),
        cumulative_value_curve(test_df, "uplift_hat", "Uplift ranking"),
        cumulative_value_curve(test_df, "net_value_hat", "Net value ranking"),
        cumulative_value_curve(random_df, "random_score", "Random targeting"),
    ],
    ignore_index=True,
)

fig, axes = plt.subplots(1, 2, figsize=(13, 4.8))

sns.lineplot(data=curves, x="contact_share", y="cum_net_value", hue="policy", ax=axes[0])
axes[0].axhline(0, color="#6B7280", linewidth=1)
axes[0].set_title("Cumulative expected net value")
axes[0].set_xlabel("Share of customers contacted")
axes[0].set_ylabel("Cumulative net value")
axes[0].xaxis.set_major_formatter(lambda x, pos: f"{100*x:.0f}%")

sns.lineplot(data=curves, x="contact_share", y="cum_saves", hue="policy", ax=axes[1], legend=False)
axes[1].set_title("Cumulative expected churn saves")
axes[1].set_xlabel("Share of customers contacted")
axes[1].set_ylabel("Expected saved customers")
axes[1].xaxis.set_major_formatter(lambda x, pos: f"{100*x:.0f}%")

plt.tight_layout()
plt.show()

The best policy is not necessarily the one that saves the most customers. If it saves low-value customers at high cost, it can be less profitable than a smaller, sharper targeting rule.

8. Segment-Level Treatment Effects

Stakeholders often want to know where the intervention works. We can estimate treatment effects by segment.

Within a randomized experiment:

\[ \hat{\tau}_g = \bar{Y}_{g,W=0} - \bar{Y}_{g,W=1} \]

for churn reduction in segment \(g\).

segment_rows = []
for segment, g in retention.groupby("segment"):
    effect, lo, hi = ate_binary_reduction(g)
    profit_eff, profit_lo_g, profit_hi_g = ate_profit(g)
    segment_rows.append(
        {
            "segment": segment,
            "customers": len(g),
            "control_churn": g.loc[g["treatment"].eq(0), "churn"].mean(),
            "treatment_churn": g.loc[g["treatment"].eq(1), "churn"].mean(),
            "estimated_churn_reduction": effect,
            "ci_low": lo,
            "ci_high": hi,
            "profit_ate_per_customer": profit_eff,
            "true_mean_net_value": g["expected_profit_uplift"].mean(),
        }
    )

segment_effects = pd.DataFrame(segment_rows).sort_values("estimated_churn_reduction", ascending=False)

display(
    styled_table(
        segment_effects,
        pct_cols=["control_churn", "treatment_churn", "estimated_churn_reduction", "ci_low", "ci_high"],
        money_cols=["profit_ate_per_customer", "true_mean_net_value"],
    )
)
  segment customers control_churn treatment_churn estimated_churn_reduction ci_low ci_high profit_ate_per_customer true_mean_net_value
3 price_sensitive 22399 16.25% 9.43% 6.81% 5.94% 7.69% $7.36 $7.76
4 support_heavy 12825 16.91% 10.56% 6.35% 5.17% 7.54% $9.33 $14.01
0 new_user 19158 14.20% 8.61% 5.60% 4.70% 6.49% $15.18 $15.65
2 premium 7971 8.28% 3.87% 4.41% 3.37% 5.46% $-8.59 $18.85
1 power_user 17647 5.58% 3.96% 1.63% 1.00% 2.26% $-12.51 $-7.22
fig, ax = plt.subplots(figsize=(10, 4.8))
plot_segments = segment_effects.sort_values("estimated_churn_reduction")
ax.errorbar(
    plot_segments["estimated_churn_reduction"],
    plot_segments["segment"],
    xerr=[
        plot_segments["estimated_churn_reduction"] - plot_segments["ci_low"],
        plot_segments["ci_high"] - plot_segments["estimated_churn_reduction"],
    ],
    fmt="o",
    color="#2F80ED",
    capsize=4,
)
ax.axvline(0, color="#6B7280", linewidth=1)
ax.set_title("Segment-level churn reduction")
ax.set_xlabel("Control churn minus treatment churn")
ax.xaxis.set_major_formatter(lambda x, pos: f"{100*x:.0f}%")
plt.tight_layout()
plt.show()

Segment analysis is useful for communication, but it is coarse. A policy can perform better by ranking customers within each segment using predicted net value.

9. Capacity Constraints

Retention teams often have limited operational capacity. Suppose the support team can contact only 6,000 customers this cycle. The right policy is not “contact all positive customers” if capacity is binding. It is “contact the highest expected net value customers subject to capacity.”

capacity = 6_000
capacity_df = test_df.copy()

capacity_policies = {
    "Top capacity by churn risk": "risk_score",
    "Top capacity by uplift": "uplift_hat",
    "Top capacity by net value": "net_value_hat",
}

capacity_rows = []
for label, score_col in capacity_policies.items():
    selected = capacity_df.sort_values(score_col, ascending=False).head(capacity).index
    mask = capacity_df.index.isin(selected)
    capacity_rows.append(evaluate_policy(capacity_df, mask, label))

capacity_eval = pd.DataFrame(capacity_rows)

display(
    styled_table(
        capacity_eval,
        pct_cols=["contact_share"],
        money_cols=["gross_saved_clv", "intervention_cost", "expected_net_value", "net_value_per_contact"],
        num_cols=["expected_saves"],
    )
)
  policy customers_contacted contact_share expected_saves gross_saved_clv intervention_cost expected_net_value net_value_per_contact
0 Top capacity by churn risk 6000 18.75% 671.913 $252,931 $126,281 $126,650 $21.11
1 Top capacity by uplift 6000 18.75% 597.626 $236,405 $126,619 $109,785 $18.30
2 Top capacity by net value 6000 18.75% 463.766 $289,957 $119,300 $170,658 $28.44

This is the operational form of causal targeting. A model is only valuable if it changes the action list under real constraints.

10. Sensitivity to Intervention Cost

Retention offers are not free. If discount size, support time, or incentive cost increases, the optimal contact depth changes.

cost_multipliers = np.linspace(0.5, 2.0, 16)
sensitivity_rows = []

for mult in cost_multipliers:
    temp = test_df.copy()
    temp["expected_profit_uplift_cost_scenario"] = (
        temp["true_churn_uplift"] * temp["clv"] - mult * temp["intervention_cost"]
    )
    temp["net_value_hat_cost_scenario"] = temp["uplift_hat"] * temp["clv"] - mult * temp["intervention_cost"]
    selected = temp["net_value_hat_cost_scenario"].gt(0)
    sensitivity_rows.append(
        {
            "cost_multiplier": mult,
            "contact_share": selected.mean(),
            "expected_net_value": temp.loc[selected, "expected_profit_uplift_cost_scenario"].sum(),
            "expected_saves": temp.loc[selected, "true_churn_uplift"].sum(),
        }
    )

sensitivity = pd.DataFrame(sensitivity_rows)

display(styled_table(sensitivity, pct_cols=["contact_share"], money_cols=["expected_net_value"], num_cols=["expected_saves"]))
  cost_multiplier contact_share expected_net_value expected_saves
0 0.500000 74.41% $493,239 1,429.249
1 0.600000 70.32% $444,203 1,371.409
2 0.700000 66.33% $397,831 1,312.538
3 0.800000 62.38% $354,687 1,250.150
4 0.900000 58.27% $314,558 1,184.324
5 1.000000 54.33% $277,500 1,116.903
6 1.100000 50.62% $242,743 1,052.824
7 1.200000 47.11% $211,139 988.616
8 1.300000 43.64% $182,280 925.450
9 1.400000 40.40% $156,303 862.636
10 1.500000 37.39% $134,313 805.215
11 1.600000 34.55% $113,002 748.778
12 1.700000 31.75% $94,866 692.812
13 1.800000 29.11% $79,235 641.010
14 1.900000 26.87% $64,040 594.311
15 2.000000 24.66% $51,509 549.150
fig, axes = plt.subplots(1, 2, figsize=(12, 4.5))

sns.lineplot(data=sensitivity, x="cost_multiplier", y="contact_share", marker="o", ax=axes[0], color="#2F80ED")
axes[0].set_title("Higher costs reduce optimal contact depth")
axes[0].set_xlabel("Intervention cost multiplier")
axes[0].set_ylabel("Share contacted")
axes[0].yaxis.set_major_formatter(lambda x, pos: f"{100*x:.0f}%")

sns.lineplot(data=sensitivity, x="cost_multiplier", y="expected_net_value", marker="o", ax=axes[1], color="#C0392B")
axes[1].axhline(0, color="#6B7280", linewidth=1)
axes[1].set_title("Program value is sensitive to offer cost")
axes[1].set_xlabel("Intervention cost multiplier")
axes[1].set_ylabel("Expected net value")

plt.tight_layout()
plt.show()

Sensitivity analysis is especially important when the treatment includes a discount. The treatment effect may be real, but the offer can still be too expensive.

11. Common Retention Threats

Retention interventions can fail for statistical and operational reasons.

threats = pd.DataFrame(
    [
        {
            "threat": "Risk targeting mistaken for uplift targeting",
            "symptom": "The team contacts the highest-risk customers only.",
            "diagnostic": "Compare churn-risk, uplift, and net-value targeting curves.",
            "mitigation": "Randomized holdout and uplift modeling.",
        },
        {
            "threat": "Contact fatigue",
            "symptom": "Low-risk customers complain or churn after unnecessary outreach.",
            "diagnostic": "Measure unsubscribe, complaint, and future engagement guardrails.",
            "mitigation": "Suppress low net-value customers and cap contact frequency.",
        },
        {
            "threat": "Offer leakage",
            "symptom": "Customers learn to threaten churn to receive discounts.",
            "diagnostic": "Track repeated offer exposure and strategic downgrade behavior.",
            "mitigation": "Randomized offer rules, eligibility windows, and non-discount interventions.",
        },
        {
            "threat": "Short outcome window",
            "symptom": "The intervention delays churn but does not improve long-term retention.",
            "diagnostic": "Track 30, 60, 90, and 180-day retention.",
            "mitigation": "Use cumulative CLV outcomes and delayed guardrails.",
        },
        {
            "threat": "Capacity bottleneck",
            "symptom": "The model recommends more contacts than the team can handle.",
            "diagnostic": "Evaluate value under contact capacity constraints.",
            "mitigation": "Rank by expected net value, not binary risk.",
        },
        {
            "threat": "Bad CLV estimates",
            "symptom": "Saved customers do not generate the expected value.",
            "diagnostic": "Backtest CLV calibration by segment.",
            "mitigation": "Use conservative CLV, sensitivity analysis, and finance review.",
        },
        {
            "threat": "Unobserved treatment variation",
            "symptom": "Some agents deliver better interventions than others.",
            "diagnostic": "Log intervention type, agent, script, and completion status.",
            "mitigation": "Standardize treatment or model treatment variants separately.",
        },
    ]
)

display(threats)
threat symptom diagnostic mitigation
0 Risk targeting mistaken for uplift targeting The team contacts the highest-risk customers o... Compare churn-risk, uplift, and net-value targ... Randomized holdout and uplift modeling.
1 Contact fatigue Low-risk customers complain or churn after unn... Measure unsubscribe, complaint, and future eng... Suppress low net-value customers and cap conta...
2 Offer leakage Customers learn to threaten churn to receive d... Track repeated offer exposure and strategic do... Randomized offer rules, eligibility windows, a...
3 Short outcome window The intervention delays churn but does not imp... Track 30, 60, 90, and 180-day retention. Use cumulative CLV outcomes and delayed guardr...
4 Capacity bottleneck The model recommends more contacts than the te... Evaluate value under contact capacity constrai... Rank by expected net value, not binary risk.
5 Bad CLV estimates Saved customers do not generate the expected v... Backtest CLV calibration by segment. Use conservative CLV, sensitivity analysis, an...
6 Unobserved treatment variation Some agents deliver better interventions than ... Log intervention type, agent, script, and comp... Standardize treatment or model treatment varia...

12. Decision Memo Template

Below is a compact retention readout for the capacity-constrained targeting policy.

best_capacity = capacity_eval.sort_values("expected_net_value", ascending=False).iloc[0]
selected_idx = (
    test_df.sort_values("net_value_hat", ascending=False)
    .head(capacity)
    .index
)
selected_customers = test_df.loc[selected_idx]

gross_saved_clv = (selected_customers["true_churn_uplift"] * selected_customers["clv"]).sum()
program_cost = selected_customers["intervention_cost"].sum()
program_net = selected_customers["expected_profit_uplift"].sum()
expected_saves = selected_customers["true_churn_uplift"].sum()
roi = gross_saved_clv / program_cost

memo = f'''
### Retention Intervention Readout

**Decision:** Which customers should receive the next-cycle retention intervention.

**Design:** Randomized retention campaign used to estimate churn reduction and net retention value.

**Primary estimand:** Customer-level expected net value: churn reduction multiplied by CLV, minus intervention cost.

**Recommended policy:** Contact the top {capacity:,} customers by predicted net value.

**Expected saved customers:** {expected_saves:,.0f}.

**Gross saved CLV:** {dollars(gross_saved_clv, 0)}.

**Intervention cost:** {dollars(program_cost, 0)}.

**Expected net value:** {dollars(program_net, 0)}.

**Gross value-to-cost ratio:** {roi:,.2f}x.

**Recommendation:** Use net-value targeting for the next campaign cycle, maintain a randomized holdout, and monitor contact fatigue, offer leakage, and delayed retention.

**Caveats:** The policy depends on CLV calibration, intervention consistency, and stable customer behavior across campaign cycles.
'''

display(Markdown(memo))

Retention Intervention Readout

Decision: Which customers should receive the next-cycle retention intervention.

Design: Randomized retention campaign used to estimate churn reduction and net retention value.

Primary estimand: Customer-level expected net value: churn reduction multiplied by CLV, minus intervention cost.

Recommended policy: Contact the top 6,000 customers by predicted net value.

Expected saved customers: 464.

Gross saved CLV: $289,957.

Intervention cost: $119,300.

Expected net value: $170,658.

Gross value-to-cost ratio: 2.43x.

Recommendation: Use net-value targeting for the next campaign cycle, maintain a randomized holdout, and monitor contact fatigue, offer leakage, and delayed retention.

Caveats: The policy depends on CLV calibration, intervention consistency, and stable customer behavior across campaign cycles.

A good retention memo does not say “the churn model has high AUC, so ship it.” It says which action list changes, what value is expected, how uncertainty is handled, and which operational risks remain.

13. Practical Workflow

A professional retention-causal workflow usually follows this sequence:

  1. Define churn, retention, and value windows.
  2. Randomize a holdout for each intervention type.
  3. Log eligibility, assignment, delivery, completion, cost, and outcome.
  4. Estimate the average treatment effect for program-level accountability.
  5. Estimate heterogeneous uplift for targeting.
  6. Convert uplift into expected net value using CLV and intervention cost.
  7. Evaluate targeting curves under capacity constraints.
  8. Keep a persistent holdout to monitor drift and long-term effects.
  9. Track guardrails: complaints, contact fatigue, offer leakage, downgrades, and delayed churn.
  10. Convert the analysis into an operational contact list and decision memo.

Hands-On Extensions

Try extending this notebook in the following ways:

  • Add multiple retention treatments, such as call, discount, billing support, and plan review.
  • Estimate treatment-specific uplift and choose the best treatment per customer.
  • Add delayed churn outcomes at 30, 60, and 90 days.
  • Use survival analysis for time-to-churn rather than binary churn.
  • Add fairness constraints across customer segments.
  • Backtest CLV calibration and show how targeting changes under conservative CLV.
  • Compare the T-learner with causal forests or doubly robust learners.

Key Takeaways

  • Churn prediction estimates who is likely to leave; uplift estimates who can be saved.
  • High churn risk is not the same as high retention value.
  • The correct targeting score combines treatment effect, CLV, intervention cost, and operational constraints.
  • A randomized holdout is the cleanest way to estimate retention intervention effects.
  • Uplift models should be evaluated by targeting value curves, not only prediction metrics.
  • Capacity constraints turn causal estimates into ranked contact lists.
  • Retention programs need guardrails for contact fatigue, offer leakage, and delayed churn.

References

Glady, N., Baesens, B., & Croux, C. (2009). Modeling churn using customer lifetime value. European Journal of Operational Research, 197(1), 402-411. https://doi.org/10.1016/j.ejor.2008.06.027

Guelman, L., Guillen, M., & Perez Marin, A. M. (2012). Random forests for uplift modeling: An insurance customer retention case. In Modeling and Simulation in Engineering, Economics and Management (pp. 123-133). Springer. https://doi.org/10.1007/978-3-642-30433-0_13

Hoppner, S., Stripling, E., Baesens, B., Broucke, S. V., & Verdonck, T. (2020). Profit driven decision trees for churn prediction. European Journal of Operational Research, 284(3), 920-933. https://doi.org/10.1016/j.ejor.2018.11.072

Oskarsdottir, M., Baesens, B., & Vanthienen, J. (2018). Profit-based model selection for customer retention using individual customer lifetime values. Big Data, 6(1), 53-65. https://doi.org/10.1089/big.2018.0015

Rombaut, E., & Guerry, M.-A. (2020). The effectiveness of employee retention through an uplift modeling approach. International Journal of Manpower, 41(8), 1199-1220. https://doi.org/10.1108/IJM-04-2019-0184

Rzepakowski, P., & Jaroszewicz, S. (2012). Uplift modeling in direct marketing. Journal of Telecommunications and Information Technology, 2, 43-50. https://doi.org/10.26636/jtit.2012.2.1263

Zhang, W., Li, J., & Liu, L. (2020). A unified survey of treatment effect heterogeneity modeling and uplift modeling. arXiv. https://doi.org/10.48550/arxiv.2007.12769