causal-learn Tutorial 13: Time-Series Causal Discovery

Time-series causal discovery asks a different question from ordinary tabular discovery. In tabular data, rows are usually treated as exchangeable samples from the same joint distribution. In time-series data, row order is part of the data-generating process. A variable at time t-1 may cause another variable at time t, and the direction of time gives us real background knowledge that is not available in a purely cross-sectional dataset.

This notebook introduces the time-series workflow available in causal-learn and related LiNGAM time-series tools shipped with the same package family. We will simulate a small dynamic system with known lagged and contemporaneous causes, then compare several ways to discover the structure:

pairwise Granger testing for two variables;
multivariate Granger-Lasso for several variables and lags;
VAR-LiNGAM for lagged effects plus same-time non-Gaussian effects;
a deliberately naive i.i.d. PC run to show what can go wrong when temporal dependence is ignored.

The examples are intentionally small. A serious time-series causal discovery project should also study stationarity, seasonality, missing time points, irregular sampling, delayed effects, interventions, and domain constraints. For large lagged conditional-independence workflows, a dedicated time-series discovery library such as Tigramite is often the better primary tool. This notebook focuses on what causal-learn can teach directly.

Estimated runtime: about 1-3 minutes on a typical laptop. The repeated stability cells are the slowest part because they refit Granger-Lasso several times.

Learning Goals

By the end of this notebook, you should be able to:

explain the difference between contemporaneous, lagged, and auto-regressive edges;
reshape a multivariate time series into a lagged supervised-learning table;
run causal-learn’s Granger tools and read the coefficient layout correctly;
run VAR-LiNGAM and separate instantaneous effects from lagged effects;
evaluate discovered lagged edges against a known synthetic truth;
understand why ordinary i.i.d. graph discovery is risky on autocorrelated rows.

Notebook Flow

We will move in the same order a real analysis should move:

Set up imports, output paths, and plotting style.
Define the true dynamic data-generating process.
Simulate and inspect a multivariate time series.
Draw the true temporal graph so the estimators have a clear target.
Build lagged design matrices to make the time-series framing concrete.
Run pairwise Granger and multivariate Granger-Lasso.
Run VAR-LiNGAM to recover same-time and lagged effects.
Compare results, tune thresholds, and run simple stability checks.
Show why naive i.i.d. PC can be misleading for time series.
Save a compact reporting checklist and artifact manifest.

Time-Series Causal Discovery Theory

A time-series causal graph usually has nodes like X(t), X(t-1), and X(t-2) rather than only X. That indexing changes the meaning of an edge.

A lagged edge such as match(t-1) -> engagement(t) says earlier match quality helps predict later engagement after accounting for other lagged variables. Lagged edges are often easier to orient because causes must come before effects in time.

A same-time edge such as match(t) -> engagement(t) says there is an instantaneous structural relation inside the same time step. Same-time edges are harder because the timestamp alone does not order the two variables. VAR-LiNGAM uses non-Gaussian residual structure to estimate these same-time directions.

An auto-regressive edge such as engagement(t-1) -> engagement(t) says the process has memory. Auto-regression is not a nuisance detail. If we ignore it, ordinary tabular methods may mistake persistence for causal dependence between variables.

Granger Causality In Plain Language

Granger causality is predictive causality over time. Variable X Granger-causes variable Y if past values of X improve prediction of current Y, after using past values of Y and possibly other variables.

That is useful, but it is not magic. Granger causality can be distorted by hidden common causes, omitted variables, synchronized measurement, nonstationarity, and insufficient lag length. In this notebook we use Granger methods as discovery tools, then evaluate them against a known synthetic truth.

VAR-LiNGAM In Plain Language

VAR-LiNGAM combines two ideas:

a vector autoregression, which models how lagged variables affect current variables;
LiNGAM, which uses linear non-Gaussian residuals to orient same-time causal effects.

The fitted object returns an array of adjacency matrices. Matrix 0 is the same-time matrix. Matrix 1 is lag one, matrix 2 is lag two, and so on. As in the other LiNGAM notebooks, the convention is row = target and column = source.

Assumptions And Caveats

The cleanest version of the workflow assumes evenly spaced observations, no missing timestamps, approximate stationarity, enough data for the selected lag length, and no important omitted time-varying confounders. It also assumes the effect is captured reasonably well by linear lagged relationships when using Granger-Lasso or VAR-LiNGAM.

When those assumptions are not plausible, do not treat the learned graph as a final causal answer. Treat it as a candidate structure that needs sensitivity checks, domain review, and ideally experimental or quasi-experimental support.

Setup

This first code cell imports the packages used throughout the notebook, creates output folders, and pins plotting defaults. The MPLCONFIGDIR line avoids noisy matplotlib cache warnings in restricted notebook environments.

from pathlib import Path
import os
import time
import warnings

PROJECT_ROOT = Path.cwd().resolve()
if PROJECT_ROOT.name == "causal_learn":
    PROJECT_ROOT = PROJECT_ROOT.parents[2]

OUTPUT_DIR = PROJECT_ROOT / "notebooks" / "tutorials" / "causal_learn" / "outputs"
DATASET_DIR = OUTPUT_DIR / "datasets"
FIGURE_DIR = OUTPUT_DIR / "figures"
TABLE_DIR = OUTPUT_DIR / "tables"
REPORT_DIR = OUTPUT_DIR / "reports"
MATPLOTLIB_CACHE_DIR = OUTPUT_DIR / "matplotlib_cache"

for directory in [DATASET_DIR, FIGURE_DIR, TABLE_DIR, REPORT_DIR, MATPLOTLIB_CACHE_DIR]:
    directory.mkdir(parents=True, exist_ok=True)

os.environ.setdefault("MPLCONFIGDIR", str(MATPLOTLIB_CACHE_DIR))
warnings.filterwarnings("ignore", category=FutureWarning)
warnings.filterwarnings("ignore", message=".*verbose.*deprecated.*")
warnings.filterwarnings("ignore", message="IProgress not found.*")

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.patches import FancyArrowPatch, FancyBboxPatch
from IPython.display import display
from sklearn.preprocessing import StandardScaler

from causallearn.search.Granger.Granger import Granger
from causallearn.search.FCMBased.lingam.var_lingam import VARLiNGAM
from causallearn.search.ConstraintBased.PC import pc

NOTEBOOK_PREFIX = "13"
RANDOM_SEED = 42
sns.set_theme(style="whitegrid", context="notebook")
plt.rcParams.update({"figure.dpi": 120, "savefig.dpi": 160})

print(f"Project root: {PROJECT_ROOT}")
print(f"Outputs: {OUTPUT_DIR}")

Project root: /home/apex/Documents/ranking_sys
Outputs: /home/apex/Documents/ranking_sys/notebooks/tutorials/causal_learn/outputs

The output confirms that the notebook is writing into the shared tutorial output folder. Every saved artifact in this notebook starts with 13_, which makes it easy to separate these files from earlier tutorial outputs.

Package Versions

Version information is part of reproducible causal discovery. Small implementation details can matter for graph search, regularization, and numerical thresholds, so we save the versions used to execute this notebook.

import causallearn
import sklearn
import statsmodels

version_table = pd.DataFrame(
    [
        {"package": "numpy", "version": np.__version__},
        {"package": "pandas", "version": pd.__version__},
        {"package": "scikit-learn", "version": sklearn.__version__},
        {"package": "statsmodels", "version": statsmodels.__version__},
        {"package": "causal-learn", "version": getattr(causallearn, "__version__", "not exposed")},
    ]
)
version_table.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_package_versions.csv", index=False)
display(version_table)

	package	version
0	numpy	2.4.4
1	pandas	3.0.2
2	scikit-learn	1.6.1
3	statsmodels	0.14.6
4	causal-learn	not exposed

These versions are not just bookkeeping. Time-series discovery combines statistical tests, regression solvers, and graph conventions, so saving the environment makes later debugging much easier.

Define The Dynamic System

We will simulate six variables with a simple product-analytics flavor:

need: underlying user need or demand pressure;
intent: short-term intent signal;
match: quality of the content or item match;
engagement: current depth of use;
renewal: future value or retention pressure;
support: support burden or friction signal.

The true system has self-memory, cross-lag effects, and a few same-time effects. The self-memory terms make the series autocorrelated. The cross-lag effects are the main Granger-style causal targets. The same-time effects give VAR-LiNGAM something that plain Granger-Lasso cannot fully represent.

VARIABLES = ["need", "intent", "match", "engagement", "renewal", "support"]
VAR_INDEX = {name: i for i, name in enumerate(VARIABLES)}
N_VARIABLES = len(VARIABLES)

SELF_LAG_STRENGTH = {
    "need": 0.35,
    "intent": 0.32,
    "match": 0.28,
    "engagement": 0.30,
    "renewal": 0.34,
    "support": 0.26,
}

CROSS_LAG_EDGES = [
    {"source": "need", "lag": 1, "target": "match", "coefficient": 0.28},
    {"source": "intent", "lag": 1, "target": "match", "coefficient": 0.34},
    {"source": "match", "lag": 1, "target": "engagement", "coefficient": 0.32},
    {"source": "engagement", "lag": 1, "target": "renewal", "coefficient": 0.36},
    {"source": "engagement", "lag": 1, "target": "support", "coefficient": -0.30},
    {"source": "support", "lag": 1, "target": "renewal", "coefficient": -0.24},
    {"source": "need", "lag": 2, "target": "engagement", "coefficient": 0.18},
    {"source": "match", "lag": 2, "target": "renewal", "coefficient": 0.22},
]

CONTEMPORANEOUS_EDGES = [
    {"source": "match", "lag": 0, "target": "engagement", "coefficient": 0.22},
    {"source": "engagement", "lag": 0, "target": "renewal", "coefficient": 0.18},
    {"source": "engagement", "lag": 0, "target": "support", "coefficient": -0.15},
]

true_edge_table = pd.DataFrame(CONTEMPORANEOUS_EDGES + CROSS_LAG_EDGES)
true_edge_table["edge_family"] = np.where(true_edge_table["lag"].eq(0), "same_time", "cross_lag")
true_edge_table.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_true_temporal_edges.csv", index=False)
display(true_edge_table)

	source	lag	target	coefficient	edge_family
0	match	0	engagement	0.22	same_time
1	engagement	0	renewal	0.18	same_time
2	engagement	0	support	-0.15	same_time
3	need	1	match	0.28	cross_lag
4	intent	1	match	0.34	cross_lag
5	match	1	engagement	0.32	cross_lag
6	engagement	1	renewal	0.36	cross_lag
7	engagement	1	support	-0.30	cross_lag
8	support	1	renewal	-0.24	cross_lag
9	need	2	engagement	0.18	cross_lag
10	match	2	renewal	0.22	cross_lag

This table is the ground truth used for evaluation. Self-lag memory is real in the simulation, but the main discovery score focuses on cross-variable effects because those are the edges analysts usually mean by causal discovery across variables.

Simulation Function

This function creates the actual time series. It uses Laplace noise rather than Gaussian noise so VAR-LiNGAM has the non-Gaussian residual structure it needs for same-time orientation.

The code applies same-time effects in a fixed acyclic order: match can affect engagement, and engagement can affect renewal and support during the same timestamp.

def build_temporal_matrices():
    """Return true same-time, lag-one, and lag-two coefficient matrices.

    Matrix convention: row = target variable, column = source variable.
    """
    lag_one = np.zeros((N_VARIABLES, N_VARIABLES))
    lag_two = np.zeros((N_VARIABLES, N_VARIABLES))
    same_time = np.zeros((N_VARIABLES, N_VARIABLES))

    for variable, value in SELF_LAG_STRENGTH.items():
        lag_one[VAR_INDEX[variable], VAR_INDEX[variable]] = value

    for edge in CROSS_LAG_EDGES:
        matrix = lag_one if edge["lag"] == 1 else lag_two
        matrix[VAR_INDEX[edge["target"]], VAR_INDEX[edge["source"]]] = edge["coefficient"]

    for edge in CONTEMPORANEOUS_EDGES:
        same_time[VAR_INDEX[edge["target"]], VAR_INDEX[edge["source"]]] = edge["coefficient"]

    return same_time, lag_one, lag_two


def simulate_time_series(n_steps=1_200, burn_in=120, seed=RANDOM_SEED, noise_scale=0.60):
    """Simulate a stationary nonlinear-looking but linear structural time series."""
    rng = np.random.default_rng(seed)
    same_time, lag_one, lag_two = build_temporal_matrices()
    values = np.zeros((n_steps + burn_in, N_VARIABLES))

    for t in range(2, n_steps + burn_in):
        # Lagged structural component plus non-Gaussian shocks.
        base = lag_one @ values[t - 1] + lag_two @ values[t - 2]
        current = base + rng.laplace(loc=0.0, scale=noise_scale, size=N_VARIABLES)

        # Same-time effects are applied in a known acyclic order.
        current[VAR_INDEX["engagement"]] += same_time[VAR_INDEX["engagement"], VAR_INDEX["match"]] * current[VAR_INDEX["match"]]
        current[VAR_INDEX["renewal"]] += same_time[VAR_INDEX["renewal"], VAR_INDEX["engagement"]] * current[VAR_INDEX["engagement"]]
        current[VAR_INDEX["support"]] += same_time[VAR_INDEX["support"], VAR_INDEX["engagement"]] * current[VAR_INDEX["engagement"]]

        values[t] = current

    frame = pd.DataFrame(values[burn_in:], columns=VARIABLES)
    frame.insert(0, "time", np.arange(len(frame)))
    return frame


time_series_df = simulate_time_series()
time_series_df.to_csv(DATASET_DIR / f"{NOTEBOOK_PREFIX}_synthetic_time_series.csv", index=False)
display(time_series_df.head())

	time	need	intent	match	engagement	renewal	support
0	0	-0.285133	0.277932	2.887705	0.986429	1.117906	0.885094
1	1	0.413857	-0.507032	0.902884	0.358165	0.804738	1.329916
2	2	0.160326	-0.248611	-1.384983	1.552417	0.050189	-1.493999
3	3	-0.369289	-1.298811	-0.483863	0.010198	0.853096	-2.225829
4	4	-1.029062	-0.573211	-1.947375	-0.248455	-0.050569	-0.846873

The first rows are already past the burn-in period, so the values are sampled from the stable part of the process rather than from the all-zero initialization.

Basic Data Audit

Before running discovery, inspect scale, missingness, and rough distribution shape. A time-series algorithm can fail for mundane reasons like missing timestamps, exploding values, or a variable with almost no variance.

data_audit = pd.DataFrame(
    {
        "mean": time_series_df[VARIABLES].mean(),
        "std": time_series_df[VARIABLES].std(),
        "min": time_series_df[VARIABLES].min(),
        "max": time_series_df[VARIABLES].max(),
        "missing_rate": time_series_df[VARIABLES].isna().mean(),
    }
).reset_index(names="variable")

time_step_check = pd.DataFrame(
    {
        "check": ["rows", "min_time", "max_time", "unique_time_steps", "missing_time_steps"],
        "value": [
            len(time_series_df),
            time_series_df["time"].min(),
            time_series_df["time"].max(),
            time_series_df["time"].nunique(),
            (time_series_df["time"].max() - time_series_df["time"].min() + 1) - time_series_df["time"].nunique(),
        ],
    }
)

data_audit.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_data_audit.csv", index=False)
time_step_check.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_time_step_check.csv", index=False)

display(data_audit.round(3))
display(time_step_check)

	variable	mean	std	min	max
0	need	-0.014	0.925	-4.033	4.743
1	intent	-0.048	0.869	-3.818	3.915
2	match	-0.072	1.000	-4.453	4.284
3	engagement	-0.074	1.101	-4.948	6.134
4	renewal	-0.035	1.341	-5.553	4.782
5	support	0.033	1.053	-5.105	4.530

	check	value
0	rows	1200
1	min_time	0
2	max_time	1199
3	unique_time_steps	1200
4	missing_time_steps	0

The audit confirms that the simulated data has no missing values or skipped timestamps. The variables have different natural scales, so we will standardize them before regularized regression and VAR-LiNGAM.

Plot A Slice Of The Series

A line plot is not a causal discovery method, but it is still useful. We want to see whether the process looks stable and whether the variables have obvious drift or breaks that would violate stationarity assumptions.

plot_slice = time_series_df.loc[time_series_df["time"].between(0, 220), ["time", *VARIABLES]]
plot_long = plot_slice.melt(id_vars="time", var_name="variable", value_name="value")

fig, ax = plt.subplots(figsize=(13, 6))
sns.lineplot(data=plot_long, x="time", y="value", hue="variable", linewidth=1.3, ax=ax)
ax.set_title("Synthetic Multivariate Time Series: First 220 Steps")
ax.set_xlabel("Time")
ax.set_ylabel("Value")
ax.legend(title="Variable", ncols=3, loc="upper right")
plt.tight_layout()
fig.savefig(FIGURE_DIR / f"{NOTEBOOK_PREFIX}_time_series_slice.png", dpi=160, bbox_inches="tight")
plt.show()

The series fluctuates around a stable range rather than trending away. That is the behavior we want for a first Granger and VAR-LiNGAM teaching example.

Correlation And Autocorrelation Checks

Time-series discovery should look at both cross-variable association and persistence over time. The next cell saves a contemporaneous correlation matrix and simple lag-one autocorrelation values.

correlation_matrix = time_series_df[VARIABLES].corr()
autocorrelation_table = pd.DataFrame(
    {
        "variable": VARIABLES,
        "lag_1_autocorrelation": [time_series_df[var].autocorr(lag=1) for var in VARIABLES],
        "lag_2_autocorrelation": [time_series_df[var].autocorr(lag=2) for var in VARIABLES],
    }
)

correlation_matrix.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_contemporaneous_correlation.csv")
autocorrelation_table.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_autocorrelation.csv", index=False)

fig, axes = plt.subplots(1, 2, figsize=(14, 5))
sns.heatmap(correlation_matrix, cmap="vlag", center=0, annot=True, fmt=".2f", ax=axes[0])
axes[0].set_title("Same-Time Correlation")
sns.barplot(data=autocorrelation_table, x="variable", y="lag_1_autocorrelation", color="#64748b", ax=axes[1])
axes[1].set_title("Lag-One Autocorrelation")
axes[1].set_xlabel("Variable")
axes[1].set_ylabel("Autocorrelation")
axes[1].tick_params(axis="x", rotation=30)
plt.tight_layout()
fig.savefig(FIGURE_DIR / f"{NOTEBOOK_PREFIX}_correlation_and_autocorrelation.png", dpi=160, bbox_inches="tight")
plt.show()

display(autocorrelation_table.round(3))

	variable	lag_1_autocorrelation	lag_2_autocorrelation
0	need	0.337	0.130
1	intent	0.304	0.098
2	match	0.323	0.106
3	engagement	0.452	0.173
4	renewal	0.580	0.314
5	support	0.393	0.142

The autocorrelations are clearly nonzero. That is the main warning sign against treating rows as independent observations in an ordinary tabular graph discovery algorithm.

Draw The True Temporal Graph

This figure shows the causal target in a compact temporal layout. Arrows from t-1 or t-2 point into current-time variables. Arrows within the t column are same-time structural effects.

def add_box(ax, xy, label, color, width=0.13, height=0.065):
    x, y = xy
    patch = FancyBboxPatch(
        (x - width / 2, y - height / 2),
        width,
        height,
        boxstyle="round,pad=0.014",
        facecolor=color,
        edgecolor="#1f2937",
        linewidth=1.15,
        zorder=4,
    )
    ax.add_patch(patch)
    ax.text(x, y, label, ha="center", va="center", fontsize=9.5, fontweight="bold", zorder=5)


def box_boundary_points(start, end, width=0.13, height=0.065, gap=0.010):
    """Move an arrow from node centers to rounded-box boundaries."""
    x0, y0 = start
    x1, y1 = end
    dx = x1 - x0
    dy = y1 - y0
    distance = (dx**2 + dy**2) ** 0.5
    if distance == 0:
        return start, end
    ux, uy = dx / distance, dy / distance

    def boundary_offset(ux, uy):
        candidates = []
        if abs(ux) > 1e-9:
            candidates.append((width / 2) / abs(ux))
        if abs(uy) > 1e-9:
            candidates.append((height / 2) / abs(uy))
        return min(candidates) + gap

    offset = boundary_offset(ux, uy)
    return (x0 + ux * offset, y0 + uy * offset), (x1 - ux * offset, y1 - uy * offset)


def add_arrow(ax, start, end, color="#334155", rad=0.0, lw=1.7):
    start_edge, end_edge = box_boundary_points(start, end)
    arrow = FancyArrowPatch(
        start_edge,
        end_edge,
        arrowstyle="-|>",
        mutation_scale=16,
        linewidth=lw,
        color=color,
        connectionstyle=f"arc3,rad={rad}",
        zorder=2,
    )
    ax.add_patch(arrow)


def draw_true_temporal_graph(path):
    fig, ax = plt.subplots(figsize=(15, 7.2))
    ax.set_axis_off()
    ax.set_xlim(0, 1)
    ax.set_ylim(0, 1)

    colors = {
        "need": "#dbeafe",
        "intent": "#dbeafe",
        "match": "#ecfccb",
        "engagement": "#fef3c7",
        "renewal": "#fee2e2",
        "support": "#ede9fe",
    }
    positions = {
        ("need", 2): (0.10, 0.78),
        ("match", 2): (0.10, 0.34),
        ("need", 1): (0.33, 0.82),
        ("intent", 1): (0.33, 0.66),
        ("match", 1): (0.33, 0.50),
        ("engagement", 1): (0.33, 0.34),
        ("support", 1): (0.33, 0.18),
        ("match", 0): (0.62, 0.66),
        ("engagement", 0): (0.62, 0.40),
        ("renewal", 0): (0.88, 0.66),
        ("support", 0): (0.88, 0.28),
    }

    for x, label in [(0.10, "t-2"), (0.33, "t-1"), (0.62, "t"), (0.88, "t")]:
        ax.text(x, 0.92, label, ha="center", va="center", fontsize=12, fontweight="bold", color="#475569")

    # Draw arrows first so boxes remain crisp and readable.
    add_arrow(ax, positions[("need", 1)], positions[("match", 0)], rad=-0.06)
    add_arrow(ax, positions[("intent", 1)], positions[("match", 0)], rad=-0.02)
    add_arrow(ax, positions[("match", 1)], positions[("engagement", 0)], rad=0.01)
    add_arrow(ax, positions[("engagement", 1)], positions[("renewal", 0)], rad=-0.10)
    add_arrow(ax, positions[("engagement", 1)], positions[("support", 0)], rad=0.04)
    add_arrow(ax, positions[("support", 1)], positions[("renewal", 0)], rad=0.18)
    add_arrow(ax, positions[("need", 2)], positions[("engagement", 0)], color="#475569", rad=-0.12, lw=1.35)
    add_arrow(ax, positions[("match", 2)], positions[("renewal", 0)], color="#475569", rad=0.18, lw=1.35)

    add_arrow(ax, positions[("match", 0)], positions[("engagement", 0)], color="#b45309", rad=0.0, lw=1.9)
    add_arrow(ax, positions[("engagement", 0)], positions[("renewal", 0)], color="#b45309", rad=-0.05, lw=1.9)
    add_arrow(ax, positions[("engagement", 0)], positions[("support", 0)], color="#b45309", rad=0.04, lw=1.9)

    for (variable, lag), xy in positions.items():
        suffix = "t" if lag == 0 else f"t-{lag}"
        add_box(ax, xy, f"{variable}\n({suffix})", colors[variable])

    ax.text(
        0.50,
        0.06,
        "Dark arrows are lagged effects; amber arrows are same-time structural effects.",
        ha="center",
        va="center",
        fontsize=10,
        color="#475569",
    )
    ax.set_title("True Temporal Causal Structure", pad=18, fontsize=16, fontweight="bold")
    fig.savefig(path, dpi=160, bbox_inches="tight")
    plt.show()


draw_true_temporal_graph(FIGURE_DIR / f"{NOTEBOOK_PREFIX}_true_temporal_graph.png")

The graph makes the target explicit: Granger methods should recover the dark lagged cross-variable arrows, while VAR-LiNGAM should also recover the amber same-time arrows if the linear non-Gaussian assumptions are helpful enough.

Standardize The Series

Regularized regression and VAR fitting are easier to compare when variables are on similar scales. We standardize the six series but keep the original time column for plotting and alignment.

scaler = StandardScaler()
scaled_values = scaler.fit_transform(time_series_df[VARIABLES])
scaled_df = pd.DataFrame(scaled_values, columns=VARIABLES)
scaled_df.insert(0, "time", time_series_df["time"].to_numpy())
scaled_df.to_csv(DATASET_DIR / f"{NOTEBOOK_PREFIX}_synthetic_time_series_standardized.csv", index=False)

display(scaled_df.head())
display(pd.DataFrame({"variable": VARIABLES, "scaled_mean": scaled_df[VARIABLES].mean(), "scaled_std": scaled_df[VARIABLES].std()}).round(3))

	time	need	intent	match	engagement	renewal	support
0	0	-0.292979	0.374857	2.962431	0.963866	0.860149	0.809245
1	1	0.462961	-0.528326	0.975884	0.393050	0.626548	1.231873
2	2	0.188774	-0.230985	-1.313973	1.478100	0.063707	-1.451142
3	3	-0.383993	-1.439352	-0.412069	0.076901	0.662620	-2.146457
4	4	-1.097520	-0.604473	-1.876854	-0.158100	-0.011452	-0.836304

	variable	scaled_mean	scaled_std
need	need	0.0	1.0
intent	intent	0.0	1.0
match	match	-0.0	1.0
engagement	engagement	0.0	1.0
renewal	renewal	0.0	1.0
support	support	0.0	1.0

The means are approximately zero and the standard deviations are approximately one. That keeps the coefficient thresholds used later from being dominated by variable scale.

Build A Lagged Design Matrix

A time-series model can be rewritten as a supervised-learning table. For each current time t, we create columns for each variable at t-1 and t-2, then keep the current variables as targets.

This is the core representation behind Granger-style models.

def make_lagged_frame(frame, variables, max_lag=2):
    pieces = [frame[["time", *variables]].copy()]
    for lag in range(1, max_lag + 1):
        shifted = frame[variables].shift(lag).add_suffix(f"__lag_{lag}")
        pieces.append(shifted)
    lagged = pd.concat(pieces, axis=1).dropna().reset_index(drop=True)
    return lagged

lagged_df = make_lagged_frame(scaled_df, VARIABLES, max_lag=2)
lagged_df.to_csv(DATASET_DIR / f"{NOTEBOOK_PREFIX}_lagged_design_matrix.csv", index=False)
display(lagged_df.head())
print(f"Lagged design shape: {lagged_df.shape}")

	time	need	intent	match	engagement	renewal	support	need__lag_1	intent__lag_1	match__lag_1	engagement__lag_1	renewal__lag_1	support__lag_1	need__lag_2	intent__lag_2	match__lag_2	engagement__lag_2	renewal__lag_2	support__lag_2
0	2	0.188774	-0.230985	-1.313973	1.478100	0.063707	-1.451142	0.462961	-0.528326	0.975884	0.393050	0.626548	1.231873	-0.292979	0.374857	2.962431	0.963866	0.860149	0.809245
1	3	-0.383993	-1.439352	-0.412069	0.076901	0.662620	-2.146457	0.188774	-0.230985	-1.313973	1.478100	0.063707	-1.451142	0.462961	-0.528326	0.975884	0.393050	0.626548	1.231873
2	4	-1.097520	-0.604473	-1.876854	-0.158100	-0.011452	-0.836304	-0.383993	-1.439352	-0.412069	0.076901	0.662620	-2.146457	0.188774	-0.230985	-1.313973	1.478100	0.063707	-1.451142
3	5	-0.528527	-0.281948	-4.384371	-2.331919	0.242976	-3.216966	-1.097520	-0.604473	-1.876854	-0.158100	-0.011452	-0.836304	-0.383993	-1.439352	-0.412069	0.076901	0.662620	-2.146457
4	6	-0.164297	-0.067427	-1.660169	-2.541614	-0.345630	0.956063	-0.528527	-0.281948	-4.384371	-2.331919	0.242976	-3.216966	-1.097520	-0.604473	-1.876854	-0.158100	-0.011452	-0.836304

Lagged design shape: (1198, 19)

The first usable row starts at time 2 because lag-two features require two previous observations. In real work, this lag construction is where you make choices about maximum lag, seasonal lags, calendar effects, and leakage prevention.

Helper Functions For Edge Tables And Metrics

The next helpers keep the rest of the notebook readable. They convert coefficient matrices into edge tables, compare learned edges to truth, and save compact metrics.

Reminder: matrices use row = target and column = source.

def coefficient_array_to_edges(coef_array, threshold, method, include_self=False):
    """Convert a same-time/lagged coefficient array into a tidy edge table."""
    rows = []
    for lag, matrix in enumerate(coef_array):
        for target in VARIABLES:
            for source in VARIABLES:
                if not include_self and source == target:
                    continue
                coefficient = float(matrix[VAR_INDEX[target], VAR_INDEX[source]])
                if abs(coefficient) >= threshold:
                    rows.append(
                        {
                            "method": method,
                            "source": source,
                            "lag": lag,
                            "target": target,
                            "coefficient": coefficient,
                            "abs_coefficient": abs(coefficient),
                            "edge_label": f"{source}(t-{lag}) -> {target}(t)" if lag else f"{source}(t) -> {target}(t)",
                        }
                    )
    return pd.DataFrame(rows).sort_values(["lag", "target", "source"]).reset_index(drop=True) if rows else pd.DataFrame(columns=["method", "source", "lag", "target", "coefficient", "abs_coefficient", "edge_label"])


def granger_coefficients_to_array(coef_matrix, max_lag):
    """Change causal-learn Granger output from (target, source-by-lag) to (lag, target, source)."""
    lagged_matrices = []
    for lag in range(1, max_lag + 1):
        start = (lag - 1) * N_VARIABLES
        stop = lag * N_VARIABLES
        lagged_matrices.append(coef_matrix[:, start:stop])
    zero_same_time = np.zeros((N_VARIABLES, N_VARIABLES))
    return np.array([zero_same_time, *lagged_matrices])


def edge_key_set(edge_table, allowed_lags=None, exclude_self=True):
    if edge_table.empty:
        return set()
    subset = edge_table.copy()
    if allowed_lags is not None:
        subset = subset[subset["lag"].isin(allowed_lags)]
    if exclude_self:
        subset = subset[subset["source"] != subset["target"]]
    return set(zip(subset["source"], subset["lag"], subset["target"]))


TRUE_EDGE_KEYS = edge_key_set(true_edge_table, allowed_lags=[0, 1, 2], exclude_self=True)
TRUE_LAGGED_EDGE_KEYS = edge_key_set(true_edge_table, allowed_lags=[1, 2], exclude_self=True)
TRUE_SAME_TIME_EDGE_KEYS = edge_key_set(true_edge_table, allowed_lags=[0], exclude_self=True)


def summarize_edges(predicted_edges, truth_keys, label):
    predicted_keys = edge_key_set(predicted_edges, allowed_lags=[0, 1, 2], exclude_self=True)
    true_positive = predicted_keys & truth_keys
    false_positive = predicted_keys - truth_keys
    false_negative = truth_keys - predicted_keys
    precision = len(true_positive) / len(predicted_keys) if predicted_keys else np.nan
    recall = len(true_positive) / len(truth_keys) if truth_keys else np.nan
    f1 = 2 * precision * recall / (precision + recall) if precision + recall > 0 else np.nan
    return pd.DataFrame(
        [
            {
                "method": label,
                "predicted_edges": len(predicted_keys),
                "true_edges": len(truth_keys),
                "true_positive_edges": len(true_positive),
                "false_positive_edges": len(false_positive),
                "false_negative_edges": len(false_negative),
                "precision": precision,
                "recall": recall,
                "f1": f1,
                "missed_edges": " | ".join(f"{s}(t-{l}) -> {t}(t)" if l else f"{s}(t) -> {t}(t)" for s, l, t in sorted(false_negative)),
                "extra_edges": " | ".join(f"{s}(t-{l}) -> {t}(t)" if l else f"{s}(t) -> {t}(t)" for s, l, t in sorted(false_positive)),
            }
        ]
    )


def save_metrics(metrics, name):
    metrics.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_{name}.csv", index=False)
    display(metrics.round(4))

The metric helpers deliberately evaluate exact lag labels. For example, match(t-1) -> engagement(t) is different from match(t-2) -> engagement(t). In time-series discovery, lag placement is part of the causal claim.

Pairwise Granger Test

We start with the simplest causal-learn time-series tool: a two-variable Granger test. This asks whether past match predicts current engagement, and whether past engagement predicts current match.

This is only pairwise. It does not adjust for the full multivariate system, so it is useful as a teaching step, not as the final graph.

pair_variables = ["match", "engagement"]
pair_data = scaled_df[pair_variables].to_numpy()
pair_model = Granger(maxlag=2, significance_level=0.01)
pair_p_values, pair_adjacency = pair_model.granger_test_2d(pair_data)

pair_rows = []
for target_index, target in enumerate(pair_variables):
    for source_index, source in enumerate(pair_variables):
        if source == target:
            continue
        for lag in [1, 2]:
            p_value = pair_p_values[target_index, source_index + len(pair_variables) * (lag - 1)]
            pair_rows.append(
                {
                    "source": source,
                    "lag": lag,
                    "target": target,
                    "p_value": p_value,
                    "significant_at_1pct": p_value < 0.01,
                }
            )

pairwise_granger_table = pd.DataFrame(pair_rows)
pairwise_granger_table.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_pairwise_granger_match_engagement.csv", index=False)
display(pairwise_granger_table)

	source	lag	target	p_value	significant_at_1pct
0	engagement	1	match	0.7020	False
1	engagement	2	match	0.5321	False
2	match	1	engagement	0.0000	True
3	match	2	engagement	0.0000	True

The pairwise test correctly finds strong evidence from past match to current engagement. The reverse direction is not significant here, which matches the main synthetic design for this pair.

Multivariate Granger-Lasso

Now we use Granger.granger_lasso, which fits one regularized regression per target variable using all variables at all selected lags as predictors. This is closer to a discovery workflow because each candidate edge competes against the other lagged variables.

The output is a coefficient matrix with shape (number_of_targets, number_of_variables * max_lag). We convert it into one matrix per lag so the edge convention is easier to read.

MAX_LAG = 2
GRANGER_THRESHOLD = 0.08

granger_model = Granger(maxlag=MAX_LAG, cv=5)
start_time = time.perf_counter()
granger_raw_coefficients = granger_model.granger_lasso(scaled_df[VARIABLES].to_numpy())
granger_elapsed = time.perf_counter() - start_time

granger_coef_array = granger_coefficients_to_array(granger_raw_coefficients, max_lag=MAX_LAG)
granger_edges = coefficient_array_to_edges(
    granger_coef_array,
    threshold=GRANGER_THRESHOLD,
    method="Granger_Lasso",
    include_self=False,
)
granger_metrics = summarize_edges(granger_edges, TRUE_LAGGED_EDGE_KEYS, "Granger_Lasso_lagged_edges")
granger_metrics["threshold"] = GRANGER_THRESHOLD
granger_metrics["elapsed_seconds"] = granger_elapsed

granger_edges.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_granger_lasso_edges.csv", index=False)
save_metrics(granger_metrics, "granger_lasso_metrics")
display(granger_edges.round(4))

	method	predicted_edges	true_edges	true_positive_edges	false_positive_edges	false_negative_edges	precision	recall	f1	missed_edges	extra_edges	threshold	elapsed_seconds
0	Granger_Lasso_lagged_edges	8	8	8	0	0	1.0	1.0	1.0			0.08	0.1495

	method	source	lag	target	coefficient	abs_coefficient	edge_label
0	Granger_Lasso	match	1	engagement	0.3515	0.3515	match(t-1) -> engagement(t)
1	Granger_Lasso	intent	1	match	0.2891	0.2891	intent(t-1) -> match(t)
2	Granger_Lasso	need	1	match	0.2528	0.2528	need(t-1) -> match(t)
3	Granger_Lasso	engagement	1	renewal	0.3489	0.3489	engagement(t-1) -> renewal(t)
4	Granger_Lasso	support	1	renewal	-0.1748	0.1748	support(t-1) -> renewal(t)
5	Granger_Lasso	engagement	1	support	-0.3702	0.3702	engagement(t-1) -> support(t)
6	Granger_Lasso	need	2	engagement	0.1192	0.1192	need(t-2) -> engagement(t)
7	Granger_Lasso	match	2	renewal	0.1280	0.1280	match(t-2) -> renewal(t)

At this threshold, Granger-Lasso recovers the intended cross-lag structure cleanly. The coefficient signs also line up with the data-generating process: support burden has a negative effect on renewal, and engagement has a negative effect on later support burden in this synthetic design.

Visualize Granger-Lasso Lag Matrices

A matrix view is often easier than a long edge table. The next figure shows the estimated lag-one and lag-two coefficients. Rows are targets and columns are sources.

fig, axes = plt.subplots(1, 2, figsize=(14, 5))
for plot_index, lag in enumerate([1, 2]):
    matrix = granger_coef_array[lag]
    sns.heatmap(
        pd.DataFrame(matrix, index=VARIABLES, columns=VARIABLES),
        cmap="vlag",
        center=0,
        vmin=-0.40,
        vmax=0.40,
        annot=True,
        fmt=".2f",
        ax=axes[plot_index],
    )
    axes[plot_index].set_title(f"Granger-Lasso Coefficients: Lag {lag}")
    axes[plot_index].set_xlabel("Source at lag")
    axes[plot_index].set_ylabel("Target at time t")
plt.tight_layout()
fig.savefig(FIGURE_DIR / f"{NOTEBOOK_PREFIX}_granger_lasso_lag_heatmaps.png", dpi=160, bbox_inches="tight")
plt.show()

The strongest off-diagonal cells correspond to the true lagged edges. The diagonal cells show self-memory, which is expected in a time series and should usually be reported separately from cross-variable causal edges.

Threshold Sensitivity For Granger-Lasso

Regularized coefficients need a practical threshold before they become a graph. Instead of choosing one threshold silently, we scan several values and report precision and recall against the synthetic truth.

threshold_rows = []
for threshold in [0.04, 0.06, 0.08, 0.10, 0.12]:
    candidate_edges = coefficient_array_to_edges(
        granger_coef_array,
        threshold=threshold,
        method="Granger_Lasso",
        include_self=False,
    )
    candidate_metrics = summarize_edges(candidate_edges, TRUE_LAGGED_EDGE_KEYS, f"threshold_{threshold:.2f}")
    candidate_metrics["threshold"] = threshold
    threshold_rows.append(candidate_metrics)

granger_threshold_metrics = pd.concat(threshold_rows, ignore_index=True)
granger_threshold_metrics.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_granger_threshold_sensitivity.csv", index=False)
display(granger_threshold_metrics[["threshold", "predicted_edges", "true_positive_edges", "false_positive_edges", "false_negative_edges", "precision", "recall", "f1"]].round(4))

	threshold	predicted_edges	true_positive_edges	false_positive_edges	false_negative_edges	precision	recall	f1
0	0.04	13	8	5	0	0.6154	1.000	0.7619
1	0.06	10	8	2	0	0.8000	1.000	0.8889
2	0.08	8	8	0	0	1.0000	1.000	1.0000
3	0.10	8	8	0	0	1.0000	1.000	1.0000
4	0.12	7	7	0	1	1.0000	0.875	0.9333

The sensitivity table shows why the chosen threshold is reasonable for this teaching dataset. Lower thresholds admit extra weak edges; higher thresholds can eventually drop weaker true lag-two effects.

Plot Threshold Sensitivity

The next plot turns the threshold table into a quick diagnostic. A good threshold region should not be a single fragile point.

plot_thresholds = granger_threshold_metrics.melt(
    id_vars="threshold",
    value_vars=["precision", "recall", "f1"],
    var_name="metric",
    value_name="score",
)

fig, ax = plt.subplots(figsize=(9, 5))
sns.lineplot(data=plot_thresholds, x="threshold", y="score", hue="metric", marker="o", ax=ax)
ax.set_title("Granger-Lasso Threshold Sensitivity")
ax.set_xlabel("Absolute Coefficient Threshold")
ax.set_ylabel("Score")
ax.set_ylim(0, 1.05)
plt.tight_layout()
fig.savefig(FIGURE_DIR / f"{NOTEBOOK_PREFIX}_granger_threshold_sensitivity.png", dpi=160, bbox_inches="tight")
plt.show()

The plot makes the threshold tradeoff visible: the useful region is where recall remains high without admitting many false positive edges.

VAR-LiNGAM

Granger-Lasso is designed for lagged predictive effects. VAR-LiNGAM adds a same-time LiNGAM step on the residuals, so it can estimate both instantaneous and lagged coefficients under linear non-Gaussian assumptions.

We fit two lags because that matches the synthetic data-generating process.

VARLINGAM_THRESHOLD = 0.07
var_lingam_model = VARLiNGAM(lags=MAX_LAG, criterion=None, prune=False, random_state=RANDOM_SEED)
start_time = time.perf_counter()
var_lingam_model.fit(scaled_df[VARIABLES].to_numpy())
var_lingam_elapsed = time.perf_counter() - start_time

var_lingam_coef_array = np.asarray(var_lingam_model.adjacency_matrices_)
var_lingam_edges = coefficient_array_to_edges(
    var_lingam_coef_array,
    threshold=VARLINGAM_THRESHOLD,
    method="VAR_LiNGAM",
    include_self=False,
)
var_lingam_metrics = summarize_edges(var_lingam_edges, TRUE_EDGE_KEYS, "VAR_LiNGAM_same_time_and_lagged")
var_lingam_metrics["threshold"] = VARLINGAM_THRESHOLD
var_lingam_metrics["elapsed_seconds"] = var_lingam_elapsed
var_lingam_metrics["estimated_causal_order"] = " -> ".join(VARIABLES[i] for i in var_lingam_model.causal_order_)

var_lingam_edges.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_var_lingam_edges.csv", index=False)
save_metrics(var_lingam_metrics, "var_lingam_metrics")
display(var_lingam_edges.round(4))

	method	predicted_edges	true_edges	true_positive_edges	false_positive_edges	false_negative_edges	precision	recall	f1	missed_edges	extra_edges	threshold	elapsed_seconds	estimated_causal_order
0	VAR_LiNGAM_same_time_and_lagged	13	11	11	2	0	0.8462	1.0	0.9167		renewal(t-1) -> match(t) \| renewal(t-2) -> nee...	0.07	0.0327	intent -> match -> engagement -> need -> renew...

	method	source	lag	target	coefficient	abs_coefficient	edge_label
0	VAR_LiNGAM	match	0	engagement	0.2112	0.2112	match(t) -> engagement(t)
1	VAR_LiNGAM	engagement	0	renewal	0.1579	0.1579	engagement(t) -> renewal(t)
2	VAR_LiNGAM	engagement	0	support	-0.0910	0.0910	engagement(t) -> support(t)
3	VAR_LiNGAM	match	1	engagement	0.3133	0.3133	match(t-1) -> engagement(t)
4	VAR_LiNGAM	intent	1	match	0.3063	0.3063	intent(t-1) -> match(t)
5	VAR_LiNGAM	need	1	match	0.2721	0.2721	need(t-1) -> match(t)
6	VAR_LiNGAM	renewal	1	match	-0.0753	0.0753	renewal(t-1) -> match(t)
7	VAR_LiNGAM	engagement	1	renewal	0.3051	0.3051	engagement(t-1) -> renewal(t)
8	VAR_LiNGAM	support	1	renewal	-0.1862	0.1862	support(t-1) -> renewal(t)
9	VAR_LiNGAM	engagement	1	support	-0.3545	0.3545	engagement(t-1) -> support(t)
10	VAR_LiNGAM	need	2	engagement	0.1300	0.1300	need(t-2) -> engagement(t)
11	VAR_LiNGAM	renewal	2	need	-0.0912	0.0912	renewal(t-2) -> need(t)
12	VAR_LiNGAM	match	2	renewal	0.1424	0.1424	match(t-2) -> renewal(t)

VAR-LiNGAM recovers both lagged effects and same-time effects in this controlled setting. The estimated causal order is specifically the same-time LiNGAM order among residual shocks, not a full time-unrolled causal order.

Visualize VAR-LiNGAM Matrices

The same-time matrix is labeled lag 0. Lag 1 and lag 2 then show delayed effects. This separation is the main conceptual advantage of VAR-LiNGAM over a plain lagged regression graph.

fig, axes = plt.subplots(1, 3, figsize=(18, 5))
for plot_index, lag in enumerate([0, 1, 2]):
    sns.heatmap(
        pd.DataFrame(var_lingam_coef_array[lag], index=VARIABLES, columns=VARIABLES),
        cmap="vlag",
        center=0,
        vmin=-0.40,
        vmax=0.40,
        annot=True,
        fmt=".2f",
        ax=axes[plot_index],
    )
    axes[plot_index].set_title("VAR-LiNGAM Same-Time" if lag == 0 else f"VAR-LiNGAM Lag {lag}")
    axes[plot_index].set_xlabel("Source")
    axes[plot_index].set_ylabel("Target")
plt.tight_layout()
fig.savefig(FIGURE_DIR / f"{NOTEBOOK_PREFIX}_var_lingam_heatmaps.png", dpi=160, bbox_inches="tight")
plt.show()

The heatmaps show why matrix indexing matters. The same-time effects appear in the first matrix, while the Granger-style delayed effects appear in the later matrices.

Compare Granger-Lasso And VAR-LiNGAM

The two methods answer overlapping but not identical questions. Granger-Lasso is focused on lagged predictive effects. VAR-LiNGAM estimates a richer structural VAR that includes same-time effects.

method_comparison_metrics = pd.concat(
    [
        granger_metrics.assign(target_edge_set="lagged_only"),
        var_lingam_metrics.assign(target_edge_set="same_time_plus_lagged"),
    ],
    ignore_index=True,
)
method_comparison_metrics.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_method_comparison_metrics.csv", index=False)

display(method_comparison_metrics[["method", "target_edge_set", "predicted_edges", "true_positive_edges", "false_positive_edges", "false_negative_edges", "precision", "recall", "f1", "elapsed_seconds"]].round(4))

	method	target_edge_set	predicted_edges	true_positive_edges	false_positive_edges	false_negative_edges	precision	recall	f1	elapsed_seconds
0	Granger_Lasso_lagged_edges	lagged_only	8	8	0	0	1.0000	1.0	1.0000	0.1495
1	VAR_LiNGAM_same_time_and_lagged	same_time_plus_lagged	13	11	2	0	0.8462	1.0	0.9167	0.0327

The comparison is not a winner-takes-all contest. The right estimator depends on whether the analysis needs only lagged predictive structure or also same-time structural direction.

Plot Method Comparison

A compact bar plot makes the difference in target edge sets visible. Because the methods are evaluated against different targets, the labels remind us what each score means.

comparison_plot = method_comparison_metrics.melt(
    id_vars=["method", "target_edge_set"],
    value_vars=["precision", "recall", "f1"],
    var_name="metric",
    value_name="score",
)
comparison_plot["method_label"] = comparison_plot["method"] + "\n(" + comparison_plot["target_edge_set"] + ")"

fig, ax = plt.subplots(figsize=(11, 5))
sns.barplot(data=comparison_plot, x="metric", y="score", hue="method_label", ax=ax)
ax.set_title("Time-Series Discovery Metrics By Method")
ax.set_xlabel("Metric")
ax.set_ylabel("Score")
ax.set_ylim(0, 1.05)
ax.legend(title="Method and target")
plt.tight_layout()
fig.savefig(FIGURE_DIR / f"{NOTEBOOK_PREFIX}_method_comparison_metrics.png", dpi=160, bbox_inches="tight")
plt.show()

Both methods perform well because the data were generated from assumptions friendly to these estimators. In real data, this plot should be paired with stability checks and domain review rather than read as proof.

Seed Stability For Granger-Lasso

A single synthetic run can look cleaner than reality. The next cell resimulates the system with several random seeds, refits Granger-Lasso, and checks how many true lagged cross-variable edges are recovered.

stability_rows = []
for seed in range(8):
    simulated = simulate_time_series(n_steps=900, burn_in=120, seed=seed)
    simulated_scaled = StandardScaler().fit_transform(simulated[VARIABLES])
    model = Granger(maxlag=MAX_LAG, cv=5)
    start_time = time.perf_counter()
    raw_coefficients = model.granger_lasso(simulated_scaled)
    elapsed = time.perf_counter() - start_time
    coef_array = granger_coefficients_to_array(raw_coefficients, max_lag=MAX_LAG)
    edges = coefficient_array_to_edges(coef_array, threshold=GRANGER_THRESHOLD, method="Granger_Lasso", include_self=False)
    metrics = summarize_edges(edges, TRUE_LAGGED_EDGE_KEYS, f"seed_{seed}")
    metrics["seed"] = seed
    metrics["elapsed_seconds"] = elapsed
    stability_rows.append(metrics)

granger_seed_stability = pd.concat(stability_rows, ignore_index=True)
granger_seed_stability.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_granger_seed_stability.csv", index=False)
display(granger_seed_stability[["seed", "predicted_edges", "true_positive_edges", "false_positive_edges", "false_negative_edges", "precision", "recall", "f1", "elapsed_seconds"]].round(4))

	seed	predicted_edges	true_positive_edges	false_positive_edges	precision	recall	f1	elapsed_seconds
0	0	9	8	1	0.8889	1.0	0.9412	0.1656
1	1	8	8	0	1.0000	1.0	1.0000	0.1452
2	2	8	8	0	1.0000	1.0	1.0000	0.1461
3	3	9	8	1	0.8889	1.0	0.9412	0.1449
4	4	8	8	0	1.0000	1.0	1.0000	0.1448
5	5	9	8	1	0.8889	1.0	0.9412	0.1471
6	6	8	8	0	1.0000	1.0	1.0000	0.1465
7	7	9	8	1	0.8889	1.0	0.9412	0.1467

The seed check asks whether the result is a stable property of the data-generating process or a lucky draw. In this controlled setup the lagged graph is stable across repeated simulations.

Plot Seed Stability

The next plot shows precision, recall, and F1 across simulation seeds. Flat lines near one are reassuring in this synthetic setting.

seed_plot = granger_seed_stability.melt(
    id_vars="seed",
    value_vars=["precision", "recall", "f1"],
    var_name="metric",
    value_name="score",
)

fig, ax = plt.subplots(figsize=(10, 5))
sns.lineplot(data=seed_plot, x="seed", y="score", hue="metric", marker="o", ax=ax)
ax.set_title("Granger-Lasso Stability Across Simulated Seeds")
ax.set_xlabel("Simulation Seed")
ax.set_ylabel("Score")
ax.set_ylim(0, 1.05)
plt.tight_layout()
fig.savefig(FIGURE_DIR / f"{NOTEBOOK_PREFIX}_granger_seed_stability.png", dpi=160, bbox_inches="tight")
plt.show()

The plot is intentionally simple: a stable method should not require a particular random simulation to look good. On real data, an analogous check might resample blocks of time or run rolling-origin splits.

Sample-Size Sensitivity

Time-series discovery can be data hungry because every candidate lag multiplies the number of possible predictors. We rerun Granger-Lasso with increasing sample sizes to show how recovery changes as the available history grows.

sample_size_rows = []
for n_steps in [250, 400, 700, 1_200]:
    simulated = simulate_time_series(n_steps=n_steps, burn_in=120, seed=RANDOM_SEED + n_steps)
    simulated_scaled = StandardScaler().fit_transform(simulated[VARIABLES])
    model = Granger(maxlag=MAX_LAG, cv=5)
    start_time = time.perf_counter()
    raw_coefficients = model.granger_lasso(simulated_scaled)
    elapsed = time.perf_counter() - start_time
    coef_array = granger_coefficients_to_array(raw_coefficients, max_lag=MAX_LAG)
    edges = coefficient_array_to_edges(coef_array, threshold=GRANGER_THRESHOLD, method="Granger_Lasso", include_self=False)
    metrics = summarize_edges(edges, TRUE_LAGGED_EDGE_KEYS, f"n_{n_steps}")
    metrics["n_steps"] = n_steps
    metrics["elapsed_seconds"] = elapsed
    sample_size_rows.append(metrics)

granger_sample_size = pd.concat(sample_size_rows, ignore_index=True)
granger_sample_size.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_granger_sample_size_sensitivity.csv", index=False)
display(granger_sample_size[["n_steps", "predicted_edges", "true_positive_edges", "false_positive_edges", "false_negative_edges", "precision", "recall", "f1", "elapsed_seconds"]].round(4))

	n_steps	predicted_edges	true_positive_edges	false_positive_edges	false_negative_edges	precision	recall	f1	elapsed_seconds
0	250	10	7	3	1	0.7	0.875	0.7778	0.1544
1	400	8	8	0	0	1.0	1.000	1.0000	0.1462
2	700	8	8	0	0	1.0	1.000	1.0000	0.1487
3	1200	8	8	0	0	1.0	1.000	1.0000	0.1673

The smaller histories can miss weaker effects or admit extra edges. As the sample grows, the graph estimate becomes more reliable in this controlled example.

Plot Sample-Size Sensitivity

This plot makes the data requirement visible. A causal graph that changes dramatically with modest changes in history length should be reported cautiously.

sample_plot = granger_sample_size.melt(
    id_vars="n_steps",
    value_vars=["precision", "recall", "f1"],
    var_name="metric",
    value_name="score",
)

fig, ax = plt.subplots(figsize=(10, 5))
sns.lineplot(data=sample_plot, x="n_steps", y="score", hue="metric", marker="o", ax=ax)
ax.set_title("Granger-Lasso Sample-Size Sensitivity")
ax.set_xlabel("Number Of Time Steps")
ax.set_ylabel("Score")
ax.set_ylim(0, 1.05)
plt.tight_layout()
fig.savefig(FIGURE_DIR / f"{NOTEBOOK_PREFIX}_granger_sample_size_sensitivity.png", dpi=160, bbox_inches="tight")
plt.show()

The curve gives a practical lesson: lagged discovery needs enough time points after lag construction. If a project has many variables and few time steps, the graph can be underidentified or unstable.

Naive I.I.D. PC As A Cautionary Example

The next cell intentionally does something risky: it runs PC directly on the same-time rows as if they were independent tabular samples. This is included to show why time-series structure should not be ignored.

The result is not treated as a valid time-series causal graph.

naive_pc_result = pc(
    scaled_df[VARIABLES].to_numpy(),
    alpha=0.01,
    indep_test="fisherz",
    stable=True,
    show_progress=False,
    node_names=VARIABLES,
)
naive_pc_edges = pd.DataFrame({"edge": [str(edge) for edge in naive_pc_result.G.get_graph_edges()]})
naive_pc_edges.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_naive_iid_pc_edges.csv", index=False)
display(naive_pc_edges)

	edge
0	intent --> match
1	engagement --> match
2	engagement --- renewal
3	engagement --- support
4	renewal --- support

The naive PC graph mixes contemporaneous association, lagged persistence, and indirect effects into one same-time graph. This is the main lesson: time ordering is not optional metadata; it is part of the causal problem.

Runtime Sketch

Runtime grows with the number of variables, candidate lags, and repeated model fits. This quick sketch fits Granger-Lasso on variable prefixes so we can see the local scaling pattern.

runtime_rows = []
for n_vars in [3, 4, 5, 6]:
    selected = VARIABLES[:n_vars]
    selected_values = scaled_df[selected].to_numpy()
    model = Granger(maxlag=MAX_LAG, cv=5)
    start_time = time.perf_counter()
    model.granger_lasso(selected_values)
    elapsed = time.perf_counter() - start_time
    runtime_rows.append({"n_variables": n_vars, "max_lag": MAX_LAG, "elapsed_seconds": elapsed})

runtime_table = pd.DataFrame(runtime_rows)
runtime_table.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_granger_runtime_sketch.csv", index=False)
display(runtime_table.round(4))

	n_variables	max_lag	elapsed_seconds
0	3	2	0.0726
1	4	2	0.0968
2	5	2	0.1212
3	6	2	0.1491

This small benchmark is not a universal runtime claim, but it helps set expectations. More variables and more lags mean more candidate predictors for every target variable.

Plot Runtime Sketch

The line plot is a quick sanity check for the local workload. In a real benchmark, you would repeat each point and include more graph sizes.

fig, ax = plt.subplots(figsize=(8, 5))
sns.lineplot(data=runtime_table, x="n_variables", y="elapsed_seconds", marker="o", color="#334155", ax=ax)
ax.set_title("Granger-Lasso Runtime Sketch")
ax.set_xlabel("Number Of Variables")
ax.set_ylabel("Elapsed Seconds")
plt.tight_layout()
fig.savefig(FIGURE_DIR / f"{NOTEBOOK_PREFIX}_granger_runtime_sketch.png", dpi=160, bbox_inches="tight")
plt.show()

The runtime remains small here because the tutorial graph is tiny. The same workflow becomes more expensive when the variable count, maximum lag, or resampling plan grows.

When To Use A Dedicated Time-Series Discovery Library

causal-learn gives us Granger tools and VAR-LiNGAM, which are useful for small and medium teaching workflows. For larger time-series causal discovery, especially conditional-independence search over many lags, a specialized library may be more ergonomic.

The next table summarizes the practical decision boundary.

method_guidance = pd.DataFrame(
    [
        {
            "need": "Quick pairwise lag test",
            "reasonable_tool": "causal-learn Granger.granger_test_2d",
            "main_warning": "Pairwise tests omit other variables and can be confounded.",
        },
        {
            "need": "Small multivariate lagged graph",
            "reasonable_tool": "causal-learn Granger.granger_lasso",
            "main_warning": "Coefficient threshold and lag length drive the final graph.",
        },
        {
            "need": "Same-time plus lagged linear non-Gaussian effects",
            "reasonable_tool": "VAR-LiNGAM",
            "main_warning": "Requires linear non-Gaussian assumptions and careful residual checks.",
        },
        {
            "need": "Large lagged conditional-independence workflow",
            "reasonable_tool": "Dedicated time-series discovery package such as Tigramite",
            "main_warning": "Choose tests, lags, and stationarity checks explicitly.",
        },
        {
            "need": "Cross-sectional DAG from autocorrelated rows",
            "reasonable_tool": "Usually avoid; first decide the temporal graph target",
            "main_warning": "Naive i.i.d. methods can orient or connect the wrong variables.",
        },
    ]
)
method_guidance.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_time_series_tool_guidance.csv", index=False)
display(method_guidance)

	need	reasonable_tool	main_warning
0	Quick pairwise lag test	causal-learn Granger.granger_test_2d	Pairwise tests omit other variables and can be...
1	Small multivariate lagged graph	causal-learn Granger.granger_lasso	Coefficient threshold and lag length drive the...
2	Same-time plus lagged linear non-Gaussian effects	VAR-LiNGAM	Requires linear non-Gaussian assumptions and c...
3	Large lagged conditional-independence workflow	Dedicated time-series discovery package such a...	Choose tests, lags, and stationarity checks ex...
4	Cross-sectional DAG from autocorrelated rows	Usually avoid; first decide the temporal graph...	Naive i.i.d. methods can orient or connect the...

This guidance is deliberately conservative. Time-series discovery is strongest when the question is phrased in temporal terms before any algorithm is run.

Reporting Checklist

A time-series causal discovery report should say exactly what temporal target was estimated. The next table is a reusable checklist for documenting that choice.

reporting_checklist = pd.DataFrame(
    [
        {
            "item": "Sampling unit and cadence",
            "what_to_report": "What one row represents and whether the spacing is regular.",
            "why_it_matters": "Lagged effects only have meaning relative to the measurement interval.",
        },
        {
            "item": "Maximum lag",
            "what_to_report": "The largest lag tested and why it was chosen.",
            "why_it_matters": "Too few lags miss delayed effects; too many lags increase false positives.",
        },
        {
            "item": "Same-time edge policy",
            "what_to_report": "Whether same-time effects are estimated, forbidden, or left unresolved.",
            "why_it_matters": "Timestamp order alone cannot orient variables measured at the same time.",
        },
        {
            "item": "Stationarity checks",
            "what_to_report": "Drift, breaks, seasonality, and distribution changes considered before discovery.",
            "why_it_matters": "Nonstationarity can create spurious lagged dependence.",
        },
        {
            "item": "Threshold or significance rule",
            "what_to_report": "Coefficient threshold, alpha level, or selection rule used to produce edges.",
            "why_it_matters": "The graph is often sensitive to this conversion step.",
        },
        {
            "item": "Stability analysis",
            "what_to_report": "Resampling, rolling windows, or seed checks used to assess graph robustness.",
            "why_it_matters": "Unstable edges should be treated as hypotheses, not findings.",
        },
        {
            "item": "Known omitted variables",
            "what_to_report": "Important unobserved drivers or measurement gaps.",
            "why_it_matters": "Hidden common causes can invalidate Granger and VAR-LiNGAM causal claims.",
        },
    ]
)
reporting_checklist.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_reporting_checklist.csv", index=False)
display(reporting_checklist)

	item	what_to_report	why_it_matters
0	Sampling unit and cadence	What one row represents and whether the spacin...	Lagged effects only have meaning relative to t...
1	Maximum lag	The largest lag tested and why it was chosen.	Too few lags miss delayed effects; too many la...
2	Same-time edge policy	Whether same-time effects are estimated, forbi...	Timestamp order alone cannot orient variables ...
3	Stationarity checks	Drift, breaks, seasonality, and distribution c...	Nonstationarity can create spurious lagged dep...
4	Threshold or significance rule	Coefficient threshold, alpha level, or selecti...	The graph is often sensitive to this conversio...
5	Stability analysis	Resampling, rolling windows, or seed checks us...	Unstable edges should be treated as hypotheses...
6	Known omitted variables	Important unobserved drivers or measurement gaps.	Hidden common causes can invalidate Granger an...

This checklist turns the notebook into a reusable analysis template. The estimator output is only one part of the causal claim; the temporal assumptions and reporting choices are equally important.

Artifact Manifest

The final cell lists the datasets, tables, and figures saved by this notebook. This makes it easy to find outputs later without scanning the whole folder.

artifact_paths = sorted(
    list(DATASET_DIR.glob(f"{NOTEBOOK_PREFIX}_*"))
    + list(TABLE_DIR.glob(f"{NOTEBOOK_PREFIX}_*"))
    + list(FIGURE_DIR.glob(f"{NOTEBOOK_PREFIX}_*"))
    + list(REPORT_DIR.glob(f"{NOTEBOOK_PREFIX}_*"))
)
artifact_manifest = pd.DataFrame(
    [
        {
            "artifact_type": path.parent.name,
            "path": str(path.relative_to(OUTPUT_DIR)),
            "size_kb": round(path.stat().st_size / 1024, 2),
        }
        for path in artifact_paths
    ]
)
artifact_manifest.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_artifact_manifest.csv", index=False)
display(artifact_manifest)

	artifact_type	path	size_kb
0	datasets	datasets/13_lagged_design_matrix.csv	419.63
1	datasets	datasets/13_synthetic_time_series.csv	143.24
2	datasets	datasets/13_synthetic_time_series_standardized...	143.26
3	figures	figures/13_correlation_and_autocorrelation.png	145.52
4	figures	figures/13_granger_lasso_lag_heatmaps.png	125.66
5	figures	figures/13_granger_runtime_sketch.png	61.38
6	figures	figures/13_granger_sample_size_sensitivity.png	57.37
7	figures	figures/13_granger_seed_stability.png	64.47
8	figures	figures/13_granger_threshold_sensitivity.png	68.92
9	figures	figures/13_method_comparison_metrics.png	70.14
10	figures	figures/13_time_series_slice.png	517.58
11	figures	figures/13_true_temporal_graph.png	133.36
12	figures	figures/13_var_lingam_heatmaps.png	165.38
13	tables	tables/13_artifact_manifest.csv	1.66
14	tables	tables/13_autocorrelation.csv	0.32
15	tables	tables/13_contemporaneous_correlation.csv	0.71
16	tables	tables/13_data_audit.csv	0.56
17	tables	tables/13_granger_lasso_edges.csv	0.82
18	tables	tables/13_granger_lasso_metrics.csv	0.24
19	tables	tables/13_granger_runtime_sketch.csv	0.13
20	tables	tables/13_granger_sample_size_sensitivity.csv	0.49
21	tables	tables/13_granger_seed_stability.csv	0.79
22	tables	tables/13_granger_threshold_sensitivity.csv	0.63
23	tables	tables/13_method_comparison_metrics.csv	0.53
24	tables	tables/13_naive_iid_pc_edges.csv	0.11
25	tables	tables/13_package_versions.csv	0.10
26	tables	tables/13_pairwise_granger_match_engagement.csv	0.16
27	tables	tables/13_reporting_checklist.csv	1.10
28	tables	tables/13_time_series_tool_guidance.csv	0.72
29	tables	tables/13_time_step_check.csv	0.09
30	tables	tables/13_true_temporal_edges.csv	0.40
31	tables	tables/13_var_lingam_edges.csv	1.26
32	tables	tables/13_var_lingam_metrics.csv	0.40

The manifest completes the notebook. The core lesson is that time-series causal discovery is about causal timing first and algorithms second: define the lagged target, respect temporal dependence, then use discovery methods as structured hypothesis generators.