causal-learn Tutorial 10: LiNGAM And Linear Non-Gaussian Models
LiNGAM stands for linear non-Gaussian acyclic model. It is a causal discovery family for settings where each variable is generated as a linear function of earlier variables plus an independent non-Gaussian error term. That extra non-Gaussianity assumption is the important part: ordinary constraint-based and score-based methods often identify only a Markov equivalence class, while LiNGAM can sometimes orient directions that would otherwise remain ambiguous.
This notebook treats LiNGAM as a practical modeling tool, not as magic. We will build a small synthetic system where the assumptions are intentionally true, check the distributional assumptions, run DirectLiNGAM and ICA-LiNGAM, inspect coefficient matrices, evaluate graph recovery, compare against PC and GES, and stress-test what happens when the non-Gaussian assumption is weakened.
Expected runtime: usually under 1-2 minutes on a laptop. The bootstrap section is the slowest part, but it uses a small number of resamples so the notebook stays interactive.
Learning Goals
By the end of this notebook, you should be able to:
Explain what LiNGAM assumes and why non-Gaussian noise helps with orientation.
Fit DirectLiNGAM and ICALiNGAM from causal-learn.
Read the LiNGAM adjacency matrix correctly: rows are children and columns are parents.
Turn coefficient matrices into tidy edge tables and graph visualizations.
Compare LiNGAM results with PC and GES on the same data.
Use bootstrap probabilities and total effects as reliability diagnostics.
Recognize when LiNGAM is being used outside its comfort zone.
Notebook Flow
We will follow the same pattern used in the earlier causal-learn tutorials. First we set up the environment and output folders. Then we create a teaching dataset with known causal structure. After that we check whether the data looks plausibly non-Gaussian, fit LiNGAM models, evaluate the learned graph, compare against other discovery families, and close with reporting guidance.
The most important conceptual difference from PC and GES is that LiNGAM estimates a directed weighted structural equation model. The edge output is not just adjacency; it also gives a signed coefficient for each direct parent-child relationship.
LiNGAM Theory
LiNGAM is easiest to understand as a structural equation model. Suppose we observe a vector of variables \(x = (x_1, x_2, \ldots, x_p)\). A linear acyclic causal model writes each variable as a weighted sum of its direct causes plus an error term:
Stacking all variables together gives the matrix form:
\[
x = Bx + e
\]
Here, \(B\) is the direct causal coefficient matrix and \(e\) is the vector of disturbance terms. In the causal-learn LiNGAM implementation, the fitted adjacency matrix follows this same child-row, parent-column convention: \(B_{ij}\) is the coefficient from parent variable \(x_j\) into child variable \(x_i\).
This orientation matters. If row match and column need has value 0.75, the statement is need -> match, not match -> need.
Why Acyclicity Matters
The word acyclic means that the graph has no directed feedback loops inside the same time slice. If the variables can be ordered so that causes always come before effects, then the coefficient matrix \(B\) can be permuted into a strictly triangular matrix. In plain language: after the right variable ordering is found, every variable only depends on variables earlier in the order.
This is why LiNGAM talks about a causal order. The order is not just a presentation detail; it is part of the model. Once a plausible order is found, the direct coefficients can be estimated by regressing each variable on variables that appear before it.
Acyclicity is also a limitation. Basic LiNGAM is not the right model for instantaneous feedback systems such as price -> demand -> price within the same measurement window. For feedback, time-lagged models or other structural assumptions are needed.
Why Gaussian Linear Models Cannot Usually Orient Directions
In a two-variable linear Gaussian system, both directions can often explain the same joint distribution. If \(Y = aX + e_Y\) with Gaussian noise, the reverse regression \(X = cY + e_X\) can also have Gaussian residuals with no obvious distributional contradiction. That is why purely Gaussian linear data often leaves direction ambiguous unless the graph has extra structure such as colliders, time order, interventions, or strong background knowledge.
LiNGAM changes the problem by assuming the error terms are independent and non-Gaussian. Under the LiNGAM assumptions, the correct causal direction tends to produce independent residuals, while the wrong direction tends to leave dependence between the residual and the proposed cause. This asymmetry is the source of LiNGAM’s extra orienting power.
The key message is not that non-Gaussian variables automatically imply causality. The key message is narrower: in a linear acyclic system with independent non-Gaussian disturbances and no hidden confounding, the direction can become identifiable from observational data.
ICA-LiNGAM Versus DirectLiNGAM
The original ICA-LiNGAM view starts from the matrix equation:
\[
x = Bx + e \quad \Rightarrow \quad x = (I - B)^{-1}e
\]
This looks like an independent component analysis problem: observed variables are mixtures of independent non-Gaussian components. ICA-LiNGAM estimates the mixing/unmixing structure and then converts it into a causal ordering and coefficient matrix.
DirectLiNGAM takes a different route. Instead of solving the whole problem through ICA, it searches for exogenous variables one by one. An exogenous variable has no parents, so it should be independent of residuals from other variables after removing its effect. DirectLiNGAM repeatedly finds such variables, removes their influence, and builds the causal order.
In practice, DirectLiNGAM is often easier to use as a default teaching model because its causal-order logic is direct and stable on small examples. ICA-LiNGAM remains useful historically and conceptually because it shows the connection between non-Gaussian causal discovery and independent component analysis.
What LiNGAM Can And Cannot Claim
When its assumptions are credible, LiNGAM can provide three useful outputs:
A causal ordering of variables.
A directed graph after thresholding small coefficients.
Signed direct-effect coefficients for each parent-child relationship.
Those outputs are still conditional on the model assumptions. LiNGAM does not automatically solve hidden confounding, selection bias, measurement error, nonlinear effects, cycles, or badly misspecified variable sets. A clean LiNGAM graph should be read as a structured causal hypothesis supported by the assumptions and diagnostics, not as a final causal proof.
That is why this notebook pairs LiNGAM estimates with distribution checks, residual independence checks, bootstrap stability, threshold sensitivity, and comparison against PC/GES. The goal is to make the assumptions visible enough that a reader can judge the result rather than simply admire the graph.
Setup
This cell imports the scientific Python stack, the causal-learn algorithms used in this notebook, and a small compatibility wrapper for GES scoring under recent NumPy behavior. The GES wrapper is included only for the comparison section; the LiNGAM models themselves do not depend on it.
The output folders are created up front so every table and figure produced later has a stable path under notebooks/tutorials/causal_learn/outputs/.
from pathlib import Pathfrom importlib.metadata import PackageNotFoundError, versionimport contextlibimport ioimport itertoolsimport osimport timeimport warnings# Resolve paths before importing matplotlib so the cache directory is writable in restricted environments.CWD = Path.cwd()if CWD.name =="causal_learn"and (CWD /"outputs").exists(): NOTEBOOK_DIR = CWDelse: NOTEBOOK_DIR = (CWD /"notebooks"/"tutorials"/"causal_learn").resolve()OUTPUT_DIR = NOTEBOOK_DIR /"outputs"DATASET_DIR = OUTPUT_DIR /"datasets"TABLE_DIR = OUTPUT_DIR /"tables"FIGURE_DIR = OUTPUT_DIR /"figures"REPORT_DIR = OUTPUT_DIR /"reports"for directory in [OUTPUT_DIR, DATASET_DIR, TABLE_DIR, FIGURE_DIR, REPORT_DIR, OUTPUT_DIR /"matplotlib_cache"]: directory.mkdir(parents=True, exist_ok=True)os.environ.setdefault("MPLCONFIGDIR", str(OUTPUT_DIR /"matplotlib_cache"))warnings.filterwarnings("ignore", message="IProgress not found.*")warnings.filterwarnings("ignore", message=".*pkg_resources is deprecated.*")import numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsfrom IPython.display import displayfrom matplotlib.patches import FancyArrowPatch, FancyBboxPatchfrom scipy import statsfrom sklearn.exceptions import ConvergenceWarningfrom causallearn.search.ConstraintBased.PC import pcimport causallearn.search.ScoreBased.GES as ges_modulefrom causallearn.search.ScoreBased.GES import gesfrom causallearn.search.FCMBased.lingam import DirectLiNGAM, ICALiNGAMwarnings.filterwarnings("ignore", category=ConvergenceWarning)NOTEBOOK_PREFIX ="10"sns.set_theme(style="whitegrid", context="notebook")plt.rcParams["figure.dpi"] =120plt.rcParams["savefig.facecolor"] ="white"def local_score_BIC_from_cov(Data, i, PAi, parameters=None):"""Compatibility wrapper for causal-learn GES BIC scoring under recent NumPy scalar behavior.""" cov, n = Data lambda_value =0.5if parameters isNoneelse parameters.get("lambda_value", 0.5) sigma =float(cov[i, i])iflen(PAi) >0: yX = cov[np.ix_([i], PAi)] XX = cov[np.ix_(PAi, PAi)]try: XX_inv = np.linalg.inv(XX)except np.linalg.LinAlgError: XX_inv = np.linalg.pinv(XX) sigma =float(cov[i, i] - (yX @ XX_inv @ yX.T).item()) sigma =max(sigma, 1e-12)return-0.5* n * (1+ np.log(sigma)) - lambda_value * (len(PAi) +1) * np.log(n)# Patch only the module object used by GES during this notebook session.ges_module.local_score_BIC_from_cov = local_score_BIC_from_covpackages = ["causal-learn", "numpy", "pandas", "scipy", "scikit-learn", "matplotlib", "seaborn"]version_rows = []for package in packages:try: package_version = version(package)except PackageNotFoundError: package_version ="not installed" version_rows.append({"package": package, "version": package_version})package_versions = pd.DataFrame(version_rows)package_versions.to_csv(TABLE_DIR /f"{NOTEBOOK_PREFIX}_package_versions.csv", index=False)display(package_versions)
package
version
0
causal-learn
0.1.4.5
1
numpy
2.4.4
2
pandas
3.0.2
3
scipy
1.17.1
4
scikit-learn
1.6.1
5
matplotlib
3.10.9
6
seaborn
0.13.2
The version table is useful for reproducibility. Causal discovery libraries can change defaults across releases, so saving package versions next to the notebook outputs makes later reruns easier to explain.
Concept Map
Before fitting models, we make the assumptions explicit. LiNGAM is a strong-assumption method: when those assumptions are reasonable, it can produce more directional information than methods that rely only on conditional independence or decomposable scores. When those assumptions are not reasonable, the extra directionality can become misleading.
concept_map = pd.DataFrame( [ {"concept": "linear structural equations","meaning": "Each variable is modeled as a weighted sum of earlier variables plus an error term.","why_it_matters": "The learned adjacency matrix can be read as direct linear effects.", }, {"concept": "non-Gaussian independent errors","meaning": "Noise terms are independent and not normally distributed.","why_it_matters": "This breaks some direction symmetries that remain under Gaussian linear models.", }, {"concept": "acyclicity","meaning": "The causal graph has no directed feedback loops within the same time slice.","why_it_matters": "LiNGAM estimates a causal order, so cycles are outside the basic model.", }, {"concept": "causal order","meaning": "An ordering where parents appear before children.","why_it_matters": "DirectLiNGAM estimates this order before estimating edge coefficients.", }, {"concept": "coefficient threshold","meaning": "Small estimated coefficients are often treated as absent edges for reporting.","why_it_matters": "Threshold choice affects precision, recall, and graph readability.", }, {"concept": "bootstrap probability","meaning": "The frequency with which an edge appears across resampled datasets.","why_it_matters": "It gives a practical stability check, not a proof of causality.", }, ])concept_map.to_csv(TABLE_DIR /f"{NOTEBOOK_PREFIX}_lingam_concept_map.csv", index=False)display(concept_map)
concept
meaning
why_it_matters
0
linear structural equations
Each variable is modeled as a weighted sum of ...
The learned adjacency matrix can be read as di...
1
non-Gaussian independent errors
Noise terms are independent and not normally d...
This breaks some direction symmetries that rem...
2
acyclicity
The causal graph has no directed feedback loop...
LiNGAM estimates a causal order, so cycles are...
3
causal order
An ordering where parents appear before children.
DirectLiNGAM estimates this order before estim...
4
coefficient threshold
Small estimated coefficients are often treated...
Threshold choice affects precision, recall, an...
5
bootstrap probability
The frequency with which an edge appears acros...
It gives a practical stability check, not a pr...
The map highlights the tradeoff. LiNGAM can be more informative than PC or GES about direction, but only because it uses more assumptions. The rest of the notebook keeps checking those assumptions instead of treating the directed output as automatic truth.
Teaching Data Design
We will use six variables from the same teaching universe as the earlier causal-learn notebooks. The graph is intentionally simple enough to inspect by eye, while still having multiple parents, a mediated path, and a downstream branch.
The data-generating system is:
need and intent are exogenous drivers.
match depends on need and intent.
engagement depends on match.
renewal depends on intent and engagement.
support depends on engagement.
For the main LiNGAM example, every error term is sampled from a Laplace distribution. Laplace noise is symmetric but heavier-tailed than Gaussian noise, so it is a clean way to create non-Gaussian errors without adding unnecessary complexity.
VARIABLES = ["need", "intent", "match", "engagement", "renewal", "support"]TRUE_EDGES = [ ("need", "match", 0.75), ("intent", "match", 0.70), ("match", "engagement", 0.85), ("intent", "renewal", 0.30), ("engagement", "renewal", 0.65), ("engagement", "support", 0.55),]variable_guide = pd.DataFrame( [ {"variable": "need", "role": "root cause", "description": "Baseline level of user need or demand."}, {"variable": "intent", "role": "root cause", "description": "Goal-directed intent that also affects later renewal value."}, {"variable": "match", "role": "mediator", "description": "Quality of matching need and intent to available options."}, {"variable": "engagement", "role": "mediator", "description": "Observed engagement generated by match quality."}, {"variable": "renewal", "role": "outcome", "description": "Future value that depends on engagement and intent."}, {"variable": "support", "role": "downstream outcome", "description": "Support burden or service need driven by engagement."}, ])true_edge_table = pd.DataFrame(TRUE_EDGES, columns=["source", "target", "true_coefficient"])true_edge_table.insert(0, "run", "truth")true_edge_table["edge_type"] ="-->"true_edge_table = true_edge_table[["run", "source", "edge_type", "target", "true_coefficient"]]true_coefficient_matrix = pd.DataFrame(0.0, index=VARIABLES, columns=VARIABLES)for source, target, coefficient in TRUE_EDGES: true_coefficient_matrix.loc[target, source] = coefficientvariable_guide.to_csv(TABLE_DIR /f"{NOTEBOOK_PREFIX}_variable_guide.csv", index=False)true_edge_table.to_csv(TABLE_DIR /f"{NOTEBOOK_PREFIX}_true_edges.csv", index=False)true_coefficient_matrix.to_csv(TABLE_DIR /f"{NOTEBOOK_PREFIX}_true_coefficient_matrix.csv")display(variable_guide)display(true_edge_table)
variable
role
description
0
need
root cause
Baseline level of user need or demand.
1
intent
root cause
Goal-directed intent that also affects later r...
2
match
mediator
Quality of matching need and intent to availab...
3
engagement
mediator
Observed engagement generated by match quality.
4
renewal
outcome
Future value that depends on engagement and in...
5
support
downstream outcome
Support burden or service need driven by engag...
run
source
edge_type
target
true_coefficient
0
truth
need
-->
match
0.75
1
truth
intent
-->
match
0.70
2
truth
match
-->
engagement
0.85
3
truth
intent
-->
renewal
0.30
4
truth
engagement
-->
renewal
0.65
5
truth
engagement
-->
support
0.55
The coefficient matrix uses the same orientation as causal-learn LiNGAM: row equals child, column equals parent. For example, the value in row match and column need is the direct effect of need on match.
Generate Linear Non-Gaussian Data
This cell creates the main dataset and a Gaussian-control version with the same structural coefficients. The non-Gaussian dataset is the one LiNGAM is designed for. The Gaussian version is saved for a later stress test where the structural graph is unchanged but the key LiNGAM distributional assumption is weakened.
def simulate_linear_system(n_samples=1_200, noise="laplace", seed=10):"""Simulate the six-variable teaching SEM with either Laplace or Gaussian errors.""" rng = np.random.default_rng(seed)if noise =="laplace": errors = rng.laplace(loc=0.0, scale=1.0, size=(n_samples, len(VARIABLES)))elif noise =="gaussian": errors = rng.normal(loc=0.0, scale=1.0, size=(n_samples, len(VARIABLES)))else:raiseValueError("noise must be either 'laplace' or 'gaussian'") data = np.zeros((n_samples, len(VARIABLES))) data[:, 0] = errors[:, 0] data[:, 1] = errors[:, 1] data[:, 2] =0.75* data[:, 0] +0.70* data[:, 1] + errors[:, 2] data[:, 3] =0.85* data[:, 2] + errors[:, 3] data[:, 4] =0.30* data[:, 1] +0.65* data[:, 3] + errors[:, 4] data[:, 5] =0.55* data[:, 3] + errors[:, 5]return pd.DataFrame(data, columns=VARIABLES)nongaussian_df = simulate_linear_system(n_samples=1_200, noise="laplace", seed=10)gaussian_df = simulate_linear_system(n_samples=1_200, noise="gaussian", seed=11)nongaussian_df.to_csv(DATASET_DIR /f"{NOTEBOOK_PREFIX}_linear_nongaussian.csv", index=False)gaussian_df.to_csv(DATASET_DIR /f"{NOTEBOOK_PREFIX}_linear_gaussian_control.csv", index=False)data_summary = pd.DataFrame( [ {"dataset": "linear_nongaussian","rows": len(nongaussian_df),"columns": nongaussian_df.shape[1],"noise_family": "Laplace","purpose": "Main LiNGAM example where assumptions are intentionally aligned.", }, {"dataset": "linear_gaussian_control","rows": len(gaussian_df),"columns": gaussian_df.shape[1],"noise_family": "Gaussian","purpose": "Stress test where the graph is the same but non-Gaussianity is weakened.", }, ])data_summary.to_csv(TABLE_DIR /f"{NOTEBOOK_PREFIX}_dataset_summary.csv", index=False)display(data_summary)display(nongaussian_df.head())
dataset
rows
columns
noise_family
purpose
0
linear_nongaussian
1200
6
Laplace
Main LiNGAM example where assumptions are inte...
1
linear_gaussian_control
1200
6
Gaussian
Stress test where the graph is the same but no...
need
intent
match
engagement
renewal
support
0
2.430457
-0.878601
2.277526
0.727127
0.234995
-0.902625
1
0.474932
1.150418
1.000170
3.301833
3.543043
1.425083
2
0.164310
0.706443
1.679660
3.444192
1.212747
2.569928
3
-1.277609
1.676954
-0.577906
0.734588
0.490584
3.211796
4
0.036320
-0.438584
-0.850934
-0.485379
-0.851244
0.175115
The generated rows are ordinary tabular observations. In a real discovery task the graph would be unknown, but here we keep the true edges so every algorithm can be evaluated rather than judged only by plausibility.
Distribution Diagnostics
LiNGAM needs independent non-Gaussian errors, not merely non-Gaussian observed variables. Since true errors are usually unavailable in real data, analysts often start with observable diagnostics: skewness, excess kurtosis, histograms, and normality tests. These checks do not prove the LiNGAM assumptions, but they can reveal whether the dataset is obviously incompatible with the story.
The non-Gaussian dataset should show heavier tails than the Gaussian-control dataset, especially through positive excess kurtosis and small Jarque-Bera p-values. This is not a full assumption check, but it confirms that the main example has the distributional signal LiNGAM expects.
Visual Distribution Check
The table above is compact, but plots make the difference easier to see. This cell overlays the observed distributions from the non-Gaussian and Gaussian-control datasets variable by variable.
The curves are not diagnostic by themselves, but they give useful context. If every variable looked perfectly Gaussian and the domain gave no reason to expect non-Gaussian errors, LiNGAM would be a harder sell as the primary discovery method.
Helper Functions
This notebook needs three kinds of helpers: graph parsing, graph scoring, and graph drawing. The most important helper is lingam_adjacency_to_edge_table, which converts a LiNGAM coefficient matrix into a standard source -> target edge table. Remember that the raw LiNGAM matrix is indexed as child rows and parent columns.
def lingam_adjacency_to_edge_table(adjacency_matrix, variables, label, threshold=0.15):"""Convert a LiNGAM child-row, parent-column matrix into a tidy directed edge table.""" rows = []for child_idx, parent_idx inzip(*np.where(np.abs(adjacency_matrix) >= threshold)): rows.append( {"run": label,"source": variables[int(parent_idx)],"edge_type": "-->","target": variables[int(child_idx)],"coefficient": float(adjacency_matrix[child_idx, parent_idx]),"abs_coefficient": float(abs(adjacency_matrix[child_idx, parent_idx])), } ) columns = ["run", "source", "edge_type", "target", "coefficient", "abs_coefficient"]return pd.DataFrame(rows, columns=columns).sort_values(["source", "target"]).reset_index(drop=True)def parse_causallearn_edge(edge):"""Convert a causal-learn edge object into source, endpoint pattern, and target strings.""" parts =str(edge).strip().split()iflen(parts) !=3:return {"source": str(edge), "edge_type": "unknown", "target": "unknown"}return {"source": parts[0], "edge_type": parts[1], "target": parts[2]}def graph_to_edge_table(graph, label):"""Return a tidy edge table from a causal-learn graph object.""" rows = [parse_causallearn_edge(edge) for edge in graph.get_graph_edges()] edge_df = pd.DataFrame(rows, columns=["source", "edge_type", "target"])if edge_df.empty: edge_df = pd.DataFrame(columns=["source", "edge_type", "target"]) edge_df.insert(0, "run", label)return edge_dfdef directed_pairs(edge_df):"""Extract definite directed pairs from an edge table.""" pairs =set()for row in edge_df.itertuples(index=False):if row.edge_type =="-->": pairs.add((row.source, row.target))elif row.edge_type =="<--": pairs.add((row.target, row.source))return pairsdef skeleton_pairs(edge_df):"""Extract adjacencies while ignoring direction.""" pairs =set()for row in edge_df.itertuples(index=False):if row.target !="unknown": pairs.add(frozenset([row.source, row.target]))return pairsdef summarize_against_truth(edge_df, truth_df, label):"""Compute graph-recovery metrics against a directed truth table.""" true_directed =set(zip(truth_df["source"], truth_df["target"])) true_skeleton = {frozenset(edge) for edge in true_directed} learned_directed = directed_pairs(edge_df) learned_skeleton = skeleton_pairs(edge_df) correct_directed = learned_directed & true_directed reversed_true = {(src, dst) for src, dst in true_directed if (dst, src) in learned_directed} missing_skeleton = true_skeleton - learned_skeleton extra_skeleton = learned_skeleton - true_skeleton unresolved_true =0for src, dst in true_directed: pair =frozenset([src, dst])if pair in learned_skeleton and (src, dst) notin learned_directed and (dst, src) notin learned_directed: unresolved_true +=1 directed_count =len(learned_directed)return pd.DataFrame( [ {"run": label,"learned_edges_total": len(edge_df),"definite_directed_edges": directed_count,"true_edges": len(true_directed),"correct_directed_edges": len(correct_directed),"directed_precision": len(correct_directed) / directed_count if directed_count else np.nan,"directed_recall": len(correct_directed) /len(true_directed) if true_directed else np.nan,"reversed_true_edges": len(reversed_true),"unresolved_true_adjacencies": unresolved_true,"missing_true_adjacencies": len(missing_skeleton),"extra_adjacencies": len(extra_skeleton), } ] )def run_pc_fisherz(data_df, label, alpha=0.01):"""Run PC quietly with Fisher-Z tests."""with warnings.catch_warnings(): warnings.simplefilter("ignore") result = pc( data_df.to_numpy(), alpha=alpha, indep_test="fisherz", stable=True, verbose=False, show_progress=False, node_names=list(data_df.columns), )return result, graph_to_edge_table(result.G, label=label)def run_ges_bic(data_df, label):"""Run BIC-GES quietly for comparison."""buffer= io.StringIO()with contextlib.redirect_stdout(buffer): record = ges( data_df.to_numpy(), score_func="local_score_BIC", node_names=list(data_df.columns), lambda_value=0.5, )return record, graph_to_edge_table(record["G"], label=label)GRAPH_POSITIONS = {"need": (0.11, 0.72),"intent": (0.11, 0.28),"match": (0.39, 0.50),"engagement": (0.62, 0.50),"renewal": (0.89, 0.72),"support": (0.89, 0.28),}NODE_LABELS = {name: name.title() for name in GRAPH_POSITIONS}NODE_COLORS = {"need": "#e0f2fe","intent": "#dbeafe","match": "#ecfccb","engagement": "#fef3c7","renewal": "#fee2e2","support": "#f3e8ff",}def trim_edge_to_box(start, end, box_w=0.145, box_h=0.095, gap=0.012):"""Return edge endpoints that stop just outside source and target boxes.""" x0, y0 = start x1, y1 = end dx = x1 - x0 dy = y1 - y0 length =float(np.hypot(dx, dy))if length ==0:return start, end effective_w = box_w +0.04 effective_h = box_h +0.04 x_limit = (effective_w /2) /abs(dx) if dx else np.inf y_limit = (effective_h /2) /abs(dy) if dy else np.inf t =min(x_limit, y_limit) + gap / lengthreturn (x0 + dx * t, y0 + dy * t), (x1 - dx * t, y1 - dy * t)def draw_box_graph(edge_df, title, path, note=None):"""Draw a DAG-style graph with rounded boxes and visible arrowheads.""" fig, ax = plt.subplots(figsize=(12, 6.2)) ax.set_axis_off() ax.set_xlim(-0.02, 1.02) ax.set_ylim(0.04, 0.96) box_w, box_h =0.145, 0.095for row in edge_df.itertuples(index=False):if row.source notin GRAPH_POSITIONS or row.target notin GRAPH_POSITIONS:continue raw_start = GRAPH_POSITIONS[row.source] raw_end = GRAPH_POSITIONS[row.target]if row.edge_type =="<--": raw_start, raw_end = raw_end, raw_start start, end = trim_edge_to_box(raw_start, raw_end, box_w=box_w, box_h=box_h)if row.edge_type in {"-->", "<--"}: arrowstyle ="-|>" mutation_scale =18 linewidth =1.8 color ="#334155"else: arrowstyle ="-" mutation_scale =1 linewidth =1.5 color ="#64748b" arrow = FancyArrowPatch( start, end, arrowstyle=arrowstyle, mutation_scale=mutation_scale, linewidth=linewidth, color=color, connectionstyle="arc3,rad=0.035", zorder=2, ) ax.add_patch(arrow)for node, (x, y) in GRAPH_POSITIONS.items(): rect = FancyBboxPatch( (x - box_w /2, y - box_h /2), box_w, box_h, boxstyle="round,pad=0.018", facecolor=NODE_COLORS[node], edgecolor="#1f2937", linewidth=1.1, zorder=5, ) ax.add_patch(rect) ax.text(x, y, NODE_LABELS[node], ha="center", va="center", fontsize=10.5, fontweight="bold", zorder=6)if note: ax.text(0.50, 0.08, note, ha="center", va="center", fontsize=10, color="#475569") ax.set_title(title, pad=18, fontsize=14, fontweight="bold") fig.savefig(path, dpi=160, bbox_inches="tight") plt.show()def truth_as_edge_table(truth_df, label="truth"):"""Convert the truth table into a plotting-ready edge table."""return truth_df.assign(run=label, edge_type="-->")[["run", "source", "edge_type", "target"]]
These helpers make the rest of the notebook easier to read. The same metric definitions are used for LiNGAM, PC, and GES so the comparison stays apples-to-apples.
Draw The True DAG
Before looking at learned graphs, we draw the true graph used to create the data. This anchors the rest of the notebook: every learned edge table and metric table will be compared back to this DAG.
true_plot_edges = truth_as_edge_table(true_edge_table)draw_box_graph( true_plot_edges, title="True Linear Non-Gaussian Teaching DAG", path=FIGURE_DIR /f"{NOTEBOOK_PREFIX}_true_lingam_dag.png", note="The synthetic data is generated from this weighted acyclic SEM.",)
The graph has six directed edges and no cycles. LiNGAM will try to recover both this ordering and the direct linear coefficients from the observed data alone.
A Tiny Directionality Demo
A two-variable system is the cleanest way to see why LiNGAM is useful. If cause and effect are merely dependent, PC and GES have no collider or larger graph pattern that forces an orientation. With non-Gaussian linear noise, LiNGAM can use distributional asymmetry to choose a direction.
In this two-variable example, PC and GES usually report an unoriented adjacency because both directions fit the same conditional independence pattern. DirectLiNGAM can orient the edge because the data was generated by a linear model with independent non-Gaussian errors.
Fit DirectLiNGAM And ICA-LiNGAM
Now we fit the two core LiNGAM estimators available in causal-learn. DirectLiNGAM estimates the causal order directly through independence comparisons. ICALiNGAM uses independent component analysis and then searches for a causal ordering compatible with the estimated mixing structure.
We use a coefficient threshold of 0.15 for graph reporting. The raw coefficient matrices are saved separately, so the threshold is only a reporting choice, not a hidden model change.
On this aligned synthetic example, both LiNGAM estimators should recover the main direct edges with coefficients close to the data-generating values. Small ordering differences among downstream variables can happen when those variables are not direct ancestors of each other.
Learned DirectLiNGAM Graph
The edge table is precise, but a graph is easier to scan. This figure shows the thresholded DirectLiNGAM graph using the same visual style as the rest of the tutorial series.
direct_lingam_edges = lingam_edge_table[lingam_edge_table["run"] =="DirectLiNGAM"].copy()draw_box_graph( direct_lingam_edges, title="DirectLiNGAM Learned Graph", path=FIGURE_DIR /f"{NOTEBOOK_PREFIX}_direct_lingam_graph.png", note=f"Edges shown when absolute coefficient is at least {EDGE_THRESHOLD}.",)
The graph should line up closely with the true DAG. The plot is thresholded for readability, while the coefficient matrices below retain the full numerical output.
Compare Estimated Coefficients To True Coefficients
LiNGAM gives signed direct-effect coefficients, which is one reason it is useful in linear systems. Here we stack the true matrix, DirectLiNGAM matrix, and ICA-LiNGAM matrix into one tidy table and a heatmap.
The heatmap is the safest way to read LiNGAM coefficients because it preserves orientation: parent variables are columns and child variables are rows. The strongest learned coefficients should sit in the same cells as the true nonzero coefficients.
Causal Order Check
LiNGAM estimates a causal order as part of the model. Exact order is stricter than edge recovery: two variables that are not ancestors of each other can swap without changing the direct edge graph. This table compares the true order ranks with the estimated order ranks.
true_order = ["need", "intent", "match", "engagement", "renewal", "support"]order_rows = []for model_label, model in model_specs: estimated_order = [VARIABLES[int(idx)] for idx in model.causal_order_]for variable in VARIABLES: order_rows.append( {"model": model_label,"variable": variable,"true_rank": true_order.index(variable) +1,"estimated_rank": estimated_order.index(variable) +1,"rank_difference": (estimated_order.index(variable) +1) - (true_order.index(variable) +1), } )order_table = pd.DataFrame(order_rows)order_table.to_csv(TABLE_DIR /f"{NOTEBOOK_PREFIX}_causal_order_comparison.csv", index=False)display(order_table)
model
variable
true_rank
estimated_rank
rank_difference
0
DirectLiNGAM
need
1
1
0
1
DirectLiNGAM
intent
2
2
0
2
DirectLiNGAM
match
3
3
0
3
DirectLiNGAM
engagement
4
4
0
4
DirectLiNGAM
renewal
5
6
1
5
DirectLiNGAM
support
6
5
-1
6
ICALiNGAM
need
1
1
0
7
ICALiNGAM
intent
2
2
0
8
ICALiNGAM
match
3
3
0
9
ICALiNGAM
engagement
4
4
0
10
ICALiNGAM
renewal
5
5
0
11
ICALiNGAM
support
6
6
0
The order table should be read with nuance. Getting a root before its descendants matters; swapping two unrelated downstream variables is usually less important than reversing a true parent-child relationship.
Residual Independence Check
After fitting a LiNGAM model, we can compute residuals implied by the learned structural equations and test whether those residuals look mutually independent. This is closer to the actual LiNGAM assumption than checking observed-variable distributions, although it still remains a diagnostic rather than a proof.
Large off-diagonal p-values are more comfortable for the LiNGAM story because they suggest the fitted residuals are not obviously dependent. Isolated small p-values are a cue to inspect model fit, omitted variables, nonlinearities, or threshold choices.
Bootstrap Edge Stability
A single fitted graph can look cleaner than the uncertainty behind it. Bootstrap resampling repeatedly refits DirectLiNGAM on resampled rows and records how often each edge appears. This is a practical stability check for the discovered directions.
Edges with high bootstrap probability are more stable under row resampling. Low-probability edges should not be promoted to strong causal claims, even when they appear in one fitted graph.
Bootstrap Probability Heatmap
The bootstrap matrix has the same child-row, parent-column orientation as the coefficient matrix. Bright cells mean that a parent-child direction appeared in many bootstrap runs.
The true direct edges should have high probability in this controlled example. Any nonzero probability on a false edge is useful too: it tells us where the estimator occasionally sees a weak alternative explanation.
Total Effects And Paths
Direct coefficients describe immediate parent-child effects. LiNGAM can also estimate total effects along directed paths. Here we inspect total effects and the paths from need to renewal, where the true main route is need -> match -> engagement -> renewal.
Total effects can be helpful for decision questions, but they inherit every assumption behind the learned graph. In this notebook they are teaching diagnostics; in real work they should be paired with domain review and robustness checks.
Compare LiNGAM With PC And GES
PC, GES, and LiNGAM answer related but different discovery questions. PC searches for conditional independence structure. GES searches over graph scores. LiNGAM uses a linear non-Gaussian structural model. Running them side by side helps separate discoveries that are robust across families from discoveries that depend heavily on one assumption set.
On this clean six-variable system, all three methods may do well because the graph has enough structure to orient the key edges. The two-variable demo above is the sharper example of LiNGAM’s directional advantage.
Plot Method-Level Metrics
A compact metric plot makes it easier to compare recovery patterns across algorithm families. The bars focus on directed recovery and adjacency mistakes rather than coefficient accuracy.
metric_plot_df = comparison_metrics.melt( id_vars="run", value_vars=["correct_directed_edges", "reversed_true_edges", "unresolved_true_adjacencies", "extra_adjacencies"], var_name="metric", value_name="count",)fig, ax = plt.subplots(figsize=(10.5, 5.2))sns.barplot(data=metric_plot_df, x="metric", y="count", hue="run", ax=ax)ax.set_title("LiNGAM Versus PC And GES On The Teaching DAG")ax.set_xlabel("Metric")ax.set_ylabel("Count")ax.tick_params(axis="x", rotation=20)ax.legend(title="Method", bbox_to_anchor=(1.02, 1), loc="upper left")plt.tight_layout()fig.savefig(FIGURE_DIR /f"{NOTEBOOK_PREFIX}_method_comparison_metrics.png", dpi=160, bbox_inches="tight")plt.show()
When several methods agree, the edge is easier to trust as a structural hypothesis. When they disagree, the next step is not to vote blindly; it is to ask which assumptions are most credible for the domain and dataset.
Threshold Sensitivity
LiNGAM returns dense numerical coefficient matrices. Reporting requires a threshold, and thresholds can change the edge set. This cell reruns the graph metrics across several absolute-coefficient thresholds for DirectLiNGAM.
A good threshold should remove tiny unstable coefficients without deleting meaningful true edges. In real data, this table should be shown alongside bootstrap probabilities rather than used as a standalone tuning exercise.
Plot Threshold Sensitivity
This plot shows how the reported graph changes as the coefficient threshold becomes more conservative.
The curve helps expose brittle conclusions. If one edge appears only at a very permissive threshold and has low bootstrap probability, it should be treated as exploratory rather than central.
Gaussian Control Stress Test
LiNGAM is designed for non-Gaussian errors. To see why that matters, we repeatedly simulate the same structural graph under Laplace and Gaussian errors, fit DirectLiNGAM, and compare recovery metrics. The coefficient structure is unchanged; only the noise family changes.
The stress test is intentionally small, but it makes the assumption visible. If the Gaussian-control results are less stable, that is not a bug in LiNGAM; it is the model telling us that its identifying information has been weakened.
Plot The Noise-Family Stress Test
The grouped summary above is useful, but a plot shows the run-to-run spread more clearly.
This is the practical habit to build: whenever a method depends on a strong assumption, create a stress test that changes that assumption while holding the structural story fixed. It makes the model’s comfort zone concrete.
Practical Reporting Checklist
A LiNGAM report should be more than a graph screenshot. This checklist records the minimum context needed for a reader to understand what was estimated, why the method was plausible, and where the result is fragile.
reporting_checklist = pd.DataFrame( [ {"item": "State the structural assumptions","what_to_report": "Linearity, acyclicity, independent errors, and non-Gaussianity.","why_it_matters": "These assumptions are what allow LiNGAM to orient edges beyond Markov equivalence.", }, {"item": "Show distribution diagnostics","what_to_report": "Skewness, excess kurtosis, normality checks, and domain reasons for non-Gaussian errors.","why_it_matters": "Observed diagnostics cannot prove the assumption, but they can reveal obvious mismatch.", }, {"item": "Report matrix orientation","what_to_report": "Rows are children and columns are parents in the LiNGAM adjacency matrix.","why_it_matters": "Misreading matrix orientation reverses causal stories.", }, {"item": "Save threshold sensitivity","what_to_report": "Edge sets and metrics under multiple coefficient thresholds.","why_it_matters": "Small coefficients should not quietly drive major claims.", }, {"item": "Use stability diagnostics","what_to_report": "Bootstrap edge probabilities, path probabilities, and method comparisons.","why_it_matters": "Stable edges are more credible than one-run artifacts.", }, {"item": "Compare with other discovery families","what_to_report": "Agreement and disagreement with PC, GES, or domain-constrained alternatives.","why_it_matters": "Disagreement reveals assumption-sensitive conclusions.", }, ])reporting_checklist.to_csv(TABLE_DIR /f"{NOTEBOOK_PREFIX}_lingam_reporting_checklist.csv", index=False)display(reporting_checklist)
item
what_to_report
why_it_matters
0
State the structural assumptions
Linearity, acyclicity, independent errors, and...
These assumptions are what allow LiNGAM to ori...
1
Show distribution diagnostics
Skewness, excess kurtosis, normality checks, a...
Observed diagnostics cannot prove the assumpti...
2
Report matrix orientation
Rows are children and columns are parents in t...
Misreading matrix orientation reverses causal ...
3
Save threshold sensitivity
Edge sets and metrics under multiple coefficie...
Small coefficients should not quietly drive ma...
4
Use stability diagnostics
Bootstrap edge probabilities, path probabiliti...
Stable edges are more credible than one-run ar...
5
Compare with other discovery families
Agreement and disagreement with PC, GES, or do...
Disagreement reveals assumption-sensitive conc...
The checklist is deliberately practical. In a portfolio or production-style analysis, the graph is only one output; the assumption audit and stability checks are what make the graph defensible.
Artifact Manifest
The final cell lists the files produced by this notebook. This makes it easier to find the generated datasets, tables, and figures without searching the output directory manually.
The notebook now has a complete LiNGAM workflow: assumption framing, synthetic data with known truth, model fitting, graph recovery metrics, coefficient review, bootstrap stability, comparison with PC/GES, assumption stress testing, and reporting guidance.
The next tutorial can move from linear non-Gaussian models to nonlinear functional causal models, where the direction signal comes from asymmetric regression residuals rather than linear ICA structure.