causal-learn Tutorial 14: Hidden Representation Learning With GIN
Many causal discovery examples assume every important variable is observed directly. Real datasets are often messier. We may observe several noisy measurements of a hidden construct, but not the construct itself. For example, a latent state such as user need, satisfaction, health status, or product-market fit may only appear through multiple proxy measurements.
This notebook introduces the causal-learn GIN tools for linear non-Gaussian latent-variable models. GIN stands for Generalized Independent Noise. The practical idea is to use observed proxy variables to recover latent clusters and a causal order among latent constructs under strong structural assumptions.
We will simulate a small dataset with two hidden variables and six observed indicators. Then we will run GIN_MI, compare it with the independence-test version of GIN, evaluate recovered clusters against known truth, and stress-test the method when indicators become noisy or cross-loaded.
Estimated runtime: about 1-2 minutes. The baseline is fast; the sensitivity grid runs several small GIN fits.
Learning Goals
By the end of this notebook, you should be able to:
explain why latent-variable discovery is different from observed-variable DAG discovery;
simulate a linear non-Gaussian latent-variable model with observed indicators;
run causal-learn’s GIN implementation and read the returned clusters;
evaluate recovered latent clusters with cluster purity and adjusted Rand index;
distinguish cluster recovery from latent causal ordering;
recognize failure modes such as weak indicators, noisy indicators, and cross-loadings.
Notebook Flow
We will work in a sequence that mirrors a careful applied workflow:
Set up imports, outputs, and plotting style.
Define a latent-variable data-generating process with known truth.
Audit observed indicators and their correlation structure.
Draw the true latent measurement graph.
Run GIN_MI and inspect the learned latent clusters.
Compare learned clusters with true latent groups.
Compare GIN_MI with the independence-test version of GIN.
Run sensitivity checks for sample size, measurement noise, latent noise shape, and cross-loading.
Save reporting guidance and an artifact manifest.
Why Hidden Representation Learning Matters
Observed-variable graph discovery asks for edges among measured columns. Hidden representation learning asks a different question: which unobserved constructs could explain groups of measured columns, and how might those constructs be causally ordered?
This matters because proxy variables are not interchangeable with the construct they measure. If three indicators all measure the same hidden factor, drawing causal arrows among those indicators can be misleading. A latent-variable view instead says: these observed variables are children of a hidden parent, and the hidden parent is the object we may want to reason about.
GIN In Plain Language
GIN-style methods are designed for linear non-Gaussian latent-variable models. The observed variables are generated from hidden variables plus independent noise. Under the right assumptions, certain linear combinations of observed indicators should be independent of variables outside the indicator group. GIN uses this property to find clusters of indicators and then infer a causal order among the latent variables.
Two details are important:
GIN is not a generic clustering algorithm. It relies on causal and distributional assumptions.
The output latent labels such as L1 and L2 are discovered labels, not the original names from the simulator. We must map them back to truth using their observed indicators.
What GIN Can And Cannot Claim
GIN can suggest that a set of observed variables share a latent parent, and it can propose an order among recovered latent groups. In this notebook, we can score that proposal because the synthetic truth is known.
In real data, the learned latent graph should be treated as a candidate measurement structure. Stronger claims require domain review, alternative specifications, sensitivity checks, and preferably external validation. Cross-loaded indicators, weak proxies, hidden subgroups, nonlinear measurement, and Gaussian noise can all make the output less reliable.
Setup
This cell imports the packages used in the notebook and creates output folders. The matplotlib cache path avoids noisy cache warnings in restricted environments.
Saving these versions makes it easier to reproduce the exact graph and cluster outputs later.
Define The Latent Measurement Model
The synthetic dataset has two hidden variables:
latent_need: an upstream hidden construct;
latent_value: a downstream hidden construct caused by latent_need.
Each latent variable has three observed indicators. The observed indicators are noisy measurements, not causes of each other. This is exactly the kind of setting where an observed-variable graph can be less natural than a latent measurement graph.
OBSERVED_VARIABLES = ["X1_need_search","X2_need_depth","X3_need_variety","X4_value_click","X5_value_watch","X6_value_return",]OBSERVED_LABELS = [f"X{i}"for i inrange(1, len(OBSERVED_VARIABLES) +1)]OBSERVED_NAME_MAP =dict(zip(OBSERVED_LABELS, OBSERVED_VARIABLES))TRUE_LATENT_GROUPS = {"latent_need": [0, 1, 2],"latent_value": [3, 4, 5],}TRUE_LATENT_ORDER = ["latent_need", "latent_value"]TRUE_CLUSTER_LABELS = np.array([0, 0, 0, 1, 1, 1])indicator_metadata = pd.DataFrame( [ {"observed_index": i,"gin_label": f"X{i +1}","observed_variable": OBSERVED_VARIABLES[i],"true_latent": latent, }for latent, members in TRUE_LATENT_GROUPS.items()for i in members ]).sort_values("observed_index")indicator_metadata.to_csv(TABLE_DIR /f"{NOTEBOOK_PREFIX}_indicator_metadata.csv", index=False)display(indicator_metadata)
observed_index
gin_label
observed_variable
true_latent
0
0
X1
X1_need_search
latent_need
1
1
X2
X2_need_depth
latent_need
2
2
X3
X3_need_variety
latent_need
3
3
X4
X4_value_click
latent_value
4
4
X5
X5_value_watch
latent_value
5
5
X6
X6_value_return
latent_value
This metadata table is essential because causal-learn labels observed variables as X1, X2, and so on. The table maps those algorithm labels back to meaningful column names.
Simulate Latent Indicator Data
The simulator creates non-Gaussian hidden variables and non-Gaussian measurement noise. The downstream latent variable depends on the upstream latent variable. Each observed indicator is a noisy linear measurement of its own latent parent.
The function also has knobs for later stress tests: higher measurement noise, Gaussian noise, weak indicators, and cross-loading from the wrong latent factor.
The observed dataset contains only indicators. The latent truth is saved for evaluation, but the GIN algorithm only receives the observed indicator matrix.
Basic Data Audit
Before any latent discovery step, inspect observed indicator scale and missingness. GIN expects a numeric matrix with no missing values.
The indicators are standardized and complete. That keeps the focus on latent structure rather than preprocessing problems.
Correlation Structure Of Indicators
If the measurement model is clean, indicators of the same latent variable should be strongly correlated with each other. Indicators across different latent variables can also be correlated because the latent variables are causally connected.
The block pattern is visible: the first three indicators move together, and the last three indicators move together. Cross-block correlation is also present because the upstream latent factor causes the downstream latent factor.
Draw The True Latent Measurement Graph
The true graph has latent variables at the top and observed indicators below. The measured columns are children of hidden parents, not peers in a simple observed-variable DAG.
This is the graph we want the GIN workflow to recover at a high level: two indicator clusters and an upstream-to-downstream latent order.
Helper Functions For GIN Output
GIN_MI returns a graph and a list of observed-index clusters. The helper functions below convert those clusters into readable tables, score cluster quality, and draw a learned measurement graph.
The adjusted Rand index scores cluster agreement without caring about the arbitrary latent labels. The order check is separate because a method can recover the right clusters but still reverse the latent order.
Run GIN-MI
GIN_MI is the mutual-information-style variant exposed in causal-learn. It is fast on this small dataset and usually recovers the two indicator groups cleanly.
The baseline run recovers the two indicator clusters and the expected latent order. The discovered latent labels are arbitrary, but the member indicators make the learned constructs readable.
Draw The Learned GIN-MI Measurement Graph
This drawing uses the learned clusters rather than the simulator’s latent labels. The top labels L1, L2, and so on come from the learned order returned by GIN-MI.
The learned graph has the same high-level shape as the true graph: one latent group for the first three indicators and one latent group for the last three indicators.
Compare With The Independence-Test GIN Variant
causal-learn also exposes GIN, which can use hsic or kci independence testing. On small examples, this version can be more conservative about which indicators it assigns to a cluster. We run the hsic option here because it is quick enough for a tutorial notebook.
The HSIC-based run may assign fewer indicators than GIN-MI on this finite sample. That is a useful reminder that algorithm settings and independence tests affect the recovered measurement structure.
Method Comparison Table
The next table compares GIN-MI and HSIC-based GIN using cluster quality, latent order agreement, and runtime.
GIN-MI is the cleaner baseline for this notebook. The HSIC version is still useful to show how a stricter independence-testing route can behave differently.
Sensitivity To Sample Size And Measurement Noise
Latent cluster recovery depends on having enough observations and strong enough indicators. We scan sample size and measurement noise to see when the recovered clusters remain stable.
The grid shows the main practical pattern: with more observations, the correct indicator clusters are more robust to noisy measurement. With very small samples and high noise, clusters can mix indicators from different latent parents.
Plot Sample And Noise Sensitivity
The heatmap shows adjusted Rand index across the sample-size and noise grid. Values near one mean the recovered indicator clusters match the true groups.
ari_heatmap = noise_sensitivity.pivot(index="measurement_noise", columns="n_samples", values="adjusted_rand_index")fig, ax = plt.subplots(figsize=(8, 5))sns.heatmap(ari_heatmap, annot=True, fmt=".2f", cmap="viridis", vmin=0, vmax=1, ax=ax)ax.set_title("GIN-MI Cluster Recovery Across Sample Size And Noise")ax.set_xlabel("Number Of Samples")ax.set_ylabel("Measurement Noise Scale")plt.tight_layout()fig.savefig(FIGURE_DIR /f"{NOTEBOOK_PREFIX}_sample_noise_sensitivity.png", dpi=160, bbox_inches="tight")plt.show()
The heatmap makes the failure region easy to see. A latent discovery report should include this kind of stress check whenever the strength of the indicators is uncertain.
Sensitivity To Non-Gaussian Assumptions
GIN is motivated by linear non-Gaussian latent-variable structure. The next cell compares Laplace and Gaussian noise across several random seeds. Cluster recovery can still look good under Gaussian noise in this simple example, but the latent order becomes less stable.
This table separates two outcomes. The indicator clusters can remain correct while the latent order flips across seeds, especially when the distributional assumptions are weakened.
Plot Latent Order Stability
The next plot counts how often the learned majority order matches the true latent order for each noise family.
The plot shows why cluster recovery and latent order recovery should be reported separately. A clean clustering result does not automatically make the direction among latent constructs stable.
Cross-Loading Stress Test
A clean measurement model says each indicator belongs to one latent parent. Real indicators often cross-load: one observed variable partly measures more than one hidden construct. The next test gradually contaminates one downstream indicator with the upstream latent factor.
Once the cross-loading becomes large, the contaminated indicator moves into the wrong learned cluster. This is a useful failure mode because it matches a common real-data problem: proxy variables often measure multiple constructs.
Plot Cross-Loading Sensitivity
The plot shows how cluster quality changes as one indicator becomes less clean.
The decline marks the point where the clean single-parent measurement assumption is no longer a good description of the observed indicators.
Runtime Sketch
GIN-MI is fast on six observed variables, but runtime still grows with sample size and the number of observed indicators. This small benchmark varies the sample size for the baseline six-indicator model.
The runtime remains manageable here. With many observed variables, the cluster search can become the expensive part.
Practical Reporting Checklist
A GIN analysis should report the measurement assumptions as clearly as the graph output. The next checklist records what a reader needs to know before trusting a latent discovery result.
reporting_checklist = pd.DataFrame( [ {"item": "Observed indicator map","what_to_report": "Which measured columns are candidate indicators and how they were selected.","why_it_matters": "GIN learns latent groups from observed indicators; irrelevant columns can distort clusters.", }, {"item": "Measurement model assumption","what_to_report": "Whether indicators are expected to have one latent parent or possible cross-loadings.","why_it_matters": "Cross-loaded indicators can move into the wrong learned cluster.", }, {"item": "Distributional assumption","what_to_report": "Whether non-Gaussianity is plausible or checked.","why_it_matters": "The method is designed for linear non-Gaussian latent-variable structure.", }, {"item": "Cluster quality diagnostics","what_to_report": "Cluster membership, cluster size, stability, and domain meaning.","why_it_matters": "The latent labels are arbitrary until the indicator membership gives them meaning.", }, {"item": "Latent order stability","what_to_report": "Whether the learned order changes across seeds, samples, or plausible preprocessing choices.","why_it_matters": "Correct clusters do not guarantee a stable causal order among latent constructs.", }, {"item": "Sensitivity checks","what_to_report": "Noise, weak indicators, cross-loading, alternative tests, and sample-size checks.","why_it_matters": "Latent discovery can look clean in one specification and fragile in another.", }, {"item": "Claim strength","what_to_report": "Whether the output is used as a candidate measurement graph or a causal conclusion.","why_it_matters": "Latent-variable discovery needs external support before strong claims are made.", }, ])reporting_checklist.to_csv(TABLE_DIR /f"{NOTEBOOK_PREFIX}_reporting_checklist.csv", index=False)display(reporting_checklist)
item
what_to_report
why_it_matters
0
Observed indicator map
Which measured columns are candidate indicator...
GIN learns latent groups from observed indicat...
1
Measurement model assumption
Whether indicators are expected to have one la...
Cross-loaded indicators can move into the wron...
2
Distributional assumption
Whether non-Gaussianity is plausible or checked.
The method is designed for linear non-Gaussian...
3
Cluster quality diagnostics
Cluster membership, cluster size, stability, a...
The latent labels are arbitrary until the indi...
4
Latent order stability
Whether the learned order changes across seeds...
Correct clusters do not guarantee a stable cau...
5
Sensitivity checks
Noise, weak indicators, cross-loading, alterna...
Latent discovery can look clean in one specifi...
6
Claim strength
Whether the output is used as a candidate meas...
Latent-variable discovery needs external suppo...
The checklist is intentionally conservative. GIN can be a powerful way to reason about hidden constructs, but the assumptions must travel with the result.
Artifact Manifest
The final cell lists the datasets, tables, and figures created by this notebook.
The notebook leaves us with a reusable pattern: define candidate indicators, learn latent clusters, score or audit cluster quality, then stress-test the measurement assumptions before treating the latent graph as more than a hypothesis.