causal-learn Tutorial 01: Graphs, DAGs, CPDAGs, PAGs, And Evaluation
This notebook builds the graph vocabulary that the rest of the causal-learn tutorial will rely on. Before running PC, FCI, GES, LiNGAM, or time-series discovery, we need to understand what those algorithms return. A causal discovery result is not just a picture: it is a compact statement about adjacencies, edge directions, ambiguous directions, possible hidden confounding, and what the observed data can or cannot distinguish.
The central question here is: when an algorithm returns a graph, how should we read it, compare it to a known answer, and report it responsibly? We will use small teaching graphs so every edge can be inspected by hand. The same ideas scale to larger graphs where manual inspection is impossible.
By the end, you should be comfortable with four recurring ideas:
A DAG is a fully directed acyclic graph: it says which variables are direct causes of which other variables under the assumed variable set.
A CPDAG represents a Markov equivalence class of DAGs: some directions are compelled, while other edges remain reversible from observational data alone.
A PAG is used when latent confounders or selection effects may exist: circles and bidirected edges communicate uncertainty and hidden-variable risk.
Evaluation metrics such as adjacency precision, adjacency recall, arrow precision, arrow recall, and SHD answer different questions; no single score fully summarizes graph quality.
Notebook Flow
We will move from concepts to code in small steps:
Set up imports, output folders, and package-version checks.
Build a tiny true DAG using causal-learn graph objects.
Render the same graph in a clean teaching style.
Convert a simple DAG to a CPDAG and inspect why directions can disappear.
Compare DAGs, CPDAG-like outputs, and PAG-like outputs using endpoint marks.
Build graph-recovery metrics from first principles.
Cross-check one example with causal-learn’s built-in graph-comparison utilities.
The goal is not to memorize every internal graph method. The goal is to learn what each graph is claiming and how to translate that claim into evidence a reviewer can understand.
Setup
This first code cell prepares the notebook environment. It creates output directories for figures and tables, imports the graph classes we will use from causal-learn, and prints package versions so the notebook is reproducible. The figure-rendering code uses Graphviz through pydot because graph layout matters in a tutorial: if the graph is hard to read, the concept becomes harder than it needs to be.
from pathlib import Pathfrom importlib.metadata import PackageNotFoundError, versionimport osimport warnings# Keep matplotlib cache files inside the project instead of writing to a user-level cache.os.environ.setdefault("MPLCONFIGDIR", str(Path.cwd() /".matplotlib_cache"))import numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsimport pydotfrom IPython.display import Image, displayfrom causallearn.graph.GraphNode import GraphNodefrom causallearn.graph.Dag import Dagfrom causallearn.graph.GeneralGraph import GeneralGraphfrom causallearn.graph.Edge import Edgefrom causallearn.graph.Endpoint import Endpointfrom causallearn.utils.DAG2CPDAG import dag2cpdagfrom causallearn.graph.SHD import SHDfrom causallearn.graph.AdjacencyConfusion import AdjacencyConfusionfrom causallearn.graph.ArrowConfusion import ArrowConfusionwarnings.filterwarnings("ignore", category=FutureWarning)sns.set_theme(style="whitegrid", context="notebook")pd.set_option("display.max_columns", 80)pd.set_option("display.max_colwidth", 120)NOTEBOOK_DIR = Path.cwd()if NOTEBOOK_DIR.name !="causal_learn": NOTEBOOK_DIR = Path("notebooks/tutorials/causal_learn").resolve()else: NOTEBOOK_DIR = NOTEBOOK_DIR.resolve()OUTPUT_DIR = NOTEBOOK_DIR /"outputs"FIGURE_DIR = OUTPUT_DIR /"figures"TABLE_DIR = OUTPUT_DIR /"tables"for directory in [OUTPUT_DIR, FIGURE_DIR, TABLE_DIR]: directory.mkdir(parents=True, exist_ok=True)NOTEBOOK_PREFIX ="01"def pkg_version(package_name: str) ->str:"""Return a package version string without failing the notebook if metadata is unavailable."""try:return version(package_name)except PackageNotFoundError:return"not installed"version_table = pd.DataFrame( [ {"package": "causal-learn", "version": pkg_version("causal-learn")}, {"package": "pandas", "version": pkg_version("pandas")}, {"package": "numpy", "version": pkg_version("numpy")}, {"package": "matplotlib", "version": pkg_version("matplotlib")}, {"package": "seaborn", "version": pkg_version("seaborn")}, {"package": "pydot", "version": pkg_version("pydot")}, ])version_table
package
version
0
causal-learn
0.1.4.5
1
pandas
3.0.2
2
numpy
2.4.4
3
matplotlib
3.10.9
4
seaborn
0.13.2
5
pydot
4.0.1
The version table is a small but useful reproducibility check. Discovery algorithms are sensitive to implementation details, and graph classes can change across library versions. Keeping the versions visible makes it easier to explain later why a result may differ from another machine or tutorial.
Graph Concept Map
This table gives the minimum vocabulary needed before touching algorithms. A common mistake is to treat every returned graph as if it were a fully oriented causal DAG. That is too strong for many discovery methods. PC often returns a CPDAG-like object because observational data cannot always orient every edge. FCI returns a PAG-like object because it explicitly allows hidden confounding. Score-based methods such as GES also work over equivalence classes rather than simply selecting one arbitrary DAG.
Read this table as a translation layer between algorithm output and plain English.
graph_concepts = pd.DataFrame( [ {"concept": "DAG","stands_for": "Directed acyclic graph","edge_example": "A -> B","plain_language": "A is represented as a direct cause of B, and cycles are not allowed.","why_it_matters": "This is the cleanest causal story, but observational discovery often cannot justify every direction.", }, {"concept": "Skeleton","stands_for": "Adjacency pattern without directions","edge_example": "A - B","plain_language": "A and B are connected somehow, but the direction is ignored.","why_it_matters": "Skeleton quality asks whether the algorithm found the right variable pairs before asking whether directions are right.", }, {"concept": "CPDAG","stands_for": "Completed partially directed acyclic graph","edge_example": "A -> B and B - C","plain_language": "Directed edges are compelled across an equivalence class; undirected edges are reversible.","why_it_matters": "It prevents overclaiming directions that the observed conditional independences cannot identify.", }, {"concept": "PAG","stands_for": "Partial ancestral graph","edge_example": "A o-> B or A <-> B","plain_language": "Endpoint marks encode uncertainty and possible hidden confounding.","why_it_matters": "It is the right language when unmeasured common causes may be present.", }, {"concept": "SHD","stands_for": "Structural Hamming Distance","edge_example": "count of edits","plain_language": "How many edge additions, deletions, or orientation changes are needed to move from one graph to another.","why_it_matters": "It is compact, but it hides which kind of error occurred unless paired with precision/recall metrics.", }, ])graph_concepts.to_csv(TABLE_DIR /f"{NOTEBOOK_PREFIX}_graph_concept_map.csv", index=False)graph_concepts
concept
stands_for
edge_example
plain_language
why_it_matters
0
DAG
Directed acyclic graph
A -> B
A is represented as a direct cause of B, and cycles are not allowed.
This is the cleanest causal story, but observational discovery often cannot justify every direction.
1
Skeleton
Adjacency pattern without directions
A - B
A and B are connected somehow, but the direction is ignored.
Skeleton quality asks whether the algorithm found the right variable pairs before asking whether directions are right.
2
CPDAG
Completed partially directed acyclic graph
A -> B and B - C
Directed edges are compelled across an equivalence class; undirected edges are reversible.
It prevents overclaiming directions that the observed conditional independences cannot identify.
3
PAG
Partial ancestral graph
A o-> B or A <-> B
Endpoint marks encode uncertainty and possible hidden confounding.
It is the right language when unmeasured common causes may be present.
4
SHD
Structural Hamming Distance
count of edits
How many edge additions, deletions, or orientation changes are needed to move from one graph to another.
It is compact, but it hides which kind of error occurred unless paired with precision/recall metrics.
This table sets up the main reporting habit for causal discovery: separate whether two variables are connected from whether the direction is known. A graph can have an excellent skeleton and still be weak on directions. That is not necessarily a failure; sometimes the data genuinely do not contain enough information to orient an edge.
A Small Teaching DAG
Now we create a small true graph. The variables are intentionally generic and product-analytics flavored, but the graph is artificial so we can know the answer. We will use it as a controlled reference for the rest of the notebook.
The story is:
Need and Intent jointly influence whether a user sees a good Match.
Match drives Engagement.
Engagement affects both Renewal and Support demand.
Need also directly affects Renewal, so not all renewal differences are explained by engagement.
This is a DAG because every edge has one direction and there is no directed cycle.
node_labels = {"Need": "User\nneed","Intent": "Current\nintent","Match": "Recommendation\nmatch","Engagement": "Engagement\ndepth","Renewal": "Future\nrenewal","Support": "Support\ncontacts",}true_edge_table = pd.DataFrame( [ {"source": "Need", "target": "Match", "mark": "-->", "reason": "Need affects what match quality means for the user."}, {"source": "Intent", "target": "Match", "mark": "-->", "reason": "Current intent affects which recommendation feels relevant."}, {"source": "Match", "target": "Engagement", "mark": "-->", "reason": "Better matching increases downstream engagement."}, {"source": "Engagement", "target": "Renewal", "mark": "-->", "reason": "Deeper engagement can raise future value."}, {"source": "Engagement", "target": "Support", "mark": "-->", "reason": "More engagement can create more opportunities for support contact."}, {"source": "Need", "target": "Renewal", "mark": "-->", "reason": "Underlying need can directly affect renewal even after engagement."}, ])true_edge_table.to_csv(TABLE_DIR /f"{NOTEBOOK_PREFIX}_true_dag_edges.csv", index=False)true_edge_table
source
target
mark
reason
0
Need
Match
-->
Need affects what match quality means for the user.
1
Intent
Match
-->
Current intent affects which recommendation feels relevant.
2
Match
Engagement
-->
Better matching increases downstream engagement.
3
Engagement
Renewal
-->
Deeper engagement can raise future value.
4
Engagement
Support
-->
More engagement can create more opportunities for support contact.
5
Need
Renewal
-->
Underlying need can directly affect renewal even after engagement.
The edge table is deliberately explicit. In larger projects, keeping an edge table next to the graph helps reviewers understand whether an arrow came from domain assumptions, from an algorithm, or from a simulation design. Here it is our ground-truth answer key.
Building The DAG With causal-learn Objects
The previous cell was a plain pandas table. This cell builds the same graph using causal-learn’s graph classes. The important pieces are:
GraphNode: wraps the variable name.
Dag: stores a directed acyclic graph.
Edge: stores two endpoint marks.
Endpoint.TAIL and Endpoint.ARROW: together represent a directed edge from source to target.
In causal-learn notation, A --> B means the endpoint at A is a tail and the endpoint at B is an arrowhead.
def build_causallearn_graph(node_names, edge_table, graph_class=GeneralGraph):"""Build a causal-learn graph from a table with source, target, and mark columns.""" nodes = [GraphNode(name) for name in node_names] node_map = {node.get_name(): node for node in nodes} graph = graph_class(nodes) endpoint_map = {"-->": (Endpoint.TAIL, Endpoint.ARROW),"<--": (Endpoint.ARROW, Endpoint.TAIL),"---": (Endpoint.TAIL, Endpoint.TAIL),"<->": (Endpoint.ARROW, Endpoint.ARROW),"o->": (Endpoint.CIRCLE, Endpoint.ARROW),"<-o": (Endpoint.ARROW, Endpoint.CIRCLE),"o-o": (Endpoint.CIRCLE, Endpoint.CIRCLE), }for row in edge_table.itertuples(index=False): end1, end2 = endpoint_map[row.mark] graph.add_edge(Edge(node_map[row.source], node_map[row.target], end1, end2))return graph, node_mapnode_order =list(node_labels)true_dag, true_node_map = build_causallearn_graph(node_order, true_edge_table, graph_class=Dag)causal_learn_edges = pd.DataFrame( {"causal_learn_edge_string": [str(edge) for edge in true_dag.get_graph_edges()] })causal_learn_edges
causal_learn_edge_string
0
Need --> Match
1
Need --> Renewal
2
Intent --> Match
3
Match --> Engagement
4
Engagement --> Renewal
5
Engagement --> Support
The causal-learn edge strings match the edge table: every edge is directed. This is the object form that algorithms and built-in graph metrics expect. The table form is still useful because it is easier to read, save, modify, and render in a teaching notebook.
Rendering The Teaching DAG
The next helper renders edge tables with Graphviz. The visual conventions are the same ones we will use throughout the tutorial:
--> is a directed edge.
--- is an undirected or reversible CPDAG edge.
<-> is a bidirected edge, often used to flag latent common-cause risk in mixed graphs.
o->, <-o, and o-o are PAG-style uncertain endpoints.
The fixed node positions are not part of the causal model. They simply make the picture easier to read and keep the notebook output stable across runs.
def render_edge_table_graph( edge_table, labels, positions, title, output_path, node_colors=None, edge_radii=None, circle_positions=None, edge_color="#334155", figsize=(12, 6),):"""Render an edge table using the shared tutorial DAG style.""" node_colors = node_colors or {node: "#e0f2fe"for node in labels} edge_radii = edge_radii or {} circle_positions = circle_positions or {} fig, ax = plt.subplots(figsize=figsize) ax.set_xlim(0, 1) ax.set_ylim(0, 1) ax.set_axis_off() endpoint_circle_queue = []def endpoint_circle(source, target, near_source=True):"""Queue a PAG circle endpoint, using explicit positions when provided.""" side ="source"if near_source else"target" key = (source, target, side)if key in circle_positions: point = np.array(circle_positions[key], dtype=float)else: source_xy = np.array(positions[source], dtype=float) target_xy = np.array(positions[target], dtype=float) t =0.18if near_source else0.82 point = source_xy + t * (target_xy - source_xy) endpoint_circle_queue.append(point)for row in edge_table.itertuples(index=False): source = row.source target = row.target mark = row.mark rad = edge_radii.get((source, target), edge_radii.get((target, source), 0.0)) linestyle ="-" arrowstyle ="-|>" xy = positions[target] xytext = positions[source]if mark =="<--": xy = positions[source] xytext = positions[target]elif mark =="---": arrowstyle ="-"elif mark =="<->": arrowstyle ="<|-|>"elif mark =="o->": endpoint_circle(source, target, near_source=True)elif mark =="<-o": xy = positions[source] xytext = positions[target] endpoint_circle(source, target, near_source=False)elif mark =="o-o": arrowstyle ="-" endpoint_circle(source, target, near_source=True) endpoint_circle(source, target, near_source=False) ax.annotate("", xy=xy, xytext=xytext, arrowprops=dict( arrowstyle=arrowstyle, color=edge_color, linewidth=1.5, mutation_scale=18, shrinkA=34, shrinkB=46, linestyle=linestyle, connectionstyle=f"arc3,rad={rad}", ), zorder=1, )for point in endpoint_circle_queue: ax.scatter( point[0], point[1], s=34, facecolors="white", edgecolors=edge_color, linewidths=1.5, zorder=3, )for node, (x, y) in positions.items(): ax.text( x, y, labels[node], ha="center", va="center", fontsize=10.5, fontweight="bold", bbox=dict( boxstyle="round,pad=0.45", facecolor=node_colors.get(node, "#e0f2fe"), edgecolor="#334155", linewidth=1.2, ), zorder=4, ) ax.set_title(title, pad=18) output_path = Path(output_path) output_path.parent.mkdir(parents=True, exist_ok=True) fig.savefig(output_path, dpi=160, bbox_inches="tight") plt.show()return output_pathteaching_positions = {"Need": (0.10, 0.76),"Intent": (0.10, 0.24),"Match": (0.34, 0.52),"Engagement": (0.66, 0.52),"Renewal": (0.90, 0.72),"Support": (0.90, 0.30),}teaching_node_colors = {"Need": "#eef2ff","Intent": "#eef2ff","Match": "#e0f2fe","Engagement": "#e0f2fe","Renewal": "#dcfce7","Support": "#dcfce7",}teaching_edge_radii = { ("Need", "Match"): -0.04, ("Intent", "Match"): 0.04, ("Match", "Engagement"): 0.00, ("Engagement", "Renewal"): -0.04, ("Engagement", "Support"): 0.04, ("Need", "Renewal"): -0.18, ("Intent", "Support"): 0.18,}true_dag_path = FIGURE_DIR /f"{NOTEBOOK_PREFIX}_true_teaching_dag.png"render_edge_table_graph( true_edge_table, node_labels, teaching_positions,"True Teaching DAG", true_dag_path, node_colors=teaching_node_colors, edge_radii=teaching_edge_radii,)
The rendered graph makes two structural features easy to see. First, Need and Intent form a collider at Match: Need -> Match <- Intent. Second, Renewal has two parents, Need and Engagement. These patterns matter because v-structures are often what allow observational discovery algorithms to orient some edges.
Endpoint Vocabulary
causal-learn graphs are built from endpoint marks, not just from whole-edge labels. This matters because algorithms such as FCI can return edges where one endpoint is known and the other is uncertain.
The whole-edge notation is a compact way to read the two endpoint marks together.
endpoint_vocabulary = pd.DataFrame( [ {"whole_edge_mark": "A --> B","endpoint_at_A": "TAIL","endpoint_at_B": "ARROW","typical_graph": "DAG, CPDAG, PAG","meaning": "A is oriented as an ancestor/cause-side endpoint of B.", }, {"whole_edge_mark": "A --- B","endpoint_at_A": "TAIL","endpoint_at_B": "TAIL","typical_graph": "CPDAG","meaning": "A and B are adjacent, but this edge is reversible within the equivalence class.", }, {"whole_edge_mark": "A o-> B","endpoint_at_A": "CIRCLE","endpoint_at_B": "ARROW","typical_graph": "PAG","meaning": "The endpoint near A is uncertain; B has an arrowhead on this edge.", }, {"whole_edge_mark": "A <-> B","endpoint_at_A": "ARROW","endpoint_at_B": "ARROW","typical_graph": "PAG or mixed graph","meaning": "Often read as possible latent common-cause structure between A and B.", }, {"whole_edge_mark": "A o-o B","endpoint_at_A": "CIRCLE","endpoint_at_B": "CIRCLE","typical_graph": "PAG","meaning": "The relationship is adjacent, but both endpoint orientations remain unresolved.", }, ])endpoint_vocabulary.to_csv(TABLE_DIR /f"{NOTEBOOK_PREFIX}_endpoint_vocabulary.csv", index=False)endpoint_vocabulary
whole_edge_mark
endpoint_at_A
endpoint_at_B
typical_graph
meaning
0
A --> B
TAIL
ARROW
DAG, CPDAG, PAG
A is oriented as an ancestor/cause-side endpoint of B.
1
A --- B
TAIL
TAIL
CPDAG
A and B are adjacent, but this edge is reversible within the equivalence class.
2
A o-> B
CIRCLE
ARROW
PAG
The endpoint near A is uncertain; B has an arrowhead on this edge.
3
A <-> B
ARROW
ARROW
PAG or mixed graph
Often read as possible latent common-cause structure between A and B.
4
A o-o B
CIRCLE
CIRCLE
PAG
The relationship is adjacent, but both endpoint orientations remain unresolved.
This endpoint view is the cleanest way to avoid overclaiming. A circle is not decorative; it is a statement that the available information has not resolved that endpoint. When writing up a discovery result, those unresolved marks should be preserved rather than silently converted to arrows.
DAG To CPDAG: Why Some Directions Disappear
A CPDAG represents all DAGs that imply the same observational conditional independences. If several DAGs are Markov equivalent, observational data alone cannot choose among them without extra assumptions, interventions, time ordering, or background knowledge.
The classic example is a three-node chain. A -> B -> C and A <- B -> C share the same skeleton and have no unshielded collider, so they are Markov equivalent. In a CPDAG, the reversible edges become undirected.
chain_labels = {"A": "A", "B": "B", "C": "C"}chain_positions = {"A": (0.18, 0.52), "B": (0.50, 0.52), "C": (0.82, 0.52)}chain_node_colors = {"A": "#eef2ff", "B": "#e0f2fe", "C": "#dcfce7"}chain_edges = pd.DataFrame( [ {"source": "A", "target": "B", "mark": "-->", "reason": "First link in a simple chain."}, {"source": "B", "target": "C", "mark": "-->", "reason": "Second link in a simple chain."}, ])chain_dag, _ = build_causallearn_graph(list(chain_labels), chain_edges, graph_class=Dag)chain_cpdag = dag2cpdag(chain_dag)def causallearn_edges_to_table(graph):"""Convert causal-learn edges into a small readable endpoint table.""" records = []for edge in graph.get_graph_edges(): endpoint1 =str(edge.get_endpoint1()) endpoint2 =str(edge.get_endpoint2()) source = edge.get_node1().get_name() target = edge.get_node2().get_name() mark_lookup = { ("TAIL", "ARROW"): "-->", ("ARROW", "TAIL"): "<--", ("TAIL", "TAIL"): "---", ("ARROW", "ARROW"): "<->", ("CIRCLE", "ARROW"): "o->", ("ARROW", "CIRCLE"): "<-o", ("CIRCLE", "CIRCLE"): "o-o", } records.append( {"source": source,"target": target,"endpoint_at_source": endpoint1,"endpoint_at_target": endpoint2,"mark": mark_lookup.get((endpoint1, endpoint2), f"{endpoint1}/{endpoint2}"),"causal_learn_string": str(edge), } )return pd.DataFrame(records)chain_cpdag_table = causallearn_edges_to_table(chain_cpdag)chain_cpdag_table.to_csv(TABLE_DIR /f"{NOTEBOOK_PREFIX}_chain_cpdag_edges.csv", index=False)chain_dag_path = FIGURE_DIR /f"{NOTEBOOK_PREFIX}_chain_dag.png"chain_cpdag_path = FIGURE_DIR /f"{NOTEBOOK_PREFIX}_chain_cpdag.png"render_edge_table_graph(chain_edges, chain_labels, chain_positions, "One DAG In The Equivalence Class", chain_dag_path, node_colors=chain_node_colors)render_edge_table_graph( chain_cpdag_table.assign(reason="reversible edge in the CPDAG"), chain_labels, chain_positions,"CPDAG Returned By dag2cpdag", chain_cpdag_path, node_colors=chain_node_colors, edge_color="#334155",)chain_cpdag_table
source
target
endpoint_at_source
endpoint_at_target
mark
causal_learn_string
0
A
B
TAIL
TAIL
---
A --- B
1
B
C
TAIL
TAIL
---
B --- C
The CPDAG table shows undirected --- edges. That does not mean there is no causal relationship. It means the equivalence class does not force a single direction for those adjacencies. This is one of the most important habits in causal discovery reporting: uncertainty about direction should stay visible.
Markov Equivalence And V-Structures
Two DAGs are Markov equivalent when they have the same skeleton and the same unshielded colliders, also called v-structures. The next cell compares three small graphs:
A chain: A -> B -> C.
A fork: A <- B -> C.
A collider: A -> B <- C.
The chain and fork have the same skeleton and no collider. The collider has the same skeleton but a different v-structure, so it implies different conditional independence behavior.
def skeleton_from_edges(edge_pairs):"""Return undirected adjacencies as sorted frozensets."""return {frozenset([source, target]) for source, target in edge_pairs}def v_structures_from_edges(edge_pairs):"""Find unshielded colliders A -> B <- C in a directed edge list.""" parents = {} all_edges =set(edge_pairs)for source, target in edge_pairs: parents.setdefault(target, set()).add(source) v_structures = []for child, parent_set in parents.items(): sorted_parents =sorted(parent_set)for i, left_parent inenumerate(sorted_parents):for right_parent in sorted_parents[i +1 :]: parents_adjacent = ( (left_parent, right_parent) in all_edgesor (right_parent, left_parent) in all_edges )ifnot parents_adjacent: v_structures.append(f"{left_parent} -> {child} <- {right_parent}")returnsorted(v_structures)small_graphs = {"chain_A_to_B_to_C": [("A", "B"), ("B", "C")],"fork_B_to_A_and_C": [("B", "A"), ("B", "C")],"collider_A_and_C_to_B": [("A", "B"), ("C", "B")],}equivalence_rows = []for graph_name, edges in small_graphs.items(): equivalence_rows.append( {"graph": graph_name,"directed_edges": ", ".join([f"{a}->{b}"for a, b in edges]),"skeleton": ", ".join(sorted(["-".join(sorted(edge)) for edge in skeleton_from_edges(edges)])),"unshielded_colliders": "; ".join(v_structures_from_edges(edges)) or"none", } )equivalence_table = pd.DataFrame(equivalence_rows)equivalence_table.to_csv(TABLE_DIR /f"{NOTEBOOK_PREFIX}_markov_equivalence_examples.csv", index=False)equivalence_table
graph
directed_edges
skeleton
unshielded_colliders
0
chain_A_to_B_to_C
A->B, B->C
A-B, B-C
none
1
fork_B_to_A_and_C
B->A, B->C
A-B, B-C
none
2
collider_A_and_C_to_B
A->B, C->B
A-B, B-C
A -> B <- C
The chain and fork rows share the same skeleton and both list none for unshielded colliders. The collider row has the same adjacencies but a different collider pattern. This is why v-structures are so valuable: they are one of the few orientation patterns observational conditional independence tests can often identify.
PAG-Style Edge Marks
A PAG is the graph language used by algorithms such as FCI when latent confounders or selection effects are allowed. In practice, this means the algorithm is less willing to pretend that every relevant cause has been measured.
The graph below is not the output of an algorithm. It is a hand-made teaching graph showing the edge marks you will later see in FCI-style notebooks.
Both endpoint orientations remain unresolved for this illustrative relationship.
Y o-o U
The key habit is to read each endpoint separately. For example, A o-> M is weaker than A -> M: it says there is an arrowhead at M, while the endpoint near A remains uncertain. This is exactly the sort of nuance that disappears if a PAG is redrawn as a simple DAG.
Graph Evaluation Metrics From First Principles
When a true graph is known, usually through simulation or a benchmark, we can evaluate learned graphs. It is useful to split evaluation into two layers:
Adjacency evaluation asks whether the right pairs of variables were connected, ignoring direction.
Direction evaluation asks whether the directed arrows match the true directions.
A conservative algorithm may recover the right adjacencies but leave many edges undirected. That should score well on skeleton recall but not necessarily on arrow recall. This distinction is more informative than one overall score.
The metric table shows why multiple scores are needed. The conservative CPDAG-like candidate has perfect skeleton precision and recall because it connects the right variable pairs, but its arrow recall is lower because it refuses to orient two edges. The missing-and-reversed candidate loses both an adjacency and a direction, which is a more serious structural error.
Visual Comparison Of Candidate Graphs
Tables are precise, but visual comparison helps diagnose the type of mistake. The next cell renders each candidate graph using the same node positions as the true DAG. Keeping the layout fixed makes differences easier to spot: an extra edge, a missing edge, a reversed edge, or an unresolved edge mark becomes visually obvious.
The fixed-layout views make the reporting tradeoff tangible. A conservative graph may look less decisive, but it can be more honest. A graph with extra arrows may look impressive, yet it can encode false causal claims. In causal discovery, being explicit about uncertainty is usually better than making the graph look complete.
Cross-Checking With causal-learn Built-In Metrics
causal-learn includes graph-comparison utilities for common benchmark metrics. The next cell compares the true DAG to the missing_and_reversed candidate. We keep this as a cross-check instead of the only evaluation because built-in metric definitions can be compact and library-specific. For a report, it is useful to show both: the library score and a plain-language table that says what kind of mistake happened.
The built-in metrics agree with the qualitative diagnosis: this candidate recovered most adjacencies but made direction mistakes. The important workflow lesson is to pair numeric metrics with an edge-level explanation. A single SHD value is useful for benchmarking, but it does not tell a stakeholder which causal claims changed.
Metric Selection Guide
This final guide summarizes which metric to emphasize for different discovery questions. The best metric depends on the decision the graph will support. If a graph will be used only to reduce a modeling feature set, adjacency recall may matter more than arrow precision. If a graph will support causal claims or intervention planning, arrow errors become much more consequential.
metric_selection_guide = pd.DataFrame( [ {"question": "Did we find the right connected variable pairs?","metric_to_start_with": "Adjacency precision and adjacency recall","watch_out_for": "High adjacency recall can still hide many false directions.", }, {"question": "Did we orient arrows correctly?","metric_to_start_with": "Arrow precision and arrow recall","watch_out_for": "A conservative CPDAG/PAG may leave valid uncertainty instead of making wrong arrow claims.", }, {"question": "How far is the whole graph from the reference graph?","metric_to_start_with": "SHD","watch_out_for": "SHD is compact but should be decomposed into missing, extra, and reversed edge errors.", }, {"question": "Can we use this graph for causal decisions?","metric_to_start_with": "Edge-level audit plus domain review","watch_out_for": "Graph recovery metrics do not validate causal assumptions by themselves.", }, ])metric_selection_guide.to_csv(TABLE_DIR /f"{NOTEBOOK_PREFIX}_metric_selection_guide.csv", index=False)metric_selection_guide
question
metric_to_start_with
watch_out_for
0
Did we find the right connected variable pairs?
Adjacency precision and adjacency recall
High adjacency recall can still hide many false directions.
1
Did we orient arrows correctly?
Arrow precision and arrow recall
A conservative CPDAG/PAG may leave valid uncertainty instead of making wrong arrow claims.
2
How far is the whole graph from the reference graph?
SHD
SHD is compact but should be decomposed into missing, extra, and reversed edge errors.
3
Can we use this graph for causal decisions?
Edge-level audit plus domain review
Graph recovery metrics do not validate causal assumptions by themselves.
This guide turns the notebook into a reusable checklist. Before reporting a discovered graph, decide whether the work needs adjacency recovery, direction recovery, equivalence-class reporting, or a decision-ready causal story. Different goals require different evidence.
Generated Artifacts
The last cell lists the files created by this notebook. Keeping outputs organized makes the tutorial easier to audit and makes it simple to reuse figures in a writeup or presentation.
The notebook now has the graph vocabulary needed for the rest of the causal-learn sequence. The next tutorial can start generating synthetic data because we now know exactly what the true graph means, how an algorithm might return a partially oriented version of it, and how to evaluate that output without flattening away uncertainty.