causal-learn Tutorial 12: Permutation-Based Methods: GRaSP And BOSS

Permutation-based causal discovery searches over variable orderings. Once an order is proposed, each variable can only choose parents that appear earlier in that order. This turns the graph search problem into an order-search problem: find a good ordering, then infer a sparse DAG consistent with that ordering.

This notebook introduces two causal-learn permutation-based methods:

BOSS: Best Order Score Search.
GRaSP: Greedy Relaxation of the Sparsest Permutation.

Both methods use local scores and order mutations to search for a graph. They are useful bridges between exact score search and scalable greedy graph search: more structured than arbitrary edge edits, but much cheaper than enumerating all DAGs.

Expected runtime: usually under 1 minute on this teaching dataset. The stability loops run multiple random starts, but each run is small and fast.

Learning Goals

By the end of this notebook, you should be able to:

Explain why variable orderings are useful search objects for DAG discovery.
Understand how a fixed order restricts possible parent sets.
Run BOSS and GRaSP from causal-learn.
Compare permutation-based output with GES and exact search on a small graph.
Check random-start stability for order-based methods.
Study how GRaSP depth and sample size affect graph recovery.
Report permutation-based results without overstating what the search guarantees.

Notebook Flow

We start with order-search theory, then set up the environment and load the same linear Gaussian teaching dataset used by the GES and exact-search notebooks. After that we show what a fixed order implies, run BOSS and GRaSP, compare them with GES and exact search, and stress-test stability across seeds, GRaSP depths, and sample sizes.

The notebook uses the linear Gaussian dataset because the permutation methods here use BIC-style local scores. That keeps the score assumptions aligned with the data-generating mechanism.

Permutation-Based Discovery Theory

A DAG always has at least one topological order: an ordering of variables where every parent appears before its children. If we know the correct order, the graph search becomes much easier. For each variable, we only need to decide which earlier variables are its parents.

For example, suppose the order is:

need, intent, match, engagement, renewal, support

Then match may choose parents from need and intent, but it cannot choose renewal as a parent because renewal appears later. The order itself forbids backward edges.

Permutation-based algorithms exploit this fact. They search over orders and use local scores to decide which parents are useful within each order.

From Orders To Parent Sets

For a fixed order, the best parent set for each node can be scored locally. With a decomposable score, the total graph score is:

\[ S(G) = \sum_i S_i(\mathrm{Pa}(X_i)) \]

The local score for X_i only depends on the parent set of X_i. Under an order, the candidate parent set must be a subset of variables that appear before X_i.

This is the key computational move. Instead of exploring arbitrary directed graphs, the algorithm asks: “if this order were true, which earlier variables would each node choose as parents?” Then it mutates the order to see if a better score is possible.

Sparsest Permutation Intuition

The sparsest permutation idea says that the right causal order should allow a sparse graph that explains the conditional dependence structure. If the order is wrong, the algorithm may need extra edges or awkward orientations to explain the same data.

This does not mean the sparsest graph is always the true graph. Finite samples, weak effects, hidden variables, and score mismatch can all distort sparsity. But the idea is powerful: a good order should make parent selection simple.

GRaSP searches for better permutations by relaxing and mutating orders. BOSS searches by moving variables through an order when doing so improves the score.

BOSS Versus GRaSP

BOSS and GRaSP are both order-based, but they move through order space differently.

BOSS repeatedly considers moving variables to better positions in the order. After an order is updated, parent sets are recomputed from local scores. It is a greedy best-order search.

GRaSP starts with an order and uses covered-edge style relaxations, called tucks in the implementation, to improve the sparsest permutation. The depth argument controls how deep the relaxation search can go. Larger depth can escape some shallow local traps, but it can also cost more time.

Both methods can depend on random starts or random traversal choices. That is why stability across seeds is part of this notebook.

Relationship To GES And Exact Search

GES greedily edits an equivalence-class graph by adding and removing edges. Exact search optimizes the score globally over a small allowed space. BOSS and GRaSP take a third route: they search over permutations and derive parent sets consistent with those permutations.

All three families can use decomposable scores, so they can agree on friendly small problems. They can also disagree because their search paths and constraints differ.

For a small teaching graph, exact search is a useful benchmark. For larger graphs, permutation methods may be more practical than exact search while still giving a structured search over possible causal orderings.

What Permutation Methods Can And Cannot Claim

Permutation methods can find high-scoring DAGs and useful causal orders under score assumptions. They are especially helpful when order structure is easier to search than arbitrary graph structure.

They cannot prove causal truth from score optimization alone. They still depend on observed variables, score choice, sample size, model class, and absence of major hidden confounding. A high-scoring order is evidence under a model, not a final causal proof.

A careful report should include the score function, penalty, random seeds or restarts, depth settings for GRaSP, runtime, stability, and comparison to other methods.

Setup

This cell imports the scientific Python stack, BOSS, GRaSP, GES, exact BIC search, and plotting utilities. It also applies a compatibility wrapper for the BIC score used by recent NumPy versions. The wrapper preserves the same scoring formula while safely converting one-element matrix results into Python scalars.

from pathlib import Path
from importlib.metadata import PackageNotFoundError, version
import contextlib
import io
import itertools
import os
import random
import time
import warnings

# Resolve paths before importing matplotlib so the cache directory stays writable.
CWD = Path.cwd()
if CWD.name == "causal_learn" and (CWD / "outputs").exists():
    NOTEBOOK_DIR = CWD
else:
    NOTEBOOK_DIR = (CWD / "notebooks" / "tutorials" / "causal_learn").resolve()

OUTPUT_DIR = NOTEBOOK_DIR / "outputs"
DATASET_DIR = OUTPUT_DIR / "datasets"
TABLE_DIR = OUTPUT_DIR / "tables"
FIGURE_DIR = OUTPUT_DIR / "figures"
REPORT_DIR = OUTPUT_DIR / "reports"
for directory in [OUTPUT_DIR, DATASET_DIR, TABLE_DIR, FIGURE_DIR, REPORT_DIR, OUTPUT_DIR / "matplotlib_cache"]:
    directory.mkdir(parents=True, exist_ok=True)

os.environ.setdefault("MPLCONFIGDIR", str(OUTPUT_DIR / "matplotlib_cache"))
warnings.filterwarnings("ignore", category=FutureWarning)
warnings.filterwarnings("ignore", message="Using 'local_score_BIC_from_cov' instead for efficiency")

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display
from matplotlib.patches import FancyArrowPatch, FancyBboxPatch

import causallearn.search.PermutationBased.BOSS as boss_module
import causallearn.search.PermutationBased.GRaSP as grasp_module
import causallearn.search.ScoreBased.GES as ges_module
from causallearn.search.PermutationBased.BOSS import boss
from causallearn.search.PermutationBased.GRaSP import grasp
from causallearn.search.ScoreBased.GES import ges
from causallearn.search.ScoreBased.ExactSearch import bic_exact_search

NOTEBOOK_PREFIX = "12"
sns.set_theme(style="whitegrid", context="notebook")
plt.rcParams["figure.dpi"] = 120
plt.rcParams["savefig.facecolor"] = "white"


def local_score_BIC_from_cov(Data, i, PAi, parameters=None):
    """Compatibility wrapper for causal-learn BIC-from-cov scoring under recent NumPy scalar behavior."""
    cov, n = Data
    lambda_value = 0.5 if parameters is None else parameters.get("lambda_value", 0.5)
    sigma = float(cov[i, i])
    if len(PAi) > 0:
        yX = cov[np.ix_([i], PAi)]
        XX = cov[np.ix_(PAi, PAi)]
        try:
            XX_inv = np.linalg.inv(XX)
        except np.linalg.LinAlgError:
            XX_inv = np.linalg.pinv(XX)
        sigma = float(cov[i, i] - (yX @ XX_inv @ yX.T).item())
    sigma = max(sigma, 1e-12)
    return -0.5 * n * (1 + np.log(sigma)) - lambda_value * (len(PAi) + 1) * np.log(n)

# Patch the modules that build LocalScoreClass objects during this notebook session.
boss_module.local_score_BIC_from_cov = local_score_BIC_from_cov
grasp_module.local_score_BIC_from_cov = local_score_BIC_from_cov
ges_module.local_score_BIC_from_cov = local_score_BIC_from_cov

packages = ["causal-learn", "numpy", "pandas", "matplotlib", "seaborn"]
version_rows = []
for package in packages:
    try:
        package_version = version(package)
    except PackageNotFoundError:
        package_version = "not installed"
    version_rows.append({"package": package, "version": package_version})

package_versions = pd.DataFrame(version_rows)
package_versions.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_package_versions.csv", index=False)
display(package_versions)

	package	version
0	causal-learn	0.1.4.5
1	numpy	2.4.4
2	pandas	3.0.2
3	matplotlib	3.10.9
4	seaborn	0.13.2

The setup confirms the environment and patches the local score function before any permutation method is run. BOSS, GRaSP, and GES will all use the same safe BIC wrapper in this notebook.

Load The Linear Gaussian Teaching Data

We use the linear Gaussian dataset from the synthetic data tutorial because it matches the BIC score assumptions well. The known true graph lets us measure recovery rather than judging graphs by appearance alone.

linear_path = DATASET_DIR / "02_linear_gaussian.csv"
truth_path = TABLE_DIR / "02_base_true_dag_edges.csv"

linear_df = pd.read_csv(linear_path)
true_edges = pd.read_csv(truth_path)
VARIABLES = list(linear_df.columns)

loaded_summary = pd.DataFrame(
    [
        {
            "artifact": "linear Gaussian data",
            "path": str(linear_path.relative_to(NOTEBOOK_DIR)),
            "rows": len(linear_df),
            "columns": linear_df.shape[1],
        },
        {
            "artifact": "true edge table",
            "path": str(truth_path.relative_to(NOTEBOOK_DIR)),
            "rows": len(true_edges),
            "columns": true_edges.shape[1],
        },
    ]
)
loaded_summary.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_loaded_dataset_summary.csv", index=False)

display(loaded_summary)
display(linear_df.head())
display(true_edges)

	artifact	path	rows	columns
0	linear Gaussian data	outputs/datasets/02_linear_gaussian.csv	2500	6
1	true edge table	outputs/tables/02_base_true_dag_edges.csv	6	4

	need	intent	match	engagement	renewal	support
0	0.249820	-0.372094	0.060245	0.667197	0.252766	-0.280058
1	0.683671	-0.210471	0.904969	1.004727	0.320095	-0.332215
2	-0.579752	-1.202671	-0.578579	-0.235444	-0.732431	0.594102
3	-0.902823	-0.077309	-0.771219	-0.531128	-0.105721	-1.503551
4	-1.985745	0.087297	-0.691315	-1.281731	-0.797906	-0.328219

	source	target	edge_type	mechanism
0	need	match	directed	Need changes what a good match means.
1	intent	match	directed	Current intent changes recommendation relevance.
2	match	engagement	directed	Better matching increases engagement depth.
3	intent	renewal	directed	Intent directly affects later value.
4	engagement	renewal	directed	Engagement contributes to renewal value.
5	engagement	support	directed	Engagement creates more chances for support co...

The data has six observed variables. That is small enough to compare BOSS, GRaSP, GES, and exact search directly without turning the notebook into a runtime exercise.

Helper Functions

These helpers convert causal-learn graph objects into tidy edge tables, compute recovery metrics, score fixed orders, and draw graphs in the same visual style used throughout the tutorial.

def parse_causallearn_edge(edge):
    """Convert a causal-learn edge object into source, endpoint pattern, and target strings."""
    parts = str(edge).strip().split()
    if len(parts) != 3:
        return {"source": str(edge), "edge_type": "unknown", "target": "unknown"}
    return {"source": parts[0], "edge_type": parts[1], "target": parts[2]}


def graph_to_edge_table(graph, label):
    """Return a tidy edge table from a causal-learn graph object."""
    rows = [parse_causallearn_edge(edge) for edge in graph.get_graph_edges()]
    edge_df = pd.DataFrame(rows, columns=["source", "edge_type", "target"])
    if edge_df.empty:
        edge_df = pd.DataFrame(columns=["source", "edge_type", "target"])
    edge_df.insert(0, "run", label)
    return edge_df


def adjacency_to_edge_table(adjacency, variables, label):
    """Convert a parent-row, child-column adjacency matrix into a tidy directed edge table."""
    rows = []
    for parent_idx, child_idx in zip(*np.where(adjacency > 0)):
        rows.append(
            {
                "run": label,
                "source": variables[int(parent_idx)],
                "edge_type": "-->",
                "target": variables[int(child_idx)],
            }
        )
    return pd.DataFrame(rows, columns=["run", "source", "edge_type", "target"])


def directed_pairs(edge_df):
    """Extract definite directed pairs from an edge table."""
    pairs = set()
    for row in edge_df.itertuples(index=False):
        if row.edge_type == "-->":
            pairs.add((row.source, row.target))
        elif row.edge_type == "<--":
            pairs.add((row.target, row.source))
    return pairs


def skeleton_pairs(edge_df):
    """Extract adjacencies while ignoring direction."""
    pairs = set()
    for row in edge_df.itertuples(index=False):
        if row.target != "unknown":
            pairs.add(frozenset([row.source, row.target]))
    return pairs


def summarize_against_truth(edge_df, truth_df, label):
    """Compute graph-recovery metrics against a directed truth table."""
    true_directed = set(zip(truth_df["source"], truth_df["target"]))
    true_skeleton = {frozenset(edge) for edge in true_directed}
    learned_directed = directed_pairs(edge_df)
    learned_skeleton = skeleton_pairs(edge_df)

    correct_directed = learned_directed & true_directed
    reversed_true = {(src, dst) for src, dst in true_directed if (dst, src) in learned_directed}
    missing_skeleton = true_skeleton - learned_skeleton
    extra_skeleton = learned_skeleton - true_skeleton

    unresolved_true = 0
    for src, dst in true_directed:
        pair = frozenset([src, dst])
        if pair in learned_skeleton and (src, dst) not in learned_directed and (dst, src) not in learned_directed:
            unresolved_true += 1

    directed_count = len(learned_directed)
    return pd.DataFrame(
        [
            {
                "run": label,
                "learned_edges_total": len(edge_df),
                "definite_directed_edges": directed_count,
                "true_edges": len(true_directed),
                "correct_directed_edges": len(correct_directed),
                "directed_precision": len(correct_directed) / directed_count if directed_count else np.nan,
                "directed_recall": len(correct_directed) / len(true_directed) if true_directed else np.nan,
                "reversed_true_edges": len(reversed_true),
                "unresolved_true_adjacencies": unresolved_true,
                "missing_true_adjacencies": len(missing_skeleton),
                "extra_adjacencies": len(extra_skeleton),
            }
        ]
    )


def run_boss(data_df, label, seed=1, lambda_value=0.5):
    """Run BOSS with a controlled random seed."""
    random.seed(seed)
    np.random.seed(seed)
    start = time.perf_counter()
    buffer = io.StringIO()
    with contextlib.redirect_stdout(buffer):
        graph = boss(
            data_df.to_numpy(),
            score_func="local_score_BIC_from_cov",
            parameters={"lambda_value": lambda_value},
            verbose=False,
            node_names=list(data_df.columns),
        )
    elapsed = time.perf_counter() - start
    edge_table = graph_to_edge_table(graph, label=label)
    return graph, edge_table, elapsed


def run_grasp(data_df, label, seed=1, depth=4, lambda_value=0.5):
    """Run GRaSP with a controlled random seed and chosen depth."""
    random.seed(seed)
    np.random.seed(seed)
    start = time.perf_counter()
    buffer = io.StringIO()
    with contextlib.redirect_stdout(buffer):
        graph = grasp(
            data_df.to_numpy(),
            score_func="local_score_BIC_from_cov",
            depth=depth,
            parameters={"lambda_value": lambda_value},
            verbose=False,
            node_names=list(data_df.columns),
        )
    elapsed = time.perf_counter() - start
    edge_table = graph_to_edge_table(graph, label=label)
    return graph, edge_table, elapsed


def run_ges_bic(data_df, label, lambda_value=0.5):
    """Run BIC-GES quietly for comparison."""
    start = time.perf_counter()
    buffer = io.StringIO()
    with contextlib.redirect_stdout(buffer):
        record = ges(
            data_df.to_numpy(),
            score_func="local_score_BIC",
            node_names=list(data_df.columns),
            lambda_value=lambda_value,
        )
    elapsed = time.perf_counter() - start
    edge_table = graph_to_edge_table(record["G"], label=label)
    return record, edge_table, elapsed


def run_exact_bic(data_df, label):
    """Run exact BIC search with A* for the small teaching graph."""
    start = time.perf_counter()
    adjacency, stats = bic_exact_search(data_df.to_numpy(), search_method="astar", verbose=False)
    elapsed = time.perf_counter() - start
    edge_table = adjacency_to_edge_table(adjacency, list(data_df.columns), label=label)
    return adjacency, stats, edge_table, elapsed


def best_parent_sets_for_order(data_df, order, lambda_value=0.5, max_parents=None):
    """Exhaustively choose the best parent set for each node given a fixed order."""
    data = data_df.to_numpy()
    cov = np.cov(data.T, ddof=0)
    n = data.shape[0]
    index = {name: i for i, name in enumerate(data_df.columns)}
    rows = []
    edge_rows = []
    total_score = 0.0

    for position, child in enumerate(order):
        child_idx = index[child]
        candidate_parents = list(order[:position])
        parent_limit = len(candidate_parents) if max_parents is None else min(max_parents, len(candidate_parents))
        best_score = -np.inf
        best_parents = []
        for size in range(parent_limit + 1):
            for parent_names in itertools.combinations(candidate_parents, size):
                parent_indices = [index[name] for name in parent_names]
                score = local_score_BIC_from_cov((cov, n), child_idx, parent_indices, {"lambda_value": lambda_value})
                if score > best_score:
                    best_score = score
                    best_parents = list(parent_names)
        total_score += best_score
        rows.append(
            {
                "order_label": " -> ".join(order),
                "child": child,
                "position": position + 1,
                "allowed_parent_count": len(candidate_parents),
                "selected_parents": ", ".join(best_parents) if best_parents else "none",
                "local_score": best_score,
            }
        )
        for parent in best_parents:
            edge_rows.append({"run": "fixed_order", "source": parent, "edge_type": "-->", "target": child})

    parent_table = pd.DataFrame(rows)
    edge_table = pd.DataFrame(edge_rows, columns=["run", "source", "edge_type", "target"])
    return parent_table, edge_table, total_score


GRAPH_POSITIONS = {
    "need": (0.11, 0.72),
    "intent": (0.11, 0.28),
    "match": (0.39, 0.50),
    "engagement": (0.62, 0.50),
    "renewal": (0.89, 0.72),
    "support": (0.89, 0.28),
}
NODE_LABELS = {name: name.title() for name in GRAPH_POSITIONS}
NODE_COLORS = {
    "need": "#e0f2fe",
    "intent": "#dbeafe",
    "match": "#ecfccb",
    "engagement": "#fef3c7",
    "renewal": "#fee2e2",
    "support": "#f3e8ff",
}


def trim_edge_to_box(start, end, box_w=0.145, box_h=0.095, gap=0.012):
    """Return edge endpoints that stop just outside source and target boxes."""
    x0, y0 = start
    x1, y1 = end
    dx = x1 - x0
    dy = y1 - y0
    length = float(np.hypot(dx, dy))
    if length == 0:
        return start, end
    effective_w = box_w + 0.04
    effective_h = box_h + 0.04
    x_limit = (effective_w / 2) / abs(dx) if dx else np.inf
    y_limit = (effective_h / 2) / abs(dy) if dy else np.inf
    t = min(x_limit, y_limit) + gap / length
    return (x0 + dx * t, y0 + dy * t), (x1 - dx * t, y1 - dy * t)


def draw_box_graph(edge_df, title, path, note=None):
    """Draw a DAG-style graph with rounded boxes and visible arrowheads."""
    fig, ax = plt.subplots(figsize=(12, 6.2))
    ax.set_axis_off()
    ax.set_xlim(-0.02, 1.02)
    ax.set_ylim(0.04, 0.96)
    box_w, box_h = 0.145, 0.095

    for row in edge_df.itertuples(index=False):
        if row.source not in GRAPH_POSITIONS or row.target not in GRAPH_POSITIONS:
            continue
        raw_start = GRAPH_POSITIONS[row.source]
        raw_end = GRAPH_POSITIONS[row.target]
        if row.edge_type == "<--":
            raw_start, raw_end = raw_end, raw_start
        start, end = trim_edge_to_box(raw_start, raw_end, box_w=box_w, box_h=box_h)
        if row.edge_type in {"-->", "<--"}:
            arrowstyle = "-|>"
            mutation_scale = 18
            linewidth = 1.8
            color = "#334155"
        else:
            arrowstyle = "-"
            mutation_scale = 1
            linewidth = 1.5
            color = "#64748b"
        arrow = FancyArrowPatch(
            start,
            end,
            arrowstyle=arrowstyle,
            mutation_scale=mutation_scale,
            linewidth=linewidth,
            color=color,
            connectionstyle="arc3,rad=0.035",
            zorder=2,
        )
        ax.add_patch(arrow)

    for node, (x, y) in GRAPH_POSITIONS.items():
        rect = FancyBboxPatch(
            (x - box_w / 2, y - box_h / 2),
            box_w,
            box_h,
            boxstyle="round,pad=0.018",
            facecolor=NODE_COLORS[node],
            edgecolor="#1f2937",
            linewidth=1.1,
            zorder=5,
        )
        ax.add_patch(rect)
        ax.text(x, y, NODE_LABELS[node], ha="center", va="center", fontsize=10.5, fontweight="bold", zorder=6)

    if note:
        ax.text(0.50, 0.08, note, ha="center", va="center", fontsize=10, color="#475569")
    ax.set_title(title, pad=18, fontsize=14, fontweight="bold")
    fig.savefig(path, dpi=160, bbox_inches="tight")
    plt.show()


def truth_as_edge_table(truth_df, label="truth"):
    """Convert a truth table into a plotting-ready edge table."""
    return truth_df.assign(run=label, edge_type="-->")[["run", "source", "edge_type", "target"]]

The helper set includes a small fixed-order scorer. That scorer is not a replacement for BOSS or GRaSP; it is a teaching device for seeing how a proposed order restricts parent choices.

Draw The True DAG

Before running order-based methods, we draw the known data-generating DAG. This is the target used in the recovery metrics.

true_plot_edges = truth_as_edge_table(true_edges)
draw_box_graph(
    true_plot_edges,
    title="True DAG For The Permutation-Based Tutorial",
    path=FIGURE_DIR / f"{NOTEBOOK_PREFIX}_true_dag.png",
    note="The teaching data is generated from this six-edge linear Gaussian DAG.",
)

The true graph has two root variables, a mediator chain, and two downstream outcomes. A good order should place causes before effects so parent selection can recover these edges.

Fixed-Order Parent Selection Demo

This cell shows why order matters. We compare a plausible causal order with a deliberately reversed order. For each order, the code finds the best local BIC parent set for every node using only variables that appear earlier in the order.

causal_order = ["need", "intent", "match", "engagement", "renewal", "support"]
reversed_order = list(reversed(causal_order))

fixed_order_rows = []
fixed_order_edge_tables = []
for order_name, order in [("causal_order", causal_order), ("reversed_order", reversed_order)]:
    parent_table, edge_table, total_score = best_parent_sets_for_order(
        linear_df,
        order,
        lambda_value=0.5,
        max_parents=3,
    )
    parent_table["order_name"] = order_name
    parent_table["total_order_score"] = total_score
    edge_table["run"] = order_name
    fixed_order_rows.append(parent_table)
    fixed_order_edge_tables.append(edge_table)

fixed_order_parent_sets = pd.concat(fixed_order_rows, ignore_index=True)
fixed_order_edges = pd.concat(fixed_order_edge_tables, ignore_index=True)
fixed_order_metrics = pd.concat(
    [
        summarize_against_truth(fixed_order_edges[fixed_order_edges["run"] == name], true_edges, name)
        for name in ["causal_order", "reversed_order"]
    ],
    ignore_index=True,
)
order_score_summary = fixed_order_parent_sets.groupby("order_name", as_index=False).agg(
    total_order_score=("total_order_score", "first"),
    selected_parent_links=("selected_parents", lambda values: sum(value != "none" for value in values)),
)

fixed_order_parent_sets.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_fixed_order_parent_sets.csv", index=False)
fixed_order_edges.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_fixed_order_edges.csv", index=False)
fixed_order_metrics.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_fixed_order_metrics.csv", index=False)
order_score_summary.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_fixed_order_score_summary.csv", index=False)

display(order_score_summary.round(4))
display(fixed_order_parent_sets[["order_name", "child", "position", "selected_parents", "local_score"]].round(4))
display(fixed_order_metrics.round(4))

	order_name	total_order_score	selected_parent_links
0	causal_order	-2547.9957	4
1	reversed_order	-2562.6663	5

	order_name	child	position	selected_parents	local_score
0	causal_order	need	1	none	-1253.9120
1	causal_order	intent	2	none	-1253.9120
2	causal_order	match	3	need, intent	504.3279
3	causal_order	engagement	4	match	142.9078
4	causal_order	renewal	5	intent, engagement	53.2493
5	causal_order	support	6	engagement	-740.6566
6	reversed_order	support	1	none	-1253.9120
7	reversed_order	renewal	2	support	-1058.9669
8	reversed_order	engagement	3	support, renewal	-197.2864
9	reversed_order	match	4	renewal, engagement	234.5766
10	reversed_order	intent	5	renewal, engagement, match	-137.2946
11	reversed_order	need	6	match, intent	-149.7830

	run	learned_edges_total	definite_directed_edges	true_edges	correct_directed_edges	directed_precision	directed_recall	reversed_true_edges	unresolved_true_adjacencies	missing_true_adjacencies	extra_adjacencies
0	causal_order	6	6	6	6	1.0	1.0	0	0	0	0
1	reversed_order	10	10	6	0	0.0	0.0	6	0	0	4

The causal order should score better and recover the direct structure more naturally. The reversed order may still fit dependencies, but it must explain them with backward or awkward parent choices.

Run BOSS And GRaSP

Now we run the actual permutation-based algorithms. The baseline uses the same BIC penalty as the GES comparison later and fixes the random seed so the notebook is reproducible.

BASELINE_SEED = 1
LAMBDA_VALUE = 0.5
GRASP_DEPTH = 4

boss_graph, boss_edges, boss_elapsed = run_boss(
    linear_df,
    label="BOSS",
    seed=BASELINE_SEED,
    lambda_value=LAMBDA_VALUE,
)
grasp_graph, grasp_edges, grasp_elapsed = run_grasp(
    linear_df,
    label="GRaSP_depth_4",
    seed=BASELINE_SEED,
    depth=GRASP_DEPTH,
    lambda_value=LAMBDA_VALUE,
)

permutation_edges = pd.concat([boss_edges, grasp_edges], ignore_index=True)
permutation_metrics = pd.concat(
    [
        summarize_against_truth(boss_edges, true_edges, "BOSS"),
        summarize_against_truth(grasp_edges, true_edges, "GRaSP_depth_4"),
    ],
    ignore_index=True,
)
permutation_metrics["elapsed_seconds"] = [boss_elapsed, grasp_elapsed]
permutation_metrics["seed"] = BASELINE_SEED
permutation_metrics["lambda_value"] = LAMBDA_VALUE

permutation_edges.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_boss_grasp_edges.csv", index=False)
permutation_metrics.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_boss_grasp_metrics.csv", index=False)

display(permutation_edges)
display(permutation_metrics.round(4))

	run	source	edge_type	target
0	BOSS	need	-->	match
1	BOSS	intent	-->	match
2	BOSS	intent	-->	renewal
3	BOSS	match	-->	engagement
4	BOSS	engagement	-->	renewal
5	BOSS	engagement	-->	support
6	GRaSP_depth_4	need	-->	match
7	GRaSP_depth_4	intent	-->	match
8	GRaSP_depth_4	intent	-->	renewal
9	GRaSP_depth_4	match	-->	engagement
10	GRaSP_depth_4	engagement	-->	renewal
11	GRaSP_depth_4	engagement	-->	support

	run	learned_edges_total	definite_directed_edges	true_edges	correct_directed_edges	directed_precision	directed_recall	reversed_true_edges	unresolved_true_adjacencies	missing_true_adjacencies	extra_adjacencies	elapsed_seconds	seed	lambda_value
0	BOSS	6	6	6	6	1.0	1.0	0	0	0	0	0.0099	1	0.5
1	GRaSP_depth_4	6	6	6	6	1.0	1.0	0	0	0	0	0.0071	1	0.5

On this friendly six-variable dataset, both methods should recover the same core graph. That is reassuring, but the stability sections below are still necessary because order-search algorithms can depend on random starts and search settings.

Draw The BOSS Graph

The BOSS graph is drawn first because BOSS is a clean baseline for best-order score search in this notebook.

draw_box_graph(
    boss_edges,
    title="BOSS Learned Graph",
    path=FIGURE_DIR / f"{NOTEBOOK_PREFIX}_boss_graph.png",
    note="BOSS searches over variable orders and derives parent sets consistent with the selected order.",
)

The graph layout matches the true DAG layout so differences are easy to see. For this baseline seed and dataset, the learned graph should be visually aligned with the true structure.

Compare With GES And Exact Search

Permutation-based search is one way to optimize graph structure. To anchor the result, we compare BOSS and GRaSP with GES and exact BIC search on the same data.

ges_record, ges_edges, ges_elapsed = run_ges_bic(linear_df, label="GES_BIC", lambda_value=LAMBDA_VALUE)
exact_adjacency, exact_stats, exact_edges, exact_elapsed = run_exact_bic(linear_df, label="Exact_BIC_AStar")

method_edges = pd.concat([boss_edges, grasp_edges, ges_edges, exact_edges], ignore_index=True)
method_metrics = pd.concat(
    [
        summarize_against_truth(boss_edges, true_edges, "BOSS"),
        summarize_against_truth(grasp_edges, true_edges, "GRaSP_depth_4"),
        summarize_against_truth(ges_edges, true_edges, "GES_BIC"),
        summarize_against_truth(exact_edges, true_edges, "Exact_BIC_AStar"),
    ],
    ignore_index=True,
)
method_metrics["elapsed_seconds"] = [boss_elapsed, grasp_elapsed, ges_elapsed, exact_elapsed]
method_metrics["search_family"] = ["permutation", "permutation", "score_greedy", "score_exact"]

method_edges.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_method_comparison_edges.csv", index=False)
method_metrics.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_method_comparison_metrics.csv", index=False)
pd.DataFrame([exact_stats]).to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_exact_search_stats.csv", index=False)

display(method_edges)
display(method_metrics.round(4))

	run	source	edge_type	target
0	BOSS	need	-->	match
1	BOSS	intent	-->	match
2	BOSS	intent	-->	renewal
3	BOSS	match	-->	engagement
4	BOSS	engagement	-->	renewal
5	BOSS	engagement	-->	support
6	GRaSP_depth_4	need	-->	match
7	GRaSP_depth_4	intent	-->	match
8	GRaSP_depth_4	intent	-->	renewal
9	GRaSP_depth_4	match	-->	engagement
10	GRaSP_depth_4	engagement	-->	renewal
11	GRaSP_depth_4	engagement	-->	support
12	GES_BIC	need	-->	match
13	GES_BIC	intent	-->	match
14	GES_BIC	intent	-->	renewal
15	GES_BIC	match	-->	engagement
16	GES_BIC	engagement	-->	renewal
17	GES_BIC	engagement	-->	support
18	Exact_BIC_AStar	need	-->	match
19	Exact_BIC_AStar	intent	-->	match
20	Exact_BIC_AStar	intent	-->	renewal
21	Exact_BIC_AStar	match	-->	engagement
22	Exact_BIC_AStar	engagement	-->	renewal
23	Exact_BIC_AStar	engagement	-->	support

	run	learned_edges_total	definite_directed_edges	true_edges	correct_directed_edges	directed_precision	directed_recall	elapsed_seconds	search_family
0	BOSS	6	6	6	6	1.0	1.0	0.0099	permutation
1	GRaSP_depth_4	6	6	6	6	1.0	1.0	0.0071	permutation
2	GES_BIC	6	6	6	6	1.0	1.0	0.0210	score_greedy
3	Exact_BIC_AStar	6	6	6	6	1.0	1.0	0.0192	score_exact

Agreement across BOSS, GRaSP, GES, and exact search is a strong teaching result for this small aligned dataset. In real data, disagreement is often more informative than agreement because it exposes assumption and search sensitivity.

Plot Method Comparison

The table above is compact, but a plot makes recovery differences easier to scan. We focus on directed correctness, unresolved true adjacencies, and extra adjacencies.

method_plot_df = method_metrics.melt(
    id_vars=["run", "search_family"],
    value_vars=["correct_directed_edges", "reversed_true_edges", "unresolved_true_adjacencies", "extra_adjacencies"],
    var_name="metric",
    value_name="count",
)

fig, ax = plt.subplots(figsize=(11, 5.4))
sns.barplot(data=method_plot_df, x="metric", y="count", hue="run", ax=ax)
ax.set_title("Permutation Methods Compared With GES And Exact Search")
ax.set_xlabel("Metric")
ax.set_ylabel("Count")
ax.tick_params(axis="x", rotation=20)
ax.legend(title="Method", bbox_to_anchor=(1.02, 1), loc="upper left")
plt.tight_layout()
fig.savefig(FIGURE_DIR / f"{NOTEBOOK_PREFIX}_method_comparison_metrics.png", dpi=160, bbox_inches="tight")
plt.show()

The plot should show that all methods perform well in this controlled setting. The later sensitivity sections are where differences become more visible.

Random-Seed Stability

BOSS and GRaSP use random order traversal or random starting choices. This cell repeats each method across several seeds and records whether the recovered graph changes.

stability_rows = []
stability_edges = []
for seed in range(10):
    for method_name in ["BOSS", "GRaSP_depth_4"]:
        if method_name == "BOSS":
            _, edge_table, elapsed = run_boss(linear_df, label=f"BOSS_seed_{seed}", seed=seed, lambda_value=LAMBDA_VALUE)
        else:
            _, edge_table, elapsed = run_grasp(linear_df, label=f"GRaSP_seed_{seed}", seed=seed, depth=GRASP_DEPTH, lambda_value=LAMBDA_VALUE)
        metrics = summarize_against_truth(edge_table, true_edges, label=f"{method_name}_seed_{seed}")
        metrics["method"] = method_name
        metrics["seed"] = seed
        metrics["elapsed_seconds"] = elapsed
        edge_signature = " | ".join(sorted(f"{row.source} {row.edge_type} {row.target}" for row in edge_table.itertuples(index=False)))
        metrics["edge_signature"] = edge_signature
        stability_rows.append(metrics)
        edge_table["method"] = method_name
        edge_table["seed"] = seed
        stability_edges.append(edge_table)

seed_stability_metrics = pd.concat(stability_rows, ignore_index=True)
seed_stability_edges = pd.concat(stability_edges, ignore_index=True)
seed_stability_summary = seed_stability_metrics.groupby("method", as_index=False).agg(
    runs=("seed", "count"),
    unique_graphs=("edge_signature", "nunique"),
    mean_correct_directed_edges=("correct_directed_edges", "mean"),
    min_correct_directed_edges=("correct_directed_edges", "min"),
    max_extra_adjacencies=("extra_adjacencies", "max"),
    median_elapsed_seconds=("elapsed_seconds", "median"),
)

seed_stability_metrics.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_seed_stability_metrics.csv", index=False)
seed_stability_edges.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_seed_stability_edges.csv", index=False)
seed_stability_summary.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_seed_stability_summary.csv", index=False)

display(seed_stability_summary.round(4))
display(seed_stability_metrics[["method", "seed", "correct_directed_edges", "unresolved_true_adjacencies", "extra_adjacencies", "elapsed_seconds"]].round(4))

	method	runs	unique_graphs	mean_correct_directed_edges	min_correct_directed_edges	max_extra_adjacencies	median_elapsed_seconds
0	BOSS	10	1	6.0	6	0	0.0088
1	GRaSP_depth_4	10	1	6.0	6	0	0.0076

	method	seed	correct_directed_edges	elapsed_seconds
0	BOSS	0	6	0.0087
1	GRaSP_depth_4	0	6	0.0085
2	BOSS	1	6	0.0094
3	GRaSP_depth_4	1	6	0.0077
4	BOSS	2	6	0.0099
5	GRaSP_depth_4	2	6	0.0073
6	BOSS	3	6	0.0087
7	GRaSP_depth_4	3	6	0.0058
8	BOSS	4	6	0.0088
9	GRaSP_depth_4	4	6	0.0091
10	BOSS	5	6	0.0088
11	GRaSP_depth_4	5	6	0.0059
12	BOSS	6	6	0.0083
13	GRaSP_depth_4	6	6	0.0076
14	BOSS	7	6	0.0086
15	GRaSP_depth_4	7	6	0.0093
16	BOSS	8	6	0.0139
17	GRaSP_depth_4	8	6	0.0090
18	BOSS	9	6	0.0088
19	GRaSP_depth_4	9	6	0.0066

The stability summary is the practical guardrail. If a method returns several distinct graphs across seeds, the report should mention that sensitivity rather than presenting one seed as definitive.

Plot Seed Stability

This plot shows recovery variation across random seeds. For a stable method and an easy dataset, the points should cluster tightly.

fig, ax = plt.subplots(figsize=(9, 5))
sns.stripplot(
    data=seed_stability_metrics,
    x="method",
    y="correct_directed_edges",
    color="#334155",
    size=7,
    alpha=0.75,
    ax=ax,
)
ax.set_title("Random-Seed Stability For BOSS And GRaSP")
ax.set_xlabel("Method")
ax.set_ylabel("Correct Directed Edges")
ax.set_ylim(-0.2, len(true_edges) + 0.5)
plt.tight_layout()
fig.savefig(FIGURE_DIR / f"{NOTEBOOK_PREFIX}_seed_stability.png", dpi=160, bbox_inches="tight")
plt.show()

Seed sensitivity is not automatically bad, but it changes how results should be communicated. Stable relationships across many starts are more credible than relationships that appear in one lucky order.

GRaSP Depth Sensitivity

GRaSP has a depth parameter controlling how deeply it searches relaxation moves. Larger depth can help escape shallow local structures, but it can also increase runtime. This cell compares several depths across a few seeds.

depth_rows = []
for depth in [1, 2, 3, 4]:
    for seed in range(5):
        _, edge_table, elapsed = run_grasp(
            linear_df,
            label=f"GRaSP_depth_{depth}_seed_{seed}",
            seed=seed,
            depth=depth,
            lambda_value=LAMBDA_VALUE,
        )
        metrics = summarize_against_truth(edge_table, true_edges, label=f"GRaSP_depth_{depth}_seed_{seed}")
        metrics["depth"] = depth
        metrics["seed"] = seed
        metrics["elapsed_seconds"] = elapsed
        depth_rows.append(metrics)

grasp_depth_metrics = pd.concat(depth_rows, ignore_index=True)
grasp_depth_summary = grasp_depth_metrics.groupby("depth", as_index=False).agg(
    runs=("seed", "count"),
    mean_correct_directed_edges=("correct_directed_edges", "mean"),
    min_correct_directed_edges=("correct_directed_edges", "min"),
    mean_extra_adjacencies=("extra_adjacencies", "mean"),
    median_elapsed_seconds=("elapsed_seconds", "median"),
)

grasp_depth_metrics.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_grasp_depth_metrics.csv", index=False)
grasp_depth_summary.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_grasp_depth_summary.csv", index=False)

display(grasp_depth_summary.round(4))
display(grasp_depth_metrics[["depth", "seed", "correct_directed_edges", "unresolved_true_adjacencies", "extra_adjacencies", "elapsed_seconds"]].round(4))

	depth	runs	mean_correct_directed_edges	min_correct_directed_edges	mean_extra_adjacencies	median_elapsed_seconds
0	1	5	4.4	2	0.4	0.0070
1	2	5	3.6	2	0.6	0.0075
2	3	5	4.4	2	0.4	0.0078
3	4	5	6.0	6	0.0	0.0073

	depth	seed	correct_directed_edges	unresolved_true_adjacencies	extra_adjacencies	elapsed_seconds
0	1	0	2	4	1	0.0102
1	1	1	6	0	0	0.0068
2	1	2	6	0	0	0.0070
3	1	3	6	0	0	0.0070
4	1	4	2	4	1	0.0080
5	2	0	2	4	1	0.0083
6	2	1	2	4	1	0.0062
7	2	2	6	0	0	0.0075
8	2	3	6	0	0	0.0058
9	2	4	2	4	1	0.0081
10	3	0	6	0	0	0.0085
11	3	1	2	4	1	0.0062
12	3	2	6	0	0	0.0078
13	3	3	6	0	0	0.0055
14	3	4	2	4	1	0.0084
15	4	0	6	0	0	0.0085
16	4	1	6	0	0	0.0069
17	4	2	6	0	0	0.0073
18	4	3	6	0	0	0.0059
19	4	4	6	0	0	0.0093

Depth sensitivity shows that a search setting can change the result even when the score and data are fixed. That is exactly why algorithm settings belong in the report.

Plot GRaSP Depth Sensitivity

This plot summarizes how correct directed recovery changes with depth across seeds.

fig, ax = plt.subplots(figsize=(9, 5))
sns.lineplot(
    data=grasp_depth_metrics,
    x="depth",
    y="correct_directed_edges",
    marker="o",
    errorbar="sd",
    ax=ax,
)
ax.set_title("GRaSP Depth Sensitivity")
ax.set_xlabel("GRaSP Depth")
ax.set_ylabel("Correct Directed Edges")
ax.set_ylim(-0.2, len(true_edges) + 0.5)
plt.tight_layout()
fig.savefig(FIGURE_DIR / f"{NOTEBOOK_PREFIX}_grasp_depth_sensitivity.png", dpi=160, bbox_inches="tight")
plt.show()

For this dataset, deeper search should generally be more stable. On larger datasets, depth can trade off accuracy and runtime, so it is worth tuning deliberately rather than leaving it unexplained.

Sample-Size Sensitivity

Order-based methods still rely on local scores estimated from data. With fewer rows, local scores are noisier, and the preferred order or parent sets can change. This cell runs BOSS and GRaSP on increasing sample sizes.

sample_rows = []
for sample_size in [100, 200, 400, len(linear_df)]:
    sample_df = linear_df.sample(n=sample_size, replace=False, random_state=sample_size).reset_index(drop=True)
    for method_name in ["BOSS", "GRaSP_depth_4"]:
        if method_name == "BOSS":
            _, edge_table, elapsed = run_boss(sample_df, label=f"BOSS_n_{sample_size}", seed=BASELINE_SEED, lambda_value=LAMBDA_VALUE)
        else:
            _, edge_table, elapsed = run_grasp(sample_df, label=f"GRaSP_n_{sample_size}", seed=BASELINE_SEED, depth=GRASP_DEPTH, lambda_value=LAMBDA_VALUE)
        metrics = summarize_against_truth(edge_table, true_edges, label=f"{method_name}_n_{sample_size}")
        metrics["method"] = method_name
        metrics["sample_size"] = sample_size
        metrics["elapsed_seconds"] = elapsed
        sample_rows.append(metrics)

sample_size_metrics = pd.concat(sample_rows, ignore_index=True)
sample_size_metrics.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_sample_size_metrics.csv", index=False)
display(sample_size_metrics.round(4))

	run	learned_edges_total	definite_directed_edges	true_edges	correct_directed_edges	directed_precision	directed_recall	reversed_true_edges	unresolved_true_adjacencies	extra_adjacencies	method	sample_size	elapsed_seconds
0	BOSS_n_100	8	6	6	5	0.8333	0.8333	1	0	2	BOSS	100	0.0108
1	GRaSP_depth_4_n_100	9	6	6	1	0.1667	0.1667	2	3	3	GRaSP_depth_4	100	0.0054
2	BOSS_n_200	6	6	6	6	1.0000	1.0000	0	0	0	BOSS	200	0.0087
3	GRaSP_depth_4_n_200	6	6	6	6	1.0000	1.0000	0	0	0	GRaSP_depth_4	200	0.0072
4	BOSS_n_400	6	6	6	6	1.0000	1.0000	0	0	0	BOSS	400	0.0089
5	GRaSP_depth_4_n_400	7	4	6	1	0.2500	0.1667	2	3	1	GRaSP_depth_4	400	0.0097
6	BOSS_n_2500	6	6	6	6	1.0000	1.0000	0	0	0	BOSS	2500	0.0087
7	GRaSP_depth_4_n_2500	6	6	6	6	1.0000	1.0000	0	0	0	GRaSP_depth_4	2500	0.0071

Sample-size sensitivity is about evidence quality, not just runtime. A graph that appears only at one sample size deserves more caution than a graph that stabilizes as the sample grows.

Plot Sample-Size Sensitivity

This plot tracks correct directed recovery as the number of rows increases.

fig, ax = plt.subplots(figsize=(9, 5))
sns.lineplot(data=sample_size_metrics, x="sample_size", y="correct_directed_edges", hue="method", marker="o", ax=ax)
ax.set_title("Permutation Method Sample-Size Sensitivity")
ax.set_xlabel("Rows Used")
ax.set_ylabel("Correct Directed Edges")
ax.set_ylim(-0.2, len(true_edges) + 0.5)
ax.legend(title="Method")
plt.tight_layout()
fig.savefig(FIGURE_DIR / f"{NOTEBOOK_PREFIX}_sample_size_sensitivity.png", dpi=160, bbox_inches="tight")
plt.show()

The plot should be read as a stability diagnostic. It does not prove population recovery, but it shows whether the method becomes more reliable as score estimates get more data.

Runtime And Scaling Sketch

Permutation methods scale with variables, possible parent choices, and score evaluations. This quick sketch runs prefixes of the variable list to show how runtime changes with variable count on the same dataset.

scaling_rows = []
for variable_count in [3, 4, 5, 6]:
    cols = VARIABLES[:variable_count]
    subset_df = linear_df[cols].copy()
    for method_name in ["BOSS", "GRaSP_depth_4"]:
        if method_name == "BOSS":
            _, edge_table, elapsed = run_boss(subset_df, label=f"BOSS_p_{variable_count}", seed=BASELINE_SEED, lambda_value=LAMBDA_VALUE)
        else:
            _, edge_table, elapsed = run_grasp(subset_df, label=f"GRaSP_p_{variable_count}", seed=BASELINE_SEED, depth=GRASP_DEPTH, lambda_value=LAMBDA_VALUE)
        scaling_rows.append(
            {
                "method": method_name,
                "variable_count": variable_count,
                "elapsed_seconds": elapsed,
                "learned_edges": len(edge_table),
            }
        )

scaling_table = pd.DataFrame(scaling_rows)
scaling_table.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_runtime_scaling_sketch.csv", index=False)
display(scaling_table.round(5))

	method	variable_count	elapsed_seconds	learned_edges
0	BOSS	3	0.00113	2
1	GRaSP_depth_4	3	0.00091	2
2	BOSS	4	0.00209	3
3	GRaSP_depth_4	4	0.00211	3
4	BOSS	5	0.00437	5
5	GRaSP_depth_4	5	0.00339	5
6	BOSS	6	0.00882	6
7	GRaSP_depth_4	6	0.00700	6

The absolute times are tiny here because the graph is tiny. The purpose is to show the workflow for tracking runtime as variable count grows, not to claim that all real permutation searches will be this fast.

Plot Runtime Scaling Sketch

This plot shows elapsed time by variable count for the tiny prefix runs.

fig, ax = plt.subplots(figsize=(8.5, 5))
sns.lineplot(data=scaling_table, x="variable_count", y="elapsed_seconds", hue="method", marker="o", ax=ax)
ax.set_title("Runtime Sketch For Permutation-Based Methods")
ax.set_xlabel("Number Of Variables")
ax.set_ylabel("Elapsed Seconds")
ax.legend(title="Method")
plt.tight_layout()
fig.savefig(FIGURE_DIR / f"{NOTEBOOK_PREFIX}_runtime_scaling_sketch.png", dpi=160, bbox_inches="tight")
plt.show()

For larger datasets, runtime should be measured and reported directly. Search depth, score type, parent-set complexity, and number of restarts can all change compute cost.

Reporting Checklist

Permutation-based results should be reported as score-based search outputs with order-search settings. This checklist records the details that make BOSS and GRaSP analyses easier to audit.

reporting_checklist = pd.DataFrame(
    [
        {
            "item": "State the score function",
            "what_to_report": "BIC, BDeu, CV score, or another local score plus penalty settings.",
            "why_it_matters": "The selected order and graph optimize this score, not a score-free causal criterion.",
        },
        {
            "item": "Report random seeds or restarts",
            "what_to_report": "Seeds used, number of starts, and whether graphs changed across starts.",
            "why_it_matters": "Permutation search can depend on random traversal or initial order.",
        },
        {
            "item": "Document GRaSP depth",
            "what_to_report": "Depth setting and sensitivity results when GRaSP is used.",
            "why_it_matters": "Depth changes the relaxation search and can change recovery.",
        },
        {
            "item": "Compare with another method",
            "what_to_report": "GES, PC, exact search on a small subproblem, or domain-constrained alternatives.",
            "why_it_matters": "Agreement across search families is more reassuring than one graph alone.",
        },
        {
            "item": "Track runtime and graph size",
            "what_to_report": "Elapsed time, variable count, learned edge count, and any parent limits.",
            "why_it_matters": "Order search can become expensive as graph size and score complexity grow.",
        },
        {
            "item": "Separate optimization from causality",
            "what_to_report": "What was optimized and which causal assumptions are still required.",
            "why_it_matters": "A high-scoring permutation is not causal proof by itself.",
        },
    ]
)
reporting_checklist.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_permutation_reporting_checklist.csv", index=False)
display(reporting_checklist)

	item	what_to_report	why_it_matters
0	State the score function	BIC, BDeu, CV score, or another local score pl...	The selected order and graph optimize this sco...
1	Report random seeds or restarts	Seeds used, number of starts, and whether grap...	Permutation search can depend on random traver...
2	Document GRaSP depth	Depth setting and sensitivity results when GRa...	Depth changes the relaxation search and can ch...
3	Compare with another method	GES, PC, exact search on a small subproblem, o...	Agreement across search families is more reass...
4	Track runtime and graph size	Elapsed time, variable count, learned edge cou...	Order search can become expensive as graph siz...
5	Separate optimization from causality	What was optimized and which causal assumption...	A high-scoring permutation is not causal proof...

The checklist is deliberately practical. It keeps the final graph connected to the score, search settings, stability checks, and assumptions that produced it.

Artifact Manifest

The final cell lists every dataset, table, and figure generated by this notebook. This makes it easy to review outputs without searching the folder manually.

artifact_rows = []
for folder, artifact_type in [(DATASET_DIR, "dataset"), (TABLE_DIR, "table"), (FIGURE_DIR, "figure")]:
    for artifact_path in sorted(folder.glob(f"{NOTEBOOK_PREFIX}_*")):
        artifact_rows.append(
            {
                "artifact_type": artifact_type,
                "path": str(artifact_path.relative_to(NOTEBOOK_DIR)),
                "size_kb": round(artifact_path.stat().st_size / 1024, 2),
            }
        )

artifact_manifest = pd.DataFrame(artifact_rows)
artifact_manifest.to_csv(TABLE_DIR / f"{NOTEBOOK_PREFIX}_artifact_manifest.csv", index=False)
display(artifact_manifest)

	artifact_type	path	size_kb
0	table	outputs/tables/12_boss_grasp_edges.csv	0.37
1	table	outputs/tables/12_boss_grasp_metrics.csv	0.35
2	table	outputs/tables/12_exact_search_stats.csv	0.08
3	table	outputs/tables/12_fixed_order_edges.csv	0.55
4	table	outputs/tables/12_fixed_order_metrics.csv	0.28
5	table	outputs/tables/12_fixed_order_parent_sets.csv	1.72
6	table	outputs/tables/12_fixed_order_score_summary.csv	0.12
7	table	outputs/tables/12_grasp_depth_metrics.csv	1.70
8	table	outputs/tables/12_grasp_depth_summary.csv	0.25
9	table	outputs/tables/12_loaded_dataset_summary.csv	0.15
10	table	outputs/tables/12_method_comparison_edges.csv	0.74
11	table	outputs/tables/12_method_comparison_metrics.csv	0.50
12	table	outputs/tables/12_package_versions.csv	0.09
13	table	outputs/tables/12_permutation_reporting_checkl...	1.03
14	table	outputs/tables/12_runtime_scaling_sketch.csv	0.33
15	table	outputs/tables/12_sample_size_metrics.csv	0.90
16	table	outputs/tables/12_seed_stability_edges.csv	5.16
17	table	outputs/tables/12_seed_stability_metrics.csv	4.17
18	table	outputs/tables/12_seed_stability_summary.csv	0.21
19	figure	outputs/figures/12_boss_graph.png	66.89
20	figure	outputs/figures/12_grasp_depth_sensitivity.png	69.43
21	figure	outputs/figures/12_method_comparison_metrics.png	94.79
22	figure	outputs/figures/12_runtime_scaling_sketch.png	88.72
23	figure	outputs/figures/12_sample_size_sensitivity.png	77.80
24	figure	outputs/figures/12_seed_stability.png	41.85
25	figure	outputs/figures/12_true_dag.png	67.02

This notebook now gives a complete permutation-based discovery workflow: order-search theory, fixed-order parent selection, BOSS and GRaSP runs, comparison with GES and exact search, seed stability, GRaSP depth sensitivity, sample-size sensitivity, runtime sketching, and reporting guidance.

The next tutorial can move to time-series causal discovery, where lag structure provides another source of direction information.