Knowledge Graph — Structure and Update Guide¶

Purpose: maintain a compressed knowledge graph of the book, simulator, methods, definitions/equations, proofs, and code links. It is human‑readable and LLM‑friendly, and supports forward/backward hooks to reference not‑yet‑written parts.

Primary file: docs/knowledge_graph/graph.yaml
Schema (fields + enums): docs/knowledge_graph/schema.yaml
Optional: per‑chapter stubs can be added later and merged into graph.yaml.

ID Conventions¶

Chapters: CH-<num> (e.g., CH-1, CH-11)
Equations: EQ-<chapter>.<num>[suffix] (e.g., EQ-1.2, EQ-1.2-prime)
Remarks/Defs/Theorems: REM-<id>, DF-<id>, TH-<id>
Concepts: CN-<slug> (e.g., CN-CMDP, CN-OPE)
Modules: MOD-<package.path> (e.g., MOD-zoosim.reward)
Tests: TEST-<path>
Plans: PLN-<slug> (e.g., PLN-Frontier-Phase)

Status¶

planned → placeholder exists; content pending
in_progress → partial content/code exists; iterating
complete → content/code implemented and referenced
archived → superseded by newer nodes

Relations (Edge types)¶

defines (chapter/section → definition/equation/concept)
proves (chapter/section → theorem)
uses (chapter/section/module → node)
implements (module → equation/concept/algorithm)
tested_by (module/node → test)
depends_on (node → node)
refers_to_future (node → planned node) [forward hook]
superseded_by (archived → node)

How to Update¶

Add or edit nodes/edges in graph.yaml.
Keep summary one‑line and concrete; include file and anchor when applicable.
For forward hooks, add a refers_to_future edge from current node to a planned node.
Prefer referencing anchors over line numbers for stability.

Anchor Format Conventions¶

Equations: Use Pandoc-style labels: {#EQ-1.2} immediately after \tag{1.2} in markdown
Theorems/Definitions: Use Pandoc labels: {#THM-2.3.1}, {#DEF-1.1.1}
Sections/Headings: Use standard markdown anchors: #section-title (auto-generated by MkDocs)
In KG nodes: Store anchors with braces: anchor: "{#EQ-1.2}" for consistency with Pandoc processing
Validation: The validator checks for substring presence; both {#EQ-1.2} and #EQ-1.2 will match

Example:

$$
R = \alpha \cdot \text{GMV} + \beta \cdot \text{CM2}
\tag{1.2}
$$
{#EQ-1.2}

In graph.yaml:

anchor: "{#EQ-1.2}"

Forward Hooks Workflow¶

When a draft chapter cites content that doesn't exist yet:

Create a planned node for the future content: ```yaml
id: CH-11 kind: chapter status: planned file: docs/book/drafts/syllabus.md # Points to planning doc initially summary: Multi-episode inter-session MDP ```
Add a refers_to_future edge from the current chapter: ```yaml
{src: CH-1, dst: CH-11, rel: refers_to_future, status: complete} ```
Update the node when drafting begins: ```yaml
id: CH-11 status: in_progress file: docs/book/drafts/ch11_multi_episode.md # Now points to actual draft ```
In the draft text, add a cross-reference box: markdown !!! note "Forward Reference — Chapter 11" The multi-episode formulation (Chapter 11) operationalizes this with retention hazard models. See [EQ-1.2-prime] for the principled objective.

Example (abbreviated)¶

Node: EQ-1.2 (equation) defined in Chapter 1; implemented by zoosim/dynamics/reward.py.
Node: CH-11 (chapter) implements EQ-1.2-prime; depends on MOD-zoosim.session_env.
Edge: MOD-zoosim.session_env implements EQ-1.2-prime.
Edge: CH-1 refers_to_future CH-11 (forward hook).

See graph.yaml for a working set seeded from current content.

NetworkX Query Tools¶

The knowledge graph includes NetworkX-based Python utilities for powerful queries, validation, and visualization.

Tools Overview¶

kg_tools.py — Core graph query library
validate_kg.py — Comprehensive validation and consistency checking
visualize_kg.py — Dependency graph visualization

Installation¶

# Install dependencies (already in project environment)
pip install networkx pyyaml matplotlib

Quick Start¶

Load the graph:

from kg_tools import KnowledgeGraph

kg = KnowledgeGraph("docs/knowledge_graph/graph.yaml")

Find untested equations:

untested = kg.find_untested_equations()
print(f"Untested equations: {untested}")

Get transitive dependencies:

deps = kg.transitive_dependencies("CH-11")
print(f"CH-11 depends on: {deps}")

Find what's blocking progress:

blockers = kg.find_blockers("CH-11")
for blocker_id, status, title in blockers:
    print(f"  {blocker_id} ({status}): {title}")

Generate coverage report:

coverage = kg.coverage_report()
for kind, stats in coverage.items():
    print(f"{kind}: {stats['tested']}/{stats['total']} ({stats['coverage_pct']}%)")

Get chapter summary:

summary = kg.chapter_summary("CH-1")
print(f"Chapter 1 defines {summary['defines_count']} items")
print(f"Forward refs: {[r['id'] for r in summary['forward_refs']]}")

Command-Line Usage¶

Run statistics:

python docs/knowledge_graph/kg_tools.py

Validate the graph:

python docs/knowledge_graph/validate_kg.py

Visualize a chapter:

python docs/knowledge_graph/visualize_kg.py --chapter CH-1 --output ch01.png

Visualize dependencies:

python docs/knowledge_graph/visualize_kg.py --deps CH-11 --output ch11_deps.png

Generate implementation map:

python docs/knowledge_graph/visualize_kg.py --impl-map --output impl_map.png

Advanced Queries¶

Find unimplemented equations:

unimpl = kg.find_unimplemented_equations()

Find circular dependencies:

cycles = kg.find_cycles()
if cycles:
    print(f"Found {len(cycles)} circular dependencies!")

Get all nodes by kind:

equations = kg.nodes_by_kind('equation')
modules = kg.nodes_by_kind('module')

Get all nodes by status:

planned = kg.nodes_by_status('planned')
in_progress = kg.nodes_by_status('in_progress')

Check implementation status:

impl_status = kg.implementation_status()
print(f"Implemented: {len(impl_status['implemented'])}")
print(f"Unimplemented: {len(impl_status['unimplemented'])}")

Export comprehensive stats:

stats = kg.export_stats()
print(f"Total nodes: {stats['total_nodes']}")
print(f"Total edges: {stats['total_edges']}")
print(f"Orphan nodes: {stats['orphan_nodes']}")

Validation Checks¶

The validator (validate_kg.py) performs:

✅ Referential integrity — No dangling references to non-existent nodes
✅ File existence — All referenced files exist in repository
✅ Anchor presence — Declared anchors exist in files
✅ Status consistency — Complete nodes don't depend on planned nodes
✅ Circular dependencies — No cycles in dependency graph
✅ Orphan detection — Nodes with no connections
✅ Coverage gaps — Equations/modules without tests
✅ Schema compliance — Valid node kinds, statuses, and edge types

Run validation before committing changes to graph.yaml:

python docs/knowledge_graph/validate_kg.py
# Exit code 0 if passed, 1 if failed

Visualization¶

The visualizer generates:

Chapter dependency graphs — What each chapter defines and depends on
Dependency trees — Transitive dependencies for any node
Implementation maps — Which modules implement which equations/algorithms

Color scheme: - Blue: Chapters - Orange: Equations - Red: Theorems - Purple: Definitions - Green: Modules - Yellow: Tests - Teal: Algorithms - Brown: Concepts

Status markers: - ○ Circle: Planned - □ Square: In Progress - ◇ Diamond: Complete - × X: Archived

Example visualizations:

# Chapter 1 with depth 2, excluding tests
python docs/knowledge_graph/visualize_kg.py --chapter CH-1 --depth 2 --output ch01.png

# Show all dependencies for CH-11
python docs/knowledge_graph/visualize_kg.py --deps CH-11 --output ch11_deps.png

# Implementation map showing module → equation links
python docs/knowledge_graph/visualize_kg.py --impl-map --output impl.png

Integration with Workflow¶

After writing a new chapter:

Update graph.yaml with new nodes/edges
Run validator: python docs/knowledge_graph/validate_kg.py
Fix any errors/warnings
Generate visualization: python docs/knowledge_graph/visualize_kg.py --chapter CH-X
Commit both graph.yaml and visualization PNGs

When planning new content:

Use kg.find_blockers("CH-X") to see what needs to be written first
Use kg.transitive_dependencies("CH-X") to understand full dependency chain
Add forward hooks for future chapters/equations

For code reviews:

Run kg.coverage_report() to check test coverage
Run kg.implementation_status() to verify equations are implemented
Check kg.find_missing_refs() for dangling references

Best Practices¶

Keep YAML as source of truth — All edits go to graph.yaml, not the NetworkX graph
Run validation frequently — Catch issues early
Visualize complex dependencies — Better than manually tracing edges
Use queries for reports — Don't manually count nodes
Commit visualizations — Helps reviewers understand structure

API Reference¶

See docstrings in kg_tools.py for complete API documentation:

help(KnowledgeGraph)

Key methods: - nodes_by_kind(kind) — Get all nodes of a type - nodes_by_status(status) — Get nodes by status - get_node_data(node_id) — Get all attributes for a node - find_untested_equations() — Find equations without tests - find_unimplemented_equations() — Find equations without implementations - transitive_dependencies(node_id) — Get dependency closure - transitive_dependents(node_id) — Get dependent closure - find_blockers(node_id) — Find incomplete dependencies - find_missing_refs() — Find dangling references - find_orphan_nodes() — Find isolated nodes - find_cycles() — Detect circular dependencies - coverage_report() — Test coverage by kind - status_summary() — Nodes by status - chapter_summary(chapter_id) — Chapter statistics - implementation_status() — Implementation coverage - export_stats() — Comprehensive statistics

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search