Knowledge Graph — Structure and Update Guide

Purpose: maintain a compressed knowledge graph of the book, simulator, methods, definitions/equations, proofs, and code links. It is human‑readable and LLM‑friendly, and supports forward/backward hooks to reference not‑yet‑written parts.

  • Primary file: docs/knowledge_graph/graph.yaml
  • Schema (fields + enums): docs/knowledge_graph/schema.yaml
  • Optional: per‑chapter stubs can be added later and merged into graph.yaml.

ID Conventions

  • Chapters: CH-<num> (e.g., CH-1, CH-11)
  • Equations: EQ-<chapter>.<num>[suffix] (e.g., EQ-1.2, EQ-1.2-prime)
  • Remarks/Defs/Theorems: REM-<id>, DF-<id>, TH-<id>
  • Concepts: CN-<slug> (e.g., CN-CMDP, CN-OPE)
  • Modules: MOD-<package.path> (e.g., MOD-zoosim.reward)
  • Tests: TEST-<path>
  • Plans: PLN-<slug> (e.g., PLN-Frontier-Phase)

Status

  • planned → placeholder exists; content pending
  • in_progress → partial content/code exists; iterating
  • complete → content/code implemented and referenced
  • archived → superseded by newer nodes

Relations (Edge types)

  • defines (chapter/section → definition/equation/concept)
  • proves (chapter/section → theorem)
  • uses (chapter/section/module → node)
  • implements (module → equation/concept/algorithm)
  • tested_by (module/node → test)
  • depends_on (node → node)
  • refers_to_future (node → planned node) [forward hook]
  • superseded_by (archived → node)

How to Update

  1. Add or edit nodes/edges in graph.yaml.
  2. Keep summary one‑line and concrete; include file and anchor when applicable.
  3. For forward hooks, add a refers_to_future edge from current node to a planned node.
  4. Prefer referencing anchors over line numbers for stability.

Anchor Format Conventions

  • Equations: Use Pandoc-style labels: {#EQ-1.2} immediately after \tag{1.2} in markdown
  • Theorems/Definitions: Use Pandoc labels: {#THM-2.3.1}, {#DEF-1.1.1}
  • Sections/Headings: Use standard markdown anchors: #section-title (auto-generated by MkDocs)
  • In KG nodes: Store anchors with braces: anchor: "{#EQ-1.2}" for consistency with Pandoc processing
  • Validation: The validator checks for substring presence; both {#EQ-1.2} and #EQ-1.2 will match

Example:

$$
R = \alpha \cdot \text{GMV} + \beta \cdot \text{CM2}
\tag{1.2}
$$
{#EQ-1.2}

In graph.yaml:

anchor: "{#EQ-1.2}"

Forward Hooks Workflow

When a draft chapter cites content that doesn't exist yet:

  1. Create a planned node for the future content: ```yaml
  2. id: CH-11 kind: chapter status: planned file: docs/book/drafts/syllabus.md # Points to planning doc initially summary: Multi-episode inter-session MDP ```

  3. Add a refers_to_future edge from the current chapter: ```yaml

  4. {src: CH-1, dst: CH-11, rel: refers_to_future, status: complete} ```

  5. Update the node when drafting begins: ```yaml

  6. id: CH-11 status: in_progress file: docs/book/drafts/ch11_multi_episode.md # Now points to actual draft ```

  7. In the draft text, add a cross-reference box: markdown !!! note "Forward Reference — Chapter 11" The multi-episode formulation (Chapter 11) operationalizes this with retention hazard models. See [EQ-1.2-prime] for the principled objective.

Example (abbreviated)

  • Node: EQ-1.2 (equation) defined in Chapter 1; implemented by zoosim/dynamics/reward.py.
  • Node: CH-11 (chapter) implements EQ-1.2-prime; depends on MOD-zoosim.session_env.
  • Edge: MOD-zoosim.session_env implements EQ-1.2-prime.
  • Edge: CH-1 refers_to_future CH-11 (forward hook).

See graph.yaml for a working set seeded from current content.


NetworkX Query Tools

The knowledge graph includes NetworkX-based Python utilities for powerful queries, validation, and visualization.

Tools Overview

  1. kg_tools.py — Core graph query library
  2. validate_kg.py — Comprehensive validation and consistency checking
  3. visualize_kg.py — Dependency graph visualization

Installation

# Install dependencies (already in project environment)
pip install networkx pyyaml matplotlib

Quick Start

Load the graph:

from kg_tools import KnowledgeGraph

kg = KnowledgeGraph("docs/knowledge_graph/graph.yaml")

Find untested equations:

untested = kg.find_untested_equations()
print(f"Untested equations: {untested}")

Get transitive dependencies:

deps = kg.transitive_dependencies("CH-11")
print(f"CH-11 depends on: {deps}")

Find what's blocking progress:

blockers = kg.find_blockers("CH-11")
for blocker_id, status, title in blockers:
    print(f"  {blocker_id} ({status}): {title}")

Generate coverage report:

coverage = kg.coverage_report()
for kind, stats in coverage.items():
    print(f"{kind}: {stats['tested']}/{stats['total']} ({stats['coverage_pct']}%)")

Get chapter summary:

summary = kg.chapter_summary("CH-1")
print(f"Chapter 1 defines {summary['defines_count']} items")
print(f"Forward refs: {[r['id'] for r in summary['forward_refs']]}")

Command-Line Usage

Run statistics:

python docs/knowledge_graph/kg_tools.py

Validate the graph:

python docs/knowledge_graph/validate_kg.py

Visualize a chapter:

python docs/knowledge_graph/visualize_kg.py --chapter CH-1 --output ch01.png

Visualize dependencies:

python docs/knowledge_graph/visualize_kg.py --deps CH-11 --output ch11_deps.png

Generate implementation map:

python docs/knowledge_graph/visualize_kg.py --impl-map --output impl_map.png

Advanced Queries

Find unimplemented equations:

unimpl = kg.find_unimplemented_equations()

Find circular dependencies:

cycles = kg.find_cycles()
if cycles:
    print(f"Found {len(cycles)} circular dependencies!")

Get all nodes by kind:

equations = kg.nodes_by_kind('equation')
modules = kg.nodes_by_kind('module')

Get all nodes by status:

planned = kg.nodes_by_status('planned')
in_progress = kg.nodes_by_status('in_progress')

Check implementation status:

impl_status = kg.implementation_status()
print(f"Implemented: {len(impl_status['implemented'])}")
print(f"Unimplemented: {len(impl_status['unimplemented'])}")

Export comprehensive stats:

stats = kg.export_stats()
print(f"Total nodes: {stats['total_nodes']}")
print(f"Total edges: {stats['total_edges']}")
print(f"Orphan nodes: {stats['orphan_nodes']}")

Validation Checks

The validator (validate_kg.py) performs:

  • Referential integrity — No dangling references to non-existent nodes
  • File existence — All referenced files exist in repository
  • Anchor presence — Declared anchors exist in files
  • Status consistency — Complete nodes don't depend on planned nodes
  • Circular dependencies — No cycles in dependency graph
  • Orphan detection — Nodes with no connections
  • Coverage gaps — Equations/modules without tests
  • Schema compliance — Valid node kinds, statuses, and edge types

Run validation before committing changes to graph.yaml:

python docs/knowledge_graph/validate_kg.py
# Exit code 0 if passed, 1 if failed

Visualization

The visualizer generates:

  1. Chapter dependency graphs — What each chapter defines and depends on
  2. Dependency trees — Transitive dependencies for any node
  3. Implementation maps — Which modules implement which equations/algorithms

Color scheme: - Blue: Chapters - Orange: Equations - Red: Theorems - Purple: Definitions - Green: Modules - Yellow: Tests - Teal: Algorithms - Brown: Concepts

Status markers: - ○ Circle: Planned - □ Square: In Progress - ◇ Diamond: Complete - × X: Archived

Example visualizations:

# Chapter 1 with depth 2, excluding tests
python docs/knowledge_graph/visualize_kg.py --chapter CH-1 --depth 2 --output ch01.png

# Show all dependencies for CH-11
python docs/knowledge_graph/visualize_kg.py --deps CH-11 --output ch11_deps.png

# Implementation map showing module → equation links
python docs/knowledge_graph/visualize_kg.py --impl-map --output impl.png

Integration with Workflow

After writing a new chapter:

  1. Update graph.yaml with new nodes/edges
  2. Run validator: python docs/knowledge_graph/validate_kg.py
  3. Fix any errors/warnings
  4. Generate visualization: python docs/knowledge_graph/visualize_kg.py --chapter CH-X
  5. Commit both graph.yaml and visualization PNGs

When planning new content:

  1. Use kg.find_blockers("CH-X") to see what needs to be written first
  2. Use kg.transitive_dependencies("CH-X") to understand full dependency chain
  3. Add forward hooks for future chapters/equations

For code reviews:

  1. Run kg.coverage_report() to check test coverage
  2. Run kg.implementation_status() to verify equations are implemented
  3. Check kg.find_missing_refs() for dangling references

Best Practices

  1. Keep YAML as source of truth — All edits go to graph.yaml, not the NetworkX graph
  2. Run validation frequently — Catch issues early
  3. Visualize complex dependencies — Better than manually tracing edges
  4. Use queries for reports — Don't manually count nodes
  5. Commit visualizations — Helps reviewers understand structure

API Reference

See docstrings in kg_tools.py for complete API documentation:

help(KnowledgeGraph)

Key methods: - nodes_by_kind(kind) — Get all nodes of a type - nodes_by_status(status) — Get nodes by status - get_node_data(node_id) — Get all attributes for a node - find_untested_equations() — Find equations without tests - find_unimplemented_equations() — Find equations without implementations - transitive_dependencies(node_id) — Get dependency closure - transitive_dependents(node_id) — Get dependent closure - find_blockers(node_id) — Find incomplete dependencies - find_missing_refs() — Find dangling references - find_orphan_nodes() — Find isolated nodes - find_cycles() — Detect circular dependencies - coverage_report() — Test coverage by kind - status_summary() — Nodes by status - chapter_summary(chapter_id) — Chapter statistics - implementation_status() — Implementation coverage - export_stats() — Comprehensive statistics