Knowledge Graph — Structure and Update Guide¶
Purpose: maintain a compressed knowledge graph of the book, simulator, methods, definitions/equations, proofs, and code links. It is human‑readable and LLM‑friendly, and supports forward/backward hooks to reference not‑yet‑written parts.
- Primary file:
docs/knowledge_graph/graph.yaml - Schema (fields + enums):
docs/knowledge_graph/schema.yaml - Optional: per‑chapter stubs can be added later and merged into
graph.yaml.
ID Conventions¶
- Chapters:
CH-<num>(e.g.,CH-1,CH-11) - Equations:
EQ-<chapter>.<num>[suffix](e.g.,EQ-1.2,EQ-1.2-prime) - Remarks/Defs/Theorems:
REM-<id>,DF-<id>,TH-<id> - Concepts:
CN-<slug>(e.g.,CN-CMDP,CN-OPE) - Modules:
MOD-<package.path>(e.g.,MOD-zoosim.reward) - Tests:
TEST-<path> - Plans:
PLN-<slug>(e.g.,PLN-Frontier-Phase)
Status¶
planned→ placeholder exists; content pendingin_progress→ partial content/code exists; iteratingcomplete→ content/code implemented and referencedarchived→ superseded by newer nodes
Relations (Edge types)¶
defines(chapter/section → definition/equation/concept)proves(chapter/section → theorem)uses(chapter/section/module → node)implements(module → equation/concept/algorithm)tested_by(module/node → test)depends_on(node → node)refers_to_future(node → planned node) [forward hook]superseded_by(archived → node)
How to Update¶
- Add or edit nodes/edges in
graph.yaml. - Keep
summaryone‑line and concrete; includefileandanchorwhen applicable. - For forward hooks, add a
refers_to_futureedge from current node to aplannednode. - Prefer referencing anchors over line numbers for stability.
Anchor Format Conventions¶
- Equations: Use Pandoc-style labels:
{#EQ-1.2}immediately after\tag{1.2}in markdown - Theorems/Definitions: Use Pandoc labels:
{#THM-2.3.1},{#DEF-1.1.1} - Sections/Headings: Use standard markdown anchors:
#section-title(auto-generated by MkDocs) - In KG nodes: Store anchors with braces:
anchor: "{#EQ-1.2}"for consistency with Pandoc processing - Validation: The validator checks for substring presence; both
{#EQ-1.2}and#EQ-1.2will match
Example:
$$
R = \alpha \cdot \text{GMV} + \beta \cdot \text{CM2}
\tag{1.2}
$$
{#EQ-1.2}
In graph.yaml:
anchor: "{#EQ-1.2}"
Forward Hooks Workflow¶
When a draft chapter cites content that doesn't exist yet:
- Create a planned node for the future content: ```yaml
-
id: CH-11 kind: chapter status: planned file: docs/book/drafts/syllabus.md # Points to planning doc initially summary: Multi-episode inter-session MDP ```
-
Add a
refers_to_futureedge from the current chapter: ```yaml -
{src: CH-1, dst: CH-11, rel: refers_to_future, status: complete} ```
-
Update the node when drafting begins: ```yaml
-
id: CH-11 status: in_progress file: docs/book/drafts/ch11_multi_episode.md # Now points to actual draft ```
-
In the draft text, add a cross-reference box:
markdown !!! note "Forward Reference — Chapter 11" The multi-episode formulation (Chapter 11) operationalizes this with retention hazard models. See [EQ-1.2-prime] for the principled objective.
Example (abbreviated)¶
- Node:
EQ-1.2(equation) defined in Chapter 1; implemented byzoosim/dynamics/reward.py. - Node:
CH-11(chapter) implementsEQ-1.2-prime; depends onMOD-zoosim.session_env. - Edge:
MOD-zoosim.session_env implements EQ-1.2-prime. - Edge:
CH-1 refers_to_future CH-11(forward hook).
See graph.yaml for a working set seeded from current content.
NetworkX Query Tools¶
The knowledge graph includes NetworkX-based Python utilities for powerful queries, validation, and visualization.
Tools Overview¶
kg_tools.py— Core graph query libraryvalidate_kg.py— Comprehensive validation and consistency checkingvisualize_kg.py— Dependency graph visualization
Installation¶
# Install dependencies (already in project environment)
pip install networkx pyyaml matplotlib
Quick Start¶
Load the graph:
from kg_tools import KnowledgeGraph
kg = KnowledgeGraph("docs/knowledge_graph/graph.yaml")
Find untested equations:
untested = kg.find_untested_equations()
print(f"Untested equations: {untested}")
Get transitive dependencies:
deps = kg.transitive_dependencies("CH-11")
print(f"CH-11 depends on: {deps}")
Find what's blocking progress:
blockers = kg.find_blockers("CH-11")
for blocker_id, status, title in blockers:
print(f" {blocker_id} ({status}): {title}")
Generate coverage report:
coverage = kg.coverage_report()
for kind, stats in coverage.items():
print(f"{kind}: {stats['tested']}/{stats['total']} ({stats['coverage_pct']}%)")
Get chapter summary:
summary = kg.chapter_summary("CH-1")
print(f"Chapter 1 defines {summary['defines_count']} items")
print(f"Forward refs: {[r['id'] for r in summary['forward_refs']]}")
Command-Line Usage¶
Run statistics:
python docs/knowledge_graph/kg_tools.py
Validate the graph:
python docs/knowledge_graph/validate_kg.py
Visualize a chapter:
python docs/knowledge_graph/visualize_kg.py --chapter CH-1 --output ch01.png
Visualize dependencies:
python docs/knowledge_graph/visualize_kg.py --deps CH-11 --output ch11_deps.png
Generate implementation map:
python docs/knowledge_graph/visualize_kg.py --impl-map --output impl_map.png
Advanced Queries¶
Find unimplemented equations:
unimpl = kg.find_unimplemented_equations()
Find circular dependencies:
cycles = kg.find_cycles()
if cycles:
print(f"Found {len(cycles)} circular dependencies!")
Get all nodes by kind:
equations = kg.nodes_by_kind('equation')
modules = kg.nodes_by_kind('module')
Get all nodes by status:
planned = kg.nodes_by_status('planned')
in_progress = kg.nodes_by_status('in_progress')
Check implementation status:
impl_status = kg.implementation_status()
print(f"Implemented: {len(impl_status['implemented'])}")
print(f"Unimplemented: {len(impl_status['unimplemented'])}")
Export comprehensive stats:
stats = kg.export_stats()
print(f"Total nodes: {stats['total_nodes']}")
print(f"Total edges: {stats['total_edges']}")
print(f"Orphan nodes: {stats['orphan_nodes']}")
Validation Checks¶
The validator (validate_kg.py) performs:
- ✅ Referential integrity — No dangling references to non-existent nodes
- ✅ File existence — All referenced files exist in repository
- ✅ Anchor presence — Declared anchors exist in files
- ✅ Status consistency — Complete nodes don't depend on planned nodes
- ✅ Circular dependencies — No cycles in dependency graph
- ✅ Orphan detection — Nodes with no connections
- ✅ Coverage gaps — Equations/modules without tests
- ✅ Schema compliance — Valid node kinds, statuses, and edge types
Run validation before committing changes to graph.yaml:
python docs/knowledge_graph/validate_kg.py
# Exit code 0 if passed, 1 if failed
Visualization¶
The visualizer generates:
- Chapter dependency graphs — What each chapter defines and depends on
- Dependency trees — Transitive dependencies for any node
- Implementation maps — Which modules implement which equations/algorithms
Color scheme: - Blue: Chapters - Orange: Equations - Red: Theorems - Purple: Definitions - Green: Modules - Yellow: Tests - Teal: Algorithms - Brown: Concepts
Status markers: - ○ Circle: Planned - □ Square: In Progress - ◇ Diamond: Complete - × X: Archived
Example visualizations:
# Chapter 1 with depth 2, excluding tests
python docs/knowledge_graph/visualize_kg.py --chapter CH-1 --depth 2 --output ch01.png
# Show all dependencies for CH-11
python docs/knowledge_graph/visualize_kg.py --deps CH-11 --output ch11_deps.png
# Implementation map showing module → equation links
python docs/knowledge_graph/visualize_kg.py --impl-map --output impl.png
Integration with Workflow¶
After writing a new chapter:
- Update
graph.yamlwith new nodes/edges - Run validator:
python docs/knowledge_graph/validate_kg.py - Fix any errors/warnings
- Generate visualization:
python docs/knowledge_graph/visualize_kg.py --chapter CH-X - Commit both
graph.yamland visualization PNGs
When planning new content:
- Use
kg.find_blockers("CH-X")to see what needs to be written first - Use
kg.transitive_dependencies("CH-X")to understand full dependency chain - Add forward hooks for future chapters/equations
For code reviews:
- Run
kg.coverage_report()to check test coverage - Run
kg.implementation_status()to verify equations are implemented - Check
kg.find_missing_refs()for dangling references
Best Practices¶
- Keep YAML as source of truth — All edits go to
graph.yaml, not the NetworkX graph - Run validation frequently — Catch issues early
- Visualize complex dependencies — Better than manually tracing edges
- Use queries for reports — Don't manually count nodes
- Commit visualizations — Helps reviewers understand structure
API Reference¶
See docstrings in kg_tools.py for complete API documentation:
help(KnowledgeGraph)
Key methods:
- nodes_by_kind(kind) — Get all nodes of a type
- nodes_by_status(status) — Get nodes by status
- get_node_data(node_id) — Get all attributes for a node
- find_untested_equations() — Find equations without tests
- find_unimplemented_equations() — Find equations without implementations
- transitive_dependencies(node_id) — Get dependency closure
- transitive_dependents(node_id) — Get dependent closure
- find_blockers(node_id) — Find incomplete dependencies
- find_missing_refs() — Find dangling references
- find_orphan_nodes() — Find isolated nodes
- find_cycles() — Detect circular dependencies
- coverage_report() — Test coverage by kind
- status_summary() — Nodes by status
- chapter_summary(chapter_id) — Chapter statistics
- implementation_status() — Implementation coverage
- export_stats() — Comprehensive statistics