muse-vcs.md
markdown
| 1 | # Muse VCS — Architecture Reference |
| 2 | |
| 3 | > **Version:** v1.0 (Phases 1–4 complete) |
| 4 | > **See also:** [Plugin Authoring Guide](../guide/plugin-authoring-guide.md) · [CRDT Reference](../guide/crdt-reference.md) · [E2E Walkthrough](muse-e2e-demo.md) · [Plugin Protocol](../protocol/muse-protocol.md) · [Domain Concepts](../protocol/muse-domain-concepts.md) · [Type Contracts](../reference/type-contracts.md) |
| 5 | |
| 6 | --- |
| 7 | |
| 8 | ## What Muse Is |
| 9 | |
| 10 | Muse is a **domain-agnostic version control system for multidimensional state**. It provides |
| 11 | a complete DAG engine — content-addressed objects, commits, branches, three-way merge, drift |
| 12 | detection, time-travel checkout, and a full log graph — with one deliberate gap: it does not |
| 13 | know what "state" is. |
| 14 | |
| 15 | That gap is the plugin slot. A `MuseDomainPlugin` tells Muse how to interpret your domain's |
| 16 | data. Everything else — the DAG, object store, branching, lineage walking, log, merge state |
| 17 | machine — is provided by the core engine and shared across all domains. |
| 18 | |
| 19 | Muse v1.0 adds **four layers of semantic richness** on top of that base, each implemented as |
| 20 | an optional protocol extension that plugins can adopt without breaking anything: |
| 21 | |
| 22 | | Phase | Protocol | What you gain | |
| 23 | |-------|----------|---------------| |
| 24 | | 1 — Typed Delta Algebra | `MuseDomainPlugin` (required) | Rich, typed operation lists instead of opaque file diffs | |
| 25 | | 2 — Domain Schema | `MuseDomainPlugin.schema()` (required) | Algorithm selection driven by declared data structure | |
| 26 | | 3 — OT Merge Engine | `StructuredMergePlugin` (optional) | Sub-file auto-merge using Operational Transformation | |
| 27 | | 4 — CRDT Semantics | `CRDTPlugin` (optional) | Convergent join — no conflicts ever possible | |
| 28 | |
| 29 | --- |
| 30 | |
| 31 | ## The Seven Invariants |
| 32 | |
| 33 | ``` |
| 34 | State = a serializable, content-addressed snapshot of any multidimensional space |
| 35 | Commit = a named delta from a parent state, recorded in a DAG |
| 36 | Branch = a divergent line of intent forked from a shared ancestor |
| 37 | Merge = three-way reconciliation of two divergent state lines against a common base |
| 38 | Drift = the gap between committed state and live state |
| 39 | Checkout = deterministic reconstruction of any historical state from the DAG |
| 40 | Lineage = the causal chain from root to any commit |
| 41 | ``` |
| 42 | |
| 43 | None of those definitions contain the word "music." |
| 44 | |
| 45 | --- |
| 46 | |
| 47 | ## Repository Structure on Disk |
| 48 | |
| 49 | Every Muse repository is a `.muse/` directory: |
| 50 | |
| 51 | ``` |
| 52 | .muse/ |
| 53 | repo.json — repository ID, domain name, creation metadata |
| 54 | HEAD — ref pointer, e.g. refs/heads/main |
| 55 | config.toml — optional local config (auth token, remotes) |
| 56 | refs/ |
| 57 | heads/ |
| 58 | main — SHA-256 commit ID of branch HEAD |
| 59 | feature/… — additional branch HEADs |
| 60 | objects/ |
| 61 | <sha2>/ — shard directory (first 2 hex chars) |
| 62 | <sha62> — raw content-addressed blob |
| 63 | commits/ |
| 64 | <commit_id>.json — CommitRecord (includes structured_delta since Phase 1) |
| 65 | snapshots/ |
| 66 | <snapshot_id>.json — SnapshotRecord (manifest: {path → object_id}) |
| 67 | tags/ |
| 68 | <tag_id>.json — TagRecord |
| 69 | MERGE_STATE.json — present only during an active merge conflict |
| 70 | muse-work/ — the working tree (domain files live here) |
| 71 | .museattributes — optional: per-path merge strategy overrides |
| 72 | .museignore — optional: paths excluded from snapshots |
| 73 | ``` |
| 74 | |
| 75 | The object store mirrors Git's loose-object layout: sharding by the first two hex characters |
| 76 | of each SHA-256 digest prevents filesystem degradation as the repository grows. |
| 77 | |
| 78 | --- |
| 79 | |
| 80 | ## Core Engine Modules |
| 81 | |
| 82 | ``` |
| 83 | muse/ |
| 84 | domain.py — all protocol definitions and shared type aliases |
| 85 | core/ |
| 86 | store.py — file-based commit / snapshot / tag CRUD |
| 87 | repo.py — repository detection (MUSE_REPO_ROOT or directory walk) |
| 88 | snapshot.py — content-addressed snapshot and commit ID derivation |
| 89 | object_store.py — SHA-256 blob storage under .muse/objects/ |
| 90 | merge_engine.py — three-way merge + CRDT join entry points |
| 91 | op_transform.py — Operational Transformation (Phase 3) |
| 92 | schema.py — DomainSchema TypedDicts (Phase 2) |
| 93 | diff_algorithms/ — LCS, tree-edit, numerical, set diff (Phase 2) |
| 94 | crdts/ — VectorClock, LWWRegister, ORSet, RGA, AWMap, GCounter (Phase 4) |
| 95 | errors.py — ExitCode enum |
| 96 | attributes.py — .museattributes loading and strategy resolution |
| 97 | plugins/ |
| 98 | registry.py — domain name → MuseDomainPlugin instance |
| 99 | music/ |
| 100 | plugin.py — MusicPlugin: reference implementation of all protocols |
| 101 | midi_diff.py — note-level MIDI diff and MIDI reconstruction |
| 102 | scaffold/ |
| 103 | plugin.py — copy-paste template for new domain plugins |
| 104 | cli/ |
| 105 | app.py — Typer application root, command registration |
| 106 | commands/ — one file per subcommand (14 commands + domains) |
| 107 | ``` |
| 108 | |
| 109 | --- |
| 110 | |
| 111 | ## Deterministic ID Derivation |
| 112 | |
| 113 | All IDs are SHA-256 digests — the DAG is fully content-addressed: |
| 114 | |
| 115 | ``` |
| 116 | object_id = sha256(raw_file_bytes) |
| 117 | snapshot_id = sha256(sorted("path:object_id\n" pairs)) |
| 118 | commit_id = sha256(sorted_parent_ids | snapshot_id | message | timestamp_iso) |
| 119 | ``` |
| 120 | |
| 121 | The same snapshot always produces the same ID. Two commits that point to identical state share |
| 122 | a `snapshot_id`. Objects are never overwritten — write is always idempotent. |
| 123 | |
| 124 | --- |
| 125 | |
| 126 | ## Phase 1 — Typed Delta Algebra |
| 127 | |
| 128 | Every commit now carries a `structured_delta: StructuredDelta` alongside the snapshot |
| 129 | manifest. A `StructuredDelta` is a list of typed `DomainOp` entries: |
| 130 | |
| 131 | | Op type | Meaning | |
| 132 | |---------|---------| |
| 133 | | `InsertOp` | An element was added at a position | |
| 134 | | `DeleteOp` | An element was removed | |
| 135 | | `MoveOp` | An element was repositioned | |
| 136 | | `ReplaceOp` | An element's value changed (before/after content hashes) | |
| 137 | | `PatchOp` | A container was internally modified (carries child ops recursively) | |
| 138 | |
| 139 | This replaces the old opaque `{added, removed, modified}` path lists entirely. Every operation |
| 140 | carries a `content_id` (SHA-256 hash of the element), an `address` (domain-specific location), |
| 141 | and a `content_summary` (human-readable description for `muse show`). |
| 142 | |
| 143 | `muse show <commit>` and `muse diff` display note-level diffs for MIDI files — not just "file |
| 144 | changed" but "3 notes added at bar 4, 1 note removed from bar 7." |
| 145 | |
| 146 | --- |
| 147 | |
| 148 | ## Phase 2 — Domain Schema & Diff Algorithm Library |
| 149 | |
| 150 | Plugins implement `schema() -> DomainSchema` to declare the structural shape of their data. |
| 151 | The schema drives algorithm selection in `diff_by_schema()`: |
| 152 | |
| 153 | | Schema kind | Diff algorithm | Use when… | |
| 154 | |-------------|---------------|-----------| |
| 155 | | `"sequence"` | Myers LCS | Ordered lists (note events, DNA sequences) | |
| 156 | | `"tree"` | LCS-based tree edit | Hierarchical structures (scene graphs, XML) | |
| 157 | | `"tensor"` | Epsilon-tolerant numerical | N-dimensional arrays (simulation grids) | |
| 158 | | `"set"` | Hash-set algebra | Unordered collections (annotation sets) | |
| 159 | | `"map"` | Per-key comparison | Key-value maps (manifests, configs) | |
| 160 | |
| 161 | `DomainSchema.merge_mode` controls which merge path the core engine takes: |
| 162 | - `"three_way"` — classic three-way merge (Phases 1–3) |
| 163 | - `"crdt"` — convergent CRDT join (Phase 4) |
| 164 | |
| 165 | --- |
| 166 | |
| 167 | ## Phase 3 — Operation-Level Merge Engine |
| 168 | |
| 169 | Plugins that implement `StructuredMergePlugin` gain sub-file auto-merge: |
| 170 | |
| 171 | ```python |
| 172 | @runtime_checkable |
| 173 | class StructuredMergePlugin(MuseDomainPlugin, Protocol): |
| 174 | def merge_ops( |
| 175 | self, |
| 176 | base: StateSnapshot, |
| 177 | ours_snap: StateSnapshot, |
| 178 | theirs_snap: StateSnapshot, |
| 179 | ours_ops: list[DomainOp], |
| 180 | theirs_ops: list[DomainOp], |
| 181 | *, |
| 182 | repo_root: pathlib.Path | None = None, |
| 183 | ) -> MergeResult: ... |
| 184 | ``` |
| 185 | |
| 186 | The core merge engine detects this with `isinstance(plugin, StructuredMergePlugin)` and calls |
| 187 | `merge_ops()` when both branches have `StructuredDelta`. Non-supporting plugins fall back to |
| 188 | file-level `merge()` automatically. |
| 189 | |
| 190 | ### Operational Transformation (`muse/core/op_transform.py`) |
| 191 | |
| 192 | | Function | Purpose | |
| 193 | |----------|---------| |
| 194 | | `ops_commute(a, b)` | Returns `True` when two ops can be applied in either order | |
| 195 | | `transform(a, b)` | Adjusts positions so the diamond property holds | |
| 196 | | `merge_op_lists(base, ours, theirs)` | Three-way OT merge; returns `MergeOpsResult` | |
| 197 | | `merge_structured(base_delta, ours_delta, theirs_delta)` | Wrapper for `StructuredDelta` inputs | |
| 198 | |
| 199 | **Commutativity rules (all 25 op-pair combinations covered):** |
| 200 | - Different addresses → always commute |
| 201 | - `InsertOp` + `InsertOp` at same position → conflict |
| 202 | - `DeleteOp` + `DeleteOp` same content_id → idempotent (not a conflict) |
| 203 | - `PatchOp` + `PatchOp` → recursive check on child ops |
| 204 | - Cross-type pairs → generally commute (structural independence) |
| 205 | |
| 206 | --- |
| 207 | |
| 208 | ## Phase 4 — CRDT Semantics |
| 209 | |
| 210 | Plugins that implement `CRDTPlugin` replace three-way merge with a mathematical `join` on a |
| 211 | lattice. **`join` always succeeds — no conflict state ever exists.** |
| 212 | |
| 213 | ```python |
| 214 | @runtime_checkable |
| 215 | class CRDTPlugin(MuseDomainPlugin, Protocol): |
| 216 | def crdt_schema(self) -> list[CRDTDimensionSpec]: ... |
| 217 | def join(self, a: CRDTSnapshotManifest, b: CRDTSnapshotManifest) -> CRDTSnapshotManifest: ... |
| 218 | def to_crdt_state(self, snapshot: StateSnapshot) -> CRDTSnapshotManifest: ... |
| 219 | def from_crdt_state(self, crdt: CRDTSnapshotManifest) -> StateSnapshot: ... |
| 220 | ``` |
| 221 | |
| 222 | Entry point: `crdt_join_snapshots()` in `merge_engine.py`. |
| 223 | |
| 224 | ### CRDT Primitive Library (`muse/core/crdts/`) |
| 225 | |
| 226 | | Primitive | File | Best for | |
| 227 | |-----------|------|---------| |
| 228 | | `VectorClock` | `vclock.py` | Causal ordering between agents | |
| 229 | | `LWWRegister` | `lww_register.py` | Scalar values; last write wins | |
| 230 | | `ORSet` | `or_set.py` | Unordered sets; adds always win | |
| 231 | | `RGA` | `rga.py` | Ordered sequences (collaborative editing) | |
| 232 | | `AWMap` | `aw_map.py` | Key-value maps; adds win | |
| 233 | | `GCounter` | `g_counter.py` | Monotonically increasing counters | |
| 234 | |
| 235 | All six satisfy: commutativity, associativity, idempotency — the three lattice laws that |
| 236 | guarantee convergence regardless of message delivery order. |
| 237 | |
| 238 | ### When to use CRDT mode |
| 239 | |
| 240 | | Scenario | Recommendation | |
| 241 | |----------|----------------| |
| 242 | | Human-paced commits (once per hour/day) | Three-way merge (Phases 1–3) | |
| 243 | | Many agents writing concurrently (sub-second) | CRDT mode | |
| 244 | | Shared annotation sets (many simultaneous contributors) | CRDT `ORSet` | |
| 245 | | Collaborative score editing (DAW-style) | CRDT `RGA` | |
| 246 | | Per-dimension mix | Set `merge_mode="crdt"` per `CRDTDimensionSpec` | |
| 247 | |
| 248 | --- |
| 249 | |
| 250 | ## The Full Plugin Protocol Stack |
| 251 | |
| 252 | ``` |
| 253 | MuseDomainPlugin ← required by every domain plugin |
| 254 | ├── schema() ← Phase 2: declare data structure |
| 255 | ├── snapshot() ← capture current live state |
| 256 | ├── diff() ← compute typed StructuredDelta |
| 257 | ├── drift() ← detect uncommitted changes |
| 258 | ├── apply() ← apply delta to working tree |
| 259 | └── merge() ← three-way merge (fallback) |
| 260 | |
| 261 | StructuredMergePlugin ← optional Phase 3 extension |
| 262 | └── merge_ops() ← operation-level OT merge |
| 263 | |
| 264 | CRDTPlugin ← optional Phase 4 extension |
| 265 | ├── crdt_schema() ← declare per-dimension CRDT types |
| 266 | ├── join() ← convergent lattice join |
| 267 | ├── to_crdt_state() ← lift plain snapshot to CRDT state |
| 268 | └── from_crdt_state() ← materialise CRDT state back to snapshot |
| 269 | ``` |
| 270 | |
| 271 | The core engine detects capabilities at runtime via `isinstance`: |
| 272 | |
| 273 | ```python |
| 274 | if isinstance(plugin, CRDTPlugin) and schema["merge_mode"] == "crdt": |
| 275 | return crdt_join_snapshots(plugin, ...) |
| 276 | elif isinstance(plugin, StructuredMergePlugin): |
| 277 | return plugin.merge_ops(base, ours_snap, theirs_snap, ours_ops, theirs_ops) |
| 278 | else: |
| 279 | return plugin.merge(base, left, right) |
| 280 | ``` |
| 281 | |
| 282 | --- |
| 283 | |
| 284 | ## How CLI Commands Use the Plugin |
| 285 | |
| 286 | | Command | Plugin method(s) called | |
| 287 | |---------|------------------------| |
| 288 | | `muse commit` | `snapshot()`, `diff()` (for structured_delta) | |
| 289 | | `muse status` | `drift()` | |
| 290 | | `muse diff` | `diff()` | |
| 291 | | `muse show` | reads stored `structured_delta` | |
| 292 | | `muse merge` | `merge_ops()` or `merge()` (capability detection) | |
| 293 | | `muse cherry-pick` | `merge()` | |
| 294 | | `muse stash` | `snapshot()` | |
| 295 | | `muse checkout` | `diff()` + `apply()` | |
| 296 | | `muse domains` | `schema()`, capability introspection | |
| 297 | |
| 298 | --- |
| 299 | |
| 300 | ## Adding a New Domain — Quick Reference |
| 301 | |
| 302 | 1. Copy `muse/plugins/scaffold/plugin.py` → `muse/plugins/<domain>/plugin.py` |
| 303 | 2. Implement all methods (every `raise NotImplementedError` must be replaced) |
| 304 | 3. Register in `muse/plugins/registry.py` |
| 305 | 4. Run `muse init --domain <domain>` in any project directory |
| 306 | 5. All existing CLI commands work immediately |
| 307 | |
| 308 | See the full [Plugin Authoring Guide](../guide/plugin-authoring-guide.md) for a step-by-step |
| 309 | walkthrough covering Phases 1–4 with examples. |
| 310 | |
| 311 | --- |
| 312 | |
| 313 | ## CLI Command Reference |
| 314 | |
| 315 | ### Core VCS (all domains) |
| 316 | |
| 317 | | Command | Description | |
| 318 | |---------|-------------| |
| 319 | | `muse init [--domain <name>]` | Initialize a repository | |
| 320 | | `muse commit -m <msg>` | Snapshot live state and record a commit | |
| 321 | | `muse status` | Show drift between HEAD and working tree | |
| 322 | | `muse diff [<base>] [<target>]` | Show delta between commits or vs. working tree | |
| 323 | | `muse log [--oneline] [--graph] [--stat]` | Display commit history | |
| 324 | | `muse show [<ref>] [--json] [--stat]` | Inspect a single commit with operation-level detail | |
| 325 | | `muse branch [<name>] [-d <name>]` | Create or delete branches | |
| 326 | | `muse checkout <branch\|commit> [-b]` | Switch branches or restore historical state | |
| 327 | | `muse merge <branch>` | Three-way merge (or CRDT join, capability-detected) | |
| 328 | | `muse cherry-pick <commit>` | Apply a specific commit's delta on top of HEAD | |
| 329 | | `muse revert <commit>` | Create a new commit undoing a prior commit | |
| 330 | | `muse reset <commit> [--hard]` | Move branch pointer | |
| 331 | | `muse stash` / `pop` / `list` / `drop` | Temporarily shelve uncommitted changes | |
| 332 | | `muse tag add <tag> [<ref>]` | Tag a commit | |
| 333 | | `muse tag list [<ref>]` | List tags | |
| 334 | | `muse domains` | Show domain dashboard — registered domains, capabilities, schema | |
| 335 | |
| 336 | ### Music-Domain Extras (music plugin only) |
| 337 | |
| 338 | | Command | Description | |
| 339 | |---------|-------------| |
| 340 | | `muse commit --section <name> --track <name>` | Commit with music metadata | |
| 341 | | `muse log --section <s> --track <t>` | Filter log by music metadata | |
| 342 | |
| 343 | --- |
| 344 | |
| 345 | ## Testing & Verification |
| 346 | |
| 347 | ```bash |
| 348 | # Full test suite (691 tests) |
| 349 | .venv/bin/pytest tests/ -v |
| 350 | |
| 351 | # Type checking (zero errors required) |
| 352 | mypy muse/ |
| 353 | |
| 354 | # Typing audit (zero Any violations required) |
| 355 | python tools/typing_audit.py --dirs muse/ tests/ --max-any 0 |
| 356 | ``` |
| 357 | |
| 358 | CI runs all three gates on every PR to `dev` and on every `dev → main` merge. |
| 359 | |
| 360 | --- |
| 361 | |
| 362 | ## Key Design Decisions |
| 363 | |
| 364 | **Why no `async`?** The CLI is synchronous by design. All algorithms are CPU-bound and |
| 365 | complete in bounded time. If a domain's data is too large to diff synchronously, the plugin |
| 366 | should chunk it — this is a domain concern, not a core concern. |
| 367 | |
| 368 | **Why TypedDicts over Pydantic?** Zero external dependencies. All types are JSON-serialisable |
| 369 | by construction. `mypy --strict` verifies them without runtime overhead. |
| 370 | |
| 371 | **Why content-addressed storage?** Objects are never overwritten. Checkout, revert, and |
| 372 | cherry-pick cost zero bytes when the target objects already exist. The object store scales to |
| 373 | millions of fine-grained sub-elements (individual notes, nucleotides, mesh vertices) without |
| 374 | format changes. |
| 375 | |
| 376 | **Why four phases?** Each phase is independently useful. A plugin that only implements |
| 377 | Phase 1 gets rich operation-level `muse show` output. Phase 2 adds algorithm selection. |
| 378 | Phase 3 adds sub-file auto-merge. Phase 4 adds convergent multi-agent semantics. Adoption |
| 379 | is incremental and backward-compatible. |