cgcardona / muse public
muse-vcs.md markdown
379 lines 15.4 KB
04004b82 Rename MusicRGA → MidiRGA and purge all 'music plugin' terminology Gabriel Cardona <gabriel@tellurstori.com> 1d ago
1 # Muse VCS — Architecture Reference
2
3 > **Version:** v0.1.1
4 > **See also:** [Plugin Authoring Guide](../guide/plugin-authoring-guide.md) · [CRDT Reference](../guide/crdt-reference.md) · [E2E Walkthrough](muse-e2e-demo.md) · [Plugin Protocol](../protocol/muse-protocol.md) · [Domain Concepts](../protocol/muse-domain-concepts.md) · [Type Contracts](../reference/type-contracts.md)
5
6 ---
7
8 ## What Muse Is
9
10 Muse is a **domain-agnostic version control system for multidimensional state**. It provides
11 a complete DAG engine — content-addressed objects, commits, branches, three-way merge, drift
12 detection, time-travel checkout, and a full log graph — with one deliberate gap: it does not
13 know what "state" is.
14
15 That gap is the plugin slot. A `MuseDomainPlugin` tells Muse how to interpret your domain's
16 data. Everything else — the DAG, object store, branching, lineage walking, log, merge state
17 machine — is provided by the core engine and shared across all domains.
18
19 Muse v1.0 adds **four layers of semantic richness** on top of that base, each implemented as
20 an optional protocol extension that plugins can adopt without breaking anything:
21
22 | Phase | Protocol | What you gain |
23 |-------|----------|---------------|
24 | 1 — Typed Delta Algebra | `MuseDomainPlugin` (required) | Rich, typed operation lists instead of opaque file diffs |
25 | 2 — Domain Schema | `MuseDomainPlugin.schema()` (required) | Algorithm selection driven by declared data structure |
26 | 3 — OT Merge Engine | `StructuredMergePlugin` (optional) | Sub-file auto-merge using Operational Transformation |
27 | 4 — CRDT Semantics | `CRDTPlugin` (optional) | Convergent join — no conflicts ever possible |
28
29 ---
30
31 ## The Seven Invariants
32
33 ```
34 State = a serializable, content-addressed snapshot of any multidimensional space
35 Commit = a named delta from a parent state, recorded in a DAG
36 Branch = a divergent line of intent forked from a shared ancestor
37 Merge = three-way reconciliation of two divergent state lines against a common base
38 Drift = the gap between committed state and live state
39 Checkout = deterministic reconstruction of any historical state from the DAG
40 Lineage = the causal chain from root to any commit
41 ```
42
43 None of those definitions contain the word "music."
44
45 ---
46
47 ## Repository Structure on Disk
48
49 Every Muse repository is a `.muse/` directory:
50
51 ```
52 .muse/
53 repo.json — repository ID, domain name, creation metadata
54 HEAD — ref pointer, e.g. refs/heads/main
55 config.toml — optional local config (auth token, remotes)
56 refs/
57 heads/
58 main — SHA-256 commit ID of branch HEAD
59 feature/… — additional branch HEADs
60 objects/
61 <sha2>/ — shard directory (first 2 hex chars)
62 <sha62> — raw content-addressed blob
63 commits/
64 <commit_id>.json — CommitRecord (includes structured_delta since Phase 1)
65 snapshots/
66 <snapshot_id>.json — SnapshotRecord (manifest: {path → object_id})
67 tags/
68 <tag_id>.json — TagRecord
69 MERGE_STATE.json — present only during an active merge conflict
70 muse-work/ — the working tree (domain files live here)
71 .museattributes — optional: per-path merge strategy overrides
72 .museignore — optional: paths excluded from snapshots
73 ```
74
75 The object store mirrors Git's loose-object layout: sharding by the first two hex characters
76 of each SHA-256 digest prevents filesystem degradation as the repository grows.
77
78 ---
79
80 ## Core Engine Modules
81
82 ```
83 muse/
84 domain.py — all protocol definitions and shared type aliases
85 core/
86 store.py — file-based commit / snapshot / tag CRUD
87 repo.py — repository detection (MUSE_REPO_ROOT or directory walk)
88 snapshot.py — content-addressed snapshot and commit ID derivation
89 object_store.py — SHA-256 blob storage under .muse/objects/
90 merge_engine.py — three-way merge + CRDT join entry points
91 op_transform.py — Operational Transformation (Phase 3)
92 schema.py — DomainSchema TypedDicts (Phase 2)
93 diff_algorithms/ — LCS, tree-edit, numerical, set diff (Phase 2)
94 crdts/ — VectorClock, LWWRegister, ORSet, RGA, AWMap, GCounter (Phase 4)
95 errors.py — ExitCode enum
96 attributes.py — .museattributes loading and strategy resolution
97 plugins/
98 registry.py — domain name → MuseDomainPlugin instance
99 music/
100 plugin.py — MidiPlugin: reference implementation of all protocols
101 midi_diff.py — note-level MIDI diff and MIDI reconstruction
102 scaffold/
103 plugin.py — copy-paste template for new domain plugins
104 cli/
105 app.py — Typer application root, command registration
106 commands/ — one file per subcommand (14 commands + domains)
107 ```
108
109 ---
110
111 ## Deterministic ID Derivation
112
113 All IDs are SHA-256 digests — the DAG is fully content-addressed:
114
115 ```
116 object_id = sha256(raw_file_bytes)
117 snapshot_id = sha256(sorted("path:object_id\n" pairs))
118 commit_id = sha256(sorted_parent_ids | snapshot_id | message | timestamp_iso)
119 ```
120
121 The same snapshot always produces the same ID. Two commits that point to identical state share
122 a `snapshot_id`. Objects are never overwritten — write is always idempotent.
123
124 ---
125
126 ## Phase 1 — Typed Delta Algebra
127
128 Every commit now carries a `structured_delta: StructuredDelta` alongside the snapshot
129 manifest. A `StructuredDelta` is a list of typed `DomainOp` entries:
130
131 | Op type | Meaning |
132 |---------|---------|
133 | `InsertOp` | An element was added at a position |
134 | `DeleteOp` | An element was removed |
135 | `MoveOp` | An element was repositioned |
136 | `ReplaceOp` | An element's value changed (before/after content hashes) |
137 | `PatchOp` | A container was internally modified (carries child ops recursively) |
138
139 This replaces the old opaque `{added, removed, modified}` path lists entirely. Every operation
140 carries a `content_id` (SHA-256 hash of the element), an `address` (domain-specific location),
141 and a `content_summary` (human-readable description for `muse show`).
142
143 `muse show <commit>` and `muse diff` display note-level diffs for MIDI files — not just "file
144 changed" but "3 notes added at bar 4, 1 note removed from bar 7."
145
146 ---
147
148 ## Phase 2 — Domain Schema & Diff Algorithm Library
149
150 Plugins implement `schema() -> DomainSchema` to declare the structural shape of their data.
151 The schema drives algorithm selection in `diff_by_schema()`:
152
153 | Schema kind | Diff algorithm | Use when… |
154 |-------------|---------------|-----------|
155 | `"sequence"` | Myers LCS | Ordered lists (note events, DNA sequences) |
156 | `"tree"` | LCS-based tree edit | Hierarchical structures (scene graphs, XML) |
157 | `"tensor"` | Epsilon-tolerant numerical | N-dimensional arrays (simulation grids) |
158 | `"set"` | Hash-set algebra | Unordered collections (annotation sets) |
159 | `"map"` | Per-key comparison | Key-value maps (manifests, configs) |
160
161 `DomainSchema.merge_mode` controls which merge path the core engine takes:
162 - `"three_way"` — classic three-way merge (Phases 1–3)
163 - `"crdt"` — convergent CRDT join (Phase 4)
164
165 ---
166
167 ## Phase 3 — Operation-Level Merge Engine
168
169 Plugins that implement `StructuredMergePlugin` gain sub-file auto-merge:
170
171 ```python
172 @runtime_checkable
173 class StructuredMergePlugin(MuseDomainPlugin, Protocol):
174 def merge_ops(
175 self,
176 base: StateSnapshot,
177 ours_snap: StateSnapshot,
178 theirs_snap: StateSnapshot,
179 ours_ops: list[DomainOp],
180 theirs_ops: list[DomainOp],
181 *,
182 repo_root: pathlib.Path | None = None,
183 ) -> MergeResult: ...
184 ```
185
186 The core merge engine detects this with `isinstance(plugin, StructuredMergePlugin)` and calls
187 `merge_ops()` when both branches have `StructuredDelta`. Non-supporting plugins fall back to
188 file-level `merge()` automatically.
189
190 ### Operational Transformation (`muse/core/op_transform.py`)
191
192 | Function | Purpose |
193 |----------|---------|
194 | `ops_commute(a, b)` | Returns `True` when two ops can be applied in either order |
195 | `transform(a, b)` | Adjusts positions so the diamond property holds |
196 | `merge_op_lists(base, ours, theirs)` | Three-way OT merge; returns `MergeOpsResult` |
197 | `merge_structured(base_delta, ours_delta, theirs_delta)` | Wrapper for `StructuredDelta` inputs |
198
199 **Commutativity rules (all 25 op-pair combinations covered):**
200 - Different addresses → always commute
201 - `InsertOp` + `InsertOp` at same position → conflict
202 - `DeleteOp` + `DeleteOp` same content_id → idempotent (not a conflict)
203 - `PatchOp` + `PatchOp` → recursive check on child ops
204 - Cross-type pairs → generally commute (structural independence)
205
206 ---
207
208 ## Phase 4 — CRDT Semantics
209
210 Plugins that implement `CRDTPlugin` replace three-way merge with a mathematical `join` on a
211 lattice. **`join` always succeeds — no conflict state ever exists.**
212
213 ```python
214 @runtime_checkable
215 class CRDTPlugin(MuseDomainPlugin, Protocol):
216 def crdt_schema(self) -> list[CRDTDimensionSpec]: ...
217 def join(self, a: CRDTSnapshotManifest, b: CRDTSnapshotManifest) -> CRDTSnapshotManifest: ...
218 def to_crdt_state(self, snapshot: StateSnapshot) -> CRDTSnapshotManifest: ...
219 def from_crdt_state(self, crdt: CRDTSnapshotManifest) -> StateSnapshot: ...
220 ```
221
222 Entry point: `crdt_join_snapshots()` in `merge_engine.py`.
223
224 ### CRDT Primitive Library (`muse/core/crdts/`)
225
226 | Primitive | File | Best for |
227 |-----------|------|---------|
228 | `VectorClock` | `vclock.py` | Causal ordering between agents |
229 | `LWWRegister` | `lww_register.py` | Scalar values; last write wins |
230 | `ORSet` | `or_set.py` | Unordered sets; adds always win |
231 | `RGA` | `rga.py` | Ordered sequences (collaborative editing) |
232 | `AWMap` | `aw_map.py` | Key-value maps; adds win |
233 | `GCounter` | `g_counter.py` | Monotonically increasing counters |
234
235 All six satisfy: commutativity, associativity, idempotency — the three lattice laws that
236 guarantee convergence regardless of message delivery order.
237
238 ### When to use CRDT mode
239
240 | Scenario | Recommendation |
241 |----------|----------------|
242 | Human-paced commits (once per hour/day) | Three-way merge (Phases 1–3) |
243 | Many agents writing concurrently (sub-second) | CRDT mode |
244 | Shared annotation sets (many simultaneous contributors) | CRDT `ORSet` |
245 | Collaborative score editing (DAW-style) | CRDT `RGA` |
246 | Per-dimension mix | Set `merge_mode="crdt"` per `CRDTDimensionSpec` |
247
248 ---
249
250 ## The Full Plugin Protocol Stack
251
252 ```
253 MuseDomainPlugin ← required by every domain plugin
254 ├── schema() ← Phase 2: declare data structure
255 ├── snapshot() ← capture current live state
256 ├── diff() ← compute typed StructuredDelta
257 ├── drift() ← detect uncommitted changes
258 ├── apply() ← apply delta to working tree
259 └── merge() ← three-way merge (fallback)
260
261 StructuredMergePlugin ← optional Phase 3 extension
262 └── merge_ops() ← operation-level OT merge
263
264 CRDTPlugin ← optional Phase 4 extension
265 ├── crdt_schema() ← declare per-dimension CRDT types
266 ├── join() ← convergent lattice join
267 ├── to_crdt_state() ← lift plain snapshot to CRDT state
268 └── from_crdt_state() ← materialise CRDT state back to snapshot
269 ```
270
271 The core engine detects capabilities at runtime via `isinstance`:
272
273 ```python
274 if isinstance(plugin, CRDTPlugin) and schema["merge_mode"] == "crdt":
275 return crdt_join_snapshots(plugin, ...)
276 elif isinstance(plugin, StructuredMergePlugin):
277 return plugin.merge_ops(base, ours_snap, theirs_snap, ours_ops, theirs_ops)
278 else:
279 return plugin.merge(base, left, right)
280 ```
281
282 ---
283
284 ## How CLI Commands Use the Plugin
285
286 | Command | Plugin method(s) called |
287 |---------|------------------------|
288 | `muse commit` | `snapshot()`, `diff()` (for structured_delta) |
289 | `muse status` | `drift()` |
290 | `muse diff` | `diff()` |
291 | `muse show` | reads stored `structured_delta` |
292 | `muse merge` | `merge_ops()` or `merge()` (capability detection) |
293 | `muse cherry-pick` | `merge()` |
294 | `muse stash` | `snapshot()` |
295 | `muse checkout` | `diff()` + `apply()` |
296 | `muse domains` | `schema()`, capability introspection |
297
298 ---
299
300 ## Adding a New Domain — Quick Reference
301
302 1. Copy `muse/plugins/scaffold/plugin.py` → `muse/plugins/<domain>/plugin.py`
303 2. Implement all methods (every `raise NotImplementedError` must be replaced)
304 3. Register in `muse/plugins/registry.py`
305 4. Run `muse init --domain <domain>` in any project directory
306 5. All existing CLI commands work immediately
307
308 See the full [Plugin Authoring Guide](../guide/plugin-authoring-guide.md) for a step-by-step
309 walkthrough covering Phases 1–4 with examples.
310
311 ---
312
313 ## CLI Command Reference
314
315 ### Core VCS (all domains)
316
317 | Command | Description |
318 |---------|-------------|
319 | `muse init [--domain <name>]` | Initialize a repository |
320 | `muse commit -m <msg>` | Snapshot live state and record a commit |
321 | `muse status` | Show drift between HEAD and working tree |
322 | `muse diff [<base>] [<target>]` | Show delta between commits or vs. working tree |
323 | `muse log [--oneline] [--graph] [--stat]` | Display commit history |
324 | `muse show [<ref>] [--json] [--stat]` | Inspect a single commit with operation-level detail |
325 | `muse branch [<name>] [-d <name>]` | Create or delete branches |
326 | `muse checkout <branch\|commit> [-b]` | Switch branches or restore historical state |
327 | `muse merge <branch>` | Three-way merge (or CRDT join, capability-detected) |
328 | `muse cherry-pick <commit>` | Apply a specific commit's delta on top of HEAD |
329 | `muse revert <commit>` | Create a new commit undoing a prior commit |
330 | `muse reset <commit> [--hard]` | Move branch pointer |
331 | `muse stash` / `pop` / `list` / `drop` | Temporarily shelve uncommitted changes |
332 | `muse tag add <tag> [<ref>]` | Tag a commit |
333 | `muse tag list [<ref>]` | List tags |
334 | `muse domains` | Show domain dashboard — registered domains, capabilities, schema |
335
336 ### MIDI-Domain Extras (MIDI plugin only)
337
338 | Command | Description |
339 |---------|-------------|
340 | `muse commit --section <name> --track <name>` | Commit with music metadata |
341 | `muse log --section <s> --track <t>` | Filter log by music metadata |
342
343 ---
344
345 ## Testing & Verification
346
347 ```bash
348 # Full test suite (691 tests)
349 .venv/bin/pytest tests/ -v
350
351 # Type checking (zero errors required)
352 mypy muse/
353
354 # Typing audit (zero Any violations required)
355 python tools/typing_audit.py --dirs muse/ tests/ --max-any 0
356 ```
357
358 CI runs all three gates on every PR to `dev` and on every `dev → main` merge.
359
360 ---
361
362 ## Key Design Decisions
363
364 **Why no `async`?** The CLI is synchronous by design. All algorithms are CPU-bound and
365 complete in bounded time. If a domain's data is too large to diff synchronously, the plugin
366 should chunk it — this is a domain concern, not a core concern.
367
368 **Why TypedDicts over Pydantic?** Zero external dependencies. All types are JSON-serialisable
369 by construction. `mypy --strict` verifies them without runtime overhead.
370
371 **Why content-addressed storage?** Objects are never overwritten. Checkout, revert, and
372 cherry-pick cost zero bytes when the target objects already exist. The object store scales to
373 millions of fine-grained sub-elements (individual notes, nucleotides, mesh vertices) without
374 format changes.
375
376 **Why four phases?** Each phase is independently useful. A plugin that only implements
377 Phase 1 gets rich operation-level `muse show` output. Phase 2 adds algorithm selection.
378 Phase 3 adds sub-file auto-merge. Phase 4 adds convergent multi-agent semantics. Adoption
379 is incremental and current.