entity-identity.md
markdown
| 1 | # Entity Identity in Muse |
| 2 | |
| 3 | ## The problem with content hashes as identity |
| 4 | |
| 5 | Muse uses SHA-256 content hashes to address every object in its store. Two |
| 6 | blobs with identical bytes have the same hash — content equality. This is |
| 7 | correct for immutable storage but wrong for *entity identity*. |
| 8 | |
| 9 | When a musician changes a note's velocity from 80 to 100, the note has the |
| 10 | same identity from the musician's perspective. But the content hash changes, |
| 11 | so the old diff model produces a `DeleteOp + InsertOp` pair — the note |
| 12 | appears to have been removed and a completely different note inserted. All |
| 13 | lineage, provenance, and causal history is lost. |
| 14 | |
| 15 | ## The solution: stable entity IDs |
| 16 | |
| 17 | A `NoteEntity` in `muse/plugins/midi/entity.py` extends the five `NoteKey` |
| 18 | fields with an optional `entity_id` — a UUID4 that is assigned at first |
| 19 | insertion and **never changes**, regardless of how the note's fields are |
| 20 | mutated later. |
| 21 | |
| 22 | ``` |
| 23 | NoteKey: (pitch, velocity, start_tick, duration_ticks, channel) |
| 24 | ↑ content equality |
| 25 | |
| 26 | NoteEntity: NoteKey + entity_id (UUID4) |
| 27 | ↑ stable identity across mutations |
| 28 | ``` |
| 29 | |
| 30 | ## Entity assignment heuristic |
| 31 | |
| 32 | `assign_entity_ids()` maps a new note list onto entity IDs from the prior |
| 33 | commit using a three-tier matching strategy: |
| 34 | |
| 35 | 1. **Exact content match** — all five fields identical → same entity, no mutation. |
| 36 | 2. **Fuzzy match** — same pitch + channel, `|Δtick| ≤ threshold` (default 10), |
| 37 | and `|Δvelocity| ≤ threshold` (default 20) → same entity, emit `MutateOp`. |
| 38 | 3. **No match** → new entity, fresh UUID4, emit `InsertOp`. |
| 39 | |
| 40 | Notes in the prior index that matched nothing → emit `DeleteOp`. |
| 41 | |
| 42 | ## MutateOp vs. DeleteOp + InsertOp |
| 43 | |
| 44 | The `MutateOp` in `muse/domain.py` carries: |
| 45 | |
| 46 | | Field | Description | |
| 47 | |-------|-------------| |
| 48 | | `entity_id` | Stable entity ID | |
| 49 | | `old_content_id` | SHA-256 of the note before the mutation | |
| 50 | | `new_content_id` | SHA-256 of the note after the mutation | |
| 51 | | `fields` | `dict[field_name, FieldMutation(old, new)]` | |
| 52 | | `old_summary` / `new_summary` | Human-readable before/after strings | |
| 53 | |
| 54 | This enables queries like "show me all velocity edits to the cello part" across |
| 55 | the full commit history. |
| 56 | |
| 57 | ## Entity index storage |
| 58 | |
| 59 | Entity indexes live under `.muse/entity_index/` as derived artifacts: |
| 60 | |
| 61 | ``` |
| 62 | .muse/entity_index/ |
| 63 | <commit_id[:16]>/ |
| 64 | <track_safe_name>_<hash[:8]>.json |
| 65 | ``` |
| 66 | |
| 67 | They are fully rebuildable from commit history and should be added to |
| 68 | `.museignore` in CI to avoid accidental commits. |
| 69 | |
| 70 | ## Independence from core |
| 71 | |
| 72 | Entity identity is purely a music-plugin concern. The core engine |
| 73 | (`muse/core/`) never imports from `muse/plugins/`. The `MutateOp` and |
| 74 | `FieldMutation` types in `muse/domain.py` are domain-agnostic — a genomics |
| 75 | plugin can use the same types to track mutations in a nucleotide sequence. |
| 76 | |
| 77 | ## Related files |
| 78 | |
| 79 | | File | Role | |
| 80 | |------|------| |
| 81 | | `muse/domain.py` | `MutateOp`, `FieldMutation`, `EntityProvenance` | |
| 82 | | `muse/plugins/midi/entity.py` | `NoteEntity`, `EntityIndex`, `assign_entity_ids`, `diff_with_entity_ids` | |
| 83 | | `muse/plugins/midi/midi_diff.py` | `diff_midi_notes_with_entities()` | |
| 84 | | `tests/test_entity.py` | Unit tests | |