cgcardona / muse public
muse-variation-spec.md markdown
579 lines 20.6 KB
12901c5a Initial extraction from tellurstori/maestro cgcardona <gabriel@tellurstori.com> 4d ago
1 # Muse / Variation Specification — End-to-End UX + Technical Contract (Stori)
2
3 > **Status:** Implementation Specification (v1)
4 > **Date:** February 2026
5 > **Target:** Stori DAW (Swift/SwiftUI) + Maestro/Intent Engine (Python)
6 > **Goal:** Ship a *demo-grade* implementation inside Stori that proves the "Cursor of DAWs" paradigm: **reviewable, audible, non-destructive AI changes**.
7
8 > **Canonical Time Unit:** All Muse and Variation data structures use **beats** as the canonical time unit. Seconds are a derived, playback-only representation. Muse reasons musically, not in wall-clock time.
9
10 > **Canonical Backend References:**
11 > For backend wire contract, state machine, and terminology, these docs are authoritative:
12 > - [variation_api.md](variation_api.md) — Wire contract, endpoints, SSE events, error codes
13 > - [terminology.md](terminology.md) — Canonical vocabulary (normative)
14 > - [muse_vcs.md](../architecture/muse_vcs.md) — Muse VCS architecture (persistent history, checkout, merge, log graph)
15
16 ---
17
18 ## What Is Muse?
19
20 **Muse** is Stori's change-proposal system for music.
21
22 Just as Git is a system for proposing, reviewing, and applying changes to source code, Muse is a system for proposing, reviewing, and applying changes to musical material.
23
24 Muse does not edit music directly.
25
26 Muse computes **Variations** — structured, reviewable descriptions of how one musical state differs from another — and presents them for human evaluation.
27
28 ---
29
30 ### Muse's Role in the System
31
32 Muse sits between **intent** and **mutation**.
33
34
35 ## 0) Canonical Terms (Do Not Drift)
36
37 This vocabulary is **normative**. Use these exact words in code, UI, docs, and agent prompts.
38
39 | Software analogy | Stori term | Definition |
40 |---|---|---|
41 | Git | **Muse** | The creative intelligence / system that proposes musical ideas |
42 | Diff | **Variation** | A proposed musical interpretation expressed as a semantic, audible change set |
43 | Hunk | **Phrase** | An independently reviewable/applicable musical phrase (bars/region slice) |
44 | Commit | **Accept Variation** | Apply selected phrases to canonical state; creates a single undo boundary |
45 | Reject | **Discard Variation** | Close the proposal without mutating canonical state |
46 | Revert | **Undo Variation** | Uses DAW undo/redo; engine-aware and audio-safe |
47 | Branch (future) | Alternate Interpretation | Parallel musical directions |
48 | Merge (future) | Blend Variations | Combine harmony from A + rhythm from B + etc. |
49
50 > **Key concept:** A diff is read. A Variation is **heard**.
51 > **Time unit:** Muse reasons in **beats**, not seconds. Time is a playback concern.
52
53 ---
54
55 ## 1) When Variations Appear (Execution Mode Policy)
56
57 The backend enforces execution mode based on intent classification. The frontend does not choose the mode — it reacts to the `state` SSE event emitted at the start of every compose stream.
58
59 ### 1.1 Core Rule — COMPOSING Always Produces a Variation
60
61 | Intent state | `execution_mode` | Behavior |
62 |---|---|---|
63 | **COMPOSING** | `variation` (forced by backend) | All tool calls produce a Variation for human review |
64 | **EDITING** | `apply` (forced by backend) | Structural ops (add track, set tempo, mute, etc.) apply immediately |
65 | **REASONING** | n/a | Chat only, no tools |
66
67 **Every COMPOSING request produces a Variation** — including purely additive ones (first-time MIDI generation, creating a new song from scratch). This mirrors the "Cursor of DAWs" paradigm: AI-generated musical content always requires human approval before becoming canonical state.
68
69 **Examples (Variation Review UI — COMPOSING):**
70 - "Create a new song in the style of Phish" — additive, but COMPOSING -> Variation
71 - "Make a chill lo-fi beat at 85 BPM" — additive, COMPOSING -> Variation
72 - "Make that minor" (transforms pitches) — COMPOSING -> Variation
73 - "Simplify the melody" (removals/modifications) — COMPOSING -> Variation
74 - "Change the bassline to be more syncopated" (re-writes notes) — COMPOSING -> Variation
75
76 **Examples (direct apply, no Variation — EDITING):**
77 - "Add a drum track" — structural, EDITING -> apply
78 - "Set the tempo to 120 BPM" — structural, EDITING -> apply
79 - "Mute the bass" — structural, EDITING -> apply
80
81 ### 1.2 "Create a new song in the style of ..." (Multi-step Tool Flow)
82
83 When the user asks to create a song from scratch, the backend classifies this as COMPOSING and the entire plan (tracks + regions + notes + FX) is proposed as a **single Variation** for review.
84
85 **Behavior:**
86 1. The planner generates a full plan (create tracks -> add regions -> generate MIDI -> add FX).
87 2. The executor simulates the plan without mutation and computes a Variation with Phrases.
88 3. The SSE stream emits `meta` -> `phrase*` -> `done` events.
89 4. The frontend enters **Variation Review Mode** showing the proposed changes.
90 5. The user reviews, auditions (A/B), and accepts or discards.
91
92 This ensures the user always has agency over AI-generated content, even during initial creation. The UX is a single review step at the end of generation — not repeated pop-ups per tool call.
93
94 ### 1.3 User Trust Overrides
95 Always show Variation UI when:
96 - The change is **destructive** (deletes/overwrites notes/regions)
97 - The target material is **user-edited** (has `userTouched=true`) or "pinned/locked"
98 - The change is **large-scope** (multi-track rewrite)
99 - The model's confidence is low OR the engine produced a best-effort fallback
100
101 ### 1.4 Quick Setting (future)
102 Add a user preference (later):
103 - **Muse Review Mode:** `Always` | `Smart (default)` | `Never (power users)`
104
105 When implemented, this preference will be stored server-side and consulted in `orchestrate()`. Even in `Never` mode, destructive changes should warn.
106
107 ---
108
109 ## 2) System Model
110
111 ### 2.1 Canonical vs Proposed State
112 - **Canonical State**: the DAW's real project state (undoable, playable, saved).
113 - **Proposed State**: an ephemeral, derived state computed by backend to propose a Variation.
114
115 **Important:** The backend does **not** mutate canonical state during proposal.
116
117 ### 2.2 Variation Lifecycle
118
119 1. **Propose**: Muse generates a Variation from intent.
120 2. **Stream**: Phrases (hunks) stream to the frontend as soon as they're computed.
121 3. **Review**: FE enters Variation Review Mode (overlay + A/B audition).
122 4. **Accept**: FE sends accepted phrase IDs; BE applies them transactionally.
123 5. **Discard**: FE discards; no mutation.
124
125 ---
126
127 ## 3) API Contract (Backend <-> Frontend)
128
129 This spec assumes HTTP + **SSE** (server-sent events) for streaming. WebSockets also acceptable; SSE is simpler for v1.
130
131 ### 3.1 Identifiers & Concurrency
132 All Variation operations must carry:
133 - `project_id`
134 - `base_state_id` (monotonic project version, e.g., UUID or int)
135 - `variation_id`
136 - Optional `request_id` for idempotency
137
138 Backend must reject commits if `base_state_id` mismatches (optimistic concurrency) unless FE explicitly requests rebase.
139
140 ### 3.2 Endpoints
141
142 #### (A) Propose Variation
143 `POST /variation/propose`
144
145 **Request**
146 ```json
147 {
148 "project_id": "uuid",
149 "base_state_id": "uuid-or-int",
150 "intent": "make that minor",
151 "scope": {
152 "track_ids": ["uuid"],
153 "region_ids": ["uuid"],
154 "beat_range": [4.0, 8.0]
155 },
156 "options": {
157 "phrase_grouping": "bars",
158 "bar_size": 4,
159 "stream": true
160 },
161 "request_id": "uuid"
162 }
163 ```
164
165 **Immediate Response (fast)**
166 ```json
167 {
168 "variation_id": "uuid",
169 "project_id": "uuid",
170 "base_state_id": "uuid-or-int",
171 "intent": "make that minor",
172 "ai_explanation": null,
173 "stream_url": "/variation/stream?variation_id=uuid"
174 }
175 ```
176
177 #### (B) Stream Variation (phrases/hunks)
178 `GET /variation/stream?variation_id=...` (SSE)
179
180 All events are wrapped in a transport-agnostic `EventEnvelope`:
181 ```json
182 {
183 "type": "meta|phrase|done|error|heartbeat",
184 "sequence": 1,
185 "variation_id": "uuid",
186 "project_id": "uuid",
187 "base_state_id": "uuid-or-int",
188 "timestamp_ms": 1700000000000,
189 "payload": { }
190 }
191 ```
192 `sequence` is strictly increasing per variation (meta=1, then phrases, then done last).
193 The event-specific data lives in `payload`; outer fields provide routing and ordering context.
194
195 **SSE Events**
196 - `meta` — overall summary + UX copy + counts
197 - `phrase` — one musical phrase at a time
198 - `done` — end of stream
199 - `error` — terminal
200 - `heartbeat` — keepalive (no payload significance)
201
202 > `progress` events are not yet implemented.
203
204 **Example: `meta`** (this is the `payload` field inside the `EventEnvelope`; `variation_id`, `project_id`, `base_state_id`, and `sequence` are in the outer envelope)
205 ```json
206 {
207 "intent": "make that minor",
208 "ai_explanation": "Lowered scale degrees 3 and 7",
209 "affected_tracks": ["uuid"],
210 "affected_regions": ["uuid"],
211 "note_counts": { "added": 12, "removed": 4, "modified": 8 }
212 }
213 ```
214
215 **Example: `phrase`**
216 ```json
217 {
218 "phrase_id": "uuid",
219 "track_id": "uuid",
220 "region_id": "uuid",
221 "start_beat": 16.0,
222 "end_beat": 32.0,
223 "label": "Bars 5-8",
224 "tags": ["harmonyChange","scaleChange"],
225 "explanation": "Converted major 3rds to minor 3rds",
226 "note_changes": [
227 {
228 "note_id": "uuid",
229 "change_type": "modified",
230 "before": { "pitch": 64, "start_beat": 0.0, "duration_beats": 0.5, "velocity": 90 },
231 "after": { "pitch": 63, "start_beat": 0.0, "duration_beats": 0.5, "velocity": 90 }
232 }
233 ],
234 "controller_changes": [
235 { "kind": "cc", "cc": 64, "beat": 0.0, "value": 127 },
236 { "kind": "pitch_bend", "beat": 1.5, "value": 4096 },
237 { "kind": "aftertouch", "beat": 2.0, "value": 80 }
238 ]
239 }
240 ```
241
242 > **Beat semantics:** `phrase.start_beat` / `phrase.end_beat` are **absolute project positions**. Note `start_beat` values inside `note_changes` are **region-relative** (offset from the region's start beat). This matches how DAWs universally store MIDI data within regions.
243
244 **Example: `done`**
245
246 The `variation_id` is carried in the outer `EventEnvelope` wrapper (not repeated in the payload).
247 ```json
248 { "status": "ready", "phrase_count": 3 }
249 ```
250
251 #### (C) Commit (Accept Variation)
252 `POST /variation/commit`
253
254 **Request**
255 ```json
256 {
257 "project_id": "uuid",
258 "base_state_id": "uuid-or-int",
259 "variation_id": "uuid",
260 "accepted_phrase_ids": ["uuid","uuid"],
261 "request_id": "uuid"
262 }
263 ```
264
265 **Response**
266 ```json
267 {
268 "project_id": "uuid",
269 "new_state_id": "uuid-or-int",
270 "applied_phrase_ids": ["uuid","uuid"],
271 "undo_label": "Accept Variation: make that minor",
272 "updated_regions": [
273 {
274 "region_id": "uuid",
275 "track_id": "uuid",
276 "notes": [
277 { "pitch": 60, "start_beat": 0.0, "duration_beats": 1.0, "velocity": 100, "channel": 0 }
278 ],
279 "cc_events": [
280 { "cc": 64, "beat": 0.0, "value": 127 }
281 ],
282 "pitch_bends": [],
283 "aftertouch": []
284 }
285 ]
286 }
287 ```
288
289 #### (D) Poll Variation Status
290 `GET /variation/{variation_id}`
291
292 Returns the current status and accumulated phrases for a variation. Useful for
293 reconnect flows and clients that can't maintain a long-lived SSE connection.
294
295 **Response**
296 ```json
297 {
298 "variation_id": "uuid",
299 "status": "ready",
300 "intent": "make that minor",
301 "phrases": []
302 }
303 ```
304
305 #### (E) Discard Variation
306 `POST /variation/discard`
307
308 ```json
309 {
310 "project_id": "uuid",
311 "variation_id": "uuid",
312 "request_id": "uuid"
313 }
314 ```
315
316 Returns `{ "ok": true }`.
317
318 ---
319
320 ## 4) Variation Data Shapes (Canonical JSON)
321
322 ### 4.1 Variation (meta)
323 ```json
324 {
325 "variation_id": "uuid",
326 "intent": "string",
327 "ai_explanation": "string|null",
328 "affected_tracks": ["uuid"],
329 "affected_regions": ["uuid"],
330 "beat_range": [0.0, 16.0],
331 "note_counts": { "added": 0, "removed": 0, "modified": 0 }
332 }
333 ```
334
335 ### 4.2 Phrase
336 ```json
337 {
338 "phrase_id": "uuid",
339 "track_id": "uuid",
340 "region_id": "uuid",
341 "start_beat": 0.0,
342 "end_beat": 4.0,
343 "label": "Bars 1-4",
344 "tags": [],
345 "explanation": "string|null",
346 "note_changes": [],
347 "controller_changes": []
348 }
349 ```
350
351 ### 4.3 NoteChange
352 ```json
353 {
354 "note_id": "uuid",
355 "change_type": "added|removed|modified",
356 "before": { "pitch": 60, "start_beat": 0.0, "duration_beats": 1.0, "velocity": 90 },
357 "after": { "pitch": 60, "start_beat": 0.0, "duration_beats": 1.0, "velocity": 90 }
358 }
359 ```
360
361 Rules:
362 - `added` -> `before` must be null (enforced by backend)
363 - `removed` -> `after` must be null (enforced by backend)
364 - `modified` -> both `before` and `after` must be present
365 - All positions in **beats** (not seconds)
366 - `start_beat` within `before`/`after` is **region-relative** (offset from the region's start)
367
368 ### 4.4 Controller Changes (Expressive MIDI)
369
370 Phrases carry `controller_changes` — expressive MIDI data beyond notes. The
371 pipeline supports the **complete** set of musically relevant MIDI messages:
372
373 | `kind` | Fields | MIDI byte | Coverage |
374 |--------|--------|-----------|----------|
375 | `cc` | `cc`, `beat`, `value` | Control Change (0xBn) | All 128 CC numbers: sustain (64), expression (11), modulation (1), volume (7), pan (10), filter cutoff (74), resonance (71), reverb send (91), chorus send (93), attack (73), release (72), soft pedal (67), sostenuto (66), legato (68), breath (2), etc. |
376 | `pitch_bend` | `beat`, `value` | Pitch Bend (0xEn) | 14-bit signed (−8192 to 8191) |
377 | `aftertouch` | `beat`, `value` | Channel Pressure (0xDn) | No `pitch` field → channel-wide pressure |
378 | `aftertouch` | `beat`, `value`, `pitch` | Poly Key Pressure (0xAn) | `pitch` present → per-note pressure |
379
380 Program Change is handled at track level (`stori_set_midi_program`).
381 Track-level automation curves (volume, pan, FX params) are handled by
382 `stori_add_automation`.
383
384 After commit, the full expressive state is materialized in `updated_regions`
385 as three separate arrays: `cc_events`, `pitch_bends`, `aftertouch`.
386
387 ---
388
389 ## 5) Backend Implementation Guidance
390
391 ### 5.1 Execution Mode Policy (Backend-Owned)
392
393 The backend determines `execution_mode` based on intent classification. The frontend's `execution_mode` field is deprecated and ignored.
394
395 - **COMPOSING** -> `execution_mode="variation"` -> Variation proposal (no mutation)
396 - **EDITING** -> `execution_mode="apply"` -> Immediate tool call execution
397 - **REASONING** -> no tools
398
399 This is enforced in `orchestrate()` (`app/core/maestro_handlers.py`). The frontend knows which mode is active from the `state` SSE event (`"composing"` / `"editing"` / `"reasoning"`) emitted at the start of every stream.
400
401 ### 5.2 Proposed State Construction
402 Avoid copying whole projects:
403 - Identify affected regions/tracks
404 - Clone only those regions (notes + essential metadata)
405 - Apply existing transform functions onto the clones
406
407 ### 5.3 Diffing / Matching Notes
408 Start simple:
409 - Match by `(pitch, start)` proximity with a tolerance (e.g., 1/16 note)
410 - If ambiguous, prefer same pitch then closest start-time
411 - Emit `modified` rather than `remove+add` when a single note clearly moved
412
413 ### 5.4 Phrase Grouping (MVP)
414 - Group changes by **bar windows** (e.g., 4 bars per phrase)
415 - Or by region boundaries if the region already stores bar markers
416
417 ### 5.5 Streaming
418 Compute hunks incrementally and stream as soon as available:
419 - `meta` ASAP
420 - then `phrase` events
421 - progress optional
422
423 Streaming is what makes the UI feel alive and Cursor-like.
424
425 ---
426
427 ## 6) Frontend UX Spec (Variation Review Mode)
428
429 ### 6.1 Entry
430 Variation Review Mode enters when the compose stream emits a `state` event with `state: "composing"`, followed by `meta` and `phrase` events. The frontend must:
431 1. Detect `state: "composing"` -> prepare for Variation Review Mode
432 2. Receive `meta` event -> show banner with intent, explanation, counts
433 3. Receive `phrase` events -> accumulate phrases for review
434 4. Receive `done` event -> enable Accept/Discard controls
435
436 For `state: "editing"`, the frontend applies `toolCall` events directly. The backend also emits `plan` and `planStepUpdate` events to render a step-by-step checklist. See [api.md](../reference/api.md) for the full event reference.
437
438 ### 6.2 Chrome (always visible while reviewing)
439 Banner containing:
440 - Intent text
441 - AI explanation (optional)
442 - Counts: +added / -removed / ~modified
443 - Controls: **A/B**, **Delta Solo**, **Accept**, **Discard**, **Review Phrases**
444
445 ### 6.3 Visual Language (Piano Roll + Score)
446 - Added: green
447 - Removed: red ghost
448 - Modified: connector + highlighted proposed note
449 - Unchanged: normal
450
451 ### 6.4 Audition
452 Required:
453 - Play Original (A)
454 - Play Variation (B)
455 - Delta Solo (changes only)
456 - Loop selected phrase
457
458 MVP audio strategy:
459 - Rebuild MIDI regions in-memory for audition modes and switch at beat boundary.
460 - If switching causes glitches, pause -> swap -> resume at same transport time (acceptable for MVP).
461
462 ### 6.5 Partial Acceptance
463 In the "Review Phrases" sheet/list:
464 - Each phrase row shows summary `+ / - / ~`
465 - Accept / reject per phrase
466 - "Apply Selected" commits accepted phrase IDs
467
468 ### 6.6 Exit
469 - Accept -> applies to project, pushes one undo group, exits review mode
470 - Discard -> exits review mode without changes
471
472 ---
473
474 ## 7) Failure Modes & UX Rules
475
476 ### 7.1 If streaming fails mid-way
477 - Keep received hunks
478 - Show a "Retry stream" button
479 - Allow Discard
480
481 ### 7.2 If commit fails due to `base_state_id` mismatch
482 - Offer: "Rebase Variation" (future)
483 - MVP: show message: "Project changed while reviewing; regenerate variation."
484
485 ### 7.3 If the user edits while reviewing
486 MVP rule:
487 - Block destructive edits to affected regions, or
488 - Allow edits but invalidate Variation (recommended: invalidate with clear toast)
489
490 ---
491
492 ## 8) MVP Cut (What to Ship First)
493
494 1. **Variation propose + stream hunks (SSE)**
495 2. **Piano roll overlay rendering**
496 3. **A/B audition (pause/swap/resume acceptable)**
497 4. **Accept all / Discard**
498 5. **Per-phrase accept (optional but high value)**
499
500 Score view diff + controller diffs can come after the demo.
501
502 ---
503
504 ## 9) Demo Script (Suggested)
505
506 1. Generate a major piano riff.
507 2. Ask: "Make that minor and more mysterious."
508 3. Variation Review appears:
509 - green/red note overlay
510 - A/B toggle + Delta Solo
511 4. Accept only bars 5-8, discard rest.
512 5. Undo to prove it's safe.
513
514 ---
515
516 ## 10) Appendix: Implementation Checklist
517
518 ### Backend
519
520 **Core (Implemented & Tested):**
521 - [x] `POST /variation/propose` returns `variation_id` + `stream_url`
522 - [x] `POST /variation/commit` accepts `accepted_phrase_ids`
523 - [x] `POST /variation/discard` returns `{"ok": true}`
524 - [x] SSE stream emits `meta`, `phrase*`, `done` (via `/maestro/stream`)
525 - [x] Phrase grouping by bars (4 bars per phrase default)
526 - [x] Commit applies accepted phrases only, returns `new_state_id`
527 - [x] No mutation in variation mode
528 - [x] All data uses beats as canonical unit (not seconds/milliseconds)
529 - [x] Optimistic concurrency via `base_state_id` checks
530 - [x] Zero Git terminology — pristine musical language
531 - [x] `VariationService` computes variations (not "diffs")
532 - [x] `Phrase` model for independently reviewable changes
533 - [x] `NoteChange` model for note transformations
534 - [x] Beat-based fields: `start_beat`, `duration_beats`, `beat_range`
535
536 **v1 Infrastructure (State Machine + Envelope + Store):**
537 - [x] `VariationStatus` enum: CREATED -> STREAMING -> READY -> COMMITTED/DISCARDED/FAILED/EXPIRED
538 - [x] `assert_transition()` enforces valid state machine transitions
539 - [x] `EventEnvelope` with type, sequence, variation_id, project_id, base_state_id, payload
540 - [x] `SequenceCounter` for per-variation monotonic sequence numbers
541 - [x] `VariationStore` (in-memory) for variation records + phrase storage
542 - [x] `SSEBroadcaster` with publish, subscribe, replay, late-join support
543 - [x] Builder helpers: `build_meta_envelope`, `build_phrase_envelope`, `build_done_envelope`, `build_error_envelope`
544
545 **v1 Supercharge (Complete):**
546 - [x] Wired infrastructure into endpoints (propose/commit/discard)
547 - [x] `GET /variation/stream` — real SSE with envelopes, replay, heartbeat
548 - [x] `GET /variation/{variation_id}` — status polling + reconnect
549 - [x] Note removals implemented in commit engine
550 - [x] Background generation task (async propose via `asyncio.create_task`)
551 - [x] Discard cancels in-flight generation
552 - [x] `stream_router.py` — single publish entry point (WS-ready)
553 - [x] Commit loads variation from store
554
555 **Execution Mode Policy (New):**
556 - [x] Backend forces `execution_mode="variation"` for all COMPOSING intents
557 - [x] Backend forces `execution_mode="apply"` for all EDITING intents
558 - [x] Frontend reacts to `state` SSE event; backend determines mode from intent
559
560 ### Frontend (Not Yet Started)
561 - [ ] Detect `state: "composing"` SSE event and enter Variation Review Mode
562 - [ ] Detect `state: "editing"` SSE event and apply tool calls directly (existing behavior)
563 - [ ] Parse and accumulate `meta`, `phrase`, `done` events during COMPOSING
564 - [ ] Variation Review Mode overlay chrome (banner, counts, intent)
565 - [ ] Render note states (added/removed/modified) in piano roll
566 - [ ] Phrase list UI with accept/reject per phrase
567 - [ ] A/B + Delta Solo audition
568 - [ ] Commit/discard flows with state-id checks
569 - [ ] Convert beats to audio time for playback only
570
571 ---
572
573 ## North-Star Reminder
574
575 > **Muse proposes Variations organized as Phrases.**
576 > **Humans choose the music.**
577 > **Everything is measured in beats.**
578
579 If this sticks, it becomes a new creative primitive for the entire industry.