cgcardona / muse public
fix security main #12 / 100

fix(security): full surface hardening — validation, path containment, parser guards, CLI bounds

Introduces muse/core/validation.py as the central trust-boundary module (validate_object_id/ref_id/branch_name/domain_name, contain_path, sanitize_display, MAX_FILE_BYTES/RESPONSE_BYTES), then applies it across the entire codebase:

Group 1 — ID validation - object_store: SHA-256 integrity check, atomic write via os.replace, per-read size cap, validate_object_id at object_path/restore_object. - store: glob-safe prefix scan (sanitize_glob_prefix), runtime type checks in CommitRecord.from_dict, branch/repo_id validation, trust-boundary validation in resolve_commit_ref + store_pulled_commit. - merge_engine: validate IDs in read_merge_state/apply_resolution, 50 000-ancestor cap in find_merge_base/_all_ancestors.

Group 2 — Path containment - All 6 restore commands (checkout, merge, reset, revert, cherry_pick, stash): contain_path wraps every object restore. - branch/init/commit: validate_branch_name + validate_domain_name at entry points; HEAD-poisoning guard in commit._read_branch. - 7 semantic write commands + midi_shard: contain_path on all output paths; shard rebased_tick clamped to >= 0.

Group 3 — Snapshot integrity - compute_snapshot_id / compute_commit_id: null-byte separators replace | and : to eliminate separator-injection attacks. - walk_workdir: symlinks skipped; hidden files/directories excluded.

Group 4 — Parser / transport hardening - midi_parser: defusedxml via muse/core/xml_safe.py (SafeET), file-size cap, tempo=0 guard, Inf/NaN BPM guard, negative-divisions guard, root-is-None check. - midi_merge: sysex payload truncated to 64 KiB (MAX_SYSEX_BYTES). - transport: urlparse scheme check, redirect refusal via _STRICT_OPENER + _open_url helper (_HttpResponse Protocol for clean typing), Content-Length cap, streaming read cap, _assert_json_content guard on all three parse helpers.

Group 5 — CLI bounds + display sanitization - log: --max-count enforced >= 1; history walking bounded by limit. - find_phrase / agent_map: --depth capped 1–10 000; --min-score 0–1. - humanize: --timing <= 1.0 beat, --velocity <= 127. - invert: --pivot validated 0–127. - All typer.echo paths: sanitize_display strips ANSI/control chars across log, tag, branch, checkout, merge, reset, revert, cherry_pick, commit, find_phrase, agent_map.

Tests updated to use real SHA-256 hashes for object IDs, 64-char hex strings for commit/snapshot refs, and correct expectations for the new security behaviors (TransportError instead of JSONDecodeError, ValueError for content-integrity failures, etc.).

All checks pass: mypy (0 errors), typing_audit (0 violations), pytest (1967/1967 green).

Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>

G Gabriel Cardona <cgcardona@gmail.com> · 7h ago Mar 20, 2026 · 8d5137ed · parent 80353726
oldest
newest 89%

Snapshot Diff

272 files in tree
+2 ~47

Comments

0

No comments yet. Be the first to start the discussion.