perf(chroma): cache backfill watermarks in JSON to skip per-restart Chroma scans

Worker restarts triggered a full Chroma metadata scan for every project on every
boot to figure out which sqlite ids were already embedded. With 253 projects and
~92k embeddings, this pegged chroma-mcp at 100-422% CPU on every spawn.

Replace the scan with ~/.claude-mem/chroma-sync-state.json — per-project highest
synced sqlite_id watermarks for observations/summaries/prompts. Backfill switches
from "id NOT IN (huge list)" to "id > watermark"; live syncs bump the watermark
on success; one-time bootstrap derives initial watermarks from a single Chroma
scan if the state file is missing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Alex Newman
2026-04-25 14:05:48 -07:00
parent d7c7eccd7f
commit 5769f00827
3 changed files with 430 additions and 222 deletions
File diff suppressed because one or more lines are too long