Upstream Claude Code bug (anthropics/claude-code#24529) leaves
CLAUDE_PLUGIN_ROOT unset for Stop hooks on macOS and ALL hooks
on Linux. Two-layer defense:
1. Shell-level: hooks.json commands now use inline fallback
_R="${CLAUDE_PLUGIN_ROOT}"; [ -z "$_R" ] && _R="$HOME/...";
falling back to the known marketplace install path.
2. Script-level: bun-runner.js self-resolves plugin root from
its own filesystem location via import.meta.url, and fixes
broken /scripts/... paths that result from empty expansion.
Added test to verify all hook commands include the fallback path.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three compounding bugs caused hook failures:
1. Missing break statements in worker-service.ts switch — if async
code threw before process.exit(), execution fell through to
subsequent cases. Added break to all 7 cases missing them.
2. Unhandled promise rejection on main() — added .catch() that logs
the error and exits 0 (per project exit code strategy: don't block
Claude Code or leave Windows Terminal tabs open).
3. Redundant start commands in hooks.json — PostToolUse,
UserPromptSubmit, and Stop groups each had a standalone start
command that was redundant (the hook case already calls
ensureWorkerStarted internally). The redundant start also caused
5s latency via bun-runner.js collectStdin() timeout since Claude
Code never closes stdin.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add terminal output control for SessionStart context
Add CLAUDE_MEM_CONTEXT_SHOW_TERMINAL_OUTPUT setting to control whether
context is displayed in the terminal at SessionStart.
When set to "false", the terminal remains clean at startup while
context is still injected into Claude's system prompt. This allows
users who find the context output verbose to disable it without
losing the automatic context injection.
Defaults to "true" for backward compatibility.
Changes:
- Add CLAUDE_MEM_CONTEXT_SHOW_TERMINAL_OUTPUT to SettingsDefaultsManager
- Check setting in context handler before setting systemMessage
- Update settings file format to include new option
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: use USER_SETTINGS_PATH and skip color fetch when disabled
Address PR feedback from automated review:
1. Use shared USER_SETTINGS_PATH constant instead of hardcoded path
- Respects custom CLAUDE_MEM_DATA_DIR override
- Consistent with other handlers (session-init, observation)
2. Skip color fetch when terminal output disabled
- Check setting before making HTTP requests
- Saves network round-trip on every session start
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Alan Dong <adong@Alans-MacBook-Pro.local>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* MAESTRO: fix ChromaDB core issues — Python pinning, Windows paths, disable toggle, metadata sanitization, transport errors
- Add --python version pinning to uvx args in both local and remote mode (fixes#1196, #1206, #1208)
- Convert backslash paths to forward slashes for --data-dir on Windows (fixes#1199)
- Add CLAUDE_MEM_CHROMA_ENABLED setting for SQLite-only fallback mode (fixes#707)
- Sanitize metadata in addDocuments() to filter null/undefined/empty values (fixes#1183, #1188)
- Wrap callTool() in try/catch for transport errors with auto-reconnect (fixes#1162)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* MAESTRO: fix data integrity — content-hash deduplication, project name collision, empty project guard, stuck isProcessing
- Add SHA-256 content-hash deduplication to observations INSERT (store.ts, transactions.ts, SessionStore.ts)
- Add content_hash column via migration 22 with backfill and index
- Fix project name collision: getCurrentProjectName() now returns parent/basename
- Guard against empty project string with cwd-derived fallback
- Fix stuck isProcessing: hasAnyPendingWork() resets processing messages older than 5 minutes
- Add 12 new tests covering all four fixes
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* MAESTRO: fix hook lifecycle — stderr suppression, output isolation, conversation pollution prevention
- Suppress process.stderr.write in hookCommand() to prevent Claude Code showing diagnostic
output as error UI (#1181). Restores stderr in finally block for worker-continues case.
- Convert console.error() to logger.warn()/error() in hook-command.ts and handlers/index.ts
so all diagnostics route to log file instead of stderr.
- Verified all 7 handlers return suppressOutput: true (prevents conversation pollution #598, #784).
- Verified session-complete is a recognized event type (fixes#984).
- Verified unknown event types return no-op handler with exit 0 (graceful degradation).
- Added 10 new tests in tests/hook-lifecycle.test.ts covering event dispatch, adapter defaults,
stderr suppression, and standard response constants.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* MAESTRO: fix worker lifecycle — restart loop coordination, stale transport retry, ENOENT shutdown race
- Add PID file mtime guard to prevent concurrent restart storms (#1145):
isPidFileRecent() + touchPidFile() coordinate across sessions
- Add transparent retry in ChromaMcpManager.callTool() on transport
error — reconnects and retries once instead of failing (#1131)
- Wrap getInstalledPluginVersion() with ENOENT/EBUSY handling (#1042)
- Verified ChromaMcpManager.stop() already called on all shutdown paths
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* MAESTRO: fix Windows platform support — uvx.cmd spawn, PowerShell $_ elimination, windowsHide, FTS5 fallback
- Route uvx spawn through cmd.exe /c on Windows since MCP SDK lacks shell:true (#1190, #1192, #1199)
- Replace all PowerShell Where-Object {$_} pipelines with WQL -Filter server-side filtering (#1024, #1062)
- Add windowsHide: true to all exec/spawn calls missing it to prevent console popups (#1048)
- Add FTS5 runtime probe with graceful fallback when unavailable on Windows (#791)
- Guard FTS5 table creation in migrations, SessionSearch, and SessionStore with try/catch
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* MAESTRO: fix skills/ distribution — build-time verification and regression tests (#1187)
Add post-build verification in build-hooks.js that fails if critical
distribution files (skills, hooks, plugin manifest) are missing. Add
10 regression tests covering skill file presence, YAML frontmatter,
hooks.json integrity, and package.json files field.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* MAESTRO: fix MigrationRunner schema initialization (#979) — version conflict between parallel migration systems
Root cause: old DatabaseManager migrations 1-7 shared schema_versions table with
MigrationRunner's 4-22, causing version number collisions (5=drop tables vs add column,
6=FTS5 vs prompt tracking, 7=discovery_tokens vs remove UNIQUE). initializeSchema()
was gated behind maxApplied===0, so core tables were never created when old versions
were present.
Fixes:
- initializeSchema() always creates core tables via CREATE TABLE IF NOT EXISTS
- Migrations 5-7 check actual DB state (columns/constraints) not just version tracking
- Crash-safe temp table rebuilds (DROP IF EXISTS _new before CREATE)
- Added missing migration 21 (ON UPDATE CASCADE) to MigrationRunner
- Added ON UPDATE CASCADE to FK definitions in initializeSchema()
- All changes applied to both runner.ts and SessionStore.ts
Tests: 13 new tests in migration-runner.test.ts covering fresh DB, idempotency,
version conflicts, crash recovery, FK constraints, and data integrity.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* MAESTRO: fix 21 test failures — stale mocks, outdated assertions, missing OpenClaw guards
Server tests (12): Added missing workerPath and getAiStatus to ServerOptions
mocks after interface expansion. ChromaSync tests (3): Updated to verify
transport cleanup in ChromaMcpManager after architecture refactor. OpenClaw (2):
Added memory_ tool skipping and response truncation to prevent recursive loops
and oversized payloads. MarkdownFormatter (2): Updated assertions to match
current output. SettingsDefaultsManager (1): Used correct default key for
getBool test. Logger standards (1): Excluded CLI transcript command from
background service check.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* MAESTRO: fix Codex CLI compatibility (#744) — session_id fallbacks, unknown platform tolerance, undefined guard
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* MAESTRO: fix Cursor IDE integration (#838, #1049) — adapter field fallbacks, tolerant session-init validation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* MAESTRO: fix /api/logs OOM (#1203) — tail-read replaces full-file readFileSync
Replace readFileSync (loads entire file into memory) with readLastLines()
that reads only from the end of the file in expanding chunks (64KB → 10MB cap).
Prevents OOM on large log files while preserving the same API response shape.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* MAESTRO: fix Settings CORS error (#1029) — explicit methods and allowedHeaders in CORS config
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* MAESTRO: add session custom_title for agent attribution (#1213) — migration 23, endpoint + store support
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* MAESTRO: prevent CLAUDE.md/AGENTS.md writes inside .git/ directories (#1165)
Add .git path guard to all 4 write sites to prevent ref corruption when
paths resolve inside .git internals.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* MAESTRO: fix plugin disabled state not respected (#781) — early exit check in all hook entry points
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* MAESTRO: fix UserPromptSubmit context re-injection on every turn (#1079) — contextInjected session flag
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* MAESTRO: fix stale AbortController queue stall (#1099) — lastGeneratorActivity tracking + 30s timeout
Three-layer fix:
1. Added lastGeneratorActivity timestamp to ActiveSession, updated by
processAgentResponse (all agents), getMessageIterator (queue yields),
and startGeneratorWithProvider (generator launch)
2. Added stale generator detection in ensureGeneratorRunning — if no
activity for >30s, aborts stale controller, resets state, restarts
3. Added AbortSignal.timeout(30000) in deleteSession to prevent
indefinite hang when awaiting a stuck generator promise
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
src/services/transcripts/cli.ts is a CLI command with user-visible
console output, not a background service — console.log is intentional.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace hardcoded marketplace path in plugin/scripts/smart-install.js with
resolveRoot() that uses CLAUDE_PLUGIN_ROOT env var (set by Claude Code for
all hooks), with fallback to script location and legacy paths. Fixes#1128,
#1166 where cache installs couldn't find or install node_modules.
Also fixes installCLI() path (ROOT/plugin/scripts/ → ROOT/scripts/) and adds
verifyCriticalModules() post-install check with npm fallback on failure.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add string-to-array coercion for ids and memorySessionIds in DataRoutes.ts
batch endpoints so MCP clients sending "[1,2,3]" or "1,2,3" instead of
native arrays no longer get 400 errors. Wrap observation storage path in
SessionRoutes.ts with try/catch returning 200 on recoverable errors instead
of 500, preventing hook breakage.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: rename save_memory to save_observation and fix MCP search instructions
Stop the primary agent from proactively saving memories by renaming
save_memory to save_observation with a neutral description. Remove
"Saving Memories" section from SKILL.md. Update context formatters
and output styles to reference the mem-search skill instead of raw
MCP tool names.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: split SessionStart hooks so smart-install failure doesn't block worker start
smart-install.js and worker-start were in the same hook group, so if
smart-install exited non-zero the worker never started. Split into
separate hook groups so they run independently.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: worker startup waits for readiness before hooks fire
Move initializationCompleteFlag to set after DB/search init (not MCP),
add waitForReadiness() polling /api/readiness, and extract shared
pollEndpointUntilOk helper to DRY up health/readiness checks.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Copied compiled installer to install/public/installer.js so Vercel
serves it at install.cmem.ai/installer.js. Updated install.sh to
fetch from same domain instead of raw.githubusercontent.com.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The bootstrap script (install.sh) fetches installer/dist/index.js from
main, but it was never committed due to the global dist/ gitignore rule.
Added negation rule and the compiled installer bundle.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: prevent duplicate worker daemons and zombie processes
Three root causes of chroma-mcp timeouts:
1. HTTP shutdown (POST /api/admin/shutdown) closed resources but never
called process.exit(). Zombie workers stayed alive, background tasks
reconnected to chroma-mcp, spawning duplicate subprocesses that all
contended for the same persistent data directory.
2. No guard against concurrent daemon startup. When hooks fired
simultaneously, multiple daemons started before either wrote a PID
file. The loser got EADDRINUSE but stayed alive because signal
handlers registered in the constructor prevented exit.
3. Corrupt 147GB HNSW index file caused all chroma queries to timeout
(MCP error -32001). Data fix: deleted corrupt collection, backfill
rebuilds from SQLite.
Code fixes:
- Add PID-based guard in daemon startup: exit if PID file process alive
- Add port-based guard in daemon startup: exit if port already bound
(runs before WorkerService constructor registers keepalive handlers)
- Add process.exit(0) after HTTP shutdown/restart completes
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: aggressive startup cleanup and one-time chroma wipe for upgrade
Kill orphaned worker-service.cjs and chroma-mcp processes immediately
at startup (no age gate) while keeping 30-min threshold for mcp-server.
Wipe corrupt chroma data once on upgrade from pre-v10.3 versions —
backfill rebuilds from SQLite automatically.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: wrap shutdown handlers in try/finally to guarantee process.exit
If onShutdown() or onRestart() threw, process.exit(0) was never reached,
leaving the daemon alive as a zombie. Also removed redundant require('fs')
calls in process-manager tests where ESM imports already existed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat: replace WASM embeddings with persistent chroma-mcp MCP connection
Replace ChromaServerManager (npx chroma run + chromadb npm + ONNX/WASM)
with ChromaMcpManager, a singleton stdio MCP client that communicates with
chroma-mcp via uvx. This eliminates native binary issues, segfaults, and
WASM embedding failures that plagued cross-platform installs.
Key changes:
- Add ChromaMcpManager: singleton MCP client with lazy connect, auto-reconnect,
connection lock, and Zscaler SSL cert support
- Rewrite ChromaSync to use MCP tool calls instead of chromadb npm client
- Handle chroma-mcp's non-JSON responses (plain text success/error messages)
- Treat "collection already exists" as idempotent success
- Wire ChromaMcpManager into GracefulShutdown for clean subprocess teardown
- Delete ChromaServerManager (no longer needed)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address PR review — connection guard leak, timer leak, async reset
- Clear connecting guard in finally block to prevent permanent reconnection block
- Clear timeout after successful connection to prevent timer leak
- Make reset() async to await stop() before nullifying instance
- Delete obsolete chroma-server-manager test (imports deleted class)
- Update graceful-shutdown test to use chromaMcpManager property name
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: prevent chroma-mcp spawn storm — zombie cleanup, stale onclose guard, reconnect backoff
Three bugs caused chroma-mcp processes to accumulate (92+ observed):
1. Zombie on timeout: failed connections left subprocess alive because
only the timer was cleared, not the transport. Now catch block
explicitly closes transport+client before rethrowing.
2. Stale onclose race: old transport's onclose handler captured `this`
and overwrote the current connection reference after reconnect,
orphaning the new subprocess. Now guarded with reference check.
3. No backoff: every failure triggered immediate reconnect. With
backfill doing hundreds of MCP calls, this created rapid-fire
spawning. Added 10s backoff on both connection failure and
unexpected process death.
Also includes ChromaSync fixes from PR review:
- queryChroma deduplication now preserves index-aligned arrays
- SQL injection guard on backfill ID exclusion lists
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Two changes fix the observer process resource leak:
1. Add ensureProcessExit to generator finally blocks in SessionRoutes and
worker-service, matching the pattern already working in SDKAgent.
2. Add stale session reaper (every 2m) that removes sessions with no active
generator and no pending work after 15m idle. This unblocks the orphan
reaper which previously skipped processes for "active" sessions.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: self-healing claimNextMessage prevents stuck processing messages
claimAndDelete → claimNextMessage with atomic self-healing: resets stale
processing messages (>60s) back to pending before claiming. Eliminates
stuck messages from generator crashes without external timers. Removes
redundant idle-timeout reset in worker-service.ts. Adds QUEUE to logger
Component type.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: update stale comments in SessionQueueProcessor to reflect claim-confirm pattern
Comments still referenced the old claim-and-delete pattern after the
claimNextMessage rename. Updated to accurately describe the current
lifecycle where messages are marked as processing and stay in DB until
confirmProcessed() is called.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: move Date.now() inside transaction and extract stale threshold constant
- Move Date.now() inside claimNextMessage transaction closure so timestamp
is fresh if WAL contention causes retry
- Extract STALE_PROCESSING_THRESHOLD_MS to module-level constant
- Add comment clarifying strict < boundary semantics
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: backfill all Chroma projects on worker startup
ChromaSync.ensureBackfilled() existed but was never called. After
v10.2.2's bun cache clear destroyed the ONNX model cache, Chroma only
had ~2 days of embeddings while SQLite had 49k+ observations.
- Add static backfillAllProjects() to ChromaSync — iterates all projects
in SQLite, creates temporary ChromaSync per project, runs smart diff
- Call backfillAllProjects() fire-and-forget on worker startup
- Add 'CHROMA_SYNC' to logger Component type (pre-existing gap)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: sanitize project names for Chroma collection naming
Replace characters outside [a-zA-Z0-9._-] with underscores so projects
like "YC Stuff" map to collection "cm__YC_Stuff" instead of failing
Chroma's collection name validation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: route backfill to shared cm__claude-mem collection, harden sanitization
- Use single ChromaSync('claude-mem') in backfillAllProjects() instead of
per-project instances, matching how DatabaseManager and SearchManager
operate — fixes critical bug where backfilled data landed in orphaned
collections that no search path reads from
- Strip trailing non-alphanumeric chars from sanitized collection names
to satisfy Chroma's end-character constraint
- Guard backfill behind Chroma server readiness to avoid N spurious error
logs when Chroma failed to start
- Use CHROMA_SYNC log component consistently for backfill messages
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: pass project as parameter to ensureBackfilled instead of mutating instance state
Eliminates shared mutable state in backfillAllProjects() loop. Project
scoping is now passed explicitly via parameter to both ensureBackfilled()
and getExistingChromaIds(), keeping a single Chroma connection while
avoiding fragile instance property mutation across iterations.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Remove nuclear `bun pm cache rm` from smart-install.js and
sync-marketplace.cjs (only needed for removed sharp dependency).
Add `bun install` in cache version directory after sync so worker
can resolve dependencies. Move HuggingFace model cache to
~/.claude-mem/models/ so reinstalls don't corrupt it. Add self-healing
retry for Protobuf parsing failures.
Fixes recurring issues #1104, #1105, #1110.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sharp was an explicit dependency but nothing in the codebase imports it.
Chroma embeddings use ONNX Runtime via @chroma-core/default-embed, not sharp.
Sharp's native binary has a persistent Bun node_modules layout bug where
@img/sharp-libvips-* isn't placed alongside @img/sharp-darwin-* causing
ERR_DLOPEN_FAILED on every install.
- Remove sharp, @img/sharp-libvips-darwin-arm64, node-gyp from deps
- Remove node-addon-api from devDeps
- Remove @img cache clearing hacks from smart-install.js and sync-marketplace.cjs
- Replace with simple `bun pm cache rm` before install as general cache hygiene
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: use bun install in sync, add node-addon-api for sharp, consolidate PendingMessageStore
- Switch sync-marketplace from npm to bun install
- Add node-addon-api as dev dep so sharp builds under bun
- Consolidate duplicate PendingMessageStore instantiation in worker-service finally block
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* build assets
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: add gemini-3-flash to validModels array
The model was defined in the type union and RPM limits but missing from
the runtime validModels array, causing silent fallback to gemini-2.5-flash.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: skip processing when Gemini returns empty observation response
Empty responses were silently consuming messages from the queue via
processAgentResponse. Now skips processing on empty content, leaving
the message in processing status for stale recovery.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: prevent idle timeout from triggering infinite restart loop
When a session hits the 3-minute idle timeout, the finally block was
seeing stale processing messages and restarting the generator endlessly.
Now tracks idle timeout as a distinct exit reason via session flag,
resets stale messages, and skips restart.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: clear stale Bun native module cache on update
Bun's global cache retains sharp/libvips native binaries with broken
dylib references after version upgrades. Clear ~/.bun/install/cache/@img/
before install in both the end-user (smart-install) and dev (sync-marketplace)
paths to prevent ERR_DLOPEN_FAILED errors in Chroma sync.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address PR review feedback (empty summary response, session-scoped reset, shell injection)
- Apply same empty-response guard to summary path as observation path in GeminiAgent
- Add optional sessionDbId param to resetStaleProcessingMessages for session-scoped resets
- Use JSON.stringify for gitignore pattern escaping, filter negation patterns
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>