claude-mem

Author	SHA1	Message	Date
Alex Newman	7966c6cba9	fix: rename save_memory and fix MCP search instructions + startup hook (#1210 ) * fix: rename save_memory to save_observation and fix MCP search instructions Stop the primary agent from proactively saving memories by renaming save_memory to save_observation with a neutral description. Remove "Saving Memories" section from SKILL.md. Update context formatters and output styles to reference the mem-search skill instead of raw MCP tool names. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: split SessionStart hooks so smart-install failure doesn't block worker start smart-install.js and worker-start were in the same hook group, so if smart-install exited non-zero the worker never started. Split into separate hook groups so they run independently. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: worker startup waits for readiness before hooks fire Move initializationCompleteFlag to set after DB/search init (not MCP), add waitForReadiness() polling /api/readiness, and extract shared pollEndpointUntilOk helper to DRY up health/readiness checks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 03:30:31 -05:00
Alex Newman	e788fd3676	fix: prevent duplicate worker daemons and zombie processes (#1178 ) * fix: prevent duplicate worker daemons and zombie processes Three root causes of chroma-mcp timeouts: 1. HTTP shutdown (POST /api/admin/shutdown) closed resources but never called process.exit(). Zombie workers stayed alive, background tasks reconnected to chroma-mcp, spawning duplicate subprocesses that all contended for the same persistent data directory. 2. No guard against concurrent daemon startup. When hooks fired simultaneously, multiple daemons started before either wrote a PID file. The loser got EADDRINUSE but stayed alive because signal handlers registered in the constructor prevented exit. 3. Corrupt 147GB HNSW index file caused all chroma queries to timeout (MCP error -32001). Data fix: deleted corrupt collection, backfill rebuilds from SQLite. Code fixes: - Add PID-based guard in daemon startup: exit if PID file process alive - Add port-based guard in daemon startup: exit if port already bound (runs before WorkerService constructor registers keepalive handlers) - Add process.exit(0) after HTTP shutdown/restart completes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: aggressive startup cleanup and one-time chroma wipe for upgrade Kill orphaned worker-service.cjs and chroma-mcp processes immediately at startup (no age gate) while keeping 30-min threshold for mcp-server. Wipe corrupt chroma data once on upgrade from pre-v10.3 versions — backfill rebuilds from SQLite automatically. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: wrap shutdown handlers in try/finally to guarantee process.exit If onShutdown() or onRestart() threw, process.exit(0) was never reached, leaving the daemon alive as a zombie. Also removed redundant require('fs') calls in process-manager tests where ESM imports already existed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-18 20:10:28 -05:00
Alex Newman	40daf8f3fa	feat: replace WASM embeddings with persistent chroma-mcp MCP connection (#1176 ) * feat: replace WASM embeddings with persistent chroma-mcp MCP connection Replace ChromaServerManager (npx chroma run + chromadb npm + ONNX/WASM) with ChromaMcpManager, a singleton stdio MCP client that communicates with chroma-mcp via uvx. This eliminates native binary issues, segfaults, and WASM embedding failures that plagued cross-platform installs. Key changes: - Add ChromaMcpManager: singleton MCP client with lazy connect, auto-reconnect, connection lock, and Zscaler SSL cert support - Rewrite ChromaSync to use MCP tool calls instead of chromadb npm client - Handle chroma-mcp's non-JSON responses (plain text success/error messages) - Treat "collection already exists" as idempotent success - Wire ChromaMcpManager into GracefulShutdown for clean subprocess teardown - Delete ChromaServerManager (no longer needed) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review — connection guard leak, timer leak, async reset - Clear connecting guard in finally block to prevent permanent reconnection block - Clear timeout after successful connection to prevent timer leak - Make reset() async to await stop() before nullifying instance - Delete obsolete chroma-server-manager test (imports deleted class) - Update graceful-shutdown test to use chromaMcpManager property name Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: prevent chroma-mcp spawn storm — zombie cleanup, stale onclose guard, reconnect backoff Three bugs caused chroma-mcp processes to accumulate (92+ observed): 1. Zombie on timeout: failed connections left subprocess alive because only the timer was cleared, not the transport. Now catch block explicitly closes transport+client before rethrowing. 2. Stale onclose race: old transport's onclose handler captured `this` and overwrote the current connection reference after reconnect, orphaning the new subprocess. Now guarded with reference check. 3. No backoff: every failure triggered immediate reconnect. With backfill doing hundreds of MCP calls, this created rapid-fire spawning. Added 10s backoff on both connection failure and unexpected process death. Also includes ChromaSync fixes from PR review: - queryChroma deduplication now preserves index-aligned arrays - SQL injection guard on backfill ID exclusion lists Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-18 18:32:38 -05:00
Alex Newman	5d79bb7a7a	fix: prevent zombie process accumulation by verifying subprocess exit (#1168 ) (#1175 ) Two changes fix the observer process resource leak: 1. Add ensureProcessExit to generator finally blocks in SessionRoutes and worker-service, matching the pattern already working in SDKAgent. 2. Add stale session reaper (every 2m) that removes sessions with no active generator and no pending work after 15m idle. This unblocks the orphan reaper which previously skipped processes for "active" sessions. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-18 16:33:23 -05:00
Alex Newman	b88251bc8b	fix: self-healing claimNextMessage prevents stuck processing messages (#1159 ) * fix: self-healing claimNextMessage prevents stuck processing messages claimAndDelete → claimNextMessage with atomic self-healing: resets stale processing messages (>60s) back to pending before claiming. Eliminates stuck messages from generator crashes without external timers. Removes redundant idle-timeout reset in worker-service.ts. Adds QUEUE to logger Component type. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: update stale comments in SessionQueueProcessor to reflect claim-confirm pattern Comments still referenced the old claim-and-delete pattern after the claimNextMessage rename. Updated to accurately describe the current lifecycle where messages are marked as processing and stay in DB until confirmProcessed() is called. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: move Date.now() inside transaction and extract stale threshold constant - Move Date.now() inside claimNextMessage transaction closure so timestamp is fresh if WAL contention causes retry - Extract STALE_PROCESSING_THRESHOLD_MS to module-level constant - Add comment clarifying strict < boundary semantics Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 23:15:46 -05:00
Alex Newman	ca8421611c	fix: backfill Chroma vector DB for all projects on startup (#1154 ) * fix: backfill all Chroma projects on worker startup ChromaSync.ensureBackfilled() existed but was never called. After v10.2.2's bun cache clear destroyed the ONNX model cache, Chroma only had ~2 days of embeddings while SQLite had 49k+ observations. - Add static backfillAllProjects() to ChromaSync — iterates all projects in SQLite, creates temporary ChromaSync per project, runs smart diff - Call backfillAllProjects() fire-and-forget on worker startup - Add 'CHROMA_SYNC' to logger Component type (pre-existing gap) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: sanitize project names for Chroma collection naming Replace characters outside [a-zA-Z0-9._-] with underscores so projects like "YC Stuff" map to collection "cm__YC_Stuff" instead of failing Chroma's collection name validation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: route backfill to shared cm__claude-mem collection, harden sanitization - Use single ChromaSync('claude-mem') in backfillAllProjects() instead of per-project instances, matching how DatabaseManager and SearchManager operate — fixes critical bug where backfilled data landed in orphaned collections that no search path reads from - Strip trailing non-alphanumeric chars from sanitized collection names to satisfy Chroma's end-character constraint - Guard backfill behind Chroma server readiness to avoid N spurious error logs when Chroma failed to start - Use CHROMA_SYNC log component consistently for backfill messages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: pass project as parameter to ensureBackfilled instead of mutating instance state Eliminates shared mutable state in backfillAllProjects() loop. Project scoping is now passed explicitly via parameter to both ensureBackfilled() and getExistingChromaIds(), keeping a single Chroma connection while avoiding fragile instance property mutation across iterations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 22:47:46 -05:00
Alex Newman	224567f980	fix: prevent ONNX model cache corruption from bun cache clears Remove nuclear `bun pm cache rm` from smart-install.js and sync-marketplace.cjs (only needed for removed sharp dependency). Add `bun install` in cache version directory after sync so worker can resolve dependencies. Move HuggingFace model cache to ~/.claude-mem/models/ so reinstalls don't corrupt it. Add self-healing retry for Protobuf parsing failures. Fixes recurring issues #1104, #1105, #1110. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 19:54:17 -05:00
Alex Newman	f24251118e	fix: bun install, node-addon-api for sharp, consolidate PendingMessageStore (#1140 ) * fix: use bun install in sync, add node-addon-api for sharp, consolidate PendingMessageStore - Switch sync-marketplace from npm to bun install - Add node-addon-api as dev dep so sharp builds under bun - Consolidate duplicate PendingMessageStore instantiation in worker-service finally block Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * build assets --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 18:05:42 -05:00
Alex Newman	d2e926fbf7	fix: post-merge breakage (Gemini, idle timeout, sharp cache) (#1138 ) * fix: add gemini-3-flash to validModels array The model was defined in the type union and RPM limits but missing from the runtime validModels array, causing silent fallback to gemini-2.5-flash. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: skip processing when Gemini returns empty observation response Empty responses were silently consuming messages from the queue via processAgentResponse. Now skips processing on empty content, leaving the message in processing status for stale recovery. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: prevent idle timeout from triggering infinite restart loop When a session hits the 3-minute idle timeout, the finally block was seeing stale processing messages and restarting the generator endlessly. Now tracks idle timeout as a distinct exit reason via session flag, resets stale messages, and skips restart. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: clear stale Bun native module cache on update Bun's global cache retains sharp/libvips native binaries with broken dylib references after version upgrades. Clear ~/.bun/install/cache/@img/ before install in both the end-user (smart-install) and dev (sync-marketplace) paths to prevent ERR_DLOPEN_FAILED errors in Chroma sync. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review feedback (empty summary response, session-scoped reset, shell injection) - Apply same empty-response guard to summary path as observation path in GeminiAgent - Add optional sessionDbId param to resetStaleProcessingMessages for session-scoped resets - Use JSON.stringify for gitignore pattern escaping, filter negation patterns Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 17:46:30 -05:00
michelhelsdingen	51719d23a4	feat: configurable subprocess pool limit for SDK agents (#995 ) * feat: configurable subprocess pool limit for SDK agents Prevents runaway accumulation of Claude SDK agent subprocesses by enforcing a configurable concurrency limit. - New CLAUDE_MEM_MAX_CONCURRENT_AGENTS setting (default: 2) - Promise-based waitForSlot() in ProcessRegistry (not polling per review feedback on #830) - Waiters are notified via unregisterProcess when a slot frees up - SDKAgent.startSession() waits for a slot before spawning - 60s timeout prevents indefinite waits Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> * fix: remove unused originalUnregister const and getActiveCount import Cleanup from Greptile review: - Remove dead `originalUnregister` variable in ProcessRegistry - Remove unused `getActiveCount` import in SDKAgent Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Happy <yesreply@happy.engineering>	2026-02-16 00:31:17 -05:00
ixnaswang	81013e1310	fix: SDK Agent fails on Windows when username contains spaces (#1022 ) * fix: SDK Agent fails on Windows when username contains spaces Fixes spawn failure on Windows when the user's path contains spaces (e.g., C:\Users\Anderson Wang\). Root cause: - SDKAgent.ts returns full auto-detected path with spaces - ProcessRegistry.ts cannot execute .cmd files when path contains spaces Solution: - SDKAgent: On Windows, prefer "claude.cmd" via PATH instead of full path - ProcessRegistry: Use cmd.exe /d /c wrapper for .cmd files on Windows This preserves argument boundaries (e.g., empty string values) while properly handling paths with spaces. Fixes #1014 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: add Windows spawn path with spaces fix documentation --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-16 00:31:11 -05:00
Mark L	6d1f17adee	fix: use Bun runtime for Windows daemon spawn (#1086 ) Co-authored-by: root <root@localhost.localdomain>	2026-02-16 00:31:04 -05:00
TerrifiedBug	0a40c4c596	fix: include project in ChromaDB where clause for vector search (#1112 ) When searching with a project parameter, the ChromaDB vector query was not filtering by project. It only filtered by doc_type. This caused larger projects to dominate the top-N results returned by ChromaDB, effectively crowding out results from smaller projects before the post-hoc SQLite project filter could take effect. For example, with project A having 19,000 embeddings and project B having 700, a search scoped to project B would return mostly project A results from ChromaDB. After SQLite filtered by project, only 1-3 results from B would survive instead of the expected 20+. The fix adds the project to the ChromaDB where clause using $and when both doc_type and project filters are needed. This is applied in both ChromaSearchStrategy.buildWhereFilter() and SearchManager.search(). Co-authored-by: TARS <tars@openclaw.local>	2026-02-16 00:30:29 -05:00
Kamran Khalid	02f7c3c9d0	fix(security): validate and restrict /api/instructions operation and topic params (CWE-22, CWE-1321) (#986 )	2026-02-16 00:29:08 -05:00
Mark L	a94ddc504f	fix(cursor): remove obsolete cursor-hooks directory gate (#1087 ) Co-authored-by: root <root@localhost.localdomain>	2026-02-16 00:26:37 -05:00
zhaixingzi	454e9c5870	fix: resolve duplicate assistant messages in OpenRouter agent (#1074 ) This commit addresses the issue of duplicate assistant messages appearing in the conversation history by commenting out the lines that were unnecessarily pushing assistant responses to the conversationHistory array. The processAgentResponse function already handles adding assistant messages to the conversation history, so these additional pushes were causing duplicate entries. Changes made: - Commented out session.conversationHistory.push calls for assistant responses in three locations within OpenRouterAgent.ts: 1. In the init response handling (around line 117) 2. In the observation response handling (around line 188) 3. In the summary response handling (around line 230) This ensures that assistant messages are only added once to the conversation history, preventing duplication while maintaining the intended functionality. Co-authored-by: 张坤 <zhangkun@example.com>	2026-02-16 00:26:07 -05:00
SaneApps	2f337dab13	fix: use Gemini v1 API endpoint instead of v1beta (#1082 ) v1beta does not support newer models like gemini-3-flash, causing silent 404 errors that back up the observation queue indefinitely. Users with CLAUDE_MEM_GEMINI_MODEL=gemini-3-flash get zero observations stored, with no visible error — the queue just grows silently. Changes: - Switch API URL from v1beta/models to v1/models (generateContent works identically on both endpoints) - Add gemini-3-flash to GeminiModel type and RPM limits - Update test to match new endpoint Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 00:26:01 -05:00
Alex Newman	055888e181	fix: address PR review feedback for subprocess cleanup and binary resolution Wrap SDK query loop in try/finally so subprocess cleanup runs on error paths. Swap Chroma binary check order to try project-level .bin first (common case). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 23:24:00 -05:00
Alex Newman	67ba17cc8a	fix: use WASM backend for Chroma embeddings to fix cross-platform issues Chroma requires client-side embeddings — the server is storage only. The previous commit incorrectly removed @chroma-core/default-embed. Uses DefaultEmbeddingFunction({ wasm: true }) which forces the WASM backend instead of native ONNX binaries. Same model (all-MiniLM-L6-v2), same embeddings, but works on all platforms without segfaults or ENOENT errors (#1104, #1105, #1110). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 23:14:21 -05:00
Alex Newman	e1ef14dbcc	fix: resolve orphaned subprocesses and Chroma HTTP regressions - Add subprocess cleanup after SDK query loop completes, using existing ProcessRegistry infrastructure (getProcessBySession + ensureProcessExit) - Replace npx-based Chroma binary spawning with absolute path resolution via require.resolve, falling back to npx with explicit cwd (#1120) - Remove @chroma-core/default-embed client-side dependency; let Chroma HTTP server handle embeddings server-side (#1104, #1105, #1110) Closes #1010, #1089, #1090, #1068, #1120, #1104, #1105, #1110 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 22:04:52 -05:00
Alex Newman	c27314f896	fix: address PR review comments for chroma server lifecycle	2026-02-13 23:39:30 -05:00
Alex Newman	1b68c55763	fix: resolve SDK spawn failures and sharp native binary crashes - Strip CLAUDECODE env var from SDK subprocesses to prevent "cannot be launched inside another Claude Code session" error (Claude Code 2.1.42+) - Lazy-load @chroma-core/default-embed to avoid eagerly pulling in sharp native binaries at bundle startup (fixes ERR_DLOPEN_FAILED) - Add stderr capture to SDK spawn for diagnosing future process failures - Exclude lockfiles from marketplace rsync and delete stale lockfiles before npm install to prevent native dep version mismatches Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 22:47:27 -05:00
Alex Newman	ed313db742	Merge main into feat/chroma-http-server Resolve conflicts between Chroma HTTP server PR and main branch changes (folder CLAUDE.md, exclusion settings, Zscaler SSL, transport cleanup). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 21:02:54 -05:00
Alex Newman	5de728612e	chore: bump version to 10.0.6 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 00:02:37 -05:00
Alex Newman	f05f9ca735	Merge remote-tracking branch 'origin/main' into openclaw-installer # Conflicts: # plugin/scripts/mcp-server.cjs # plugin/scripts/worker-service.cjs	2026-02-12 22:04:03 -05:00
Alex Newman	05e904e613	feat: enhance /api/health with version, uptime, workerPath, and AI status Replace hardcoded TEST-008 build ID with real package version. Add worker filesystem path, uptime counter, and AI provider status (including last interaction success/failure tracking) to the health endpoint response. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-12 21:16:22 -05:00
Alex Newman	98d87d7573	chore: bump version to 10.0.4 Reverts v10.0.3 chroma-mcp spawn storm fix (broken release). Restores codebase to v10.0.2 state. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 21:36:34 -05:00
Alex Newman	3f01baebfe	Merge remote-tracking branch 'origin/main' into fix/chroma-mcp-spawn-storm # Conflicts: # src/services/worker-service.ts # tests/infrastructure/process-manager.test.ts	2026-02-11 15:43:08 -05:00
Alex Newman	0b214a59a1	chore: bump version to 10.0.2 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 15:25:50 -05:00
Rod Boev	2c5c99c0c7	fix: use etime-based sorting instead of PID ordering for process guards Addresses Greptile review feedback: - ChromaSync: replace PID-based sort with ps etime column + parseElapsedTime() for reliable age ordering (PIDs wrap and don't guarantee ordering) - ProcessManager: filter out entries with unparseable etime (-1) before sorting to prevent sort corruption in cleanupExcessChromaProcesses()	2026-02-11 07:19:28 -05:00
Rod Boev	a3f9e7f638	fix: prevent chroma-mcp spawn storm with 5-layer defense (641 processes → max 2) During SIGHUP testing with 6+ active sessions, ChromaSync.ensureConnection() had no mutex — concurrent fire-and-forget syncObservation() calls each spawned a chroma-mcp subprocess via StdioClientTransport, creating 641 orphans in ~5min. Error-driven reconnection formed a positive feedback loop amplifying the storm. Defense layers: - Layer 0: Connection mutex via promise memoization (prevents concurrent spawns) - Layer 1: Pre-spawn process count guard using execFileSync('ps') (kills excess) - Layer 2: Hardened close() with try-finally + Unix pkill in GracefulShutdown - Layer 3: Count-based orphan reaper in ProcessManager (not age-based) - Layer 4: Circuit breaker stops retries after 3 consecutive failures for 60s Closes #1063, closes #695 Relates to #1010, #707	2026-02-11 07:19:28 -05:00
Rod Boev	4e67393d27	fix: prevent daemon silent death from SIGHUP + unhandled errors Root cause: registerSignalHandlers() handled SIGTERM/SIGINT but not SIGHUP. When the parent hook process exits, the kernel sends SIGHUP to the daemon, causing immediate termination (default signal action). Belt-and-suspenders fix: 1. SIGHUP handler: ignore in daemon mode, graceful shutdown otherwise 2. setsid: spawn daemon in new session on Linux (prevents SIGHUP delivery) 3. Global unhandledRejection/uncaughtException guards in daemon mode	2026-02-11 00:35:53 -05:00
Alex Newman	af95461a70	Merge branch 'main' into fix/hook-resilience-worker-lifecycle # Conflicts: # plugin/scripts/mcp-server.cjs # plugin/scripts/worker-service.cjs	2026-02-10 23:37:33 -05:00
Rod Boev	418e38ee46	fix: hook resilience and worker lifecycle improvements (#957 , #923 , #984 , #987 , #1042 ) Reduce timeouts to eliminate 10-30s startup delay when worker is dead (common on WSL2 after hibernate). Add stale PID detection, graceful error handling across all handlers, and error classification that distinguishes worker unavailability from handler bugs. - HEALTH_CHECK 30s→3s, new POST_SPAWN_WAIT (5s), PORT_IN_USE_WAIT (3s) - isProcessAlive() with EPERM handling, cleanStalePidFile() - getPluginVersion() try-catch for shutdown race (#1042) - isWorkerUnavailableError: transport+5xx+429→exit 0, 4xx→exit 2 - No-op handler for unknown event types (#984) - Wrap all handler fetch calls in try-catch for graceful degradation - CLAUDE_MEM_HEALTH_TIMEOUT_MS env var override with validation	2026-02-10 15:34:35 -05:00
xingyu	e4e1d3fb92	fix: Windows platform improvements — re-enable Chroma, fix DB race, simplify env isolation 1. ProcessManager: Migrate spawnDaemon() from WMIC to PowerShell Start-Process - WMIC deprecated in Windows 11, PowerShell inherits env vars properly - Use -WindowStyle Hidden to prevent console popups - Fix redundant backslash escaping in PowerShell $_ variables 2. ChromaSync: Re-enable vector search on Windows - Remove overly defensive platform check that disabled all semantic search - Worker daemon starts with -WindowStyle Hidden; child processes inherit - MCP SDK's StdioClientTransport uses shell:false, no new console created 3. worker-service: Unified DB-ready gate middleware - Replace single-endpoint /api/sessions/init wait with global middleware - Hold all DB-dependent requests until database is initialized (30s timeout) - Whitelist static assets, /health, and viewer page for immediate response - Separate dbReadyPromise (DB only) from initializationComplete (full init) - Fixes "Database not initialized" errors on /stream, /summarize, /init 4. EnvManager: Switch from allowlist to blocklist for subprocess env - Only strip ANTHROPIC_API_KEY to prevent Issue #733 billing hijack - Pass through all other vars (ANTHROPIC_AUTH_TOKEN, ANTHROPIC_BASE_URL, etc.) - Simpler, less fragile than maintaining an exhaustive system vars allowlist	2026-02-07 18:30:57 +08:00
Alex Newman	5969d670d0	chore: bump version to 9.1.1 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-07 02:18:44 -05:00
Alex Newman	8dfcb5e612	chore: bump version to 9.1.0 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-07 01:05:38 -05:00
Alex Newman	ff503d08a7	MAESTRO: Merge PR #657 - Add generate/clean CLI commands for CLAUDE.md management Cherry-picked source changes from PR #657 (224 commits behind main). Adds `claude-mem generate` and `claude-mem clean` CLI commands: - New src/cli/claude-md-commands.ts with generateClaudeMd() and cleanClaudeMd() - Worker service generate/clean case handlers with --dry-run support - CLAUDE_MD logger component type - Uses shared isDirectChild from path-utils.ts (DRY improvement over PR original) Skipped from PR: 91 CLAUDE.md file deletions (stale), build artifacts, .claude/plans/ dev artifact, smart-install.js shell alias auto-injection (aggressive profile modification without consent). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 05:52:54 -05:00
Alex Newman	98920bd860	MAESTRO: Merge PR #662 - Add save_memory MCP tool for manual memory storage Adds save_memory MCP tool allowing users to manually save observations for semantic search. Source changes cherry-picked from PR #662 by @darconada (build artifact conflicts resolved by direct application). Closes #645. Co-Authored-By: darconadalabarga <darconada@arsys.es> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 04:13:44 -05:00
Alex Newman	5dffb1ebb0	MAESTRO: fix(hooks): add session-complete handler to enable orphan reaper cleanup Cherry-picked from PR #844 by @thusdigital. Sessions stayed in active sessions map forever after summarize, causing the orphan reaper to think all processes were still active. Adds session-complete as Stop phase 2 hook that calls POST /api/sessions/complete to remove sessions from the active map, allowing the reaper to correctly identify and clean up orphaned worker processes. Fixes #842. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 03:23:13 -05:00
Alex Newman	da1d2cd36a	MAESTRO: fix(db): prevent FK constraint failures on worker restart Cherry-picked source changes from PR #889 by @Et9797. Fixes #846. Key changes: - Add ensureMemorySessionIdRegistered() guard in SessionStore.ts - Add ON UPDATE CASCADE migration (schema v21) for observations and session_summaries FK constraints - Change message queue from claim-and-delete to claim-confirm pattern (PendingMessageStore.ts) - Add spawn deduplication and unrecoverable error detection in SessionRoutes.ts and worker-service.ts - Add forceInit flag to SDKAgent for stale session recovery Build artifacts skipped (pre-existing dompurify dep issue). Path fixes (HealthMonitor.ts, worker-utils.ts) already merged via PR #634. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 03:16:17 -05:00
Alex Newman	8030c44af4	MAESTRO: fix(cursor): use bun runtime and fix hooks directory detection Cherry-picked source changes from PR #721. Fixes two Cursor standalone setup bugs: 1. findCursorHooksDir() now checks for hooks.json (unified CLI mode) in addition to legacy common.sh/common.ps1 scripts 2. installCursorHooks() now uses bun instead of node for hook commands since worker-service.cjs depends on bun:sqlite 3. Added findBunPath() to detect bun executable across platforms Build artifacts skipped (pre-existing dompurify viewer dep issue). Source-only cherry-pick, TypeScript compilation clean for modified file. Co-Authored-By: polux0 <aleksaprosperitylabs@gmail.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 03:09:26 -05:00
Alex Newman	75a0f2e981	fix: respect CLAUDE_CONFIG_DIR for plugin paths (#626 ) Add MARKETPLACE_ROOT constant to paths.ts and update 5 source files to use centralized path constants instead of hardcoded ~/.claude paths. Preserves backwards compatibility when CLAUDE_CONFIG_DIR is not set. Based on PR #634 by @Kuroakira, cherry-picked onto main due to build artifact merge conflicts (source changes applied cleanly). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 03:00:08 -05:00
Alex Newman	91e1d5baad	fix: correct Gemini model name from gemini-3-flash to gemini-3-flash-preview The Gemini API requires the -preview suffix for the Gemini 3 Flash model. gemini-3-flash does not exist - only gemini-3-flash-preview is available. This was causing 404 errors when users selected this model option. Closes #831 Co-Authored-By: Glucksberg <markuscontasul@gmail.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 02:55:30 -05:00
Alex Newman	3ad53733e8	MAESTRO: Merge PR #884 adding Zscaler SSL certificate support for ChromaDB vector search Adds automatic detection and handling of Zscaler enterprise security certificates on macOS. Combines standard certifi CA certificates with Zscaler certificates into a single bundle, passed via SSL_CERT_FILE/REQUESTS_CA_BUNDLE/CURL_CA_BUNDLE env vars to the chroma-mcp subprocess. Certificate bundle is cached for 24 hours. Falls back gracefully when Zscaler is not present, with no impact on non-Zscaler environments. Co-Authored-By: RClark4958 <rickdclark48@gmail.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 02:15:19 -05:00
Jenha Poyarkov	9f2a237aaf	fix: close transport on connection error to prevent chroma-mcp zombie processes Fixes #761 Root cause: When connection errors occur (MCP error -32000, Connection closed), the code was resetting \`connected\` and \`client\` but NOT calling \`transport.close()\`, leaving the chroma-mcp subprocess alive. Each reconnection attempt spawned a NEW process while old ones accumulated. Changes: - Close transport before resetting state in ensureCollection() error handler - Close transport before resetting state in queryChroma() error handler - Set transport = null after closing to match close() method behavior - Add regression tests for Issue #761 with source code verification Tested on macOS - no more zombie processes after the fix.	2026-02-06 02:10:18 -05:00
Abdelkarim Mateos Sanchez	9bd56c993c	fix: align IDs with metadatas in ChromaSearchStrategy ChromaSync.queryChroma() returns deduplicated sqlite_ids but the metadatas array contains multiple entries per observation (narrative + facts). The filterByRecency() method was iterating over metadatas and using the index to access ids, causing array out-of-bounds access. The fix builds a Map from sqlite_id to metadata, then iterates over the deduplicated ids array to ensure proper alignment. Symptoms before fix: - Semantic search returning incorrect/empty results - Search only working with near-exact queries - Recent items (same day) not being found Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 02:07:03 -05:00
Alex Newman	711f5455df	fix: generate synthetic memorySessionId for stateless providers (PR #615 ) Gemini and OpenRouter are stateless APIs that never return session IDs. Without synthetic IDs, PR #693's defensive memorySessionId checks throw errors on every observation processing call for these providers. Generates provider-prefixed IDs (gemini-/openrouter-{contentSessionId}- {timestamp}) before the first API call, persisted to the database via updateMemorySessionId(). Applied from PR #615 (closed due to staleness). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 02:04:49 -05:00
Alex Newman	2d40afe7ef	fix: provider-aware recovery and stale session cleanup (PR #741 ) Applies PR #741 by @licutis onto main, resolving conflicts with recently merged PRs #693, #937, and #627. Adds getActiveAgent() to WorkerService so startup-recovery uses the correct provider instead of hardcoding SDKAgent. Also cleans up sessions stuck 'active' for 6+ hours and their pending messages before processing orphaned queues. Co-Authored-By: licutis <43884712+licutis@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 01:58:00 -05:00
TranslateMe	ea38601564	fix: Reset AbortController before starting generator to prevent infinite abort loop When a generator exits with wasAborted=true, the AbortController remains in aborted state but generatorPromise is set to null. When a new observation arrives, ensureGeneratorRunning() sees generatorPromise=null and tries to start a new generator, but the new generator immediately sees signal.aborted=true and exits, causing an infinite "Generator aborted" loop. This fix resets the AbortController if it's already aborted before starting a new generator, allowing the session to recover from the stuck state. Bug reproduction: 1. Session receives observations 2. Something causes the generator to be aborted 3. generatorPromise = null, but abortController.signal.aborted = true 4. New observation arrives → starts generator → immediately aborted → loop Fix: Check if abortController.signal.aborted before starting generator, and create a new AbortController if needed.	2026-02-06 01:53:17 -05:00

1 2 3 4 5 ...

332 Commits