* fix: backfill all Chroma projects on worker startup
ChromaSync.ensureBackfilled() existed but was never called. After
v10.2.2's bun cache clear destroyed the ONNX model cache, Chroma only
had ~2 days of embeddings while SQLite had 49k+ observations.
- Add static backfillAllProjects() to ChromaSync — iterates all projects
in SQLite, creates temporary ChromaSync per project, runs smart diff
- Call backfillAllProjects() fire-and-forget on worker startup
- Add 'CHROMA_SYNC' to logger Component type (pre-existing gap)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: sanitize project names for Chroma collection naming
Replace characters outside [a-zA-Z0-9._-] with underscores so projects
like "YC Stuff" map to collection "cm__YC_Stuff" instead of failing
Chroma's collection name validation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: route backfill to shared cm__claude-mem collection, harden sanitization
- Use single ChromaSync('claude-mem') in backfillAllProjects() instead of
per-project instances, matching how DatabaseManager and SearchManager
operate — fixes critical bug where backfilled data landed in orphaned
collections that no search path reads from
- Strip trailing non-alphanumeric chars from sanitized collection names
to satisfy Chroma's end-character constraint
- Guard backfill behind Chroma server readiness to avoid N spurious error
logs when Chroma failed to start
- Use CHROMA_SYNC log component consistently for backfill messages
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: pass project as parameter to ensureBackfilled instead of mutating instance state
Eliminates shared mutable state in backfillAllProjects() loop. Project
scoping is now passed explicitly via parameter to both ensureBackfilled()
and getExistingChromaIds(), keeping a single Chroma connection while
avoiding fragile instance property mutation across iterations.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Remove nuclear `bun pm cache rm` from smart-install.js and
sync-marketplace.cjs (only needed for removed sharp dependency).
Add `bun install` in cache version directory after sync so worker
can resolve dependencies. Move HuggingFace model cache to
~/.claude-mem/models/ so reinstalls don't corrupt it. Add self-healing
retry for Protobuf parsing failures.
Fixes recurring issues #1104, #1105, #1110.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: use bun install in sync, add node-addon-api for sharp, consolidate PendingMessageStore
- Switch sync-marketplace from npm to bun install
- Add node-addon-api as dev dep so sharp builds under bun
- Consolidate duplicate PendingMessageStore instantiation in worker-service finally block
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* build assets
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: add gemini-3-flash to validModels array
The model was defined in the type union and RPM limits but missing from
the runtime validModels array, causing silent fallback to gemini-2.5-flash.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: skip processing when Gemini returns empty observation response
Empty responses were silently consuming messages from the queue via
processAgentResponse. Now skips processing on empty content, leaving
the message in processing status for stale recovery.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: prevent idle timeout from triggering infinite restart loop
When a session hits the 3-minute idle timeout, the finally block was
seeing stale processing messages and restarting the generator endlessly.
Now tracks idle timeout as a distinct exit reason via session flag,
resets stale messages, and skips restart.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: clear stale Bun native module cache on update
Bun's global cache retains sharp/libvips native binaries with broken
dylib references after version upgrades. Clear ~/.bun/install/cache/@img/
before install in both the end-user (smart-install) and dev (sync-marketplace)
paths to prevent ERR_DLOPEN_FAILED errors in Chroma sync.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address PR review feedback (empty summary response, session-scoped reset, shell injection)
- Apply same empty-response guard to summary path as observation path in GeminiAgent
- Add optional sessionDbId param to resetStaleProcessingMessages for session-scoped resets
- Use JSON.stringify for gitignore pattern escaping, filter negation patterns
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add parent heartbeat to MCP server to prevent orphaned processes
MCP server now monitors its parent process every 30s. When the parent
dies (ppid changes to 1 on Unix), the server self-exits to prevent
orphaned node processes that accumulate over time.
- Checks ppid every 30s after server start
- Compares against initial ppid (handles reparenting)
- Timer uses unref() to not keep process alive artificially
- Unix-only (ppid=1 detection doesn't apply on Windows)
Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
* fix: make cleanup() synchronous for consistent shutdown behavior
cleanup() only does synchronous work (clearInterval + process.exit),
so remove async to avoid inconsistent behavior when called from
setInterval callback vs signal handler vs awaited context.
Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
---------
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Happy <yesreply@happy.engineering>
* feat: configurable subprocess pool limit for SDK agents
Prevents runaway accumulation of Claude SDK agent subprocesses by
enforcing a configurable concurrency limit.
- New CLAUDE_MEM_MAX_CONCURRENT_AGENTS setting (default: 2)
- Promise-based waitForSlot() in ProcessRegistry (not polling per
review feedback on #830)
- Waiters are notified via unregisterProcess when a slot frees up
- SDKAgent.startSession() waits for a slot before spawning
- 60s timeout prevents indefinite waits
Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
* fix: remove unused originalUnregister const and getActiveCount import
Cleanup from Greptile review:
- Remove dead `originalUnregister` variable in ProcessRegistry
- Remove unused `getActiveCount` import in SDKAgent
Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
---------
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Happy <yesreply@happy.engineering>
* fix: SDK Agent fails on Windows when username contains spaces
Fixes spawn failure on Windows when the user's path contains spaces
(e.g., C:\Users\Anderson Wang\).
Root cause:
- SDKAgent.ts returns full auto-detected path with spaces
- ProcessRegistry.ts cannot execute .cmd files when path contains spaces
Solution:
- SDKAgent: On Windows, prefer "claude.cmd" via PATH instead of full path
- ProcessRegistry: Use cmd.exe /d /c wrapper for .cmd files on Windows
This preserves argument boundaries (e.g., empty string values) while
properly handling paths with spaces.
Fixes#1014
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* docs: add Windows spawn path with spaces fix documentation
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
When searching with a project parameter, the ChromaDB vector query was
not filtering by project. It only filtered by doc_type. This caused
larger projects to dominate the top-N results returned by ChromaDB,
effectively crowding out results from smaller projects before the
post-hoc SQLite project filter could take effect.
For example, with project A having 19,000 embeddings and project B
having 700, a search scoped to project B would return mostly project A
results from ChromaDB. After SQLite filtered by project, only 1-3
results from B would survive instead of the expected 20+.
The fix adds the project to the ChromaDB where clause using $and when
both doc_type and project filters are needed. This is applied in both
ChromaSearchStrategy.buildWhereFilter() and SearchManager.search().
Co-authored-by: TARS <tars@openclaw.local>
This commit addresses the issue of duplicate assistant messages appearing
in the conversation history by commenting out the lines that were
unnecessarily pushing assistant responses to the conversationHistory array.
The processAgentResponse function already handles adding assistant messages
to the conversation history, so these additional pushes were causing
duplicate entries.
Changes made:
- Commented out session.conversationHistory.push calls for assistant responses
in three locations within OpenRouterAgent.ts:
1. In the init response handling (around line 117)
2. In the observation response handling (around line 188)
3. In the summary response handling (around line 230)
This ensures that assistant messages are only added once to the conversation
history, preventing duplication while maintaining the intended functionality.
Co-authored-by: 张坤 <zhangkun@example.com>
v1beta does not support newer models like gemini-3-flash, causing
silent 404 errors that back up the observation queue indefinitely.
Users with CLAUDE_MEM_GEMINI_MODEL=gemini-3-flash get zero observations
stored, with no visible error — the queue just grows silently.
Changes:
- Switch API URL from v1beta/models to v1/models (generateContent
works identically on both endpoints)
- Add gemini-3-flash to GeminiModel type and RPM limits
- Update test to match new endpoint
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
The summarize (Stop) and observation (PostToolUse) handlers throw
blocking errors (exit code 2) when optional input fields like
transcriptPath, toolName, or cwd are missing. This causes visible
hook errors on every session stop and after some tool uses.
Replace throws with graceful returns matching the existing pattern
used for worker-unavailable checks.
Fixes#1097
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Address PR #1125 review feedback - both fetches now start simultaneously
via Promise.all instead of sequential-then-parallel.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add systemMessage field to HookResult so SessionStart can display a
colored timeline directly to the user in the CLI. The handler now
parallel-fetches both markdown (for Claude context) and ANSI-colored
(for user display) timelines, appending a viewer URL link.
Also update default settings to hide verbose token columns (read/work
tokens, savings amount) and disable full observation expansion, keeping
the cleaner index-only view by default.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wrap SDK query loop in try/finally so subprocess cleanup runs on error paths.
Swap Chroma binary check order to try project-level .bin first (common case).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Chroma requires client-side embeddings — the server is storage only.
The previous commit incorrectly removed @chroma-core/default-embed.
Uses DefaultEmbeddingFunction({ wasm: true }) which forces the WASM
backend instead of native ONNX binaries. Same model (all-MiniLM-L6-v2),
same embeddings, but works on all platforms without segfaults or
ENOENT errors (#1104, #1105, #1110).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Strip CLAUDECODE env var from SDK subprocesses to prevent "cannot be
launched inside another Claude Code session" error (Claude Code 2.1.42+)
- Lazy-load @chroma-core/default-embed to avoid eagerly pulling in
sharp native binaries at bundle startup (fixes ERR_DLOPEN_FAILED)
- Add stderr capture to SDK spawn for diagnosing future process failures
- Exclude lockfiles from marketplace rsync and delete stale lockfiles
before npm install to prevent native dep version mismatches
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve conflicts between Chroma HTTP server PR and main branch changes
(folder CLAUDE.md, exclusion settings, Zscaler SSL, transport cleanup).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace hardcoded TEST-008 build ID with real package version. Add worker
filesystem path, uptime counter, and AI provider status (including last
interaction success/failure tracking) to the health endpoint response.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Addresses Greptile review feedback:
- ChromaSync: replace PID-based sort with ps etime column + parseElapsedTime()
for reliable age ordering (PIDs wrap and don't guarantee ordering)
- ProcessManager: filter out entries with unparseable etime (-1) before
sorting to prevent sort corruption in cleanupExcessChromaProcesses()
During SIGHUP testing with 6+ active sessions, ChromaSync.ensureConnection()
had no mutex — concurrent fire-and-forget syncObservation() calls each spawned
a chroma-mcp subprocess via StdioClientTransport, creating 641 orphans in ~5min.
Error-driven reconnection formed a positive feedback loop amplifying the storm.
Defense layers:
- Layer 0: Connection mutex via promise memoization (prevents concurrent spawns)
- Layer 1: Pre-spawn process count guard using execFileSync('ps') (kills excess)
- Layer 2: Hardened close() with try-finally + Unix pkill in GracefulShutdown
- Layer 3: Count-based orphan reaper in ProcessManager (not age-based)
- Layer 4: Circuit breaker stops retries after 3 consecutive failures for 60s
Closes#1063, closes#695
Relates to #1010, #707
Root cause: registerSignalHandlers() handled SIGTERM/SIGINT but not
SIGHUP. When the parent hook process exits, the kernel sends SIGHUP
to the daemon, causing immediate termination (default signal action).
Belt-and-suspenders fix:
1. SIGHUP handler: ignore in daemon mode, graceful shutdown otherwise
2. setsid: spawn daemon in new session on Linux (prevents SIGHUP delivery)
3. Global unhandledRejection/uncaughtException guards in daemon mode
Missing return statement and closing brace in the programming errors
check caused a build failure after merging main.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three fixes to make OpenClaw agent observations work end-to-end:
1. Session init in before_agent_start — the worker's privacy check
requires a stored user prompt; without calling /api/sessions/init,
all observations were skipped as "private"
2. Race condition fix in agent_end — await summarize before sending
complete, preventing session deletion before in-flight observation
POSTs arrive
3. OAuth token pass-through in buildIsolatedEnv — spawned Claude CLI
processes now receive CLAUDE_CODE_OAUTH_TOKEN from the worker's
env when no explicit API key is configured
Also adds agent-specific emoji mapping and dynamic project naming
for the Telegram observation feed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reduce timeouts to eliminate 10-30s startup delay when worker is dead
(common on WSL2 after hibernate). Add stale PID detection, graceful
error handling across all handlers, and error classification that
distinguishes worker unavailability from handler bugs.
- HEALTH_CHECK 30s→3s, new POST_SPAWN_WAIT (5s), PORT_IN_USE_WAIT (3s)
- isProcessAlive() with EPERM handling, cleanStalePidFile()
- getPluginVersion() try-catch for shutdown race (#1042)
- isWorkerUnavailableError: transport+5xx+429→exit 0, 4xx→exit 2
- No-op handler for unknown event types (#984)
- Wrap all handler fetch calls in try-catch for graceful degradation
- CLAUDE_MEM_HEALTH_TIMEOUT_MS env var override with validation
1. ProcessManager: Migrate spawnDaemon() from WMIC to PowerShell Start-Process
- WMIC deprecated in Windows 11, PowerShell inherits env vars properly
- Use -WindowStyle Hidden to prevent console popups
- Fix redundant backslash escaping in PowerShell $_ variables
2. ChromaSync: Re-enable vector search on Windows
- Remove overly defensive platform check that disabled all semantic search
- Worker daemon starts with -WindowStyle Hidden; child processes inherit
- MCP SDK's StdioClientTransport uses shell:false, no new console created
3. worker-service: Unified DB-ready gate middleware
- Replace single-endpoint /api/sessions/init wait with global middleware
- Hold all DB-dependent requests until database is initialized (30s timeout)
- Whitelist static assets, /health, and viewer page for immediate response
- Separate dbReadyPromise (DB only) from initializationComplete (full init)
- Fixes "Database not initialized" errors on /stream, /summarize, /init
4. EnvManager: Switch from allowlist to blocklist for subprocess env
- Only strip ANTHROPIC_API_KEY to prevent Issue #733 billing hijack
- Pass through all other vars (ANTHROPIC_AUTH_TOKEN, ANTHROPIC_BASE_URL, etc.)
- Simpler, less fragile than maintaining an exhaustive system vars allowlist
Cherry-picked source changes from PR #657 (224 commits behind main).
Adds `claude-mem generate` and `claude-mem clean` CLI commands:
- New src/cli/claude-md-commands.ts with generateClaudeMd() and cleanClaudeMd()
- Worker service generate/clean case handlers with --dry-run support
- CLAUDE_MD logger component type
- Uses shared isDirectChild from path-utils.ts (DRY improvement over PR original)
Skipped from PR: 91 CLAUDE.md file deletions (stale), build artifacts,
.claude/plans/ dev artifact, smart-install.js shell alias auto-injection
(aggressive profile modification without consent).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cherry-picked both PRs to main (both had merge conflicts with current main).
PR #920 (@Spunky84): CLAUDE_MEM_EXCLUDED_PROJECTS setting with glob patterns
to exclude entire projects from memory tracking (privacy/confidentiality).
Early-exit in session-init and observation handlers. 11 unit tests.
PR #699 (@leepokai): CLAUDE_MEM_FOLDER_MD_EXCLUDE setting with JSON array
of paths to exclude from CLAUDE.md file generation (fixes SwiftUI/Xcode
build conflicts and drizzle kit migration failures). Closes#620.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds save_memory MCP tool allowing users to manually save observations
for semantic search. Source changes cherry-picked from PR #662 by
@darconada (build artifact conflicts resolved by direct application).
Closes#645.
Co-Authored-By: darconadalabarga <darconada@arsys.es>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cherry-picked from PR #844 by @thusdigital. Sessions stayed in active
sessions map forever after summarize, causing the orphan reaper to think
all processes were still active. Adds session-complete as Stop phase 2
hook that calls POST /api/sessions/complete to remove sessions from the
active map, allowing the reaper to correctly identify and clean up
orphaned worker processes. Fixes#842.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cherry-picked source changes from PR #889 by @Et9797. Fixes#846.
Key changes:
- Add ensureMemorySessionIdRegistered() guard in SessionStore.ts
- Add ON UPDATE CASCADE migration (schema v21) for observations and session_summaries FK constraints
- Change message queue from claim-and-delete to claim-confirm pattern (PendingMessageStore.ts)
- Add spawn deduplication and unrecoverable error detection in SessionRoutes.ts and worker-service.ts
- Add forceInit flag to SDKAgent for stale session recovery
Build artifacts skipped (pre-existing dompurify dep issue). Path fixes (HealthMonitor.ts, worker-utils.ts)
already merged via PR #634.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cherry-picked source changes from PR #721. Fixes two Cursor standalone
setup bugs:
1. findCursorHooksDir() now checks for hooks.json (unified CLI mode)
in addition to legacy common.sh/common.ps1 scripts
2. installCursorHooks() now uses bun instead of node for hook commands
since worker-service.cjs depends on bun:sqlite
3. Added findBunPath() to detect bun executable across platforms
Build artifacts skipped (pre-existing dompurify viewer dep issue).
Source-only cherry-pick, TypeScript compilation clean for modified file.
Co-Authored-By: polux0 <aleksaprosperitylabs@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Root cause: Claude Code doesn't close stdin after writing hook input,
so stdin.on('end') never fires.
Previous approach: Timeout-based workaround (wait 5s then parse).
New approach: JSON is self-delimiting. We attempt to parse after each
data chunk. Once we have valid JSON, we resolve immediately without
waiting for EOF. This is the proper fix - hooks now exit in <500ms
instead of waiting for any timeout.
Changes:
- Add tryParseJson() to detect complete JSON
- Parse after each stdin chunk, resolve immediately on success
- Add 50ms parse delay for multi-chunk delivery edge case
- Safety timeout (30s) only for truly malformed input
- Removes dependency on stdin.on('end') which never fires
Testing:
- Normal operation: 448ms (was 5000ms+ with timeout approach)
- Stdin stays open: Process exits immediately after JSON complete
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Change from absolute timeout to inactivity timeout (reset on each data chunk)
to avoid truncating large/slow payloads
- Fix race condition: add resolved=true before resolving in catch block
- Fix unreliable readable check: just access the property, don't check value
- Add cleanup() call in catch block for consistency
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fixes#727 (PostToolUse hooks hanging at "1/2 done")
Addresses #646 (Bun stdin EINVAL crash)
Root causes:
1. Bun crashes with EINVAL when Claude Code doesn't provide valid stdin fd
2. stdin.on('end') never fires if Claude Code doesn't close stdin properly
Changes:
- Add isStdinAvailable() to safely check stdin before reading
- Wrap stdin access in try-catch to handle Bun's lazy fstat crash
- Add 5-second timeout to prevent indefinite hangs
- Gracefully return undefined instead of crashing on stdin errors
- Properly clean up event listeners to prevent memory leaks
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>