* feat: configurable subprocess pool limit for SDK agents
Prevents runaway accumulation of Claude SDK agent subprocesses by
enforcing a configurable concurrency limit.
- New CLAUDE_MEM_MAX_CONCURRENT_AGENTS setting (default: 2)
- Promise-based waitForSlot() in ProcessRegistry (not polling per
review feedback on #830)
- Waiters are notified via unregisterProcess when a slot frees up
- SDKAgent.startSession() waits for a slot before spawning
- 60s timeout prevents indefinite waits
Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
* fix: remove unused originalUnregister const and getActiveCount import
Cleanup from Greptile review:
- Remove dead `originalUnregister` variable in ProcessRegistry
- Remove unused `getActiveCount` import in SDKAgent
Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
---------
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Happy <yesreply@happy.engineering>
* fix: SDK Agent fails on Windows when username contains spaces
Fixes spawn failure on Windows when the user's path contains spaces
(e.g., C:\Users\Anderson Wang\).
Root cause:
- SDKAgent.ts returns full auto-detected path with spaces
- ProcessRegistry.ts cannot execute .cmd files when path contains spaces
Solution:
- SDKAgent: On Windows, prefer "claude.cmd" via PATH instead of full path
- ProcessRegistry: Use cmd.exe /d /c wrapper for .cmd files on Windows
This preserves argument boundaries (e.g., empty string values) while
properly handling paths with spaces.
Fixes#1014
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* docs: add Windows spawn path with spaces fix documentation
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
- Strip CLAUDECODE env var from SDK subprocesses to prevent "cannot be
launched inside another Claude Code session" error (Claude Code 2.1.42+)
- Lazy-load @chroma-core/default-embed to avoid eagerly pulling in
sharp native binaries at bundle startup (fixes ERR_DLOPEN_FAILED)
- Add stderr capture to SDK spawn for diagnosing future process failures
- Exclude lockfiles from marketplace rsync and delete stale lockfiles
before npm install to prevent native dep version mismatches
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
During SIGHUP testing with 6+ active sessions, ChromaSync.ensureConnection()
had no mutex — concurrent fire-and-forget syncObservation() calls each spawned
a chroma-mcp subprocess via StdioClientTransport, creating 641 orphans in ~5min.
Error-driven reconnection formed a positive feedback loop amplifying the storm.
Defense layers:
- Layer 0: Connection mutex via promise memoization (prevents concurrent spawns)
- Layer 1: Pre-spawn process count guard using execFileSync('ps') (kills excess)
- Layer 2: Hardened close() with try-finally + Unix pkill in GracefulShutdown
- Layer 3: Count-based orphan reaper in ProcessManager (not age-based)
- Layer 4: Circuit breaker stops retries after 3 consecutive failures for 60s
Closes#1063, closes#695
Relates to #1010, #707
Add killIdleDaemonChildren() to clean up SDK-spawned claude processes
that don't terminate after completing their work.
Problem:
- Worker-service daemon spawns Claude SDK processes
- These processes remain alive after work completes
- They accumulate over time, consuming significant memory
- Existing killSystemOrphans() only handles PPID=1 orphans
Solution:
- Add killIdleDaemonChildren() that finds claude processes where:
- Parent PID = daemon's PID (children of worker-service)
- CPU = 0% (idle, not actively working)
- Running > 2 minutes (completed their work)
- Call it from reapOrphanedProcesses() (runs every 5 minutes)
Testing:
- Verified locally: 15+ zombie processes cleaned up
- Memory saved: ~2GB
- Normal processes (MCP server, Chroma) unaffected
Co-Authored-By: Claude <noreply@anthropic.com>
The previous approach (PR #837) set CLAUDE_CONFIG_DIR to isolate observer
sessions from `claude --resume`. However, this broke authentication because
Claude Code stores credentials in the config directory.
This fix uses the SDK's `cwd` option instead:
- Observer sessions run with cwd=~/.claude-mem/observer-sessions/
- Project name = path.basename(cwd) = "observer-sessions"
- Sessions won't appear when running `claude --resume` from actual projects
- Authentication works because ~/.claude/ config is preserved
Changes:
- ProcessRegistry.ts: Remove CLAUDE_CONFIG_DIR override from spawn
- SDKAgent.ts: Add cwd option to query() pointing to observer dir
- paths.ts: Rename OBSERVER_CONFIG_DIR to OBSERVER_SESSIONS_DIR
Fixes regression from #837
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Observer sessions created by claude-mem were appearing in the main Claude Code
session picker (`claude --resume`), cluttering the list with internal plugin
sessions that users never intend to resume.
In one user's case: 74 observer sessions out of ~220 total (34% noise).
## Solution
Set `CLAUDE_CONFIG_DIR` to `~/.claude-mem/observer-config/` when spawning
observer Claude processes. This stores observer session files in a separate
location, isolating them from user sessions.
## Changes
1. Added `OBSERVER_CONFIG_DIR` to paths.ts
2. Modified `createPidCapturingSpawn()` in ProcessRegistry.ts to inject
`CLAUDE_CONFIG_DIR` environment variable
Observer sessions now write their `.jsonl` files to:
`~/.claude-mem/observer-config/projects/*/`
Instead of the user's:
`~/.claude/projects/*/`
Fixes#832
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* Fix zombie process accumulation (Issue #737)
Problem: Claude haiku subprocesses spawned by the SDK weren't terminating
properly, causing zombie process accumulation (user reported 155 processes
consuming 51GB RAM).
Root causes:
1. SDK's SpawnedProcess interface hides subprocess PIDs
2. deleteSession() didn't verify subprocess exit
3. abort() was fire-and-forget with no confirmation
4. No mechanism to track or clean up orphaned processes
Solution:
- Add ProcessRegistry module to track spawned Claude subprocesses
- Use SDK's spawnClaudeCodeProcess option to capture PIDs via custom spawn
- Pass signal parameter to enable AbortController integration
- Wait for subprocess exit in deleteSession() with 5s timeout
- Escalate to SIGKILL if graceful exit fails
- Add orphan reaper running every 5 minutes as safety net
Files changed:
- src/services/worker/ProcessRegistry.ts (new): PID registry and reaper
- src/services/worker/SDKAgent.ts: Use custom spawn to capture PIDs
- src/services/worker/SessionManager.ts: Verify subprocess exit on delete
- src/services/worker-service.ts: Start/stop orphan reaper
Fixes#737
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix: address code review feedback
- Replace busy-wait polling with event-based proc.once('exit')
- Detect and warn about multiple processes per session (race condition)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: bigphoot <bigphoot@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>