During SIGHUP testing with 6+ active sessions, ChromaSync.ensureConnection()
had no mutex — concurrent fire-and-forget syncObservation() calls each spawned
a chroma-mcp subprocess via StdioClientTransport, creating 641 orphans in ~5min.
Error-driven reconnection formed a positive feedback loop amplifying the storm.
Defense layers:
- Layer 0: Connection mutex via promise memoization (prevents concurrent spawns)
- Layer 1: Pre-spawn process count guard using execFileSync('ps') (kills excess)
- Layer 2: Hardened close() with try-finally + Unix pkill in GracefulShutdown
- Layer 3: Count-based orphan reaper in ProcessManager (not age-based)
- Layer 4: Circuit breaker stops retries after 3 consecutive failures for 60s
Closes#1063, closes#695
Relates to #1010, #707
Root cause: registerSignalHandlers() handled SIGTERM/SIGINT but not
SIGHUP. When the parent hook process exits, the kernel sends SIGHUP
to the daemon, causing immediate termination (default signal action).
Belt-and-suspenders fix:
1. SIGHUP handler: ignore in daemon mode, graceful shutdown otherwise
2. setsid: spawn daemon in new session on Linux (prevents SIGHUP delivery)
3. Global unhandledRejection/uncaughtException guards in daemon mode
Reduce timeouts to eliminate 10-30s startup delay when worker is dead
(common on WSL2 after hibernate). Add stale PID detection, graceful
error handling across all handlers, and error classification that
distinguishes worker unavailability from handler bugs.
- HEALTH_CHECK 30s→3s, new POST_SPAWN_WAIT (5s), PORT_IN_USE_WAIT (3s)
- isProcessAlive() with EPERM handling, cleanStalePidFile()
- getPluginVersion() try-catch for shutdown race (#1042)
- isWorkerUnavailableError: transport+5xx+429→exit 0, 4xx→exit 2
- No-op handler for unknown event types (#984)
- Wrap all handler fetch calls in try-catch for graceful degradation
- CLAUDE_MEM_HEALTH_TIMEOUT_MS env var override with validation
1. ProcessManager: Migrate spawnDaemon() from WMIC to PowerShell Start-Process
- WMIC deprecated in Windows 11, PowerShell inherits env vars properly
- Use -WindowStyle Hidden to prevent console popups
- Fix redundant backslash escaping in PowerShell $_ variables
2. ChromaSync: Re-enable vector search on Windows
- Remove overly defensive platform check that disabled all semantic search
- Worker daemon starts with -WindowStyle Hidden; child processes inherit
- MCP SDK's StdioClientTransport uses shell:false, no new console created
3. worker-service: Unified DB-ready gate middleware
- Replace single-endpoint /api/sessions/init wait with global middleware
- Hold all DB-dependent requests until database is initialized (30s timeout)
- Whitelist static assets, /health, and viewer page for immediate response
- Separate dbReadyPromise (DB only) from initializationComplete (full init)
- Fixes "Database not initialized" errors on /stream, /summarize, /init
4. EnvManager: Switch from allowlist to blocklist for subprocess env
- Only strip ANTHROPIC_API_KEY to prevent Issue #733 billing hijack
- Pass through all other vars (ANTHROPIC_AUTH_TOKEN, ANTHROPIC_BASE_URL, etc.)
- Simpler, less fragile than maintaining an exhaustive system vars allowlist
Cherry-picked source changes from PR #657 (224 commits behind main).
Adds `claude-mem generate` and `claude-mem clean` CLI commands:
- New src/cli/claude-md-commands.ts with generateClaudeMd() and cleanClaudeMd()
- Worker service generate/clean case handlers with --dry-run support
- CLAUDE_MD logger component type
- Uses shared isDirectChild from path-utils.ts (DRY improvement over PR original)
Skipped from PR: 91 CLAUDE.md file deletions (stale), build artifacts,
.claude/plans/ dev artifact, smart-install.js shell alias auto-injection
(aggressive profile modification without consent).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds save_memory MCP tool allowing users to manually save observations
for semantic search. Source changes cherry-picked from PR #662 by
@darconada (build artifact conflicts resolved by direct application).
Closes#645.
Co-Authored-By: darconadalabarga <darconada@arsys.es>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cherry-picked from PR #844 by @thusdigital. Sessions stayed in active
sessions map forever after summarize, causing the orphan reaper to think
all processes were still active. Adds session-complete as Stop phase 2
hook that calls POST /api/sessions/complete to remove sessions from the
active map, allowing the reaper to correctly identify and clean up
orphaned worker processes. Fixes#842.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cherry-picked source changes from PR #889 by @Et9797. Fixes#846.
Key changes:
- Add ensureMemorySessionIdRegistered() guard in SessionStore.ts
- Add ON UPDATE CASCADE migration (schema v21) for observations and session_summaries FK constraints
- Change message queue from claim-and-delete to claim-confirm pattern (PendingMessageStore.ts)
- Add spawn deduplication and unrecoverable error detection in SessionRoutes.ts and worker-service.ts
- Add forceInit flag to SDKAgent for stale session recovery
Build artifacts skipped (pre-existing dompurify dep issue). Path fixes (HealthMonitor.ts, worker-utils.ts)
already merged via PR #634.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cherry-picked source changes from PR #721. Fixes two Cursor standalone
setup bugs:
1. findCursorHooksDir() now checks for hooks.json (unified CLI mode)
in addition to legacy common.sh/common.ps1 scripts
2. installCursorHooks() now uses bun instead of node for hook commands
since worker-service.cjs depends on bun:sqlite
3. Added findBunPath() to detect bun executable across platforms
Build artifacts skipped (pre-existing dompurify viewer dep issue).
Source-only cherry-pick, TypeScript compilation clean for modified file.
Co-Authored-By: polux0 <aleksaprosperitylabs@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add MARKETPLACE_ROOT constant to paths.ts and update 5 source files to use
centralized path constants instead of hardcoded ~/.claude paths. Preserves
backwards compatibility when CLAUDE_CONFIG_DIR is not set.
Based on PR #634 by @Kuroakira, cherry-picked onto main due to build artifact
merge conflicts (source changes applied cleanly).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Gemini API requires the -preview suffix for the Gemini 3 Flash model.
gemini-3-flash does not exist - only gemini-3-flash-preview is available.
This was causing 404 errors when users selected this model option.
Closes#831
Co-Authored-By: Glucksberg <markuscontasul@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds automatic detection and handling of Zscaler enterprise security certificates
on macOS. Combines standard certifi CA certificates with Zscaler certificates into
a single bundle, passed via SSL_CERT_FILE/REQUESTS_CA_BUNDLE/CURL_CA_BUNDLE env vars
to the chroma-mcp subprocess. Certificate bundle is cached for 24 hours. Falls back
gracefully when Zscaler is not present, with no impact on non-Zscaler environments.
Co-Authored-By: RClark4958 <rickdclark48@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fixes#761
Root cause: When connection errors occur (MCP error -32000, Connection closed),
the code was resetting \`connected\` and \`client\` but NOT calling
\`transport.close()\`, leaving the chroma-mcp subprocess alive. Each
reconnection attempt spawned a NEW process while old ones accumulated.
Changes:
- Close transport before resetting state in ensureCollection() error handler
- Close transport before resetting state in queryChroma() error handler
- Set transport = null after closing to match close() method behavior
- Add regression tests for Issue #761 with source code verification
Tested on macOS - no more zombie processes after the fix.
ChromaSync.queryChroma() returns deduplicated sqlite_ids but the
metadatas array contains multiple entries per observation (narrative +
facts). The filterByRecency() method was iterating over metadatas and
using the index to access ids, causing array out-of-bounds access.
The fix builds a Map from sqlite_id to metadata, then iterates over
the deduplicated ids array to ensure proper alignment.
Symptoms before fix:
- Semantic search returning incorrect/empty results
- Search only working with near-exact queries
- Recent items (same day) not being found
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Gemini and OpenRouter are stateless APIs that never return session IDs.
Without synthetic IDs, PR #693's defensive memorySessionId checks throw
errors on every observation processing call for these providers.
Generates provider-prefixed IDs (gemini-/openrouter-{contentSessionId}-
{timestamp}) before the first API call, persisted to the database via
updateMemorySessionId(). Applied from PR #615 (closed due to staleness).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Applies PR #741 by @licutis onto main, resolving conflicts with
recently merged PRs #693, #937, and #627. Adds getActiveAgent() to
WorkerService so startup-recovery uses the correct provider instead
of hardcoding SDKAgent. Also cleans up sessions stuck 'active' for
6+ hours and their pending messages before processing orphaned queues.
Co-Authored-By: licutis <43884712+licutis@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a generator exits with wasAborted=true, the AbortController remains in
aborted state but generatorPromise is set to null. When a new observation
arrives, ensureGeneratorRunning() sees generatorPromise=null and tries to
start a new generator, but the new generator immediately sees
signal.aborted=true and exits, causing an infinite "Generator aborted" loop.
This fix resets the AbortController if it's already aborted before starting
a new generator, allowing the session to recover from the stuck state.
Bug reproduction:
1. Session receives observations
2. Something causes the generator to be aborted
3. generatorPromise = null, but abortController.signal.aborted = true
4. New observation arrives → starts generator → immediately aborted → loop
Fix: Check if abortController.signal.aborted before starting generator,
and create a new AbortController if needed.
PostToolUse hook can create the session before UserPromptSubmit's
session-init sets the project, leaving it empty. Add an UPDATE after
INSERT OR IGNORE to backfill the project when a later hook provides it.
Closes#939
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add restart limit (max 3 consecutive restarts) with exponential backoff
to prevent infinite generator restart loops. Also add defensive
memorySessionId checks in GeminiAgent and OpenRouterAgent before
expensive LLM calls to fail fast when session ID hasn't been captured.
Based on PR #693 by @ajbmachon (applied to current main).
Co-Authored-By: Andre Machon <ajbmachon2@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
JSON.parse preserves native types, so boolean true/false stay as
booleans rather than strings. The previous check only handled string
'true', meaning users who set `"ENABLED": true` (boolean) wouldn't
have the feature work.
Now both `getBool()` helper and ResponseProcessor check handle:
- String 'true' → enabled
- Boolean true → enabled
- Any other value → disabled
Tested: Confirmed feature stays disabled with string "false" and
no new CLAUDE.md files are created when accessing directories.
The CLAUDE_MEM_FOLDER_CLAUDEMD_ENABLED setting was documented but never
actually checked in code. The folder CLAUDE.md generation ran
unconditionally whenever files were touched.
Changes:
- Add CLAUDE_MEM_FOLDER_CLAUDEMD_ENABLED to SettingsDefaults interface
- Add default value 'false' to DEFAULTS object
- Check setting in ResponseProcessor before calling updateFolderClaudeMdFiles
Fixes the issue identified in issue-600-documentation-audit-features-not-implemented.md:
"The setting CLAUDE_MEM_FOLDER_CLAUDEMD_ENABLED is never read."
Fixes a race condition where the UserPromptSubmit hook could call
/api/sessions/init before the database is fully initialized, resulting
in a 500 error with "Database not initialized".
Root cause:
- Worker starts HTTP server immediately for fast startup
- Health endpoint (/api/health) returns OK when server is listening
- session-init hook waits for health check, then calls /api/sessions/init
- Database initialization happens in background, creating a race window
Solution:
- Add early handler for /api/sessions/init that waits for initialization
- Uses same pattern as existing /api/context/inject handler
- Returns 503 with retry message if initialization times out
The fix ensures hooks receive proper responses even during the brief
startup window when the server is listening but DB isn't ready yet.
Co-Authored-By: Claude Code <noreply@anthropic.com>
The startup cleanupOrphanedProcesses() only targeted chroma-mcp, leaving
orphaned mcp-server.cjs and worker-service.cjs processes undetected after
daemon crashes. Expanded to target all claude-mem process types with
30-minute age filtering and current PID exclusion. Closes PR #687 (which
had a spawnDaemon regression removing Windows WMIC support).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add killIdleDaemonChildren() to clean up SDK-spawned claude processes
that don't terminate after completing their work.
Problem:
- Worker-service daemon spawns Claude SDK processes
- These processes remain alive after work completes
- They accumulate over time, consuming significant memory
- Existing killSystemOrphans() only handles PPID=1 orphans
Solution:
- Add killIdleDaemonChildren() that finds claude processes where:
- Parent PID = daemon's PID (children of worker-service)
- CPU = 0% (idle, not actively working)
- Running > 2 minutes (completed their work)
- Call it from reapOrphanedProcesses() (runs every 5 minutes)
Testing:
- Verified locally: 15+ zombie processes cleaned up
- Memory saved: ~2GB
- Normal processes (MCP server, Chroma) unaffected
Co-Authored-By: Claude <noreply@anthropic.com>
Adds file-based cooldown lock in ensureWorkerStarted() so failed worker
spawn attempts on Windows don't produce repeated bun.exe terminal popups.
Based on the approach from PR #931, integrated into the refactored code.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The /api/context/inject endpoint previously blocked for up to 5 minutes
(300s timeout) waiting for initializationComplete, which includes MCP
connection setup. On Windows, the MCP connection can hang indefinitely,
causing the context hook to never return and blocking Claude Code startup.
This change makes the endpoint fail open: if the worker hasn't finished
initializing, return empty context immediately instead of blocking. The
hook completes fast, and context becomes available on subsequent prompts
once initialization finishes in the background.
Closes#958
Prevents cross-origin attacks from malicious websites by restricting
CORS to only allow:
- Requests without Origin header (hooks, curl, CLI tools)
- Requests from localhost / 127.0.0.1 origins
Previously, CORS was completely open (cors() without configuration),
allowing any website to access the local API and read session data.
Fixes the "Worker did not become ready within 15 seconds" timeout issue.
Root cause: isWorkerHealthy() and waitForHealth() were checking /api/readiness
which returns 503 until full initialization completes (including MCP connection
which can take 5+ minutes). Hooks only have 15 seconds timeout.
Solution: Use /api/health (liveness check) which returns 200 as soon as the
HTTP server is listening. This is sufficient for hook communication since
the worker can accept requests while background initialization continues.
Changes:
- src/shared/worker-utils.ts: Change /api/readiness to /api/health in isWorkerHealthy()
- src/services/infrastructure/HealthMonitor.ts: Change /api/readiness to /api/health in waitForHealth()
- tests/infrastructure/health-monitor.test.ts: Update test to expect /api/health
Fixes#811, #772, #729
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This fixes Issue #733 where claude-mem would incorrectly use ANTHROPIC_API_KEY from
random project .env files instead of the user's configured Claude Code CLI subscription.
Root cause: The SDK's `query()` function inherits from `process.env` when no `env`
option is passed. When users work in projects with their own .env files containing
API keys, the SDK would discover and use those keys, billing the wrong account.
Solution: Centralized credential management via ~/.claude-mem/.env
Changes:
- Add EnvManager.ts: Centralized credential storage and isolated env builder
- SDKAgent: Pass isolated env to SDK query() that only includes credentials from
~/.claude-mem/.env, not random keys from process.env inheritance
- GeminiAgent/OpenRouterAgent: Use getCredential() instead of process.env fallback
- SettingsDefaultsManager: Add CLAUDE_MEM_CLAUDE_AUTH_METHOD setting ('cli' | 'api')
How it works:
1. buildIsolatedEnv() creates a clean environment with only essential system vars
(PATH, HOME, etc.) and credentials explicitly configured in ~/.claude-mem/.env
2. SDK subprocess runs with this isolated env, never seeing random API keys
3. If no ANTHROPIC_API_KEY is in ~/.claude-mem/.env, Claude Code CLI billing is used
4. Same pattern applied to Gemini/OpenRouter agents for consistency
This ensures claude-mem always uses the user's intended billing method, regardless
of what .env files exist in their working directory.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix: add idle timeout to prevent zombie observer processes
Root cause fix for zombie observer accumulation. The SessionQueueProcessor
iterator now exits gracefully after 3 minutes of inactivity instead of
waiting forever for messages.
Changes:
- Add IDLE_TIMEOUT_MS constant (3 minutes)
- waitForMessage() now returns boolean and accepts timeout parameter
- createIterator() tracks lastActivityTime and exits on idle timeout
- Graceful exit via return (not throw) allows SDK to complete cleanly
This addresses the root cause that PR #848 worked around with pattern
matching. Observer processes now self-terminate, preventing accumulation
when session-complete hooks don't fire.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix: trigger abort on idle timeout to actually kill subprocess
The previous implementation only returned from the iterator on idle timeout,
but this doesn't terminate the Claude subprocess - it just stops yielding
messages. The subprocess stays alive as a zombie because:
1. Returning from createIterator() ends the generator
2. The SDK closes stdin via transport.endInput()
3. But the subprocess may not exit on stdin EOF
4. No abort signal is sent to kill it
Fix: Add onIdleTimeout callback that SessionManager uses to call
session.abortController.abort(). This sends SIGTERM to the subprocess
via the SDK's ProcessTransport abort handler.
Verified by Codex analysis of the SDK internals:
- abort() triggers ProcessTransport abort handler → SIGTERM
- transport.close() sends SIGTERM → escalates to SIGKILL after 5s
- Just closing stdin is NOT sufficient to guarantee subprocess exit
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix: add idle timeout to prevent zombie observer processes
Also cleaned up hooks.json to remove redundant start commands.
The hook command handler now auto-starts the worker if not running,
which is how it should have been since we changed to auto-start.
This maintenance change was done manually.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix: resolve race condition in session queue idle timeout detection
- Reset timer on spurious wakeup when queue is empty but duration check fails
- Use optional chaining for onIdleTimeout callback
- Include threshold value in idle timeout log message for better diagnostics
- Add comprehensive unit tests for SessionQueueProcessor
Fixes PR #856 review feedback.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* feat: migrate installer to Setup hook
- Add plugin/scripts/setup.sh for one-time dependency setup
- Add Setup hook to hooks.json (triggers via claude --init)
- Remove smart-install.js from SessionStart hook
- Keep smart-install.js as manual fallback for Windows/auto-install
Setup hook handles:
- Bun detection with fallback locations
- uv detection (optional, for Chroma)
- Version marker to skip redundant installs
- Clear error messages with install instructions
* feat: add np for one-command npm releases
- Add np as dev dependency
- Add release, release:patch, release:minor, release:major scripts
- Add prepublishOnly hook to run build before publish
- Configure np (no yarn, include all contents, run tests)
* fix: reduce PostToolUse hook timeout to 30s
PostToolUse runs on every tool call, 120s was excessive and could cause
hangs. Reduced to 30s for responsive behavior.
* docs: add PR shipping report
Analyzed 6 PRs for shipping readiness:
- #856: Ready to merge (idle timeout fix)
- #700, #722, #657: Have conflicts, need rebase
- #464: Contributor PR, too large (15K+ lines)
- #863: Needs manual review
Includes shipping strategy and conflict resolution order.
* MAESTRO: Verify PR #856 test suite passes
All 797 tests pass (3 skipped, 0 failures). The 11 SessionQueueProcessor
idle timeout tests all pass with 20 expect() assertions verified.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* MAESTRO: Verify PR #856 build passes
- Ran npm run build successfully with no TypeScript errors
- All artifacts generated (worker-service, mcp-server, context-generator, viewer UI)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* MAESTRO: Code review PR #856 implementation verified
Verified all requirements in SessionQueueProcessor.ts:
- IDLE_TIMEOUT_MS = 180000ms (3 minutes)
- waitForMessage() accepts timeout parameter
- lastActivityTime reset on spurious wakeup (race condition fix)
- Graceful exit logs include thresholdMs parameter
- 11 comprehensive test cases in SessionQueueProcessor.test.ts
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: bigph00t <166455923+bigph00t@users.noreply.github.com>
Co-authored-by: root <root@srv1317155.hstgr.cloud>
The previous approach (PR #837) set CLAUDE_CONFIG_DIR to isolate observer
sessions from `claude --resume`. However, this broke authentication because
Claude Code stores credentials in the config directory.
This fix uses the SDK's `cwd` option instead:
- Observer sessions run with cwd=~/.claude-mem/observer-sessions/
- Project name = path.basename(cwd) = "observer-sessions"
- Sessions won't appear when running `claude --resume` from actual projects
- Authentication works because ~/.claude/ config is preserved
Changes:
- ProcessRegistry.ts: Remove CLAUDE_CONFIG_DIR override from spawn
- SDKAgent.ts: Add cwd option to query() pointing to observer dir
- paths.ts: Rename OBSERVER_CONFIG_DIR to OBSERVER_SESSIONS_DIR
Fixes regression from #837
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Observer sessions created by claude-mem were appearing in the main Claude Code
session picker (`claude --resume`), cluttering the list with internal plugin
sessions that users never intend to resume.
In one user's case: 74 observer sessions out of ~220 total (34% noise).
## Solution
Set `CLAUDE_CONFIG_DIR` to `~/.claude-mem/observer-config/` when spawning
observer Claude processes. This stores observer session files in a separate
location, isolating them from user sessions.
## Changes
1. Added `OBSERVER_CONFIG_DIR` to paths.ts
2. Modified `createPidCapturingSpawn()` in ProcessRegistry.ts to inject
`CLAUDE_CONFIG_DIR` environment variable
Observer sessions now write their `.jsonl` files to:
`~/.claude-mem/observer-config/projects/*/`
Instead of the user's:
`~/.claude/projects/*/`
Fixes#832
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
When the worker restarts, the SDK context is lost but the database still contains
memory_session_id values from the previous worker instance. The existing guard
(lastPromptNumber > 1) doesn't protect against this because lastPromptNumber is
also loaded from the database.
This fix:
- Clears memory_session_id when initializing a session from DB (not from cache)
- Adds warning log when discarding stale session IDs
- Lets SDK agent capture fresh memory_session_id on first response
The key insight: if a session is not in memory, we're in a new worker instance,
and any database memory_session_id is definitely stale.
Fixes#817
Related to #825
Co-authored-by: bigphoot <bigphoot@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Wire tenant, database, and API key settings into ChromaSync for
remote/pro mode. In remote mode:
- Passes tenant and database to ChromaClient for data isolation
- Adds Authorization header when API key is configured
- Logs tenant isolation connection details
Local mode unchanged - uses default_tenant without explicit params.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Added @chroma-core/default-embed dependency for local embeddings
- Updated ChromaSync to use DefaultEmbeddingFunction with collections
- Added isServerReachable() async method for reliable server detection
- Fixed start() to detect and reuse existing Chroma servers
- Updated build script to externalize native ONNX binaries
- Added runtime dependency to plugin/package.json
The embedding function uses all-MiniLM-L6-v2 model locally via ONNX,
eliminating need for external embedding API calls.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Updated chromadb from ^1.9.2 to ^3.2.2 (includes CLI binary)
- Changed heartbeat endpoint from /api/v1 to /api/v2
The 1.9.x version did not include the CLI, causing `npx chroma run` to fail.
Version 3.2.2 includes the chroma CLI and uses the v2 API.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace MCP subprocess approach with persistent Chroma HTTP server for
improved performance and reliability. This re-enables Chroma on Windows
by eliminating the subprocess spawning that caused console popups.
Changes:
- NEW: ChromaServerManager.ts - Manages local Chroma server lifecycle
via `npx chroma run`
- REFACTOR: ChromaSync.ts - Uses chromadb npm package's ChromaClient
instead of MCP subprocess (removes Windows disabling)
- UPDATE: worker-service.ts - Starts Chroma server on initialization
- UPDATE: GracefulShutdown.ts - Stops Chroma server on shutdown
- UPDATE: SettingsDefaultsManager.ts - New Chroma configuration options
- UPDATE: build-hooks.js - Mark optional chromadb deps as external
Benefits:
- Eliminates subprocess spawn latency on first query
- Single server process instead of per-operation subprocesses
- No Python/uvx dependency for local mode
- Re-enables Chroma vector search on Windows
- Future-ready for cloud-hosted Chroma (claude-mem pro)
- Cross-platform: Linux, macOS, Windows
Configuration:
CLAUDE_MEM_CHROMA_MODE=local|remote
CLAUDE_MEM_CHROMA_HOST=127.0.0.1
CLAUDE_MEM_CHROMA_PORT=8000
CLAUDE_MEM_CHROMA_SSL=false
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The isDirectChild() function failed to match files when the API used
absolute paths (/Users/x/project/app/api) but the database stored
relative paths (app/api/router.py). This caused all folder CLAUDE.md
files to incorrectly show "No recent activity".
Changes:
- Create shared path-utils module with proper path normalization
- Implement suffix matching strategy for mixed path formats
- Update SessionSearch.ts to use shared utilities
- Update regenerate-claude-md.ts to use shared utilities (was using
outdated broken logic)
- Prevent spurious directory creation from malformed paths
- Add comprehensive test coverage for path matching edge cases
This is the proper fix for #794, replacing PR #809 which only masked
the bug by skipping file creation when "no activity" was shown.
Co-authored-by: bigphoot <bigphoot@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* Fix zombie process accumulation (Issue #737)
Problem: Claude haiku subprocesses spawned by the SDK weren't terminating
properly, causing zombie process accumulation (user reported 155 processes
consuming 51GB RAM).
Root causes:
1. SDK's SpawnedProcess interface hides subprocess PIDs
2. deleteSession() didn't verify subprocess exit
3. abort() was fire-and-forget with no confirmation
4. No mechanism to track or clean up orphaned processes
Solution:
- Add ProcessRegistry module to track spawned Claude subprocesses
- Use SDK's spawnClaudeCodeProcess option to capture PIDs via custom spawn
- Pass signal parameter to enable AbortController integration
- Wait for subprocess exit in deleteSession() with 5s timeout
- Escalate to SIGKILL if graceful exit fails
- Add orphan reaper running every 5 minutes as safety net
Files changed:
- src/services/worker/ProcessRegistry.ts (new): PID registry and reaper
- src/services/worker/SDKAgent.ts: Use custom spawn to capture PIDs
- src/services/worker/SessionManager.ts: Verify subprocess exit on delete
- src/services/worker-service.ts: Start/stop orphan reaper
Fixes#737
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix: address code review feedback
- Replace busy-wait polling with event-based proc.once('exit')
- Detect and warn about multiple processes per session (race condition)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: bigphoot <bigphoot@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>