This commit addresses the issue of duplicate assistant messages appearing
in the conversation history by commenting out the lines that were
unnecessarily pushing assistant responses to the conversationHistory array.
The processAgentResponse function already handles adding assistant messages
to the conversation history, so these additional pushes were causing
duplicate entries.
Changes made:
- Commented out session.conversationHistory.push calls for assistant responses
in three locations within OpenRouterAgent.ts:
1. In the init response handling (around line 117)
2. In the observation response handling (around line 188)
3. In the summary response handling (around line 230)
This ensures that assistant messages are only added once to the conversation
history, preventing duplication while maintaining the intended functionality.
Co-authored-by: 张坤 <zhangkun@example.com>
v1beta does not support newer models like gemini-3-flash, causing
silent 404 errors that back up the observation queue indefinitely.
Users with CLAUDE_MEM_GEMINI_MODEL=gemini-3-flash get zero observations
stored, with no visible error — the queue just grows silently.
Changes:
- Switch API URL from v1beta/models to v1/models (generateContent
works identically on both endpoints)
- Add gemini-3-flash to GeminiModel type and RPM limits
- Update test to match new endpoint
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Wrap SDK query loop in try/finally so subprocess cleanup runs on error paths.
Swap Chroma binary check order to try project-level .bin first (common case).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Chroma requires client-side embeddings — the server is storage only.
The previous commit incorrectly removed @chroma-core/default-embed.
Uses DefaultEmbeddingFunction({ wasm: true }) which forces the WASM
backend instead of native ONNX binaries. Same model (all-MiniLM-L6-v2),
same embeddings, but works on all platforms without segfaults or
ENOENT errors (#1104, #1105, #1110).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Strip CLAUDECODE env var from SDK subprocesses to prevent "cannot be
launched inside another Claude Code session" error (Claude Code 2.1.42+)
- Lazy-load @chroma-core/default-embed to avoid eagerly pulling in
sharp native binaries at bundle startup (fixes ERR_DLOPEN_FAILED)
- Add stderr capture to SDK spawn for diagnosing future process failures
- Exclude lockfiles from marketplace rsync and delete stale lockfiles
before npm install to prevent native dep version mismatches
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve conflicts between Chroma HTTP server PR and main branch changes
(folder CLAUDE.md, exclusion settings, Zscaler SSL, transport cleanup).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace hardcoded TEST-008 build ID with real package version. Add worker
filesystem path, uptime counter, and AI provider status (including last
interaction success/failure tracking) to the health endpoint response.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Addresses Greptile review feedback:
- ChromaSync: replace PID-based sort with ps etime column + parseElapsedTime()
for reliable age ordering (PIDs wrap and don't guarantee ordering)
- ProcessManager: filter out entries with unparseable etime (-1) before
sorting to prevent sort corruption in cleanupExcessChromaProcesses()
During SIGHUP testing with 6+ active sessions, ChromaSync.ensureConnection()
had no mutex — concurrent fire-and-forget syncObservation() calls each spawned
a chroma-mcp subprocess via StdioClientTransport, creating 641 orphans in ~5min.
Error-driven reconnection formed a positive feedback loop amplifying the storm.
Defense layers:
- Layer 0: Connection mutex via promise memoization (prevents concurrent spawns)
- Layer 1: Pre-spawn process count guard using execFileSync('ps') (kills excess)
- Layer 2: Hardened close() with try-finally + Unix pkill in GracefulShutdown
- Layer 3: Count-based orphan reaper in ProcessManager (not age-based)
- Layer 4: Circuit breaker stops retries after 3 consecutive failures for 60s
Closes#1063, closes#695
Relates to #1010, #707
Root cause: registerSignalHandlers() handled SIGTERM/SIGINT but not
SIGHUP. When the parent hook process exits, the kernel sends SIGHUP
to the daemon, causing immediate termination (default signal action).
Belt-and-suspenders fix:
1. SIGHUP handler: ignore in daemon mode, graceful shutdown otherwise
2. setsid: spawn daemon in new session on Linux (prevents SIGHUP delivery)
3. Global unhandledRejection/uncaughtException guards in daemon mode
Reduce timeouts to eliminate 10-30s startup delay when worker is dead
(common on WSL2 after hibernate). Add stale PID detection, graceful
error handling across all handlers, and error classification that
distinguishes worker unavailability from handler bugs.
- HEALTH_CHECK 30s→3s, new POST_SPAWN_WAIT (5s), PORT_IN_USE_WAIT (3s)
- isProcessAlive() with EPERM handling, cleanStalePidFile()
- getPluginVersion() try-catch for shutdown race (#1042)
- isWorkerUnavailableError: transport+5xx+429→exit 0, 4xx→exit 2
- No-op handler for unknown event types (#984)
- Wrap all handler fetch calls in try-catch for graceful degradation
- CLAUDE_MEM_HEALTH_TIMEOUT_MS env var override with validation
1. ProcessManager: Migrate spawnDaemon() from WMIC to PowerShell Start-Process
- WMIC deprecated in Windows 11, PowerShell inherits env vars properly
- Use -WindowStyle Hidden to prevent console popups
- Fix redundant backslash escaping in PowerShell $_ variables
2. ChromaSync: Re-enable vector search on Windows
- Remove overly defensive platform check that disabled all semantic search
- Worker daemon starts with -WindowStyle Hidden; child processes inherit
- MCP SDK's StdioClientTransport uses shell:false, no new console created
3. worker-service: Unified DB-ready gate middleware
- Replace single-endpoint /api/sessions/init wait with global middleware
- Hold all DB-dependent requests until database is initialized (30s timeout)
- Whitelist static assets, /health, and viewer page for immediate response
- Separate dbReadyPromise (DB only) from initializationComplete (full init)
- Fixes "Database not initialized" errors on /stream, /summarize, /init
4. EnvManager: Switch from allowlist to blocklist for subprocess env
- Only strip ANTHROPIC_API_KEY to prevent Issue #733 billing hijack
- Pass through all other vars (ANTHROPIC_AUTH_TOKEN, ANTHROPIC_BASE_URL, etc.)
- Simpler, less fragile than maintaining an exhaustive system vars allowlist
Cherry-picked source changes from PR #657 (224 commits behind main).
Adds `claude-mem generate` and `claude-mem clean` CLI commands:
- New src/cli/claude-md-commands.ts with generateClaudeMd() and cleanClaudeMd()
- Worker service generate/clean case handlers with --dry-run support
- CLAUDE_MD logger component type
- Uses shared isDirectChild from path-utils.ts (DRY improvement over PR original)
Skipped from PR: 91 CLAUDE.md file deletions (stale), build artifacts,
.claude/plans/ dev artifact, smart-install.js shell alias auto-injection
(aggressive profile modification without consent).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds save_memory MCP tool allowing users to manually save observations
for semantic search. Source changes cherry-picked from PR #662 by
@darconada (build artifact conflicts resolved by direct application).
Closes#645.
Co-Authored-By: darconadalabarga <darconada@arsys.es>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cherry-picked from PR #844 by @thusdigital. Sessions stayed in active
sessions map forever after summarize, causing the orphan reaper to think
all processes were still active. Adds session-complete as Stop phase 2
hook that calls POST /api/sessions/complete to remove sessions from the
active map, allowing the reaper to correctly identify and clean up
orphaned worker processes. Fixes#842.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cherry-picked source changes from PR #889 by @Et9797. Fixes#846.
Key changes:
- Add ensureMemorySessionIdRegistered() guard in SessionStore.ts
- Add ON UPDATE CASCADE migration (schema v21) for observations and session_summaries FK constraints
- Change message queue from claim-and-delete to claim-confirm pattern (PendingMessageStore.ts)
- Add spawn deduplication and unrecoverable error detection in SessionRoutes.ts and worker-service.ts
- Add forceInit flag to SDKAgent for stale session recovery
Build artifacts skipped (pre-existing dompurify dep issue). Path fixes (HealthMonitor.ts, worker-utils.ts)
already merged via PR #634.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cherry-picked source changes from PR #721. Fixes two Cursor standalone
setup bugs:
1. findCursorHooksDir() now checks for hooks.json (unified CLI mode)
in addition to legacy common.sh/common.ps1 scripts
2. installCursorHooks() now uses bun instead of node for hook commands
since worker-service.cjs depends on bun:sqlite
3. Added findBunPath() to detect bun executable across platforms
Build artifacts skipped (pre-existing dompurify viewer dep issue).
Source-only cherry-pick, TypeScript compilation clean for modified file.
Co-Authored-By: polux0 <aleksaprosperitylabs@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add MARKETPLACE_ROOT constant to paths.ts and update 5 source files to use
centralized path constants instead of hardcoded ~/.claude paths. Preserves
backwards compatibility when CLAUDE_CONFIG_DIR is not set.
Based on PR #634 by @Kuroakira, cherry-picked onto main due to build artifact
merge conflicts (source changes applied cleanly).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Gemini API requires the -preview suffix for the Gemini 3 Flash model.
gemini-3-flash does not exist - only gemini-3-flash-preview is available.
This was causing 404 errors when users selected this model option.
Closes#831
Co-Authored-By: Glucksberg <markuscontasul@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds automatic detection and handling of Zscaler enterprise security certificates
on macOS. Combines standard certifi CA certificates with Zscaler certificates into
a single bundle, passed via SSL_CERT_FILE/REQUESTS_CA_BUNDLE/CURL_CA_BUNDLE env vars
to the chroma-mcp subprocess. Certificate bundle is cached for 24 hours. Falls back
gracefully when Zscaler is not present, with no impact on non-Zscaler environments.
Co-Authored-By: RClark4958 <rickdclark48@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fixes#761
Root cause: When connection errors occur (MCP error -32000, Connection closed),
the code was resetting \`connected\` and \`client\` but NOT calling
\`transport.close()\`, leaving the chroma-mcp subprocess alive. Each
reconnection attempt spawned a NEW process while old ones accumulated.
Changes:
- Close transport before resetting state in ensureCollection() error handler
- Close transport before resetting state in queryChroma() error handler
- Set transport = null after closing to match close() method behavior
- Add regression tests for Issue #761 with source code verification
Tested on macOS - no more zombie processes after the fix.
ChromaSync.queryChroma() returns deduplicated sqlite_ids but the
metadatas array contains multiple entries per observation (narrative +
facts). The filterByRecency() method was iterating over metadatas and
using the index to access ids, causing array out-of-bounds access.
The fix builds a Map from sqlite_id to metadata, then iterates over
the deduplicated ids array to ensure proper alignment.
Symptoms before fix:
- Semantic search returning incorrect/empty results
- Search only working with near-exact queries
- Recent items (same day) not being found
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Gemini and OpenRouter are stateless APIs that never return session IDs.
Without synthetic IDs, PR #693's defensive memorySessionId checks throw
errors on every observation processing call for these providers.
Generates provider-prefixed IDs (gemini-/openrouter-{contentSessionId}-
{timestamp}) before the first API call, persisted to the database via
updateMemorySessionId(). Applied from PR #615 (closed due to staleness).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Applies PR #741 by @licutis onto main, resolving conflicts with
recently merged PRs #693, #937, and #627. Adds getActiveAgent() to
WorkerService so startup-recovery uses the correct provider instead
of hardcoding SDKAgent. Also cleans up sessions stuck 'active' for
6+ hours and their pending messages before processing orphaned queues.
Co-Authored-By: licutis <43884712+licutis@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a generator exits with wasAborted=true, the AbortController remains in
aborted state but generatorPromise is set to null. When a new observation
arrives, ensureGeneratorRunning() sees generatorPromise=null and tries to
start a new generator, but the new generator immediately sees
signal.aborted=true and exits, causing an infinite "Generator aborted" loop.
This fix resets the AbortController if it's already aborted before starting
a new generator, allowing the session to recover from the stuck state.
Bug reproduction:
1. Session receives observations
2. Something causes the generator to be aborted
3. generatorPromise = null, but abortController.signal.aborted = true
4. New observation arrives → starts generator → immediately aborted → loop
Fix: Check if abortController.signal.aborted before starting generator,
and create a new AbortController if needed.
PostToolUse hook can create the session before UserPromptSubmit's
session-init sets the project, leaving it empty. Add an UPDATE after
INSERT OR IGNORE to backfill the project when a later hook provides it.
Closes#939
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add restart limit (max 3 consecutive restarts) with exponential backoff
to prevent infinite generator restart loops. Also add defensive
memorySessionId checks in GeminiAgent and OpenRouterAgent before
expensive LLM calls to fail fast when session ID hasn't been captured.
Based on PR #693 by @ajbmachon (applied to current main).
Co-Authored-By: Andre Machon <ajbmachon2@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
JSON.parse preserves native types, so boolean true/false stay as
booleans rather than strings. The previous check only handled string
'true', meaning users who set `"ENABLED": true` (boolean) wouldn't
have the feature work.
Now both `getBool()` helper and ResponseProcessor check handle:
- String 'true' → enabled
- Boolean true → enabled
- Any other value → disabled
Tested: Confirmed feature stays disabled with string "false" and
no new CLAUDE.md files are created when accessing directories.
The CLAUDE_MEM_FOLDER_CLAUDEMD_ENABLED setting was documented but never
actually checked in code. The folder CLAUDE.md generation ran
unconditionally whenever files were touched.
Changes:
- Add CLAUDE_MEM_FOLDER_CLAUDEMD_ENABLED to SettingsDefaults interface
- Add default value 'false' to DEFAULTS object
- Check setting in ResponseProcessor before calling updateFolderClaudeMdFiles
Fixes the issue identified in issue-600-documentation-audit-features-not-implemented.md:
"The setting CLAUDE_MEM_FOLDER_CLAUDEMD_ENABLED is never read."
Fixes a race condition where the UserPromptSubmit hook could call
/api/sessions/init before the database is fully initialized, resulting
in a 500 error with "Database not initialized".
Root cause:
- Worker starts HTTP server immediately for fast startup
- Health endpoint (/api/health) returns OK when server is listening
- session-init hook waits for health check, then calls /api/sessions/init
- Database initialization happens in background, creating a race window
Solution:
- Add early handler for /api/sessions/init that waits for initialization
- Uses same pattern as existing /api/context/inject handler
- Returns 503 with retry message if initialization times out
The fix ensures hooks receive proper responses even during the brief
startup window when the server is listening but DB isn't ready yet.
Co-Authored-By: Claude Code <noreply@anthropic.com>
The startup cleanupOrphanedProcesses() only targeted chroma-mcp, leaving
orphaned mcp-server.cjs and worker-service.cjs processes undetected after
daemon crashes. Expanded to target all claude-mem process types with
30-minute age filtering and current PID exclusion. Closes PR #687 (which
had a spawnDaemon regression removing Windows WMIC support).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add killIdleDaemonChildren() to clean up SDK-spawned claude processes
that don't terminate after completing their work.
Problem:
- Worker-service daemon spawns Claude SDK processes
- These processes remain alive after work completes
- They accumulate over time, consuming significant memory
- Existing killSystemOrphans() only handles PPID=1 orphans
Solution:
- Add killIdleDaemonChildren() that finds claude processes where:
- Parent PID = daemon's PID (children of worker-service)
- CPU = 0% (idle, not actively working)
- Running > 2 minutes (completed their work)
- Call it from reapOrphanedProcesses() (runs every 5 minutes)
Testing:
- Verified locally: 15+ zombie processes cleaned up
- Memory saved: ~2GB
- Normal processes (MCP server, Chroma) unaffected
Co-Authored-By: Claude <noreply@anthropic.com>
Adds file-based cooldown lock in ensureWorkerStarted() so failed worker
spawn attempts on Windows don't produce repeated bun.exe terminal popups.
Based on the approach from PR #931, integrated into the refactored code.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The /api/context/inject endpoint previously blocked for up to 5 minutes
(300s timeout) waiting for initializationComplete, which includes MCP
connection setup. On Windows, the MCP connection can hang indefinitely,
causing the context hook to never return and blocking Claude Code startup.
This change makes the endpoint fail open: if the worker hasn't finished
initializing, return empty context immediately instead of blocking. The
hook completes fast, and context becomes available on subsequent prompts
once initialization finishes in the background.
Closes#958