Two changes fix the observer process resource leak:
1. Add ensureProcessExit to generator finally blocks in SessionRoutes and
worker-service, matching the pattern already working in SDKAgent.
2. Add stale session reaper (every 2m) that removes sessions with no active
generator and no pending work after 15m idle. This unblocks the orphan
reaper which previously skipped processes for "active" sessions.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: self-healing claimNextMessage prevents stuck processing messages
claimAndDelete → claimNextMessage with atomic self-healing: resets stale
processing messages (>60s) back to pending before claiming. Eliminates
stuck messages from generator crashes without external timers. Removes
redundant idle-timeout reset in worker-service.ts. Adds QUEUE to logger
Component type.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: update stale comments in SessionQueueProcessor to reflect claim-confirm pattern
Comments still referenced the old claim-and-delete pattern after the
claimNextMessage rename. Updated to accurately describe the current
lifecycle where messages are marked as processing and stay in DB until
confirmProcessed() is called.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: move Date.now() inside transaction and extract stale threshold constant
- Move Date.now() inside claimNextMessage transaction closure so timestamp
is fresh if WAL contention causes retry
- Extract STALE_PROCESSING_THRESHOLD_MS to module-level constant
- Add comment clarifying strict < boundary semantics
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: backfill all Chroma projects on worker startup
ChromaSync.ensureBackfilled() existed but was never called. After
v10.2.2's bun cache clear destroyed the ONNX model cache, Chroma only
had ~2 days of embeddings while SQLite had 49k+ observations.
- Add static backfillAllProjects() to ChromaSync — iterates all projects
in SQLite, creates temporary ChromaSync per project, runs smart diff
- Call backfillAllProjects() fire-and-forget on worker startup
- Add 'CHROMA_SYNC' to logger Component type (pre-existing gap)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: sanitize project names for Chroma collection naming
Replace characters outside [a-zA-Z0-9._-] with underscores so projects
like "YC Stuff" map to collection "cm__YC_Stuff" instead of failing
Chroma's collection name validation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: route backfill to shared cm__claude-mem collection, harden sanitization
- Use single ChromaSync('claude-mem') in backfillAllProjects() instead of
per-project instances, matching how DatabaseManager and SearchManager
operate — fixes critical bug where backfilled data landed in orphaned
collections that no search path reads from
- Strip trailing non-alphanumeric chars from sanitized collection names
to satisfy Chroma's end-character constraint
- Guard backfill behind Chroma server readiness to avoid N spurious error
logs when Chroma failed to start
- Use CHROMA_SYNC log component consistently for backfill messages
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: pass project as parameter to ensureBackfilled instead of mutating instance state
Eliminates shared mutable state in backfillAllProjects() loop. Project
scoping is now passed explicitly via parameter to both ensureBackfilled()
and getExistingChromaIds(), keeping a single Chroma connection while
avoiding fragile instance property mutation across iterations.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Remove nuclear `bun pm cache rm` from smart-install.js and
sync-marketplace.cjs (only needed for removed sharp dependency).
Add `bun install` in cache version directory after sync so worker
can resolve dependencies. Move HuggingFace model cache to
~/.claude-mem/models/ so reinstalls don't corrupt it. Add self-healing
retry for Protobuf parsing failures.
Fixes recurring issues #1104, #1105, #1110.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: use bun install in sync, add node-addon-api for sharp, consolidate PendingMessageStore
- Switch sync-marketplace from npm to bun install
- Add node-addon-api as dev dep so sharp builds under bun
- Consolidate duplicate PendingMessageStore instantiation in worker-service finally block
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* build assets
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
On Linux, Bun's libuv calls fstat() on inherited pipe file descriptors
and crashes with EINVAL when the pipe originates from Claude Code's hook
system. This causes all PostToolUse hooks to fail silently, preventing
observations from being recorded.
The fix reads stdin entirely in the Node.js parent process (bun-runner.js)
before spawning Bun, then writes the buffered data to a fresh pipe created
by Node's child_process.spawn(). Bun receives a standard pipe that it can
fstat() without errors.
Changes:
- Add collectStdin() to buffer piped input in Node.js with 5s safety timeout
- Change stdio from 'inherit' to ['pipe'|'ignore', 'inherit', 'inherit']
- Write buffered stdin to child.stdin then close for proper EOF signaling
- Handle edge cases: TTY stdin, no stdin, read errors
Fixes#646
Co-authored-by: yczc3999 <zxfgds@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Address PR #1125 review feedback - both fetches now start simultaneously
via Promise.all instead of sequential-then-parallel.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add systemMessage field to HookResult so SessionStart can display a
colored timeline directly to the user in the CLI. The handler now
parallel-fetches both markdown (for Claude context) and ANSI-colored
(for user display) timelines, appending a viewer URL link.
Also update default settings to hide verbose token columns (read/work
tokens, savings amount) and disable full observation expansion, keeping
the cleaner index-only view by default.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wrap SDK query loop in try/finally so subprocess cleanup runs on error paths.
Swap Chroma binary check order to try project-level .bin first (common case).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Chroma requires client-side embeddings — the server is storage only.
The previous commit incorrectly removed @chroma-core/default-embed.
Uses DefaultEmbeddingFunction({ wasm: true }) which forces the WASM
backend instead of native ONNX binaries. Same model (all-MiniLM-L6-v2),
same embeddings, but works on all platforms without segfaults or
ENOENT errors (#1104, #1105, #1110).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Strip CLAUDECODE env var from SDK subprocesses to prevent "cannot be
launched inside another Claude Code session" error (Claude Code 2.1.42+)
- Lazy-load @chroma-core/default-embed to avoid eagerly pulling in
sharp native binaries at bundle startup (fixes ERR_DLOPEN_FAILED)
- Add stderr capture to SDK spawn for diagnosing future process failures
- Exclude lockfiles from marketplace rsync and delete stale lockfiles
before npm install to prevent native dep version mismatches
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace hardcoded TEST-008 build ID with real package version. Add worker
filesystem path, uptime counter, and AI provider status (including last
interaction success/failure tracking) to the health endpoint response.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Missing return statement and closing brace in the programming errors
check caused a build failure after merging main.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reduce timeouts to eliminate 10-30s startup delay when worker is dead
(common on WSL2 after hibernate). Add stale PID detection, graceful
error handling across all handlers, and error classification that
distinguishes worker unavailability from handler bugs.
- HEALTH_CHECK 30s→3s, new POST_SPAWN_WAIT (5s), PORT_IN_USE_WAIT (3s)
- isProcessAlive() with EPERM handling, cleanStalePidFile()
- getPluginVersion() try-catch for shutdown race (#1042)
- isWorkerUnavailableError: transport+5xx+429→exit 0, 4xx→exit 2
- No-op handler for unknown event types (#984)
- Wrap all handler fetch calls in try-catch for graceful degradation
- CLAUDE_MEM_HEALTH_TIMEOUT_MS env var override with validation
- Check CLAUDE_MEM_DATA_DIR env var before settings.json (Greptile)
- Derive project before DB check for consistent output (Greptile)
- Include project in error fallback output (Greptile)
- Set executable permission for shebang compatibility (Greptile)
Lightweight script for Claude Code statusLineCommand integration.
Returns per-project observation and prompt counts via direct SQLite
read (~15ms, no HTTP, no worker dependency).
Counts are scoped with WHERE project = ? to prevent inflated totals
from cross-project observations. Supports CLAUDE_MEM_DATA_DIR from
settings.json for custom data directory configurations.
Cherry-picked source changes from PR #657 (224 commits behind main).
Adds `claude-mem generate` and `claude-mem clean` CLI commands:
- New src/cli/claude-md-commands.ts with generateClaudeMd() and cleanClaudeMd()
- Worker service generate/clean case handlers with --dry-run support
- CLAUDE_MD logger component type
- Uses shared isDirectChild from path-utils.ts (DRY improvement over PR original)
Skipped from PR: 91 CLAUDE.md file deletions (stale), build artifacts,
.claude/plans/ dev artifact, smart-install.js shell alias auto-injection
(aggressive profile modification without consent).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bun 1.1.14+ is required because:
- `.changes` property on SQLite statements (added in 1.1.14)
- Multi-statement SQL support in db.run() (fixed in 1.0.26)
Older versions cause silent failures in database migrations.
Fixes#519🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds automatic detection and handling of Zscaler enterprise security certificates
on macOS. Combines standard certifi CA certificates with Zscaler certificates into
a single bundle, passed via SSL_CERT_FILE/REQUESTS_CA_BUNDLE/CURL_CA_BUNDLE env vars
to the chroma-mcp subprocess. Certificate bundle is cached for 24 hours. Falls back
gracefully when Zscaler is not present, with no impact on non-Zscaler environments.
Co-Authored-By: RClark4958 <rickdclark48@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Gemini and OpenRouter are stateless APIs that never return session IDs.
Without synthetic IDs, PR #693's defensive memorySessionId checks throw
errors on every observation processing call for these providers.
Generates provider-prefixed IDs (gemini-/openrouter-{contentSessionId}-
{timestamp}) before the first API call, persisted to the database via
updateMemorySessionId(). Applied from PR #615 (closed due to staleness).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add restart limit (max 3 consecutive restarts) with exponential backoff
to prevent infinite generator restart loops. Also add defensive
memorySessionId checks in GeminiAgent and OpenRouterAgent before
expensive LLM calls to fail fast when session ID hasn't been captured.
Based on PR #693 by @ajbmachon (applied to current main).
Co-Authored-By: Andre Machon <ajbmachon2@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Folder CLAUDE.md generation is now gated behind the CLAUDE_MEM_FOLDER_CLAUDEMD_ENABLED
setting (default: false). The setting is exposed via the settings API and handles both
string and boolean JSON values.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add exclusion list for directories where CLAUDE.md generation breaks
toolchains: res/ (Android aapt2), .git/, build/, node_modules/,
__pycache__/. Closes issue #912. Credit to @jayvenn21.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add hasConsecutiveDuplicateSegments() check to isValidPathForClaudeMd() to reject paths
like frontend/frontend/ or src/src/ that occur when cwd already includes the directory name.
3 new tests added (46 total for claude-md-utils). Fixes#814. Credit to @Glucksberg.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Prevents "file modified since read" errors when Claude Code is actively editing
a CLAUDE.md file by detecting CLAUDE.md paths in observation file lists and
skipping those folders during updates. Closes#859. Credit: @cheapsteak.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The startup cleanupOrphanedProcesses() only targeted chroma-mcp, leaving
orphaned mcp-server.cjs and worker-service.cjs processes undetected after
daemon crashes. Expanded to target all claude-mem process types with
30-minute age filtering and current PID exclusion. Closes PR #687 (which
had a spawnDaemon regression removing Windows WMIC support).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
On Windows, when the username contains spaces (e.g., "Tafari Higgs"),
bun-runner.js fails with: Syntax Error at C:\Users\Tafari:1:2
This happens because `shell: IS_WINDOWS` (true on Windows) causes
spawn() to route through cmd.exe, which misinterprets the path at
the first space in the username directory.
Removing `shell: true` on Windows fixes the issue since spawn()
can invoke the Bun binary directly without needing a shell wrapper.
Added `windowsHide: true` to prevent a visible console window from
appearing, matching the previous hidden behavior under cmd.exe.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PR #896 identified a valid XSS concern in TerminalPreview.tsx but was
broken (missing DOMPurify import and dependency). The existing
escapeXML:true on AnsiToHtml already mitigates the vector, but
DOMPurify adds defense-in-depth sanitization.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>