readGeminiSettings() throws on corrupt JSON since ae6915b, but
checkGeminiCliHooksStatus() called it without catching — violating
its "returns 0 always" contract.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- cpSync now does rmSync before copy to avoid stale file merges
- setupIDEs() returns failed IDE list; install reports partial success
- runSmartInstall() returns boolean status instead of void
- Worker port in next-steps URL reads CLAUDE_MEM_WORKER_PORT env var
- Goose YAML regex stops at column-0 keys (prevents eating sibling sections)
- AGENTS.md uninstall removes header-only stub files
- findBunPath() validated before use in WindsurfHooksInstaller
- Cursor marked unsupported in ide-detection until installer is wired
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add shebang banner to NPX CLI esbuild config so npx claude-mem works
- Remove manual backslash pre-escaping in WindsurfHooksInstaller (JSON.stringify handles it)
- Scope cache deletion to claude-mem only, not entire vendor namespace
- Use getWorkerPort() in OpenCodeInstaller instead of hard-coded 37777
- Throw on corrupt JSON in readJsonSafe/readGeminiSettings/Windsurf to prevent data loss
- Fix Cursor install stub to warn instead of silently succeeding
- Fix Gemini uninstall to remove individual hooks within groups, not whole groups
- Update tests for new corrupt-file-throws behavior
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SessionEnd has a 1.5s hardcoded cap from Claude Code (CLAUDE_CODE_SESSIONEND_HOOKS_TIMEOUT_MS),
making it unsuitable for waiting on async work. Previously, the Stop hook would fire-and-forget
the summarize request, then SessionEnd would immediately call deleteSession — aborting the SDK
agent mid-summary.
Now the Stop hook (120s timeout, no cap) owns the full lifecycle:
1. Queue summarize request
2. Poll new GET /api/sessions/status endpoint until queue drains
3. Call /api/sessions/complete after summary finishes
SessionEnd is now a true fire-and-forget fallback (process.exit(0) immediately).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolve merge conflicts in adapter index, gemini-cli adapter, and rebuilt CJS artifacts.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes CursorHooksInstaller ESM compatibility, updates install command
with improved path resolution, and refreshes built plugin artifacts.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The MCP server (#!/usr/bin/env node) and context generator run under
Node.js, where import.meta.url throws SyntaxError in CJS mode. Only
the worker-service needs the banner since it runs under Bun.
CJS files under Node.js already have __dirname/__filename natively.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add sessionId to summarize.ts warning log for easier triage
- Add APPROVED OVERRIDE annotation to Windows spawn catch block
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. Fix esbuild inlining build-machine __dirname as string literal — use
CJS-compatible runtime banner with require("node:url").fileURLToPath
across worker-service, mcp-server, and context-generator builds.
2. Fix isMainModule check missing .cjs extension and Windows backslash
path normalization.
3. Wrap extractLastMessage in try-catch to prevent infinite Stop hook
feedback loop on malformed transcripts (exit 0 instead of exit 2).
4. Replace heavy SessionEnd hook (Node→Bun→1.7MB CJS→HTTP) with
lightweight inline node -e one-liner (~200ms vs >1s).
5. Add 7 Gemini/OpenRouter error patterns to unrecoverablePatterns
circuit breaker to prevent 77K+ retry loops on expired API keys.
6. Preserve CLAUDE_CODE_OAUTH_TOKEN and CLAUDE_CODE_GIT_BASH_PATH in
sanitizeEnv instead of stripping them with the CLAUDE_CODE_ prefix.
7. Use PowerShell -EncodedCommand for spawnDaemon to fix path quoting
when Windows usernames contain spaces.
Closes#1515, #1495, #1475, #1465, #1500, #1513, #1512, #1450, #1460,
#1486, #1449, #1481, #1451, #1480, #1453, #1445
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Removes installer/ directory (16 files) — fully replaced by src/npx-cli/.
Updates install.sh and installer.js to redirect to npx claude-mem.
Adds npx claude-mem as primary install method in docs and README.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds esbuild steps for npx-cli (57KB, Node.js ESM) and openclaw plugin
(12KB). Creates .npmignore to exclude node_modules and Bun binary from
npm package, reducing pack size from 146MB to 2MB.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replaces the old git-clone installer with a direct npm package copy workflow.
Supports 13 IDE auto-detection targets, runtime delegation to Bun worker,
and pure Node.js install path (no Bun required for installation).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: stop spinner from spinning forever due to orphaned DB messages
The activity spinner never stopped because isAnySessionProcessing() queried
ALL pending/processing messages in the database, including orphaned messages
from dead sessions that no generator would ever process.
Root cause: isAnySessionProcessing() used hasAnyPendingWork() which is a
global DB scan. Changed it to use getTotalQueueDepth() which only checks
sessions in the active in-memory Map.
Additional fixes:
- Add terminateSession() to enforce restart-or-terminate invariant
- Fix 3 zombie paths in .finally() handler that left sessions alive
- Clean up idle sessions from memory on successful completion
- Remove redundant bare isProcessing:true broadcast
- Replace inline require() with proper accessor
- Add 8 regression tests for session termination invariant
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: address review findings — idle-timeout race, double broadcast, query amplification
- Move pendingCount check before idle-timeout termination to prevent
abandoning fresh messages that arrive between idle abort and .finally()
- Move broadcastProcessingStatus() inside restart branch only — the else
branch already broadcasts via removeSessionImmediate callback
- Compute queueDepth once in broadcastProcessingStatus() and derive
isProcessing from it, eliminating redundant double iteration
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: strip <system_instruction> tags before database storage
Extends the existing tag-stripping mechanism (used for <private> and
<claude-mem-context>) to also filter Conductor-injected system instructions,
preventing them from being persisted in the observation database.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: also strip <system-instruction> (hyphen variant) before DB storage
Conductor uses both <system_instruction> and <system-instruction> tag
formats. This adds the hyphen variant to the same stripping mechanism.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(openclaw): inject context via system prompt instead of overwriting MEMORY.md
The OpenClaw plugin was overwriting each agent's MEMORY.md with a large
auto-generated observation dump (~12-15KB) on every before_agent_start
and tool_result_persist event. This conflicts with OpenClaw's design
where MEMORY.md is agent-curated long-term memory.
Migrate context injection from file-based (writeFile MEMORY.md) to
OpenClaw's native before_prompt_build hook, which returns context via
appendSystemContext. This keeps MEMORY.md under agent control while
still providing cross-session observation context to the LLM.
Changes:
- Add before_prompt_build hook that returns { appendSystemContext }
- Remove writeFile/MEMORY.md sync from before_agent_start
- Remove MEMORY.md sync from tool_result_persist (observations still recorded)
- Add 60s TTL cache to avoid re-fetching context on every LLM turn
- Add syncMemoryFileExclude config for per-agent opt-out
- Remove dead workspaceDirsBySessionKey tracking map
- Rewrite test suite to verify prompt injection instead of file writes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(ui): align settings defaults with backend and use nullish coalescing
The web UI had two issues causing settings inflation:
1. DEFAULT_SETTINGS in the UI used FULL_COUNT='5' and all token columns
'true', while SettingsDefaultsManager (backend) uses FULL_COUNT='0'
and token columns 'false'. Opening the settings modal and saving
without changes would silently inflate the context.
2. useSettings used || for fallback, which treats '0' and 'false' as
falsy — even when the backend correctly returns these values, the UI
would replace them with inflated defaults. Changed to ?? (nullish
coalescing) so only null/undefined trigger the fallback.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs(openclaw): update integration docs for system prompt injection
Reflect the migration from MEMORY.md file writes to before_prompt_build
hook-based context injection:
- Update architecture diagram and overview to show new hook flow
- Replace "MEMORY.md Live Sync" section with "System Prompt Context Injection"
- Update event lifecycle steps (before_agent_start, tool_result_persist)
- Add before_prompt_build step with TTL cache description
- Document new syncMemoryFileExclude config parameter
- Update session tracking to reflect removed workspaceDirsBySessionKey
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: fix terminology and update SKILL.md for system prompt injection
Replace "prompt injection" with "context injection" in docs to avoid
confusion with the OWASP security term. Update openclaw/SKILL.md to
reflect the new before_prompt_build hook and remove stale MEMORY.md
references.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Alex Newman <thedotmack@gmail.com>
* feat: add embedded Process Supervisor for unified process lifecycle management
Consolidates scattered process management (ProcessManager, GracefulShutdown,
HealthMonitor, ProcessRegistry) into a unified src/supervisor/ module.
New: ProcessRegistry with JSON persistence, env sanitizer (strips CLAUDECODE_*
vars), graceful shutdown cascade (SIGTERM → 5s wait → SIGKILL with tree-kill
on Windows), PID file liveness validation, and singleton Supervisor API.
Fixes#1352 (worker inherits CLAUDECODE env causing nested sessions)
Fixes#1356 (zombie TCP socket after Windows reboot)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add session-scoped process reaping to supervisor
Adds reapSession(sessionId) to ProcessRegistry for killing session-tagged
processes on session end. SessionManager.deleteSession() now triggers reaping.
Tightens orphan reaper interval from 60s to 30s.
Fixes#1351 (MCP server processes leak on session end)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add Unix domain socket support for worker communication
Introduces socket-manager.ts for UDS-based worker communication, eliminating
port 37777 collisions between concurrent sessions. Worker listens on
~/.claude-mem/sockets/worker.sock by default with TCP fallback.
All hook handlers, MCP server, health checks, and admin commands updated to
use socket-aware workerHttpRequest(). Backwards compatible — settings can
force TCP mode via CLAUDE_MEM_WORKER_TRANSPORT=tcp.
Fixes#1346 (port 37777 collision across concurrent sessions)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: remove in-process worker fallback from hook command
Removes the fallback path where hook scripts started WorkerService in-process,
making the worker a grandchild of Claude Code (killed by sandbox). Hooks now
always delegate to ensureWorkerStarted() which spawns a fully detached daemon.
Fixes#1249 (grandchild process killed by sandbox)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add health checker and /api/admin/doctor endpoint
Adds 30-second periodic health sweep that prunes dead processes from the
supervisor registry and cleans stale socket files. Adds /api/admin/doctor
endpoint exposing supervisor state, process liveness, and environment health.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add comprehensive supervisor test suite
64 tests covering all supervisor modules: process registry (18 tests),
env sanitizer (8), shutdown cascade (10), socket manager (15), health
checker (5), and supervisor API (6). Includes persistence, isolation,
edge cases, and cross-module integration scenarios.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: revert Unix domain socket transport, restore TCP on port 37777
The socket-manager introduced UDS as default transport, but this broke
the HTTP server's TCP accessibility (viewer UI, curl, external monitoring).
Since there's only ever one worker process handling all sessions, the
port collision rationale for UDS doesn't apply. Reverts to TCP-only,
removing ~900 lines of unnecessary complexity.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: remove dead code found in pre-landing review
Remove unused `acceptingSpawns` field from Supervisor class (written but
never read — assertCanSpawn uses stopPromise instead) and unused
`buildWorkerUrl` import from context handler.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* updated gitignore
* fix: address PR review feedback - downgrade HTTP logging, clean up gitignore, harden supervisor
- Downgrade request/response HTTP logging from info to debug to reduce noise
- Remove unused getWorkerPort imports, use buildWorkerUrl helper
- Export ENV_PREFIXES/ENV_EXACT_MATCHES from env-sanitizer, reuse in Server.ts
- Fix isPidAlive(0) returning true (should be false)
- Add shutdownInitiated flag to prevent signal handler race condition
- Make validateWorkerPidFile testable with pidFilePath option
- Remove unused dataDir from ShutdownCascadeOptions
- Upgrade reapSession log from debug to warn
- Rename zombiePidFiles to deadProcessPids (returns actual PIDs)
- Clean up gitignore: remove duplicate datasets/, stale ~*/ and http*/ patterns
- Fix tests to use temp directories instead of relying on real PID file
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: always pass --ssl flag to chroma-mcp in remote mode
The chroma-mcp CLI defaults to SSL when using --client-type http.
When CLAUDE_MEM_CHROMA_SSL is false (the common case for local
ChromaDB servers), buildCommandArgs() omitted --ssl entirely,
causing chroma-mcp to attempt an SSL connection to a plain HTTP
server and fail with "Could not connect to a Chroma server".
Always pass --ssl with an explicit true/false value so the user's
CLAUDE_MEM_CHROMA_SSL setting is faithfully forwarded.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* test: add regression tests for ChromaMcpManager SSL flag fix
Adds 4 focused test cases verifying buildCommandArgs() produces correct
--ssl args, covering SSL=false, SSL=true, unset (defaults to false), and
local mode (no --ssl flag). Requested by @xkonjin in PR #1286 review.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: rebuild checked-in bundles to include SSL flag fix
Rebuild all bundles against upstream/main so the --ssl <true|false>
fix is present in the runtime artifacts that hooks and the marketplace
plugin actually execute.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
The pending-work-restart logic had no retry limit, causing infinite loops
when sessions encountered FOREIGN KEY constraint failures. This led to
2000+ error log entries per minute and eventual worker crash via SIGTERM.
Two fixes:
1. Add 'FOREIGN KEY constraint failed' to unrecoverable error patterns
so it short-circuits immediately instead of falling through to restart
2. Add MAX_PENDING_RESTARTS (3) limit to pending-work-restart path as a
safety net for any future unhandled persistent errors
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
The SessionStart hook was incorrectly split into two separate matchers
with the same pattern "startup|clear|compact", causing them to run
in parallel per Claude Code's hook execution model. This resulted in
a race condition where both hooks tried to bind to port 37777
simultaneously, causing "port in use" errors on first startup.
This fix consolidates all SessionStart commands into a single
matcher, ensuring they execute sequentially.
Fixes regression introduced in commit d93bde0.
Co-authored-by: yunshu <yunshu@moresec.cn>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
storeObservations() and storeObservationsAndMarkComplete() were missing
the content-hash deduplication that storeObservation() (singular) already
had via computeObservationContentHash() and findDuplicateObservation().
This caused the Gemini provider (and potentially others that return
multiple observations per response) to insert 2-10x duplicate rows per
tool use, since the batch methods inserted unconditionally without
checking content_hash.
The fix adds the same dedup pattern from storeObservation() to both
batch methods:
1. Compute content hash via computeObservationContentHash()
2. Check for existing observation within 30s window via findDuplicateObservation()
3. Skip insert and reuse existing ID if duplicate found
4. Include content_hash column in INSERT statement
Fixes#1158 (duplicate observations with Gemini provider)
Co-authored-by: Enzo Ricciulli <e.ricciulli@systhema.ai>
When a claude-mem DB is synced between machines running different versions,
orphaned indexes can reference non-existent columns (e.g. idx_observations_content_hash
referencing content_hash). This causes SQLite to throw "malformed database schema"
on ALL queries, including PRAGMAs, creating a silent 503 failure loop.
The fix detects this on startup, uses Python's sqlite3 module to drop the
orphaned schema objects (bun:sqlite doesn't support writable_schema modifications),
resets migration versions, and lets the idempotent migration system recreate
everything properly.
Fixes#1307
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
When a project filter was selected in the Web UI, all SSE live data
(observations, summaries, prompts) was completely discarded. Only
paginated API data was shown, meaning new real-time events were
invisible until the user refreshed the page.
Fix: filter SSE data by project before merging with paginated data,
instead of discarding it entirely.
Fixes#1313
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
When Claude Code runs in a worktree (via Agent tool with isolation: "worktree"),
the transcript path points to the worktree's project directory. After the
worktree is cleaned up, the Stop hook fires but the transcript file no longer
exists, causing extractLastMessage() to throw. This error triggers Claude to
respond, which fires another Stop hook, creating an infinite error loop.
Changed throws to warn-and-return-empty so the summarize hook exits cleanly
with exit 0 instead of cascading errors.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>