Revert of #1820 behavior. Each worktree now gets its own bucket:
- In a worktree, primary = `parent/worktree` (e.g. `claude-mem/dar-es-salaam`)
- In a main repo, primary = basename (unchanged)
- allProjects is always `[primary]` — strict isolation at query time
Includes a one-off maintenance script (scripts/worktree-remap.ts) that
retroactively reattributes past sessions to their worktree using path
signals in observations and user prompts. Two-rule inference keeps the
remap high-confidence:
1. The worktree basename in the path matches the session's current
plain project name (pre-#1820 era; trusted).
2. Or all worktree path signals converge on a single (parent, worktree)
across the session.
Ambiguous sessions are skipped.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Observations were 100% failing on Claude Code 2.1.109+ because the Agent
SDK emits ["--setting-sources", ""] when settingSources defaults to [].
The existing Bun-workaround filter stripped the empty string but left
the orphan --setting-sources flag, which then consumed --permission-mode
as its value, crashing the subprocess with:
Error processing --setting-sources:
Invalid setting source: --permission-mode.
Make the filter pair-aware: when an empty arg follows a --flag, drop
both so the SDK default (no setting sources) is preserved by omission.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A prior Claude instance snuck in a `$CMEM` token branding header
during a context compression refactor (7e072106). Reverts back to
the original descriptive format: `# [project] recent context, datetime`
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The synthetic summary salvage feature created fake summaries from observation
data when the AI returned <observation> instead of <summary> tags. This was
overengineered — missing a summary is preferable to fabricating one from
observation fields that don't map cleanly to summary semantics.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: reap stuck generators in reapStaleSessions (fixes#1652)
Sessions whose SDK subprocess hung would stay in the active sessions
map forever because `reapStaleSessions()` unconditionally skipped any
session with a non-null `generatorPromise`. The generator was blocked
on `for await (const msg of queryResult)` inside SDKAgent and could
never unblock itself — the idle-timeout only fires when the generator
is in `waitForMessage()`, and the orphan reaper skips processes whose
session is still in the map.
Add `MAX_GENERATOR_IDLE_MS` (5 min). When `reapStaleSessions()` sees
a session whose `generatorPromise` is set but `lastGeneratorActivity`
has not advanced in over 5 minutes, it now:
1. SIGKILLs the tracked subprocess to unblock the stuck `for await`
2. Calls `session.abortController.abort()` so the generator loop exits
3. Calls `deleteSession()` which waits up to 30 s for the generator to
finish, then cleans up supervisor-tracked children
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: freeze time in stale-generator test and import constants from production source
- Export MAX_GENERATOR_IDLE_MS, MAX_SESSION_IDLE_MS, StaleGeneratorCandidate,
StaleGeneratorProcess, and detectStaleGenerator from SessionManager.ts so
tests no longer duplicate production constants or detection logic.
- Use setSystemTime() from bun:test to freeze Date.now() in the
"exactly at threshold" test, eliminating the flaky double-Date.now() race.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: add circuit breaker to OpenClaw worker client (#1636)
When the claude-mem worker is unreachable, every plugin event (before_agent_start,
before_prompt_build, tool_result_persist, agent_end) triggered a new fetch that
failed and logged a warning, causing CPU-spinning and continuous log spam.
Add a CLOSED/OPEN/HALF_OPEN circuit breaker: after 3 consecutive network errors
the circuit opens, silently drops all worker calls for 30 s, then sends one probe.
Individual failures are only logged while the circuit is still CLOSED; once open
it logs once ("disabling requests for 30s") and goes quiet until recovery.
Generated by Claude Code
Vibe coded by Ousama Ben Younes
Co-Authored-By: Claude <noreply@anthropic.com>
* fix: limit HALF_OPEN to single probe and move circuitOnSuccess after response.ok check
- Add _halfOpenProbeInFlight flag so only one probe is allowed in HALF_OPEN state;
concurrent callers are silently dropped until the probe completes (success or failure)
- Move circuitOnSuccess() to after the response.ok check in workerPost, workerPostFireAndForget,
and workerGetText so non-2xx HTTP responses no longer close the circuit
- Clear _halfOpenProbeInFlight in both circuitOnSuccess and circuitOnFailure, and in circuitReset
- Add regression test covering HALF_OPEN one-probe behavior: non-2xx keeps circuit open,
2xx closes it
* chore: trigger CodeRabbit re-review
---------
Co-authored-by: Claude <noreply@anthropic.com>
* fix: resolve Setup hook broken reference and warn on macOS-only binary (#1547)
On Linux ARM64, the plugin silently failed because:
1. The Setup hook called setup.sh which was removed; the hook exited 127
(file not found), causing the plugin to appear uninstalled.
2. The committed plugin/scripts/claude-mem binary is macOS arm64 only;
no warning was shown when it could not execute on other platforms.
Fix the Setup hook to call smart-install.js (the current setup mechanism)
and add checkBinaryPlatformCompatibility() to smart-install.js, which reads
the Mach-O magic bytes from the bundled binary and warns users on non-macOS
platforms that the JS fallback (bun-runner.js + worker-service.cjs) is active.
Generated by Claude Code
Vibe coded by ousamabenyounes
Co-Authored-By: Claude <noreply@anthropic.com>
* fix: close fd in finally block, strengthen smart-install tests to use production function
- Wrap openSync/readSync in checkBinaryPlatformCompatibility with a finally block so the file descriptor is always closed even if readSync throws
- Export checkBinaryPlatformCompatibility with an optional binaryPath param for testability
- Refactor Mach-O detection tests to call the production function directly, mocking process.platform and passing controlled binary paths, eliminating duplicated inline logic
- Strengthen plugin-distribution test to assert at least one command hook exists before checking for smart-install.js, preventing vacuous pass
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude <noreply@anthropic.com>
* fix: add session lifecycle guards to prevent runaway API spend (#1590)
Three root causes allowed 30+ subprocess accumulation over 36 hours:
1. SIGTERM-killed processes (code 143) triggered crash recovery and
immediately respawned — now detected and treated as intentional
termination (aborts controller so wasAborted=true in .finally).
2. No wall-clock limit: sessions ran for 13+ hours continuously
spending tokens — now refuses new generators after 4 hours and
drains the pending queue to prevent further spawning.
3. Duplicate --resume processes for the same session UUID — now
killed and unregistered before a new spawn is registered.
Generated by Claude Code
Vibe coded by ousamabenyounes
Co-Authored-By: Claude <noreply@anthropic.com>
* fix: use normalized errorMsg in logger.error payload and annotate SIGTERM override
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: use persisted createdAt for wall-clock guard and bind abortController locally to prevent stale abort
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore: re-trigger CodeRabbit review after rate limit reset
* fix: defer process unregistration until exit and align boundary test with strict > (#1693)
- ProcessRegistry: don't unregister PID immediately after SIGTERM — let the
existing 'exit' handler clean up when the process actually exits, preventing
tracking loss for still-live processes.
- Test: align wall-clock boundary test with production's strict `>` operator
(exactly 4h is NOT terminated, only >4h is).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude <noreply@anthropic.com>
Three root causes prevented Gemini sessions from persisting prompts,
observations, and summaries:
1. BeforeAgent was mapped to user-message (display-only) instead of
session-init (which initialises the session and starts the SDK agent).
2. The transcript parser expected Claude Code JSONL (type: "assistant")
but Gemini CLI 0.37.0 writes a JSON document with a messages array
where assistant entries carry type: "gemini". extractLastMessage now
detects the format and routes to the correct parser, preserving
full backward compatibility with Claude Code JSONL transcripts.
3. The summarize handler omitted platformSource from the
/api/sessions/summarize request body, causing sessions to be recorded
without the gemini-cli source tag.
Co-authored-by: Claude <noreply@anthropic.com>
* fix: replace hardcoded nvm/homebrew PATH with universal login shell resolution
Hook commands previously hardcoded PATH entries for nvm and homebrew,
causing `node: command not found` for users with other Node version
managers (mise, asdf, volta, fnm, Nix, etc.).
Replace with `$($SHELL -lc 'echo $PATH')` which inherits the user's
login shell PATH regardless of how Node was installed. Also adds the
missing PATH export to the PreToolUse hook (#1702).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: add cache-path fallback to PreToolUse hook
Aligns PreToolUse _R resolution with all other hooks by adding the
cache directory lookup before falling back to the marketplace path.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: use parent project name for worktree observation writes (#1819)
Observations and sessions from git worktrees were stored under
basename(cwd) instead of the parent repo name because write paths
called getProjectName() (not worktree-aware) instead of
getProjectContext() (worktree-aware). This is the same bug as
#1081, #1317, and #1500 — it regressed because the two functions
coexist and new code reached for the simpler one.
Fix: getProjectContext() now returns parentProjectName as primary
when in a worktree, and all four write-path call sites now use
getProjectContext().primary instead of getProjectName().
Includes regression test that creates a real worktree directory
structure and asserts primary === parentProjectName.
* fix: address review nitpicks — allProjects fallback, JSDoc, write-path test
- ContextBuilder: default projects to context.allProjects for legacy
worktree-labeled record compatibility
- ProjectContext: clarify JSDoc that primary is canonical (parent repo
in worktrees)
- Tests: add write-path regression test mirroring session-init/SessionRoutes
pattern; refactor worktree fixture into beforeAll/afterAll
* refactor(project-name): rename local to cwdProjectName and dedupe allProjects
Addresses final CodeRabbit nitpick: disambiguates the local variable
from the returned `primary` field, and dedupes allProjects via Set
in case parent and cwd resolve to the same name.
---------
Co-authored-by: Ethan Hurst <ethan.hurst@outlook.com.au>
Bun's child_process.spawn() silently drops empty string arguments from
argv, unlike Node which preserves them. When the Agent SDK defaults
settingSources to [] (empty array), [].join(",") produces "" which gets
pushed as ["--setting-sources", ""]. Bun drops the "", causing
--permission-mode to be consumed as the value for --setting-sources:
Error processing --setting-sources: Invalid setting source: --permission-mode
This caused 100% observation failure (exit code 1 on every SDK subprocess
spawn), resulting in 0 observations stored across all sessions.
The fix filters empty string args before passing to spawn(), making the
behavior consistent between Node and Bun runtimes.
Fixes#1779
Related: #1660
Co-authored-by: bswnth48 <69203760+bswnth48@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(file-context): preserve targeted reads + invalidate on mtime (#1719)
The PreToolUse:Read hook unconditionally rewrote tool input to
{file_path, limit:1}, which interacted with two failure modes:
1. Subagent edits a file → parent's next Read still gets truncated
because the observation snapshot predates the change.
2. Claude requests a different section with offset/limit → the hook
strips them, so the Claude Code harness's read-dedup cache returns
"File unchanged" against the prior 1-line read. The file becomes
unreadable for the rest of the conversation, even though the hook's
own recovery hint says "Read again with offset/limit for the
section you need."
Two complementary fixes:
- **mtime invalidation**: stat the file (we already stat for the size
gate) and compare mtimeMs to the newest observation's created_at_epoch.
If the file is newer, pass the read through unchanged so fresh content
reaches Claude.
- **Targeted-read pass-through**: when toolInput already specifies
offset and/or limit, preserve them in updatedInput instead of
collapsing to {limit:1}. The harness's dedup cache then sees a
distinct input and lets the read proceed.
The unconstrained-read path (no offset, no limit) is unchanged: still
truncated to 1 line plus the observation timeline, so token economics
are preserved for the common case.
Tests cover all three branches: existing truncation, targeted-read
pass-through (offset+limit, limit-only), and mtime-driven bypass.
Fixes#1719
* refactor(file-context): address review findings on #1719 fix
- Add offset-only test case for full targeted-read branch coverage
- Use >= for mtime comparison to handle same-millisecond edge case
- Add Number.isFinite() + bounds guards on offset/limit pass-through
- Trim over-verbose comments to concise single-line summaries
- Remove redundant `as number` casts after typeof narrowing
- Add comment explaining fileMtimeMs=0 sentinel invariant
* fix: exclude primary key index from unique constraint check in migration 7
PRAGMA index_list returns all indexes including those backing PRIMARY KEY
columns (origin='pk'), which always have unique=1. The check was therefore
always true, causing migration 7 to run the full DROP/CREATE/RENAME table
sequence on every worker startup instead of short-circuiting once the
UNIQUE constraint had already been removed.
Fix: filter to non-PK indexes by requiring idx.origin !== 'pk'. The
origin field is already present on the existing IndexInfo interface.
Fixes#1749
* fix: apply pk-origin guard to all three migration code paths
CodeRabbit correctly identified that the origin !== 'pk' fix was only
applied to MigrationRunner.ts but not to the two other active code paths
that run the same removeSessionSummariesUniqueConstraint logic:
- SessionStore.ts:220 — used by DatabaseManager and worker-service
- plugin/scripts/context-generator.cjs — bundled artifact (minified)
All three paths now consistently exclude primary-key indexes when
detecting unique constraints on session_summaries.
* fix: restrict .env file permissions to owner-only (0600)
API keys stored in ~/.claude-mem/.env were created without explicit
permissions, defaulting to umask-dependent mode. On systems with a
permissive umask (e.g. 0022), the file would be world-readable.
- Set directory permissions to 0700 on creation
- Set file permissions to 0600 via writeFileSync mode option
- Call chmodSync after write to fix permissions on pre-existing files
Signed-off-by: Jochen Meyer
* fix: also restrict pre-existing directory permissions to 0700
The initial fix only set directory mode on creation. Pre-existing
~/.claude-mem/ directories from earlier installs remained world-readable.
Add chmodSync for the directory alongside the existing file chmod,
and document the Windows limitation (ACLs, not POSIX permissions).
---------
Signed-off-by: Jochen Meyer
syncAndBroadcastSummary was using the raw ParsedSummary (null when salvaged)
instead of summaryForStore for the SSE broadcast, causing a crash when the
LLM returns <observation> without <summary> tags. Also removes misplaced
tree-sitter docs from mem-search/SKILL.md (belongs in smart-explore).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The three SessionStart hooks poll http://localhost:37777/health with an
8-second retry budget and then exit 1 silently (via `curl -sf || exit 1`)
when the worker has not stabilized in time. Claude Code surfaces this as
two `SessionStart:startup hook error - Failed with non-blocking status
code: No stderr output` messages on every first session of the day.
This happens because of a race between the MCP server auto-starting the
worker and the hook's own `worker-service.cjs start` path, which on Linux
respects a live PID file written by an earlier session and waits for the
existing worker to become healthy. 8s is not always enough.
Mitigate in the hook layer without touching worker-service.cjs:
- bump the inner retry loop from 8 to 20 attempts (up to ~20s)
- replace `|| exit 1` with `|| true` so the hook emits its continue JSON
regardless (Claude Code no longer logs a phantom error)
- guard the context-injection hook with a health check - only run
`worker-service.cjs hook claude-code context` if the worker is actually
responding, otherwise skip silently
Trade-off: on cold starts where the worker takes longer than ~20s to
come up, the recent-context preamble is skipped for that first session
instead of surfacing an error. Subsequent sessions in the same day work
normally because the worker is already healthy.
Refs #1447
glob@11.x is deprecated by its maintainer and flagged as containing
widely-publicised security vulnerabilities (including ReDoS risks).
The latest stable version is glob@13.0.6.
Compatibility verified: the codebase uses only `globSync` with
`{ nodir, absolute }` options in two files:
- src/services/transcripts/watcher.ts
- scripts/analyze-transformations-smart.js
The `globSync` function signature and these options are identical
in glob@13. No call-site changes are required.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fixes Issue #1312: AI sometimes returns <observation> XML tags instead of
<summary> tags during the summarize phase, despite clear instructions in
buildSummaryPrompt() requiring <summary> ONLY output.
When this occurs, parseSummary() returns null and the entire session summary
is lost. This fix detects the condition (summary missing + observations
present) and synthesizes a summary from the observation data, ensuring
session summaries are not completely lost.
The salvage mapping:
- request: observation title
- investigated: observation narrative or facts
- learned: observation facts joined
- completed: title if type is feature/bugfix
- notes: indicates this is a synthetic salvage summary
Observations are stored normally regardless of this fallback.
Co-authored-by: Sisyphus <sisyphus@openclaw>