* fix: add session lifecycle guards to prevent runaway API spend (#1590)
Three root causes allowed 30+ subprocess accumulation over 36 hours:
1. SIGTERM-killed processes (code 143) triggered crash recovery and
immediately respawned — now detected and treated as intentional
termination (aborts controller so wasAborted=true in .finally).
2. No wall-clock limit: sessions ran for 13+ hours continuously
spending tokens — now refuses new generators after 4 hours and
drains the pending queue to prevent further spawning.
3. Duplicate --resume processes for the same session UUID — now
killed and unregistered before a new spawn is registered.
Generated by Claude Code
Vibe coded by ousamabenyounes
Co-Authored-By: Claude <noreply@anthropic.com>
* fix: use normalized errorMsg in logger.error payload and annotate SIGTERM override
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: use persisted createdAt for wall-clock guard and bind abortController locally to prevent stale abort
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore: re-trigger CodeRabbit review after rate limit reset
* fix: defer process unregistration until exit and align boundary test with strict > (#1693)
- ProcessRegistry: don't unregister PID immediately after SIGTERM — let the
existing 'exit' handler clean up when the process actually exits, preventing
tracking loss for still-live processes.
- Test: align wall-clock boundary test with production's strict `>` operator
(exactly 4h is NOT terminated, only >4h is).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude <noreply@anthropic.com>
Three root causes prevented Gemini sessions from persisting prompts,
observations, and summaries:
1. BeforeAgent was mapped to user-message (display-only) instead of
session-init (which initialises the session and starts the SDK agent).
2. The transcript parser expected Claude Code JSONL (type: "assistant")
but Gemini CLI 0.37.0 writes a JSON document with a messages array
where assistant entries carry type: "gemini". extractLastMessage now
detects the format and routes to the correct parser, preserving
full backward compatibility with Claude Code JSONL transcripts.
3. The summarize handler omitted platformSource from the
/api/sessions/summarize request body, causing sessions to be recorded
without the gemini-cli source tag.
Co-authored-by: Claude <noreply@anthropic.com>
* fix: replace hardcoded nvm/homebrew PATH with universal login shell resolution
Hook commands previously hardcoded PATH entries for nvm and homebrew,
causing `node: command not found` for users with other Node version
managers (mise, asdf, volta, fnm, Nix, etc.).
Replace with `$($SHELL -lc 'echo $PATH')` which inherits the user's
login shell PATH regardless of how Node was installed. Also adds the
missing PATH export to the PreToolUse hook (#1702).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: add cache-path fallback to PreToolUse hook
Aligns PreToolUse _R resolution with all other hooks by adding the
cache directory lookup before falling back to the marketplace path.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: use parent project name for worktree observation writes (#1819)
Observations and sessions from git worktrees were stored under
basename(cwd) instead of the parent repo name because write paths
called getProjectName() (not worktree-aware) instead of
getProjectContext() (worktree-aware). This is the same bug as
#1081, #1317, and #1500 — it regressed because the two functions
coexist and new code reached for the simpler one.
Fix: getProjectContext() now returns parentProjectName as primary
when in a worktree, and all four write-path call sites now use
getProjectContext().primary instead of getProjectName().
Includes regression test that creates a real worktree directory
structure and asserts primary === parentProjectName.
* fix: address review nitpicks — allProjects fallback, JSDoc, write-path test
- ContextBuilder: default projects to context.allProjects for legacy
worktree-labeled record compatibility
- ProjectContext: clarify JSDoc that primary is canonical (parent repo
in worktrees)
- Tests: add write-path regression test mirroring session-init/SessionRoutes
pattern; refactor worktree fixture into beforeAll/afterAll
* refactor(project-name): rename local to cwdProjectName and dedupe allProjects
Addresses final CodeRabbit nitpick: disambiguates the local variable
from the returned `primary` field, and dedupes allProjects via Set
in case parent and cwd resolve to the same name.
---------
Co-authored-by: Ethan Hurst <ethan.hurst@outlook.com.au>
Bun's child_process.spawn() silently drops empty string arguments from
argv, unlike Node which preserves them. When the Agent SDK defaults
settingSources to [] (empty array), [].join(",") produces "" which gets
pushed as ["--setting-sources", ""]. Bun drops the "", causing
--permission-mode to be consumed as the value for --setting-sources:
Error processing --setting-sources: Invalid setting source: --permission-mode
This caused 100% observation failure (exit code 1 on every SDK subprocess
spawn), resulting in 0 observations stored across all sessions.
The fix filters empty string args before passing to spawn(), making the
behavior consistent between Node and Bun runtimes.
Fixes#1779
Related: #1660
Co-authored-by: bswnth48 <69203760+bswnth48@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(file-context): preserve targeted reads + invalidate on mtime (#1719)
The PreToolUse:Read hook unconditionally rewrote tool input to
{file_path, limit:1}, which interacted with two failure modes:
1. Subagent edits a file → parent's next Read still gets truncated
because the observation snapshot predates the change.
2. Claude requests a different section with offset/limit → the hook
strips them, so the Claude Code harness's read-dedup cache returns
"File unchanged" against the prior 1-line read. The file becomes
unreadable for the rest of the conversation, even though the hook's
own recovery hint says "Read again with offset/limit for the
section you need."
Two complementary fixes:
- **mtime invalidation**: stat the file (we already stat for the size
gate) and compare mtimeMs to the newest observation's created_at_epoch.
If the file is newer, pass the read through unchanged so fresh content
reaches Claude.
- **Targeted-read pass-through**: when toolInput already specifies
offset and/or limit, preserve them in updatedInput instead of
collapsing to {limit:1}. The harness's dedup cache then sees a
distinct input and lets the read proceed.
The unconstrained-read path (no offset, no limit) is unchanged: still
truncated to 1 line plus the observation timeline, so token economics
are preserved for the common case.
Tests cover all three branches: existing truncation, targeted-read
pass-through (offset+limit, limit-only), and mtime-driven bypass.
Fixes#1719
* refactor(file-context): address review findings on #1719 fix
- Add offset-only test case for full targeted-read branch coverage
- Use >= for mtime comparison to handle same-millisecond edge case
- Add Number.isFinite() + bounds guards on offset/limit pass-through
- Trim over-verbose comments to concise single-line summaries
- Remove redundant `as number` casts after typeof narrowing
- Add comment explaining fileMtimeMs=0 sentinel invariant
* fix: exclude primary key index from unique constraint check in migration 7
PRAGMA index_list returns all indexes including those backing PRIMARY KEY
columns (origin='pk'), which always have unique=1. The check was therefore
always true, causing migration 7 to run the full DROP/CREATE/RENAME table
sequence on every worker startup instead of short-circuiting once the
UNIQUE constraint had already been removed.
Fix: filter to non-PK indexes by requiring idx.origin !== 'pk'. The
origin field is already present on the existing IndexInfo interface.
Fixes#1749
* fix: apply pk-origin guard to all three migration code paths
CodeRabbit correctly identified that the origin !== 'pk' fix was only
applied to MigrationRunner.ts but not to the two other active code paths
that run the same removeSessionSummariesUniqueConstraint logic:
- SessionStore.ts:220 — used by DatabaseManager and worker-service
- plugin/scripts/context-generator.cjs — bundled artifact (minified)
All three paths now consistently exclude primary-key indexes when
detecting unique constraints on session_summaries.
* fix: restrict .env file permissions to owner-only (0600)
API keys stored in ~/.claude-mem/.env were created without explicit
permissions, defaulting to umask-dependent mode. On systems with a
permissive umask (e.g. 0022), the file would be world-readable.
- Set directory permissions to 0700 on creation
- Set file permissions to 0600 via writeFileSync mode option
- Call chmodSync after write to fix permissions on pre-existing files
Signed-off-by: Jochen Meyer
* fix: also restrict pre-existing directory permissions to 0700
The initial fix only set directory mode on creation. Pre-existing
~/.claude-mem/ directories from earlier installs remained world-readable.
Add chmodSync for the directory alongside the existing file chmod,
and document the Windows limitation (ACLs, not POSIX permissions).
---------
Signed-off-by: Jochen Meyer
syncAndBroadcastSummary was using the raw ParsedSummary (null when salvaged)
instead of summaryForStore for the SSE broadcast, causing a crash when the
LLM returns <observation> without <summary> tags. Also removes misplaced
tree-sitter docs from mem-search/SKILL.md (belongs in smart-explore).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The three SessionStart hooks poll http://localhost:37777/health with an
8-second retry budget and then exit 1 silently (via `curl -sf || exit 1`)
when the worker has not stabilized in time. Claude Code surfaces this as
two `SessionStart:startup hook error - Failed with non-blocking status
code: No stderr output` messages on every first session of the day.
This happens because of a race between the MCP server auto-starting the
worker and the hook's own `worker-service.cjs start` path, which on Linux
respects a live PID file written by an earlier session and waits for the
existing worker to become healthy. 8s is not always enough.
Mitigate in the hook layer without touching worker-service.cjs:
- bump the inner retry loop from 8 to 20 attempts (up to ~20s)
- replace `|| exit 1` with `|| true` so the hook emits its continue JSON
regardless (Claude Code no longer logs a phantom error)
- guard the context-injection hook with a health check - only run
`worker-service.cjs hook claude-code context` if the worker is actually
responding, otherwise skip silently
Trade-off: on cold starts where the worker takes longer than ~20s to
come up, the recent-context preamble is skipped for that first session
instead of surfacing an error. Subsequent sessions in the same day work
normally because the worker is already healthy.
Refs #1447
glob@11.x is deprecated by its maintainer and flagged as containing
widely-publicised security vulnerabilities (including ReDoS risks).
The latest stable version is glob@13.0.6.
Compatibility verified: the codebase uses only `globSync` with
`{ nodir, absolute }` options in two files:
- src/services/transcripts/watcher.ts
- scripts/analyze-transformations-smart.js
The `globSync` function signature and these options are identical
in glob@13. No call-site changes are required.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fixes Issue #1312: AI sometimes returns <observation> XML tags instead of
<summary> tags during the summarize phase, despite clear instructions in
buildSummaryPrompt() requiring <summary> ONLY output.
When this occurs, parseSummary() returns null and the entire session summary
is lost. This fix detects the condition (summary missing + observations
present) and synthesizes a summary from the observation data, ensuring
session summaries are not completely lost.
The salvage mapping:
- request: observation title
- investigated: observation narrative or facts
- learned: observation facts joined
- completed: title if type is feature/bugfix
- notes: indicates this is a synthetic salvage summary
Observations are stored normally regardless of this fallback.
Co-authored-by: Sisyphus <sisyphus@openclaw>
GET /api/corpus returned a bare array, which the MCP server wrapper
(callWorkerAPI) forwards directly. MCP's tools/call validation rejects
non-object results with "expected object, received array", so the
list_corpora MCP tool was completely unusable.
Every other corpus endpoint is a POST that already returns the
{content:[...]} shape, so this is a targeted one-file fix.
Stop hook polled queueLength===0 as a proxy for summary success, but the queue
empties regardless of whether the LLM produced valid <summary> tags. Added
lastSummaryStored tracking on ActiveSession, surfaced via the /api/sessions/status
endpoint, and emit a logger.warn in the Stop hook when summaryStored===false.
Generated by Claude Code
Vibe coded by ousamabenyounes
Co-Authored-By: Claude <noreply@anthropic.com>
- Add Custom Grammars (.claude-mem.json) section explaining how to register
additional tree-sitter parsers for unsupported file extensions
- Add Markdown Special Support section documenting heading-based outline,
code-fence search, section unfold, and frontmatter extraction behaviors
- Expand bundled language test to cover all 10 documented languages plus
the plain-text fallback sentence to prevent partial doc regressions
Co-Authored-By: Claude <noreply@anthropic.com>
- Add missing context-generator.cjs to the SHEBANG_SCRIPTS list so CRLF
regressions in that script are caught by the test suite
- Replace silent early-returns with expect(existsSync(filePath)).toBe(true)
so the suite fails loudly when expected build artifacts are absent
Co-Authored-By: Claude <noreply@anthropic.com>
When the MCP server and SessionStart hook both spawn a worker daemon
concurrently, one loses the bind race (EADDRINUSE / Bun's port-in-use
error). The loser now checks if the winner is healthy; if so, it logs
INFO and exits cleanly instead of logging a misleading ERROR on every
first session start.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
chroma-mcp uses pydantic-settings which auto-reads .env/.env.local from
the CWD. When the project directory contains non-chroma variables (e.g.
CELERY_TASK_ALWAYS_EAGER), pydantic rejects them with "Extra inputs are
not permitted", crashing the subprocess and triggering a permanent
backoff loop. Passing cwd: os.homedir() to StdioClientTransport ensures
pydantic never reads project env files.
Generated by Claude Code
Vibe coded by ousamabenyounes
Co-Authored-By: Claude <noreply@anthropic.com>
Without .gitattributes, building on Windows produces plugin scripts with
CRLF line endings. The CRLF on the shebang line causes
"env: node\r: No such file or directory" on macOS/Linux, breaking the
MCP server and all hook scripts. Add text=auto eol=lf as the global
default plus explicit eol=lf rules for plugin/scripts/*.cjs and *.js.
Generated by Claude Code
Vibe coded by ousamabenyounes
Co-Authored-By: Claude <noreply@anthropic.com>
Node 22+ emits DEP0190 when spawnSync is called with a separate args
array and shell:true, because the args are only concatenated (not
escaped). Split the findBun() PATH check into platform-specific calls:
Windows uses spawnSync('where bun', { shell: true }) as a single string,
Unix uses spawnSync('which', ['bun']) with no shell option.
Generated by Claude Code
Vibe coded by ousamabenyounes
Co-Authored-By: Claude <noreply@anthropic.com>
When the LLM is overwhelmed by large context it can emit bare
<observation/> blocks (or ones containing only <type>). These are
stored as rows where title, narrative, facts and concepts are all
null/empty, appearing as meaningless "Untitled" entries in the context
window. Add a guard in parseObservations() that skips any observation
where every content field is null/empty before pushing it to the
result array.
Generated by Claude Code
Vibe coded by ousamabenyounes
Co-Authored-By: Claude <noreply@anthropic.com>
tree-sitter language docs belonged in smart-explore but were absent;
this adds the Bundled Languages table (10 languages) with correct placement.
Generated by Claude Code
Vibe coded by ousamabenyounes
Co-Authored-By: Claude <noreply@anthropic.com>
Top-level mock.module() in context-reinjection-guard.test.ts permanently stubbed
getProjectName() to 'test-project' for the entire Bun worker process, causing
tests in other files to receive the wrong value. Removed the unnecessary mock
(session-init tests don't assert on project name), added bunfig.toml smol=true
for worker isolation, and added a regression test.
Generated by Claude Code
Vibe coded by ousamabenyounes
Co-Authored-By: Claude <noreply@anthropic.com>
The mcp-server.cjs script requires bun:sqlite, a Bun-specific built-in
that is unavailable in Node.js. When Claude Code spawns the script using
the shebang (#!/usr/bin/env node), the import fails with:
Error: Cannot find module 'bun:sqlite'
Fix: explicitly invoke bun as the command and pass the script as an arg,
so the correct runtime is used regardless of the shebang line.
* feat: add knowledge agent types, store, builder, and renderer
Phase 1 of Knowledge Agents feature. Introduces corpus compilation
pipeline that filters observations from the database into portable
corpus files stored at ~/.claude-mem/corpora/.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add corpus CRUD HTTP endpoints and wire into worker service
Phase 2 of Knowledge Agents. Adds CorpusRoutes with 5 endpoints
(build, list, get, delete, rebuild) and registers them during
worker background initialization alongside SearchRoutes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add KnowledgeAgent with V1 SDK prime/query/reprime
Phase 3 of Knowledge Agents. Uses Agent SDK V1 query() with
resume and disallowedTools for Q&A-only knowledge sessions.
Auto-reprimes on session expiry. Adds prime, query, and reprime
HTTP endpoints to CorpusRoutes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add MCP tools and skill for knowledge agents
Phase 4 of Knowledge Agents. Adds build_corpus, list_corpora,
prime_corpus, and query_corpus MCP tools delegating to worker
HTTP endpoints. Includes /knowledge-agent skill with workflow docs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: handle SDK process exit in KnowledgeAgent, add e2e test
The Agent SDK may throw after yielding all messages when the
Claude process exits with a non-zero code. Now tolerates this
if session_id/answer were already captured. Adds comprehensive
e2e test script (31 assertions) orchestrated via tmux-cli.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: use settings model ID instead of hardcoded model in KnowledgeAgent
Reads CLAUDE_MEM_MODEL from user settings via getModelId(), matching
the existing SDKAgent pattern. No more hardcoded model assumptions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: improve knowledge agents developer experience
Add public documentation page, rebuild/reprime MCP tools, and actionable
error messages. DX review scored knowledge agents 4/10 — core engineering
works (31/31 e2e) but the feature was invisible. This addresses
discoverability (docs, cross-links), API completeness (missing MCP tools),
and error quality (fix/example fields in error responses).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add quick start guide to knowledge agents page
Covers the three main use cases upfront: creating an agent, asking a
single question, and starting a fresh conversation with reprime. Includes
keeping-it-current section for rebuild + reprime workflow.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: address code review issues — path traversal, session safety, prompt injection
- Block path traversal in CorpusStore with alphanumeric name validation and resolved path check
- Harden system prompt against instruction injection from untrusted corpus content
- Validate question field as non-empty string in query endpoint
- Only persist session_id after successful prime (not null on failure)
- Persist refreshed session_id after query execution
- Only auto-reprime on session resume errors, not all query failures
- Add fenced code block language tags to SKILL.md
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: address remaining code review issues — e2e robustness, MCP validation, docs
- Harden e2e curl wrappers with connect-timeout, fallback to HTTP 000 on transport failure
- Use curl_post wrapper consistently for all long-running POST calls
- Add runtime name validation to all corpus MCP tool handlers
- Fix docs: soften hallucination guarantee to probabilistic claim
- Fix architecture diagram: add missing rebuild_corpus and reprime_corpus tools
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: enforce string[] type in safeParseJsonArray for corpus data integrity
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: add blank line before fenced code blocks in SKILL.md maintenance section
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add custom OpenRouter base URL support
Allow users to configure a custom base URL for OpenRouter API calls
through settings UI and environment management.
Generated with AI
Co-Authored-By: AI Partner
* refactor: remove OpenRouter base URL customization, keep Claude URL changes
Only retain ANTHROPIC_BASE_URL and ANTHROPIC_AUTH_TOKEN support in
EnvManager for custom Claude API endpoint configuration.
Generated with AI
Co-Authored-By: AI Partner
* chore: revert build artifacts to match main
Generated with AI
Co-Authored-By: AI Partner
* fix: remove ANTHROPIC_AUTH_TOKEN, add ANTHROPIC_BASE_URL persistence
- Remove unnecessary ANTHROPIC_AUTH_TOKEN (inherited from parent process)
- Add ANTHROPIC_BASE_URL to saveClaudeMemEnv() to fix config persistence
- Keep only ANTHROPIC_BASE_URL support for custom API endpoint
Generated with AI
Co-Authored-By: AI Partner
Imported observations were invisible to the MCP search tool because the
FTS5 content table was not reliably updated during bulk import. The import
handler now calls rebuildObservationsFTSIndex() after inserting new
observations, ensuring the full-text search index is consistent.
A new SessionStore.rebuildObservationsFTSIndex() method encapsulates the
FTS5 rebuild command and is a no-op when the observations_fts table does
not exist (e.g. FTS5 unavailable on Windows).