Commit Graph

660 Commits

Author SHA1 Message Date
Ben Younes 05232ff091 fix: reap stuck generators in reapStaleSessions (fixes #1652) (#1698)
* fix: reap stuck generators in reapStaleSessions (fixes #1652)

Sessions whose SDK subprocess hung would stay in the active sessions
map forever because `reapStaleSessions()` unconditionally skipped any
session with a non-null `generatorPromise`.  The generator was blocked
on `for await (const msg of queryResult)` inside SDKAgent and could
never unblock itself — the idle-timeout only fires when the generator
is in `waitForMessage()`, and the orphan reaper skips processes whose
session is still in the map.

Add `MAX_GENERATOR_IDLE_MS` (5 min).  When `reapStaleSessions()` sees
a session whose `generatorPromise` is set but `lastGeneratorActivity`
has not advanced in over 5 minutes, it now:
1. SIGKILLs the tracked subprocess to unblock the stuck `for await`
2. Calls `session.abortController.abort()` so the generator loop exits
3. Calls `deleteSession()` which waits up to 30 s for the generator to
   finish, then cleans up supervisor-tracked children

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: freeze time in stale-generator test and import constants from production source

- Export MAX_GENERATOR_IDLE_MS, MAX_SESSION_IDLE_MS, StaleGeneratorCandidate,
  StaleGeneratorProcess, and detectStaleGenerator from SessionManager.ts so
  tests no longer duplicate production constants or detection logic.
- Use setSystemTime() from bun:test to freeze Date.now() in the
  "exactly at threshold" test, eliminating the flaky double-Date.now() race.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 00:58:35 -07:00
Ben Younes f97c50bfb9 fix: session lifecycle guards to prevent runaway API spend (#1590) (#1693)
* fix: add session lifecycle guards to prevent runaway API spend (#1590)

Three root causes allowed 30+ subprocess accumulation over 36 hours:
1. SIGTERM-killed processes (code 143) triggered crash recovery and
   immediately respawned — now detected and treated as intentional
   termination (aborts controller so wasAborted=true in .finally).
2. No wall-clock limit: sessions ran for 13+ hours continuously
   spending tokens — now refuses new generators after 4 hours and
   drains the pending queue to prevent further spawning.
3. Duplicate --resume processes for the same session UUID — now
   killed and unregistered before a new spawn is registered.

Generated by Claude Code
Vibe coded by ousamabenyounes

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: use normalized errorMsg in logger.error payload and annotate SIGTERM override

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: use persisted createdAt for wall-clock guard and bind abortController locally to prevent stale abort

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore: re-trigger CodeRabbit review after rate limit reset

* fix: defer process unregistration until exit and align boundary test with strict > (#1693)

- ProcessRegistry: don't unregister PID immediately after SIGTERM — let the
  existing 'exit' handler clean up when the process actually exits, preventing
  tracking loss for still-live processes.
- Test: align wall-clock boundary test with production's strict `>` operator
  (exactly 4h is NOT terminated, only >4h is).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-04-15 00:58:23 -07:00
Ben Younes 983be42998 fix: resolve Gemini CLI 0.37.0 session capture failures (#1664) (#1692)
Three root causes prevented Gemini sessions from persisting prompts,
observations, and summaries:

1. BeforeAgent was mapped to user-message (display-only) instead of
   session-init (which initialises the session and starts the SDK agent).

2. The transcript parser expected Claude Code JSONL (type: "assistant")
   but Gemini CLI 0.37.0 writes a JSON document with a messages array
   where assistant entries carry type: "gemini". extractLastMessage now
   detects the format and routes to the correct parser, preserving
   full backward compatibility with Claude Code JSONL transcripts.

3. The summarize handler omitted platformSource from the
   /api/sessions/summarize request body, causing sessions to be recorded
   without the gemini-cli source tag.

Co-authored-by: Claude <noreply@anthropic.com>
2026-04-15 00:58:20 -07:00
Ethan 16a0737dfc fix: use parent project name for worktree observation writes (#1820)
* fix: use parent project name for worktree observation writes (#1819)

Observations and sessions from git worktrees were stored under
basename(cwd) instead of the parent repo name because write paths
called getProjectName() (not worktree-aware) instead of
getProjectContext() (worktree-aware). This is the same bug as
#1081, #1317, and #1500 — it regressed because the two functions
coexist and new code reached for the simpler one.

Fix: getProjectContext() now returns parentProjectName as primary
when in a worktree, and all four write-path call sites now use
getProjectContext().primary instead of getProjectName().

Includes regression test that creates a real worktree directory
structure and asserts primary === parentProjectName.

* fix: address review nitpicks — allProjects fallback, JSDoc, write-path test

- ContextBuilder: default projects to context.allProjects for legacy
  worktree-labeled record compatibility
- ProjectContext: clarify JSDoc that primary is canonical (parent repo
  in worktrees)
- Tests: add write-path regression test mirroring session-init/SessionRoutes
  pattern; refactor worktree fixture into beforeAll/afterAll

* refactor(project-name): rename local to cwdProjectName and dedupe allProjects

Addresses final CodeRabbit nitpick: disambiguates the local variable
from the returned `primary` field, and dedupes allProjects via Set
in case parent and cwd resolve to the same name.

---------

Co-authored-by: Ethan Hurst <ethan.hurst@outlook.com.au>
2026-04-15 00:58:14 -07:00
biswanath-cmd 3d92684e04 fix: filter empty string args before Bun spawn() to prevent CLI parsing errors (#1781)
Bun's child_process.spawn() silently drops empty string arguments from
argv, unlike Node which preserves them. When the Agent SDK defaults
settingSources to [] (empty array), [].join(",") produces "" which gets
pushed as ["--setting-sources", ""]. Bun drops the "", causing
--permission-mode to be consumed as the value for --setting-sources:

  Error processing --setting-sources: Invalid setting source: --permission-mode

This caused 100% observation failure (exit code 1 on every SDK subprocess
spawn), resulting in 0 observations stored across all sessions.

The fix filters empty string args before passing to spawn(), making the
behavior consistent between Node and Bun runtimes.

Fixes #1779
Related: #1660

Co-authored-by: bswnth48 <69203760+bswnth48@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 00:58:11 -07:00
enma998 471e1f62f9 Fix npx search and default Codex context to workspace-local AGENTS (#1780)
* Fix npx search query parameter mismatch

* Use workspace-local Codex AGENTS context by default

---------

Co-authored-by: bnb <bnb>
2026-04-15 00:58:08 -07:00
suyua9 eeb6841033 fix: coerce corpus route filters (#1776)
* fix: coerce corpus route filters

* test: cover unsupported corpus type filters
2026-04-15 00:58:01 -07:00
Tran Quang 2a2008bac2 fix(file-context): preserve targeted reads + invalidate on mtime (#1719) (#1729)
* fix(file-context): preserve targeted reads + invalidate on mtime (#1719)

The PreToolUse:Read hook unconditionally rewrote tool input to
{file_path, limit:1}, which interacted with two failure modes:

1. Subagent edits a file → parent's next Read still gets truncated
   because the observation snapshot predates the change.
2. Claude requests a different section with offset/limit → the hook
   strips them, so the Claude Code harness's read-dedup cache returns
   "File unchanged" against the prior 1-line read. The file becomes
   unreadable for the rest of the conversation, even though the hook's
   own recovery hint says "Read again with offset/limit for the
   section you need."

Two complementary fixes:

- **mtime invalidation**: stat the file (we already stat for the size
  gate) and compare mtimeMs to the newest observation's created_at_epoch.
  If the file is newer, pass the read through unchanged so fresh content
  reaches Claude.

- **Targeted-read pass-through**: when toolInput already specifies
  offset and/or limit, preserve them in updatedInput instead of
  collapsing to {limit:1}. The harness's dedup cache then sees a
  distinct input and lets the read proceed.

The unconstrained-read path (no offset, no limit) is unchanged: still
truncated to 1 line plus the observation timeline, so token economics
are preserved for the common case.

Tests cover all three branches: existing truncation, targeted-read
pass-through (offset+limit, limit-only), and mtime-driven bypass.

Fixes #1719

* refactor(file-context): address review findings on #1719 fix

- Add offset-only test case for full targeted-read branch coverage
- Use >= for mtime comparison to handle same-millisecond edge case
- Add Number.isFinite() + bounds guards on offset/limit pass-through
- Trim over-verbose comments to concise single-line summaries
- Remove redundant `as number` casts after typeof narrowing
- Add comment explaining fileMtimeMs=0 sentinel invariant
2026-04-15 00:57:57 -07:00
Jochen Meyer 59ce0fc553 fix: exclude primary key index from unique constraint check in migration 7 (#1771)
* fix: exclude primary key index from unique constraint check in migration 7

PRAGMA index_list returns all indexes including those backing PRIMARY KEY
columns (origin='pk'), which always have unique=1. The check was therefore
always true, causing migration 7 to run the full DROP/CREATE/RENAME table
sequence on every worker startup instead of short-circuiting once the
UNIQUE constraint had already been removed.

Fix: filter to non-PK indexes by requiring idx.origin !== 'pk'. The
origin field is already present on the existing IndexInfo interface.

Fixes #1749

* fix: apply pk-origin guard to all three migration code paths

CodeRabbit correctly identified that the origin !== 'pk' fix was only
applied to MigrationRunner.ts but not to the two other active code paths
that run the same removeSessionSummariesUniqueConstraint logic:

- SessionStore.ts:220 — used by DatabaseManager and worker-service
- plugin/scripts/context-generator.cjs — bundled artifact (minified)

All three paths now consistently exclude primary-key indexes when
detecting unique constraints on session_summaries.
2026-04-15 00:57:51 -07:00
Jochen Meyer 31ee1024c5 fix: restrict ~/.claude-mem/.env permissions to owner-only (0600) (#1770)
* fix: restrict .env file permissions to owner-only (0600)

API keys stored in ~/.claude-mem/.env were created without explicit
permissions, defaulting to umask-dependent mode. On systems with a
permissive umask (e.g. 0022), the file would be world-readable.

- Set directory permissions to 0700 on creation
- Set file permissions to 0600 via writeFileSync mode option
- Call chmodSync after write to fix permissions on pre-existing files

Signed-off-by: Jochen Meyer

* fix: also restrict pre-existing directory permissions to 0700

The initial fix only set directory mode on creation. Pre-existing
~/.claude-mem/ directories from earlier installs remained world-readable.
Add chmodSync for the directory alongside the existing file chmod,
and document the Windows limitation (ACLs, not POSIX permissions).

---------

Signed-off-by: Jochen Meyer
2026-04-15 00:57:48 -07:00
Alex Newman a390a537c9 fix: broadcast uses summaryForStore to support salvaged summaries (#1718)
syncAndBroadcastSummary was using the raw ParsedSummary (null when salvaged)
instead of summaryForStore for the SSE broadcast, causing a crash when the
LLM returns <observation> without <summary> tags. Also removes misplaced
tree-sitter docs from mem-search/SKILL.md (belongs in smart-explore).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 19:11:48 -07:00
Alex Newman 2357835942 Merge pull request #1686 from ousamabenyounes/fix/issue-1633
fix: expose summaryStored in session status to detect silent summary loss (#1633)
2026-04-14 18:41:58 -07:00
Alex Newman 77a22d30b2 Merge pull request #1555 from ousamabenyounes/fix/issue-1384-mcp-inputschema
fix: declare inputSchema properties for search and timeline MCP tools (#1384 #1413)
2026-04-14 18:41:54 -07:00
Alex Newman 40a25e0225 Merge pull request #1676 from ousamabenyounes/fix/issue-1625
fix: filter ghost observations with no content fields (#1625)
2026-04-14 18:41:51 -07:00
Alex Newman 4c2ab98d90 Merge pull request #1679 from ousamabenyounes/fix/issue-1297
fix: set cwd to homedir when spawning chroma-mcp to prevent pydantic .env.local crash (#1297)
2026-04-14 18:41:48 -07:00
Alex Newman 216d17879d Merge pull request #1680 from ousamabenyounes/fix/issue-1447
fix: suppress false ERROR when duplicate daemon loses port bind race (#1447)
2026-04-14 18:41:25 -07:00
Alex Newman 9a91a1be2b Merge pull request #1701 from ck0park/fix/list-corpora-mcp-result-shape
fix: list_corpora MCP tool — wrap response in CallToolResult shape (fixes #1700)
2026-04-14 18:41:13 -07:00
Sisyphus 🏔️ b7c23ca232 fix(ResponseProcessor): salvage synthetic summary when AI returns <observation> instead of <summary>
Fixes Issue #1312: AI sometimes returns <observation> XML tags instead of
<summary> tags during the summarize phase, despite clear instructions in
buildSummaryPrompt() requiring <summary> ONLY output.

When this occurs, parseSummary() returns null and the entire session summary
is lost. This fix detects the condition (summary missing + observations
present) and synthesizes a summary from the observation data, ensuring
session summaries are not completely lost.

The salvage mapping:
- request: observation title
- investigated: observation narrative or facts
- learned: observation facts joined
- completed: title if type is feature/bugfix
- notes: indicates this is a synthetic salvage summary

Observations are stored normally regardless of this fallback.

Co-authored-by: Sisyphus <sisyphus@openclaw>
2026-04-11 20:29:35 +08:00
Ousama Ben Younes edc8535ac1 fix: skip queueLength===0 completion branch when session returns 404 2026-04-11 08:16:35 +00:00
ck0park ad127bec40 fix: wrap list_corpora response in MCP CallToolResult shape (fixes #1700)
GET /api/corpus returned a bare array, which the MCP server wrapper
(callWorkerAPI) forwards directly. MCP's tools/call validation rejects
non-object results with "expected object, received array", so the
list_corpora MCP tool was completely unusable.

Every other corpus endpoint is a POST that already returns the
{content:[...]} shape, so this is a targeted one-file fix.
2026-04-11 09:57:01 +09:00
Ousama Ben Younes 2f19eab9c2 fix: expose summaryStored in session status to detect silent summary loss (#1633)
Stop hook polled queueLength===0 as a proxy for summary success, but the queue
empties regardless of whether the LLM produced valid <summary> tags. Added
lastSummaryStored tracking on ActiveSession, surfaced via the /api/sessions/status
endpoint, and emit a logger.warn in the Stop hook when summaryStored===false.

Generated by Claude Code
Vibe coded by ousamabenyounes

Co-Authored-By: Claude <noreply@anthropic.com>
2026-04-10 15:06:18 +00:00
Ousama Ben Younes 08cf2ba3bd fix: suppress false ERROR when duplicate daemon loses port bind race (#1447)
When the MCP server and SessionStart hook both spawn a worker daemon
concurrently, one loses the bind race (EADDRINUSE / Bun's port-in-use
error). The loser now checks if the winner is healthy; if so, it logs
INFO and exits cleanly instead of logging a misleading ERROR on every
first session start.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 10:01:08 +00:00
Ousama Ben Younes c7c4fd54d6 fix: set cwd to homedir when spawning chroma-mcp to prevent pydantic .env.local crash (#1297)
chroma-mcp uses pydantic-settings which auto-reads .env/.env.local from
the CWD. When the project directory contains non-chroma variables (e.g.
CELERY_TASK_ALWAYS_EAGER), pydantic rejects them with "Extra inputs are
not permitted", crashing the subprocess and triggering a permanent
backoff loop. Passing cwd: os.homedir() to StdioClientTransport ensures
pydantic never reads project env files.

Generated by Claude Code
Vibe coded by ousamabenyounes

Co-Authored-By: Claude <noreply@anthropic.com>
2026-04-10 09:55:02 +00:00
Ousama Ben Younes e398212983 fix: filter ghost observations with no content fields (#1625)
When the LLM is overwhelmed by large context it can emit bare
<observation/> blocks (or ones containing only <type>). These are
stored as rows where title, narrative, facts and concepts are all
null/empty, appearing as meaningless "Untitled" entries in the context
window. Add a guard in parseObservations() that skips any observation
where every content field is null/empty before pushing it to the
result array.

Generated by Claude Code
Vibe coded by ousamabenyounes

Co-Authored-By: Claude <noreply@anthropic.com>
2026-04-10 09:40:40 +00:00
Alex Newman c648d5d8d2 feat: Knowledge Agents — queryable corpora from claude-mem (#1653)
* feat: add knowledge agent types, store, builder, and renderer

Phase 1 of Knowledge Agents feature. Introduces corpus compilation
pipeline that filters observations from the database into portable
corpus files stored at ~/.claude-mem/corpora/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add corpus CRUD HTTP endpoints and wire into worker service

Phase 2 of Knowledge Agents. Adds CorpusRoutes with 5 endpoints
(build, list, get, delete, rebuild) and registers them during
worker background initialization alongside SearchRoutes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add KnowledgeAgent with V1 SDK prime/query/reprime

Phase 3 of Knowledge Agents. Uses Agent SDK V1 query() with
resume and disallowedTools for Q&A-only knowledge sessions.
Auto-reprimes on session expiry. Adds prime, query, and reprime
HTTP endpoints to CorpusRoutes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add MCP tools and skill for knowledge agents

Phase 4 of Knowledge Agents. Adds build_corpus, list_corpora,
prime_corpus, and query_corpus MCP tools delegating to worker
HTTP endpoints. Includes /knowledge-agent skill with workflow docs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: handle SDK process exit in KnowledgeAgent, add e2e test

The Agent SDK may throw after yielding all messages when the
Claude process exits with a non-zero code. Now tolerates this
if session_id/answer were already captured. Adds comprehensive
e2e test script (31 assertions) orchestrated via tmux-cli.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use settings model ID instead of hardcoded model in KnowledgeAgent

Reads CLAUDE_MEM_MODEL from user settings via getModelId(), matching
the existing SDKAgent pattern. No more hardcoded model assumptions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: improve knowledge agents developer experience

Add public documentation page, rebuild/reprime MCP tools, and actionable
error messages. DX review scored knowledge agents 4/10 — core engineering
works (31/31 e2e) but the feature was invisible. This addresses
discoverability (docs, cross-links), API completeness (missing MCP tools),
and error quality (fix/example fields in error responses).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add quick start guide to knowledge agents page

Covers the three main use cases upfront: creating an agent, asking a
single question, and starting a fresh conversation with reprime. Includes
keeping-it-current section for rebuild + reprime workflow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address code review issues — path traversal, session safety, prompt injection

- Block path traversal in CorpusStore with alphanumeric name validation and resolved path check
- Harden system prompt against instruction injection from untrusted corpus content
- Validate question field as non-empty string in query endpoint
- Only persist session_id after successful prime (not null on failure)
- Persist refreshed session_id after query execution
- Only auto-reprime on session resume errors, not all query failures
- Add fenced code block language tags to SKILL.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address remaining code review issues — e2e robustness, MCP validation, docs

- Harden e2e curl wrappers with connect-timeout, fallback to HTTP 000 on transport failure
- Use curl_post wrapper consistently for all long-running POST calls
- Add runtime name validation to all corpus MCP tool handlers
- Fix docs: soften hallucination guarantee to probabilistic claim
- Fix architecture diagram: add missing rebuild_corpus and reprime_corpus tools

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: enforce string[] type in safeParseJsonArray for corpus data integrity

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add blank line before fenced code blocks in SKILL.md maintenance section

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 17:30:20 -07:00
WuTao 07be61cf91 feat: support ANTHROPIC_BASE_URL in EnvManager (#1627)
* feat: add custom OpenRouter base URL support

Allow users to configure a custom base URL for OpenRouter API calls
through settings UI and environment management.

Generated with AI

Co-Authored-By: AI Partner

* refactor: remove OpenRouter base URL customization, keep Claude URL changes

Only retain ANTHROPIC_BASE_URL and ANTHROPIC_AUTH_TOKEN support in
EnvManager for custom Claude API endpoint configuration.

Generated with AI

Co-Authored-By: AI Partner

* chore: revert build artifacts to match main

Generated with AI

Co-Authored-By: AI Partner

* fix: remove ANTHROPIC_AUTH_TOKEN, add ANTHROPIC_BASE_URL persistence

- Remove unnecessary ANTHROPIC_AUTH_TOKEN (inherited from parent process)
- Add ANTHROPIC_BASE_URL to saveClaudeMemEnv() to fix config persistence
- Keep only ANTHROPIC_BASE_URL support for custom API endpoint

Generated with AI

Co-Authored-By: AI Partner
2026-04-08 16:17:06 -07:00
Octopus f7fd2221c8 fix: rebuild FTS5 index after bulk observation import (fixes #1631) (#1632)
Imported observations were invisible to the MCP search tool because the
FTS5 content table was not reliably updated during bulk import. The import
handler now calls rebuildObservationsFTSIndex() after inserting new
observations, ensuring the full-text search index is consistent.

A new SessionStore.rebuildObservationsFTSIndex() method encapsulates the
FTS5 rebuild command and is a no-op when the observations_fts table does
not exist (e.g. FTS5 unavailable on Windows).
2026-04-08 16:16:55 -07:00
Alex Newman abd55977ca fix(mcp): MCP server crashes with Cannot find module 'bun:sqlite' under Node (#1645)
* fix(mcp): MCP server crashes with Cannot find module 'bun:sqlite' under Node

The MCP server bundle (mcp-server.cjs) ships with `#!/usr/bin/env node` so
it must run under Node, but commit 2b60dd29 added an import of
`ensureWorkerStarted` from worker-service.ts. That import transitively pulls
in DatabaseManager → bun:sqlite, blowing up at top-level require under Node.

The bundle ballooned from ~358KB (v11.0.1) to ~1.96MB (v12.0.0) and crashed
on every spawn, breaking the MCP server entirely for Codex/MCP-only clients
and any flow that boots the MCP tool surface.

Fix:

1. Extract `ensureWorkerStarted` and the Windows spawn-cooldown helpers
   into a new lightweight module `src/services/worker-spawner.ts` that
   only imports from infrastructure/ProcessManager, infrastructure/HealthMonitor,
   shared/*, and utils/logger — no SQLite, no ChromaSync, no DatabaseManager.

2. The new helper takes the worker script path explicitly so callers
   running under Node (mcp-server) can pass `worker-service.cjs` while
   callers already inside the worker (worker-service self-spawn) pass
   `__filename`. worker-service.ts keeps a thin wrapper for back-compat.

3. mcp-server.ts now imports from worker-spawner.js and resolves
   WORKER_SCRIPT_PATH via __dirname so the daemon can be auto-started
   for MCP-only clients without dragging in the entire worker bundle.

4. resolveWorkerRuntimePath() now searches for Bun on every platform
   (not just Windows). worker-service.cjs requires Bun at runtime, so
   when the spawner is invoked from a Node process the Unix branch can
   no longer fall through to process.execPath (= node).

5. spawnDaemon's Unix branch now calls resolveWorkerRuntimePath() instead
   of hardcoding process.execPath, fixing the same Node-spawning-Node bug
   for the actual subprocess launch on Linux/macOS.

After:
- mcp-server.cjs is 384KB again with zero `bun:sqlite` references
- node mcp-server.cjs initializes and serves tools/list + tools/call
  (verified via JSON-RPC against the running worker)
- ProcessManager test suite updated for the new cross-platform Bun
  resolution behavior; full suite has the same pre-existing failures
  as main, no regressions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(mcp): address PR #1645 review feedback (round 1)

Per Claude Code Review on PR #1645:

1. mcp-server.ts: log a warning when both __dirname and import.meta.url
   resolution fail. The cwd() fallback is essentially dead code for the
   CJS bundle but if it ever fires it gives the user a breadcrumb instead
   of a silently-wrong WORKER_SCRIPT_PATH.

2. mcp-server.ts: existsSync check on WORKER_SCRIPT_PATH at module load.
   Surfaces a clear "worker-service.cjs not found at expected path" log
   line for partial installs / dev environments instead of letting the
   failure surface as a generic spawnDaemon error later.

3. ProcessManager.ts: explanatory comment on the Windows `return 0`
   sentinel in spawnDaemon. Documents that PowerShell Start-Process
   doesn't return a PID and that callers MUST use `pid === undefined`
   for failure detection — never falsy checks like `if (!pid)`.

Items 4 (no direct unit tests for the worker-spawner Windows cooldown
helpers) and 5 (process-manager.test.ts uses real ~/.claude-mem path)
are deferred — the reviewer flagged the latter as out of scope, and
the former needs an injectable-I/O refactor that isn't appropriate
for a hotfix bugfix PR.

Verified: build clean, mcp-server.cjs still 384KB / zero bun:sqlite,
JSON-RPC tools/list still returns the 7-tool surface, ProcessManager
test suite still 43/43.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(spawner): mkdir CLAUDE_MEM_DATA_DIR before writing Windows cooldown marker

Per CodeRabbit on PR #1645: on a fresh user profile, the data dir may not
exist yet when markWorkerSpawnAttempted() runs. writeFileSync would throw
ENOENT, the catch would swallow it, and the marker would never be created
— defeating the popup-loop protection this helper exists to provide.

mkdirSync(dir, { recursive: true }) is a no-op when the directory already
exists, so it's safe to call on every spawn attempt.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs(spawner): add APPROVED OVERRIDE annotations for cooldown marker catches

Per CodeRabbit on PR #1645: silent catch blocks at spawn-cooldown sites
should carry the APPROVED OVERRIDE annotation that the rest of the
codebase uses (see ProcessManager.ts:689, BaseRouteHandler.ts:82,
ChromaSync.ts:288).

Both catches are intentional best-effort:
- markWorkerSpawnAttempted: if mkdir/writeFileSync fails, the worker
  spawn itself will almost certainly fail too. Surfacing that downstream
  is far more useful than a noisy log line about a lock file.
- clearWorkerSpawnAttempted: a stale marker is harmless. Worst case is
  one suppressed retry within the cooldown window, then self-heals.

No behaviour change. Resolves the second half of CodeRabbit's lines
38-65 comment on worker-spawner.ts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(mcp): address PR #1645 review feedback (round 2)

Round 2 of Claude Code Review feedback on PR #1645:

Build guardrail (most important — protects the regression this PR fixes):

- scripts/build-hooks.js: post-build check that fails the build if
  mcp-server.cjs ever contains a `bun:sqlite` reference. This is the
  exact regression PR #1645 fixed; future contributors will get an
  immediate, actionable error if a transitive import re-introduces it.
  Verified the check trips when violated.

Code clarity:

- src/servers/mcp-server.ts: drop dead `_originalLog` capture — it was
  never restored. Less code is fewer bugs.

- src/servers/mcp-server.ts: elevate `cwd()` fallback log from WARN to
  ERROR. Per reviewer: a wrong WORKER_SCRIPT_PATH means worker auto-start
  silently fails, so the breadcrumb should be loud and searchable.

- src/services/worker-service.ts: extended doc comment on the
  `ensureWorkerStartedShared(port, __filename)` wrapper explaining why
  `__filename` is the correct script path here (CJS bundle = compiled
  worker-service.cjs) and why mcp-server.ts can't use the same trick.

- src/services/infrastructure/ProcessManager.ts: inline comment on the
  `env.BUN === 'bun'` bare-command guard explaining why it's reachable
  even though `isBunExecutablePath('bun')` is true (pathExists returns
  false for relative names, so the second branch is what fires).

Coverage:

- src/services/infrastructure/ProcessManager.ts: add `/usr/bin/bun` to
  the Linux candidate paths so apt-installed Bun on Debian/Ubuntu is
  found without falling through to the PATH lookup.

Out-of-scope items (deferred with rationale in PR replies):

- Unit tests for ensureWorkerStarted / Windows cooldown helpers — needs
  injectable-I/O refactor unsuitable for a hotfix.
- Sentinel object for Windows spawnDaemon `0` — broader API change.
- Windows Scoop install path — follow-up for a future PR.
- runOneTimeChromaMigration placement, aggressiveStartupCleanup,
  console.log redirect timing, platform timeout multiplier — all
  pre-existing and unrelated to this regression.

Verified: build clean, guardrail trips on simulated violation,
mcp-server.cjs still 0 bun:sqlite refs, ProcessManager tests 43/43.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(mcp): address PR #1645 review feedback (round 3)

Round 3 of Claude Code Review feedback on PR #1645:

ProcessManager.ts: improve actionability of "Bun not found" errors

Both Windows and Unix branches of spawnDaemon previously logged a vague
"Failed to locate Bun runtime" message when resolveWorkerRuntimePath()
returned null. Replaced with an actionable message that names the install
URL and explains *why* Bun is required (worker uses bun:sqlite). The
existing null-guard at the call sites already prevents passing null to
child_process.spawn — only the error text changed.

scripts/build-hooks.js: refine bun:sqlite guardrail to match actual
require() calls only

The previous coarse `includes('bun:sqlite')` check tripped on its own
improved error message, which legitimately mentions "bun:sqlite" by name.
Switched to a regex that matches `require("bun:sqlite")` /
`require('bun:sqlite')` (with optional whitespace, handles both quote
styles, handles minified output) so error messages and inline comments
can reference the module name without false positives. Verified the
regex still trips on real violations (both spaced and minified forms)
and correctly ignores string-literal mentions.

Other round-3 items (verified, not changed):

- TOOL_ENDPOINT_MAP: reviewer flagged as dead code, but it IS used at
  lines 250 and 263 by the search and timeline tool handlers. False
  positive — kept as-is.
- if (!pid) callsites: grepped src/, zero offenders. The Windows `0`
  PID sentinel contract is safe; only the in-line documentation comment
  in ProcessManager.ts mentions the anti-pattern.
- callWorkerAPIPost double-wrapping: pre-existing intentional behavior
  (only used by /api/observations/batch which returns raw data, not
  the MCP {content:[...]} shape). Unrelated to this regression.
- Snap path / startParentHeartbeat / main().catch / test for non-
  existent workerScriptPath / etc — pre-existing or out of scope for
  this hotfix, deferred per established disposition.

Verified: build clean, guardrail still trips on real violations,
mcp-server.cjs has 0 require("bun:sqlite") calls, JSON-RPC tools/list
returns the 7-tool surface, ProcessManager tests 43/43.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test(spawnDaemon): contract test for Windows 0 PID success sentinel

Per CodeRabbit nitpick on PR #1645 commit 7a96b3b9: add a focused test
that documents the spawnDaemon return contract so any future contributor
who introduces `if (!pid)` against a spawnDaemon return value (or its
wrapper) sees a failing assertion explaining why the falsy check is
incorrect.

The test deliberately exercises the JS-level semantics rather than
mocking PowerShell — a true mocked Windows test would require
refactoring spawnDaemon to take an injectable execSync, which is a
larger change than this hotfix should carry. The contract assertions
here catch the same regression class (treating Windows success as
failure) without that refactor.

Verified: bun test tests/infrastructure/process-manager.test.ts now
passes 44/44 (was 43/43).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(mcp): address PR #1645 review feedback (round 4)

Round 4 of Claude Code Review feedback on PR #1645 (review of round-3
commit 193286f9):

tests/infrastructure/process-manager.test.ts: replace require('fs')
with the already-imported statSync. Reviewer correctly flagged that
the file uses ESM-style named imports everywhere else and the inline
require() calls would break under strict ESM. Two callsites updated
in the touchPidFile test.

src/services/infrastructure/ProcessManager.ts: hoist
resolveWorkerRuntimePath() and the `Bun runtime not found` error
handling out of both branches in spawnDaemon. Both Windows and Unix
branches need the same Bun lookup, and resolving once before the OS
branch split avoids a duplicate execSync('which bun')/where bun in the
no-well-known-path fallback. The error message is also DRY now —
single source of truth instead of two near-identical strings.

CodeRabbit confirmed in its previous reply that "All actionable items
across all four review rounds are fully resolved" — these two minor
items from claude-review of round 3 are the only remaining cleanup.

Verified: build clean, ProcessManager tests still 44/44.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(mcp): address PR #1645 review feedback (round 5)

Round 5 of Claude Code Review feedback on PR #1645:

src/services/worker-spawner.ts: drop `export` from internal helpers

`shouldSkipSpawnOnWindows`, `markWorkerSpawnAttempted`, and
`clearWorkerSpawnAttempted` were exported even though they were
private in worker-service.ts and nothing outside this module needs
them. Removing the `export` keyword keeps the public surface to just
`ensureWorkerStarted` and prevents future callers from bypassing the
spawn lifecycle.

scripts/build-hooks.js: broaden guardrail to all bun:* modules

Previously the regex only caught `require("bun:sqlite")`, but every
module in the `bun:` namespace (bun:ffi, bun:test, etc.) is Bun-only
and would crash mcp-server.cjs the same way under Node. Generalized
the regex to `require("bun:[a-z][a-z0-9_-]*")` so a transitive import
of any Bun-only module fails the build instead of shipping a broken
bundle. Verified the new regex still trips on bun:sqlite, bun:ffi,
bun:test, and correctly ignores string-literal mentions in error
messages.

src/servers/mcp-server.ts: attribute root cause when dirname resolution fails

Previously, if `__dirname`/`import.meta.url` resolution failed and we
fell back to `process.cwd()`, the user would see two warnings: an
error about the dirname fallback AND a separate warning about the
missing worker bundle. The second warning hides the root cause —
someone debugging would assume the install is broken when really it's
a dirname-resolution failure. Track the failure with a flag and emit
a single root-cause-attributing log line in the existence-check
branch instead. The dirname fallback paths are still functionally
unreachable in CJS deployment; this just makes the failure mode
unmistakable if it ever does fire.

Out of scope (consistent with prior rounds):
- darwin/linux split for non-Windows candidate paths (benign today)
- Integration test for non-existent workerScriptPath (test coverage
  gap deferred since rounds 1-2)
- Defer existsSync check to first ensureWorkerStarted call (current
  module-init check is the loud signal we want)

Already addressed in earlier rounds:
- resolveWorkerRuntimePath() called twice in spawnDaemon → hoisted in
  round 4 (b2c114b4)
- _originalLog dead code → removed in round 2 (7a96b3b9)

Verified: build clean, broadened guardrail trips on bun:sqlite,
bun:ffi, and bun:test (and ignores string literals), MCP server
serves the 7-tool surface, ProcessManager tests still 44/44.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(mcp): address PR #1645 review feedback (round 6)

Round 6 of Claude Code Review feedback on PR #1645:

src/services/worker-spawner.ts: validate workerScriptPath at entry

Add an empty-string + existsSync guard at the top of ensureWorkerStarted.
Without this, a partial install or upstream path-resolution regression
just surfaces as a low-signal child_process error from spawnDaemon. The
explicit log line at the entry point makes that class of bug much
easier to diagnose. The mcp-server.ts module-init existsSync check
already covers this for the MCP-server caller, but defending at the
spawner level reinforces the contract for any future caller.

src/services/worker-spawner.ts: document SettingsDefaultsManager
dependency boundary in the module header

The spawner imports from SettingsDefaultsManager, ProcessManager, and
HealthMonitor. None of those currently touch bun:sqlite, but if any
of them ever does, the spawner's SQLite-free contract silently breaks.
The build guardrail in build-hooks.js is the only thing that catches
it. Header comment now flags this so future contributors audit
transitive imports when adding helpers from the shared/infrastructure
layers.

src/services/infrastructure/ProcessManager.ts: add /snap/bin/bun

Ubuntu Snap install path. Now alongside the existing apt path
(/usr/bin/bun) and Homebrew/Linuxbrew paths. The PATH lookup catches
it as fallback, but listing it explicitly avoids paying for an
execSync('which bun') in the common case.

src/servers/mcp-server.ts: elevate missing-bundle log warn → error

A missing worker-service.cjs means EVERY MCP tool call that needs the
worker silently fails. That's a broken-install state, not a transient
condition — match the severity of the dirname-fallback branch above
(which is already ERROR).

Out of scope (consistent with prior rounds, reviewer agrees these are
appropriately deferred):
- Streaming bundle read in build-hooks.js (nit at current 384KB size)
- Unit tests for ensureWorkerStarted / cooldown helpers
- Integration test for non-existent workerScriptPath

Verified: build clean, broadened guardrail still trips on bun:* imports
and ignores string literals, MCP server serves the 7-tool surface,
ProcessManager tests still 44/44.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(mcp): defer WORKER_SCRIPT_PATH check to first call (round 7)

Round 7 of Claude Code Review feedback on PR #1645:

src/servers/mcp-server.ts: extract module-level existsSync check into
checkWorkerScriptPath() and call it lazily from ensureWorkerConnection()
instead of at module load.

The early-warning intent is preserved (the check still fires before any
actual spawn attempt), but tests/tools that import this module without
booting the MCP server no longer see noisy ERROR-level log lines for a
worker bundle they never intended to start. The check is cheap and
idempotent, so calling it on every auto-start attempt is fine.

The two failure-mode branches (dirname-resolution failure vs simple
missing-bundle) remain unchanged — the function body is identical to
the previous module-level if-block, just hoisted into a function and
called from ensureWorkerConnection().

False positive (no change needed):
- Reviewer flagged `mkdirSync` as a dead import in worker-spawner.ts,
  but it IS used at line 71 in markWorkerSpawnAttempted (the round-1
  ENOENT fix CodeRabbit explicitly asked for).

Out of scope:
- Volta path (~/.volta/bin/bun) — PATH fallback handles it; nit per
  reviewer
- worker-spawner.ts unit tests — needs injectable I/O, deferred
  consistently since round 1

Verified: build clean, tests 44/44, smoke test 7-tool surface.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(mcp): address PR #1645 review feedback (round 8)

Round 8 of Claude Code Review feedback on PR #1645:

tests/services/worker-spawner.test.ts: NEW FILE — unit tests for the
ensureWorkerStarted entry-point validation guards added in round 6.
Covers the empty-string and non-existent-path cases without requiring
the broader injectable-I/O refactor that the deeper spawn lifecycle
tests would need. 2 new passing tests.

src/services/infrastructure/ProcessManager.ts: memoize
resolveWorkerRuntimePath() for the no-options call site (which is what
spawnDaemon uses). Caches both successful resolutions and the
not-found result so repeated spawn attempts (crash loops, health
thrashing) don't repeatedly hit statSync on candidate paths. Tests
that pass options bypass the cache entirely so existing test cases
remain deterministic. Added resetWorkerRuntimePathCache() exported
for test isolation only.

src/servers/mcp-server.ts: rename checkWorkerScriptPath() →
warnIfWorkerScriptMissing(). Per reviewer: the old name implied a
boolean check but the function returns void and has side effects. New
name is more accurate.

DEFENDED (no change made):
- Reviewer asked to elevate process.cwd() fallback to a synchronous
  throw at module load. This conflicts with round 7 feedback which
  asked to defer the existsSync check to first call to avoid noisy
  test logs. The current lazy approach is the right compromise: it
  fires before any actual spawn attempt, attributes the root cause,
  and doesn't pollute test imports. Throwing at module load would
  crash before stdio is wired up, which is much harder to debug than
  the lazy log line.
- Reviewer asked to grep for `if (!pid)` callsites — already verified
  in round 3, zero offenders in src/.

Out of scope:
- Volta path (~/.volta/bin/bun) — PATH fallback handles it; reviewer
  marked as nit
- Deeper unit tests for ensureWorkerStarted spawn lifecycle (PID file
  cleanup, health checks, etc.) — needs injectable I/O, deferred
  consistently since round 1

Verified: build clean, ProcessManager tests still 44/44, new
worker-spawner tests 2/2, smoke test serves 7 tools.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(spawner): clear Windows cooldown marker on all healthy paths (round 9)

Round 9 of PR #1645 review feedback.

src/services/worker-spawner.ts: clear stale Windows cooldown marker
on every healthy-return path

Per CodeRabbit (genuine bug):

The .worker-start-attempted marker was previously only cleared after
a spawn initiated by ensureWorkerStarted itself succeeded. If a
previous auto-start failed, then the worker became healthy via
another session or a manual start, the early-return success branches
(existing live PID, fast-path health check, port-in-use waitForHealth)
would leave the stale marker behind. A subsequent genuine outage
inside the 2-minute cooldown window would then be incorrectly
suppressed on Windows.

Now calls clearWorkerSpawnAttempted() on all three healthy success
paths in addition to the existing post-spawn path. The function is
already a no-op on non-Windows, so the change is risk-free for Linux
and macOS callers.

src/servers/mcp-server.ts: more actionable error when auto-start fails

Per claude-review: when ensureWorkerStarted returns false (or throws),
the caller currently logs a generic "Worker auto-start failed" line.
Updated both error sites to explicitly call out which MCP tools will
fail (search/timeline/get_observations) and to point at earlier log
lines for the specific cause. Helps users distinguish "worker is just
not running" from "tools are broken".

DEFENDED (no change):
- Sentinel object for Windows spawnDaemon 0 PID — broader API change,
  out of scope, deferred consistently since round 1
- Spawner lifecycle tests beyond input validation — needs injectable
  I/O, deferred consistently
- Concurrent cooldown marker race on Windows — pre-existing,
  out of scope
- stripHardcodedDirname() regex fragility assertion — pre-existing,
  out of scope

Verified: build clean, ProcessManager tests 44/44, worker-spawner
tests 2/2, smoke test 7-tool surface.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(spawner): don't cache null Bun-not-found result (round 10)

Round 10 of PR #1645 review feedback.

src/services/infrastructure/ProcessManager.ts: only cache successful
resolveWorkerRuntimePath() results

Genuine bug from claude-review: the round-8 memoization cached BOTH
successful resolutions AND the not-found `null` result. If Bun isn't
on PATH at the moment the MCP server first tries to spawn the worker
— e.g., on a fresh install where the user installs Bun in another
terminal and retries — every subsequent ensureWorkerConnection call
would return the cached `null` and fail with a misleading "Bun not
found" error even though Bun is now available.

The fix is the one-line change the reviewer suggested: only cache
when `result !== null`. Crash loops still get the fast-path memoized
success; recovery from a fresh-install Bun install still works.

src/servers/mcp-server.ts: rename warnIfWorkerScriptMissing →
errorIfWorkerScriptMissing

Per claude-review: the function uses logger.error but the name says
"warn" — name/level mismatch. Renamed to match. The function still
serves the same purpose (defensive lazy check), just with an accurate
name.

DEFENDED (no change):
- Discriminated union for mcpServerDirResolutionFailed flag — current
  approach works, the noise is minimal, and the alternative would
  add type complexity for a path that's functionally unreachable in
  CJS deployment
- macOS /usr/local/bin/bun "missing" — already in the Linux/macOS
  candidate list at line 137 (false positive from reviewer)
- nix store path — out of scope, PATH fallback handles it
- Long build-hooks.js error message — verbosity is intentional, this
  message only fires on a real regression and the diagnostic value is
  worth the line wrap
- Spawner lifecycle test coverage gap — needs injectable I/O,
  deferred consistently

Verified: build clean, ProcessManager tests 44/44, worker-spawner
tests 2/2, smoke test 7-tool surface.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(mcp): bundle size budget guardrail (round 11)

Round 11 of PR #1645 review feedback.

scripts/build-hooks.js: secondary bundle-size budget guardrail

Per claude-review: the existing `require("bun:*")` regex catches the
specific regression class we already know about, but if esbuild ever
changes how it emits external module specifiers, the regex could
silently miss the regression. A bundle-size budget catches the
structural symptom (worker-service.ts dragged into the bundle blew
the size from ~358KB to ~1.96MB) regardless of how the imports look.

Set the ceiling at 600KB. Current size is ~384KB; the broken v12.0.0
bundle was ~1920KB. Plenty of headroom for legitimate growth without
incentivizing bundle bloat or false positives. Both guardrails fire
independently — one is regex-based, one is size-based — so a
regression has to defeat both to ship.

tests/services/worker-spawner.test.ts: comment about port irrelevance

Per claude-review: the hardcoded port values in the validation-guard
tests are arbitrary because the path validation short-circuits before
any network I/O. Added a comment explaining this so future readers
don't waste time wondering why specific ports were picked.

DEFENDED (no change):

- clearWorkerSpawnAttempted on the unhealthy-live-PID return path:
  reviewer asked to clear the marker here too, but the current
  behavior is correct. The marker tracks "recently attempted a spawn"
  and exists to prevent rapid PowerShell-popup loops. If a wedged
  process is currently using the port, the spawn isn't actually
  happening on this code path (the helper returns false without
  reaching the spawn step). When the wedged process eventually dies
  and a subsequent call hits the spawn path, the marker correctly
  suppresses repeated retry attempts within the 2-minute cooldown.
  Clearing the marker on the unhealthy-return path would defeat
  exactly the popup-loop protection the marker exists to provide.

- execSync in lookupBinaryInPath blocks event loop: pre-existing
  concern, not introduced by this PR. Reviewer notes "fires once,
  result cached". Not in scope for a hotfix.

- Tracking issue for spawner lifecycle test gap: out of scope for
  this PR; the gap is documented in the test file's header comment
  with a back-reference to PR #1645.

Verified: build clean, both guardrails functional (size budget is
under the new ceiling), ProcessManager tests 44/44, worker-spawner
tests 2/2, smoke test 7-tool surface.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(mcp): eliminate double error log when worker bundle is missing (round 12)

Round 12 of PR #1645 review feedback.

src/servers/mcp-server.ts: errorIfWorkerScriptMissing() now only logs
when the dirname-fallback attribution path is needed

Previously a missing worker-service.cjs would produce two ERROR log
lines on the same code path:
1. errorIfWorkerScriptMissing() in ensureWorkerConnection()
2. The existsSync guard inside ensureWorkerStarted()

The simple "missing bundle" case is fully covered by the spawner's
own existsSync guard. The mcp-server.ts function now ONLY logs when
mcpServerDirResolutionFailed is true — that's the mcp-server-specific
root-cause attribution that the spawner cannot provide on its own.

Net effect: same single error log per bug class, cleaner triage.

DEFENDED (no change):

- mkdirSync error propagation in markWorkerSpawnAttempted: reviewer
  worried that mkdirSync/writeFileSync exceptions could escape, but
  the entire body is already wrapped in try/catch with an APPROVED
  OVERRIDE annotation. False positive.
- clearWorkerSpawnAttempted on healthy paths: reviewer asked a
  clarifying question, not a change request. The behavior is
  intentional — the cooldown marker exists to prevent rapid
  PowerShell-popup loops from a series of failed spawns; a healthy
  worker means the marker has served its purpose and a future
  outage should NOT be suppressed. Will explain in PR reply.
- __filename ESM concern in worker-service.ts wrapper: already
  documented in round 4 with an extended comment about the CJS
  bundle context and why mcp-server.ts can't use the same trick.
- Spawn lifecycle integration tests: deferred consistently since
  round 1; gap is documented in worker-spawner.test.ts header.

Verified: build clean, ProcessManager tests 44/44, worker-spawner
tests 2/2, smoke test 7-tool surface.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test(spawner): add bare-command BUN env override coverage

Final round of PR #1645 review feedback: while preparing to merge, I
noticed CodeRabbit's round-5 CHANGES_REQUESTED review on commit
3570d2f0 included an unaddressed nitpick — the env-driven bare-command
branch in resolveWorkerRuntimePath() (returning a bare 'bun' unchanged
when BUN or BUN_PATH is set that way) had no test coverage and could
regress without any failing assertion.

Added a focused test that exercises the env: { BUN: 'bun' } branch
specifically. 47/47 tests pass (was 46/46).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 18:08:36 -07:00
Alex Newman a0e895b53b fix: enhance title sanitization per PR #1641 review (round 4)
Collapse multiple whitespace, trim, and increase max length to 160 chars
for observation titles in file-context deny reason.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 14:18:22 -07:00
Alex Newman 753a993647 fix: address PR #1641 review comments (round 3)
- Fix migration version conflict: addSessionPlatformSourceColumn now uses v25
- Sanitize observation titles in file-context deny reason (strip newlines, limit length)
- Guard json_each() with LIKE '[%' check for legacy bare-path rows
- Guard /stream SSE endpoint with 503 before DB initialization
- Scope bun-runner signal exit handling to start subcommand only
- Normalize platformSource at route boundary in DataRoutes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 14:16:41 -07:00
Alex Newman d0676aa049 feat: file-read gate allows Edit, add legacy-peer-deps for grammar install
- Change file-read gate from deny to allow with limit:1, injecting the
  observation timeline as additionalContext. Edit now works on gated files
  since the file registers as "read" with near-zero token cost.
- Add updatedInput to HookResult type for PreToolUse hooks.
- Add .npmrc with legacy-peer-deps=true for tree-sitter peer dep conflicts.
- Add --legacy-peer-deps to npm fallback paths in smart-install.js so end
  users without bun can install the 24 grammar packages.
- Rebuild plugin artifacts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 14:06:07 -07:00
Alex Newman 7996dfd5cd Merge branch 'thedotmack/add-lang-parsers' into integration/validation-batch
Adds 24-language support for smart-explore: Kotlin, Swift, Elixir,
Lua, Scala, Bash, Haskell, Zig, CSS, SCSS, TOML, YAML, SQL, Markdown.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 13:50:46 -07:00
Alex Newman 95889c7b4e feat: expand smart-explore to 24 languages with markdown support and user-installable grammars
Add 15 new tree-sitter language grammars (Kotlin, Swift, PHP, Elixir, Lua, Scala,
Bash, Haskell, Zig, CSS, SCSS, TOML, YAML, SQL, Markdown) with verified SCM queries.
Add markdown-specific formatting with heading hierarchy, code block detection, and
section-aware unfold. Add user-installable grammar system via .claude-mem.json config
with custom query file support.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 13:24:56 -07:00
Alex Newman 25bb93a995 fix: address PR #1641 review comments (round 2)
- Remove duplicate TranscriptWatcher/config imports in worker-service.ts
- Use normalizePlatformSource in handleSessionInitByClaudeId for consistency
- Don't skip DB completion when session not in memory (completeByClaudeId)
- Add try-catch around fetch in useContextPreview refresh callback
- Deduplicate store.getAllProjects() call in DataRoutes
- Fix malformed comment separators in migration runner
- Fix missing closing brace and JSDoc opener (merge artifact) in migration runner

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 13:22:58 -07:00
Alex Newman c21e49d9fa fix: address PR review comments and add file read gate docs
Fix indentation bugs flagged in PR review (SettingsDefaultsManager,
MigrationRunner), add current date/time to file read gate timeline
so the model can judge observation recency, and add documentation
for the file read gate feature.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 13:09:46 -07:00
Alex Newman cbb68ad9e1 fix: worker startup crash and missing observation columns
Two bugs fixed:

1. SessionCompletionHandler called dbManager.getSessionStore() during
   WorkerService construction, before DB initialization. Changed to
   accept DatabaseManager and defer the call to runtime.

2. migration009 (generated_by_model, relevance_count columns) only ran
   via the deprecated MigrationRunner path, never through SessionStore's
   migration chain. Added addObservationModelColumns() to SessionStore
   constructor. Checks column existence directly since schema_versions
   may have been marked applied without the ALTER TABLE succeeding.

Also removed duplicate transcriptWatcher declaration and shutdown block
(merge artifact).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 12:20:10 -07:00
Alex Newman b8999c1181 Merge branch 'thedotmack/file-read-timeline-inject' into integration/validation-batch 2026-04-07 11:18:58 -07:00
Alex Newman d8947473b8 fix: escape filePath in recovery hints to prevent malformed output
Filenames containing quotes, backslashes, or newlines could produce
malformed smart_outline/smart_unfold examples in the deny message.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 11:06:32 -07:00
Ousama Ben Younes 53f98fad67 fix: use null-check instead of falsy-OR for depth defaults to preserve 0
Number(x) || 10 converts 0 to 10 since 0 is falsy, making it impossible
to request zero context depth (anchor only). Replace with explicit null
check in timeline(), getContextTimeline(), getTimelineByQuery().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-07 17:40:15 +00:00
Ousama Ben Younes 64062ac761 fix: address CodeRabbit review — remove side-effectful test import and normalize timeline depth params
- tests/servers/mcp-tool-schemas.test.ts: remove `import '../../src/servers/mcp-server.js'`
  which triggered server startup side effects; test only needs to read the TS source as text
- src/services/worker/SearchManager.ts: add Number() coercion for depth_before/depth_after
  in timeline(), getContextTimeline(), getTimelineByQuery() — HTTP query strings deliver
  these as strings, coercion ensures they are always numbers before being passed to
  filterByDepth() and getTimelineAround*()

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-07 17:30:59 +00:00
Ousama Ben Younes 8cdabe6315 fix: declare inputSchema properties for search and timeline MCP tools (#1384 #1413)
Both tools had properties:{} which prevents MCP clients from exposing
params to the LLM, causing every call to send {} and get a 500 error
("Either query or filters required for search").

- search: declare query, limit, project, type, obs_type, dateStart, dateEnd, offset, orderBy
- timeline: declare anchor, query, depth_before, depth_after, project
- Add 3 schema regression tests (static source validation)

Closes #1384
Closes #1413

Co-Authored-By: Claude <noreply@anthropic.com>
2026-04-07 17:21:19 +00:00
Alex Newman e3475180cd fix: address PR review — day sort, path canonicalization, dead code cleanup
- Sort within-day observations chronologically (was specificity-ordered)
- Canonicalize relative paths to POSIX format before DB lookup
- Skip projects param when allProjects is empty (prevents cross-project leaks)
- Remove dead stderrMessage field and hook-command block (unused after permissionDecision switch)
- Type permissionDecision as 'allow' | 'deny' union instead of string
- Remove redundant non-null assertions in getObservationsByFilePath
- Add edit guidance to deny message (use sed via Bash with smart tools)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 01:59:30 -07:00
Alex Newman ef1b427a2a fix: update timeline deny message to route to smart tools
The deny reason is the routing surface — show all cheaper exits:
semantic priming from the timeline, get_observations for details,
and smart_outline/smart_unfold for current code structure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 00:25:55 -07:00
Alex Newman 455aeaf654 fix: remove per-session gate, use permissionDecision deny for every read
The per-session FileReadGate was never requested and broke the cost
savings loop — subsequent reads in the same session silently bypassed
the timeline, hiding newly created observations.

Now the timeline fires on every read that has observations, using the
hook contract's permissionDecision: "deny" with the timeline as the
reason (exit 0 + JSON) instead of exit code 2 + stderr.

- Delete FileReadGate.ts entirely
- Remove /api/file-context/gate endpoint from DataRoutes
- Switch handler from exit code 2 to permissionDecision: "deny"
- Restore permissionDecision fields to HookResult
- Eliminate one HTTP round-trip per read (no gate check needed)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 22:05:40 -07:00
Alex Newman 31910fb265 fix: address PR review feedback — path safety, SQL injection, gate scoping
- Resolve relative filePath against input.cwd before statSync; early-return on ENOENT
- Replace LIKE '%path%' with exact json_each equality to prevent false matches
- Sanitize and parameterize LIMIT to prevent NaN SQL errors
- Fix day-sorting to use earliest epoch in group, not first (specificity-sorted) item
- Use exact path equality in deduplicateObservations instead of substring includes
- Scope FileReadGate by session+cwd to prevent worktree collisions
- Refresh lastAccess TTL on active sessions; throttle prune to every 50 calls
- Type params as (string | number)[] instead of any[]
- Remove unused permissionDecision fields from HookResult

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 17:29:59 -07:00
Alex Newman 6250a194dd Merge branch 'pr-1472' into integration/validation-batch
# Conflicts:
#	plugin/scripts/context-generator.cjs
#	plugin/scripts/mcp-server.cjs
#	plugin/scripts/worker-service.cjs
#	plugin/ui/viewer-bundle.js
#	src/cli/handlers/context.ts
#	src/services/sqlite/SessionStore.ts
#	src/services/sqlite/migrations/runner.ts
#	src/services/worker-service.ts
#	src/shared/SettingsDefaultsManager.ts
2026-04-06 14:23:18 -07:00
Alex Newman 58fcd85724 Merge branch 'pr-1578' into integration/validation-batch
# Conflicts:
#	plugin/scripts/context-generator.cjs
#	plugin/scripts/worker-service.cjs
#	src/utils/tag-stripping.ts
2026-04-06 14:21:45 -07:00
Alex Newman d570909bf1 Merge branch 'pr-1491' into integration/validation-batch
# Conflicts:
#	plugin/scripts/mcp-server.cjs
#	plugin/scripts/worker-service.cjs
#	src/shared/hook-constants.ts
2026-04-06 14:20:05 -07:00
Alex Newman 5dd2a6f758 Merge branch 'pr-1553' into integration/validation-batch
# Conflicts:
#	src/services/worker/session/SessionCompletionHandler.ts
2026-04-06 14:19:50 -07:00
Alex Newman c3cb8f81ed Merge branch 'pr-1368' into integration/validation-batch
# Conflicts:
#	plugin/scripts/context-generator.cjs
#	plugin/scripts/mcp-server.cjs
#	plugin/scripts/worker-service.cjs
#	plugin/ui/viewer-bundle.js
2026-04-06 14:19:23 -07:00