Stop hook polled queueLength===0 as a proxy for summary success, but the queue
empties regardless of whether the LLM produced valid <summary> tags. Added
lastSummaryStored tracking on ActiveSession, surfaced via the /api/sessions/status
endpoint, and emit a logger.warn in the Stop hook when summaryStored===false.
Generated by Claude Code
Vibe coded by ousamabenyounes
Co-Authored-By: Claude <noreply@anthropic.com>
* fix(mcp): MCP server crashes with Cannot find module 'bun:sqlite' under Node
The MCP server bundle (mcp-server.cjs) ships with `#!/usr/bin/env node` so
it must run under Node, but commit 2b60dd29 added an import of
`ensureWorkerStarted` from worker-service.ts. That import transitively pulls
in DatabaseManager → bun:sqlite, blowing up at top-level require under Node.
The bundle ballooned from ~358KB (v11.0.1) to ~1.96MB (v12.0.0) and crashed
on every spawn, breaking the MCP server entirely for Codex/MCP-only clients
and any flow that boots the MCP tool surface.
Fix:
1. Extract `ensureWorkerStarted` and the Windows spawn-cooldown helpers
into a new lightweight module `src/services/worker-spawner.ts` that
only imports from infrastructure/ProcessManager, infrastructure/HealthMonitor,
shared/*, and utils/logger — no SQLite, no ChromaSync, no DatabaseManager.
2. The new helper takes the worker script path explicitly so callers
running under Node (mcp-server) can pass `worker-service.cjs` while
callers already inside the worker (worker-service self-spawn) pass
`__filename`. worker-service.ts keeps a thin wrapper for back-compat.
3. mcp-server.ts now imports from worker-spawner.js and resolves
WORKER_SCRIPT_PATH via __dirname so the daemon can be auto-started
for MCP-only clients without dragging in the entire worker bundle.
4. resolveWorkerRuntimePath() now searches for Bun on every platform
(not just Windows). worker-service.cjs requires Bun at runtime, so
when the spawner is invoked from a Node process the Unix branch can
no longer fall through to process.execPath (= node).
5. spawnDaemon's Unix branch now calls resolveWorkerRuntimePath() instead
of hardcoding process.execPath, fixing the same Node-spawning-Node bug
for the actual subprocess launch on Linux/macOS.
After:
- mcp-server.cjs is 384KB again with zero `bun:sqlite` references
- node mcp-server.cjs initializes and serves tools/list + tools/call
(verified via JSON-RPC against the running worker)
- ProcessManager test suite updated for the new cross-platform Bun
resolution behavior; full suite has the same pre-existing failures
as main, no regressions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(mcp): address PR #1645 review feedback (round 1)
Per Claude Code Review on PR #1645:
1. mcp-server.ts: log a warning when both __dirname and import.meta.url
resolution fail. The cwd() fallback is essentially dead code for the
CJS bundle but if it ever fires it gives the user a breadcrumb instead
of a silently-wrong WORKER_SCRIPT_PATH.
2. mcp-server.ts: existsSync check on WORKER_SCRIPT_PATH at module load.
Surfaces a clear "worker-service.cjs not found at expected path" log
line for partial installs / dev environments instead of letting the
failure surface as a generic spawnDaemon error later.
3. ProcessManager.ts: explanatory comment on the Windows `return 0`
sentinel in spawnDaemon. Documents that PowerShell Start-Process
doesn't return a PID and that callers MUST use `pid === undefined`
for failure detection — never falsy checks like `if (!pid)`.
Items 4 (no direct unit tests for the worker-spawner Windows cooldown
helpers) and 5 (process-manager.test.ts uses real ~/.claude-mem path)
are deferred — the reviewer flagged the latter as out of scope, and
the former needs an injectable-I/O refactor that isn't appropriate
for a hotfix bugfix PR.
Verified: build clean, mcp-server.cjs still 384KB / zero bun:sqlite,
JSON-RPC tools/list still returns the 7-tool surface, ProcessManager
test suite still 43/43.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(spawner): mkdir CLAUDE_MEM_DATA_DIR before writing Windows cooldown marker
Per CodeRabbit on PR #1645: on a fresh user profile, the data dir may not
exist yet when markWorkerSpawnAttempted() runs. writeFileSync would throw
ENOENT, the catch would swallow it, and the marker would never be created
— defeating the popup-loop protection this helper exists to provide.
mkdirSync(dir, { recursive: true }) is a no-op when the directory already
exists, so it's safe to call on every spawn attempt.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs(spawner): add APPROVED OVERRIDE annotations for cooldown marker catches
Per CodeRabbit on PR #1645: silent catch blocks at spawn-cooldown sites
should carry the APPROVED OVERRIDE annotation that the rest of the
codebase uses (see ProcessManager.ts:689, BaseRouteHandler.ts:82,
ChromaSync.ts:288).
Both catches are intentional best-effort:
- markWorkerSpawnAttempted: if mkdir/writeFileSync fails, the worker
spawn itself will almost certainly fail too. Surfacing that downstream
is far more useful than a noisy log line about a lock file.
- clearWorkerSpawnAttempted: a stale marker is harmless. Worst case is
one suppressed retry within the cooldown window, then self-heals.
No behaviour change. Resolves the second half of CodeRabbit's lines
38-65 comment on worker-spawner.ts.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(mcp): address PR #1645 review feedback (round 2)
Round 2 of Claude Code Review feedback on PR #1645:
Build guardrail (most important — protects the regression this PR fixes):
- scripts/build-hooks.js: post-build check that fails the build if
mcp-server.cjs ever contains a `bun:sqlite` reference. This is the
exact regression PR #1645 fixed; future contributors will get an
immediate, actionable error if a transitive import re-introduces it.
Verified the check trips when violated.
Code clarity:
- src/servers/mcp-server.ts: drop dead `_originalLog` capture — it was
never restored. Less code is fewer bugs.
- src/servers/mcp-server.ts: elevate `cwd()` fallback log from WARN to
ERROR. Per reviewer: a wrong WORKER_SCRIPT_PATH means worker auto-start
silently fails, so the breadcrumb should be loud and searchable.
- src/services/worker-service.ts: extended doc comment on the
`ensureWorkerStartedShared(port, __filename)` wrapper explaining why
`__filename` is the correct script path here (CJS bundle = compiled
worker-service.cjs) and why mcp-server.ts can't use the same trick.
- src/services/infrastructure/ProcessManager.ts: inline comment on the
`env.BUN === 'bun'` bare-command guard explaining why it's reachable
even though `isBunExecutablePath('bun')` is true (pathExists returns
false for relative names, so the second branch is what fires).
Coverage:
- src/services/infrastructure/ProcessManager.ts: add `/usr/bin/bun` to
the Linux candidate paths so apt-installed Bun on Debian/Ubuntu is
found without falling through to the PATH lookup.
Out-of-scope items (deferred with rationale in PR replies):
- Unit tests for ensureWorkerStarted / Windows cooldown helpers — needs
injectable-I/O refactor unsuitable for a hotfix.
- Sentinel object for Windows spawnDaemon `0` — broader API change.
- Windows Scoop install path — follow-up for a future PR.
- runOneTimeChromaMigration placement, aggressiveStartupCleanup,
console.log redirect timing, platform timeout multiplier — all
pre-existing and unrelated to this regression.
Verified: build clean, guardrail trips on simulated violation,
mcp-server.cjs still 0 bun:sqlite refs, ProcessManager tests 43/43.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(mcp): address PR #1645 review feedback (round 3)
Round 3 of Claude Code Review feedback on PR #1645:
ProcessManager.ts: improve actionability of "Bun not found" errors
Both Windows and Unix branches of spawnDaemon previously logged a vague
"Failed to locate Bun runtime" message when resolveWorkerRuntimePath()
returned null. Replaced with an actionable message that names the install
URL and explains *why* Bun is required (worker uses bun:sqlite). The
existing null-guard at the call sites already prevents passing null to
child_process.spawn — only the error text changed.
scripts/build-hooks.js: refine bun:sqlite guardrail to match actual
require() calls only
The previous coarse `includes('bun:sqlite')` check tripped on its own
improved error message, which legitimately mentions "bun:sqlite" by name.
Switched to a regex that matches `require("bun:sqlite")` /
`require('bun:sqlite')` (with optional whitespace, handles both quote
styles, handles minified output) so error messages and inline comments
can reference the module name without false positives. Verified the
regex still trips on real violations (both spaced and minified forms)
and correctly ignores string-literal mentions.
Other round-3 items (verified, not changed):
- TOOL_ENDPOINT_MAP: reviewer flagged as dead code, but it IS used at
lines 250 and 263 by the search and timeline tool handlers. False
positive — kept as-is.
- if (!pid) callsites: grepped src/, zero offenders. The Windows `0`
PID sentinel contract is safe; only the in-line documentation comment
in ProcessManager.ts mentions the anti-pattern.
- callWorkerAPIPost double-wrapping: pre-existing intentional behavior
(only used by /api/observations/batch which returns raw data, not
the MCP {content:[...]} shape). Unrelated to this regression.
- Snap path / startParentHeartbeat / main().catch / test for non-
existent workerScriptPath / etc — pre-existing or out of scope for
this hotfix, deferred per established disposition.
Verified: build clean, guardrail still trips on real violations,
mcp-server.cjs has 0 require("bun:sqlite") calls, JSON-RPC tools/list
returns the 7-tool surface, ProcessManager tests 43/43.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test(spawnDaemon): contract test for Windows 0 PID success sentinel
Per CodeRabbit nitpick on PR #1645 commit 7a96b3b9: add a focused test
that documents the spawnDaemon return contract so any future contributor
who introduces `if (!pid)` against a spawnDaemon return value (or its
wrapper) sees a failing assertion explaining why the falsy check is
incorrect.
The test deliberately exercises the JS-level semantics rather than
mocking PowerShell — a true mocked Windows test would require
refactoring spawnDaemon to take an injectable execSync, which is a
larger change than this hotfix should carry. The contract assertions
here catch the same regression class (treating Windows success as
failure) without that refactor.
Verified: bun test tests/infrastructure/process-manager.test.ts now
passes 44/44 (was 43/43).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(mcp): address PR #1645 review feedback (round 4)
Round 4 of Claude Code Review feedback on PR #1645 (review of round-3
commit 193286f9):
tests/infrastructure/process-manager.test.ts: replace require('fs')
with the already-imported statSync. Reviewer correctly flagged that
the file uses ESM-style named imports everywhere else and the inline
require() calls would break under strict ESM. Two callsites updated
in the touchPidFile test.
src/services/infrastructure/ProcessManager.ts: hoist
resolveWorkerRuntimePath() and the `Bun runtime not found` error
handling out of both branches in spawnDaemon. Both Windows and Unix
branches need the same Bun lookup, and resolving once before the OS
branch split avoids a duplicate execSync('which bun')/where bun in the
no-well-known-path fallback. The error message is also DRY now —
single source of truth instead of two near-identical strings.
CodeRabbit confirmed in its previous reply that "All actionable items
across all four review rounds are fully resolved" — these two minor
items from claude-review of round 3 are the only remaining cleanup.
Verified: build clean, ProcessManager tests still 44/44.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(mcp): address PR #1645 review feedback (round 5)
Round 5 of Claude Code Review feedback on PR #1645:
src/services/worker-spawner.ts: drop `export` from internal helpers
`shouldSkipSpawnOnWindows`, `markWorkerSpawnAttempted`, and
`clearWorkerSpawnAttempted` were exported even though they were
private in worker-service.ts and nothing outside this module needs
them. Removing the `export` keyword keeps the public surface to just
`ensureWorkerStarted` and prevents future callers from bypassing the
spawn lifecycle.
scripts/build-hooks.js: broaden guardrail to all bun:* modules
Previously the regex only caught `require("bun:sqlite")`, but every
module in the `bun:` namespace (bun:ffi, bun:test, etc.) is Bun-only
and would crash mcp-server.cjs the same way under Node. Generalized
the regex to `require("bun:[a-z][a-z0-9_-]*")` so a transitive import
of any Bun-only module fails the build instead of shipping a broken
bundle. Verified the new regex still trips on bun:sqlite, bun:ffi,
bun:test, and correctly ignores string-literal mentions in error
messages.
src/servers/mcp-server.ts: attribute root cause when dirname resolution fails
Previously, if `__dirname`/`import.meta.url` resolution failed and we
fell back to `process.cwd()`, the user would see two warnings: an
error about the dirname fallback AND a separate warning about the
missing worker bundle. The second warning hides the root cause —
someone debugging would assume the install is broken when really it's
a dirname-resolution failure. Track the failure with a flag and emit
a single root-cause-attributing log line in the existence-check
branch instead. The dirname fallback paths are still functionally
unreachable in CJS deployment; this just makes the failure mode
unmistakable if it ever does fire.
Out of scope (consistent with prior rounds):
- darwin/linux split for non-Windows candidate paths (benign today)
- Integration test for non-existent workerScriptPath (test coverage
gap deferred since rounds 1-2)
- Defer existsSync check to first ensureWorkerStarted call (current
module-init check is the loud signal we want)
Already addressed in earlier rounds:
- resolveWorkerRuntimePath() called twice in spawnDaemon → hoisted in
round 4 (b2c114b4)
- _originalLog dead code → removed in round 2 (7a96b3b9)
Verified: build clean, broadened guardrail trips on bun:sqlite,
bun:ffi, and bun:test (and ignores string literals), MCP server
serves the 7-tool surface, ProcessManager tests still 44/44.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(mcp): address PR #1645 review feedback (round 6)
Round 6 of Claude Code Review feedback on PR #1645:
src/services/worker-spawner.ts: validate workerScriptPath at entry
Add an empty-string + existsSync guard at the top of ensureWorkerStarted.
Without this, a partial install or upstream path-resolution regression
just surfaces as a low-signal child_process error from spawnDaemon. The
explicit log line at the entry point makes that class of bug much
easier to diagnose. The mcp-server.ts module-init existsSync check
already covers this for the MCP-server caller, but defending at the
spawner level reinforces the contract for any future caller.
src/services/worker-spawner.ts: document SettingsDefaultsManager
dependency boundary in the module header
The spawner imports from SettingsDefaultsManager, ProcessManager, and
HealthMonitor. None of those currently touch bun:sqlite, but if any
of them ever does, the spawner's SQLite-free contract silently breaks.
The build guardrail in build-hooks.js is the only thing that catches
it. Header comment now flags this so future contributors audit
transitive imports when adding helpers from the shared/infrastructure
layers.
src/services/infrastructure/ProcessManager.ts: add /snap/bin/bun
Ubuntu Snap install path. Now alongside the existing apt path
(/usr/bin/bun) and Homebrew/Linuxbrew paths. The PATH lookup catches
it as fallback, but listing it explicitly avoids paying for an
execSync('which bun') in the common case.
src/servers/mcp-server.ts: elevate missing-bundle log warn → error
A missing worker-service.cjs means EVERY MCP tool call that needs the
worker silently fails. That's a broken-install state, not a transient
condition — match the severity of the dirname-fallback branch above
(which is already ERROR).
Out of scope (consistent with prior rounds, reviewer agrees these are
appropriately deferred):
- Streaming bundle read in build-hooks.js (nit at current 384KB size)
- Unit tests for ensureWorkerStarted / cooldown helpers
- Integration test for non-existent workerScriptPath
Verified: build clean, broadened guardrail still trips on bun:* imports
and ignores string literals, MCP server serves the 7-tool surface,
ProcessManager tests still 44/44.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(mcp): defer WORKER_SCRIPT_PATH check to first call (round 7)
Round 7 of Claude Code Review feedback on PR #1645:
src/servers/mcp-server.ts: extract module-level existsSync check into
checkWorkerScriptPath() and call it lazily from ensureWorkerConnection()
instead of at module load.
The early-warning intent is preserved (the check still fires before any
actual spawn attempt), but tests/tools that import this module without
booting the MCP server no longer see noisy ERROR-level log lines for a
worker bundle they never intended to start. The check is cheap and
idempotent, so calling it on every auto-start attempt is fine.
The two failure-mode branches (dirname-resolution failure vs simple
missing-bundle) remain unchanged — the function body is identical to
the previous module-level if-block, just hoisted into a function and
called from ensureWorkerConnection().
False positive (no change needed):
- Reviewer flagged `mkdirSync` as a dead import in worker-spawner.ts,
but it IS used at line 71 in markWorkerSpawnAttempted (the round-1
ENOENT fix CodeRabbit explicitly asked for).
Out of scope:
- Volta path (~/.volta/bin/bun) — PATH fallback handles it; nit per
reviewer
- worker-spawner.ts unit tests — needs injectable I/O, deferred
consistently since round 1
Verified: build clean, tests 44/44, smoke test 7-tool surface.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(mcp): address PR #1645 review feedback (round 8)
Round 8 of Claude Code Review feedback on PR #1645:
tests/services/worker-spawner.test.ts: NEW FILE — unit tests for the
ensureWorkerStarted entry-point validation guards added in round 6.
Covers the empty-string and non-existent-path cases without requiring
the broader injectable-I/O refactor that the deeper spawn lifecycle
tests would need. 2 new passing tests.
src/services/infrastructure/ProcessManager.ts: memoize
resolveWorkerRuntimePath() for the no-options call site (which is what
spawnDaemon uses). Caches both successful resolutions and the
not-found result so repeated spawn attempts (crash loops, health
thrashing) don't repeatedly hit statSync on candidate paths. Tests
that pass options bypass the cache entirely so existing test cases
remain deterministic. Added resetWorkerRuntimePathCache() exported
for test isolation only.
src/servers/mcp-server.ts: rename checkWorkerScriptPath() →
warnIfWorkerScriptMissing(). Per reviewer: the old name implied a
boolean check but the function returns void and has side effects. New
name is more accurate.
DEFENDED (no change made):
- Reviewer asked to elevate process.cwd() fallback to a synchronous
throw at module load. This conflicts with round 7 feedback which
asked to defer the existsSync check to first call to avoid noisy
test logs. The current lazy approach is the right compromise: it
fires before any actual spawn attempt, attributes the root cause,
and doesn't pollute test imports. Throwing at module load would
crash before stdio is wired up, which is much harder to debug than
the lazy log line.
- Reviewer asked to grep for `if (!pid)` callsites — already verified
in round 3, zero offenders in src/.
Out of scope:
- Volta path (~/.volta/bin/bun) — PATH fallback handles it; reviewer
marked as nit
- Deeper unit tests for ensureWorkerStarted spawn lifecycle (PID file
cleanup, health checks, etc.) — needs injectable I/O, deferred
consistently since round 1
Verified: build clean, ProcessManager tests still 44/44, new
worker-spawner tests 2/2, smoke test serves 7 tools.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(spawner): clear Windows cooldown marker on all healthy paths (round 9)
Round 9 of PR #1645 review feedback.
src/services/worker-spawner.ts: clear stale Windows cooldown marker
on every healthy-return path
Per CodeRabbit (genuine bug):
The .worker-start-attempted marker was previously only cleared after
a spawn initiated by ensureWorkerStarted itself succeeded. If a
previous auto-start failed, then the worker became healthy via
another session or a manual start, the early-return success branches
(existing live PID, fast-path health check, port-in-use waitForHealth)
would leave the stale marker behind. A subsequent genuine outage
inside the 2-minute cooldown window would then be incorrectly
suppressed on Windows.
Now calls clearWorkerSpawnAttempted() on all three healthy success
paths in addition to the existing post-spawn path. The function is
already a no-op on non-Windows, so the change is risk-free for Linux
and macOS callers.
src/servers/mcp-server.ts: more actionable error when auto-start fails
Per claude-review: when ensureWorkerStarted returns false (or throws),
the caller currently logs a generic "Worker auto-start failed" line.
Updated both error sites to explicitly call out which MCP tools will
fail (search/timeline/get_observations) and to point at earlier log
lines for the specific cause. Helps users distinguish "worker is just
not running" from "tools are broken".
DEFENDED (no change):
- Sentinel object for Windows spawnDaemon 0 PID — broader API change,
out of scope, deferred consistently since round 1
- Spawner lifecycle tests beyond input validation — needs injectable
I/O, deferred consistently
- Concurrent cooldown marker race on Windows — pre-existing,
out of scope
- stripHardcodedDirname() regex fragility assertion — pre-existing,
out of scope
Verified: build clean, ProcessManager tests 44/44, worker-spawner
tests 2/2, smoke test 7-tool surface.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(spawner): don't cache null Bun-not-found result (round 10)
Round 10 of PR #1645 review feedback.
src/services/infrastructure/ProcessManager.ts: only cache successful
resolveWorkerRuntimePath() results
Genuine bug from claude-review: the round-8 memoization cached BOTH
successful resolutions AND the not-found `null` result. If Bun isn't
on PATH at the moment the MCP server first tries to spawn the worker
— e.g., on a fresh install where the user installs Bun in another
terminal and retries — every subsequent ensureWorkerConnection call
would return the cached `null` and fail with a misleading "Bun not
found" error even though Bun is now available.
The fix is the one-line change the reviewer suggested: only cache
when `result !== null`. Crash loops still get the fast-path memoized
success; recovery from a fresh-install Bun install still works.
src/servers/mcp-server.ts: rename warnIfWorkerScriptMissing →
errorIfWorkerScriptMissing
Per claude-review: the function uses logger.error but the name says
"warn" — name/level mismatch. Renamed to match. The function still
serves the same purpose (defensive lazy check), just with an accurate
name.
DEFENDED (no change):
- Discriminated union for mcpServerDirResolutionFailed flag — current
approach works, the noise is minimal, and the alternative would
add type complexity for a path that's functionally unreachable in
CJS deployment
- macOS /usr/local/bin/bun "missing" — already in the Linux/macOS
candidate list at line 137 (false positive from reviewer)
- nix store path — out of scope, PATH fallback handles it
- Long build-hooks.js error message — verbosity is intentional, this
message only fires on a real regression and the diagnostic value is
worth the line wrap
- Spawner lifecycle test coverage gap — needs injectable I/O,
deferred consistently
Verified: build clean, ProcessManager tests 44/44, worker-spawner
tests 2/2, smoke test 7-tool surface.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(mcp): bundle size budget guardrail (round 11)
Round 11 of PR #1645 review feedback.
scripts/build-hooks.js: secondary bundle-size budget guardrail
Per claude-review: the existing `require("bun:*")` regex catches the
specific regression class we already know about, but if esbuild ever
changes how it emits external module specifiers, the regex could
silently miss the regression. A bundle-size budget catches the
structural symptom (worker-service.ts dragged into the bundle blew
the size from ~358KB to ~1.96MB) regardless of how the imports look.
Set the ceiling at 600KB. Current size is ~384KB; the broken v12.0.0
bundle was ~1920KB. Plenty of headroom for legitimate growth without
incentivizing bundle bloat or false positives. Both guardrails fire
independently — one is regex-based, one is size-based — so a
regression has to defeat both to ship.
tests/services/worker-spawner.test.ts: comment about port irrelevance
Per claude-review: the hardcoded port values in the validation-guard
tests are arbitrary because the path validation short-circuits before
any network I/O. Added a comment explaining this so future readers
don't waste time wondering why specific ports were picked.
DEFENDED (no change):
- clearWorkerSpawnAttempted on the unhealthy-live-PID return path:
reviewer asked to clear the marker here too, but the current
behavior is correct. The marker tracks "recently attempted a spawn"
and exists to prevent rapid PowerShell-popup loops. If a wedged
process is currently using the port, the spawn isn't actually
happening on this code path (the helper returns false without
reaching the spawn step). When the wedged process eventually dies
and a subsequent call hits the spawn path, the marker correctly
suppresses repeated retry attempts within the 2-minute cooldown.
Clearing the marker on the unhealthy-return path would defeat
exactly the popup-loop protection the marker exists to provide.
- execSync in lookupBinaryInPath blocks event loop: pre-existing
concern, not introduced by this PR. Reviewer notes "fires once,
result cached". Not in scope for a hotfix.
- Tracking issue for spawner lifecycle test gap: out of scope for
this PR; the gap is documented in the test file's header comment
with a back-reference to PR #1645.
Verified: build clean, both guardrails functional (size budget is
under the new ceiling), ProcessManager tests 44/44, worker-spawner
tests 2/2, smoke test 7-tool surface.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(mcp): eliminate double error log when worker bundle is missing (round 12)
Round 12 of PR #1645 review feedback.
src/servers/mcp-server.ts: errorIfWorkerScriptMissing() now only logs
when the dirname-fallback attribution path is needed
Previously a missing worker-service.cjs would produce two ERROR log
lines on the same code path:
1. errorIfWorkerScriptMissing() in ensureWorkerConnection()
2. The existsSync guard inside ensureWorkerStarted()
The simple "missing bundle" case is fully covered by the spawner's
own existsSync guard. The mcp-server.ts function now ONLY logs when
mcpServerDirResolutionFailed is true — that's the mcp-server-specific
root-cause attribution that the spawner cannot provide on its own.
Net effect: same single error log per bug class, cleaner triage.
DEFENDED (no change):
- mkdirSync error propagation in markWorkerSpawnAttempted: reviewer
worried that mkdirSync/writeFileSync exceptions could escape, but
the entire body is already wrapped in try/catch with an APPROVED
OVERRIDE annotation. False positive.
- clearWorkerSpawnAttempted on healthy paths: reviewer asked a
clarifying question, not a change request. The behavior is
intentional — the cooldown marker exists to prevent rapid
PowerShell-popup loops from a series of failed spawns; a healthy
worker means the marker has served its purpose and a future
outage should NOT be suppressed. Will explain in PR reply.
- __filename ESM concern in worker-service.ts wrapper: already
documented in round 4 with an extended comment about the CJS
bundle context and why mcp-server.ts can't use the same trick.
- Spawn lifecycle integration tests: deferred consistently since
round 1; gap is documented in worker-spawner.test.ts header.
Verified: build clean, ProcessManager tests 44/44, worker-spawner
tests 2/2, smoke test 7-tool surface.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test(spawner): add bare-command BUN env override coverage
Final round of PR #1645 review feedback: while preparing to merge, I
noticed CodeRabbit's round-5 CHANGES_REQUESTED review on commit
3570d2f0 included an unaddressed nitpick — the env-driven bare-command
branch in resolveWorkerRuntimePath() (returning a bare 'bun' unchanged
when BUN or BUN_PATH is set that way) had no test coverage and could
regress without any failing assertion.
Added a focused test that exercises the env: { BUN: 'bun' } branch
specifically. 47/47 tests pass (was 46/46).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Observer prompt now explicitly requires XML observation blocks or empty
responses — prose explanations like "Skipping" are discarded. ResponseProcessor
logs a warning when non-XML content is received. Recording focus expanded to
include concrete debugging findings.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: resolve 3 upstream bugs in summarize, ChromaSync, and HealthMonitor
1. summarize.ts: Skip summary when transcript has no assistant message.
Prevents error loop where empty transcripts cause repeated failed
summarize attempts (~30 errors/day observed in production).
2. ChromaSync.ts: Fallback to chroma_update_documents when add fails
with "IDs already exist". Handles partial writes after MCP timeout
without waiting for next backfill cycle.
3. HealthMonitor.ts: Replace HTTP-based isPortInUse with atomic socket
bind on Unix. Eliminates TOCTOU race when two sessions start
simultaneously (HTTP check is non-atomic — both see "port free"
before either completes listen()). Updated tests accordingly.
All three bugs are pre-existing in v10.5.5. Confirmed via log analysis
of 543K lines over 17 days of production usage across two servers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: add CONTRIB_NOTES.md to gitignore
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: address CodeRabbit review on PR #1566
- HealthMonitor: add APPROVED OVERRIDE annotation for Win32 HTTP fallback
- ChromaSync: replace chroma_update_documents with delete+add for proper
upsert (update only modifies existing IDs, silently ignores missing ones)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Alessandro Costa <alessandro@claudio.dev>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- cpSync now does rmSync before copy to avoid stale file merges
- setupIDEs() returns failed IDE list; install reports partial success
- runSmartInstall() returns boolean status instead of void
- Worker port in next-steps URL reads CLAUDE_MEM_WORKER_PORT env var
- Goose YAML regex stops at column-0 keys (prevents eating sibling sections)
- AGENTS.md uninstall removes header-only stub files
- findBunPath() validated before use in WindsurfHooksInstaller
- Cursor marked unsupported in ide-detection until installer is wired
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add shebang banner to NPX CLI esbuild config so npx claude-mem works
- Remove manual backslash pre-escaping in WindsurfHooksInstaller (JSON.stringify handles it)
- Scope cache deletion to claude-mem only, not entire vendor namespace
- Use getWorkerPort() in OpenCodeInstaller instead of hard-coded 37777
- Throw on corrupt JSON in readJsonSafe/readGeminiSettings/Windsurf to prevent data loss
- Fix Cursor install stub to warn instead of silently succeeding
- Fix Gemini uninstall to remove individual hooks within groups, not whole groups
- Update tests for new corrupt-file-throws behavior
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
System reminders (CLAUDE.md contents, deferred tool lists) were being
stored in memory observations. Add system-reminder to the tag stripping
pipeline alongside <private> and <system_instruction>, and extract the
duplicated regex into a shared SYSTEM_REMINDER_REGEX constant.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The fallback path for CLAUDE_PLUGIN_ROOT was pointing to the old
marketplaces install location which no longer exists. Hooks now first
try to find the latest versioned cache directory
(~/.claude/plugins/cache/thedotmack/claude-mem/<version>/) using ls -dt,
with the marketplaces path kept as a final fallback.
This mirrors the self-resolution pattern already used in bun-runner.js
(resolve(__bun_runner_dirname, '..')) but at the shell level, so node
can find bun-runner.js in the first place.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When an observation response accidentally contains a <summary> tag with
plain text (no <request>/<investigated>/etc. sub-tags), parseSummary was
creating empty SESSION SUMMARY records with all fields as empty strings.
Add an all-null guard AFTER field extraction: if none of the 5 sub-tags
matched, the <summary> match is a false positive and we return null.
This is distinct from the commented-out validation above (which rejected
summaries with SOME missing fields). We only reject when ALL are absent —
real partial summaries are still saved per the maintainer's explicit note.
Closes#1360
Co-Authored-By: Claude <noreply@anthropic.com>
JSON.parse('/path/to/file') throws SyntaxError, crashing the viewer and
any code reading observations with legacy bare-path data in those columns.
- Add parseFileList() helper in observations/files.ts — tries JSON.parse,
falls back to wrapping bare strings in an array
- Replace unsafe JSON.parse calls in files.ts, SessionStore.ts, ChromaSync.ts
- Add 9 unit tests covering null, empty, valid JSON, bare paths, invalid JSON
Closes#1359
Co-Authored-By: Claude <noreply@anthropic.com>
completeByDbId only cleaned up in-memory state, leaving sdk_sessions rows
with status='active' and completed_at=NULL indefinitely. Ghost sessions
accumulated and exhausted the agent pool, causing 60s timeout errors.
- Add SessionStore.markSessionCompleted() to set status/completed_at/completed_at_epoch
- Call it at the start of completeByDbId before in-memory cleanup
- Inject SessionStore into SessionCompletionHandler via constructor
- Add 4 tests covering status, timestamps, isolation, and non-existent IDs
Closes#1532
Co-Authored-By: Claude <noreply@anthropic.com>
CLAUDE_MEM_MODEL defaulted to the deprecated claude-sonnet-4-5 across source,
installer, tests, and documentation. Updated all references to claude-sonnet-4-6.
Closes#1390
Co-Authored-By: Claude <noreply@anthropic.com>
When CLAUDE_MEM_FOLDER_USE_LOCAL_MD is set to 'true' in settings,
claude-mem writes auto-generated context to CLAUDE.local.md instead
of CLAUDE.md. This separates personal machine-generated context from
shared project instructions, aligning with Claude Code's native
CLAUDE.local.md convention where:
- CLAUDE.md = team-shared project instructions (checked into git)
- CLAUDE.local.md = personal/local context (gitignored)
Changes:
- Add CLAUDE_MEM_FOLDER_USE_LOCAL_MD setting (default: false)
- Add getTargetFilename() helper to resolve target based on settings
- Update writeClaudeMdToFolder() to accept optional target filename
- Update active-file detection to skip folders with either CLAUDE.md
or CLAUDE.local.md being actively read/modified (issue #859 compat)
- Add 8 new tests covering filename selection, write behavior,
content preservation, atomic writes, and active-file detection
Closes#632
- Update POST_SPAWN_WAIT test assertion from 5000 to 15000 to match
the constant change in hook-constants.ts
- Remove redundant readPidFile() from aggressiveStartupCleanup() —
start() writes the new PID before this runs, so it always returns
process.pid (already protected)
- Add waitForReadiness() to the reused-worker path in
ensureWorkerStarted() to prevent concurrent hooks from racing
past a cold-starting worker's initialization guard
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fixes#1478
When a terminal reports cwd as '~' or '~/subpath' instead of the full
path, getProjectName() fell through to the 'unknown-project' fallback
because path.basename('~') returns '~' as-is.
Added expandTilde() helper that resolves leading ~ to os.homedir(),
called in both getProjectName() and getProjectContext() before path
operations and worktree detection.
1. Fix esbuild inlining build-machine __dirname as string literal — use
CJS-compatible runtime banner with require("node:url").fileURLToPath
across worker-service, mcp-server, and context-generator builds.
2. Fix isMainModule check missing .cjs extension and Windows backslash
path normalization.
3. Wrap extractLastMessage in try-catch to prevent infinite Stop hook
feedback loop on malformed transcripts (exit 0 instead of exit 2).
4. Replace heavy SessionEnd hook (Node→Bun→1.7MB CJS→HTTP) with
lightweight inline node -e one-liner (~200ms vs >1s).
5. Add 7 Gemini/OpenRouter error patterns to unrecoverablePatterns
circuit breaker to prevent 77K+ retry loops on expired API keys.
6. Preserve CLAUDE_CODE_OAUTH_TOKEN and CLAUDE_CODE_GIT_BASH_PATH in
sanitizeEnv instead of stripping them with the CLAUDE_CODE_ prefix.
7. Use PowerShell -EncodedCommand for spawnDaemon to fix path quoting
when Windows usernames contain spaces.
Closes#1515, #1495, #1475, #1465, #1500, #1513, #1512, #1450, #1460,
#1486, #1449, #1481, #1451, #1480, #1453, #1445
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fields concatenated without separators allowed different tuples to produce
identical hashes (e.g. session="ab", title="cd" vs session="abc", title="d").
This could cause legitimate observations to be silently deduplicated.
Join with \x00 so field boundaries are unambiguous.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tighten platform source persistence so legacy callers cannot silently relabel existing sessions, repair migration 24 when schema_versions drifts from the real schema, and polish the follow-up UI/error-handler review nits.
- only backfill platform_source when it is blank and raise on explicit source conflicts for an existing session
- make migration 24 verify both the sdk_sessions column and its index before treating it as applied
- expose platform_source from the functional session getters and add regression tests for source preservation and schema drift recovery
- add the required APPROVED OVERRIDE annotation for centralized HTTP error translation
- keep mobile source pills on a single horizontal row
GeminiAgent sends the full conversation history with every API call,
causing quadratic token growth per session. A 100-observation session
sends ~30M cumulative input tokens. This ports the proven truncateHistory()
sliding window from OpenRouterAgent to GeminiAgent.
- Add CLAUDE_MEM_GEMINI_MAX_CONTEXT_MESSAGES (default: 20) and
CLAUDE_MEM_GEMINI_MAX_TOKENS (default: 100000) settings
- Add truncateHistory() to GeminiAgent using shared estimateTokens()
- Always preserve at least the newest message to avoid empty API requests
- Add settings validation in SettingsRoutes (1-100 messages, 1K-1M tokens)
- Add regression tests for truncation and oversized single-prompt edge case
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: stop spinner from spinning forever due to orphaned DB messages
The activity spinner never stopped because isAnySessionProcessing() queried
ALL pending/processing messages in the database, including orphaned messages
from dead sessions that no generator would ever process.
Root cause: isAnySessionProcessing() used hasAnyPendingWork() which is a
global DB scan. Changed it to use getTotalQueueDepth() which only checks
sessions in the active in-memory Map.
Additional fixes:
- Add terminateSession() to enforce restart-or-terminate invariant
- Fix 3 zombie paths in .finally() handler that left sessions alive
- Clean up idle sessions from memory on successful completion
- Remove redundant bare isProcessing:true broadcast
- Replace inline require() with proper accessor
- Add 8 regression tests for session termination invariant
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: address review findings — idle-timeout race, double broadcast, query amplification
- Move pendingCount check before idle-timeout termination to prevent
abandoning fresh messages that arrive between idle abort and .finally()
- Move broadcastProcessingStatus() inside restart branch only — the else
branch already broadcasts via removeSessionImmediate callback
- Compute queueDepth once in broadcastProcessingStatus() and derive
isProcessing from it, eliminating redundant double iteration
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ColorFormatter and MarkdownFormatter names obscured their actual purpose.
The formatters serve two distinct audiences: the AI agent (compressed,
token-efficient context) and the human (rich ANSI-colored terminal output).
- MarkdownFormatter → AgentFormatter (renderMarkdown* → renderAgent*)
- ColorFormatter → HumanFormatter (renderColor* → renderHuman*)
- useColors parameter → forHuman across the pipeline
- Import aliases Color/Markdown → Human/Agent
- API query param `colors=true` unchanged (backward compatible)
Pure rename refactor — no logic or behavior changes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: strip <system_instruction> tags before database storage
Extends the existing tag-stripping mechanism (used for <private> and
<claude-mem-context>) to also filter Conductor-injected system instructions,
preventing them from being persisted in the observation database.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: also strip <system-instruction> (hyphen variant) before DB storage
Conductor uses both <system_instruction> and <system-instruction> tag
formats. This adds the hyphen variant to the same stripping mechanism.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add embedded Process Supervisor for unified process lifecycle management
Consolidates scattered process management (ProcessManager, GracefulShutdown,
HealthMonitor, ProcessRegistry) into a unified src/supervisor/ module.
New: ProcessRegistry with JSON persistence, env sanitizer (strips CLAUDECODE_*
vars), graceful shutdown cascade (SIGTERM → 5s wait → SIGKILL with tree-kill
on Windows), PID file liveness validation, and singleton Supervisor API.
Fixes#1352 (worker inherits CLAUDECODE env causing nested sessions)
Fixes#1356 (zombie TCP socket after Windows reboot)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add session-scoped process reaping to supervisor
Adds reapSession(sessionId) to ProcessRegistry for killing session-tagged
processes on session end. SessionManager.deleteSession() now triggers reaping.
Tightens orphan reaper interval from 60s to 30s.
Fixes#1351 (MCP server processes leak on session end)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add Unix domain socket support for worker communication
Introduces socket-manager.ts for UDS-based worker communication, eliminating
port 37777 collisions between concurrent sessions. Worker listens on
~/.claude-mem/sockets/worker.sock by default with TCP fallback.
All hook handlers, MCP server, health checks, and admin commands updated to
use socket-aware workerHttpRequest(). Backwards compatible — settings can
force TCP mode via CLAUDE_MEM_WORKER_TRANSPORT=tcp.
Fixes#1346 (port 37777 collision across concurrent sessions)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: remove in-process worker fallback from hook command
Removes the fallback path where hook scripts started WorkerService in-process,
making the worker a grandchild of Claude Code (killed by sandbox). Hooks now
always delegate to ensureWorkerStarted() which spawns a fully detached daemon.
Fixes#1249 (grandchild process killed by sandbox)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add health checker and /api/admin/doctor endpoint
Adds 30-second periodic health sweep that prunes dead processes from the
supervisor registry and cleans stale socket files. Adds /api/admin/doctor
endpoint exposing supervisor state, process liveness, and environment health.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add comprehensive supervisor test suite
64 tests covering all supervisor modules: process registry (18 tests),
env sanitizer (8), shutdown cascade (10), socket manager (15), health
checker (5), and supervisor API (6). Includes persistence, isolation,
edge cases, and cross-module integration scenarios.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: revert Unix domain socket transport, restore TCP on port 37777
The socket-manager introduced UDS as default transport, but this broke
the HTTP server's TCP accessibility (viewer UI, curl, external monitoring).
Since there's only ever one worker process handling all sessions, the
port collision rationale for UDS doesn't apply. Reverts to TCP-only,
removing ~900 lines of unnecessary complexity.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: remove dead code found in pre-landing review
Remove unused `acceptingSpawns` field from Supervisor class (written but
never read — assertCanSpawn uses stopPromise instead) and unused
`buildWorkerUrl` import from context handler.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* updated gitignore
* fix: address PR review feedback - downgrade HTTP logging, clean up gitignore, harden supervisor
- Downgrade request/response HTTP logging from info to debug to reduce noise
- Remove unused getWorkerPort imports, use buildWorkerUrl helper
- Export ENV_PREFIXES/ENV_EXACT_MATCHES from env-sanitizer, reuse in Server.ts
- Fix isPidAlive(0) returning true (should be false)
- Add shutdownInitiated flag to prevent signal handler race condition
- Make validateWorkerPidFile testable with pidFilePath option
- Remove unused dataDir from ShutdownCascadeOptions
- Upgrade reapSession log from debug to warn
- Rename zombiePidFiles to deadProcessPids (returns actual PIDs)
- Clean up gitignore: remove duplicate datasets/, stale ~*/ and http*/ patterns
- Fix tests to use temp directories instead of relying on real PID file
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: always pass --ssl flag to chroma-mcp in remote mode
The chroma-mcp CLI defaults to SSL when using --client-type http.
When CLAUDE_MEM_CHROMA_SSL is false (the common case for local
ChromaDB servers), buildCommandArgs() omitted --ssl entirely,
causing chroma-mcp to attempt an SSL connection to a plain HTTP
server and fail with "Could not connect to a Chroma server".
Always pass --ssl with an explicit true/false value so the user's
CLAUDE_MEM_CHROMA_SSL setting is faithfully forwarded.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* test: add regression tests for ChromaMcpManager SSL flag fix
Adds 4 focused test cases verifying buildCommandArgs() produces correct
--ssl args, covering SSL=false, SSL=true, unset (defaults to false), and
local mode (no --ssl flag). Requested by @xkonjin in PR #1286 review.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: rebuild checked-in bundles to include SSL flag fix
Rebuild all bundles against upstream/main so the --ssl <true|false>
fix is present in the runtime artifacts that hooks and the marketplace
plugin actually execute.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
When a claude-mem DB is synced between machines running different versions,
orphaned indexes can reference non-existent columns (e.g. idx_observations_content_hash
referencing content_hash). This causes SQLite to throw "malformed database schema"
on ALL queries, including PRAGMAs, creating a silent 503 failure loop.
The fix detects this on startup, uses Python's sqlite3 module to drop the
orphaned schema objects (bun:sqlite doesn't support writable_schema modifications),
resets migration versions, and lets the idempotent migration system recreate
everything properly.
Fixes#1307
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: remove unrecognized fields from Claude Code Stop hook output
Claude Code validates Stop hook JSON output against its hook contract
schema which only accepts {decision?, reason?, systemMessage?}. The
formatOutput() function was returning {continue, suppressOutput} which
are not part of the Claude Code hook API, causing "JSON validation
failed" errors on every session stop.
Return an empty object {} for the default case (no hookSpecificOutput),
preserving only systemMessage when present. This is valid for all hook
event types and eliminates the schema validation error.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* test: add unhappy-path tests for formatOutput per PR review
Add edge case coverage for malformed input (undefined/null), falsy
systemMessage values, non-contract field stripping, and contract key
allowlist. Also add defensive null guard to formatOutput matching
normalizeInput pattern.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Alex Worland <alexworland@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
proc.killed only means Node sent a signal — the process can still be alive.
This caused premature pool slot release, allowing unbounded process spawning.
- ensureProcessExit: remove proc.killed from early-exit checks, only trust exitCode
- Fix 3 call-site guards that skipped cleanup for signaled-but-alive processes
- Add TOTAL_PROCESS_HARD_CAP=10 safety net in waitForSlot()
- After SIGKILL, wait up to 1s via exit event instead of blind 200ms sleep
- Reduce reaper interval from 5min to 1min, idle threshold from 2min to 1min
Closes#1226
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
smart-install.js used stdio: 'inherit' for execSync calls, leaking
plain-text install output (bun/npm progress) to stdout. Claude Code
expects hook output to start with '{' (valid JSON), so this caused
a confusing "SessionStart:startup hook error" on every session start.
Changes:
- Pipe stdout on all execSync calls to prevent non-JSON stdout leak
- Output {"continue":true,"suppressOutput":true} on both success and
error paths so Claude Code always receives valid JSON
- Add tests verifying no stdio:'inherit' remains and JSON is output
Fixes#1253
Vibe-coded by Ousama Ben Younes
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: resolve PostToolUse hook crashes and 5s latency (#1220)
Three compounding bugs caused hook failures:
1. Missing break statements in worker-service.ts switch — if async
code threw before process.exit(), execution fell through to
subsequent cases. Added break to all 7 cases missing them.
2. Unhandled promise rejection on main() — added .catch() that logs
the error and exits 0 (per project exit code strategy: don't block
Claude Code or leave Windows Terminal tabs open).
3. Redundant start commands in hooks.json — PostToolUse,
UserPromptSubmit, and Stop groups each had a standalone start
command that was redundant (the hook case already calls
ensureWorkerStarted internally). The redundant start also caused
5s latency via bun-runner.js collectStdin() timeout since Claude
Code never closes stdin.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: add CLAUDE_PLUGIN_ROOT fallback for Stop hooks (#1215)
Upstream Claude Code bug (anthropics/claude-code#24529) leaves
CLAUDE_PLUGIN_ROOT unset for Stop hooks on macOS and ALL hooks
on Linux. Two-layer defense:
1. Shell-level: hooks.json commands now use inline fallback
_R="${CLAUDE_PLUGIN_ROOT}"; [ -z "$_R" ] && _R="$HOME/...";
falling back to the known marketplace install path.
2. Script-level: bun-runner.js self-resolves plugin root from
its own filesystem location via import.meta.url, and fixes
broken /scripts/... paths that result from empty expansion.
Added test to verify all hook commands include the fallback path.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* MAESTRO: fix ChromaDB core issues — Python pinning, Windows paths, disable toggle, metadata sanitization, transport errors
- Add --python version pinning to uvx args in both local and remote mode (fixes#1196, #1206, #1208)
- Convert backslash paths to forward slashes for --data-dir on Windows (fixes#1199)
- Add CLAUDE_MEM_CHROMA_ENABLED setting for SQLite-only fallback mode (fixes#707)
- Sanitize metadata in addDocuments() to filter null/undefined/empty values (fixes#1183, #1188)
- Wrap callTool() in try/catch for transport errors with auto-reconnect (fixes#1162)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* MAESTRO: fix data integrity — content-hash deduplication, project name collision, empty project guard, stuck isProcessing
- Add SHA-256 content-hash deduplication to observations INSERT (store.ts, transactions.ts, SessionStore.ts)
- Add content_hash column via migration 22 with backfill and index
- Fix project name collision: getCurrentProjectName() now returns parent/basename
- Guard against empty project string with cwd-derived fallback
- Fix stuck isProcessing: hasAnyPendingWork() resets processing messages older than 5 minutes
- Add 12 new tests covering all four fixes
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* MAESTRO: fix hook lifecycle — stderr suppression, output isolation, conversation pollution prevention
- Suppress process.stderr.write in hookCommand() to prevent Claude Code showing diagnostic
output as error UI (#1181). Restores stderr in finally block for worker-continues case.
- Convert console.error() to logger.warn()/error() in hook-command.ts and handlers/index.ts
so all diagnostics route to log file instead of stderr.
- Verified all 7 handlers return suppressOutput: true (prevents conversation pollution #598, #784).
- Verified session-complete is a recognized event type (fixes#984).
- Verified unknown event types return no-op handler with exit 0 (graceful degradation).
- Added 10 new tests in tests/hook-lifecycle.test.ts covering event dispatch, adapter defaults,
stderr suppression, and standard response constants.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* MAESTRO: fix worker lifecycle — restart loop coordination, stale transport retry, ENOENT shutdown race
- Add PID file mtime guard to prevent concurrent restart storms (#1145):
isPidFileRecent() + touchPidFile() coordinate across sessions
- Add transparent retry in ChromaMcpManager.callTool() on transport
error — reconnects and retries once instead of failing (#1131)
- Wrap getInstalledPluginVersion() with ENOENT/EBUSY handling (#1042)
- Verified ChromaMcpManager.stop() already called on all shutdown paths
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* MAESTRO: fix Windows platform support — uvx.cmd spawn, PowerShell $_ elimination, windowsHide, FTS5 fallback
- Route uvx spawn through cmd.exe /c on Windows since MCP SDK lacks shell:true (#1190, #1192, #1199)
- Replace all PowerShell Where-Object {$_} pipelines with WQL -Filter server-side filtering (#1024, #1062)
- Add windowsHide: true to all exec/spawn calls missing it to prevent console popups (#1048)
- Add FTS5 runtime probe with graceful fallback when unavailable on Windows (#791)
- Guard FTS5 table creation in migrations, SessionSearch, and SessionStore with try/catch
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* MAESTRO: fix skills/ distribution — build-time verification and regression tests (#1187)
Add post-build verification in build-hooks.js that fails if critical
distribution files (skills, hooks, plugin manifest) are missing. Add
10 regression tests covering skill file presence, YAML frontmatter,
hooks.json integrity, and package.json files field.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* MAESTRO: fix MigrationRunner schema initialization (#979) — version conflict between parallel migration systems
Root cause: old DatabaseManager migrations 1-7 shared schema_versions table with
MigrationRunner's 4-22, causing version number collisions (5=drop tables vs add column,
6=FTS5 vs prompt tracking, 7=discovery_tokens vs remove UNIQUE). initializeSchema()
was gated behind maxApplied===0, so core tables were never created when old versions
were present.
Fixes:
- initializeSchema() always creates core tables via CREATE TABLE IF NOT EXISTS
- Migrations 5-7 check actual DB state (columns/constraints) not just version tracking
- Crash-safe temp table rebuilds (DROP IF EXISTS _new before CREATE)
- Added missing migration 21 (ON UPDATE CASCADE) to MigrationRunner
- Added ON UPDATE CASCADE to FK definitions in initializeSchema()
- All changes applied to both runner.ts and SessionStore.ts
Tests: 13 new tests in migration-runner.test.ts covering fresh DB, idempotency,
version conflicts, crash recovery, FK constraints, and data integrity.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* MAESTRO: fix 21 test failures — stale mocks, outdated assertions, missing OpenClaw guards
Server tests (12): Added missing workerPath and getAiStatus to ServerOptions
mocks after interface expansion. ChromaSync tests (3): Updated to verify
transport cleanup in ChromaMcpManager after architecture refactor. OpenClaw (2):
Added memory_ tool skipping and response truncation to prevent recursive loops
and oversized payloads. MarkdownFormatter (2): Updated assertions to match
current output. SettingsDefaultsManager (1): Used correct default key for
getBool test. Logger standards (1): Excluded CLI transcript command from
background service check.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* MAESTRO: fix Codex CLI compatibility (#744) — session_id fallbacks, unknown platform tolerance, undefined guard
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* MAESTRO: fix Cursor IDE integration (#838, #1049) — adapter field fallbacks, tolerant session-init validation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* MAESTRO: fix /api/logs OOM (#1203) — tail-read replaces full-file readFileSync
Replace readFileSync (loads entire file into memory) with readLastLines()
that reads only from the end of the file in expanding chunks (64KB → 10MB cap).
Prevents OOM on large log files while preserving the same API response shape.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* MAESTRO: fix Settings CORS error (#1029) — explicit methods and allowedHeaders in CORS config
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* MAESTRO: add session custom_title for agent attribution (#1213) — migration 23, endpoint + store support
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* MAESTRO: prevent CLAUDE.md/AGENTS.md writes inside .git/ directories (#1165)
Add .git path guard to all 4 write sites to prevent ref corruption when
paths resolve inside .git internals.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* MAESTRO: fix plugin disabled state not respected (#781) — early exit check in all hook entry points
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* MAESTRO: fix UserPromptSubmit context re-injection on every turn (#1079) — contextInjected session flag
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* MAESTRO: fix stale AbortController queue stall (#1099) — lastGeneratorActivity tracking + 30s timeout
Three-layer fix:
1. Added lastGeneratorActivity timestamp to ActiveSession, updated by
processAgentResponse (all agents), getMessageIterator (queue yields),
and startGeneratorWithProvider (generator launch)
2. Added stale generator detection in ensureGeneratorRunning — if no
activity for >30s, aborts stale controller, resets state, restarts
3. Added AbortSignal.timeout(30000) in deleteSession to prevent
indefinite hang when awaiting a stuck generator promise
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
src/services/transcripts/cli.ts is a CLI command with user-visible
console output, not a background service — console.log is intentional.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace hardcoded marketplace path in plugin/scripts/smart-install.js with
resolveRoot() that uses CLAUDE_PLUGIN_ROOT env var (set by Claude Code for
all hooks), with fallback to script location and legacy paths. Fixes#1128,
#1166 where cache installs couldn't find or install node_modules.
Also fixes installCLI() path (ROOT/plugin/scripts/ → ROOT/scripts/) and adds
verifyCriticalModules() post-install check with npm fallback on failure.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add string-to-array coercion for ids and memorySessionIds in DataRoutes.ts
batch endpoints so MCP clients sending "[1,2,3]" or "1,2,3" instead of
native arrays no longer get 400 errors. Wrap observation storage path in
SessionRoutes.ts with try/catch returning 200 on recoverable errors instead
of 500, preventing hook breakage.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: prevent duplicate worker daemons and zombie processes
Three root causes of chroma-mcp timeouts:
1. HTTP shutdown (POST /api/admin/shutdown) closed resources but never
called process.exit(). Zombie workers stayed alive, background tasks
reconnected to chroma-mcp, spawning duplicate subprocesses that all
contended for the same persistent data directory.
2. No guard against concurrent daemon startup. When hooks fired
simultaneously, multiple daemons started before either wrote a PID
file. The loser got EADDRINUSE but stayed alive because signal
handlers registered in the constructor prevented exit.
3. Corrupt 147GB HNSW index file caused all chroma queries to timeout
(MCP error -32001). Data fix: deleted corrupt collection, backfill
rebuilds from SQLite.
Code fixes:
- Add PID-based guard in daemon startup: exit if PID file process alive
- Add port-based guard in daemon startup: exit if port already bound
(runs before WorkerService constructor registers keepalive handlers)
- Add process.exit(0) after HTTP shutdown/restart completes
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: aggressive startup cleanup and one-time chroma wipe for upgrade
Kill orphaned worker-service.cjs and chroma-mcp processes immediately
at startup (no age gate) while keeping 30-min threshold for mcp-server.
Wipe corrupt chroma data once on upgrade from pre-v10.3 versions —
backfill rebuilds from SQLite automatically.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: wrap shutdown handlers in try/finally to guarantee process.exit
If onShutdown() or onRestart() threw, process.exit(0) was never reached,
leaving the daemon alive as a zombie. Also removed redundant require('fs')
calls in process-manager tests where ESM imports already existed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>