Commit Graph

695 Commits

Author SHA1 Message Date
Alex Newman f2d361b918 feat: security observation types + Telegram notifier (#2084)
* feat: security observation types + Telegram notifier

Adds two severity-axis security observation types (security_alert, security_note)
to the code mode and a fire-and-forget Telegram notifier that posts when a saved
observation matches configured type or concept triggers. Default trigger fires on
security_alert only; notifier is disabled until BOT_TOKEN and CHAT_ID are set.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(telegram): honor CLAUDE_MEM_TELEGRAM_ENABLED master toggle

Adds an explicit on/off flag (default 'true') so users can disable the
notifier without clearing credentials.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf(stop-hook): make summarize handler fire-and-forget

Stop hook previously blocked the Claude Code session for up to 110
seconds while polling the worker for summary completion. The handler
now returns as soon as the enqueue POST is acked.

- summarize.ts: drop the 500ms polling loop and /api/sessions/complete
  call; tighten SUMMARIZE_TIMEOUT_MS from 300s to 5s since the worker
  acks the enqueue synchronously.
- SessionCompletionHandler: extract idempotent finalizeSession() for
  DB mark + orphaned-pending-queue drain + broadcast. completeByDbId
  now delegates so the /api/sessions/complete HTTP route is backward
  compatible.
- SessionRoutes: wire finalizeSession into the SDK-agent generator's
  finally block, gated on lastSummaryStored + empty pending queue so
  only Stop events produce finalize (not every idle tick).
- WorkerService: own the single SessionCompletionHandler instance and
  inject it into SessionRoutes to avoid duplicate construction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(pr2084): address reviewer findings

CodeRabbit:
- SessionStore.getSessionById now returns status; without it, the
  finalizeSession idempotency guard always evaluated false and
  re-fired drain/broadcast on every call.
- worker-service.ts: three call sites that remove the in-memory session
  after finalizeSession now do so only on success. On failure the
  session is left in place so the 60s orphan reaper can retry; removing
  it would orphan an 'active' DB row indefinitely under the fire-and-
  forget Stop hook.
- runFallbackForTerminatedSession no longer emits a second
  session_completed event; finalizeSession already broadcasts one.
  The explicit broadcast now runs only on the finalize-failure fallback.

Greptile:
- TelegramNotifier reads via loadFromFile(USER_SETTINGS_PATH) so values
  in ~/.claude-mem/settings.json actually take effect; SettingsDefaultsManager.get()
  alone skipped the file and silently ignored user-configured credentials.
- Emoji is derived from obs.type (security_alert → 🚨, security_note → 🔐,
  fallback 🔔) instead of hardcoded 🚨 for every observation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(hooks): worker-port mismatch on Windows and settings.json overrides (#2086)

Hooks computed the health-check port as \$((37700 + id -u % 100)),
ignoring ~/.claude-mem/settings.json. Two failure modes resulted:

1. Users upgrading from pre-per-uid builds kept CLAUDE_MEM_WORKER_PORT
   set to '37777' in settings.json. The worker bound 37777 (settings
   wins), but hooks queried 37701 (uid 501 on macOS), so every
   SessionStart/UserPromptSubmit health check failed.
2. Windows Git Bash/PowerShell returns a real Windows UID for 'id -u'
   (e.g. 209), producing port 37709 while the Node worker fell back
   to 37777 (process.getuid?.() ?? 77). Every prompt hit the 60s hook
   timeout.

hooks.json now resolves the port in this order, matching how the
worker itself resolves it:
  1. sed CLAUDE_MEM_WORKER_PORT from ~/.claude-mem/settings.json
  2. If absent, and uname is MINGW/CYGWIN/MSYS → 37777
  3. Otherwise 37700 + (id -u || 77) % 100

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(pr2084): sync DatabaseManager.getSessionById return type

CodeRabbit round 2: the DatabaseManager.getSessionById return type
was missing platform_source, custom_title, and status fields that
SessionStore.getSessionById actually returns. Structural typing
hid the mismatch at compile time, but it prevents callers going
through DatabaseManager from seeing the status field that the
idempotency guard in SessionCompletionHandler relies on.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(pr2084): hooks honor env vars and host; looser port regex (#2086 followup)

CodeRabbit round 3: match the worker's env > file > defaults precedence
and resolve host the same way as port.

- Env: CLAUDE_MEM_WORKER_PORT and CLAUDE_MEM_WORKER_HOST win first.
- File: sed now accepts both quoted ('"37777"') and unquoted (37777)
  JSON values for the port; a separate sed reads CLAUDE_MEM_WORKER_HOST.
- Defaults: port per-uid formula (Windows: 37777), host 127.0.0.1.
- Health-check URL uses the resolved $HOST instead of hardcoded localhost.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 16:08:28 -07:00
Alex Newman 99060bac1a fix: detect PID reuse in worker start-guard (container restarts) (#2082)
* fix: detect PID reuse in worker start-guard to survive container restarts

The 'Worker already running' guard checked PID liveness with kill(0), which
false-positives when a persistent PID file outlives the PID namespace (docker
stop / docker start, pm2 graceful reloads). The new worker comes up with the
same low PID (e.g. 11) as the old one, kill(0) says 'alive', and the worker
refuses to start against its own prior incarnation.

Capture a process-start token alongside the PID and verify identity, not just
liveness:
  - Linux: /proc/<pid>/stat field 22 (starttime, jiffies since boot)
  - macOS/POSIX: `ps -p <pid> -o lstart=`
  - Windows: unchanged (returns null, falls back to liveness)

PID files written by older versions are token-less, so verifyPidFileOwnership
falls back to the current liveness-only behavior for backwards compatibility.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: apply review feedback to PID identity helpers

- Collapse ProcessManager re-export down to a single import/export statement.
- Make verifyPidFileOwnership a type predicate (info is PidInfo) so callers
  don't need non-null assertions on the narrowed value.
- Drop the `!` assertions at the worker-service GUARD 1 call site now that
  the predicate narrows.
- Tighten the captureProcessStartToken platform doc comment to enumerate
  process.platform values explicitly.

No behavior change — esbuild output is byte-identical (type-only edits).
Addresses items 1-3 of the claude-review comment on PR #2082.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: pin LC_ALL=C for `ps lstart=` in captureProcessStartToken

Without a locale pin, `ps -o lstart=` emits month/weekday names in the
system locale. A bind-mounted PID file written under one locale and read
under another would hash to different tokens and the live worker would
incorrectly appear stale — reintroducing the very bug this helper exists
to prevent.

Flagged by Greptile on PR #2082.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: address second-round review on PID identity helpers

- verifyPidFileOwnership: log a DEBUG diagnostic when the PID is alive but
  the start-token mismatches. Without it, callers can't distinguish the
  "process dead" path from the "PID reused" path in production logs — the
  exact case this helper exists to catch.
- writePidFile: drop the redundant `?? undefined` coercion. `null` and
  `undefined` are both falsy for the subsequent ternary, so the coercion
  was purely cosmetic noise that suggested an important distinction.
- Add a unit test for the win32 fallback path in captureProcessStartToken
  (mocks process.platform) — previously uncovered in CI.

Addresses items 1, 2, and 5 of the second claude-review on PR #2082.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 19:49:03 -07:00
Alex Newman 03748acd6a refactor: remove bearer auth and platform_source context filter (#2081)
* fix: resolve search, database, and docker bugs (#1913, #1916, #1956, #1957, #2048)

- Fix concept/concepts param mismatch in SearchManager.normalizeParams (#1916)
- Add FTS5 keyword fallback when ChromaDB is unavailable (#1913, #2048)
- Add periodic WAL checkpoint and journal_size_limit to prevent unbounded WAL growth (#1956)
- Add periodic clearFailed() to purge stale pending_messages (#1957)
- Fix nounset-safe TTY_ARGS expansion in docker/claude-mem/run.sh

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: prevent silent data loss on non-XML responses, add queue info to /health (#1867, #1874)

- ResponseProcessor: mark messages as failed (with retry) instead of confirming
  when the LLM returns non-XML garbage (auth errors, rate limits) (#1874)
- Health endpoint: include activeSessions count for queue liveness monitoring (#1867)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: cache isFts5Available() at construction time

Addresses Greptile review: avoid DDL probe (CREATE + DROP) on every text
query. Result is now cached in _fts5Available at construction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve worker stability bugs — pool deadlock, MCP loopback, restart guard (#1868, #1876, #2053)

- Replace flat consecutiveRestarts counter with time-windowed RestartGuard:
  only counts restarts within 60s window (cap=10), decays after 5min of
  success. Prevents stranding pending messages on long-running sessions. (#2053)

- Add idle session eviction to pool slot allocation: when all slots are full,
  evict the idlest session (no pending work, oldest activity) to free a slot
  for new requests, preventing 60s timeout deadlock. (#1868)

- Fix MCP loopback self-check: use process.execPath instead of bare 'node'
  which fails on non-interactive PATH. Fix crash misclassification by removing
  false "Generator exited unexpectedly" error log on normal completion. (#1876)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve hooks reliability bugs — summarize exit code, session-init health wait (#1896, #1901, #1903, #1907)

- Wrap summarize hook's workerHttpRequest in try/catch to prevent exit
  code 2 (blocking error) on network failures or malformed responses.
  Session exit no longer blocks on worker errors. (#1901)

- Add health-check wait loop to UserPromptSubmit session-init command in
  hooks.json. On Linux/WSL where hook ordering fires UserPromptSubmit
  before SessionStart, session-init now waits up to 10s for worker health
  before proceeding. Also wrap session-init HTTP call in try/catch. (#1907)

- Close #1896 as already-fixed: mtime comparison at file-context.ts:255-267
  bypasses truncation when file is newer than latest observation.

- Close #1903 as no-repro: hooks.json correctly declares all hook events.
  Issue was Claude Code 12.0.1/macOS platform event-dispatch bug.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: security hardening — bearer auth, path validation, rate limits, per-user port (#1932, #1933, #1934, #1935, #1936)

- Add bearer token auth to all API endpoints: auto-generated 32-byte
  token stored at ~/.claude-mem/worker-auth-token (mode 0600). All hook,
  MCP, viewer, and OpenCode requests include Authorization header.
  Health/readiness endpoints exempt for polling. (#1932, #1933)

- Add path traversal protection: watch.context.path validated against
  project root and ~/.claude-mem/ before write. Rejects ../../../etc
  style attacks. (#1934)

- Reduce JSON body limit from 50MB to 5MB. Add in-memory rate limiter
  (300 req/min/IP) to prevent abuse. (#1935)

- Derive default worker port from UID (37700 + uid%100) to prevent
  cross-user data leakage on multi-user macOS. Windows falls back to
  37777. Shell hooks use same formula via id -u. (#1936)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve search project filtering and import Chroma sync (#1911, #1912, #1914, #1918)

- Fix per-type search endpoints to pass project filter to Chroma queries
  and SQLite hydration. searchObservations/Sessions/UserPrompts now use
  $or clause matching project + merged_into_project. (#1912)

- Fix timeline/search methods to pass project to Chroma anchor queries.
  Prevents cross-project result leakage when project param omitted. (#1911)

- Sync imported observations to ChromaDB after FTS rebuild. Import
  endpoint now calls chromaSync.syncObservation() for each imported
  row, making them visible to MCP search(). (#1914)

- Fix session-init cwd fallback to match context.ts (process.cwd()).
  Prevents project key mismatch that caused "no previous sessions"
  on fresh sessions. (#1918)

- Fix sync-marketplace restart to include auth token and per-user port.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve all CodeRabbit and Greptile review comments on PR #2080

- Fix run.sh comment mismatch (no-op flag vs empty array)
- Gate session-init on health check success (prevent running when worker unreachable)
- Fix date_desc ordering ignored in FTS session search
- Age-scope failed message purge (1h retention) instead of clearing all
- Anchor RestartGuard decay to real successes (null init, not Date.now())
- Add recordSuccess() calls in ResponseProcessor and completion path
- Prevent caller headers from overriding bearer auth token
- Add lazy cleanup for rate limiter map to prevent unbounded growth
- Bound post-import Chroma sync with concurrency limit of 8
- Add doc_type:'observation' filter to Chroma queries feeding observation hydration
- Add FTS fallback to all specialized search handlers (observations, sessions, prompts, timeline)
- Add response.ok check and error handling in viewer saveSettings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve CodeRabbit round-2 review comments

- Use failure timestamp (COALESCE) instead of created_at_epoch for stale purge
- Downgrade _fts5Available flag when FTS table creation fails
- Escape FTS5 MATCH input by quoting user queries as literal phrases
- Escape LIKE metacharacters (%, _, \) in prompt text search
- Add response.ok check in initial settings load (matches save flow)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve CodeRabbit round-3 review comments

- Include failed_at_epoch in COALESCE for age-scoped purge
- Re-throw FTS5 errors so callers can distinguish failure from no-results
- Wrap all FTS fallback calls in SearchManager with try/catch

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: remove bearer auth and platform_source from context inject

Bearer token auth (#1932/#1933) added friction for all localhost API
clients with no benefit — the worker already binds localhost-only (CORS
restriction + host binding). Removed auth-token module, requireAuth
middleware, and Authorization headers from all internal callers.

platform_source filtering from the /api/context/inject path was never
used by any caller and silently filtered out observations. The underlying
platform_source column stays; only the query-time filter and its plumbing
through ContextBuilder, ObservationCompiler, SearchRoutes, context.ts,
and transcripts/processor.ts are removed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: resolve CodeRabbit + Greptile + claude-review comments on PR #2081

- middleware.ts: drop 'Authorization' from CORS allowedHeaders (Greptile)
- middleware.ts: rate limiter falls back to req.socket.remoteAddress; add Retry-After on 429 (claude-review)
- SearchRoutes.ts: drop leftover platformSource read+pass in handleContextPreview (Greptile)
- .docker-blowout-data/: stop tracking the empty SQLite placeholder and gitignore the dir (claude-review)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: tighten rate limiter — correct boundary + drop dead cleanup branch

- `entry.count >= RATE_LIMIT_MAX_REQUESTS` so the 300th request is the
  first rejected (was 301).
- Removed the `requestCounts.size > 100` lazy-cleanup block — on a
  localhost-only server the map tops out at 1–2 entries, so the branch
  was dead code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: rate limiter correctly allows exactly 300 req/min; doc localhost scope

- Check `entry.count >= max` BEFORE incrementing so the cap matches the
  comment: 300 requests pass, the 301st gets 429.
- Added a comment noting the limiter is effectively a global cap on a
  localhost-only worker (all callers share the 127.0.0.1/::1 bucket).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: normalise IPv4-mapped IPv6 in rate limiter client IP

Strip the `::ffff:` prefix so a localhost caller routed as
`::ffff:127.0.0.1` shares a bucket with `127.0.0.1`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: size-guarded prune of rate limiter map for non-localhost deploys

Prune expired entries only when the map exceeds 1000 keys and we're
already doing a window reset, so the cost is zero on the localhost hot
path (1–2 keys) and the map can't grow unbounded if the worker is ever
bound on a non-loopback interface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 13:31:13 -07:00
Alex Newman 8fd3685d6e chore: bump version to 12.3.6
Removes the 300 req/min rate limiter from the worker's HTTP middleware.
The worker is localhost-only (enforced via CORS), so rate limiting was
pointless security theater — but it broke the viewer, which polls logs
and stats frequently enough to trip the limit within seconds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 12:35:52 -07:00
Alex Newman 8d166b47c1 Revert "revert: roll back v12.3.3 (Issue Blowout 2026)"
This reverts commit bfc7de377a.
2026-04-20 12:18:55 -07:00
Alex Newman bfc7de377a revert: roll back v12.3.3 (Issue Blowout 2026)
SessionStart context injection regressed in v12.3.3 — no memory
context is being delivered to new sessions. Rolling back to the
v12.3.2 tree state while the regression is investigated.

Reverts #2080.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 11:59:15 -07:00
Alex Newman ba1ef6c42c fix: Issue Blowout 2026 — 25 bugs across worker, hooks, security, and search (#2080)
* fix: resolve search, database, and docker bugs (#1913, #1916, #1956, #1957, #2048)

- Fix concept/concepts param mismatch in SearchManager.normalizeParams (#1916)
- Add FTS5 keyword fallback when ChromaDB is unavailable (#1913, #2048)
- Add periodic WAL checkpoint and journal_size_limit to prevent unbounded WAL growth (#1956)
- Add periodic clearFailed() to purge stale pending_messages (#1957)
- Fix nounset-safe TTY_ARGS expansion in docker/claude-mem/run.sh

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: prevent silent data loss on non-XML responses, add queue info to /health (#1867, #1874)

- ResponseProcessor: mark messages as failed (with retry) instead of confirming
  when the LLM returns non-XML garbage (auth errors, rate limits) (#1874)
- Health endpoint: include activeSessions count for queue liveness monitoring (#1867)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: cache isFts5Available() at construction time

Addresses Greptile review: avoid DDL probe (CREATE + DROP) on every text
query. Result is now cached in _fts5Available at construction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve worker stability bugs — pool deadlock, MCP loopback, restart guard (#1868, #1876, #2053)

- Replace flat consecutiveRestarts counter with time-windowed RestartGuard:
  only counts restarts within 60s window (cap=10), decays after 5min of
  success. Prevents stranding pending messages on long-running sessions. (#2053)

- Add idle session eviction to pool slot allocation: when all slots are full,
  evict the idlest session (no pending work, oldest activity) to free a slot
  for new requests, preventing 60s timeout deadlock. (#1868)

- Fix MCP loopback self-check: use process.execPath instead of bare 'node'
  which fails on non-interactive PATH. Fix crash misclassification by removing
  false "Generator exited unexpectedly" error log on normal completion. (#1876)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve hooks reliability bugs — summarize exit code, session-init health wait (#1896, #1901, #1903, #1907)

- Wrap summarize hook's workerHttpRequest in try/catch to prevent exit
  code 2 (blocking error) on network failures or malformed responses.
  Session exit no longer blocks on worker errors. (#1901)

- Add health-check wait loop to UserPromptSubmit session-init command in
  hooks.json. On Linux/WSL where hook ordering fires UserPromptSubmit
  before SessionStart, session-init now waits up to 10s for worker health
  before proceeding. Also wrap session-init HTTP call in try/catch. (#1907)

- Close #1896 as already-fixed: mtime comparison at file-context.ts:255-267
  bypasses truncation when file is newer than latest observation.

- Close #1903 as no-repro: hooks.json correctly declares all hook events.
  Issue was Claude Code 12.0.1/macOS platform event-dispatch bug.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: security hardening — bearer auth, path validation, rate limits, per-user port (#1932, #1933, #1934, #1935, #1936)

- Add bearer token auth to all API endpoints: auto-generated 32-byte
  token stored at ~/.claude-mem/worker-auth-token (mode 0600). All hook,
  MCP, viewer, and OpenCode requests include Authorization header.
  Health/readiness endpoints exempt for polling. (#1932, #1933)

- Add path traversal protection: watch.context.path validated against
  project root and ~/.claude-mem/ before write. Rejects ../../../etc
  style attacks. (#1934)

- Reduce JSON body limit from 50MB to 5MB. Add in-memory rate limiter
  (300 req/min/IP) to prevent abuse. (#1935)

- Derive default worker port from UID (37700 + uid%100) to prevent
  cross-user data leakage on multi-user macOS. Windows falls back to
  37777. Shell hooks use same formula via id -u. (#1936)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve search project filtering and import Chroma sync (#1911, #1912, #1914, #1918)

- Fix per-type search endpoints to pass project filter to Chroma queries
  and SQLite hydration. searchObservations/Sessions/UserPrompts now use
  $or clause matching project + merged_into_project. (#1912)

- Fix timeline/search methods to pass project to Chroma anchor queries.
  Prevents cross-project result leakage when project param omitted. (#1911)

- Sync imported observations to ChromaDB after FTS rebuild. Import
  endpoint now calls chromaSync.syncObservation() for each imported
  row, making them visible to MCP search(). (#1914)

- Fix session-init cwd fallback to match context.ts (process.cwd()).
  Prevents project key mismatch that caused "no previous sessions"
  on fresh sessions. (#1918)

- Fix sync-marketplace restart to include auth token and per-user port.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve all CodeRabbit and Greptile review comments on PR #2080

- Fix run.sh comment mismatch (no-op flag vs empty array)
- Gate session-init on health check success (prevent running when worker unreachable)
- Fix date_desc ordering ignored in FTS session search
- Age-scope failed message purge (1h retention) instead of clearing all
- Anchor RestartGuard decay to real successes (null init, not Date.now())
- Add recordSuccess() calls in ResponseProcessor and completion path
- Prevent caller headers from overriding bearer auth token
- Add lazy cleanup for rate limiter map to prevent unbounded growth
- Bound post-import Chroma sync with concurrency limit of 8
- Add doc_type:'observation' filter to Chroma queries feeding observation hydration
- Add FTS fallback to all specialized search handlers (observations, sessions, prompts, timeline)
- Add response.ok check and error handling in viewer saveSettings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve CodeRabbit round-2 review comments

- Use failure timestamp (COALESCE) instead of created_at_epoch for stale purge
- Downgrade _fts5Available flag when FTS table creation fails
- Escape FTS5 MATCH input by quoting user queries as literal phrases
- Escape LIKE metacharacters (%, _, \) in prompt text search
- Add response.ok check in initial settings load (matches save flow)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve CodeRabbit round-3 review comments

- Include failed_at_epoch in COALESCE for age-scoped purge
- Re-throw FTS5 errors so callers can distinguish failure from no-results
- Wrap all FTS fallback calls in SearchManager with try/catch

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 11:42:09 -07:00
Alex Newman be99a5d690 fix: resolve search, database, and docker bugs (#2079)
* fix: resolve search, database, and docker bugs (#1913, #1916, #1956, #1957, #2048)

- Fix concept/concepts param mismatch in SearchManager.normalizeParams (#1916)
- Add FTS5 keyword fallback when ChromaDB is unavailable (#1913, #2048)
- Add periodic WAL checkpoint and journal_size_limit to prevent unbounded WAL growth (#1956)
- Add periodic clearFailed() to purge stale pending_messages (#1957)
- Fix nounset-safe TTY_ARGS expansion in docker/claude-mem/run.sh

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: prevent silent data loss on non-XML responses, add queue info to /health (#1867, #1874)

- ResponseProcessor: mark messages as failed (with retry) instead of confirming
  when the LLM returns non-XML garbage (auth errors, rate limits) (#1874)
- Health endpoint: include activeSessions count for queue liveness monitoring (#1867)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: cache isFts5Available() at construction time

Addresses Greptile review: avoid DDL probe (CREATE + DROP) on every text
query. Result is now cached in _fts5Available at construction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 22:19:18 -07:00
Alex Newman b8360cdee1 fix: restore assistant replies to conversationHistory in OpenRouterAgent
The extracted helper methods (handleInitResponse, processObservationMessage,
processSummaryMessage) lost the conversationHistory.push calls for assistant
replies, breaking multi-turn context for queryOpenRouterMultiTurn.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 20:20:13 -07:00
Alex Newman d2eb89d27f fix: resolve all CodeRabbit review comments
Critical fixes:
- GeminiCliHooksInstaller: wrap install/uninstall prep in try/catch to maintain
  numeric return contract
- McpIntegrations: move mkdirSync inside try block for Goose installer

Major fixes:
- import-xml-observations: guard invalid dates before toISOString()
- runtime.ts: guard response.json() parsing
- WorktreeAdoption: delay adoptedSqliteIds mutation until SQL succeeds
- CursorHooksInstaller: move mkdirSync inside try block
- McpIntegrations: throw on failed claude-mem block replacement
- OpenClawInstaller: propagate parse failure instead of returning {}
- OpenClawInstaller: move mkdirSync inside try block
- WindsurfHooksInstaller: validate hooks.json shape with optional chaining
- timeline/queries: pass normalized Error directly to logger
- ChromaSync: use composite dedup key (entityType:id) to prevent cross-type collisions
- EnvManager: wrap preflight directory/file ops in try/catch with ENV logging

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 20:17:57 -07:00
Alex Newman 30a0ab4ddb fix: resolve Greptile review comments on error handling
- Remove spurious console.error in logger JSON.parse catch (expected control flow)
- Remove debug logging from hot PID cleanup loop (approved override)
- Replace unsafe `error as Error` casts with instanceof checks in ChromaSync, GeminiAgent, OpenRouterAgent
- Wrap non-Error FTS failures with new Error(String()) instead of dropping details

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 20:05:27 -07:00
Alex Newman a0dd516cd5 fix: resolve all 301 error handling anti-patterns across codebase
Systematic cleanup of every error handling anti-pattern detected by the
automated scanner. 289 issues fixed via code changes, 12 approved with
specific technical justifications.

Changes across 90 files:
- GENERIC_CATCH (141): Added instanceof Error type discrimination
- LARGE_TRY_BLOCK (82): Extracted helper methods to narrow try scope to ≤10 lines
- NO_LOGGING_IN_CATCH (65): Added logger/console calls for error visibility
- CATCH_AND_CONTINUE_CRITICAL_PATH (10): Added throw/return or approved overrides
- ERROR_STRING_MATCHING (2): Approved with rationale (no typed error classes)
- ERROR_MESSAGE_GUESSING (1): Replaced chained .includes() with documented pattern array
- PROMISE_CATCH_NO_LOGGING (1): Added logging to .catch() handler

Also fixes a detector bug where nested try/catch inside a catch block
corrupted brace-depth tracking, causing false positives.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 19:57:00 -07:00
Alex Newman 2337997c48 fix(parser): stop warning on normal observation responses (#2074)
parseSummary runs on every agent response, not just summary turns. When the
turn is a normal observation, the LLM correctly emits <observation> and no
<summary> — but the fallthrough branch from #1345 treated this as prompt
misbehavior and logged "prompt conditioning may need strengthening" every
time. That assumption stopped holding after #1633 refactored the caller to
always invoke parseSummary with a coerceFromObservation flag.

Gate the whole observation-on-summary path on coerceFromObservation. On a
real summary turn, coercion still runs and logs the legitimate "coercion
failed" warning when the response has no usable content. On an observation
turn, parseSummary returns null silently, which is the correct behavior.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:30:05 -07:00
Alex Newman 789efe4234 feat: disable subagent summaries, label subagent observations (#2073)
* feat: disable subagent summaries and label subagent observations

Detect Claude Code subagent hook context via `agent_id`/`agent_type` on
stdin, short-circuit the Stop-hook summary path when present, and thread
the subagent identity end-to-end onto observation rows (new `agent_type`
and `agent_id` columns, migration 010 at version 27). Main-session rows
remain NULL; content-hash dedup is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address PR #2073 review feedback

- Narrow summarize subagent guard to agentId only so --agent-started
  main sessions still own their summary (agentType alone is main-session).
- Remove now-dead agentId/agentType spreads from the summarize POST body.
- Always overwrite pendingAgentId/pendingAgentType in SDK/Gemini/OpenRouter
  agents (clears stale subagent identity on main-session messages after
  a subagent message in the same batch).
- Add idx_observations_agent_id index in migration 010 + the mirror
  migration in SessionStore + the runner.
- Replace console.log in migration010 with logger.debug.
- Update summarize test: agentType alone no longer short-circuits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address CodeRabbit + claude-review iteration 4 feedback

- SessionRoutes.handleSummarizeByClaudeId: narrow worker-side guard to
  agentId only (matches hook-side). agentType alone = --agent main
  session, which still owns its summary.
- ResponseProcessor: wrap storeObservations in try/finally so
  pendingAgentId/Type clear even if storage throws. Prevents stale
  subagent identity from leaking into the next batch on error.
- SessionStore.importObservation + bulk.importObservation: persist
  agent_type/agent_id so backup/import round-trips preserve subagent
  attribution.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* polish: claude-review iteration 5 cleanup

- Use ?? not || for nullable subagent fields in PendingMessageStore
  (prevents treating empty string as null).
- Simplify observation.ts body spread — include fields unconditionally;
  JSON.stringify drops undefined anyway.
- Narrow any[] to Array<{ name: string }> in migration010 column checks.
- Add trailing newline to migrations.ts.
- Document in observations/store.ts why the dedup hash intentionally
  excludes agent fields.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* polish: claude-review iteration 7 feedback

- claude-code adapter: add 128-char safety cap on agent_id/agent_type
  so a malformed Claude Code payload cannot balloon DB rows. Empty
  strings now also treated as absent.
- migration010: state-aware debug log lists only columns actually
  added; idempotent re-runs log "already present; ensured indexes".
- Add 3 adapter tests covering the length cap boundary and empty-string
  rejection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf: skip subagent summary before worker bootstrap

Move the agentId short-circuit above ensureWorkerRunning() so a Stop
hook fired inside a subagent does not trigger worker startup just to
return early. Addresses CodeRabbit nit on summarize.ts:36-47.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 14:58:01 -07:00
Copilot 8ec91e7ffa fix: break infinite summary-retry loop (#1633) (#2072)
* Initial plan

* fix: break infinite summary-retry loop (#1633)

Three-part fix:
1. Parser coercion: When LLM returns <observation> tags instead of <summary>,
   coerce observation content into summary fields (root cause fix)
2. Stronger summary prompt: Add clearer tag requirements with warnings
3. Circuit breaker: Track consecutive summary failures per session,
   skip further attempts after 3 failures to prevent unbounded prompt growth

Agent-Logs-Url: https://github.com/thedotmack/claude-mem/sessions/e345e8ec-bc97-4eaa-94bd-6e951fda8f77

Co-authored-by: thedotmack <683968+thedotmack@users.noreply.github.com>

* refactor: extract shared constants for summary mode marker and failure threshold

Addresses code review feedback: SUMMARY_MODE_MARKER and
MAX_CONSECUTIVE_SUMMARY_FAILURES are now defined once in sdk/prompts.ts
and imported by ResponseProcessor and SessionManager.

Agent-Logs-Url: https://github.com/thedotmack/claude-mem/sessions/e345e8ec-bc97-4eaa-94bd-6e951fda8f77

Co-authored-by: thedotmack <683968+thedotmack@users.noreply.github.com>

* fix: guard summary failure counter on summaryExpected (Greptile P1)

The circuit breaker counter previously incremented on any response
containing <observation> or <summary> tags — which matches virtually
every normal observation response. After 3 observations the breaker
would open and permanently block summarization, reproducing the
data-loss scenario #1633 was meant to prevent.

Gate the increment block on summaryExpected (already computed for
parseSummary coercion) so the counter only tracks actual summary
attempts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: cover circuit-breaker + apply review polish

- Use findLast / at(-1) for last-user-message lookup instead of
  filter + index (O(1) common case).
- Drop redundant `|| 0` fallback — field is required and initialized.
- Add comment noting counter is ephemeral by design.
- Add ResponseProcessor tests covering:
  * counter NOT incrementing on normal observation responses
    (regression guard for the Greptile P1)
  * counter incrementing when a summary was expected but missing
  * counter resetting to 0 on successful summary storage

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: iterate all observation blocks; don't count skip_summary as failure

Addresses CodeRabbit review on #2072:

- coerceObservationToSummary now iterates all <observation> blocks
  with a global regex and returns the first block that has title,
  narrative, or facts. Previously, an empty leading observation
  would short-circuit and discard populated follow-ups.

- Circuit-breaker counter now treats explicit <skip_summary/> as
  neutral — neither a failure nor a success — so a run that happens
  to end on a skip doesn't punish the session or mask a prior bad
  streak. Real failures (no summary, no skip) still increment.

- Tests added for both cases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: reference SUMMARY_MODE_MARKER constant instead of hardcoded string

Addresses CodeRabbit nitpick: tests should pull the marker from the
canonical source so they don't silently drift when the constant is
renamed or edited.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: also coerce observations when <summary> has empty sub-tags

When the LLM wraps an empty <summary></summary> around real observation
content, the #1360 empty-subtag guard rejects the summary and returns
null — which would lose the observation content and resurrect the
#1633 retry loop. Fall back to coerceObservationToSummary in that
branch too, mirroring the unmatched-<summary> path.

Adds a test covering the empty-summary-wraps-observation case and
a guard test for empty summary with no observation content.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thedotmack <683968+thedotmack@users.noreply.github.com>
Co-authored-by: Alex Newman <thedotmack@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 12:00:38 -07:00
Alex Newman b8d63d949f fix(worktree): self-heal Chroma metadata on re-run
Addresses unresolved CodeRabbit finding on WorktreeAdoption.ts:296.

Previously, Chroma patch failures stranded rows permanently: adoptedSqliteIds
was built only from rows where merged_into_project IS NULL, so once SQL
committed, reruns couldn't rediscover them for retry.

The Chroma id set is now built from ALL observations whose project matches a
merged worktree — including rows already stamped to this parent. Combined
with the idempotent updateMergedIntoProject, transient Chroma failures
self-heal on the next adoption pass.

SQL writes remain idempotent (UPDATE still guards on merged_into_project IS
NULL), so adoptedObservations / adoptedSummaries continue to count only
newly-adopted rows. chromaUpdates now counts total Chroma writes per pass
(may exceed adoptedObservations when retrying).
2026-04-16 22:01:21 -07:00
Alex Newman 7a66cb310f fix(worktree): address PR review — schema guard, startup adoption, query parity
Addresses six CodeRabbit/Greptile findings on PR #2052:

- Schema guard in adoptMergedWorktrees probes for merged_into_project
  columns before preparing statements; returns early when absent so first
  boot after upgrade (pre-migration) doesn't silently fail.

- Startup adoption now iterates distinct cwds from pending_messages and
  dedupes via resolveMainRepoPath — the worker daemon runs with
  cwd=plugin scripts dir, so process.cwd() fallback was a no-op.

- ObservationCompiler single-project queries (queryObservations /
  querySummaries) OR merged_into_project into WHERE so injected context
  surfaces adopted worktree rows, matching the Multi variants.

- SessionStore constructor now calls ensureMergedIntoProjectColumns so
  bundled artifacts (context-generator.cjs) that embed SessionStore get
  the merged_into_project column on DBs that only went through the
  bundled migration chain.

- OBSERVER_SESSIONS_PROJECT constant is now derived from
  basename(OBSERVER_SESSIONS_DIR) and used across PaginationHelper,
  SessionStore, and timeline queries instead of hardcoded strings.

- Corrected misleading Chroma retry docstring in WorktreeAdoption to
  match actual behavior (no auto-retry once SQL commits).
2026-04-16 21:31:30 -07:00
Alex Newman d1601123fd feat(ui): hide observer-sessions project from UI lists
Observer sessions (internal SDK-driven worker queries) run under a
synthetic project name 'observer-sessions' to keep them out of
claude --resume. They were still surfacing in the viewer project
picker and unfiltered observation/summary/prompt feeds.

Filter them out at every UI-facing query:
- SessionStore.getAllProjects and getProjectCatalog
- timeline/queries.ts getAllProjects
- PaginationHelper observations/summaries/prompts when no project is selected

When a caller explicitly requests project='observer-sessions',
results are still returned (not a hard ban, just hidden by default).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 20:05:37 -07:00
Alex Newman f6fda8fff4 fix(worktree): address CodeRabbit PR review feedback
- Document --branch override in npx-cli help text
- Guard ContextBuilder against empty projects[] override; fall back to cwd-derived primary
- Ensure merged_into_project indexes are created even if ALTER ran in a prior partial migration
- Reject adopt --branch/--cwd flags with missing or flag-like values
- Use defined --color-border-primary token for merged badge border

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 20:03:27 -07:00
Alex Newman d24f3a7019 fix(worktree): address PR review — test assertion, dry-run sentinel, git timeouts
- Update allProjects test expectation to match [parent, composite] (matches JSDoc + callers in ContextBuilder/context handlers).
- Replace string-matched __DRY_RUN_ROLLBACK__ sentinel with dedicated DryRunRollback class to avoid swallowing unrelated errors.
- Add 5000ms timeout to spawnSync git calls in WorktreeAdoption and ProcessManager so worker startup can't hang on a stuck git process.
- Drop unreachable break after process.exit(0) in adopt case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 19:50:01 -07:00
Alex Newman bce4ce32ec feat(ui): show merged-into-parent badge on adopted observations
ObservationCard renders a secondary "merged → <parent>" chip when
merged_into_project is set, next to the existing project label.
Both are meaningful: project is origin provenance, merged_into_project
is the current home.

Extends PaginationHelper's observations and summaries queries with
OR merged_into_project = ? so the single-project viewer fetch pulls
in adopted rows — the plan's Phase 3 covered multi-project context
injection; this is the single-project UI read path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 19:33:47 -07:00
Alex Newman 5664fabce4 feat(cli): npx claude-mem adopt [--dry-run] [--branch X]
Adds a manual escape hatch for the worktree adoption engine. Covers
squash-merges where git branch --merged HEAD returns nothing, and
lets users re-run adoption on demand.

Wired through worker-service.cjs (same pattern as generate/clean)
so the command runs under Bun with bun:sqlite, keeping npx-cli/
pure Node. --cwd flag passes the user's working directory through
the spawn so the engine resolves the correct parent repo.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 19:28:17 -07:00
Alex Newman 0b90495391 feat(worktree): auto-adopt merged worktrees on worker startup
Invokes adoptMergedWorktrees() right after runOneTimeCwdRemap() and
before dbManager.initialize(), wrapped in try/catch so adoption
failures never block startup. Idempotent, so running every startup
is cheap — the SQL UPDATE only touches rows where merged_into_project
IS NULL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 19:24:43 -07:00
Alex Newman 3e770df332 feat(worktree): query plumbing surfaces merged rows under parent project
ObservationCompiler.queryObservationsMulti and querySummariesMulti
WHERE clause extended with OR merged_into_project IN (...), so a
parent-project read pulls in rows originally written under any
child worktree's composite name once merged.

SearchManager wraps the Chroma project filter in \$or so semantic
search behaves identically. ChromaSync baseMetadata now carries
merged_into_project on new embeddings; existing rows are patched
retroactively by the adoption engine.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 19:24:10 -07:00
Alex Newman a7c3c4af2d feat(worktree): adoption engine for merged worktrees
Detects merged worktrees via git (worktree list --porcelain +
branch --merged HEAD), then stamps merged_into_project on SQLite
observations/summaries and propagates the same metadata to Chroma
in lockstep. `project` stays immutable; adoption is a virtual
pointer. Idempotent via IS NULL guard on UPDATE and by idempotent
Chroma metadata writes. SQL is source of truth — Chroma failures
are logged but don't roll back SQL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 19:19:02 -07:00
Alex Newman 3d1dfcc26a feat(migration): add merged_into_project column for worktree adoption
Nullable pointer on observations and session_summaries that lets a
worktree's rows surface under the parent project's observation list
without data movement. Self-idempotent via PRAGMA table_info guard;
does not bump schema_versions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 19:12:38 -07:00
Alex Newman dc198d5677 feat(worktree): include parent repo observations in worktree read scope
Worktrees are branches off main; the parent holds the architecture,
decisions, and long-tail history the worktree inherits. Scoping reads
to the worktree alone meant every new worktree started cold on any
question that required prior context.

Expand `allProjects` in a worktree to `[parent, composite]` so reads
pull both. Writes still go through `.primary` (the composite), so
sibling worktrees don't leak into each other.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 17:55:17 -07:00
Alex Newman 193e7e0719 feat(worktree): auto-apply cwd-based project remap on worker startup
Ports scripts/cwd-remap.ts into ProcessManager.runOneTimeCwdRemap() and
invokes it in initializeBackground() alongside the existing chroma
migration. Uses pending_messages.cwd as the source of truth to rewrite
pre-worktree bare project names into the parent/worktree composite
format so search and context are consistent.

- Backs up the DB to .bak-cwd-remap-<ts> before any writes.
- Idempotent: marker file .cwd-remap-applied-v1 short-circuits reruns.
- No-ops on fresh installs (no DB, or no pending_messages table).
- On failure, logs and skips the marker so the next restart retries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 17:51:33 -07:00
Alex Newman 9d695f53ed chore: remove auto-generated per-directory CLAUDE.md files
Leftover artifacts from an abandoned context-injection feature. The
project-level CLAUDE.md stays; the directory-level ones were generated
timeline scaffolding that never panned out.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 17:51:24 -07:00
Alex Newman a65ab055ca fix(worktree): audit observation fetch/display for composite project names
Three sites didn't account for parent/worktree composite naming:

- PaginationHelper.stripProjectPath: marker used full composite, breaking
  path sanitization for worktrees checked out outside a parent/leaf layout.
  Now extracts the leaf segment.
- observations/store.ts: fallback imported getCurrentProjectName from
  shared/paths.ts (a duplicate impl without worktree detection). Switched
  to getProjectContext().primary so writes key into the same project as
  reads.
- SearchManager.getRecentContext: fallback used basename(cwd) and lost
  the parent prefix, making the MCP tool find nothing in worktrees.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 17:38:49 -07:00
Alex Newman 3869b083d0 fix(context): derive project from explicit projects array, not cwd
When a caller (e.g. worker context-inject route) passes a `projects`
array without a matching cwd, the cwd-derived `context.primary` drifted
from the projects being queried — producing an empty-state header for
one project while querying another. Use the last entry of `projects` so
header and query target stay in sync.
2026-04-16 17:16:51 -07:00
Alex Newman 040729beef fix(project-name): use parent/worktree composite so observations don't cross worktrees
Revert of #1820 behavior. Each worktree now gets its own bucket:
- In a worktree, primary = `parent/worktree` (e.g. `claude-mem/dar-es-salaam`)
- In a main repo, primary = basename (unchanged)
- allProjects is always `[primary]` — strict isolation at query time

Includes a one-off maintenance script (scripts/worktree-remap.ts) that
retroactively reattributes past sessions to their worktree using path
signals in observations and user prompts. Two-rule inference keeps the
remap high-confidence:
  1. The worktree basename in the path matches the session's current
     plain project name (pre-#1820 era; trusted).
  2. Or all worktree path signals converge on a single (parent, worktree)
     across the session.
Ambiguous sessions are skipped.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-16 15:40:44 -07:00
Alex Newman c76a439491 fix: drop orphan flag when filtering empty-string spawn args (#2049)
Observations were 100% failing on Claude Code 2.1.109+ because the Agent
SDK emits ["--setting-sources", ""] when settingSources defaults to [].
The existing Bun-workaround filter stripped the empty string but left
the orphan --setting-sources flag, which then consumed --permission-mode
as its value, crashing the subprocess with:

  Error processing --setting-sources:
  Invalid setting source: --permission-mode.

Make the filter pair-aware: when an empty arg follows a --flag, drop
both so the SDK default (no setting sources) is preserved by omission.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 14:30:54 -07:00
Alex Newman aa7cdb6d9f fix: revert unauthorized $CMEM branding in context header
A prior Claude instance snuck in a `$CMEM` token branding header
during a context compression refactor (7e072106). Reverts back to
the original descriptive format: `# [project] recent context, datetime`

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 12:04:27 -07:00
Alex Newman d0fc68c630 revert: remove overengineered summary salvage logic (#1718) (#1850)
The synthetic summary salvage feature created fake summaries from observation
data when the AI returned <observation> instead of <summary> tags. This was
overengineered — missing a summary is preferable to fabricating one from
observation fields that don't map cleanly to summary semantics.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 04:22:41 -07:00
Ben Younes 05232ff091 fix: reap stuck generators in reapStaleSessions (fixes #1652) (#1698)
* fix: reap stuck generators in reapStaleSessions (fixes #1652)

Sessions whose SDK subprocess hung would stay in the active sessions
map forever because `reapStaleSessions()` unconditionally skipped any
session with a non-null `generatorPromise`.  The generator was blocked
on `for await (const msg of queryResult)` inside SDKAgent and could
never unblock itself — the idle-timeout only fires when the generator
is in `waitForMessage()`, and the orphan reaper skips processes whose
session is still in the map.

Add `MAX_GENERATOR_IDLE_MS` (5 min).  When `reapStaleSessions()` sees
a session whose `generatorPromise` is set but `lastGeneratorActivity`
has not advanced in over 5 minutes, it now:
1. SIGKILLs the tracked subprocess to unblock the stuck `for await`
2. Calls `session.abortController.abort()` so the generator loop exits
3. Calls `deleteSession()` which waits up to 30 s for the generator to
   finish, then cleans up supervisor-tracked children

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: freeze time in stale-generator test and import constants from production source

- Export MAX_GENERATOR_IDLE_MS, MAX_SESSION_IDLE_MS, StaleGeneratorCandidate,
  StaleGeneratorProcess, and detectStaleGenerator from SessionManager.ts so
  tests no longer duplicate production constants or detection logic.
- Use setSystemTime() from bun:test to freeze Date.now() in the
  "exactly at threshold" test, eliminating the flaky double-Date.now() race.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 00:58:35 -07:00
Ben Younes f97c50bfb9 fix: session lifecycle guards to prevent runaway API spend (#1590) (#1693)
* fix: add session lifecycle guards to prevent runaway API spend (#1590)

Three root causes allowed 30+ subprocess accumulation over 36 hours:
1. SIGTERM-killed processes (code 143) triggered crash recovery and
   immediately respawned — now detected and treated as intentional
   termination (aborts controller so wasAborted=true in .finally).
2. No wall-clock limit: sessions ran for 13+ hours continuously
   spending tokens — now refuses new generators after 4 hours and
   drains the pending queue to prevent further spawning.
3. Duplicate --resume processes for the same session UUID — now
   killed and unregistered before a new spawn is registered.

Generated by Claude Code
Vibe coded by ousamabenyounes

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: use normalized errorMsg in logger.error payload and annotate SIGTERM override

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: use persisted createdAt for wall-clock guard and bind abortController locally to prevent stale abort

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore: re-trigger CodeRabbit review after rate limit reset

* fix: defer process unregistration until exit and align boundary test with strict > (#1693)

- ProcessRegistry: don't unregister PID immediately after SIGTERM — let the
  existing 'exit' handler clean up when the process actually exits, preventing
  tracking loss for still-live processes.
- Test: align wall-clock boundary test with production's strict `>` operator
  (exactly 4h is NOT terminated, only >4h is).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-04-15 00:58:23 -07:00
Ben Younes 983be42998 fix: resolve Gemini CLI 0.37.0 session capture failures (#1664) (#1692)
Three root causes prevented Gemini sessions from persisting prompts,
observations, and summaries:

1. BeforeAgent was mapped to user-message (display-only) instead of
   session-init (which initialises the session and starts the SDK agent).

2. The transcript parser expected Claude Code JSONL (type: "assistant")
   but Gemini CLI 0.37.0 writes a JSON document with a messages array
   where assistant entries carry type: "gemini". extractLastMessage now
   detects the format and routes to the correct parser, preserving
   full backward compatibility with Claude Code JSONL transcripts.

3. The summarize handler omitted platformSource from the
   /api/sessions/summarize request body, causing sessions to be recorded
   without the gemini-cli source tag.

Co-authored-by: Claude <noreply@anthropic.com>
2026-04-15 00:58:20 -07:00
Ethan 16a0737dfc fix: use parent project name for worktree observation writes (#1820)
* fix: use parent project name for worktree observation writes (#1819)

Observations and sessions from git worktrees were stored under
basename(cwd) instead of the parent repo name because write paths
called getProjectName() (not worktree-aware) instead of
getProjectContext() (worktree-aware). This is the same bug as
#1081, #1317, and #1500 — it regressed because the two functions
coexist and new code reached for the simpler one.

Fix: getProjectContext() now returns parentProjectName as primary
when in a worktree, and all four write-path call sites now use
getProjectContext().primary instead of getProjectName().

Includes regression test that creates a real worktree directory
structure and asserts primary === parentProjectName.

* fix: address review nitpicks — allProjects fallback, JSDoc, write-path test

- ContextBuilder: default projects to context.allProjects for legacy
  worktree-labeled record compatibility
- ProjectContext: clarify JSDoc that primary is canonical (parent repo
  in worktrees)
- Tests: add write-path regression test mirroring session-init/SessionRoutes
  pattern; refactor worktree fixture into beforeAll/afterAll

* refactor(project-name): rename local to cwdProjectName and dedupe allProjects

Addresses final CodeRabbit nitpick: disambiguates the local variable
from the returned `primary` field, and dedupes allProjects via Set
in case parent and cwd resolve to the same name.

---------

Co-authored-by: Ethan Hurst <ethan.hurst@outlook.com.au>
2026-04-15 00:58:14 -07:00
biswanath-cmd 3d92684e04 fix: filter empty string args before Bun spawn() to prevent CLI parsing errors (#1781)
Bun's child_process.spawn() silently drops empty string arguments from
argv, unlike Node which preserves them. When the Agent SDK defaults
settingSources to [] (empty array), [].join(",") produces "" which gets
pushed as ["--setting-sources", ""]. Bun drops the "", causing
--permission-mode to be consumed as the value for --setting-sources:

  Error processing --setting-sources: Invalid setting source: --permission-mode

This caused 100% observation failure (exit code 1 on every SDK subprocess
spawn), resulting in 0 observations stored across all sessions.

The fix filters empty string args before passing to spawn(), making the
behavior consistent between Node and Bun runtimes.

Fixes #1779
Related: #1660

Co-authored-by: bswnth48 <69203760+bswnth48@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 00:58:11 -07:00
enma998 471e1f62f9 Fix npx search and default Codex context to workspace-local AGENTS (#1780)
* Fix npx search query parameter mismatch

* Use workspace-local Codex AGENTS context by default

---------

Co-authored-by: bnb <bnb>
2026-04-15 00:58:08 -07:00
suyua9 eeb6841033 fix: coerce corpus route filters (#1776)
* fix: coerce corpus route filters

* test: cover unsupported corpus type filters
2026-04-15 00:58:01 -07:00
Tran Quang 2a2008bac2 fix(file-context): preserve targeted reads + invalidate on mtime (#1719) (#1729)
* fix(file-context): preserve targeted reads + invalidate on mtime (#1719)

The PreToolUse:Read hook unconditionally rewrote tool input to
{file_path, limit:1}, which interacted with two failure modes:

1. Subagent edits a file → parent's next Read still gets truncated
   because the observation snapshot predates the change.
2. Claude requests a different section with offset/limit → the hook
   strips them, so the Claude Code harness's read-dedup cache returns
   "File unchanged" against the prior 1-line read. The file becomes
   unreadable for the rest of the conversation, even though the hook's
   own recovery hint says "Read again with offset/limit for the
   section you need."

Two complementary fixes:

- **mtime invalidation**: stat the file (we already stat for the size
  gate) and compare mtimeMs to the newest observation's created_at_epoch.
  If the file is newer, pass the read through unchanged so fresh content
  reaches Claude.

- **Targeted-read pass-through**: when toolInput already specifies
  offset and/or limit, preserve them in updatedInput instead of
  collapsing to {limit:1}. The harness's dedup cache then sees a
  distinct input and lets the read proceed.

The unconstrained-read path (no offset, no limit) is unchanged: still
truncated to 1 line plus the observation timeline, so token economics
are preserved for the common case.

Tests cover all three branches: existing truncation, targeted-read
pass-through (offset+limit, limit-only), and mtime-driven bypass.

Fixes #1719

* refactor(file-context): address review findings on #1719 fix

- Add offset-only test case for full targeted-read branch coverage
- Use >= for mtime comparison to handle same-millisecond edge case
- Add Number.isFinite() + bounds guards on offset/limit pass-through
- Trim over-verbose comments to concise single-line summaries
- Remove redundant `as number` casts after typeof narrowing
- Add comment explaining fileMtimeMs=0 sentinel invariant
2026-04-15 00:57:57 -07:00
Jochen Meyer 59ce0fc553 fix: exclude primary key index from unique constraint check in migration 7 (#1771)
* fix: exclude primary key index from unique constraint check in migration 7

PRAGMA index_list returns all indexes including those backing PRIMARY KEY
columns (origin='pk'), which always have unique=1. The check was therefore
always true, causing migration 7 to run the full DROP/CREATE/RENAME table
sequence on every worker startup instead of short-circuiting once the
UNIQUE constraint had already been removed.

Fix: filter to non-PK indexes by requiring idx.origin !== 'pk'. The
origin field is already present on the existing IndexInfo interface.

Fixes #1749

* fix: apply pk-origin guard to all three migration code paths

CodeRabbit correctly identified that the origin !== 'pk' fix was only
applied to MigrationRunner.ts but not to the two other active code paths
that run the same removeSessionSummariesUniqueConstraint logic:

- SessionStore.ts:220 — used by DatabaseManager and worker-service
- plugin/scripts/context-generator.cjs — bundled artifact (minified)

All three paths now consistently exclude primary-key indexes when
detecting unique constraints on session_summaries.
2026-04-15 00:57:51 -07:00
Jochen Meyer 31ee1024c5 fix: restrict ~/.claude-mem/.env permissions to owner-only (0600) (#1770)
* fix: restrict .env file permissions to owner-only (0600)

API keys stored in ~/.claude-mem/.env were created without explicit
permissions, defaulting to umask-dependent mode. On systems with a
permissive umask (e.g. 0022), the file would be world-readable.

- Set directory permissions to 0700 on creation
- Set file permissions to 0600 via writeFileSync mode option
- Call chmodSync after write to fix permissions on pre-existing files

Signed-off-by: Jochen Meyer

* fix: also restrict pre-existing directory permissions to 0700

The initial fix only set directory mode on creation. Pre-existing
~/.claude-mem/ directories from earlier installs remained world-readable.
Add chmodSync for the directory alongside the existing file chmod,
and document the Windows limitation (ACLs, not POSIX permissions).

---------

Signed-off-by: Jochen Meyer
2026-04-15 00:57:48 -07:00
Alex Newman a390a537c9 fix: broadcast uses summaryForStore to support salvaged summaries (#1718)
syncAndBroadcastSummary was using the raw ParsedSummary (null when salvaged)
instead of summaryForStore for the SSE broadcast, causing a crash when the
LLM returns <observation> without <summary> tags. Also removes misplaced
tree-sitter docs from mem-search/SKILL.md (belongs in smart-explore).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 19:11:48 -07:00
Alex Newman 2357835942 Merge pull request #1686 from ousamabenyounes/fix/issue-1633
fix: expose summaryStored in session status to detect silent summary loss (#1633)
2026-04-14 18:41:58 -07:00
Alex Newman 77a22d30b2 Merge pull request #1555 from ousamabenyounes/fix/issue-1384-mcp-inputschema
fix: declare inputSchema properties for search and timeline MCP tools (#1384 #1413)
2026-04-14 18:41:54 -07:00
Alex Newman 40a25e0225 Merge pull request #1676 from ousamabenyounes/fix/issue-1625
fix: filter ghost observations with no content fields (#1625)
2026-04-14 18:41:51 -07:00
Alex Newman 4c2ab98d90 Merge pull request #1679 from ousamabenyounes/fix/issue-1297
fix: set cwd to homedir when spawning chroma-mcp to prevent pydantic .env.local crash (#1297)
2026-04-14 18:41:48 -07:00