fix: bug-batch — 17 issues + 4 foundations (chroma, opencode, parser, OAuth, paths, uptime, classification) (#2282)

* feat: foundations F1-F4 + simple bug fixes Foundations (no consumer adoption yet): - F1 spawnHidden wrapper at src/shared/spawn.ts - F2 paths namespace with 18 accessors + invariant test (tests/shared/paths.test.ts) - F3 getUptimeSeconds at src/shared/uptime.ts - F4 ClassifiedProviderError at src/services/worker/provider-errors.ts + 6 tests Issue fixes (file-isolated, parallel-safe): - #2231: SECURITY.md at repo root for GitHub Security tab - #2240: dedupe observationIds before Chroma sync (ResponseProcessor.ts) - #2247: add task_complete to Codex session-end events - #2243: rsync excludes scripts/package.json + scripts/node_modules Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: validate Claude executable with --version and detect desktop app Extract findClaudeExecutable() into shared utility used by both SDKAgent and KnowledgeAgent (deduplication). Every candidate is now validated with --version (3s timeout). Desktop app executables in AppData/Program Files get an actionable error message directing users to install the CLI via npm. Closes #2222 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use Zod schemas in OpenCode plugin to fix _zod.def crash OpenCode 1.14.x walks arg._zod.def at plugin registration, which crashes on plain JSON Schema objects like {type: "string"}. Replace with z.string().describe() so the Zod internals are present. Closes #2226, #2225, #2154 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: neutralize chroma-mcp CPU storm at the root Two surgical fixes to the chroma backfill path that together cause the sustained 60–80% CPU + orphan accumulation pattern reported across 1. ChromaMcpManager.getSpawnEnv: cap embedding-thread fanout ONNX Runtime / OpenBLAS / MKL all default to cpu_count(), so a 12-core machine spins 12 threads burning embeddings concurrently. The user's getSpawnEnv only handled SSL certs — no thread limits at all. Inject OMP_NUM_THREADS / ONNX_NUM_THREADS / OPENBLAS_NUM_THREADS / MKL_NUM_THREADS defaults of 2 (only if user hasn't pinned them), and ANONYMIZED_TELEMETRY=false to stop background HTTP from the embedding subprocess. Closes the storm at the source. 2. ChromaSync.backfill{Observations,Summaries,Prompts}: per-batch watermark The bump was in a trailing finally block. SIGKILL / OOM / power loss mid-flight skips finally entirely, so the watermark stayed at 0 and the next worker boot re-embedded the entire history (16K obs in #2220's case), which then pegged CPU forever in combination with (1). Move the bump inside the loop so progress is durable per batch. Closes #2214. Verification: - 26/26 chroma tests pass (tests/services/sync, tests/integration/chroma-vector-sync) - Bundle confirms thread caps and per-batch bumps are present - Full suite: 1429 pass / 20 fail — pre-existing failures only, no regression vs v12.4.9 baseline (1429 pass / 27 fail) Closes #2214. Substantially de-amplifies #2220 (the structural Job-Object cleanup is still tracked separately at #2216). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: kill chroma-mcp process tree and limit backfill concurrency Three fixes for orphan chroma-mcp processes and resource exhaustion: 1. killProcessTree() in ChromaMcpManager.stop() tears down the full uvx->uv->python->chroma-mcp spawn chain (pkill -P on POSIX, taskkill /T on Windows) before MCP client.close(). 2. Register chroma process with pgid for supervisor shutdown cascade. 3. backfillAllProjects() now processes max 3 projects concurrently with a re-entrancy guard to prevent overlapping fire-and-forget runs. Closes #2216, advances #2220, #2213 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * build: regenerate plugin artifacts after cherry-picks Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: foundation consumers + Cursor/stdin/queue/docs fixes F1 spawnHidden adoption (#2236): - 8 spawn → spawnHidden conversions across worker-utils, ProcessManager, npx-cli (install/runtime), supervisor/process-registry F3 getUptimeSeconds adoption (#2250): - Server.ts:165 (THE BUG: returned ms) - Server.ts:270, SessionRoutes.ts:326 (4th ms-bug consumer found), DataRoutes.ts:225 (refactor for consistency) #2188 stdin '{}' fallback removal: - Diagnostic logging to <DATA_DIR>/logs/runner-errors.log + CAPTURE_BROKEN marker; exit 0 to preserve Windows Terminal exit-code strategy #2196 ANTHROPIC_BASE_URL docs: - New docs/public/configuration/custom-anthropic-backends.mdx - Note: issue may need separate auto-detect feature; docs document existing plumbing only #2242 check-pending-queue endpoints: - Point at /api/processing-status + /api/processing per DataRoutes.ts; honor CLAUDE_MEM_WORKER_PORT env #2248 Cursor sessions never summarized: - Pulled reporter wbingli's tested fix (commit 46eaba44) - Bug A: cursor adapter now derives transcriptPath from cwd+sessionId - Bug B: parser accepts both line.type and line.role - Bug C: walk backward, prefer non-empty text, fallback to empty - Tests: 10-case regression suite + tests/fixtures/cursor-session.jsonl Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: F2 paths namespace adoption (#2237 + #2238) Replaced 24 hardcoded homedir() + '.claude-mem' sites across 18 source files with paths.<accessor>() calls from src/shared/paths.ts. Accessors used: dataDir, workerPid, settings, database, chroma, combinedCerts, transcriptsConfig, transcriptsState, corpora, supervisorRegistry, envFile, logsDir. Sites converted (file:area): - src/cli/claude-md-commands.ts (database) - src/services/context/ContextConfigLoader.ts (settings) - src/services/infrastructure/ProcessManager.ts (workerPid) - src/services/infrastructure/WorktreeAdoption.ts (settings) - src/services/integrations/CodexCliInstaller.ts (settings) - src/services/sync/ChromaMcpManager.ts (chroma + combinedCerts) - src/services/transcripts/config.ts (transcriptsConfig + transcriptsState) - src/services/worker/ClaudeProvider.ts (envFile) - src/services/worker/GeminiProvider.ts (envFile + 2 more) - src/services/worker/http/routes/DataRoutes.ts (dataDir) - src/services/worker/http/routes/SettingsRoutes.ts (settings + envFile) - src/services/worker/knowledge/CorpusStore.ts (corpora) - src/shared/EnvManager.ts (envFile) - src/supervisor/index.ts (supervisorRegistry) - src/supervisor/process-registry.ts (supervisorRegistry) - src/supervisor/shutdown.ts (supervisorRegistry) - src/utils/claude-md-utils.ts (database) - src/utils/logger.ts (logsDir + settings, lazy to avoid cycle) CLAUDE_MEM_DATA_DIR override now flows through 100% of the worker runtime; no per-file env reads needed. Verification: - Grep guard: zero homedir+'.claude-mem' sites remain in src/ (excluding paths.ts itself and SettingsDefaultsManager.ts) - F2 invariant test: 3/3 pass (60 expects) - Foundation tests: 19/19 pass Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: F4 provider classification + parser fence + OAuth keychain F4 adoption (#2244 + #2254): - Per-provider classifiers: classifyClaudeError, classifyGeminiError, classifyOpenRouterError. Each lives in the provider file. - New retry helper at src/services/worker/retry.ts: withRetry() honors ClassifiedProviderError.kind; retriable=transient/rate_limit (with retryAfterMs); not retriable=unrecoverable/auth_invalid/quota_exhausted. maxRetries=2, perAttemptTimeout=30s, exponential backoff with jitter. - GeminiProvider + OpenRouterProvider fetch calls wrapped with retry. Best-effort request-id capture (x-goog-request-id, x-request-id, x-openrouter-request-id) for dedup logging. - Deleted unrecoverablePatterns allowlist at worker-service.ts:540 area; worker dispatches on err.kind instead. - 28 new classifier tests at tests/worker/provider-classifiers.test.ts: 429-no-Retry-After, 500-with-quota-exceeded, OverloadedError, per-provider auth_invalid signals. #2233 Part A — parser fence handling: - src/sdk/prompts.ts: removed 4 fence markers from XML example blocks. Model now sees plain XML, eliminating the failure-mode that drained quota via repeated retries. - src/sdk/parser.ts: stripCodeFences() at top, called before parseAgentXml. Fence-tolerant regardless of model behavior. - TODO comment references #2233 Part B (tool-use migration as separate scope). - 4 fence-tolerance tests added to tests/sdk/parser.test.ts. #2215 OAuth token keychain: - New src/shared/oauth-token.ts (~360 LOC): readClaudeOAuthToken() reads from platform-native credential stores at worker spawn-time. - macOS: security find-generic-password -s "Claude Code-credentials" - Windows: PowerShell wrapper around CredRead (Win32 Advapi32.dll) - Linux: secret-tool lookup - Fallback: env CLAUDE_CODE_OAUTH_TOKEN with JWT exp claim or sidecar expiresAt validation; refuses stale-token injection. - EnvManager.buildIsolatedEnvWithFreshOAuth() (async) replaces silent process.env copy. Empty injection on absent; marker write on expired. - <DATA_DIR>/oauth-stale.marker surfaces "re-login via Claude Desktop" via existing SessionStart additionalContext mechanism (context.ts). - ClaudeProvider.startSession + KnowledgeAgent.prime/executeQuery now await the async env builder. - 17 oauth-token tests covering decodeJwtExpMs, marker round-trip, env-fallback expiry detection. Verification: - npx tsc --noEmit: only pre-existing bun-types error - bun test (foundations + new): 70 pass, 0 new fails (8 fails are pre-existing parser.test.ts cases unrelated to fence work) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: #2234 quota-aware wall-clock guard New src/services/worker/RateLimitStore.ts (207 LOC) — vendor pattern from meridian/rateLimitStore.ts (MIT, copied not depended). API: - class RateLimitStore: set/get/getAll/getMostRecentByWindow/size/clear, in-memory last-write-wins keyed by rateLimitType. - globalRateLimitStore singleton. - shouldAbortForQuota(authMethod, store, now?) → {abort, reason?, window?} - isApiKeyAuth(authMethod): matches both verbose getAuthMethodDescription strings and concise "api_key". Thresholds (auth-type gated): - api_key: never aborts (user authorized per-call spend). - cli/oauth/subscription: - five_hour utilization >= 0.95 OR resetsAt within 15min (with 0.85 utilization floor to avoid false trip on freshly-reset windows) - seven_day_opus >= 0.93 - seven_day_sonnet >= 0.92 - seven_day >= 0.93 - overage >= 0.95 ClaudeProvider integration (line 198, for-await loop): - Detects message.type === 'system' && subtype === 'rate_limit' - Records rate_limit_info via globalRateLimitStore.set - Calls shouldAbortForQuota(authMethod, globalRateLimitStore) - On abort: session.abortReason = 'quota:<window>', abortController.abort, break out of loop. Worker continues other sessions. Health endpoint (Server.ts:174): - New rateLimits field on /api/health from getMostRecentByWindow(). - Field shape: {five_hour?, seven_day?, seven_day_opus?, seven_day_sonnet?, overage?} each carrying utilization, status, resetsAt, observedAt. Tests (tests/worker/rate-limit-store.test.ts): - 22 cases covering store CRUD, isApiKeyAuth, abort decision matrix. - api_key never aborts at any utilization. - cli aborts at threshold breaches per window. - Reset-grace buffer with utilization floor. Verification: - npx tsc --noEmit: only pre-existing bun error - bun test tests/worker/rate-limit-store.test.ts: 22/22 pass - bun test tests/claude-provider-resume.test.ts: 9/9 pass - bun test tests/server/: 44/44 pass Plugin artifacts regenerated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * build: regenerate worker-service.cjs after final build-and-sync Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: align test assertions with F4 classification + timeout Two test fixes for branch-introduced regressions vs main: 1. tests/gemini_provider.test.ts "should throw on other errors": F4's classifyGeminiError replaced upstream Error message with ClassifiedProviderError. Test was pinned to pre-F4 string. Updated assertion to match new "Gemini bad request (status 400)". 2. tests/infrastructure/graceful-shutdown.test.ts: Test pokes real ~/.claude-mem/supervisor.json registry which on a developer machine contains live worker + chroma-mcp PIDs. SIGTERM → wait → SIGKILL cascade takes ~6s end-to-end. Bumped per-test timeout to 15000ms. Underlying shutdown code unchanged. Future cleanup should mock getSupervisor() here. Result: branch failure count == main (77 pre-existing failures). No new regressions from this branch's work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review: address 4 Greptile P1/P2 findings on PR #2282 P1 (real bug): clearStaleMarker silently broken in ESM - src/shared/oauth-token.ts:14: add unlinkSync to top-level fs import - src/shared/oauth-token.ts:342: drop inline require('fs'), call unlinkSync directly. ESM has no require, so the previous code threw ReferenceError swallowed by try/catch — making clearStaleMarker a permanent no-op. Stale oauth marker would persist indefinitely after Claude Desktop refreshed the token. P2 (security): execSync shell-string interpolation - src/shared/find-claude-executable.ts:39: execSync(`"${candidate}" --version`) → execFileSync(candidate, ['--version']). Path containing ", ;, & — reachable on Windows via crafted CLAUDE_CODE_PATH in settings.json — would otherwise produce a malformed/exploitable command. P2 (security): PowerShell username injection - src/shared/oauth-token.ts:119: userInfo().username escaped with PS single-quote convention (' → '') before interpolation into `'Claude Code-credentials:${user}'`. Defensive against future Windows versions or domain-joined machines that may permit ' in usernames. P2 (style): Unreachable throw lastError post-loop - src/services/worker/retry.ts:109: explained as the safety net for opts.maxRetries < 0 (pathological input where the loop never executes and lastError is undefined). Annotated with comment + descriptive fallback Error so the dead-looking code is now self-documenting. Verification: - npx tsc --noEmit: clean (only pre-existing bun-types error) - bun test tests/shared/oauth-token.test.ts tests/worker/provider-classifiers.test.ts tests/worker/provider-errors.test.ts: 50 pass / 0 fail Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review: tighten SECURITY.md data-flow and audit dates Fixes CodeRabbit comments #3178957249 (Data Storage section overstated "no external transmission" — softened to call out Claude Agent SDK, alternate provider, Chroma MCP, OAuth keychain, and registry fetches) and #3178957250 (Next Scheduled Audit was earlier than Last Updated; bumped Last Updated to 2026-05-03 and audit to 2026-09-16) on PR #2282. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review: drop inline require('fs') in paths.ts Fixes CodeRabbit outside-diff comment on src/shared/paths.ts:25-29 from PR #2282 review. resolveDataDir() ran require('fs') inside an ESM module (this file uses import.meta.url and .js imports), which can break in strict ESM environments. readFileSync now imports at the top alongside existsSync/mkdirSync. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review: block CLAUDE_CODE_OAUTH_TOKEN from parent env (issue #2215) Fixes CodeRabbit outside-diff comment on src/shared/EnvManager.ts:14-17 from PR #2282 review. The OAuth-token leak fix was bypassed because buildIsolatedEnv() copied every parent env var that wasn't in BLOCKED_ENV_VARS, but CLAUDE_CODE_OAUTH_TOKEN was not blocked. A stale parent token therefore still reached isolatedEnv even when the fresh keychain read returned expired/absent — defeating the fix documented inline at lines 178-183. Adds CLAUDE_CODE_OAUTH_TOKEN to BLOCKED_ENV_VARS and defensively deletes it again at the top of buildIsolatedEnvWithFreshOAuth() so the fresh-spawn-time read is the only path that can populate it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review: validate cursor sessionId against path traversal Fixes CodeRabbit comment #3178957252 on PR #2282. The Cursor adapter took sessionId straight from stdin and concatenated it into a join(homedir(), '.cursor', 'projects', ..., sessionId, ...) path. A crafted value containing path separators or '..' segments could escape ~/.cursor/projects, and the later transcript read would then probe arbitrary local files. deriveCursorTranscriptPath() now rejects any sessionId that doesn't match /^[A-Za-z0-9_-]+$/ — Cursor's real session ids are UUID-style identifiers, so the safe whitelist is non-disruptive. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review: scope stripCodeFences() to full-wrapper payloads only Fixes CodeRabbit comment #3178957253 on PR #2282. The previous regex greedily removed the first opening and last closing triple-backticks anywhere in the input, which could mangle valid content with internal fenced examples or surrounding prose — and ran before XML parsing so it created false negatives. stripCodeFences() now only strips when the entire payload is a single fenced block (start-to-end, with optional language tag and surrounding whitespace), capturing the inner content. Adds a regression test that feeds prose with internal triple-backtick markers around a real <observation> block and asserts the inner ``` are preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review: honor abortSignal during retry backoff sleep Fixes CodeRabbit comment #3178957263 on PR #2282. The retry helper used an unconditional `setTimeout` Promise for backoff between attempts, so an external abort that fired during the wait was delayed until the timer completed. The backoff now races setTimeout against opts.abortSignal: if the signal flips, the timer is cleared and the Promise rejects with 'Aborted' immediately. The abort listener is registered with { once: true } and removed when the timer fires to avoid leaks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review: abort immediately on provider-side rejected status Fixes CodeRabbit comment #3178957261 on PR #2282. shouldAbortForQuota() only checked utilization thresholds and reset-grace heuristics; a snapshot with status='rejected' (or overageStatus='rejected' on the overage window) but no utilization number could still return { abort: false }, letting the worker keep consuming after the provider had already declared the bucket exhausted. Provider-side rejection is now checked before utilization. When either rejection signal is present the guard returns abort=true with reason "quota:<window> rejected by provider". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review: only bump Chroma watermark on confirmed batch writes Fixes CodeRabbit comments #3178957259 (watermark advances on swallowed batch failures) and #3178957260 (backfillInProgress can stick true if init throws) on PR #2282. addDocuments() previously logged and swallowed per-batch failures with a void return type, so all three backfill loops (observations, summaries, prompts) bumped the watermark unconditionally after the call — turning a transient Chroma failure into permanently-skipped records. addDocuments() now returns the count of documents that actually landed (including delete+add reconcile retries), and each loop only advances the watermark when the batch wrote successfully. Failed batches log a debug message and continue so the loop still gets through the rest. backfillAllProjects() now constructs SessionStore and ChromaSync inside a try block so a constructor throw can't leave the static backfillInProgress guard stuck true and silently skip every later backfill. The finally always clears the guard and best-effort closes each resource. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review: fall back to pid kill when process group is gone Fixes CodeRabbit outside-diff comment on src/supervisor/shutdown.ts:118-134 from PR #2282 review. signalProcess() returned silently when a pgid was present and process.kill(-pgid, signal) threw ESRCH, never attempting the per-pid signal. With the new chroma registration path that records a pgid alongside the pid, an already-collapsed group could turn shutdown into a no-op even though the root pid was still alive. The POSIX branch now tries -pgid first when present, and on ESRCH falls through to process.kill(pid, signal). Non-ESRCH errors still propagate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review: settings path, uptime clamp, fetch timeouts Fixes three smaller CodeRabbit issues on PR #2282: - SettingsRoutes (outside-diff #2282 review on lines 65-79): the parse-error response told users to delete ~/.claude-mem/settings.json even when paths.settings() resolved elsewhere. Now uses the resolved settingsPath variable in the message. - uptime.ts (#3178957264 / lines 2-3): getUptimeSeconds() could return a negative value if startedAtMs was in the future or the system clock moved backward. Clamps with Math.max(0, ...) so health endpoints never see negative seconds. - check-pending-queue.ts (#3178957248 / lines 27-45): checkWorkerHealth, getProcessingStatus and triggerProcessing all called fetch with no timeout, so the script could block forever if the worker accepted the TCP connection but never responded. Wraps each fetch with an AbortController + 10s timeout that throws a clear timeout message. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review: walk descendants recursively when killing chroma-mcp tree Fixes CodeRabbit comment #3178957258 on PR #2282. The POSIX teardown in ChromaMcpManager.killProcessTree() relied on `pkill -P <pid>`, which only signals direct children. Under uv, chroma-mcp spawns python as a grandchild — when uv exits and python re-parents to init, pkill -P never reaches it and the descendant survives the "tree kill". killProcessTree() now collects the full descendant set via a recursive `pgrep -P` walk before each signal phase. The walk returns leaves first so signals propagate bottom-up (SIGTERM children before their parents, then again for SIGKILL after the 500ms grace window so any layer that re-parented during teardown still gets cleaned up). pgrep failures (no children, missing binary) return [] so this stays best-effort and falls back to the existing per-pid signal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review: tolerate malformed JSONL lines in transcript-parser Fixes Greptile P1 comment 3178964456 on PR #2282. extractLastMessageFromJsonl previously called JSON.parse(rawLine) with no guard. A truncated/malformed JSONL line — common when a transcript was crashed mid-write or partially flushed — would throw SyntaxError, crash the summarization pipeline for that session, and silently lose all prior valid messages. Fix: wrap JSON.parse in try/catch and skip bad lines. The empty-line guard only catches truly empty strings, not malformed fragments. Regression tests added for two cases: - Mixed valid + truncated lines: returns last valid match. - All lines malformed: returns empty string (no throw). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review: classify FK constraint failures BEFORE provider classifier Fixes Greptile P1 comment 3178979583 on PR #2282. The F4 #2244 work introduced a regression: reclassifyAtDispatch always returns a non-null ClassifiedProviderError for known agent types (Claude/Gemini/OpenRouter), so the isFkConstraintFailure branch was dead code. Per-provider classifiers don't recognize "FOREIGN KEY constraint failed", so SQLite FK failures fell through to the default 'transient' kind and would retry indefinitely — restart loop on corrupted session DB state. Old unrecoverablePatterns explicitly listed FK constraint as unrecoverable; restoring that semantic by checking FK FIRST and only deferring to the classifier when not an FK error. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review: validate CLAUDE_MEM_WORKER_PORT in check-pending-queue Parse the env var, range-check (1-65535), and fall back to 37777 with a console.warn on invalid input instead of letting a malformed value flow into the URL builder unchecked (CodeRabbit Minor on PR #2282). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review: SIGKILL union of pre-TERM and post-wait descendant sets When the chroma-mcp root exits during the SIGTERM grace window, its descendants get re-parented to init and drop out of the post-wait pgrep -P scan. Without including the pre-TERM snapshot, those re-parented PIDs would never receive SIGKILL even though they were definitely children before SIGTERM and may still be alive (CodeRabbit Major on PR #2282). Compute Array.from(new Set([...descendantsBeforeTerm, ...descendantsBeforeKill])) and SIGKILL the union. The two sets typically overlap, so dedupe is required. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review: enforce addDocuments return-count in direct sync paths syncObservation/syncSummary/syncUserPrompt now capture the written count from addDocuments() and only bump the watermark when every requested document landed in Chroma. addDocuments() tolerates per-batch failures (returns the actual written count), so the previous unconditional bump was silently marking unsynced rows as synced on transient errors — preventing the next backfill from retrying them (CodeRabbit Major on PR #2282). A partial write now logs a warn with the (requested, written) pair and preserves retryability on the next pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review: guard backfill watermark against non-contiguous failures The backfill watermark is a single monotonic id, so it cannot represent sparse success: "synced through 200, gap at 201–250, then 251 onward" would, on restart, skip 201–250 forever because the watermark sat at either 200 or 251 — both lose data (CodeRabbit Major on PR #2282). Add a per-loop hadGap flag to backfillObservations / backfillSummaries / backfillPrompts. Once any batch under-writes, every subsequent batch must also skip the bump, regardless of whether it itself succeeded. Also tighten the failure check from `writtenInBatch <= 0` to `writtenInBatch < batch.length` so partial-batch writes are caught. The watermark stays at the last contiguously-synced position; the next backfill pass retries from there, eventually closing the gap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review: clear oauth-stale marker when token is absent When an OAuth token disappears entirely (user logs out, keychain cleared), buildIsolatedEnvWithFreshOAuth's absent branch was leaving any prior stale-marker file in place. The session-start hook would then keep surfacing an "expired token, re-login" warning even though the token is no longer expired — it's gone, and re-login was already done elsewhere or not applicable (CodeRabbit Minor on PR #2282). Call clearStaleMarker() in the absent branch the same way the present branch already does. Add a regression test exercising the full buildIsolatedEnvWithFreshOAuth path: pre-write a marker, force absent via spoofed unsupported platform, assert the marker is gone after. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review: skip unknown message.content shapes instead of throwing extractLastMessageFromJsonl already tolerates malformed JSONL lines (JSON.parse failure -> continue), but a valid JSON line whose message.content is an unexpected type (null, number, plain object) was still throwing — contradicting the new tolerance and crashing the entire summary pipeline on a single weird line (CodeRabbit Major + Greptile P1 on PR #2282). Replace the `throw new Error(...)` with `continue` so a single bad content shape skips that line instead of failing the whole transcript read. Forward compat: future content schemas land harmlessly. Add regression tests covering null, number, and plain-object content; each must not throw and must fall back to the most recent valid line. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review: guard null/primitive entries in message.content array Fixes CodeRabbit comment 3179004190 on PR #2282. The Array.isArray branch previously did `c.type === 'text'` directly, which throws if `c` is null or a primitive — possible in malformed logs. Tightened the filter with a type guard: requires c to be a non-null object with type === 'text' and a string text field. Same defensive class as the malformed-line and unknown-content-shape tolerances. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 22:27:07 -07:00
parent 7fb0745ddb
commit d384d3c595
63 changed files with 4437 additions and 1082 deletions
@@ -0,0 +1,206 @@
+# Security Policy
+
+## Supported Versions
+
+Only the latest released version of `claude-mem` receives security updates. Please upgrade to the latest version before reporting a vulnerability.
+
+| Version | Supported          |
+| ------- | ------------------ |
+| latest  | :white_check_mark: |
+| older   | :x:                |
+
+## Reporting a Vulnerability
+
+If you discover a security vulnerability in claude-mem, please report it by:
+
+1. **DO NOT** create a public GitHub issue, pull request, or discussion
+2. Email **alex@cmem.ai** with details, OR use GitHub's "Report a vulnerability" button under the Security tab to open a private security advisory
+3. Include steps to reproduce, impact assessment, affected version(s), and suggested fixes if possible
+
+**Scope:** This policy covers the `claude-mem` plugin and its bundled components (hooks, worker service, SQLite/Chroma sync, viewer UI, search/planning skills). Issues in upstream dependencies should be reported to those projects directly, but feel free to flag them to us as well.
+
+We take security seriously, will acknowledge valid reports within 48 hours, and aim to ship a fix in the next release.
+
+## Security Measures
+
+### Command Injection Prevention
+
+Claude-mem executes system commands for git operations and process management. We have implemented comprehensive protections against command injection:
+
+#### Safe Command Execution
+- **Array-based Arguments:** All commands use array-based arguments to prevent shell interpretation
+- **No Shell Execution:** `shell: false` is explicitly set for all spawn operations involving user input
+- **Input Validation:** All user-controlled parameters are validated before use
+
+#### Example Safe Pattern
+```typescript
+// ✅ SAFE: Array-based arguments with validation
+if (!isValidBranchName(userInput)) {
+  throw new Error('Invalid input');
+}
+spawnSync('git', ['checkout', userInput], { shell: false });
+
+// ❌ UNSAFE: Never do this
+execSync(`git checkout ${userInput}`);
+```
+
+### Input Validation
+
+All user-controlled inputs are validated using whitelists and strict patterns:
+
+- **Branch Names:** Must match `/^[a-zA-Z0-9][a-zA-Z0-9._/-]*$/` and not contain `..`
+- **Port Numbers:** Must be numeric and within range 1024-65535
+- **File Paths:** All paths are joined using `path.join()` to prevent traversal
+
+### Process Management
+
+- **PID File Protection:** Process IDs are stored in user's data directory (`~/.claude-mem/`)
+- **Port Validation:** Worker port is validated before binding
+- **Health Checks:** Worker health is verified before processing requests
+
+### Privacy Controls
+
+Claude-mem includes dual-tag system for content privacy:
+
+- `<private>content</private>` - User-level privacy (prevents storage)
+- `<claude-mem-context>content</claude-mem-context>` - System-level tag (prevents recursive storage)
+
+Tags are stripped at the hook layer before data reaches worker/database.
+
+## Security Audit History
+
+### 2025-12-16: Command Injection Vulnerability (Issue #354)
+- **Severity:** CRITICAL
+- **Status:** RESOLVED
+- **Affected Versions:** All versions prior to fix
+- **Fixed In:** Current version
+- **Vulnerabilities Found:** 3
+- **Vulnerabilities Fixed:** 3
+
+**Summary of Fixes:**
+1. Replaced string interpolation with array-based arguments in `BranchManager.ts`
+2. Added `isValidBranchName()` validation function
+3. Removed unnecessary shell usage in `bun-path.ts`
+4. Created comprehensive security test suite
+
+## Security Best Practices for Contributors
+
+### When Adding Command Execution
+
+1. **NEVER use shell with user input:**
+   ```typescript
+   // ❌ NEVER
+   execSync(`command ${userInput}`);
+   spawn('command', [...], { shell: true });
+
+   // ✅ ALWAYS
+   spawnSync('command', [userInput], { shell: false });
+   ```
+
+2. **ALWAYS validate user input:**
+   ```typescript
+   if (!isValidInput(userInput)) {
+     throw new Error('Invalid input');
+   }
+   ```
+
+3. **Use array-based arguments:**
+   ```typescript
+   // ❌ NEVER
+   execSync(`git ${command} ${arg}`);
+
+   // ✅ ALWAYS
+   spawnSync('git', [command, arg], { shell: false });
+   ```
+
+4. **Explicitly set shell: false:**
+   ```typescript
+   spawnSync('command', args, { shell: false });
+   ```
+
+### When Adding User Input
+
+1. **Whitelist validation** over blacklist
+2. **Strict regex patterns** for format validation
+3. **Type checking** for expected data types
+4. **Range validation** for numeric inputs
+5. **Length limits** for string inputs
+
+### Code Review Checklist
+
+Before submitting a PR with command execution or user input handling:
+
+- [ ] No `execSync` with string interpolation or template literals
+- [ ] No `shell: true` when user input is involved
+- [ ] All spawn/spawnSync calls use array arguments
+- [ ] Input validation is present for all user-controlled parameters
+- [ ] Security tests are added for new attack vectors
+- [ ] Code follows the safe patterns described above
+
+## Dependencies
+
+We regularly audit dependencies for vulnerabilities:
+
+- **npm audit:** Run before each release
+- **Dependabot:** Enabled for automatic security updates
+- **Manual Review:** Critical dependencies reviewed quarterly
+
+## Data Storage
+
+Claude-mem stores data locally in `~/.claude-mem/`:
+
+- **Database:** SQLite3 at `~/.claude-mem/claude-mem.db`
+- **Vector Store:** Chroma at `~/.claude-mem/chroma/`
+- **Logs:** `~/.claude-mem/logs/`
+- **Settings:** `~/.claude-mem/settings.json`
+
+All claude-mem state files (database, vector store, logs, settings, supervisor and PID files) are written to the local user directory and are not uploaded by claude-mem itself. Claude-mem does not collect telemetry.
+
+However, by design claude-mem invokes upstream model providers and optional integrations to do its work, so observation/transcript/prompt content can leave the machine through those channels:
+
+- **Claude Agent SDK** (default summarization/observation path): sends prompts and transcript context to Anthropic's API.
+- **Alternate providers** (`gemini`, `openrouter`): when configured, send the same context to those providers instead.
+- **Chroma MCP / `chroma-mcp`**: when enabled, computes embeddings via the configured embedding backend, which may be a remote API depending on the user's chroma-mcp configuration.
+- **OAuth / keychain reads**: claude-mem reads the Claude Code OAuth token from the platform-native credential store at spawn time. The token is injected into worker subprocesses but is not transmitted by claude-mem.
+- **GitHub releases / npm registry**: version-check and self-update flows fetch metadata from public registries.
+
+Review your provider/Chroma configuration in `~/.claude-mem/settings.json` and `~/.claude-mem/.env` before sending sensitive content. Use `<private>...</private>` tags to keep specific content out of the local store.
+
+## Permissions
+
+Claude-mem requires:
+
+- **File System:** Read/write to `~/.claude-mem/` and `~/.claude/plugins/`
+- **Network:** HTTP server on localhost (default port 37777)
+- **Process Management:** Spawn worker processes, manage PIDs
+
+No elevated privileges (root/administrator) are required.
+
+## Secure Defaults
+
+- **Worker Host:** Binds to `127.0.0.1` by default (localhost only)
+- **Worker Port:** User-configurable, validates range 1024-65535
+- **Log Level:** INFO by default (no sensitive data in logs)
+- **Privacy Tags:** Auto-strips private content before storage
+
+## Updates
+
+Security patches are released as soon as possible after discovery. Users should:
+
+1. Keep claude-mem updated to the latest version
+2. Monitor GitHub releases for security announcements
+3. Review [CHANGELOG.md](./CHANGELOG.md) for security-related changes
+
+## Questions?
+
+For security-related questions (non-vulnerabilities), please:
+
+1. Review code comments in security-critical files
+2. Open a GitHub Discussion (not an Issue) for general security questions
+3. For sensitive questions, email **alex@cmem.ai**
+
+---
+
+**Last Updated:** 2026-05-03
+**Last Audit:** 2025-12-16 (Issue #354)
+**Next Scheduled Audit:** 2026-09-16
@@ -0,0 +1,112 @@
+---
+title: "Custom Anthropic-Compatible Backends"
+description: "Point claude-mem at bridged or self-hosted Anthropic-compatible API endpoints with ANTHROPIC_BASE_URL"
+---
+
+# Custom Anthropic-Compatible Backends
+
+When you use the `claude` provider, claude-mem talks to the Anthropic API through the Claude Agent SDK. By default, the SDK targets the official Anthropic endpoint, but it honors the standard `ANTHROPIC_BASE_URL` environment variable. That means you can route claude-mem at any Anthropic-protocol-compatible backend — for example a corporate gateway, a regional bridge, or a third-party provider that exposes an Anthropic-shaped API — without changing any claude-mem source code.
+
+<Note>
+This page documents how to **persist a custom base URL** so claude-mem's worker uses it consistently. It does **not** add an OpenAI-compatible provider, and it does **not** auto-detect the bridge configuration from your shell — both of those are tracked separately in [issue #2196](https://github.com/thedotmack/claude-mem/issues/2196). For now, configuration is manual.
+</Note>
+
+## When to Use This
+
+Use `ANTHROPIC_BASE_URL` if you need claude-mem's observation worker to talk to:
+
+- A **corporate Anthropic gateway** (proxy in front of `api.anthropic.com`)
+- A **regional Anthropic deployment** (e.g. AWS Bedrock or GCP Vertex via an Anthropic-compatible shim)
+- A **third-party provider** that bridges its API to the Anthropic protocol
+
+If your provider only speaks the OpenAI chat-completions protocol (DeepSeek native, Ollama, vLLM, LiteLLM), use the [OpenRouter provider](../usage/openrouter-provider) instead — it speaks OpenAI-style chat completions and accepts a base URL via OpenRouter's gateway.
+
+## How the Plumbing Works
+
+The flow is intentionally simple:
+
+1. **You write the credential** to `~/.claude-mem/.env`.
+2. **`EnvManager.loadClaudeMemEnv()`** reads that file (`src/shared/EnvManager.ts:67`).
+3. **`buildIsolatedEnv()`** copies `ANTHROPIC_BASE_URL` into the worker's spawn environment alongside `ANTHROPIC_API_KEY` (`src/shared/EnvManager.ts:164`).
+4. **`ClaudeProvider.startSession()`** spawns the Claude Agent SDK with that isolated env (`src/services/worker/ClaudeProvider.ts:115`). The SDK reads `ANTHROPIC_BASE_URL` natively — claude-mem does not parse or rewrite it.
+
+Because the variable is isolated to the worker process, your interactive Claude Code sessions are unaffected; only the background memory agent uses the override.
+
+## Configuration
+
+### Step 1: Edit `~/.claude-mem/.env`
+
+The credentials file is a plain `KEY=VALUE` env file at `~/.claude-mem/.env` (mode `0600`). Add or update the `ANTHROPIC_BASE_URL` line:
+
+```bash
+# ~/.claude-mem/.env
+ANTHROPIC_API_KEY=sk-ant-...
+ANTHROPIC_BASE_URL=https://your-gateway.example.com/v1
+```
+
+If the file does not yet exist, create it. The directory permissions are enforced to `0700` and the file to `0600` automatically on the next worker write.
+
+### Step 2: Pick a Compatible Model
+
+`CLAUDE_MEM_MODEL` (in `~/.claude-mem/settings.json`) is passed straight through to the SDK. The model name **must be one your bridge accepts** — claude-mem does not translate names.
+
+```json
+{
+  "CLAUDE_MEM_MODEL": "claude-haiku-4-5-20251001"
+}
+```
+
+If your bridge expects a non-Anthropic model name (for example, a Bedrock inference profile), set that string here instead.
+
+### Step 3: Restart the Worker
+
+Credentials are loaded when the worker spawns the SDK, so a restart is required after you edit `.env`:
+
+```bash
+npm run worker:restart
+```
+
+## Worked Example: Corporate Gateway
+
+Suppose your team runs `https://anthropic-proxy.internal.example.com` in front of `api.anthropic.com` for audit and rate-limit purposes. The proxy accepts the same protocol and the same model names.
+
+`~/.claude-mem/.env`:
+
+```bash
+ANTHROPIC_API_KEY=sk-corp-...
+ANTHROPIC_BASE_URL=https://anthropic-proxy.internal.example.com
+```
+
+`~/.claude-mem/settings.json`:
+
+```json
+{
+  "CLAUDE_MEM_PROVIDER": "claude",
+  "CLAUDE_MEM_MODEL": "claude-haiku-4-5-20251001"
+}
+```
+
+Restart, and the next observation will be routed through your gateway.
+
+## Verifying
+
+After restarting, watch the worker logs for the next observation flush:
+
+```bash
+npm run worker:logs
+```
+
+A successful request through your gateway shows the standard `SDK Starting SDK query` line followed by `Response received`. If the gateway rejects the request, the SDK error surfaces verbatim in `worker-error.log` — there is no silent fallback to the public Anthropic endpoint.
+
+## Limitations and Gotchas
+
+- **No model-name translation.** If your bridge expects `glm-4.7` and `CLAUDE_MEM_MODEL` is `claude-haiku-4-5-20251001`, the request will fail. Pin `CLAUDE_MEM_MODEL` to a name your bridge recognizes.
+- **`ANTHROPIC_API_KEY` is required even if your gateway uses a different auth header.** The SDK refuses to spawn without it; many gateways either pass the value through or accept any non-empty placeholder. Check your gateway's docs.
+- **`ANTHROPIC_BASE_URL` from your shell is not inherited.** `ANTHROPIC_API_KEY` is in the BLOCKED_ENV_VARS list (`src/shared/EnvManager.ts:10`) to prevent accidental billing on a shell-leaked key — `ANTHROPIC_BASE_URL` is not blocked, but it must still be set in `~/.claude-mem/.env` for the worker to pick it up reliably across restarts. Do not rely on shell exports.
+- **No auto-detection.** If you have already configured `ANTHROPIC_BASE_URL`, `ANTHROPIC_DEFAULT_HAIKU_MODEL`, etc. for Claude Code itself, claude-mem will **not** read those today. Mirror the relevant values into `~/.claude-mem/.env` and `~/.claude-mem/settings.json`. See [issue #2196](https://github.com/thedotmack/claude-mem/issues/2196) for the auto-detect feature request.
+
+## Related
+
+- [Configuration](../configuration) — All claude-mem settings
+- [OpenRouter Provider](../usage/openrouter-provider) — OpenAI-compatible bridge for non-Anthropic protocols
+- [Gemini Provider](../usage/gemini-provider) — Native Gemini API alternative
@@ -80,6 +80,7 @@
        "icon": "gear",
        "pages": [
          "configuration",
+          "configuration/custom-anthropic-backends",
          "modes",
          "development",
          "troubleshooting",
@@ -1,6 +1,6 @@
 #!/usr/bin/env node
 import { spawnSync, spawn } from 'child_process';
-import { existsSync, readFileSync } from 'fs';
+import { existsSync, readFileSync, mkdirSync, appendFileSync, writeFileSync } from 'fs';
 import { join, dirname, resolve } from 'path';
 import { homedir } from 'os';
 import { fileURLToPath } from 'url';
@@ -138,8 +138,58 @@ if (IS_WINDOWS) {
 const child = spawn(spawnCmd, spawnArgs, spawnOptions);

 if (child.stdin) {
-  child.stdin.write(stdinData || '{}');
-  child.stdin.end();
+  if (stdinData && stdinData.length > 0) {
+    child.stdin.write(stdinData);
+    child.stdin.end();
+  } else {
+    // Issue #2188: empty/missing stdin previously masked by `|| '{}'` fallback,
+    // which silently hid WSL bash failures (e.g. hooks invoked under a broken
+    // shell that never piped a payload). Surface the failure mode instead.
+    const dataDir = process.env.CLAUDE_MEM_DATA_DIR || join(homedir(), '.claude-mem');
+    const payloadType = stdinData === null
+      ? 'null (no data event or stream error)'
+      : stdinData === undefined
+        ? 'undefined'
+        : Buffer.isBuffer(stdinData) && stdinData.length === 0
+          ? 'empty Buffer (zero bytes received)'
+          : `unexpected (${typeof stdinData})`;
+    const payloadByteLength = (stdinData && typeof stdinData.length === 'number')
+      ? stdinData.length
+      : 0;
+    const diagnostic = [
+      `[bun-runner] empty stdin payload received — issue #2188`,
+      `  script: ${args[0]}`,
+      `  payload byte length: ${payloadByteLength}`,
+      `  payload type: ${payloadType}`,
+      `  platform: ${process.platform}`,
+      `  shell: ${process.env.SHELL || 'n/a'}`,
+      `  stdin TTY: ${process.stdin.isTTY === true ? 'true' : process.stdin.isTTY === false ? 'false' : 'undefined'}`,
+      `  timestamp: ${new Date().toISOString()}`,
+      `  CLAUDE_PLUGIN_ROOT: ${RESOLVED_PLUGIN_ROOT}`,
+    ].join('\n');
+
+    // Write to stderr so Claude Code surfaces the diagnostic.
+    console.error(diagnostic);
+
+    // Persist diagnostic to the runner-errors log and drop a CAPTURE_BROKEN marker
+    // file so the next session-start hint can surface the failure. We exit 0 to
+    // honor the project's exit-code strategy (worker/hook errors exit 0 to
+    // prevent Windows Terminal tab pileup) — the marker file is the durable
+    // signal that something is wrong, not the exit code.
+    try {
+      const logsDir = join(dataDir, 'logs');
+      mkdirSync(logsDir, { recursive: true });
+      appendFileSync(join(logsDir, 'runner-errors.log'), diagnostic + '\n\n');
+      mkdirSync(dataDir, { recursive: true });
+      writeFileSync(join(dataDir, 'CAPTURE_BROKEN'), diagnostic + '\n');
+    } catch (writeErr) {
+      console.error(`[bun-runner] failed to persist diagnostic: ${writeErr && writeErr.message ? writeErr.message : writeErr}`);
+    }
+
+    try { child.stdin.end(); } catch {}
+    try { child.kill(); } catch {}
+    process.exit(0);
+  }
 }

 child.on('error', (err) => {
@@ -1,76 +1,96 @@
 #!/usr/bin/env bun

-const WORKER_URL = 'http://localhost:37777';
+const DEFAULT_WORKER_PORT = 37777;

-interface QueueMessage {
-  id: number;
-  session_db_id: number;
-  message_type: string;
-  tool_name: string | null;
-  status: 'pending' | 'processing' | 'failed';
-  retry_count: number;
-  created_at_epoch: number;
-  project: string | null;
+function resolveWorkerPort(): number {
+  const raw = process.env.CLAUDE_MEM_WORKER_PORT;
+  if (raw === undefined || raw === '') return DEFAULT_WORKER_PORT;
+  const parsed = parseInt(raw, 10);
+  if (!Number.isInteger(parsed) || parsed < 1 || parsed > 65535) {
+    console.warn(
+      `[check-pending-queue] Invalid CLAUDE_MEM_WORKER_PORT=${JSON.stringify(raw)}; ` +
+        `falling back to ${DEFAULT_WORKER_PORT}`
+    );
+    return DEFAULT_WORKER_PORT;
+  }
+  return parsed;
 }

-interface QueueResponse {
-  queue: {
-    messages: QueueMessage[];
-    totalPending: number;
-    totalProcessing: number;
-    totalFailed: number;
-    stuckCount: number;
-  };
-  recentlyProcessed: QueueMessage[];
-  sessionsWithPendingWork: number[];
+const WORKER_PORT = resolveWorkerPort();
+const WORKER_URL = `http://127.0.0.1:${WORKER_PORT}`;
+const WORKER_FETCH_TIMEOUT_MS = 10_000;
+
+interface ProcessingStatusResponse {
+  isProcessing: boolean;
+  queueDepth: number;
 }

-interface ProcessResponse {
-  success: boolean;
-  totalPendingSessions: number;
-  sessionsStarted: number;
-  sessionsSkipped: number;
-  startedSessionIds: number[];
+interface SetProcessingResponse {
+  status: string;
+  isProcessing: boolean;
+  queueDepth: number;
+  activeSessions: number;
+}
+
+async function fetchWithTimeout(
+  url: string,
+  init: RequestInit | undefined,
+  timeoutMessage: string,
+  timeoutMs: number = WORKER_FETCH_TIMEOUT_MS,
+): Promise<Response> {
+  const controller = new AbortController();
+  const timer = setTimeout(() => controller.abort(), timeoutMs);
+  try {
+    return await fetch(url, { ...init, signal: controller.signal });
+  } catch (err) {
+    if ((err as { name?: string })?.name === 'AbortError') {
+      throw new Error(`${timeoutMessage} (timed out after ${timeoutMs}ms)`);
+    }
+    throw err;
+  } finally {
+    clearTimeout(timer);
+  }
 }

 async function checkWorkerHealth(): Promise<boolean> {
  try {
-    const res = await fetch(`${WORKER_URL}/api/health`);
+    const res = await fetchWithTimeout(
+      `${WORKER_URL}/api/health`,
+      undefined,
+      'Health check did not respond',
+    );
    return res.ok;
  } catch {
    return false;
  }
 }

-async function getQueueStatus(): Promise<QueueResponse> {
-  const res = await fetch(`${WORKER_URL}/api/pending-queue`);
+async function getProcessingStatus(): Promise<ProcessingStatusResponse> {
+  const res = await fetchWithTimeout(
+    `${WORKER_URL}/api/processing-status`,
+    undefined,
+    'Failed to get processing status',
+  );
  if (!res.ok) {
-    throw new Error(`Failed to get queue status: ${res.status}`);
+    throw new Error(`Failed to get processing status: ${res.status}`);
  }
-  return res.json();
+  return res.json() as Promise<ProcessingStatusResponse>;
 }

-async function processQueue(limit: number): Promise<ProcessResponse> {
-  const res = await fetch(`${WORKER_URL}/api/pending-queue/process`, {
-    method: 'POST',
-    headers: { 'Content-Type': 'application/json' },
-    body: JSON.stringify({ sessionLimit: limit })
-  });
+async function triggerProcessing(): Promise<SetProcessingResponse> {
+  const res = await fetchWithTimeout(
+    `${WORKER_URL}/api/processing`,
+    {
+      method: 'POST',
+      headers: { 'Content-Type': 'application/json' },
+      body: JSON.stringify({})
+    },
+    'Failed to trigger processing',
+  );
  if (!res.ok) {
-    throw new Error(`Failed to process queue: ${res.status}`);
+    throw new Error(`Failed to trigger processing: ${res.status}`);
  }
-  return res.json();
-}
-
-function formatAge(epochMs: number): string {
-  const ageMs = Date.now() - epochMs;
-  const minutes = Math.floor(ageMs / 60000);
-  const hours = Math.floor(minutes / 60);
-  const days = Math.floor(hours / 24);
-
-  if (days > 0) return `${days}d ${hours % 24}h ago`;
-  if (hours > 0) return `${hours}h ${minutes % 60}m ago`;
-  return `${minutes}m ago`;
+  return res.json() as Promise<SetProcessingResponse>;
 }

 async function prompt(question: string): Promise<string> {
@@ -97,104 +117,61 @@ async function main() {
    console.log(`
 Claude-Mem Pending Queue Manager

-Check and process pending observation queue backlog.
+Check current processing status and queue depth, optionally trigger processing.

 Usage:
  bun scripts/check-pending-queue.ts [options]

 Options:
  --help, -h     Show this help message
-  --process      Auto-process without prompting
-  --limit N      Process up to N sessions (default: 10)
+  --process      Trigger processing without prompting
+
+Environment:
+  CLAUDE_MEM_WORKER_PORT  Worker port (default: 37777)

 Examples:
  # Check queue status interactively
  bun scripts/check-pending-queue.ts

-  # Auto-process up to 10 sessions
+  # Trigger processing non-interactively
  bun scripts/check-pending-queue.ts --process

-  # Process up to 5 sessions
-  bun scripts/check-pending-queue.ts --process --limit 5
-
 What is this for?
-  If the claude-mem worker crashes or restarts, pending observations may
-  be left unprocessed. This script shows the backlog and lets you trigger
-  processing. The worker no longer auto-recovers on startup to give you
-  control over when processing happens.
+  If the claude-mem worker has unprocessed observations queued, this script
+  reports the current queue depth and lets you trigger processing.
 `);
    process.exit(0);
  }

  const autoProcess = args.includes('--process');
-  const limitArg = args.find((_, i) => args[i - 1] === '--limit');
-  const limit = limitArg ? parseInt(limitArg, 10) : 10;

  console.log('\n=== Claude-Mem Pending Queue Status ===\n');

  const healthy = await checkWorkerHealth();
  if (!healthy) {
-    console.log('Worker is not running. Start it with:');
+    console.log(`Worker is not running at ${WORKER_URL}. Start it with:`);
    console.log('  cd ~/.claude/plugins/marketplaces/thedotmack && npm run worker:start\n');
    process.exit(1);
  }
-  console.log('Worker status: Running\n');
+  console.log(`Worker status: Running at ${WORKER_URL}\n`);

-  const status = await getQueueStatus();
-  const { queue, sessionsWithPendingWork } = status;
+  const status = await getProcessingStatus();

  console.log('Queue Summary:');
-  console.log(`  Pending:    ${queue.totalPending}`);
-  console.log(`  Processing: ${queue.totalProcessing}`);
-  console.log(`  Failed:     ${queue.totalFailed}`);
-  console.log(`  Stuck:      ${queue.stuckCount} (processing > 5 min)`);
-  console.log(`  Sessions:   ${sessionsWithPendingWork.length} with pending work\n`);
+  console.log(`  Processing: ${status.isProcessing ? 'yes' : 'no'}`);
+  console.log(`  Queue depth: ${status.queueDepth}\n`);

-  const hasBacklog = queue.totalPending > 0 || queue.totalFailed > 0;
-  const hasStuck = queue.stuckCount > 0;
+  const hasBacklog = status.queueDepth > 0;

-  if (!hasBacklog && !hasStuck) {
-    console.log('No backlog detected. Queue is healthy.\n');
-
-    if (status.recentlyProcessed.length > 0) {
-      console.log(`Recently processed: ${status.recentlyProcessed.length} messages in last 30 min\n`);
-    }
+  if (!hasBacklog) {
+    console.log('No backlog detected. Queue is empty.\n');
    process.exit(0);
  }

-  if (queue.messages.length > 0) {
-    console.log('Pending Messages:');
-    console.log('─'.repeat(80));
-
-    const bySession = new Map<number, QueueMessage[]>();
-    for (const msg of queue.messages) {
-      const list = bySession.get(msg.session_db_id) || [];
-      list.push(msg);
-      bySession.set(msg.session_db_id, list);
-    }
-
-    for (const [sessionId, messages] of bySession) {
-      const project = messages[0].project || 'unknown';
-      const oldest = Math.min(...messages.map(m => m.created_at_epoch));
-      const statuses = {
-        pending: messages.filter(m => m.status === 'pending').length,
-        processing: messages.filter(m => m.status === 'processing').length,
-        failed: messages.filter(m => m.status === 'failed').length
-      };
-
-      console.log(`  Session ${sessionId} (${project})`);
-      console.log(`    Messages: ${messages.length} total`);
-      console.log(`    Status:   ${statuses.pending} pending, ${statuses.processing} processing, ${statuses.failed} failed`);
-      console.log(`    Age:      ${formatAge(oldest)}`);
-    }
-    console.log('─'.repeat(80));
-    console.log('');
-  }
-
  if (autoProcess) {
-    console.log(`Auto-processing up to ${limit} sessions...\n`);
+    console.log('Triggering processing...\n');
  } else {
-    const answer = await prompt(`Process pending queue? (up to ${limit} sessions) [y/N]: `);
+    const answer = await prompt(`Trigger processing for ${status.queueDepth} queued items? [y/N]: `);
    if (answer.toLowerCase() !== 'y') {
      console.log('\nSkipped. Run with --process to auto-process.\n');
      process.exit(0);
@@ -202,18 +179,15 @@ What is this for?
    console.log('');
  }

-  const result = await processQueue(limit);
+  const result = await triggerProcessing();

  console.log('Processing Result:');
-  console.log(`  Sessions started: ${result.sessionsStarted}`);
-  console.log(`  Sessions skipped: ${result.sessionsSkipped} (already active)`);
-  console.log(`  Remaining:        ${result.totalPendingSessions - result.sessionsStarted}`);
+  console.log(`  Status:           ${result.status}`);
+  console.log(`  Is processing:    ${result.isProcessing ? 'yes' : 'no'}`);
+  console.log(`  Queue depth:      ${result.queueDepth}`);
+  console.log(`  Active sessions:  ${result.activeSessions}`);

-  if (result.startedSessionIds.length > 0) {
-    console.log(`  Started IDs:      ${result.startedSessionIds.join(', ')}`);
-  }
-
-  console.log('\nProcessing started in background. Check status again in a few minutes.\n');
+  console.log('\nProcessing handled by worker. Check status again in a few minutes.\n');
 }

 main().catch(err => {
@@ -129,7 +129,7 @@ try {
  const gitignoreExcludes = getGitignoreExcludes(rootDir);

  execSync(
-    `rsync -av --delete --exclude=.git --exclude=bun.lock --exclude=package-lock.json ${gitignoreExcludes} ./ ~/.claude/plugins/marketplaces/thedotmack/`,
+    `rsync -av --delete --exclude=.git --exclude=bun.lock --exclude=package-lock.json --exclude=scripts/package.json --exclude=scripts/node_modules ${gitignoreExcludes} ./ ~/.claude/plugins/marketplaces/thedotmack/`,
    { stdio: 'inherit' }
  );

@@ -1,6 +1,32 @@
+import { existsSync } from 'fs';
+import { homedir } from 'os';
+import { join } from 'path';
 import type { PlatformAdapter, NormalizedHookInput, HookResult } from '../types.js';
 import { AdapterRejectedInput, isValidCwd } from './errors.js';

+/**
+ * Derive the on-disk path to a Cursor agent transcript JSONL given the
+ * workspace cwd and the conversation id. Cursor stores transcripts at:
+ *
+ *   ~/.cursor/projects/<workspace-slug>/agent-transcripts/<UUID>/<UUID>.jsonl
+ *
+ * where <workspace-slug> is the absolute cwd with the leading slash stripped
+ * and any '/' or '.' replaced with '-' (e.g. /Users/foo.bar/workspaces ->
+ * Users-foo-bar-workspaces). Returns undefined if the file does not exist.
+ */
+// Cursor session ids are UUID-style identifiers. Restrict to a safe character
+// set so a malicious sessionId from stdin cannot escape ~/.cursor/projects via
+// path separators, '..' segments, or null bytes (security review on PR #2282).
+const SAFE_SESSION_ID_RE = /^[A-Za-z0-9_-]+$/;
+
+export function deriveCursorTranscriptPath(cwd: string | undefined, sessionId: string | undefined): string | undefined {
+  if (!cwd || !sessionId) return undefined;
+  if (!SAFE_SESSION_ID_RE.test(sessionId)) return undefined;
+  const slug = cwd.replace(/^\//, '').replace(/[/.]/g, '-');
+  const candidate = join(homedir(), '.cursor', 'projects', slug, 'agent-transcripts', sessionId, `${sessionId}.jsonl`);
+  return existsSync(candidate) ? candidate : undefined;
+}
+
 export const cursorAdapter: PlatformAdapter = {
  normalizeInput(raw) {
    const r = (raw ?? {}) as any;
@@ -9,14 +35,18 @@ export const cursorAdapter: PlatformAdapter = {
    if (!isValidCwd(cwd)) {
      throw new AdapterRejectedInput('invalid_cwd');
    }
+    const sessionId = r.conversation_id || r.generation_id || r.id;
    return {
-      sessionId: r.conversation_id || r.generation_id || r.id,
+      sessionId,
      cwd,
      prompt: r.prompt ?? r.query ?? r.input ?? r.message,
      toolName: isShellCommand ? 'Bash' : r.tool_name,
      toolInput: isShellCommand ? { command: r.command } : r.tool_input,
      toolResponse: isShellCommand ? { output: r.output } : r.result_json,  // result_json not tool_response
-      transcriptPath: undefined,  // Cursor doesn't provide transcript
+      // Cursor's stop hook does not pass a transcript path on stdin, but it
+      // does write a JSONL transcript to disk under ~/.cursor/projects/...,
+      // so we derive the path from cwd + conversation id.
+      transcriptPath: deriveCursorTranscriptPath(cwd, sessionId),
      filePath: r.file_path,
      edits: r.edits,
    };
@@ -1,7 +1,6 @@

 import { Database } from 'bun:sqlite';
 import path from 'path';
-import os from 'os';
 import {
  existsSync,
  writeFileSync,
@@ -15,9 +14,10 @@ import { SettingsDefaultsManager } from '../shared/SettingsDefaultsManager.js';
 import { formatTime, groupByDate } from '../shared/timeline-formatting.js';
 import { isDirectChild } from '../shared/path-utils.js';
 import { logger } from '../utils/logger.js';
+import { paths } from '../shared/paths.js';

-const DB_PATH = path.join(os.homedir(), '.claude-mem', 'claude-mem.db');
-const SETTINGS_PATH = path.join(os.homedir(), '.claude-mem', 'settings.json');
+const DB_PATH = paths.database();
+const SETTINGS_PATH = paths.settings();

 interface ObservationRow {
  id: number;
@@ -9,6 +9,7 @@ import { getProjectContext } from '../../utils/project-name.js';
 import { HOOK_EXIT_CODES } from '../../shared/hook-constants.js';
 import { logger } from '../../utils/logger.js';
 import { loadFromFileOnce } from '../../shared/hook-settings.js';
+import { readStaleMarker } from '../../shared/oauth-token.js';

 export const contextHandler: EventHandler = {
  async execute(input: NormalizedHookInput): Promise<HookResult> {
@@ -43,6 +44,17 @@ export const contextHandler: EventHandler = {
      return emptyResult;
    }

+    // Issue #2215: surface stale OAuth token marker as a session-start hint.
+    // Marker is written by EnvManager.buildIsolatedEnvWithFreshOAuth() when
+    // a previous worker spawn detected an expired keychain entry.
+    const staleReason = readStaleMarker();
+    if (staleReason) {
+      const hint = `[claude-mem] Claude Desktop OAuth token is stale: ${staleReason}\nPlease re-login via Claude Desktop to refresh the token.`;
+      additionalContext = additionalContext
+        ? `${hint}\n\n${additionalContext}`
+        : hint;
+    }
+
    let coloredTimeline = '';
    if (showTerminalOutput) {
      const colorResult = await executeWithWorkerFallback<string>(colorApiPath, 'GET');
@@ -1,4 +1,4 @@
-
+import { z } from "zod";

 interface OpenCodeProject {
  name?: string;
@@ -249,10 +249,7 @@ export const ClaudeMemPlugin = async (ctx: OpenCodePluginContext) => {
        description:
          "Search claude-mem memory database for past observations, sessions, and context",
        args: {
-          query: {
-            type: "string",
-            description: "Search query for memory observations",
-          },
+          query: z.string().describe("Search query for memory observations"),
        },
        async execute(
          args: Record<string, unknown>,
@@ -1,6 +1,7 @@
 import * as p from '@clack/prompts';
 import pc from 'picocolors';
-import { execSync, spawn } from 'child_process';
+import { execSync } from 'child_process';
+import { spawnHidden } from '../../shared/spawn.js';
 import { cpSync, existsSync, mkdirSync, readFileSync, rmSync, writeFileSync } from 'fs';
 import { homedir } from 'os';
 import { dirname, join } from 'path';
@@ -396,7 +397,7 @@ async function installClaudeCode(): Promise<boolean> {

  return new Promise<boolean>((resolve) => {
    let captured = '';
-    const child = spawn(command, [], {
+    const child = spawnHidden(command, [], {
      shell: IS_WINDOWS ? (process.env.ComSpec ?? 'cmd.exe') : '/bin/bash',
      stdio: spinner ? ['inherit', 'pipe', 'pipe'] : 'inherit',
    });
@@ -1,4 +1,4 @@
-import { spawn } from 'child_process';
+import { spawnHidden } from '../../shared/spawn.js';
 import { existsSync } from 'fs';
 import { join } from 'path';
 import pc from 'picocolors';
@@ -42,7 +42,7 @@ function spawnBunWorkerCommand(command: string, extraArgs: string[] = []): void

  const args = [workerScript, command, ...extraArgs];

-  const child = spawn(bunPath, args, {
+  const child = spawnHidden(bunPath, args, {
    stdio: 'inherit',
    cwd: marketplaceDirectory(),
    env: process.env,
@@ -88,7 +88,7 @@ export function runAdoptCommand(extraArgs: string[] = []): void {
  const userCwd = process.cwd();
  const args = [workerScript, 'adopt', '--cwd', userCwd, ...extraArgs];

-  const child = spawn(bunPath, args, {
+  const child = spawnHidden(bunPath, args, {
    stdio: 'inherit',
    cwd: marketplaceDirectory(),
    env: process.env,
@@ -177,7 +177,7 @@ export function runTranscriptWatchCommand(): void {
    return;
  }

-  const child = spawn(bunPath, [transcriptWatcherPath, 'watch'], {
+  const child = spawnHidden(bunPath, [transcriptWatcherPath, 'watch'], {
    stdio: 'inherit',
    cwd: marketplaceDirectory(),
    env: process.env,
@@ -2,6 +2,16 @@
 import { logger } from '../utils/logger.js';
 import { ModeManager } from '../services/domain/ModeManager.js';

+// TODO(#2233): migrate to Anthropic tool-use API for deterministic JSON output. This text-XML path is the bridge.
+// Only strip fences when the entire payload is a single fenced block. Stripping
+// the first opening + last closing fence anywhere in the string can corrupt
+// content that contains internal fenced examples or surrounding prose
+// (CodeRabbit review on PR #2282).
+function stripCodeFences(text: string): string {
+  const match = text.match(/^\s*```(?:xml)?\s*\n([\s\S]*?)\n```\s*$/i);
+  return match ? match[1] : text;
+}
+
 export interface ParsedObservation {
  type: string;
  title: string | null;
@@ -33,6 +43,8 @@ export function parseAgentXml(raw: string, correlationId?: string | number): Par
    return { valid: false };
  }

+  raw = stripCodeFences(raw);
+
  const skipMatch = /<skip_summary(?:\s+reason="([^"]*)")?\s*\/>/.exec(raw);
  if (skipMatch) {
    return {
@@ -39,7 +39,6 @@ ${mode.prompts.skip_guidance}

 ${mode.prompts.output_format_header}

-\`\`\`xml
 <observation>
  <type>[ ${mode.observation_types.map(t => t.id).join(' | ')} ]</type>
  <!--
@@ -72,7 +71,6 @@ ${mode.prompts.output_format_header}
    <file>${mode.prompts.xml_file_placeholder}</file>
  </files_modified>
 </observation>
-\`\`\`
 ${mode.prompts.format_examples}

 ${mode.prompts.footer}
@@ -170,7 +168,6 @@ ${mode.prompts.continuation_instruction}

 ${mode.prompts.output_format_header}

-\`\`\`xml
 <observation>
  <type>[ ${mode.observation_types.map(t => t.id).join(' | ')} ]</type>
  <!--
@@ -203,7 +200,6 @@ ${mode.prompts.output_format_header}
    <file>${mode.prompts.xml_file_placeholder}</file>
  </files_modified>
 </observation>
-\`\`\`
 ${mode.prompts.format_examples}

 ${mode.prompts.footer}
@@ -1,12 +1,11 @@

-import path from 'path';
-import { homedir } from 'os';
 import { SettingsDefaultsManager } from '../../shared/SettingsDefaultsManager.js';
+import { paths } from '../../shared/paths.js';
 import { ModeManager } from '../domain/ModeManager.js';
 import type { ContextConfig } from './types.js';

 export function loadContextConfig(): ContextConfig {
-  const settingsPath = path.join(homedir(), '.claude-mem', 'settings.json');
+  const settingsPath = paths.settings();
  const settings = SettingsDefaultsManager.loadFromFile(settingsPath);

  const mode = ModeManager.getInstance().getActiveMode();
@@ -2,17 +2,19 @@
 import path from 'path';
 import { homedir } from 'os';
 import { existsSync, writeFileSync, readFileSync, unlinkSync, mkdirSync, rmSync, statSync, utimesSync, copyFileSync } from 'fs';
-import { exec, execSync, spawn, spawnSync } from 'child_process';
+import { exec, execSync, spawnSync } from 'child_process';
+import { spawnHidden } from '../../shared/spawn.js';
 import { promisify } from 'util';
 import { logger } from '../../utils/logger.js';
 import { HOOK_TIMEOUTS } from '../../shared/hook-constants.js';
 import { sanitizeEnv } from '../../supervisor/env-sanitizer.js';
 import { getSupervisor, validateWorkerPidFile, type ValidateWorkerPidStatus } from '../../supervisor/index.js';
+import { paths } from '../../shared/paths.js';

 const execAsync = promisify(exec);

-const DATA_DIR = path.join(homedir(), '.claude-mem');
-const PID_FILE = path.join(DATA_DIR, 'worker.pid');
+const DATA_DIR = paths.dataDir();
+const PID_FILE = paths.workerPid();

 interface RuntimeResolverOptions {
  platform?: NodeJS.Platform;
@@ -455,7 +457,7 @@ export function spawnDaemon(
    ? [runtimePath, scriptPath, '--daemon']
    : [scriptPath, '--daemon'];

-  const child = spawn(execPath, args, {
+  const child = spawnHidden(execPath, args, {
    detached: true,
    stdio: 'ignore',
    env
@@ -1,13 +1,13 @@

 import path from 'path';
-import { homedir } from 'os';
 import { existsSync } from 'fs';
 import { spawnSync } from 'child_process';
 import { logger } from '../../utils/logger.js';
 import { getProjectContext } from '../../utils/project-name.js';
 import { ChromaSync } from '../sync/ChromaSync.js';
+import { paths } from '../../shared/paths.js';

-const DEFAULT_DATA_DIR = path.join(homedir(), '.claude-mem');
+const DEFAULT_DATA_DIR = paths.dataDir();

 export interface AdoptionResult {
  repoPath: string;
@@ -9,11 +9,12 @@ import {
  DEFAULT_STATE_PATH,
  SAMPLE_CONFIG,
 } from '../transcripts/config.js';
+import { paths } from '../../shared/paths.js';
 import type { TranscriptWatchConfig, WatchTarget } from '../transcripts/types.js';

 const CODEX_DIR = path.join(homedir(), '.codex');
 const CODEX_AGENTS_MD_PATH = path.join(CODEX_DIR, 'AGENTS.md');
-const CLAUDE_MEM_DIR = path.join(homedir(), '.claude-mem');
+const CLAUDE_MEM_DIR = paths.dataDir();

 const CODEX_WATCH_NAME = 'codex';

@@ -11,6 +11,8 @@ import { getSupervisor } from '../../supervisor/index.js';
 import { isPidAlive } from '../../supervisor/process-registry.js';
 import { ENV_PREFIXES, ENV_EXACT_MATCHES } from '../../supervisor/env-sanitizer.js';
 import { flushResponseThen } from './flushResponseThen.js';
+import { getUptimeSeconds } from '../../shared/uptime.js';
+import { globalRateLimitStore } from '../worker/RateLimitStore.js';

 const INSTRUCTIONS_BASE_DIR: string = path.resolve(__dirname, '../skills/mem-search');
 const INSTRUCTIONS_OPERATIONS_DIR: string = path.join(INSTRUCTIONS_BASE_DIR, 'operations');
@@ -161,7 +163,7 @@ export class Server {
        status: 'ok',
        version: BUILT_IN_VERSION,
        workerPath: this.options.workerPath,
-        uptime: Date.now() - this.startTime,
+        uptime: getUptimeSeconds(this.startTime),
        managed: process.env.CLAUDE_MEM_MANAGED === 'true',
        hasIpc: typeof process.send === 'function',
        platform: process.platform,
@@ -169,6 +171,7 @@ export class Server {
        initialized: this.options.getInitializationComplete(),
        mcpReady: this.options.getMcpReady(),
        ai: this.options.getAiStatus(),
+        rateLimits: globalRateLimitStore.getMostRecentByWindow(),
      });
    });

@@ -266,8 +269,7 @@ export class Server {
        ENV_EXACT_MATCHES.has(key) || ENV_PREFIXES.some(prefix => key.startsWith(prefix))
      );

-      const uptimeMs = Date.now() - this.startTime;
-      const uptimeSeconds = Math.floor(uptimeMs / 1000);
+      const uptimeSeconds = getUptimeSeconds(this.startTime);
      const hours = Math.floor(uptimeSeconds / 3600);
      const minutes = Math.floor((uptimeSeconds % 3600) / 60);
      const formattedUptime = hours > 0 ? `${hours}h ${minutes}m` : `${minutes}m`;
@@ -1,21 +1,24 @@

 import { Client } from '@modelcontextprotocol/sdk/client/index.js';
 import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js';
-import { execSync } from 'child_process';
+import { execFile, execSync, type ChildProcess } from 'child_process';
+import { promisify } from 'util';
 import path from 'path';
 import os from 'os';
 import fs from 'fs';
 import { logger } from '../../utils/logger.js';
 import { SettingsDefaultsManager } from '../../shared/SettingsDefaultsManager.js';
-import { USER_SETTINGS_PATH } from '../../shared/paths.js';
+import { USER_SETTINGS_PATH, paths } from '../../shared/paths.js';
 import { sanitizeEnv } from '../../supervisor/env-sanitizer.js';
 import { getSupervisor } from '../../supervisor/index.js';

+const execFileAsync = promisify(execFile);
+
 const CHROMA_MCP_CLIENT_NAME = 'claude-mem-chroma';
 const CHROMA_MCP_CLIENT_VERSION = '1.0.0';
 const MCP_CONNECTION_TIMEOUT_MS = 30_000;
-const RECONNECT_BACKOFF_MS = 10_000; 
-const DEFAULT_CHROMA_DATA_DIR = path.join(os.homedir(), '.claude-mem', 'chroma');
+const RECONNECT_BACKOFF_MS = 10_000;
+const DEFAULT_CHROMA_DATA_DIR = paths.chroma();
 const CHROMA_SUPERVISOR_ID = 'chroma-mcp';

 const CHROMA_MCP_PINNED_VERSION = '0.2.6';
@@ -325,6 +328,18 @@ export class ChromaMcpManager {
    }
  }

+  /**
+   * Gracefully stop the MCP connection and kill the chroma-mcp subprocess tree.
+   *
+   * The MCP SDK's client.close() sends stdin close -> SIGTERM -> SIGKILL to the
+   * direct child (uvx), but the spawn chain (uvx -> uv -> python -> chroma-mcp)
+   * can leave descendants orphaned because MCP SDK does not use process groups.
+   *
+   * Fix: kill the entire process tree rooted at the direct child PID BEFORE
+   * closing the MCP client, ensuring no orphan python/chroma-mcp processes
+   * accumulate across reconnects or worker restarts. Matches the tree-kill
+   * pattern from shutdown.ts (Principle 5: OS-supervised teardown).
+   */
  async stop(): Promise<void> {
    if (!this.client) {
      logger.debug('CHROMA_MCP', 'No active MCP connection to stop');
@@ -333,6 +348,13 @@ export class ChromaMcpManager {

    logger.info('CHROMA_MCP', 'Stopping chroma-mcp MCP connection');

+    // Kill the entire process tree before closing the MCP client so
+    // descendants (uv, python, chroma-mcp) don't become orphans.
+    const chromaProcess = (this.transport as unknown as { _process?: ChildProcess })?._process;
+    if (chromaProcess?.pid) {
+      await ChromaMcpManager.killProcessTree(chromaProcess.pid);
+    }
+
    try {
      await this.client.close();
    } catch (error) {
@@ -352,6 +374,137 @@ export class ChromaMcpManager {
    logger.info('CHROMA_MCP', 'chroma-mcp MCP connection stopped');
  }

+  /**
+   * Kill a process and all its descendants (tree-kill).
+   *
+   * POSIX: Sends SIGTERM to the process, then uses `pkill -P` to signal
+   * children recursively. Falls back to single-PID kill if pkill is unavailable.
+   *
+   * Windows: Uses `taskkill /T /F /PID` for full subtree teardown (same
+   * pattern as shutdown.ts).
+   *
+   * Best-effort — swallows ESRCH (already dead) and logs other errors.
+   */
+  private static async killProcessTree(pid: number): Promise<void> {
+    logger.debug('CHROMA_MCP', `Killing process tree rooted at PID ${pid}`);
+
+    if (process.platform === 'win32') {
+      try {
+        await execFileAsync('taskkill', ['/PID', String(pid), '/T', '/F'], {
+          timeout: 5_000,
+          windowsHide: true
+        });
+      } catch (error) {
+        // taskkill exits non-zero when the process is already dead — that's fine.
+        logger.debug('CHROMA_MCP', `taskkill tree-kill finished (may already be dead)`, {
+          pid,
+          error: error instanceof Error ? error.message : String(error)
+        });
+      }
+      return;
+    }
+
+    // POSIX: walk descendants recursively (bottom-up) and signal each.
+    // `pkill -P <pid>` only reaches direct children, so `python` /
+    // `chroma-mcp` under `uv` (grandchildren) get re-parented to init and
+    // survive. We collect the full descendant set via `pgrep -P` walks before
+    // signaling, so the SIGTERM phase reaches every layer
+    // (CodeRabbit review on PR #2282).
+    try {
+      const descendantsBeforeTerm = await ChromaMcpManager.collectDescendantPids(pid);
+      // Signal leaves first, then the root.
+      for (const child of descendantsBeforeTerm) {
+        try {
+          process.kill(child, 'SIGTERM');
+        } catch {
+          // Already gone — fine.
+        }
+      }
+      try {
+        process.kill(pid, 'SIGTERM');
+      } catch (error) {
+        const code = (error as NodeJS.ErrnoException).code;
+        if (code !== 'ESRCH') {
+          logger.debug('CHROMA_MCP', `Failed to SIGTERM PID ${pid}`, { code });
+        }
+      }
+
+      // Brief wait for SIGTERM to propagate, then SIGKILL stragglers.
+      await new Promise(resolve => setTimeout(resolve, 500));
+
+      // Re-collect descendants — some layers may have re-parented during the
+      // SIGTERM grace window.
+      //
+      // SIGKILL targets the UNION of pre-TERM and post-wait descendant sets:
+      // when the root exits between snapshots, children get re-parented to
+      // init and drop out of `pgrep -P <root>`. Without the union, those
+      // re-parented descendants would never receive SIGKILL even though they
+      // were definitely children before SIGTERM (CodeRabbit review on PR
+      // #2282). Dedupe via Set since `descendantsBeforeKill` typically
+      // overlaps with `descendantsBeforeTerm`.
+      const descendantsBeforeKill = await ChromaMcpManager.collectDescendantPids(pid);
+      const killTargets = Array.from(new Set([...descendantsBeforeTerm, ...descendantsBeforeKill]));
+      for (const child of killTargets) {
+        try {
+          process.kill(child, 'SIGKILL');
+        } catch {
+          // Already dead — fine.
+        }
+      }
+      try {
+        process.kill(pid, 'SIGKILL');
+      } catch {
+        // Already dead — fine.
+      }
+    } catch (error) {
+      logger.debug('CHROMA_MCP', `Process tree kill completed (best-effort)`, {
+        pid,
+        error: error instanceof Error ? error.message : String(error)
+      });
+    }
+  }
+
+  /**
+   * Recursively collect all descendant PIDs of `rootPid` using `pgrep -P`.
+   * Returned bottom-up (leaves first) so callers can signal leaves before
+   * their ancestors. Best-effort: missing pgrep / non-zero exits return [].
+   */
+  private static async collectDescendantPids(rootPid: number): Promise<number[]> {
+    const seen = new Set<number>();
+    const collected: number[] = [];
+
+    async function walk(pid: number): Promise<void> {
+      let stdout = '';
+      try {
+        const result = await execFileAsync('pgrep', ['-P', String(pid)], { timeout: 2_000 });
+        stdout = result.stdout;
+      } catch {
+        // pgrep exits 1 when no children match — that's fine, just return.
+        return;
+      }
+      const children = stdout
+        .split('\n')
+        .map(line => line.trim())
+        .filter(line => line.length > 0)
+        .map(line => Number.parseInt(line, 10))
+        .filter(n => Number.isFinite(n) && n > 0 && !seen.has(n));
+
+      for (const child of children) {
+        seen.add(child);
+        await walk(child);
+        // Bottom-up: push after recursion so leaves come first.
+        collected.push(child);
+      }
+    }
+
+    await walk(rootPid);
+    return collected;
+  }
+
+  /**
+   * Reset the singleton instance (for testing).
+   * Awaits stop() to prevent dual subprocesses.
+   */
  static async reset(): Promise<void> {
    if (ChromaMcpManager.instance) {
      await ChromaMcpManager.instance.stop();
@@ -360,7 +513,7 @@ export class ChromaMcpManager {
  }

  private getCombinedCertPath(): string | undefined {
-    const combinedCertPath = path.join(os.homedir(), '.claude-mem', 'combined_certs.pem');
+    const combinedCertPath = paths.combinedCerts();

    if (fs.existsSync(combinedCertPath)) {
      const stats = fs.statSync(combinedCertPath);
@@ -435,6 +588,19 @@ export class ChromaMcpManager {
      }
    }

+    // Cap embedding-thread fanout. ONNX Runtime / OpenBLAS / MKL all default to
+    // cpu_count(), so a 12-core box runs 12 threads burning embeddings in
+    // parallel — the dominant cause of the chroma-mcp CPU storm on Windows
+    // (#2220). Two threads keeps backfill latency reasonable without saturating
+    // the box. Only set if the user hasn't pinned them explicitly.
+    const threadCap = '2';
+    for (const key of ['OMP_NUM_THREADS', 'ONNX_NUM_THREADS', 'OPENBLAS_NUM_THREADS', 'MKL_NUM_THREADS']) {
+      if (!baseEnv[key]) baseEnv[key] = threadCap;
+    }
+    // Disable Chroma's anonymous telemetry — it issues background HTTP from
+    // the embedding subprocess on every collection touch.
+    if (!baseEnv.ANONYMIZED_TELEMETRY) baseEnv.ANONYMIZED_TELEMETRY = 'false';
+
    const combinedCertPath = this.getCombinedCertPath();
    if (!combinedCertPath) {
      return baseEnv;
@@ -454,15 +620,30 @@ export class ChromaMcpManager {
  }

  private registerManagedProcess(): void {
-    const chromaProcess = (this.transport as unknown as { _process?: import('child_process').ChildProcess })._process;
+    const chromaProcess = (this.transport as unknown as { _process?: ChildProcess })._process;
    if (!chromaProcess?.pid) {
      return;
    }

+    // Register with pgid so the supervisor's shutdown cascade can use
+    // process-group signaling (kill(-pgid, signal)) to tear down the
+    // entire spawn chain (uvx -> uv -> python -> chroma-mcp) in one
+    // syscall, matching the SDK subprocess pattern in process-registry.ts.
+    //
+    // Note: MCP SDK's StdioClientTransport does NOT use detached:true,
+    // so the child shares our process group — setting pgid here enables
+    // tree-kill via signalProcess() in shutdown.ts which falls back to
+    // taskkill /T on Windows when pgid is present but group signal fails.
+    // On POSIX the pgid recorded here is used by killProcessTree() in
+    // stop() for explicit tree teardown rather than negative-PID signaling.
    getSupervisor().registerProcess(CHROMA_SUPERVISOR_ID, {
      pid: chromaProcess.pid,
      type: 'chroma',
-      startedAt: new Date().toISOString()
+      startedAt: new Date().toISOString(),
+      // Store pid as pgid — shutdown.ts will attempt kill(-pgid) on POSIX.
+      // If the child isn't actually its own group leader, the ESRCH is caught
+      // and shutdown falls back to single-PID kill (see signalProcess()).
+      pgid: chromaProcess.pid
    }, chromaProcess);

    chromaProcess.once('exit', () => {
@@ -222,15 +222,25 @@ export class ChromaSync {
    return documents;
  }

-  private async addDocuments(documents: ChromaDocument[]): Promise<void> {
+  /**
+   * Write `documents` to Chroma in BATCH_SIZE-sized batches.
+   *
+   * Returns the number of documents that were successfully written (or
+   * confirmed via delete+add reconcile). Per-batch failures are logged and the
+   * loop continues — we never throw — so callers must use the returned count
+   * to advance their watermark, otherwise an interrupted backfill can mark
+   * unsynced records as synced.
+   */
+  private async addDocuments(documents: ChromaDocument[]): Promise<number> {
    if (documents.length === 0) {
-      return;
+      return 0;
    }

    await this.ensureCollectionExists();

    const chromaMcp = ChromaMcpManager.getInstance();

+    let written = 0;
    for (let i = 0; i < documents.length; i += this.BATCH_SIZE) {
      const batch = documents.slice(i, i + this.BATCH_SIZE);

@@ -247,6 +257,7 @@ export class ChromaSync {
          documents: batch.map(d => d.document),
          metadatas: cleanMetadatas
        });
+        written += batch.length;
      } catch (error) {
        const errMsg = error instanceof Error ? error.message : String(error);
        if (errMsg.includes('already exist')) {
@@ -261,20 +272,21 @@ export class ChromaSync {
              documents: batch.map(d => d.document),
              metadatas: cleanMetadatas
            });
+            written += batch.length;
            logger.info('CHROMA_SYNC', 'Batch reconciled via delete+add after duplicate conflict', {
              collection: this.collectionName,
              batchStart: i,
              batchSize: batch.length
            });
          } catch (reconcileError) {
-            logger.error('CHROMA_SYNC', 'Batch reconcile (delete+add) failed', {
+            logger.error('CHROMA_SYNC', 'Batch reconcile (delete+add) failed — watermark will not advance for this batch', {
              collection: this.collectionName,
              batchStart: i,
              batchSize: batch.length
            }, reconcileError as Error);
          }
        } else {
-          logger.error('CHROMA_SYNC', 'Batch add failed, continuing with remaining batches', {
+          logger.error('CHROMA_SYNC', 'Batch add failed — watermark will not advance for this batch, continuing with remaining batches', {
            collection: this.collectionName,
            batchStart: i,
            batchSize: batch.length
@@ -285,8 +297,10 @@ export class ChromaSync {

    logger.debug('CHROMA_SYNC', 'Documents added', {
      collection: this.collectionName,
-      count: documents.length
+      requested: documents.length,
+      written
    });
+    return written;
  }

  async syncObservation(
@@ -326,8 +340,22 @@ export class ChromaSync {
      project
    });

-    await this.addDocuments(documents);
-    ChromaSyncState.bump(project, 'observations', observationId);
+    // Only advance the watermark on a confirmed full write. addDocuments() now
+    // returns a written count and tolerates per-batch failures, so a transient
+    // Chroma error must NOT mark this observation as synced — otherwise the
+    // backfill pass on next boot will skip past it (CodeRabbit review on PR
+    // #2282).
+    const written = await this.addDocuments(documents);
+    if (written === documents.length) {
+      ChromaSyncState.bump(project, 'observations', observationId);
+    } else {
+      logger.warn('CHROMA_SYNC', 'Observation watermark bump skipped — partial write', {
+        observationId,
+        project,
+        requested: documents.length,
+        written
+      });
+    }
  }

  async syncSummary(
@@ -364,8 +392,18 @@ export class ChromaSync {
      project
    });

-    await this.addDocuments(documents);
-    ChromaSyncState.bump(project, 'summaries', summaryId);
+    // Only bump on a confirmed full write — see syncObservation() for rationale.
+    const written = await this.addDocuments(documents);
+    if (written === documents.length) {
+      ChromaSyncState.bump(project, 'summaries', summaryId);
+    } else {
+      logger.warn('CHROMA_SYNC', 'Summary watermark bump skipped — partial write', {
+        summaryId,
+        project,
+        requested: documents.length,
+        written
+      });
+    }
  }

  private formatUserPromptDoc(prompt: StoredUserPrompt): ChromaDocument {
@@ -409,8 +447,17 @@ export class ChromaSync {
      project
    });

-    await this.addDocuments([document]);
-    ChromaSyncState.bump(project, 'prompts', promptId);
+    // Only bump on a confirmed full write — see syncObservation() for rationale.
+    const written = await this.addDocuments([document]);
+    if (written === 1) {
+      ChromaSyncState.bump(project, 'prompts', promptId);
+    } else {
+      logger.warn('CHROMA_SYNC', 'Prompt watermark bump skipped — write failed', {
+        promptId,
+        project,
+        written
+      });
+    }
  }

  private async getExistingChromaIds(projectOverride?: string): Promise<{
@@ -574,34 +621,67 @@ export class ChromaSync {
      obsByDocCount.push({ obs, docs });
    }

+    // Watermark must be durable per-batch: SIGKILL / OOM / reboot mid-flight
+    // skips any trailing finally, so a once-at-end bump leaves the watermark
+    // at zero and the next boot re-embeds everything (#2214, amplifies #2220).
+    //
+    // Non-contiguous failure guard: once any batch under-writes, ALL later
+    // batches must also skip the watermark bump. The watermark is a single
+    // monotonic id, so it cannot represent "synced through 200, then a gap at
+    // 201–250, then 251 onward" — bumping past the gap would silently drop
+    // 201–250 forever (CodeRabbit review on PR #2282).
    let writtenDocs = 0;
    let lastSyncedObsIdx = -1;
-    try {
-      for (let i = 0; i < allDocs.length; i += this.BATCH_SIZE) {
-        const batch = allDocs.slice(i, i + this.BATCH_SIZE);
-        await this.addDocuments(batch);
-        writtenDocs += batch.length;
-
-        let cursor = 0;
-        for (let j = 0; j < obsByDocCount.length; j++) {
-          cursor += obsByDocCount[j].docs.length;
-          if (cursor <= writtenDocs) {
-            lastSyncedObsIdx = j;
-          } else {
-            break;
-          }
-        }
-
-        logger.debug('CHROMA_SYNC', 'Backfill progress', {
+    let hadGap = false;
+    for (let i = 0; i < allDocs.length; i += this.BATCH_SIZE) {
+      const batch = allDocs.slice(i, i + this.BATCH_SIZE);
+      const writtenInBatch = await this.addDocuments(batch);
+      // Only advance the watermark for documents that actually landed in
+      // Chroma. addDocuments() logs and continues on per-batch failures, so a
+      // partial write must not mark unwritten docs as synced.
+      if (writtenInBatch < batch.length) {
+        hadGap = true;
+        logger.debug('CHROMA_SYNC', 'Skipping watermark bump for failed/partial batch', {
          project: backfillProject,
-          progress: `${Math.min(i + this.BATCH_SIZE, allDocs.length)}/${allDocs.length}`
+          batchStart: i,
+          requested: batch.length,
+          written: writtenInBatch
        });
+        continue;
      }
-    } finally {
+      if (hadGap) {
+        // A previous batch left a gap; downstream batches cannot bump the
+        // watermark even if they themselves succeeded.
+        logger.debug('CHROMA_SYNC', 'Skipping watermark bump after prior gap', {
+          project: backfillProject,
+          batchStart: i
+        });
+        continue;
+      }
+      writtenDocs += writtenInBatch;
+
+      let cursor = 0;
+      for (let j = 0; j < obsByDocCount.length; j++) {
+        cursor += obsByDocCount[j].docs.length;
+        if (cursor <= writtenDocs) {
+          lastSyncedObsIdx = j;
+        } else {
+          break;
+        }
+      }
+
      if (lastSyncedObsIdx >= 0) {
-        const highestId = obsByDocCount[lastSyncedObsIdx].obs.id;
-        ChromaSyncState.bump(backfillProject, 'observations', highestId);
+        ChromaSyncState.bump(
+          backfillProject,
+          'observations',
+          obsByDocCount[lastSyncedObsIdx].obs.id
+        );
      }
+
+      logger.debug('CHROMA_SYNC', 'Backfill progress', {
+        project: backfillProject,
+        progress: `${Math.min(i + this.BATCH_SIZE, allDocs.length)}/${allDocs.length}`
+      });
    }

    return allDocs;
@@ -641,30 +721,53 @@ export class ChromaSync {
      summaryByDocCount.push({ summary, docs });
    }

+    // Non-contiguous failure guard: see backfillObservations() for rationale.
    let writtenDocs = 0;
    let lastSyncedIdx = -1;
-    try {
-      for (let i = 0; i < summaryDocs.length; i += this.BATCH_SIZE) {
-        const batch = summaryDocs.slice(i, i + this.BATCH_SIZE);
-        await this.addDocuments(batch);
-        writtenDocs += batch.length;
-
-        let cursor = 0;
-        for (let j = 0; j < summaryByDocCount.length; j++) {
-          cursor += summaryByDocCount[j].docs.length;
-          if (cursor <= writtenDocs) lastSyncedIdx = j;
-          else break;
-        }
-
-        logger.debug('CHROMA_SYNC', 'Backfill progress', {
+    let hadGap = false;
+    for (let i = 0; i < summaryDocs.length; i += this.BATCH_SIZE) {
+      const batch = summaryDocs.slice(i, i + this.BATCH_SIZE);
+      const writtenInBatch = await this.addDocuments(batch);
+      // Only advance the watermark for documents that actually landed in
+      // Chroma. See the analogous comment in backfillObservations().
+      if (writtenInBatch < batch.length) {
+        hadGap = true;
+        logger.debug('CHROMA_SYNC', 'Skipping watermark bump for failed/partial batch', {
          project: backfillProject,
-          progress: `${Math.min(i + this.BATCH_SIZE, summaryDocs.length)}/${summaryDocs.length}`
+          batchStart: i,
+          requested: batch.length,
+          written: writtenInBatch
        });
+        continue;
      }
-    } finally {
+      if (hadGap) {
+        logger.debug('CHROMA_SYNC', 'Skipping watermark bump after prior gap', {
+          project: backfillProject,
+          batchStart: i
+        });
+        continue;
+      }
+      writtenDocs += writtenInBatch;
+
+      let cursor = 0;
+      for (let j = 0; j < summaryByDocCount.length; j++) {
+        cursor += summaryByDocCount[j].docs.length;
+        if (cursor <= writtenDocs) lastSyncedIdx = j;
+        else break;
+      }
+
      if (lastSyncedIdx >= 0) {
-        ChromaSyncState.bump(backfillProject, 'summaries', summaryByDocCount[lastSyncedIdx].summary.id);
+        ChromaSyncState.bump(
+          backfillProject,
+          'summaries',
+          summaryByDocCount[lastSyncedIdx].summary.id
+        );
      }
+
+      logger.debug('CHROMA_SYNC', 'Backfill progress', {
+        project: backfillProject,
+        progress: `${Math.min(i + this.BATCH_SIZE, summaryDocs.length)}/${summaryDocs.length}`
+      });
    }

    return summaryDocs;
@@ -709,23 +812,41 @@ export class ChromaSync {
      promptDocs.push(this.formatUserPromptDoc(prompt));
    }

-    let lastSyncedPromptId = 0;
-    try {
-      for (let i = 0; i < promptDocs.length; i += this.BATCH_SIZE) {
-        const batch = promptDocs.slice(i, i + this.BATCH_SIZE);
-        await this.addDocuments(batch);
-        const upTo = Math.min(i + this.BATCH_SIZE, prompts.length);
-        lastSyncedPromptId = prompts[upTo - 1].id;
-
-        logger.debug('CHROMA_SYNC', 'Backfill progress', {
+    // Prompts are 1 doc each — bump the watermark per batch so an interrupted
+    // backfill resumes where it left off instead of re-embedding from zero.
+    // Only advance the watermark when the batch actually wrote — partial
+    // writes must not skip the failed prompts on restart.
+    //
+    // Non-contiguous failure guard: see backfillObservations() for rationale.
+    let hadGap = false;
+    for (let i = 0; i < promptDocs.length; i += this.BATCH_SIZE) {
+      const batch = promptDocs.slice(i, i + this.BATCH_SIZE);
+      const writtenInBatch = await this.addDocuments(batch);
+      const upTo = Math.min(i + this.BATCH_SIZE, prompts.length);
+      if (writtenInBatch < batch.length) {
+        hadGap = true;
+        logger.debug('CHROMA_SYNC', 'Skipping prompt watermark bump for failed/partial batch', {
          project: backfillProject,
-          progress: `${upTo}/${promptDocs.length}`
+          batchStart: i,
+          requested: batch.length,
+          written: writtenInBatch
        });
+        continue;
      }
-    } finally {
-      if (lastSyncedPromptId > 0) {
-        ChromaSyncState.bump(backfillProject, 'prompts', lastSyncedPromptId);
+      if (hadGap) {
+        logger.debug('CHROMA_SYNC', 'Skipping prompt watermark bump after prior gap', {
+          project: backfillProject,
+          batchStart: i
+        });
+        continue;
      }
+      const lastSyncedPromptId = prompts[upTo - 1].id;
+      ChromaSyncState.bump(backfillProject, 'prompts', lastSyncedPromptId);
+
+      logger.debug('CHROMA_SYNC', 'Backfill progress', {
+        project: backfillProject,
+        progress: `${upTo}/${promptDocs.length}`
+      });
    }

    return promptDocs;
@@ -814,9 +935,50 @@ export class ChromaSync {
    return { ids, distances, metadatas };
  }

+  /** Maximum number of concurrent project backfills to run at once. */
+  private static readonly BACKFILL_CONCURRENCY_LIMIT = 3;
+
+  /** Guard flag to prevent overlapping backfill runs from fire-and-forget callers. */
+  private static backfillInProgress = false;
+
+  /**
+   * Backfill all projects that have observations in SQLite but may be missing from Chroma.
+   * Uses a single shared ChromaSync('claude-mem') instance and Chroma connection.
+   * Per-project scoping is passed as a parameter to ensureBackfilled(), avoiding
+   * instance state mutation. All documents land in the cm__claude-mem collection
+   * with project scoped via metadata, matching how DatabaseManager and SearchManager operate.
+   * Designed to be called fire-and-forget on worker startup.
+   *
+   * Concurrency: processes at most BACKFILL_CONCURRENCY_LIMIT projects in parallel
+   * to bound CPU and memory pressure from concurrent Chroma embedding operations.
+   * A re-entrant guard prevents overlapping backfill runs from accumulating.
+   */
  static async backfillAllProjects(storeOverride?: SessionStore): Promise<void> {
-    const db = storeOverride ?? new SessionStore();
-    const sync = new ChromaSync('claude-mem');
+    if (ChromaSync.backfillInProgress) {
+      logger.info('CHROMA_SYNC', 'Backfill already in progress, skipping duplicate run');
+      return;
+    }
+
+    // Allocate first so a constructor throw cannot leave the guard stuck true
+    // and silently skip every subsequent backfill (CodeRabbit review on PR
+    // #2282). The guard only flips to true after both resources are alive,
+    // and the finally always clears it.
+    let db: SessionStore | undefined;
+    let sync: ChromaSync | undefined;
+    try {
+      db = storeOverride ?? new SessionStore();
+      sync = new ChromaSync('claude-mem');
+    } catch (error) {
+      logger.error('CHROMA_SYNC', 'Failed to initialize backfill resources',
+        {}, error instanceof Error ? error : new Error(String(error)));
+      // Best-effort cleanup if SessionStore allocated but ChromaSync threw.
+      if (db && !storeOverride) {
+        try { db.close(); } catch { /* ignore */ }
+      }
+      throw error;
+    }
+
+    ChromaSync.backfillInProgress = true;
    try {
      const projects = db.db.prepare(
        'SELECT DISTINCT project FROM observations WHERE project IS NOT NULL AND project != ?'
@@ -837,22 +999,45 @@ export class ChromaSync {
        logger.info('CHROMA_SYNC', 'Bootstrap complete — incremental backfills will use watermarks');
      }

-      for (const { project } of projects) {
-        try {
-          await sync.ensureBackfilled(project, db);
-        } catch (error) {
-          if (error instanceof Error) {
-            logger.error('CHROMA_SYNC', `Backfill failed for project: ${project}`, {}, error);
-          } else {
-            logger.error('CHROMA_SYNC', `Backfill failed for project: ${project}`, { error: String(error) });
+      // Process projects in chunks of BACKFILL_CONCURRENCY_LIMIT to bound
+      // CPU/memory pressure from concurrent Chroma embedding operations.
+      // Each chunk runs its projects in parallel; we wait for the entire chunk
+      // before starting the next one. Simple and predictable — no semaphore
+      // overhead, no unbounded fan-out.
+      const concurrency = ChromaSync.BACKFILL_CONCURRENCY_LIMIT;
+      for (let i = 0; i < projects.length; i += concurrency) {
+        const chunk = projects.slice(i, i + concurrency);
+        const chunkResults = await Promise.allSettled(
+          chunk.map(({ project }) => sync!.ensureBackfilled(project, db!))
+        );
+
+        for (let j = 0; j < chunkResults.length; j++) {
+          const result = chunkResults[j];
+          if (result.status === 'rejected') {
+            const project = chunk[j].project;
+            const error = result.reason;
+            if (error instanceof Error) {
+              logger.error('CHROMA_SYNC', `Backfill failed for project: ${project}`, {}, error);
+            } else {
+              logger.error('CHROMA_SYNC', `Backfill failed for project: ${project}`, { error: String(error) });
+            }
+            // Continue to next chunk — don't let one failure stop others
          }
-          // Continue to next project — don't let one failure stop others
        }
      }
    } finally {
-      await sync.close();
-      if (!storeOverride) {
-        db.close();
+      ChromaSync.backfillInProgress = false;
+      if (sync) {
+        try { await sync.close(); } catch (closeError) {
+          logger.debug('CHROMA_SYNC', 'sync.close() failed during backfill teardown',
+            {}, closeError instanceof Error ? closeError : new Error(String(closeError)));
+        }
+      }
+      if (!storeOverride && db) {
+        try { db.close(); } catch (closeError) {
+          logger.debug('CHROMA_SYNC', 'db.close() failed during backfill teardown',
+            {}, closeError instanceof Error ? closeError : new Error(String(closeError)));
+        }
      }
    }
  }
@@ -1,10 +1,11 @@
 import { existsSync, readFileSync, writeFileSync, mkdirSync } from 'fs';
 import { homedir } from 'os';
 import { join, dirname } from 'path';
+import { paths } from '../../shared/paths.js';
 import type { TranscriptSchema, TranscriptWatchConfig } from './types.js';

-export const DEFAULT_CONFIG_PATH = join(homedir(), '.claude-mem', 'transcript-watch.json');
-export const DEFAULT_STATE_PATH = join(homedir(), '.claude-mem', 'transcript-watch-state.json');
+export const DEFAULT_CONFIG_PATH = paths.transcriptsConfig();
+export const DEFAULT_STATE_PATH = paths.transcriptsState();

 const CODEX_SAMPLE_SCHEMA: TranscriptSchema = {
  name: 'codex',
@@ -78,7 +79,8 @@ const CODEX_SAMPLE_SCHEMA: TranscriptSchema = {
    },
    {
      name: 'session-end',
-      match: { path: 'payload.type', in: ['turn_aborted', 'turn_completed'] },
+      // TODO(#2249): delete watcher when Codex hook lifecycle migration ships
+      match: { path: 'payload.type', in: ['turn_aborted', 'turn_completed', 'task_complete'] },
      action: 'session_end'
    }
  ]
@@ -58,10 +58,11 @@ import {
 import { DatabaseManager } from './worker/DatabaseManager.js';
 import { SessionManager } from './worker/SessionManager.js';
 import { SSEBroadcaster } from './worker/SSEBroadcaster.js';
-import { ClaudeProvider } from './worker/ClaudeProvider.js';
+import { ClaudeProvider, classifyClaudeError } from './worker/ClaudeProvider.js';
 import type { WorkerRef } from './worker/agents/types.js';
-import { GeminiProvider, isGeminiSelected, isGeminiAvailable } from './worker/GeminiProvider.js';
-import { OpenRouterProvider, isOpenRouterSelected, isOpenRouterAvailable } from './worker/OpenRouterProvider.js';
+import { GeminiProvider, classifyGeminiError, isGeminiSelected, isGeminiAvailable } from './worker/GeminiProvider.js';
+import { OpenRouterProvider, classifyOpenRouterError, isOpenRouterSelected, isOpenRouterAvailable } from './worker/OpenRouterProvider.js';
+import { ClassifiedProviderError, isClassified, type ProviderErrorClass } from './worker/provider-errors.js';
 import { PaginationHelper } from './worker/PaginationHelper.js';
 import { SettingsManager } from './worker/SettingsManager.js';
 import { SearchManager } from './worker/SearchManager.js';
@@ -503,6 +504,36 @@ export class WorkerService implements WorkerRef {
    return this.sdkAgent;
  }

+  /**
+   * Re-classify a raw error at the worker-service dispatch site using the
+   * active provider's classifier. Returns null when the provider classifier
+   * doesn't recognize the shape (caller falls back to default behavior).
+   *
+   * Most provider errors should already be classified at the provider
+   * boundary — this is a safety net for errors from inside the SDK that
+   * never round-tripped through fetch (e.g. Anthropic SDK exceptions).
+   */
+  private reclassifyAtDispatch(
+    error: unknown,
+    agent: ClaudeProvider | GeminiProvider | OpenRouterProvider
+  ): ClassifiedProviderError | null {
+    try {
+      if (agent instanceof ClaudeProvider) {
+        return classifyClaudeError(error);
+      }
+      if (agent instanceof GeminiProvider) {
+        // Without a status code we still want network/spawn detection.
+        return classifyGeminiError({ cause: error });
+      }
+      if (agent instanceof OpenRouterProvider) {
+        return classifyOpenRouterError({ cause: error });
+      }
+    } catch {
+      // If the classifier itself throws, fall back to unclassified.
+    }
+    return null;
+  }
+
  private startSessionProcessor(
    session: ReturnType<typeof this.sessionManager.getSession>,
    source: string
@@ -531,22 +562,26 @@ export class WorkerService implements WorkerRef {
      .catch(async (error: unknown) => {
        const errorMessage = (error as Error)?.message || '';

-        const unrecoverablePatterns = [
-          'Claude executable not found',
-          'CLAUDE_CODE_PATH',
-          'ENOENT',
-          'spawn',
-          'Invalid API key',
-          'API_KEY_INVALID',
-          'API key expired',
-          'API key not valid',
-          'PERMISSION_DENIED',
-          'Gemini API error: 400',
-          'Gemini API error: 401',
-          'Gemini API error: 403',
-          'FOREIGN KEY constraint failed',
-        ];
-        if (unrecoverablePatterns.some(pattern => errorMessage.includes(pattern))) {
+        // Dispatch on F4 ClassifiedProviderError.kind. Replaces the old
+        // string-matching allowlist (#2244). Already-classified errors
+        // propagate kind from the provider boundary; raw errors get
+        // re-classified here using provider-specific helpers based on the
+        // active agent.
+        const classified: ClassifiedProviderError | null = isClassified(error)
+          ? error
+          : this.reclassifyAtDispatch(error, agent);
+
+        // FOREIGN KEY constraint failures from SQLite are unrecoverable but
+        // not provider-specific; check before deferring to the classifier so
+        // FK failures don't get misclassified as transient and retry forever
+        // (per-provider classifiers don't recognize FK errors).
+        const isFkConstraintFailure = errorMessage.includes('FOREIGN KEY constraint failed');
+
+        const dispatchKind: ProviderErrorClass | null = isFkConstraintFailure
+          ? 'unrecoverable'
+          : (classified ? classified.kind : null);
+
+        if (dispatchKind === 'unrecoverable' || dispatchKind === 'auth_invalid' || dispatchKind === 'quota_exhausted') {
          hadUnrecoverableError = true;
          this.lastAiInteraction = {
            timestamp: Date.now(),
@@ -554,9 +589,13 @@ export class WorkerService implements WorkerRef {
            provider: providerName,
            error: errorMessage,
          };
-          logger.error('SDK', 'Unrecoverable generator error - will NOT restart', {
+          const logLabel =
+            dispatchKind === 'auth_invalid' ? 'auth invalid' :
+            dispatchKind === 'quota_exhausted' ? 'quota exhausted' : 'unrecoverable';
+          logger.error('SDK', `Unrecoverable generator error (${logLabel}) - will NOT restart`, {
            sessionId: session.sessionDbId,
            project: session.project,
+            errorKind: dispatchKind,
            errorMessage
          });
          return;
@@ -33,7 +33,7 @@ export interface ActiveSession {
  lastSummaryStored?: boolean;
  pendingAgentId?: string | null;
  pendingAgentType?: string | null;
-  abortReason?: 'idle' | 'shutdown' | 'overflow' | 'restart-guard' | null;
+  abortReason?: 'idle' | 'shutdown' | 'overflow' | 'restart-guard' | 'quota' | string | null;
  respawnTimer?: ReturnType<typeof setTimeout>;
 }

@@ -1,14 +1,12 @@

-import { execSync } from 'child_process';
-import { homedir } from 'os';
-import path from 'path';
 import { DatabaseManager } from './DatabaseManager.js';
 import { SessionManager } from './SessionManager.js';
 import { logger } from '../../utils/logger.js';
 import { buildInitPrompt, buildObservationPrompt, buildSummaryPrompt, buildContinuationPrompt } from '../../sdk/prompts.js';
 import { SettingsDefaultsManager } from '../../shared/SettingsDefaultsManager.js';
-import { USER_SETTINGS_PATH, OBSERVER_SESSIONS_DIR, ensureDir } from '../../shared/paths.js';
-import { buildIsolatedEnv, getAuthMethodDescription } from '../../shared/EnvManager.js';
+import { USER_SETTINGS_PATH, OBSERVER_SESSIONS_DIR, ensureDir, paths } from '../../shared/paths.js';
+import { buildIsolatedEnvWithFreshOAuth, getAuthMethodDescription } from '../../shared/EnvManager.js';
+import { findClaudeExecutable } from '../../shared/find-claude-executable.js';
 import type { ActiveSession, SDKUserMessage } from '../worker-types.js';
 import { ModeManager } from '../domain/ModeManager.js';
 import { processAgentResponse, type WorkerRef } from './agents/index.js';
@@ -19,9 +17,86 @@ import {
  waitForSlot,
 } from '../../supervisor/process-registry.js';
 import { sanitizeEnv } from '../../supervisor/env-sanitizer.js';
+import {
+  globalRateLimitStore,
+  shouldAbortForQuota,
+  type RateLimitInfo,
+} from './RateLimitStore.js';

 // @ts-ignore - Agent SDK types may not be available
 import { query } from '@anthropic-ai/claude-agent-sdk';
+import { ClassifiedProviderError } from './provider-errors.js';
+
+/**
+ * Classify a ClaudeProvider error (executable spawn failures, SDK errors,
+ * Anthropic API errors). Provider-specific because it relies on:
+ *   - SDK error class names (e.g. OverloadedError) when present
+ *   - spawn errors (ENOENT) when the Claude executable is missing
+ *   - Anthropic-specific message strings ("Invalid API key", "Prompt is too long")
+ */
+export function classifyClaudeError(err: unknown): ClassifiedProviderError {
+  const message = err instanceof Error ? err.message : String(err);
+  const errAny = err as { name?: string; status?: number; error?: { type?: string } };
+
+  // Executable / spawn issues — unrecoverable, no point retrying.
+  if (
+    message.includes('Claude executable not found') ||
+    message.includes('CLAUDE_CODE_PATH') ||
+    message.includes('ENOENT') ||
+    message.startsWith('spawn ')
+  ) {
+    return new ClassifiedProviderError(message, { kind: 'unrecoverable', cause: err });
+  }
+
+  // Anthropic auth failures.
+  if (
+    errAny.status === 401 ||
+    errAny.status === 403 ||
+    message.includes('Invalid API key') ||
+    message.includes('API_KEY_INVALID') ||
+    message.includes('API key expired') ||
+    message.includes('API key not valid')
+  ) {
+    return new ClassifiedProviderError(message, { kind: 'auth_invalid', cause: err });
+  }
+
+  // SDK-level overloaded — Anthropic emits OverloadedError or 529 with type:'overloaded_error'.
+  if (
+    errAny.name === 'OverloadedError' ||
+    errAny.status === 529 ||
+    errAny.error?.type === 'overloaded_error'
+  ) {
+    return new ClassifiedProviderError(message || 'Anthropic overloaded', { kind: 'transient', cause: err });
+  }
+
+  // Rate limit.
+  if (errAny.status === 429) {
+    return new ClassifiedProviderError(message, { kind: 'rate_limit', cause: err });
+  }
+
+  // Quota.
+  if (message.toLowerCase().includes('quota exceeded')) {
+    return new ClassifiedProviderError(message, { kind: 'quota_exhausted', cause: err });
+  }
+
+  // Context overflow — unrecoverable in this session, requires reset.
+  if (
+    message.includes('Prompt is too long') ||
+    message.includes('prompt is too long') ||
+    message.includes('context window')
+  ) {
+    return new ClassifiedProviderError(message, { kind: 'unrecoverable', cause: err });
+  }
+
+  // Server errors → transient.
+  if (typeof errAny.status === 'number' && errAny.status >= 500 && errAny.status < 600) {
+    return new ClassifiedProviderError(message, { kind: 'transient', cause: err });
+  }
+
+  // Default: treat unknown errors as transient (preserve old behavior of
+  // retrying everything not explicitly marked unrecoverable).
+  return new ClassifiedProviderError(message, { kind: 'transient', cause: err });
+}

 export class ClaudeProvider {
  private dbManager: DatabaseManager;
@@ -41,7 +116,8 @@ export class ClaudeProvider {
  async startSession(session: ActiveSession, worker?: WorkerRef): Promise<void> {
    const cwdTracker = { lastCwd: undefined as string | undefined };

-    const claudePath = this.findClaudeExecutable();
+    // Find and validate Claude executable (shared utility, closes #2222)
+    const claudePath = findClaudeExecutable('SDK');

    const modelId = session.modelOverride || this.getModelId();
    const disallowedTools = [
@@ -76,7 +152,7 @@ export class ClaudeProvider {
    const maxConcurrent = parseInt(settings.CLAUDE_MEM_MAX_CONCURRENT_AGENTS, 10) || 2;
    await waitForSlot(maxConcurrent, 60_000);

-    const isolatedEnv = sanitizeEnv(buildIsolatedEnv());
+    const isolatedEnv = sanitizeEnv(await buildIsolatedEnvWithFreshOAuth());
    const authMethod = getAuthMethodDescription();

    logger.info('SDK', 'Starting SDK query', {
@@ -120,6 +196,36 @@ export class ClaudeProvider {

    try {
      for await (const message of queryResult) {
+        // Quota-aware wall-clock guard (#2234): the SDK pushes `system` events
+        // with subtype `rate_limit` carrying live subscription quota state.
+        // Capture the snapshot, then bail out of the loop before issuing
+        // another request if we've crossed a per-window threshold. API-key
+        // users are exempt — they authorized per-call spend.
+        if (
+          (message as any)?.type === 'system' &&
+          (message as any)?.subtype === 'rate_limit'
+        ) {
+          const info = (message as any).rate_limit_info as RateLimitInfo | undefined;
+          if (info) {
+            globalRateLimitStore.set(info);
+          }
+          const decision = shouldAbortForQuota(authMethod, globalRateLimitStore);
+          if (decision.abort) {
+            logger.warn('SDK', `Aborting session for quota guard: ${decision.reason}`, {
+              sessionDbId: session.sessionDbId,
+              window: decision.window,
+              authMethod,
+            });
+            session.abortReason = `quota:${decision.window ?? 'unknown'}`;
+            try {
+              session.abortController.abort();
+            } catch {
+              // best-effort
+            }
+            break;
+          }
+        }
+
        if (message.session_id && message.session_id !== session.memorySessionId) {
          const previousId = session.memorySessionId;
          session.memorySessionId = message.session_id;
@@ -333,46 +439,8 @@ export class ClaudeProvider {
    }
  }

-  private findClaudeExecutable(): string {
-    const settings = SettingsDefaultsManager.loadFromFile(USER_SETTINGS_PATH);
-
-    if (settings.CLAUDE_CODE_PATH) {
-      const { existsSync } = require('fs');
-      if (!existsSync(settings.CLAUDE_CODE_PATH)) {
-        throw new Error(`CLAUDE_CODE_PATH is set to "${settings.CLAUDE_CODE_PATH}" but the file does not exist.`);
-      }
-      return settings.CLAUDE_CODE_PATH;
-    }
-
-    if (process.platform === 'win32') {
-      try {
-        execSync('where claude.cmd', { encoding: 'utf8', windowsHide: true, stdio: ['ignore', 'pipe', 'ignore'] });
-        return 'claude.cmd'; 
-      } catch {
-        // Fall through to generic error
-      }
-    }
-
-    try {
-      const claudePath = execSync(
-        process.platform === 'win32' ? 'where claude' : 'which claude',
-        { encoding: 'utf8', windowsHide: true, stdio: ['ignore', 'pipe', 'ignore'] }
-      ).trim().split('\n')[0].trim();
-
-      if (claudePath) return claudePath;
-    } catch (error) {
-      if (error instanceof Error) {
-        logger.debug('SDK', 'Claude executable auto-detection failed', {}, error);
-      } else {
-        logger.debug('SDK', 'Claude executable auto-detection failed with non-Error', {}, new Error(String(error)));
-      }
-    }
-
-    throw new Error('Claude executable not found. Please either:\n1. Add "claude" to your system PATH, or\n2. Set CLAUDE_CODE_PATH in ~/.claude-mem/settings.json');
-  }
-
  private getModelId(): string {
-    const settingsPath = path.join(homedir(), '.claude-mem', 'settings.json');
+    const settingsPath = paths.settings();
    const settings = SettingsDefaultsManager.loadFromFile(settingsPath);
    return settings.CLAUDE_MEM_MODEL;
  }
@@ -1,13 +1,11 @@

-import path from 'path';
-import { homedir } from 'os';
 import { DatabaseManager } from './DatabaseManager.js';
 import { SessionManager } from './SessionManager.js';
 import { logger } from '../../utils/logger.js';
 import { buildInitPrompt, buildObservationPrompt, buildSummaryPrompt, buildContinuationPrompt } from '../../sdk/prompts.js';
 import { SettingsDefaultsManager } from '../../shared/SettingsDefaultsManager.js';
 import { getCredential } from '../../shared/EnvManager.js';
-import { USER_SETTINGS_PATH } from '../../shared/paths.js';
+import { USER_SETTINGS_PATH, paths } from '../../shared/paths.js';
 import { estimateTokens } from '../../shared/timeline-formatting.js';
 import type { ActiveSession, ConversationMessage } from '../worker-types.js';
 import { ModeManager } from '../domain/ModeManager.js';
@@ -17,9 +15,105 @@ import {
  isAbortError,
  type WorkerRef
 } from './agents/index.js';
+import { ClassifiedProviderError } from './provider-errors.js';
+import { withRetry } from './retry.js';

 const GEMINI_API_URL = 'https://generativelanguage.googleapis.com/v1/models';

+/**
+ * Parse Retry-After header (seconds or HTTP-date).
+ * Returns ms or undefined.
+ */
+function parseRetryAfterMs(value: string | null): number | undefined {
+  if (!value) return undefined;
+  const seconds = Number(value);
+  if (!Number.isNaN(seconds) && seconds >= 0) {
+    return Math.floor(seconds * 1000);
+  }
+  const dateMs = Date.parse(value);
+  if (!Number.isNaN(dateMs)) {
+    const delta = dateMs - Date.now();
+    return delta > 0 ? delta : 0;
+  }
+  return undefined;
+}
+
+/**
+ * Classify a Gemini fetch failure into ClassifiedProviderError. Called at
+ * the boundary right after `fetch()` returns or throws. Provider-specific
+ * because Gemini surfaces auth/quota/rate-limit signals via specific status
+ * codes and body strings (e.g. "quota exceeded", "API key not valid").
+ */
+export function classifyGeminiError(input: {
+  status?: number;
+  bodyText?: string;
+  headers?: Headers | { get(name: string): string | null };
+  cause: unknown;
+  requestId?: string;
+}): ClassifiedProviderError {
+  const status = input.status;
+  const body = input.bodyText ?? '';
+  const lower = body.toLowerCase();
+  const headers = input.headers;
+  const retryAfterMs = headers ? parseRetryAfterMs(headers.get('retry-after')) : undefined;
+
+  // Quota exceeded — by body marker — even on 500 (Gemini quirk).
+  if (lower.includes('quota exceeded') || lower.includes('resource_exhausted')) {
+    return new ClassifiedProviderError(
+      `Gemini quota exhausted${status !== undefined ? ` (status ${status})` : ''}`,
+      { kind: 'quota_exhausted', cause: input.cause },
+    );
+  }
+
+  if (status === 429) {
+    return new ClassifiedProviderError(
+      'Gemini rate limit (429)',
+      { kind: 'rate_limit', cause: input.cause, ...(retryAfterMs !== undefined ? { retryAfterMs } : {}) },
+    );
+  }
+
+  if (status === 401 || status === 403) {
+    // API_KEY_INVALID, PERMISSION_DENIED, etc.
+    if (lower.includes('api key not valid') || lower.includes('api_key_invalid') || lower.includes('api key expired')) {
+      return new ClassifiedProviderError(
+        `Gemini auth invalid (status ${status})`,
+        { kind: 'auth_invalid', cause: input.cause },
+      );
+    }
+    return new ClassifiedProviderError(
+      `Gemini auth error (status ${status})`,
+      { kind: 'auth_invalid', cause: input.cause },
+    );
+  }
+
+  if (status === 400) {
+    return new ClassifiedProviderError(
+      `Gemini bad request (status 400)`,
+      { kind: 'unrecoverable', cause: input.cause },
+    );
+  }
+
+  if (status !== undefined && status >= 500 && status < 600) {
+    return new ClassifiedProviderError(
+      `Gemini upstream error (status ${status})`,
+      { kind: 'transient', cause: input.cause },
+    );
+  }
+
+  // Network errors (no status) — treat as transient.
+  if (status === undefined) {
+    return new ClassifiedProviderError(
+      `Gemini network error: ${input.cause instanceof Error ? input.cause.message : String(input.cause)}`,
+      { kind: 'transient', cause: input.cause },
+    );
+  }
+
+  return new ClassifiedProviderError(
+    `Gemini API error: ${status}${body ? ` - ${body.substring(0, 200)}` : ''}`,
+    { kind: 'unrecoverable', cause: input.cause },
+  );
+}
+
 export type GeminiModel =
  | 'gemini-2.5-flash-lite'
  | 'gemini-2.5-flash'
@@ -346,26 +440,54 @@ export class GeminiProvider {

    await enforceRateLimitForModel(model, rateLimitingEnabled);

-    const response = await fetch(url, {
-      method: 'POST',
-      headers: {
-        'Content-Type': 'application/json',
-      },
-      body: JSON.stringify({
-        contents,
-        generationConfig: {
-          temperature: 0.3,  // Lower temperature for structured extraction
-          maxOutputTokens: 4096,
-        },
-      }),
-    });
+    // Track request-id (best-effort dedup) across retries.
+    let priorRequestId: string | null = null;

-    if (!response.ok) {
-      const error = await response.text();
-      throw new Error(`Gemini API error: ${response.status} - ${error}`);
-    }
+    const data = await withRetry<GeminiResponse>(async (attemptSignal) => {
+      let response: Response;
+      try {
+        response = await fetch(url, {
+          method: 'POST',
+          headers: {
+            'Content-Type': 'application/json',
+            ...(priorRequestId ? { 'x-claude-mem-prior-request-id': priorRequestId } : {}),
+          },
+          body: JSON.stringify({
+            contents,
+            generationConfig: {
+              temperature: 0.3,  // Lower temperature for structured extraction
+              maxOutputTokens: 4096,
+            },
+          }),
+          signal: attemptSignal,
+        });
+      } catch (networkError: unknown) {
+        // Network failures, aborts, DNS, etc.
+        throw classifyGeminiError({
+          cause: networkError,
+        });
+      }

-    const data = await response.json() as GeminiResponse;
+      const requestId = response.headers.get('x-goog-request-id') ?? response.headers.get('x-request-id');
+      if (requestId) {
+        priorRequestId = requestId;
+      } else {
+        logger.debug('SDK', 'Gemini response missing request-id header; retry dedup is best-effort');
+      }
+
+      if (!response.ok) {
+        const errorBody = await response.text();
+        throw classifyGeminiError({
+          status: response.status,
+          bodyText: errorBody,
+          headers: response.headers,
+          cause: new Error(`Gemini API error: ${response.status} - ${errorBody}`),
+          ...(requestId ? { requestId } : {}),
+        });
+      }
+
+      return await response.json() as GeminiResponse;
+    }, { label: `Gemini ${model}` });

    if (!data.candidates?.[0]?.content?.parts?.[0]?.text) {
      logger.error('SDK', 'Empty response from Gemini');
@@ -379,7 +501,7 @@ export class GeminiProvider {
  }

  private getGeminiConfig(): { apiKey: string; model: GeminiModel; rateLimitingEnabled: boolean } {
-    const settingsPath = path.join(homedir(), '.claude-mem', 'settings.json');
+    const settingsPath = paths.settings();
    const settings = SettingsDefaultsManager.loadFromFile(settingsPath);

    const apiKey = settings.CLAUDE_MEM_GEMINI_API_KEY || getCredential('GEMINI_API_KEY') || '';
@@ -414,13 +536,13 @@ export class GeminiProvider {
 }

 export function isGeminiAvailable(): boolean {
-  const settingsPath = path.join(homedir(), '.claude-mem', 'settings.json');
+  const settingsPath = paths.settings();
  const settings = SettingsDefaultsManager.loadFromFile(settingsPath);
  return !!(settings.CLAUDE_MEM_GEMINI_API_KEY || getCredential('GEMINI_API_KEY'));
 }

 export function isGeminiSelected(): boolean {
-  const settingsPath = path.join(homedir(), '.claude-mem', 'settings.json');
+  const settingsPath = paths.settings();
  const settings = SettingsDefaultsManager.loadFromFile(settingsPath);
  return settings.CLAUDE_MEM_PROVIDER === 'gemini';
 }
@@ -14,9 +14,99 @@ import {
  processAgentResponse,
  type WorkerRef
 } from './agents/index.js';
+import { ClassifiedProviderError } from './provider-errors.js';
+import { withRetry } from './retry.js';

 const OPENROUTER_API_URL = 'https://openrouter.ai/api/v1/chat/completions';

+/**
+ * Parse Retry-After header (seconds or HTTP-date). Returns ms or undefined.
+ */
+function parseRetryAfterMs(value: string | null): number | undefined {
+  if (!value) return undefined;
+  const seconds = Number(value);
+  if (!Number.isNaN(seconds) && seconds >= 0) {
+    return Math.floor(seconds * 1000);
+  }
+  const dateMs = Date.parse(value);
+  if (!Number.isNaN(dateMs)) {
+    const delta = dateMs - Date.now();
+    return delta > 0 ? delta : 0;
+  }
+  return undefined;
+}
+
+/**
+ * Classify an OpenRouter fetch failure into ClassifiedProviderError. Called
+ * at the boundary right after `fetch()` returns or throws.
+ */
+export function classifyOpenRouterError(input: {
+  status?: number;
+  bodyText?: string;
+  headers?: Headers | { get(name: string): string | null };
+  cause: unknown;
+  requestId?: string;
+}): ClassifiedProviderError {
+  const status = input.status;
+  const body = input.bodyText ?? '';
+  const lower = body.toLowerCase();
+  const headers = input.headers;
+  const retryAfterMs = headers ? parseRetryAfterMs(headers.get('retry-after')) : undefined;
+
+  // Quota / insufficient credits — body marker takes precedence over status.
+  if (
+    lower.includes('quota exceeded') ||
+    lower.includes('insufficient credits') ||
+    lower.includes('insufficient_quota')
+  ) {
+    return new ClassifiedProviderError(
+      `OpenRouter quota exhausted${status !== undefined ? ` (status ${status})` : ''}`,
+      { kind: 'quota_exhausted', cause: input.cause },
+    );
+  }
+
+  if (status === 429) {
+    return new ClassifiedProviderError(
+      'OpenRouter rate limit (429)',
+      { kind: 'rate_limit', cause: input.cause, ...(retryAfterMs !== undefined ? { retryAfterMs } : {}) },
+    );
+  }
+
+  if (status === 401 || status === 403) {
+    return new ClassifiedProviderError(
+      `OpenRouter auth error (status ${status})`,
+      { kind: 'auth_invalid', cause: input.cause },
+    );
+  }
+
+  if (status === 400 || status === 404) {
+    return new ClassifiedProviderError(
+      `OpenRouter bad request (status ${status})`,
+      { kind: 'unrecoverable', cause: input.cause },
+    );
+  }
+
+  if (status !== undefined && status >= 500 && status < 600) {
+    return new ClassifiedProviderError(
+      `OpenRouter upstream error (status ${status})`,
+      { kind: 'transient', cause: input.cause },
+    );
+  }
+
+  // Network errors (no status) — treat as transient.
+  if (status === undefined) {
+    return new ClassifiedProviderError(
+      `OpenRouter network error: ${input.cause instanceof Error ? input.cause.message : String(input.cause)}`,
+      { kind: 'transient', cause: input.cause },
+    );
+  }
+
+  return new ClassifiedProviderError(
+    `OpenRouter API error: ${status}${body ? ` - ${body.substring(0, 200)}` : ''}`,
+    { kind: 'unrecoverable', cause: input.cause },
+  );
+}
+
 const DEFAULT_MAX_CONTEXT_MESSAGES = 20;  
 const DEFAULT_MAX_ESTIMATED_TOKENS = 100000;  
 const CHARS_PER_TOKEN_ESTIMATE = 4;  
@@ -339,32 +429,64 @@ export class OpenRouterProvider {
      estimatedTokens
    });

-    const response = await fetch(OPENROUTER_API_URL, {
-      method: 'POST',
-      headers: {
-        'Authorization': `Bearer ${apiKey}`,
-        'HTTP-Referer': siteUrl || 'https://github.com/thedotmack/claude-mem',
-        'X-Title': appName || 'claude-mem',
-        'Content-Type': 'application/json',
-      },
-      body: JSON.stringify({
-        model,
-        messages,
-        temperature: 0.3,  // Lower temperature for structured extraction
-        max_tokens: 4096,
-      }),
-    });
+    let priorRequestId: string | null = null;

-    if (!response.ok) {
-      const errorText = await response.text();
-      throw new Error(`OpenRouter API error: ${response.status} - ${errorText}`);
-    }
+    const data = await withRetry<OpenRouterResponse>(async (attemptSignal) => {
+      let response: Response;
+      try {
+        response = await fetch(OPENROUTER_API_URL, {
+          method: 'POST',
+          headers: {
+            'Authorization': `Bearer ${apiKey}`,
+            'HTTP-Referer': siteUrl || 'https://github.com/thedotmack/claude-mem',
+            'X-Title': appName || 'claude-mem',
+            'Content-Type': 'application/json',
+            ...(priorRequestId ? { 'x-claude-mem-prior-request-id': priorRequestId } : {}),
+          },
+          body: JSON.stringify({
+            model,
+            messages,
+            temperature: 0.3,  // Lower temperature for structured extraction
+            max_tokens: 4096,
+          }),
+          signal: attemptSignal,
+        });
+      } catch (networkError: unknown) {
+        throw classifyOpenRouterError({ cause: networkError });
+      }

-    const data = await response.json() as OpenRouterResponse;
+      const requestId = response.headers.get('x-request-id') ?? response.headers.get('x-openrouter-request-id');
+      if (requestId) {
+        priorRequestId = requestId;
+      } else {
+        logger.debug('SDK', 'OpenRouter response missing request-id header; retry dedup is best-effort');
+      }

-    if (data.error) {
-      throw new Error(`OpenRouter API error: ${data.error.code} - ${data.error.message}`);
-    }
+      if (!response.ok) {
+        const errorText = await response.text();
+        throw classifyOpenRouterError({
+          status: response.status,
+          bodyText: errorText,
+          headers: response.headers,
+          cause: new Error(`OpenRouter API error: ${response.status} - ${errorText}`),
+          ...(requestId ? { requestId } : {}),
+        });
+      }
+
+      const responseData = await response.json() as OpenRouterResponse;
+
+      if (responseData.error) {
+        // Per OpenRouter spec, errors can come in 200 responses too.
+        throw classifyOpenRouterError({
+          status: response.status,
+          bodyText: `${responseData.error.code} ${responseData.error.message ?? ''}`,
+          headers: response.headers,
+          cause: new Error(`OpenRouter API error: ${responseData.error.code} - ${responseData.error.message}`),
+        });
+      }
+
+      return responseData;
+    }, { label: `OpenRouter ${model}` });

    if (!data.choices?.[0]?.message?.content) {
      logger.error('SDK', 'Empty response from OpenRouter');
@@ -0,0 +1,223 @@
+/**
+ * Rate limit store — captures `rate_limit` system events emitted by
+ * `@anthropic-ai/claude-agent-sdk`'s `query()` stream.
+ *
+ * The SDK reports the live Claude subscription quota state as `system` events
+ * with subtype `rate_limit`. The payload includes the (currently undocumented)
+ * `rate_limit_info` shape:
+ *
+ *   {
+ *     status: "allowed" | "allowed_warning" | "rejected",
+ *     resetsAt?: number,                              // epoch ms
+ *     rateLimitType?: "five_hour" | "seven_day"
+ *                   | "seven_day_opus" | "seven_day_sonnet"
+ *                   | "overage",
+ *     utilization?: number,                           // 0..1
+ *     overageStatus?: "allowed" | "allowed_warning" | "rejected",
+ *     overageResetsAt?: number,
+ *     isUsingOverage?: boolean,
+ *     surpassedThreshold?: number,
+ *   }
+ *
+ * Pattern adapted from meridian's proxy/rateLimitStore.ts (last-write-wins
+ * per `rateLimitType` bucket, in-memory only). State resets on worker
+ * restart — that's fine, the SDK pushes a fresh event on the next request.
+ *
+ * Quota-aware abort logic gates the worker from continuing to consume a
+ * subscription bucket once it crosses a per-window threshold. API-key
+ * users are exempt because they authorized per-call spend.
+ */
+
+export type RateLimitWindow =
+  | 'five_hour'
+  | 'seven_day'
+  | 'seven_day_opus'
+  | 'seven_day_sonnet'
+  | 'overage';
+
+export interface RateLimitInfo {
+  status?: 'allowed' | 'allowed_warning' | 'rejected';
+  resetsAt?: number;
+  rateLimitType?: RateLimitWindow;
+  utilization?: number;
+  overageStatus?: 'allowed' | 'allowed_warning' | 'rejected';
+  overageResetsAt?: number;
+  isUsingOverage?: boolean;
+  surpassedThreshold?: number;
+}
+
+export interface RateLimitEntry extends RateLimitInfo {
+  observedAt: number;
+}
+
+export type RateLimitBucketKey = RateLimitWindow | 'default';
+
+export class RateLimitStore {
+  private entries = new Map<RateLimitBucketKey, RateLimitEntry>();
+
+  /**
+   * Record a rate-limit info snapshot. Last-write-wins per bucket key.
+   * Accepts both the literal `rate_limit_info` payload and a wrapping object;
+   * callers should pass the inner info.
+   */
+  set(info: RateLimitInfo | undefined | null): void {
+    if (!info || typeof info !== 'object') return;
+    const key: RateLimitBucketKey = info.rateLimitType ?? 'default';
+    this.entries.set(key, { ...info, observedAt: Date.now() });
+  }
+
+  /** Snapshot a single bucket, or undefined if not yet seen. */
+  get(type: RateLimitWindow | undefined): RateLimitEntry | undefined {
+    if (!type) return this.entries.get('default');
+    return this.entries.get(type);
+  }
+
+  /** All current entries, newest-first by observedAt. */
+  getAll(): RateLimitEntry[] {
+    return Array.from(this.entries.values()).sort(
+      (a, b) => b.observedAt - a.observedAt,
+    );
+  }
+
+  /** Latest snapshot per "interesting" window for health surface. */
+  getMostRecentByWindow(): {
+    five_hour?: RateLimitEntry;
+    seven_day?: RateLimitEntry;
+    seven_day_opus?: RateLimitEntry;
+    seven_day_sonnet?: RateLimitEntry;
+    overage?: RateLimitEntry;
+  } {
+    return {
+      five_hour: this.entries.get('five_hour'),
+      seven_day: this.entries.get('seven_day'),
+      seven_day_opus: this.entries.get('seven_day_opus'),
+      seven_day_sonnet: this.entries.get('seven_day_sonnet'),
+      overage: this.entries.get('overage'),
+    };
+  }
+
+  get size(): number {
+    return this.entries.size;
+  }
+
+  /** Drop all entries — used by tests for isolation. */
+  clear(): void {
+    this.entries.clear();
+  }
+}
+
+/** Process-wide singleton. */
+export const globalRateLimitStore = new RateLimitStore();
+
+/**
+ * Per-window utilization thresholds for subscription users (cli/oauth).
+ * Crossing one of these aborts the SDK loop so we don't burn through the
+ * window on background memory work and starve interactive sessions.
+ */
+const UTILIZATION_THRESHOLDS: Record<RateLimitWindow, number> = {
+  five_hour: 0.95,
+  seven_day_opus: 0.93,
+  seven_day_sonnet: 0.92,
+  seven_day: 0.93,
+  overage: 0.95,
+};
+
+/** Reset-window grace: bail early if a window resets within this many ms. */
+const RESET_GRACE_MS = 15 * 60 * 1000; // 15 minutes
+/** Utilization floor before the reset-grace check kicks in. */
+const RESET_GRACE_UTILIZATION_FLOOR = 0.85;
+
+/**
+ * Decide whether to abort SDK consumption based on the latest rate-limit
+ * snapshot and the active auth method.
+ *
+ * - `api_key` (or any string starting with "API key"): never abort —
+ *   per-call billing means the user already authorized the spend.
+ * - `cli` / OAuth / subscription: per-window utilization thresholds plus a
+ *   reset-grace buffer so we avoid burning the last few percent right
+ *   before a window resets.
+ */
+export function shouldAbortForQuota(
+  authMethod: string,
+  store: RateLimitStore,
+  now: number = Date.now(),
+): { abort: boolean; reason?: string; window?: RateLimitWindow } {
+  // API-key users authorized per-call spend; the wall-clock guard is for
+  // subscription quota only.
+  if (isApiKeyAuth(authMethod)) {
+    return { abort: false };
+  }
+
+  const windows: RateLimitWindow[] = [
+    'five_hour',
+    'seven_day_opus',
+    'seven_day_sonnet',
+    'seven_day',
+    'overage',
+  ];
+
+  for (const window of windows) {
+    const entry = store.get(window);
+    if (!entry) continue;
+
+    const util = entry.utilization;
+    const threshold = UTILIZATION_THRESHOLDS[window];
+
+    // Provider-side rejection trumps utilization heuristics. A snapshot with
+    // status='rejected' (or overageStatus='rejected' on the overage window)
+    // means the provider has already declared the bucket exhausted; we must
+    // stop regardless of whether utilization is reported.
+    const isRejected =
+      entry.status === 'rejected' ||
+      (window === 'overage' && entry.overageStatus === 'rejected');
+
+    if (isRejected) {
+      return {
+        abort: true,
+        window,
+        reason: `quota:${window} rejected by provider`,
+      };
+    }
+
+    if (typeof util === 'number' && util >= threshold) {
+      return {
+        abort: true,
+        window,
+        reason: `quota:${window} utilization ${(util * 100).toFixed(1)}% >= ${(threshold * 100).toFixed(0)}%`,
+      };
+    }
+
+    // Reset-grace buffer: only meaningful for the rolling 5h window where
+    // a fresh bucket is imminent. Skip when utilization is low — no point
+    // bailing on a window that just reset to ~0%.
+    if (
+      window === 'five_hour' &&
+      typeof entry.resetsAt === 'number' &&
+      typeof util === 'number' &&
+      util >= RESET_GRACE_UTILIZATION_FLOOR
+    ) {
+      const msUntilReset = entry.resetsAt - now;
+      if (msUntilReset > 0 && msUntilReset <= RESET_GRACE_MS) {
+        return {
+          abort: true,
+          window,
+          reason: `quota:${window} resets in ${Math.round(msUntilReset / 60000)}m (grace buffer ${RESET_GRACE_MS / 60000}m, util ${(util * 100).toFixed(1)}%)`,
+        };
+      }
+    }
+  }
+
+  return { abort: false };
+}
+
+/**
+ * Detects API-key auth from a free-form auth-method label. Matches the
+ * verbose strings produced by `getAuthMethodDescription()` (e.g.
+ * "API key (from ~/.claude-mem/.env)") as well as concise tokens like
+ * "api_key".
+ */
+export function isApiKeyAuth(authMethod: string): boolean {
+  if (!authMethod) return false;
+  const normalized = authMethod.toLowerCase();
+  return normalized.startsWith('api key') || normalized === 'api_key';
+}
@@ -173,9 +173,15 @@ async function syncAndBroadcastObservations(
  agentName: string,
  projectRoot?: string
 ): Promise<void> {
-  for (let i = 0; i < observations.length; i++) {
-    const obsId = result.observationIds[i];
-    const obs = observations[i];
+  // Dedupe observation IDs before sync/broadcast: storeObservations may collapse
+  // multiple parsed observations onto the same row via content_hash, producing
+  // duplicate IDs. Syncing them 1:1 triggers repeated Chroma "IDs already exist"
+  // reconciles. See issue #2240.
+  const uniqueObservationIds = [...new Set(result.observationIds)];
+
+  for (const obsId of uniqueObservationIds) {
+    const observationIndex = result.observationIds.indexOf(obsId);
+    const obs = observations[observationIndex];
    const chromaStart = Date.now();

    dbManager.getChromaSync()?.syncObservation(
@@ -4,8 +4,7 @@ import { z } from 'zod';
 import path from 'path';
 import { readFileSync, statSync, existsSync } from 'fs';
 import { logger } from '../../../../utils/logger.js';
-import { homedir } from 'os';
-import { getPackageRoot } from '../../../../shared/paths.js';
+import { getPackageRoot, paths } from '../../../../shared/paths.js';
 import { getWorkerPort } from '../../../../shared/worker-utils.js';
 import { PaginationHelper } from '../../PaginationHelper.js';
 import { DatabaseManager } from '../../DatabaseManager.js';
@@ -17,6 +16,7 @@ import { validateBody } from '../middleware/validateBody.js';
 import { normalizePlatformSource } from '../../../../shared/platform-source.js';
 import { getObservationsByFilePath } from '../../../sqlite/observations/get.js';
 import { getFirstObservationCreatedAt } from '../../../sqlite/observations/recent.js';
+import { getUptimeSeconds } from '../../../../shared/uptime.js';

 const integerArrayLike = z.preprocess((value) => {
  if (Array.isArray(value)) return value;
@@ -215,13 +215,13 @@ export class DataRoutes extends BaseRouteHandler {
    const totalSummaries = db.prepare('SELECT COUNT(*) as count FROM session_summaries').get() as { count: number };
    const firstObservationAt = getFirstObservationCreatedAt(db);

-    const dbPath = path.join(homedir(), '.claude-mem', 'claude-mem.db');
+    const dbPath = paths.database();
    let dbSize = 0;
    if (existsSync(dbPath)) {
      dbSize = statSync(dbPath).size;
    }

-    const uptime = Math.floor((Date.now() - this.startTime) / 1000);
+    const uptime = getUptimeSeconds(this.startTime);
    const activeSessions = this.sessionManager.getActiveSessionCount();
    const sseClients = this.sseBroadcaster.getClientCount();

@@ -20,6 +20,7 @@ import { getProjectContext } from '../../../../utils/project-name.js';
 import { normalizePlatformSource } from '../../../../shared/platform-source.js';
 import { handleGeneratorExit } from '../../session/GeneratorExitHandler.js';
 import { SessionCompletionHandler } from '../../session/SessionCompletionHandler.js';
+import { getUptimeSeconds } from '../../../../shared/uptime.js';

 const MAX_USER_PROMPT_BYTES = 256 * 1024;

@@ -322,7 +323,7 @@ export class SessionRoutes extends BaseRouteHandler {
      sessionDbId,
      queueLength,
      summaryStored: session.lastSummaryStored ?? null,
-      uptime: Date.now() - session.startTime
+      uptime: getUptimeSeconds(session.startTime)
    });
  });

@@ -3,8 +3,7 @@ import express, { Request, Response } from 'express';
 import { z } from 'zod';
 import path from 'path';
 import { readFileSync, writeFileSync, existsSync, renameSync, mkdirSync } from 'fs';
-import { homedir } from 'os';
-import { getPackageRoot } from '../../../../shared/paths.js';
+import { getPackageRoot, paths } from '../../../../shared/paths.js';
 import { logger } from '../../../../utils/logger.js';
 import { SettingsManager } from '../../SettingsManager.js';
 import { getBranchInfo, switchBranch, pullUpdates } from '../../BranchManager.js';
@@ -47,7 +46,7 @@ export class SettingsRoutes extends BaseRouteHandler {
  }

  private handleGetSettings = this.wrapHandler((req: Request, res: Response): void => {
-    const settingsPath = path.join(homedir(), '.claude-mem', 'settings.json');
+    const settingsPath = paths.settings();
    this.ensureSettingsFile(settingsPath);
    const settings = SettingsDefaultsManager.loadFromFile(settingsPath);
    res.json(settings);
@@ -63,7 +62,7 @@ export class SettingsRoutes extends BaseRouteHandler {
      return;
    }

-    const settingsPath = path.join(homedir(), '.claude-mem', 'settings.json');
+    const settingsPath = paths.settings();
    this.ensureSettingsFile(settingsPath);
    let settings: any = {};

@@ -76,7 +75,7 @@ export class SettingsRoutes extends BaseRouteHandler {
        logger.error('HTTP', 'Failed to parse settings file', { settingsPath }, normalizedParseError);
        res.status(500).json({
          success: false,
-          error: 'Settings file is corrupted. Delete ~/.claude-mem/settings.json to reset.'
+          error: `Settings file is corrupted. Delete ${settingsPath} to reset.`
        });
        return;
      }
@@ -1,11 +1,11 @@

 import * as fs from 'node:fs';
 import * as path from 'node:path';
-import * as os from 'node:os';
 import { logger } from '../../../utils/logger.js';
+import { paths } from '../../../shared/paths.js';
 import type { CorpusFile, CorpusStats } from './types.js';

-const CORPORA_DIR = path.join(os.homedir(), '.claude-mem', 'corpora');
+const CORPORA_DIR = paths.corpora();

 export class CorpusStore {
  private readonly corporaDir: string;
@@ -1,12 +1,12 @@

-import { execSync } from 'child_process';
 import { CorpusStore } from './CorpusStore.js';
 import { CorpusRenderer } from './CorpusRenderer.js';
 import type { CorpusFile, QueryResult } from './types.js';
 import { logger } from '../../../utils/logger.js';
 import { SettingsDefaultsManager } from '../../../shared/SettingsDefaultsManager.js';
 import { USER_SETTINGS_PATH, OBSERVER_SESSIONS_DIR, ensureDir } from '../../../shared/paths.js';
-import { buildIsolatedEnv } from '../../../shared/EnvManager.js';
+import { buildIsolatedEnvWithFreshOAuth } from '../../../shared/EnvManager.js';
+import { findClaudeExecutable } from '../../../shared/find-claude-executable.js';
 import { sanitizeEnv } from '../../../supervisor/env-sanitizer.js';

 // @ts-ignore - Agent SDK types may not be available
@@ -50,8 +50,8 @@ export class KnowledgeAgent {
    ].join('\n');

    ensureDir(OBSERVER_SESSIONS_DIR);
-    const claudePath = this.findClaudeExecutable();
-    const isolatedEnv = sanitizeEnv(buildIsolatedEnv());
+    const claudePath = findClaudeExecutable('WORKER');
+    const isolatedEnv = sanitizeEnv(await buildIsolatedEnvWithFreshOAuth());

    const queryResult = query({
      prompt: primePrompt,
@@ -145,8 +145,8 @@ export class KnowledgeAgent {

  private async executeQuery(corpus: CorpusFile, question: string): Promise<QueryResult> {
    ensureDir(OBSERVER_SESSIONS_DIR);
-    const claudePath = this.findClaudeExecutable();
-    const isolatedEnv = sanitizeEnv(buildIsolatedEnv());
+    const claudePath = findClaudeExecutable('WORKER');
+    const isolatedEnv = sanitizeEnv(await buildIsolatedEnvWithFreshOAuth());

    const queryResult = query({
      prompt: question,
@@ -196,41 +196,4 @@ export class KnowledgeAgent {
    return settings.CLAUDE_MEM_MODEL;
  }

-  private findClaudeExecutable(): string {
-    const settings = SettingsDefaultsManager.loadFromFile(USER_SETTINGS_PATH);
-
-    if (settings.CLAUDE_CODE_PATH) {
-      const { existsSync } = require('fs');
-      if (!existsSync(settings.CLAUDE_CODE_PATH)) {
-        throw new Error(`CLAUDE_CODE_PATH is set to "${settings.CLAUDE_CODE_PATH}" but the file does not exist.`);
-      }
-      return settings.CLAUDE_CODE_PATH;
-    }
-
-    if (process.platform === 'win32') {
-      try {
-        execSync('where claude.cmd', { encoding: 'utf8', windowsHide: true, stdio: ['ignore', 'pipe', 'ignore'] });
-        return 'claude.cmd';
-      } catch {
-        // Fall through to generic detection
-      }
-    }
-
-    try {
-      const claudePath = execSync(
-        process.platform === 'win32' ? 'where claude' : 'which claude',
-        { encoding: 'utf8', windowsHide: true, stdio: ['ignore', 'pipe', 'ignore'] }
-      ).trim().split('\n')[0].trim();
-
-      if (claudePath) return claudePath;
-    } catch (error) {
-      if (error instanceof Error) {
-        logger.debug('WORKER', 'Claude executable auto-detection failed', {}, error);
-      } else {
-        logger.debug('WORKER', 'Claude executable auto-detection failed (non-Error thrown)', { thrownValue: String(error) });
-      }
-    }
-
-    throw new Error('Claude executable not found. Please either:\n1. Add "claude" to your system PATH, or\n2. Set CLAUDE_CODE_PATH in ~/.claude-mem/settings.json');
-  }
 }
@@ -0,0 +1,32 @@
+// F4 foundation: classified provider errors with extensible kind field.
+export type ProviderErrorClass =
+  | 'transient'
+  | 'unrecoverable'
+  | 'rate_limit'
+  | 'quota_exhausted'
+  | 'auth_invalid'
+  | (string & {}); // open union: providers may emit custom kinds
+
+export class ClassifiedProviderError extends Error {
+  readonly kind: ProviderErrorClass;
+  readonly retryAfterMs?: number;
+  readonly cause: unknown;
+
+  constructor(message: string, opts: {
+    kind: ProviderErrorClass;
+    cause: unknown;
+    retryAfterMs?: number;
+  }) {
+    super(message);
+    this.name = 'ClassifiedProviderError';
+    this.kind = opts.kind;
+    this.cause = opts.cause;
+    if (opts.retryAfterMs !== undefined) {
+      this.retryAfterMs = opts.retryAfterMs;
+    }
+  }
+}
+
+export function isClassified(err: unknown): err is ClassifiedProviderError {
+  return err instanceof ClassifiedProviderError;
+}
@@ -0,0 +1,130 @@
+/**
+ * Retry helper that consumes ClassifiedProviderError.kind to decide whether to
+ * retry. Pattern adapted from open-agent-sdk's retry.ts (MIT) — exponential
+ * backoff with jitter, but driven by classified error kinds, not raw HTTP
+ * status codes.
+ *
+ * Used by GeminiProvider + OpenRouterProvider for fetch retries. Cap retries
+ * at 2 because POSTs to these APIs aren't strictly idempotent; we honor a
+ * provider-supplied request-id (best-effort) for dedup.
+ */
+
+import { ClassifiedProviderError, isClassified } from './provider-errors.js';
+import { logger } from '../../utils/logger.js';
+
+export interface RetryOptions {
+  /** Maximum retry attempts (in addition to the initial attempt). Cap=2 by default for non-idempotent POSTs. */
+  maxRetries?: number;
+  /** Per-attempt timeout in ms. Default 30s. */
+  perAttemptTimeoutMs?: number;
+  /** Base delay used for exponential backoff. Default 100ms. */
+  baseDelayMs?: number;
+  /** Cap for backoff delay. Default 30s. */
+  maxDelayMs?: number;
+  /** Tag for logging. */
+  label?: string;
+  /** External abort signal. */
+  abortSignal?: AbortSignal;
+}
+
+const DEFAULT_OPTIONS: Required<Omit<RetryOptions, 'label' | 'abortSignal'>> = {
+  maxRetries: 2,
+  perAttemptTimeoutMs: 30_000,
+  baseDelayMs: 100,
+  maxDelayMs: 30_000,
+};
+
+/** Returns true if a classified error is worth retrying. */
+export function isRetryableKind(err: unknown): boolean {
+  if (!isClassified(err)) {
+    // Unclassified errors are treated as transient (preserve old default).
+    return true;
+  }
+  return err.kind === 'transient' || err.kind === 'rate_limit';
+}
+
+/** Compute backoff delay: 100 * 2^attempt + random(50). Capped at maxDelayMs. */
+export function computeBackoffMs(attempt: number, opts: { baseDelayMs: number; maxDelayMs: number }): number {
+  const exponential = opts.baseDelayMs * Math.pow(2, attempt);
+  const jitter = Math.random() * 50;
+  return Math.min(exponential + jitter, opts.maxDelayMs);
+}
+
+/**
+ * Run `fn` with retry. `fn` receives an AbortSignal scoped to the current
+ * attempt's timeout. The classified error from `fn` (if any) drives the
+ * retry/no-retry decision. Honors `retryAfterMs` for rate_limit kind.
+ */
+export async function withRetry<T>(
+  fn: (attemptSignal: AbortSignal) => Promise<T>,
+  options: RetryOptions = {},
+): Promise<T> {
+  const opts = { ...DEFAULT_OPTIONS, ...options };
+  let lastError: unknown;
+
+  for (let attempt = 0; attempt <= opts.maxRetries; attempt++) {
+    if (options.abortSignal?.aborted) {
+      throw new Error('Aborted');
+    }
+
+    // Per-attempt timeout via AbortController. Forward external aborts too.
+    const attemptController = new AbortController();
+    const timeoutHandle = setTimeout(() => attemptController.abort(), opts.perAttemptTimeoutMs);
+    const onExternalAbort = () => attemptController.abort();
+    options.abortSignal?.addEventListener('abort', onExternalAbort, { once: true });
+
+    try {
+      return await fn(attemptController.signal);
+    } catch (err: unknown) {
+      lastError = err;
+
+      if (!isRetryableKind(err)) {
+        throw err;
+      }
+
+      if (attempt === opts.maxRetries) {
+        throw err;
+      }
+
+      // Honor retryAfterMs from rate_limit errors; otherwise exponential backoff.
+      let delayMs: number;
+      if (isClassified(err) && err.kind === 'rate_limit' && err.retryAfterMs !== undefined) {
+        delayMs = err.retryAfterMs;
+      } else {
+        delayMs = computeBackoffMs(attempt, { baseDelayMs: opts.baseDelayMs, maxDelayMs: opts.maxDelayMs });
+      }
+
+      const errMsg = err instanceof Error ? err.message : String(err);
+      logger.warn('SDK', `Retrying ${opts.label ?? 'fetch'} after ${delayMs}ms (attempt ${attempt + 1}/${opts.maxRetries})`, {
+        kind: isClassified(err) ? err.kind : 'unclassified',
+        message: errMsg.substring(0, 200),
+      });
+      // Abort-aware sleep: an external abort during backoff should exit
+      // immediately instead of waiting out the full delay.
+      await new Promise<void>((resolve, reject) => {
+        const signal = options.abortSignal;
+        if (signal?.aborted) {
+          reject(new Error('Aborted'));
+          return;
+        }
+        const timer = setTimeout(() => {
+          signal?.removeEventListener('abort', onAbort);
+          resolve();
+        }, delayMs);
+        const onAbort = () => {
+          clearTimeout(timer);
+          reject(new Error('Aborted'));
+        };
+        signal?.addEventListener('abort', onAbort, { once: true });
+      });
+    } finally {
+      clearTimeout(timeoutHandle);
+      options.abortSignal?.removeEventListener('abort', onExternalAbort);
+    }
+  }
+
+  // Reachable only if opts.maxRetries < 0 (loop never executed). The success
+  // and exhaustion paths both return/throw inside the loop. This guards
+  // pathological inputs and satisfies TypeScript's return-type exhaustiveness.
+  throw lastError ?? new Error('withRetry exited without an attempt (maxRetries < 0)');
+}
@@ -1,15 +1,22 @@

 import { existsSync, readFileSync, writeFileSync, mkdirSync, chmodSync } from 'fs';
-import { join, dirname } from 'path';
-import { homedir } from 'os';
 import { logger } from '../utils/logger.js';
+import { paths } from './paths.js';
+import {
+  readClaudeOAuthToken,
+  writeStaleMarker,
+  clearStaleMarker,
+  type OAuthTokenResult,
+} from './oauth-token.js';

-const DATA_DIR = join(homedir(), '.claude-mem');
-export const ENV_FILE_PATH = join(DATA_DIR, '.env');
+export const ENV_FILE_PATH = paths.envFile();

 const BLOCKED_ENV_VARS = [
-  'ANTHROPIC_API_KEY',  // Issue #733: Prevent auto-discovery from project .env files
-  'CLAUDECODE',         // Prevent "cannot be launched inside another Claude Code session" error
+  'ANTHROPIC_API_KEY',       // Issue #733: Prevent auto-discovery from project .env files
+  'CLAUDECODE',              // Prevent "cannot be launched inside another Claude Code session" error
+  'CLAUDE_CODE_OAUTH_TOKEN', // Issue #2215: prevent stale parent-process token from leaking into
+                             // isolated env. The fresh token is read from the keychain at spawn
+                             // time by buildIsolatedEnvWithFreshOAuth().
 ];

 export interface ClaudeMemEnv {
@@ -89,10 +96,10 @@ export function loadClaudeMemEnv(): ClaudeMemEnv {
 export function saveClaudeMemEnv(env: ClaudeMemEnv): void {
  let existing: Record<string, string> = {};
  try {
-    if (!existsSync(DATA_DIR)) {
-      mkdirSync(DATA_DIR, { recursive: true, mode: 0o700 });
+    if (!existsSync(paths.dataDir())) {
+      mkdirSync(paths.dataDir(), { recursive: true, mode: 0o700 });
    }
-    chmodSync(DATA_DIR, 0o700);
+    chmodSync(paths.dataDir(), 0o700);

    existing = existsSync(ENV_FILE_PATH)
      ? parseEnvFile(readFileSync(ENV_FILE_PATH, 'utf-8'))
@@ -171,9 +178,89 @@ export function buildIsolatedEnv(includeCredentials: boolean = true): Record<str
      isolatedEnv.OPENROUTER_API_KEY = credentials.OPENROUTER_API_KEY;
    }

-    if (!isolatedEnv.ANTHROPIC_API_KEY && process.env.CLAUDE_CODE_OAUTH_TOKEN) {
-      isolatedEnv.CLAUDE_CODE_OAUTH_TOKEN = process.env.CLAUDE_CODE_OAUTH_TOKEN;
-    }
+    // Note: CLAUDE_CODE_OAUTH_TOKEN is intentionally NOT copied from
+    // process.env here. OAuth tokens have refresh semantics that this
+    // sync path cannot model — copying a parent-process token captured
+    // at startup means injecting a stale token days later (issue #2215).
+    // Use buildIsolatedEnvWithFreshOAuth() for spawn-time injection.
+  }
+
+  return isolatedEnv;
+}
+
+/**
+ * Async variant of buildIsolatedEnv() that reads the OAuth token from the
+ * platform-native credential store at the moment of spawn. Use this at SDK
+ * spawn-time so the worker subprocess always gets a fresh token.
+ *
+ * Behavior per OAuthTokenResult:
+ *   - present: inject as CLAUDE_CODE_OAUTH_TOKEN env var, clear stale marker.
+ *   - expired: do NOT inject. Log re-login message. Write stale marker so
+ *     the session-start hook can surface the message to the user.
+ *   - absent: proceed without the token. Worker may fall back to
+ *     ANTHROPIC_API_KEY or other auth.
+ *
+ * Issue #2215: this replaces the old "copy CLAUDE_CODE_OAUTH_TOKEN from
+ * process.env" path which silently injected stale tokens.
+ */
+export async function buildIsolatedEnvWithFreshOAuth(
+  includeCredentials: boolean = true,
+): Promise<Record<string, string>> {
+  const isolatedEnv = buildIsolatedEnv(includeCredentials);
+
+  // Defensive: ensure no parent-process OAuth token survives this path even
+  // if BLOCKED_ENV_VARS is bypassed. Issue #2215.
+  delete isolatedEnv.CLAUDE_CODE_OAUTH_TOKEN;
+
+  if (!includeCredentials) return isolatedEnv;
+
+  // If the user already configured an ANTHROPIC_API_KEY in ~/.claude-mem/.env,
+  // honor that and skip OAuth lookup entirely. API key auth is preferred when
+  // explicitly configured because it's stateless and stable.
+  if (isolatedEnv.ANTHROPIC_API_KEY) {
+    clearStaleMarker();
+    return isolatedEnv;
+  }
+
+  let result: OAuthTokenResult;
+  try {
+    result = await readClaudeOAuthToken();
+  } catch (error) {
+    logger.warn(
+      'OAUTH',
+      'OAuth token read failed unexpectedly; proceeding without token',
+      {},
+      error instanceof Error ? error : new Error(String(error)),
+    );
+    return isolatedEnv;
+  }
+
+  switch (result.kind) {
+    case 'present':
+      isolatedEnv.CLAUDE_CODE_OAUTH_TOKEN = result.token;
+      logger.info('OAUTH', 'Injected fresh CLAUDE_CODE_OAUTH_TOKEN at spawn-time', {
+        source: result.source,
+        expiresAt: result.expiresAt,
+      });
+      clearStaleMarker();
+      break;
+    case 'expired':
+      logger.warn(
+        'OAUTH',
+        `Refusing to inject expired CLAUDE_CODE_OAUTH_TOKEN: ${result.reason}. Re-login via Claude Desktop to refresh.`,
+        { expiresAt: result.expiresAt },
+      );
+      writeStaleMarker(result.reason);
+      break;
+    case 'absent':
+      logger.debug('OAUTH', `No OAuth token available: ${result.reason}`);
+      // Token is absent — any prior stale-marker would have been written
+      // when the token was expired, but is no longer accurate now that the
+      // token is gone. Clear it so the session-start hook stops surfacing
+      // a stale "expired token, re-login" warning (CodeRabbit review on PR
+      // #2282).
+      clearStaleMarker();
+      break;
  }

  return isolatedEnv;
@@ -193,8 +280,11 @@ export function getAuthMethodDescription(): string {
  if (hasAnthropicApiKey()) {
    return 'API key (from ~/.claude-mem/.env)';
  }
+  // Note: this is a quick sync hint for logging — the authoritative OAuth
+  // path is buildIsolatedEnvWithFreshOAuth() which reads the keychain at
+  // spawn time. process.env may or may not carry a token here.
  if (process.env.CLAUDE_CODE_OAUTH_TOKEN) {
-    return 'Claude Code OAuth token (from parent process)';
+    return 'Claude Code OAuth token (env, refreshed via keychain at spawn)';
  }
-  return 'Claude Code CLI (subscription billing)';
+  return 'Claude Code OAuth token (read from system keychain at spawn)';
 }
@@ -0,0 +1,166 @@
+/**
+ * Shared Claude executable discovery and validation.
+ *
+ * Used by SDKAgent and KnowledgeAgent to locate a working Claude Code CLI.
+ * Validates candidates with `--version` to distinguish the real CLI from
+ * the desktop-app .exe (which exists on disk but can't run headless).
+ *
+ * Closes #2222.
+ */
+
+import { execSync, execFileSync } from 'child_process';
+import { existsSync } from 'fs';
+import { SettingsDefaultsManager } from './SettingsDefaultsManager.js';
+import { USER_SETTINGS_PATH } from './paths.js';
+import { logger } from '../utils/logger.js';
+
+/** How long to wait for `claude --version` before giving up (ms). */
+const VERSION_CHECK_TIMEOUT_MS = 3_000;
+
+/**
+ * Returns true if the path looks like a Windows desktop-app installation
+ * (AppData or Program Files) rather than a CLI installed via npm/volta/etc.
+ */
+function looksLikeDesktopAppPath(candidatePath: string): boolean {
+  const normalized = candidatePath.replace(/\\/g, '/').toLowerCase();
+  return (
+    normalized.includes('appdata') ||
+    normalized.includes('program files') ||
+    normalized.includes('program files (x86)')
+  );
+}
+
+/**
+ * Run `<candidate> --version` and return the trimmed stdout, or null on failure.
+ * Failures include: timeout, non-zero exit, missing binary, etc.
+ *
+ * Uses execFileSync (not execSync) so the candidate path is passed as a
+ * separate argument and never interpreted by a shell. This prevents shell
+ * injection if the path contains characters like `"`, `;`, `&` — reachable
+ * on Windows via a crafted CLAUDE_CODE_PATH in settings.json.
+ */
+function verifyClaudeVersion(candidate: string): string | null {
+  try {
+    const versionOutput = execFileSync(candidate, ['--version'], {
+      encoding: 'utf8',
+      timeout: VERSION_CHECK_TIMEOUT_MS,
+      windowsHide: true,
+      stdio: ['ignore', 'pipe', 'ignore'],
+    }).trim();
+    return versionOutput || null;
+  } catch {
+    return null;
+  }
+}
+
+/**
+ * Find and validate a Claude Code CLI executable.
+ *
+ * Discovery order:
+ *   1. `CLAUDE_CODE_PATH` from settings.json (explicit user override)
+ *   2. `claude.cmd` via PATH on Windows (avoids spawn issues with spaces)
+ *   3. `which claude` / `where claude` auto-detection
+ *
+ * Every candidate is validated with `--version` (3 s timeout) before being
+ * accepted. If a candidate exists on disk but fails `--version`, it is
+ * skipped with a warning. Desktop-app executables get an actionable error
+ * message explaining how to install the CLI.
+ *
+ * @param logComponent  Logger component tag (e.g. 'SDK', 'WORKER')
+ * @throws {Error} when no valid Claude CLI can be found
+ */
+export function findClaudeExecutable(logComponent: string = 'SDK'): string {
+  const settings = SettingsDefaultsManager.loadFromFile(USER_SETTINGS_PATH);
+
+  // --- 1. Explicit configured path ----------------------------------------
+  if (settings.CLAUDE_CODE_PATH) {
+    if (!existsSync(settings.CLAUDE_CODE_PATH)) {
+      throw new Error(
+        `CLAUDE_CODE_PATH is set to "${settings.CLAUDE_CODE_PATH}" but the file does not exist.`
+      );
+    }
+
+    const version = verifyClaudeVersion(settings.CLAUDE_CODE_PATH);
+    if (!version) {
+      const isDesktopApp = looksLikeDesktopAppPath(settings.CLAUDE_CODE_PATH);
+      if (isDesktopApp) {
+        throw new Error(
+          `Found desktop app at "${settings.CLAUDE_CODE_PATH}" but it doesn't support headless mode. ` +
+          `Install Claude Code CLI: npm install -g @anthropic-ai/claude-code`
+        );
+      }
+      throw new Error(
+        `CLAUDE_CODE_PATH is set to "${settings.CLAUDE_CODE_PATH}" but it failed the --version check. ` +
+        `Ensure this is a working Claude Code CLI binary.`
+      );
+    }
+    logger.debug(logComponent, `Using configured CLAUDE_CODE_PATH: ${settings.CLAUDE_CODE_PATH} (${version})`);
+    return settings.CLAUDE_CODE_PATH;
+  }
+
+  // --- 2. Windows: prefer claude.cmd via PATH ------------------------------
+  if (process.platform === 'win32') {
+    try {
+      execSync('where claude.cmd', {
+        encoding: 'utf8',
+        windowsHide: true,
+        stdio: ['ignore', 'pipe', 'ignore'],
+      });
+      // claude.cmd is a wrapper — verify it can actually produce --version
+      const version = verifyClaudeVersion('claude.cmd');
+      if (version) {
+        logger.debug(logComponent, `Using claude.cmd from PATH (${version})`);
+        return 'claude.cmd';
+      }
+      logger.warn(logComponent, 'claude.cmd found in PATH but failed --version check, trying next candidate');
+    } catch {
+      // Fall through to generic detection
+    }
+  }
+
+  // --- 3. Auto-detection via which/where -----------------------------------
+  try {
+    const rawOutput = execSync(
+      process.platform === 'win32' ? 'where claude' : 'which claude',
+      { encoding: 'utf8', windowsHide: true, stdio: ['ignore', 'pipe', 'ignore'] }
+    ).trim();
+
+    // `where` on Windows can return multiple lines; try each candidate
+    const candidates = rawOutput.split('\n').map((line) => line.trim()).filter(Boolean);
+
+    for (const candidate of candidates) {
+      const version = verifyClaudeVersion(candidate);
+      if (version) {
+        logger.debug(logComponent, `Auto-detected Claude CLI: ${candidate} (${version})`);
+        return candidate;
+      }
+
+      // Candidate exists but doesn't respond to --version
+      if (looksLikeDesktopAppPath(candidate)) {
+        logger.warn(
+          logComponent,
+          `Skipping desktop app at "${candidate}" — it doesn't support headless mode. ` +
+          `Install Claude Code CLI: npm install -g @anthropic-ai/claude-code`
+        );
+      } else {
+        logger.warn(
+          logComponent,
+          `Skipping "${candidate}" — failed --version check`
+        );
+      }
+    }
+  } catch (error) {
+    // [ANTI-PATTERN IGNORED]: Fallback behavior — which/where failed, continue to throw clear error
+    if (error instanceof Error) {
+      logger.debug(logComponent, 'Claude executable auto-detection failed', {}, error);
+    } else {
+      logger.debug(logComponent, 'Claude executable auto-detection failed with non-Error', {}, new Error(String(error)));
+    }
+  }
+
+  throw new Error(
+    'Claude executable not found. Please either:\n' +
+    '1. Add "claude" to your system PATH, or\n' +
+    '2. Set CLAUDE_CODE_PATH in ~/.claude-mem/settings.json'
+  );
+}
@@ -0,0 +1,362 @@
+/**
+ * Read Claude Desktop's OAuth token from the platform-native credential store
+ * at worker spawn-time. This avoids the staleness problem of persisting tokens
+ * in EnvManager's allowlist — keychain entries are always current because
+ * Claude Desktop refreshes them in place.
+ *
+ * Issue #2215: do NOT add CLAUDE_CODE_OAUTH_TOKEN to the persisted-key list
+ * without expiry handling. OAuth tokens expire and refresh; stale tokens
+ * injected days later cause 401s.
+ */
+
+import { execFile, type ExecFileException } from 'child_process';
+import { promisify } from 'util';
+import { existsSync, readFileSync, writeFileSync, mkdirSync, unlinkSync } from 'fs';
+import { userInfo } from 'os';
+import { join } from 'path';
+import { paths } from './paths.js';
+import { logger } from '../utils/logger.js';
+
+const execFileAsync = promisify(execFile);
+
+const KEYCHAIN_SERVICE_NAME = 'Claude Code-credentials';
+const READ_TIMEOUT_MS = 5000;
+
+// Grace window: even if expiresAt is in the past by less than this, allow the
+// token through. Claude Desktop typically refreshes shortly before expiry, so
+// a small grace covers clock skew and refresh-in-progress windows.
+const EXPIRY_GRACE_MS = 60_000;
+
+export type OAuthTokenResult =
+  | { kind: 'present'; token: string; source: 'keychain' | 'env-fallback'; expiresAt?: number }
+  | { kind: 'expired'; reason: string; expiresAt?: number }
+  | { kind: 'absent'; reason: string };
+
+interface ClaudeKeychainPayload {
+  claudeAiOauth?: {
+    accessToken?: string;
+    refreshToken?: string;
+    expiresAt?: number;
+    scopes?: string[];
+  };
+}
+
+/**
+ * Decode a JWT's `exp` claim if the token looks like a JWT. Returns
+ * milliseconds since epoch. Returns undefined if the token isn't a JWT or
+ * doesn't carry an `exp` claim.
+ */
+export function decodeJwtExpMs(token: string): number | undefined {
+  const parts = token.split('.');
+  if (parts.length !== 3) return undefined;
+  try {
+    const payloadB64 = parts[1].replace(/-/g, '+').replace(/_/g, '/');
+    const payload = JSON.parse(Buffer.from(payloadB64, 'base64').toString('utf-8'));
+    if (typeof payload.exp === 'number') {
+      // JWT exp is seconds since epoch; normalize to ms.
+      return payload.exp * 1000;
+    }
+  } catch {
+    return undefined;
+  }
+  return undefined;
+}
+
+/**
+ * Determine whether `expiresAtMs` indicates an expired token, allowing for a
+ * small grace window for clock skew and in-flight refreshes.
+ */
+function isExpired(expiresAtMs: number | undefined): boolean {
+  if (expiresAtMs === undefined) return false;
+  return expiresAtMs + EXPIRY_GRACE_MS < Date.now();
+}
+
+/**
+ * macOS: read the JSON blob stored under "Claude Code-credentials" service in
+ * the user's login keychain. The blob looks like:
+ *   {"claudeAiOauth":{"accessToken":"...","refreshToken":"...","expiresAt":<ms>}}
+ */
+async function readMacOsKeychain(): Promise<OAuthTokenResult> {
+  const account = userInfo().username;
+  try {
+    const { stdout } = await execFileAsync(
+      'security',
+      ['find-generic-password', '-s', KEYCHAIN_SERVICE_NAME, '-a', account, '-w'],
+      { timeout: READ_TIMEOUT_MS, windowsHide: true },
+    );
+    const raw = stdout.trim();
+    if (!raw) {
+      return { kind: 'absent', reason: 'macOS keychain returned empty value for "Claude Code-credentials"' };
+    }
+    return parseKeychainPayload(raw);
+  } catch (error) {
+    const err = error as ExecFileException;
+    // `security` exits non-zero when the entry doesn't exist — fail-fast as absent.
+    return {
+      kind: 'absent',
+      reason: `macOS keychain lookup failed for service "${KEYCHAIN_SERVICE_NAME}" (account=${account}): ${err.message ?? String(err)}`,
+    };
+  }
+}
+
+/**
+ * Windows: Credential Manager (DPAPI). Claude Desktop on Windows stores
+ * OAuth credentials under a target like "Claude Code:credentials" via the
+ * Wincred API. We read it via PowerShell's CredentialManager wrapper.
+ *
+ * Note: `cmdkey /list` exposes target names but not secrets. Reading the
+ * secret requires PowerShell + the CredentialManager module OR the Win32
+ * CredRead API. We use a PowerShell snippet that calls CredRead for the
+ * common target name patterns Claude Desktop is known to use.
+ */
+async function readWindowsCredentialManager(): Promise<OAuthTokenResult> {
+  // PowerShell snippet enumerates likely target names and prints the JSON blob.
+  // The exact target name on Windows is "Claude Code-credentials" or
+  // "Claude Code:credentials" (Claude Desktop uses `${service}:${account}` or
+  // `${service}` depending on version). This script tries both.
+  // Username is escaped with PowerShell's single-quote convention (' → '') in
+  // case future Windows versions or domain-joined machines permit ' in usernames.
+  const psSafeUsername = userInfo().username.replace(/'/g, "''");
+  const psScript = `
+    $ErrorActionPreference = 'SilentlyContinue'
+    $candidates = @('Claude Code-credentials', 'Claude Code:credentials', 'Claude Code-credentials:${psSafeUsername}')
+    Add-Type -Namespace ClaudeMem -Name CredRead -MemberDefinition @"
+      [DllImport("Advapi32.dll", SetLastError=true, CharSet=CharSet.Unicode)]
+      public static extern bool CredRead(string target, uint type, uint reservedFlag, out IntPtr CredentialPtr);
+      [DllImport("Advapi32.dll", SetLastError=true)]
+      public static extern void CredFree(IntPtr cred);
+      [StructLayout(LayoutKind.Sequential, CharSet=CharSet.Unicode)]
+      public struct CREDENTIAL {
+        public uint Flags; public uint Type; public string TargetName; public string Comment;
+        public System.Runtime.InteropServices.ComTypes.FILETIME LastWritten;
+        public uint CredentialBlobSize; public IntPtr CredentialBlob;
+        public uint Persist; public uint AttributeCount; public IntPtr Attributes;
+        public string TargetAlias; public string UserName;
+      }
+"@ -ErrorAction SilentlyContinue
+    foreach ($t in $candidates) {
+      $ptr = [IntPtr]::Zero
+      $ok = [ClaudeMem.CredRead]::CredRead($t, 1, 0, [ref]$ptr)
+      if ($ok) {
+        $cred = [System.Runtime.InteropServices.Marshal]::PtrToStructure($ptr, [Type][ClaudeMem.CredRead+CREDENTIAL])
+        $bytes = New-Object byte[] $cred.CredentialBlobSize
+        [System.Runtime.InteropServices.Marshal]::Copy($cred.CredentialBlob, $bytes, 0, $cred.CredentialBlobSize)
+        [ClaudeMem.CredRead]::CredFree($ptr) | Out-Null
+        [System.Text.Encoding]::Unicode.GetString($bytes)
+        exit 0
+      }
+    }
+    exit 1
+  `.trim();
+
+  try {
+    const { stdout } = await execFileAsync(
+      'powershell.exe',
+      ['-NoProfile', '-NonInteractive', '-Command', psScript],
+      { timeout: READ_TIMEOUT_MS, windowsHide: true },
+    );
+    const raw = stdout.trim();
+    if (!raw) {
+      return { kind: 'absent', reason: 'Windows Credential Manager has no entry for "Claude Code-credentials"' };
+    }
+    return parseKeychainPayload(raw);
+  } catch (error) {
+    const err = error as ExecFileException;
+    return {
+      kind: 'absent',
+      reason: `Windows Credential Manager read failed: ${err.message ?? String(err)}`,
+    };
+  }
+}
+
+/**
+ * Linux: libsecret via the `secret-tool` CLI. Claude Desktop on Linux stores
+ * the credential under the same service name "Claude Code-credentials" with
+ * the account attribute set to the OS username.
+ */
+async function readLinuxLibsecret(): Promise<OAuthTokenResult> {
+  const account = userInfo().username;
+  try {
+    const { stdout } = await execFileAsync(
+      'secret-tool',
+      ['lookup', 'service', KEYCHAIN_SERVICE_NAME, 'account', account],
+      { timeout: READ_TIMEOUT_MS, windowsHide: true },
+    );
+    const raw = stdout.trim();
+    if (!raw) {
+      return { kind: 'absent', reason: 'Linux libsecret returned empty value for "Claude Code-credentials"' };
+    }
+    return parseKeychainPayload(raw);
+  } catch (error) {
+    const err = error as ExecFileException;
+    return {
+      kind: 'absent',
+      reason: `Linux libsecret lookup failed (is secret-tool installed?): ${err.message ?? String(err)}`,
+    };
+  }
+}
+
+/**
+ * The keychain payload Claude Desktop writes is a JSON blob. Parse it, extract
+ * the access token, and classify based on `expiresAt`.
+ */
+function parseKeychainPayload(raw: string): OAuthTokenResult {
+  let payload: ClaudeKeychainPayload;
+  try {
+    payload = JSON.parse(raw);
+  } catch {
+    // Some Claude Desktop versions might store a bare token instead of JSON.
+    if (raw.startsWith('sk-ant-') || raw.split('.').length === 3) {
+      const expFromJwt = decodeJwtExpMs(raw);
+      if (isExpired(expFromJwt)) {
+        return {
+          kind: 'expired',
+          reason: 'Bare keychain token has expired JWT exp claim',
+          expiresAt: expFromJwt,
+        };
+      }
+      return { kind: 'present', token: raw, source: 'keychain', expiresAt: expFromJwt };
+    }
+    return { kind: 'absent', reason: 'Keychain payload is neither JSON nor a recognized token shape' };
+  }
+
+  const accessToken = payload.claudeAiOauth?.accessToken;
+  const expiresAt = payload.claudeAiOauth?.expiresAt;
+
+  if (!accessToken) {
+    return { kind: 'absent', reason: 'Keychain payload has no claudeAiOauth.accessToken field' };
+  }
+
+  // Prefer the SDK-provided expiresAt; fall back to JWT exp if present.
+  const effectiveExpiresAt = expiresAt ?? decodeJwtExpMs(accessToken);
+
+  if (isExpired(effectiveExpiresAt)) {
+    return {
+      kind: 'expired',
+      reason: 'Claude Desktop OAuth token has expired — re-login via Claude Desktop to refresh',
+      expiresAt: effectiveExpiresAt,
+    };
+  }
+
+  return { kind: 'present', token: accessToken, source: 'keychain', expiresAt: effectiveExpiresAt };
+}
+
+/**
+ * Sidecar metadata file: when a fallback token is provided via env (CI, headless,
+ * keychain-blocked environments), a sibling JSON file at
+ * `${DATA_DIR}/oauth-token-meta.json` may carry the token's expiresAt timestamp.
+ * This lets us refuse stale-token injection in environments where keychain
+ * access is blocked.
+ */
+function readSidecarExpiresAt(): number | undefined {
+  const sidecarPath = join(paths.dataDir(), 'oauth-token-meta.json');
+  if (!existsSync(sidecarPath)) return undefined;
+  try {
+    const raw = readFileSync(sidecarPath, 'utf-8');
+    const parsed = JSON.parse(raw);
+    if (typeof parsed.expiresAt === 'number') return parsed.expiresAt;
+  } catch {
+    // Malformed sidecar — treat as absent and let fall-through happen.
+  }
+  return undefined;
+}
+
+/**
+ * Read Claude Desktop's OAuth token, preferring the platform-native credential
+ * store. Falls back to the CLAUDE_CODE_OAUTH_TOKEN environment variable only
+ * when the keychain has no entry — env-as-primary is intended for CI/headless
+ * setups where no keychain exists.
+ */
+export async function readClaudeOAuthToken(): Promise<OAuthTokenResult> {
+  let keychainResult: OAuthTokenResult;
+
+  switch (process.platform) {
+    case 'darwin':
+      keychainResult = await readMacOsKeychain();
+      break;
+    case 'win32':
+      keychainResult = await readWindowsCredentialManager();
+      break;
+    case 'linux':
+      keychainResult = await readLinuxLibsecret();
+      break;
+    default:
+      keychainResult = {
+        kind: 'absent',
+        reason: `Unsupported platform: ${process.platform}`,
+      };
+  }
+
+  // If keychain produced a present or expired result, that's authoritative.
+  // Expired wins over env-fallback: a known-stale keychain entry is a clearer
+  // signal than an env var of unknown freshness.
+  if (keychainResult.kind === 'present' || keychainResult.kind === 'expired') {
+    return keychainResult;
+  }
+
+  // Keychain absent: try env-fallback for CI/headless. Refuse if the sidecar
+  // metadata indicates the env-provided token is stale.
+  const envToken = process.env.CLAUDE_CODE_OAUTH_TOKEN;
+  if (envToken && envToken.trim().length > 0) {
+    const sidecarExpiresAt = readSidecarExpiresAt();
+    const jwtExpiresAt = decodeJwtExpMs(envToken);
+    const effectiveExpiresAt = sidecarExpiresAt ?? jwtExpiresAt;
+
+    if (isExpired(effectiveExpiresAt)) {
+      return {
+        kind: 'expired',
+        reason: 'CLAUDE_CODE_OAUTH_TOKEN env var expired (per sidecar/JWT) — re-login via Claude Desktop',
+        expiresAt: effectiveExpiresAt,
+      };
+    }
+
+    return {
+      kind: 'present',
+      token: envToken,
+      source: 'env-fallback',
+      expiresAt: effectiveExpiresAt,
+    };
+  }
+
+  return keychainResult;
+}
+
+/**
+ * Marker file pattern: when a recent spawn returned `expired`, write a marker
+ * at `${DATA_DIR}/oauth-stale.marker` so the session-start hook can surface a
+ * clear "re-login via Claude Desktop" message to the user. The marker is
+ * cleared once the token is refreshed and a `present` result is observed.
+ */
+export function writeStaleMarker(reason: string): void {
+  try {
+    const dir = paths.dataDir();
+    if (!existsSync(dir)) mkdirSync(dir, { recursive: true, mode: 0o700 });
+    const markerPath = join(dir, 'oauth-stale.marker');
+    writeFileSync(markerPath, reason, { encoding: 'utf-8', mode: 0o600 });
+  } catch (error) {
+    logger.warn('OAUTH', 'Failed to write oauth-stale marker', {}, error instanceof Error ? error : new Error(String(error)));
+  }
+}
+
+export function clearStaleMarker(): void {
+  try {
+    const markerPath = join(paths.dataDir(), 'oauth-stale.marker');
+    if (existsSync(markerPath)) {
+      unlinkSync(markerPath);
+    }
+  } catch {
+    // Best-effort: if we can't clear the marker, the session-start hook will
+    // surface a stale message even though the token is actually fresh. The
+    // next successful spawn will overwrite the marker.
+  }
+}
+
+export function readStaleMarker(): string | undefined {
+  try {
+    const markerPath = join(paths.dataDir(), 'oauth-stale.marker');
+    if (!existsSync(markerPath)) return undefined;
+    return readFileSync(markerPath, 'utf-8');
+  } catch {
+    return undefined;
+  }
+}
@@ -1,6 +1,6 @@
 import { join, dirname, basename, sep } from 'path';
 import { homedir } from 'os';
-import { existsSync, mkdirSync } from 'fs';
+import { existsSync, mkdirSync, readFileSync } from 'fs';
 import { execSync } from 'child_process';
 import { fileURLToPath } from 'url';
 import { SettingsDefaultsManager } from './SettingsDefaultsManager.js';
@@ -24,7 +24,6 @@ function resolveDataDir(): string {
  const settingsPath = join(defaultDataDir, 'settings.json');
  try {
    if (existsSync(settingsPath)) {
-      const { readFileSync } = require('fs');
      const raw = JSON.parse(readFileSync(settingsPath, 'utf-8'));
      const settings = raw.env ?? raw; 
      if (settings.CLAUDE_MEM_DATA_DIR) {
@@ -126,3 +125,24 @@ export function createBackupFilename(originalPath: string): string {

  return `${originalPath}.backup.${timestamp}`;
 }
+
+export const paths = {
+  dataDir: () => DATA_DIR,
+  workerPid: () => join(DATA_DIR, 'worker.pid'),
+  settings: () => join(DATA_DIR, 'settings.json'),
+  database: () => join(DATA_DIR, 'claude-mem.db'),
+  chroma: () => join(DATA_DIR, 'chroma'),
+  combinedCerts: () => join(DATA_DIR, 'combined_certs.pem'),
+  transcriptsConfig: () => join(DATA_DIR, 'transcript-watch.json'),
+  transcriptsState: () => join(DATA_DIR, 'transcript-watch-state.json'),
+  corpora: () => join(DATA_DIR, 'corpora'),
+  supervisorRegistry: () => join(DATA_DIR, 'supervisor.json'),
+  envFile: () => join(DATA_DIR, '.env'),
+  logsDir: () => LOGS_DIR,
+  archives: () => ARCHIVES_DIR,
+  trash: () => TRASH_DIR,
+  backups: () => BACKUPS_DIR,
+  modes: () => MODES_DIR,
+  vectorDb: () => VECTOR_DB_DIR,
+  observerSessions: () => OBSERVER_SESSIONS_DIR,
+} as const;
@@ -0,0 +1,12 @@
+// F1 foundation: spawn wrapper that hides child windows on Windows by default. See src/shared/spawn.ts.test.ts for invariant.
+import { spawn, type SpawnOptions, type ChildProcess } from 'node:child_process';
+
+export type SpawnHiddenOptions = SpawnOptions;
+
+export function spawnHidden(
+  command: string,
+  args?: readonly string[],
+  options?: SpawnOptions
+): ChildProcess {
+  return spawn(command, args ?? [], { windowsHide: true, ...options });
+}
@@ -60,47 +60,84 @@ function extractLastMessageFromGeminiTranscript(
  return '';
 }

-function extractLastMessageFromJsonl(
+/**
+ * Extract last message from a JSONL transcript.
+ *
+ * Supports two field conventions for the per-line role marker:
+ * - Claude Code:  `{"type":"assistant",...}`
+ * - Cursor:       `{"role":"assistant",...}`
+ *
+ * The most recent assistant turn is often a pure tool_use block with no text
+ * content (especially in Cursor, where the agent's last action before the
+ * user replies is a tool call). We therefore keep scanning backwards until
+ * we find a turn with non-empty text content, instead of returning early on
+ * the first matching role.
+ */
+export function extractLastMessageFromJsonl(
  content: string,
  role: 'user' | 'assistant',
  stripSystemReminders: boolean
 ): string {
  const lines = content.split('\n');
  let foundMatchingRole = false;
+  let lastEmptyText: string | null = null;

  for (let i = lines.length - 1; i >= 0; i--) {
-    const line = JSON.parse(lines[i]);
-    if (line.type === role) {
-      foundMatchingRole = true;
+    const rawLine = lines[i];
+    if (!rawLine) continue;
+    // Tolerate truncated/malformed JSONL lines (crash mid-write, partial flush).
+    // A bad line shouldn't crash the summarization pipeline — skip and move on.
+    let line: any;
+    try {
+      line = JSON.parse(rawLine);
+    } catch {
+      continue;
+    }
+    const lineRole = line.type ?? line.role;
+    if (lineRole !== role) continue;
+    foundMatchingRole = true;

-      if (line.message?.content) {
-        let text = '';
-        const msgContent = line.message.content;
+    if (!line.message?.content) continue;

-        if (typeof msgContent === 'string') {
-          text = msgContent;
-        } else if (Array.isArray(msgContent)) {
-          text = msgContent
-            .filter((c: any) => c.type === 'text')
-            .map((c: any) => c.text)
-            .join('\n');
-        } else {
-          throw new Error(`Unknown message content format in transcript. Type: ${typeof msgContent}`);
-        }
+    let text = '';
+    const msgContent = line.message.content;
+    if (typeof msgContent === 'string') {
+      text = msgContent;
+    } else if (Array.isArray(msgContent)) {
+      text = msgContent
+        .filter(
+          (c: any): c is { type: 'text'; text: string } =>
+            !!c && typeof c === 'object' && c.type === 'text' && typeof c.text === 'string'
+        )
+        .map((c) => c.text)
+        .join('\n');
+    } else {
+      // Unknown content shape (null, number, plain object, etc.) — skip rather
+      // than throw. A single weird line should not crash the entire summary
+      // pipeline; we already tolerate malformed JSONL via the parse-catch
+      // above, and this is the same class of defensive forward compat
+      // (CodeRabbit / Greptile review on PR #2282).
+      continue;
+    }

-        if (stripSystemReminders) {
-          text = text.replace(SYSTEM_REMINDER_REGEX, '');
-          text = text.replace(/\n{3,}/g, '\n\n').trim();
-        }
+    if (stripSystemReminders) {
+      text = text.replace(SYSTEM_REMINDER_REGEX, '');
+      text = text.replace(/\n{3,}/g, '\n\n').trim();
+    }

-        return text;
-      }
+    if (text && text.trim()) {
+      return text;
+    }
+    // Remember the first (most recent) empty-text turn as a fallback so the
+    // caller can still distinguish "no matching role" from "matching role but
+    // tool-only turns" if every later turn is empty.
+    if (lastEmptyText === null) {
+      lastEmptyText = text;
    }
  }

  if (!foundMatchingRole) {
    return '';
  }
-
-  return '';
+  return lastEmptyText ?? '';
 }
@@ -0,0 +1,7 @@
+// F3 foundation: derive uptime in seconds from a start timestamp in ms.
+// Clamps to >= 0 so a future startedAtMs or a non-monotonic clock skew doesn't
+// surface negative uptime to health/status endpoints
+// (CodeRabbit review on PR #2282).
+export function getUptimeSeconds(startedAtMs: number, now: () => number = Date.now): number {
+  return Math.max(0, Math.floor((now() - startedAtMs) / 1000));
+}
@@ -1,6 +1,7 @@
 import path from "path";
 import { readFileSync, existsSync, writeFileSync, renameSync, mkdirSync } from "fs";
-import { spawn, execSync } from "child_process";
+import { execSync } from "child_process";
+import { spawnHidden } from "./spawn.js";
 import { logger } from "../utils/logger.js";
 import { HOOK_TIMEOUTS, HOOK_EXIT_CODES, getTimeout } from "./hook-constants.js";
 import { SettingsDefaultsManager } from "./SettingsDefaultsManager.js";
@@ -241,7 +242,7 @@ export async function ensureWorkerRunning(): Promise<boolean> {
  logger.info('SYSTEM', 'Worker not running — lazy-spawning', { runtimePath, scriptPath });

  try {
-    const proc = spawn(runtimePath, [scriptPath, '--daemon'], {
+    const proc = spawnHidden(runtimePath, [scriptPath, '--daemon'], {
      detached: true,
      stdio: ['ignore', 'ignore', 'ignore'],
    });
@@ -1,6 +1,4 @@
 import { existsSync, readFileSync, rmSync } from 'fs';
-import { homedir } from 'os';
-import path from 'path';
 import { logger } from '../utils/logger.js';
 import {
  getProcessRegistry,
@@ -11,9 +9,9 @@ import {
 } from './process-registry.js';
 import { runShutdownCascade } from './shutdown.js';
 import { startHealthChecker, stopHealthChecker } from './health-checker.js';
+import { paths } from '../shared/paths.js';

-const DATA_DIR = path.join(homedir(), '.claude-mem');
-const PID_FILE = path.join(DATA_DIR, 'worker.pid');
+const PID_FILE = paths.workerPid();

 interface ValidateWorkerPidOptions {
  logAlive?: boolean;
@@ -1,15 +1,15 @@
-import { ChildProcess, spawn, spawnSync } from 'child_process';
+import { ChildProcess, spawnSync } from 'child_process';
+import { spawnHidden } from '../shared/spawn.js';
 import { existsSync, mkdirSync, readFileSync, writeFileSync } from 'fs';
-import { homedir } from 'os';
 import path from 'path';
 import { logger } from '../utils/logger.js';
 import { sanitizeEnv } from './env-sanitizer.js';
+import { paths } from '../shared/paths.js';

 const REAP_SESSION_SIGTERM_TIMEOUT_MS = 5_000;
 const REAP_SESSION_SIGKILL_TIMEOUT_MS = 1_000;

-const DATA_DIR = path.join(homedir(), '.claude-mem');
-const DEFAULT_REGISTRY_PATH = path.join(DATA_DIR, 'supervisor.json');
+const DEFAULT_REGISTRY_PATH = paths.supervisorRegistry();

 export interface ManagedProcessInfo {
  pid: number;
@@ -511,7 +511,7 @@ export function spawnSdkProcess(

  const isWin = process.platform === 'win32';
  const child = useCmdWrapper
-    ? spawn('cmd.exe', ['/d', '/c', options.command, ...filteredArgs], {
+    ? spawnHidden('cmd.exe', ['/d', '/c', options.command, ...filteredArgs], {
        cwd: options.cwd,
        env,
        detached: !isWin,
@@ -519,7 +519,7 @@ export function spawnSdkProcess(
        signal: options.signal,
        windowsHide: true,
      })
-    : spawn(options.command, filteredArgs, {
+    : spawnHidden(options.command, filteredArgs, {
        cwd: options.cwd,
        env,
        detached: !isWin,
@@ -1,15 +1,13 @@
 import { execFile } from 'child_process';
 import { rmSync } from 'fs';
-import { homedir } from 'os';
-import path from 'path';
 import { promisify } from 'util';
 import { logger } from '../utils/logger.js';
 import { HOOK_TIMEOUTS } from '../shared/hook-constants.js';
 import { isPidAlive, type ManagedProcessRecord, type ProcessRegistry } from './process-registry.js';
+import { paths } from '../shared/paths.js';

 const execFileAsync = promisify(execFile);
-const DATA_DIR = path.join(homedir(), '.claude-mem');
-const PID_FILE = path.join(DATA_DIR, 'worker.pid');
+const PID_FILE = paths.workerPid();

 type TreeKillFn = (pid: number, signal?: string, callback?: (error?: Error | null) => void) => void;

@@ -118,20 +116,31 @@ async function signalProcess(record: ManagedProcessRecord, signal: 'SIGTERM' | '
  const { pid, pgid } = record;

  if (process.platform !== 'win32') {
-    try {
-      if (typeof pgid === 'number') {
+    // Try the process group first when we have one — it reaches grandchildren
+    // re-parented to init. If the group is already gone (ESRCH) the actual
+    // root pid may still be alive (e.g. it survived its own group teardown);
+    // fall through to the per-pid signal so shutdown isn't a no-op
+    // (CodeRabbit review on PR #2282).
+    if (typeof pgid === 'number') {
+      try {
        process.kill(-pgid, signal);
-      } else {
-        process.kill(pid, signal);
-      }
-    } catch (error: unknown) {
-      if (error instanceof Error) {
-        const errno = (error as NodeJS.ErrnoException).code;
-        if (errno === 'ESRCH') {
-          return;
+        return;
+      } catch (error: unknown) {
+        const errno = error instanceof Error ? (error as NodeJS.ErrnoException).code : undefined;
+        if (errno !== 'ESRCH') {
+          throw error;
        }
+        // ESRCH on the group — fall through and try the bare pid below.
+      }
+    }
+
+    try {
+      process.kill(pid, signal);
+    } catch (error: unknown) {
+      const errno = error instanceof Error ? (error as NodeJS.ErrnoException).code : undefined;
+      if (errno !== 'ESRCH') {
+        throw error;
      }
-      throw error;
    }
    return;
  }
@@ -1,13 +1,13 @@

 import { existsSync, readFileSync, writeFileSync, renameSync } from 'fs';
 import path from 'path';
-import os from 'os';
 import { logger } from './logger.js';
 import { formatDate, groupByDate } from '../shared/timeline-formatting.js';
 import { SettingsDefaultsManager } from '../shared/SettingsDefaultsManager.js';
 import { workerHttpRequest } from '../shared/worker-utils.js';
+import { paths } from '../shared/paths.js';

-const SETTINGS_PATH = path.join(os.homedir(), '.claude-mem', 'settings.json');
+const SETTINGS_PATH = paths.settings();

 const CLAUDE_MD_FILENAME = 'CLAUDE.md';

@@ -1,7 +1,7 @@

 import { appendFileSync, existsSync, mkdirSync, readFileSync } from 'fs';
 import { join } from 'path';
-import { homedir } from 'os';
+import { paths } from '../shared/paths.js';

 export enum LogLevel {
  DEBUG = 0,
@@ -29,6 +29,7 @@ export type Component =
  | 'HTTP'
  | 'IMPORT'
  | 'INGEST'
+  | 'OAUTH'
  | 'OPENCLAW'
  | 'OPENCODE'
  | 'PARSER'
@@ -55,7 +56,6 @@ interface LogContext {
  [key: string]: any;
 }

-const DEFAULT_DATA_DIR = join(homedir(), '.claude-mem');

 class Logger {
  private level: LogLevel | null = null;
@@ -73,7 +73,7 @@ class Logger {
    this.logFileInitialized = true;

    try {
-      const logsDir = join(DEFAULT_DATA_DIR, 'logs');
+      const logsDir = paths.logsDir();

      if (!existsSync(logsDir)) {
        mkdirSync(logsDir, { recursive: true });
@@ -90,7 +90,7 @@ class Logger {
  private getLevel(): LogLevel {
    if (this.level === null) {
      try {
-        const settingsPath = join(DEFAULT_DATA_DIR, 'settings.json');
+        const settingsPath = paths.settings();
        if (existsSync(settingsPath)) {
          const settingsData = readFileSync(settingsPath, 'utf-8');
          const settings = JSON.parse(settingsData);
@@ -0,0 +1,5 @@
+{"role":"user","message":{"content":[{"type":"text","text":"please list the files in src/"}]}}
+{"role":"assistant","message":{"content":[{"type":"text","text":"I'll list the files now."}]}}
+{"role":"user","message":{"content":[{"type":"text","text":"thanks, also tell me what you found"}]}}
+{"role":"assistant","message":{"content":[{"type":"tool_use","name":"Shell","input":{"command":"ls src/"}}]}}
+{"role":"assistant","message":{"content":[{"type":"text","text":"Here are the files: adapters, handlers, types."}]}}
@@ -275,7 +275,11 @@ describe('GeminiProvider', () => {

    global.fetch = mock(() => Promise.resolve(new Response('Invalid argument', { status: 400 })));

-    await expect(agent.startSession(session)).rejects.toThrow('Gemini API error: 400 - Invalid argument');
+    // F4 classifyGeminiError surfaces 400 as a classified `unrecoverable` error
+    // with a stable message ("Gemini bad request (status 400)") rather than
+    // forwarding the raw upstream body. The original cause is preserved on
+    // `.cause` for diagnostics — see classifyGeminiError in GeminiProvider.ts.
+    await expect(agent.startSession(session)).rejects.toThrow('Gemini bad request (status 400)');
  });

  it('should respect rate limits when rate limiting enabled', async () => {
@@ -51,6 +51,16 @@ describe('GracefulShutdown', () => {
  });

  describe('performGracefulShutdown', () => {
+    // Timeout bumped to 15s. performGracefulShutdown calls
+    // getSupervisor().stop() which runs runShutdownCascade against the real
+    // ~/.claude-mem/supervisor.json registry. If the developer has a live
+    // worker + chroma-mcp registered, the cascade SIGTERMs/SIGKILLs them
+    // and waits up to ~5–6s for them to exit, which sails past the default
+    // 5000ms test timeout. The other shutdown tests below are unaffected
+    // because they don't register an mcpClient/dbManager/chromaMcpManager
+    // mock that exercises the same path. This is test-infrastructure debt
+    // — the test interacts with the production supervisor singleton — not
+    // a code regression in the shutdown flow itself.
    it('should call shutdown steps in correct order', async () => {
      const callOrder: string[] = [];

@@ -115,7 +125,7 @@ describe('GracefulShutdown', () => {
      expect(callOrder.indexOf('mcpClient.close')).toBeLessThan(callOrder.indexOf('dbManager.close'));

      expect(callOrder.indexOf('chromaMcpManager.stop')).toBeLessThan(callOrder.indexOf('dbManager.close'));
-    });
+    }, 15000);

    it('should remove PID file during shutdown', async () => {
      const mockSessionManager: ShutdownableService = {
@@ -154,3 +154,82 @@ describe('parseAgentXml — observations', () => {
    expect(result[0].files_modified).toEqual(['src/utils.ts']);
  });
 });
+
+describe('parseAgentXml — fence tolerance (#2233 Part A)', () => {
+  it('parses plain XML input correctly (no fence)', () => {
+    const xml = `<observation>
+      <type>discovery</type>
+      <title>Plain XML input</title>
+      <narrative>No fence wrapper present.</narrative>
+    </observation>`;
+
+    const result = parseAgentXml(xml);
+    expect(result.valid).toBe(true);
+    if (!result.valid) return;
+    expect(result.observations).toHaveLength(1);
+    expect(result.observations[0].title).toBe('Plain XML input');
+  });
+
+  it('parses fenced XML with language tag (```xml ... ```)', () => {
+    const xml = '```xml\n<observation>\n  <type>discovery</type>\n  <title>Fenced with lang</title>\n  <narrative>Wrapped in xml-tagged code fence.</narrative>\n</observation>\n```';
+
+    const result = parseAgentXml(xml);
+    expect(result.valid).toBe(true);
+    if (!result.valid) return;
+    expect(result.observations).toHaveLength(1);
+    expect(result.observations[0].title).toBe('Fenced with lang');
+    expect(result.observations[0].narrative).toBe('Wrapped in xml-tagged code fence.');
+  });
+
+  it('parses fenced XML without language tag (``` ... ```)', () => {
+    const xml = '```\n<observation>\n  <type>bugfix</type>\n  <title>Bare fence</title>\n  <narrative>Wrapped in language-less fence.</narrative>\n</observation>\n```';
+
+    const result = parseAgentXml(xml);
+    expect(result.valid).toBe(true);
+    if (!result.valid) return;
+    expect(result.observations).toHaveLength(1);
+    expect(result.observations[0].title).toBe('Bare fence');
+    expect(result.observations[0].narrative).toBe('Wrapped in language-less fence.');
+  });
+
+  it('does not falsely strip when XML appears mid-text without fences', () => {
+    const xml = `Some intro prose.
+<observation>
+  <type>refactor</type>
+  <title>Mid-text observation</title>
+  <narrative>No fences anywhere in the input.</narrative>
+</observation>
+Trailing prose.`;
+
+    const result = parseAgentXml(xml);
+    expect(result.valid).toBe(true);
+    if (!result.valid) return;
+    expect(result.observations).toHaveLength(1);
+    expect(result.observations[0].title).toBe('Mid-text observation');
+    expect(result.observations[0].narrative).toBe('No fences anywhere in the input.');
+  });
+
+  it('does not strip inner triple-backtick lines when payload is not a full fenced wrapper', () => {
+    // Regression for CodeRabbit review on PR #2282: stripCodeFences() used to
+    // greedily remove the first ``` and last ``` anywhere in the input, which
+    // could mangle content that contains internal fenced examples or surrounds
+    // the XML with prose. The fence-stripper must only fire when the entire
+    // payload is a single fenced block.
+    const xml = 'Lead-in text with ```inline``` markers.\n' +
+      '<observation>\n' +
+      '  <type>discovery</type>\n' +
+      '  <title>Body with ``` inside narrative</title>\n' +
+      '  <narrative>Snippet: ```\nfoo\n``` end of snippet.</narrative>\n' +
+      '</observation>\n' +
+      'Trailing ``` prose with another ``` mark.';
+
+    const result = parseAgentXml(xml);
+    expect(result.valid).toBe(true);
+    if (!result.valid) return;
+    expect(result.observations).toHaveLength(1);
+    expect(result.observations[0].title).toBe('Body with ``` inside narrative');
+    // Narrative should still contain the inner ``` markers — i.e. the
+    // stripper did not eat them.
+    expect(result.observations[0].narrative).toContain('```');
+  });
+});
@@ -0,0 +1,288 @@
+import { describe, it, expect, beforeEach, afterEach, spyOn } from 'bun:test';
+import * as childProcess from 'child_process';
+import * as fs from 'fs';
+import { join } from 'path';
+import {
+  readClaudeOAuthToken,
+  decodeJwtExpMs,
+  writeStaleMarker,
+  clearStaleMarker,
+  readStaleMarker,
+} from '../../src/shared/oauth-token.js';
+import { paths } from '../../src/shared/paths.js';
+import { buildIsolatedEnvWithFreshOAuth } from '../../src/shared/EnvManager.js';
+
+/**
+ * The implementation uses promisify(execFile), which captures execFile at
+ * module-load time. To intercept those calls in tests we replace the export
+ * on `child_process` and restore it afterwards. We also redirect DATA_DIR
+ * to a per-test temp dir for marker/sidecar tests.
+ */
+
+const ORIGINAL_EXEC_FILE = childProcess.execFile;
+const ORIGINAL_PLATFORM = process.platform;
+const ORIGINAL_ENV_TOKEN = process.env.CLAUDE_CODE_OAUTH_TOKEN;
+const ORIGINAL_DATA_DIR = process.env.CLAUDE_MEM_DATA_DIR;
+
+let dataDirSpy: ReturnType<typeof spyOn> | undefined;
+let tempDir: string;
+
+function setPlatform(value: NodeJS.Platform): void {
+  Object.defineProperty(process, 'platform', { value, configurable: true });
+}
+
+function restorePlatform(): void {
+  Object.defineProperty(process, 'platform', { value: ORIGINAL_PLATFORM, configurable: true });
+}
+
+/**
+ * Patch promisify(execFile) by replacing the underlying execFile with a stub
+ * that calls back like the real Node API. Because oauth-token.ts already
+ * captured the original at import time, we instead intercept the cached
+ * promisified function via the module's internal binding by re-importing.
+ *
+ * Simpler approach: spy on childProcess.execFile and route calls to a fake
+ * callback. Because promisify wraps execFile by reference at import time,
+ * we can't intercept post-hoc. Instead we exercise the parsing logic
+ * directly via parseKeychainPayload-equivalent paths: we inject results by
+ * calling readClaudeOAuthToken() with platform spoofed AND the expected
+ * `security`/`secret-tool` binary spy via mocking the `execFile` hostpath.
+ *
+ * Bun's spyOn lets us replace properties on the `child_process` module
+ * object, but the promisified handle inside oauth-token.ts already holds a
+ * reference. So we test the parsing layer by exercising decodeJwtExpMs
+ * directly and rely on environment-fallback path for the integration shape.
+ */
+
+beforeEach(() => {
+  // Redirect DATA_DIR to a temp directory for marker file tests.
+  tempDir = fs.mkdtempSync(join(fs.realpathSync(require('os').tmpdir()), 'claude-mem-oauth-test-'));
+  dataDirSpy = spyOn(paths, 'dataDir').mockImplementation(() => tempDir);
+});
+
+afterEach(() => {
+  dataDirSpy?.mockRestore();
+  restorePlatform();
+  if (ORIGINAL_ENV_TOKEN === undefined) {
+    delete process.env.CLAUDE_CODE_OAUTH_TOKEN;
+  } else {
+    process.env.CLAUDE_CODE_OAUTH_TOKEN = ORIGINAL_ENV_TOKEN;
+  }
+  if (ORIGINAL_DATA_DIR === undefined) {
+    delete process.env.CLAUDE_MEM_DATA_DIR;
+  } else {
+    process.env.CLAUDE_MEM_DATA_DIR = ORIGINAL_DATA_DIR;
+  }
+  // Clean up temp dir
+  try {
+    fs.rmSync(tempDir, { recursive: true, force: true });
+  } catch {
+    // best effort
+  }
+});
+
+describe('decodeJwtExpMs', () => {
+  it('returns undefined for non-JWT tokens', () => {
+    expect(decodeJwtExpMs('sk-ant-oat01-bare-token')).toBeUndefined();
+    expect(decodeJwtExpMs('not.a.jwt.really')).toBeUndefined();
+    expect(decodeJwtExpMs('')).toBeUndefined();
+  });
+
+  it('extracts exp claim from a real JWT and converts seconds to ms', () => {
+    // header.payload.signature where payload is {"exp": 9999999999}
+    const header = Buffer.from(JSON.stringify({ alg: 'none' })).toString('base64url');
+    const payload = Buffer.from(JSON.stringify({ exp: 9999999999 })).toString('base64url');
+    const signature = 'sig';
+    const jwt = `${header}.${payload}.${signature}`;
+    expect(decodeJwtExpMs(jwt)).toBe(9999999999 * 1000);
+  });
+
+  it('returns undefined when JWT payload has no exp claim', () => {
+    const header = Buffer.from(JSON.stringify({ alg: 'none' })).toString('base64url');
+    const payload = Buffer.from(JSON.stringify({ sub: 'user' })).toString('base64url');
+    const jwt = `${header}.${payload}.sig`;
+    expect(decodeJwtExpMs(jwt)).toBeUndefined();
+  });
+
+  it('returns undefined for malformed JWT', () => {
+    expect(decodeJwtExpMs('not-base64.not-base64.sig')).toBeUndefined();
+  });
+});
+
+describe('marker file scheme', () => {
+  it('writeStaleMarker creates the marker file with the reason', () => {
+    writeStaleMarker('token expired at 2026-01-01');
+    const markerPath = join(tempDir, 'oauth-stale.marker');
+    expect(fs.existsSync(markerPath)).toBe(true);
+    expect(fs.readFileSync(markerPath, 'utf-8')).toBe('token expired at 2026-01-01');
+  });
+
+  it('readStaleMarker returns undefined when no marker exists', () => {
+    expect(readStaleMarker()).toBeUndefined();
+  });
+
+  it('readStaleMarker returns the reason after writeStaleMarker', () => {
+    writeStaleMarker('refresh me');
+    expect(readStaleMarker()).toBe('refresh me');
+  });
+
+  it('clearStaleMarker removes an existing marker', () => {
+    writeStaleMarker('temporary');
+    expect(readStaleMarker()).toBe('temporary');
+    clearStaleMarker();
+    expect(readStaleMarker()).toBeUndefined();
+  });
+
+  it('clearStaleMarker is a no-op when no marker exists', () => {
+    expect(() => clearStaleMarker()).not.toThrow();
+  });
+});
+
+describe('readClaudeOAuthToken — env-fallback branch', () => {
+  // These tests exercise the env-fallback path which is reachable on every
+  // platform when the keychain returns absent. We force absent by spoofing
+  // the platform to an unsupported value.
+  beforeEach(() => {
+    setPlatform('aix' as NodeJS.Platform); // unsupported -> always absent
+  });
+
+  it('returns absent when no env token is set', async () => {
+    delete process.env.CLAUDE_CODE_OAUTH_TOKEN;
+    const result = await readClaudeOAuthToken();
+    expect(result.kind).toBe('absent');
+    if (result.kind === 'absent') {
+      expect(result.reason).toContain('Unsupported platform');
+    }
+  });
+
+  it('returns present (env-fallback) when env token is set and not expired', async () => {
+    // Non-JWT bare token, no sidecar -> no expiresAt detectable -> not expired.
+    process.env.CLAUDE_CODE_OAUTH_TOKEN = 'sk-ant-oat01-fallback';
+    const result = await readClaudeOAuthToken();
+    expect(result.kind).toBe('present');
+    if (result.kind === 'present') {
+      expect(result.token).toBe('sk-ant-oat01-fallback');
+      expect(result.source).toBe('env-fallback');
+    }
+  });
+
+  it('returns expired when env token JWT exp claim is in the past', async () => {
+    // Build a JWT with exp=1 (1970) — definitely expired.
+    const header = Buffer.from(JSON.stringify({ alg: 'none' })).toString('base64url');
+    const payload = Buffer.from(JSON.stringify({ exp: 1 })).toString('base64url');
+    const expiredJwt = `${header}.${payload}.sig`;
+    process.env.CLAUDE_CODE_OAUTH_TOKEN = expiredJwt;
+    const result = await readClaudeOAuthToken();
+    expect(result.kind).toBe('expired');
+    if (result.kind === 'expired') {
+      expect(result.reason).toContain('expired');
+      expect(result.expiresAt).toBe(1000); // 1 sec * 1000
+    }
+  });
+
+  it('returns expired when sidecar metadata indicates env token is stale', async () => {
+    process.env.CLAUDE_CODE_OAUTH_TOKEN = 'sk-ant-oat01-bare';
+    // Write a sidecar with expiresAt in the past (well beyond grace window).
+    const sidecarPath = join(tempDir, 'oauth-token-meta.json');
+    const stalePastMs = Date.now() - 60 * 60 * 1000; // 1 hour ago
+    fs.writeFileSync(sidecarPath, JSON.stringify({ expiresAt: stalePastMs }));
+    const result = await readClaudeOAuthToken();
+    expect(result.kind).toBe('expired');
+    if (result.kind === 'expired') {
+      expect(result.expiresAt).toBe(stalePastMs);
+    }
+  });
+
+  it('returns present when sidecar expiresAt is in the future', async () => {
+    process.env.CLAUDE_CODE_OAUTH_TOKEN = 'sk-ant-oat01-bare';
+    const sidecarPath = join(tempDir, 'oauth-token-meta.json');
+    const futureMs = Date.now() + 60 * 60 * 1000; // 1 hour from now
+    fs.writeFileSync(sidecarPath, JSON.stringify({ expiresAt: futureMs }));
+    const result = await readClaudeOAuthToken();
+    expect(result.kind).toBe('present');
+    if (result.kind === 'present') {
+      expect(result.expiresAt).toBe(futureMs);
+      expect(result.source).toBe('env-fallback');
+    }
+  });
+});
+
+describe('readClaudeOAuthToken — macOS keychain branch', () => {
+  // We can't easily intercept the cached promisified execFile from inside
+  // oauth-token.ts (it captured a reference at module load). Instead we
+  // verify the macOS branch dispatches by checking that on darwin without
+  // a real keychain entry, the fallback path is reached.
+  it('on macOS, falls back to env when keychain access fails or returns nothing', async () => {
+    if (process.platform !== 'darwin') {
+      // Skip on non-macOS — we only run this test where security CLI exists.
+      return;
+    }
+    // Use an env token; if the real keychain has a fresh entry, we get
+    // 'present' with source='keychain'. If no keychain entry, we fall back
+    // to env-fallback. Either way, kind='present' with a non-empty token
+    // (or 'expired' if the real keychain entry happens to be stale).
+    process.env.CLAUDE_CODE_OAUTH_TOKEN = 'sk-ant-oat01-test-fallback';
+    setPlatform('darwin');
+    const result = await readClaudeOAuthToken();
+    // Whatever the keychain says, the result should be a valid kind.
+    expect(['present', 'expired', 'absent']).toContain(result.kind);
+    if (result.kind === 'present') {
+      expect(result.token.length).toBeGreaterThan(0);
+      expect(['keychain', 'env-fallback']).toContain(result.source);
+    }
+  });
+});
+
+describe('readClaudeOAuthToken — Linux branch', () => {
+  it('on linux without secret-tool, returns absent gracefully', async () => {
+    if (process.platform !== 'linux') return; // skip on non-linux
+    setPlatform('linux');
+    delete process.env.CLAUDE_CODE_OAUTH_TOKEN;
+    const result = await readClaudeOAuthToken();
+    // If secret-tool is not installed or has no entry, returns absent.
+    // If somehow present, we accept that too.
+    expect(['present', 'expired', 'absent']).toContain(result.kind);
+  });
+});
+
+describe('readClaudeOAuthToken — Windows branch', () => {
+  it('on win32 without keychain entry, returns absent or env-fallback', async () => {
+    if (process.platform !== 'win32') return; // skip on non-windows
+    setPlatform('win32');
+    delete process.env.CLAUDE_CODE_OAUTH_TOKEN;
+    const result = await readClaudeOAuthToken();
+    expect(['present', 'expired', 'absent']).toContain(result.kind);
+  });
+});
+
+// CodeRabbit Minor (PR #2282 follow-up): when the OAuth token is absent, any
+// previously-written stale marker is no longer accurate (the token is gone,
+// not expired). buildIsolatedEnvWithFreshOAuth must clear it on the absent
+// branch the same way it does on present.
+describe('buildIsolatedEnvWithFreshOAuth — absent token clears stale marker', () => {
+  beforeEach(() => {
+    setPlatform('aix' as NodeJS.Platform); // unsupported -> always absent
+    delete process.env.CLAUDE_CODE_OAUTH_TOKEN;
+  });
+
+  it('clears a pre-existing stale marker when token is absent', async () => {
+    // Pre-existing marker from an earlier "expired" pass.
+    writeStaleMarker('left over from previous run');
+    expect(readStaleMarker()).toBe('left over from previous run');
+
+    // Force the absent path: ANTHROPIC_API_KEY must NOT be set in either the
+    // env file or the process env, otherwise the early-return branch fires
+    // before we ever reach the OAuth resolution.
+    const origAnthropicKey = process.env.ANTHROPIC_API_KEY;
+    delete process.env.ANTHROPIC_API_KEY;
+    try {
+      await buildIsolatedEnvWithFreshOAuth(true);
+    } finally {
+      if (origAnthropicKey !== undefined) {
+        process.env.ANTHROPIC_API_KEY = origAnthropicKey;
+      }
+    }
+
+    expect(readStaleMarker()).toBeUndefined();
+  });
+});
@@ -0,0 +1,33 @@
+import { describe, it, expect } from 'bun:test';
+import { paths, DATA_DIR } from '../../src/shared/paths.js';
+
+describe('paths namespace', () => {
+  it('exposes at least the known core accessors', () => {
+    const keys = Object.keys(paths);
+    const required = [
+      'dataDir',
+      'workerPid',
+      'settings',
+      'database',
+      'chroma',
+      'transcriptsConfig',
+    ];
+    for (const key of required) {
+      expect(keys).toContain(key);
+    }
+  });
+
+  it('every accessor returns a string starting with DATA_DIR', () => {
+    for (const key of Object.keys(paths) as Array<keyof typeof paths>) {
+      const value = paths[key]();
+      expect(typeof value).toBe('string');
+      expect(value.startsWith(DATA_DIR)).toBe(true);
+    }
+  });
+
+  it('every accessor is a callable function', () => {
+    for (const key of Object.keys(paths) as Array<keyof typeof paths>) {
+      expect(typeof paths[key]).toBe('function');
+    }
+  });
+});
@@ -0,0 +1,225 @@
+/**
+ * Regression tests for issue #2248: Cursor IDE sessions are never summarized.
+ *
+ * Validates the three fixes that make Cursor sessions actually get summarized
+ * end-to-end (previously they were silently skipped):
+ *   A. cursor adapter derives `transcriptPath` from `cwd + conversation_id`,
+ *      since Cursor does not pass a transcript path on stdin.
+ *   B. `extractLastMessageFromJsonl` accepts both `{type:"assistant"}` (Claude
+ *      Code) and `{role:"assistant"}` (Cursor) per-line role markers.
+ *   C. `extractLastMessageFromJsonl` keeps scanning back through assistant
+ *      turns when the most recent one is a pure tool_use (no text content),
+ *      instead of returning an empty string and causing the summary to be
+ *      skipped.
+ */
+import { describe, it, expect, beforeEach, afterEach } from 'bun:test';
+import { readFileSync, writeFileSync, mkdirSync, rmSync, existsSync } from 'fs';
+import { join } from 'path';
+import { tmpdir, homedir } from 'os';
+import { extractLastMessage, extractLastMessageFromJsonl } from '../../src/shared/transcript-parser.js';
+import { cursorAdapter, deriveCursorTranscriptPath } from '../../src/cli/adapters/cursor.js';
+
+const FIXTURE_PATH = join(__dirname, '..', 'fixtures', 'cursor-session.jsonl');
+
+// ---------------------------------------------------------------------------
+// Bug B + C: extractLastMessageFromJsonl on the cursor-session.jsonl fixture
+// ---------------------------------------------------------------------------
+
+describe('cursor-extraction: extractLastMessageFromJsonl on fixture', () => {
+  const fixtureContent = readFileSync(FIXTURE_PATH, 'utf-8').trim();
+
+  it('returns the last user text from the fixture', () => {
+    expect(extractLastMessageFromJsonl(fixtureContent, 'user', false)).toBe(
+      'thanks, also tell me what you found'
+    );
+  });
+
+  it('returns the final assistant text (skipping tool_use-only turn)', () => {
+    expect(extractLastMessageFromJsonl(fixtureContent, 'assistant', false)).toBe(
+      'Here are the files: adapters, handlers, types.'
+    );
+  });
+});
+
+// ---------------------------------------------------------------------------
+// Bug B + C: extractLastMessage with extra inline cases
+// ---------------------------------------------------------------------------
+
+describe('cursor-extraction: extractLastMessage Cursor JSONL compatibility', () => {
+  const tmpDir = join(tmpdir(), `cursor-extraction-test-${Date.now()}`);
+  const transcriptPath = join(tmpDir, 'transcript.jsonl');
+
+  beforeEach(() => {
+    mkdirSync(tmpDir, { recursive: true });
+  });
+  afterEach(() => {
+    rmSync(tmpDir, { recursive: true, force: true });
+  });
+
+  it('reads Cursor JSONL using {"role":"assistant"} (Bug B regression)', () => {
+    const lines = [
+      { role: 'user', message: { content: [{ type: 'text', text: 'hello' }] } },
+      { role: 'assistant', message: { content: [{ type: 'text', text: 'hi from cursor' }] } },
+    ];
+    writeFileSync(transcriptPath, lines.map((l) => JSON.stringify(l)).join('\n'));
+
+    expect(extractLastMessage(transcriptPath, 'assistant')).toBe('hi from cursor');
+  });
+
+  it('skips a tool-only last assistant turn and returns the previous text-bearing one (Bug C regression)', () => {
+    const lines = [
+      { role: 'user', message: { content: [{ type: 'text', text: 'q1' }] } },
+      { role: 'assistant', message: { content: [{ type: 'text', text: 'real answer' }] } },
+      { role: 'user', message: { content: [{ type: 'text', text: 'q2' }] } },
+      { role: 'assistant', message: { content: [{ type: 'tool_use', name: 'Shell', input: { command: 'ls' } }] } },
+    ];
+    writeFileSync(transcriptPath, lines.map((l) => JSON.stringify(l)).join('\n'));
+
+    expect(extractLastMessage(transcriptPath, 'assistant')).toBe('real answer');
+  });
+
+  it('still returns "" when no assistant turn exists at all', () => {
+    const lines = [{ role: 'user', message: { content: [{ type: 'text', text: 'lonely' }] } }];
+    writeFileSync(transcriptPath, lines.map((l) => JSON.stringify(l)).join('\n'));
+
+    expect(extractLastMessage(transcriptPath, 'assistant')).toBe('');
+  });
+
+  it('still works for Claude Code format using {"type":"assistant"}', () => {
+    const lines = [
+      { type: 'user', message: { content: [{ type: 'text', text: 'q' }] } },
+      { type: 'assistant', message: { content: [{ type: 'text', text: 'claude code answer' }] } },
+    ];
+    writeFileSync(transcriptPath, lines.map((l) => JSON.stringify(l)).join('\n'));
+
+    expect(extractLastMessage(transcriptPath, 'assistant')).toBe('claude code answer');
+  });
+});
+
+// ---------------------------------------------------------------------------
+// Bug A: cursor adapter transcript path derivation
+// ---------------------------------------------------------------------------
+
+describe('cursor-extraction: cursorAdapter transcriptPath derivation', () => {
+  const sessionId = `c0ffee${Date.now()}`;
+  const fakeCwd = join(tmpdir(), 'fake.workspace', 'subdir');
+  const slug = fakeCwd.replace(/^\//, '').replace(/[/.]/g, '-');
+  const transcriptDir = join(homedir(), '.cursor', 'projects', slug, 'agent-transcripts', sessionId);
+  const transcriptPath = join(transcriptDir, `${sessionId}.jsonl`);
+
+  beforeEach(() => {
+    mkdirSync(fakeCwd, { recursive: true });
+    mkdirSync(transcriptDir, { recursive: true });
+    writeFileSync(
+      transcriptPath,
+      JSON.stringify({ role: 'assistant', message: { content: [{ type: 'text', text: 'ok' }] } }) + '\n'
+    );
+  });
+
+  afterEach(() => {
+    if (existsSync(transcriptPath)) rmSync(transcriptPath);
+    if (existsSync(transcriptDir)) rmSync(transcriptDir, { recursive: true, force: true });
+    if (existsSync(fakeCwd)) rmSync(fakeCwd, { recursive: true, force: true });
+  });
+
+  it('derives transcriptPath from cwd + conversation_id when the file exists (Bug A regression)', () => {
+    const normalized = cursorAdapter.normalizeInput({
+      cwd: fakeCwd,
+      conversation_id: sessionId,
+    });
+
+    expect(normalized.sessionId).toBe(sessionId);
+    expect(normalized.transcriptPath).toBe(transcriptPath);
+  });
+
+  it('returns transcriptPath: undefined when the file does not exist', () => {
+    rmSync(transcriptPath);
+    const normalized = cursorAdapter.normalizeInput({
+      cwd: fakeCwd,
+      conversation_id: sessionId,
+    });
+
+    expect(normalized.sessionId).toBe(sessionId);
+    expect(normalized.transcriptPath).toBeUndefined();
+  });
+
+  it('returns undefined when sessionId is missing (deriveCursorTranscriptPath direct call)', () => {
+    expect(deriveCursorTranscriptPath(fakeCwd, undefined)).toBeUndefined();
+  });
+
+  it('returns undefined when cwd is missing (deriveCursorTranscriptPath direct call)', () => {
+    expect(deriveCursorTranscriptPath(undefined, sessionId)).toBeUndefined();
+  });
+});
+
+// ---------------------------------------------------------------------------
+// Greptile P1 (PR #2282): malformed JSONL lines must not crash the pipeline
+// ---------------------------------------------------------------------------
+
+describe('cursor-extraction: malformed JSONL tolerance', () => {
+  it('skips truncated/malformed lines and returns the last valid match', () => {
+    const validLine = JSON.stringify({
+      role: 'assistant',
+      message: { content: [{ type: 'text', text: 'recovered text' }] },
+    });
+    const malformed = '{"role":"assistant","message":{"content":[{"type":"tex'; // truncated mid-write
+    const content = [validLine, malformed].join('\n');
+
+    expect(() => extractLastMessageFromJsonl(content, 'assistant', false)).not.toThrow();
+    expect(extractLastMessageFromJsonl(content, 'assistant', false)).toBe('recovered text');
+  });
+
+  it('returns empty string when ALL lines are malformed', () => {
+    const content = ['{partial', 'not even close to json', '}{'].join('\n');
+    expect(extractLastMessageFromJsonl(content, 'assistant', false)).toBe('');
+  });
+
+  // CodeRabbit Major + Greptile P1 (PR #2282 follow-up): a valid JSON line
+  // whose `message.content` is an unexpected type (null, number, plain
+  // object) used to throw. It must now be skipped — same tolerance class as
+  // truncated lines.
+  it('skips a line whose message.content is null and falls back to a valid earlier line', () => {
+    const valid = JSON.stringify({
+      role: 'assistant',
+      message: { content: [{ type: 'text', text: 'kept' }] },
+    });
+    const nullContent = JSON.stringify({
+      role: 'assistant',
+      message: { content: null },
+    });
+    const content = [valid, nullContent].join('\n');
+
+    expect(() => extractLastMessageFromJsonl(content, 'assistant', false)).not.toThrow();
+    expect(extractLastMessageFromJsonl(content, 'assistant', false)).toBe('kept');
+  });
+
+  it('skips a line whose message.content is a number without throwing', () => {
+    const valid = JSON.stringify({
+      role: 'assistant',
+      message: { content: [{ type: 'text', text: 'kept too' }] },
+    });
+    const numericContent = JSON.stringify({
+      role: 'assistant',
+      message: { content: 42 },
+    });
+    const content = [valid, numericContent].join('\n');
+
+    expect(() => extractLastMessageFromJsonl(content, 'assistant', false)).not.toThrow();
+    expect(extractLastMessageFromJsonl(content, 'assistant', false)).toBe('kept too');
+  });
+
+  it('skips a line whose message.content is a plain object without throwing', () => {
+    const valid = JSON.stringify({
+      role: 'assistant',
+      message: { content: [{ type: 'text', text: 'survivor' }] },
+    });
+    const objectContent = JSON.stringify({
+      role: 'assistant',
+      message: { content: { unexpected: 'shape' } },
+    });
+    const content = [valid, objectContent].join('\n');
+
+    expect(() => extractLastMessageFromJsonl(content, 'assistant', false)).not.toThrow();
+    expect(extractLastMessageFromJsonl(content, 'assistant', false)).toBe('survivor');
+  });
+});
@@ -0,0 +1,238 @@
+import { describe, it, expect } from 'bun:test';
+import {
+  ClassifiedProviderError,
+  isClassified,
+} from '../../src/services/worker/provider-errors.js';
+import { classifyClaudeError } from '../../src/services/worker/ClaudeProvider.js';
+import { classifyGeminiError } from '../../src/services/worker/GeminiProvider.js';
+import { classifyOpenRouterError } from '../../src/services/worker/OpenRouterProvider.js';
+
+// Hard cases per F4 spec — provider-specific classifiers must map raw HTTP
+// shapes / SDK errors to ClassifiedProviderError with the right kind.
+
+describe('classifyGeminiError', () => {
+  it('classifies 429 with no Retry-After as rate_limit with no retryAfterMs', () => {
+    const headers = new Headers(); // no Retry-After
+    const cause = new Error('Gemini API error: 429 - quota');
+    const err = classifyGeminiError({
+      status: 429,
+      bodyText: 'Too Many Requests',
+      headers,
+      cause,
+    });
+    expect(isClassified(err)).toBe(true);
+    expect(err.kind).toBe('rate_limit');
+    expect(err.retryAfterMs).toBeUndefined();
+    expect(err.cause).toBe(cause);
+  });
+
+  it('classifies 429 with Retry-After: 5 as rate_limit with retryAfterMs=5000', () => {
+    const headers = new Headers({ 'Retry-After': '5' });
+    const err = classifyGeminiError({
+      status: 429,
+      bodyText: '',
+      headers,
+      cause: new Error('rate limited'),
+    });
+    expect(err.kind).toBe('rate_limit');
+    expect(err.retryAfterMs).toBe(5000);
+  });
+
+  it('classifies 500 with body containing "quota exceeded" as quota_exhausted', () => {
+    const err = classifyGeminiError({
+      status: 500,
+      bodyText: 'Internal: quota exceeded for model',
+      cause: new Error('500 - quota exceeded'),
+    });
+    expect(err.kind).toBe('quota_exhausted');
+    expect(err.retryAfterMs).toBeUndefined();
+  });
+
+  it('classifies 401 with "API key not valid" body as auth_invalid', () => {
+    const err = classifyGeminiError({
+      status: 401,
+      bodyText: 'API key not valid. Please pass a valid API key.',
+      cause: new Error('401'),
+    });
+    expect(err.kind).toBe('auth_invalid');
+  });
+
+  it('classifies 403 PERMISSION_DENIED as auth_invalid', () => {
+    const err = classifyGeminiError({
+      status: 403,
+      bodyText: 'PERMISSION_DENIED',
+      cause: new Error('403'),
+    });
+    expect(err.kind).toBe('auth_invalid');
+  });
+
+  it('classifies 503 as transient', () => {
+    const err = classifyGeminiError({
+      status: 503,
+      bodyText: 'service unavailable',
+      cause: new Error('503'),
+    });
+    expect(err.kind).toBe('transient');
+  });
+
+  it('classifies network error (no status) as transient', () => {
+    const cause = new Error('fetch failed: ECONNREFUSED');
+    const err = classifyGeminiError({ cause });
+    expect(err.kind).toBe('transient');
+    expect(err.cause).toBe(cause);
+  });
+
+  it('classifies 400 as unrecoverable', () => {
+    const err = classifyGeminiError({
+      status: 400,
+      bodyText: 'INVALID_ARGUMENT',
+      cause: new Error('400'),
+    });
+    expect(err.kind).toBe('unrecoverable');
+  });
+});
+
+describe('classifyOpenRouterError', () => {
+  it('classifies 429 with no Retry-After as rate_limit with no retryAfterMs', () => {
+    const headers = new Headers(); // no Retry-After
+    const err = classifyOpenRouterError({
+      status: 429,
+      bodyText: 'rate limit exceeded',
+      headers,
+      cause: new Error('429'),
+    });
+    expect(err.kind).toBe('rate_limit');
+    expect(err.retryAfterMs).toBeUndefined();
+  });
+
+  it('classifies 429 with Retry-After: 10 as rate_limit with retryAfterMs=10000', () => {
+    const headers = new Headers({ 'retry-after': '10' });
+    const err = classifyOpenRouterError({
+      status: 429,
+      bodyText: '',
+      headers,
+      cause: new Error('429'),
+    });
+    expect(err.kind).toBe('rate_limit');
+    expect(err.retryAfterMs).toBe(10_000);
+  });
+
+  it('classifies 500 with body containing "quota exceeded" as quota_exhausted', () => {
+    const err = classifyOpenRouterError({
+      status: 500,
+      bodyText: 'something quota exceeded',
+      cause: new Error('500'),
+    });
+    expect(err.kind).toBe('quota_exhausted');
+  });
+
+  it('classifies "insufficient credits" body as quota_exhausted regardless of status', () => {
+    const err = classifyOpenRouterError({
+      status: 402,
+      bodyText: 'insufficient credits',
+      cause: new Error('402'),
+    });
+    expect(err.kind).toBe('quota_exhausted');
+  });
+
+  it('classifies 401 as auth_invalid', () => {
+    const err = classifyOpenRouterError({
+      status: 401,
+      bodyText: 'unauthorized',
+      cause: new Error('401'),
+    });
+    expect(err.kind).toBe('auth_invalid');
+  });
+
+  it('classifies 502 as transient', () => {
+    const err = classifyOpenRouterError({
+      status: 502,
+      bodyText: 'bad gateway',
+      cause: new Error('502'),
+    });
+    expect(err.kind).toBe('transient');
+  });
+
+  it('classifies network error (no status) as transient', () => {
+    const cause = new Error('ECONNRESET');
+    const err = classifyOpenRouterError({ cause });
+    expect(err.kind).toBe('transient');
+  });
+});
+
+describe('classifyClaudeError', () => {
+  it('classifies SDK-level OverloadedError as transient', () => {
+    class OverloadedError extends Error {
+      constructor() {
+        super('Overloaded');
+        this.name = 'OverloadedError';
+      }
+    }
+    const err = classifyClaudeError(new OverloadedError());
+    expect(isClassified(err)).toBe(true);
+    expect(err.kind).toBe('transient');
+  });
+
+  it('classifies 529 status as transient', () => {
+    const sdkErr = Object.assign(new Error('overloaded'), { status: 529 });
+    const err = classifyClaudeError(sdkErr);
+    expect(err.kind).toBe('transient');
+  });
+
+  it('classifies anthropic error.type=overloaded_error as transient', () => {
+    const sdkErr = Object.assign(new Error('upstream'), {
+      error: { type: 'overloaded_error' },
+    });
+    const err = classifyClaudeError(sdkErr);
+    expect(err.kind).toBe('transient');
+  });
+
+  it('classifies "Invalid API key" message as auth_invalid', () => {
+    const err = classifyClaudeError(new Error('Invalid API key: configure ~/.claude-mem/.env'));
+    expect(err.kind).toBe('auth_invalid');
+  });
+
+  it('classifies status=401 as auth_invalid', () => {
+    const sdkErr = Object.assign(new Error('unauthorized'), { status: 401 });
+    const err = classifyClaudeError(sdkErr);
+    expect(err.kind).toBe('auth_invalid');
+  });
+
+  it('classifies ENOENT spawn error as unrecoverable', () => {
+    const spawnErr = Object.assign(new Error('spawn claude ENOENT'), { code: 'ENOENT' });
+    const err = classifyClaudeError(spawnErr);
+    expect(err.kind).toBe('unrecoverable');
+  });
+
+  it('classifies "Claude executable not found" as unrecoverable', () => {
+    const err = classifyClaudeError(new Error('Claude executable not found at $CLAUDE_CODE_PATH'));
+    expect(err.kind).toBe('unrecoverable');
+  });
+
+  it('classifies prompt-too-long as unrecoverable', () => {
+    const err = classifyClaudeError(new Error('Claude session context overflow: prompt is too long'));
+    expect(err.kind).toBe('unrecoverable');
+  });
+
+  it('classifies status=429 as rate_limit', () => {
+    const sdkErr = Object.assign(new Error('rate limited'), { status: 429 });
+    const err = classifyClaudeError(sdkErr);
+    expect(err.kind).toBe('rate_limit');
+  });
+
+  it('classifies "quota exceeded" message as quota_exhausted', () => {
+    const err = classifyClaudeError(new Error('upstream: quota exceeded'));
+    expect(err.kind).toBe('quota_exhausted');
+  });
+
+  it('classifies status=503 as transient', () => {
+    const sdkErr = Object.assign(new Error('service unavailable'), { status: 503 });
+    const err = classifyClaudeError(sdkErr);
+    expect(err.kind).toBe('transient');
+  });
+
+  it('classifies unknown error as transient (preserve old default)', () => {
+    const err = classifyClaudeError(new Error('something weird happened'));
+    expect(err.kind).toBe('transient');
+  });
+});
@@ -0,0 +1,118 @@
+import { describe, it, expect } from 'bun:test';
+import {
+  ClassifiedProviderError,
+  isClassified,
+  type ProviderErrorClass,
+} from '../../src/services/worker/provider-errors.js';
+
+// These tests exercise the *type system* and *invariants of the class itself*.
+// Per-provider classification helpers (the actual mapping from raw SDK errors
+// to ClassifiedProviderError) come in a later task — here we feed stub inputs
+// representing what those helpers will eventually produce.
+
+describe('ClassifiedProviderError', () => {
+  it('classifies a 429-with-no-Retry-After response as rate_limit with no retryAfterMs', () => {
+    const stubRaw = {
+      status: 429,
+      headers: {}, // no Retry-After header
+      body: 'Too Many Requests',
+    };
+
+    const err = new ClassifiedProviderError('rate limited', {
+      kind: 'rate_limit',
+      cause: stubRaw,
+    });
+
+    expect(isClassified(err)).toBe(true);
+    expect(err.kind).toBe('rate_limit');
+    expect(err.retryAfterMs).toBeUndefined();
+    expect(err.cause).toBe(stubRaw);
+    expect(err.message).toBe('rate limited');
+    expect(err.name).toBe('ClassifiedProviderError');
+  });
+
+  it('classifies a 500-with-quota-exceeded body as quota_exhausted', () => {
+    const stubRaw = {
+      status: 500,
+      body: 'Internal error: quota exceeded for project',
+    };
+
+    const err = new ClassifiedProviderError('quota exceeded', {
+      kind: 'quota_exhausted',
+      cause: stubRaw,
+    });
+
+    expect(isClassified(err)).toBe(true);
+    expect(err.kind).toBe('quota_exhausted');
+    expect(err.retryAfterMs).toBeUndefined();
+    expect(err.cause).toBe(stubRaw);
+  });
+
+  it('classifies an SDK-level OverloadedError as transient', () => {
+    // Stand-in for an SDK error class instance (e.g. Anthropic OverloadedError).
+    class OverloadedError extends Error {
+      constructor() {
+        super('Overloaded');
+        this.name = 'OverloadedError';
+      }
+    }
+    const stubRaw = new OverloadedError();
+
+    const err = new ClassifiedProviderError('upstream overloaded', {
+      kind: 'transient',
+      cause: stubRaw,
+      retryAfterMs: 2000,
+    });
+
+    expect(isClassified(err)).toBe(true);
+    expect(err.kind).toBe('transient');
+    expect(err.retryAfterMs).toBe(2000);
+    expect(err.cause).toBe(stubRaw);
+  });
+
+  it('classifies an unknown 4xx as unrecoverable', () => {
+    const stubRaw = {
+      status: 418,
+      body: "I'm a teapot",
+    };
+
+    const err = new ClassifiedProviderError('unrecoverable client error', {
+      kind: 'unrecoverable',
+      cause: stubRaw,
+    });
+
+    expect(isClassified(err)).toBe(true);
+    expect(err.kind).toBe('unrecoverable');
+    expect(err.retryAfterMs).toBeUndefined();
+  });
+
+  it('round-trips a custom string kind through the open-union type system', () => {
+    // The (string & {}) branch in ProviderErrorClass means any string is
+    // assignable, but the named literals still autocomplete. Verify that a
+    // provider-specific kind survives unchanged through construction +
+    // the isClassified guard, and that it satisfies the type.
+    const customKind: ProviderErrorClass = 'flue_specific';
+
+    const err = new ClassifiedProviderError('flue-specific failure', {
+      kind: customKind,
+      cause: { provider: 'flue', code: 'F-42' },
+    });
+
+    expect(isClassified(err)).toBe(true);
+    expect(err.kind).toBe('flue_specific');
+
+    // Narrowing through isClassified preserves the kind field as ProviderErrorClass.
+    if (isClassified(err)) {
+      const k: ProviderErrorClass = err.kind;
+      expect(k).toBe('flue_specific');
+    }
+  });
+
+  it('isClassified rejects non-ClassifiedProviderError values', () => {
+    expect(isClassified(new Error('plain'))).toBe(false);
+    expect(isClassified('rate_limit')).toBe(false);
+    expect(isClassified(null)).toBe(false);
+    expect(isClassified(undefined)).toBe(false);
+    expect(isClassified({ kind: 'rate_limit' })).toBe(false);
+  });
+});
@@ -0,0 +1,217 @@
+import { describe, it, expect, beforeEach } from 'bun:test';
+import {
+  RateLimitStore,
+  shouldAbortForQuota,
+  isApiKeyAuth,
+  type RateLimitInfo,
+} from '../../src/services/worker/RateLimitStore.js';
+
+// Quota-aware wall-clock guard (#2234).
+//
+// Subscription users (cli/oauth) get aborted when they cross per-window
+// utilization thresholds, plus a reset-grace buffer for the rolling 5h
+// window. API-key users are exempt because they authorized per-call spend.
+
+const FIXED_NOW = 1_700_000_000_000; // arbitrary epoch ms anchor
+
+function freshStore(): RateLimitStore {
+  return new RateLimitStore();
+}
+
+describe('RateLimitStore', () => {
+  it('records and retrieves entries by rateLimitType', () => {
+    const store = freshStore();
+    store.set({ rateLimitType: 'five_hour', utilization: 0.5, status: 'allowed' });
+    const got = store.get('five_hour');
+    expect(got?.utilization).toBe(0.5);
+    expect(got?.status).toBe('allowed');
+    expect(typeof got?.observedAt).toBe('number');
+  });
+
+  it('overwrites older entries for the same window (last-write-wins)', () => {
+    const store = freshStore();
+    store.set({ rateLimitType: 'five_hour', utilization: 0.5 });
+    store.set({ rateLimitType: 'five_hour', utilization: 0.9 });
+    expect(store.get('five_hour')?.utilization).toBe(0.9);
+  });
+
+  it('keeps separate buckets per window', () => {
+    const store = freshStore();
+    store.set({ rateLimitType: 'five_hour', utilization: 0.4 });
+    store.set({ rateLimitType: 'seven_day_opus', utilization: 0.7 });
+    expect(store.get('five_hour')?.utilization).toBe(0.4);
+    expect(store.get('seven_day_opus')?.utilization).toBe(0.7);
+    expect(store.size).toBe(2);
+  });
+
+  it('falls back to "default" bucket when rateLimitType is missing', () => {
+    const store = freshStore();
+    store.set({ utilization: 0.6 } as RateLimitInfo);
+    expect(store.get(undefined)?.utilization).toBe(0.6);
+  });
+
+  it('ignores null/undefined input', () => {
+    const store = freshStore();
+    store.set(null as any);
+    store.set(undefined as any);
+    expect(store.size).toBe(0);
+  });
+
+  it('getMostRecentByWindow returns latest snapshots keyed by window', () => {
+    const store = freshStore();
+    store.set({ rateLimitType: 'five_hour', utilization: 0.1 });
+    store.set({ rateLimitType: 'seven_day_sonnet', utilization: 0.2 });
+    store.set({ rateLimitType: 'seven_day_opus', utilization: 0.3 });
+    const snap = store.getMostRecentByWindow();
+    expect(snap.five_hour?.utilization).toBe(0.1);
+    expect(snap.seven_day_sonnet?.utilization).toBe(0.2);
+    expect(snap.seven_day_opus?.utilization).toBe(0.3);
+    expect(snap.seven_day).toBeUndefined();
+  });
+
+  it('clear() drops all entries', () => {
+    const store = freshStore();
+    store.set({ rateLimitType: 'five_hour', utilization: 0.5 });
+    store.clear();
+    expect(store.size).toBe(0);
+    expect(store.get('five_hour')).toBeUndefined();
+  });
+});
+
+describe('isApiKeyAuth', () => {
+  it('matches verbose getAuthMethodDescription() output', () => {
+    expect(isApiKeyAuth('API key (from ~/.claude-mem/.env)')).toBe(true);
+    expect(isApiKeyAuth('Claude Code OAuth token (read from system keychain at spawn)')).toBe(false);
+  });
+
+  it('matches concise tokens', () => {
+    expect(isApiKeyAuth('api_key')).toBe(true);
+    expect(isApiKeyAuth('cli')).toBe(false);
+    expect(isApiKeyAuth('')).toBe(false);
+  });
+});
+
+describe('shouldAbortForQuota — api_key auth', () => {
+  let store: RateLimitStore;
+  beforeEach(() => {
+    store = freshStore();
+  });
+
+  it('never aborts even at five_hour utilization 0.99', () => {
+    store.set({ rateLimitType: 'five_hour', utilization: 0.99, status: 'allowed_warning' });
+    const decision = shouldAbortForQuota('api_key', store, FIXED_NOW);
+    expect(decision.abort).toBe(false);
+  });
+
+  it('never aborts even at seven_day_opus 0.99', () => {
+    store.set({ rateLimitType: 'seven_day_opus', utilization: 0.99 });
+    const decision = shouldAbortForQuota('API key (from ~/.claude-mem/.env)', store, FIXED_NOW);
+    expect(decision.abort).toBe(false);
+  });
+
+  it('never aborts when reset is imminent', () => {
+    store.set({
+      rateLimitType: 'five_hour',
+      utilization: 0.92,
+      resetsAt: FIXED_NOW + 60_000, // 1 min away
+    });
+    const decision = shouldAbortForQuota('api_key', store, FIXED_NOW);
+    expect(decision.abort).toBe(false);
+  });
+});
+
+describe('shouldAbortForQuota — cli/oauth auth', () => {
+  const cliAuth = 'Claude Code OAuth token (read from system keychain at spawn)';
+  let store: RateLimitStore;
+  beforeEach(() => {
+    store = freshStore();
+  });
+
+  it('aborts on five_hour at 0.96 with reason mentioning "five_hour"', () => {
+    store.set({ rateLimitType: 'five_hour', utilization: 0.96 });
+    const decision = shouldAbortForQuota(cliAuth, store, FIXED_NOW);
+    expect(decision.abort).toBe(true);
+    expect(decision.window).toBe('five_hour');
+    expect(decision.reason).toContain('five_hour');
+  });
+
+  it('does not abort on five_hour at 0.94 (below 0.95 threshold, no reset pressure)', () => {
+    store.set({
+      rateLimitType: 'five_hour',
+      utilization: 0.94,
+      resetsAt: FIXED_NOW + 60 * 60 * 1000, // 1h away
+    });
+    const decision = shouldAbortForQuota(cliAuth, store, FIXED_NOW);
+    expect(decision.abort).toBe(false);
+  });
+
+  it('aborts on seven_day_opus at 0.94 (>= 0.93 threshold)', () => {
+    store.set({ rateLimitType: 'seven_day_opus', utilization: 0.94 });
+    const decision = shouldAbortForQuota(cliAuth, store, FIXED_NOW);
+    expect(decision.abort).toBe(true);
+    expect(decision.window).toBe('seven_day_opus');
+  });
+
+  it('aborts on seven_day_sonnet at 0.93 (>= 0.92 threshold)', () => {
+    store.set({ rateLimitType: 'seven_day_sonnet', utilization: 0.93 });
+    const decision = shouldAbortForQuota(cliAuth, store, FIXED_NOW);
+    expect(decision.abort).toBe(true);
+    expect(decision.window).toBe('seven_day_sonnet');
+  });
+
+  it('aborts on five_hour at 0.90 with resetsAt 10 min away (grace buffer)', () => {
+    store.set({
+      rateLimitType: 'five_hour',
+      utilization: 0.90,
+      resetsAt: FIXED_NOW + 10 * 60 * 1000, // 10 min
+    });
+    const decision = shouldAbortForQuota(cliAuth, store, FIXED_NOW);
+    expect(decision.abort).toBe(true);
+    expect(decision.window).toBe('five_hour');
+    expect(decision.reason).toContain('resets');
+  });
+
+  it('does not abort on five_hour at 0.90 with resetsAt 30 min away (outside grace)', () => {
+    store.set({
+      rateLimitType: 'five_hour',
+      utilization: 0.90,
+      resetsAt: FIXED_NOW + 30 * 60 * 1000, // 30 min
+    });
+    const decision = shouldAbortForQuota(cliAuth, store, FIXED_NOW);
+    expect(decision.abort).toBe(false);
+  });
+
+  it('does not abort when all windows are below threshold', () => {
+    store.set({ rateLimitType: 'five_hour', utilization: 0.5 });
+    store.set({ rateLimitType: 'seven_day_opus', utilization: 0.4 });
+    store.set({ rateLimitType: 'seven_day_sonnet', utilization: 0.3 });
+    const decision = shouldAbortForQuota(cliAuth, store, FIXED_NOW);
+    expect(decision.abort).toBe(false);
+  });
+
+  it('skips reset-grace check when utilization is below the floor', () => {
+    // resetsAt within grace window but util well below the 0.85 floor —
+    // no point aborting on a window that just reset.
+    store.set({
+      rateLimitType: 'five_hour',
+      utilization: 0.10,
+      resetsAt: FIXED_NOW + 5 * 60 * 1000,
+    });
+    const decision = shouldAbortForQuota(cliAuth, store, FIXED_NOW);
+    expect(decision.abort).toBe(false);
+  });
+
+  it('reports the first matching window when multiple are over threshold', () => {
+    store.set({ rateLimitType: 'five_hour', utilization: 0.99 });
+    store.set({ rateLimitType: 'seven_day_opus', utilization: 0.99 });
+    const decision = shouldAbortForQuota(cliAuth, store, FIXED_NOW);
+    expect(decision.abort).toBe(true);
+    // five_hour is checked first per the iteration order.
+    expect(decision.window).toBe('five_hour');
+  });
+
+  it('does not abort with empty store', () => {
+    const decision = shouldAbortForQuota(cliAuth, store, FIXED_NOW);
+    expect(decision.abort).toBe(false);
+  });
+});