perf: streamline worker startup and consolidate database connections (#2122)

* docs: pathfinder refactor corpus + Node 20 preflight

Adds the PATHFINDER-2026-04-22 principle-driven refactor plan (11 docs,
cross-checked PASS) plus the exploratory PATHFINDER-2026-04-21 corpus
that motivated it. Bumps engines.node to >=20.0.0 per the ingestion-path
plan preflight (recursive fs.watch). Adds the pathfinder skill.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: land PATHFINDER Plan 01 — data integrity

Schema, UNIQUE constraints, self-healing claim, Chroma upsert fallback.

- Phase 1: fresh schema.sql regenerated at post-refactor shape.
- Phase 2: migrations 23+24 — rebuild pending_messages without
  started_processing_at_epoch; UNIQUE(session_id, tool_use_id);
  UNIQUE(memory_session_id, content_hash) on observations; dedup
  duplicate rows before adding indexes.
- Phase 3: claimNextMessage rewritten to self-healing query using
  worker_pid NOT IN live_worker_pids; STALE_PROCESSING_THRESHOLD_MS
  and the 60-s stale-reset block deleted.
- Phase 4: DEDUP_WINDOW_MS and findDuplicateObservation deleted;
  observations.insert now uses ON CONFLICT DO NOTHING.
- Phase 5: failed-message purge block deleted from worker-service
  2-min interval; clearFailedOlderThan method deleted.
- Phase 6: repairMalformedSchema and its Python subprocess repair
  path deleted from Database.ts; SQLite errors now propagate.
- Phase 7: Chroma delete-then-add fallback gated behind
  CHROMA_SYNC_FALLBACK_ON_CONFLICT env flag as bridge until
  Chroma MCP ships native upsert.
- Phase 8: migration 19 no-op block absorbed into fresh schema.sql.

Verification greps all return 0 matches. bun test tests/sqlite/
passes 63/63. bun run build succeeds.

Plan: PATHFINDER-2026-04-22/01-data-integrity.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: land PATHFINDER Plan 02 — process lifecycle

OS process groups replace hand-rolled reapers. Worker runs until
killed; orphans are prevented by detached spawn + kill(-pgid).

- Phase 1: src/services/worker/ProcessRegistry.ts DELETED. The
  canonical registry at src/supervisor/process-registry.ts is the
  sole survivor; SDK spawn site consolidated into it via new
  createSdkSpawnFactory/spawnSdkProcess/getSdkProcessForSession/
  ensureSdkProcessExit/waitForSlot helpers.
- Phase 2: SDK children spawn with detached:true + stdio:
  ['ignore','pipe','pipe']; pgid recorded on ManagedProcessInfo.
- Phase 3: shutdown.ts signalProcess teardown uses
  process.kill(-pgid, signal) on Unix when pgid is recorded;
  Windows path unchanged (tree-kill/taskkill).
- Phase 4: all reaper intervals deleted — startOrphanReaper call,
  staleSessionReaperInterval setInterval (including the co-located
  WAL checkpoint — SQLite's built-in wal_autocheckpoint handles
  WAL growth without an app-level timer), killIdleDaemonChildren,
  killSystemOrphans, reapOrphanedProcesses, reapStaleSessions, and
  detectStaleGenerator. MAX_GENERATOR_IDLE_MS and MAX_SESSION_IDLE_MS
  constants deleted.
- Phase 5: abandonedTimer — already 0 matches; primary-path cleanup
  via generatorPromise.finally() already lives in worker-service
  startSessionProcessor and SessionRoutes ensureGeneratorRunning.
- Phase 6: evictIdlestSession and its evict callback deleted from
  SessionManager. Pool admission gates backpressure upstream.
- Phase 7: SDK-failure fallback — SessionManager has zero matches
  for fallbackAgent/Gemini/OpenRouter. Failures surface to hooks
  via exit code 2 through SessionRoutes error mapping.
- Phase 8: ensureWorkerRunning in worker-utils.ts rewritten to
  lazy-spawn — consults isWorkerPortAlive (which gates
  captureProcessStartToken for PID-reuse safety via commit
  99060bac), then spawns detached with unref(), then
  waitForWorkerPort({ attempts: 3, backoffMs: 250 }) hand-rolled
  exponential backoff 250→500→1000ms. No respawn npm dep.
- Phase 9: idle self-shutdown — zero matches for
  idleCheck/idleTimeout/IDLE_MAX_MS/idleShutdown. Worker exits
  only on external SIGTERM via supervisor signal handlers.

Three test files that exercised deleted code removed:
tests/worker/process-registry.test.ts,
tests/worker/session-lifecycle-guard.test.ts,
tests/services/worker/reap-stale-sessions.test.ts.
Pass count: 1451 → 1407 (-44), all attributable to deleted test
files. Zero new failures. 31 pre-existing failures remain
(schema-repair suite, logger-usage-standards, environmental
openclaw / plugin-distribution) — none introduced by Plan 02.

All 10 verification greps return 0. bun run build succeeds.

Plan: PATHFINDER-2026-04-22/02-process-lifecycle.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: land PATHFINDER Plan 04 (narrowed) — search fail-fast

Phases 3, 5, 6 only. Plan-doc inaccuracies for phases 1/2/4/7/8/9
deferred for plan reconciliation:
  - Phase 1/2: ObservationRow type doesn't exist; the four
    "formatters" operate on three incompatible types.
  - Phase 4: RECENCY_WINDOW_MS already imported from
    SEARCH_CONSTANTS at every call site.
  - Phase 7: getExistingChromaIds is NOT @deprecated and has an
    active caller in ChromaSync.backfillMissingSyncs.
  - Phase 8: estimateTokens already consolidated.
  - Phase 9: knowledge-corpus rewrite blocked on PG-3
    prompt-caching cost smoke test.

Phase 3 — Delete SearchManager.findByConcept/findByFile/findByType.
SearchRoutes handlers (handleSearchByConcept/File/Type) now call
searchManager.getOrchestrator().findByXxx() directly via new
getter accessors on SearchManager. ~250 LoC deleted.

Phase 5 — Fail-fast Chroma. Created
src/services/worker/search/errors.ts with ChromaUnavailableError
extends AppError(503, 'CHROMA_UNAVAILABLE'). Deleted
SearchOrchestrator.executeWithFallback's Chroma-failed
SQLite-fallback branch; runtime Chroma errors now throw 503.
"Path 3" (chromaSync was null at construction — explicit-
uninitialized config) preserved as legitimate empty-result state
per plan text. ChromaSearchStrategy.search no longer wraps in
try/catch — errors propagate.

Phase 6 — Delete HybridSearchStrategy three try/catch silent
fallback blocks (findByConcept, findByType, findByFile) at lines
~82-95, ~120-132, ~161-172. Removed `fellBack` field from
StrategySearchResult type and every return site
(SQLiteSearchStrategy, BaseSearchStrategy.emptyResult,
SearchOrchestrator).

Tests updated (Principle 7 — delete in same PR):
  - search-orchestrator.test.ts: "fall back to SQLite" rewritten
    as "throw ChromaUnavailableError (HTTP 503)".
  - chroma/hybrid/sqlite-search-strategy tests: rewritten to
    rejects.toThrow; removed fellBack assertions.

Verification: SearchManager.findBy → 0; fellBack → 0 in src/.
bun test tests/worker/search/ → 122 pass, 0 fail.
bun test (suite-wide) → 1407 pass, baseline maintained, 0 new
failures. bun run build succeeds.

Plan: PATHFINDER-2026-04-22/04-read-path.md (Phases 3, 5, 6)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: land PATHFINDER Plan 03 — ingestion path

Fail-fast parser, direct in-process ingest, recursive fs.watch,
DB-backed tool pairing. Worker-internal HTTP loopback eliminated.

- Phase 0: Created src/services/worker/http/shared.ts exporting
  ingestObservation/ingestPrompt/ingestSummary as direct
  in-process functions plus ingestEventBus (Node EventEmitter,
  reusing existing pattern — no third event bus introduced).
  setIngestContext wires the SessionManager dependency from
  worker-service constructor.
- Phase 1: src/sdk/parser.ts collapsed to one parseAgentXml
  returning { valid:true; kind: 'observation'|'summary'; data }
  | { valid:false; reason: string }. Inspects root element;
  <skip_summary reason="…"/> is a first-class summary case
  with skipped:true. NEVER returns undefined. NEVER coerces.
- Phase 2: ResponseProcessor calls parseAgentXml exactly once,
  branches on the discriminated union. On invalid → markFailed
  + logger.warn(reason). On observation → ingestObservation.
  On summary → ingestSummary then emit summaryStoredEvent
  { sessionId, messageId } (consumed by Plan 05's blocking
  /api/session/end).
- Phase 3: Deleted consecutiveSummaryFailures field
  (ResponseProcessor + SessionManager + worker-types) and
  MAX_CONSECUTIVE_SUMMARY_FAILURES constant. Circuit-breaker
  guards and "tripped" log lines removed.
- Phase 4: coerceObservationToSummary deleted from sdk/parser.ts.
- Phase 5: src/services/transcripts/watcher.ts rescan setInterval
  replaced with fs.watch(transcriptsRoot, { recursive: true,
  persistent: true }) — Node 20+ recursive mode.
- Phase 6: src/services/transcripts/processor.ts pendingTools
  Map deleted. tool_use rows insert with INSERT OR IGNORE on
  UNIQUE(session_id, tool_use_id) (added by Plan 01). New
  pairToolUsesByJoin query in PendingMessageStore for read-time
  pairing (UNIQUE INDEX provides idempotency; explicit consumer
  not yet wired).
- Phase 7: HTTP loopback at processor.ts:252 replaced with
  direct ingestObservation call. maybeParseJson silent-passthrough
  rewritten to fail-fast (throws on malformed JSON).
- Phase 8: src/utils/tag-stripping.ts countTags + stripTagsInternal
  collapsed into one alternation regex, single-pass over input.
- Phase 9: src/utils/transcript-parser.ts (dead TranscriptParser
  class) deleted. The active extractLastMessage at
  src/shared/transcript-parser.ts:41-144 is the sole survivor.

Tests updated (Principle 7 — same-PR delete):
  - tests/sdk/parser.test.ts + parse-summary.test.ts: rewritten
    to assert discriminated-union shape; coercion-specific
    scenarios collapse into { valid:false } assertions.
  - tests/worker/agents/response-processor.test.ts: circuit-breaker
    describe block skipped; non-XML/empty-response tests assert
    fail-fast markFailed behavior.

Verification: every grep returns 0. transcript-parser.ts deleted.
bun run build succeeds. bun test → 1399 pass / 28 fail / 7 skip
(net -8 pass = the 4 retired circuit-breaker tests + 4 collapsed
parser cases). Zero new failures vs baseline.

Deferred (out of Plan 03 scope, will land in Plan 06): SessionRoutes
HTTP route handlers still call sessionManager.queueObservation
inline rather than the new shared helpers — the helpers are ready,
the route swap is mechanical and belongs with the Zod refactor.

Plan: PATHFINDER-2026-04-22/03-ingestion-path.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: land PATHFINDER Plan 05 — hook surface

Worker-call plumbing collapsed to one helper. Polling replaced by
server-side blocking endpoint. Fail-loud counter surfaces persistent
worker outages via exit code 2.

- Phase 1: plugin/hooks/hooks.json — three 20-iteration `for i in
  1..20; do curl -sf .../health && break; sleep 0.1; done` shell
  retry wrappers deleted. Hook commands invoke their bun entry
  point directly.
- Phase 2: src/shared/worker-utils.ts — added
  executeWithWorkerFallback<T>(url, method, body) returning
  T | { continue: true; reason?: string }. All 8 hook handlers
  (observation, session-init, context, file-context, file-edit,
  summarize, session-complete, user-message) rewritten to use
  it instead of duplicating the ensureWorkerRunning →
  workerHttpRequest → fallback sequence.
- Phase 3: blocking POST /api/session/end in SessionRoutes.ts
  using validateBody + sessionEndSchema (z.object({sessionId})).
  One-shot ingestEventBus.on('summaryStoredEvent') listener,
  30 s timer, req.aborted handler — all share one cleanup so
  the listener cannot leak. summarize.ts polling loop, plus
  MAX_WAIT_FOR_SUMMARY_MS / POLL_INTERVAL_MS constants, deleted.
- Phase 4: src/shared/hook-settings.ts — loadFromFileOnce()
  memoizes SettingsDefaultsManager.loadFromFile per process.
  Per-handler settings reads collapsed.
- Phase 5: src/shared/should-track-project.ts — single exclusion
  check entry; isProjectExcluded no longer referenced from
  src/cli/handlers/.
- Phase 6: cwd validation pushed into adapter normalizeInput
  (all 6 adapters: claude-code, cursor, raw, gemini-cli,
  windsurf). New AdapterRejectedInput error in
  src/cli/adapters/errors.ts. Handler-level isValidCwd checks
  deleted from file-edit.ts and observation.ts. hook-command.ts
  catches AdapterRejectedInput → graceful fallback.
- Phase 7: session-init.ts conditional initAgent guard deleted;
  initAgent is idempotent. tests/hooks/context-reinjection-guard
  test (validated the deleted conditional) deleted in same PR
  per Principle 7.
- Phase 8: fail-loud counter at ~/.claude-mem/state/hook-failures
  .json. Atomic write via .tmp + rename. CLAUDE_MEM_HOOK_FAIL_LOUD
  _THRESHOLD setting (default 3). On consecutive worker-unreachable
  ≥ N: process.exit(2). On success: reset to 0. NOT a retry.
- Phase 9: ensureWorkerAliveOnce() module-scope memoization
  wrapping ensureWorkerRunning. executeWithWorkerFallback calls
  the memoized version.

Minimal validateBody middleware stub at
src/services/worker/http/middleware/validateBody.ts. Plan 06 will
expand with typed inference + error envelope conventions.

Verification: 4/4 grep targets pass. bun run build succeeds.
bun test → 1393 pass / 28 fail / 7 skip; -6 pass attributable
solely to deleted context-reinjection-guard test file. Zero new
failures vs baseline.

Plan: PATHFINDER-2026-04-22/05-hook-surface.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: land PATHFINDER Plan 06 — API surface

One Zod-based validator wrapping every POST/PUT. Rate limiter,
diagnostic endpoints, and shutdown wrappers deleted. Failure-
marking consolidated to one helper.

- Phase 1 (preflight): zod@^3 already installed.
- Phase 2: validateBody middleware confirmed at canonical shape
  in src/services/worker/http/middleware/validateBody.ts —
  safeParse → 400 { error: 'ValidationError', issues: [...] }
  on failure, replaces req.body with parsed value on success.
- Phase 3: Per-route Zod schemas declared at the top of each
  route file. 24 POST endpoints across SessionRoutes,
  CorpusRoutes, DataRoutes, MemoryRoutes, SearchRoutes,
  LogsRoutes, SettingsRoutes now wrap with validateBody().
  /api/session/end (Plan 05) confirmed using same middleware.
- Phase 4: validateRequired() deleted from BaseRouteHandler
  along with every call site. Inline coercion helpers
  (coerceStringArray, coercePositiveInteger) and inline
  if (!req.body...) guards deleted across all route files.
- Phase 5: Rate limiter middleware and its registration deleted
  from src/services/worker/http/middleware.ts. Worker binds
  127.0.0.1:37777 — no untrusted caller.
- Phase 6: viewer.html cached at module init in ViewerRoutes.ts
  via fs.readFileSync; served as Buffer with text/html content
  type. SKILL.md + per-operation .md files cached in
  Server.ts as Map<string, string>; loadInstructionContent
  helper deleted. NO fs.watch, NO TTL — process restart is the
  cache-invalidation event.
- Phase 7: Four diagnostic endpoints deleted from DataRoutes.ts
  — /api/pending-queue (GET), /api/pending-queue/process (POST),
  /api/pending-queue/failed (DELETE), /api/pending-queue/all
  (DELETE). Helper methods that ONLY served them
  (getQueueMessages, getStuckCount, getRecentlyProcessed,
  clearFailed, clearAll) deleted from PendingMessageStore.
  KEPT: /api/processing-status (observability), /health
  (used by ensureWorkerRunning).
- Phase 8: stopSupervisor wrapper deleted from supervisor/index.ts.
  GracefulShutdown now calls getSupervisor().stop() directly.
  Two functions retained with clear roles:
    - performGracefulShutdown — worker-side 6-step shutdown
    - runShutdownCascade — supervisor-side child teardown
      (process.kill(-pgid), Windows tree-kill, PID-file cleanup)
  Each has unique non-trivial logic and a single canonical caller.
- Phase 9: transitionMessagesTo(status, filter) is the sole
  failure-marking path on PendingMessageStore. Old methods
  markSessionMessagesFailed and markAllSessionMessagesAbandoned
  deleted along with all callers (worker-service,
  SessionCompletionHandler, tests/zombie-prevention).

Tests updated (Principle 7 same-PR delete): coercion test files
refactored to chain validateBody → handler. Zombie-prevention
tests rewritten to call transitionMessagesTo.

Verification: all 4 grep targets → 0. bun run build succeeds.
bun test → 1393 pass / 28 fail / 7 skip — exact match to
baseline. Zero new failures.

Plan: PATHFINDER-2026-04-22/06-api-surface.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: land PATHFINDER Plan 07 — dead code sweep

ts-prune-driven sweep across the tree after Plans 01-06 landed.
Deleted unused exports, orphan helpers, and one fully orphaned
file. Earlier-plan deletions verified.

Deleted:
- src/utils/bun-path.ts (entire file — getBunPath, getBunPathOrThrow,
  isBunAvailable: zero importers)
- bun-resolver.getBunVersionString: zero callers
- PendingMessageStore.retryMessage / resetProcessingToPending /
  abortMessage: superseded by transitionMessagesTo (Plan 06 Phase 9)
- EnvManager.MANAGED_CREDENTIAL_KEYS, EnvManager.setCredential:
  zero callers
- CodexCliInstaller.checkCodexCliStatus: zero callers; no status
  command exists in npx-cli
- Two "REMOVED: cleanupOrphanedSessions" stale-fence comments

Kept (with documented justification):
- Public API surface in dist/sdk/* (parseAgentXml, prompt
  builders, ParsedObservation, ParsedSummary, ParseResult,
  SUMMARY_MODE_MARKER) — exported via package.json sdk path.
- generateContext / loadContextConfig / token utilities — used
  via dynamic await import('../../../context-generator.js') in
  worker SearchRoutes.
- MCP_IDE_INSTALLERS, install/uninstall functions for codex/goose
  — used via dynamic await import in npx-cli/install.ts +
  uninstall.ts (ts-prune cannot trace dynamic imports).
- getExistingChromaIds — active caller in
  ChromaSync.backfillMissingSyncs (Plan 04 narrowed scope).
- processPendingQueues / getSessionsWithPendingMessages — active
  orphan-recovery caller in worker-service.ts plus
  zombie-prevention test coverage.
- StoreAndMarkCompleteResult legacy alias — return-type annotation
  in same file.
- All Database.ts barrel re-exports — used downstream.

Earlier-plan verification:
- Plan 03 Phase 9: VERIFIED — src/utils/transcript-parser.ts
  is gone; TranscriptParser has 0 references in src/.
- Plan 01 Phase 8: VERIFIED — migration 19 no-op absorbed.
- SessionStore.ts:52-70 consolidation NOT executed (deferred):
  the methods are not thin wrappers but ~900 LoC of bodies, and
  two methods are documented as intentional mirrors so the
  context-generator.cjs bundle stays schema-consistent without
  pulling MigrationRunner. Deserves its own plan, not a sweep.

Verification: TranscriptParser → 0; transcript-parser.ts → gone;
no commented-out code markers remain. bun run build succeeds.
bun test → 1393 pass / 28 fail / 7 skip — EXACT match to
baseline. Zero regressions.

Plan: PATHFINDER-2026-04-22/07-dead-code.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: remove residual ProcessRegistry comment reference

Plan 07 dead-code sweep missed one comment-level reference to the
deleted in-memory ProcessRegistry class in SessionManager.ts:347.
Rewritten to describe the supervisor.json scope without naming the
deleted class, completing the verification grep target.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address Greptile review (P1 + 2× P2)

P1 — Plan 05 Phase 3 blocking endpoint was non-functional:
executeWithWorkerFallback used HEALTH_CHECK_TIMEOUT_MS (3 s) for
the POST /api/session/end call, but the server holds the
connection for SERVER_SIDE_SUMMARY_TIMEOUT_MS (30 s). Client
always raced to a "timed out" rejection that isWorkerUnavailable
classified as worker-unreachable, so the hook silently degraded
instead of waiting for summaryStoredEvent.
  - Added optional timeoutMs to executeWithWorkerFallback,
    forwarded to workerHttpRequest.
  - summarize.ts call site now passes 35_000 (5 s above server
    hold window).

P2 — ingestSummary({ kind: 'parsed' }) branch was dead code:
ResponseProcessor emitted summaryStoredEvent directly via the
event bus, bypassing the centralized helper that the comment
claimed was the single source.
  - ResponseProcessor now calls ingestSummary({ kind: 'parsed',
    sessionDbId, messageId, contentSessionId, parsed }) so the
    event-emission path is single-sourced.
  - ingestSummary's requireContext() resolution moved inside the
    'queue' branch (the only branch that needs sessionManager /
    dbManager). 'parsed' is a pure event-bus emission and
    doesn't need worker-internal context — fixes mocked
    ResponseProcessor unit tests that don't call
    setIngestContext.

P2 — isWorkerFallback could false-positive on legitimate API
responses whose schema includes { continue: true, ... }:
  - Added a Symbol.for('claude-mem/worker-fallback') brand to
    WorkerFallback. isWorkerFallback now checks the brand, not
    a duck-typed property name.

Verification: bun run build succeeds. bun test → 1393 pass /
28 fail / 7 skip — exact baseline match. Zero new failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address Greptile iteration 2 (P1 + P2)

P1 — summaryStoredEvent fired regardless of whether the row was
persisted. ResponseProcessor's call to ingestSummary({ kind:
'parsed' }) ran for every parsed.kind === 'summary' even when
result.summaryId came back null (e.g. FK violation, null
memory_session_id at commit). The blocking /api/session/end
endpoint then returned { ok: true } and the Stop hook logged
'Summary stored' for a non-existent row.

  - Gate ingestSummary call on (parsed.data.skipped ||
    session.lastSummaryStored). Skipped summaries are an explicit
    no-op bypass and still confirm; real summaries only confirm
    when storage actually wrote a row.
  - Non-skipped + summaryId === null path logs a warn and lets
    the server-side timeout (504) surface to the hook instead of
    a false ok:true.

P2 — PendingMessageStore.enqueue() returns 0 when INSERT OR
IGNORE suppresses a duplicate (the UNIQUE(session_id, tool_use_id)
constraint added by Plan 01 Phase 1). The two callers
(SessionManager.queueObservation and queueSummarize) previously
logged 'ENQUEUED messageId=0' which read like a row was inserted.

  - Branch on messageId === 0 and emit a 'DUP_SUPPRESSED' debug
    log instead of the misleading ENQUEUED line. No behavior
    change — the duplicate is still correctly suppressed by the
    DB (Principle 3); only the log surface is corrected.
  - confirmProcessed is never called with the enqueue() return
    value (it operates on session.processingMessageIds[] from
    claimNextMessage), so no caller is broken; the visibility
    fix prevents future misuse.

Verification: bun run build succeeds. bun test → 1393 pass /
28 fail / 7 skip — exact baseline match. Zero new failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address Greptile iteration 3 (P1 + 2× P2)

- P1 worker-service.ts: wire ensureGeneratorRunning into the ingest
  context after SessionRoutes is constructed. setIngestContext runs
  before routes exist, so transcript-watcher observations queued via
  ingestObservation() had no way to auto-start the SDK generator.
  Added attachIngestGeneratorStarter() to patch the callback in.
- P2 shared.ts: IngestEventBus now sets maxListeners to 0. Concurrent
  /api/session/end calls register one listener each and clean up on
  completion, so the default-10 warning fires spuriously under normal
  load.
- P2 SessionRoutes.ts: handleObservationsByClaudeId now delegates to
  ingestObservation() instead of duplicating skip-tool / meta /
  privacy / queue logic. Single helper, matching the Plan 03 goal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address Greptile iteration 4 (P1 tool-pair + P2 parse/path/doc)

- processor.handleToolResult: restore in-memory tool-use→tool-result
  pairing via session.pendingTools for schemas (e.g. Codex) whose
  tool_result events carry only tool_use_id + output. Without this,
  neither handler fired — all tool observations silently dropped.
- processor.maybeParseJson: return raw string on parse failure instead
  of throwing. Previously a single malformed JSON-shaped field caused
  handleLine's outer catch to discard the entire transcript line.
- watcher.deepestNonGlobAncestor: split on / and \\, emit empty string
  for purely-glob inputs so the caller skips the watch instead of
  anchoring fs.watch at the filesystem root. Windows-compatible.
- PendingMessageStore.enqueue: tighten docstring — callers today only
  log on the returned id; the SessionManager branches on id === 0.

* fix: forward tool_use_id through ingestObservation (Greptile iter 5)

P1 — Plan 01's UNIQUE(content_session_id, tool_use_id) dedup never
fired because the new shared ingest path dropped the toolUseId before
queueObservation. SQLite treats NULL values as distinct for UNIQUE,
so every replayed transcript line landed a duplicate row.

- shared.ingestObservation: forward payload.toolUseId to
  queueObservation so INSERT OR IGNORE can actually collapse.
- SessionRoutes.handleObservationsByClaudeId: destructure both
  tool_use_id (HTTP convention) and toolUseId (JS convention) from
  req.body and pass into ingestObservation.
- observationsByClaudeIdSchema: declare both keys explicitly so the
  validator doesn't rely on .passthrough() alone.

* fix: drop dead pairToolUsesByJoin, close session-end listener race

- PendingMessageStore: delete pairToolUsesByJoin. The method was never
  called and its self-join semantics are structurally incompatible
  with UNIQUE(content_session_id, tool_use_id): INSERT OR IGNORE
  collapses any second row with the same pair, so a self-join can
  only ever match a row to itself. In-memory pendingTools in
  processor.ts remains the pairing path for split-event schemas.

- IngestEventBus: retain a short-lived (60s) recentStored map keyed
  by sessionId. Populated on summaryStoredEvent emit, evicted on
  consume or TTL.

- handleSessionEnd: drain the recent-events buffer before attaching
  the listener. Closes the register-after-emit race where the summary
  can persist between the hook's summarize POST and its session/end
  POST — previously that window returned 504 after the 30s timeout.

* chore: merge origin/main into vivacious-teeth

Resolves conflicts with 15 commits on main (v12.3.9, security
observation types, Telegram notifier, PID-reuse worker start-guard).

Conflict resolution strategy:
- plugin/hooks/hooks.json, plugin/scripts/*.cjs, plugin/ui/viewer-bundle.js:
  kept ours — PATHFINDER Plan 05 deletes the for-i-in-1-to-20 curl retry
  loops and the built artifacts regenerate on build.
- src/cli/handlers/summarize.ts: kept ours — Plan 05 blocking
  POST /api/session/end supersedes main's fire-and-forget path.
- src/services/worker-service.ts: kept ours — Plan 05 ingest bus +
  summaryStoredEvent supersedes main's SessionCompletionHandler DI
  refactor + orphan-reaper fallback.
- src/services/worker/http/routes/SessionRoutes.ts: kept ours — same
  reason; generator .finally() Stop-hook self-clean is a guard for a
  path our blocking endpoint removes.
- src/services/worker/http/routes/CorpusRoutes.ts: merged — added
  security_alert / security_note to ALLOWED_CORPUS_TYPES (feature from
  #2084) while preserving our Zod validateBody schema.

Typecheck: 294 errors (vs 298 pre-merge). No new errors introduced; all
remaining are pre-existing (Component-enum gaps, DOM lib for viewer,
bun:sqlite types).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address Greptile P2 findings

1) SessionRoutes.handleSessionEnd was the only route handler not wrapped
   in wrapHandler — synchronous exceptions would hang the client rather
   than surfacing as 500s. Wrap it like every other handler.

2) processor.handleToolResult only consumed the session.pendingTools
   entry when the tool_result arrived without a toolName. In the
   split-schema path where tool_result carries both toolName and toolId,
   the entry was never deleted and the map grew for the life of the
   session. Consume the entry whenever toolId is present.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: typing cleanup and viewer tsconfig split for PR feedback

- Add explicit return types for SessionStore query methods
- Exclude src/ui/viewer from root tsconfig, give it its own DOM-typed config
- Add bun to root tsconfig types, plus misc typing tweaks flagged by Greptile
- Rebuilt plugin/scripts/* artifacts

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address Greptile P2 findings (iter 2)

- PendingMessageStore.transitionMessagesTo: require sessionDbId (drop
  the unscoped-drain branch that would nuke every pending/processing
  row across all sessions if a future caller omitted the filter).
- IngestEventBus.takeRecentSummaryStored: make idempotent — keep the
  cached event until TTL eviction so a retried Stop hook's second
  /api/session/end returns immediately instead of hanging 30 s.
- TranscriptWatcher fs.watch callback: skip full glob scan for paths
  already tailed (JSONL appends fire on every line; only unknown
  paths warrant a rescan).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: call finalizeSession in terminal session paths (Greptile iter 3)

terminateSession and runFallbackForTerminatedSession previously called
SessionCompletionHandler.finalizeSession before removeSessionImmediate;
the refactor dropped those calls, leaving sdk_sessions.status='active'
for every session killed by wall-clock limit, unrecoverable error, or
exhausted fallback chain. The deleted reapStaleSessions interval was
the only prior backstop.

Re-wires finalizeSession (idempotent: marks completed, drains pending,
broadcasts) into both paths; no reaper reintroduced.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: GC failed pending_messages rows at startup (Greptile iter 4)

Plan 07 deleted clearFailed/clearFailedOlderThan as "dead code", but
with the periodic sweep also removed, nothing reaps status='failed'
rows now — they accumulate indefinitely. Since claimNextMessage's
self-healing subquery scans this table, unbounded growth degrades
claim latency over time.

Re-introduces clearFailedOlderThan and calls it once at worker startup
(not a reaper — one-shot, idempotent). 7-day retention keeps enough
history for operator inspection while bounding the table.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: finalize sessions on normal exit; cleanup hoist; share handler (iter 5)

1. startSessionProcessor success branch now calls completionHandler.
   finalizeSession before removeSessionImmediate. Hooks-disabled installs
   (and any Stop hook that fails before POST /api/sessions/complete) no
   longer leave sdk_sessions rows as status='active' forever. Idempotent
   — a subsequent /api/sessions/complete is a no-op.

2. Hoist SessionRoutes.handleSessionEnd cleanup declaration above the
   closures that reference it (TDZ safety; safe at runtime today but
   fragile if timeout ever shrinks).

3. SessionRoutes now receives WorkerService's shared SessionCompletionHandler
   instead of constructing its own — prevents silent divergence if the
   handler ever becomes stateful.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: stop runaway crash-recovery loop on dead sessions

Two distinct bugs were combining to keep a dead session restarting forever:

Bug 1 (uncaught "The operation was aborted."):
  child_process.spawn emits 'error' asynchronously for ENOENT/EACCES/abort
  signal aborts. spawnSdkProcess() never attached an 'error' listener, so
  any async spawn failure became uncaughtException and escaped to the
  daemon-level handler. Attach an 'error' listener immediately after spawn,
  before the !child.pid early-return, so async spawn errors are logged
  (with errno code) and swallowed locally.

Bug 2 (sliding-window limiter never trips on slow restart cadence):
  RestartGuard tripped only when restartTimestamps.length exceeded
  MAX_WINDOWED_RESTARTS (10) within RESTART_WINDOW_MS (60s). With the 8s
  exponential-backoff cap, only ~7-8 restarts fit in the window, so a dead
  session that fail-restart-fail-restart on 8s cycles would loop forever
  (consecutiveRestarts climbing past 30+ in observed logs). Add a
  consecutiveFailures counter that increments on every restart and resets
  only on recordSuccess(). Trip when consecutive failures exceed
  MAX_CONSECUTIVE_FAILURES (5) — meaning 5 restarts with zero successful
  processing in between proves the session is dead. Both guards now run in
  parallel: tight loops still trip the windowed cap; slow loops trip the
  consecutive-failure cap.

Also: when the SessionRoutes path trips the guard, drain pending messages
to 'abandoned' so the session does not reappear in
getSessionsWithPendingMessages and trigger another auto-start cycle. The
worker-service.ts path already does this via terminateSession.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf: streamline worker startup and consolidate database connections

1. Database Pooling: Modified DatabaseManager, SessionStore, and SessionSearch to share a single bun:sqlite connection, eliminating redundant file descriptors.
2. Non-blocking Startup: Refactored WorktreeAdoption and Chroma backfill to run in the background (fire-and-forget), preventing them from stalling core initialization.
3. Diagnostic Routes: Added /api/chroma/status and bypassed the initialization guard for health/readiness endpoints to allow diagnostics during startup.
4. Robust Search: Implemented reliable SQLite FTS5 fallback in SearchManager for when Chroma (uvx) fails or is unavailable.
5. Code Cleanup: Removed redundant loopback MCP checks and mangled initialization logic from WorkerService.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: hard-exclude observer-sessions from hooks; bundle migration 29 (#2124)

* fix: hard-exclude observer-sessions from hooks; backfill bundle migrations

Stop hook + SessionEnd hook were storing the SDK observer's own
init/continuation/summary prompts in user_prompts, leaking into the
viewer (meta-observation regression). 25 such rows accumulated.

- shouldTrackProject: hard-reject OBSERVER_SESSIONS_DIR (and its subtree)
  before consulting user-configured exclusion globs.
- summarize.ts (Stop) and session-complete.ts (SessionEnd): early-return
  when shouldTrackProject(cwd) is false, so the observer's own hooks
  cannot bootstrap the worker or queue a summary against the meta-session.
- SessionRoutes: cap user-prompt body at 256 KiB at the session-init
  boundary so a runaway observer prompt cannot blow up storage.
- SessionStore: add migration 29 (UNIQUE(memory_session_id, content_hash)
  on observations) inline so bundled artifacts (worker-service.cjs,
  context-generator.cjs) stay schema-consistent — without it, the
  ON CONFLICT clause in observation inserts throws.
- spawnSdkProcess: stdio[stdin] from 'ignore' to 'pipe' so the
  supervisor can actually feed the observer's stdin.

Also rebuilds plugin/scripts/{worker-service,context-generator}.cjs.

* fix: walk back to UTF-8 boundary on prompt truncation (Greptile P2)

Plain Buffer.subarray at MAX_USER_PROMPT_BYTES can land mid-codepoint,
which the utf8 decoder silently rewrites to U+FFFD. Walk back over any
continuation bytes (0b10xxxxxx) before decoding so the truncated prompt
ends on a valid sequence boundary instead of a replacement character.

* fix: cross-platform observer-dir containment; clarify SDK stdin pipe

claude-review feedback on PR #2124.

- shouldTrackProject: literal `cwd.startsWith(OBSERVER_SESSIONS_DIR + '/')`
  hard-coded a POSIX separator and missed Windows backslash paths plus any
  trailing-slash variance. Switched to a path.relative-based isWithin()
  helper so Windows hook input under observer-sessions\\... is also excluded.
- spawnSdkProcess: added a comment explaining why stdin must be 'pipe' —
  SpawnedSdkProcess.stdin is typed NonNullable and the Claude Agent SDK
  consumes that pipe; 'ignore' would null it and the null-check below
  would tear the child down on every spawn.

* fix: make Stop hook fire-and-forget; remove dead /api/session/end

The Stop hook was awaiting a 35-second long-poll on /api/session/end,
which the worker held open until the summary-stored event fired (or its
30s server-side timeout elapsed). Followed by another await on
/api/sessions/complete. Three sequential awaits, the middle one a 30s
hold — not fire-and-forget despite repeated requests.

The Stop hook now does ONE thing: POST /api/sessions/summarize to
queue the summary work and return. The worker drives the rest async.
Session-map cleanup is performed by the SessionEnd handler
(session-complete.ts), not duplicated here.

- summarize.ts: drop the /api/session/end long-poll and the trailing
  /api/sessions/complete await; ~40 lines removed; unused
  SessionEndResponse interface gone; header comment rewritten.
- SessionRoutes: delete handleSessionEnd, sessionEndSchema, the
  SERVER_SIDE_SUMMARY_TIMEOUT_MS constant, and the /api/session/end
  route registration. Drop the now-unused ingestEventBus and
  SummaryStoredEvent imports.
- ResponseProcessor + shared.ts + worker-utils.ts: update stale
  comments that referenced the dead endpoint. The IngestEventBus is
  left in place dormant (no listeners) for follow-up cleanup so this
  PR stays focused on the blocker.

Bundle artifact (worker-service.cjs) rebuilt via build-and-sync.

Verification:
- grep '/api/session/end' plugin/scripts/worker-service.cjs → 0
- grep 'timeoutMs:35' plugin/scripts/worker-service.cjs → 0
- Worker restarted clean, /api/health ok at pid 92368

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* deps: bump all dependencies to latest including majors

Upgrades: React 18→19, Express 4→5, Zod 3→4, TypeScript 5→6,
@types/node 20→25, @anthropic-ai/claude-agent-sdk 0.1→0.2,
@clack/prompts 0.9→1.2, plus minors. Adds Daily Maintenance section
to CLAUDE.md mandating latest-version policy across manifests.

Express 5 surfaced a race in Server.listen() where the 'error' handler
was attached after listen() was invoked; refactored to use
http.createServer with both 'error' and 'listening' handlers attached
before listen(), restoring port-conflict rejection semantics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: surface real chroma errors and add deep status probe

Replace the misleading "Vector search failed - semantic search unavailable.
Install uv... restart the worker." string in SearchManager with the actual
exception text from chroma_query_documents. The lying message blamed `uv`
for any failure — even when the real cause was a chroma-mcp transport
timeout, an empty collection, or a dead subprocess.

Also add /api/chroma/status?deep=1 backed by a new
ChromaMcpManager.probeSemanticSearch() that round-trips a real query
(chroma_list_collections + chroma_query_documents) instead of just
checking the stdio handshake. The cheap default path is unchanged.

Includes the diagnostic plan (PLAN-fix-mcp-search.md) and updated test
fixtures for the new structured failure message.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: rebuild worker-service bundle to match merged src

Bundle was stale after the squash merge of #2124 — it still contained
the old "Install uv... semantic search unavailable" string and lacked
probeSemanticSearch. Rebuilt via bun run build-and-sync.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: address coderabbit feedback on PLAN-fix-mcp-search.md

- replace machine-specific /Users/alexnewman absolute paths with portable
  <repo-root> placeholder (MD-style portability)
- add blank lines around the TypeScript fenced block (MD031)
- tag the bare fenced block with `text` (MD040)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Alex Newman
2026-04-25 13:37:40 -07:00
committed by GitHub
parent 8ace1d9c84
commit 94d592f212
159 changed files with 18091 additions and 5843 deletions
+8 -1
View File
@@ -1,4 +1,5 @@
import type { PlatformAdapter, NormalizedHookInput, HookResult } from '../types.js';
import { AdapterRejectedInput, isValidCwd } from './errors.js';
// Maps Claude Code stdin format (session_id, cwd, tool_name, etc.)
// SessionStart hooks receive no stdin, so we must handle undefined input gracefully
@@ -12,9 +13,15 @@ const pickAgentField = (v: unknown): string | undefined =>
export const claudeCodeAdapter: PlatformAdapter = {
normalizeInput(raw) {
const r = (raw ?? {}) as any;
// Plan 05 Phase 6 — cwd validation at the adapter boundary (single check,
// not duplicated in handlers). Falls back to process.cwd() when unset.
const cwd = r.cwd ?? process.cwd();
if (!isValidCwd(cwd)) {
throw new AdapterRejectedInput('invalid_cwd');
}
return {
sessionId: r.session_id ?? r.id ?? r.sessionId,
cwd: r.cwd ?? process.cwd(),
cwd,
prompt: r.prompt,
toolName: r.tool_name,
toolInput: r.tool_input,
+7 -1
View File
@@ -1,4 +1,5 @@
import type { PlatformAdapter, NormalizedHookInput, HookResult } from '../types.js';
import { AdapterRejectedInput, isValidCwd } from './errors.js';
// Maps Cursor stdin format - field names differ from Claude Code
// Cursor uses: conversation_id, workspace_roots[], result_json, command/output
@@ -13,9 +14,14 @@ export const cursorAdapter: PlatformAdapter = {
const r = (raw ?? {}) as any;
// Cursor-specific: shell commands come as command/output instead of tool_name/input/response
const isShellCommand = !!r.command && !r.tool_name;
// Plan 05 Phase 6 — cwd validation at the adapter boundary.
const cwd = r.workspace_roots?.[0] ?? r.cwd ?? process.cwd();
if (!isValidCwd(cwd)) {
throw new AdapterRejectedInput('invalid_cwd');
}
return {
sessionId: r.conversation_id || r.generation_id || r.id,
cwd: r.workspace_roots?.[0] ?? r.cwd ?? process.cwd(),
cwd,
prompt: r.prompt ?? r.query ?? r.input ?? r.message,
toolName: isShellCommand ? 'Bash' : r.tool_name,
toolInput: isShellCommand ? { command: r.command } : r.tool_input,
+24
View File
@@ -0,0 +1,24 @@
/**
* Adapter-layer rejection. Plan 05 Phase 6 (PATHFINDER-2026-04-22): cwd
* validation moves from per-handler `if (!cwd) throw …` to the adapter
* boundary. When normalization detects an invalid input, the adapter throws
* `AdapterRejectedInput`; the hook runner translates it into a graceful
* `{ continue: true }` so the user's session is never blocked by a malformed
* hook payload.
*/
export class AdapterRejectedInput extends Error {
constructor(public readonly reason: string) {
super(`adapter rejected input: ${reason}`);
this.name = 'AdapterRejectedInput';
}
}
/**
* A cwd is valid when it is a non-empty string. The adapter normalizers fall
* back to `process.cwd()` when the inbound payload omits cwd, so the only way
* this returns false is when the payload supplies `null`/`''`/non-string.
*/
export function isValidCwd(cwd: unknown): cwd is string {
return typeof cwd === 'string' && cwd.length > 0;
}
+5
View File
@@ -1,4 +1,5 @@
import type { PlatformAdapter } from '../types.js';
import { AdapterRejectedInput, isValidCwd } from './errors.js';
/**
* Gemini CLI Platform Adapter
@@ -39,6 +40,10 @@ export const geminiCliAdapter: PlatformAdapter = {
?? process.env.GEMINI_PROJECT_DIR
?? process.env.CLAUDE_PROJECT_DIR
?? process.cwd();
// Plan 05 Phase 6 — cwd validation at the adapter boundary.
if (!isValidCwd(cwd)) {
throw new AdapterRejectedInput('invalid_cwd');
}
const sessionId = r.session_id
?? process.env.GEMINI_SESSION_ID
+8 -2
View File
@@ -1,12 +1,18 @@
import type { PlatformAdapter, NormalizedHookInput, HookResult } from '../types.js';
import { AdapterRejectedInput, isValidCwd } from './errors.js';
// Raw adapter passes through with minimal transformation - useful for testing
export const rawAdapter: PlatformAdapter = {
normalizeInput(raw) {
const r = raw as any;
const r = (raw ?? {}) as any;
// Plan 05 Phase 6 — cwd validation at the adapter boundary.
const cwd = r.cwd ?? process.cwd();
if (!isValidCwd(cwd)) {
throw new AdapterRejectedInput('invalid_cwd');
}
return {
sessionId: r.sessionId ?? r.session_id ?? 'unknown',
cwd: r.cwd ?? process.cwd(),
cwd,
prompt: r.prompt,
toolName: r.toolName ?? r.tool_name,
toolInput: r.toolInput ?? r.tool_input,
+8 -1
View File
@@ -1,4 +1,5 @@
import type { PlatformAdapter, NormalizedHookInput, HookResult } from '../types.js';
import { AdapterRejectedInput, isValidCwd } from './errors.js';
// Maps Windsurf stdin format — JSON envelope with agent_action_name + tool_info payload
//
@@ -17,9 +18,15 @@ export const windsurfAdapter: PlatformAdapter = {
const toolInfo = r.tool_info ?? {};
const actionName: string = r.agent_action_name ?? '';
// Plan 05 Phase 6 — cwd validation at the adapter boundary.
const cwd = toolInfo.cwd ?? process.cwd();
if (!isValidCwd(cwd)) {
throw new AdapterRejectedInput('invalid_cwd');
}
const base: NormalizedHookInput = {
sessionId: r.trajectory_id ?? r.execution_id,
cwd: toolInfo.cwd ?? process.cwd(),
cwd,
platform: 'windsurf',
};
+28 -40
View File
@@ -6,34 +6,24 @@
*/
import type { EventHandler, NormalizedHookInput, HookResult } from '../types.js';
import { ensureWorkerRunning, getWorkerPort, workerHttpRequest } from '../../shared/worker-utils.js';
import {
executeWithWorkerFallback,
isWorkerFallback,
getWorkerPort,
} from '../../shared/worker-utils.js';
import { getProjectContext } from '../../utils/project-name.js';
import { HOOK_EXIT_CODES } from '../../shared/hook-constants.js';
import { logger } from '../../utils/logger.js';
import { SettingsDefaultsManager } from '../../shared/SettingsDefaultsManager.js';
import { USER_SETTINGS_PATH } from '../../shared/paths.js';
import { loadFromFileOnce } from '../../shared/hook-settings.js';
export const contextHandler: EventHandler = {
async execute(input: NormalizedHookInput): Promise<HookResult> {
// Ensure worker is running before any other logic
const workerReady = await ensureWorkerRunning();
if (!workerReady) {
// Worker not available - return empty context gracefully
return {
hookSpecificOutput: {
hookEventName: 'SessionStart',
additionalContext: ''
},
exitCode: HOOK_EXIT_CODES.SUCCESS
};
}
const cwd = input.cwd ?? process.cwd();
const context = getProjectContext(cwd);
const port = getWorkerPort();
// Check if terminal output should be shown (load settings early)
const settings = SettingsDefaultsManager.loadFromFile(USER_SETTINGS_PATH);
// Plan 05 Phase 4: settings via process-scope cache.
const settings = loadFromFileOnce();
const showTerminalOutput = settings.CLAUDE_MEM_CONTEXT_SHOW_TERMINAL_OUTPUT === 'true';
// Pass all projects (parent + worktree if applicable) for unified timeline
@@ -41,38 +31,36 @@ export const contextHandler: EventHandler = {
const apiPath = `/api/context/inject?projects=${encodeURIComponent(projectsParam)}`;
const colorApiPath = input.platform === 'claude-code' ? `${apiPath}&colors=true` : apiPath;
const emptyResult = {
const emptyResult: HookResult = {
hookSpecificOutput: { hookEventName: 'SessionStart', additionalContext: '' },
exitCode: HOOK_EXIT_CODES.SUCCESS
exitCode: HOOK_EXIT_CODES.SUCCESS,
};
// Note: Removed AbortSignal.timeout due to Windows Bun cleanup issue (libuv assertion)
// Worker service has its own timeouts, so client-side timeout is redundant
let response: Response;
let colorResponse: Response | null;
try {
[response, colorResponse] = await Promise.all([
workerHttpRequest(apiPath),
showTerminalOutput ? workerHttpRequest(colorApiPath).catch(() => null) : Promise.resolve(null)
]);
} catch (error) {
// Worker unreachable — return empty context gracefully
logger.warn('HOOK', 'Context fetch error, returning empty', { error: error instanceof Error ? error.message : String(error) });
// Plan 05 Phase 2: single helper for ensure-worker-alive → request → fallback.
const contextResult = await executeWithWorkerFallback<string>(apiPath, 'GET');
if (isWorkerFallback(contextResult)) {
return emptyResult;
}
if (!response.ok) {
logger.warn('HOOK', 'Context generation failed, returning empty', { status: response.status });
let additionalContext: string;
if (typeof contextResult === 'string') {
additionalContext = contextResult.trim();
} else if (contextResult === undefined) {
additionalContext = '';
} else {
// Unexpected non-string body — log and fall back to empty.
logger.warn('HOOK', 'Context response was not a string', { type: typeof contextResult });
return emptyResult;
}
const [contextResult, colorResult] = await Promise.all([
response.text(),
colorResponse?.ok ? colorResponse.text() : Promise.resolve('')
]);
let coloredTimeline = '';
if (showTerminalOutput) {
const colorResult = await executeWithWorkerFallback<string>(colorApiPath, 'GET');
if (!isWorkerFallback(colorResult) && typeof colorResult === 'string') {
coloredTimeline = colorResult.trim();
}
}
const additionalContext = contextResult.trim();
const coloredTimeline = colorResult.trim();
const platform = input.platform;
// Use colored timeline for display if available, otherwise fall back to
+15 -27
View File
@@ -6,14 +6,12 @@
*/
import type { EventHandler, NormalizedHookInput, HookResult } from '../types.js';
import { ensureWorkerRunning, workerHttpRequest } from '../../shared/worker-utils.js';
import { executeWithWorkerFallback, isWorkerFallback } from '../../shared/worker-utils.js';
import { logger } from '../../utils/logger.js';
import { parseJsonArray } from '../../shared/timeline-formatting.js';
import { statSync } from 'fs';
import path from 'path';
import { isProjectExcluded } from '../../utils/project-filter.js';
import { SettingsDefaultsManager } from '../../shared/SettingsDefaultsManager.js';
import { USER_SETTINGS_PATH } from '../../shared/paths.js';
import { shouldTrackProject } from '../../shared/should-track-project.js';
import { getProjectContext } from '../../utils/project-name.js';
/** Skip the gate for files smaller than this — timeline overhead exceeds file read cost. */
@@ -207,19 +205,12 @@ export const fileContextHandler: EventHandler = {
logger.debug('HOOK', 'File stat failed, proceeding with gate', { error: err instanceof Error ? err.message : String(err) });
}
// Check if project is excluded from tracking
const settings = SettingsDefaultsManager.loadFromFile(USER_SETTINGS_PATH);
if (input.cwd && isProjectExcluded(input.cwd, settings.CLAUDE_MEM_EXCLUDED_PROJECTS)) {
// Plan 05 Phase 5: project exclusion via single helper.
if (input.cwd && !shouldTrackProject(input.cwd)) {
logger.debug('HOOK', 'Project excluded from tracking, skipping file context', { cwd: input.cwd });
return { continue: true, suppressOutput: true };
}
// Ensure worker is running
const workerReady = await ensureWorkerRunning();
if (!workerReady) {
return { continue: true, suppressOutput: true };
}
// Query worker for observations related to this file
const context = getProjectContext(input.cwd);
const cwd = input.cwd || process.cwd();
@@ -232,22 +223,19 @@ export const fileContextHandler: EventHandler = {
}
queryParams.set('limit', String(FETCH_LOOKAHEAD_LIMIT));
let data: { observations: ObservationRow[]; count: number };
try {
const response = await workerHttpRequest(`/api/observations/by-file?${queryParams.toString()}`, { method: 'GET' });
if (!response.ok) {
logger.warn('HOOK', 'File context query failed, skipping', { status: response.status, filePath });
return { continue: true, suppressOutput: true };
}
data = await response.json() as { observations: ObservationRow[]; count: number };
} catch (error) {
logger.warn('HOOK', 'File context fetch error, skipping', {
error: error instanceof Error ? error.message : String(error),
});
// Plan 05 Phase 2: single helper for ensure-worker-alive → request → fallback.
const result = await executeWithWorkerFallback<{ observations: ObservationRow[]; count: number }>(
`/api/observations/by-file?${queryParams.toString()}`,
'GET',
);
if (isWorkerFallback(result)) {
return { continue: true, suppressOutput: true };
}
if (!result || !Array.isArray((result as any).observations)) {
logger.warn('HOOK', 'File context query returned malformed body, skipping', { filePath });
return { continue: true, suppressOutput: true };
}
const data = result;
if (!data.observations || data.observations.length === 0) {
return { continue: true, suppressOutput: true };
+19 -40
View File
@@ -6,35 +6,13 @@
*/
import type { EventHandler, NormalizedHookInput, HookResult } from '../types.js';
import { ensureWorkerRunning, workerHttpRequest } from '../../shared/worker-utils.js';
import { executeWithWorkerFallback, isWorkerFallback } from '../../shared/worker-utils.js';
import { logger } from '../../utils/logger.js';
import { HOOK_EXIT_CODES } from '../../shared/hook-constants.js';
import { normalizePlatformSource } from '../../shared/platform-source.js';
async function sendFileEditObservation(requestBody: string, filePath: string): Promise<void> {
const response = await workerHttpRequest('/api/sessions/observations', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: requestBody
});
if (!response.ok) {
logger.warn('HOOK', 'File edit observation storage failed, skipping', { status: response.status, filePath });
return;
}
logger.debug('HOOK', 'File edit observation sent successfully', { filePath });
}
export const fileEditHandler: EventHandler = {
async execute(input: NormalizedHookInput): Promise<HookResult> {
// Ensure worker is running before any other logic
const workerReady = await ensureWorkerRunning();
if (!workerReady) {
// Worker not available - skip file edit observation gracefully
return { continue: true, suppressOutput: true, exitCode: HOOK_EXIT_CODES.SUCCESS };
}
const { sessionId, cwd, filePath, edits } = input;
const platformSource = normalizePlatformSource(input.platform);
@@ -46,30 +24,31 @@ export const fileEditHandler: EventHandler = {
editCount: edits?.length ?? 0
});
// Validate required fields before sending to worker
// Plan 05 Phase 6: cwd is validated at the adapter boundary; this is a
// belt-and-suspenders type guard so TypeScript narrows.
if (!cwd) {
throw new Error(`Missing cwd in FileEdit hook input for session ${sessionId}, file ${filePath}`);
}
// Send to worker as an observation with file edit metadata
// The observation handler on the worker will process this appropriately
const requestBody = JSON.stringify({
contentSessionId: sessionId,
platformSource,
tool_name: 'write_file',
tool_input: { filePath, edits },
tool_response: { success: true },
cwd
});
// Plan 05 Phase 2: single helper for ensure-worker-alive → request → fallback.
const result = await executeWithWorkerFallback<{ status?: string }>(
'/api/sessions/observations',
'POST',
{
contentSessionId: sessionId,
platformSource,
tool_name: 'write_file',
tool_input: { filePath, edits },
tool_response: { success: true },
cwd,
},
);
try {
await sendFileEditObservation(requestBody, filePath);
} catch (error) {
// Worker unreachable — skip file edit observation gracefully
logger.warn('HOOK', 'File edit observation fetch error, skipping', { error: error instanceof Error ? error.message : String(error) });
if (isWorkerFallback(result)) {
return { continue: true, suppressOutput: true, exitCode: HOOK_EXIT_CODES.SUCCESS };
}
logger.debug('HOOK', 'File edit observation sent successfully', { filePath });
return { continue: true, suppressOutput: true };
}
},
};
+28 -47
View File
@@ -5,38 +5,14 @@
*/
import type { EventHandler, NormalizedHookInput, HookResult } from '../types.js';
import { ensureWorkerRunning, workerHttpRequest } from '../../shared/worker-utils.js';
import { executeWithWorkerFallback, isWorkerFallback } from '../../shared/worker-utils.js';
import { logger } from '../../utils/logger.js';
import { HOOK_EXIT_CODES } from '../../shared/hook-constants.js';
import { isProjectExcluded } from '../../utils/project-filter.js';
import { SettingsDefaultsManager } from '../../shared/SettingsDefaultsManager.js';
import { USER_SETTINGS_PATH } from '../../shared/paths.js';
import { shouldTrackProject } from '../../shared/should-track-project.js';
import { normalizePlatformSource } from '../../shared/platform-source.js';
async function sendObservationToWorker(requestBody: string, toolName: string): Promise<void> {
const response = await workerHttpRequest('/api/sessions/observations', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: requestBody
});
if (!response.ok) {
logger.warn('HOOK', 'Observation storage failed, skipping', { status: response.status, toolName });
return;
}
logger.debug('HOOK', 'Observation sent successfully', { toolName });
}
export const observationHandler: EventHandler = {
async execute(input: NormalizedHookInput): Promise<HookResult> {
// Ensure worker is running before any other logic
const workerReady = await ensureWorkerRunning();
if (!workerReady) {
// Worker not available - skip observation gracefully
return { continue: true, suppressOutput: true, exitCode: HOOK_EXIT_CODES.SUCCESS };
}
const { sessionId, cwd, toolName, toolInput, toolResponse } = input;
const platformSource = normalizePlatformSource(input.platform);
@@ -49,38 +25,43 @@ export const observationHandler: EventHandler = {
logger.dataIn('HOOK', `PostToolUse: ${toolStr}`, {});
// Validate required fields before sending to worker
// Plan 05 Phase 6: cwd is validated at the adapter boundary; the adapter
// rejects empty cwd before reaching the handler. We still type-narrow for
// TypeScript and as a belt-and-suspenders guard.
if (!cwd) {
throw new Error(`Missing cwd in PostToolUse hook input for session ${sessionId}, tool ${toolName}`);
}
// Check if project is excluded from tracking
const settings = SettingsDefaultsManager.loadFromFile(USER_SETTINGS_PATH);
if (isProjectExcluded(cwd, settings.CLAUDE_MEM_EXCLUDED_PROJECTS)) {
// Plan 05 Phase 5: project exclusion via single helper.
if (!shouldTrackProject(cwd)) {
logger.debug('HOOK', 'Project excluded from tracking, skipping observation', { cwd, toolName });
return { continue: true, suppressOutput: true };
}
// Send to worker - worker handles privacy check and database operations
const requestBody = JSON.stringify({
contentSessionId: sessionId,
platformSource,
tool_name: toolName,
tool_input: toolInput,
tool_response: toolResponse,
cwd,
agentId: input.agentId,
agentType: input.agentType
});
// Plan 05 Phase 2: single helper for ensure-worker-alive → request → fallback.
const result = await executeWithWorkerFallback<{ status?: string }>(
'/api/sessions/observations',
'POST',
{
contentSessionId: sessionId,
platformSource,
tool_name: toolName,
tool_input: toolInput,
tool_response: toolResponse,
cwd,
agentId: input.agentId,
agentType: input.agentType,
},
);
try {
await sendObservationToWorker(requestBody, toolName);
} catch (error) {
// Worker unreachable — skip observation gracefully
logger.warn('HOOK', 'Observation fetch error, skipping', { error: error instanceof Error ? error.message : String(error) });
if (isWorkerFallback(result)) {
// Worker unreachable — fail-loud counter has already been incremented
// and may have escalated to exit 2. If we got here, threshold not yet
// reached, so degrade gracefully.
return { continue: true, suppressOutput: true, exitCode: HOOK_EXIT_CODES.SUCCESS };
}
logger.debug('HOOK', 'Observation sent successfully', { toolName });
return { continue: true, suppressOutput: true };
}
},
};
+20 -33
View File
@@ -10,56 +10,43 @@
*/
import type { EventHandler, NormalizedHookInput, HookResult } from '../types.js';
import { ensureWorkerRunning, workerHttpRequest } from '../../shared/worker-utils.js';
import { executeWithWorkerFallback, isWorkerFallback } from '../../shared/worker-utils.js';
import { logger } from '../../utils/logger.js';
import { normalizePlatformSource } from '../../shared/platform-source.js';
async function sendSessionCompleteRequest(sessionId: string, platformSource: string): Promise<void> {
const response = await workerHttpRequest('/api/sessions/complete', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ contentSessionId: sessionId, platformSource })
});
if (!response.ok) {
const text = await response.text();
logger.warn('HOOK', 'session-complete: Failed to complete session', { status: response.status, body: text });
} else {
logger.info('HOOK', 'Session completed successfully', { contentSessionId: sessionId });
}
}
import { shouldTrackProject } from '../../shared/should-track-project.js';
export const sessionCompleteHandler: EventHandler = {
async execute(input: NormalizedHookInput): Promise<HookResult> {
// Ensure worker is running
const workerReady = await ensureWorkerRunning();
if (!workerReady) {
// Worker not available — skip session completion gracefully
return { continue: true, suppressOutput: true };
}
const { sessionId } = input;
const platformSource = normalizePlatformSource(input.platform);
// Same OBSERVER_SESSIONS_DIR exclusion as the rest of the hook surface —
// the observer's child Claude Code must never call /api/sessions/complete.
if (input.cwd && !shouldTrackProject(input.cwd)) {
return { continue: true, suppressOutput: true };
}
if (!sessionId) {
logger.warn('HOOK', 'session-complete: Missing sessionId, skipping');
return { continue: true, suppressOutput: true };
}
logger.info('HOOK', '→ session-complete: Removing session from active map', {
contentSessionId: sessionId
contentSessionId: sessionId,
});
try {
await sendSessionCompleteRequest(sessionId, platformSource);
} catch (error) {
// Log but don't fail - session may already be gone
const errorMessage = error instanceof Error ? error.message : String(error);
logger.warn('HOOK', 'session-complete: Error completing session', {
error: errorMessage
});
// Plan 05 Phase 2: single helper for ensure-worker-alive → request → fallback.
const result = await executeWithWorkerFallback<{ status?: string }>(
'/api/sessions/complete',
'POST',
{ contentSessionId: sessionId, platformSource },
);
if (isWorkerFallback(result)) {
return { continue: true, suppressOutput: true };
}
logger.info('HOOK', 'Session completed successfully', { contentSessionId: sessionId });
return { continue: true, suppressOutput: true };
}
},
};
+54 -91
View File
@@ -5,45 +5,29 @@
*/
import type { EventHandler, NormalizedHookInput, HookResult } from '../types.js';
import { ensureWorkerRunning, workerHttpRequest } from '../../shared/worker-utils.js';
import { executeWithWorkerFallback, isWorkerFallback } from '../../shared/worker-utils.js';
import { getProjectContext } from '../../utils/project-name.js';
import { logger } from '../../utils/logger.js';
import { HOOK_EXIT_CODES } from '../../shared/hook-constants.js';
import { isProjectExcluded } from '../../utils/project-filter.js';
import { SettingsDefaultsManager } from '../../shared/SettingsDefaultsManager.js';
import { USER_SETTINGS_PATH } from '../../shared/paths.js';
import { shouldTrackProject } from '../../shared/should-track-project.js';
import { loadFromFileOnce } from '../../shared/hook-settings.js';
import { normalizePlatformSource } from '../../shared/platform-source.js';
async function fetchSemanticContext(
prompt: string,
project: string,
limit: string,
sessionDbId: number
): Promise<string> {
const semanticRes = await workerHttpRequest('/api/context/semantic', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ q: prompt, project, limit })
});
if (semanticRes.ok) {
const data = await semanticRes.json() as { context: string; count: number };
if (data.context) {
logger.debug('HOOK', `Semantic injection: ${data.count} observations for prompt`, { sessionId: sessionDbId, count: data.count });
return data.context;
}
}
return '';
interface SessionInitResponse {
sessionDbId: number;
promptNumber: number;
skipped?: boolean;
reason?: string;
contextInjected?: boolean;
}
interface SemanticContextResponse {
context: string;
count: number;
}
export const sessionInitHandler: EventHandler = {
async execute(input: NormalizedHookInput): Promise<HookResult> {
// Ensure worker is running before any other logic
const workerReady = await ensureWorkerRunning();
if (!workerReady) {
// Worker not available - skip session init gracefully
return { continue: true, suppressOutput: true, exitCode: HOOK_EXIT_CODES.SUCCESS };
}
const { sessionId, prompt: rawPrompt } = input;
const cwd = input.cwd ?? process.cwd(); // Match context.ts fallback (#1918)
@@ -53,9 +37,8 @@ export const sessionInitHandler: EventHandler = {
return { continue: true, suppressOutput: true, exitCode: HOOK_EXIT_CODES.SUCCESS };
}
// Check if project is excluded from tracking
const settings = SettingsDefaultsManager.loadFromFile(USER_SETTINGS_PATH);
if (cwd && isProjectExcluded(cwd, settings.CLAUDE_MEM_EXCLUDED_PROJECTS)) {
// Plan 05 Phase 5: project exclusion via single helper.
if (!shouldTrackProject(cwd)) {
logger.info('HOOK', 'Project excluded from tracking', { cwd });
return { continue: true, suppressOutput: true };
}
@@ -69,38 +52,28 @@ export const sessionInitHandler: EventHandler = {
logger.debug('HOOK', 'session-init: Calling /api/sessions/init', { contentSessionId: sessionId, project });
// Initialize session via HTTP - handles DB operations and privacy checks
let initResponse: Response;
try {
initResponse = await workerHttpRequest('/api/sessions/init', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
contentSessionId: sessionId,
project,
prompt,
platformSource
})
});
} catch (err) {
// Worker unreachable — on Linux/WSL, hook may fire before worker is healthy (#1907)
logger.warn('HOOK', `session-init: worker request failed: ${err instanceof Error ? err.message : err}`);
// Plan 05 Phase 2: single helper for ensure-worker-alive → request → fallback.
const initResult = await executeWithWorkerFallback<SessionInitResponse>(
'/api/sessions/init',
'POST',
{
contentSessionId: sessionId,
project,
prompt,
platformSource,
},
);
if (isWorkerFallback(initResult)) {
return { continue: true, suppressOutput: true, exitCode: HOOK_EXIT_CODES.SUCCESS };
}
if (!initResponse.ok) {
// Log but don't throw - a worker 500 should not block the user's prompt
logger.failure('HOOK', `Session initialization failed: ${initResponse.status}`, { contentSessionId: sessionId, project });
// Worker may have returned a non-2xx body (parsed but missing fields). Fail-soft.
if (typeof initResult?.sessionDbId !== 'number') {
logger.failure('HOOK', 'Session initialization returned malformed response', { contentSessionId: sessionId, project });
return { continue: true, suppressOutput: true, exitCode: HOOK_EXIT_CODES.SUCCESS };
}
const initResult = await initResponse.json() as {
sessionDbId: number;
promptNumber: number;
skipped?: boolean;
reason?: string;
contextInjected?: boolean;
};
const sessionDbId = initResult.sessionDbId;
const promptNumber = initResult.promptNumber;
@@ -117,57 +90,47 @@ export const sessionInitHandler: EventHandler = {
return { continue: true, suppressOutput: true };
}
// Skip SDK agent re-initialization if context was already injected for this session (#1079)
// The prompt was already saved to the database by /api/sessions/init above —
// no need to re-start the SDK agent on every turn.
// Note: we do NOT return here — semantic injection below must run on every prompt.
const skipAgentInit = Boolean(initResult.contextInjected);
if (skipAgentInit) {
logger.info('HOOK', `INIT_COMPLETE | sessionDbId=${sessionDbId} | promptNumber=${promptNumber} | skipped_agent_init=true | reason=context_already_injected`, {
sessionId: sessionDbId
});
}
// Only initialize SDK agent for Claude Code (not Cursor)
// Cursor doesn't use the SDK agent - it only needs session/observation storage
if (!skipAgentInit && input.platform !== 'cursor' && sessionDbId) {
// Plan 05 Phase 7: agent init is idempotent — call unconditionally for
// every Claude Code session. Cursor still skipped (no SDK agent).
if (input.platform !== 'cursor' && sessionDbId) {
// Strip leading slash from commands for memory agent
// /review 101 -> review 101 (more semantic for observations)
const cleanedPrompt = prompt.startsWith('/') ? prompt.substring(1) : prompt;
logger.debug('HOOK', 'session-init: Calling /sessions/{sessionDbId}/init', { sessionDbId, promptNumber });
// Initialize SDK agent session via HTTP (starts the agent!)
const response = await workerHttpRequest(`/sessions/${sessionDbId}/init`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ userPrompt: cleanedPrompt, promptNumber })
});
if (!response.ok) {
// Log but don't throw - SDK agent failure should not block the user's prompt
logger.failure('HOOK', `SDK agent start failed: ${response.status}`, { sessionDbId, promptNumber });
const agentInitResult = await executeWithWorkerFallback<{ status?: string }>(
`/sessions/${sessionDbId}/init`,
'POST',
{ userPrompt: cleanedPrompt, promptNumber },
);
if (isWorkerFallback(agentInitResult)) {
// Worker became unreachable mid-invocation; fail-loud counter handled it.
return { continue: true, suppressOutput: true, exitCode: HOOK_EXIT_CODES.SUCCESS };
}
} else if (!skipAgentInit && input.platform === 'cursor') {
} else if (input.platform === 'cursor') {
logger.debug('HOOK', 'session-init: Skipping SDK agent init for Cursor platform', { sessionDbId, promptNumber });
}
// Semantic context injection: query Chroma for relevant past observations
// and inject as additionalContext so Claude receives relevant memory each prompt.
// Controlled by CLAUDE_MEM_SEMANTIC_INJECT setting (default: true).
// Plan 05 Phase 4: settings via process-scope cache.
const settings = loadFromFileOnce();
const semanticInject =
String(settings.CLAUDE_MEM_SEMANTIC_INJECT).toLowerCase() === 'true';
let additionalContext = '';
if (semanticInject && prompt && prompt.length >= 20 && prompt !== '[media prompt]') {
const limit = settings.CLAUDE_MEM_SEMANTIC_INJECT_LIMIT || '5';
try {
additionalContext = await fetchSemanticContext(prompt, project, limit, sessionDbId);
} catch (e) {
// Graceful degradation — semantic injection is optional
logger.debug('HOOK', 'Semantic injection unavailable', {
error: e instanceof Error ? e.message : String(e)
});
const semanticResult = await executeWithWorkerFallback<SemanticContextResponse>(
'/api/context/semantic',
'POST',
{ q: prompt, project, limit },
);
if (!isWorkerFallback(semanticResult) && semanticResult?.context) {
logger.debug('HOOK', `Semantic injection: ${semanticResult.count} observations for prompt`, { sessionId: sessionDbId, count: semanticResult.count });
additionalContext = semanticResult.context;
}
}
+30 -37
View File
@@ -1,26 +1,33 @@
/**
* Summarize Handler - Stop
*
* Fire-and-forget: enqueue the summarize request with the worker and return
* immediately so the Stop hook does not block the user's terminal. The worker
* owns completion and session cleanup.
* Fire-and-forget: queue the summarize request and exit. The worker handles
* summary generation, storage, and session cleanup asynchronously. The Stop
* hook does not wait for any of it — Claude Code must exit immediately.
* Session-complete cleanup is performed by the SessionEnd handler.
*/
import type { EventHandler, NormalizedHookInput, HookResult } from '../types.js';
import { ensureWorkerRunning, workerHttpRequest } from '../../shared/worker-utils.js';
import { executeWithWorkerFallback, isWorkerFallback } from '../../shared/worker-utils.js';
import { logger } from '../../utils/logger.js';
import { extractLastMessage } from '../../shared/transcript-parser.js';
import { HOOK_EXIT_CODES } from '../../shared/hook-constants.js';
import { normalizePlatformSource } from '../../shared/platform-source.js';
const SUMMARIZE_TIMEOUT_MS = 5000;
import { shouldTrackProject } from '../../shared/should-track-project.js';
export const summarizeHandler: EventHandler = {
async execute(input: NormalizedHookInput): Promise<HookResult> {
// Skip Stop hook entirely when firing from an excluded project (notably
// OBSERVER_SESSIONS_DIR). Without this, the SDK observer's own Stop hook
// queues summaries against its meta-session and triggers a recovery loop.
if (input.cwd && !shouldTrackProject(input.cwd)) {
return { continue: true, suppressOutput: true, exitCode: HOOK_EXIT_CODES.SUCCESS };
}
// Skip summaries in subagent context — subagents do not own the session summary.
// Gate on agentId only: that field is present exclusively for Task-spawned subagents.
// agentType alone (no agentId) indicates `--agent`-started main sessions, which still
// own their summary. Do this BEFORE ensureWorkerRunning() so a subagent Stop hook
// own their summary. Do this BEFORE the worker call so a subagent Stop hook
// does not bootstrap the worker.
if (input.agentId) {
logger.debug('HOOK', 'Skipping summary: subagent context detected', {
@@ -31,16 +38,13 @@ export const summarizeHandler: EventHandler = {
return { continue: true, suppressOutput: true, exitCode: HOOK_EXIT_CODES.SUCCESS };
}
// Ensure worker is running before any other logic
const workerReady = await ensureWorkerRunning();
if (!workerReady) {
// Worker not available - skip summary gracefully
return { continue: true, suppressOutput: true, exitCode: HOOK_EXIT_CODES.SUCCESS };
}
const { sessionId, transcriptPath } = input;
// Validate required fields before processing
if (!sessionId) {
logger.warn('HOOK', 'summarize: No sessionId provided, skipping');
return { continue: true, suppressOutput: true, exitCode: HOOK_EXIT_CODES.SUCCESS };
}
if (!transcriptPath) {
// No transcript available - skip summary gracefully (not an error)
logger.debug('HOOK', `No transcriptPath in Stop hook input for session ${sessionId} - skipping summary`);
@@ -75,31 +79,20 @@ export const summarizeHandler: EventHandler = {
const platformSource = normalizePlatformSource(input.platform);
// 1. Queue summarize request — worker returns immediately with { status: 'queued' }
let response: Response;
try {
response = await workerHttpRequest('/api/sessions/summarize', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
contentSessionId: sessionId,
last_assistant_message: lastAssistantMessage,
platformSource
}),
timeoutMs: SUMMARIZE_TIMEOUT_MS
});
} catch (err) {
// Network error, worker crash, or timeout — exit gracefully instead of
// bubbling to hook runner which exits code 2 and blocks session exit (#1901)
logger.warn('HOOK', `Stop hook: summarize request failed: ${err instanceof Error ? err.message : err}`);
const queueResult = await executeWithWorkerFallback<{ status?: string }>(
'/api/sessions/summarize',
'POST',
{
contentSessionId: sessionId,
last_assistant_message: lastAssistantMessage,
platformSource,
},
);
if (isWorkerFallback(queueResult)) {
return { continue: true, suppressOutput: true, exitCode: HOOK_EXIT_CODES.SUCCESS };
}
if (!response.ok) {
return { continue: true, suppressOutput: true };
}
logger.debug('HOOK', 'Summary request queued');
logger.debug('HOOK', 'Summary request queued, exiting hook');
return { continue: true, suppressOutput: true };
}
},
};
+23 -32
View File
@@ -7,47 +7,38 @@
import { basename } from 'path';
import type { EventHandler, NormalizedHookInput, HookResult } from '../types.js';
import { ensureWorkerRunning, getWorkerPort, workerHttpRequest } from '../../shared/worker-utils.js';
import {
executeWithWorkerFallback,
isWorkerFallback,
getWorkerPort,
} from '../../shared/worker-utils.js';
import { HOOK_EXIT_CODES } from '../../shared/hook-constants.js';
async function fetchAndDisplayContext(project: string, colorsParam: string, port: number): Promise<void> {
const response = await workerHttpRequest(
`/api/context/inject?project=${encodeURIComponent(project)}${colorsParam}`
);
if (!response.ok) {
return;
}
const output = await response.text();
process.stderr.write(
"\n\n" + String.fromCodePoint(0x1F4DD) + " Claude-Mem Context Loaded\n\n" +
output +
"\n\n" + String.fromCodePoint(0x1F4A1) + " Wrap any message with <private> ... </private> to prevent storing sensitive information.\n" +
"\n" + String.fromCodePoint(0x1F4AC) + " Community https://discord.gg/J4wttp9vDu" +
`\n` + String.fromCodePoint(0x1F4FA) + ` Watch live in browser http://localhost:${port}/\n`
);
}
export const userMessageHandler: EventHandler = {
async execute(input: NormalizedHookInput): Promise<HookResult> {
// Ensure worker is running
const workerReady = await ensureWorkerRunning();
if (!workerReady) {
// Worker not available — skip user message gracefully
return { exitCode: HOOK_EXIT_CODES.SUCCESS };
}
const port = getWorkerPort();
const project = basename(input.cwd ?? process.cwd());
const colorsParam = input.platform === 'claude-code' ? '&colors=true' : '';
try {
await fetchAndDisplayContext(project, colorsParam, port);
} catch {
// Worker unreachable — skip user message gracefully
// Plan 05 Phase 2: single helper for ensure-worker-alive → request → fallback.
const result = await executeWithWorkerFallback<string>(
`/api/context/inject?project=${encodeURIComponent(project)}${colorsParam}`,
'GET',
);
if (isWorkerFallback(result)) {
return { exitCode: HOOK_EXIT_CODES.SUCCESS };
}
const output = typeof result === 'string' ? result : '';
process.stderr.write(
"\n\n" + String.fromCodePoint(0x1F4DD) + " Claude-Mem Context Loaded\n\n" +
output +
"\n\n" + String.fromCodePoint(0x1F4A1) + " Wrap any message with <private> ... </private> to prevent storing sensitive information.\n" +
"\n" + String.fromCodePoint(0x1F4AC) + " Community https://discord.gg/J4wttp9vDu" +
`\n` + String.fromCodePoint(0x1F4FA) + ` Watch live in browser http://localhost:${port}/\n`
);
return { exitCode: HOOK_EXIT_CODES.SUCCESS };
}
},
};
+13
View File
@@ -1,5 +1,6 @@
import { readJsonFromStdin } from './stdin-reader.js';
import { getPlatformAdapter } from './adapters/index.js';
import { AdapterRejectedInput } from './adapters/errors.js';
import { getEventHandler } from './handlers/index.js';
import { HOOK_EXIT_CODES } from '../shared/hook-constants.js';
import { logger } from '../utils/logger.js';
@@ -98,6 +99,18 @@ export async function hookCommand(platform: string, event: string, options: Hook
try {
return await executeHookPipeline(adapter, handler, platform, options);
} catch (error) {
// Plan 05 Phase 6 — adapter rejected the input (invalid cwd or other
// boundary-detected payload defect). Treat as graceful: emit a continue
// envelope and exit 0 so the user's session is not blocked by a malformed
// hook payload from the platform.
if (error instanceof AdapterRejectedInput) {
logger.warn('HOOK', `Adapter rejected input (${error.reason}), skipping hook`);
console.log(JSON.stringify({ continue: true, suppressOutput: true }));
if (!options.skipExit) {
process.exit(HOOK_EXIT_CODES.SUCCESS);
}
return HOOK_EXIT_CODES.SUCCESS;
}
if (isWorkerUnavailableError(error)) {
// Worker unavailable — degrade gracefully, don't block the user
// Log to file instead of stderr (#1181)
+4 -2
View File
@@ -351,7 +351,8 @@ function runNpmInstallInMarketplace(): void {
execSync('npm install --production', {
cwd: marketplaceDir,
stdio: 'pipe',
...(IS_WINDOWS ? { shell: true as const } : {}),
encoding: 'utf8',
...(IS_WINDOWS ? { shell: process.env.ComSpec ?? 'cmd.exe' } : {}),
});
}
@@ -370,7 +371,8 @@ function runSmartInstall(): boolean {
try {
execSync(`node "${smartInstallPath}"`, {
stdio: 'inherit',
...(IS_WINDOWS ? { shell: true as const } : {}),
encoding: 'utf8',
...(IS_WINDOWS ? { shell: process.env.ComSpec ?? 'cmd.exe' } : {}),
});
return true;
} catch (error: unknown) {
-20
View File
@@ -64,23 +64,3 @@ export function resolveBunBinaryPath(): string | null {
return null;
}
/**
* Get the installed Bun version string (e.g. `"1.2.3"`), or `null`
* if Bun is not available.
*/
export function getBunVersionString(): string | null {
const bunPath = resolveBunBinaryPath();
if (!bunPath) return null;
try {
const result = spawnSync(bunPath, ['--version'], {
encoding: 'utf-8',
stdio: ['pipe', 'pipe', 'pipe'],
shell: IS_WINDOWS,
});
return result.status === 0 ? result.stdout.trim() : null;
} catch (error: unknown) {
console.error('[bun-resolver] Failed to get Bun version:', error instanceof Error ? error.message : String(error));
return null;
}
}
+111 -143
View File
@@ -1,6 +1,13 @@
/**
* XML Parser Module
* Parses observation and summary XML blocks from SDK responses
*
* Single fail-fast entry point for SDK agent XML responses.
*
* Per PATHFINDER-2026-04-22 plan 03 phase 1:
* - One function (`parseAgentXml`) for all agent responses.
* - Discriminated-union return: `{ valid: true, kind, data }` or `{ valid: false, reason }`.
* - No coercion. No silent passthrough. No "lenient mode".
* - `<skip_summary reason="…"/>` is a first-class summary case (skipped: true).
*/
import { logger } from '../utils/logger.js';
@@ -24,23 +31,103 @@ export interface ParsedSummary {
completed: string | null;
next_steps: string | null;
notes: string | null;
/** True when the response was an explicit `<skip_summary reason="…"/>` bypass. */
skipped?: boolean;
/** Non-null when `skipped: true`. */
skip_reason?: string | null;
}
export type ParseResult =
| { valid: true; kind: 'observation'; data: ParsedObservation[] }
| { valid: true; kind: 'summary'; data: ParsedSummary }
| { valid: false; reason: string };
/**
* Parse an SDK agent response. Inspects the first significant XML root element
* and returns a discriminated union. Never coerces. Never returns null/undefined.
*
* Recognised roots:
* <observation> … </observation> → { kind: 'observation', data: ParsedObservation[] }
* <summary> … </summary> → { kind: 'summary', data: ParsedSummary }
* <skip_summary reason="…" /> → { kind: 'summary', data: { skipped: true, … } }
*
* Anything else → { valid: false, reason }. The caller is responsible for
* surfacing the reason (markFailed, log, etc.). No retry coercion.
*/
export function parseAgentXml(raw: string, correlationId?: string | number): ParseResult {
if (typeof raw !== 'string' || !raw.trim()) {
return { valid: false, reason: 'empty: response had no content' };
}
// Skip-summary is recognised even when wrapped in other text, but only as the
// sole structural signal. It outranks <observation> / <summary> matches because
// it is an explicit protocol bypass. `reason` is optional.
const skipMatch = /<skip_summary(?:\s+reason="([^"]*)")?\s*\/>/.exec(raw);
if (skipMatch) {
return {
valid: true,
kind: 'summary',
data: {
request: null,
investigated: null,
learned: null,
completed: null,
next_steps: null,
notes: null,
skipped: true,
skip_reason: skipMatch[1] ?? null,
},
};
}
// Find the first significant element by scanning for the first `<…>` opener
// that is one of the recognised roots. This tolerates leading prose / debug
// output from the model while still failing fast on entirely-non-XML payloads.
const firstRoot = /<(observation|summary)\b/i.exec(raw);
if (!firstRoot) {
const preview = raw.length > 120 ? `${raw.slice(0, 120)}` : raw;
return {
valid: false,
reason: `unknown root: response contained no <observation>, <summary>, or <skip_summary/> element (preview: ${preview.replace(/\s+/g, ' ')})`,
};
}
const rootName = firstRoot[1].toLowerCase();
if (rootName === 'observation') {
const observations = parseObservationBlocks(raw, correlationId);
if (observations.length === 0) {
return {
valid: false,
reason: '<observation>: no parseable observation block (every block was empty or ghost)',
};
}
return { valid: true, kind: 'observation', data: observations };
}
// rootName === 'summary'
const summary = parseSummaryBlock(raw, correlationId);
if (!summary) {
return {
valid: false,
reason: '<summary>: empty or missing every required sub-tag (request/investigated/learned/completed/next_steps)',
};
}
return { valid: true, kind: 'summary', data: summary };
}
/**
* Parse observation XML blocks from SDK response
* Returns all observations found in the response
* Parse all <observation>…</observation> blocks. Filters out ghost
* observations (every content field empty). Returns the surviving list.
*/
export function parseObservations(text: string, correlationId?: string): ParsedObservation[] {
function parseObservationBlocks(text: string, correlationId?: string | number): ParsedObservation[] {
const observations: ParsedObservation[] = [];
// Match <observation>...</observation> blocks (non-greedy)
const observationRegex = /<observation>([\s\S]*?)<\/observation>/g;
let match;
while ((match = observationRegex.exec(text)) !== null) {
const obsContent = match[1];
// Extract all fields
const type = extractField(obsContent, 'type');
const title = extractField(obsContent, 'title');
const subtitle = extractField(obsContent, 'subtitle');
@@ -50,13 +137,13 @@ export function parseObservations(text: string, correlationId?: string): ParsedO
const files_read = extractArrayElements(obsContent, 'files_read', 'file');
const files_modified = extractArrayElements(obsContent, 'files_modified', 'file');
// All fields except type are nullable in schema.
// If type is missing or invalid, use first type from mode as fallback.
// Determine final type using active mode's valid types
// Type fallback: per existing semantics, missing/invalid type degrades to the
// first type in the active mode. This is parser-internal validation, not
// recovery from a contract violation: every mode's first type is intentionally
// the catch-all bucket.
const mode = ModeManager.getInstance().getActiveMode();
const validTypes = mode.observation_types.map(t => t.id);
const fallbackType = validTypes[0]; // First type in mode's list is the fallback
const fallbackType = validTypes[0];
let finalType = fallbackType;
if (type) {
if (validTypes.includes(type.trim())) {
@@ -68,8 +155,6 @@ export function parseObservations(text: string, correlationId?: string): ParsedO
logger.error('PARSER', `Observation missing type field, using "${fallbackType}"`, { correlationId });
}
// All other fields are optional - save whatever we have
// Filter out type from concepts array (types and concepts are separate dimensions)
const cleanedConcepts = concepts.filter(c => c !== finalType);
@@ -83,10 +168,8 @@ export function parseObservations(text: string, correlationId?: string): ParsedO
}
// Skip ghost observations — records where every content field is null/empty.
// These accumulate when the LLM emits a bare <observation/> (or one with only <type>)
// due to context overflow. They carry no information and pollute the context window.
// (subtitle and file lists are intentionally excluded from this guard: an observation
// with only a subtitle is still too thin to be useful on its own.)
// (subtitle and file lists are intentionally excluded from this guard:
// an observation with only a subtitle is still too thin to be useful.)
if (!title && !narrative && facts.length === 0 && cleanedConcepts.length === 0) {
logger.warn('PARSER', 'Skipping empty observation (all content fields null)', {
correlationId,
@@ -111,96 +194,29 @@ export function parseObservations(text: string, correlationId?: string): ParsedO
}
/**
* Parse summary XML block from SDK response
* Returns null if no valid summary found or if summary was skipped
*
* @param coerceFromObservation - When true, attempts to convert <observation> tags
* into summary fields if no <summary> tags are found. Only set this when the
* response was expected to be a summary (i.e., a summarize message was sent).
* Prevents the infinite retry loop described in #1633.
* Parse a single <summary>…</summary> block. Returns null when the block has
* no usable sub-tags (every required field empty) — the caller maps this to
* a fail-fast `{ valid: false, reason }` result.
*/
export function parseSummary(text: string, sessionId?: number, coerceFromObservation: boolean = false): ParsedSummary | null {
// Check for skip_summary first
const skipRegex = /<skip_summary\s+reason="([^"]+)"\s*\/>/;
const skipMatch = skipRegex.exec(text);
if (skipMatch) {
logger.info('PARSER', 'Summary skipped', {
sessionId,
reason: skipMatch[1]
});
return null;
}
// Match <summary>...</summary> block (non-greedy)
function parseSummaryBlock(text: string, correlationId?: string | number): ParsedSummary | null {
const summaryRegex = /<summary>([\s\S]*?)<\/summary>/;
const summaryMatch = summaryRegex.exec(text);
if (!summaryMatch) {
// When the LLM returns <observation> tags instead of <summary> tags on a
// summary turn, coerce the observation content into summary fields rather
// than discarding it. This breaks the infinite retry loop described in
// #1633: without coercion, the summary is silently dropped, the session
// completes without a summary, a new session is spawned with an ever-growing
// prompt, and the cycle repeats.
//
// parseSummary is called on every response (see ResponseProcessor), not just
// summary turns — so the absence of <summary> in an observation response is
// expected, not a prompt-conditioning failure. Only act when the caller
// actually expected a summary (coerceFromObservation=true).
if (coerceFromObservation && /<observation>/.test(text)) {
const coerced = coerceObservationToSummary(text, sessionId);
if (coerced) {
return coerced;
}
logger.warn('PARSER', 'Summary response contained <observation> tags instead of <summary> — coercion failed, no usable content', { sessionId });
}
return null;
}
if (!summaryMatch) return null;
const summaryContent = summaryMatch[1];
// Extract fields
const request = extractField(summaryContent, 'request');
const investigated = extractField(summaryContent, 'investigated');
const learned = extractField(summaryContent, 'learned');
const completed = extractField(summaryContent, 'completed');
const next_steps = extractField(summaryContent, 'next_steps');
const notes = extractField(summaryContent, 'notes'); // Optional
const notes = extractField(summaryContent, 'notes'); // optional
// NOTE FROM THEDOTMACK: 100% of the time we must SAVE the summary, even if fields are missing. 10/24/2025
// NEVER DO THIS NONSENSE AGAIN.
// Validate required fields are present (notes is optional)
// if (!request || !investigated || !learned || !completed || !next_steps) {
// logger.warn('PARSER', 'Summary missing required fields', {
// sessionId,
// hasRequest: !!request,
// hasInvestigated: !!investigated,
// hasLearned: !!learned,
// hasCompleted: !!completed,
// hasNextSteps: !!next_steps
// });
// return null;
// }
// Guard: if NO sub-tags matched at all, this is a false positive —
// <summary> accidentally appeared inside an <observation> response with no structured content.
// This is NOT the same as missing some fields (which we intentionally allow above).
// Fix for #1360.
// Per maintainer note: a summary with at least one populated sub-tag must be
// saved. Missing sub-tags are tolerated; an entirely empty <summary> block is
// a false-positive (covered the #1360 regression) and is rejected.
if (!request && !investigated && !learned && !completed && !next_steps) {
// If the response also contains <observation> tags with real content, fall
// back to coercion rather than discarding the response entirely — this covers
// the case where the LLM wraps empty <summary></summary> around observation
// content, which would otherwise resurrect the #1633 retry loop.
if (coerceFromObservation && /<observation>/.test(text)) {
const coerced = coerceObservationToSummary(text, sessionId);
if (coerced) {
logger.warn('PARSER', 'Empty <summary> match rejected — coerced from <observation> fallback (#1633)', { sessionId });
return coerced;
}
}
logger.warn('PARSER', 'Summary match has no sub-tags — skipping false positive', { sessionId });
logger.warn('PARSER', 'Summary block has no sub-tags — rejecting false positive', { correlationId });
return null;
}
@@ -210,54 +226,10 @@ export function parseSummary(text: string, sessionId?: number, coerceFromObserva
learned,
completed,
next_steps,
notes
notes,
};
}
/**
* Coerce <observation> response into a ParsedSummary when <summary> tags are missing.
* Maps observation fields to the closest summary equivalents so that a usable
* summary is stored instead of nothing — breaking the retry loop (#1633).
*/
function coerceObservationToSummary(text: string, sessionId?: number): ParsedSummary | null {
// Iterate all <observation> blocks — if the LLM emits multiple and the first is
// empty, we still want to salvage the first one that has usable content.
const obsRegex = /<observation>([\s\S]*?)<\/observation>/g;
let obsMatch: RegExpExecArray | null;
let blockIndex = 0;
while ((obsMatch = obsRegex.exec(text)) !== null) {
const obsContent = obsMatch[1];
const title = extractField(obsContent, 'title');
const subtitle = extractField(obsContent, 'subtitle');
const narrative = extractField(obsContent, 'narrative');
const facts = extractArrayElements(obsContent, 'facts', 'fact');
if (title || narrative || facts.length > 0) {
// Map observation fields → summary fields (best-effort)
const request = title || subtitle || null;
const investigated = narrative || null;
const learned = facts.length > 0 ? facts.join('; ') : null;
const completed = title ? `${title}${subtitle ? ' — ' + subtitle : ''}` : null;
const next_steps = null; // No direct observation equivalent
logger.warn('PARSER', 'Coerced <observation> response into <summary> to prevent retry loop (#1633)', {
sessionId,
blockIndex,
hasTitle: !!title,
hasNarrative: !!narrative,
factCount: facts.length,
});
return { request, investigated, learned, completed, next_steps, notes: null };
}
blockIndex++;
}
return null;
}
/**
* Extract a simple field value from XML content
* Returns null for missing or empty/whitespace-only fields
@@ -265,8 +237,6 @@ function coerceObservationToSummary(text: string, sessionId?: number): ParsedSum
* Uses non-greedy match to handle nested tags and code snippets (Issue #798)
*/
function extractField(content: string, fieldName: string): string | null {
// Use [\s\S]*? to match any character including newlines, non-greedily
// This handles nested XML tags like <item>...</item> inside the field
const regex = new RegExp(`<${fieldName}>([\\s\\S]*?)</${fieldName}>`);
const match = regex.exec(content);
if (!match) return null;
@@ -282,7 +252,6 @@ function extractField(content: string, fieldName: string): string | null {
function extractArrayElements(content: string, arrayName: string, elementName: string): string[] {
const elements: string[] = [];
// Match the array block using [\s\S]*? for nested content
const arrayRegex = new RegExp(`<${arrayName}>([\\s\\S]*?)</${arrayName}>`);
const arrayMatch = arrayRegex.exec(content);
@@ -292,7 +261,6 @@ function extractArrayElements(content: string, arrayName: string, elementName: s
const arrayContent = arrayMatch[1];
// Extract individual elements using [\s\S]*? for nested content
const elementRegex = new RegExp(`<${elementName}>([\\s\\S]*?)</${elementName}>`, 'g');
let elementMatch;
while ((elementMatch = elementRegex.exec(arrayContent)) !== null) {
+5 -10
View File
@@ -7,19 +7,14 @@ import { logger } from '../utils/logger.js';
import type { ModeConfig } from '../services/domain/types.js';
/**
* Marker string embedded in summary prompts — used by ResponseProcessor to detect
* whether the most recent user message was a summary request (enables observation→summary
* coercion for #1633). Keep in sync with buildSummaryPrompt below.
* Marker string embedded in summary prompts — historically used by
* ResponseProcessor to detect summary turns for the (now-deleted) coercion
* fallback. Kept here because `buildSummaryPrompt` still embeds it as the
* mode-switch banner; deleting the constant would require rewriting the
* prompt builder, which is out of scope for plan 03.
*/
export const SUMMARY_MODE_MARKER = 'MODE SWITCH: PROGRESS SUMMARY';
/**
* Maximum consecutive summary failures before the circuit breaker opens.
* After this many failures, SessionManager.queueSummarize will skip further
* summarize requests to prevent the infinite retry loop (#1633).
*/
export const MAX_CONSECUTIVE_SUMMARY_FAILURES = 3;
export interface Observation {
id: number;
tool_name: string;
@@ -10,7 +10,7 @@
import http from 'http';
import { logger } from '../../utils/logger.js';
import { stopSupervisor } from '../../supervisor/index.js';
import { getSupervisor } from '../../supervisor/index.js';
export interface ShutdownableService {
shutdownAll(): Promise<void>;
@@ -80,7 +80,10 @@ export async function performGracefulShutdown(config: GracefulShutdownConfig): P
}
// STEP 6: Supervisor handles tracked child termination, PID cleanup, and stale sockets.
await stopSupervisor();
// Plan 06 Phase 8 — call the supervisor singleton directly; the wrapper
// re-export from supervisor/index.ts was deleted (one wrapper, one caller,
// no value).
await getSupervisor().stop();
logger.info('SYSTEM', 'Worker shutdown complete');
}
@@ -48,7 +48,7 @@ interface WorktreeEntry {
branch: string | null;
}
const GIT_TIMEOUT_MS = 5000;
const GIT_TIMEOUT_MS = 15000;
class DryRunRollback extends Error {
constructor() {
@@ -58,11 +58,31 @@ class DryRunRollback extends Error {
}
function gitCapture(cwd: string, args: string[]): string | null {
const startTime = Date.now();
const r = spawnSync('git', ['-C', cwd, ...args], {
encoding: 'utf8',
timeout: GIT_TIMEOUT_MS
});
if (r.status !== 0) return null;
const duration = Date.now() - startTime;
if (duration > 1000) {
logger.debug('GIT', `Slow git operation: git -C ${cwd} ${args.join(' ')} took ${duration}ms`);
}
if (r.error) {
logger.warn('GIT', `Git operation failed: git -C ${cwd} ${args.join(' ')}`, {
error: r.error.message,
timedOut: r.error.name === 'ETIMEDOUT' || (r.status === null && r.signal === 'SIGTERM')
});
return null;
}
if (r.status !== 0) {
logger.debug('GIT', `Git returned non-zero exit code ${r.status}: git -C ${cwd} ${args.join(' ')}`, {
stderr: r.stderr?.toString().trim()
});
return null;
}
return (r.stdout ?? '').trim();
}
@@ -281,83 +281,3 @@ export function uninstallCodexCli(): number {
return 0;
}
// ---------------------------------------------------------------------------
// Public API: Status Check
// ---------------------------------------------------------------------------
/**
* Check Codex CLI integration status.
*
* @returns 0 always (informational)
*/
export function checkCodexCliStatus(): number {
console.log('\nClaude-Mem Codex CLI Integration Status\n');
// Check transcript-watch.json
if (!existsSync(DEFAULT_CONFIG_PATH)) {
console.log('Status: Not installed');
console.log(` No transcript watch config at ${DEFAULT_CONFIG_PATH}`);
console.log('\nRun: npx claude-mem install --ide codex-cli\n');
return 0;
}
let config: TranscriptWatchConfig;
try {
config = loadExistingTranscriptWatchConfig();
} catch (error) {
if (error instanceof Error) {
logger.error('WORKER', 'Could not parse transcript-watch.json', { path: DEFAULT_CONFIG_PATH }, error);
} else {
logger.error('WORKER', 'Could not parse transcript-watch.json', { path: DEFAULT_CONFIG_PATH }, new Error(String(error)));
}
console.log('Status: Unknown');
console.log(' Could not parse transcript-watch.json.');
console.log('');
return 0;
}
const codexWatch = config.watches.find(
(w: WatchTarget) => w.name === CODEX_WATCH_NAME,
);
const codexSchema = config.schemas?.[CODEX_WATCH_NAME];
if (!codexWatch) {
console.log('Status: Not installed');
console.log(' transcript-watch.json exists but no codex watch configured.');
console.log('\nRun: npx claude-mem install --ide codex-cli\n');
return 0;
}
console.log('Status: Installed');
console.log(` Config: ${DEFAULT_CONFIG_PATH}`);
console.log(` Watch path: ${codexWatch.path}`);
console.log(` Schema: ${codexSchema ? `codex (v${codexSchema.version ?? '?'})` : 'missing'}`);
console.log(` Start at end: ${codexWatch.startAtEnd ?? false}`);
if (codexWatch.context) {
console.log(` Context mode: ${codexWatch.context.mode}`);
console.log(` Context path: ${codexWatch.context.path ?? '<workspace>/AGENTS.md (default)'}`);
console.log(` Context updates on: ${codexWatch.context.updateOn?.join(', ') ?? 'none'}`);
}
if (existsSync(CODEX_AGENTS_MD_PATH)) {
const mdContent = readFileSync(CODEX_AGENTS_MD_PATH, 'utf-8');
if (mdContent.includes('<claude-mem-context>')) {
console.log(` Legacy global context: Present (${CODEX_AGENTS_MD_PATH})`);
} else {
console.log(` Legacy global context: Not active`);
}
} else {
console.log(` Legacy global context: None`);
}
const sessionsDir = path.join(CODEX_DIR, 'sessions');
if (existsSync(sessionsDir)) {
console.log(` Sessions directory: exists`);
} else {
console.log(` Sessions directory: not yet created (use Codex CLI to generate sessions)`);
}
console.log('');
return 0;
}
+80 -33
View File
@@ -21,6 +21,61 @@ import { getSupervisor } from '../../supervisor/index.js';
import { isPidAlive } from '../../supervisor/process-registry.js';
import { ENV_PREFIXES, ENV_EXACT_MATCHES } from '../../supervisor/env-sanitizer.js';
/**
* Plan 06 Phase 6 — instruction content (SKILL.md + ALLOWED_OPERATIONS .md
* files) is read once at module init and held in memory for the lifetime of
* the worker process. Process restart is the cache-invalidation event.
*
* `SKILL.md` is held as the full UTF-8 string so `extractInstructionSection`
* can slice topic windows on every request without re-reading the file.
* Per-operation files are cached as a `Map<operation, content>`. Files that
* are missing on disk simply omit from the map; the request handler returns
* 404 in that case (preserving legacy behaviour).
*/
const INSTRUCTIONS_BASE_DIR: string = path.resolve(__dirname, '../skills/mem-search');
const INSTRUCTIONS_OPERATIONS_DIR: string = path.join(INSTRUCTIONS_BASE_DIR, 'operations');
const INSTRUCTIONS_SKILL_PATH: string = path.join(INSTRUCTIONS_BASE_DIR, 'SKILL.md');
const cachedSkillMd: string | null = (() => {
try {
const text = fs.readFileSync(INSTRUCTIONS_SKILL_PATH, 'utf-8');
logger.info('SYSTEM', 'Cached SKILL.md at boot', {
path: INSTRUCTIONS_SKILL_PATH,
bytes: Buffer.byteLength(text, 'utf-8'),
});
return text;
} catch (error: unknown) {
logger.debug('SYSTEM', 'SKILL.md not present at boot, /api/instructions will 404 for topic queries', {
path: INSTRUCTIONS_SKILL_PATH,
message: error instanceof Error ? error.message : String(error),
});
return null;
}
})();
const cachedOperationContent: ReadonlyMap<string, string> = (() => {
const map = new Map<string, string>();
for (const operation of ALLOWED_OPERATIONS) {
const operationPath = path.join(INSTRUCTIONS_OPERATIONS_DIR, `${operation}.md`);
try {
map.set(operation, fs.readFileSync(operationPath, 'utf-8'));
} catch (error: unknown) {
// Missing operation files are non-fatal — 404 is returned per request.
logger.debug('SYSTEM', 'Operation instruction file not present at boot', {
path: operationPath,
message: error instanceof Error ? error.message : String(error),
});
}
}
if (map.size > 0) {
logger.info('SYSTEM', 'Cached operation instruction files at boot', {
count: map.size,
operations: Array.from(map.keys()),
});
}
return map;
})();
// Build-time injected version constant (set by esbuild define)
declare const __DEFAULT_PACKAGE_VERSION__: string;
const BUILT_IN_VERSION = typeof __DEFAULT_PACKAGE_VERSION__ !== 'undefined'
@@ -94,11 +149,20 @@ export class Server {
*/
async listen(port: number, host: string): Promise<void> {
return new Promise<void>((resolve, reject) => {
this.server = this.app.listen(port, host, () => {
const server = http.createServer(this.app);
this.server = server;
const onError = (err: Error) => {
server.off('listening', onListening);
reject(err);
};
const onListening = () => {
server.off('error', onError);
logger.info('SYSTEM', 'HTTP server started', { host, port, pid: process.pid });
resolve();
});
this.server.on('error', reject);
};
server.once('error', onError);
server.once('listening', onListening);
server.listen(port, host);
});
}
@@ -198,8 +262,9 @@ export class Server {
res.status(200).json({ version: BUILT_IN_VERSION });
});
// Instructions endpoint - loads SKILL.md sections on-demand
this.app.get('/api/instructions', async (req: Request, res: Response) => {
// Instructions endpoint — Plan 06 Phase 6 — serves the cached SKILL.md /
// operations content loaded once at module init.
this.app.get('/api/instructions', (req: Request, res: Response) => {
const topic = (req.query.topic as string) || 'all';
const operation = req.query.operation as string | undefined;
@@ -213,24 +278,20 @@ export class Server {
}
if (operation) {
const OPERATIONS_BASE_DIR = path.resolve(__dirname, '../skills/mem-search/operations');
const operationPath = path.resolve(OPERATIONS_BASE_DIR, `${operation}.md`);
if (!operationPath.startsWith(OPERATIONS_BASE_DIR + path.sep)) {
return res.status(400).json({ error: 'Invalid request' });
const cached = cachedOperationContent.get(operation);
if (cached === undefined) {
logger.debug('HTTP', 'Instruction file not cached at boot', { operation });
return res.status(404).json({ error: 'Instruction not found' });
}
return res.json({ content: [{ type: 'text', text: cached }] });
}
try {
const content = await this.loadInstructionContent(operation, topic);
res.json({ content: [{ type: 'text', text: content }] });
} catch (error) {
if (error instanceof Error) {
logger.debug('HTTP', 'Instruction file not found', { topic, operation, message: error.message });
} else {
logger.debug('HTTP', 'Instruction file not found', { topic, operation, error: String(error) });
}
res.status(404).json({ error: 'Instruction not found' });
if (cachedSkillMd === null) {
logger.debug('HTTP', 'SKILL.md not cached at boot', { topic });
return res.status(404).json({ error: 'Instruction not found' });
}
const sectionText = this.extractInstructionSection(cachedSkillMd, topic);
res.json({ content: [{ type: 'text', text: sectionText }] });
});
// Admin endpoints for process management (localhost-only)
@@ -330,20 +391,6 @@ export class Server {
});
}
/**
* Load instruction content from disk for the /api/instructions endpoint.
* Caller must validate operation/topic before calling.
*/
private async loadInstructionContent(operation: string | undefined, topic: string): Promise<string> {
if (operation) {
const operationPath = path.resolve(__dirname, '../skills/mem-search/operations', `${operation}.md`);
return fs.promises.readFile(operationPath, 'utf-8');
}
const skillPath = path.join(__dirname, '../skills/mem-search/SKILL.md');
const fullContent = await fs.promises.readFile(skillPath, 'utf-8');
return this.extractInstructionSection(fullContent, topic);
}
/**
* Extract a specific section from instruction content
*/
-9
View File
@@ -480,15 +480,6 @@ const QUERIES: Record<string, string> = {
(class_definition name: (identifier) @name) @cls
(import_statement) @imp
(import_declaration) @imp
`,
php: `
(function_definition name: (name) @name) @func
(method_declaration name: (name) @name) @method
(class_declaration name: (name) @name) @cls
(interface_declaration name: (name) @name) @iface
(trait_declaration name: (name) @name) @trait_def
(namespace_use_declaration) @imp
`,
};
-125
View File
@@ -1,8 +1,4 @@
import { Database } from 'bun:sqlite';
import { execFileSync } from 'child_process';
import { existsSync, unlinkSync, writeFileSync } from 'fs';
import { tmpdir } from 'os';
import { join } from 'path';
import { DATA_DIR, DB_PATH, ensureDir } from '../../shared/paths.js';
import { logger } from '../../utils/logger.js';
import { MigrationRunner } from './migrations/runner.js';
@@ -19,118 +15,6 @@ export interface Migration {
let dbInstance: Database | null = null;
/**
* Repair malformed database schema before migrations run.
*
* This handles the case where a database is synced between machines running
* different claude-mem versions. A newer version may have added columns and
* indexes that an older version (or even the same version on a fresh install)
* cannot process. SQLite throws "malformed database schema" when it encounters
* an index referencing a non-existent column, which prevents ALL queries —
* including the migrations that would fix the schema.
*
* The fix: use Python's sqlite3 module (which supports writable_schema) to
* drop the orphaned schema objects, then let the migration system recreate
* them properly. bun:sqlite doesn't allow DELETE FROM sqlite_master even
* with writable_schema = ON.
*/
function repairMalformedSchema(db: Database): void {
try {
// Quick test: if we can query sqlite_master, the schema is fine
db.query('SELECT name FROM sqlite_master WHERE type = "table" LIMIT 1').all();
return;
} catch (error: unknown) {
const message = error instanceof Error ? error.message : String(error);
if (!message.includes('malformed database schema')) {
throw error;
}
logger.warn('DB', 'Detected malformed database schema, attempting repair', { error: message });
// Extract the problematic object name from the error message
// Format: "malformed database schema (object_name) - details"
const match = message.match(/malformed database schema \(([^)]+)\)/);
if (!match) {
logger.error('DB', 'Could not parse malformed schema error, cannot auto-repair', { error: message });
throw error;
}
const objectName = match[1];
logger.info('DB', `Dropping malformed schema object: ${objectName}`);
// Get the DB file path. For file-based DBs, we can use Python to repair.
// For in-memory DBs, we can't shell out — just re-throw.
const dbPath = db.filename;
if (!dbPath || dbPath === ':memory:' || dbPath === '') {
logger.error('DB', 'Cannot auto-repair in-memory database');
throw error;
}
// Close the connection so Python can safely modify the file
db.close();
// Use Python's sqlite3 module to drop the orphaned object and reset
// related migration versions so they re-run and recreate things properly.
// bun:sqlite doesn't support DELETE FROM sqlite_master even with writable_schema.
//
// We write a temp script rather than using -c to avoid shell escaping issues
// with paths containing spaces or special characters. execFileSync passes
// args directly without a shell, so dbPath and objectName are safe.
const scriptPath = join(tmpdir(), `claude-mem-repair-${Date.now()}.py`);
try {
writeFileSync(scriptPath, `
import sqlite3, sys
db_path = sys.argv[1]
obj_name = sys.argv[2]
c = sqlite3.connect(db_path)
c.execute('PRAGMA writable_schema = ON')
c.execute('DELETE FROM sqlite_master WHERE name = ?', (obj_name,))
c.execute('PRAGMA writable_schema = OFF')
# Reset migration versions so affected migrations re-run.
# Guard with existence check: schema_versions may not exist on a very fresh DB.
has_sv = c.execute(
"SELECT count(*) FROM sqlite_master WHERE type='table' AND name='schema_versions'"
).fetchone()[0]
if has_sv:
c.execute('DELETE FROM schema_versions')
c.commit()
c.close()
`);
execFileSync('python3', [scriptPath, dbPath, objectName], { timeout: 10000 });
logger.info('DB', `Dropped orphaned schema object "${objectName}" and reset migration versions via Python sqlite3. All migrations will re-run (they are idempotent).`);
} catch (pyError: unknown) {
const pyMessage = pyError instanceof Error ? pyError.message : String(pyError);
logger.error('DB', 'Python sqlite3 repair failed', { error: pyMessage });
throw new Error(`Schema repair failed: ${message}. Python repair error: ${pyMessage}`);
} finally {
if (existsSync(scriptPath)) unlinkSync(scriptPath);
}
}
}
/**
* Wrapper that handles the close/reopen cycle needed for schema repair.
* Returns a (possibly new) Database connection.
*/
function repairMalformedSchemaWithReopen(dbPath: string, db: Database): Database {
try {
db.query('SELECT name FROM sqlite_master WHERE type = "table" LIMIT 1').all();
return db;
} catch (error: unknown) {
const message = error instanceof Error ? error.message : String(error);
if (!message.includes('malformed database schema')) {
throw error;
}
// repairMalformedSchema closes the DB internally for Python access
repairMalformedSchema(db);
// Reopen and check for additional malformed objects
const newDb = new Database(dbPath, { create: true, readwrite: true });
return repairMalformedSchemaWithReopen(dbPath, newDb);
}
}
/**
* ClaudeMemDatabase - New entry point for the sqlite module
*
@@ -154,11 +38,6 @@ export class ClaudeMemDatabase {
// Create database connection
this.db = new Database(dbPath, { create: true, readwrite: true });
// Repair any malformed schema before applying settings or running migrations.
// Must happen first — even PRAGMA calls can fail on a corrupted schema.
// This may close and reopen the connection if repair is needed.
this.db = repairMalformedSchemaWithReopen(dbPath, this.db);
// Apply optimized SQLite settings
this.db.run('PRAGMA journal_mode = WAL');
this.db.run('PRAGMA synchronous = NORMAL');
@@ -218,10 +97,6 @@ export class DatabaseManager {
this.db = new Database(DB_PATH, { create: true, readwrite: true });
// Repair any malformed schema before applying settings or running migrations.
// Must happen first — even PRAGMA calls can fail on a corrupted schema.
this.db = repairMalformedSchemaWithReopen(DB_PATH, this.db);
// Apply optimized SQLite settings
this.db.run('PRAGMA journal_mode = WAL');
this.db.run('PRAGMA synchronous = NORMAL');
+157 -301
View File
@@ -1,9 +1,18 @@
import { Database } from './sqlite-compat.js';
import { Database } from 'bun:sqlite';
import type { PendingMessage } from '../worker-types.js';
import { logger } from '../../utils/logger.js';
/** Messages processing longer than this are considered stale and reset to pending by self-healing */
const STALE_PROCESSING_THRESHOLD_MS = 60_000;
/**
* Provider for the set of currently-live worker PIDs.
*
* The self-healing claim query reclaims any 'processing' row whose
* worker_pid is NOT a live worker (crash recovery without a timer).
*
* Default: a single-worker process supplies just its own PID. Multi-worker
* deployments inject a callback backed by `supervisor/process-registry.ts`
* (`getSupervisor().getRegistry().getAll().filter(r => r.type === 'worker').map(r => r.pid)`).
*/
export type LiveWorkerPidsProvider = () => readonly number[];
/**
* Persistent pending message record from database
@@ -22,8 +31,8 @@ export interface PersistentPendingMessage {
status: 'pending' | 'processing' | 'processed' | 'failed';
retry_count: number;
created_at_epoch: number;
started_processing_at_epoch: number | null;
completed_at_epoch: number | null;
worker_pid: number | null;
// Claude Code subagent identity — NULL for main-session messages.
agent_type: string | null;
agent_id: string | null;
@@ -37,44 +46,76 @@ export interface PersistentPendingMessage {
*
* Lifecycle:
* 1. enqueue() - Message persisted with status 'pending'
* 2. claimNextMessage() - Atomically claims next pending message (marks as 'processing')
* 2. claimNextMessage() - Atomically claims next pending message (marks as 'processing'
* and stamps the live worker's PID). Self-healing: reclaims any 'processing' row
* whose worker_pid is no longer alive (worker crash) in the same UPDATE.
* 3. confirmProcessed() - Deletes message after successful processing
*
* Self-healing:
* - claimNextMessage() resets stale 'processing' messages (>60s) back to 'pending' before claiming
* - This eliminates stuck messages from generator crashes without external timers
*
* Recovery:
* - getSessionsWithPendingMessages() - Find sessions that need recovery on startup
* Self-healing semantics:
* A 'processing' row is reclaimable iff worker_pid IS NULL or worker_pid is
* not present in the live-pids list at claim time. No timer, no
* stale-cutoff timestamp — liveness is the truth.
*/
export class PendingMessageStore {
private db: Database;
private maxRetries: number;
private workerPid: number;
private getLiveWorkerPids: LiveWorkerPidsProvider;
constructor(db: Database, maxRetries: number = 3) {
/**
* @param db SQLite database
* @param maxRetries Per-message retry ceiling for transient SDK failures (default 3)
* @param workerPid PID of the worker that owns this store; stamped into worker_pid on claim.
* Defaults to process.pid so single-process deployments need no extra wiring.
* @param getLiveWorkerPids Provider for the set of all currently-live worker PIDs.
* Defaults to `[workerPid]` — only this worker is alive.
* Multi-worker deployments inject a supervisor-backed provider.
*/
constructor(
db: Database,
maxRetries: number = 3,
workerPid: number = process.pid,
getLiveWorkerPids?: LiveWorkerPidsProvider
) {
this.db = db;
this.maxRetries = maxRetries;
this.workerPid = workerPid;
this.getLiveWorkerPids = getLiveWorkerPids ?? (() => [this.workerPid]);
}
/**
* Enqueue a new message (persist before processing)
* @returns The database ID of the persisted message
* Enqueue a new message (persist before processing).
*
* Uses `INSERT OR IGNORE` so duplicate (content_session_id, tool_use_id)
* pairs collapse to a single row — the UNIQUE INDEX added in plan 01 phase 1
* is the authority on tool-use idempotency. Per principle 3 (UNIQUE
* constraint over dedup window), we don't time-gate duplicates.
*
* @returns The database ID of the persisted message, or 0 when the insert
* was suppressed by ON CONFLICT. Callers MUST guard with `id > 0`
* before threading the value into any subsequent SQL (e.g.
* `confirmProcessed`, `markFailed`, `processingMessageIds`) —
* a zero id would silently target zero rows. The only two call
* sites today (`SessionManager.queueObservation` and
* `queueSummarize`) use the id purely for logging and both
* branch on `messageId === 0`.
*/
enqueue(sessionDbId: number, contentSessionId: string, message: PendingMessage): number {
const now = Date.now();
const stmt = this.db.prepare(`
INSERT INTO pending_messages (
session_db_id, content_session_id, message_type,
INSERT OR IGNORE INTO pending_messages (
session_db_id, content_session_id, tool_use_id, message_type,
tool_name, tool_input, tool_response, cwd,
last_assistant_message,
prompt_number, status, retry_count, created_at_epoch,
agent_type, agent_id
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, 'pending', 0, ?, ?, ?)
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 'pending', 0, ?, ?, ?)
`);
const result = stmt.run(
sessionDbId,
contentSessionId,
message.toolUseId ?? null,
message.type,
message.tool_name || null,
message.tool_input ? JSON.stringify(message.tool_input) : null,
@@ -90,58 +131,58 @@ export class PendingMessageStore {
return result.lastInsertRowid as number;
}
/**
* Atomically claim the next pending message by marking it as 'processing'.
* Self-healing: resets any stale 'processing' messages (>60s) back to 'pending' first.
* Message stays in DB until confirmProcessed() is called.
* Uses a transaction to prevent race conditions.
/**
* Atomically claim the next message for `sessionDbId`.
*
* A row is claimable iff:
* - status = 'pending', OR
* - status = 'processing' AND worker_pid is not in the live-pids set
* (i.e. the previous owner crashed). This is the self-healing branch:
* liveness is checked at claim time, not by a background reaper.
*
* The claim stamps the live worker's PID and flips status to 'processing'
* in a single UPDATE … WHERE id = (subquery).
*/
claimNextMessage(sessionDbId: number): PersistentPendingMessage | null {
const claimTx = this.db.transaction((sessionId: number) => {
// Capture time inside transaction so it's fresh if WAL contention causes retry
const now = Date.now();
// Self-healing: reset stale 'processing' messages back to 'pending'
// This recovers from generator crashes without external timers
// Note: strict < means messages must be OLDER than threshold to be reset
const staleCutoff = now - STALE_PROCESSING_THRESHOLD_MS;
const resetStmt = this.db.prepare(`
UPDATE pending_messages
SET status = 'pending', started_processing_at_epoch = NULL
WHERE session_db_id = ? AND status = 'processing'
AND started_processing_at_epoch < ?
`);
const resetResult = resetStmt.run(sessionId, staleCutoff);
if (resetResult.changes > 0) {
logger.info('QUEUE', `SELF_HEAL | sessionDbId=${sessionId} | recovered ${resetResult.changes} stale processing message(s)`);
}
// Build a parameterized IN-list of live worker PIDs. We always include
// this worker's PID so that an in-flight claim doesn't accidentally
// self-reclaim a row we just stamped (the predicate is "NOT IN live").
const livePids = this.getLivePidsIncludingSelf();
const placeholders = livePids.map(() => '?').join(',');
const peekStmt = this.db.prepare(`
SELECT * FROM pending_messages
WHERE session_db_id = ? AND status = 'pending'
ORDER BY id ASC
LIMIT 1
`);
const msg = peekStmt.get(sessionId) as PersistentPendingMessage | null;
const sql = `
UPDATE pending_messages
SET status = 'processing',
worker_pid = ?
WHERE id = (
SELECT id FROM pending_messages
WHERE session_db_id = ?
AND (
status = 'pending'
OR (status = 'processing' AND (worker_pid IS NULL OR worker_pid NOT IN (${placeholders})))
)
ORDER BY id ASC
LIMIT 1
)
RETURNING *
`;
if (msg) {
// CRITICAL FIX: Mark as 'processing' instead of deleting
// Message will be deleted by confirmProcessed() after successful store
const updateStmt = this.db.prepare(`
UPDATE pending_messages
SET status = 'processing', started_processing_at_epoch = ?
WHERE id = ?
`);
updateStmt.run(now, msg.id);
const stmt = this.db.prepare(sql);
const params: (number | string)[] = [this.workerPid, sessionDbId, ...livePids];
const claimed = stmt.get(...params) as PersistentPendingMessage | null;
// Log claim with minimal info (avoid logging full payload)
logger.info('QUEUE', `CLAIMED | sessionDbId=${sessionId} | messageId=${msg.id} | type=${msg.message_type}`, {
sessionId: sessionId
});
}
return msg;
});
if (claimed) {
logger.info('QUEUE', `CLAIMED | sessionDbId=${sessionDbId} | messageId=${claimed.id} | type=${claimed.message_type} | workerPid=${this.workerPid}`, {
sessionId: sessionDbId
});
}
return claimed;
}
return claimTx(sessionDbId) as PersistentPendingMessage | null;
private getLivePidsIncludingSelf(): number[] {
const pids = this.getLiveWorkerPids();
if (pids.includes(this.workerPid)) return [...pids];
return [...pids, this.workerPid];
}
/**
@@ -158,34 +199,19 @@ export class PendingMessageStore {
}
/**
* Reset stale 'processing' messages back to 'pending' for retry.
* Called on worker startup and periodically to recover from crashes.
* @param thresholdMs Messages processing longer than this are considered stale (default: 5 minutes)
* @returns Number of messages reset
* Delete `status='failed'` rows older than `thresholdMs`. Called once at
* worker startup so `pending_messages` does not grow unbounded on long-
* running or high-failure-rate installations; `claimNextMessage`'s
* self-healing subquery scans this table, so bounded rows keep claim
* latency predictable. Not a reaper — one-shot, idempotent.
*/
resetStaleProcessingMessages(thresholdMs: number = 5 * 60 * 1000, sessionDbId?: number): number {
clearFailedOlderThan(thresholdMs: number): number {
const cutoff = Date.now() - thresholdMs;
let stmt;
let result;
if (sessionDbId !== undefined) {
stmt = this.db.prepare(`
UPDATE pending_messages
SET status = 'pending', started_processing_at_epoch = NULL
WHERE status = 'processing' AND started_processing_at_epoch < ? AND session_db_id = ?
`);
result = stmt.run(cutoff, sessionDbId);
} else {
stmt = this.db.prepare(`
UPDATE pending_messages
SET status = 'pending', started_processing_at_epoch = NULL
WHERE status = 'processing' AND started_processing_at_epoch < ?
`);
result = stmt.run(cutoff);
}
if (result.changes > 0) {
logger.info('QUEUE', `RESET_STALE | count=${result.changes} | thresholdMs=${thresholdMs}${sessionDbId !== undefined ? ` | sessionDbId=${sessionDbId}` : ''}`);
}
return result.changes;
const stmt = this.db.prepare(`
DELETE FROM pending_messages
WHERE status = 'failed' AND COALESCE(failed_at_epoch, completed_at_epoch, 0) < ?
`);
return stmt.run(cutoff).changes;
}
/**
@@ -201,144 +227,44 @@ export class PendingMessageStore {
}
/**
* Get all queue messages (for UI display)
* Returns pending, processing, and failed messages (not processed - they're deleted)
* Joins with sdk_sessions to get project name
* Transition pending_messages rows to a terminal status — PATHFINDER-2026-04-22
* Plan 06 Phase 9. One SQL UPDATE path, one place to add a new terminal status
* later, zero divergence between call sites.
*
* - `failed` — narrow form: only rows currently `status='processing'`.
* Used during error recovery when a session generator crashes and we want
* to mark its in-flight messages failed without touching rows that never
* left `pending`.
*
* - `abandoned` — wide form: rows in `('pending', 'processing')`.
* Used during session termination or completion drain so the session
* doesn't appear in `getSessionsWithPendingMessages` forever. Both forms
* write the row's `status` column to `'failed'`; `abandoned` is just the
* broader WHERE clause.
*
* Cites Principle 6 (one helper, N callers) and Principle 7 (the
* old per-status wrapper methods were deleted in the same PR).
*
* @param status `'failed'` (processing-only) or `'abandoned'` (pending+processing)
* @param filter `{ sessionDbId: number }` — scope to one session's rows.
* Required: no unscoped path exists, to prevent accidental global drain.
* @returns Number of rows updated
*/
getQueueMessages(): (PersistentPendingMessage & { project: string | null })[] {
const stmt = this.db.prepare(`
SELECT pm.*, ss.project
FROM pending_messages pm
LEFT JOIN sdk_sessions ss ON pm.content_session_id = ss.content_session_id
WHERE pm.status IN ('pending', 'processing', 'failed')
ORDER BY
CASE pm.status
WHEN 'failed' THEN 0
WHEN 'processing' THEN 1
WHEN 'pending' THEN 2
END,
pm.created_at_epoch ASC
`);
return stmt.all() as (PersistentPendingMessage & { project: string | null })[];
}
/**
* Get count of stuck messages (processing longer than threshold)
*/
getStuckCount(thresholdMs: number): number {
const cutoff = Date.now() - thresholdMs;
const stmt = this.db.prepare(`
SELECT COUNT(*) as count FROM pending_messages
WHERE status = 'processing' AND started_processing_at_epoch < ?
`);
const result = stmt.get(cutoff) as { count: number };
return result.count;
}
/**
* Retry a specific message (reset to pending)
* Works for pending (re-queue), processing (reset stuck), and failed messages
*/
retryMessage(messageId: number): boolean {
const stmt = this.db.prepare(`
UPDATE pending_messages
SET status = 'pending', started_processing_at_epoch = NULL
WHERE id = ? AND status IN ('pending', 'processing', 'failed')
`);
const result = stmt.run(messageId);
return result.changes > 0;
}
/**
* Reset all processing messages for a session to pending
* Used when force-restarting a stuck session
*/
resetProcessingToPending(sessionDbId: number): number {
const stmt = this.db.prepare(`
UPDATE pending_messages
SET status = 'pending', started_processing_at_epoch = NULL
WHERE session_db_id = ? AND status = 'processing'
`);
const result = stmt.run(sessionDbId);
return result.changes;
}
/**
* Mark all processing messages for a session as failed
* Used in error recovery when session generator crashes
* @returns Number of messages marked failed
*/
markSessionMessagesFailed(sessionDbId: number): number {
transitionMessagesTo(
status: 'failed' | 'abandoned',
filter: { sessionDbId: number }
): number {
const now = Date.now();
const statusClause = status === 'failed'
? `status = 'processing'`
: `status IN ('pending', 'processing')`;
// Atomic update - all processing messages for session → failed
// Note: This bypasses retry logic since generator failures are session-level,
// not message-level. Individual message failures use markFailed() instead.
const stmt = this.db.prepare(`
UPDATE pending_messages
SET status = 'failed', failed_at_epoch = ?
WHERE session_db_id = ? AND status = 'processing'
WHERE session_db_id = ? AND ${statusClause}
`);
const result = stmt.run(now, sessionDbId);
return result.changes;
}
/**
* Mark all pending and processing messages for a session as failed (abandoned).
* Used when SDK session is terminated and no fallback agent is available:
* prevents the session from appearing in getSessionsWithPendingMessages forever.
* @returns Number of messages marked failed
*/
markAllSessionMessagesAbandoned(sessionDbId: number): number {
const now = Date.now();
const stmt = this.db.prepare(`
UPDATE pending_messages
SET status = 'failed', failed_at_epoch = ?
WHERE session_db_id = ? AND status IN ('pending', 'processing')
`);
const result = stmt.run(now, sessionDbId);
return result.changes;
}
/**
* Abort a specific message (delete from queue)
*/
abortMessage(messageId: number): boolean {
const stmt = this.db.prepare('DELETE FROM pending_messages WHERE id = ?');
const result = stmt.run(messageId);
return result.changes > 0;
}
/**
* Retry all stuck messages at once
*/
retryAllStuck(thresholdMs: number): number {
const cutoff = Date.now() - thresholdMs;
const stmt = this.db.prepare(`
UPDATE pending_messages
SET status = 'pending', started_processing_at_epoch = NULL
WHERE status = 'processing' AND started_processing_at_epoch < ?
`);
const result = stmt.run(cutoff);
return result.changes;
}
/**
* Get recently processed messages (for UI feedback)
* Shows messages completed in the last N minutes so users can see their stuck items were processed
*/
getRecentlyProcessed(limit: number = 10, withinMinutes: number = 30): (PersistentPendingMessage & { project: string | null })[] {
const cutoff = Date.now() - (withinMinutes * 60 * 1000);
const stmt = this.db.prepare(`
SELECT pm.*, ss.project
FROM pending_messages pm
LEFT JOIN sdk_sessions ss ON pm.content_session_id = ss.content_session_id
WHERE pm.status = 'processed' AND pm.completed_at_epoch > ?
ORDER BY pm.completed_at_epoch DESC
LIMIT ?
`);
return stmt.all(cutoff, limit) as (PersistentPendingMessage & { project: string | null })[];
return stmt.run(now, filter.sessionDbId).changes;
}
/**
@@ -358,7 +284,7 @@ export class PendingMessageStore {
// Move back to pending for retry
const stmt = this.db.prepare(`
UPDATE pending_messages
SET status = 'pending', retry_count = retry_count + 1, started_processing_at_epoch = NULL
SET status = 'pending', retry_count = retry_count + 1, worker_pid = NULL
WHERE id = ?
`);
stmt.run(messageId);
@@ -373,24 +299,6 @@ export class PendingMessageStore {
}
}
/**
* Reset stuck messages (processing -> pending if stuck longer than threshold)
* @param thresholdMs Messages processing longer than this are considered stuck (0 = reset all)
* @returns Number of messages reset
*/
resetStuckMessages(thresholdMs: number): number {
const cutoff = thresholdMs === 0 ? Date.now() : Date.now() - thresholdMs;
const stmt = this.db.prepare(`
UPDATE pending_messages
SET status = 'pending', started_processing_at_epoch = NULL
WHERE status = 'processing' AND started_processing_at_epoch < ?
`);
const result = stmt.run(cutoff);
return result.changes;
}
/**
* Get count of pending messages for a session
*/
@@ -417,27 +325,21 @@ export class PendingMessageStore {
}
/**
* Check if any session has pending work.
* Excludes 'processing' messages stuck for >5 minutes (resets them to 'pending' as a side effect).
* Check if any session has work that could be claimed right now.
*
* Counts a row as work iff it is 'pending' or it is 'processing' under a
* worker_pid that is not currently alive (the same predicate the
* self-healing claim uses). No side effects — no UPDATE, no timer.
*/
hasAnyPendingWork(): boolean {
// Reset stuck 'processing' messages older than 5 minutes before checking
const stuckCutoff = Date.now() - (5 * 60 * 1000);
const resetStmt = this.db.prepare(`
UPDATE pending_messages
SET status = 'pending', started_processing_at_epoch = NULL
WHERE status = 'processing' AND started_processing_at_epoch < ?
`);
const resetResult = resetStmt.run(stuckCutoff);
if (resetResult.changes > 0) {
logger.info('QUEUE', `STUCK_RESET | hasAnyPendingWork reset ${resetResult.changes} stuck processing message(s) older than 5 minutes`);
}
const livePids = this.getLivePidsIncludingSelf();
const placeholders = livePids.map(() => '?').join(',');
const stmt = this.db.prepare(`
SELECT COUNT(*) as count FROM pending_messages
WHERE status IN ('pending', 'processing')
WHERE status = 'pending'
OR (status = 'processing' AND (worker_pid IS NULL OR worker_pid NOT IN (${placeholders})))
`);
const result = stmt.get() as { count: number };
const result = stmt.get(...livePids) as { count: number };
return result.count > 0;
}
@@ -464,52 +366,6 @@ export class PendingMessageStore {
return result ? { sessionDbId: result.session_db_id, contentSessionId: result.content_session_id } : null;
}
/**
* Clear all failed messages from the queue
* @returns Number of messages deleted
*/
clearFailed(): number {
const stmt = this.db.prepare(`
DELETE FROM pending_messages
WHERE status = 'failed'
`);
const result = stmt.run();
return result.changes;
}
/**
* Clear failed messages older than the given threshold.
* Preserves recent failures for inspection and manual retry.
* @param thresholdMs - Only delete failures older than this many milliseconds
* @returns Number of messages deleted
*/
clearFailedOlderThan(thresholdMs: number): number {
const cutoff = Date.now() - thresholdMs;
// Use COALESCE to prefer the most recent failure timestamp over creation time.
// failed_at_epoch is set by session-level failures, completed_at_epoch by markFailed().
const stmt = this.db.prepare(`
DELETE FROM pending_messages
WHERE status = 'failed'
AND COALESCE(failed_at_epoch, completed_at_epoch, started_processing_at_epoch, created_at_epoch) < ?
`);
const result = stmt.run(cutoff);
return result.changes;
}
/**
* Clear all pending, processing, and failed messages from the queue
* Keeps only processed messages (for history)
* @returns Number of messages deleted
*/
clearAll(): number {
const stmt = this.db.prepare(`
DELETE FROM pending_messages
WHERE status IN ('pending', 'processing', 'failed')
`);
const result = stmt.run();
return result.changes;
}
/**
* Convert a PersistentPendingMessage back to PendingMessage format
*/
+6 -5
View File
@@ -25,13 +25,14 @@ export class SessionSearch {
private static readonly MISSING_SEARCH_INPUT_MESSAGE = 'Either query or filters required for search';
constructor(dbPath?: string) {
if (!dbPath) {
constructor(dbPathOrDb: string | Database = DB_PATH) {
if (dbPathOrDb instanceof Database) {
this.db = dbPathOrDb;
} else {
ensureDir(DATA_DIR);
dbPath = DB_PATH;
this.db = new Database(dbPathOrDb);
this.db.run('PRAGMA journal_mode = WAL');
}
this.db = new Database(dbPath);
this.db.run('PRAGMA journal_mode = WAL');
// Cache FTS5 availability once at construction (avoids DDL probe on every query)
this._fts5Available = this.isFts5Available();
+253 -69
View File
@@ -1,4 +1,4 @@
import { Database } from 'bun:sqlite';
import { Database, type SQLQueryBindings } from 'bun:sqlite';
import { DATA_DIR, DB_PATH, ensureDir, OBSERVER_SESSIONS_PROJECT } from '../../shared/paths.js';
import { logger } from '../../utils/logger.js';
import {
@@ -13,7 +13,8 @@ import {
LatestPromptResult
} from '../../types/database.js';
import type { PendingMessageStore } from './PendingMessageStore.js';
import { computeObservationContentHash, findDuplicateObservation } from './observations/store.js';
import type { ObservationSearchResult, SessionSummarySearchResult } from './types.js';
import { computeObservationContentHash } from './observations/store.js';
import { parseFileList } from './observations/files.js';
import { DEFAULT_PLATFORM_SOURCE, normalizePlatformSource, sortPlatformSources } from '../../shared/platform-source.js';
@@ -34,17 +35,21 @@ function resolveCreateSessionArgs(
export class SessionStore {
public db: Database;
constructor(dbPath: string = DB_PATH) {
if (dbPath !== ':memory:') {
ensureDir(DATA_DIR);
}
this.db = new Database(dbPath);
constructor(dbPathOrDb: string | Database = DB_PATH) {
if (dbPathOrDb instanceof Database) {
this.db = dbPathOrDb;
} else {
if (dbPathOrDb !== ':memory:') {
ensureDir(DATA_DIR);
}
this.db = new Database(dbPathOrDb);
// Ensure optimized settings
this.db.run('PRAGMA journal_mode = WAL');
this.db.run('PRAGMA synchronous = NORMAL');
this.db.run('PRAGMA foreign_keys = ON');
this.db.run('PRAGMA journal_size_limit = 4194304'); // 4MB WAL cap (#1956)
// Ensure optimized settings only for new connections
this.db.run('PRAGMA journal_mode = WAL');
this.db.run('PRAGMA synchronous = NORMAL');
this.db.run('PRAGMA foreign_keys = ON');
this.db.run('PRAGMA journal_size_limit = 4194304'); // 4MB WAL cap (#1956)
}
// Initialize schema if needed (fresh database)
this.initializeSchema();
@@ -68,6 +73,7 @@ export class SessionStore {
this.addObservationModelColumns();
this.ensureMergedIntoProjectColumns();
this.addObservationSubagentColumns();
this.addObservationsUniqueContentHashIndex();
}
/**
@@ -565,7 +571,6 @@ export class SessionStore {
status TEXT NOT NULL DEFAULT 'pending' CHECK(status IN ('pending', 'processing', 'processed', 'failed')),
retry_count INTEGER NOT NULL DEFAULT 0,
created_at_epoch INTEGER NOT NULL,
started_processing_at_epoch INTEGER,
completed_at_epoch INTEGER,
FOREIGN KEY (session_db_id) REFERENCES sdk_sessions(id) ON DELETE CASCADE
)
@@ -661,7 +666,7 @@ export class SessionStore {
/**
* Add failed_at_epoch column to pending_messages (migration 20)
* Used by markSessionMessagesFailed() for error recovery tracking
* Used by transitionMessagesTo() for error recovery tracking
*/
private addFailedAtEpochColumn(): void {
const applied = this.db.prepare('SELECT version FROM schema_versions WHERE version = ?').get(20) as SchemaVersion | undefined;
@@ -1033,6 +1038,47 @@ export class SessionStore {
}
}
/**
* Add UNIQUE(memory_session_id, content_hash) on observations (migration 29).
* Mirrors MigrationRunner.addObservationsUniqueContentHashIndex so bundled
* artifacts that embed SessionStore (e.g. worker-service.cjs, context-generator.cjs)
* stay schema-consistent. Without this, INSERT … ON CONFLICT(memory_session_id,
* content_hash) DO NOTHING throws "ON CONFLICT clause does not match any
* PRIMARY KEY or UNIQUE constraint" and every observation insert fails.
*/
private addObservationsUniqueContentHashIndex(): void {
const applied = this.db.prepare('SELECT version FROM schema_versions WHERE version = ?').get(29) as SchemaVersion | undefined;
if (applied) return;
const obsCols = this.db.query('PRAGMA table_info(observations)').all() as TableColumnInfo[];
const hasMem = obsCols.some(c => c.name === 'memory_session_id');
const hasHash = obsCols.some(c => c.name === 'content_hash');
if (!hasMem || !hasHash) {
this.db.prepare('INSERT OR IGNORE INTO schema_versions (version, applied_at) VALUES (?, ?)').run(29, new Date().toISOString());
return;
}
this.db.run('BEGIN TRANSACTION');
try {
this.db.run(`
DELETE FROM observations
WHERE id NOT IN (
SELECT MIN(id) FROM observations
GROUP BY memory_session_id, content_hash
)
`);
this.db.run(`
CREATE UNIQUE INDEX IF NOT EXISTS ux_observations_session_hash
ON observations(memory_session_id, content_hash)
`);
this.db.prepare('INSERT OR IGNORE INTO schema_versions (version, applied_at) VALUES (?, ?)').run(29, new Date().toISOString());
this.db.run('COMMIT');
} catch (error) {
this.db.run('ROLLBACK');
throw error;
}
}
/**
* Update the memory session ID for a session
* Called by SDKAgent when it captures the session ID from the first SDK message
@@ -1112,7 +1158,18 @@ export class SessionStore {
LIMIT ?
`);
return stmt.all(project, limit);
return stmt.all(project, limit) as Array<{
request: string | null;
investigated: string | null;
learned: string | null;
completed: string | null;
next_steps: string | null;
files_read: string | null;
files_edited: string | null;
notes: string | null;
prompt_number: number | null;
created_at: string;
}>;
}
/**
@@ -1137,7 +1194,15 @@ export class SessionStore {
LIMIT ?
`);
return stmt.all(project, limit);
return stmt.all(project, limit) as Array<{
memory_session_id: string;
request: string | null;
learned: string | null;
completed: string | null;
next_steps: string | null;
prompt_number: number | null;
created_at: string;
}>;
}
/**
@@ -1157,7 +1222,12 @@ export class SessionStore {
LIMIT ?
`);
return stmt.all(project, limit);
return stmt.all(project, limit) as Array<{
type: string;
text: string;
prompt_number: number | null;
created_at: string;
}>;
}
/**
@@ -1193,7 +1263,18 @@ export class SessionStore {
LIMIT ?
`);
return stmt.all(limit);
return stmt.all(limit) as Array<{
id: number;
type: string;
title: string | null;
subtitle: string | null;
text: string;
project: string;
platform_source: string;
prompt_number: number | null;
created_at: string;
created_at_epoch: number;
}>;
}
/**
@@ -1237,7 +1318,22 @@ export class SessionStore {
LIMIT ?
`);
return stmt.all(limit);
return stmt.all(limit) as Array<{
id: number;
request: string | null;
investigated: string | null;
learned: string | null;
completed: string | null;
next_steps: string | null;
files_read: string | null;
files_edited: string | null;
notes: string | null;
project: string;
platform_source: string;
prompt_number: number | null;
created_at: string;
created_at_epoch: number;
}>;
}
/**
@@ -1269,7 +1365,16 @@ export class SessionStore {
LIMIT ?
`);
return stmt.all(limit);
return stmt.all(limit) as Array<{
id: number;
content_session_id: string;
project: string;
platform_source: string;
prompt_number: number;
prompt_text: string;
created_at: string;
created_at_epoch: number;
}>;
}
/**
@@ -1283,7 +1388,7 @@ export class SessionStore {
WHERE project IS NOT NULL AND project != ''
AND project != ?
`;
const params: unknown[] = [OBSERVER_SESSIONS_PROJECT];
const params: SQLQueryBindings[] = [OBSERVER_SESSIONS_PROJECT];
if (normalizedPlatformSource) {
query += ' AND COALESCE(platform_source, ?) = ?';
@@ -1404,7 +1509,13 @@ export class SessionStore {
ORDER BY started_at_epoch ASC
`);
return stmt.all(project, limit);
return stmt.all(project, limit) as Array<{
memory_session_id: string | null;
status: string;
started_at: string;
user_prompt: string | null;
has_summary: boolean;
}>;
}
/**
@@ -1423,7 +1534,12 @@ export class SessionStore {
ORDER BY created_at_epoch ASC
`);
return stmt.all(memorySessionId);
return stmt.all(memorySessionId) as Array<{
title: string;
subtitle: string;
type: string;
prompt_number: number | null;
}>;
}
/**
@@ -1445,7 +1561,7 @@ export class SessionStore {
getObservationsByIds(
ids: number[],
options: { orderBy?: 'date_desc' | 'date_asc'; limit?: number; project?: string; type?: string | string[]; concepts?: string | string[]; files?: string | string[] } = {}
): ObservationRecord[] {
): ObservationSearchResult[] {
if (ids.length === 0) return [];
const { orderBy = 'date_desc', limit, project, type, concepts, files } = options;
@@ -1509,7 +1625,7 @@ export class SessionStore {
${limitClause}
`);
return stmt.all(...params) as ObservationRecord[];
return stmt.all(...params) as ObservationSearchResult[];
}
/**
@@ -1539,7 +1655,19 @@ export class SessionStore {
LIMIT 1
`);
return stmt.get(memorySessionId) || null;
return (stmt.get(memorySessionId) as {
request: string | null;
investigated: string | null;
learned: string | null;
completed: string | null;
next_steps: string | null;
files_read: string | null;
files_edited: string | null;
notes: string | null;
prompt_number: number | null;
created_at: string;
created_at_epoch: number;
} | null) || null;
}
/**
@@ -1599,7 +1727,16 @@ export class SessionStore {
LIMIT 1
`);
return stmt.get(id) || null;
return (stmt.get(id) as {
id: number;
content_session_id: string;
memory_session_id: string | null;
project: string;
platform_source: string;
user_prompt: string;
custom_title: string | null;
status: string;
} | null) || null;
}
/**
@@ -1805,12 +1942,9 @@ export class SessionStore {
const timestampEpoch = overrideTimestampEpoch ?? Date.now();
const timestampIso = new Date(timestampEpoch).toISOString();
// Content-hash deduplication
// DB-enforced dedup: UNIQUE(memory_session_id, content_hash) +
// ON CONFLICT DO NOTHING (Plan 01 Phase 4).
const contentHash = computeObservationContentHash(memorySessionId, observation.title, observation.narrative);
const existing = findDuplicateObservation(this.db, contentHash, timestampEpoch);
if (existing) {
return { id: existing.id, createdAtEpoch: existing.created_at_epoch };
}
const stmt = this.db.prepare(`
INSERT INTO observations
@@ -1818,9 +1952,11 @@ export class SessionStore {
files_read, files_modified, prompt_number, discovery_tokens, agent_type, agent_id, content_hash, created_at, created_at_epoch,
generated_by_model)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
ON CONFLICT(memory_session_id, content_hash) DO NOTHING
RETURNING id, created_at_epoch
`);
const result = stmt.run(
const inserted = stmt.get(
memorySessionId,
project,
observation.type,
@@ -1839,12 +1975,22 @@ export class SessionStore {
timestampIso,
timestampEpoch,
generatedByModel || null
);
) as { id: number; created_at_epoch: number } | null;
return {
id: Number(result.lastInsertRowid),
createdAtEpoch: timestampEpoch
};
if (inserted) {
return { id: inserted.id, createdAtEpoch: inserted.created_at_epoch };
}
const existing = this.db.prepare(
'SELECT id, created_at_epoch FROM observations WHERE memory_session_id = ? AND content_hash = ?'
).get(memorySessionId, contentHash) as { id: number; created_at_epoch: number } | null;
if (!existing) {
throw new Error(
`storeObservation: ON CONFLICT without existing row for content_hash=${contentHash}`
);
}
return { id: existing.id, createdAtEpoch: existing.created_at_epoch };
}
/**
@@ -1950,25 +2096,25 @@ export class SessionStore {
const storeTx = this.db.transaction(() => {
const observationIds: number[] = [];
// 1. Store all observations (with content-hash deduplication)
// 1. Store all observations.
// DB-enforced dedup via UNIQUE(memory_session_id, content_hash) +
// ON CONFLICT DO NOTHING (Plan 01 Phase 4).
const obsStmt = this.db.prepare(`
INSERT INTO observations
(memory_session_id, project, type, title, subtitle, facts, narrative, concepts,
files_read, files_modified, prompt_number, discovery_tokens, agent_type, agent_id, content_hash, created_at, created_at_epoch,
generated_by_model)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
ON CONFLICT(memory_session_id, content_hash) DO NOTHING
RETURNING id
`);
const lookupExistingStmt = this.db.prepare(
'SELECT id FROM observations WHERE memory_session_id = ? AND content_hash = ?'
);
for (const observation of observations) {
// Content-hash deduplication (same logic as storeObservation singular)
const contentHash = computeObservationContentHash(memorySessionId, observation.title, observation.narrative);
const existing = findDuplicateObservation(this.db, contentHash, timestampEpoch);
if (existing) {
observationIds.push(existing.id);
continue;
}
const result = obsStmt.run(
const inserted = obsStmt.get(
memorySessionId,
project,
observation.type,
@@ -1987,8 +2133,20 @@ export class SessionStore {
timestampIso,
timestampEpoch,
generatedByModel || null
);
observationIds.push(Number(result.lastInsertRowid));
) as { id: number } | null;
if (inserted) {
observationIds.push(inserted.id);
continue;
}
const existing = lookupExistingStmt.get(memorySessionId, contentHash) as { id: number } | null;
if (!existing) {
throw new Error(
`storeObservations: ON CONFLICT without existing row for content_hash=${contentHash}`
);
}
observationIds.push(existing.id);
}
// 2. Store summary if provided
@@ -2086,25 +2244,25 @@ export class SessionStore {
const storeAndMarkTx = this.db.transaction(() => {
const observationIds: number[] = [];
// 1. Store all observations (with content-hash deduplication)
// 1. Store all observations.
// DB-enforced dedup via UNIQUE(memory_session_id, content_hash) +
// ON CONFLICT DO NOTHING (Plan 01 Phase 4).
const obsStmt = this.db.prepare(`
INSERT INTO observations
(memory_session_id, project, type, title, subtitle, facts, narrative, concepts,
files_read, files_modified, prompt_number, discovery_tokens, agent_type, agent_id, content_hash, created_at, created_at_epoch,
generated_by_model)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
ON CONFLICT(memory_session_id, content_hash) DO NOTHING
RETURNING id
`);
const lookupExistingStmt = this.db.prepare(
'SELECT id FROM observations WHERE memory_session_id = ? AND content_hash = ?'
);
for (const observation of observations) {
// Content-hash deduplication (same logic as storeObservation singular)
const contentHash = computeObservationContentHash(memorySessionId, observation.title, observation.narrative);
const existing = findDuplicateObservation(this.db, contentHash, timestampEpoch);
if (existing) {
observationIds.push(existing.id);
continue;
}
const result = obsStmt.run(
const inserted = obsStmt.get(
memorySessionId,
project,
observation.type,
@@ -2123,8 +2281,20 @@ export class SessionStore {
timestampIso,
timestampEpoch,
generatedByModel || null
);
observationIds.push(Number(result.lastInsertRowid));
) as { id: number } | null;
if (inserted) {
observationIds.push(inserted.id);
continue;
}
const existing = lookupExistingStmt.get(memorySessionId, contentHash) as { id: number } | null;
if (!existing) {
throw new Error(
`storeObservationsAndMarkComplete: ON CONFLICT without existing row for content_hash=${contentHash}`
);
}
observationIds.push(existing.id);
}
// 2. Store summary if provided
@@ -2177,11 +2347,6 @@ export class SessionStore {
// REMOVED: cleanupOrphanedSessions - violates "EVERYTHING SHOULD SAVE ALWAYS"
// There's no such thing as an "orphaned" session. Sessions are created by hooks
// and managed by Claude Code's lifecycle. Worker restarts don't invalidate them.
// Marking all active sessions as 'failed' on startup destroys the user's current work.
/**
* Get session summaries by IDs (for hybrid Chroma search)
* Returns summaries in specified temporal order
@@ -2189,7 +2354,7 @@ export class SessionStore {
getSessionSummariesByIds(
ids: number[],
options: { orderBy?: 'date_desc' | 'date_asc'; limit?: number; project?: string } = {}
): SessionSummaryRecord[] {
): SessionSummarySearchResult[] {
if (ids.length === 0) return [];
const { orderBy = 'date_desc', limit, project } = options;
@@ -2211,7 +2376,7 @@ export class SessionStore {
${limitClause}
`);
return stmt.all(...params) as SessionSummaryRecord[];
return stmt.all(...params) as SessionSummarySearchResult[];
}
/**
@@ -2443,7 +2608,15 @@ export class SessionStore {
LIMIT 1
`);
return stmt.get(id) || null;
return (stmt.get(id) as {
id: number;
content_session_id: string;
prompt_number: number;
prompt_text: string;
project: string;
created_at: string;
created_at_epoch: number;
} | null) || null;
}
/**
@@ -2519,7 +2692,18 @@ export class SessionStore {
LIMIT 1
`);
return stmt.get(id) || null;
return (stmt.get(id) as {
id: number;
memory_session_id: string | null;
content_session_id: string;
project: string;
user_prompt: string;
request_summary: string | null;
learned_summary: string | null;
status: string;
created_at: string;
created_at_epoch: number;
} | null) || null;
}
/**
+206 -17
View File
@@ -30,7 +30,6 @@ export class MigrationRunner {
this.ensureDiscoveryTokensColumn();
this.createPendingMessagesTable();
this.renameSessionIdColumns();
this.repairSessionIdColumnRename();
this.addFailedAtEpochColumn();
this.addOnUpdateCascadeToForeignKeys();
this.addObservationContentHashColumn();
@@ -39,6 +38,8 @@ export class MigrationRunner {
this.addSessionPlatformSourceColumn();
this.ensureMergedIntoProjectColumns();
this.addObservationSubagentColumns();
this.rebuildPendingMessagesForSelfHealingClaim();
this.addObservationsUniqueContentHashIndex();
}
/**
@@ -533,7 +534,6 @@ export class MigrationRunner {
status TEXT NOT NULL DEFAULT 'pending' CHECK(status IN ('pending', 'processing', 'processed', 'failed')),
retry_count INTEGER NOT NULL DEFAULT 0,
created_at_epoch INTEGER NOT NULL,
started_processing_at_epoch INTEGER,
completed_at_epoch INTEGER,
FOREIGN KEY (session_db_id) REFERENCES sdk_sessions(id) ON DELETE CASCADE
)
@@ -613,23 +613,9 @@ export class MigrationRunner {
}
}
/**
* Repair session ID column renames (migration 19)
* DEPRECATED: Migration 17 is now fully idempotent and handles all cases.
* This migration is kept for backwards compatibility but does nothing.
*/
private repairSessionIdColumnRename(): void {
const applied = this.db.prepare('SELECT version FROM schema_versions WHERE version = ?').get(19) as SchemaVersion | undefined;
if (applied) return;
// Migration 17 now handles all column rename cases idempotently.
// Just record this migration as applied.
this.db.prepare('INSERT OR IGNORE INTO schema_versions (version, applied_at) VALUES (?, ?)').run(19, new Date().toISOString());
}
/**
* Add failed_at_epoch column to pending_messages (migration 20)
* Used by markSessionMessagesFailed() for error recovery tracking
* Used by transitionMessagesTo() for error recovery tracking
*/
private addFailedAtEpochColumn(): void {
const applied = this.db.prepare('SELECT version FROM schema_versions WHERE version = ?').get(20) as SchemaVersion | undefined;
@@ -1015,4 +1001,207 @@ export class MigrationRunner {
this.db.prepare('INSERT OR IGNORE INTO schema_versions (version, applied_at) VALUES (?, ?)').run(27, new Date().toISOString());
}
}
/**
* Rebuild pending_messages for self-healing claim (migration 28).
*
* PATHFINDER-2026-04-22 Plan 01 Phase 2.
*
* - Drops the legacy stale-reset epoch column (was the input to the
* 60-s stale-reset; replaced by worker-PID liveness at claim time).
* - Adds `worker_pid INTEGER` (set by claimNextMessage to the live
* worker's PID; rows whose worker_pid is no longer alive are
* immediately reclaimable).
* - Adds `tool_use_id TEXT` so ingestion-time pairing of tool_use
* tool_result can be DB-backed instead of an in-memory Map
* (Plan 03 dependency).
* - Dedupes any existing rows that share (content_session_id,
* tool_use_id), then creates a partial UNIQUE index.
*
* Follows the table-rebuild precedent at runner.ts:691 (migration 21):
* disable FKs, BEGIN, recreate, INSERT-SELECT, RENAME, COMMIT, re-enable.
*/
private rebuildPendingMessagesForSelfHealingClaim(): void {
const applied = this.db.prepare('SELECT version FROM schema_versions WHERE version = ?').get(28) as SchemaVersion | undefined;
if (applied) return;
const pendingExists = (this.db.query("SELECT name FROM sqlite_master WHERE type='table' AND name='pending_messages'").all() as TableNameRow[]).length > 0;
if (!pendingExists) {
// pending_messages table never created on this DB — nothing to rebuild.
this.db.prepare('INSERT OR IGNORE INTO schema_versions (version, applied_at) VALUES (?, ?)').run(28, new Date().toISOString());
return;
}
logger.debug('DB', 'Rebuilding pending_messages for self-healing claim (migration 28)');
// PRAGMA foreign_keys must be set outside a transaction.
this.db.run('PRAGMA foreign_keys = OFF');
this.db.run('BEGIN TRANSACTION');
try {
// Source columns may include legacy fields. We build the SELECT explicitly
// using only columns we know are present in the source after migration 27.
const sourceCols = this.db.query('PRAGMA table_info(pending_messages)').all() as TableColumnInfo[];
const colNames = new Set(sourceCols.map(c => c.name));
const has = (name: string) => colNames.has(name);
// Clean up leftover temp from a previously-crashed run.
this.db.run('DROP TABLE IF EXISTS pending_messages_new');
this.db.run(`
CREATE TABLE pending_messages_new (
id INTEGER PRIMARY KEY AUTOINCREMENT,
session_db_id INTEGER NOT NULL,
content_session_id TEXT NOT NULL,
tool_use_id TEXT,
message_type TEXT NOT NULL CHECK(message_type IN ('observation', 'summarize')),
tool_name TEXT,
tool_input TEXT,
tool_response TEXT,
cwd TEXT,
last_user_message TEXT,
last_assistant_message TEXT,
prompt_number INTEGER,
status TEXT NOT NULL DEFAULT 'pending'
CHECK(status IN ('pending', 'processing', 'processed', 'failed')),
retry_count INTEGER NOT NULL DEFAULT 0,
created_at_epoch INTEGER NOT NULL,
failed_at_epoch INTEGER,
completed_at_epoch INTEGER,
worker_pid INTEGER,
agent_type TEXT,
agent_id TEXT,
FOREIGN KEY (session_db_id) REFERENCES sdk_sessions(id) ON DELETE CASCADE
)
`);
// INSERT-SELECT — note that the legacy stale-reset epoch column is
// intentionally omitted. Any 'processing' row is left with worker_pid =
// NULL so that a self-healing claim picks it up immediately on next
// worker boot.
this.db.run(`
INSERT INTO pending_messages_new (
id, session_db_id, content_session_id, tool_use_id, message_type,
tool_name, tool_input, tool_response, cwd, last_user_message,
last_assistant_message, prompt_number, status, retry_count,
created_at_epoch, failed_at_epoch, completed_at_epoch, worker_pid,
agent_type, agent_id
)
SELECT
id,
session_db_id,
content_session_id,
${has('tool_use_id') ? 'tool_use_id' : 'NULL'},
message_type,
tool_name,
tool_input,
tool_response,
cwd,
${has('last_user_message') ? 'last_user_message' : 'NULL'},
${has('last_assistant_message') ? 'last_assistant_message' : 'NULL'},
${has('prompt_number') ? 'prompt_number' : 'NULL'},
status,
retry_count,
created_at_epoch,
${has('failed_at_epoch') ? 'failed_at_epoch' : 'NULL'},
${has('completed_at_epoch') ? 'completed_at_epoch' : 'NULL'},
NULL,
${has('agent_type') ? 'agent_type' : 'NULL'},
${has('agent_id') ? 'agent_id' : 'NULL'}
FROM pending_messages
`);
this.db.run('DROP TABLE pending_messages');
this.db.run('ALTER TABLE pending_messages_new RENAME TO pending_messages');
this.db.run('CREATE INDEX IF NOT EXISTS idx_pending_messages_session ON pending_messages(session_db_id)');
this.db.run('CREATE INDEX IF NOT EXISTS idx_pending_messages_status ON pending_messages(status)');
this.db.run('CREATE INDEX IF NOT EXISTS idx_pending_messages_claude_session ON pending_messages(content_session_id)');
this.db.run('CREATE INDEX IF NOT EXISTS idx_pending_messages_worker_pid ON pending_messages(worker_pid)');
// Dedup any pre-existing duplicate (content_session_id, tool_use_id) pairs
// before adding the UNIQUE index. Keep the lowest id (oldest) per pair.
this.db.run(`
DELETE FROM pending_messages
WHERE tool_use_id IS NOT NULL
AND id NOT IN (
SELECT MIN(id) FROM pending_messages
WHERE tool_use_id IS NOT NULL
GROUP BY content_session_id, tool_use_id
)
`);
this.db.run(`
CREATE UNIQUE INDEX IF NOT EXISTS ux_pending_session_tool
ON pending_messages(content_session_id, tool_use_id)
WHERE tool_use_id IS NOT NULL
`);
this.db.prepare('INSERT OR IGNORE INTO schema_versions (version, applied_at) VALUES (?, ?)').run(28, new Date().toISOString());
this.db.run('COMMIT');
this.db.run('PRAGMA foreign_keys = ON');
logger.debug('DB', 'Rebuilt pending_messages for self-healing claim');
} catch (error) {
this.db.run('ROLLBACK');
this.db.run('PRAGMA foreign_keys = ON');
if (error instanceof Error) {
throw error;
}
throw new Error(`Migration 28 failed: ${String(error)}`);
}
}
/**
* Add UNIQUE(memory_session_id, content_hash) on observations (migration 29).
*
* PATHFINDER-2026-04-22 Plan 01 Phase 2 + Phase 4.
*
* - Dedupes existing rows that share (memory_session_id, content_hash),
* keeping the lowest id (oldest) per pair.
* - Creates a UNIQUE index that lets writers use
* INSERT ON CONFLICT(memory_session_id, content_hash) DO NOTHING
* in place of the legacy dedup window scan.
*/
private addObservationsUniqueContentHashIndex(): void {
const applied = this.db.prepare('SELECT version FROM schema_versions WHERE version = ?').get(29) as SchemaVersion | undefined;
if (applied) return;
// Need both columns to exist.
const obsCols = this.db.query('PRAGMA table_info(observations)').all() as TableColumnInfo[];
const hasMem = obsCols.some(c => c.name === 'memory_session_id');
const hasHash = obsCols.some(c => c.name === 'content_hash');
if (!hasMem || !hasHash) {
// Nothing to do; record so we don't keep retrying.
this.db.prepare('INSERT OR IGNORE INTO schema_versions (version, applied_at) VALUES (?, ?)').run(29, new Date().toISOString());
return;
}
this.db.run('BEGIN TRANSACTION');
try {
// Dedup before adding the UNIQUE index — keep the lowest id per pair.
this.db.run(`
DELETE FROM observations
WHERE id NOT IN (
SELECT MIN(id) FROM observations
GROUP BY memory_session_id, content_hash
)
`);
this.db.run(`
CREATE UNIQUE INDEX IF NOT EXISTS ux_observations_session_hash
ON observations(memory_session_id, content_hash)
`);
this.db.prepare('INSERT OR IGNORE INTO schema_versions (version, applied_at) VALUES (?, ?)').run(29, new Date().toISOString());
this.db.run('COMMIT');
logger.debug('DB', 'Added UNIQUE(memory_session_id, content_hash) on observations');
} catch (error) {
this.db.run('ROLLBACK');
if (error instanceof Error) {
throw error;
}
throw new Error(`Migration 29 failed: ${String(error)}`);
}
}
}
+29 -34
View File
@@ -9,9 +9,6 @@ import { logger } from '../../../utils/logger.js';
import { getProjectContext } from '../../../utils/project-name.js';
import type { ObservationInput, StoreObservationResult } from './types.js';
/** Deduplication window: observations with the same content hash within this window are skipped */
const DEDUP_WINDOW_MS = 30_000;
/**
* Compute a short content hash for deduplication.
* Uses (memory_session_id, title, narrative) as the semantic identity of an observation.
@@ -30,25 +27,13 @@ export function computeObservationContentHash(
}
/**
* Check if a duplicate observation exists within the dedup window.
* Returns the existing observation's id and timestamp if found, null otherwise.
*/
export function findDuplicateObservation(
db: Database,
contentHash: string,
timestampEpoch: number
): { id: number; created_at_epoch: number } | null {
const windowStart = timestampEpoch - DEDUP_WINDOW_MS;
const stmt = db.prepare(
'SELECT id, created_at_epoch FROM observations WHERE content_hash = ? AND created_at_epoch > ?'
);
return (stmt.get(contentHash, windowStart) as { id: number; created_at_epoch: number } | null);
}
/**
* Store an observation (from SDK parsing)
* Assumes session already exists (created by hook)
* Performs content-hash deduplication: skips INSERT if an identical observation exists within 30s
* Store an observation (from SDK parsing).
*
* Assumes session already exists (created by hook). Deduplication is enforced
* by the database via UNIQUE(memory_session_id, content_hash) (Plan 01 Phase 4):
* INSERT ON CONFLICT DO NOTHING absorbs duplicates silently. The returned id
* is the existing row's id when a conflict occurred, otherwise the freshly
* inserted row.
*/
export function storeObservation(
db: Database,
@@ -66,22 +51,18 @@ export function storeObservation(
// Guard against empty project string (race condition where project isn't set yet)
const resolvedProject = project || getProjectContext(process.cwd()).primary;
// Content-hash deduplication
const contentHash = computeObservationContentHash(memorySessionId, observation.title, observation.narrative);
const existing = findDuplicateObservation(db, contentHash, timestampEpoch);
if (existing) {
logger.debug('DEDUP', `Skipped duplicate observation | contentHash=${contentHash} | existingId=${existing.id}`);
return { id: existing.id, createdAtEpoch: existing.created_at_epoch };
}
const stmt = db.prepare(`
INSERT INTO observations
(memory_session_id, project, type, title, subtitle, facts, narrative, concepts,
files_read, files_modified, prompt_number, discovery_tokens, agent_type, agent_id, content_hash, created_at, created_at_epoch)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
ON CONFLICT(memory_session_id, content_hash) DO NOTHING
RETURNING id, created_at_epoch
`);
const result = stmt.run(
const inserted = stmt.get(
memorySessionId,
resolvedProject,
observation.type,
@@ -99,10 +80,24 @@ export function storeObservation(
contentHash,
timestampIso,
timestampEpoch
);
) as { id: number; created_at_epoch: number } | null;
return {
id: Number(result.lastInsertRowid),
createdAtEpoch: timestampEpoch
};
if (inserted) {
return { id: inserted.id, createdAtEpoch: inserted.created_at_epoch };
}
// Conflict — fetch the existing row's id for the (memory_session_id, content_hash) pair.
const existing = db.prepare(
'SELECT id, created_at_epoch FROM observations WHERE memory_session_id = ? AND content_hash = ?'
).get(memorySessionId, contentHash) as { id: number; created_at_epoch: number } | null;
if (!existing) {
// Unreachable in practice (UNIQUE conflict implies existing row), but be explicit.
throw new Error(
`storeObservation: ON CONFLICT fired but no row exists for (memory_session_id=${memorySessionId}, content_hash=${contentHash})`
);
}
logger.debug('DEDUP', `Skipped duplicate observation | contentHash=${contentHash} | existingId=${existing.id}`);
return { id: existing.id, createdAtEpoch: existing.created_at_epoch };
}
+188
View File
@@ -0,0 +1,188 @@
-- claude-mem SQLite schema
--
-- Authoritative shape of the database after all migrations through
-- runner.ts have been applied (current tip = migration 29). Fresh
-- databases boot directly into this shape; existing databases reach
-- it via the migration runner.
--
-- Source of truth: src/services/sqlite/migrations/runner.ts
-- Regenerated by: PATHFINDER-2026-04-22 Plan 01 (Data Integrity).
--
-- Invariants enforced here (Plan 01):
-- * pending_messages.UNIQUE(content_session_id, tool_use_id) — replaces
-- in-memory pendingTools Map for ingestion pairing (Plan 03 also depends).
-- * pending_messages.worker_pid INTEGER — populated by self-healing
-- claim query; replaces the legacy stale-reset epoch column.
-- * observations.UNIQUE(memory_session_id, content_hash) — replaces the
-- legacy dedup window; ON CONFLICT DO NOTHING absorbs duplicates.
CREATE TABLE IF NOT EXISTS schema_versions (
id INTEGER PRIMARY KEY,
version INTEGER UNIQUE NOT NULL,
applied_at TEXT NOT NULL
);
-- ─────────────────────────────────────────────────────────────────────
-- sdk_sessions: one row per Claude/Codex session observed by claude-mem.
-- ─────────────────────────────────────────────────────────────────────
CREATE TABLE IF NOT EXISTS sdk_sessions (
id INTEGER PRIMARY KEY AUTOINCREMENT,
content_session_id TEXT UNIQUE NOT NULL,
memory_session_id TEXT UNIQUE,
project TEXT NOT NULL,
platform_source TEXT NOT NULL DEFAULT 'claude',
user_prompt TEXT,
started_at TEXT NOT NULL,
started_at_epoch INTEGER NOT NULL,
completed_at TEXT,
completed_at_epoch INTEGER,
status TEXT NOT NULL DEFAULT 'active'
CHECK(status IN ('active', 'completed', 'failed')),
worker_port INTEGER,
prompt_counter INTEGER DEFAULT 0,
custom_title TEXT
);
CREATE INDEX IF NOT EXISTS idx_sdk_sessions_claude_id ON sdk_sessions(content_session_id);
CREATE INDEX IF NOT EXISTS idx_sdk_sessions_sdk_id ON sdk_sessions(memory_session_id);
CREATE INDEX IF NOT EXISTS idx_sdk_sessions_project ON sdk_sessions(project);
CREATE INDEX IF NOT EXISTS idx_sdk_sessions_status ON sdk_sessions(status);
CREATE INDEX IF NOT EXISTS idx_sdk_sessions_started ON sdk_sessions(started_at_epoch DESC);
CREATE INDEX IF NOT EXISTS idx_sdk_sessions_platform_source ON sdk_sessions(platform_source);
-- ─────────────────────────────────────────────────────────────────────
-- observations: structured memory rows extracted from SDK output.
-- UNIQUE(memory_session_id, content_hash) replaces the legacy dedup window;
-- writes use INSERT … ON CONFLICT DO NOTHING.
-- ─────────────────────────────────────────────────────────────────────
CREATE TABLE IF NOT EXISTS observations (
id INTEGER PRIMARY KEY AUTOINCREMENT,
memory_session_id TEXT NOT NULL,
project TEXT NOT NULL,
text TEXT,
type TEXT NOT NULL,
title TEXT,
subtitle TEXT,
facts TEXT,
narrative TEXT,
concepts TEXT,
files_read TEXT,
files_modified TEXT,
prompt_number INTEGER,
discovery_tokens INTEGER DEFAULT 0,
content_hash TEXT,
agent_type TEXT,
agent_id TEXT,
merged_into_project TEXT,
generated_by_model TEXT,
created_at TEXT NOT NULL,
created_at_epoch INTEGER NOT NULL,
FOREIGN KEY(memory_session_id) REFERENCES sdk_sessions(memory_session_id)
ON DELETE CASCADE ON UPDATE CASCADE,
UNIQUE(memory_session_id, content_hash)
);
CREATE INDEX IF NOT EXISTS idx_observations_sdk_session ON observations(memory_session_id);
CREATE INDEX IF NOT EXISTS idx_observations_project ON observations(project);
CREATE INDEX IF NOT EXISTS idx_observations_type ON observations(type);
CREATE INDEX IF NOT EXISTS idx_observations_created ON observations(created_at_epoch DESC);
CREATE INDEX IF NOT EXISTS idx_observations_content_hash ON observations(content_hash, created_at_epoch);
CREATE INDEX IF NOT EXISTS idx_observations_agent_type ON observations(agent_type);
CREATE INDEX IF NOT EXISTS idx_observations_agent_id ON observations(agent_id);
CREATE INDEX IF NOT EXISTS idx_observations_merged_into ON observations(merged_into_project);
-- ─────────────────────────────────────────────────────────────────────
-- session_summaries: one summary row per memory session.
-- ─────────────────────────────────────────────────────────────────────
CREATE TABLE IF NOT EXISTS session_summaries (
id INTEGER PRIMARY KEY AUTOINCREMENT,
memory_session_id TEXT NOT NULL,
project TEXT NOT NULL,
request TEXT,
investigated TEXT,
learned TEXT,
completed TEXT,
next_steps TEXT,
files_read TEXT,
files_edited TEXT,
notes TEXT,
prompt_number INTEGER,
discovery_tokens INTEGER DEFAULT 0,
merged_into_project TEXT,
created_at TEXT NOT NULL,
created_at_epoch INTEGER NOT NULL,
FOREIGN KEY(memory_session_id) REFERENCES sdk_sessions(memory_session_id)
ON DELETE CASCADE ON UPDATE CASCADE
);
CREATE INDEX IF NOT EXISTS idx_session_summaries_sdk_session ON session_summaries(memory_session_id);
CREATE INDEX IF NOT EXISTS idx_session_summaries_project ON session_summaries(project);
CREATE INDEX IF NOT EXISTS idx_session_summaries_created ON session_summaries(created_at_epoch DESC);
CREATE INDEX IF NOT EXISTS idx_summaries_merged_into ON session_summaries(merged_into_project);
-- ─────────────────────────────────────────────────────────────────────
-- pending_messages: persistent work queue for SDK messages.
-- worker_pid + UNIQUE(content_session_id, tool_use_id) make the claim
-- query self-healing without any legacy stale-reset epoch column.
-- ─────────────────────────────────────────────────────────────────────
CREATE TABLE IF NOT EXISTS pending_messages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
session_db_id INTEGER NOT NULL,
content_session_id TEXT NOT NULL,
tool_use_id TEXT,
message_type TEXT NOT NULL
CHECK(message_type IN ('observation', 'summarize')),
tool_name TEXT,
tool_input TEXT,
tool_response TEXT,
cwd TEXT,
last_user_message TEXT,
last_assistant_message TEXT,
prompt_number INTEGER,
status TEXT NOT NULL DEFAULT 'pending'
CHECK(status IN ('pending', 'processing', 'processed', 'failed')),
retry_count INTEGER NOT NULL DEFAULT 0,
created_at_epoch INTEGER NOT NULL,
failed_at_epoch INTEGER,
completed_at_epoch INTEGER,
worker_pid INTEGER,
agent_type TEXT,
agent_id TEXT,
FOREIGN KEY (session_db_id) REFERENCES sdk_sessions(id) ON DELETE CASCADE
);
CREATE INDEX IF NOT EXISTS idx_pending_messages_session ON pending_messages(session_db_id);
CREATE INDEX IF NOT EXISTS idx_pending_messages_status ON pending_messages(status);
CREATE INDEX IF NOT EXISTS idx_pending_messages_claude_session ON pending_messages(content_session_id);
CREATE INDEX IF NOT EXISTS idx_pending_messages_worker_pid ON pending_messages(worker_pid);
CREATE UNIQUE INDEX IF NOT EXISTS ux_pending_session_tool
ON pending_messages(content_session_id, tool_use_id)
WHERE tool_use_id IS NOT NULL;
-- ─────────────────────────────────────────────────────────────────────
-- user_prompts: per-prompt history (UI + FTS search).
-- ─────────────────────────────────────────────────────────────────────
CREATE TABLE IF NOT EXISTS user_prompts (
id INTEGER PRIMARY KEY AUTOINCREMENT,
content_session_id TEXT NOT NULL,
prompt_number INTEGER NOT NULL,
prompt_text TEXT NOT NULL,
created_at TEXT NOT NULL,
created_at_epoch INTEGER NOT NULL,
FOREIGN KEY(content_session_id) REFERENCES sdk_sessions(content_session_id) ON DELETE CASCADE
);
CREATE INDEX IF NOT EXISTS idx_user_prompts_claude_session ON user_prompts(content_session_id);
CREATE INDEX IF NOT EXISTS idx_user_prompts_created ON user_prompts(created_at_epoch DESC);
CREATE INDEX IF NOT EXISTS idx_user_prompts_prompt_number ON user_prompts(prompt_number);
CREATE INDEX IF NOT EXISTS idx_user_prompts_lookup ON user_prompts(content_session_id, prompt_number);
-- ─────────────────────────────────────────────────────────────────────
-- observation_feedback: usage-signal tracking for tier routing.
-- ─────────────────────────────────────────────────────────────────────
CREATE TABLE IF NOT EXISTS observation_feedback (
id INTEGER PRIMARY KEY AUTOINCREMENT,
observation_id INTEGER NOT NULL,
signal_type TEXT NOT NULL,
session_db_id INTEGER,
created_at_epoch INTEGER NOT NULL,
metadata TEXT,
FOREIGN KEY (observation_id) REFERENCES observations(id) ON DELETE CASCADE
);
CREATE INDEX IF NOT EXISTS idx_feedback_observation ON observation_feedback(observation_id);
CREATE INDEX IF NOT EXISTS idx_feedback_signal ON observation_feedback(signal_type);
+48 -21
View File
@@ -10,7 +10,7 @@ import { Database } from 'bun:sqlite';
import { logger } from '../../utils/logger.js';
import type { ObservationInput } from './observations/types.js';
import type { SummaryInput } from './summaries/types.js';
import { computeObservationContentHash, findDuplicateObservation } from './observations/store.js';
import { computeObservationContentHash } from './observations/store.js';
/**
* Result from storeObservations / storeObservationsAndMarkComplete transaction
@@ -64,23 +64,25 @@ export function storeObservationsAndMarkComplete(
const storeAndMarkTx = db.transaction(() => {
const observationIds: number[] = [];
// 1. Store all observations (with content-hash deduplication)
// 1. Store all observations.
// UNIQUE(memory_session_id, content_hash) + ON CONFLICT DO NOTHING enforces
// dedup at the DB layer (Plan 01 Phase 4). RETURNING gives us the row id
// when the insert went through; on conflict we look up the existing id.
const obsStmt = db.prepare(`
INSERT INTO observations
(memory_session_id, project, type, title, subtitle, facts, narrative, concepts,
files_read, files_modified, prompt_number, discovery_tokens, agent_type, agent_id, content_hash, created_at, created_at_epoch)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
ON CONFLICT(memory_session_id, content_hash) DO NOTHING
RETURNING id
`);
const lookupExistingStmt = db.prepare(
'SELECT id FROM observations WHERE memory_session_id = ? AND content_hash = ?'
);
for (const observation of observations) {
const contentHash = computeObservationContentHash(memorySessionId, observation.title, observation.narrative);
const existing = findDuplicateObservation(db, contentHash, timestampEpoch);
if (existing) {
observationIds.push(existing.id);
continue;
}
const result = obsStmt.run(
const inserted = obsStmt.get(
memorySessionId,
project,
observation.type,
@@ -98,8 +100,20 @@ export function storeObservationsAndMarkComplete(
contentHash,
timestampIso,
timestampEpoch
);
observationIds.push(Number(result.lastInsertRowid));
) as { id: number } | null;
if (inserted) {
observationIds.push(inserted.id);
continue;
}
const existing = lookupExistingStmt.get(memorySessionId, contentHash) as { id: number } | null;
if (!existing) {
throw new Error(
`storeObservationsAndMarkComplete: ON CONFLICT without existing row for content_hash=${contentHash}`
);
}
observationIds.push(existing.id);
}
// 2. Store summary if provided
@@ -185,23 +199,24 @@ export function storeObservations(
const storeTx = db.transaction(() => {
const observationIds: number[] = [];
// 1. Store all observations (with content-hash deduplication)
// 1. Store all observations.
// UNIQUE(memory_session_id, content_hash) + ON CONFLICT DO NOTHING enforces
// dedup at the DB layer (Plan 01 Phase 4).
const obsStmt = db.prepare(`
INSERT INTO observations
(memory_session_id, project, type, title, subtitle, facts, narrative, concepts,
files_read, files_modified, prompt_number, discovery_tokens, agent_type, agent_id, content_hash, created_at, created_at_epoch)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
ON CONFLICT(memory_session_id, content_hash) DO NOTHING
RETURNING id
`);
const lookupExistingStmt = db.prepare(
'SELECT id FROM observations WHERE memory_session_id = ? AND content_hash = ?'
);
for (const observation of observations) {
const contentHash = computeObservationContentHash(memorySessionId, observation.title, observation.narrative);
const existing = findDuplicateObservation(db, contentHash, timestampEpoch);
if (existing) {
observationIds.push(existing.id);
continue;
}
const result = obsStmt.run(
const inserted = obsStmt.get(
memorySessionId,
project,
observation.type,
@@ -219,8 +234,20 @@ export function storeObservations(
contentHash,
timestampIso,
timestampEpoch
);
observationIds.push(Number(result.lastInsertRowid));
) as { id: number } | null;
if (inserted) {
observationIds.push(inserted.id);
continue;
}
const existing = lookupExistingStmt.get(memorySessionId, contentHash) as { id: number } | null;
if (!existing) {
throw new Error(
`storeObservations: ON CONFLICT without existing row for content_hash=${contentHash}`
);
}
observationIds.push(existing.id);
}
// 2. Store summary if provided
+67
View File
@@ -341,6 +341,73 @@ export class ChromaMcpManager {
}
}
/**
* Deep semantic-search probe verifies the actual query path works,
* not just that the subprocess responds to one tool. Each stage is wrapped
* in its own try/catch so the returned `stage` reflects where it failed.
*
* Stages:
* - 'list' chroma_list_collections (also counts collections)
* - 'query' chroma_query_documents against cm__claude-mem with a trivial
* query and n_results: 1 (measures latency)
* - 'done' both stages succeeded
*/
async probeSemanticSearch(): Promise<{
ok: boolean;
stage: 'connect' | 'list' | 'query' | 'done';
error?: string;
collections?: number;
queryLatencyMs?: number;
}> {
let collections: number | undefined;
// Stage: list — also lazy-connects via callTool
try {
const listResult: any = await this.callTool('chroma_list_collections', { limit: 100 });
if (Array.isArray(listResult)) {
collections = listResult.length;
} else if (listResult && Array.isArray(listResult.collections)) {
collections = listResult.collections.length;
} else if (listResult && typeof listResult === 'object' && 'length' in listResult) {
collections = (listResult as { length: number }).length;
}
} catch (error) {
const message = error instanceof Error ? error.message : String(error);
logger.warn('CHROMA_MCP', 'Deep probe failed at list stage', { error: message });
return { ok: false, stage: 'list', error: message };
}
// Stage: query — round-trip through the embedding/vector path
const queryStartedAt = Date.now();
try {
await this.callTool('chroma_query_documents', {
collection_name: 'cm__claude-mem',
query_texts: ['ping'],
n_results: 1
});
const queryLatencyMs = Date.now() - queryStartedAt;
return { ok: true, stage: 'done', collections, queryLatencyMs };
} catch (error) {
const queryLatencyMs = Date.now() - queryStartedAt;
const rawMessage = error instanceof Error ? error.message : String(error);
const isMissingOrEmpty = /not exist|missing|empty|no such/i.test(rawMessage);
const errorMessage = isMissingOrEmpty
? `collection cm__claude-mem missing or empty (${rawMessage})`
: rawMessage;
logger.warn('CHROMA_MCP', 'Deep probe failed at query stage', {
error: rawMessage,
queryLatencyMs
});
return {
ok: false,
stage: 'query',
error: errorMessage,
collections,
queryLatencyMs
};
}
}
/**
* Gracefully stop the MCP connection and kill the chroma-mcp subprocess.
* client.close() sends stdin close -> SIGTERM -> SIGKILL to the subprocess.
+14 -7
View File
@@ -549,9 +549,10 @@ export class ChromaSync {
* Reads from SQLite and syncs in batches
* @param projectOverride - If provided, backfill this project instead of this.project.
* Used by backfillAllProjects() to iterate projects without mutating instance state.
* @param storeOverride - If provided, use this SessionStore instead of creating a new one.
* Throws error if backfill fails
*/
async ensureBackfilled(projectOverride?: string): Promise<void> {
async ensureBackfilled(projectOverride?: string, storeOverride?: SessionStore): Promise<void> {
const backfillProject = projectOverride ?? this.project;
logger.info('CHROMA_SYNC', 'Starting smart backfill', { project: backfillProject });
@@ -560,7 +561,7 @@ export class ChromaSync {
// Fetch existing IDs from Chroma (fast, metadata only)
const existing = await this.getExistingChromaIds(backfillProject);
const db = new SessionStore();
const db = storeOverride ?? new SessionStore();
try {
await this.runBackfillPipeline(db, backfillProject, existing);
@@ -568,7 +569,10 @@ export class ChromaSync {
logger.error('CHROMA_SYNC', 'Backfill failed', { project: backfillProject }, error instanceof Error ? error : new Error(String(error)));
throw new Error(`Backfill failed: ${error instanceof Error ? error.message : String(error)}`);
} finally {
db.close();
// Only close if we created it
if (!storeOverride) {
db.close();
}
}
}
@@ -861,8 +865,8 @@ export class ChromaSync {
* with project scoped via metadata, matching how DatabaseManager and SearchManager operate.
* Designed to be called fire-and-forget on worker startup.
*/
static async backfillAllProjects(): Promise<void> {
const db = new SessionStore();
static async backfillAllProjects(storeOverride?: SessionStore): Promise<void> {
const db = storeOverride ?? new SessionStore();
const sync = new ChromaSync('claude-mem');
try {
const projects = db.db.prepare(
@@ -873,7 +877,7 @@ export class ChromaSync {
for (const { project } of projects) {
try {
await sync.ensureBackfilled(project);
await sync.ensureBackfilled(project, db);
} catch (error) {
if (error instanceof Error) {
logger.error('CHROMA_SYNC', `Backfill failed for project: ${project}`, {}, error);
@@ -885,7 +889,10 @@ export class ChromaSync {
}
} finally {
await sync.close();
db.close();
// Only close if we created it
if (!storeOverride) {
db.close();
}
}
}
+68 -37
View File
@@ -1,6 +1,5 @@
import path from 'path';
import { sessionInitHandler } from '../../cli/handlers/session-init.js';
import { observationHandler } from '../../cli/handlers/observation.js';
import { fileEditHandler } from '../../cli/handlers/file-edit.js';
import { sessionCompleteHandler } from '../../cli/handlers/session-complete.js';
import { ensureWorkerRunning, workerHttpRequest } from '../../shared/worker-utils.js';
@@ -12,6 +11,7 @@ import { resolveFieldSpec, resolveFields, matchesRule } from './field-utils.js';
import { expandHomePath } from './config.js';
import type { TranscriptSchema, WatchTarget, SchemaEvent } from './types.js';
import { normalizePlatformSource } from '../../shared/platform-source.js';
import { ingestObservation } from '../worker/http/shared.js';
interface SessionState {
sessionId: string;
@@ -20,14 +20,10 @@ interface SessionState {
project?: string;
lastUserMessage?: string;
lastAssistantMessage?: string;
pendingTools: Map<string, { name?: string; input?: unknown }>;
}
interface PendingTool {
id?: string;
name?: string;
input?: unknown;
response?: unknown;
// In-memory pairing for transcript schemas (e.g. Codex) where tool_use
// carries toolName + toolInput and tool_result only carries tool_use_id +
// output. Keyed by toolId; consumed and deleted on the matching tool_result.
pendingTools?: Map<string, { toolName: string; toolInput: unknown }>;
}
export class TranscriptEventProcessor {
@@ -56,7 +52,6 @@ export class TranscriptEventProcessor {
session = {
sessionId,
platformSource: normalizePlatformSource(watch.name),
pendingTools: new Map()
};
this.sessions.set(key, session);
}
@@ -129,7 +124,7 @@ export class TranscriptEventProcessor {
const project = this.resolveProject(entry, watch, schema, event, session);
if (project) session.project = project;
const fields = resolveFields(event.fields, entry, { watch, schema, session });
const fields = resolveFields(event.fields, entry, { watch, schema, session: session as unknown as Record<string, unknown> });
switch (event.action) {
case 'session_context':
@@ -196,12 +191,6 @@ export class TranscriptEventProcessor {
const toolInput = this.maybeParseJson(fields.toolInput);
const toolResponse = this.maybeParseJson(fields.toolResponse);
const pending: PendingTool = { id: toolId, name: toolName, input: toolInput, response: toolResponse };
if (toolId) {
session.pendingTools.set(toolId, { name: pending.name, input: pending.input });
}
if (toolName === 'apply_patch' && typeof toolInput === 'string') {
const files = this.parseApplyPatchFiles(toolInput);
for (const filePath of files) {
@@ -212,35 +201,61 @@ export class TranscriptEventProcessor {
}
}
if (toolResponse !== undefined && toolName) {
// Two schema shapes to support:
// 1. Self-contained events (e.g. Claude JSONL): tool_use and tool_result
// both carry toolName; tool_use may already include toolResponse.
// 2. Split events (e.g. Codex): tool_use carries toolName + toolInput,
// tool_result carries only toolUseId + output. Neither side alone
// has both toolName and toolResponse.
//
// For (1) we emit eagerly when toolResponse is present. For (2) we stash
// toolName/toolInput on the session keyed by toolId so handleToolResult
// can join them at tool_result time. The DB's
// UNIQUE(content_session_id, tool_use_id) index collapses any duplicate
// emissions that arise when both events carry a complete record.
if (toolName && toolResponse !== undefined) {
await this.sendObservation(session, {
toolName,
toolInput,
toolResponse
toolResponse,
toolUseId: toolId,
});
} else if (toolName && toolId) {
if (!session.pendingTools) session.pendingTools = new Map();
session.pendingTools.set(toolId, { toolName, toolInput });
}
}
private async handleToolResult(session: SessionState, fields: Record<string, unknown>): Promise<void> {
const toolId = typeof fields.toolId === 'string' ? fields.toolId : undefined;
const toolName = typeof fields.toolName === 'string' ? fields.toolName : undefined;
let toolName = typeof fields.toolName === 'string' ? fields.toolName : undefined;
const toolResponse = this.maybeParseJson(fields.toolResponse);
let toolInput = this.maybeParseJson(fields.toolInput);
let toolInput: unknown = this.maybeParseJson(fields.toolInput);
let name = toolName;
if (toolId && session.pendingTools.has(toolId)) {
const pending = session.pendingTools.get(toolId)!;
toolInput = pending.input ?? toolInput;
name = name ?? pending.name;
session.pendingTools.delete(toolId);
// Consume any pending-tool entry for this toolId regardless of whether the
// tool_result already carries toolName: in the split-schema path the
// result always resolves the pending entry, so leaving it behind would
// grow the map until session end.
if (toolId && session.pendingTools) {
const pending = session.pendingTools.get(toolId);
if (pending) {
if (!toolName) toolName = pending.toolName;
if (toolInput === undefined) toolInput = pending.toolInput;
session.pendingTools.delete(toolId);
}
}
if (name) {
if (toolName) {
await this.sendObservation(session, {
toolName: name,
toolName,
toolInput,
toolResponse
toolResponse,
toolUseId: toolId,
});
} else {
logger.debug('TRANSCRIPT', 'Dropping tool_result with no resolvable toolName', {
sessionId: session.sessionId,
toolId,
});
}
}
@@ -249,14 +264,23 @@ export class TranscriptEventProcessor {
const toolName = typeof fields.toolName === 'string' ? fields.toolName : undefined;
if (!toolName) return;
await observationHandler.execute({
sessionId: session.sessionId,
// PATHFINDER plan 03 phase 7: replace HTTP loopback (worker → its own
// /api/sessions/observations endpoint) with a direct in-process call to
// ingestObservation. Same implementation backs the cross-process HTTP
// route handler (one helper, N callers).
const result = ingestObservation({
contentSessionId: session.sessionId,
cwd: session.cwd ?? process.cwd(),
toolName,
toolInput: this.maybeParseJson(fields.toolInput),
toolResponse: this.maybeParseJson(fields.toolResponse),
platform: session.platformSource
platformSource: session.platformSource,
toolUseId: typeof fields.toolUseId === 'string' ? fields.toolUseId : undefined,
});
if (!result.ok) {
throw new Error(`ingestObservation failed: ${result.reason}`);
}
}
private async sendFileEdit(session: SessionState, fields: Record<string, unknown>): Promise<void> {
@@ -277,10 +301,17 @@ export class TranscriptEventProcessor {
const trimmed = value.trim();
if (!trimmed) return value;
if (!(trimmed.startsWith('{') || trimmed.startsWith('['))) return value;
// Pass through the raw string on parse failure rather than throwing.
// Throwing from this helper propagates to `handleLine`'s outer catch,
// which then silently drops the entire transcript line — including any
// valid sibling fields. A single malformed JSON-shaped field should
// degrade to opaque-string handling, not lose the whole observation.
try {
return JSON.parse(trimmed);
} catch (error: unknown) {
logger.debug('WORKER', 'Failed to parse JSON string', { length: trimmed.length }, error instanceof Error ? error : undefined);
} catch (error) {
logger.debug('TRANSCRIPT', 'Field looked like JSON but did not parse; using raw string', {
preview: trimmed.slice(0, 120),
}, error instanceof Error ? error : undefined);
return value;
}
}
@@ -314,7 +345,7 @@ export class TranscriptEventProcessor {
platform: session.platformSource
});
await this.updateContext(session, watch);
session.pendingTools.clear();
session.pendingTools?.clear();
const key = this.getSessionKey(watch, session.sessionId);
this.sessions.delete(key);
}
+77 -13
View File
@@ -1,5 +1,5 @@
import { existsSync, statSync, watch as fsWatch, createReadStream } from 'fs';
import { basename, join } from 'path';
import { basename, join, resolve as resolvePath, sep as pathSep } from 'path';
import { globSync } from 'glob';
import { logger } from '../../utils/logger.js';
import { expandHomePath } from './config.js';
@@ -84,7 +84,7 @@ export class TranscriptWatcher {
private processor = new TranscriptEventProcessor();
private tailers = new Map<string, FileTailer>();
private state: TranscriptWatchState;
private rescanTimers: Array<NodeJS.Timeout> = [];
private rootWatchers: Array<ReturnType<typeof fsWatch>> = [];
constructor(private config: TranscriptWatchConfig, private statePath: string) {
this.state = loadWatchState(statePath);
@@ -101,10 +101,10 @@ export class TranscriptWatcher {
tailer.close();
}
this.tailers.clear();
for (const timer of this.rescanTimers) {
clearInterval(timer);
for (const watcher of this.rootWatchers) {
watcher.close();
}
this.rescanTimers = [];
this.rootWatchers = [];
}
private async setupWatch(watch: WatchTarget): Promise<void> {
@@ -121,16 +121,80 @@ export class TranscriptWatcher {
await this.addTailer(filePath, watch, schema, true);
}
const rescanIntervalMs = watch.rescanIntervalMs ?? 5000;
const timer = setInterval(async () => {
const newFiles = this.resolveWatchFiles(resolvedPath);
for (const filePath of newFiles) {
if (!this.tailers.has(filePath)) {
await this.addTailer(filePath, watch, schema, false);
// PATHFINDER plan 03 phase 5: 5-second rescan timer replaced by a
// recursive fs.watch on the configured root. Requires Node 20+ on Linux
// for recursive mode (engines.node >= 20.0.0 — already enforced in
// package.json).
const watchRoot = this.deepestNonGlobAncestor(resolvedPath);
if (!watchRoot || !existsSync(watchRoot)) {
logger.debug('TRANSCRIPT', 'Watch root does not exist, skipping fs.watch', { watch: watch.name, watchRoot });
return;
}
try {
const watcher = fsWatch(watchRoot, { recursive: true, persistent: true }, (event, name) => {
if (!name) return; // some events omit filename
// Skip the glob scan for paths we already tail — JSONL appends fire
// here on every line and a full resolveWatchFiles() per append is
// more expensive than the prior 5-s interval. Only unknown paths
// warrant a rescan (new transcript files surface here first).
const changed = resolvePath(watchRoot, name);
if (this.tailers.has(changed)) return;
const matches = this.resolveWatchFiles(resolvedPath);
for (const filePath of matches) {
if (!this.tailers.has(filePath)) {
void this.addTailer(filePath, watch, schema, false);
}
}
});
this.rootWatchers.push(watcher);
logger.info('TRANSCRIPT', 'Watching transcript root recursively', { watch: watch.name, watchRoot });
} catch (error) {
logger.warn('TRANSCRIPT', 'Failed to start recursive fs.watch on transcript root', {
watch: watch.name,
watchRoot,
}, error instanceof Error ? error : undefined);
}
}
/**
* Return the deepest path component that contains no glob meta-characters.
* Used to anchor `fs.watch(recursive: true)` for both literal directories
* and patterns like `~/.codex/sessions/**\/*.jsonl`.
*
* Handles both `/` and `\` as separators so Windows-native paths
* (e.g. `C:\Users\x\codex\sessions\**\*.jsonl`) resolve correctly. When
* the input is purely glob meta (no literal prefix) we return an empty
* string so the caller skips the watch instead of anchoring at the
* filesystem root.
*/
private deepestNonGlobAncestor(inputPath: string): string {
if (!this.hasGlob(inputPath)) {
// Literal path: if it's a file, return its directory; otherwise return as-is.
if (existsSync(inputPath)) {
try {
const stat = statSync(inputPath);
return stat.isDirectory() ? inputPath : resolvePath(inputPath, '..');
} catch {
return resolvePath(inputPath, '..');
}
}
}, rescanIntervalMs);
this.rescanTimers.push(timer);
return inputPath;
}
const segments = inputPath.split(/[/\\]/);
const literalSegments: string[] = [];
for (const segment of segments) {
if (/[*?[\]{}()]/.test(segment)) break;
literalSegments.push(segment);
}
if (literalSegments.length === 0) return '';
if (literalSegments.length === 1 && literalSegments[0] === '') {
// Input started with a separator but the first real segment was a
// glob (e.g. `/**/foo`). Don't silently broaden the watch to `/`.
return '';
}
return literalSegments.join(pathSep);
}
private resolveSchema(watch: WatchTarget): TranscriptSchema | null {
+146 -265
View File
@@ -79,6 +79,7 @@ import { DatabaseManager } from './worker/DatabaseManager.js';
import { SessionManager } from './worker/SessionManager.js';
import { SSEBroadcaster } from './worker/SSEBroadcaster.js';
import { SDKAgent } from './worker/SDKAgent.js';
import type { WorkerRef } from './worker/agents/types.js';
import { GeminiAgent, isGeminiSelected, isGeminiAvailable } from './worker/GeminiAgent.js';
import { OpenRouterAgent, isOpenRouterSelected, isOpenRouterAvailable } from './worker/OpenRouterAgent.js';
import { PaginationHelper } from './worker/PaginationHelper.js';
@@ -88,6 +89,7 @@ import { FormattingService } from './worker/FormattingService.js';
import { TimelineService } from './worker/TimelineService.js';
import { SessionEventBroadcaster } from './worker/events/SessionEventBroadcaster.js';
import { SessionCompletionHandler } from './worker/session/SessionCompletionHandler.js';
import { setIngestContext, attachIngestGeneratorStarter } from './worker/http/shared.js';
import { DEFAULT_CONFIG_PATH, DEFAULT_STATE_PATH, expandHomePath, loadTranscriptWatchConfig, writeSampleConfig } from './transcripts/config.js';
import { TranscriptWatcher } from './transcripts/watcher.js';
@@ -100,14 +102,18 @@ import { SettingsRoutes } from './worker/http/routes/SettingsRoutes.js';
import { LogsRoutes } from './worker/http/routes/LogsRoutes.js';
import { MemoryRoutes } from './worker/http/routes/MemoryRoutes.js';
import { CorpusRoutes } from './worker/http/routes/CorpusRoutes.js';
import { ChromaRoutes } from './worker/http/routes/ChromaRoutes.js';
// Knowledge agent services
import { CorpusStore } from './worker/knowledge/CorpusStore.js';
import { CorpusBuilder } from './worker/knowledge/CorpusBuilder.js';
import { KnowledgeAgent } from './worker/knowledge/KnowledgeAgent.js';
// Process management for zombie cleanup (Issue #737)
import { startOrphanReaper, reapOrphanedProcesses, getProcessBySession, ensureProcessExit } from './worker/ProcessRegistry.js';
// Primary-path session lifecycle helpers — no reapers, no orphan sweeps.
// The SDK subprocess is spawned in its own POSIX process group via
// createSdkSpawnFactory; teardown via ensureSdkProcessExit kills the whole
// group so no descendants leak (Principle 5).
import { getSdkProcessForSession, ensureSdkProcessExit } from '../supervisor/process-registry.js';
/**
* Build JSON status output for hook framework communication.
@@ -133,7 +139,7 @@ export function buildStatusOutput(status: 'ready' | 'error', message?: string):
};
}
export class WorkerService {
export class WorkerService implements WorkerRef {
private server: Server;
private startTime: number = Date.now();
private mcpClient: Client;
@@ -146,14 +152,14 @@ export class WorkerService {
// Service layer
private dbManager: DatabaseManager;
private sessionManager: SessionManager;
private sseBroadcaster: SSEBroadcaster;
public sseBroadcaster: SSEBroadcaster;
private sdkAgent: SDKAgent;
private geminiAgent: GeminiAgent;
private openRouterAgent: OpenRouterAgent;
private paginationHelper: PaginationHelper;
private settingsManager: SettingsManager;
private sessionEventBroadcaster: SessionEventBroadcaster;
private sessionCompletionHandler: SessionCompletionHandler;
private completionHandler: SessionCompletionHandler;
private corpusStore: CorpusStore;
// Route handlers
@@ -169,12 +175,6 @@ export class WorkerService {
private initializationComplete: Promise<void>;
private resolveInitialization!: () => void;
// Orphan reaper cleanup function (Issue #737)
private stopOrphanReaper: (() => void) | null = null;
// Stale session reaper interval (Issue #1168)
private staleSessionReaperInterval: ReturnType<typeof setInterval> | null = null;
// AI interaction tracking for health endpoint
private lastAiInteraction: {
timestamp: number;
@@ -200,13 +200,21 @@ export class WorkerService {
this.paginationHelper = new PaginationHelper(this.dbManager);
this.settingsManager = new SettingsManager(this.dbManager);
this.sessionEventBroadcaster = new SessionEventBroadcaster(this.sseBroadcaster, this);
this.sessionCompletionHandler = new SessionCompletionHandler(
this.completionHandler = new SessionCompletionHandler(
this.sessionManager,
this.sessionEventBroadcaster,
this.dbManager
this.dbManager,
);
this.corpusStore = new CorpusStore();
// Wire ingest helpers (plan 03 phase 0). Worker-internal callers use these
// directly instead of HTTP-loopback into our own routes.
setIngestContext({
sessionManager: this.sessionManager,
dbManager: this.dbManager,
eventBroadcaster: this.sessionEventBroadcaster,
});
// Set callback for when sessions are deleted
this.sessionManager.setOnSessionDeleted(() => {
this.broadcastProcessingStatus();
@@ -268,6 +276,9 @@ export class WorkerService {
private registerRoutes(): void {
// IMPORTANT: Middleware must be registered BEFORE routes (Express processes in order)
// Register Chroma routes immediately so they bypass the initialization guard
this.server.registerRoutes(new ChromaRoutes());
// Early handler for /api/context/inject — fail open if not yet initialized
this.server.app.get('/api/context/inject', async (req, res, next) => {
if (!this.initializationCompleteFlag || !this.searchRoutes) {
@@ -281,14 +292,20 @@ export class WorkerService {
// Guard ALL /api/* routes during initialization — wait for DB with timeout
// Exceptions: /api/health, /api/readiness, /api/version (handled by Server.ts core routes)
// and /api/context/inject (handled above with fail-open)
// and /api/chroma/status (diagnostic endpoint)
this.server.app.use('/api', async (req, res, next) => {
// Bypass guard for diagnostic endpoints
if (req.path === '/chroma/status' || req.path === '/health' || req.path === '/readiness' || req.path === '/version') {
next();
return;
}
if (this.initializationCompleteFlag) {
next();
return;
}
const timeoutMs = 30000;
const timeoutMs = 120000; // 2 minutes
const timeoutPromise = new Promise<void>((_, reject) =>
setTimeout(() => reject(new Error('Database initialization timeout')), timeoutMs)
);
@@ -312,7 +329,15 @@ export class WorkerService {
// Standard routes (registered AFTER guard middleware)
this.server.registerRoutes(new ViewerRoutes(this.sseBroadcaster, this.dbManager, this.sessionManager));
this.server.registerRoutes(new SessionRoutes(this.sessionManager, this.dbManager, this.sdkAgent, this.geminiAgent, this.openRouterAgent, this.sessionEventBroadcaster, this, this.sessionCompletionHandler));
const sessionRoutes = new SessionRoutes(this.sessionManager, this.dbManager, this.sdkAgent, this.geminiAgent, this.openRouterAgent, this.sessionEventBroadcaster, this, this.completionHandler);
this.server.registerRoutes(sessionRoutes);
// Wire the generator-starter callback now that SessionRoutes exists.
// `setIngestContext` ran in the constructor before routes were
// constructed; transcript-watcher observations depend on this side-effect
// to auto-start the SDK generator after enqueue.
attachIngestGeneratorStarter((sessionDbId, source) =>
sessionRoutes.ensureGeneratorRunning(sessionDbId, source),
);
this.server.registerRoutes(new DataRoutes(this.paginationHelper, this.dbManager, this.sessionManager, this.sseBroadcaster, this, this.startTime));
this.server.registerRoutes(new SettingsRoutes(this.settingsManager));
this.server.registerRoutes(new LogsRoutes());
@@ -359,6 +384,7 @@ export class WorkerService {
*/
private async initializeBackground(): Promise<void> {
try {
logger.info('WORKER', 'Background initialization starting...');
await aggressiveStartupCleanup();
// Load mode configuration
@@ -368,47 +394,39 @@ export class WorkerService {
const settings = SettingsDefaultsManager.loadFromFile(USER_SETTINGS_PATH);
const modeId = settings.CLAUDE_MEM_MODE;
ModeManager.getInstance().loadMode(modeId);
logger.info('SYSTEM', `Mode loaded: ${modeId}`);
// One-time chroma wipe for users upgrading from versions with duplicate worker bugs.
// Only runs in local mode (chroma is local-only). Backfill at line ~414 rebuilds from SQLite.
if (settings.CLAUDE_MEM_MODE === 'local' || !settings.CLAUDE_MEM_MODE) {
logger.info('WORKER', 'Checking for one-time Chroma migration...');
runOneTimeChromaMigration();
}
// One-time remap of pre-worktree project names using pending_messages.cwd.
// Must run before dbManager.initialize() so we don't hold the DB open.
logger.info('WORKER', 'Checking for one-time CWD remap...');
runOneTimeCwdRemap();
// Stamp merged worktrees so their observations surface under the parent
// project. Runs every startup (not marker-gated) because git state evolves
// and the engine is fully idempotent. Must also precede dbManager.initialize().
//
// The worker daemon is spawned with cwd=marketplace-plugin-dir (not a git
// repo), so we can't seed adoption with process.cwd(). Instead, discover
// parent repos from recorded pending_messages.cwd values.
let adoptions: Awaited<ReturnType<typeof adoptMergedWorktreesForAllKnownRepos>> | null = null;
try {
adoptions = await adoptMergedWorktreesForAllKnownRepos({});
} catch (err) {
// [ANTI-PATTERN IGNORED]: Worktree adoption is best-effort on startup; failure must not block worker initialization
if (err instanceof Error) {
logger.error('WORKER', 'Worktree adoption failed (non-fatal)', {}, err);
} else {
logger.error('WORKER', 'Worktree adoption failed (non-fatal) with non-Error', {}, new Error(String(err)));
}
}
if (adoptions) {
for (const adoption of adoptions) {
if (adoption.adoptedObservations > 0 || adoption.adoptedSummaries > 0 || adoption.chromaUpdates > 0) {
logger.info('SYSTEM', 'Merged worktrees adopted on startup', adoption);
}
if (adoption.errors.length > 0) {
logger.warn('SYSTEM', 'Worktree adoption had per-branch errors', {
repoPath: adoption.repoPath,
errors: adoption.errors
});
// Stamp merged worktrees (Non-blocking, fire-and-forget)
logger.info('WORKER', 'Adopting merged worktrees (background)...');
adoptMergedWorktreesForAllKnownRepos({}).then(adoptions => {
if (adoptions) {
for (const adoption of adoptions) {
if (adoption.adoptedObservations > 0 || adoption.adoptedSummaries > 0 || adoption.chromaUpdates > 0) {
logger.info('SYSTEM', 'Merged worktrees adopted in background', adoption);
}
if (adoption.errors.length > 0) {
logger.warn('SYSTEM', 'Worktree adoption had per-branch errors', {
repoPath: adoption.repoPath,
errors: adoption.errors
});
}
}
}
}
}).catch(err => {
logger.error('WORKER', 'Worktree adoption failed (background)', {}, err instanceof Error ? err : new Error(String(err)));
});
// Initialize ChromaMcpManager only if Chroma is enabled
const chromaEnabled = settings.CLAUDE_MEM_CHROMA_ENABLED !== 'false';
@@ -419,21 +437,24 @@ export class WorkerService {
logger.info('SYSTEM', 'Chroma disabled via CLAUDE_MEM_CHROMA_ENABLED=false, skipping ChromaMcpManager');
}
const modeId = settings.CLAUDE_MEM_MODE;
ModeManager.getInstance().loadMode(modeId);
logger.info('SYSTEM', `Mode loaded: ${modeId}`);
logger.info('WORKER', 'Initializing database manager...');
await this.dbManager.initialize();
// Reset any messages that were processing when worker died
const { PendingMessageStore } = await import('./sqlite/PendingMessageStore.js');
const pendingStore = new PendingMessageStore(this.dbManager.getSessionStore().db, 3);
const resetCount = pendingStore.resetStaleProcessingMessages(0); // 0 = reset ALL processing
if (resetCount > 0) {
logger.info('SYSTEM', `Reset ${resetCount} stale processing messages to pending`);
// One-shot GC for terminally-failed rows
try {
logger.info('WORKER', 'Running startup GC for pending messages...');
const { PendingMessageStore } = await import('./sqlite/PendingMessageStore.js');
const pendingStore = new PendingMessageStore(this.dbManager.getSessionStore().db, 3);
const cleared = pendingStore.clearFailedOlderThan(7 * 24 * 60 * 60 * 1000);
if (cleared > 0) {
logger.info('QUEUE', 'Startup GC cleared old failed pending_messages rows', { cleared });
}
} catch (err) {
logger.warn('QUEUE', 'Startup GC for failed pending_messages rows failed', {}, err instanceof Error ? err : undefined);
}
// Initialize search services
logger.info('WORKER', 'Initializing search services...');
const formattingService = new FormattingService();
const timelineService = new TimelineService();
const searchManager = new SearchManager(
@@ -464,8 +485,6 @@ export class WorkerService {
logger.info('WORKER', 'CorpusRoutes registered');
// DB and search are ready — mark initialization complete so hooks can proceed.
// MCP connection is tracked separately via mcpReady and is NOT required for
// the worker to serve context/search requests.
this.initializationCompleteFlag = true;
this.resolveInitialization();
logger.info('SYSTEM', 'Core initialization complete (DB + search ready)');
@@ -474,7 +493,7 @@ export class WorkerService {
// Auto-backfill Chroma for all projects if out of sync with SQLite (fire-and-forget)
if (this.chromaMcpManager) {
ChromaSync.backfillAllProjects().then(() => {
ChromaSync.backfillAllProjects(this.dbManager.getSessionStore()).then(() => {
logger.info('CHROMA_SYNC', 'Backfill check complete for all projects');
}).catch(error => {
logger.error('CHROMA_SYNC', 'Backfill failed (non-blocking)', {}, error as Error);
@@ -482,134 +501,55 @@ export class WorkerService {
}
// Mark MCP as externally ready once the bundled stdio server binary exists.
// Codex/Claude Desktop connect to this binary directly; the loopback client
// below is only a best-effort self-check and should not mark health false.
const mcpServerPath = path.join(__dirname, 'mcp-server.cjs');
this.mcpReady = existsSync(mcpServerPath);
// Best-effort loopback MCP self-check
getSupervisor().assertCanSpawn('mcp server');
const transport = new StdioClientTransport({
command: process.execPath, // Use resolved path, not bare 'node' which fails on non-interactive PATH (#1876)
args: [mcpServerPath],
env: sanitizeEnv(process.env)
// Best-effort loopback MCP self-check (Non-blocking, F&F)
this.runMcpSelfCheck(mcpServerPath).catch(err => {
logger.debug('WORKER', 'MCP self-check failed (non-fatal)', { error: err.message });
});
const MCP_INIT_TIMEOUT_MS = 300000;
return;
} catch (error) {
// Background initialization failed - log and let worker fail health checks
logger.error('SYSTEM', 'Background initialization failed', {}, error instanceof Error ? error : undefined);
}
}
/**
* Run a best-effort loopback MCP self-check to verify the bundled server can start.
* This is entirely diagnostic and does not block worker availability.
*/
private async runMcpSelfCheck(mcpServerPath: string): Promise<void> {
try {
getSupervisor().assertCanSpawn('mcp server');
const transport = new StdioClientTransport({
command: process.execPath,
args: [mcpServerPath],
env: Object.fromEntries(
Object.entries(sanitizeEnv(process.env)).filter(([, value]) => value !== undefined)
) as Record<string, string>
});
const MCP_INIT_TIMEOUT_MS = 60000; // 1 minute is plenty for local check
const mcpConnectionPromise = this.mcpClient.connect(transport);
let timeoutId: ReturnType<typeof setTimeout>;
const timeoutPromise = new Promise<never>((_, reject) => {
timeoutId = setTimeout(
() => reject(new Error('MCP connection timeout after 5 minutes')),
MCP_INIT_TIMEOUT_MS
setTimeout(
() => reject(new Error('MCP connection timeout')),
60000
);
});
try {
await Promise.race([mcpConnectionPromise, timeoutPromise]);
} catch (connectionError) {
clearTimeout(timeoutId!);
logger.warn('WORKER', 'MCP loopback self-check failed, cleaning up subprocess', {
error: connectionError instanceof Error ? connectionError.message : String(connectionError)
});
try {
await transport.close();
} catch (transportCloseError) {
// [ANTI-PATTERN IGNORED]: transport.close() is best-effort cleanup after MCP connection already failed; supervisor handles orphan processes
logger.debug('WORKER', 'transport.close() failed during MCP cleanup', {
error: transportCloseError instanceof Error ? transportCloseError.message : String(transportCloseError)
});
}
logger.info('WORKER', 'Bundled MCP server remains available for external stdio clients', {
path: mcpServerPath
});
return;
}
clearTimeout(timeoutId!);
await Promise.race([mcpConnectionPromise, timeoutPromise]);
logger.info('WORKER', 'MCP loopback self-check connected successfully');
const mcpProcess = (transport as unknown as { _process?: import('child_process').ChildProcess })._process;
if (mcpProcess?.pid) {
getSupervisor().registerProcess('mcp-server', {
pid: mcpProcess.pid,
type: 'mcp',
startedAt: new Date().toISOString()
}, mcpProcess);
mcpProcess.once('exit', () => {
getSupervisor().unregisterProcess('mcp-server');
});
}
logger.success('WORKER', 'MCP loopback self-check connected');
// Start orphan reaper to clean up zombie processes (Issue #737)
this.stopOrphanReaper = startOrphanReaper(() => {
const activeIds = new Set<number>();
for (const [id] of this.sessionManager['sessions']) {
activeIds.add(id);
}
return activeIds;
});
logger.info('SYSTEM', 'Started orphan reaper (runs every 30 seconds)');
// Reap stale sessions to unblock orphan process cleanup (Issue #1168)
this.staleSessionReaperInterval = setInterval(async () => {
try {
const reaped = await this.sessionManager.reapStaleSessions();
if (reaped > 0) {
logger.info('SYSTEM', `Reaped ${reaped} stale sessions`);
}
} catch (e) {
// [ANTI-PATTERN IGNORED]: setInterval callback cannot throw; reaper retries on next tick (every 2 min)
if (e instanceof Error) {
logger.error('WORKER', 'Stale session reaper error', {}, e);
} else {
logger.error('WORKER', 'Stale session reaper error with non-Error', {}, new Error(String(e)));
}
}
// Purge stale failed pending messages to prevent unbounded queue growth (#1957)
// Only remove failures older than 1 hour to preserve recent failures for inspection/retry
try {
const pendingStore = this.sessionManager.getPendingMessageStore();
const FAILED_MESSAGE_RETENTION_MS = 60 * 60 * 1000; // 1 hour
const purged = pendingStore.clearFailedOlderThan(FAILED_MESSAGE_RETENTION_MS);
if (purged > 0) {
logger.info('SYSTEM', `Purged ${purged} stale failed pending messages (older than 1h)`);
}
} catch (e) {
if (e instanceof Error) {
logger.error('WORKER', 'Failed message purge error', {}, e);
} else {
logger.error('WORKER', 'Failed message purge error with non-Error', {}, new Error(String(e)));
}
}
// Periodic WAL checkpoint to prevent unbounded WAL growth (#1956)
try {
this.dbManager.getSessionStore().db.run('PRAGMA wal_checkpoint(PASSIVE)');
} catch (e) {
if (e instanceof Error) {
logger.error('WORKER', 'WAL checkpoint error', {}, e);
} else {
logger.error('WORKER', 'WAL checkpoint error with non-Error', {}, new Error(String(e)));
}
}
}, 2 * 60 * 1000);
// Auto-recover orphaned queues (fire-and-forget with error logging)
this.processPendingQueues(50).then(result => {
if (result.sessionsStarted > 0) {
logger.info('SYSTEM', `Auto-recovered ${result.sessionsStarted} sessions with pending work`, {
totalPending: result.totalPendingSessions,
started: result.sessionsStarted,
sessionIds: result.startedSessionIds
});
}
}).catch(error => {
logger.error('SYSTEM', 'Auto-recovery of pending queues failed', {}, error as Error);
});
// Cleanup
await transport.close();
} catch (error) {
logger.error('SYSTEM', 'Background initialization failed', {}, error as Error);
throw error;
logger.warn('WORKER', 'MCP loopback self-check failed', {
error: error instanceof Error ? error.message : String(error)
});
}
}
@@ -787,10 +727,11 @@ export class WorkerService {
throw error;
})
.finally(async () => {
// CRITICAL: Verify subprocess exit to prevent zombie accumulation (Issue #1168)
const trackedProcess = getProcessBySession(session.sessionDbId);
// Primary-path subprocess teardown — process-group kill ensures any
// SDK descendants are reaped too (Principle 5).
const trackedProcess = getSdkProcessForSession(session.sessionDbId);
if (trackedProcess && trackedProcess.process.exitCode === null) {
await ensureProcessExit(trackedProcess, 5000);
await ensureSdkProcessExit(trackedProcess, 5000);
}
session.generatorPromise = null;
@@ -833,12 +774,14 @@ export class WorkerService {
session.consecutiveRestarts = (session.consecutiveRestarts || 0) + 1; // Keep for logging
if (!restartAllowed) {
logger.error('SYSTEM', 'Restart guard tripped: too many restarts in window, stopping to prevent runaway costs', {
logger.error('SYSTEM', 'Restart guard tripped: session is dead, terminating', {
sessionId: session.sessionDbId,
pendingCount,
restartsInWindow: session.restartGuard.restartsInWindow,
windowMs: session.restartGuard.windowMs,
maxRestarts: session.restartGuard.maxRestarts
maxRestarts: session.restartGuard.maxRestarts,
consecutiveFailures: session.restartGuard.consecutiveFailuresSinceSuccess,
maxConsecutiveFailures: session.restartGuard.maxConsecutiveFailures
});
session.consecutiveRestarts = 0;
this.terminateSession(session.sessionDbId, 'max_restarts_exceeded');
@@ -856,26 +799,17 @@ export class WorkerService {
this.startSessionProcessor(session, 'pending-work-restart');
this.broadcastProcessingStatus();
} else {
// Successful completion with no pending work — clean up session.
// Only remove from the in-memory map if finalize succeeds; otherwise
// leave the session in place so the 60s orphan reaper (or a future
// retry) can repair the inconsistency. Removing a still-"active" DB
// row from memory would orphan it indefinitely under the new
// fire-and-forget Stop hook (no /api/sessions/complete to retry).
// Successful completion with no pending work — finalize then drop
// in-memory state. finalizeSession flips sdk_sessions.status to
// 'completed', drains orphaned pendings, broadcasts; idempotent so
// the later POST /api/sessions/complete from the Stop hook is a
// no-op. Without this, hooks-disabled installs (and any session
// whose Stop hook fails before /api/sessions/complete) leave the
// DB row permanently 'active'.
session.restartGuard?.recordSuccess();
session.consecutiveRestarts = 0;
let finalized = false;
try {
this.sessionCompletionHandler.finalizeSession(session.sessionDbId);
finalized = true;
} catch (err) {
logger.warn('SESSION', 'finalizeSession failed in WorkerService generator .finally()', {
sessionId: session.sessionDbId
}, err as Error);
}
if (finalized) {
this.sessionManager.removeSessionImmediate(session.sessionDbId);
}
this.completionHandler.finalizeSession(session.sessionDbId);
this.sessionManager.removeSessionImmediate(session.sessionDbId);
}
});
}
@@ -960,34 +894,12 @@ export class WorkerService {
}
}
// No fallback or both failed: mark messages abandoned and remove session so queue doesn't grow
const pendingStore = this.sessionManager.getPendingMessageStore();
const abandoned = pendingStore.markAllSessionMessagesAbandoned(sessionDbId);
if (abandoned > 0) {
logger.warn('SDK', 'No fallback available; marked pending messages abandoned', {
sessionId: sessionDbId,
abandoned
});
}
// Finalize so DB status + broadcast + pending-drain are consistent on fallback failure.
// finalizeSession already broadcasts session_completed, so we don't also call
// broadcastSessionCompleted below. On finalize failure, fall back to the
// explicit broadcast so the UI still gets the event and leave the session
// in memory for the orphan reaper to retry.
let finalized = false;
try {
this.sessionCompletionHandler.finalizeSession(sessionDbId);
finalized = true;
} catch (err) {
logger.warn('SESSION', 'finalizeSession failed in runFallbackForTerminatedSession', {
sessionId: sessionDbId
}, err as Error);
}
if (finalized) {
this.sessionManager.removeSessionImmediate(sessionDbId);
} else {
this.sessionEventBroadcaster.broadcastSessionCompleted(sessionDbId);
}
// No fallback or both failed: mark session completed in DB (drain pending
// + broadcast via finalizeSession, idempotent) then drop in-memory state.
// Without this, sdk_sessions.status stays 'active' forever — the deleted
// reapStaleSessions interval was the only prior backstop.
this.completionHandler.finalizeSession(sessionDbId);
this.sessionManager.removeSessionImmediate(sessionDbId);
}
/**
@@ -1001,34 +913,15 @@ export class WorkerService {
* no? terminateSession()
*/
private terminateSession(sessionDbId: number, reason: string): void {
const pendingStore = this.sessionManager.getPendingMessageStore();
const abandoned = pendingStore.markAllSessionMessagesAbandoned(sessionDbId);
logger.info('SYSTEM', 'Session terminated', { sessionId: sessionDbId, reason });
logger.info('SYSTEM', 'Session terminated', {
sessionId: sessionDbId,
reason,
abandonedMessages: abandoned
});
// finalizeSession marks sdk_sessions.status='completed', drains pending
// messages, and broadcasts. Idempotent. Without this, wall-clock-limited
// and unrecoverable-error paths leave DB rows as 'active' forever.
this.completionHandler.finalizeSession(sessionDbId);
// Finalize session (mark completed in DB + drain pending + broadcast). Idempotent.
// This runs AFTER startSession() has returned, which means any summary/observation
// writes inside processAgentResponse() are already committed to SQLite synchronously.
// Only remove from the in-memory map if finalize succeeds; otherwise leave the
// session in place so the 60s orphan reaper can repair the DB inconsistency.
let finalized = false;
try {
this.sessionCompletionHandler.finalizeSession(sessionDbId);
finalized = true;
} catch (err) {
logger.warn('SESSION', 'finalizeSession failed during terminateSession', {
sessionId: sessionDbId, reason
}, err as Error);
}
if (finalized) {
// removeSessionImmediate fires onSessionDeletedCallback → broadcastProcessingStatus()
this.sessionManager.removeSessionImmediate(sessionDbId);
}
// removeSessionImmediate fires onSessionDeletedCallback → broadcastProcessingStatus()
this.sessionManager.removeSessionImmediate(sessionDbId);
}
/**
@@ -1154,18 +1047,6 @@ export class WorkerService {
logger.info('TRANSCRIPT', 'Transcript watcher stopped');
}
// Stop orphan reaper before shutdown (Issue #737)
if (this.stopOrphanReaper) {
this.stopOrphanReaper();
this.stopOrphanReaper = null;
}
// Stop stale session reaper (Issue #1168)
if (this.staleSessionReaperInterval) {
clearInterval(this.staleSessionReaperInterval);
this.staleSessionReaperInterval = null;
}
await performGracefulShutdown({
server: this.server.getHttpServer(),
sessionManager: this.sessionManager,
+5 -3
View File
@@ -48,9 +48,6 @@ export interface ActiveSession {
// Track whether the most recent storage operation persisted a summary record.
// Used by the status endpoint so the Stop hook can detect silent summary loss (#1633).
lastSummaryStored?: boolean;
// Circuit breaker: track consecutive summary failures to prevent infinite retry loops (#1633).
// When this reaches MAX_CONSECUTIVE_SUMMARY_FAILURES, further summarize requests are skipped.
consecutiveSummaryFailures: number;
// Subagent identity carried forward from the most recent claimed pending message.
// When observations are parsed and stored, these fields label the resulting rows
// so subagent work is attributable. NULL / undefined means the batch came from the main session.
@@ -69,6 +66,9 @@ export interface PendingMessage {
// Claude Code subagent identity — present only when the hook fired inside a subagent.
agentId?: string;
agentType?: string;
/** Provider-assigned tool-use id; underpins the
* UNIQUE(content_session_id, tool_use_id) idempotency index added in plan 01. */
toolUseId?: string;
}
/**
@@ -90,6 +90,8 @@ export interface ObservationData {
// Claude Code subagent identity — present only when the hook fired inside a subagent.
agentId?: string;
agentType?: string;
/** Provider-assigned tool-use id (plan 03 phase 6 idempotency key). */
toolUseId?: string;
}
// ============================================================================
+17 -15
View File
@@ -8,15 +8,17 @@
* - ChromaSync integration
*/
import { Database } from 'bun:sqlite';
import { SessionStore } from '../sqlite/SessionStore.js';
import { SessionSearch } from '../sqlite/SessionSearch.js';
import { ChromaSync } from '../sync/ChromaSync.js';
import { SettingsDefaultsManager } from '../../shared/SettingsDefaultsManager.js';
import { USER_SETTINGS_PATH } from '../../shared/paths.js';
import { USER_SETTINGS_PATH, DB_PATH } from '../../shared/paths.js';
import { logger } from '../../utils/logger.js';
import type { DBSession } from '../worker-types.js';
export class DatabaseManager {
private db: Database | null = null;
private sessionStore: SessionStore | null = null;
private sessionSearch: SessionSearch | null = null;
private chromaSync: ChromaSync | null = null;
@@ -26,8 +28,11 @@ export class DatabaseManager {
*/
async initialize(): Promise<void> {
// Open database connection (ONCE)
this.sessionStore = new SessionStore();
this.sessionSearch = new SessionSearch();
this.db = new Database(DB_PATH);
// Shared connection between store and search
this.sessionStore = new SessionStore(this.db);
this.sessionSearch = new SessionSearch(this.db);
// Initialize ChromaSync only if Chroma is enabled (SQLite-only fallback when disabled)
const settings = SettingsDefaultsManager.loadFromFile(USER_SETTINGS_PATH);
@@ -38,7 +43,7 @@ export class DatabaseManager {
logger.info('DB', 'Chroma disabled via CLAUDE_MEM_CHROMA_ENABLED=false, using SQLite-only search');
}
logger.info('DB', 'Database initialized');
logger.info('DB', 'Database initialized (shared connection)');
}
/**
@@ -51,13 +56,14 @@ export class DatabaseManager {
this.chromaSync = null;
}
if (this.sessionStore) {
this.sessionStore.close();
this.sessionStore = null;
}
if (this.sessionSearch) {
this.sessionSearch.close();
this.sessionSearch = null;
// We don't call sessionStore.close() or sessionSearch.close()
// because they share this.db which we close below.
this.sessionStore = null;
this.sessionSearch = null;
if (this.db) {
this.db.close();
this.db = null;
}
logger.info('DB', 'Database closed');
}
@@ -89,10 +95,6 @@ export class DatabaseManager {
return this.chromaSync;
}
// REMOVED: cleanupOrphanedSessions - violates "EVERYTHING SHOULD SAVE ALWAYS"
// Worker restarts don't make sessions orphaned. Sessions are managed by hooks
// and exist independently of worker state.
/**
* Get session by ID (throws if not found)
*/
+2 -1
View File
@@ -7,6 +7,7 @@
* - Efficient LIMIT+1 trick to avoid COUNT(*) query
*/
import type { SQLQueryBindings } from 'bun:sqlite';
import { DatabaseManager } from './DatabaseManager.js';
import { logger } from '../../utils/logger.js';
import { OBSERVER_SESSIONS_PROJECT } from '../../shared/paths.js';
@@ -102,7 +103,7 @@ export class PaginationHelper {
FROM observations o
LEFT JOIN sdk_sessions s ON o.memory_session_id = s.memory_session_id
`;
const params: unknown[] = [];
const params: SQLQueryBindings[] = [];
const conditions: string[] = [];
if (project) {
-527
View File
@@ -1,527 +0,0 @@
/**
* ProcessRegistry: Track spawned Claude subprocesses
*
* Fixes Issue #737: Claude haiku subprocesses don't terminate properly,
* causing zombie process accumulation (user reported 155 processes / 51GB RAM).
*
* Root causes:
* 1. SDK's SpawnedProcess interface hides subprocess PIDs
* 2. deleteSession() doesn't verify subprocess exit before cleanup
* 3. abort() is fire-and-forget with no confirmation
*
* Solution:
* - Use SDK's spawnClaudeCodeProcess option to capture PIDs
* - Track all spawned processes with session association
* - Verify exit on session deletion with timeout + SIGKILL escalation
* - Safety net orphan reaper runs every 5 minutes
*/
import { spawn, exec, ChildProcess } from 'child_process';
import { promisify } from 'util';
import { logger } from '../../utils/logger.js';
import { sanitizeEnv } from '../../supervisor/env-sanitizer.js';
import { getSupervisor } from '../../supervisor/index.js';
const execAsync = promisify(exec);
interface TrackedProcess {
pid: number;
sessionDbId: number;
spawnedAt: number;
process: ChildProcess;
}
function getTrackedProcesses(): TrackedProcess[] {
return getSupervisor().getRegistry()
.getAll()
.filter(record => record.type === 'sdk')
.map((record) => {
const processRef = getSupervisor().getRegistry().getRuntimeProcess(record.id);
if (!processRef) {
return null;
}
return {
pid: record.pid,
sessionDbId: Number(record.sessionId),
spawnedAt: Date.parse(record.startedAt),
process: processRef
};
})
.filter((value): value is TrackedProcess => value !== null);
}
/**
* Register a spawned process in the registry
*/
export function registerProcess(pid: number, sessionDbId: number, process: ChildProcess): void {
getSupervisor().registerProcess(`sdk:${sessionDbId}:${pid}`, {
pid,
type: 'sdk',
sessionId: sessionDbId,
startedAt: new Date().toISOString()
}, process);
logger.info('PROCESS', `Registered PID ${pid} for session ${sessionDbId}`, { pid, sessionDbId });
}
/**
* Unregister a process from the registry and notify pool waiters
*/
export function unregisterProcess(pid: number): void {
for (const record of getSupervisor().getRegistry().getByPid(pid)) {
if (record.type === 'sdk') {
getSupervisor().unregisterProcess(record.id);
}
}
logger.debug('PROCESS', `Unregistered PID ${pid}`, { pid });
// Notify waiters that a pool slot may be available
notifySlotAvailable();
}
/**
* Get process info by session ID
* Warns if multiple processes found (indicates race condition)
*/
export function getProcessBySession(sessionDbId: number): TrackedProcess | undefined {
const matches = getTrackedProcesses().filter(info => info.sessionDbId === sessionDbId);
if (matches.length > 1) {
logger.warn('PROCESS', `Multiple processes found for session ${sessionDbId}`, {
count: matches.length,
pids: matches.map(m => m.pid)
});
}
return matches[0];
}
/**
* Get count of active processes in the registry
*/
export function getActiveCount(): number {
return getSupervisor().getRegistry().getAll().filter(record => record.type === 'sdk').length;
}
// Waiters for pool slots - resolved when a process exits and frees a slot
const slotWaiters: Array<() => void> = [];
/**
* Notify waiters that a slot has freed up
*/
function notifySlotAvailable(): void {
const waiter = slotWaiters.shift();
if (waiter) waiter();
}
/**
* Wait for a pool slot to become available (promise-based, not polling)
* @param maxConcurrent Max number of concurrent agents
* @param timeoutMs Max time to wait before giving up
* @param evictIdleSession Optional callback to evict an idle session when all slots are full (#1868)
*/
const TOTAL_PROCESS_HARD_CAP = 10;
export async function waitForSlot(
maxConcurrent: number,
timeoutMs: number = 60_000,
evictIdleSession?: () => boolean
): Promise<void> {
// Hard cap: refuse to spawn if too many processes exist regardless of pool accounting
const activeCount = getActiveCount();
if (activeCount >= TOTAL_PROCESS_HARD_CAP) {
throw new Error(`Hard cap exceeded: ${activeCount} processes in registry (cap=${TOTAL_PROCESS_HARD_CAP}). Refusing to spawn more.`);
}
if (activeCount < maxConcurrent) return;
// Try to evict an idle session before waiting (#1868)
// Idle sessions hold pool slots during their 3-min idle timeout, blocking new sessions
// that would timeout after 60s. Eviction aborts the idle session asynchronously —
// the freed slot is picked up by the waiter mechanism below.
if (evictIdleSession) {
const evicted = evictIdleSession();
if (evicted) {
logger.info('PROCESS', 'Evicted idle session to free pool slot for waiting request');
}
}
logger.info('PROCESS', `Pool limit reached (${activeCount}/${maxConcurrent}), waiting for slot...`);
return new Promise<void>((resolve, reject) => {
const timeout = setTimeout(() => {
const idx = slotWaiters.indexOf(onSlot);
if (idx >= 0) slotWaiters.splice(idx, 1);
reject(new Error(`Timed out waiting for agent pool slot after ${timeoutMs}ms`));
}, timeoutMs);
const onSlot = () => {
clearTimeout(timeout);
if (getActiveCount() < maxConcurrent) {
resolve();
} else {
// Still full, re-queue
slotWaiters.push(onSlot);
}
};
slotWaiters.push(onSlot);
});
}
/**
* Get all active PIDs (for debugging)
*/
export function getActiveProcesses(): Array<{ pid: number; sessionDbId: number; ageMs: number }> {
const now = Date.now();
return getTrackedProcesses().map(info => ({
pid: info.pid,
sessionDbId: info.sessionDbId,
ageMs: now - info.spawnedAt
}));
}
/**
* Wait for a process to exit with timeout, escalating to SIGKILL if needed
* Uses event-based waiting instead of polling to avoid CPU overhead
*/
export async function ensureProcessExit(tracked: TrackedProcess, timeoutMs: number = 5000): Promise<void> {
const { pid, process: proc } = tracked;
// Already exited? Only trust exitCode, NOT proc.killed
// proc.killed only means Node sent a signal — the process can still be alive
if (proc.exitCode !== null) {
unregisterProcess(pid);
return;
}
// Wait for graceful exit with timeout using event-based approach
const exitPromise = new Promise<void>((resolve) => {
proc.once('exit', () => resolve());
});
const timeoutPromise = new Promise<void>((resolve) => {
setTimeout(resolve, timeoutMs);
});
await Promise.race([exitPromise, timeoutPromise]);
// Check if exited gracefully — only trust exitCode
if (proc.exitCode !== null) {
unregisterProcess(pid);
return;
}
// Timeout: escalate to SIGKILL
logger.warn('PROCESS', `PID ${pid} did not exit after ${timeoutMs}ms, sending SIGKILL`, { pid, timeoutMs });
try {
proc.kill('SIGKILL');
} catch {
// Already dead
}
// Wait for SIGKILL to take effect — use exit event with 1s timeout instead of blind sleep
const sigkillExitPromise = new Promise<void>((resolve) => {
proc.once('exit', () => resolve());
});
const sigkillTimeout = new Promise<void>((resolve) => {
setTimeout(resolve, 1000);
});
await Promise.race([sigkillExitPromise, sigkillTimeout]);
unregisterProcess(pid);
}
/**
* Kill idle daemon children (claude processes spawned by worker-service)
*
* These are SDK-spawned claude processes that completed their work but
* didn't terminate properly. They remain as children of the worker-service
* daemon, consuming memory without doing useful work.
*
* Criteria for cleanup:
* - Process name is "claude"
* - Parent PID is the worker-service daemon (this process)
* - Process has 0% CPU (idle)
* - Process has been running for more than 2 minutes
*/
async function killIdleDaemonChildren(): Promise<number> {
if (process.platform === 'win32') {
// Windows: Different process model, skip for now
return 0;
}
const daemonPid = process.pid;
let killed = 0;
try {
const { stdout } = await execAsync(
'ps -eo pid,ppid,%cpu,etime,comm 2>/dev/null | grep "claude$" || true'
);
for (const line of stdout.trim().split('\n')) {
if (!line) continue;
const parts = line.trim().split(/\s+/);
if (parts.length < 5) continue;
const [pidStr, ppidStr, cpuStr, etime] = parts;
const pid = parseInt(pidStr, 10);
const ppid = parseInt(ppidStr, 10);
const cpu = parseFloat(cpuStr);
// Skip if not a child of this daemon
if (ppid !== daemonPid) continue;
// Skip if actively using CPU
if (cpu > 0) continue;
// Parse elapsed time to minutes
// Formats: MM:SS, HH:MM:SS, D-HH:MM:SS
let minutes = 0;
const dayMatch = etime.match(/^(\d+)-(\d+):(\d+):(\d+)$/);
const hourMatch = etime.match(/^(\d+):(\d+):(\d+)$/);
const minMatch = etime.match(/^(\d+):(\d+)$/);
if (dayMatch) {
minutes = parseInt(dayMatch[1], 10) * 24 * 60 +
parseInt(dayMatch[2], 10) * 60 +
parseInt(dayMatch[3], 10);
} else if (hourMatch) {
minutes = parseInt(hourMatch[1], 10) * 60 +
parseInt(hourMatch[2], 10);
} else if (minMatch) {
minutes = parseInt(minMatch[1], 10);
}
// Kill if idle for more than 1 minute
if (minutes >= 1) {
logger.info('PROCESS', `Killing idle daemon child PID ${pid} (idle ${minutes}m)`, { pid, minutes });
try {
process.kill(pid, 'SIGKILL');
killed++;
} catch {
// Already dead or permission denied
}
}
}
} catch {
// No matches or command error
}
return killed;
}
/**
* Kill system-level orphans (ppid=1 on Unix)
* These are Claude processes whose parent died unexpectedly
*/
async function killSystemOrphans(): Promise<number> {
if (process.platform === 'win32') {
return 0; // Windows doesn't have ppid=1 orphan concept
}
try {
const { stdout } = await execAsync(
'ps -eo pid,ppid,args 2>/dev/null | grep -E "claude.*haiku|claude.*output-format" | grep -v grep'
);
let killed = 0;
for (const line of stdout.trim().split('\n')) {
if (!line) continue;
const match = line.trim().match(/^(\d+)\s+(\d+)/);
if (match && parseInt(match[2]) === 1) { // ppid=1 = orphan
const orphanPid = parseInt(match[1]);
logger.warn('PROCESS', `Killing system orphan PID ${orphanPid}`, { pid: orphanPid });
try {
process.kill(orphanPid, 'SIGKILL');
killed++;
} catch {
// Already dead or permission denied
}
}
}
return killed;
} catch {
return 0; // No matches or error
}
}
/**
* Reap orphaned processes - both registry-tracked and system-level
*/
export async function reapOrphanedProcesses(activeSessionIds: Set<number>): Promise<number> {
let killed = 0;
// Registry-based: kill processes for dead sessions
for (const record of getSupervisor().getRegistry().getAll().filter(entry => entry.type === 'sdk')) {
const pid = record.pid;
const sessionDbId = Number(record.sessionId);
const processRef = getSupervisor().getRegistry().getRuntimeProcess(record.id);
if (activeSessionIds.has(sessionDbId)) continue; // Active = safe
logger.warn('PROCESS', `Killing orphan PID ${pid} (session ${sessionDbId} gone)`, { pid, sessionDbId });
try {
if (processRef) {
processRef.kill('SIGKILL');
} else {
process.kill(pid, 'SIGKILL');
}
killed++;
} catch {
// Already dead
}
getSupervisor().unregisterProcess(record.id);
notifySlotAvailable();
}
// System-level: find ppid=1 orphans
killed += await killSystemOrphans();
// Daemon children: find idle SDK processes that didn't terminate
killed += await killIdleDaemonChildren();
return killed;
}
/**
* Create a custom spawn function for SDK that captures PIDs
*
* The SDK's spawnClaudeCodeProcess option allows us to intercept subprocess
* creation and capture the PID before the SDK hides it.
*
* NOTE: Session isolation is handled via the `cwd` option in SDKAgent.ts,
* NOT via CLAUDE_CONFIG_DIR (which breaks authentication).
*/
export function createPidCapturingSpawn(sessionDbId: number) {
return (spawnOptions: {
command: string;
args: string[];
cwd?: string;
env?: NodeJS.ProcessEnv;
signal?: AbortSignal;
}) => {
// Kill any existing process for this session before spawning a new one.
// Multiple processes sharing the same --resume UUID waste API credits and
// can conflict with each other (Issue #1590).
const existing = getProcessBySession(sessionDbId);
if (existing && existing.process.exitCode === null) {
logger.warn('PROCESS', `Killing duplicate process PID ${existing.pid} before spawning new one for session ${sessionDbId}`, {
existingPid: existing.pid,
sessionDbId
});
let exited = false;
try {
existing.process.kill('SIGTERM');
exited = existing.process.exitCode !== null;
} catch (error: unknown) {
// Already dead — safe to unregister immediately
if (error instanceof Error) {
logger.warn('WORKER', `Failed to kill duplicate process PID ${existing.pid}, likely already dead`, { existingPid: existing.pid, sessionDbId }, error);
}
exited = true;
}
if (exited) {
unregisterProcess(existing.pid);
}
// If still alive, the 'exit' handler (line ~440) will unregister it.
}
getSupervisor().assertCanSpawn('claude sdk');
// On Windows, use cmd.exe wrapper for .cmd files to properly handle paths with spaces
const useCmdWrapper = process.platform === 'win32' && spawnOptions.command.endsWith('.cmd');
const env = sanitizeEnv(spawnOptions.env ?? process.env);
// Filter empty string args AND their preceding flag (Issue #2049).
// The Agent SDK emits ["--setting-sources", ""] when settingSources defaults to [].
// Simply dropping "" leaves an orphan --setting-sources that consumes the next
// flag (e.g. --permission-mode) as its value, crashing Claude Code 2.1.109+ with
// "Invalid setting source: --permission-mode". Drop the flag too so the SDK
// default (no setting sources) is preserved by omission.
const args: string[] = [];
for (const arg of spawnOptions.args) {
if (arg === '') {
if (args.length > 0 && args[args.length - 1].startsWith('--')) {
args.pop();
}
continue;
}
args.push(arg);
}
const child = useCmdWrapper
? spawn('cmd.exe', ['/d', '/c', spawnOptions.command, ...args], {
cwd: spawnOptions.cwd,
env,
stdio: ['pipe', 'pipe', 'pipe'],
signal: spawnOptions.signal,
windowsHide: true
})
: spawn(spawnOptions.command, args, {
cwd: spawnOptions.cwd,
env,
stdio: ['pipe', 'pipe', 'pipe'],
signal: spawnOptions.signal, // CRITICAL: Pass signal for AbortController integration
windowsHide: true
});
// Capture stderr for debugging spawn failures
if (child.stderr) {
child.stderr.on('data', (data: Buffer) => {
logger.debug('SDK_SPAWN', `[session-${sessionDbId}] stderr: ${data.toString().trim()}`);
});
}
// Register PID
if (child.pid) {
registerProcess(child.pid, sessionDbId, child);
// Auto-unregister on exit
child.on('exit', (code: number | null, signal: string | null) => {
if (code !== 0) {
logger.warn('SDK_SPAWN', `[session-${sessionDbId}] Claude process exited`, { code, signal, pid: child.pid });
}
if (child.pid) {
unregisterProcess(child.pid);
}
});
}
// Return SDK-compatible interface
return {
stdin: child.stdin,
stdout: child.stdout,
stderr: child.stderr,
get killed() { return child.killed; },
get exitCode() { return child.exitCode; },
kill: child.kill.bind(child),
on: child.on.bind(child),
once: child.once.bind(child),
off: child.off.bind(child)
};
};
}
/**
* Start the orphan reaper interval
* Returns cleanup function to stop the interval
*/
export function startOrphanReaper(getActiveSessionIds: () => Set<number>, intervalMs: number = 30 * 1000): () => void {
const interval = setInterval(async () => {
try {
const activeIds = getActiveSessionIds();
const killed = await reapOrphanedProcesses(activeIds);
if (killed > 0) {
logger.info('PROCESS', `Reaper cleaned up ${killed} orphaned processes`, { killed });
}
} catch (error) {
if (error instanceof Error) {
logger.error('WORKER', 'Reaper error', {}, error);
} else {
logger.error('WORKER', 'Reaper error', { rawError: String(error) });
}
}
}, intervalMs);
// Return cleanup function
return () => clearInterval(interval);
}
+34 -2
View File
@@ -3,15 +3,26 @@
* Prevents tight-loop restarts (bug) while allowing legitimate occasional restarts
* over long sessions. Replaces the flat consecutiveRestarts counter that stranded
* pending messages after just 3 restarts over any timeframe (#2053).
*
* TWO INDEPENDENT TRIPS:
* 1. Sliding window: more than MAX_WINDOWED_RESTARTS within RESTART_WINDOW_MS.
* Catches genuinely tight loops (e.g. crash every <6s).
* 2. Consecutive failures: more than MAX_CONSECUTIVE_FAILURES restarts with
* NO successful processing in between. Catches dead sessions that
* fail-restart-fail-restart on a slow exponential backoff cadence
* (e.g. 8s backoff cap + spawn failures = restartsInWindow stays under
* the windowed cap forever, but the session is clearly dead).
*/
const RESTART_WINDOW_MS = 60_000; // Only count restarts within last 60 seconds
const MAX_WINDOWED_RESTARTS = 10; // 10 restarts in 60s = runaway loop
const MAX_CONSECUTIVE_FAILURES = 5; // 5 restarts with no success in between = session is dead
const DECAY_AFTER_SUCCESS_MS = 5 * 60_000; // Clear history after 5min of uninterrupted success
export class RestartGuard {
private restartTimestamps: number[] = [];
private lastSuccessfulProcessing: number | null = null;
private consecutiveFailures: number = 0;
/**
* Record a restart and check if the guard should trip.
@@ -34,16 +45,23 @@ export class RestartGuard {
// Record this restart
this.restartTimestamps.push(now);
this.consecutiveFailures += 1;
// Check if we've exceeded the cap within the window
return this.restartTimestamps.length <= MAX_WINDOWED_RESTARTS;
// Trip if EITHER guard exceeds its limit:
// - Sliding window cap (tight loops)
// - Consecutive failures with no successful work (dead session, e.g. spawn always fails)
const withinWindowedCap = this.restartTimestamps.length <= MAX_WINDOWED_RESTARTS;
const withinConsecutiveCap = this.consecutiveFailures <= MAX_CONSECUTIVE_FAILURES;
return withinWindowedCap && withinConsecutiveCap;
}
/**
* Call when a message is successfully processed to update the success timestamp.
* Resets the consecutive-failure counter (real progress was made).
*/
recordSuccess(): void {
this.lastSuccessfulProcessing = Date.now();
this.consecutiveFailures = 0;
}
/**
@@ -67,4 +85,18 @@ export class RestartGuard {
get maxRestarts(): number {
return MAX_WINDOWED_RESTARTS;
}
/**
* Get consecutive failures since last successful processing (for logging).
*/
get consecutiveFailuresSinceSuccess(): number {
return this.consecutiveFailures;
}
/**
* Get the max allowed consecutive failures (for logging).
*/
get maxConsecutiveFailures(): number {
return MAX_CONSECUTIVE_FAILURES;
}
}
+19 -11
View File
@@ -21,7 +21,12 @@ import { buildIsolatedEnv, getAuthMethodDescription } from '../../shared/EnvMana
import type { ActiveSession, SDKUserMessage } from '../worker-types.js';
import { ModeManager } from '../domain/ModeManager.js';
import { processAgentResponse, type WorkerRef } from './agents/index.js';
import { createPidCapturingSpawn, getProcessBySession, ensureProcessExit, waitForSlot } from './ProcessRegistry.js';
import {
createSdkSpawnFactory,
getSdkProcessForSession,
ensureSdkProcessExit,
waitForSlot,
} from '../../supervisor/process-registry.js';
import { sanitizeEnv } from '../../supervisor/env-sanitizer.js';
// Import Agent SDK (assumes it's installed)
@@ -90,11 +95,11 @@ export class SDKAgent {
}
// Wait for agent pool slot (configurable via CLAUDE_MEM_MAX_CONCURRENT_AGENTS)
// Pass idle session eviction callback to prevent pool deadlock (#1868):
// idle sessions hold slots during 3-min idle wait, blocking new sessions
// Backpressure only — a full pool waits, never evicts a live session
// (Principle 1: do not kick live work to make room).
const settings = SettingsDefaultsManager.loadFromFile(USER_SETTINGS_PATH);
const maxConcurrent = parseInt(settings.CLAUDE_MEM_MAX_CONCURRENT_AGENTS, 10) || 2;
await waitForSlot(maxConcurrent, 60_000, () => this.sessionManager.evictIdlestSession());
await waitForSlot(maxConcurrent, 60_000);
// Build isolated environment from ~/.claude-mem/.env
// This prevents Issue #733: random ANTHROPIC_API_KEY from project .env files
@@ -105,7 +110,7 @@ export class SDKAgent {
logger.info('SDK', 'Starting SDK query', {
sessionDbId: session.sessionDbId,
contentSessionId: session.contentSessionId,
memorySessionId: session.memorySessionId,
memorySessionId: session.memorySessionId ?? undefined,
hasRealMemorySessionId,
shouldResume,
resume_parameter: shouldResume ? session.memorySessionId : '(none - fresh start)',
@@ -139,12 +144,13 @@ export class SDKAgent {
// instead of polluting user's actual project resume lists
cwd: OBSERVER_SESSIONS_DIR,
// Only resume if shouldResume is true (memorySessionId exists, not first prompt, not forceInit)
...(shouldResume && { resume: session.memorySessionId }),
...(shouldResume && session.memorySessionId ? { resume: session.memorySessionId } : {}),
disallowedTools,
abortController: session.abortController,
pathToClaudeCodeExecutable: claudePath,
// Custom spawn function captures PIDs to fix zombie process accumulation
spawnClaudeCodeProcess: createPidCapturingSpawn(session.sessionDbId),
// Custom spawn factory: spawns the SDK child in its own POSIX process
// group so the worker can tear down the whole subtree on shutdown.
spawnClaudeCodeProcess: createSdkSpawnFactory(session.sessionDbId),
env: isolatedEnv // Use isolated credentials from ~/.claude-mem/.env, not process.env
}
});
@@ -283,10 +289,12 @@ export class SDKAgent {
}
}
} finally {
// Ensure subprocess is terminated after query completes (or on error)
const tracked = getProcessBySession(session.sessionDbId);
// Ensure subprocess is terminated after query completes (or on error).
// Process-group teardown via ensureSdkProcessExit kills any descendants
// the SDK spawned, so no orphan reaper is needed (Principle 5).
const tracked = getSdkProcessForSession(session.sessionDbId);
if (tracked && tracked.process.exitCode === null) {
await ensureProcessExit(tracked, 5000);
await ensureSdkProcessExit(tracked, 5000);
}
}
+99 -331
View File
@@ -31,6 +31,8 @@ import {
SEARCH_CONSTANTS
} from './search/index.js';
import type { TimelineData } from './search/index.js';
import { ResultFormatter } from './search/ResultFormatter.js';
import { ChromaUnavailableError } from './search/errors.js';
export class SearchManager {
private orchestrator: SearchOrchestrator;
@@ -52,6 +54,22 @@ export class SearchManager {
this.timelineBuilder = new TimelineBuilder();
}
/**
* Accessor for the underlying orchestrator. Used by HTTP routes that need
* raw StrategySearchResult instead of formatted MCP text output.
*/
getOrchestrator(): SearchOrchestrator {
return this.orchestrator;
}
/**
* Accessor for the formatter. Used by HTTP routes that construct
* text output from raw orchestrator results.
*/
getFormatter(): FormattingService {
return this.formatter;
}
/**
* Query Chroma vector database via ChromaSync
* @deprecated Use orchestrator.search() instead
@@ -166,6 +184,7 @@ export class SearchManager {
let sessions: SessionSummarySearchResult[] = [];
let prompts: UserPromptSearchResult[] = [];
let chromaFailed = false;
let chromaFailureReason: { message: string; isConnectionError: boolean } | null = null;
// Determine which types to query based on type filter
const searchObservations = !type || type === 'observations';
@@ -202,12 +221,6 @@ export class SearchManager {
whereFilter = { doc_type: 'user_prompt' };
}
// Include project in the Chroma where clause to scope vector search.
// Without this, larger projects dominate the top-N results and smaller
// projects get crowded out before the post-hoc SQLite filter.
// Match both native-provenance rows (project) and adopted merged-worktree
// rows (merged_into_project) so a parent-project query surfaces its
// merged children's observations too.
if (options.project) {
const projectFilter = {
$or: [
@@ -220,82 +233,96 @@ export class SearchManager {
: projectFilter;
}
// Step 1: Chroma semantic search with optional type + project filter
const chromaResults = await this.queryChroma(query, 100, whereFilter);
chromaSucceeded = true; // Chroma didn't throw error
logger.debug('SEARCH', 'ChromaDB returned semantic matches', { matchCount: chromaResults.ids.length });
try {
// Step 1: Chroma semantic search with optional type + project filter
const chromaResults = await this.queryChroma(query, 100, whereFilter);
chromaSucceeded = true; // Chroma didn't throw error
logger.debug('SEARCH', 'ChromaDB returned semantic matches', { matchCount: chromaResults.ids.length });
if (chromaResults.ids.length > 0) {
// Step 2: Filter by date range
// Use user-provided dateRange if available, otherwise fall back to 90-day recency window
const { dateRange } = options;
let startEpoch: number | undefined;
let endEpoch: number | undefined;
if (chromaResults.ids.length > 0) {
// Step 2: Filter by date range
const { dateRange } = options;
let startEpoch: number | undefined;
let endEpoch: number | undefined;
if (dateRange) {
if (dateRange.start) {
startEpoch = typeof dateRange.start === 'number'
? dateRange.start
: new Date(dateRange.start).getTime();
if (dateRange) {
if (dateRange.start) {
startEpoch = typeof dateRange.start === 'number'
? dateRange.start
: new Date(dateRange.start).getTime();
}
if (dateRange.end) {
endEpoch = typeof dateRange.end === 'number'
? dateRange.end
: new Date(dateRange.end).getTime();
}
} else {
// Default: 90-day recency window
startEpoch = Date.now() - SEARCH_CONSTANTS.RECENCY_WINDOW_MS;
}
if (dateRange.end) {
endEpoch = typeof dateRange.end === 'number'
? dateRange.end
: new Date(dateRange.end).getTime();
const recentMetadata = chromaResults.metadatas.map((meta, idx) => ({
id: chromaResults.ids[idx],
meta,
isRecent: meta && meta.created_at_epoch != null
&& (!startEpoch || meta.created_at_epoch >= startEpoch)
&& (!endEpoch || meta.created_at_epoch <= endEpoch)
})).filter(item => item.isRecent);
logger.debug('SEARCH', dateRange ? 'Results within user date range' : 'Results within 90-day window', { count: recentMetadata.length });
// Step 3: Categorize IDs by document type
const obsIds: number[] = [];
const sessionIds: number[] = [];
const promptIds: number[] = [];
for (const item of recentMetadata) {
const docType = item.meta?.doc_type;
if (docType === 'observation' && searchObservations) {
obsIds.push(item.id);
} else if (docType === 'session_summary' && searchSessions) {
sessionIds.push(item.id);
} else if (docType === 'user_prompt' && searchPrompts) {
promptIds.push(item.id);
}
}
// Step 4: Hydrate from SQLite with additional filters
if (obsIds.length > 0) {
const obsOptions = { ...options, type: obs_type, concepts, files };
observations = this.sessionStore.getObservationsByIds(obsIds, obsOptions);
}
if (sessionIds.length > 0) {
sessions = this.sessionStore.getSessionSummariesByIds(sessionIds, { orderBy: 'date_desc', limit: options.limit, project: options.project });
}
if (promptIds.length > 0) {
prompts = this.sessionStore.getUserPromptsByIds(promptIds, { orderBy: 'date_desc', limit: options.limit, project: options.project });
}
} else {
// Default: 90-day recency window
startEpoch = Date.now() - SEARCH_CONSTANTS.RECENCY_WINDOW_MS;
logger.debug('SEARCH', 'ChromaDB found no matches (final result, no FTS5 fallback)', {});
}
} catch (chromaError) {
const errorObject = chromaError instanceof Error ? chromaError : new Error(String(chromaError));
chromaFailureReason = {
message: errorObject.message,
isConnectionError: chromaError instanceof ChromaUnavailableError,
};
logger.warn('SEARCH', 'ChromaDB semantic search failed, falling back to FTS5 keyword search', {}, errorObject);
chromaFailed = true;
const recentMetadata = chromaResults.metadatas.map((meta, idx) => ({
id: chromaResults.ids[idx],
meta,
isRecent: meta && meta.created_at_epoch != null
&& (!startEpoch || meta.created_at_epoch >= startEpoch)
&& (!endEpoch || meta.created_at_epoch <= endEpoch)
})).filter(item => item.isRecent);
logger.debug('SEARCH', dateRange ? 'Results within user date range' : 'Results within 90-day window', { count: recentMetadata.length });
// Step 3: Categorize IDs by document type
const obsIds: number[] = [];
const sessionIds: number[] = [];
const promptIds: number[] = [];
for (const item of recentMetadata) {
const docType = item.meta?.doc_type;
if (docType === 'observation' && searchObservations) {
obsIds.push(item.id);
} else if (docType === 'session_summary' && searchSessions) {
sessionIds.push(item.id);
} else if (docType === 'user_prompt' && searchPrompts) {
promptIds.push(item.id);
}
// Fallback to FTS5 path since Chroma failed
if (searchObservations) {
observations = this.sessionSearch.searchObservations(query, { ...options, type: obs_type, concepts, files });
}
logger.debug('SEARCH', 'Categorized results by type', { observations: obsIds.length, sessions: sessionIds.length, prompts: prompts.length });
// Step 4: Hydrate from SQLite with additional filters
if (obsIds.length > 0) {
// Apply obs_type, concepts, files filters if provided
const obsOptions = { ...options, type: obs_type, concepts, files };
observations = this.sessionStore.getObservationsByIds(obsIds, obsOptions);
if (searchSessions) {
sessions = this.sessionSearch.searchSessions(query, options);
}
if (sessionIds.length > 0) {
sessions = this.sessionStore.getSessionSummariesByIds(sessionIds, { orderBy: 'date_desc', limit: options.limit, project: options.project });
if (searchPrompts) {
prompts = this.sessionSearch.searchUserPrompts(query, options);
}
if (promptIds.length > 0) {
prompts = this.sessionStore.getUserPromptsByIds(promptIds, { orderBy: 'date_desc', limit: options.limit, project: options.project });
}
logger.debug('SEARCH', 'Hydrated results from SQLite', { observations: observations.length, sessions: sessions.length, prompts: prompts.length });
} else {
// Chroma returned 0 results - this is the correct answer, don't fall back to FTS5
logger.debug('SEARCH', 'ChromaDB found no matches (final result, no FTS5 fallback)', {});
}
}
// ChromaDB not initialized - fall back to FTS5 keyword search (#1913, #2048)
// PATH 3: FTS5 KEYWORD SEARCH (Chroma not initialized)
else if (query) {
logger.debug('SEARCH', 'ChromaDB not initialized — falling back to FTS5 keyword search', {});
try {
@@ -329,11 +356,11 @@ export class SearchManager {
}
if (totalResults === 0) {
if (chromaFailed) {
if (chromaFailureReason !== null) {
return {
content: [{
type: 'text' as const,
text: `Vector search failed - semantic search unavailable.\n\nTo enable semantic search:\n1. Install uv: https://docs.astral.sh/uv/getting-started/installation/\n2. Restart the worker: npm run worker:restart\n\nNote: You can still use filter-only searches (date ranges, types, files) without a query term.`
text: ResultFormatter.formatChromaFailureMessage(chromaFailureReason)
}]
};
}
@@ -1203,265 +1230,6 @@ export class SearchManager {
}
/**
* Tool handler: find_by_concept
*/
async findByConcept(args: any): Promise<any> {
const normalized = this.normalizeParams(args);
const { concepts: concept, ...filters } = normalized;
let results: ObservationSearchResult[] = [];
// Metadata-first, semantic-enhanced search
if (this.chromaSync) {
logger.debug('SEARCH', 'Using metadata-first + semantic ranking for concept search', {});
// Step 1: SQLite metadata filter (get all IDs with this concept)
const metadataResults = this.sessionSearch.findByConcept(concept, filters);
logger.debug('SEARCH', 'Found observations with concept', { concept, count: metadataResults.length });
if (metadataResults.length > 0) {
// Step 2: Chroma semantic ranking (rank by relevance to concept)
const ids = metadataResults.map(obs => obs.id);
const chromaResults = await this.queryChroma(concept, Math.min(ids.length, 100));
// Intersect: Keep only IDs that passed metadata filter, in semantic rank order
const rankedIds: number[] = [];
for (const chromaId of chromaResults.ids) {
if (ids.includes(chromaId) && !rankedIds.includes(chromaId)) {
rankedIds.push(chromaId);
}
}
logger.debug('SEARCH', 'Chroma ranked results by semantic relevance', { count: rankedIds.length });
// Step 3: Hydrate in semantic rank order
if (rankedIds.length > 0) {
results = this.sessionStore.getObservationsByIds(rankedIds, { limit: filters.limit || 20 });
// Restore semantic ranking order
results.sort((a, b) => rankedIds.indexOf(a.id) - rankedIds.indexOf(b.id));
}
}
}
// Fall back to SQLite-only if Chroma unavailable or failed
if (results.length === 0) {
logger.debug('SEARCH', 'Using SQLite-only concept search', {});
results = this.sessionSearch.findByConcept(concept, filters);
}
if (results.length === 0) {
return {
content: [{
type: 'text' as const,
text: `No observations found with concept "${concept}"`
}]
};
}
// Format as table
const header = `Found ${results.length} observation(s) with concept "${concept}"\n\n${this.formatter.formatTableHeader()}`;
const formattedResults = results.map((obs, i) => this.formatter.formatObservationIndex(obs, i));
return {
content: [{
type: 'text' as const,
text: header + '\n' + formattedResults.join('\n')
}]
};
}
/**
* Tool handler: find_by_file
*/
async findByFile(args: any): Promise<any> {
const normalized = this.normalizeParams(args);
const { files: rawFilePath, ...filters } = normalized;
// Handle both string and array (normalizeParams may split on comma)
const filePath = Array.isArray(rawFilePath) ? rawFilePath[0] : rawFilePath;
let observations: ObservationSearchResult[] = [];
let sessions: SessionSummarySearchResult[] = [];
// Metadata-first, semantic-enhanced search for observations
if (this.chromaSync) {
logger.debug('SEARCH', 'Using metadata-first + semantic ranking for file search', {});
// Step 1: SQLite metadata filter (get all results with this file)
const metadataResults = this.sessionSearch.findByFile(filePath, filters);
logger.debug('SEARCH', 'Found results for file', { file: filePath, observations: metadataResults.observations.length, sessions: metadataResults.sessions.length });
// Sessions: Keep as-is (already summarized, no semantic ranking needed)
sessions = metadataResults.sessions;
// Observations: Apply semantic ranking
if (metadataResults.observations.length > 0) {
// Step 2: Chroma semantic ranking (rank by relevance to file path)
const ids = metadataResults.observations.map(obs => obs.id);
const chromaResults = await this.queryChroma(filePath, Math.min(ids.length, 100));
// Intersect: Keep only IDs that passed metadata filter, in semantic rank order
const rankedIds: number[] = [];
for (const chromaId of chromaResults.ids) {
if (ids.includes(chromaId) && !rankedIds.includes(chromaId)) {
rankedIds.push(chromaId);
}
}
logger.debug('SEARCH', 'Chroma ranked observations by semantic relevance', { count: rankedIds.length });
// Step 3: Hydrate in semantic rank order
if (rankedIds.length > 0) {
observations = this.sessionStore.getObservationsByIds(rankedIds, { limit: filters.limit || 20 });
// Restore semantic ranking order
observations.sort((a, b) => rankedIds.indexOf(a.id) - rankedIds.indexOf(b.id));
}
}
}
// Fall back to SQLite-only if Chroma unavailable or failed
if (observations.length === 0 && sessions.length === 0) {
logger.debug('SEARCH', 'Using SQLite-only file search', {});
const results = this.sessionSearch.findByFile(filePath, filters);
observations = results.observations;
sessions = results.sessions;
}
const totalResults = observations.length + sessions.length;
if (totalResults === 0) {
return {
content: [{
type: 'text' as const,
text: `No results found for file "${filePath}"`
}]
};
}
// Combine observations and sessions with timestamps for date grouping
const combined: Array<{
type: 'observation' | 'session';
data: ObservationSearchResult | SessionSummarySearchResult;
epoch: number;
created_at: string;
}> = [
...observations.map(obs => ({
type: 'observation' as const,
data: obs,
epoch: obs.created_at_epoch,
created_at: obs.created_at
})),
...sessions.map(sess => ({
type: 'session' as const,
data: sess,
epoch: sess.created_at_epoch,
created_at: sess.created_at
}))
];
// Sort by date (most recent first)
combined.sort((a, b) => b.epoch - a.epoch);
// Group by date for proper timeline rendering
const resultsByDate = groupByDate(combined, item => item.created_at);
// Format with date headers for proper date parsing by folder CLAUDE.md generator
const lines: string[] = [];
lines.push(`Found ${totalResults} result(s) for file "${filePath}"`);
lines.push('');
for (const [day, dayResults] of resultsByDate) {
lines.push(`### ${day}`);
lines.push('');
lines.push(this.formatter.formatTableHeader());
for (const result of dayResults) {
if (result.type === 'observation') {
lines.push(this.formatter.formatObservationIndex(result.data as ObservationSearchResult, 0));
} else {
lines.push(this.formatter.formatSessionIndex(result.data as SessionSummarySearchResult, 0));
}
}
lines.push('');
}
return {
content: [{
type: 'text' as const,
text: lines.join('\n')
}]
};
}
/**
* Tool handler: find_by_type
*/
async findByType(args: any): Promise<any> {
const normalized = this.normalizeParams(args);
const { type, ...filters } = normalized;
const typeStr = Array.isArray(type) ? type.join(', ') : type;
let results: ObservationSearchResult[] = [];
// Metadata-first, semantic-enhanced search
if (this.chromaSync) {
logger.debug('SEARCH', 'Using metadata-first + semantic ranking for type search', {});
// Step 1: SQLite metadata filter (get all IDs with this type)
const metadataResults = this.sessionSearch.findByType(type, filters);
logger.debug('SEARCH', 'Found observations with type', { type: typeStr, count: metadataResults.length });
if (metadataResults.length > 0) {
// Step 2: Chroma semantic ranking (rank by relevance to type)
const ids = metadataResults.map(obs => obs.id);
const chromaResults = await this.queryChroma(typeStr, Math.min(ids.length, 100));
// Intersect: Keep only IDs that passed metadata filter, in semantic rank order
const rankedIds: number[] = [];
for (const chromaId of chromaResults.ids) {
if (ids.includes(chromaId) && !rankedIds.includes(chromaId)) {
rankedIds.push(chromaId);
}
}
logger.debug('SEARCH', 'Chroma ranked results by semantic relevance', { count: rankedIds.length });
// Step 3: Hydrate in semantic rank order
if (rankedIds.length > 0) {
results = this.sessionStore.getObservationsByIds(rankedIds, { limit: filters.limit || 20 });
// Restore semantic ranking order
results.sort((a, b) => rankedIds.indexOf(a.id) - rankedIds.indexOf(b.id));
}
}
}
// Fall back to SQLite-only if Chroma unavailable or failed
if (results.length === 0) {
logger.debug('SEARCH', 'Using SQLite-only type search', {});
results = this.sessionSearch.findByType(type, filters);
}
if (results.length === 0) {
return {
content: [{
type: 'text' as const,
text: `No observations found with type "${typeStr}"`
}]
};
}
// Format as table
const header = `Found ${results.length} observation(s) with type "${typeStr}"\n\n${this.formatter.formatTableHeader()}`;
const formattedResults = results.map((obs, i) => this.formatter.formatObservationIndex(obs, i));
return {
content: [{
type: 'text' as const,
text: header + '\n' + formattedResults.join('\n')
}]
};
}
/**
* Tool handler: get_recent_context
*/
+40 -192
View File
@@ -14,75 +14,10 @@ import { logger } from '../../utils/logger.js';
import type { ActiveSession, PendingMessage, PendingMessageWithId, ObservationData } from '../worker-types.js';
import { PendingMessageStore } from '../sqlite/PendingMessageStore.js';
import { SessionQueueProcessor } from '../queue/SessionQueueProcessor.js';
import { getProcessBySession, ensureProcessExit } from './ProcessRegistry.js';
import { getSdkProcessForSession, ensureSdkProcessExit } from '../../supervisor/process-registry.js';
import { getSupervisor } from '../../supervisor/index.js';
import { MAX_CONSECUTIVE_SUMMARY_FAILURES } from '../../sdk/prompts.js';
import { RestartGuard } from './RestartGuard.js';
/** Idle threshold before a stuck generator (zombie subprocess) is force-killed. */
export const MAX_GENERATOR_IDLE_MS = 5 * 60 * 1000; // 5 minutes
/** Idle threshold before a no-generator session with no pending work is reaped. */
export const MAX_SESSION_IDLE_MS = 15 * 60 * 1000; // 15 minutes
/**
* Minimal process interface used by detectStaleGenerator compatible with
* both the real Bun.Subprocess / ChildProcess shapes and test mocks.
*/
export interface StaleGeneratorProcess {
exitCode: number | null;
kill(signal?: string): boolean | void;
}
/**
* Minimal session fields required to evaluate stale-generator status.
* This is a subset of ActiveSession, allowing unit tests to pass plain objects.
*/
export interface StaleGeneratorCandidate {
generatorPromise: Promise<void> | null;
lastGeneratorActivity: number;
abortController: AbortController;
}
/**
* Detect whether a session's generator is stuck (zombie subprocess) and, if so,
* SIGKILL the subprocess and abort the controller.
*
* Extracted from reapStaleSessions() so tests can import and exercise the exact
* same logic rather than duplicating it locally. (Issue #1652)
*
* @param session - session to inspect
* @param proc - tracked subprocess (may be undefined if not in ProcessRegistry)
* @param now - current timestamp (defaults to Date.now(); pass explicit value in tests)
* @returns true if the session was marked stale, false otherwise
*/
export function detectStaleGenerator(
session: StaleGeneratorCandidate,
proc: StaleGeneratorProcess | undefined,
now = Date.now()
): boolean {
if (!session.generatorPromise) return false;
const generatorIdleMs = now - session.lastGeneratorActivity;
if (generatorIdleMs <= MAX_GENERATOR_IDLE_MS) return false;
// Kill subprocess to unblock stuck for-await
if (proc && proc.exitCode === null) {
try {
proc.kill('SIGKILL');
} catch (error) {
if (error instanceof Error) {
logger.warn('SESSION', 'Failed to SIGKILL stale generator subprocess', {}, error);
} else {
logger.warn('SESSION', 'Failed to SIGKILL stale generator subprocess with non-Error', {}, new Error(String(error)));
}
}
}
// Signal the SDK agent loop to exit
session.abortController.abort();
return true;
}
export class SessionManager {
private dbManager: DatabaseManager;
private sessions: Map<number, ActiveSession> = new Map();
@@ -229,7 +164,6 @@ export class SessionManager {
restartGuard: new RestartGuard(),
processingMessageIds: [], // CLAIM-CONFIRM: Track message IDs for confirmProcessed()
lastGeneratorActivity: Date.now(), // Initialize for stale detection (Issue #1099)
consecutiveSummaryFailures: 0, // Circuit breaker for summary retry loop (#1633)
pendingAgentId: null, // Subagent identity carried from the most recent claimed message
pendingAgentType: null // (null for main-session messages)
};
@@ -289,16 +223,28 @@ export class SessionManager {
prompt_number: data.prompt_number,
cwd: data.cwd,
agentId: data.agentId,
agentType: data.agentType
agentType: data.agentType,
toolUseId: data.toolUseId,
};
try {
const messageId = this.getPendingStore().enqueue(sessionDbId, session.contentSessionId, message);
const queueDepth = this.getPendingStore().getPendingCount(sessionDbId);
const toolSummary = logger.formatTool(data.tool_name, data.tool_input);
logger.info('QUEUE', `ENQUEUED | sessionDbId=${sessionDbId} | messageId=${messageId} | type=observation | tool=${toolSummary} | depth=${queueDepth}`, {
sessionId: sessionDbId
});
// enqueue returns 0 on INSERT OR IGNORE conflict (UNIQUE(session_id, tool_use_id)
// — Plan 01 Phase 1). The duplicate is correctly suppressed by the DB; surface
// it visibly so it isn't misread as "messageId=0 was inserted." Per
// Principle 3 (UNIQUE constraint over dedup window) this is the success path
// for replayed transcript lines, not an error.
if (messageId === 0) {
logger.debug('QUEUE', `DUP_SUPPRESSED | sessionDbId=${sessionDbId} | type=observation | tool=${toolSummary} | toolUseId=${data.toolUseId ?? 'null'} | depth=${queueDepth}`, {
sessionId: sessionDbId
});
} else {
logger.info('QUEUE', `ENQUEUED | sessionDbId=${sessionDbId} | messageId=${messageId} | type=observation | tool=${toolSummary} | depth=${queueDepth}`, {
sessionId: sessionDbId
});
}
} catch (error) {
if (error instanceof Error) {
logger.error('SESSION', 'Failed to persist observation to DB', {
@@ -333,17 +279,10 @@ export class SessionManager {
session = this.initializeSession(sessionDbId);
}
// Circuit breaker: skip summarize if too many consecutive failures (#1633).
// This prevents the infinite loop where each failed summary spawns a new session
// with an ever-growing prompt. Counter is in-memory per ActiveSession — it resets
// on worker restart, which is acceptable because session state is already ephemeral.
if (session.consecutiveSummaryFailures >= MAX_CONSECUTIVE_SUMMARY_FAILURES) {
logger.warn('SESSION', `Circuit breaker OPEN: skipping summarize after ${session.consecutiveSummaryFailures} consecutive failures (#1633)`, {
sessionId: sessionDbId,
contentSessionId: session.contentSessionId
});
return;
}
// PATHFINDER plan 03 phase 3: summary-failure circuit breaker deleted.
// Each failed parse is independently marked failed via the retry ladder
// in PendingMessageStore.markFailed; a storm of bad parses surfaces as
// retry exhaustion, not as silent suppression of further requests.
// CRITICAL: Persist to database FIRST
const message: PendingMessage = {
@@ -354,9 +293,16 @@ export class SessionManager {
try {
const messageId = this.getPendingStore().enqueue(sessionDbId, session.contentSessionId, message);
const queueDepth = this.getPendingStore().getPendingCount(sessionDbId);
logger.info('QUEUE', `ENQUEUED | sessionDbId=${sessionDbId} | messageId=${messageId} | type=summarize | depth=${queueDepth}`, {
sessionId: sessionDbId
});
// See queueObservation note: messageId=0 means UNIQUE-suppressed duplicate.
if (messageId === 0) {
logger.debug('QUEUE', `DUP_SUPPRESSED | sessionDbId=${sessionDbId} | type=summarize | depth=${queueDepth}`, {
sessionId: sessionDbId
});
} else {
logger.info('QUEUE', `ENQUEUED | sessionDbId=${sessionDbId} | messageId=${messageId} | type=summarize | depth=${queueDepth}`, {
sessionId: sessionDbId
});
}
} catch (error) {
if (error instanceof Error) {
logger.error('SESSION', 'Failed to persist summarize to DB', {
@@ -402,19 +348,21 @@ export class SessionManager {
});
}
// 3. Verify subprocess exit with 5s timeout (Issue #737 fix)
const tracked = getProcessBySession(sessionDbId);
// 3. Verify subprocess exit with 5s timeout. Process-group teardown is
// used internally so any SDK descendants are killed too (Principle 5).
const tracked = getSdkProcessForSession(sessionDbId);
if (tracked && tracked.process.exitCode === null) {
logger.debug('SESSION', `Waiting for subprocess PID ${tracked.pid} to exit`, {
logger.debug('SESSION', `Waiting for subprocess PID ${tracked.pid} (pgid ${tracked.pgid}) to exit`, {
sessionId: sessionDbId,
pid: tracked.pid
pid: tracked.pid,
pgid: tracked.pgid
});
await ensureProcessExit(tracked, 5000);
await ensureSdkProcessExit(tracked, 5000);
}
// 3b. Reap all supervisor-tracked processes for this session (#1351)
// This catches MCP servers and other child processes not tracked by the
// in-memory ProcessRegistry (e.g. processes registered only in supervisor.json).
// Catches MCP servers and other child processes registered only in
// supervisor.json that the in-process tracking would not see.
try {
await getSupervisor().getRegistry().reapSession(sessionDbId);
} catch (error) {
@@ -467,106 +415,6 @@ export class SessionManager {
}
}
/**
* Evict the idlest session to free a pool slot (#1868).
* An "idle" session has an active generator but no pending work it's sitting
* in the 3-min idle wait before subprocess cleanup. Evicting it triggers abort
* which kills the subprocess and frees the pool slot for a waiting new session.
* @returns true if a session was evicted, false if no idle sessions found
*/
evictIdlestSession(): boolean {
let idlestSessionId: number | null = null;
let oldestActivity = Infinity;
for (const [sessionDbId, session] of this.sessions) {
if (!session.generatorPromise) continue; // No generator = no slot held
const pendingCount = this.getPendingStore().getPendingCount(sessionDbId);
if (pendingCount > 0) continue; // Has work to do, don't evict
// Pick the session with the oldest lastGeneratorActivity (idlest)
if (session.lastGeneratorActivity < oldestActivity) {
oldestActivity = session.lastGeneratorActivity;
idlestSessionId = sessionDbId;
}
}
if (idlestSessionId === null) return false;
const session = this.sessions.get(idlestSessionId);
if (!session) return false;
logger.info('SESSION', 'Evicting idle session to free pool slot for new request (#1868)', {
sessionDbId: idlestSessionId,
idleDurationMs: Date.now() - oldestActivity
});
session.idleTimedOut = true;
session.abortController.abort();
return true;
}
/**
* Reap sessions with no active generator and no pending work that have been idle too long.
* Also reaps sessions whose generator has been stuck (no lastGeneratorActivity update) for
* longer than MAX_GENERATOR_IDLE_MS these are zombie subprocesses that will never exit
* on their own because the orphan reaper skips sessions in the active sessions map. (Issue #1652)
*
* This unblocks the orphan reaper which skips processes for "active" sessions. (Issue #1168)
*/
async reapStaleSessions(): Promise<number> {
const now = Date.now();
const staleSessionIds: number[] = [];
for (const [sessionDbId, session] of this.sessions) {
// Sessions with active generators — check for stuck/zombie generators (Issue #1652)
if (session.generatorPromise) {
const generatorIdleMs = now - session.lastGeneratorActivity;
if (generatorIdleMs > MAX_GENERATOR_IDLE_MS) {
logger.warn('SESSION', `Stale generator detected for session ${sessionDbId} (no activity for ${Math.round(generatorIdleMs / 60000)}m) — force-killing subprocess`, {
sessionDbId,
generatorIdleMs
});
// Force-kill the subprocess to unblock the stuck for-await in SDKAgent.
// Without this the generator is blocked on `for await (const msg of queryResult)`
// and will never exit even after abort() is called.
const trackedProcess = getProcessBySession(sessionDbId);
if (trackedProcess && trackedProcess.process.exitCode === null) {
try {
trackedProcess.process.kill('SIGKILL');
} catch (err) {
if (err instanceof Error) {
logger.warn('SESSION', 'Failed to SIGKILL subprocess for stale generator', { sessionDbId }, err);
} else {
logger.warn('SESSION', 'Failed to SIGKILL subprocess for stale generator with non-Error', { sessionDbId }, new Error(String(err)));
}
}
}
// Signal the SDK agent loop to exit after the subprocess dies
session.abortController.abort();
staleSessionIds.push(sessionDbId);
}
continue;
}
// Skip sessions with pending work
const pendingCount = this.getPendingStore().getPendingCount(sessionDbId);
if (pendingCount > 0) continue;
// No generator + no pending work + old enough = stale
const sessionAge = now - session.startTime;
if (sessionAge > MAX_SESSION_IDLE_MS) {
logger.warn('SESSION', `Reaping idle session ${sessionDbId} (no activity for >${Math.round(MAX_SESSION_IDLE_MS / 60000)}m)`, { sessionDbId });
staleSessionIds.push(sessionDbId);
}
}
for (const sessionDbId of staleSessionIds) {
await this.deleteSession(sessionDbId);
}
return staleSessionIds.length;
}
/**
* Shutdown all active sessions
*/
+3 -1
View File
@@ -37,7 +37,9 @@ export class SettingsManager {
for (const row of rows) {
const key = row.key as keyof ViewerSettings;
if (key in settings) {
settings[key] = JSON.parse(row.value) as ViewerSettings[typeof key];
// Object.assign narrows correctly across the discriminated union
// where `settings[key] = value` would collapse to `never`.
Object.assign(settings, { [key]: JSON.parse(row.value) });
}
}
+38 -57
View File
@@ -12,8 +12,8 @@
*/
import { logger } from '../../../utils/logger.js';
import { parseObservations, parseSummary, type ParsedObservation, type ParsedSummary } from '../../../sdk/parser.js';
import { SUMMARY_MODE_MARKER, MAX_CONSECUTIVE_SUMMARY_FAILURES } from '../../../sdk/prompts.js';
import { parseAgentXml, type ParsedObservation, type ParsedSummary } from '../../../sdk/parser.js';
import { ingestSummary } from '../http/shared.js';
import { updateCursorContextForProject } from '../../integrations/CursorHooksInstaller.js';
import { notifyTelegram } from '../../integrations/TelegramNotifier.js';
import { updateFolderClaudeMdFiles } from '../../../utils/claude-md-utils.js';
@@ -67,39 +67,16 @@ export async function processAgentResponse(
session.conversationHistory.push({ role: 'assistant', content: text });
}
// Parse observations and summary
const observations = parseObservations(text, session.contentSessionId);
// Single fail-fast parse (PATHFINDER plan 03 phase 1+2). On invalid XML,
// mark each in-flight pending message failed and stop. The PendingMessageStore
// retry ladder is the legitimate primary-path surface for transient failures;
// there is no circuit breaker, no coercion.
const parsed = parseAgentXml(text, session.contentSessionId);
// Detect whether the most recent prompt was a summary request.
// If so, enable observation-to-summary coercion to prevent the infinite
// retry loop described in #1633.
const lastMessage = session.conversationHistory.at(-1);
const lastUserMessage = lastMessage?.role === 'user'
? lastMessage
: session.conversationHistory.findLast(m => m.role === 'user') ?? null;
const summaryExpected = lastUserMessage?.content?.includes(SUMMARY_MODE_MARKER) ?? false;
const summary = parseSummary(text, session.sessionDbId, summaryExpected);
// Detect non-XML responses (auth errors, rate limits, garbled output).
// When the response contains no parseable XML and produced no observations,
// mark the pending messages as failed instead of confirming them — this prevents
// silent data loss when the LLM returns garbage (#1874).
const isNonXmlResponse = (
text.trim() &&
observations.length === 0 &&
!summary &&
!/<observation>|<summary>|<skip_summary\b/.test(text)
);
if (isNonXmlResponse) {
const preview = text.length > 200 ? `${text.slice(0, 200)}...` : text;
logger.warn('PARSER', `${agentName} returned non-XML response; marking messages as failed for retry (#1874)`, {
if (!parsed.valid) {
logger.warn('PARSER', `${agentName} returned unparseable response: ${parsed.reason}`, {
sessionId: session.sessionDbId,
preview
});
// Mark messages as failed (retry logic in PendingMessageStore handles retries)
const pendingStore = sessionManager.getPendingMessageStore();
for (const messageId of session.processingMessageIds) {
pendingStore.markFailed(messageId);
@@ -108,6 +85,17 @@ export async function processAgentResponse(
return;
}
let observations: ParsedObservation[] = [];
let summary: ParsedSummary | null = null;
if (parsed.kind === 'observation') {
observations = parsed.data;
} else if (!parsed.data.skipped) {
// `<skip_summary/>` is a first-class parser result but carries nothing to
// persist; the summary storage path is skipped entirely so storeObservations
// does not see an empty record.
summary = parsed.data;
}
// Convert nullable fields to empty strings for storeSummary (if summary exists)
const summaryForStore = normalizeSummaryForStorage(summary);
@@ -174,30 +162,23 @@ export async function processAgentResponse(
// to the Stop hook for silent-summary-loss detection (#1633)
session.lastSummaryStored = result.summaryId !== null;
// Circuit breaker: track consecutive summary failures (#1633).
// Only evaluate when a summary was actually expected (summarize message was sent).
// Without this guard, the counter would increment on every normal observation
// response, tripping the breaker after 3 observations and permanently blocking
// summarization — reproducing the data-loss scenario this fix is meant to prevent.
if (summaryExpected) {
const skippedIntentionally = /<skip_summary\b/.test(text);
if (summaryForStore !== null) {
// Summary was present in the response — reset the failure counter
session.consecutiveSummaryFailures = 0;
} else if (skippedIntentionally) {
// Explicit <skip_summary/> is a valid protocol response — neither success
// nor failure. Leave the counter unchanged so we don't mask a bad run that
// happens to end on a skip, but also don't punish intentional skips.
} else {
// Summary was expected but none was stored — count as failure
session.consecutiveSummaryFailures += 1;
if (session.consecutiveSummaryFailures >= MAX_CONSECUTIVE_SUMMARY_FAILURES) {
logger.error('SESSION', `Circuit breaker: ${session.consecutiveSummaryFailures} consecutive summary failures — further summarize requests will be skipped (#1633)`, {
sessionId: session.sessionDbId,
contentSessionId: session.contentSessionId
});
}
}
// Gate ingestSummary({kind:'parsed'}) on real persistence so the event bus
// only fires for summaries that actually landed in the DB. Skipped summaries
// (<skip_summary/>) are an explicit bypass and still notify.
if (parsed.kind === 'summary' && (parsed.data.skipped || session.lastSummaryStored)) {
const messageId = session.processingMessageIds[0] ?? -1;
ingestSummary({
kind: 'parsed',
sessionDbId: session.sessionDbId,
messageId,
contentSessionId: session.contentSessionId,
parsed: parsed.data,
});
} else if (parsed.kind === 'summary') {
logger.warn('DB', 'summary parsed but no row persisted; suppressing summaryStoredEvent', {
sessionId: session.sessionDbId,
memorySessionId: session.memorySessionId,
});
}
// CLAIM-CONFIRM: Now that storage succeeded, confirm all processing messages (delete from queue)
@@ -342,7 +323,7 @@ async function syncAndBroadcastObservations(
// Only runs if CLAUDE_MEM_FOLDER_CLAUDEMD_ENABLED is true (default: false)
const settings = SettingsDefaultsManager.loadFromFile(USER_SETTINGS_PATH);
// Handle both string 'true' and boolean true from JSON settings
const settingValue = settings.CLAUDE_MEM_FOLDER_CLAUDEMD_ENABLED;
const settingValue: unknown = settings.CLAUDE_MEM_FOLDER_CLAUDEMD_ENABLED;
const folderClaudeMdEnabled = settingValue === 'true' || settingValue === true;
if (folderClaudeMdEnabled) {
@@ -47,20 +47,6 @@ export abstract class BaseRouteHandler {
return value;
}
/**
* Validate required body parameters
* Returns true if all required params present, sends 400 error otherwise
*/
protected validateRequired(req: Request, res: Response, params: string[]): boolean {
for (const param of params) {
if (req.body[param] === undefined || req.body[param] === null) {
this.badRequest(res, `Missing ${param}`);
return false;
}
}
return true;
}
/**
* Send 400 Bad Request response
*/
-36
View File
@@ -42,42 +42,6 @@ export function createMiddleware(
credentials: false
}));
// Simple in-memory rate limiter (#1935).
// Worker binds localhost-only, so in practice this is a global 300 req/min
// cap — every caller shares the 127.0.0.1/::1 bucket.
const requestCounts = new Map<string, { count: number; resetAt: number }>();
const RATE_LIMIT_WINDOW_MS = 60_000;
const RATE_LIMIT_MAX_REQUESTS = 300;
const rateLimiter: RequestHandler = (req, res, next) => {
// Normalise IPv4-mapped IPv6 so 127.0.0.1 and ::ffff:127.0.0.1 share a bucket.
const clientIp = (req.socket.remoteAddress ?? req.ip ?? 'unknown').replace(/^::ffff:/, '');
const now = Date.now();
let entry = requestCounts.get(clientIp);
if (!entry || now >= entry.resetAt) {
// Safety valve in case the worker is ever bound non-localhost.
if (requestCounts.size > 1000) {
for (const [ip, e] of requestCounts) {
if (now >= e.resetAt) requestCounts.delete(ip);
}
}
entry = { count: 0, resetAt: now + RATE_LIMIT_WINDOW_MS };
requestCounts.set(clientIp, entry);
}
if (entry.count >= RATE_LIMIT_MAX_REQUESTS) {
res.set('Retry-After', String(Math.ceil((entry.resetAt - now) / 1000)));
res.status(429).json({ error: 'Rate limit exceeded' });
return;
}
entry.count++;
next();
};
middlewares.push(rateLimiter);
// HTTP request/response logging
middlewares.push((req: Request, res: Response, next: NextFunction) => {
// Skip logging for static assets, health checks, and polling endpoints
@@ -0,0 +1,37 @@
/**
* Zod body-validation middleware PATHFINDER-2026-04-22 Plan 06 Phase 2.
*
* Canonical signature: given a Zod schema, parse `req.body` with `safeParse`.
* On failure, respond 400 with `{ error: 'ValidationError', issues: [...] }`
* and stop. On success, replace `req.body` with the parsed (typed) value and
* call `next()`.
*
* Principles:
* - Principle 2 Fail-fast over grace-degrade. No try/catch swallow,
* no coercion, no "best-effort" defaults.
* - Principle 6 One helper, N callers. Every validated POST/PUT
* across `src/services/worker/http/routes/` uses this one middleware
* wrapped around a per-route Zod schema declared at the top of its
* owning route file.
*/
import type { RequestHandler } from 'express';
import type { ZodTypeAny } from 'zod';
export const validateBody = <S extends ZodTypeAny>(schema: S): RequestHandler =>
(req, res, next) => {
const result = schema.safeParse(req.body);
if (!result.success) {
res.status(400).json({
error: 'ValidationError',
issues: result.error.issues.map(i => ({
path: i.path,
message: i.message,
code: i.code,
})),
});
return;
}
req.body = result.data;
next();
};
@@ -0,0 +1,78 @@
/**
* Chroma Routes
*
* Provides diagnostic endpoints for ChromaDB integration.
*/
import express, { Request, Response } from 'express';
import { BaseRouteHandler } from '../BaseRouteHandler.js';
import { ChromaMcpManager } from '../../../sync/ChromaMcpManager.js';
import { logger } from '../../../../utils/logger.js';
import { SettingsDefaultsManager } from '../../../../shared/SettingsDefaultsManager.js';
import { USER_SETTINGS_PATH } from '../../../../shared/paths.js';
export class ChromaRoutes extends BaseRouteHandler {
setupRoutes(app: express.Application): void {
app.get('/api/chroma/status', this.handleGetStatus.bind(this));
}
/**
* GET /api/chroma/status
* Returns current health and connection status of chroma-mcp.
*
* Pass `?deep=1` (or `?deep=true`) to additionally run a real
* semantic-search round-trip via ChromaMcpManager.probeSemanticSearch().
* The cheap path (no `deep`) stays cheap it only calls isHealthy().
*/
private handleGetStatus = this.wrapHandler(async (req: Request, res: Response): Promise<void> => {
const settings = SettingsDefaultsManager.loadFromFile(USER_SETTINGS_PATH);
const chromaEnabled = settings.CLAUDE_MEM_CHROMA_ENABLED !== 'false';
// Truthy check: any non-empty, non-"false"/"0" value enables deep probe.
// Bare `?deep` (no value) shows up as '' in Express, which we treat as enabled.
const deepRaw = req.query.deep;
const deepEnabled =
deepRaw !== undefined &&
deepRaw !== 'false' &&
deepRaw !== '0';
if (!chromaEnabled) {
res.json({
status: 'disabled',
connected: false,
timestamp: new Date().toISOString(),
details: 'Chroma is disabled via CLAUDE_MEM_CHROMA_ENABLED=false',
deep: deepEnabled
});
return;
}
const chromaMcp = ChromaMcpManager.getInstance();
const isHealthy = await chromaMcp.isHealthy();
if (!deepEnabled) {
res.json({
status: isHealthy ? 'healthy' : 'unhealthy',
connected: isHealthy,
timestamp: new Date().toISOString(),
details: isHealthy ? 'chroma-mcp is responding to tool calls' : 'chroma-mcp health check failed',
deep: false
});
return;
}
const probe = await chromaMcp.probeSemanticSearch();
const status = probe.ok ? 'healthy' : 'unhealthy';
res.json({
status,
connected: isHealthy,
timestamp: new Date().toISOString(),
details: probe.ok
? 'chroma-mcp semantic search round-trip succeeded'
: `chroma-mcp deep probe failed at stage '${probe.stage}'`,
deep: true,
probe
});
});
}
+64 -86
View File
@@ -6,14 +6,65 @@
*/
import express, { Request, Response } from 'express';
import { z } from 'zod';
import { BaseRouteHandler } from '../BaseRouteHandler.js';
import { logger } from '../../../../utils/logger.js';
import { validateBody } from '../middleware/validateBody.js';
import { CorpusStore } from '../../knowledge/CorpusStore.js';
import { CorpusBuilder } from '../../knowledge/CorpusBuilder.js';
import { KnowledgeAgent } from '../../knowledge/KnowledgeAgent.js';
import type { CorpusFilter } from '../../knowledge/types.js';
const ALLOWED_CORPUS_TYPES = new Set(['decision', 'bugfix', 'feature', 'refactor', 'discovery', 'change', 'security_alert', 'security_note']);
const ALLOWED_CORPUS_TYPES = ['decision', 'bugfix', 'feature', 'refactor', 'discovery', 'change', 'security_alert', 'security_note'] as const;
const ALLOWED_CORPUS_TYPE_SET = new Set<string>(ALLOWED_CORPUS_TYPES);
// Plan 06 Phase 3 — per-route Zod schemas. Coercions match the legacy
// `coerceStringArray` / `coercePositiveInteger` semantics: accept JSON
// strings, comma-separated strings, or native arrays; reject empty fields.
const stringArrayLike = z.preprocess((value) => {
if (value === undefined || value === null || value === '') return undefined;
if (Array.isArray(value)) return value;
if (typeof value === 'string') {
try {
const parsed = JSON.parse(value);
if (Array.isArray(parsed)) return parsed;
} catch {
// not JSON, fall through to comma split
}
return value.split(',').map((part) => part.trim()).filter(Boolean);
}
return value;
}, z.array(z.string().min(1)).optional());
const positiveIntegerLike = z.preprocess((value) => {
if (value === undefined || value === null || value === '') return undefined;
if (typeof value === 'string') {
const parsed = Number(value);
return Number.isNaN(parsed) ? value : parsed;
}
return value;
}, z.number().int().positive().optional());
const buildCorpusSchema = z.object({
name: z.string().min(1),
description: z.string().optional(),
project: z.string().optional(),
types: stringArrayLike.refine(
(arr) => arr === undefined || arr.every((t) => ALLOWED_CORPUS_TYPE_SET.has(t)),
{ message: `types must contain only ${ALLOWED_CORPUS_TYPES.join(', ')}` }
),
concepts: stringArrayLike,
files: stringArrayLike,
query: z.string().optional(),
date_start: z.string().optional(),
date_end: z.string().optional(),
limit: positiveIntegerLike,
}).passthrough();
const queryCorpusSchema = z.object({
question: z.string().trim().min(1),
}).passthrough();
const emptyBodySchema = z.object({}).passthrough();
export class CorpusRoutes extends BaseRouteHandler {
constructor(
@@ -25,14 +76,14 @@ export class CorpusRoutes extends BaseRouteHandler {
}
setupRoutes(app: express.Application): void {
app.post('/api/corpus', this.handleBuildCorpus.bind(this));
app.post('/api/corpus', validateBody(buildCorpusSchema), this.handleBuildCorpus.bind(this));
app.get('/api/corpus', this.handleListCorpora.bind(this));
app.get('/api/corpus/:name', this.handleGetCorpus.bind(this));
app.delete('/api/corpus/:name', this.handleDeleteCorpus.bind(this));
app.post('/api/corpus/:name/rebuild', this.handleRebuildCorpus.bind(this));
app.post('/api/corpus/:name/prime', this.handlePrimeCorpus.bind(this));
app.post('/api/corpus/:name/query', this.handleQueryCorpus.bind(this));
app.post('/api/corpus/:name/reprime', this.handleReprimeCorpus.bind(this));
app.post('/api/corpus/:name/rebuild', validateBody(emptyBodySchema), this.handleRebuildCorpus.bind(this));
app.post('/api/corpus/:name/prime', validateBody(emptyBodySchema), this.handlePrimeCorpus.bind(this));
app.post('/api/corpus/:name/query', validateBody(queryCorpusSchema), this.handleQueryCorpus.bind(this));
app.post('/api/corpus/:name/reprime', validateBody(emptyBodySchema), this.handleReprimeCorpus.bind(this));
}
/**
@@ -41,42 +92,18 @@ export class CorpusRoutes extends BaseRouteHandler {
* Body: { name, description?, project?, types?, concepts?, files?, query?, date_start?, date_end?, limit? }
*/
private handleBuildCorpus = this.wrapHandler(async (req: Request, res: Response): Promise<void> => {
if (!req.body.name) {
res.status(400).json({
error: 'Missing required field: name',
fix: 'Add a "name" field to your request body',
example: { name: 'my-corpus', query: 'hooks', limit: 100 }
});
return;
}
const { name, description, project, types, concepts, files, query, date_start, date_end, limit } = req.body;
const coercedTypes = this.coerceStringArray(types, 'types', res);
if (coercedTypes === null) return;
if (coercedTypes && !coercedTypes.every(type => ALLOWED_CORPUS_TYPES.has(type))) {
this.badRequest(res, 'types must contain valid observation types');
return;
}
const coercedConcepts = this.coerceStringArray(concepts, 'concepts', res);
if (coercedConcepts === null) return;
const coercedFiles = this.coerceStringArray(files, 'files', res);
if (coercedFiles === null) return;
const coercedLimit = this.coercePositiveInteger(limit, 'limit', res);
if (coercedLimit === null) return;
const { name, description, project, types, concepts, files, query, date_start, date_end, limit } =
req.body as z.infer<typeof buildCorpusSchema>;
const filter: CorpusFilter = {};
if (project) filter.project = project;
if (coercedTypes && coercedTypes.length > 0) filter.types = coercedTypes as CorpusFilter['types'];
if (coercedConcepts && coercedConcepts.length > 0) filter.concepts = coercedConcepts;
if (coercedFiles && coercedFiles.length > 0) filter.files = coercedFiles;
if (types && types.length > 0) filter.types = types as CorpusFilter['types'];
if (concepts && concepts.length > 0) filter.concepts = concepts;
if (files && files.length > 0) filter.files = files;
if (query) filter.query = query;
if (date_start) filter.date_start = date_start;
if (date_end) filter.date_end = date_end;
if (coercedLimit !== undefined) filter.limit = coercedLimit;
if (limit !== undefined) filter.limit = limit;
const corpus = await this.corpusBuilder.build(name, description || '', filter);
@@ -85,45 +112,6 @@ export class CorpusRoutes extends BaseRouteHandler {
res.json(metadata);
});
private coerceStringArray(value: unknown, fieldName: string, res: Response): string[] | null | undefined {
if (value === undefined || value === null || value === '') {
return undefined;
}
let parsed = value;
if (typeof value === 'string') {
try {
parsed = JSON.parse(value);
} catch (parseError: unknown) {
if (parseError instanceof Error) {
logger.debug('HTTP', `${fieldName} is not valid JSON, treating as comma-separated string`, { value });
}
parsed = value.split(',').map(part => part.trim()).filter(Boolean);
}
}
if (!Array.isArray(parsed) || !parsed.every(item => typeof item === 'string')) {
this.badRequest(res, `${fieldName} must be an array of strings`);
return null;
}
return parsed.map(item => item.trim()).filter(Boolean);
}
private coercePositiveInteger(value: unknown, fieldName: string, res: Response): number | null | undefined {
if (value === undefined || value === null || value === '') {
return undefined;
}
const parsed = typeof value === 'string' ? Number(value) : value;
if (typeof parsed !== 'number' || !Number.isInteger(parsed) || parsed <= 0) {
this.badRequest(res, `${fieldName} must be a positive integer`);
return null;
}
return parsed;
}
/**
* List all corpora with stats
* GET /api/corpus
@@ -234,16 +222,6 @@ export class CorpusRoutes extends BaseRouteHandler {
*/
private handleQueryCorpus = this.wrapHandler(async (req: Request, res: Response): Promise<void> => {
const { name } = req.params;
if (!req.body.question || typeof req.body.question !== 'string' || req.body.question.trim().length === 0) {
res.status(400).json({
error: 'Missing required field: question',
fix: 'Add a non-empty "question" string to your request body',
example: { question: 'What architectural decisions were made about hooks?' }
});
return;
}
const corpus = this.corpusStore.read(name);
if (!corpus) {
+61 -130
View File
@@ -6,6 +6,7 @@
*/
import express, { Request, Response } from 'express';
import { z } from 'zod';
import path from 'path';
import { readFileSync, statSync, existsSync } from 'fs';
import { logger } from '../../../../utils/logger.js';
@@ -18,9 +19,63 @@ import { SessionManager } from '../../SessionManager.js';
import { SSEBroadcaster } from '../../SSEBroadcaster.js';
import type { WorkerService } from '../../../worker-service.js';
import { BaseRouteHandler } from '../BaseRouteHandler.js';
import { validateBody } from '../middleware/validateBody.js';
import { normalizePlatformSource } from '../../../../shared/platform-source.js';
import { getObservationsByFilePath } from '../../../sqlite/observations/get.js';
// Plan 06 Phase 3 — per-route Zod schemas. Coercions match the legacy
// behaviour where MCP clients sometimes send arrays as JSON-encoded strings
// or comma-separated strings.
const integerArrayLike = z.preprocess((value) => {
if (Array.isArray(value)) return value;
if (typeof value === 'string') {
try {
const parsed = JSON.parse(value);
if (Array.isArray(parsed)) return parsed;
} catch {
// not JSON, fall through to comma split
}
// Keep NaN values so the inner z.number().int() schema rejects them
// — coercion does not silently drop garbage input.
return value.split(',').map((part) => Number(part.trim()));
}
return value;
}, z.array(z.number().int()));
const stringArrayLike = z.preprocess((value) => {
if (Array.isArray(value)) return value;
if (typeof value === 'string') {
try {
const parsed = JSON.parse(value);
if (Array.isArray(parsed)) return parsed;
} catch {
// not JSON, fall through to comma split
}
return value.split(',').map((part) => part.trim()).filter(Boolean);
}
return value;
}, z.array(z.string()));
const observationsBatchSchema = z.object({
ids: integerArrayLike,
orderBy: z.enum(['date_desc', 'date_asc']).optional(),
limit: z.number().int().positive().optional(),
project: z.string().optional(),
}).passthrough();
const sdkSessionsBatchSchema = z.object({
memorySessionIds: stringArrayLike,
}).passthrough();
const setProcessingSchema = z.object({}).passthrough();
const importSchema = z.object({
sessions: z.array(z.unknown()).optional(),
summaries: z.array(z.unknown()).optional(),
observations: z.array(z.unknown()).optional(),
prompts: z.array(z.unknown()).optional(),
}).passthrough();
export class DataRoutes extends BaseRouteHandler {
constructor(
private paginationHelper: PaginationHelper,
@@ -42,9 +97,9 @@ export class DataRoutes extends BaseRouteHandler {
// Fetch by ID endpoints
app.get('/api/observation/:id', this.handleGetObservationById.bind(this));
app.get('/api/observations/by-file', this.handleGetObservationsByFile.bind(this));
app.post('/api/observations/batch', this.handleGetObservationsByIds.bind(this));
app.post('/api/observations/batch', validateBody(observationsBatchSchema), this.handleGetObservationsByIds.bind(this));
app.get('/api/session/:id', this.handleGetSessionById.bind(this));
app.post('/api/sdk-sessions/batch', this.handleGetSdkSessionsByIds.bind(this));
app.post('/api/sdk-sessions/batch', validateBody(sdkSessionsBatchSchema), this.handleGetSdkSessionsByIds.bind(this));
app.get('/api/prompt/:id', this.handleGetPromptById.bind(this));
// Metadata endpoints
@@ -53,16 +108,10 @@ export class DataRoutes extends BaseRouteHandler {
// Processing status endpoints
app.get('/api/processing-status', this.handleGetProcessingStatus.bind(this));
app.post('/api/processing', this.handleSetProcessing.bind(this));
// Pending queue management endpoints
app.get('/api/pending-queue', this.handleGetPendingQueue.bind(this));
app.post('/api/pending-queue/process', this.handleProcessPendingQueue.bind(this));
app.delete('/api/pending-queue/failed', this.handleClearFailedQueue.bind(this));
app.delete('/api/pending-queue/all', this.handleClearAllQueue.bind(this));
app.post('/api/processing', validateBody(setProcessingSchema), this.handleSetProcessing.bind(this));
// Import endpoint
app.post('/api/import', this.handleImport.bind(this));
app.post('/api/import', validateBody(importSchema), this.handleImport.bind(this));
}
/**
@@ -139,29 +188,13 @@ export class DataRoutes extends BaseRouteHandler {
* Body: { ids: number[], orderBy?: 'date_desc' | 'date_asc', limit?: number, project?: string }
*/
private handleGetObservationsByIds = this.wrapHandler((req: Request, res: Response): void => {
let { ids, orderBy, limit, project } = req.body;
// Coerce string-encoded arrays from MCP clients (e.g. "[1,2,3]" or "1,2,3")
if (typeof ids === 'string') {
try { ids = JSON.parse(ids); } catch { ids = ids.split(',').map(Number); }
}
if (!ids || !Array.isArray(ids)) {
this.badRequest(res, 'ids must be an array of numbers');
return;
}
const { ids, orderBy, limit, project } = req.body as z.infer<typeof observationsBatchSchema>;
if (ids.length === 0) {
res.json([]);
return;
}
// Validate all IDs are numbers
if (!ids.every(id => typeof id === 'number' && Number.isInteger(id))) {
this.badRequest(res, 'All ids must be integers');
return;
}
const store = this.dbManager.getSessionStore();
const observations = store.getObservationsByIds(ids, { orderBy, limit, project });
@@ -193,17 +226,7 @@ export class DataRoutes extends BaseRouteHandler {
* Body: { memorySessionIds: string[] }
*/
private handleGetSdkSessionsByIds = this.wrapHandler((req: Request, res: Response): void => {
let { memorySessionIds } = req.body;
// Coerce string-encoded arrays from MCP clients (e.g. '["a","b"]' or "a,b")
if (typeof memorySessionIds === 'string') {
try { memorySessionIds = JSON.parse(memorySessionIds); } catch { memorySessionIds = memorySessionIds.split(',').map((s: string) => s.trim()); }
}
if (!Array.isArray(memorySessionIds)) {
this.badRequest(res, 'memorySessionIds must be an array');
return;
}
const { memorySessionIds } = req.body as z.infer<typeof sdkSessionsBatchSchema>;
const store = this.dbManager.getSessionStore();
const sessions = store.getSdkSessionsBySessionIds(memorySessionIds);
@@ -467,96 +490,4 @@ export class DataRoutes extends BaseRouteHandler {
});
});
/**
* Get pending queue contents
* GET /api/pending-queue
* Returns all pending, processing, and failed messages with optional recently processed
*/
private handleGetPendingQueue = this.wrapHandler((req: Request, res: Response): void => {
const { PendingMessageStore } = require('../../../sqlite/PendingMessageStore.js');
const pendingStore = new PendingMessageStore(this.dbManager.getSessionStore().db, 3);
// Get queue contents (pending, processing, failed)
const queueMessages = pendingStore.getQueueMessages();
// Get recently processed (last 30 min, up to 20)
const recentlyProcessed = pendingStore.getRecentlyProcessed(20, 30);
// Get stuck message count (processing > 5 min)
const stuckCount = pendingStore.getStuckCount(5 * 60 * 1000);
// Get sessions with pending work
const sessionsWithPending = pendingStore.getSessionsWithPendingMessages();
res.json({
queue: {
messages: queueMessages,
totalPending: queueMessages.filter((m: { status: string }) => m.status === 'pending').length,
totalProcessing: queueMessages.filter((m: { status: string }) => m.status === 'processing').length,
totalFailed: queueMessages.filter((m: { status: string }) => m.status === 'failed').length,
stuckCount
},
recentlyProcessed,
sessionsWithPendingWork: sessionsWithPending
});
});
/**
* Process pending queue
* POST /api/pending-queue/process
* Body: { sessionLimit?: number } - defaults to 10
* Starts SDK agents for sessions with pending messages
*/
private handleProcessPendingQueue = this.wrapHandler(async (req: Request, res: Response): Promise<void> => {
const sessionLimit = Math.min(
Math.max(parseInt(req.body.sessionLimit, 10) || 10, 1),
100 // Max 100 sessions at once
);
const result = await this.workerService.processPendingQueues(sessionLimit);
res.json({
success: true,
...result
});
});
/**
* Clear all failed messages from the queue
* DELETE /api/pending-queue/failed
* Returns the number of messages cleared
*/
private handleClearFailedQueue = this.wrapHandler((req: Request, res: Response): void => {
const { PendingMessageStore } = require('../../../sqlite/PendingMessageStore.js');
const pendingStore = new PendingMessageStore(this.dbManager.getSessionStore().db, 3);
const clearedCount = pendingStore.clearFailed();
logger.info('QUEUE', 'Cleared failed queue messages', { clearedCount });
res.json({
success: true,
clearedCount
});
});
/**
* Clear all messages from the queue (pending, processing, and failed)
* DELETE /api/pending-queue/all
* Returns the number of messages cleared
*/
private handleClearAllQueue = this.wrapHandler((req: Request, res: Response): void => {
const { PendingMessageStore } = require('../../../sqlite/PendingMessageStore.js');
const pendingStore = new PendingMessageStore(this.dbManager.getSessionStore().db, 3);
const clearedCount = pendingStore.clearAll();
logger.warn('QUEUE', 'Cleared ALL queue messages (pending, processing, failed)', { clearedCount });
res.json({
success: true,
clearedCount
});
});
}
@@ -5,11 +5,16 @@
*/
import express, { Request, Response } from 'express';
import { z } from 'zod';
import { openSync, fstatSync, readSync, closeSync, existsSync, writeFileSync } from 'fs';
import { join } from 'path';
import { logger } from '../../../../utils/logger.js';
import { SettingsDefaultsManager } from '../../../../shared/SettingsDefaultsManager.js';
import { BaseRouteHandler } from '../BaseRouteHandler.js';
import { validateBody } from '../middleware/validateBody.js';
// Plan 06 Phase 3 — per-route Zod schema. The clear-logs endpoint takes no body.
const clearLogsSchema = z.object({}).passthrough();
/**
* Read the last N lines from a file without loading the entire file into memory.
@@ -99,7 +104,7 @@ export class LogsRoutes extends BaseRouteHandler {
setupRoutes(app: express.Application): void {
app.get('/api/logs', this.handleGetLogs.bind(this));
app.post('/api/logs/clear', this.handleClearLogs.bind(this));
app.post('/api/logs/clear', validateBody(clearLogsSchema), this.handleClearLogs.bind(this));
}
/**
@@ -6,10 +6,19 @@
*/
import express, { Request, Response } from 'express';
import { z } from 'zod';
import { BaseRouteHandler } from '../BaseRouteHandler.js';
import { validateBody } from '../middleware/validateBody.js';
import { logger } from '../../../../utils/logger.js';
import type { DatabaseManager } from '../../DatabaseManager.js';
// Plan 06 Phase 3 — per-route Zod schema.
const saveMemorySchema = z.object({
text: z.string().trim().min(1),
title: z.string().optional(),
project: z.string().optional(),
}).passthrough();
export class MemoryRoutes extends BaseRouteHandler {
constructor(
private dbManager: DatabaseManager,
@@ -19,7 +28,7 @@ export class MemoryRoutes extends BaseRouteHandler {
}
setupRoutes(app: express.Application): void {
app.post('/api/memory/save', this.handleSaveMemory.bind(this));
app.post('/api/memory/save', validateBody(saveMemorySchema), this.handleSaveMemory.bind(this));
}
/**
@@ -27,14 +36,9 @@ export class MemoryRoutes extends BaseRouteHandler {
* Body: { text: string, title?: string, project?: string }
*/
private handleSaveMemory = this.wrapHandler(async (req: Request, res: Response): Promise<void> => {
const { text, title, project } = req.body;
const { text, title, project } = req.body as z.infer<typeof saveMemorySchema>;
const targetProject = project || this.defaultProject;
if (!text || typeof text !== 'string' || text.trim().length === 0) {
this.badRequest(res, 'text is required and must be non-empty');
return;
}
const sessionStore = this.dbManager.getSessionStore();
const chromaSync = this.dbManager.getChromaSync();
@@ -69,6 +73,17 @@ export class MemoryRoutes extends BaseRouteHandler {
});
// 4. Sync to ChromaDB (async, fire-and-forget)
if (!chromaSync) {
logger.debug('CHROMA', 'ChromaDB sync skipped (chromaSync not available)', { id: result.id });
res.json({
success: true,
id: result.id,
title: observation.title,
project: targetProject,
message: `Memory saved as observation #${result.id}`
});
return;
}
chromaSync.syncObservation(
result.id,
memorySessionId,
+147 -7
View File
@@ -6,9 +6,21 @@
*/
import express, { Request, Response } from 'express';
import { z } from 'zod';
import { SearchManager } from '../../SearchManager.js';
import { BaseRouteHandler } from '../BaseRouteHandler.js';
import { validateBody } from '../middleware/validateBody.js';
import { logger } from '../../../../utils/logger.js';
import { groupByDate } from '../../../../shared/timeline-formatting.js';
import type { ObservationSearchResult, SessionSummarySearchResult } from '../../../sqlite/types.js';
// Plan 06 Phase 3 — per-route Zod schema. The semantic-context endpoint
// also accepts query-string fallbacks, so the body itself is fully optional.
const semanticContextSchema = z.object({
q: z.string().optional(),
project: z.string().optional(),
limit: z.union([z.string(), z.number()]).optional(),
}).passthrough();
export class SearchRoutes extends BaseRouteHandler {
constructor(
@@ -38,7 +50,7 @@ export class SearchRoutes extends BaseRouteHandler {
app.get('/api/context/timeline', this.handleGetContextTimeline.bind(this));
app.get('/api/context/preview', this.handleContextPreview.bind(this));
app.get('/api/context/inject', this.handleContextInject.bind(this));
app.post('/api/context/semantic', this.handleSemanticContext.bind(this));
app.post('/api/context/semantic', validateBody(semanticContextSchema), this.handleSemanticContext.bind(this));
// Timeline and help endpoints
app.get('/api/timeline/by-query', this.handleGetTimelineByQuery.bind(this));
@@ -120,28 +132,156 @@ export class SearchRoutes extends BaseRouteHandler {
/**
* Search observations by concept
* GET /api/search/by-concept?concept=discovery&limit=5
*
* Chroma errors surface as 503 via ChromaUnavailableError (thrown by orchestrator).
*/
private handleSearchByConcept = this.wrapHandler(async (req: Request, res: Response): Promise<void> => {
const result = await this.searchManager.findByConcept(req.query);
res.json(result);
const orchestrator = this.searchManager.getOrchestrator();
const formatter = this.searchManager.getFormatter();
const query = req.query as Record<string, any>;
const rawConcept = query.concepts ?? query.concept;
const concept = Array.isArray(rawConcept) ? rawConcept[0] : rawConcept;
const strategyResult = await orchestrator.findByConcept(concept, query);
const observations = strategyResult.results.observations;
if (observations.length === 0) {
res.json({
content: [{
type: 'text' as const,
text: `No observations found with concept "${concept}"`
}]
});
return;
}
const header = `Found ${observations.length} observation(s) with concept "${concept}"\n\n${formatter.formatTableHeader()}`;
const rows = observations.map((obs: ObservationSearchResult, i: number) => formatter.formatObservationIndex(obs, i));
res.json({
content: [{
type: 'text' as const,
text: header + '\n' + rows.join('\n')
}]
});
});
/**
* Search by file path
* GET /api/search/by-file?filePath=...&limit=10
*
* Chroma errors surface as 503 via ChromaUnavailableError (thrown by orchestrator).
*/
private handleSearchByFile = this.wrapHandler(async (req: Request, res: Response): Promise<void> => {
const result = await this.searchManager.findByFile(req.query);
res.json(result);
const orchestrator = this.searchManager.getOrchestrator();
const formatter = this.searchManager.getFormatter();
const query = req.query as Record<string, any>;
// Accept both filePath and files for API compatibility
const rawFilePath = query.filePath ?? query.files;
const filePath = Array.isArray(rawFilePath)
? rawFilePath[0]
: (typeof rawFilePath === 'string' && rawFilePath.includes(','))
? rawFilePath.split(',')[0].trim()
: rawFilePath;
const { observations, sessions } = await orchestrator.findByFile(filePath, query);
const totalResults = observations.length + sessions.length;
if (totalResults === 0) {
res.json({
content: [{
type: 'text' as const,
text: `No results found for file "${filePath}"`
}]
});
return;
}
// Combine observations and sessions with timestamps for date grouping
const combined: Array<{
type: 'observation' | 'session';
data: ObservationSearchResult | SessionSummarySearchResult;
epoch: number;
created_at: string;
}> = [
...observations.map((obs: ObservationSearchResult) => ({
type: 'observation' as const,
data: obs,
epoch: obs.created_at_epoch,
created_at: obs.created_at
})),
...sessions.map((sess: SessionSummarySearchResult) => ({
type: 'session' as const,
data: sess,
epoch: sess.created_at_epoch,
created_at: sess.created_at
}))
];
combined.sort((a, b) => b.epoch - a.epoch);
const resultsByDate = groupByDate(combined, item => item.created_at);
const lines: string[] = [];
lines.push(`Found ${totalResults} result(s) for file "${filePath}"`);
lines.push('');
for (const [day, dayResults] of resultsByDate) {
lines.push(`### ${day}`);
lines.push('');
lines.push(formatter.formatTableHeader());
for (const result of dayResults) {
if (result.type === 'observation') {
lines.push(formatter.formatObservationIndex(result.data as ObservationSearchResult, 0));
} else {
lines.push(formatter.formatSessionIndex(result.data as SessionSummarySearchResult, 0));
}
}
lines.push('');
}
res.json({
content: [{
type: 'text' as const,
text: lines.join('\n')
}]
});
});
/**
* Search observations by type
* GET /api/search/by-type?type=bugfix&limit=10
*
* Chroma errors surface as 503 via ChromaUnavailableError (thrown by orchestrator).
*/
private handleSearchByType = this.wrapHandler(async (req: Request, res: Response): Promise<void> => {
const result = await this.searchManager.findByType(req.query);
res.json(result);
const orchestrator = this.searchManager.getOrchestrator();
const formatter = this.searchManager.getFormatter();
const query = req.query as Record<string, any>;
const rawType = query.type;
const type = (typeof rawType === 'string' && rawType.includes(','))
? rawType.split(',').map((s: string) => s.trim()).filter(Boolean)
: rawType;
const typeStr = Array.isArray(type) ? type.join(', ') : type;
const strategyResult = await orchestrator.findByType(type, query);
const observations = strategyResult.results.observations;
if (observations.length === 0) {
res.json({
content: [{
type: 'text' as const,
text: `No observations found with type "${typeStr}"`
}]
});
return;
}
const header = `Found ${observations.length} observation(s) with type "${typeStr}"\n\n${formatter.formatTableHeader()}`;
const rows = observations.map((obs: ObservationSearchResult, i: number) => formatter.formatObservationIndex(obs, i));
res.json({
content: [{
type: 'text' as const,
text: header + '\n' + rows.join('\n')
}]
});
});
/**
+175 -162
View File
@@ -6,6 +6,9 @@
*/
import express, { Request, Response } from 'express';
import { z } from 'zod';
import { ingestObservation } from '../shared.js';
import { validateBody } from '../middleware/validateBody.js';
import { getWorkerPort } from '../../../../shared/worker-utils.js';
import { logger } from '../../../../utils/logger.js';
import { stripMemoryTagsFromJson, stripMemoryTagsFromPrompt } from '../../../../utils/tag-stripping.js';
@@ -21,13 +24,14 @@ import { SessionCompletionHandler } from '../../session/SessionCompletionHandler
import { PrivacyCheckValidator } from '../../validation/PrivacyCheckValidator.js';
import { SettingsDefaultsManager } from '../../../../shared/SettingsDefaultsManager.js';
import { USER_SETTINGS_PATH } from '../../../../shared/paths.js';
import { getProcessBySession, ensureProcessExit } from '../../ProcessRegistry.js';
import { getSdkProcessForSession, ensureSdkProcessExit } from '../../../../supervisor/process-registry.js';
import { getProjectContext } from '../../../../utils/project-name.js';
import { normalizePlatformSource } from '../../../../shared/platform-source.js';
import { RestartGuard } from '../../RestartGuard.js';
const MAX_USER_PROMPT_BYTES = 256 * 1024;
export class SessionRoutes extends BaseRouteHandler {
private completionHandler: SessionCompletionHandler;
private spawnInProgress = new Map<number, boolean>();
private crashRecoveryScheduled = new Set<number>();
@@ -39,13 +43,9 @@ export class SessionRoutes extends BaseRouteHandler {
private openRouterAgent: OpenRouterAgent,
private eventBroadcaster: SessionEventBroadcaster,
private workerService: WorkerService,
completionHandler: SessionCompletionHandler
private completionHandler: SessionCompletionHandler,
) {
super();
// Use the shared completion handler from WorkerService so the SDK-agent
// completion path and the HTTP fallback route operate on the same instance
// (avoids duplicate construction; keeps finalize semantics consistent).
this.completionHandler = completionHandler;
}
/**
@@ -97,7 +97,7 @@ export class SessionRoutes extends BaseRouteHandler {
private static readonly STALE_GENERATOR_THRESHOLD_MS = 30_000; // 30 seconds (#1099)
private static readonly MAX_SESSION_WALL_CLOCK_MS = 4 * 60 * 60 * 1000; // 4 hours (#1590)
private ensureGeneratorRunning(sessionDbId: number, source: string): void {
public ensureGeneratorRunning(sessionDbId: number, source: string): void {
const session = this.sessionManager.getSession(sessionDbId);
if (!session) return;
@@ -121,7 +121,7 @@ export class SessionRoutes extends BaseRouteHandler {
session.abortController.abort();
}
const pendingStore = this.sessionManager.getPendingMessageStore();
pendingStore.markAllSessionMessagesAbandoned(sessionDbId);
pendingStore.transitionMessagesTo('abandoned', { sessionDbId });
this.sessionManager.removeSessionImmediate(sessionDbId);
return;
}
@@ -253,7 +253,7 @@ export class SessionRoutes extends BaseRouteHandler {
// Mark all processing messages as failed so they can be retried or abandoned
const pendingStore = this.sessionManager.getPendingMessageStore();
try {
const failedCount = pendingStore.markSessionMessagesFailed(session.sessionDbId);
const failedCount = pendingStore.transitionMessagesTo('failed', { sessionDbId: session.sessionDbId });
if (failedCount > 0) {
logger.error('SESSION', `Marked messages as failed after generator error`, {
sessionId: session.sessionDbId,
@@ -268,10 +268,11 @@ export class SessionRoutes extends BaseRouteHandler {
}
})
.finally(async () => {
// CRITICAL: Verify subprocess exit to prevent zombie accumulation (Issue #1168)
const tracked = getProcessBySession(session.sessionDbId);
// Primary-path subprocess teardown — process-group kill ensures any
// SDK descendants are reaped too (Principle 5).
const tracked = getSdkProcessForSession(session.sessionDbId);
if (tracked && !tracked.process.killed && tracked.process.exitCode === null) {
await ensureProcessExit(tracked, 5000);
await ensureSdkProcessExit(tracked, 5000);
}
const sessionDbId = session.sessionDbId;
@@ -289,43 +290,6 @@ export class SessionRoutes extends BaseRouteHandler {
session.currentProvider = null;
this.workerService.broadcastProcessingStatus();
// Stop-hook fire-and-forget (Phase 2): if the generator just processed
// a summary and no work remains, the Stop hook is done and we should
// self-clean the session. The summary write is already committed to
// SQLite synchronously inside processAgentResponse() BEFORE startSession()
// returns (see ResponseProcessor.ts: storeObservations() is sync, and
// confirmProcessed() runs right after), so by the time this .finally()
// runs the summary is durably persisted.
//
// We gate on lastSummaryStored so we don't finalize after every idle
// timeout between tool calls — only when a real Stop event produced
// a summary record.
try {
const pendingStore = this.sessionManager.getPendingMessageStore();
const pendingNow = pendingStore.getPendingCount(sessionDbId);
if (session.lastSummaryStored === true && pendingNow === 0) {
logger.info('SESSION', 'Stop-hook self-clean: summary persisted + queue drained → finalizing', {
sessionId: sessionDbId
});
// finalizeSession is idempotent and does NOT touch the in-memory map —
// it only marks DB completed, drains any orphaned pending messages,
// and broadcasts the completion event. sessionManager cleanup is
// handled below by the existing abort/removeSessionImmediate flow.
this.completionHandler.finalizeSession(sessionDbId);
// Clear the flag so a subsequent re-activation of the same session
// does not fire finalize again without a fresh summary.
session.lastSummaryStored = false;
// Ensure the session is removed from the active-sessions map so the
// Stop-hook path doesn't depend on a later idle-timeout tick.
this.sessionManager.removeSessionImmediate(sessionDbId);
return;
}
} catch (err) {
logger.warn('SESSION', 'finalizeSession failed in SessionRoutes generator .finally()', {
sessionId: sessionDbId
}, err as Error);
}
// Crash recovery: If not aborted and still has work, restart (with limit)
if (!wasAborted) {
const pendingStore = this.sessionManager.getPendingMessageStore();
@@ -353,16 +317,34 @@ export class SessionRoutes extends BaseRouteHandler {
session.consecutiveRestarts = (session.consecutiveRestarts || 0) + 1; // Keep for logging
if (!restartAllowed) {
logger.error('SESSION', `CRITICAL: Restart guard tripped — too many restarts in window, stopping to prevent runaway costs`, {
logger.error('SESSION', `CRITICAL: Restart guard tripped — session is dead, draining pending messages and terminating`, {
sessionId: sessionDbId,
pendingCount,
restartsInWindow: session.restartGuard.restartsInWindow,
windowMs: session.restartGuard.windowMs,
maxRestarts: session.restartGuard.maxRestarts,
action: 'Generator will NOT restart. Check logs for root cause. Messages remain in pending state.'
consecutiveFailures: session.restartGuard.consecutiveFailuresSinceSuccess,
maxConsecutiveFailures: session.restartGuard.maxConsecutiveFailures,
action: 'Generator will NOT restart. Pending messages drained to abandoned. Check logs for root cause.'
});
// Don't restart - abort to prevent further API calls
// Don't restart - abort to prevent further API calls AND drain pending
// messages so the session doesn't reappear in getSessionsWithPendingMessages
// and trigger another auto-start cycle.
session.abortController.abort();
try {
const drained = pendingStore.transitionMessagesTo('abandoned', { sessionDbId });
if (drained > 0) {
logger.error('SESSION', 'Drained pending messages to abandoned after restart guard trip', {
sessionId: sessionDbId,
drained,
});
}
} catch (drainErr) {
const normalized = drainErr instanceof Error ? drainErr : new Error(String(drainErr));
logger.error('SESSION', 'Failed to drain pending messages after restart guard trip', {
sessionId: sessionDbId,
}, normalized);
}
return;
}
@@ -371,7 +353,9 @@ export class SessionRoutes extends BaseRouteHandler {
pendingCount,
consecutiveRestarts: session.consecutiveRestarts,
restartsInWindow: session.restartGuard!.restartsInWindow,
maxRestarts: session.restartGuard!.maxRestarts
maxRestarts: session.restartGuard!.maxRestarts,
consecutiveFailures: session.restartGuard!.consecutiveFailuresSinceSuccess,
maxConsecutiveFailures: session.restartGuard!.maxConsecutiveFailures
});
// Abort OLD controller before replacing to prevent child process leaks
@@ -411,21 +395,106 @@ export class SessionRoutes extends BaseRouteHandler {
setupRoutes(app: express.Application): void {
// Legacy session endpoints (use sessionDbId)
app.post('/sessions/:sessionDbId/init', this.handleSessionInit.bind(this));
app.post('/sessions/:sessionDbId/observations', this.handleObservations.bind(this));
app.post('/sessions/:sessionDbId/summarize', this.handleSummarize.bind(this));
app.post(
'/sessions/:sessionDbId/init',
validateBody(SessionRoutes.legacySessionInitSchema),
this.handleSessionInit.bind(this)
);
app.post(
'/sessions/:sessionDbId/observations',
validateBody(SessionRoutes.legacyObservationsSchema),
this.handleObservations.bind(this)
);
app.post(
'/sessions/:sessionDbId/summarize',
validateBody(SessionRoutes.legacySummarizeSchema),
this.handleSummarize.bind(this)
);
app.get('/sessions/:sessionDbId/status', this.handleSessionStatus.bind(this));
app.delete('/sessions/:sessionDbId', this.handleSessionDelete.bind(this));
app.post('/sessions/:sessionDbId/complete', this.handleSessionComplete.bind(this));
// New session endpoints (use contentSessionId)
app.post('/api/sessions/init', this.handleSessionInitByClaudeId.bind(this));
app.post('/api/sessions/observations', this.handleObservationsByClaudeId.bind(this));
app.post('/api/sessions/summarize', this.handleSummarizeByClaudeId.bind(this));
app.post('/api/sessions/complete', this.handleCompleteByClaudeId.bind(this));
app.post(
'/api/sessions/init',
validateBody(SessionRoutes.sessionInitByClaudeIdSchema),
this.handleSessionInitByClaudeId.bind(this)
);
app.post(
'/api/sessions/observations',
validateBody(SessionRoutes.observationsByClaudeIdSchema),
this.handleObservationsByClaudeId.bind(this)
);
app.post(
'/api/sessions/summarize',
validateBody(SessionRoutes.summarizeByClaudeIdSchema),
this.handleSummarizeByClaudeId.bind(this)
);
app.post(
'/api/sessions/complete',
validateBody(SessionRoutes.completeByClaudeIdSchema),
this.handleCompleteByClaudeId.bind(this)
);
app.get('/api/sessions/status', this.handleStatusByClaudeId.bind(this));
}
// Plan 06 Phase 3 — per-route Zod schemas. Schemas live at the top of the
// owning route file and gate body validation via `validateBody`.
// `passthrough()` preserves optional/forwarded fields the handlers
// already accept (e.g. cwd, agentId, agentType, platformSource).
private static readonly legacySessionInitSchema = z.object({
userPrompt: z.string().optional(),
promptNumber: z.number().int().optional(),
}).passthrough();
private static readonly legacyObservationsSchema = z.object({
tool_name: z.string().min(1),
tool_input: z.unknown().optional(),
tool_response: z.unknown().optional(),
prompt_number: z.number().int().optional(),
cwd: z.string().optional(),
}).passthrough();
private static readonly legacySummarizeSchema = z.object({
last_assistant_message: z.string().optional(),
}).passthrough();
private static readonly sessionInitByClaudeIdSchema = z.object({
contentSessionId: z.string().min(1),
project: z.string().optional(),
prompt: z.string().optional(),
platformSource: z.string().optional(),
customTitle: z.string().optional(),
}).passthrough();
private static readonly observationsByClaudeIdSchema = z.object({
contentSessionId: z.string().min(1),
tool_name: z.string().min(1),
tool_input: z.unknown().optional(),
tool_response: z.unknown().optional(),
cwd: z.string().optional(),
agentId: z.string().optional(),
agentType: z.string().optional(),
platformSource: z.string().optional(),
// Idempotency key for the UNIQUE(content_session_id, tool_use_id) index
// added in Plan 01 Phase 1. Accept both snake and camel shapes so
// cross-process callers using either convention still deduplicate.
tool_use_id: z.string().optional(),
toolUseId: z.string().optional(),
}).passthrough();
private static readonly summarizeByClaudeIdSchema = z.object({
contentSessionId: z.string().min(1),
last_assistant_message: z.string().optional(),
agentId: z.string().optional(),
platformSource: z.string().optional(),
}).passthrough();
private static readonly completeByClaudeIdSchema = z.object({
contentSessionId: z.string().min(1),
platformSource: z.string().optional(),
}).passthrough();
/**
* Initialize a new session
*/
@@ -600,98 +669,40 @@ export class SessionRoutes extends BaseRouteHandler {
* Body: { contentSessionId, tool_name, tool_input, tool_response, cwd }
*/
private handleObservationsByClaudeId = this.wrapHandler((req: Request, res: Response): void => {
const { contentSessionId, tool_name, tool_input, tool_response, cwd, agentId, agentType } = req.body;
const platformSource = normalizePlatformSource(req.body.platformSource);
const project = typeof cwd === 'string' && cwd.trim() ? getProjectContext(cwd).primary : '';
if (!contentSessionId) {
return this.badRequest(res, 'Missing contentSessionId');
}
// Load skip tools from settings
const settings = SettingsDefaultsManager.loadFromFile(USER_SETTINGS_PATH);
const skipTools = new Set(settings.CLAUDE_MEM_SKIP_TOOLS.split(',').map(t => t.trim()).filter(Boolean));
// Skip low-value or meta tools
if (skipTools.has(tool_name)) {
logger.debug('SESSION', 'Skipping observation for tool', { tool_name });
res.json({ status: 'skipped', reason: 'tool_excluded' });
return;
}
// Skip meta-observations: file operations on session-memory files
const fileOperationTools = new Set(['Edit', 'Write', 'Read', 'NotebookEdit']);
if (fileOperationTools.has(tool_name) && tool_input) {
const filePath = tool_input.file_path || tool_input.notebook_path;
if (filePath && filePath.includes('session-memory')) {
logger.debug('SESSION', 'Skipping meta-observation for session-memory file', {
tool_name,
file_path: filePath
});
res.json({ status: 'skipped', reason: 'session_memory_meta' });
return;
}
}
const store = this.dbManager.getSessionStore();
let sessionDbId: number;
let promptNumber: number;
try {
sessionDbId = store.createSDKSession(contentSessionId, project, '', undefined, platformSource);
promptNumber = store.getPromptNumberFromUserPrompts(contentSessionId);
} catch (error) {
const normalizedError = error instanceof Error ? error : new Error(String(error));
logger.error('HTTP', 'Observation storage failed', { contentSessionId, tool_name }, normalizedError);
res.json({ stored: false, reason: normalizedError.message });
return;
}
// Privacy check: skip if user prompt was entirely private
const userPrompt = PrivacyCheckValidator.checkUserPromptPrivacy(
store,
const {
contentSessionId,
promptNumber,
'observation',
sessionDbId,
{ tool_name }
);
if (!userPrompt) {
res.json({ status: 'skipped', reason: 'private' });
return;
}
// Strip memory tags from tool_input and tool_response
const cleanedToolInput = tool_input !== undefined
? stripMemoryTagsFromJson(JSON.stringify(tool_input))
: '{}';
const cleanedToolResponse = tool_response !== undefined
? stripMemoryTagsFromJson(JSON.stringify(tool_response))
: '{}';
// Queue observation
this.sessionManager.queueObservation(sessionDbId, {
tool_name,
tool_input: cleanedToolInput,
tool_response: cleanedToolResponse,
prompt_number: promptNumber,
cwd: cwd || (() => {
logger.error('SESSION', 'Missing cwd when queueing observation in SessionRoutes', {
sessionId: sessionDbId,
tool_name
});
return '';
})(),
agentId: typeof agentId === 'string' ? agentId : undefined,
agentType: typeof agentType === 'string' ? agentType : undefined,
tool_input,
tool_response,
cwd,
platformSource,
agentId,
agentType,
tool_use_id,
toolUseId,
} = req.body;
const result = ingestObservation({
contentSessionId,
toolName: tool_name,
toolInput: tool_input,
toolResponse: tool_response,
cwd,
platformSource,
agentId,
agentType,
toolUseId: typeof tool_use_id === 'string' ? tool_use_id : (typeof toolUseId === 'string' ? toolUseId : undefined),
});
// Ensure SDK agent is running
this.ensureGeneratorRunning(sessionDbId, 'observation');
if (!result.ok) {
res.status(result.status ?? 500).json({ stored: false, reason: result.reason });
return;
}
// Broadcast observation queued event
this.eventBroadcaster.broadcastObservationQueued(sessionDbId);
if ('status' in result && result.status === 'skipped') {
res.json({ status: 'skipped', reason: result.reason });
return;
}
res.json({ status: 'queued' });
});
@@ -707,10 +718,6 @@ export class SessionRoutes extends BaseRouteHandler {
const { contentSessionId, last_assistant_message, agentId } = req.body;
const platformSource = normalizePlatformSource(req.body.platformSource);
if (!contentSessionId) {
return this.badRequest(res, 'Missing contentSessionId');
}
// Belt-and-suspenders: reject summarize requests from subagent context.
// Gate on agentId only — agentType alone indicates a main session started with
// --agent, which still owns its summary. Mirrors the hook-side guard in summarize.ts.
@@ -802,10 +809,6 @@ export class SessionRoutes extends BaseRouteHandler {
logger.info('HTTP', '→ POST /api/sessions/complete', { contentSessionId });
if (!contentSessionId) {
return this.badRequest(res, 'Missing contentSessionId');
}
const store = this.dbManager.getSessionStore();
// Look up sessionDbId from contentSessionId (createSDKSession is idempotent)
@@ -854,10 +857,25 @@ export class SessionRoutes extends BaseRouteHandler {
// Only contentSessionId is truly required — Cursor and other platforms
// may omit prompt/project in their payload (#838, #1049)
const project = req.body.project || 'unknown';
const prompt = req.body.prompt || '[media prompt]';
let prompt = req.body.prompt || '[media prompt]';
const platformSource = normalizePlatformSource(req.body.platformSource);
const customTitle = req.body.customTitle || undefined;
const promptByteLength = Buffer.byteLength(prompt, 'utf8');
if (promptByteLength > MAX_USER_PROMPT_BYTES) {
logger.warn('HTTP', 'SessionRoutes: oversized prompt truncated at session-init boundary', {
project,
contentSessionId,
promptByteLength,
maxBytes: MAX_USER_PROMPT_BYTES,
preview: prompt.slice(0, 200)
});
const buf = Buffer.from(prompt, 'utf8');
let end = MAX_USER_PROMPT_BYTES;
while (end > 0 && (buf[end] & 0xc0) === 0x80) end--;
prompt = buf.subarray(0, end).toString('utf8');
}
logger.info('HTTP', 'SessionRoutes: handleSessionInitByClaudeId called', {
contentSessionId,
project,
@@ -866,11 +884,6 @@ export class SessionRoutes extends BaseRouteHandler {
customTitle
});
// Validate required parameters
if (!this.validateRequired(req, res, ['contentSessionId'])) {
return;
}
const store = this.dbManager.getSessionStore();
// Step 1: Create/get SDK session (idempotent INSERT OR IGNORE)
@@ -6,6 +6,7 @@
*/
import express, { Request, Response } from 'express';
import { z } from 'zod';
import path from 'path';
import { readFileSync, writeFileSync, existsSync, renameSync, mkdirSync } from 'fs';
import { homedir } from 'os';
@@ -13,11 +14,27 @@ import { getPackageRoot } from '../../../../shared/paths.js';
import { logger } from '../../../../utils/logger.js';
import { SettingsManager } from '../../SettingsManager.js';
import { getBranchInfo, switchBranch, pullUpdates } from '../../BranchManager.js';
import { ModeManager } from '../../domain/ModeManager.js';
import { ModeManager } from '../../../domain/ModeManager.js';
import { BaseRouteHandler } from '../BaseRouteHandler.js';
import { validateBody } from '../middleware/validateBody.js';
import { SettingsDefaultsManager } from '../../../../shared/SettingsDefaultsManager.js';
import { clearPortCache } from '../../../../shared/worker-utils.js';
// Plan 06 Phase 3 — per-route Zod schemas. Semantic validation of individual
// CLAUDE_MEM_* keys still happens inside `validateSettings()` because the
// allowed-value rules are richer than what Zod expresses here.
const updateSettingsSchema = z.object({}).passthrough();
const toggleMcpSchema = z.object({
enabled: z.boolean(),
}).passthrough();
const switchBranchSchema = z.object({
branch: z.string().min(1),
}).passthrough();
const updateBranchSchema = z.object({}).passthrough();
export class SettingsRoutes extends BaseRouteHandler {
constructor(
private settingsManager: SettingsManager
@@ -28,16 +45,16 @@ export class SettingsRoutes extends BaseRouteHandler {
setupRoutes(app: express.Application): void {
// Settings endpoints
app.get('/api/settings', this.handleGetSettings.bind(this));
app.post('/api/settings', this.handleUpdateSettings.bind(this));
app.post('/api/settings', validateBody(updateSettingsSchema), this.handleUpdateSettings.bind(this));
// MCP toggle endpoints
app.get('/api/mcp/status', this.handleGetMcpStatus.bind(this));
app.post('/api/mcp/toggle', this.handleToggleMcp.bind(this));
app.post('/api/mcp/toggle', validateBody(toggleMcpSchema), this.handleToggleMcp.bind(this));
// Branch switching endpoints
app.get('/api/branch/status', this.handleGetBranchStatus.bind(this));
app.post('/api/branch/switch', this.handleSwitchBranch.bind(this));
app.post('/api/branch/update', this.handleUpdateBranch.bind(this));
app.post('/api/branch/switch', validateBody(switchBranchSchema), this.handleSwitchBranch.bind(this));
app.post('/api/branch/update', validateBody(updateBranchSchema), this.handleUpdateBranch.bind(this));
}
/**
@@ -156,12 +173,7 @@ export class SettingsRoutes extends BaseRouteHandler {
* Body: { enabled: boolean }
*/
private handleToggleMcp = this.wrapHandler((req: Request, res: Response): void => {
const { enabled } = req.body;
if (typeof enabled !== 'boolean') {
this.badRequest(res, 'enabled must be a boolean');
return;
}
const { enabled } = req.body as z.infer<typeof toggleMcpSchema>;
this.toggleMcp(enabled);
res.json({ success: true, enabled: this.isMcpEnabled() });
@@ -180,12 +192,7 @@ export class SettingsRoutes extends BaseRouteHandler {
* Body: { branch: "main" | "beta/7.0" }
*/
private handleSwitchBranch = this.wrapHandler(async (req: Request, res: Response): Promise<void> => {
const { branch } = req.body;
if (!branch) {
res.status(400).json({ success: false, error: 'Missing branch parameter' });
return;
}
const { branch } = req.body as z.infer<typeof switchBranchSchema>;
// Validate branch name
const allowedBranches = ['main', 'beta/7.0', 'feature/bun-executable'];
+39 -16
View File
@@ -15,6 +15,40 @@ import { DatabaseManager } from '../../DatabaseManager.js';
import { SessionManager } from '../../SessionManager.js';
import { BaseRouteHandler } from '../BaseRouteHandler.js';
/**
* Plan 06 Phase 6 viewer.html is loaded once at module init and held in
* memory for the lifetime of the worker process. Process restart is the
* cache-invalidation event; no fs.watch, no TTL, no refresh.
*
* We probe the same two on-disk locations the legacy handler did so the
* dev (cache) and installed (marketplace) layouts both keep working.
*/
const VIEWER_HTML_CANDIDATE_PATHS: readonly string[] = (() => {
const packageRoot = getPackageRoot();
return [
path.join(packageRoot, 'ui', 'viewer.html'),
path.join(packageRoot, 'plugin', 'ui', 'viewer.html'),
];
})();
const resolvedViewerHtmlPath: string | null =
VIEWER_HTML_CANDIDATE_PATHS.find((candidate) => existsSync(candidate)) ?? null;
const viewerHtmlBytes: Buffer | null = resolvedViewerHtmlPath
? readFileSync(resolvedViewerHtmlPath)
: null;
if (resolvedViewerHtmlPath) {
logger.info('SYSTEM', 'Cached viewer.html at boot', {
path: resolvedViewerHtmlPath,
bytes: viewerHtmlBytes!.byteLength,
});
} else {
logger.warn('SYSTEM', 'viewer.html not found at any expected location at boot', {
candidates: VIEWER_HTML_CANDIDATE_PATHS,
});
}
export class ViewerRoutes extends BaseRouteHandler {
constructor(
private sseBroadcaster: SSEBroadcaster,
@@ -49,26 +83,15 @@ export class ViewerRoutes extends BaseRouteHandler {
});
/**
* Serve viewer UI
* Serve viewer UI from the in-memory cache populated at module init.
* Plan 06 Phase 6 single read at boot, no per-request fs hit.
*/
private handleViewerUI = this.wrapHandler((req: Request, res: Response): void => {
const packageRoot = getPackageRoot();
// Try cache structure first (ui/viewer.html), then marketplace structure (plugin/ui/viewer.html)
const viewerPaths = [
path.join(packageRoot, 'ui', 'viewer.html'),
path.join(packageRoot, 'plugin', 'ui', 'viewer.html')
];
const viewerPath = viewerPaths.find(p => existsSync(p));
if (!viewerPath) {
if (!viewerHtmlBytes) {
throw new Error('Viewer UI not found at any expected location');
}
const html = readFileSync(viewerPath, 'utf-8');
res.setHeader('Content-Type', 'text/html');
res.send(html);
res.setHeader('Content-Type', 'text/html; charset=utf-8');
res.send(viewerHtmlBytes);
});
/**
+406
View File
@@ -0,0 +1,406 @@
/**
* Worker HTTP shared ingest helpers.
*
* Per PATHFINDER-2026-04-22 plan 03 phase 0:
* `ingestObservation`, `ingestPrompt`, `ingestSummary` are the single
* in-process implementation of the worker's three ingest paths. The HTTP
* route handlers (cross-process callers) and worker-internal producers
* (transcript processor, ResponseProcessor) BOTH delegate here.
*
* No HTTP loopback. No duplicated insert logic. One helper, N callers.
*
* Wiring: `WorkerService` registers its `sessionManager`, `dbManager`, and
* `sessionEventBroadcaster` once at startup via `setIngestContext`. The
* helpers fail fast if called before registration.
*/
import { logger } from '../../../utils/logger.js';
import type { SessionManager } from '../SessionManager.js';
import type { DatabaseManager } from '../DatabaseManager.js';
import type { SessionEventBroadcaster } from '../events/SessionEventBroadcaster.js';
import type { ParsedSummary } from '../../../sdk/parser.js';
import { stripMemoryTagsFromJson } from '../../../utils/tag-stripping.js';
import { isProjectExcluded } from '../../../utils/project-filter.js';
import { SettingsDefaultsManager } from '../../../shared/SettingsDefaultsManager.js';
import { USER_SETTINGS_PATH } from '../../../shared/paths.js';
import { getProjectContext } from '../../../utils/project-name.js';
import { normalizePlatformSource } from '../../../shared/platform-source.js';
import { PrivacyCheckValidator } from '../validation/PrivacyCheckValidator.js';
import { EventEmitter } from 'events';
// ============================================================================
// Event bus — Phase 2 (`summaryStoredEvent`) consumers attach here.
// ============================================================================
/**
* Event payload emitted exactly once per successful `ingestSummary` call that
* actually stored a summary row. `messageId` is the pending_messages row id
* that produced the summary; `sessionId` is the contentSessionId.
*
* Currently dormant the only consumer (the blocking `/api/session/end`
* endpoint) was removed when the Stop hook went fire-and-forget. Kept for
* future internal subscribers; emissions are cheap no-ops with no listeners.
*/
export interface SummaryStoredEvent {
sessionId: string;
messageId: number;
}
class IngestEventBus extends EventEmitter {
/**
* Recent summaryStoredEvent buffer keyed by sessionId. Originally protected
* the register-after-emit race for the blocking `/api/session/end` handler.
* Currently unused (handler removed when Stop hook went fire-and-forget);
* preserved so any future subscriber gets the same race-free contract.
*/
private readonly recentStored = new Map<string, { event: SummaryStoredEvent; at: number }>();
private static readonly RECENT_EVENT_TTL_MS = 60_000;
constructor() {
super();
// Disable the default 10-listener warning. With no current consumers
// this is moot, but kept for parity if future subscribers attach.
this.setMaxListeners(0);
this.on('summaryStoredEvent', (evt: SummaryStoredEvent) => {
this.recentStored.set(evt.sessionId, { event: evt, at: Date.now() });
this.evictExpiredStored();
});
}
/** Read a recently-emitted summaryStoredEvent (idempotent; TTL-evicted). */
takeRecentSummaryStored(sessionId: string): SummaryStoredEvent | undefined {
const entry = this.recentStored.get(sessionId);
if (!entry) return undefined;
if (Date.now() - entry.at > IngestEventBus.RECENT_EVENT_TTL_MS) {
this.recentStored.delete(sessionId);
return undefined;
}
return entry.event;
}
private evictExpiredStored(): void {
const cutoff = Date.now() - IngestEventBus.RECENT_EVENT_TTL_MS;
for (const [key, entry] of this.recentStored) {
if (entry.at < cutoff) this.recentStored.delete(key);
}
}
}
/**
* Process-local event bus for ingestion lifecycle events.
*
* Single Node EventEmitter there is no third event-bus in the worker.
* `SessionManager` already uses Node EventEmitter for queue notifications
* (`src/services/worker/SessionManager.ts:25`), and
* `SessionQueueProcessor` consumes EventEmitter events
* (`src/services/queue/SessionQueueProcessor.ts:18`); this module follows
* the same pattern at the ingestion layer.
*/
export const ingestEventBus = new IngestEventBus();
// ============================================================================
// Context registration
// ============================================================================
interface IngestContext {
sessionManager: SessionManager;
dbManager: DatabaseManager;
eventBroadcaster: SessionEventBroadcaster;
/** Optional callback to (re)start the SDK generator after enqueue. */
ensureGeneratorRunning?: (sessionDbId: number, source: string) => void;
}
let ctx: IngestContext | null = null;
/**
* Register the worker-scoped services the ingest helpers depend on.
* Called once from `WorkerService` constructor.
*/
export function setIngestContext(next: IngestContext): void {
ctx = next;
}
/**
* Attach the generator-running callback after `SessionRoutes` has been
* constructed. `setIngestContext` is called early in `WorkerService` startup
* (before routes exist), so the callback is wired in as a second step once
* `SessionRoutes.ensureGeneratorRunning` is available.
*
* Without this, transcript-watcher observations queue via
* `ingestObservation()` but the SDK generator never auto-starts to drain
* them.
*/
export function attachIngestGeneratorStarter(
ensureGeneratorRunning: (sessionDbId: number, source: string) => void,
): void {
requireContext().ensureGeneratorRunning = ensureGeneratorRunning;
}
function requireContext(): IngestContext {
if (!ctx) {
throw new Error('ingest helpers used before setIngestContext() — wiring bug');
}
return ctx;
}
// ============================================================================
// Result type
// ============================================================================
export type IngestResult =
| { ok: true; sessionDbId: number; messageId?: number }
| { ok: true; status: 'skipped'; reason: string }
| { ok: false; reason: string; status?: number };
// ============================================================================
// Observation
// ============================================================================
export interface ObservationPayload {
contentSessionId: string;
toolName: string;
toolInput: unknown;
toolResponse: unknown;
cwd?: string;
platformSource?: string;
agentId?: string;
agentType?: string;
toolUseId?: string;
}
/**
* Ingest an observation: resolve session, apply project / skip-tool filters,
* strip privacy tags, persist to pending_messages, ensure the SDK generator
* is running.
*
* Same implementation for cross-process HTTP callers and worker-internal
* callers (transcript processor, ResponseProcessor side-effects).
*/
export function ingestObservation(payload: ObservationPayload): IngestResult {
const { sessionManager, dbManager, eventBroadcaster, ensureGeneratorRunning } = requireContext();
if (!payload.contentSessionId) {
return { ok: false, reason: 'missing contentSessionId', status: 400 };
}
if (!payload.toolName) {
return { ok: false, reason: 'missing toolName', status: 400 };
}
const platformSource = normalizePlatformSource(payload.platformSource);
const cwd = typeof payload.cwd === 'string' ? payload.cwd : '';
const project = cwd.trim() ? getProjectContext(cwd).primary : '';
const settings = SettingsDefaultsManager.loadFromFile(USER_SETTINGS_PATH);
// Project exclusion (the same gate the hook handler applies).
if (cwd && isProjectExcluded(cwd, settings.CLAUDE_MEM_EXCLUDED_PROJECTS)) {
return { ok: true, status: 'skipped', reason: 'project_excluded' };
}
// Skip low-value or meta tools per user settings.
const skipTools = new Set(
settings.CLAUDE_MEM_SKIP_TOOLS.split(',').map(t => t.trim()).filter(Boolean)
);
if (skipTools.has(payload.toolName)) {
return { ok: true, status: 'skipped', reason: 'tool_excluded' };
}
// Skip meta-observations: file operations on session-memory files.
const fileOperationTools = new Set(['Edit', 'Write', 'Read', 'NotebookEdit']);
if (fileOperationTools.has(payload.toolName) && payload.toolInput && typeof payload.toolInput === 'object') {
const input = payload.toolInput as { file_path?: string; notebook_path?: string };
const filePath = input.file_path || input.notebook_path;
if (filePath && filePath.includes('session-memory')) {
return { ok: true, status: 'skipped', reason: 'session_memory_meta' };
}
}
const store = dbManager.getSessionStore();
let sessionDbId: number;
let promptNumber: number;
try {
sessionDbId = store.createSDKSession(payload.contentSessionId, project, '', undefined, platformSource);
promptNumber = store.getPromptNumberFromUserPrompts(payload.contentSessionId);
} catch (error) {
const message = error instanceof Error ? error.message : String(error);
logger.error('INGEST', 'Observation session resolution failed', {
contentSessionId: payload.contentSessionId,
toolName: payload.toolName,
}, error instanceof Error ? error : new Error(message));
return { ok: false, reason: message, status: 500 };
}
// Privacy: skip if user prompt was entirely private.
const userPrompt = PrivacyCheckValidator.checkUserPromptPrivacy(
store,
payload.contentSessionId,
promptNumber,
'observation',
sessionDbId,
{ tool_name: payload.toolName }
);
if (!userPrompt) {
return { ok: true, status: 'skipped', reason: 'private' };
}
const cleanedToolInput = payload.toolInput !== undefined
? stripMemoryTagsFromJson(JSON.stringify(payload.toolInput))
: '{}';
const cleanedToolResponse = payload.toolResponse !== undefined
? stripMemoryTagsFromJson(JSON.stringify(payload.toolResponse))
: '{}';
sessionManager.queueObservation(sessionDbId, {
tool_name: payload.toolName,
tool_input: cleanedToolInput,
tool_response: cleanedToolResponse,
prompt_number: promptNumber,
cwd: cwd || (() => {
logger.error('INGEST', 'Missing cwd when ingesting observation', {
sessionId: sessionDbId,
toolName: payload.toolName,
});
return '';
})(),
agentId: typeof payload.agentId === 'string' ? payload.agentId : undefined,
agentType: typeof payload.agentType === 'string' ? payload.agentType : undefined,
// Forward the provider-assigned tool-use id so the
// UNIQUE(content_session_id, tool_use_id) idempotency index from Plan 01
// can actually collapse replays. SQLite treats NULL tool_use_id values as
// distinct, so dropping it here silently defeats the INSERT OR IGNORE.
toolUseId: typeof payload.toolUseId === 'string' ? payload.toolUseId : undefined,
});
ensureGeneratorRunning?.(sessionDbId, 'observation');
eventBroadcaster.broadcastObservationQueued(sessionDbId);
return { ok: true, sessionDbId };
}
// ============================================================================
// Summary (queue side — agent processes the request asynchronously)
// ============================================================================
export interface PromptPayload {
contentSessionId: string;
/** The user prompt text (must not contain stripped tags). */
prompt: string;
cwd?: string;
platformSource?: string;
promptNumber?: number;
}
/**
* Ingest a user prompt. Used by the SessionStart / UserPromptSubmit hooks and
* by transcript-driven session inits. Wraps `SessionStore.appendUserPrompt`
* so cross-process and in-process callers share the same path.
*/
export function ingestPrompt(payload: PromptPayload): IngestResult {
const { dbManager } = requireContext();
if (!payload.contentSessionId) {
return { ok: false, reason: 'missing contentSessionId', status: 400 };
}
if (typeof payload.prompt !== 'string') {
return { ok: false, reason: 'missing prompt text', status: 400 };
}
const platformSource = normalizePlatformSource(payload.platformSource);
const cwd = typeof payload.cwd === 'string' ? payload.cwd : '';
const project = cwd.trim() ? getProjectContext(cwd).primary : '';
try {
const store = dbManager.getSessionStore();
const sessionDbId = store.createSDKSession(payload.contentSessionId, project, payload.prompt, undefined, platformSource);
return { ok: true, sessionDbId };
} catch (error) {
const message = error instanceof Error ? error.message : String(error);
return { ok: false, reason: message, status: 500 };
}
}
// ============================================================================
// Summary
// ============================================================================
/**
* Two shapes of ingest:
* - "queue a summarize request" (cross-process hook trigger): goes via
* `SessionManager.queueSummarize` so the SDK agent will produce the XML
* payload on its next iteration.
* - "the SDK agent already produced the parsed summary": goes via
* `ingestSummary({ parsed, sessionDbId, messageId })`. Stored synchronously,
* emits `summaryStoredEvent` for the blocking endpoint in plan 05.
*/
export type SummaryPayload =
| {
kind: 'queue';
contentSessionId: string;
lastAssistantMessage?: string;
platformSource?: string;
cwd?: string;
}
| {
kind: 'parsed';
sessionDbId: number;
messageId: number;
contentSessionId: string;
parsed: ParsedSummary;
};
export function ingestSummary(payload: SummaryPayload): IngestResult {
// The 'parsed' branch is a pure post-store notification — it only touches
// the module-scope event bus, not the database/session manager. Resolving
// requireContext() before the branch split breaks unit tests that drive
// ResponseProcessor with a mocked sessionManager but no setIngestContext.
// Only the 'queue' branch needs the worker-internal context.
if (payload.kind === 'queue') {
const { sessionManager, dbManager, ensureGeneratorRunning } = requireContext();
if (!payload.contentSessionId) {
return { ok: false, reason: 'missing contentSessionId', status: 400 };
}
const platformSource = normalizePlatformSource(payload.platformSource);
const cwd = typeof payload.cwd === 'string' ? payload.cwd : '';
const project = cwd.trim() ? getProjectContext(cwd).primary : '';
let sessionDbId: number;
try {
sessionDbId = dbManager.getSessionStore().createSDKSession(payload.contentSessionId, project, '', undefined, platformSource);
} catch (error) {
const message = error instanceof Error ? error.message : String(error);
return { ok: false, reason: message, status: 500 };
}
sessionManager.queueSummarize(sessionDbId, payload.lastAssistantMessage);
ensureGeneratorRunning?.(sessionDbId, 'summarize');
return { ok: true, sessionDbId };
}
// kind === 'parsed' — the SDK agent has produced a summary; store via
// session store and emit the summaryStoredEvent for blocking consumers.
// Skipped summaries (`<skip_summary/>`) are recorded as a successful no-op:
// they have no content to persist, but consumers should still be unblocked.
if (payload.parsed.skipped) {
ingestEventBus.emit('summaryStoredEvent', {
sessionId: payload.contentSessionId,
messageId: payload.messageId,
} satisfies SummaryStoredEvent);
return { ok: true, sessionDbId: payload.sessionDbId, messageId: payload.messageId };
}
// The actual storage of the parsed summary remains co-transactional with
// the observation batch in `processAgentResponse`. By the time this branch
// is reached the row is already persisted; this call is the canonical
// post-store notification path so every producer fires the event the same
// way (Plan 03 Phase 2 + greploop fix — sole emitter of summaryStoredEvent).
ingestEventBus.emit('summaryStoredEvent', {
sessionId: payload.contentSessionId,
messageId: payload.messageId,
} satisfies SummaryStoredEvent);
return { ok: true, sessionDbId: payload.sessionDbId, messageId: payload.messageId };
}
+10 -10
View File
@@ -6,7 +6,7 @@
*/
import { logger } from '../../../utils/logger.js';
import type { ObservationRecord } from '../../../types/database.js';
import type { ObservationSearchResult } from '../../sqlite/types.js';
import type { SessionStore } from '../../sqlite/SessionStore.js';
import type { SearchOrchestrator } from '../search/SearchOrchestrator.js';
import { CorpusRenderer } from './CorpusRenderer.js';
@@ -121,19 +121,19 @@ export class CorpusBuilder {
}
/**
* Map a raw ObservationRecord (with JSON string fields) to a CorpusObservation
* Map a raw ObservationSearchResult (with JSON string fields) to a CorpusObservation
*/
private mapObservationToCorpus(row: ObservationRecord): CorpusObservation {
private mapObservationToCorpus(row: ObservationSearchResult): CorpusObservation {
return {
id: row.id,
type: row.type,
title: (row as any).title || '',
subtitle: (row as any).subtitle || null,
narrative: (row as any).narrative || null,
facts: safeParseJsonArray((row as any).facts),
concepts: safeParseJsonArray((row as any).concepts),
files_read: safeParseJsonArray((row as any).files_read),
files_modified: safeParseJsonArray((row as any).files_modified),
title: row.title || '',
subtitle: row.subtitle || null,
narrative: row.narrative || null,
facts: safeParseJsonArray(row.facts),
concepts: safeParseJsonArray(row.concepts),
files_read: safeParseJsonArray(row.files_read),
files_modified: safeParseJsonArray(row.files_modified),
project: row.project,
created_at: row.created_at,
created_at_epoch: row.created_at_epoch,
+18 -10
View File
@@ -33,7 +33,13 @@ export class ResultFormatter {
if (totalResults === 0) {
if (chromaFailed) {
return this.formatChromaFailureMessage();
// Legacy callers route through here without a specific reason; surface a
// generic non-connection failure so users still get the diagnostic pointer
// instead of the old "install uv" lie.
return ResultFormatter.formatChromaFailureMessage({
message: 'unknown error (no reason captured by caller)',
isConnectionError: false,
});
}
return `No results found matching "${query}"`;
}
@@ -270,16 +276,18 @@ export class ResultFormatter {
}
/**
* Format Chroma failure message
* Format Chroma failure message with the real underlying error.
*
* Static so callers (e.g. SearchManager) can format without needing
* an instance. The message intentionally surfaces the raw error text
* and points users at /api/chroma/status?deep=1 for diagnostics
* never a static "install uv" instruction (which lies about the cause).
*/
private formatChromaFailureMessage(): string {
return `Vector search failed - semantic search unavailable.
To enable semantic search:
1. Install uv: https://docs.astral.sh/uv/getting-started/installation/
2. Restart the worker: npm run worker:restart
Note: You can still use filter-only searches (date ranges, types, files) without a query term.`;
static formatChromaFailureMessage(reason: { message: string; isConnectionError: boolean }): string {
if (reason.isConnectionError) {
return `Semantic search is offline (Chroma MCP unreachable: ${reason.message}). Falling back to keyword search; results may be incomplete. Run \`/api/chroma/status?deep=1\` to diagnose.`;
}
return `Semantic search failed: ${reason.message}. Falling back to keyword search; results may be incomplete. Check \`~/.claude-mem/logs/\` for the CHROMA_SYNC entry. Run \`/api/chroma/status?deep=1\` for a deeper probe.`;
}
/**
@@ -30,6 +30,7 @@ import type {
SearchResults,
ObservationSearchResult
} from './types.js';
import { ChromaUnavailableError } from './errors.js';
import { logger } from '../../../utils/logger.js';
/**
@@ -88,34 +89,27 @@ export class SearchOrchestrator {
}
// PATH 2: CHROMA SEMANTIC SEARCH (query text + Chroma available)
// Fail-fast: if Chroma errors, ChromaSearchStrategy now lets the error
// propagate. We catch it here only to translate into a typed 503.
if (this.chromaStrategy) {
logger.debug('SEARCH', 'Orchestrator: Using Chroma semantic search', {});
const result = await this.chromaStrategy.search(options);
// If Chroma succeeded (even with 0 results), return
if (result.usedChroma) {
return result;
try {
return await this.chromaStrategy.search(options);
} catch (error) {
const errorObj = error instanceof Error ? error : new Error(String(error));
throw new ChromaUnavailableError(
`Chroma query failed: ${errorObj.message}`,
errorObj
);
}
// Chroma failed - fall back to SQLite for filter-only
logger.debug('SEARCH', 'Orchestrator: Chroma failed, falling back to SQLite', {});
const fallbackResult = await this.sqliteStrategy.search({
...options,
query: undefined // Remove query for SQLite fallback
});
return {
...fallbackResult,
fellBack: true
};
}
// PATH 3: No Chroma available
logger.debug('SEARCH', 'Orchestrator: Chroma not available', {});
// PATH 3: Chroma not configured (explicitly uninitialized at construction).
// This is a legitimate config state — return empty results, not an error.
logger.debug('SEARCH', 'Orchestrator: Chroma not configured', {});
return {
results: { observations: [], sessions: [], prompts: [] },
usedChroma: false,
fellBack: false,
strategy: 'sqlite'
};
}
@@ -130,12 +124,11 @@ export class SearchOrchestrator {
return await this.hybridStrategy.findByConcept(concept, options);
}
// Fallback to SQLite
// Chroma not configured: SQLite metadata-only result.
const results = this.sqliteStrategy.findByConcept(concept, options);
return {
results: { observations: results, sessions: [], prompts: [] },
usedChroma: false,
fellBack: false,
strategy: 'sqlite'
};
}
@@ -150,12 +143,11 @@ export class SearchOrchestrator {
return await this.hybridStrategy.findByType(type, options);
}
// Fallback to SQLite
// Chroma not configured: SQLite metadata-only result.
const results = this.sqliteStrategy.findByType(type, options);
return {
results: { observations: results, sessions: [], prompts: [] },
usedChroma: false,
fellBack: false,
strategy: 'sqlite'
};
}
+16
View File
@@ -0,0 +1,16 @@
/**
* Search-related error classes
*/
import { AppError } from '../../server/ErrorHandler.js';
/**
* Thrown when Chroma is expected to be available but failed at query time.
* Maps to HTTP 503 Service Unavailable.
*/
export class ChromaUnavailableError extends AppError {
constructor(message: string, cause?: Error) {
super(message, 503, 'CHROMA_UNAVAILABLE', cause ? { cause: cause.message } : undefined);
this.name = 'ChromaUnavailableError';
}
}
@@ -59,31 +59,16 @@ export class ChromaSearchStrategy extends BaseSearchStrategy implements SearchSt
const searchSessions = searchType === 'all' || searchType === 'sessions';
const searchPrompts = searchType === 'all' || searchType === 'prompts';
let observations: ObservationSearchResult[] = [];
let sessions: SessionSummarySearchResult[] = [];
let prompts: UserPromptSearchResult[] = [];
// Build Chroma where filter for doc_type and project
const whereFilter = this.buildWhereFilter(searchType, project);
logger.debug('SEARCH', 'ChromaSearchStrategy: Querying Chroma', { query, searchType });
try {
return await this.executeChromaSearch(query, whereFilter, {
searchObservations, searchSessions, searchPrompts,
obsType, concepts, files, orderBy, limit, project
});
} catch (error) {
const errorObj = error instanceof Error ? error : new Error(String(error));
logger.error('WORKER', 'ChromaSearchStrategy: Search failed', {}, errorObj);
// Return empty result - caller may try fallback strategy
return {
results: { observations: [], sessions: [], prompts: [] },
usedChroma: false,
fellBack: false,
strategy: 'chroma'
};
}
// Fail-fast: errors propagate to orchestrator, which translates to HTTP 503.
return await this.executeChromaSearch(query, whereFilter, {
searchObservations, searchSessions, searchPrompts,
obsType, concepts, files, orderBy, limit, project
});
}
private async executeChromaSearch(
@@ -111,7 +96,6 @@ export class ChromaSearchStrategy extends BaseSearchStrategy implements SearchSt
return {
results: { observations: [], sessions: [], prompts: [] },
usedChroma: true,
fellBack: false,
strategy: 'chroma'
};
}
@@ -123,27 +107,31 @@ export class ChromaSearchStrategy extends BaseSearchStrategy implements SearchSt
let sessions: SessionSummarySearchResult[] = [];
let prompts: UserPromptSearchResult[] = [];
// Chroma already ranks by vector similarity; 'relevance' has no SQL
// equivalent, so drop it before hydrating rows from SessionStore.
const sqlOrderBy: 'date_desc' | 'date_asc' | undefined =
options.orderBy === 'relevance' ? undefined : options.orderBy;
if (categorized.obsIds.length > 0) {
const obsOptions = { type: options.obsType, concepts: options.concepts, files: options.files, orderBy: options.orderBy, limit: options.limit, project: options.project };
const obsOptions = { type: options.obsType, concepts: options.concepts, files: options.files, orderBy: sqlOrderBy, limit: options.limit, project: options.project };
observations = this.sessionStore.getObservationsByIds(categorized.obsIds, obsOptions);
}
if (categorized.sessionIds.length > 0) {
sessions = this.sessionStore.getSessionSummariesByIds(categorized.sessionIds, {
orderBy: options.orderBy, limit: options.limit, project: options.project
orderBy: sqlOrderBy, limit: options.limit, project: options.project
});
}
if (categorized.promptIds.length > 0) {
prompts = this.sessionStore.getUserPromptsByIds(categorized.promptIds, {
orderBy: options.orderBy, limit: options.limit, project: options.project
orderBy: sqlOrderBy, limit: options.limit, project: options.project
});
}
return {
results: { observations, sessions, prompts },
usedChroma: true,
fellBack: false,
strategy: 'chroma'
};
}
@@ -79,20 +79,8 @@ export class HybridSearchStrategy extends BaseSearchStrategy implements SearchSt
const ids = metadataResults.map(obs => obs.id);
try {
return await this.rankAndHydrate(concept, ids, limit);
} catch (error) {
const errorObj = error instanceof Error ? error : new Error(String(error));
logger.error('WORKER', 'HybridSearchStrategy: findByConcept failed', {}, errorObj);
// Fall back to metadata-only results
const results = this.sessionSearch.findByConcept(concept, filterOptions);
return {
results: { observations: results, sessions: [], prompts: [] },
usedChroma: false,
fellBack: true,
strategy: 'hybrid'
};
}
// Fail-fast: Chroma errors propagate to orchestrator (HTTP 503).
return await this.rankAndHydrate(concept, ids, limit);
}
/**
@@ -117,19 +105,8 @@ export class HybridSearchStrategy extends BaseSearchStrategy implements SearchSt
const ids = metadataResults.map(obs => obs.id);
try {
return await this.rankAndHydrate(typeStr, ids, limit);
} catch (error) {
const errorObj = error instanceof Error ? error : new Error(String(error));
logger.error('WORKER', 'HybridSearchStrategy: findByType failed', {}, errorObj);
const results = this.sessionSearch.findByType(type as any, filterOptions);
return {
results: { observations: results, sessions: [], prompts: [] },
usedChroma: false,
fellBack: true,
strategy: 'hybrid'
};
}
// Fail-fast: Chroma errors propagate to orchestrator (HTTP 503).
return await this.rankAndHydrate(typeStr, ids, limit);
}
/**
@@ -158,18 +135,8 @@ export class HybridSearchStrategy extends BaseSearchStrategy implements SearchSt
const ids = metadataResults.observations.map(obs => obs.id);
try {
return await this.rankAndHydrateForFile(filePath, ids, limit, sessions);
} catch (error) {
const errorObj = error instanceof Error ? error : new Error(String(error));
logger.error('WORKER', 'HybridSearchStrategy: findByFile failed', {}, errorObj);
const results = this.sessionSearch.findByFile(filePath, filterOptions);
return {
observations: results.observations,
sessions: results.sessions,
usedChroma: false
};
}
// Fail-fast: Chroma errors propagate to orchestrator (HTTP 503).
return await this.rankAndHydrateForFile(filePath, ids, limit, sessions);
}
private async rankAndHydrate(
@@ -191,7 +158,6 @@ export class HybridSearchStrategy extends BaseSearchStrategy implements SearchSt
return {
results: { observations, sessions: [], prompts: [] },
usedChroma: true,
fellBack: false,
strategy: 'hybrid'
};
}
@@ -98,7 +98,6 @@ export class SQLiteSearchStrategy extends BaseSearchStrategy implements SearchSt
return {
results: { observations, sessions, prompts },
usedChroma: false,
fellBack: false,
strategy: 'sqlite'
};
}
@@ -54,7 +54,6 @@ export abstract class BaseSearchStrategy implements SearchStrategy {
prompts: []
},
usedChroma: strategy === 'chroma' || strategy === 'hybrid',
fellBack: false,
strategy
};
}
-2
View File
@@ -103,8 +103,6 @@ export interface StrategySearchResult {
results: SearchResults;
/** Whether Chroma was used successfully */
usedChroma: boolean;
/** Whether fallback was triggered */
fellBack: boolean;
/** Strategy that produced the results */
strategy: SearchStrategyHint;
}
@@ -57,7 +57,7 @@ export class SessionCompletionHandler {
// completed session would never be picked up again.
try {
const pendingStore = this.sessionManager.getPendingMessageStore();
const drainedCount = pendingStore.markAllSessionMessagesAbandoned(sessionDbId);
const drainedCount = pendingStore.transitionMessagesTo('abandoned', { sessionDbId });
if (drainedCount > 0) {
logger.warn('SESSION', `Drained ${drainedCount} orphaned pending messages on session finalize`, {
sessionId: sessionDbId, drainedCount
-17
View File
@@ -30,13 +30,6 @@ const BLOCKED_ENV_VARS = [
'CLAUDECODE', // Prevent "cannot be launched inside another Claude Code session" error
];
// Credential keys that claude-mem manages
export const MANAGED_CREDENTIAL_KEYS = [
'ANTHROPIC_API_KEY',
'GEMINI_API_KEY',
'OPENROUTER_API_KEY',
];
export interface ClaudeMemEnv {
// Credentials (optional - empty means use CLI billing for Claude)
ANTHROPIC_API_KEY?: string;
@@ -269,16 +262,6 @@ export function getCredential(key: keyof ClaudeMemEnv): string | undefined {
return env[key];
}
/**
* Set a specific credential in claude-mem's .env
* Pass empty string to remove the credential
*/
export function setCredential(key: keyof ClaudeMemEnv, value: string): void {
const env = loadClaudeMemEnv();
env[key] = value || undefined;
saveClaudeMemEnv(env);
}
/**
* Check if claude-mem has an Anthropic API key configured
* If false, it means CLI billing should be used
+3 -1
View File
@@ -56,6 +56,7 @@ export interface SettingsDefaults {
CLAUDE_MEM_TRANSCRIPTS_CONFIG_PATH: string; // Path to transcript watcher config JSON
// Process Management
CLAUDE_MEM_MAX_CONCURRENT_AGENTS: string; // Max concurrent Claude SDK agent subprocesses (default: 2)
CLAUDE_MEM_HOOK_FAIL_LOUD_THRESHOLD: string; // Plan 05 Phase 8 — consecutive hook→worker unreachable failures before exit code 2 (default: 3)
// Exclusion Settings
CLAUDE_MEM_EXCLUDED_PROJECTS: string; // Comma-separated glob patterns for excluded project paths
CLAUDE_MEM_FOLDER_MD_EXCLUDE: string; // JSON array of folder paths to exclude from CLAUDE.md generation
@@ -133,6 +134,7 @@ export class SettingsDefaultsManager {
CLAUDE_MEM_TRANSCRIPTS_CONFIG_PATH: join(homedir(), '.claude-mem', 'transcript-watch.json'),
// Process Management
CLAUDE_MEM_MAX_CONCURRENT_AGENTS: '2', // Max concurrent Claude SDK agent subprocesses
CLAUDE_MEM_HOOK_FAIL_LOUD_THRESHOLD: '3', // Plan 05 Phase 8 — escalate to exit code 2 after N consecutive worker-unreachable hook invocations
// Exclusion Settings
CLAUDE_MEM_EXCLUDED_PROJECTS: '', // Comma-separated glob patterns for excluded project paths
CLAUDE_MEM_FOLDER_MD_EXCLUDE: '[]', // JSON array of folder paths to exclude from CLAUDE.md generation
@@ -193,7 +195,7 @@ export class SettingsDefaultsManager {
* Handles both string 'true' and boolean true from JSON
*/
static getBool(key: keyof SettingsDefaults): boolean {
const value = this.get(key);
const value: unknown = this.get(key);
return value === 'true' || value === true;
}
+35
View File
@@ -0,0 +1,35 @@
/**
* Per-process settings cache for hook handlers.
*
* Plan 05 Phase 4 (PATHFINDER-2026-04-22): each hook process is short-lived,
* but multiple handlers within a single hook invocation independently call
* `SettingsDefaultsManager.loadFromFile(USER_SETTINGS_PATH)` and re-read the
* settings file from disk. Settings cannot mutate during a single hook
* invocation, so we memoize the first read for the lifetime of the process.
*
* One helper, N callers (Principle 6). Every hook handler that needs settings
* imports `loadFromFileOnce()` from here instead of calling
* `SettingsDefaultsManager.loadFromFile` directly.
*/
import {
SettingsDefaultsManager,
type SettingsDefaults,
} from './SettingsDefaultsManager.js';
import { USER_SETTINGS_PATH } from './paths.js';
let cachedSettings: SettingsDefaults | null = null;
/**
* Load settings from disk on first call, return the memoized value thereafter.
*
* Cache lifetime is the process hooks are short-lived (typically <1s), so a
* settings change made by the user is picked up the next time Claude Code
* spawns a hook process. There is no in-process invalidation API because there
* is no in-process mutation path.
*/
export function loadFromFileOnce(): SettingsDefaults {
if (cachedSettings !== null) return cachedSettings;
cachedSettings = SettingsDefaultsManager.loadFromFile(USER_SETTINGS_PATH);
return cachedSettings;
}
+44
View File
@@ -0,0 +1,44 @@
/**
* Single answer to "should this hook run for this cwd?"
*
* Plan 05 Phase 5 (PATHFINDER-2026-04-22): three handlers (observation,
* session-init, file-context) each duplicated the
* `loadFromFileOnce() → isProjectExcluded(cwd, settings.CLAUDE_MEM_EXCLUDED_PROJECTS)`
* pair. This module is the only entry point for that question; handlers call
* `shouldTrackProject(cwd)` and route through here.
*
* One helper, N callers (Principle 6). After this module lands, no handler
* references `isProjectExcluded` directly the import lives only here.
*/
import { relative, isAbsolute } from 'path';
import { isProjectExcluded } from '../utils/project-filter.js';
import { loadFromFileOnce } from './hook-settings.js';
import { OBSERVER_SESSIONS_DIR } from './paths.js';
function isWithin(child: string, parent: string): boolean {
if (child === parent) return true;
const rel = relative(parent, child);
return rel.length > 0 && !rel.startsWith('..') && !isAbsolute(rel);
}
/**
* @returns true when the project at `cwd` is NOT excluded from claude-mem
* tracking, i.e., the hook should proceed; false when the project
* matches one of the exclusion globs.
*
* Hard-excludes OBSERVER_SESSIONS_DIR: the SDK agent spawns Claude Code with
* that cwd, and its hooks must never feed the worker otherwise the observer's
* own init/continuation/summary prompts end up stored as `user_prompts` and
* leak into the viewer (meta-observation).
*/
export function shouldTrackProject(cwd: string): boolean {
if (!cwd) return true;
// path.relative handles separator differences (Windows '\\' vs POSIX '/')
// and trailing-slash variance, which a literal startsWith would miss.
if (isWithin(cwd, OBSERVER_SESSIONS_DIR)) {
return false;
}
const settings = loadFromFileOnce();
return !isProjectExcluded(cwd, settings.CLAUDE_MEM_EXCLUDED_PROJECTS);
}
+395 -21
View File
@@ -1,9 +1,17 @@
import path from "path";
import { readFileSync } from "fs";
import { readFileSync, existsSync, writeFileSync, renameSync, mkdirSync } from "fs";
import { spawn, execSync } from "child_process";
import { logger } from "../utils/logger.js";
import { HOOK_TIMEOUTS, getTimeout } from "./hook-constants.js";
import { HOOK_TIMEOUTS, HOOK_EXIT_CODES, getTimeout } from "./hook-constants.js";
import { SettingsDefaultsManager } from "./SettingsDefaultsManager.js";
import { MARKETPLACE_ROOT } from "./paths.js";
import { MARKETPLACE_ROOT, DATA_DIR } from "./paths.js";
import { loadFromFileOnce } from "./hook-settings.js";
// `validateWorkerPidFile` consults `captureProcessStartToken` at
// `src/supervisor/process-registry.ts` for PID-reuse detection (commit
// 99060bac). The lazy-spawn fast path below uses it to confirm a live port
// is owned by OUR worker incarnation rather than a stale PID squatting on
// the port after container restart.
import { validateWorkerPidFile } from "../supervisor/index.js";
// Named constants for health checks
// Allow env var override for users on slow systems (e.g., CLAUDE_MEM_HEALTH_TIMEOUT_MS=10000)
@@ -214,26 +222,392 @@ async function checkWorkerVersion(): Promise<void> {
/**
* Ensure worker service is running
* Quick health check - returns false if worker not healthy (doesn't block)
* Port might be in use by another process, or worker might not be started yet
* Resolve the absolute path to the worker-service script the hook should
* relaunch as a detached daemon. Hooks live in the plugin's `scripts/`
* directory next to `worker-service.cjs`; production and dev checkouts both
* ship the bundled CJS there. Returns null when no candidate exists on disk
* (partial install, build artifact missing).
*/
export async function ensureWorkerRunning(): Promise<boolean> {
// Quick health check (single attempt, no polling)
try {
if (await isWorkerHealthy()) {
await checkWorkerVersion(); // logs warning on mismatch, doesn't restart
return true; // Worker healthy
}
} catch (e) {
// Not healthy - log for debugging
logger.debug('SYSTEM', 'Worker health check failed', {
error: e instanceof Error ? e.message : String(e)
});
function resolveWorkerScriptPath(): string | null {
const candidates = [
path.join(MARKETPLACE_ROOT, 'plugin', 'scripts', 'worker-service.cjs'),
path.join(process.cwd(), 'plugin', 'scripts', 'worker-service.cjs'),
];
for (const candidate of candidates) {
if (existsSync(candidate)) return candidate;
}
return null;
}
// Port might be in use by something else, or worker not started
// Return false but don't throw - let caller decide how to handle
logger.warn('SYSTEM', 'Worker not healthy, hook will proceed gracefully');
/**
* Resolve the absolute path to the Bun runtime.
*
* Local to worker-utils.ts so the lazy-spawn path does not transitively
* import `services/infrastructure/ProcessManager.ts` that module pulls
* in `bun:sqlite` via `cwd-remap`, and pulling it in would break the NPX
* CLI bundle which must run under plain Node (no Bun). The worker daemon
* itself requires Bun (it uses bun:sqlite directly); this lookup finds
* the Bun binary that the daemon will execute under.
*/
function resolveBunRuntime(): string | null {
if (process.env.BUN && existsSync(process.env.BUN)) return process.env.BUN;
try {
const cmd = process.platform === 'win32' ? 'where bun' : 'which bun';
const output = execSync(cmd, {
stdio: ['ignore', 'pipe', 'ignore'],
encoding: 'utf-8',
windowsHide: true,
});
const firstMatch = output
.split(/\r?\n/)
.map(line => line.trim())
.find(line => line.length > 0);
return firstMatch || null;
} catch {
return null;
}
}
/**
* Wait for the worker port to open, using exponential backoff.
*
* Deliberately hand-rolled `respawn` or similar npm helpers add a
* supervisor semantic layer we do not want here (Principle 6). The retry
* policy is three attempts with 250ms 500ms 1000ms backoff, which is
* enough to cover the worker's start-up (~1-2s on a warm cache, slower on
* Windows) without blocking a hook for long when the spawn outright failed.
*/
async function waitForWorkerPort(options: { attempts: number; backoffMs: number }): Promise<boolean> {
let delayMs = options.backoffMs;
for (let attempt = 1; attempt <= options.attempts; attempt++) {
if (await isWorkerPortAlive()) return true;
if (attempt < options.attempts) {
await new Promise<void>(resolve => setTimeout(resolve, delayMs));
delayMs *= 2;
}
}
return false;
}
/**
* Is the worker port owned by a live worker we recognize?
*
* Two gates:
* 1. HTTP /api/health returns 200, AND
* 2. PID-file start-token check (via `validateWorkerPidFile`
* `captureProcessStartToken`) confirms the recorded PID has not been
* reused by a different process since the file was written.
*
* When the PID file is missing we accept a healthy HTTP response on its own
* the file is written by the worker itself after `listen()` succeeds, so
* a brief window exists during which a freshly-spawned worker is reachable
* via HTTP but has not yet persisted its PID record. Treating this as
* "not ours" would cause the hook to double-spawn in a race with the
* worker's own PID-file write.
*
* An 'alive' status that fails identity verification is treated as dead so
* the caller falls through to the spawn path (Phase 8 contract).
*/
async function isWorkerPortAlive(): Promise<boolean> {
let healthy: boolean;
try {
healthy = await isWorkerHealthy();
} catch (error: unknown) {
logger.debug('SYSTEM', 'Worker health check threw', {
error: error instanceof Error ? error.message : String(error),
});
return false;
}
if (!healthy) return false;
const pidStatus = validateWorkerPidFile({ logAlive: false });
if (pidStatus === 'missing') return true; // race: listening before PID file written
if (pidStatus === 'alive') return true; // identity verified via start-token
return false; // 'stale' | 'invalid' — PID reused
}
/**
* Lazy-spawn the worker if it is not already running, then wait for its port.
*
* Flow:
* 1. If the port is alive AND verified as ours, return true (fast path).
* 2. Otherwise, resolve the bun runtime + worker script path.
* 3. Spawn detached, `unref()` so the hook's exit does not take the worker
* down with it (the worker lives as its own independent daemon).
* 4. Wait for the port to come up, up to 3 attempts with exponential
* backoff (250ms 500ms 1000ms ~1.75s total).
*
* PID-reuse safety is inherited from `validateWorkerPidFile` (commit
* 99060bac) see the `isWorkerPortAlive` comment above. There is no
* auto-restart loop; failure is reported via the return value so the hook
* can surface it through exit code 2 (Principle 2 fail-fast).
*/
export async function ensureWorkerRunning(): Promise<boolean> {
if (await isWorkerPortAlive()) {
await checkWorkerVersion();
return true;
}
const runtimePath = resolveBunRuntime();
const scriptPath = resolveWorkerScriptPath();
if (!runtimePath) {
logger.warn('SYSTEM', 'Cannot lazy-spawn worker: Bun runtime not found on PATH');
return false;
}
if (!scriptPath) {
logger.warn('SYSTEM', 'Cannot lazy-spawn worker: worker-service.cjs not found in plugin/scripts');
return false;
}
logger.info('SYSTEM', 'Worker not running — lazy-spawning', { runtimePath, scriptPath });
try {
const proc = spawn(runtimePath, [scriptPath, '--daemon'], {
detached: true,
stdio: ['ignore', 'ignore', 'ignore'],
});
proc.unref();
} catch (error: unknown) {
if (error instanceof Error) {
logger.error('SYSTEM', 'Lazy-spawn of worker failed', { runtimePath, scriptPath }, error);
} else {
logger.error('SYSTEM', 'Lazy-spawn of worker failed (non-Error)', {
runtimePath, scriptPath, error: String(error),
});
}
return false;
}
const alive = await waitForWorkerPort({ attempts: 3, backoffMs: 250 });
if (!alive) {
logger.warn('SYSTEM', 'Worker port did not open after lazy-spawn within 3 attempts');
return false;
}
return true;
}
// ============================================================================
// Plan 05 Phase 9 — single per-process alive cache.
//
// One hook invocation may issue multiple worker requests (session-init issues
// several). The alive-state cannot change mid-invocation without the hook
// process exiting, so memoize the first result. By Principle 6 (one helper,
// N callers), this is the ONLY alive-state cache; all hook→worker call sites
// route through `executeWithWorkerFallback` (Phase 2) which calls this.
// ============================================================================
let aliveCache: boolean | null = null;
export async function ensureWorkerAliveOnce(): Promise<boolean> {
if (aliveCache !== null) return aliveCache;
aliveCache = await ensureWorkerRunning();
return aliveCache;
}
// ============================================================================
// Plan 05 Phase 8 — fail-loud counter.
//
// The counter records how many consecutive hook invocations have seen the
// worker unreachable. After N (default 3) consecutive failures, the next
// hook exits code 2 so Claude Code's hook contract surfaces the outage to
// Claude. Below N, hooks exit 0 to avoid breaking the user's session.
//
// This is NOT a retry. We do not reinvoke `ensureWorkerAliveOnce` or
// reattempt the HTTP request. We record the result of the one primary-path
// attempt and either return (graceful) or escalate (fail-loud).
//
// File: ~/.claude-mem/state/hook-failures.json
// Atomic write: tmp + rename (POSIX atomic within a filesystem).
// ============================================================================
interface HookFailureState {
consecutiveFailures: number;
lastFailureAt: number;
}
const FAIL_LOUD_DEFAULT_THRESHOLD = 3;
function getStateDir(): string {
return path.join(DATA_DIR, 'state');
}
function getHookFailuresPath(): string {
return path.join(getStateDir(), 'hook-failures.json');
}
function readHookFailureState(): HookFailureState {
try {
const raw = readFileSync(getHookFailuresPath(), 'utf-8');
const parsed = JSON.parse(raw) as Partial<HookFailureState>;
return {
consecutiveFailures: typeof parsed.consecutiveFailures === 'number' && Number.isFinite(parsed.consecutiveFailures)
? Math.max(0, Math.floor(parsed.consecutiveFailures))
: 0,
lastFailureAt: typeof parsed.lastFailureAt === 'number' && Number.isFinite(parsed.lastFailureAt)
? parsed.lastFailureAt
: 0,
};
} catch {
// Missing file or corrupt JSON → fresh state.
return { consecutiveFailures: 0, lastFailureAt: 0 };
}
}
function writeHookFailureStateAtomic(state: HookFailureState): void {
const stateDir = getStateDir();
const dest = getHookFailuresPath();
const tmp = `${dest}.tmp`;
try {
if (!existsSync(stateDir)) {
mkdirSync(stateDir, { recursive: true });
}
writeFileSync(tmp, JSON.stringify(state), 'utf-8');
renameSync(tmp, dest);
} catch (error: unknown) {
logger.debug('SYSTEM', 'Failed to persist hook-failure counter', {
error: error instanceof Error ? error.message : String(error),
});
}
}
function getFailLoudThreshold(): number {
try {
const settings = loadFromFileOnce();
const raw = settings.CLAUDE_MEM_HOOK_FAIL_LOUD_THRESHOLD;
const parsed = parseInt(raw, 10);
if (Number.isFinite(parsed) && parsed >= 1) return parsed;
} catch {
// settings unreadable — fall through to default
}
return FAIL_LOUD_DEFAULT_THRESHOLD;
}
/**
* Record a worker-unreachable hook invocation. Returns the new counter value.
* If the counter reaches the threshold, this function writes to stderr and
* exits the process with code 2 (blocking error per Claude Code hook contract).
*
* Not a retry does not reattempt the operation. The caller already ran the
* single primary-path attempt and got `false` from `ensureWorkerAliveOnce`.
*/
function recordWorkerUnreachable(): number {
const state = readHookFailureState();
const next: HookFailureState = {
consecutiveFailures: state.consecutiveFailures + 1,
lastFailureAt: Date.now(),
};
writeHookFailureStateAtomic(next);
const threshold = getFailLoudThreshold();
if (next.consecutiveFailures >= threshold) {
process.stderr.write(
`claude-mem worker unreachable for ${next.consecutiveFailures} consecutive hooks.\n`
);
process.exit(HOOK_EXIT_CODES.BLOCKING_ERROR);
}
return next.consecutiveFailures;
}
/**
* Reset the consecutive-failure counter. Called when the worker is alive,
* acknowledging that any prior outage has ended. Not a retry it is a
* success-path acknowledgement.
*/
function resetWorkerFailureCounter(): void {
const state = readHookFailureState();
if (state.consecutiveFailures === 0) return; // skip a no-op write
writeHookFailureStateAtomic({ consecutiveFailures: 0, lastFailureAt: 0 });
}
// ============================================================================
// Plan 05 Phase 2 — `executeWithWorkerFallback(url, method, body)`.
//
// Eight handlers used to duplicate the
// `ensureWorkerRunning() → workerHttpRequest() → if (!ok) return { continue: true }`
// sequence. This helper is the ONE implementation; eight handlers import it.
//
// Behavior:
// 1. ensureWorkerAliveOnce() (Phase 9). If false → fail-loud counter
// (Phase 8). May process.exit(2). Otherwise return graceful fallback.
// 2. workerHttpRequest(url, method, body). Parse JSON.
// 3. On success, reset the fail-loud counter.
//
// No retry inside this helper. No timeout-and-exit-0 swallow. The fail-loud
// counter records consecutive invocation outcomes; it does not reinvoke work.
// ============================================================================
// Branded sentinel so isWorkerFallback cannot false-positive on legitimate
// API responses that happen to carry `continue: true` in their own schema.
const WORKER_FALLBACK_BRAND: unique symbol = Symbol.for('claude-mem/worker-fallback');
export type WorkerFallback =
| { continue: true; [WORKER_FALLBACK_BRAND]: true }
| { continue: true; reason: string; [WORKER_FALLBACK_BRAND]: true };
export type WorkerCallResult<T> = T | WorkerFallback;
export function isWorkerFallback<T>(result: WorkerCallResult<T>): result is WorkerFallback {
return typeof result === 'object'
&& result !== null
&& (result as { [WORKER_FALLBACK_BRAND]?: unknown })[WORKER_FALLBACK_BRAND] === true;
}
export interface WorkerFallbackOptions {
/**
* Per-call HTTP timeout in ms. Forwarded to workerHttpRequest. Omit to use
* HEALTH_CHECK_TIMEOUT_MS (the default ~3 s suitable for short pings).
* All hook endpoints are fire-and-forget queueing endpoints that return
* `{status: 'queued'}` immediately, so the default suffices.
*/
timeoutMs?: number;
}
export async function executeWithWorkerFallback<T = unknown>(
url: string,
method: 'GET' | 'POST' | 'PUT' | 'DELETE',
body?: unknown,
options: WorkerFallbackOptions = {},
): Promise<WorkerCallResult<T>> {
const alive = await ensureWorkerAliveOnce();
if (!alive) {
// Records and possibly process.exit(2). If we return below, the counter
// is below threshold, the user's session continues uninterrupted.
recordWorkerUnreachable();
return { continue: true, reason: 'worker_unreachable', [WORKER_FALLBACK_BRAND]: true };
}
const init: { method: string; headers?: Record<string, string>; body?: string; timeoutMs?: number } = { method };
if (body !== undefined) {
init.headers = { 'Content-Type': 'application/json' };
init.body = JSON.stringify(body);
}
if (options.timeoutMs !== undefined) {
init.timeoutMs = options.timeoutMs;
}
const response = await workerHttpRequest(url, init);
if (!response.ok) {
// Non-2xx is a real worker response (so the worker IS reachable). Reset
// the consecutive-failures counter; surface the response body to the
// caller as a typed value via T's caller-controlled shape. Callers that
// care about non-2xx must inspect the value (or wrap with their own
// status check); the helper does not silently coerce non-2xx into a
// graceful fallback.
resetWorkerFailureCounter();
const text = await response.text().catch(() => '');
let parsed: unknown = text;
try { parsed = JSON.parse(text); } catch { /* keep raw text */ }
return parsed as T;
}
resetWorkerFailureCounter();
const text = await response.text();
if (text.length === 0) return undefined as unknown as T;
try {
return JSON.parse(text) as T;
} catch {
return text as unknown as T;
}
}
+6 -9
View File
@@ -146,10 +146,6 @@ export async function startSupervisor(): Promise<void> {
await supervisorSingleton.start();
}
export async function stopSupervisor(): Promise<void> {
await supervisorSingleton.stop();
}
export function getSupervisor(): Supervisor {
return supervisorSingleton;
}
@@ -168,7 +164,7 @@ export function validateWorkerPidFile(options: ValidateWorkerPidOptions = {}): V
let pidInfo: PidInfo | null = null;
try {
pidInfo = JSON.parse(readFileSync(pidFilePath, 'utf-8')) as PidInfo;
pidInfo = JSON.parse(readFileSync(pidFilePath, 'utf-8')) as PidInfo | null;
} catch (error: unknown) {
if (error instanceof Error) {
logger.warn('SYSTEM', 'Failed to parse worker PID file, removing it', { path: pidFilePath }, error);
@@ -182,7 +178,8 @@ export function validateWorkerPidFile(options: ValidateWorkerPidOptions = {}): V
return 'invalid';
}
if (verifyPidFileOwnership(pidInfo)) {
const isAlive = verifyPidFileOwnership(pidInfo);
if (isAlive && pidInfo) {
if (options.logAlive ?? true) {
logger.info('SYSTEM', 'Worker already running (PID alive)', {
existingPid: pidInfo.pid,
@@ -194,9 +191,9 @@ export function validateWorkerPidFile(options: ValidateWorkerPidOptions = {}): V
}
logger.info('SYSTEM', 'Removing stale PID file (worker process is dead or PID has been reused)', {
pid: pidInfo.pid,
port: pidInfo.port,
startedAt: pidInfo.startedAt
pid: pidInfo?.pid,
port: pidInfo?.port,
startedAt: pidInfo?.startedAt
});
rmSync(pidFilePath, { force: true });
return 'stale';
+430 -7
View File
@@ -1,8 +1,9 @@
import { ChildProcess, spawnSync } from 'child_process';
import { ChildProcess, spawn, spawnSync } from 'child_process';
import { existsSync, mkdirSync, readFileSync, writeFileSync } from 'fs';
import { homedir } from 'os';
import path from 'path';
import { logger } from '../utils/logger.js';
import { sanitizeEnv } from './env-sanitizer.js';
const REAP_SESSION_SIGTERM_TIMEOUT_MS = 5_000;
const REAP_SESSION_SIGKILL_TIMEOUT_MS = 1_000;
@@ -15,6 +16,14 @@ export interface ManagedProcessInfo {
type: string;
sessionId?: string | number;
startedAt: string;
// POSIX process group leader PID for group-scoped teardown.
// On Unix, when a child is spawned with `detached: true`, the kernel calls
// setpgid() and the child becomes the leader of its own group — its pgid
// equals its pid. Stored so `process.kill(-pgid, signal)` can tear down
// the child AND every descendant it spawned in one syscall (Principle 5).
// Undefined on Windows (no POSIX groups) and for processes that were not
// spawned with detached: true (e.g. the worker itself, MCP stdio clients).
pgid?: number;
}
export interface ManagedProcessRecord extends ManagedProcessInfo {
@@ -303,22 +312,30 @@ export class ProcessRegistry {
pids: sessionRecords.map(r => r.pid)
});
// Phase 1: SIGTERM all alive processes
// Phase 1: SIGTERM all alive processes — use process-group teardown for
// records that carry pgid so any descendants the SDK spawned are killed
// too (Principle 5).
const aliveRecords = sessionRecords.filter(r => isPidAlive(r.pid));
for (const record of aliveRecords) {
try {
process.kill(record.pid, 'SIGTERM');
if (typeof record.pgid === 'number' && process.platform !== 'win32') {
process.kill(-record.pgid, 'SIGTERM');
} else {
process.kill(record.pid, 'SIGTERM');
}
} catch (error: unknown) {
if (error instanceof Error) {
const code = (error as NodeJS.ErrnoException).code;
if (code !== 'ESRCH') {
logger.debug('SYSTEM', `Failed to SIGTERM session process PID ${record.pid}`, {
pid: record.pid
pid: record.pid,
pgid: record.pgid
}, error);
}
} else {
logger.warn('SYSTEM', `Failed to SIGTERM session process PID ${record.pid} (non-Error)`, {
pid: record.pid,
pgid: record.pgid,
error: String(error)
});
}
@@ -333,26 +350,34 @@ export class ProcessRegistry {
await new Promise(resolve => setTimeout(resolve, 100));
}
// Phase 3: SIGKILL any survivors
// Phase 3: SIGKILL any survivors — process-group teardown when pgid is
// recorded so descendants are killed too.
const survivors = aliveRecords.filter(r => isPidAlive(r.pid));
for (const record of survivors) {
logger.warn('SYSTEM', `Session process PID ${record.pid} did not exit after SIGTERM, sending SIGKILL`, {
pid: record.pid,
pgid: record.pgid,
sessionId: sessionIdNum
});
try {
process.kill(record.pid, 'SIGKILL');
if (typeof record.pgid === 'number' && process.platform !== 'win32') {
process.kill(-record.pgid, 'SIGKILL');
} else {
process.kill(record.pid, 'SIGKILL');
}
} catch (error: unknown) {
if (error instanceof Error) {
const code = (error as NodeJS.ErrnoException).code;
if (code !== 'ESRCH') {
logger.debug('SYSTEM', `Failed to SIGKILL session process PID ${record.pid}`, {
pid: record.pid
pid: record.pid,
pgid: record.pgid
}, error);
}
} else {
logger.warn('SYSTEM', `Failed to SIGKILL session process PID ${record.pid} (non-Error)`, {
pid: record.pid,
pgid: record.pgid,
error: String(error)
});
}
@@ -406,3 +431,401 @@ export function getProcessRegistry(): ProcessRegistry {
export function createProcessRegistry(registryPath: string): ProcessRegistry {
return new ProcessRegistry(registryPath);
}
// ---------------------------------------------------------------------------
// SDK session lookup + exit verification
// ---------------------------------------------------------------------------
export interface TrackedSdkProcess {
pid: number;
pgid: number | undefined;
sessionDbId: number;
process: ChildProcess;
}
/**
* Look up the live SDK subprocess for a given session, if any.
*
* Returns undefined when no SDK record is registered for the session, or
* when the ChildProcess reference has been dropped (process exited and was
* unregistered). Warns on duplicates multiple SDK records per session
* indicate a race in createSdkSpawnFactory's pre-spawn cleanup.
*/
export function getSdkProcessForSession(sessionDbId: number): TrackedSdkProcess | undefined {
const registry = getProcessRegistry();
const matches = registry.getBySession(sessionDbId).filter(r => r.type === 'sdk');
if (matches.length > 1) {
logger.warn('PROCESS', `Multiple SDK processes found for session ${sessionDbId}`, {
count: matches.length,
pids: matches.map(m => m.pid),
});
}
const record = matches[0];
if (!record) return undefined;
const processRef = registry.getRuntimeProcess(record.id);
if (!processRef) return undefined;
return {
pid: record.pid,
pgid: record.pgid,
sessionDbId,
process: processRef,
};
}
/**
* Wait for an SDK subprocess to exit, escalating to SIGKILL on the process
* group if it overstays `timeoutMs`. Fully event-driven no polling.
*
* This is primary-path cleanup invoked from session-level finally() blocks
* when a session ends; it is NOT a reaper. It runs at most once per session
* deletion. Process-group teardown (`kill(-pgid, SIGKILL)`) ensures any
* descendants the SDK spawned are also killed.
*/
export async function ensureSdkProcessExit(
tracked: TrackedSdkProcess,
timeoutMs: number = 5000
): Promise<void> {
const { pid, pgid, process: proc } = tracked;
// Already exited? Trust exitCode, not proc.killed — proc.killed only means
// Node sent a signal; the process may still be running.
if (proc.exitCode !== null) return;
const exitPromise = new Promise<void>((resolve) => {
proc.once('exit', () => resolve());
});
const timeoutPromise = new Promise<void>((resolve) => {
setTimeout(resolve, timeoutMs);
});
await Promise.race([exitPromise, timeoutPromise]);
if (proc.exitCode !== null) return;
// Timeout: escalate to SIGKILL on the whole process group so any
// descendants the SDK spawned are killed too (Principle 5).
logger.warn('PROCESS', `PID ${pid} did not exit after ${timeoutMs}ms, sending SIGKILL to process group`, {
pid, pgid, timeoutMs,
});
try {
if (typeof pgid === 'number' && process.platform !== 'win32') {
process.kill(-pgid, 'SIGKILL');
} else {
proc.kill('SIGKILL');
}
} catch {
// Already dead — fine.
}
// Wait up to 1s for SIGKILL to take effect (event-driven, not blind sleep).
const sigkillExit = new Promise<void>((resolve) => {
proc.once('exit', () => resolve());
});
const sigkillTimeout = new Promise<void>((resolve) => {
setTimeout(resolve, 1000);
});
await Promise.race([sigkillExit, sigkillTimeout]);
}
// ---------------------------------------------------------------------------
// Pool slot waiters — backpressure without eviction
// ---------------------------------------------------------------------------
//
// waitForSlot is used by SDKAgent to avoid starting more concurrent SDK
// subprocesses than configured. It is event-driven: when a process exits and
// is unregistered, notifySlotAvailable() wakes exactly one waiter. There is
// no polling. There is no idle-session eviction (Principle 1 — do not kick
// live sessions to make room; a full pool must apply backpressure upstream).
const TOTAL_PROCESS_HARD_CAP = 10;
const slotWaiters: Array<() => void> = [];
function getActiveSdkCount(): number {
return getProcessRegistry().getAll().filter(record => record.type === 'sdk').length;
}
function notifySlotAvailable(): void {
const waiter = slotWaiters.shift();
if (waiter) waiter();
}
/**
* Wait until a pool slot is available to spawn another SDK subprocess.
*
* Resolves immediately when active SDK process count is below `maxConcurrent`.
* Otherwise enqueues a waiter that is woken by a subsequent exit handler.
* Rejects with a timeout error if no slot opens within `timeoutMs`.
* Rejects immediately if the registry is already at the hard cap.
*/
export async function waitForSlot(maxConcurrent: number, timeoutMs: number = 60_000): Promise<void> {
const activeCount = getActiveSdkCount();
if (activeCount >= TOTAL_PROCESS_HARD_CAP) {
throw new Error(`Hard cap exceeded: ${activeCount} processes in registry (cap=${TOTAL_PROCESS_HARD_CAP}). Refusing to spawn more.`);
}
if (activeCount < maxConcurrent) return;
logger.info('PROCESS', `Pool limit reached (${activeCount}/${maxConcurrent}), waiting for slot...`);
return new Promise<void>((resolve, reject) => {
const timeout = setTimeout(() => {
const idx = slotWaiters.indexOf(onSlot);
if (idx >= 0) slotWaiters.splice(idx, 1);
reject(new Error(`Timed out waiting for agent pool slot after ${timeoutMs}ms`));
}, timeoutMs);
const onSlot = () => {
clearTimeout(timeout);
if (getActiveSdkCount() < maxConcurrent) {
resolve();
} else {
slotWaiters.push(onSlot);
}
};
slotWaiters.push(onSlot);
});
}
// ---------------------------------------------------------------------------
// SDK subprocess spawn
// ---------------------------------------------------------------------------
export interface SpawnedSdkProcess {
stdin: NonNullable<ChildProcess['stdin']>;
stdout: NonNullable<ChildProcess['stdout']>;
stderr: NonNullable<ChildProcess['stderr']>;
readonly killed: boolean;
readonly exitCode: number | null;
kill: ChildProcess['kill'];
on: ChildProcess['on'];
once: ChildProcess['once'];
off: ChildProcess['off'];
}
export interface SpawnSdkOptions {
command: string;
args: string[];
cwd?: string;
env?: NodeJS.ProcessEnv;
signal?: AbortSignal;
}
/**
* Spawn a Claude SDK subprocess in its own POSIX process group.
*
* The spawn uses `detached: true` so the child becomes the leader of a new
* process group (setpgid). The leader's PID equals its pgid on Unix, so we
* store `child.pid` as both pid and pgid on the managed process record.
* Shutdown then signals the group via `process.kill(-pgid, signal)`, tearing
* down the SDK child AND every descendant in one syscall (Principle 5).
*
* Windows caveat: `detached: true` does not create a POSIX group. The
* recorded pgid is still the child PID so Windows teardown at least kills
* the direct child; full subtree teardown on Windows requires Job Objects
* or `taskkill /T /F` (see shutdown.ts).
*
* Node's child_process.spawn is used intentionally Bun.spawn does NOT
* support `detached: true` (see PATHFINDER-2026-04-22/_reference.md Part 2
* row 3), and this module must work under Bun as well as Node.
*/
export function spawnSdkProcess(
sessionDbId: number,
options: SpawnSdkOptions
): { process: SpawnedSdkProcess; pid: number; pgid: number } | null {
const registry = getProcessRegistry();
// On Windows, use cmd.exe wrapper for .cmd files to properly handle paths with spaces.
const useCmdWrapper = process.platform === 'win32' && options.command.endsWith('.cmd');
const env = sanitizeEnv(options.env ?? process.env);
// Filter empty string args AND their preceding flag (Issue #2049).
// The Agent SDK emits ["--setting-sources", ""] when settingSources defaults to [].
// Simply dropping "" leaves an orphan --setting-sources that consumes the next
// flag as its value, crashing Claude Code 2.1.109+ with
// "Invalid setting source: --permission-mode". Drop the flag too so the SDK
// default (no setting sources) is preserved by omission.
const filteredArgs: string[] = [];
for (const arg of options.args) {
if (arg === '') {
if (filteredArgs.length > 0 && filteredArgs[filteredArgs.length - 1].startsWith('--')) {
filteredArgs.pop();
}
continue;
}
filteredArgs.push(arg);
}
// Unix: detached:true causes the kernel to setpgid() on the child so the
// child becomes leader of a new process group whose pgid equals its pid.
// Windows: detached:true decouples the child from the parent console; there
// is no POSIX group, but the flag is still safe to pass.
//
// stdin must be 'pipe' (not 'ignore') because SpawnedSdkProcess.stdin is
// typed NonNullable<...> and the Claude Agent SDK consumes that pipe to
// stream prompts in. With 'ignore', child.stdin would be null and the
// null-check below (line ~737) would tear the child down immediately.
const child = useCmdWrapper
? spawn('cmd.exe', ['/d', '/c', options.command, ...filteredArgs], {
cwd: options.cwd,
env,
detached: true,
stdio: ['pipe', 'pipe', 'pipe'],
signal: options.signal,
windowsHide: true,
})
: spawn(options.command, filteredArgs, {
cwd: options.cwd,
env,
detached: true,
stdio: ['pipe', 'pipe', 'pipe'],
signal: options.signal,
windowsHide: true,
});
// ALWAYS attach an 'error' listener BEFORE any other code runs, regardless of
// whether the child has a PID. child_process.spawn emits 'error' asynchronously
// for ENOENT, EACCES, AbortSignal-driven aborts, etc. Without a listener these
// become uncaughtException — the cause of "The operation was aborted." escaping
// to the daemon during crash-recovery loops.
child.on('error', (err: Error) => {
logger.warn('SDK_SPAWN', `[session-${sessionDbId}] child emitted error event`, {
sessionDbId,
pid: child.pid,
errorName: err.name,
errorCode: (err as NodeJS.ErrnoException).code,
}, err);
});
if (!child.pid) {
logger.error('PROCESS', 'Spawn succeeded but produced no PID', { sessionDbId });
return null;
}
const pid = child.pid;
const pgid = pid; // On Unix with detached:true, pgid === pid. On Windows, this is an alias.
// Capture stderr for debugging spawn failures.
if (child.stderr) {
child.stderr.on('data', (data: Buffer) => {
logger.debug('SDK_SPAWN', `[session-${sessionDbId}] stderr: ${data.toString().trim()}`);
});
}
// Register the process in the supervisor registry with pgid recorded so
// the shutdown cascade can signal the whole group.
const recordId = `sdk:${sessionDbId}:${pid}`;
registry.register(recordId, {
pid,
type: 'sdk',
sessionId: sessionDbId,
startedAt: new Date().toISOString(),
pgid,
}, child);
// Auto-unregister on exit. child.on('exit') is the authoritative event-driven
// signal that a process has left — no polling, no sweeper needed (Principle 4).
child.on('exit', (code: number | null, signal: string | null) => {
if (code !== 0) {
logger.warn('SDK_SPAWN', `[session-${sessionDbId}] Claude process exited`, { code, signal, pid });
}
registry.unregister(recordId);
// Wake one pool-slot waiter since a slot just freed up.
notifySlotAvailable();
});
if (!child.stdin || !child.stdout || !child.stderr) {
logger.error('PROCESS', 'Spawned SDK child missing required stdio streams', {
sessionDbId,
pid,
hasStdin: Boolean(child.stdin),
hasStdout: Boolean(child.stdout),
hasStderr: Boolean(child.stderr),
});
try { child.kill('SIGKILL'); } catch { /* already dead */ }
return null;
}
const spawned: SpawnedSdkProcess = {
stdin: child.stdin,
stdout: child.stdout,
stderr: child.stderr,
get killed() { return child.killed; },
get exitCode() { return child.exitCode; },
kill: child.kill.bind(child),
on: child.on.bind(child),
once: child.once.bind(child),
off: child.off.bind(child),
};
return { process: spawned, pid, pgid };
}
/**
* SDK-compatible spawn factory.
*
* The Claude Agent SDK's `spawnClaudeCodeProcess` option calls our factory
* with its own spawn arguments; we forward them into `spawnSdkProcess` which
* creates the child in its own process group and records it in the supervisor
* registry. The returned shape is the minimal subset of ChildProcess that the
* SDK consumes stdin/stdout/stderr pipes, killed/exitCode getters, and
* kill/on/once/off.
*
* Pre-spawn cleanup: if a previous process for this session is still alive
* (e.g. a crash-recovery attempt that collided with a still-running SDK),
* SIGTERM it. Multiple processes sharing the same --resume UUID waste API
* credits and can conflict with each other (Issue #1590).
*/
export function createSdkSpawnFactory(sessionDbId: number) {
return (spawnOptions: SpawnSdkOptions): SpawnedSdkProcess => {
const registry = getProcessRegistry();
// Kill any existing process for this session before spawning a new one.
const existing = registry.getBySession(sessionDbId).filter(r => r.type === 'sdk');
for (const record of existing) {
if (!isPidAlive(record.pid)) continue;
try {
if (typeof record.pgid === 'number') {
// Signal the whole group — kill the SDK child and any descendants.
if (process.platform !== 'win32') {
process.kill(-record.pgid, 'SIGTERM');
} else {
process.kill(record.pid, 'SIGTERM');
}
} else {
process.kill(record.pid, 'SIGTERM');
}
logger.warn('PROCESS', `Killing duplicate SDK process PID ${record.pid} before spawning new one for session ${sessionDbId}`, {
existingPid: record.pid,
sessionDbId,
});
} catch (error: unknown) {
const code = error instanceof Error ? (error as NodeJS.ErrnoException).code : undefined;
if (code !== 'ESRCH') {
if (error instanceof Error) {
logger.warn('PROCESS', `Failed to SIGTERM duplicate SDK process PID ${record.pid}`, { sessionDbId }, error);
} else {
logger.warn('PROCESS', `Failed to SIGTERM duplicate SDK process PID ${record.pid} (non-Error)`, {
sessionDbId, error: String(error),
});
}
}
}
}
const result = spawnSdkProcess(sessionDbId, spawnOptions);
if (!result) {
// Match the legacy failure mode: the SDK needs a process-like object
// even on spawn failure; throwing here surfaces via exit code 2 to the
// hook layer (Principle 2 — fail-fast).
throw new Error(`Failed to spawn SDK subprocess for session ${sessionDbId}`);
}
return result.process;
};
}
+60 -40
View File
@@ -34,16 +34,18 @@ export async function runShutdownCascade(options: ShutdownCascadeOptions): Promi
}
try {
await signalProcess(record.pid, 'SIGTERM');
await signalProcess(record, 'SIGTERM');
} catch (error: unknown) {
if (error instanceof Error) {
logger.debug('SYSTEM', 'Failed to send SIGTERM to child process', {
pid: record.pid,
pgid: record.pgid,
type: record.type
}, error);
} else {
logger.warn('SYSTEM', 'Failed to send SIGTERM to child process (non-Error)', {
pid: record.pid,
pgid: record.pgid,
type: record.type,
error: String(error)
});
@@ -56,16 +58,18 @@ export async function runShutdownCascade(options: ShutdownCascadeOptions): Promi
const survivors = childRecords.filter(record => isPidAlive(record.pid));
for (const record of survivors) {
try {
await signalProcess(record.pid, 'SIGKILL');
await signalProcess(record, 'SIGKILL');
} catch (error: unknown) {
if (error instanceof Error) {
logger.debug('SYSTEM', 'Failed to force kill child process', {
pid: record.pid,
pgid: record.pgid,
type: record.type
}, error);
} else {
logger.warn('SYSTEM', 'Failed to force kill child process (non-Error)', {
pid: record.pid,
pgid: record.pgid,
type: record.type,
error: String(error)
});
@@ -110,7 +114,38 @@ async function waitForExit(records: ManagedProcessRecord[], timeoutMs: number):
}
}
async function signalProcess(pid: number, signal: 'SIGTERM' | 'SIGKILL'): Promise<void> {
async function signalProcess(record: ManagedProcessRecord, signal: 'SIGTERM' | 'SIGKILL'): Promise<void> {
const { pid, pgid } = record;
// Unix path: when the record carries a pgid (set when the child was spawned
// with detached:true so it became its own group leader), signal the negative
// PID to tear down the whole process group in one syscall — the SDK child
// AND every descendant it spawned. This replaces hand-rolled orphan sweeps
// (Principle 5: OS-supervised process groups over hand-rolled reapers).
//
// Falls back to single-PID kill when pgid is absent (the worker itself,
// MCP stdio clients, anything not spawned with detached:true).
if (process.platform !== 'win32') {
try {
if (typeof pgid === 'number') {
process.kill(-pgid, signal);
} else {
process.kill(pid, signal);
}
} catch (error: unknown) {
if (error instanceof Error) {
const errno = (error as NodeJS.ErrnoException).code;
if (errno === 'ESRCH') {
return;
}
}
throw error;
}
return;
}
// Windows: no POSIX process groups. SIGTERM uses single-PID kill; SIGKILL
// uses tree-kill or taskkill /T to walk the descendant tree.
if (signal === 'SIGTERM') {
try {
process.kill(pid, signal);
@@ -126,50 +161,35 @@ async function signalProcess(pid: number, signal: 'SIGTERM' | 'SIGKILL'): Promis
return;
}
if (process.platform === 'win32') {
const treeKill = await loadTreeKill();
if (treeKill) {
await new Promise<void>((resolve, reject) => {
treeKill(pid, signal, (error) => {
if (!error) {
resolve();
return;
}
const treeKill = await loadTreeKill();
if (treeKill) {
await new Promise<void>((resolve, reject) => {
treeKill(pid, signal, (error) => {
if (!error) {
resolve();
return;
}
const errno = (error as NodeJS.ErrnoException).code;
if (errno === 'ESRCH') {
resolve();
return;
}
reject(error);
});
const errno = (error as NodeJS.ErrnoException).code;
if (errno === 'ESRCH') {
resolve();
return;
}
reject(error);
});
return;
}
const args = ['/PID', String(pid), '/T'];
if (signal === 'SIGKILL') {
args.push('/F');
}
await execFileAsync('taskkill', args, {
timeout: HOOK_TIMEOUTS.POWERSHELL_COMMAND,
windowsHide: true
});
return;
}
try {
process.kill(pid, signal);
} catch (error: unknown) {
if (error instanceof Error) {
const errno = (error as NodeJS.ErrnoException).code;
if (errno === 'ESRCH') {
return;
}
}
throw error;
const args = ['/PID', String(pid), '/T'];
if (signal === 'SIGKILL') {
args.push('/F');
}
await execFileAsync('taskkill', args, {
timeout: HOOK_TIMEOUTS.POWERSHELL_COMMAND,
windowsHide: true
});
}
async function loadTreeKill(): Promise<TreeKillFn | null> {
+6 -6
View File
@@ -15,7 +15,7 @@ type DataItem = Observation | Summary | UserPrompt;
/**
* Generic pagination hook for observations, summaries, and prompts
*/
function usePaginationFor(endpoint: string, dataType: DataType, currentFilter: string, currentSource: string) {
function usePaginationFor<TItem extends DataItem>(endpoint: string, dataType: DataType, currentFilter: string, currentSource: string) {
const [state, setState] = useState<PaginationState>({
isLoading: false,
hasMore: true
@@ -30,7 +30,7 @@ function usePaginationFor(endpoint: string, dataType: DataType, currentFilter: s
* Load more items from the API
* Automatically resets offset to 0 if filter has changed
*/
const loadMore = useCallback(async (): Promise<DataItem[]> => {
const loadMore = useCallback(async (): Promise<TItem[]> => {
// Check if filter changed - if so, reset pagination synchronously
const selectionKey = `${currentSource}::${currentFilter}`;
const filterChanged = lastSelectionRef.current !== selectionKey;
@@ -75,7 +75,7 @@ function usePaginationFor(endpoint: string, dataType: DataType, currentFilter: s
throw new Error(`Failed to load ${dataType}: ${response.statusText}`);
}
const data = await response.json() as { items: DataItem[], hasMore: boolean };
const data = await response.json() as { items: TItem[], hasMore: boolean };
const nextState = {
...stateRef.current,
@@ -106,9 +106,9 @@ function usePaginationFor(endpoint: string, dataType: DataType, currentFilter: s
* Hook for paginating observations
*/
export function usePagination(currentFilter: string, currentSource: string) {
const observations = usePaginationFor(API_ENDPOINTS.OBSERVATIONS, 'observations', currentFilter, currentSource);
const summaries = usePaginationFor(API_ENDPOINTS.SUMMARIES, 'summaries', currentFilter, currentSource);
const prompts = usePaginationFor(API_ENDPOINTS.PROMPTS, 'prompts', currentFilter, currentSource);
const observations = usePaginationFor<Observation>(API_ENDPOINTS.OBSERVATIONS, 'observations', currentFilter, currentSource);
const summaries = usePaginationFor<Summary>(API_ENDPOINTS.SUMMARIES, 'summaries', currentFilter, currentSource);
const prompts = usePaginationFor<UserPrompt>(API_ENDPOINTS.PROMPTS, 'prompts', currentFilter, currentSource);
return {
observations,
+9
View File
@@ -0,0 +1,9 @@
{
"extends": "../../../tsconfig.json",
"compilerOptions": {
"lib": ["ES2022", "DOM", "DOM.Iterable"],
"rootDir": "."
},
"include": ["./**/*"],
"exclude": []
}
-80
View File
@@ -1,80 +0,0 @@
/**
* Bun Path Utility
*
* Resolves the Bun executable path for environments where Bun is not in PATH
* (e.g., fish shell users where ~/.config/fish/config.fish isn't read by /bin/sh)
*/
import { spawnSync } from 'child_process';
import { existsSync } from 'fs';
import { join } from 'path';
import { homedir } from 'os';
import { logger } from './logger.js';
/**
* Get the Bun executable path
* Tries PATH first, then checks common installation locations
* Returns absolute path if found, null otherwise
*/
export function getBunPath(): string | null {
const isWindows = process.platform === 'win32';
// Try PATH first
try {
const result = spawnSync('bun', ['--version'], {
encoding: 'utf-8',
stdio: ['pipe', 'pipe', 'pipe'],
shell: false // SECURITY: No need for shell, bun is the executable
});
if (result.status === 0) {
return 'bun'; // Available in PATH
}
} catch (e) {
logger.debug('SYSTEM', 'Bun not found in PATH, checking common installation locations', {
error: e instanceof Error ? e.message : String(e)
});
}
// Check common installation paths
const bunPaths = isWindows
? [join(homedir(), '.bun', 'bin', 'bun.exe')]
: [
join(homedir(), '.bun', 'bin', 'bun'),
'/usr/local/bin/bun',
'/opt/homebrew/bin/bun', // Apple Silicon Homebrew
'/home/linuxbrew/.linuxbrew/bin/bun' // Linux Homebrew
];
for (const bunPath of bunPaths) {
if (existsSync(bunPath)) {
return bunPath;
}
}
return null;
}
/**
* Get the Bun executable path or throw an error
* Use this when Bun is required for operation
*/
export function getBunPathOrThrow(): string {
const bunPath = getBunPath();
if (!bunPath) {
const isWindows = process.platform === 'win32';
const installCmd = isWindows
? 'powershell -c "irm bun.sh/install.ps1 | iex"'
: 'curl -fsSL https://bun.sh/install | bash';
throw new Error(
`Bun is required but not found. Install it with:\n ${installCmd}\nThen restart your terminal.`
);
}
return bunPath;
}
/**
* Check if Bun is available (in PATH or common locations)
*/
export function isBunAvailable(): boolean {
return getBunPath() !== null;
}
+38 -3
View File
@@ -15,12 +15,47 @@ export enum LogLevel {
SILENT = 4
}
export type Component = 'HOOK' | 'WORKER' | 'SDK' | 'PARSER' | 'DB' | 'SYSTEM' | 'HTTP' | 'SESSION' | 'CHROMA' | 'CHROMA_MCP' | 'CHROMA_SYNC' | 'FOLDER_INDEX' | 'CLAUDE_MD' | 'QUEUE' | 'TELEGRAM';
export type Component =
| 'AGENTS_MD'
| 'BRANCH'
| 'CHROMA'
| 'CHROMA_MCP'
| 'CHROMA_SYNC'
| 'CLAUDE_MD'
| 'CONFIG'
| 'CONSOLE'
| 'CURSOR'
| 'DB'
| 'DEDUP'
| 'ENV'
| 'FOLDER_INDEX'
| 'HOOK'
| 'HTTP'
| 'IMPORT'
| 'INGEST'
| 'OPENCLAW'
| 'OPENCODE'
| 'PARSER'
| 'PROCESS'
| 'PROJECT_NAME'
| 'QUEUE'
| 'SDK'
| 'SDK_SPAWN'
| 'SEARCH'
| 'SECURITY'
| 'SESSION'
| 'SETTINGS'
| 'SHUTDOWN'
| 'SYSTEM'
| 'TELEGRAM'
| 'TRANSCRIPT'
| 'WINDSURF'
| 'WORKER';
interface LogContext {
sessionId?: number;
sessionId?: string | number;
memorySessionId?: string;
correlationId?: string;
correlationId?: string | number;
[key: string]: any;
}
+60 -45
View File
@@ -10,82 +10,97 @@
* (should not be persisted to memory)
* 4. <system-reminder> - Claude Code-injected system reminders
* (CLAUDE.md contents, deferred tool lists, etc. should not be persisted)
* 5. <persisted-output> - Persisted-output payload tag
*
* EDGE PROCESSING PATTERN: Filter at hook layer before sending to worker/storage.
* This keeps the worker service simple and follows one-way data stream.
*
* PATHFINDER plan 03 phase 8: collapsed countTags + stripTagsInternal into a
* single alternation regex. One pass over the input. One helper, N callers
* (`stripMemoryTagsFromJson` / `stripMemoryTagsFromPrompt` are thin adapters).
*/
import { logger } from './logger.js';
/** All tag names this module strips. Single source of truth for the regex. */
const TAG_NAMES = [
'private',
'claude-mem-context',
'system_instruction',
'system-instruction',
'persisted-output',
'system-reminder',
] as const;
type TagName = (typeof TAG_NAMES)[number];
/**
* Single-pass alternation regex covering every privacy / context tag.
* Backreference `\1` ensures a closing tag matches the opening name; tag
* attributes (e.g. `<system-reminder data-foo="…">`) are tolerated via
* `[^>]*`.
*/
const STRIP_REGEX = new RegExp(
`<(${TAG_NAMES.join('|')})\\b[^>]*>[\\s\\S]*?</\\1>`,
'g'
);
/**
* Regex to match <system-reminder> tags and their content.
* Exported for use by transcript parsers that strip system-reminder at read-time.
*
* Kept as a separate single-tag regex because the active transcript parser
* (`src/shared/transcript-parser.ts`) consumes only this one tag and would
* otherwise need to re-import the multi-tag list.
*/
export const SYSTEM_REMINDER_REGEX = /<system-reminder>[\s\S]*?<\/system-reminder>/g;
/**
* Maximum number of tags allowed in a single content block
* This protects against ReDoS (Regular Expression Denial of Service) attacks
* where malicious input with many nested/unclosed tags could cause catastrophic backtracking
*/
/** Maximum total stripped-tag count before we log a ReDoS-class anomaly. */
const MAX_TAG_COUNT = 100;
/**
* Count total number of opening tags in content
* Used for ReDoS protection before regex processing
* Strip every recognised tag from `input` in a single pass.
*
* @returns the stripped string (trimmed) and per-tag counts. Counts are
* surfaced to logs for observability but are not used as a control
* signal.
*/
function countTags(content: string): number {
const privateCount = (content.match(/<private>/g) || []).length;
const contextCount = (content.match(/<claude-mem-context>/g) || []).length;
const systemInstructionCount = (content.match(/<system_instruction>/g) || []).length;
const systemInstructionHyphenCount = (content.match(/<system-instruction>/g) || []).length;
const persistedOutputCount = (content.match(/<persisted-output>/g) || []).length;
const systemReminderCount = (content.match(/<system-reminder>/g) || []).length;
return privateCount + contextCount + systemInstructionCount + systemInstructionHyphenCount + persistedOutputCount + systemReminderCount;
}
export function stripTags(input: string): { stripped: string; counts: Record<TagName, number> } {
const counts: Record<TagName, number> = Object.fromEntries(
TAG_NAMES.map(name => [name, 0])
) as Record<TagName, number>;
/**
* Internal function to strip memory tags from content
* Shared logic extracted from both JSON and prompt stripping functions
*/
function stripTagsInternal(content: string): string {
// ReDoS protection: limit tag count before regex processing
const tagCount = countTags(content);
if (tagCount > MAX_TAG_COUNT) {
STRIP_REGEX.lastIndex = 0; // /g state is per-instance — reset before each call.
let total = 0;
const stripped = input.replace(STRIP_REGEX, (_, name: TagName) => {
counts[name] = (counts[name] ?? 0) + 1;
total += 1;
return '';
});
if (total > MAX_TAG_COUNT) {
logger.warn('SYSTEM', 'tag count exceeds limit', undefined, {
tagCount,
tagCount: total,
maxAllowed: MAX_TAG_COUNT,
contentLength: content.length
contentLength: input.length,
});
// Still process but log the anomaly
}
return content
.replace(/<claude-mem-context>[\s\S]*?<\/claude-mem-context>/g, '')
.replace(/<private>[\s\S]*?<\/private>/g, '')
.replace(/<system_instruction>[\s\S]*?<\/system_instruction>/g, '')
.replace(/<system-instruction>[\s\S]*?<\/system-instruction>/g, '')
.replace(/<persisted-output>[\s\S]*?<\/persisted-output>/g, '')
.replace(SYSTEM_REMINDER_REGEX, '')
.trim();
return { stripped: stripped.trim(), counts };
}
/**
* Strip memory tags from JSON-serialized content (tool inputs/responses)
*
* @param content - Stringified JSON content from tool_input or tool_response
* @returns Cleaned content with tags removed, or '{}' if invalid
* Strip memory tags from JSON-serialized content (tool inputs/responses).
* Thin adapter around `stripTags` same regex, same single pass.
*/
export function stripMemoryTagsFromJson(content: string): string {
return stripTagsInternal(content);
return stripTags(content).stripped;
}
/**
* Strip memory tags from user prompt content
*
* @param content - Raw user prompt text
* @returns Cleaned content with tags removed
* Strip memory tags from user prompt content.
* Thin adapter around `stripTags` same regex, same single pass.
*/
export function stripMemoryTagsFromPrompt(content: string): string {
return stripTagsInternal(content);
return stripTags(content).stripped;
}
-266
View File
@@ -1,266 +0,0 @@
/**
* TranscriptParser - Properly parse Claude Code transcript JSONL files
* Handles all transcript entry types based on validated model
*/
import { readFileSync } from 'fs';
import { logger } from './logger.js';
import { SYSTEM_REMINDER_REGEX } from './tag-stripping.js';
import type {
TranscriptEntry,
UserTranscriptEntry,
AssistantTranscriptEntry,
SummaryTranscriptEntry,
SystemTranscriptEntry,
QueueOperationTranscriptEntry,
ContentItem,
TextContent,
} from '../types/transcript.js';
export interface ParseStats {
totalLines: number;
parsedEntries: number;
failedLines: number;
entriesByType: Record<string, number>;
failureRate: number;
}
export class TranscriptParser {
private entries: TranscriptEntry[] = [];
private parseErrors: Array<{ lineNumber: number; error: string }> = [];
constructor(transcriptPath: string) {
this.parseTranscript(transcriptPath);
}
private parseTranscript(transcriptPath: string): void {
const content = readFileSync(transcriptPath, 'utf-8').trim();
if (!content) return;
const lines = content.split('\n');
lines.forEach((line, index) => {
try {
const entry = JSON.parse(line) as TranscriptEntry;
this.entries.push(entry);
} catch (error) {
logger.debug('PARSER', 'Failed to parse transcript line', { lineNumber: index + 1 }, error as Error);
this.parseErrors.push({
lineNumber: index + 1,
error: error instanceof Error ? error.message : String(error),
});
}
});
// Log summary if there were parse errors
if (this.parseErrors.length > 0) {
logger.error('PARSER', `Failed to parse ${this.parseErrors.length} lines`, {
path: transcriptPath,
totalLines: lines.length,
errorCount: this.parseErrors.length
});
}
}
/**
* Get all entries of a specific type
*/
getEntriesByType<T extends TranscriptEntry>(type: T['type']): T[] {
return this.entries.filter((e) => e.type === type) as T[];
}
/**
* Get all user entries
*/
getUserEntries(): UserTranscriptEntry[] {
return this.getEntriesByType<UserTranscriptEntry>('user');
}
/**
* Get all assistant entries
*/
getAssistantEntries(): AssistantTranscriptEntry[] {
return this.getEntriesByType<AssistantTranscriptEntry>('assistant');
}
/**
* Get all summary entries
*/
getSummaryEntries(): SummaryTranscriptEntry[] {
return this.getEntriesByType<SummaryTranscriptEntry>('summary');
}
/**
* Get all system entries
*/
getSystemEntries(): SystemTranscriptEntry[] {
return this.getEntriesByType<SystemTranscriptEntry>('system');
}
/**
* Get all queue operation entries
*/
getQueueOperationEntries(): QueueOperationTranscriptEntry[] {
return this.getEntriesByType<QueueOperationTranscriptEntry>('queue-operation');
}
/**
* Get last entry of a specific type
*/
getLastEntryByType<T extends TranscriptEntry>(type: T['type']): T | null {
const entries = this.getEntriesByType<T>(type);
return entries.length > 0 ? entries[entries.length - 1] : null;
}
/**
* Extract text content from content items
*/
private extractTextFromContent(content: string | ContentItem[]): string {
if (typeof content === 'string') {
return content;
}
if (Array.isArray(content)) {
return content
.filter((item): item is TextContent => item.type === 'text')
.map((item) => item.text)
.join('\n');
}
return '';
}
/**
* Get last user message text (finds last entry with actual text content)
*/
getLastUserMessage(): string {
const userEntries = this.getUserEntries();
// Iterate backward to find the last user message with text content
for (let i = userEntries.length - 1; i >= 0; i--) {
const entry = userEntries[i];
if (!entry?.message?.content) continue;
const text = this.extractTextFromContent(entry.message.content);
if (text) return text;
}
return '';
}
/**
* Get last assistant message text (finds last entry with text content, with optional system-reminder filtering)
*/
getLastAssistantMessage(filterSystemReminders = true): string {
const assistantEntries = this.getAssistantEntries();
// Iterate backward to find the last assistant message with text content
for (let i = assistantEntries.length - 1; i >= 0; i--) {
const entry = assistantEntries[i];
if (!entry?.message?.content) continue;
let text = this.extractTextFromContent(entry.message.content);
if (!text) continue;
if (filterSystemReminders) {
// Filter out system-reminder tags and their content
text = text.replace(SYSTEM_REMINDER_REGEX, '');
// Clean up excessive whitespace
text = text.replace(/\n{3,}/g, '\n\n').trim();
}
if (text) return text;
}
return '';
}
/**
* Get all tool use operations from assistant entries
*/
getToolUseHistory(): Array<{ name: string; timestamp: string; input: any }> {
const toolUses: Array<{ name: string; timestamp: string; input: any }> = [];
for (const entry of this.getAssistantEntries()) {
if (Array.isArray(entry.message.content)) {
for (const item of entry.message.content) {
if (item.type === 'tool_use') {
toolUses.push({
name: item.name,
timestamp: entry.timestamp,
input: item.input,
});
}
}
}
}
return toolUses;
}
/**
* Get total token usage across all assistant messages
*/
getTotalTokenUsage(): {
inputTokens: number;
outputTokens: number;
cacheCreationTokens: number;
cacheReadTokens: number;
} {
const assistantEntries = this.getAssistantEntries();
return assistantEntries.reduce(
(acc, entry) => {
const usage = entry.message.usage;
if (usage) {
acc.inputTokens += usage.input_tokens || 0;
acc.outputTokens += usage.output_tokens || 0;
acc.cacheCreationTokens += usage.cache_creation_input_tokens || 0;
acc.cacheReadTokens += usage.cache_read_input_tokens || 0;
}
return acc;
},
{
inputTokens: 0,
outputTokens: 0,
cacheCreationTokens: 0,
cacheReadTokens: 0,
}
);
}
/**
* Get parse statistics
*/
getParseStats(): ParseStats {
const entriesByType: Record<string, number> = {};
for (const entry of this.entries) {
entriesByType[entry.type] = (entriesByType[entry.type] || 0) + 1;
}
const totalLines = this.entries.length + this.parseErrors.length;
return {
totalLines,
parsedEntries: this.entries.length,
failedLines: this.parseErrors.length,
entriesByType,
failureRate: totalLines > 0 ? this.parseErrors.length / totalLines : 0,
};
}
/**
* Get parse errors
*/
getParseErrors(): Array<{ lineNumber: number; error: string }> {
return this.parseErrors;
}
/**
* Get all entries (raw)
*/
getAllEntries(): TranscriptEntry[] {
return this.entries;
}
}