perf: streamline worker startup and consolidate database connections (#2122)
* docs: pathfinder refactor corpus + Node 20 preflight
Adds the PATHFINDER-2026-04-22 principle-driven refactor plan (11 docs,
cross-checked PASS) plus the exploratory PATHFINDER-2026-04-21 corpus
that motivated it. Bumps engines.node to >=20.0.0 per the ingestion-path
plan preflight (recursive fs.watch). Adds the pathfinder skill.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor: land PATHFINDER Plan 01 — data integrity
Schema, UNIQUE constraints, self-healing claim, Chroma upsert fallback.
- Phase 1: fresh schema.sql regenerated at post-refactor shape.
- Phase 2: migrations 23+24 — rebuild pending_messages without
started_processing_at_epoch; UNIQUE(session_id, tool_use_id);
UNIQUE(memory_session_id, content_hash) on observations; dedup
duplicate rows before adding indexes.
- Phase 3: claimNextMessage rewritten to self-healing query using
worker_pid NOT IN live_worker_pids; STALE_PROCESSING_THRESHOLD_MS
and the 60-s stale-reset block deleted.
- Phase 4: DEDUP_WINDOW_MS and findDuplicateObservation deleted;
observations.insert now uses ON CONFLICT DO NOTHING.
- Phase 5: failed-message purge block deleted from worker-service
2-min interval; clearFailedOlderThan method deleted.
- Phase 6: repairMalformedSchema and its Python subprocess repair
path deleted from Database.ts; SQLite errors now propagate.
- Phase 7: Chroma delete-then-add fallback gated behind
CHROMA_SYNC_FALLBACK_ON_CONFLICT env flag as bridge until
Chroma MCP ships native upsert.
- Phase 8: migration 19 no-op block absorbed into fresh schema.sql.
Verification greps all return 0 matches. bun test tests/sqlite/
passes 63/63. bun run build succeeds.
Plan: PATHFINDER-2026-04-22/01-data-integrity.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor: land PATHFINDER Plan 02 — process lifecycle
OS process groups replace hand-rolled reapers. Worker runs until
killed; orphans are prevented by detached spawn + kill(-pgid).
- Phase 1: src/services/worker/ProcessRegistry.ts DELETED. The
canonical registry at src/supervisor/process-registry.ts is the
sole survivor; SDK spawn site consolidated into it via new
createSdkSpawnFactory/spawnSdkProcess/getSdkProcessForSession/
ensureSdkProcessExit/waitForSlot helpers.
- Phase 2: SDK children spawn with detached:true + stdio:
['ignore','pipe','pipe']; pgid recorded on ManagedProcessInfo.
- Phase 3: shutdown.ts signalProcess teardown uses
process.kill(-pgid, signal) on Unix when pgid is recorded;
Windows path unchanged (tree-kill/taskkill).
- Phase 4: all reaper intervals deleted — startOrphanReaper call,
staleSessionReaperInterval setInterval (including the co-located
WAL checkpoint — SQLite's built-in wal_autocheckpoint handles
WAL growth without an app-level timer), killIdleDaemonChildren,
killSystemOrphans, reapOrphanedProcesses, reapStaleSessions, and
detectStaleGenerator. MAX_GENERATOR_IDLE_MS and MAX_SESSION_IDLE_MS
constants deleted.
- Phase 5: abandonedTimer — already 0 matches; primary-path cleanup
via generatorPromise.finally() already lives in worker-service
startSessionProcessor and SessionRoutes ensureGeneratorRunning.
- Phase 6: evictIdlestSession and its evict callback deleted from
SessionManager. Pool admission gates backpressure upstream.
- Phase 7: SDK-failure fallback — SessionManager has zero matches
for fallbackAgent/Gemini/OpenRouter. Failures surface to hooks
via exit code 2 through SessionRoutes error mapping.
- Phase 8: ensureWorkerRunning in worker-utils.ts rewritten to
lazy-spawn — consults isWorkerPortAlive (which gates
captureProcessStartToken for PID-reuse safety via commit
99060bac), then spawns detached with unref(), then
waitForWorkerPort({ attempts: 3, backoffMs: 250 }) hand-rolled
exponential backoff 250→500→1000ms. No respawn npm dep.
- Phase 9: idle self-shutdown — zero matches for
idleCheck/idleTimeout/IDLE_MAX_MS/idleShutdown. Worker exits
only on external SIGTERM via supervisor signal handlers.
Three test files that exercised deleted code removed:
tests/worker/process-registry.test.ts,
tests/worker/session-lifecycle-guard.test.ts,
tests/services/worker/reap-stale-sessions.test.ts.
Pass count: 1451 → 1407 (-44), all attributable to deleted test
files. Zero new failures. 31 pre-existing failures remain
(schema-repair suite, logger-usage-standards, environmental
openclaw / plugin-distribution) — none introduced by Plan 02.
All 10 verification greps return 0. bun run build succeeds.
Plan: PATHFINDER-2026-04-22/02-process-lifecycle.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor: land PATHFINDER Plan 04 (narrowed) — search fail-fast
Phases 3, 5, 6 only. Plan-doc inaccuracies for phases 1/2/4/7/8/9
deferred for plan reconciliation:
- Phase 1/2: ObservationRow type doesn't exist; the four
"formatters" operate on three incompatible types.
- Phase 4: RECENCY_WINDOW_MS already imported from
SEARCH_CONSTANTS at every call site.
- Phase 7: getExistingChromaIds is NOT @deprecated and has an
active caller in ChromaSync.backfillMissingSyncs.
- Phase 8: estimateTokens already consolidated.
- Phase 9: knowledge-corpus rewrite blocked on PG-3
prompt-caching cost smoke test.
Phase 3 — Delete SearchManager.findByConcept/findByFile/findByType.
SearchRoutes handlers (handleSearchByConcept/File/Type) now call
searchManager.getOrchestrator().findByXxx() directly via new
getter accessors on SearchManager. ~250 LoC deleted.
Phase 5 — Fail-fast Chroma. Created
src/services/worker/search/errors.ts with ChromaUnavailableError
extends AppError(503, 'CHROMA_UNAVAILABLE'). Deleted
SearchOrchestrator.executeWithFallback's Chroma-failed
SQLite-fallback branch; runtime Chroma errors now throw 503.
"Path 3" (chromaSync was null at construction — explicit-
uninitialized config) preserved as legitimate empty-result state
per plan text. ChromaSearchStrategy.search no longer wraps in
try/catch — errors propagate.
Phase 6 — Delete HybridSearchStrategy three try/catch silent
fallback blocks (findByConcept, findByType, findByFile) at lines
~82-95, ~120-132, ~161-172. Removed `fellBack` field from
StrategySearchResult type and every return site
(SQLiteSearchStrategy, BaseSearchStrategy.emptyResult,
SearchOrchestrator).
Tests updated (Principle 7 — delete in same PR):
- search-orchestrator.test.ts: "fall back to SQLite" rewritten
as "throw ChromaUnavailableError (HTTP 503)".
- chroma/hybrid/sqlite-search-strategy tests: rewritten to
rejects.toThrow; removed fellBack assertions.
Verification: SearchManager.findBy → 0; fellBack → 0 in src/.
bun test tests/worker/search/ → 122 pass, 0 fail.
bun test (suite-wide) → 1407 pass, baseline maintained, 0 new
failures. bun run build succeeds.
Plan: PATHFINDER-2026-04-22/04-read-path.md (Phases 3, 5, 6)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor: land PATHFINDER Plan 03 — ingestion path
Fail-fast parser, direct in-process ingest, recursive fs.watch,
DB-backed tool pairing. Worker-internal HTTP loopback eliminated.
- Phase 0: Created src/services/worker/http/shared.ts exporting
ingestObservation/ingestPrompt/ingestSummary as direct
in-process functions plus ingestEventBus (Node EventEmitter,
reusing existing pattern — no third event bus introduced).
setIngestContext wires the SessionManager dependency from
worker-service constructor.
- Phase 1: src/sdk/parser.ts collapsed to one parseAgentXml
returning { valid:true; kind: 'observation'|'summary'; data }
| { valid:false; reason: string }. Inspects root element;
<skip_summary reason="…"/> is a first-class summary case
with skipped:true. NEVER returns undefined. NEVER coerces.
- Phase 2: ResponseProcessor calls parseAgentXml exactly once,
branches on the discriminated union. On invalid → markFailed
+ logger.warn(reason). On observation → ingestObservation.
On summary → ingestSummary then emit summaryStoredEvent
{ sessionId, messageId } (consumed by Plan 05's blocking
/api/session/end).
- Phase 3: Deleted consecutiveSummaryFailures field
(ResponseProcessor + SessionManager + worker-types) and
MAX_CONSECUTIVE_SUMMARY_FAILURES constant. Circuit-breaker
guards and "tripped" log lines removed.
- Phase 4: coerceObservationToSummary deleted from sdk/parser.ts.
- Phase 5: src/services/transcripts/watcher.ts rescan setInterval
replaced with fs.watch(transcriptsRoot, { recursive: true,
persistent: true }) — Node 20+ recursive mode.
- Phase 6: src/services/transcripts/processor.ts pendingTools
Map deleted. tool_use rows insert with INSERT OR IGNORE on
UNIQUE(session_id, tool_use_id) (added by Plan 01). New
pairToolUsesByJoin query in PendingMessageStore for read-time
pairing (UNIQUE INDEX provides idempotency; explicit consumer
not yet wired).
- Phase 7: HTTP loopback at processor.ts:252 replaced with
direct ingestObservation call. maybeParseJson silent-passthrough
rewritten to fail-fast (throws on malformed JSON).
- Phase 8: src/utils/tag-stripping.ts countTags + stripTagsInternal
collapsed into one alternation regex, single-pass over input.
- Phase 9: src/utils/transcript-parser.ts (dead TranscriptParser
class) deleted. The active extractLastMessage at
src/shared/transcript-parser.ts:41-144 is the sole survivor.
Tests updated (Principle 7 — same-PR delete):
- tests/sdk/parser.test.ts + parse-summary.test.ts: rewritten
to assert discriminated-union shape; coercion-specific
scenarios collapse into { valid:false } assertions.
- tests/worker/agents/response-processor.test.ts: circuit-breaker
describe block skipped; non-XML/empty-response tests assert
fail-fast markFailed behavior.
Verification: every grep returns 0. transcript-parser.ts deleted.
bun run build succeeds. bun test → 1399 pass / 28 fail / 7 skip
(net -8 pass = the 4 retired circuit-breaker tests + 4 collapsed
parser cases). Zero new failures vs baseline.
Deferred (out of Plan 03 scope, will land in Plan 06): SessionRoutes
HTTP route handlers still call sessionManager.queueObservation
inline rather than the new shared helpers — the helpers are ready,
the route swap is mechanical and belongs with the Zod refactor.
Plan: PATHFINDER-2026-04-22/03-ingestion-path.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor: land PATHFINDER Plan 05 — hook surface
Worker-call plumbing collapsed to one helper. Polling replaced by
server-side blocking endpoint. Fail-loud counter surfaces persistent
worker outages via exit code 2.
- Phase 1: plugin/hooks/hooks.json — three 20-iteration `for i in
1..20; do curl -sf .../health && break; sleep 0.1; done` shell
retry wrappers deleted. Hook commands invoke their bun entry
point directly.
- Phase 2: src/shared/worker-utils.ts — added
executeWithWorkerFallback<T>(url, method, body) returning
T | { continue: true; reason?: string }. All 8 hook handlers
(observation, session-init, context, file-context, file-edit,
summarize, session-complete, user-message) rewritten to use
it instead of duplicating the ensureWorkerRunning →
workerHttpRequest → fallback sequence.
- Phase 3: blocking POST /api/session/end in SessionRoutes.ts
using validateBody + sessionEndSchema (z.object({sessionId})).
One-shot ingestEventBus.on('summaryStoredEvent') listener,
30 s timer, req.aborted handler — all share one cleanup so
the listener cannot leak. summarize.ts polling loop, plus
MAX_WAIT_FOR_SUMMARY_MS / POLL_INTERVAL_MS constants, deleted.
- Phase 4: src/shared/hook-settings.ts — loadFromFileOnce()
memoizes SettingsDefaultsManager.loadFromFile per process.
Per-handler settings reads collapsed.
- Phase 5: src/shared/should-track-project.ts — single exclusion
check entry; isProjectExcluded no longer referenced from
src/cli/handlers/.
- Phase 6: cwd validation pushed into adapter normalizeInput
(all 6 adapters: claude-code, cursor, raw, gemini-cli,
windsurf). New AdapterRejectedInput error in
src/cli/adapters/errors.ts. Handler-level isValidCwd checks
deleted from file-edit.ts and observation.ts. hook-command.ts
catches AdapterRejectedInput → graceful fallback.
- Phase 7: session-init.ts conditional initAgent guard deleted;
initAgent is idempotent. tests/hooks/context-reinjection-guard
test (validated the deleted conditional) deleted in same PR
per Principle 7.
- Phase 8: fail-loud counter at ~/.claude-mem/state/hook-failures
.json. Atomic write via .tmp + rename. CLAUDE_MEM_HOOK_FAIL_LOUD
_THRESHOLD setting (default 3). On consecutive worker-unreachable
≥ N: process.exit(2). On success: reset to 0. NOT a retry.
- Phase 9: ensureWorkerAliveOnce() module-scope memoization
wrapping ensureWorkerRunning. executeWithWorkerFallback calls
the memoized version.
Minimal validateBody middleware stub at
src/services/worker/http/middleware/validateBody.ts. Plan 06 will
expand with typed inference + error envelope conventions.
Verification: 4/4 grep targets pass. bun run build succeeds.
bun test → 1393 pass / 28 fail / 7 skip; -6 pass attributable
solely to deleted context-reinjection-guard test file. Zero new
failures vs baseline.
Plan: PATHFINDER-2026-04-22/05-hook-surface.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor: land PATHFINDER Plan 06 — API surface
One Zod-based validator wrapping every POST/PUT. Rate limiter,
diagnostic endpoints, and shutdown wrappers deleted. Failure-
marking consolidated to one helper.
- Phase 1 (preflight): zod@^3 already installed.
- Phase 2: validateBody middleware confirmed at canonical shape
in src/services/worker/http/middleware/validateBody.ts —
safeParse → 400 { error: 'ValidationError', issues: [...] }
on failure, replaces req.body with parsed value on success.
- Phase 3: Per-route Zod schemas declared at the top of each
route file. 24 POST endpoints across SessionRoutes,
CorpusRoutes, DataRoutes, MemoryRoutes, SearchRoutes,
LogsRoutes, SettingsRoutes now wrap with validateBody().
/api/session/end (Plan 05) confirmed using same middleware.
- Phase 4: validateRequired() deleted from BaseRouteHandler
along with every call site. Inline coercion helpers
(coerceStringArray, coercePositiveInteger) and inline
if (!req.body...) guards deleted across all route files.
- Phase 5: Rate limiter middleware and its registration deleted
from src/services/worker/http/middleware.ts. Worker binds
127.0.0.1:37777 — no untrusted caller.
- Phase 6: viewer.html cached at module init in ViewerRoutes.ts
via fs.readFileSync; served as Buffer with text/html content
type. SKILL.md + per-operation .md files cached in
Server.ts as Map<string, string>; loadInstructionContent
helper deleted. NO fs.watch, NO TTL — process restart is the
cache-invalidation event.
- Phase 7: Four diagnostic endpoints deleted from DataRoutes.ts
— /api/pending-queue (GET), /api/pending-queue/process (POST),
/api/pending-queue/failed (DELETE), /api/pending-queue/all
(DELETE). Helper methods that ONLY served them
(getQueueMessages, getStuckCount, getRecentlyProcessed,
clearFailed, clearAll) deleted from PendingMessageStore.
KEPT: /api/processing-status (observability), /health
(used by ensureWorkerRunning).
- Phase 8: stopSupervisor wrapper deleted from supervisor/index.ts.
GracefulShutdown now calls getSupervisor().stop() directly.
Two functions retained with clear roles:
- performGracefulShutdown — worker-side 6-step shutdown
- runShutdownCascade — supervisor-side child teardown
(process.kill(-pgid), Windows tree-kill, PID-file cleanup)
Each has unique non-trivial logic and a single canonical caller.
- Phase 9: transitionMessagesTo(status, filter) is the sole
failure-marking path on PendingMessageStore. Old methods
markSessionMessagesFailed and markAllSessionMessagesAbandoned
deleted along with all callers (worker-service,
SessionCompletionHandler, tests/zombie-prevention).
Tests updated (Principle 7 same-PR delete): coercion test files
refactored to chain validateBody → handler. Zombie-prevention
tests rewritten to call transitionMessagesTo.
Verification: all 4 grep targets → 0. bun run build succeeds.
bun test → 1393 pass / 28 fail / 7 skip — exact match to
baseline. Zero new failures.
Plan: PATHFINDER-2026-04-22/06-api-surface.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor: land PATHFINDER Plan 07 — dead code sweep
ts-prune-driven sweep across the tree after Plans 01-06 landed.
Deleted unused exports, orphan helpers, and one fully orphaned
file. Earlier-plan deletions verified.
Deleted:
- src/utils/bun-path.ts (entire file — getBunPath, getBunPathOrThrow,
isBunAvailable: zero importers)
- bun-resolver.getBunVersionString: zero callers
- PendingMessageStore.retryMessage / resetProcessingToPending /
abortMessage: superseded by transitionMessagesTo (Plan 06 Phase 9)
- EnvManager.MANAGED_CREDENTIAL_KEYS, EnvManager.setCredential:
zero callers
- CodexCliInstaller.checkCodexCliStatus: zero callers; no status
command exists in npx-cli
- Two "REMOVED: cleanupOrphanedSessions" stale-fence comments
Kept (with documented justification):
- Public API surface in dist/sdk/* (parseAgentXml, prompt
builders, ParsedObservation, ParsedSummary, ParseResult,
SUMMARY_MODE_MARKER) — exported via package.json sdk path.
- generateContext / loadContextConfig / token utilities — used
via dynamic await import('../../../context-generator.js') in
worker SearchRoutes.
- MCP_IDE_INSTALLERS, install/uninstall functions for codex/goose
— used via dynamic await import in npx-cli/install.ts +
uninstall.ts (ts-prune cannot trace dynamic imports).
- getExistingChromaIds — active caller in
ChromaSync.backfillMissingSyncs (Plan 04 narrowed scope).
- processPendingQueues / getSessionsWithPendingMessages — active
orphan-recovery caller in worker-service.ts plus
zombie-prevention test coverage.
- StoreAndMarkCompleteResult legacy alias — return-type annotation
in same file.
- All Database.ts barrel re-exports — used downstream.
Earlier-plan verification:
- Plan 03 Phase 9: VERIFIED — src/utils/transcript-parser.ts
is gone; TranscriptParser has 0 references in src/.
- Plan 01 Phase 8: VERIFIED — migration 19 no-op absorbed.
- SessionStore.ts:52-70 consolidation NOT executed (deferred):
the methods are not thin wrappers but ~900 LoC of bodies, and
two methods are documented as intentional mirrors so the
context-generator.cjs bundle stays schema-consistent without
pulling MigrationRunner. Deserves its own plan, not a sweep.
Verification: TranscriptParser → 0; transcript-parser.ts → gone;
no commented-out code markers remain. bun run build succeeds.
bun test → 1393 pass / 28 fail / 7 skip — EXACT match to
baseline. Zero regressions.
Plan: PATHFINDER-2026-04-22/07-dead-code.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: remove residual ProcessRegistry comment reference
Plan 07 dead-code sweep missed one comment-level reference to the
deleted in-memory ProcessRegistry class in SessionManager.ts:347.
Rewritten to describe the supervisor.json scope without naming the
deleted class, completing the verification grep target.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: address Greptile review (P1 + 2× P2)
P1 — Plan 05 Phase 3 blocking endpoint was non-functional:
executeWithWorkerFallback used HEALTH_CHECK_TIMEOUT_MS (3 s) for
the POST /api/session/end call, but the server holds the
connection for SERVER_SIDE_SUMMARY_TIMEOUT_MS (30 s). Client
always raced to a "timed out" rejection that isWorkerUnavailable
classified as worker-unreachable, so the hook silently degraded
instead of waiting for summaryStoredEvent.
- Added optional timeoutMs to executeWithWorkerFallback,
forwarded to workerHttpRequest.
- summarize.ts call site now passes 35_000 (5 s above server
hold window).
P2 — ingestSummary({ kind: 'parsed' }) branch was dead code:
ResponseProcessor emitted summaryStoredEvent directly via the
event bus, bypassing the centralized helper that the comment
claimed was the single source.
- ResponseProcessor now calls ingestSummary({ kind: 'parsed',
sessionDbId, messageId, contentSessionId, parsed }) so the
event-emission path is single-sourced.
- ingestSummary's requireContext() resolution moved inside the
'queue' branch (the only branch that needs sessionManager /
dbManager). 'parsed' is a pure event-bus emission and
doesn't need worker-internal context — fixes mocked
ResponseProcessor unit tests that don't call
setIngestContext.
P2 — isWorkerFallback could false-positive on legitimate API
responses whose schema includes { continue: true, ... }:
- Added a Symbol.for('claude-mem/worker-fallback') brand to
WorkerFallback. isWorkerFallback now checks the brand, not
a duck-typed property name.
Verification: bun run build succeeds. bun test → 1393 pass /
28 fail / 7 skip — exact baseline match. Zero new failures.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: address Greptile iteration 2 (P1 + P2)
P1 — summaryStoredEvent fired regardless of whether the row was
persisted. ResponseProcessor's call to ingestSummary({ kind:
'parsed' }) ran for every parsed.kind === 'summary' even when
result.summaryId came back null (e.g. FK violation, null
memory_session_id at commit). The blocking /api/session/end
endpoint then returned { ok: true } and the Stop hook logged
'Summary stored' for a non-existent row.
- Gate ingestSummary call on (parsed.data.skipped ||
session.lastSummaryStored). Skipped summaries are an explicit
no-op bypass and still confirm; real summaries only confirm
when storage actually wrote a row.
- Non-skipped + summaryId === null path logs a warn and lets
the server-side timeout (504) surface to the hook instead of
a false ok:true.
P2 — PendingMessageStore.enqueue() returns 0 when INSERT OR
IGNORE suppresses a duplicate (the UNIQUE(session_id, tool_use_id)
constraint added by Plan 01 Phase 1). The two callers
(SessionManager.queueObservation and queueSummarize) previously
logged 'ENQUEUED messageId=0' which read like a row was inserted.
- Branch on messageId === 0 and emit a 'DUP_SUPPRESSED' debug
log instead of the misleading ENQUEUED line. No behavior
change — the duplicate is still correctly suppressed by the
DB (Principle 3); only the log surface is corrected.
- confirmProcessed is never called with the enqueue() return
value (it operates on session.processingMessageIds[] from
claimNextMessage), so no caller is broken; the visibility
fix prevents future misuse.
Verification: bun run build succeeds. bun test → 1393 pass /
28 fail / 7 skip — exact baseline match. Zero new failures.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: address Greptile iteration 3 (P1 + 2× P2)
- P1 worker-service.ts: wire ensureGeneratorRunning into the ingest
context after SessionRoutes is constructed. setIngestContext runs
before routes exist, so transcript-watcher observations queued via
ingestObservation() had no way to auto-start the SDK generator.
Added attachIngestGeneratorStarter() to patch the callback in.
- P2 shared.ts: IngestEventBus now sets maxListeners to 0. Concurrent
/api/session/end calls register one listener each and clean up on
completion, so the default-10 warning fires spuriously under normal
load.
- P2 SessionRoutes.ts: handleObservationsByClaudeId now delegates to
ingestObservation() instead of duplicating skip-tool / meta /
privacy / queue logic. Single helper, matching the Plan 03 goal.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: address Greptile iteration 4 (P1 tool-pair + P2 parse/path/doc)
- processor.handleToolResult: restore in-memory tool-use→tool-result
pairing via session.pendingTools for schemas (e.g. Codex) whose
tool_result events carry only tool_use_id + output. Without this,
neither handler fired — all tool observations silently dropped.
- processor.maybeParseJson: return raw string on parse failure instead
of throwing. Previously a single malformed JSON-shaped field caused
handleLine's outer catch to discard the entire transcript line.
- watcher.deepestNonGlobAncestor: split on / and \\, emit empty string
for purely-glob inputs so the caller skips the watch instead of
anchoring fs.watch at the filesystem root. Windows-compatible.
- PendingMessageStore.enqueue: tighten docstring — callers today only
log on the returned id; the SessionManager branches on id === 0.
* fix: forward tool_use_id through ingestObservation (Greptile iter 5)
P1 — Plan 01's UNIQUE(content_session_id, tool_use_id) dedup never
fired because the new shared ingest path dropped the toolUseId before
queueObservation. SQLite treats NULL values as distinct for UNIQUE,
so every replayed transcript line landed a duplicate row.
- shared.ingestObservation: forward payload.toolUseId to
queueObservation so INSERT OR IGNORE can actually collapse.
- SessionRoutes.handleObservationsByClaudeId: destructure both
tool_use_id (HTTP convention) and toolUseId (JS convention) from
req.body and pass into ingestObservation.
- observationsByClaudeIdSchema: declare both keys explicitly so the
validator doesn't rely on .passthrough() alone.
* fix: drop dead pairToolUsesByJoin, close session-end listener race
- PendingMessageStore: delete pairToolUsesByJoin. The method was never
called and its self-join semantics are structurally incompatible
with UNIQUE(content_session_id, tool_use_id): INSERT OR IGNORE
collapses any second row with the same pair, so a self-join can
only ever match a row to itself. In-memory pendingTools in
processor.ts remains the pairing path for split-event schemas.
- IngestEventBus: retain a short-lived (60s) recentStored map keyed
by sessionId. Populated on summaryStoredEvent emit, evicted on
consume or TTL.
- handleSessionEnd: drain the recent-events buffer before attaching
the listener. Closes the register-after-emit race where the summary
can persist between the hook's summarize POST and its session/end
POST — previously that window returned 504 after the 30s timeout.
* chore: merge origin/main into vivacious-teeth
Resolves conflicts with 15 commits on main (v12.3.9, security
observation types, Telegram notifier, PID-reuse worker start-guard).
Conflict resolution strategy:
- plugin/hooks/hooks.json, plugin/scripts/*.cjs, plugin/ui/viewer-bundle.js:
kept ours — PATHFINDER Plan 05 deletes the for-i-in-1-to-20 curl retry
loops and the built artifacts regenerate on build.
- src/cli/handlers/summarize.ts: kept ours — Plan 05 blocking
POST /api/session/end supersedes main's fire-and-forget path.
- src/services/worker-service.ts: kept ours — Plan 05 ingest bus +
summaryStoredEvent supersedes main's SessionCompletionHandler DI
refactor + orphan-reaper fallback.
- src/services/worker/http/routes/SessionRoutes.ts: kept ours — same
reason; generator .finally() Stop-hook self-clean is a guard for a
path our blocking endpoint removes.
- src/services/worker/http/routes/CorpusRoutes.ts: merged — added
security_alert / security_note to ALLOWED_CORPUS_TYPES (feature from
#2084) while preserving our Zod validateBody schema.
Typecheck: 294 errors (vs 298 pre-merge). No new errors introduced; all
remaining are pre-existing (Component-enum gaps, DOM lib for viewer,
bun:sqlite types).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: address Greptile P2 findings
1) SessionRoutes.handleSessionEnd was the only route handler not wrapped
in wrapHandler — synchronous exceptions would hang the client rather
than surfacing as 500s. Wrap it like every other handler.
2) processor.handleToolResult only consumed the session.pendingTools
entry when the tool_result arrived without a toolName. In the
split-schema path where tool_result carries both toolName and toolId,
the entry was never deleted and the map grew for the life of the
session. Consume the entry whenever toolId is present.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: typing cleanup and viewer tsconfig split for PR feedback
- Add explicit return types for SessionStore query methods
- Exclude src/ui/viewer from root tsconfig, give it its own DOM-typed config
- Add bun to root tsconfig types, plus misc typing tweaks flagged by Greptile
- Rebuilt plugin/scripts/* artifacts
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: address Greptile P2 findings (iter 2)
- PendingMessageStore.transitionMessagesTo: require sessionDbId (drop
the unscoped-drain branch that would nuke every pending/processing
row across all sessions if a future caller omitted the filter).
- IngestEventBus.takeRecentSummaryStored: make idempotent — keep the
cached event until TTL eviction so a retried Stop hook's second
/api/session/end returns immediately instead of hanging 30 s.
- TranscriptWatcher fs.watch callback: skip full glob scan for paths
already tailed (JSONL appends fire on every line; only unknown
paths warrant a rescan).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: call finalizeSession in terminal session paths (Greptile iter 3)
terminateSession and runFallbackForTerminatedSession previously called
SessionCompletionHandler.finalizeSession before removeSessionImmediate;
the refactor dropped those calls, leaving sdk_sessions.status='active'
for every session killed by wall-clock limit, unrecoverable error, or
exhausted fallback chain. The deleted reapStaleSessions interval was
the only prior backstop.
Re-wires finalizeSession (idempotent: marks completed, drains pending,
broadcasts) into both paths; no reaper reintroduced.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: GC failed pending_messages rows at startup (Greptile iter 4)
Plan 07 deleted clearFailed/clearFailedOlderThan as "dead code", but
with the periodic sweep also removed, nothing reaps status='failed'
rows now — they accumulate indefinitely. Since claimNextMessage's
self-healing subquery scans this table, unbounded growth degrades
claim latency over time.
Re-introduces clearFailedOlderThan and calls it once at worker startup
(not a reaper — one-shot, idempotent). 7-day retention keeps enough
history for operator inspection while bounding the table.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: finalize sessions on normal exit; cleanup hoist; share handler (iter 5)
1. startSessionProcessor success branch now calls completionHandler.
finalizeSession before removeSessionImmediate. Hooks-disabled installs
(and any Stop hook that fails before POST /api/sessions/complete) no
longer leave sdk_sessions rows as status='active' forever. Idempotent
— a subsequent /api/sessions/complete is a no-op.
2. Hoist SessionRoutes.handleSessionEnd cleanup declaration above the
closures that reference it (TDZ safety; safe at runtime today but
fragile if timeout ever shrinks).
3. SessionRoutes now receives WorkerService's shared SessionCompletionHandler
instead of constructing its own — prevents silent divergence if the
handler ever becomes stateful.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: stop runaway crash-recovery loop on dead sessions
Two distinct bugs were combining to keep a dead session restarting forever:
Bug 1 (uncaught "The operation was aborted."):
child_process.spawn emits 'error' asynchronously for ENOENT/EACCES/abort
signal aborts. spawnSdkProcess() never attached an 'error' listener, so
any async spawn failure became uncaughtException and escaped to the
daemon-level handler. Attach an 'error' listener immediately after spawn,
before the !child.pid early-return, so async spawn errors are logged
(with errno code) and swallowed locally.
Bug 2 (sliding-window limiter never trips on slow restart cadence):
RestartGuard tripped only when restartTimestamps.length exceeded
MAX_WINDOWED_RESTARTS (10) within RESTART_WINDOW_MS (60s). With the 8s
exponential-backoff cap, only ~7-8 restarts fit in the window, so a dead
session that fail-restart-fail-restart on 8s cycles would loop forever
(consecutiveRestarts climbing past 30+ in observed logs). Add a
consecutiveFailures counter that increments on every restart and resets
only on recordSuccess(). Trip when consecutive failures exceed
MAX_CONSECUTIVE_FAILURES (5) — meaning 5 restarts with zero successful
processing in between proves the session is dead. Both guards now run in
parallel: tight loops still trip the windowed cap; slow loops trip the
consecutive-failure cap.
Also: when the SessionRoutes path trips the guard, drain pending messages
to 'abandoned' so the session does not reappear in
getSessionsWithPendingMessages and trigger another auto-start cycle. The
worker-service.ts path already does this via terminateSession.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* perf: streamline worker startup and consolidate database connections
1. Database Pooling: Modified DatabaseManager, SessionStore, and SessionSearch to share a single bun:sqlite connection, eliminating redundant file descriptors.
2. Non-blocking Startup: Refactored WorktreeAdoption and Chroma backfill to run in the background (fire-and-forget), preventing them from stalling core initialization.
3. Diagnostic Routes: Added /api/chroma/status and bypassed the initialization guard for health/readiness endpoints to allow diagnostics during startup.
4. Robust Search: Implemented reliable SQLite FTS5 fallback in SearchManager for when Chroma (uvx) fails or is unavailable.
5. Code Cleanup: Removed redundant loopback MCP checks and mangled initialization logic from WorkerService.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: hard-exclude observer-sessions from hooks; bundle migration 29 (#2124)
* fix: hard-exclude observer-sessions from hooks; backfill bundle migrations
Stop hook + SessionEnd hook were storing the SDK observer's own
init/continuation/summary prompts in user_prompts, leaking into the
viewer (meta-observation regression). 25 such rows accumulated.
- shouldTrackProject: hard-reject OBSERVER_SESSIONS_DIR (and its subtree)
before consulting user-configured exclusion globs.
- summarize.ts (Stop) and session-complete.ts (SessionEnd): early-return
when shouldTrackProject(cwd) is false, so the observer's own hooks
cannot bootstrap the worker or queue a summary against the meta-session.
- SessionRoutes: cap user-prompt body at 256 KiB at the session-init
boundary so a runaway observer prompt cannot blow up storage.
- SessionStore: add migration 29 (UNIQUE(memory_session_id, content_hash)
on observations) inline so bundled artifacts (worker-service.cjs,
context-generator.cjs) stay schema-consistent — without it, the
ON CONFLICT clause in observation inserts throws.
- spawnSdkProcess: stdio[stdin] from 'ignore' to 'pipe' so the
supervisor can actually feed the observer's stdin.
Also rebuilds plugin/scripts/{worker-service,context-generator}.cjs.
* fix: walk back to UTF-8 boundary on prompt truncation (Greptile P2)
Plain Buffer.subarray at MAX_USER_PROMPT_BYTES can land mid-codepoint,
which the utf8 decoder silently rewrites to U+FFFD. Walk back over any
continuation bytes (0b10xxxxxx) before decoding so the truncated prompt
ends on a valid sequence boundary instead of a replacement character.
* fix: cross-platform observer-dir containment; clarify SDK stdin pipe
claude-review feedback on PR #2124.
- shouldTrackProject: literal `cwd.startsWith(OBSERVER_SESSIONS_DIR + '/')`
hard-coded a POSIX separator and missed Windows backslash paths plus any
trailing-slash variance. Switched to a path.relative-based isWithin()
helper so Windows hook input under observer-sessions\\... is also excluded.
- spawnSdkProcess: added a comment explaining why stdin must be 'pipe' —
SpawnedSdkProcess.stdin is typed NonNullable and the Claude Agent SDK
consumes that pipe; 'ignore' would null it and the null-check below
would tear the child down on every spawn.
* fix: make Stop hook fire-and-forget; remove dead /api/session/end
The Stop hook was awaiting a 35-second long-poll on /api/session/end,
which the worker held open until the summary-stored event fired (or its
30s server-side timeout elapsed). Followed by another await on
/api/sessions/complete. Three sequential awaits, the middle one a 30s
hold — not fire-and-forget despite repeated requests.
The Stop hook now does ONE thing: POST /api/sessions/summarize to
queue the summary work and return. The worker drives the rest async.
Session-map cleanup is performed by the SessionEnd handler
(session-complete.ts), not duplicated here.
- summarize.ts: drop the /api/session/end long-poll and the trailing
/api/sessions/complete await; ~40 lines removed; unused
SessionEndResponse interface gone; header comment rewritten.
- SessionRoutes: delete handleSessionEnd, sessionEndSchema, the
SERVER_SIDE_SUMMARY_TIMEOUT_MS constant, and the /api/session/end
route registration. Drop the now-unused ingestEventBus and
SummaryStoredEvent imports.
- ResponseProcessor + shared.ts + worker-utils.ts: update stale
comments that referenced the dead endpoint. The IngestEventBus is
left in place dormant (no listeners) for follow-up cleanup so this
PR stays focused on the blocker.
Bundle artifact (worker-service.cjs) rebuilt via build-and-sync.
Verification:
- grep '/api/session/end' plugin/scripts/worker-service.cjs → 0
- grep 'timeoutMs:35' plugin/scripts/worker-service.cjs → 0
- Worker restarted clean, /api/health ok at pid 92368
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* deps: bump all dependencies to latest including majors
Upgrades: React 18→19, Express 4→5, Zod 3→4, TypeScript 5→6,
@types/node 20→25, @anthropic-ai/claude-agent-sdk 0.1→0.2,
@clack/prompts 0.9→1.2, plus minors. Adds Daily Maintenance section
to CLAUDE.md mandating latest-version policy across manifests.
Express 5 surfaced a race in Server.listen() where the 'error' handler
was attached after listen() was invoked; refactored to use
http.createServer with both 'error' and 'listening' handlers attached
before listen(), restoring port-conflict rejection semantics.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: surface real chroma errors and add deep status probe
Replace the misleading "Vector search failed - semantic search unavailable.
Install uv... restart the worker." string in SearchManager with the actual
exception text from chroma_query_documents. The lying message blamed `uv`
for any failure — even when the real cause was a chroma-mcp transport
timeout, an empty collection, or a dead subprocess.
Also add /api/chroma/status?deep=1 backed by a new
ChromaMcpManager.probeSemanticSearch() that round-trips a real query
(chroma_list_collections + chroma_query_documents) instead of just
checking the stdio handshake. The cheap default path is unchanged.
Includes the diagnostic plan (PLAN-fix-mcp-search.md) and updated test
fixtures for the new structured failure message.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: rebuild worker-service bundle to match merged src
Bundle was stale after the squash merge of #2124 — it still contained
the old "Install uv... semantic search unavailable" string and lacked
probeSemanticSearch. Rebuilt via bun run build-and-sync.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: address coderabbit feedback on PLAN-fix-mcp-search.md
- replace machine-specific /Users/alexnewman absolute paths with portable
<repo-root> placeholder (MD-style portability)
- add blank lines around the TypeScript fenced block (MD031)
- tag the bare fenced block with `text` (MD040)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,4 +1,5 @@
|
||||
import type { PlatformAdapter, NormalizedHookInput, HookResult } from '../types.js';
|
||||
import { AdapterRejectedInput, isValidCwd } from './errors.js';
|
||||
|
||||
// Maps Claude Code stdin format (session_id, cwd, tool_name, etc.)
|
||||
// SessionStart hooks receive no stdin, so we must handle undefined input gracefully
|
||||
@@ -12,9 +13,15 @@ const pickAgentField = (v: unknown): string | undefined =>
|
||||
export const claudeCodeAdapter: PlatformAdapter = {
|
||||
normalizeInput(raw) {
|
||||
const r = (raw ?? {}) as any;
|
||||
// Plan 05 Phase 6 — cwd validation at the adapter boundary (single check,
|
||||
// not duplicated in handlers). Falls back to process.cwd() when unset.
|
||||
const cwd = r.cwd ?? process.cwd();
|
||||
if (!isValidCwd(cwd)) {
|
||||
throw new AdapterRejectedInput('invalid_cwd');
|
||||
}
|
||||
return {
|
||||
sessionId: r.session_id ?? r.id ?? r.sessionId,
|
||||
cwd: r.cwd ?? process.cwd(),
|
||||
cwd,
|
||||
prompt: r.prompt,
|
||||
toolName: r.tool_name,
|
||||
toolInput: r.tool_input,
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
import type { PlatformAdapter, NormalizedHookInput, HookResult } from '../types.js';
|
||||
import { AdapterRejectedInput, isValidCwd } from './errors.js';
|
||||
|
||||
// Maps Cursor stdin format - field names differ from Claude Code
|
||||
// Cursor uses: conversation_id, workspace_roots[], result_json, command/output
|
||||
@@ -13,9 +14,14 @@ export const cursorAdapter: PlatformAdapter = {
|
||||
const r = (raw ?? {}) as any;
|
||||
// Cursor-specific: shell commands come as command/output instead of tool_name/input/response
|
||||
const isShellCommand = !!r.command && !r.tool_name;
|
||||
// Plan 05 Phase 6 — cwd validation at the adapter boundary.
|
||||
const cwd = r.workspace_roots?.[0] ?? r.cwd ?? process.cwd();
|
||||
if (!isValidCwd(cwd)) {
|
||||
throw new AdapterRejectedInput('invalid_cwd');
|
||||
}
|
||||
return {
|
||||
sessionId: r.conversation_id || r.generation_id || r.id,
|
||||
cwd: r.workspace_roots?.[0] ?? r.cwd ?? process.cwd(),
|
||||
cwd,
|
||||
prompt: r.prompt ?? r.query ?? r.input ?? r.message,
|
||||
toolName: isShellCommand ? 'Bash' : r.tool_name,
|
||||
toolInput: isShellCommand ? { command: r.command } : r.tool_input,
|
||||
|
||||
@@ -0,0 +1,24 @@
|
||||
/**
|
||||
* Adapter-layer rejection. Plan 05 Phase 6 (PATHFINDER-2026-04-22): cwd
|
||||
* validation moves from per-handler `if (!cwd) throw …` to the adapter
|
||||
* boundary. When normalization detects an invalid input, the adapter throws
|
||||
* `AdapterRejectedInput`; the hook runner translates it into a graceful
|
||||
* `{ continue: true }` so the user's session is never blocked by a malformed
|
||||
* hook payload.
|
||||
*/
|
||||
|
||||
export class AdapterRejectedInput extends Error {
|
||||
constructor(public readonly reason: string) {
|
||||
super(`adapter rejected input: ${reason}`);
|
||||
this.name = 'AdapterRejectedInput';
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* A cwd is valid when it is a non-empty string. The adapter normalizers fall
|
||||
* back to `process.cwd()` when the inbound payload omits cwd, so the only way
|
||||
* this returns false is when the payload supplies `null`/`''`/non-string.
|
||||
*/
|
||||
export function isValidCwd(cwd: unknown): cwd is string {
|
||||
return typeof cwd === 'string' && cwd.length > 0;
|
||||
}
|
||||
@@ -1,4 +1,5 @@
|
||||
import type { PlatformAdapter } from '../types.js';
|
||||
import { AdapterRejectedInput, isValidCwd } from './errors.js';
|
||||
|
||||
/**
|
||||
* Gemini CLI Platform Adapter
|
||||
@@ -39,6 +40,10 @@ export const geminiCliAdapter: PlatformAdapter = {
|
||||
?? process.env.GEMINI_PROJECT_DIR
|
||||
?? process.env.CLAUDE_PROJECT_DIR
|
||||
?? process.cwd();
|
||||
// Plan 05 Phase 6 — cwd validation at the adapter boundary.
|
||||
if (!isValidCwd(cwd)) {
|
||||
throw new AdapterRejectedInput('invalid_cwd');
|
||||
}
|
||||
|
||||
const sessionId = r.session_id
|
||||
?? process.env.GEMINI_SESSION_ID
|
||||
|
||||
@@ -1,12 +1,18 @@
|
||||
import type { PlatformAdapter, NormalizedHookInput, HookResult } from '../types.js';
|
||||
import { AdapterRejectedInput, isValidCwd } from './errors.js';
|
||||
|
||||
// Raw adapter passes through with minimal transformation - useful for testing
|
||||
export const rawAdapter: PlatformAdapter = {
|
||||
normalizeInput(raw) {
|
||||
const r = raw as any;
|
||||
const r = (raw ?? {}) as any;
|
||||
// Plan 05 Phase 6 — cwd validation at the adapter boundary.
|
||||
const cwd = r.cwd ?? process.cwd();
|
||||
if (!isValidCwd(cwd)) {
|
||||
throw new AdapterRejectedInput('invalid_cwd');
|
||||
}
|
||||
return {
|
||||
sessionId: r.sessionId ?? r.session_id ?? 'unknown',
|
||||
cwd: r.cwd ?? process.cwd(),
|
||||
cwd,
|
||||
prompt: r.prompt,
|
||||
toolName: r.toolName ?? r.tool_name,
|
||||
toolInput: r.toolInput ?? r.tool_input,
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
import type { PlatformAdapter, NormalizedHookInput, HookResult } from '../types.js';
|
||||
import { AdapterRejectedInput, isValidCwd } from './errors.js';
|
||||
|
||||
// Maps Windsurf stdin format — JSON envelope with agent_action_name + tool_info payload
|
||||
//
|
||||
@@ -17,9 +18,15 @@ export const windsurfAdapter: PlatformAdapter = {
|
||||
const toolInfo = r.tool_info ?? {};
|
||||
const actionName: string = r.agent_action_name ?? '';
|
||||
|
||||
// Plan 05 Phase 6 — cwd validation at the adapter boundary.
|
||||
const cwd = toolInfo.cwd ?? process.cwd();
|
||||
if (!isValidCwd(cwd)) {
|
||||
throw new AdapterRejectedInput('invalid_cwd');
|
||||
}
|
||||
|
||||
const base: NormalizedHookInput = {
|
||||
sessionId: r.trajectory_id ?? r.execution_id,
|
||||
cwd: toolInfo.cwd ?? process.cwd(),
|
||||
cwd,
|
||||
platform: 'windsurf',
|
||||
};
|
||||
|
||||
|
||||
+28
-40
@@ -6,34 +6,24 @@
|
||||
*/
|
||||
|
||||
import type { EventHandler, NormalizedHookInput, HookResult } from '../types.js';
|
||||
import { ensureWorkerRunning, getWorkerPort, workerHttpRequest } from '../../shared/worker-utils.js';
|
||||
import {
|
||||
executeWithWorkerFallback,
|
||||
isWorkerFallback,
|
||||
getWorkerPort,
|
||||
} from '../../shared/worker-utils.js';
|
||||
import { getProjectContext } from '../../utils/project-name.js';
|
||||
import { HOOK_EXIT_CODES } from '../../shared/hook-constants.js';
|
||||
import { logger } from '../../utils/logger.js';
|
||||
import { SettingsDefaultsManager } from '../../shared/SettingsDefaultsManager.js';
|
||||
import { USER_SETTINGS_PATH } from '../../shared/paths.js';
|
||||
import { loadFromFileOnce } from '../../shared/hook-settings.js';
|
||||
|
||||
export const contextHandler: EventHandler = {
|
||||
async execute(input: NormalizedHookInput): Promise<HookResult> {
|
||||
// Ensure worker is running before any other logic
|
||||
const workerReady = await ensureWorkerRunning();
|
||||
if (!workerReady) {
|
||||
// Worker not available - return empty context gracefully
|
||||
return {
|
||||
hookSpecificOutput: {
|
||||
hookEventName: 'SessionStart',
|
||||
additionalContext: ''
|
||||
},
|
||||
exitCode: HOOK_EXIT_CODES.SUCCESS
|
||||
};
|
||||
}
|
||||
|
||||
const cwd = input.cwd ?? process.cwd();
|
||||
const context = getProjectContext(cwd);
|
||||
const port = getWorkerPort();
|
||||
|
||||
// Check if terminal output should be shown (load settings early)
|
||||
const settings = SettingsDefaultsManager.loadFromFile(USER_SETTINGS_PATH);
|
||||
// Plan 05 Phase 4: settings via process-scope cache.
|
||||
const settings = loadFromFileOnce();
|
||||
const showTerminalOutput = settings.CLAUDE_MEM_CONTEXT_SHOW_TERMINAL_OUTPUT === 'true';
|
||||
|
||||
// Pass all projects (parent + worktree if applicable) for unified timeline
|
||||
@@ -41,38 +31,36 @@ export const contextHandler: EventHandler = {
|
||||
const apiPath = `/api/context/inject?projects=${encodeURIComponent(projectsParam)}`;
|
||||
const colorApiPath = input.platform === 'claude-code' ? `${apiPath}&colors=true` : apiPath;
|
||||
|
||||
const emptyResult = {
|
||||
const emptyResult: HookResult = {
|
||||
hookSpecificOutput: { hookEventName: 'SessionStart', additionalContext: '' },
|
||||
exitCode: HOOK_EXIT_CODES.SUCCESS
|
||||
exitCode: HOOK_EXIT_CODES.SUCCESS,
|
||||
};
|
||||
|
||||
// Note: Removed AbortSignal.timeout due to Windows Bun cleanup issue (libuv assertion)
|
||||
// Worker service has its own timeouts, so client-side timeout is redundant
|
||||
let response: Response;
|
||||
let colorResponse: Response | null;
|
||||
try {
|
||||
[response, colorResponse] = await Promise.all([
|
||||
workerHttpRequest(apiPath),
|
||||
showTerminalOutput ? workerHttpRequest(colorApiPath).catch(() => null) : Promise.resolve(null)
|
||||
]);
|
||||
} catch (error) {
|
||||
// Worker unreachable — return empty context gracefully
|
||||
logger.warn('HOOK', 'Context fetch error, returning empty', { error: error instanceof Error ? error.message : String(error) });
|
||||
// Plan 05 Phase 2: single helper for ensure-worker-alive → request → fallback.
|
||||
const contextResult = await executeWithWorkerFallback<string>(apiPath, 'GET');
|
||||
if (isWorkerFallback(contextResult)) {
|
||||
return emptyResult;
|
||||
}
|
||||
|
||||
if (!response.ok) {
|
||||
logger.warn('HOOK', 'Context generation failed, returning empty', { status: response.status });
|
||||
let additionalContext: string;
|
||||
if (typeof contextResult === 'string') {
|
||||
additionalContext = contextResult.trim();
|
||||
} else if (contextResult === undefined) {
|
||||
additionalContext = '';
|
||||
} else {
|
||||
// Unexpected non-string body — log and fall back to empty.
|
||||
logger.warn('HOOK', 'Context response was not a string', { type: typeof contextResult });
|
||||
return emptyResult;
|
||||
}
|
||||
|
||||
const [contextResult, colorResult] = await Promise.all([
|
||||
response.text(),
|
||||
colorResponse?.ok ? colorResponse.text() : Promise.resolve('')
|
||||
]);
|
||||
let coloredTimeline = '';
|
||||
if (showTerminalOutput) {
|
||||
const colorResult = await executeWithWorkerFallback<string>(colorApiPath, 'GET');
|
||||
if (!isWorkerFallback(colorResult) && typeof colorResult === 'string') {
|
||||
coloredTimeline = colorResult.trim();
|
||||
}
|
||||
}
|
||||
|
||||
const additionalContext = contextResult.trim();
|
||||
const coloredTimeline = colorResult.trim();
|
||||
const platform = input.platform;
|
||||
|
||||
// Use colored timeline for display if available, otherwise fall back to
|
||||
|
||||
@@ -6,14 +6,12 @@
|
||||
*/
|
||||
|
||||
import type { EventHandler, NormalizedHookInput, HookResult } from '../types.js';
|
||||
import { ensureWorkerRunning, workerHttpRequest } from '../../shared/worker-utils.js';
|
||||
import { executeWithWorkerFallback, isWorkerFallback } from '../../shared/worker-utils.js';
|
||||
import { logger } from '../../utils/logger.js';
|
||||
import { parseJsonArray } from '../../shared/timeline-formatting.js';
|
||||
import { statSync } from 'fs';
|
||||
import path from 'path';
|
||||
import { isProjectExcluded } from '../../utils/project-filter.js';
|
||||
import { SettingsDefaultsManager } from '../../shared/SettingsDefaultsManager.js';
|
||||
import { USER_SETTINGS_PATH } from '../../shared/paths.js';
|
||||
import { shouldTrackProject } from '../../shared/should-track-project.js';
|
||||
import { getProjectContext } from '../../utils/project-name.js';
|
||||
|
||||
/** Skip the gate for files smaller than this — timeline overhead exceeds file read cost. */
|
||||
@@ -207,19 +205,12 @@ export const fileContextHandler: EventHandler = {
|
||||
logger.debug('HOOK', 'File stat failed, proceeding with gate', { error: err instanceof Error ? err.message : String(err) });
|
||||
}
|
||||
|
||||
// Check if project is excluded from tracking
|
||||
const settings = SettingsDefaultsManager.loadFromFile(USER_SETTINGS_PATH);
|
||||
if (input.cwd && isProjectExcluded(input.cwd, settings.CLAUDE_MEM_EXCLUDED_PROJECTS)) {
|
||||
// Plan 05 Phase 5: project exclusion via single helper.
|
||||
if (input.cwd && !shouldTrackProject(input.cwd)) {
|
||||
logger.debug('HOOK', 'Project excluded from tracking, skipping file context', { cwd: input.cwd });
|
||||
return { continue: true, suppressOutput: true };
|
||||
}
|
||||
|
||||
// Ensure worker is running
|
||||
const workerReady = await ensureWorkerRunning();
|
||||
if (!workerReady) {
|
||||
return { continue: true, suppressOutput: true };
|
||||
}
|
||||
|
||||
// Query worker for observations related to this file
|
||||
const context = getProjectContext(input.cwd);
|
||||
const cwd = input.cwd || process.cwd();
|
||||
@@ -232,22 +223,19 @@ export const fileContextHandler: EventHandler = {
|
||||
}
|
||||
queryParams.set('limit', String(FETCH_LOOKAHEAD_LIMIT));
|
||||
|
||||
let data: { observations: ObservationRow[]; count: number };
|
||||
try {
|
||||
const response = await workerHttpRequest(`/api/observations/by-file?${queryParams.toString()}`, { method: 'GET' });
|
||||
|
||||
if (!response.ok) {
|
||||
logger.warn('HOOK', 'File context query failed, skipping', { status: response.status, filePath });
|
||||
return { continue: true, suppressOutput: true };
|
||||
}
|
||||
|
||||
data = await response.json() as { observations: ObservationRow[]; count: number };
|
||||
} catch (error) {
|
||||
logger.warn('HOOK', 'File context fetch error, skipping', {
|
||||
error: error instanceof Error ? error.message : String(error),
|
||||
});
|
||||
// Plan 05 Phase 2: single helper for ensure-worker-alive → request → fallback.
|
||||
const result = await executeWithWorkerFallback<{ observations: ObservationRow[]; count: number }>(
|
||||
`/api/observations/by-file?${queryParams.toString()}`,
|
||||
'GET',
|
||||
);
|
||||
if (isWorkerFallback(result)) {
|
||||
return { continue: true, suppressOutput: true };
|
||||
}
|
||||
if (!result || !Array.isArray((result as any).observations)) {
|
||||
logger.warn('HOOK', 'File context query returned malformed body, skipping', { filePath });
|
||||
return { continue: true, suppressOutput: true };
|
||||
}
|
||||
const data = result;
|
||||
|
||||
if (!data.observations || data.observations.length === 0) {
|
||||
return { continue: true, suppressOutput: true };
|
||||
|
||||
@@ -6,35 +6,13 @@
|
||||
*/
|
||||
|
||||
import type { EventHandler, NormalizedHookInput, HookResult } from '../types.js';
|
||||
import { ensureWorkerRunning, workerHttpRequest } from '../../shared/worker-utils.js';
|
||||
import { executeWithWorkerFallback, isWorkerFallback } from '../../shared/worker-utils.js';
|
||||
import { logger } from '../../utils/logger.js';
|
||||
import { HOOK_EXIT_CODES } from '../../shared/hook-constants.js';
|
||||
import { normalizePlatformSource } from '../../shared/platform-source.js';
|
||||
|
||||
async function sendFileEditObservation(requestBody: string, filePath: string): Promise<void> {
|
||||
const response = await workerHttpRequest('/api/sessions/observations', {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: requestBody
|
||||
});
|
||||
|
||||
if (!response.ok) {
|
||||
logger.warn('HOOK', 'File edit observation storage failed, skipping', { status: response.status, filePath });
|
||||
return;
|
||||
}
|
||||
|
||||
logger.debug('HOOK', 'File edit observation sent successfully', { filePath });
|
||||
}
|
||||
|
||||
export const fileEditHandler: EventHandler = {
|
||||
async execute(input: NormalizedHookInput): Promise<HookResult> {
|
||||
// Ensure worker is running before any other logic
|
||||
const workerReady = await ensureWorkerRunning();
|
||||
if (!workerReady) {
|
||||
// Worker not available - skip file edit observation gracefully
|
||||
return { continue: true, suppressOutput: true, exitCode: HOOK_EXIT_CODES.SUCCESS };
|
||||
}
|
||||
|
||||
const { sessionId, cwd, filePath, edits } = input;
|
||||
const platformSource = normalizePlatformSource(input.platform);
|
||||
|
||||
@@ -46,30 +24,31 @@ export const fileEditHandler: EventHandler = {
|
||||
editCount: edits?.length ?? 0
|
||||
});
|
||||
|
||||
// Validate required fields before sending to worker
|
||||
// Plan 05 Phase 6: cwd is validated at the adapter boundary; this is a
|
||||
// belt-and-suspenders type guard so TypeScript narrows.
|
||||
if (!cwd) {
|
||||
throw new Error(`Missing cwd in FileEdit hook input for session ${sessionId}, file ${filePath}`);
|
||||
}
|
||||
|
||||
// Send to worker as an observation with file edit metadata
|
||||
// The observation handler on the worker will process this appropriately
|
||||
const requestBody = JSON.stringify({
|
||||
contentSessionId: sessionId,
|
||||
platformSource,
|
||||
tool_name: 'write_file',
|
||||
tool_input: { filePath, edits },
|
||||
tool_response: { success: true },
|
||||
cwd
|
||||
});
|
||||
// Plan 05 Phase 2: single helper for ensure-worker-alive → request → fallback.
|
||||
const result = await executeWithWorkerFallback<{ status?: string }>(
|
||||
'/api/sessions/observations',
|
||||
'POST',
|
||||
{
|
||||
contentSessionId: sessionId,
|
||||
platformSource,
|
||||
tool_name: 'write_file',
|
||||
tool_input: { filePath, edits },
|
||||
tool_response: { success: true },
|
||||
cwd,
|
||||
},
|
||||
);
|
||||
|
||||
try {
|
||||
await sendFileEditObservation(requestBody, filePath);
|
||||
} catch (error) {
|
||||
// Worker unreachable — skip file edit observation gracefully
|
||||
logger.warn('HOOK', 'File edit observation fetch error, skipping', { error: error instanceof Error ? error.message : String(error) });
|
||||
if (isWorkerFallback(result)) {
|
||||
return { continue: true, suppressOutput: true, exitCode: HOOK_EXIT_CODES.SUCCESS };
|
||||
}
|
||||
|
||||
logger.debug('HOOK', 'File edit observation sent successfully', { filePath });
|
||||
return { continue: true, suppressOutput: true };
|
||||
}
|
||||
},
|
||||
};
|
||||
|
||||
@@ -5,38 +5,14 @@
|
||||
*/
|
||||
|
||||
import type { EventHandler, NormalizedHookInput, HookResult } from '../types.js';
|
||||
import { ensureWorkerRunning, workerHttpRequest } from '../../shared/worker-utils.js';
|
||||
import { executeWithWorkerFallback, isWorkerFallback } from '../../shared/worker-utils.js';
|
||||
import { logger } from '../../utils/logger.js';
|
||||
import { HOOK_EXIT_CODES } from '../../shared/hook-constants.js';
|
||||
import { isProjectExcluded } from '../../utils/project-filter.js';
|
||||
import { SettingsDefaultsManager } from '../../shared/SettingsDefaultsManager.js';
|
||||
import { USER_SETTINGS_PATH } from '../../shared/paths.js';
|
||||
import { shouldTrackProject } from '../../shared/should-track-project.js';
|
||||
import { normalizePlatformSource } from '../../shared/platform-source.js';
|
||||
|
||||
async function sendObservationToWorker(requestBody: string, toolName: string): Promise<void> {
|
||||
const response = await workerHttpRequest('/api/sessions/observations', {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: requestBody
|
||||
});
|
||||
|
||||
if (!response.ok) {
|
||||
logger.warn('HOOK', 'Observation storage failed, skipping', { status: response.status, toolName });
|
||||
return;
|
||||
}
|
||||
|
||||
logger.debug('HOOK', 'Observation sent successfully', { toolName });
|
||||
}
|
||||
|
||||
export const observationHandler: EventHandler = {
|
||||
async execute(input: NormalizedHookInput): Promise<HookResult> {
|
||||
// Ensure worker is running before any other logic
|
||||
const workerReady = await ensureWorkerRunning();
|
||||
if (!workerReady) {
|
||||
// Worker not available - skip observation gracefully
|
||||
return { continue: true, suppressOutput: true, exitCode: HOOK_EXIT_CODES.SUCCESS };
|
||||
}
|
||||
|
||||
const { sessionId, cwd, toolName, toolInput, toolResponse } = input;
|
||||
const platformSource = normalizePlatformSource(input.platform);
|
||||
|
||||
@@ -49,38 +25,43 @@ export const observationHandler: EventHandler = {
|
||||
|
||||
logger.dataIn('HOOK', `PostToolUse: ${toolStr}`, {});
|
||||
|
||||
// Validate required fields before sending to worker
|
||||
// Plan 05 Phase 6: cwd is validated at the adapter boundary; the adapter
|
||||
// rejects empty cwd before reaching the handler. We still type-narrow for
|
||||
// TypeScript and as a belt-and-suspenders guard.
|
||||
if (!cwd) {
|
||||
throw new Error(`Missing cwd in PostToolUse hook input for session ${sessionId}, tool ${toolName}`);
|
||||
}
|
||||
|
||||
// Check if project is excluded from tracking
|
||||
const settings = SettingsDefaultsManager.loadFromFile(USER_SETTINGS_PATH);
|
||||
if (isProjectExcluded(cwd, settings.CLAUDE_MEM_EXCLUDED_PROJECTS)) {
|
||||
// Plan 05 Phase 5: project exclusion via single helper.
|
||||
if (!shouldTrackProject(cwd)) {
|
||||
logger.debug('HOOK', 'Project excluded from tracking, skipping observation', { cwd, toolName });
|
||||
return { continue: true, suppressOutput: true };
|
||||
}
|
||||
|
||||
// Send to worker - worker handles privacy check and database operations
|
||||
const requestBody = JSON.stringify({
|
||||
contentSessionId: sessionId,
|
||||
platformSource,
|
||||
tool_name: toolName,
|
||||
tool_input: toolInput,
|
||||
tool_response: toolResponse,
|
||||
cwd,
|
||||
agentId: input.agentId,
|
||||
agentType: input.agentType
|
||||
});
|
||||
// Plan 05 Phase 2: single helper for ensure-worker-alive → request → fallback.
|
||||
const result = await executeWithWorkerFallback<{ status?: string }>(
|
||||
'/api/sessions/observations',
|
||||
'POST',
|
||||
{
|
||||
contentSessionId: sessionId,
|
||||
platformSource,
|
||||
tool_name: toolName,
|
||||
tool_input: toolInput,
|
||||
tool_response: toolResponse,
|
||||
cwd,
|
||||
agentId: input.agentId,
|
||||
agentType: input.agentType,
|
||||
},
|
||||
);
|
||||
|
||||
try {
|
||||
await sendObservationToWorker(requestBody, toolName);
|
||||
} catch (error) {
|
||||
// Worker unreachable — skip observation gracefully
|
||||
logger.warn('HOOK', 'Observation fetch error, skipping', { error: error instanceof Error ? error.message : String(error) });
|
||||
if (isWorkerFallback(result)) {
|
||||
// Worker unreachable — fail-loud counter has already been incremented
|
||||
// and may have escalated to exit 2. If we got here, threshold not yet
|
||||
// reached, so degrade gracefully.
|
||||
return { continue: true, suppressOutput: true, exitCode: HOOK_EXIT_CODES.SUCCESS };
|
||||
}
|
||||
|
||||
logger.debug('HOOK', 'Observation sent successfully', { toolName });
|
||||
return { continue: true, suppressOutput: true };
|
||||
}
|
||||
},
|
||||
};
|
||||
|
||||
@@ -10,56 +10,43 @@
|
||||
*/
|
||||
|
||||
import type { EventHandler, NormalizedHookInput, HookResult } from '../types.js';
|
||||
import { ensureWorkerRunning, workerHttpRequest } from '../../shared/worker-utils.js';
|
||||
import { executeWithWorkerFallback, isWorkerFallback } from '../../shared/worker-utils.js';
|
||||
import { logger } from '../../utils/logger.js';
|
||||
import { normalizePlatformSource } from '../../shared/platform-source.js';
|
||||
|
||||
async function sendSessionCompleteRequest(sessionId: string, platformSource: string): Promise<void> {
|
||||
const response = await workerHttpRequest('/api/sessions/complete', {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({ contentSessionId: sessionId, platformSource })
|
||||
});
|
||||
|
||||
if (!response.ok) {
|
||||
const text = await response.text();
|
||||
logger.warn('HOOK', 'session-complete: Failed to complete session', { status: response.status, body: text });
|
||||
} else {
|
||||
logger.info('HOOK', 'Session completed successfully', { contentSessionId: sessionId });
|
||||
}
|
||||
}
|
||||
import { shouldTrackProject } from '../../shared/should-track-project.js';
|
||||
|
||||
export const sessionCompleteHandler: EventHandler = {
|
||||
async execute(input: NormalizedHookInput): Promise<HookResult> {
|
||||
// Ensure worker is running
|
||||
const workerReady = await ensureWorkerRunning();
|
||||
if (!workerReady) {
|
||||
// Worker not available — skip session completion gracefully
|
||||
return { continue: true, suppressOutput: true };
|
||||
}
|
||||
|
||||
const { sessionId } = input;
|
||||
const platformSource = normalizePlatformSource(input.platform);
|
||||
|
||||
// Same OBSERVER_SESSIONS_DIR exclusion as the rest of the hook surface —
|
||||
// the observer's child Claude Code must never call /api/sessions/complete.
|
||||
if (input.cwd && !shouldTrackProject(input.cwd)) {
|
||||
return { continue: true, suppressOutput: true };
|
||||
}
|
||||
|
||||
if (!sessionId) {
|
||||
logger.warn('HOOK', 'session-complete: Missing sessionId, skipping');
|
||||
return { continue: true, suppressOutput: true };
|
||||
}
|
||||
|
||||
logger.info('HOOK', '→ session-complete: Removing session from active map', {
|
||||
contentSessionId: sessionId
|
||||
contentSessionId: sessionId,
|
||||
});
|
||||
|
||||
try {
|
||||
await sendSessionCompleteRequest(sessionId, platformSource);
|
||||
} catch (error) {
|
||||
// Log but don't fail - session may already be gone
|
||||
const errorMessage = error instanceof Error ? error.message : String(error);
|
||||
logger.warn('HOOK', 'session-complete: Error completing session', {
|
||||
error: errorMessage
|
||||
});
|
||||
// Plan 05 Phase 2: single helper for ensure-worker-alive → request → fallback.
|
||||
const result = await executeWithWorkerFallback<{ status?: string }>(
|
||||
'/api/sessions/complete',
|
||||
'POST',
|
||||
{ contentSessionId: sessionId, platformSource },
|
||||
);
|
||||
|
||||
if (isWorkerFallback(result)) {
|
||||
return { continue: true, suppressOutput: true };
|
||||
}
|
||||
|
||||
logger.info('HOOK', 'Session completed successfully', { contentSessionId: sessionId });
|
||||
return { continue: true, suppressOutput: true };
|
||||
}
|
||||
},
|
||||
};
|
||||
|
||||
@@ -5,45 +5,29 @@
|
||||
*/
|
||||
|
||||
import type { EventHandler, NormalizedHookInput, HookResult } from '../types.js';
|
||||
import { ensureWorkerRunning, workerHttpRequest } from '../../shared/worker-utils.js';
|
||||
import { executeWithWorkerFallback, isWorkerFallback } from '../../shared/worker-utils.js';
|
||||
import { getProjectContext } from '../../utils/project-name.js';
|
||||
import { logger } from '../../utils/logger.js';
|
||||
import { HOOK_EXIT_CODES } from '../../shared/hook-constants.js';
|
||||
import { isProjectExcluded } from '../../utils/project-filter.js';
|
||||
import { SettingsDefaultsManager } from '../../shared/SettingsDefaultsManager.js';
|
||||
import { USER_SETTINGS_PATH } from '../../shared/paths.js';
|
||||
import { shouldTrackProject } from '../../shared/should-track-project.js';
|
||||
import { loadFromFileOnce } from '../../shared/hook-settings.js';
|
||||
import { normalizePlatformSource } from '../../shared/platform-source.js';
|
||||
|
||||
async function fetchSemanticContext(
|
||||
prompt: string,
|
||||
project: string,
|
||||
limit: string,
|
||||
sessionDbId: number
|
||||
): Promise<string> {
|
||||
const semanticRes = await workerHttpRequest('/api/context/semantic', {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({ q: prompt, project, limit })
|
||||
});
|
||||
if (semanticRes.ok) {
|
||||
const data = await semanticRes.json() as { context: string; count: number };
|
||||
if (data.context) {
|
||||
logger.debug('HOOK', `Semantic injection: ${data.count} observations for prompt`, { sessionId: sessionDbId, count: data.count });
|
||||
return data.context;
|
||||
}
|
||||
}
|
||||
return '';
|
||||
interface SessionInitResponse {
|
||||
sessionDbId: number;
|
||||
promptNumber: number;
|
||||
skipped?: boolean;
|
||||
reason?: string;
|
||||
contextInjected?: boolean;
|
||||
}
|
||||
|
||||
interface SemanticContextResponse {
|
||||
context: string;
|
||||
count: number;
|
||||
}
|
||||
|
||||
export const sessionInitHandler: EventHandler = {
|
||||
async execute(input: NormalizedHookInput): Promise<HookResult> {
|
||||
// Ensure worker is running before any other logic
|
||||
const workerReady = await ensureWorkerRunning();
|
||||
if (!workerReady) {
|
||||
// Worker not available - skip session init gracefully
|
||||
return { continue: true, suppressOutput: true, exitCode: HOOK_EXIT_CODES.SUCCESS };
|
||||
}
|
||||
|
||||
const { sessionId, prompt: rawPrompt } = input;
|
||||
const cwd = input.cwd ?? process.cwd(); // Match context.ts fallback (#1918)
|
||||
|
||||
@@ -53,9 +37,8 @@ export const sessionInitHandler: EventHandler = {
|
||||
return { continue: true, suppressOutput: true, exitCode: HOOK_EXIT_CODES.SUCCESS };
|
||||
}
|
||||
|
||||
// Check if project is excluded from tracking
|
||||
const settings = SettingsDefaultsManager.loadFromFile(USER_SETTINGS_PATH);
|
||||
if (cwd && isProjectExcluded(cwd, settings.CLAUDE_MEM_EXCLUDED_PROJECTS)) {
|
||||
// Plan 05 Phase 5: project exclusion via single helper.
|
||||
if (!shouldTrackProject(cwd)) {
|
||||
logger.info('HOOK', 'Project excluded from tracking', { cwd });
|
||||
return { continue: true, suppressOutput: true };
|
||||
}
|
||||
@@ -69,38 +52,28 @@ export const sessionInitHandler: EventHandler = {
|
||||
|
||||
logger.debug('HOOK', 'session-init: Calling /api/sessions/init', { contentSessionId: sessionId, project });
|
||||
|
||||
// Initialize session via HTTP - handles DB operations and privacy checks
|
||||
let initResponse: Response;
|
||||
try {
|
||||
initResponse = await workerHttpRequest('/api/sessions/init', {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({
|
||||
contentSessionId: sessionId,
|
||||
project,
|
||||
prompt,
|
||||
platformSource
|
||||
})
|
||||
});
|
||||
} catch (err) {
|
||||
// Worker unreachable — on Linux/WSL, hook may fire before worker is healthy (#1907)
|
||||
logger.warn('HOOK', `session-init: worker request failed: ${err instanceof Error ? err.message : err}`);
|
||||
// Plan 05 Phase 2: single helper for ensure-worker-alive → request → fallback.
|
||||
const initResult = await executeWithWorkerFallback<SessionInitResponse>(
|
||||
'/api/sessions/init',
|
||||
'POST',
|
||||
{
|
||||
contentSessionId: sessionId,
|
||||
project,
|
||||
prompt,
|
||||
platformSource,
|
||||
},
|
||||
);
|
||||
|
||||
if (isWorkerFallback(initResult)) {
|
||||
return { continue: true, suppressOutput: true, exitCode: HOOK_EXIT_CODES.SUCCESS };
|
||||
}
|
||||
|
||||
if (!initResponse.ok) {
|
||||
// Log but don't throw - a worker 500 should not block the user's prompt
|
||||
logger.failure('HOOK', `Session initialization failed: ${initResponse.status}`, { contentSessionId: sessionId, project });
|
||||
// Worker may have returned a non-2xx body (parsed but missing fields). Fail-soft.
|
||||
if (typeof initResult?.sessionDbId !== 'number') {
|
||||
logger.failure('HOOK', 'Session initialization returned malformed response', { contentSessionId: sessionId, project });
|
||||
return { continue: true, suppressOutput: true, exitCode: HOOK_EXIT_CODES.SUCCESS };
|
||||
}
|
||||
|
||||
const initResult = await initResponse.json() as {
|
||||
sessionDbId: number;
|
||||
promptNumber: number;
|
||||
skipped?: boolean;
|
||||
reason?: string;
|
||||
contextInjected?: boolean;
|
||||
};
|
||||
const sessionDbId = initResult.sessionDbId;
|
||||
const promptNumber = initResult.promptNumber;
|
||||
|
||||
@@ -117,57 +90,47 @@ export const sessionInitHandler: EventHandler = {
|
||||
return { continue: true, suppressOutput: true };
|
||||
}
|
||||
|
||||
// Skip SDK agent re-initialization if context was already injected for this session (#1079)
|
||||
// The prompt was already saved to the database by /api/sessions/init above —
|
||||
// no need to re-start the SDK agent on every turn.
|
||||
// Note: we do NOT return here — semantic injection below must run on every prompt.
|
||||
const skipAgentInit = Boolean(initResult.contextInjected);
|
||||
if (skipAgentInit) {
|
||||
logger.info('HOOK', `INIT_COMPLETE | sessionDbId=${sessionDbId} | promptNumber=${promptNumber} | skipped_agent_init=true | reason=context_already_injected`, {
|
||||
sessionId: sessionDbId
|
||||
});
|
||||
}
|
||||
|
||||
// Only initialize SDK agent for Claude Code (not Cursor)
|
||||
// Cursor doesn't use the SDK agent - it only needs session/observation storage
|
||||
if (!skipAgentInit && input.platform !== 'cursor' && sessionDbId) {
|
||||
// Plan 05 Phase 7: agent init is idempotent — call unconditionally for
|
||||
// every Claude Code session. Cursor still skipped (no SDK agent).
|
||||
if (input.platform !== 'cursor' && sessionDbId) {
|
||||
// Strip leading slash from commands for memory agent
|
||||
// /review 101 -> review 101 (more semantic for observations)
|
||||
const cleanedPrompt = prompt.startsWith('/') ? prompt.substring(1) : prompt;
|
||||
|
||||
logger.debug('HOOK', 'session-init: Calling /sessions/{sessionDbId}/init', { sessionDbId, promptNumber });
|
||||
|
||||
// Initialize SDK agent session via HTTP (starts the agent!)
|
||||
const response = await workerHttpRequest(`/sessions/${sessionDbId}/init`, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({ userPrompt: cleanedPrompt, promptNumber })
|
||||
});
|
||||
|
||||
if (!response.ok) {
|
||||
// Log but don't throw - SDK agent failure should not block the user's prompt
|
||||
logger.failure('HOOK', `SDK agent start failed: ${response.status}`, { sessionDbId, promptNumber });
|
||||
const agentInitResult = await executeWithWorkerFallback<{ status?: string }>(
|
||||
`/sessions/${sessionDbId}/init`,
|
||||
'POST',
|
||||
{ userPrompt: cleanedPrompt, promptNumber },
|
||||
);
|
||||
if (isWorkerFallback(agentInitResult)) {
|
||||
// Worker became unreachable mid-invocation; fail-loud counter handled it.
|
||||
return { continue: true, suppressOutput: true, exitCode: HOOK_EXIT_CODES.SUCCESS };
|
||||
}
|
||||
} else if (!skipAgentInit && input.platform === 'cursor') {
|
||||
} else if (input.platform === 'cursor') {
|
||||
logger.debug('HOOK', 'session-init: Skipping SDK agent init for Cursor platform', { sessionDbId, promptNumber });
|
||||
}
|
||||
|
||||
// Semantic context injection: query Chroma for relevant past observations
|
||||
// and inject as additionalContext so Claude receives relevant memory each prompt.
|
||||
// Controlled by CLAUDE_MEM_SEMANTIC_INJECT setting (default: true).
|
||||
// Plan 05 Phase 4: settings via process-scope cache.
|
||||
const settings = loadFromFileOnce();
|
||||
const semanticInject =
|
||||
String(settings.CLAUDE_MEM_SEMANTIC_INJECT).toLowerCase() === 'true';
|
||||
let additionalContext = '';
|
||||
|
||||
if (semanticInject && prompt && prompt.length >= 20 && prompt !== '[media prompt]') {
|
||||
const limit = settings.CLAUDE_MEM_SEMANTIC_INJECT_LIMIT || '5';
|
||||
try {
|
||||
additionalContext = await fetchSemanticContext(prompt, project, limit, sessionDbId);
|
||||
} catch (e) {
|
||||
// Graceful degradation — semantic injection is optional
|
||||
logger.debug('HOOK', 'Semantic injection unavailable', {
|
||||
error: e instanceof Error ? e.message : String(e)
|
||||
});
|
||||
const semanticResult = await executeWithWorkerFallback<SemanticContextResponse>(
|
||||
'/api/context/semantic',
|
||||
'POST',
|
||||
{ q: prompt, project, limit },
|
||||
);
|
||||
if (!isWorkerFallback(semanticResult) && semanticResult?.context) {
|
||||
logger.debug('HOOK', `Semantic injection: ${semanticResult.count} observations for prompt`, { sessionId: sessionDbId, count: semanticResult.count });
|
||||
additionalContext = semanticResult.context;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -1,26 +1,33 @@
|
||||
/**
|
||||
* Summarize Handler - Stop
|
||||
*
|
||||
* Fire-and-forget: enqueue the summarize request with the worker and return
|
||||
* immediately so the Stop hook does not block the user's terminal. The worker
|
||||
* owns completion and session cleanup.
|
||||
* Fire-and-forget: queue the summarize request and exit. The worker handles
|
||||
* summary generation, storage, and session cleanup asynchronously. The Stop
|
||||
* hook does not wait for any of it — Claude Code must exit immediately.
|
||||
* Session-complete cleanup is performed by the SessionEnd handler.
|
||||
*/
|
||||
|
||||
import type { EventHandler, NormalizedHookInput, HookResult } from '../types.js';
|
||||
import { ensureWorkerRunning, workerHttpRequest } from '../../shared/worker-utils.js';
|
||||
import { executeWithWorkerFallback, isWorkerFallback } from '../../shared/worker-utils.js';
|
||||
import { logger } from '../../utils/logger.js';
|
||||
import { extractLastMessage } from '../../shared/transcript-parser.js';
|
||||
import { HOOK_EXIT_CODES } from '../../shared/hook-constants.js';
|
||||
import { normalizePlatformSource } from '../../shared/platform-source.js';
|
||||
|
||||
const SUMMARIZE_TIMEOUT_MS = 5000;
|
||||
import { shouldTrackProject } from '../../shared/should-track-project.js';
|
||||
|
||||
export const summarizeHandler: EventHandler = {
|
||||
async execute(input: NormalizedHookInput): Promise<HookResult> {
|
||||
// Skip Stop hook entirely when firing from an excluded project (notably
|
||||
// OBSERVER_SESSIONS_DIR). Without this, the SDK observer's own Stop hook
|
||||
// queues summaries against its meta-session and triggers a recovery loop.
|
||||
if (input.cwd && !shouldTrackProject(input.cwd)) {
|
||||
return { continue: true, suppressOutput: true, exitCode: HOOK_EXIT_CODES.SUCCESS };
|
||||
}
|
||||
|
||||
// Skip summaries in subagent context — subagents do not own the session summary.
|
||||
// Gate on agentId only: that field is present exclusively for Task-spawned subagents.
|
||||
// agentType alone (no agentId) indicates `--agent`-started main sessions, which still
|
||||
// own their summary. Do this BEFORE ensureWorkerRunning() so a subagent Stop hook
|
||||
// own their summary. Do this BEFORE the worker call so a subagent Stop hook
|
||||
// does not bootstrap the worker.
|
||||
if (input.agentId) {
|
||||
logger.debug('HOOK', 'Skipping summary: subagent context detected', {
|
||||
@@ -31,16 +38,13 @@ export const summarizeHandler: EventHandler = {
|
||||
return { continue: true, suppressOutput: true, exitCode: HOOK_EXIT_CODES.SUCCESS };
|
||||
}
|
||||
|
||||
// Ensure worker is running before any other logic
|
||||
const workerReady = await ensureWorkerRunning();
|
||||
if (!workerReady) {
|
||||
// Worker not available - skip summary gracefully
|
||||
return { continue: true, suppressOutput: true, exitCode: HOOK_EXIT_CODES.SUCCESS };
|
||||
}
|
||||
|
||||
const { sessionId, transcriptPath } = input;
|
||||
|
||||
// Validate required fields before processing
|
||||
if (!sessionId) {
|
||||
logger.warn('HOOK', 'summarize: No sessionId provided, skipping');
|
||||
return { continue: true, suppressOutput: true, exitCode: HOOK_EXIT_CODES.SUCCESS };
|
||||
}
|
||||
if (!transcriptPath) {
|
||||
// No transcript available - skip summary gracefully (not an error)
|
||||
logger.debug('HOOK', `No transcriptPath in Stop hook input for session ${sessionId} - skipping summary`);
|
||||
@@ -75,31 +79,20 @@ export const summarizeHandler: EventHandler = {
|
||||
const platformSource = normalizePlatformSource(input.platform);
|
||||
|
||||
// 1. Queue summarize request — worker returns immediately with { status: 'queued' }
|
||||
let response: Response;
|
||||
try {
|
||||
response = await workerHttpRequest('/api/sessions/summarize', {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({
|
||||
contentSessionId: sessionId,
|
||||
last_assistant_message: lastAssistantMessage,
|
||||
platformSource
|
||||
}),
|
||||
timeoutMs: SUMMARIZE_TIMEOUT_MS
|
||||
});
|
||||
} catch (err) {
|
||||
// Network error, worker crash, or timeout — exit gracefully instead of
|
||||
// bubbling to hook runner which exits code 2 and blocks session exit (#1901)
|
||||
logger.warn('HOOK', `Stop hook: summarize request failed: ${err instanceof Error ? err.message : err}`);
|
||||
const queueResult = await executeWithWorkerFallback<{ status?: string }>(
|
||||
'/api/sessions/summarize',
|
||||
'POST',
|
||||
{
|
||||
contentSessionId: sessionId,
|
||||
last_assistant_message: lastAssistantMessage,
|
||||
platformSource,
|
||||
},
|
||||
);
|
||||
if (isWorkerFallback(queueResult)) {
|
||||
return { continue: true, suppressOutput: true, exitCode: HOOK_EXIT_CODES.SUCCESS };
|
||||
}
|
||||
|
||||
if (!response.ok) {
|
||||
return { continue: true, suppressOutput: true };
|
||||
}
|
||||
|
||||
logger.debug('HOOK', 'Summary request queued');
|
||||
|
||||
logger.debug('HOOK', 'Summary request queued, exiting hook');
|
||||
return { continue: true, suppressOutput: true };
|
||||
}
|
||||
},
|
||||
};
|
||||
|
||||
@@ -7,47 +7,38 @@
|
||||
|
||||
import { basename } from 'path';
|
||||
import type { EventHandler, NormalizedHookInput, HookResult } from '../types.js';
|
||||
import { ensureWorkerRunning, getWorkerPort, workerHttpRequest } from '../../shared/worker-utils.js';
|
||||
import {
|
||||
executeWithWorkerFallback,
|
||||
isWorkerFallback,
|
||||
getWorkerPort,
|
||||
} from '../../shared/worker-utils.js';
|
||||
import { HOOK_EXIT_CODES } from '../../shared/hook-constants.js';
|
||||
|
||||
async function fetchAndDisplayContext(project: string, colorsParam: string, port: number): Promise<void> {
|
||||
const response = await workerHttpRequest(
|
||||
`/api/context/inject?project=${encodeURIComponent(project)}${colorsParam}`
|
||||
);
|
||||
|
||||
if (!response.ok) {
|
||||
return;
|
||||
}
|
||||
|
||||
const output = await response.text();
|
||||
process.stderr.write(
|
||||
"\n\n" + String.fromCodePoint(0x1F4DD) + " Claude-Mem Context Loaded\n\n" +
|
||||
output +
|
||||
"\n\n" + String.fromCodePoint(0x1F4A1) + " Wrap any message with <private> ... </private> to prevent storing sensitive information.\n" +
|
||||
"\n" + String.fromCodePoint(0x1F4AC) + " Community https://discord.gg/J4wttp9vDu" +
|
||||
`\n` + String.fromCodePoint(0x1F4FA) + ` Watch live in browser http://localhost:${port}/\n`
|
||||
);
|
||||
}
|
||||
|
||||
export const userMessageHandler: EventHandler = {
|
||||
async execute(input: NormalizedHookInput): Promise<HookResult> {
|
||||
// Ensure worker is running
|
||||
const workerReady = await ensureWorkerRunning();
|
||||
if (!workerReady) {
|
||||
// Worker not available — skip user message gracefully
|
||||
return { exitCode: HOOK_EXIT_CODES.SUCCESS };
|
||||
}
|
||||
|
||||
const port = getWorkerPort();
|
||||
const project = basename(input.cwd ?? process.cwd());
|
||||
const colorsParam = input.platform === 'claude-code' ? '&colors=true' : '';
|
||||
|
||||
try {
|
||||
await fetchAndDisplayContext(project, colorsParam, port);
|
||||
} catch {
|
||||
// Worker unreachable — skip user message gracefully
|
||||
// Plan 05 Phase 2: single helper for ensure-worker-alive → request → fallback.
|
||||
const result = await executeWithWorkerFallback<string>(
|
||||
`/api/context/inject?project=${encodeURIComponent(project)}${colorsParam}`,
|
||||
'GET',
|
||||
);
|
||||
|
||||
if (isWorkerFallback(result)) {
|
||||
return { exitCode: HOOK_EXIT_CODES.SUCCESS };
|
||||
}
|
||||
|
||||
const output = typeof result === 'string' ? result : '';
|
||||
process.stderr.write(
|
||||
"\n\n" + String.fromCodePoint(0x1F4DD) + " Claude-Mem Context Loaded\n\n" +
|
||||
output +
|
||||
"\n\n" + String.fromCodePoint(0x1F4A1) + " Wrap any message with <private> ... </private> to prevent storing sensitive information.\n" +
|
||||
"\n" + String.fromCodePoint(0x1F4AC) + " Community https://discord.gg/J4wttp9vDu" +
|
||||
`\n` + String.fromCodePoint(0x1F4FA) + ` Watch live in browser http://localhost:${port}/\n`
|
||||
);
|
||||
|
||||
return { exitCode: HOOK_EXIT_CODES.SUCCESS };
|
||||
}
|
||||
},
|
||||
};
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
import { readJsonFromStdin } from './stdin-reader.js';
|
||||
import { getPlatformAdapter } from './adapters/index.js';
|
||||
import { AdapterRejectedInput } from './adapters/errors.js';
|
||||
import { getEventHandler } from './handlers/index.js';
|
||||
import { HOOK_EXIT_CODES } from '../shared/hook-constants.js';
|
||||
import { logger } from '../utils/logger.js';
|
||||
@@ -98,6 +99,18 @@ export async function hookCommand(platform: string, event: string, options: Hook
|
||||
try {
|
||||
return await executeHookPipeline(adapter, handler, platform, options);
|
||||
} catch (error) {
|
||||
// Plan 05 Phase 6 — adapter rejected the input (invalid cwd or other
|
||||
// boundary-detected payload defect). Treat as graceful: emit a continue
|
||||
// envelope and exit 0 so the user's session is not blocked by a malformed
|
||||
// hook payload from the platform.
|
||||
if (error instanceof AdapterRejectedInput) {
|
||||
logger.warn('HOOK', `Adapter rejected input (${error.reason}), skipping hook`);
|
||||
console.log(JSON.stringify({ continue: true, suppressOutput: true }));
|
||||
if (!options.skipExit) {
|
||||
process.exit(HOOK_EXIT_CODES.SUCCESS);
|
||||
}
|
||||
return HOOK_EXIT_CODES.SUCCESS;
|
||||
}
|
||||
if (isWorkerUnavailableError(error)) {
|
||||
// Worker unavailable — degrade gracefully, don't block the user
|
||||
// Log to file instead of stderr (#1181)
|
||||
|
||||
@@ -351,7 +351,8 @@ function runNpmInstallInMarketplace(): void {
|
||||
execSync('npm install --production', {
|
||||
cwd: marketplaceDir,
|
||||
stdio: 'pipe',
|
||||
...(IS_WINDOWS ? { shell: true as const } : {}),
|
||||
encoding: 'utf8',
|
||||
...(IS_WINDOWS ? { shell: process.env.ComSpec ?? 'cmd.exe' } : {}),
|
||||
});
|
||||
}
|
||||
|
||||
@@ -370,7 +371,8 @@ function runSmartInstall(): boolean {
|
||||
try {
|
||||
execSync(`node "${smartInstallPath}"`, {
|
||||
stdio: 'inherit',
|
||||
...(IS_WINDOWS ? { shell: true as const } : {}),
|
||||
encoding: 'utf8',
|
||||
...(IS_WINDOWS ? { shell: process.env.ComSpec ?? 'cmd.exe' } : {}),
|
||||
});
|
||||
return true;
|
||||
} catch (error: unknown) {
|
||||
|
||||
@@ -64,23 +64,3 @@ export function resolveBunBinaryPath(): string | null {
|
||||
return null;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get the installed Bun version string (e.g. `"1.2.3"`), or `null`
|
||||
* if Bun is not available.
|
||||
*/
|
||||
export function getBunVersionString(): string | null {
|
||||
const bunPath = resolveBunBinaryPath();
|
||||
if (!bunPath) return null;
|
||||
|
||||
try {
|
||||
const result = spawnSync(bunPath, ['--version'], {
|
||||
encoding: 'utf-8',
|
||||
stdio: ['pipe', 'pipe', 'pipe'],
|
||||
shell: IS_WINDOWS,
|
||||
});
|
||||
return result.status === 0 ? result.stdout.trim() : null;
|
||||
} catch (error: unknown) {
|
||||
console.error('[bun-resolver] Failed to get Bun version:', error instanceof Error ? error.message : String(error));
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
+111
-143
@@ -1,6 +1,13 @@
|
||||
/**
|
||||
* XML Parser Module
|
||||
* Parses observation and summary XML blocks from SDK responses
|
||||
*
|
||||
* Single fail-fast entry point for SDK agent XML responses.
|
||||
*
|
||||
* Per PATHFINDER-2026-04-22 plan 03 phase 1:
|
||||
* - One function (`parseAgentXml`) for all agent responses.
|
||||
* - Discriminated-union return: `{ valid: true, kind, data }` or `{ valid: false, reason }`.
|
||||
* - No coercion. No silent passthrough. No "lenient mode".
|
||||
* - `<skip_summary reason="…"/>` is a first-class summary case (skipped: true).
|
||||
*/
|
||||
|
||||
import { logger } from '../utils/logger.js';
|
||||
@@ -24,23 +31,103 @@ export interface ParsedSummary {
|
||||
completed: string | null;
|
||||
next_steps: string | null;
|
||||
notes: string | null;
|
||||
/** True when the response was an explicit `<skip_summary reason="…"/>` bypass. */
|
||||
skipped?: boolean;
|
||||
/** Non-null when `skipped: true`. */
|
||||
skip_reason?: string | null;
|
||||
}
|
||||
|
||||
export type ParseResult =
|
||||
| { valid: true; kind: 'observation'; data: ParsedObservation[] }
|
||||
| { valid: true; kind: 'summary'; data: ParsedSummary }
|
||||
| { valid: false; reason: string };
|
||||
|
||||
/**
|
||||
* Parse an SDK agent response. Inspects the first significant XML root element
|
||||
* and returns a discriminated union. Never coerces. Never returns null/undefined.
|
||||
*
|
||||
* Recognised roots:
|
||||
* <observation> … </observation> → { kind: 'observation', data: ParsedObservation[] }
|
||||
* <summary> … </summary> → { kind: 'summary', data: ParsedSummary }
|
||||
* <skip_summary reason="…" /> → { kind: 'summary', data: { skipped: true, … } }
|
||||
*
|
||||
* Anything else → { valid: false, reason }. The caller is responsible for
|
||||
* surfacing the reason (markFailed, log, etc.). No retry coercion.
|
||||
*/
|
||||
export function parseAgentXml(raw: string, correlationId?: string | number): ParseResult {
|
||||
if (typeof raw !== 'string' || !raw.trim()) {
|
||||
return { valid: false, reason: 'empty: response had no content' };
|
||||
}
|
||||
|
||||
// Skip-summary is recognised even when wrapped in other text, but only as the
|
||||
// sole structural signal. It outranks <observation> / <summary> matches because
|
||||
// it is an explicit protocol bypass. `reason` is optional.
|
||||
const skipMatch = /<skip_summary(?:\s+reason="([^"]*)")?\s*\/>/.exec(raw);
|
||||
if (skipMatch) {
|
||||
return {
|
||||
valid: true,
|
||||
kind: 'summary',
|
||||
data: {
|
||||
request: null,
|
||||
investigated: null,
|
||||
learned: null,
|
||||
completed: null,
|
||||
next_steps: null,
|
||||
notes: null,
|
||||
skipped: true,
|
||||
skip_reason: skipMatch[1] ?? null,
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
// Find the first significant element by scanning for the first `<…>` opener
|
||||
// that is one of the recognised roots. This tolerates leading prose / debug
|
||||
// output from the model while still failing fast on entirely-non-XML payloads.
|
||||
const firstRoot = /<(observation|summary)\b/i.exec(raw);
|
||||
if (!firstRoot) {
|
||||
const preview = raw.length > 120 ? `${raw.slice(0, 120)}…` : raw;
|
||||
return {
|
||||
valid: false,
|
||||
reason: `unknown root: response contained no <observation>, <summary>, or <skip_summary/> element (preview: ${preview.replace(/\s+/g, ' ')})`,
|
||||
};
|
||||
}
|
||||
|
||||
const rootName = firstRoot[1].toLowerCase();
|
||||
if (rootName === 'observation') {
|
||||
const observations = parseObservationBlocks(raw, correlationId);
|
||||
if (observations.length === 0) {
|
||||
return {
|
||||
valid: false,
|
||||
reason: '<observation>: no parseable observation block (every block was empty or ghost)',
|
||||
};
|
||||
}
|
||||
return { valid: true, kind: 'observation', data: observations };
|
||||
}
|
||||
|
||||
// rootName === 'summary'
|
||||
const summary = parseSummaryBlock(raw, correlationId);
|
||||
if (!summary) {
|
||||
return {
|
||||
valid: false,
|
||||
reason: '<summary>: empty or missing every required sub-tag (request/investigated/learned/completed/next_steps)',
|
||||
};
|
||||
}
|
||||
return { valid: true, kind: 'summary', data: summary };
|
||||
}
|
||||
|
||||
/**
|
||||
* Parse observation XML blocks from SDK response
|
||||
* Returns all observations found in the response
|
||||
* Parse all <observation>…</observation> blocks. Filters out ghost
|
||||
* observations (every content field empty). Returns the surviving list.
|
||||
*/
|
||||
export function parseObservations(text: string, correlationId?: string): ParsedObservation[] {
|
||||
function parseObservationBlocks(text: string, correlationId?: string | number): ParsedObservation[] {
|
||||
const observations: ParsedObservation[] = [];
|
||||
|
||||
// Match <observation>...</observation> blocks (non-greedy)
|
||||
const observationRegex = /<observation>([\s\S]*?)<\/observation>/g;
|
||||
|
||||
let match;
|
||||
while ((match = observationRegex.exec(text)) !== null) {
|
||||
const obsContent = match[1];
|
||||
|
||||
// Extract all fields
|
||||
const type = extractField(obsContent, 'type');
|
||||
const title = extractField(obsContent, 'title');
|
||||
const subtitle = extractField(obsContent, 'subtitle');
|
||||
@@ -50,13 +137,13 @@ export function parseObservations(text: string, correlationId?: string): ParsedO
|
||||
const files_read = extractArrayElements(obsContent, 'files_read', 'file');
|
||||
const files_modified = extractArrayElements(obsContent, 'files_modified', 'file');
|
||||
|
||||
// All fields except type are nullable in schema.
|
||||
// If type is missing or invalid, use first type from mode as fallback.
|
||||
|
||||
// Determine final type using active mode's valid types
|
||||
// Type fallback: per existing semantics, missing/invalid type degrades to the
|
||||
// first type in the active mode. This is parser-internal validation, not
|
||||
// recovery from a contract violation: every mode's first type is intentionally
|
||||
// the catch-all bucket.
|
||||
const mode = ModeManager.getInstance().getActiveMode();
|
||||
const validTypes = mode.observation_types.map(t => t.id);
|
||||
const fallbackType = validTypes[0]; // First type in mode's list is the fallback
|
||||
const fallbackType = validTypes[0];
|
||||
let finalType = fallbackType;
|
||||
if (type) {
|
||||
if (validTypes.includes(type.trim())) {
|
||||
@@ -68,8 +155,6 @@ export function parseObservations(text: string, correlationId?: string): ParsedO
|
||||
logger.error('PARSER', `Observation missing type field, using "${fallbackType}"`, { correlationId });
|
||||
}
|
||||
|
||||
// All other fields are optional - save whatever we have
|
||||
|
||||
// Filter out type from concepts array (types and concepts are separate dimensions)
|
||||
const cleanedConcepts = concepts.filter(c => c !== finalType);
|
||||
|
||||
@@ -83,10 +168,8 @@ export function parseObservations(text: string, correlationId?: string): ParsedO
|
||||
}
|
||||
|
||||
// Skip ghost observations — records where every content field is null/empty.
|
||||
// These accumulate when the LLM emits a bare <observation/> (or one with only <type>)
|
||||
// due to context overflow. They carry no information and pollute the context window.
|
||||
// (subtitle and file lists are intentionally excluded from this guard: an observation
|
||||
// with only a subtitle is still too thin to be useful on its own.)
|
||||
// (subtitle and file lists are intentionally excluded from this guard:
|
||||
// an observation with only a subtitle is still too thin to be useful.)
|
||||
if (!title && !narrative && facts.length === 0 && cleanedConcepts.length === 0) {
|
||||
logger.warn('PARSER', 'Skipping empty observation (all content fields null)', {
|
||||
correlationId,
|
||||
@@ -111,96 +194,29 @@ export function parseObservations(text: string, correlationId?: string): ParsedO
|
||||
}
|
||||
|
||||
/**
|
||||
* Parse summary XML block from SDK response
|
||||
* Returns null if no valid summary found or if summary was skipped
|
||||
*
|
||||
* @param coerceFromObservation - When true, attempts to convert <observation> tags
|
||||
* into summary fields if no <summary> tags are found. Only set this when the
|
||||
* response was expected to be a summary (i.e., a summarize message was sent).
|
||||
* Prevents the infinite retry loop described in #1633.
|
||||
* Parse a single <summary>…</summary> block. Returns null when the block has
|
||||
* no usable sub-tags (every required field empty) — the caller maps this to
|
||||
* a fail-fast `{ valid: false, reason }` result.
|
||||
*/
|
||||
export function parseSummary(text: string, sessionId?: number, coerceFromObservation: boolean = false): ParsedSummary | null {
|
||||
// Check for skip_summary first
|
||||
const skipRegex = /<skip_summary\s+reason="([^"]+)"\s*\/>/;
|
||||
const skipMatch = skipRegex.exec(text);
|
||||
|
||||
if (skipMatch) {
|
||||
logger.info('PARSER', 'Summary skipped', {
|
||||
sessionId,
|
||||
reason: skipMatch[1]
|
||||
});
|
||||
return null;
|
||||
}
|
||||
|
||||
// Match <summary>...</summary> block (non-greedy)
|
||||
function parseSummaryBlock(text: string, correlationId?: string | number): ParsedSummary | null {
|
||||
const summaryRegex = /<summary>([\s\S]*?)<\/summary>/;
|
||||
const summaryMatch = summaryRegex.exec(text);
|
||||
|
||||
if (!summaryMatch) {
|
||||
// When the LLM returns <observation> tags instead of <summary> tags on a
|
||||
// summary turn, coerce the observation content into summary fields rather
|
||||
// than discarding it. This breaks the infinite retry loop described in
|
||||
// #1633: without coercion, the summary is silently dropped, the session
|
||||
// completes without a summary, a new session is spawned with an ever-growing
|
||||
// prompt, and the cycle repeats.
|
||||
//
|
||||
// parseSummary is called on every response (see ResponseProcessor), not just
|
||||
// summary turns — so the absence of <summary> in an observation response is
|
||||
// expected, not a prompt-conditioning failure. Only act when the caller
|
||||
// actually expected a summary (coerceFromObservation=true).
|
||||
if (coerceFromObservation && /<observation>/.test(text)) {
|
||||
const coerced = coerceObservationToSummary(text, sessionId);
|
||||
if (coerced) {
|
||||
return coerced;
|
||||
}
|
||||
logger.warn('PARSER', 'Summary response contained <observation> tags instead of <summary> — coercion failed, no usable content', { sessionId });
|
||||
}
|
||||
return null;
|
||||
}
|
||||
if (!summaryMatch) return null;
|
||||
|
||||
const summaryContent = summaryMatch[1];
|
||||
|
||||
// Extract fields
|
||||
const request = extractField(summaryContent, 'request');
|
||||
const investigated = extractField(summaryContent, 'investigated');
|
||||
const learned = extractField(summaryContent, 'learned');
|
||||
const completed = extractField(summaryContent, 'completed');
|
||||
const next_steps = extractField(summaryContent, 'next_steps');
|
||||
const notes = extractField(summaryContent, 'notes'); // Optional
|
||||
const notes = extractField(summaryContent, 'notes'); // optional
|
||||
|
||||
// NOTE FROM THEDOTMACK: 100% of the time we must SAVE the summary, even if fields are missing. 10/24/2025
|
||||
// NEVER DO THIS NONSENSE AGAIN.
|
||||
|
||||
// Validate required fields are present (notes is optional)
|
||||
// if (!request || !investigated || !learned || !completed || !next_steps) {
|
||||
// logger.warn('PARSER', 'Summary missing required fields', {
|
||||
// sessionId,
|
||||
// hasRequest: !!request,
|
||||
// hasInvestigated: !!investigated,
|
||||
// hasLearned: !!learned,
|
||||
// hasCompleted: !!completed,
|
||||
// hasNextSteps: !!next_steps
|
||||
// });
|
||||
// return null;
|
||||
// }
|
||||
|
||||
// Guard: if NO sub-tags matched at all, this is a false positive —
|
||||
// <summary> accidentally appeared inside an <observation> response with no structured content.
|
||||
// This is NOT the same as missing some fields (which we intentionally allow above).
|
||||
// Fix for #1360.
|
||||
// Per maintainer note: a summary with at least one populated sub-tag must be
|
||||
// saved. Missing sub-tags are tolerated; an entirely empty <summary> block is
|
||||
// a false-positive (covered the #1360 regression) and is rejected.
|
||||
if (!request && !investigated && !learned && !completed && !next_steps) {
|
||||
// If the response also contains <observation> tags with real content, fall
|
||||
// back to coercion rather than discarding the response entirely — this covers
|
||||
// the case where the LLM wraps empty <summary></summary> around observation
|
||||
// content, which would otherwise resurrect the #1633 retry loop.
|
||||
if (coerceFromObservation && /<observation>/.test(text)) {
|
||||
const coerced = coerceObservationToSummary(text, sessionId);
|
||||
if (coerced) {
|
||||
logger.warn('PARSER', 'Empty <summary> match rejected — coerced from <observation> fallback (#1633)', { sessionId });
|
||||
return coerced;
|
||||
}
|
||||
}
|
||||
logger.warn('PARSER', 'Summary match has no sub-tags — skipping false positive', { sessionId });
|
||||
logger.warn('PARSER', 'Summary block has no sub-tags — rejecting false positive', { correlationId });
|
||||
return null;
|
||||
}
|
||||
|
||||
@@ -210,54 +226,10 @@ export function parseSummary(text: string, sessionId?: number, coerceFromObserva
|
||||
learned,
|
||||
completed,
|
||||
next_steps,
|
||||
notes
|
||||
notes,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Coerce <observation> response into a ParsedSummary when <summary> tags are missing.
|
||||
* Maps observation fields to the closest summary equivalents so that a usable
|
||||
* summary is stored instead of nothing — breaking the retry loop (#1633).
|
||||
*/
|
||||
function coerceObservationToSummary(text: string, sessionId?: number): ParsedSummary | null {
|
||||
// Iterate all <observation> blocks — if the LLM emits multiple and the first is
|
||||
// empty, we still want to salvage the first one that has usable content.
|
||||
const obsRegex = /<observation>([\s\S]*?)<\/observation>/g;
|
||||
let obsMatch: RegExpExecArray | null;
|
||||
let blockIndex = 0;
|
||||
|
||||
while ((obsMatch = obsRegex.exec(text)) !== null) {
|
||||
const obsContent = obsMatch[1];
|
||||
const title = extractField(obsContent, 'title');
|
||||
const subtitle = extractField(obsContent, 'subtitle');
|
||||
const narrative = extractField(obsContent, 'narrative');
|
||||
const facts = extractArrayElements(obsContent, 'facts', 'fact');
|
||||
|
||||
if (title || narrative || facts.length > 0) {
|
||||
// Map observation fields → summary fields (best-effort)
|
||||
const request = title || subtitle || null;
|
||||
const investigated = narrative || null;
|
||||
const learned = facts.length > 0 ? facts.join('; ') : null;
|
||||
const completed = title ? `${title}${subtitle ? ' — ' + subtitle : ''}` : null;
|
||||
const next_steps = null; // No direct observation equivalent
|
||||
|
||||
logger.warn('PARSER', 'Coerced <observation> response into <summary> to prevent retry loop (#1633)', {
|
||||
sessionId,
|
||||
blockIndex,
|
||||
hasTitle: !!title,
|
||||
hasNarrative: !!narrative,
|
||||
factCount: facts.length,
|
||||
});
|
||||
|
||||
return { request, investigated, learned, completed, next_steps, notes: null };
|
||||
}
|
||||
|
||||
blockIndex++;
|
||||
}
|
||||
|
||||
return null;
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract a simple field value from XML content
|
||||
* Returns null for missing or empty/whitespace-only fields
|
||||
@@ -265,8 +237,6 @@ function coerceObservationToSummary(text: string, sessionId?: number): ParsedSum
|
||||
* Uses non-greedy match to handle nested tags and code snippets (Issue #798)
|
||||
*/
|
||||
function extractField(content: string, fieldName: string): string | null {
|
||||
// Use [\s\S]*? to match any character including newlines, non-greedily
|
||||
// This handles nested XML tags like <item>...</item> inside the field
|
||||
const regex = new RegExp(`<${fieldName}>([\\s\\S]*?)</${fieldName}>`);
|
||||
const match = regex.exec(content);
|
||||
if (!match) return null;
|
||||
@@ -282,7 +252,6 @@ function extractField(content: string, fieldName: string): string | null {
|
||||
function extractArrayElements(content: string, arrayName: string, elementName: string): string[] {
|
||||
const elements: string[] = [];
|
||||
|
||||
// Match the array block using [\s\S]*? for nested content
|
||||
const arrayRegex = new RegExp(`<${arrayName}>([\\s\\S]*?)</${arrayName}>`);
|
||||
const arrayMatch = arrayRegex.exec(content);
|
||||
|
||||
@@ -292,7 +261,6 @@ function extractArrayElements(content: string, arrayName: string, elementName: s
|
||||
|
||||
const arrayContent = arrayMatch[1];
|
||||
|
||||
// Extract individual elements using [\s\S]*? for nested content
|
||||
const elementRegex = new RegExp(`<${elementName}>([\\s\\S]*?)</${elementName}>`, 'g');
|
||||
let elementMatch;
|
||||
while ((elementMatch = elementRegex.exec(arrayContent)) !== null) {
|
||||
|
||||
+5
-10
@@ -7,19 +7,14 @@ import { logger } from '../utils/logger.js';
|
||||
import type { ModeConfig } from '../services/domain/types.js';
|
||||
|
||||
/**
|
||||
* Marker string embedded in summary prompts — used by ResponseProcessor to detect
|
||||
* whether the most recent user message was a summary request (enables observation→summary
|
||||
* coercion for #1633). Keep in sync with buildSummaryPrompt below.
|
||||
* Marker string embedded in summary prompts — historically used by
|
||||
* ResponseProcessor to detect summary turns for the (now-deleted) coercion
|
||||
* fallback. Kept here because `buildSummaryPrompt` still embeds it as the
|
||||
* mode-switch banner; deleting the constant would require rewriting the
|
||||
* prompt builder, which is out of scope for plan 03.
|
||||
*/
|
||||
export const SUMMARY_MODE_MARKER = 'MODE SWITCH: PROGRESS SUMMARY';
|
||||
|
||||
/**
|
||||
* Maximum consecutive summary failures before the circuit breaker opens.
|
||||
* After this many failures, SessionManager.queueSummarize will skip further
|
||||
* summarize requests to prevent the infinite retry loop (#1633).
|
||||
*/
|
||||
export const MAX_CONSECUTIVE_SUMMARY_FAILURES = 3;
|
||||
|
||||
export interface Observation {
|
||||
id: number;
|
||||
tool_name: string;
|
||||
|
||||
@@ -10,7 +10,7 @@
|
||||
|
||||
import http from 'http';
|
||||
import { logger } from '../../utils/logger.js';
|
||||
import { stopSupervisor } from '../../supervisor/index.js';
|
||||
import { getSupervisor } from '../../supervisor/index.js';
|
||||
|
||||
export interface ShutdownableService {
|
||||
shutdownAll(): Promise<void>;
|
||||
@@ -80,7 +80,10 @@ export async function performGracefulShutdown(config: GracefulShutdownConfig): P
|
||||
}
|
||||
|
||||
// STEP 6: Supervisor handles tracked child termination, PID cleanup, and stale sockets.
|
||||
await stopSupervisor();
|
||||
// Plan 06 Phase 8 — call the supervisor singleton directly; the wrapper
|
||||
// re-export from supervisor/index.ts was deleted (one wrapper, one caller,
|
||||
// no value).
|
||||
await getSupervisor().stop();
|
||||
|
||||
logger.info('SYSTEM', 'Worker shutdown complete');
|
||||
}
|
||||
|
||||
@@ -48,7 +48,7 @@ interface WorktreeEntry {
|
||||
branch: string | null;
|
||||
}
|
||||
|
||||
const GIT_TIMEOUT_MS = 5000;
|
||||
const GIT_TIMEOUT_MS = 15000;
|
||||
|
||||
class DryRunRollback extends Error {
|
||||
constructor() {
|
||||
@@ -58,11 +58,31 @@ class DryRunRollback extends Error {
|
||||
}
|
||||
|
||||
function gitCapture(cwd: string, args: string[]): string | null {
|
||||
const startTime = Date.now();
|
||||
const r = spawnSync('git', ['-C', cwd, ...args], {
|
||||
encoding: 'utf8',
|
||||
timeout: GIT_TIMEOUT_MS
|
||||
});
|
||||
if (r.status !== 0) return null;
|
||||
const duration = Date.now() - startTime;
|
||||
|
||||
if (duration > 1000) {
|
||||
logger.debug('GIT', `Slow git operation: git -C ${cwd} ${args.join(' ')} took ${duration}ms`);
|
||||
}
|
||||
|
||||
if (r.error) {
|
||||
logger.warn('GIT', `Git operation failed: git -C ${cwd} ${args.join(' ')}`, {
|
||||
error: r.error.message,
|
||||
timedOut: r.error.name === 'ETIMEDOUT' || (r.status === null && r.signal === 'SIGTERM')
|
||||
});
|
||||
return null;
|
||||
}
|
||||
|
||||
if (r.status !== 0) {
|
||||
logger.debug('GIT', `Git returned non-zero exit code ${r.status}: git -C ${cwd} ${args.join(' ')}`, {
|
||||
stderr: r.stderr?.toString().trim()
|
||||
});
|
||||
return null;
|
||||
}
|
||||
return (r.stdout ?? '').trim();
|
||||
}
|
||||
|
||||
|
||||
@@ -281,83 +281,3 @@ export function uninstallCodexCli(): number {
|
||||
return 0;
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Public API: Status Check
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/**
|
||||
* Check Codex CLI integration status.
|
||||
*
|
||||
* @returns 0 always (informational)
|
||||
*/
|
||||
export function checkCodexCliStatus(): number {
|
||||
console.log('\nClaude-Mem Codex CLI Integration Status\n');
|
||||
|
||||
// Check transcript-watch.json
|
||||
if (!existsSync(DEFAULT_CONFIG_PATH)) {
|
||||
console.log('Status: Not installed');
|
||||
console.log(` No transcript watch config at ${DEFAULT_CONFIG_PATH}`);
|
||||
console.log('\nRun: npx claude-mem install --ide codex-cli\n');
|
||||
return 0;
|
||||
}
|
||||
|
||||
let config: TranscriptWatchConfig;
|
||||
try {
|
||||
config = loadExistingTranscriptWatchConfig();
|
||||
} catch (error) {
|
||||
if (error instanceof Error) {
|
||||
logger.error('WORKER', 'Could not parse transcript-watch.json', { path: DEFAULT_CONFIG_PATH }, error);
|
||||
} else {
|
||||
logger.error('WORKER', 'Could not parse transcript-watch.json', { path: DEFAULT_CONFIG_PATH }, new Error(String(error)));
|
||||
}
|
||||
console.log('Status: Unknown');
|
||||
console.log(' Could not parse transcript-watch.json.');
|
||||
console.log('');
|
||||
return 0;
|
||||
}
|
||||
|
||||
const codexWatch = config.watches.find(
|
||||
(w: WatchTarget) => w.name === CODEX_WATCH_NAME,
|
||||
);
|
||||
const codexSchema = config.schemas?.[CODEX_WATCH_NAME];
|
||||
|
||||
if (!codexWatch) {
|
||||
console.log('Status: Not installed');
|
||||
console.log(' transcript-watch.json exists but no codex watch configured.');
|
||||
console.log('\nRun: npx claude-mem install --ide codex-cli\n');
|
||||
return 0;
|
||||
}
|
||||
|
||||
console.log('Status: Installed');
|
||||
console.log(` Config: ${DEFAULT_CONFIG_PATH}`);
|
||||
console.log(` Watch path: ${codexWatch.path}`);
|
||||
console.log(` Schema: ${codexSchema ? `codex (v${codexSchema.version ?? '?'})` : 'missing'}`);
|
||||
console.log(` Start at end: ${codexWatch.startAtEnd ?? false}`);
|
||||
|
||||
if (codexWatch.context) {
|
||||
console.log(` Context mode: ${codexWatch.context.mode}`);
|
||||
console.log(` Context path: ${codexWatch.context.path ?? '<workspace>/AGENTS.md (default)'}`);
|
||||
console.log(` Context updates on: ${codexWatch.context.updateOn?.join(', ') ?? 'none'}`);
|
||||
}
|
||||
|
||||
if (existsSync(CODEX_AGENTS_MD_PATH)) {
|
||||
const mdContent = readFileSync(CODEX_AGENTS_MD_PATH, 'utf-8');
|
||||
if (mdContent.includes('<claude-mem-context>')) {
|
||||
console.log(` Legacy global context: Present (${CODEX_AGENTS_MD_PATH})`);
|
||||
} else {
|
||||
console.log(` Legacy global context: Not active`);
|
||||
}
|
||||
} else {
|
||||
console.log(` Legacy global context: None`);
|
||||
}
|
||||
|
||||
const sessionsDir = path.join(CODEX_DIR, 'sessions');
|
||||
if (existsSync(sessionsDir)) {
|
||||
console.log(` Sessions directory: exists`);
|
||||
} else {
|
||||
console.log(` Sessions directory: not yet created (use Codex CLI to generate sessions)`);
|
||||
}
|
||||
|
||||
console.log('');
|
||||
return 0;
|
||||
}
|
||||
|
||||
@@ -21,6 +21,61 @@ import { getSupervisor } from '../../supervisor/index.js';
|
||||
import { isPidAlive } from '../../supervisor/process-registry.js';
|
||||
import { ENV_PREFIXES, ENV_EXACT_MATCHES } from '../../supervisor/env-sanitizer.js';
|
||||
|
||||
/**
|
||||
* Plan 06 Phase 6 — instruction content (SKILL.md + ALLOWED_OPERATIONS .md
|
||||
* files) is read once at module init and held in memory for the lifetime of
|
||||
* the worker process. Process restart is the cache-invalidation event.
|
||||
*
|
||||
* `SKILL.md` is held as the full UTF-8 string so `extractInstructionSection`
|
||||
* can slice topic windows on every request without re-reading the file.
|
||||
* Per-operation files are cached as a `Map<operation, content>`. Files that
|
||||
* are missing on disk simply omit from the map; the request handler returns
|
||||
* 404 in that case (preserving legacy behaviour).
|
||||
*/
|
||||
const INSTRUCTIONS_BASE_DIR: string = path.resolve(__dirname, '../skills/mem-search');
|
||||
const INSTRUCTIONS_OPERATIONS_DIR: string = path.join(INSTRUCTIONS_BASE_DIR, 'operations');
|
||||
const INSTRUCTIONS_SKILL_PATH: string = path.join(INSTRUCTIONS_BASE_DIR, 'SKILL.md');
|
||||
|
||||
const cachedSkillMd: string | null = (() => {
|
||||
try {
|
||||
const text = fs.readFileSync(INSTRUCTIONS_SKILL_PATH, 'utf-8');
|
||||
logger.info('SYSTEM', 'Cached SKILL.md at boot', {
|
||||
path: INSTRUCTIONS_SKILL_PATH,
|
||||
bytes: Buffer.byteLength(text, 'utf-8'),
|
||||
});
|
||||
return text;
|
||||
} catch (error: unknown) {
|
||||
logger.debug('SYSTEM', 'SKILL.md not present at boot, /api/instructions will 404 for topic queries', {
|
||||
path: INSTRUCTIONS_SKILL_PATH,
|
||||
message: error instanceof Error ? error.message : String(error),
|
||||
});
|
||||
return null;
|
||||
}
|
||||
})();
|
||||
|
||||
const cachedOperationContent: ReadonlyMap<string, string> = (() => {
|
||||
const map = new Map<string, string>();
|
||||
for (const operation of ALLOWED_OPERATIONS) {
|
||||
const operationPath = path.join(INSTRUCTIONS_OPERATIONS_DIR, `${operation}.md`);
|
||||
try {
|
||||
map.set(operation, fs.readFileSync(operationPath, 'utf-8'));
|
||||
} catch (error: unknown) {
|
||||
// Missing operation files are non-fatal — 404 is returned per request.
|
||||
logger.debug('SYSTEM', 'Operation instruction file not present at boot', {
|
||||
path: operationPath,
|
||||
message: error instanceof Error ? error.message : String(error),
|
||||
});
|
||||
}
|
||||
}
|
||||
if (map.size > 0) {
|
||||
logger.info('SYSTEM', 'Cached operation instruction files at boot', {
|
||||
count: map.size,
|
||||
operations: Array.from(map.keys()),
|
||||
});
|
||||
}
|
||||
return map;
|
||||
})();
|
||||
|
||||
// Build-time injected version constant (set by esbuild define)
|
||||
declare const __DEFAULT_PACKAGE_VERSION__: string;
|
||||
const BUILT_IN_VERSION = typeof __DEFAULT_PACKAGE_VERSION__ !== 'undefined'
|
||||
@@ -94,11 +149,20 @@ export class Server {
|
||||
*/
|
||||
async listen(port: number, host: string): Promise<void> {
|
||||
return new Promise<void>((resolve, reject) => {
|
||||
this.server = this.app.listen(port, host, () => {
|
||||
const server = http.createServer(this.app);
|
||||
this.server = server;
|
||||
const onError = (err: Error) => {
|
||||
server.off('listening', onListening);
|
||||
reject(err);
|
||||
};
|
||||
const onListening = () => {
|
||||
server.off('error', onError);
|
||||
logger.info('SYSTEM', 'HTTP server started', { host, port, pid: process.pid });
|
||||
resolve();
|
||||
});
|
||||
this.server.on('error', reject);
|
||||
};
|
||||
server.once('error', onError);
|
||||
server.once('listening', onListening);
|
||||
server.listen(port, host);
|
||||
});
|
||||
}
|
||||
|
||||
@@ -198,8 +262,9 @@ export class Server {
|
||||
res.status(200).json({ version: BUILT_IN_VERSION });
|
||||
});
|
||||
|
||||
// Instructions endpoint - loads SKILL.md sections on-demand
|
||||
this.app.get('/api/instructions', async (req: Request, res: Response) => {
|
||||
// Instructions endpoint — Plan 06 Phase 6 — serves the cached SKILL.md /
|
||||
// operations content loaded once at module init.
|
||||
this.app.get('/api/instructions', (req: Request, res: Response) => {
|
||||
const topic = (req.query.topic as string) || 'all';
|
||||
const operation = req.query.operation as string | undefined;
|
||||
|
||||
@@ -213,24 +278,20 @@ export class Server {
|
||||
}
|
||||
|
||||
if (operation) {
|
||||
const OPERATIONS_BASE_DIR = path.resolve(__dirname, '../skills/mem-search/operations');
|
||||
const operationPath = path.resolve(OPERATIONS_BASE_DIR, `${operation}.md`);
|
||||
if (!operationPath.startsWith(OPERATIONS_BASE_DIR + path.sep)) {
|
||||
return res.status(400).json({ error: 'Invalid request' });
|
||||
const cached = cachedOperationContent.get(operation);
|
||||
if (cached === undefined) {
|
||||
logger.debug('HTTP', 'Instruction file not cached at boot', { operation });
|
||||
return res.status(404).json({ error: 'Instruction not found' });
|
||||
}
|
||||
return res.json({ content: [{ type: 'text', text: cached }] });
|
||||
}
|
||||
|
||||
try {
|
||||
const content = await this.loadInstructionContent(operation, topic);
|
||||
res.json({ content: [{ type: 'text', text: content }] });
|
||||
} catch (error) {
|
||||
if (error instanceof Error) {
|
||||
logger.debug('HTTP', 'Instruction file not found', { topic, operation, message: error.message });
|
||||
} else {
|
||||
logger.debug('HTTP', 'Instruction file not found', { topic, operation, error: String(error) });
|
||||
}
|
||||
res.status(404).json({ error: 'Instruction not found' });
|
||||
if (cachedSkillMd === null) {
|
||||
logger.debug('HTTP', 'SKILL.md not cached at boot', { topic });
|
||||
return res.status(404).json({ error: 'Instruction not found' });
|
||||
}
|
||||
const sectionText = this.extractInstructionSection(cachedSkillMd, topic);
|
||||
res.json({ content: [{ type: 'text', text: sectionText }] });
|
||||
});
|
||||
|
||||
// Admin endpoints for process management (localhost-only)
|
||||
@@ -330,20 +391,6 @@ export class Server {
|
||||
});
|
||||
}
|
||||
|
||||
/**
|
||||
* Load instruction content from disk for the /api/instructions endpoint.
|
||||
* Caller must validate operation/topic before calling.
|
||||
*/
|
||||
private async loadInstructionContent(operation: string | undefined, topic: string): Promise<string> {
|
||||
if (operation) {
|
||||
const operationPath = path.resolve(__dirname, '../skills/mem-search/operations', `${operation}.md`);
|
||||
return fs.promises.readFile(operationPath, 'utf-8');
|
||||
}
|
||||
const skillPath = path.join(__dirname, '../skills/mem-search/SKILL.md');
|
||||
const fullContent = await fs.promises.readFile(skillPath, 'utf-8');
|
||||
return this.extractInstructionSection(fullContent, topic);
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract a specific section from instruction content
|
||||
*/
|
||||
|
||||
@@ -480,15 +480,6 @@ const QUERIES: Record<string, string> = {
|
||||
(class_definition name: (identifier) @name) @cls
|
||||
(import_statement) @imp
|
||||
(import_declaration) @imp
|
||||
`,
|
||||
|
||||
php: `
|
||||
(function_definition name: (name) @name) @func
|
||||
(method_declaration name: (name) @name) @method
|
||||
(class_declaration name: (name) @name) @cls
|
||||
(interface_declaration name: (name) @name) @iface
|
||||
(trait_declaration name: (name) @name) @trait_def
|
||||
(namespace_use_declaration) @imp
|
||||
`,
|
||||
};
|
||||
|
||||
|
||||
@@ -1,8 +1,4 @@
|
||||
import { Database } from 'bun:sqlite';
|
||||
import { execFileSync } from 'child_process';
|
||||
import { existsSync, unlinkSync, writeFileSync } from 'fs';
|
||||
import { tmpdir } from 'os';
|
||||
import { join } from 'path';
|
||||
import { DATA_DIR, DB_PATH, ensureDir } from '../../shared/paths.js';
|
||||
import { logger } from '../../utils/logger.js';
|
||||
import { MigrationRunner } from './migrations/runner.js';
|
||||
@@ -19,118 +15,6 @@ export interface Migration {
|
||||
|
||||
let dbInstance: Database | null = null;
|
||||
|
||||
/**
|
||||
* Repair malformed database schema before migrations run.
|
||||
*
|
||||
* This handles the case where a database is synced between machines running
|
||||
* different claude-mem versions. A newer version may have added columns and
|
||||
* indexes that an older version (or even the same version on a fresh install)
|
||||
* cannot process. SQLite throws "malformed database schema" when it encounters
|
||||
* an index referencing a non-existent column, which prevents ALL queries —
|
||||
* including the migrations that would fix the schema.
|
||||
*
|
||||
* The fix: use Python's sqlite3 module (which supports writable_schema) to
|
||||
* drop the orphaned schema objects, then let the migration system recreate
|
||||
* them properly. bun:sqlite doesn't allow DELETE FROM sqlite_master even
|
||||
* with writable_schema = ON.
|
||||
*/
|
||||
function repairMalformedSchema(db: Database): void {
|
||||
try {
|
||||
// Quick test: if we can query sqlite_master, the schema is fine
|
||||
db.query('SELECT name FROM sqlite_master WHERE type = "table" LIMIT 1').all();
|
||||
return;
|
||||
} catch (error: unknown) {
|
||||
const message = error instanceof Error ? error.message : String(error);
|
||||
if (!message.includes('malformed database schema')) {
|
||||
throw error;
|
||||
}
|
||||
|
||||
logger.warn('DB', 'Detected malformed database schema, attempting repair', { error: message });
|
||||
|
||||
// Extract the problematic object name from the error message
|
||||
// Format: "malformed database schema (object_name) - details"
|
||||
const match = message.match(/malformed database schema \(([^)]+)\)/);
|
||||
if (!match) {
|
||||
logger.error('DB', 'Could not parse malformed schema error, cannot auto-repair', { error: message });
|
||||
throw error;
|
||||
}
|
||||
|
||||
const objectName = match[1];
|
||||
logger.info('DB', `Dropping malformed schema object: ${objectName}`);
|
||||
|
||||
// Get the DB file path. For file-based DBs, we can use Python to repair.
|
||||
// For in-memory DBs, we can't shell out — just re-throw.
|
||||
const dbPath = db.filename;
|
||||
if (!dbPath || dbPath === ':memory:' || dbPath === '') {
|
||||
logger.error('DB', 'Cannot auto-repair in-memory database');
|
||||
throw error;
|
||||
}
|
||||
|
||||
// Close the connection so Python can safely modify the file
|
||||
db.close();
|
||||
|
||||
// Use Python's sqlite3 module to drop the orphaned object and reset
|
||||
// related migration versions so they re-run and recreate things properly.
|
||||
// bun:sqlite doesn't support DELETE FROM sqlite_master even with writable_schema.
|
||||
//
|
||||
// We write a temp script rather than using -c to avoid shell escaping issues
|
||||
// with paths containing spaces or special characters. execFileSync passes
|
||||
// args directly without a shell, so dbPath and objectName are safe.
|
||||
const scriptPath = join(tmpdir(), `claude-mem-repair-${Date.now()}.py`);
|
||||
try {
|
||||
writeFileSync(scriptPath, `
|
||||
import sqlite3, sys
|
||||
db_path = sys.argv[1]
|
||||
obj_name = sys.argv[2]
|
||||
c = sqlite3.connect(db_path)
|
||||
c.execute('PRAGMA writable_schema = ON')
|
||||
c.execute('DELETE FROM sqlite_master WHERE name = ?', (obj_name,))
|
||||
c.execute('PRAGMA writable_schema = OFF')
|
||||
# Reset migration versions so affected migrations re-run.
|
||||
# Guard with existence check: schema_versions may not exist on a very fresh DB.
|
||||
has_sv = c.execute(
|
||||
"SELECT count(*) FROM sqlite_master WHERE type='table' AND name='schema_versions'"
|
||||
).fetchone()[0]
|
||||
if has_sv:
|
||||
c.execute('DELETE FROM schema_versions')
|
||||
c.commit()
|
||||
c.close()
|
||||
`);
|
||||
execFileSync('python3', [scriptPath, dbPath, objectName], { timeout: 10000 });
|
||||
logger.info('DB', `Dropped orphaned schema object "${objectName}" and reset migration versions via Python sqlite3. All migrations will re-run (they are idempotent).`);
|
||||
} catch (pyError: unknown) {
|
||||
const pyMessage = pyError instanceof Error ? pyError.message : String(pyError);
|
||||
logger.error('DB', 'Python sqlite3 repair failed', { error: pyMessage });
|
||||
throw new Error(`Schema repair failed: ${message}. Python repair error: ${pyMessage}`);
|
||||
} finally {
|
||||
if (existsSync(scriptPath)) unlinkSync(scriptPath);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Wrapper that handles the close/reopen cycle needed for schema repair.
|
||||
* Returns a (possibly new) Database connection.
|
||||
*/
|
||||
function repairMalformedSchemaWithReopen(dbPath: string, db: Database): Database {
|
||||
try {
|
||||
db.query('SELECT name FROM sqlite_master WHERE type = "table" LIMIT 1').all();
|
||||
return db;
|
||||
} catch (error: unknown) {
|
||||
const message = error instanceof Error ? error.message : String(error);
|
||||
if (!message.includes('malformed database schema')) {
|
||||
throw error;
|
||||
}
|
||||
|
||||
// repairMalformedSchema closes the DB internally for Python access
|
||||
repairMalformedSchema(db);
|
||||
|
||||
// Reopen and check for additional malformed objects
|
||||
const newDb = new Database(dbPath, { create: true, readwrite: true });
|
||||
return repairMalformedSchemaWithReopen(dbPath, newDb);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* ClaudeMemDatabase - New entry point for the sqlite module
|
||||
*
|
||||
@@ -154,11 +38,6 @@ export class ClaudeMemDatabase {
|
||||
// Create database connection
|
||||
this.db = new Database(dbPath, { create: true, readwrite: true });
|
||||
|
||||
// Repair any malformed schema before applying settings or running migrations.
|
||||
// Must happen first — even PRAGMA calls can fail on a corrupted schema.
|
||||
// This may close and reopen the connection if repair is needed.
|
||||
this.db = repairMalformedSchemaWithReopen(dbPath, this.db);
|
||||
|
||||
// Apply optimized SQLite settings
|
||||
this.db.run('PRAGMA journal_mode = WAL');
|
||||
this.db.run('PRAGMA synchronous = NORMAL');
|
||||
@@ -218,10 +97,6 @@ export class DatabaseManager {
|
||||
|
||||
this.db = new Database(DB_PATH, { create: true, readwrite: true });
|
||||
|
||||
// Repair any malformed schema before applying settings or running migrations.
|
||||
// Must happen first — even PRAGMA calls can fail on a corrupted schema.
|
||||
this.db = repairMalformedSchemaWithReopen(DB_PATH, this.db);
|
||||
|
||||
// Apply optimized SQLite settings
|
||||
this.db.run('PRAGMA journal_mode = WAL');
|
||||
this.db.run('PRAGMA synchronous = NORMAL');
|
||||
|
||||
@@ -1,9 +1,18 @@
|
||||
import { Database } from './sqlite-compat.js';
|
||||
import { Database } from 'bun:sqlite';
|
||||
import type { PendingMessage } from '../worker-types.js';
|
||||
import { logger } from '../../utils/logger.js';
|
||||
|
||||
/** Messages processing longer than this are considered stale and reset to pending by self-healing */
|
||||
const STALE_PROCESSING_THRESHOLD_MS = 60_000;
|
||||
/**
|
||||
* Provider for the set of currently-live worker PIDs.
|
||||
*
|
||||
* The self-healing claim query reclaims any 'processing' row whose
|
||||
* worker_pid is NOT a live worker (crash recovery without a timer).
|
||||
*
|
||||
* Default: a single-worker process supplies just its own PID. Multi-worker
|
||||
* deployments inject a callback backed by `supervisor/process-registry.ts`
|
||||
* (`getSupervisor().getRegistry().getAll().filter(r => r.type === 'worker').map(r => r.pid)`).
|
||||
*/
|
||||
export type LiveWorkerPidsProvider = () => readonly number[];
|
||||
|
||||
/**
|
||||
* Persistent pending message record from database
|
||||
@@ -22,8 +31,8 @@ export interface PersistentPendingMessage {
|
||||
status: 'pending' | 'processing' | 'processed' | 'failed';
|
||||
retry_count: number;
|
||||
created_at_epoch: number;
|
||||
started_processing_at_epoch: number | null;
|
||||
completed_at_epoch: number | null;
|
||||
worker_pid: number | null;
|
||||
// Claude Code subagent identity — NULL for main-session messages.
|
||||
agent_type: string | null;
|
||||
agent_id: string | null;
|
||||
@@ -37,44 +46,76 @@ export interface PersistentPendingMessage {
|
||||
*
|
||||
* Lifecycle:
|
||||
* 1. enqueue() - Message persisted with status 'pending'
|
||||
* 2. claimNextMessage() - Atomically claims next pending message (marks as 'processing')
|
||||
* 2. claimNextMessage() - Atomically claims next pending message (marks as 'processing'
|
||||
* and stamps the live worker's PID). Self-healing: reclaims any 'processing' row
|
||||
* whose worker_pid is no longer alive (worker crash) in the same UPDATE.
|
||||
* 3. confirmProcessed() - Deletes message after successful processing
|
||||
*
|
||||
* Self-healing:
|
||||
* - claimNextMessage() resets stale 'processing' messages (>60s) back to 'pending' before claiming
|
||||
* - This eliminates stuck messages from generator crashes without external timers
|
||||
*
|
||||
* Recovery:
|
||||
* - getSessionsWithPendingMessages() - Find sessions that need recovery on startup
|
||||
* Self-healing semantics:
|
||||
* A 'processing' row is reclaimable iff worker_pid IS NULL or worker_pid is
|
||||
* not present in the live-pids list at claim time. No timer, no
|
||||
* stale-cutoff timestamp — liveness is the truth.
|
||||
*/
|
||||
export class PendingMessageStore {
|
||||
private db: Database;
|
||||
private maxRetries: number;
|
||||
private workerPid: number;
|
||||
private getLiveWorkerPids: LiveWorkerPidsProvider;
|
||||
|
||||
constructor(db: Database, maxRetries: number = 3) {
|
||||
/**
|
||||
* @param db SQLite database
|
||||
* @param maxRetries Per-message retry ceiling for transient SDK failures (default 3)
|
||||
* @param workerPid PID of the worker that owns this store; stamped into worker_pid on claim.
|
||||
* Defaults to process.pid so single-process deployments need no extra wiring.
|
||||
* @param getLiveWorkerPids Provider for the set of all currently-live worker PIDs.
|
||||
* Defaults to `[workerPid]` — only this worker is alive.
|
||||
* Multi-worker deployments inject a supervisor-backed provider.
|
||||
*/
|
||||
constructor(
|
||||
db: Database,
|
||||
maxRetries: number = 3,
|
||||
workerPid: number = process.pid,
|
||||
getLiveWorkerPids?: LiveWorkerPidsProvider
|
||||
) {
|
||||
this.db = db;
|
||||
this.maxRetries = maxRetries;
|
||||
this.workerPid = workerPid;
|
||||
this.getLiveWorkerPids = getLiveWorkerPids ?? (() => [this.workerPid]);
|
||||
}
|
||||
|
||||
/**
|
||||
* Enqueue a new message (persist before processing)
|
||||
* @returns The database ID of the persisted message
|
||||
* Enqueue a new message (persist before processing).
|
||||
*
|
||||
* Uses `INSERT OR IGNORE` so duplicate (content_session_id, tool_use_id)
|
||||
* pairs collapse to a single row — the UNIQUE INDEX added in plan 01 phase 1
|
||||
* is the authority on tool-use idempotency. Per principle 3 (UNIQUE
|
||||
* constraint over dedup window), we don't time-gate duplicates.
|
||||
*
|
||||
* @returns The database ID of the persisted message, or 0 when the insert
|
||||
* was suppressed by ON CONFLICT. Callers MUST guard with `id > 0`
|
||||
* before threading the value into any subsequent SQL (e.g.
|
||||
* `confirmProcessed`, `markFailed`, `processingMessageIds`) —
|
||||
* a zero id would silently target zero rows. The only two call
|
||||
* sites today (`SessionManager.queueObservation` and
|
||||
* `queueSummarize`) use the id purely for logging and both
|
||||
* branch on `messageId === 0`.
|
||||
*/
|
||||
enqueue(sessionDbId: number, contentSessionId: string, message: PendingMessage): number {
|
||||
const now = Date.now();
|
||||
const stmt = this.db.prepare(`
|
||||
INSERT INTO pending_messages (
|
||||
session_db_id, content_session_id, message_type,
|
||||
INSERT OR IGNORE INTO pending_messages (
|
||||
session_db_id, content_session_id, tool_use_id, message_type,
|
||||
tool_name, tool_input, tool_response, cwd,
|
||||
last_assistant_message,
|
||||
prompt_number, status, retry_count, created_at_epoch,
|
||||
agent_type, agent_id
|
||||
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, 'pending', 0, ?, ?, ?)
|
||||
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 'pending', 0, ?, ?, ?)
|
||||
`);
|
||||
|
||||
const result = stmt.run(
|
||||
sessionDbId,
|
||||
contentSessionId,
|
||||
message.toolUseId ?? null,
|
||||
message.type,
|
||||
message.tool_name || null,
|
||||
message.tool_input ? JSON.stringify(message.tool_input) : null,
|
||||
@@ -90,58 +131,58 @@ export class PendingMessageStore {
|
||||
return result.lastInsertRowid as number;
|
||||
}
|
||||
|
||||
/**
|
||||
* Atomically claim the next pending message by marking it as 'processing'.
|
||||
* Self-healing: resets any stale 'processing' messages (>60s) back to 'pending' first.
|
||||
* Message stays in DB until confirmProcessed() is called.
|
||||
* Uses a transaction to prevent race conditions.
|
||||
/**
|
||||
* Atomically claim the next message for `sessionDbId`.
|
||||
*
|
||||
* A row is claimable iff:
|
||||
* - status = 'pending', OR
|
||||
* - status = 'processing' AND worker_pid is not in the live-pids set
|
||||
* (i.e. the previous owner crashed). This is the self-healing branch:
|
||||
* liveness is checked at claim time, not by a background reaper.
|
||||
*
|
||||
* The claim stamps the live worker's PID and flips status to 'processing'
|
||||
* in a single UPDATE … WHERE id = (subquery).
|
||||
*/
|
||||
claimNextMessage(sessionDbId: number): PersistentPendingMessage | null {
|
||||
const claimTx = this.db.transaction((sessionId: number) => {
|
||||
// Capture time inside transaction so it's fresh if WAL contention causes retry
|
||||
const now = Date.now();
|
||||
// Self-healing: reset stale 'processing' messages back to 'pending'
|
||||
// This recovers from generator crashes without external timers
|
||||
// Note: strict < means messages must be OLDER than threshold to be reset
|
||||
const staleCutoff = now - STALE_PROCESSING_THRESHOLD_MS;
|
||||
const resetStmt = this.db.prepare(`
|
||||
UPDATE pending_messages
|
||||
SET status = 'pending', started_processing_at_epoch = NULL
|
||||
WHERE session_db_id = ? AND status = 'processing'
|
||||
AND started_processing_at_epoch < ?
|
||||
`);
|
||||
const resetResult = resetStmt.run(sessionId, staleCutoff);
|
||||
if (resetResult.changes > 0) {
|
||||
logger.info('QUEUE', `SELF_HEAL | sessionDbId=${sessionId} | recovered ${resetResult.changes} stale processing message(s)`);
|
||||
}
|
||||
// Build a parameterized IN-list of live worker PIDs. We always include
|
||||
// this worker's PID so that an in-flight claim doesn't accidentally
|
||||
// self-reclaim a row we just stamped (the predicate is "NOT IN live").
|
||||
const livePids = this.getLivePidsIncludingSelf();
|
||||
const placeholders = livePids.map(() => '?').join(',');
|
||||
|
||||
const peekStmt = this.db.prepare(`
|
||||
SELECT * FROM pending_messages
|
||||
WHERE session_db_id = ? AND status = 'pending'
|
||||
ORDER BY id ASC
|
||||
LIMIT 1
|
||||
`);
|
||||
const msg = peekStmt.get(sessionId) as PersistentPendingMessage | null;
|
||||
const sql = `
|
||||
UPDATE pending_messages
|
||||
SET status = 'processing',
|
||||
worker_pid = ?
|
||||
WHERE id = (
|
||||
SELECT id FROM pending_messages
|
||||
WHERE session_db_id = ?
|
||||
AND (
|
||||
status = 'pending'
|
||||
OR (status = 'processing' AND (worker_pid IS NULL OR worker_pid NOT IN (${placeholders})))
|
||||
)
|
||||
ORDER BY id ASC
|
||||
LIMIT 1
|
||||
)
|
||||
RETURNING *
|
||||
`;
|
||||
|
||||
if (msg) {
|
||||
// CRITICAL FIX: Mark as 'processing' instead of deleting
|
||||
// Message will be deleted by confirmProcessed() after successful store
|
||||
const updateStmt = this.db.prepare(`
|
||||
UPDATE pending_messages
|
||||
SET status = 'processing', started_processing_at_epoch = ?
|
||||
WHERE id = ?
|
||||
`);
|
||||
updateStmt.run(now, msg.id);
|
||||
const stmt = this.db.prepare(sql);
|
||||
const params: (number | string)[] = [this.workerPid, sessionDbId, ...livePids];
|
||||
const claimed = stmt.get(...params) as PersistentPendingMessage | null;
|
||||
|
||||
// Log claim with minimal info (avoid logging full payload)
|
||||
logger.info('QUEUE', `CLAIMED | sessionDbId=${sessionId} | messageId=${msg.id} | type=${msg.message_type}`, {
|
||||
sessionId: sessionId
|
||||
});
|
||||
}
|
||||
return msg;
|
||||
});
|
||||
if (claimed) {
|
||||
logger.info('QUEUE', `CLAIMED | sessionDbId=${sessionDbId} | messageId=${claimed.id} | type=${claimed.message_type} | workerPid=${this.workerPid}`, {
|
||||
sessionId: sessionDbId
|
||||
});
|
||||
}
|
||||
return claimed;
|
||||
}
|
||||
|
||||
return claimTx(sessionDbId) as PersistentPendingMessage | null;
|
||||
private getLivePidsIncludingSelf(): number[] {
|
||||
const pids = this.getLiveWorkerPids();
|
||||
if (pids.includes(this.workerPid)) return [...pids];
|
||||
return [...pids, this.workerPid];
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -158,34 +199,19 @@ export class PendingMessageStore {
|
||||
}
|
||||
|
||||
/**
|
||||
* Reset stale 'processing' messages back to 'pending' for retry.
|
||||
* Called on worker startup and periodically to recover from crashes.
|
||||
* @param thresholdMs Messages processing longer than this are considered stale (default: 5 minutes)
|
||||
* @returns Number of messages reset
|
||||
* Delete `status='failed'` rows older than `thresholdMs`. Called once at
|
||||
* worker startup so `pending_messages` does not grow unbounded on long-
|
||||
* running or high-failure-rate installations; `claimNextMessage`'s
|
||||
* self-healing subquery scans this table, so bounded rows keep claim
|
||||
* latency predictable. Not a reaper — one-shot, idempotent.
|
||||
*/
|
||||
resetStaleProcessingMessages(thresholdMs: number = 5 * 60 * 1000, sessionDbId?: number): number {
|
||||
clearFailedOlderThan(thresholdMs: number): number {
|
||||
const cutoff = Date.now() - thresholdMs;
|
||||
let stmt;
|
||||
let result;
|
||||
if (sessionDbId !== undefined) {
|
||||
stmt = this.db.prepare(`
|
||||
UPDATE pending_messages
|
||||
SET status = 'pending', started_processing_at_epoch = NULL
|
||||
WHERE status = 'processing' AND started_processing_at_epoch < ? AND session_db_id = ?
|
||||
`);
|
||||
result = stmt.run(cutoff, sessionDbId);
|
||||
} else {
|
||||
stmt = this.db.prepare(`
|
||||
UPDATE pending_messages
|
||||
SET status = 'pending', started_processing_at_epoch = NULL
|
||||
WHERE status = 'processing' AND started_processing_at_epoch < ?
|
||||
`);
|
||||
result = stmt.run(cutoff);
|
||||
}
|
||||
if (result.changes > 0) {
|
||||
logger.info('QUEUE', `RESET_STALE | count=${result.changes} | thresholdMs=${thresholdMs}${sessionDbId !== undefined ? ` | sessionDbId=${sessionDbId}` : ''}`);
|
||||
}
|
||||
return result.changes;
|
||||
const stmt = this.db.prepare(`
|
||||
DELETE FROM pending_messages
|
||||
WHERE status = 'failed' AND COALESCE(failed_at_epoch, completed_at_epoch, 0) < ?
|
||||
`);
|
||||
return stmt.run(cutoff).changes;
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -201,144 +227,44 @@ export class PendingMessageStore {
|
||||
}
|
||||
|
||||
/**
|
||||
* Get all queue messages (for UI display)
|
||||
* Returns pending, processing, and failed messages (not processed - they're deleted)
|
||||
* Joins with sdk_sessions to get project name
|
||||
* Transition pending_messages rows to a terminal status — PATHFINDER-2026-04-22
|
||||
* Plan 06 Phase 9. One SQL UPDATE path, one place to add a new terminal status
|
||||
* later, zero divergence between call sites.
|
||||
*
|
||||
* - `failed` — narrow form: only rows currently `status='processing'`.
|
||||
* Used during error recovery when a session generator crashes and we want
|
||||
* to mark its in-flight messages failed without touching rows that never
|
||||
* left `pending`.
|
||||
*
|
||||
* - `abandoned` — wide form: rows in `('pending', 'processing')`.
|
||||
* Used during session termination or completion drain so the session
|
||||
* doesn't appear in `getSessionsWithPendingMessages` forever. Both forms
|
||||
* write the row's `status` column to `'failed'`; `abandoned` is just the
|
||||
* broader WHERE clause.
|
||||
*
|
||||
* Cites Principle 6 (one helper, N callers) and Principle 7 (the
|
||||
* old per-status wrapper methods were deleted in the same PR).
|
||||
*
|
||||
* @param status `'failed'` (processing-only) or `'abandoned'` (pending+processing)
|
||||
* @param filter `{ sessionDbId: number }` — scope to one session's rows.
|
||||
* Required: no unscoped path exists, to prevent accidental global drain.
|
||||
* @returns Number of rows updated
|
||||
*/
|
||||
getQueueMessages(): (PersistentPendingMessage & { project: string | null })[] {
|
||||
const stmt = this.db.prepare(`
|
||||
SELECT pm.*, ss.project
|
||||
FROM pending_messages pm
|
||||
LEFT JOIN sdk_sessions ss ON pm.content_session_id = ss.content_session_id
|
||||
WHERE pm.status IN ('pending', 'processing', 'failed')
|
||||
ORDER BY
|
||||
CASE pm.status
|
||||
WHEN 'failed' THEN 0
|
||||
WHEN 'processing' THEN 1
|
||||
WHEN 'pending' THEN 2
|
||||
END,
|
||||
pm.created_at_epoch ASC
|
||||
`);
|
||||
return stmt.all() as (PersistentPendingMessage & { project: string | null })[];
|
||||
}
|
||||
|
||||
/**
|
||||
* Get count of stuck messages (processing longer than threshold)
|
||||
*/
|
||||
getStuckCount(thresholdMs: number): number {
|
||||
const cutoff = Date.now() - thresholdMs;
|
||||
const stmt = this.db.prepare(`
|
||||
SELECT COUNT(*) as count FROM pending_messages
|
||||
WHERE status = 'processing' AND started_processing_at_epoch < ?
|
||||
`);
|
||||
const result = stmt.get(cutoff) as { count: number };
|
||||
return result.count;
|
||||
}
|
||||
|
||||
/**
|
||||
* Retry a specific message (reset to pending)
|
||||
* Works for pending (re-queue), processing (reset stuck), and failed messages
|
||||
*/
|
||||
retryMessage(messageId: number): boolean {
|
||||
const stmt = this.db.prepare(`
|
||||
UPDATE pending_messages
|
||||
SET status = 'pending', started_processing_at_epoch = NULL
|
||||
WHERE id = ? AND status IN ('pending', 'processing', 'failed')
|
||||
`);
|
||||
const result = stmt.run(messageId);
|
||||
return result.changes > 0;
|
||||
}
|
||||
|
||||
/**
|
||||
* Reset all processing messages for a session to pending
|
||||
* Used when force-restarting a stuck session
|
||||
*/
|
||||
resetProcessingToPending(sessionDbId: number): number {
|
||||
const stmt = this.db.prepare(`
|
||||
UPDATE pending_messages
|
||||
SET status = 'pending', started_processing_at_epoch = NULL
|
||||
WHERE session_db_id = ? AND status = 'processing'
|
||||
`);
|
||||
const result = stmt.run(sessionDbId);
|
||||
return result.changes;
|
||||
}
|
||||
|
||||
/**
|
||||
* Mark all processing messages for a session as failed
|
||||
* Used in error recovery when session generator crashes
|
||||
* @returns Number of messages marked failed
|
||||
*/
|
||||
markSessionMessagesFailed(sessionDbId: number): number {
|
||||
transitionMessagesTo(
|
||||
status: 'failed' | 'abandoned',
|
||||
filter: { sessionDbId: number }
|
||||
): number {
|
||||
const now = Date.now();
|
||||
const statusClause = status === 'failed'
|
||||
? `status = 'processing'`
|
||||
: `status IN ('pending', 'processing')`;
|
||||
|
||||
// Atomic update - all processing messages for session → failed
|
||||
// Note: This bypasses retry logic since generator failures are session-level,
|
||||
// not message-level. Individual message failures use markFailed() instead.
|
||||
const stmt = this.db.prepare(`
|
||||
UPDATE pending_messages
|
||||
SET status = 'failed', failed_at_epoch = ?
|
||||
WHERE session_db_id = ? AND status = 'processing'
|
||||
WHERE session_db_id = ? AND ${statusClause}
|
||||
`);
|
||||
|
||||
const result = stmt.run(now, sessionDbId);
|
||||
return result.changes;
|
||||
}
|
||||
|
||||
/**
|
||||
* Mark all pending and processing messages for a session as failed (abandoned).
|
||||
* Used when SDK session is terminated and no fallback agent is available:
|
||||
* prevents the session from appearing in getSessionsWithPendingMessages forever.
|
||||
* @returns Number of messages marked failed
|
||||
*/
|
||||
markAllSessionMessagesAbandoned(sessionDbId: number): number {
|
||||
const now = Date.now();
|
||||
const stmt = this.db.prepare(`
|
||||
UPDATE pending_messages
|
||||
SET status = 'failed', failed_at_epoch = ?
|
||||
WHERE session_db_id = ? AND status IN ('pending', 'processing')
|
||||
`);
|
||||
const result = stmt.run(now, sessionDbId);
|
||||
return result.changes;
|
||||
}
|
||||
|
||||
/**
|
||||
* Abort a specific message (delete from queue)
|
||||
*/
|
||||
abortMessage(messageId: number): boolean {
|
||||
const stmt = this.db.prepare('DELETE FROM pending_messages WHERE id = ?');
|
||||
const result = stmt.run(messageId);
|
||||
return result.changes > 0;
|
||||
}
|
||||
|
||||
/**
|
||||
* Retry all stuck messages at once
|
||||
*/
|
||||
retryAllStuck(thresholdMs: number): number {
|
||||
const cutoff = Date.now() - thresholdMs;
|
||||
const stmt = this.db.prepare(`
|
||||
UPDATE pending_messages
|
||||
SET status = 'pending', started_processing_at_epoch = NULL
|
||||
WHERE status = 'processing' AND started_processing_at_epoch < ?
|
||||
`);
|
||||
const result = stmt.run(cutoff);
|
||||
return result.changes;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get recently processed messages (for UI feedback)
|
||||
* Shows messages completed in the last N minutes so users can see their stuck items were processed
|
||||
*/
|
||||
getRecentlyProcessed(limit: number = 10, withinMinutes: number = 30): (PersistentPendingMessage & { project: string | null })[] {
|
||||
const cutoff = Date.now() - (withinMinutes * 60 * 1000);
|
||||
const stmt = this.db.prepare(`
|
||||
SELECT pm.*, ss.project
|
||||
FROM pending_messages pm
|
||||
LEFT JOIN sdk_sessions ss ON pm.content_session_id = ss.content_session_id
|
||||
WHERE pm.status = 'processed' AND pm.completed_at_epoch > ?
|
||||
ORDER BY pm.completed_at_epoch DESC
|
||||
LIMIT ?
|
||||
`);
|
||||
return stmt.all(cutoff, limit) as (PersistentPendingMessage & { project: string | null })[];
|
||||
return stmt.run(now, filter.sessionDbId).changes;
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -358,7 +284,7 @@ export class PendingMessageStore {
|
||||
// Move back to pending for retry
|
||||
const stmt = this.db.prepare(`
|
||||
UPDATE pending_messages
|
||||
SET status = 'pending', retry_count = retry_count + 1, started_processing_at_epoch = NULL
|
||||
SET status = 'pending', retry_count = retry_count + 1, worker_pid = NULL
|
||||
WHERE id = ?
|
||||
`);
|
||||
stmt.run(messageId);
|
||||
@@ -373,24 +299,6 @@ export class PendingMessageStore {
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Reset stuck messages (processing -> pending if stuck longer than threshold)
|
||||
* @param thresholdMs Messages processing longer than this are considered stuck (0 = reset all)
|
||||
* @returns Number of messages reset
|
||||
*/
|
||||
resetStuckMessages(thresholdMs: number): number {
|
||||
const cutoff = thresholdMs === 0 ? Date.now() : Date.now() - thresholdMs;
|
||||
|
||||
const stmt = this.db.prepare(`
|
||||
UPDATE pending_messages
|
||||
SET status = 'pending', started_processing_at_epoch = NULL
|
||||
WHERE status = 'processing' AND started_processing_at_epoch < ?
|
||||
`);
|
||||
|
||||
const result = stmt.run(cutoff);
|
||||
return result.changes;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get count of pending messages for a session
|
||||
*/
|
||||
@@ -417,27 +325,21 @@ export class PendingMessageStore {
|
||||
}
|
||||
|
||||
/**
|
||||
* Check if any session has pending work.
|
||||
* Excludes 'processing' messages stuck for >5 minutes (resets them to 'pending' as a side effect).
|
||||
* Check if any session has work that could be claimed right now.
|
||||
*
|
||||
* Counts a row as work iff it is 'pending' or it is 'processing' under a
|
||||
* worker_pid that is not currently alive (the same predicate the
|
||||
* self-healing claim uses). No side effects — no UPDATE, no timer.
|
||||
*/
|
||||
hasAnyPendingWork(): boolean {
|
||||
// Reset stuck 'processing' messages older than 5 minutes before checking
|
||||
const stuckCutoff = Date.now() - (5 * 60 * 1000);
|
||||
const resetStmt = this.db.prepare(`
|
||||
UPDATE pending_messages
|
||||
SET status = 'pending', started_processing_at_epoch = NULL
|
||||
WHERE status = 'processing' AND started_processing_at_epoch < ?
|
||||
`);
|
||||
const resetResult = resetStmt.run(stuckCutoff);
|
||||
if (resetResult.changes > 0) {
|
||||
logger.info('QUEUE', `STUCK_RESET | hasAnyPendingWork reset ${resetResult.changes} stuck processing message(s) older than 5 minutes`);
|
||||
}
|
||||
|
||||
const livePids = this.getLivePidsIncludingSelf();
|
||||
const placeholders = livePids.map(() => '?').join(',');
|
||||
const stmt = this.db.prepare(`
|
||||
SELECT COUNT(*) as count FROM pending_messages
|
||||
WHERE status IN ('pending', 'processing')
|
||||
WHERE status = 'pending'
|
||||
OR (status = 'processing' AND (worker_pid IS NULL OR worker_pid NOT IN (${placeholders})))
|
||||
`);
|
||||
const result = stmt.get() as { count: number };
|
||||
const result = stmt.get(...livePids) as { count: number };
|
||||
return result.count > 0;
|
||||
}
|
||||
|
||||
@@ -464,52 +366,6 @@ export class PendingMessageStore {
|
||||
return result ? { sessionDbId: result.session_db_id, contentSessionId: result.content_session_id } : null;
|
||||
}
|
||||
|
||||
/**
|
||||
* Clear all failed messages from the queue
|
||||
* @returns Number of messages deleted
|
||||
*/
|
||||
clearFailed(): number {
|
||||
const stmt = this.db.prepare(`
|
||||
DELETE FROM pending_messages
|
||||
WHERE status = 'failed'
|
||||
`);
|
||||
const result = stmt.run();
|
||||
return result.changes;
|
||||
}
|
||||
|
||||
/**
|
||||
* Clear failed messages older than the given threshold.
|
||||
* Preserves recent failures for inspection and manual retry.
|
||||
* @param thresholdMs - Only delete failures older than this many milliseconds
|
||||
* @returns Number of messages deleted
|
||||
*/
|
||||
clearFailedOlderThan(thresholdMs: number): number {
|
||||
const cutoff = Date.now() - thresholdMs;
|
||||
// Use COALESCE to prefer the most recent failure timestamp over creation time.
|
||||
// failed_at_epoch is set by session-level failures, completed_at_epoch by markFailed().
|
||||
const stmt = this.db.prepare(`
|
||||
DELETE FROM pending_messages
|
||||
WHERE status = 'failed'
|
||||
AND COALESCE(failed_at_epoch, completed_at_epoch, started_processing_at_epoch, created_at_epoch) < ?
|
||||
`);
|
||||
const result = stmt.run(cutoff);
|
||||
return result.changes;
|
||||
}
|
||||
|
||||
/**
|
||||
* Clear all pending, processing, and failed messages from the queue
|
||||
* Keeps only processed messages (for history)
|
||||
* @returns Number of messages deleted
|
||||
*/
|
||||
clearAll(): number {
|
||||
const stmt = this.db.prepare(`
|
||||
DELETE FROM pending_messages
|
||||
WHERE status IN ('pending', 'processing', 'failed')
|
||||
`);
|
||||
const result = stmt.run();
|
||||
return result.changes;
|
||||
}
|
||||
|
||||
/**
|
||||
* Convert a PersistentPendingMessage back to PendingMessage format
|
||||
*/
|
||||
|
||||
@@ -25,13 +25,14 @@ export class SessionSearch {
|
||||
|
||||
private static readonly MISSING_SEARCH_INPUT_MESSAGE = 'Either query or filters required for search';
|
||||
|
||||
constructor(dbPath?: string) {
|
||||
if (!dbPath) {
|
||||
constructor(dbPathOrDb: string | Database = DB_PATH) {
|
||||
if (dbPathOrDb instanceof Database) {
|
||||
this.db = dbPathOrDb;
|
||||
} else {
|
||||
ensureDir(DATA_DIR);
|
||||
dbPath = DB_PATH;
|
||||
this.db = new Database(dbPathOrDb);
|
||||
this.db.run('PRAGMA journal_mode = WAL');
|
||||
}
|
||||
this.db = new Database(dbPath);
|
||||
this.db.run('PRAGMA journal_mode = WAL');
|
||||
|
||||
// Cache FTS5 availability once at construction (avoids DDL probe on every query)
|
||||
this._fts5Available = this.isFts5Available();
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
import { Database } from 'bun:sqlite';
|
||||
import { Database, type SQLQueryBindings } from 'bun:sqlite';
|
||||
import { DATA_DIR, DB_PATH, ensureDir, OBSERVER_SESSIONS_PROJECT } from '../../shared/paths.js';
|
||||
import { logger } from '../../utils/logger.js';
|
||||
import {
|
||||
@@ -13,7 +13,8 @@ import {
|
||||
LatestPromptResult
|
||||
} from '../../types/database.js';
|
||||
import type { PendingMessageStore } from './PendingMessageStore.js';
|
||||
import { computeObservationContentHash, findDuplicateObservation } from './observations/store.js';
|
||||
import type { ObservationSearchResult, SessionSummarySearchResult } from './types.js';
|
||||
import { computeObservationContentHash } from './observations/store.js';
|
||||
import { parseFileList } from './observations/files.js';
|
||||
import { DEFAULT_PLATFORM_SOURCE, normalizePlatformSource, sortPlatformSources } from '../../shared/platform-source.js';
|
||||
|
||||
@@ -34,17 +35,21 @@ function resolveCreateSessionArgs(
|
||||
export class SessionStore {
|
||||
public db: Database;
|
||||
|
||||
constructor(dbPath: string = DB_PATH) {
|
||||
if (dbPath !== ':memory:') {
|
||||
ensureDir(DATA_DIR);
|
||||
}
|
||||
this.db = new Database(dbPath);
|
||||
constructor(dbPathOrDb: string | Database = DB_PATH) {
|
||||
if (dbPathOrDb instanceof Database) {
|
||||
this.db = dbPathOrDb;
|
||||
} else {
|
||||
if (dbPathOrDb !== ':memory:') {
|
||||
ensureDir(DATA_DIR);
|
||||
}
|
||||
this.db = new Database(dbPathOrDb);
|
||||
|
||||
// Ensure optimized settings
|
||||
this.db.run('PRAGMA journal_mode = WAL');
|
||||
this.db.run('PRAGMA synchronous = NORMAL');
|
||||
this.db.run('PRAGMA foreign_keys = ON');
|
||||
this.db.run('PRAGMA journal_size_limit = 4194304'); // 4MB WAL cap (#1956)
|
||||
// Ensure optimized settings only for new connections
|
||||
this.db.run('PRAGMA journal_mode = WAL');
|
||||
this.db.run('PRAGMA synchronous = NORMAL');
|
||||
this.db.run('PRAGMA foreign_keys = ON');
|
||||
this.db.run('PRAGMA journal_size_limit = 4194304'); // 4MB WAL cap (#1956)
|
||||
}
|
||||
|
||||
// Initialize schema if needed (fresh database)
|
||||
this.initializeSchema();
|
||||
@@ -68,6 +73,7 @@ export class SessionStore {
|
||||
this.addObservationModelColumns();
|
||||
this.ensureMergedIntoProjectColumns();
|
||||
this.addObservationSubagentColumns();
|
||||
this.addObservationsUniqueContentHashIndex();
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -565,7 +571,6 @@ export class SessionStore {
|
||||
status TEXT NOT NULL DEFAULT 'pending' CHECK(status IN ('pending', 'processing', 'processed', 'failed')),
|
||||
retry_count INTEGER NOT NULL DEFAULT 0,
|
||||
created_at_epoch INTEGER NOT NULL,
|
||||
started_processing_at_epoch INTEGER,
|
||||
completed_at_epoch INTEGER,
|
||||
FOREIGN KEY (session_db_id) REFERENCES sdk_sessions(id) ON DELETE CASCADE
|
||||
)
|
||||
@@ -661,7 +666,7 @@ export class SessionStore {
|
||||
|
||||
/**
|
||||
* Add failed_at_epoch column to pending_messages (migration 20)
|
||||
* Used by markSessionMessagesFailed() for error recovery tracking
|
||||
* Used by transitionMessagesTo() for error recovery tracking
|
||||
*/
|
||||
private addFailedAtEpochColumn(): void {
|
||||
const applied = this.db.prepare('SELECT version FROM schema_versions WHERE version = ?').get(20) as SchemaVersion | undefined;
|
||||
@@ -1033,6 +1038,47 @@ export class SessionStore {
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Add UNIQUE(memory_session_id, content_hash) on observations (migration 29).
|
||||
* Mirrors MigrationRunner.addObservationsUniqueContentHashIndex so bundled
|
||||
* artifacts that embed SessionStore (e.g. worker-service.cjs, context-generator.cjs)
|
||||
* stay schema-consistent. Without this, INSERT … ON CONFLICT(memory_session_id,
|
||||
* content_hash) DO NOTHING throws "ON CONFLICT clause does not match any
|
||||
* PRIMARY KEY or UNIQUE constraint" and every observation insert fails.
|
||||
*/
|
||||
private addObservationsUniqueContentHashIndex(): void {
|
||||
const applied = this.db.prepare('SELECT version FROM schema_versions WHERE version = ?').get(29) as SchemaVersion | undefined;
|
||||
if (applied) return;
|
||||
|
||||
const obsCols = this.db.query('PRAGMA table_info(observations)').all() as TableColumnInfo[];
|
||||
const hasMem = obsCols.some(c => c.name === 'memory_session_id');
|
||||
const hasHash = obsCols.some(c => c.name === 'content_hash');
|
||||
if (!hasMem || !hasHash) {
|
||||
this.db.prepare('INSERT OR IGNORE INTO schema_versions (version, applied_at) VALUES (?, ?)').run(29, new Date().toISOString());
|
||||
return;
|
||||
}
|
||||
|
||||
this.db.run('BEGIN TRANSACTION');
|
||||
try {
|
||||
this.db.run(`
|
||||
DELETE FROM observations
|
||||
WHERE id NOT IN (
|
||||
SELECT MIN(id) FROM observations
|
||||
GROUP BY memory_session_id, content_hash
|
||||
)
|
||||
`);
|
||||
this.db.run(`
|
||||
CREATE UNIQUE INDEX IF NOT EXISTS ux_observations_session_hash
|
||||
ON observations(memory_session_id, content_hash)
|
||||
`);
|
||||
this.db.prepare('INSERT OR IGNORE INTO schema_versions (version, applied_at) VALUES (?, ?)').run(29, new Date().toISOString());
|
||||
this.db.run('COMMIT');
|
||||
} catch (error) {
|
||||
this.db.run('ROLLBACK');
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Update the memory session ID for a session
|
||||
* Called by SDKAgent when it captures the session ID from the first SDK message
|
||||
@@ -1112,7 +1158,18 @@ export class SessionStore {
|
||||
LIMIT ?
|
||||
`);
|
||||
|
||||
return stmt.all(project, limit);
|
||||
return stmt.all(project, limit) as Array<{
|
||||
request: string | null;
|
||||
investigated: string | null;
|
||||
learned: string | null;
|
||||
completed: string | null;
|
||||
next_steps: string | null;
|
||||
files_read: string | null;
|
||||
files_edited: string | null;
|
||||
notes: string | null;
|
||||
prompt_number: number | null;
|
||||
created_at: string;
|
||||
}>;
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -1137,7 +1194,15 @@ export class SessionStore {
|
||||
LIMIT ?
|
||||
`);
|
||||
|
||||
return stmt.all(project, limit);
|
||||
return stmt.all(project, limit) as Array<{
|
||||
memory_session_id: string;
|
||||
request: string | null;
|
||||
learned: string | null;
|
||||
completed: string | null;
|
||||
next_steps: string | null;
|
||||
prompt_number: number | null;
|
||||
created_at: string;
|
||||
}>;
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -1157,7 +1222,12 @@ export class SessionStore {
|
||||
LIMIT ?
|
||||
`);
|
||||
|
||||
return stmt.all(project, limit);
|
||||
return stmt.all(project, limit) as Array<{
|
||||
type: string;
|
||||
text: string;
|
||||
prompt_number: number | null;
|
||||
created_at: string;
|
||||
}>;
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -1193,7 +1263,18 @@ export class SessionStore {
|
||||
LIMIT ?
|
||||
`);
|
||||
|
||||
return stmt.all(limit);
|
||||
return stmt.all(limit) as Array<{
|
||||
id: number;
|
||||
type: string;
|
||||
title: string | null;
|
||||
subtitle: string | null;
|
||||
text: string;
|
||||
project: string;
|
||||
platform_source: string;
|
||||
prompt_number: number | null;
|
||||
created_at: string;
|
||||
created_at_epoch: number;
|
||||
}>;
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -1237,7 +1318,22 @@ export class SessionStore {
|
||||
LIMIT ?
|
||||
`);
|
||||
|
||||
return stmt.all(limit);
|
||||
return stmt.all(limit) as Array<{
|
||||
id: number;
|
||||
request: string | null;
|
||||
investigated: string | null;
|
||||
learned: string | null;
|
||||
completed: string | null;
|
||||
next_steps: string | null;
|
||||
files_read: string | null;
|
||||
files_edited: string | null;
|
||||
notes: string | null;
|
||||
project: string;
|
||||
platform_source: string;
|
||||
prompt_number: number | null;
|
||||
created_at: string;
|
||||
created_at_epoch: number;
|
||||
}>;
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -1269,7 +1365,16 @@ export class SessionStore {
|
||||
LIMIT ?
|
||||
`);
|
||||
|
||||
return stmt.all(limit);
|
||||
return stmt.all(limit) as Array<{
|
||||
id: number;
|
||||
content_session_id: string;
|
||||
project: string;
|
||||
platform_source: string;
|
||||
prompt_number: number;
|
||||
prompt_text: string;
|
||||
created_at: string;
|
||||
created_at_epoch: number;
|
||||
}>;
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -1283,7 +1388,7 @@ export class SessionStore {
|
||||
WHERE project IS NOT NULL AND project != ''
|
||||
AND project != ?
|
||||
`;
|
||||
const params: unknown[] = [OBSERVER_SESSIONS_PROJECT];
|
||||
const params: SQLQueryBindings[] = [OBSERVER_SESSIONS_PROJECT];
|
||||
|
||||
if (normalizedPlatformSource) {
|
||||
query += ' AND COALESCE(platform_source, ?) = ?';
|
||||
@@ -1404,7 +1509,13 @@ export class SessionStore {
|
||||
ORDER BY started_at_epoch ASC
|
||||
`);
|
||||
|
||||
return stmt.all(project, limit);
|
||||
return stmt.all(project, limit) as Array<{
|
||||
memory_session_id: string | null;
|
||||
status: string;
|
||||
started_at: string;
|
||||
user_prompt: string | null;
|
||||
has_summary: boolean;
|
||||
}>;
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -1423,7 +1534,12 @@ export class SessionStore {
|
||||
ORDER BY created_at_epoch ASC
|
||||
`);
|
||||
|
||||
return stmt.all(memorySessionId);
|
||||
return stmt.all(memorySessionId) as Array<{
|
||||
title: string;
|
||||
subtitle: string;
|
||||
type: string;
|
||||
prompt_number: number | null;
|
||||
}>;
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -1445,7 +1561,7 @@ export class SessionStore {
|
||||
getObservationsByIds(
|
||||
ids: number[],
|
||||
options: { orderBy?: 'date_desc' | 'date_asc'; limit?: number; project?: string; type?: string | string[]; concepts?: string | string[]; files?: string | string[] } = {}
|
||||
): ObservationRecord[] {
|
||||
): ObservationSearchResult[] {
|
||||
if (ids.length === 0) return [];
|
||||
|
||||
const { orderBy = 'date_desc', limit, project, type, concepts, files } = options;
|
||||
@@ -1509,7 +1625,7 @@ export class SessionStore {
|
||||
${limitClause}
|
||||
`);
|
||||
|
||||
return stmt.all(...params) as ObservationRecord[];
|
||||
return stmt.all(...params) as ObservationSearchResult[];
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -1539,7 +1655,19 @@ export class SessionStore {
|
||||
LIMIT 1
|
||||
`);
|
||||
|
||||
return stmt.get(memorySessionId) || null;
|
||||
return (stmt.get(memorySessionId) as {
|
||||
request: string | null;
|
||||
investigated: string | null;
|
||||
learned: string | null;
|
||||
completed: string | null;
|
||||
next_steps: string | null;
|
||||
files_read: string | null;
|
||||
files_edited: string | null;
|
||||
notes: string | null;
|
||||
prompt_number: number | null;
|
||||
created_at: string;
|
||||
created_at_epoch: number;
|
||||
} | null) || null;
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -1599,7 +1727,16 @@ export class SessionStore {
|
||||
LIMIT 1
|
||||
`);
|
||||
|
||||
return stmt.get(id) || null;
|
||||
return (stmt.get(id) as {
|
||||
id: number;
|
||||
content_session_id: string;
|
||||
memory_session_id: string | null;
|
||||
project: string;
|
||||
platform_source: string;
|
||||
user_prompt: string;
|
||||
custom_title: string | null;
|
||||
status: string;
|
||||
} | null) || null;
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -1805,12 +1942,9 @@ export class SessionStore {
|
||||
const timestampEpoch = overrideTimestampEpoch ?? Date.now();
|
||||
const timestampIso = new Date(timestampEpoch).toISOString();
|
||||
|
||||
// Content-hash deduplication
|
||||
// DB-enforced dedup: UNIQUE(memory_session_id, content_hash) +
|
||||
// ON CONFLICT DO NOTHING (Plan 01 Phase 4).
|
||||
const contentHash = computeObservationContentHash(memorySessionId, observation.title, observation.narrative);
|
||||
const existing = findDuplicateObservation(this.db, contentHash, timestampEpoch);
|
||||
if (existing) {
|
||||
return { id: existing.id, createdAtEpoch: existing.created_at_epoch };
|
||||
}
|
||||
|
||||
const stmt = this.db.prepare(`
|
||||
INSERT INTO observations
|
||||
@@ -1818,9 +1952,11 @@ export class SessionStore {
|
||||
files_read, files_modified, prompt_number, discovery_tokens, agent_type, agent_id, content_hash, created_at, created_at_epoch,
|
||||
generated_by_model)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||
ON CONFLICT(memory_session_id, content_hash) DO NOTHING
|
||||
RETURNING id, created_at_epoch
|
||||
`);
|
||||
|
||||
const result = stmt.run(
|
||||
const inserted = stmt.get(
|
||||
memorySessionId,
|
||||
project,
|
||||
observation.type,
|
||||
@@ -1839,12 +1975,22 @@ export class SessionStore {
|
||||
timestampIso,
|
||||
timestampEpoch,
|
||||
generatedByModel || null
|
||||
);
|
||||
) as { id: number; created_at_epoch: number } | null;
|
||||
|
||||
return {
|
||||
id: Number(result.lastInsertRowid),
|
||||
createdAtEpoch: timestampEpoch
|
||||
};
|
||||
if (inserted) {
|
||||
return { id: inserted.id, createdAtEpoch: inserted.created_at_epoch };
|
||||
}
|
||||
|
||||
const existing = this.db.prepare(
|
||||
'SELECT id, created_at_epoch FROM observations WHERE memory_session_id = ? AND content_hash = ?'
|
||||
).get(memorySessionId, contentHash) as { id: number; created_at_epoch: number } | null;
|
||||
|
||||
if (!existing) {
|
||||
throw new Error(
|
||||
`storeObservation: ON CONFLICT without existing row for content_hash=${contentHash}`
|
||||
);
|
||||
}
|
||||
return { id: existing.id, createdAtEpoch: existing.created_at_epoch };
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -1950,25 +2096,25 @@ export class SessionStore {
|
||||
const storeTx = this.db.transaction(() => {
|
||||
const observationIds: number[] = [];
|
||||
|
||||
// 1. Store all observations (with content-hash deduplication)
|
||||
// 1. Store all observations.
|
||||
// DB-enforced dedup via UNIQUE(memory_session_id, content_hash) +
|
||||
// ON CONFLICT DO NOTHING (Plan 01 Phase 4).
|
||||
const obsStmt = this.db.prepare(`
|
||||
INSERT INTO observations
|
||||
(memory_session_id, project, type, title, subtitle, facts, narrative, concepts,
|
||||
files_read, files_modified, prompt_number, discovery_tokens, agent_type, agent_id, content_hash, created_at, created_at_epoch,
|
||||
generated_by_model)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||
ON CONFLICT(memory_session_id, content_hash) DO NOTHING
|
||||
RETURNING id
|
||||
`);
|
||||
const lookupExistingStmt = this.db.prepare(
|
||||
'SELECT id FROM observations WHERE memory_session_id = ? AND content_hash = ?'
|
||||
);
|
||||
|
||||
for (const observation of observations) {
|
||||
// Content-hash deduplication (same logic as storeObservation singular)
|
||||
const contentHash = computeObservationContentHash(memorySessionId, observation.title, observation.narrative);
|
||||
const existing = findDuplicateObservation(this.db, contentHash, timestampEpoch);
|
||||
if (existing) {
|
||||
observationIds.push(existing.id);
|
||||
continue;
|
||||
}
|
||||
|
||||
const result = obsStmt.run(
|
||||
const inserted = obsStmt.get(
|
||||
memorySessionId,
|
||||
project,
|
||||
observation.type,
|
||||
@@ -1987,8 +2133,20 @@ export class SessionStore {
|
||||
timestampIso,
|
||||
timestampEpoch,
|
||||
generatedByModel || null
|
||||
);
|
||||
observationIds.push(Number(result.lastInsertRowid));
|
||||
) as { id: number } | null;
|
||||
|
||||
if (inserted) {
|
||||
observationIds.push(inserted.id);
|
||||
continue;
|
||||
}
|
||||
|
||||
const existing = lookupExistingStmt.get(memorySessionId, contentHash) as { id: number } | null;
|
||||
if (!existing) {
|
||||
throw new Error(
|
||||
`storeObservations: ON CONFLICT without existing row for content_hash=${contentHash}`
|
||||
);
|
||||
}
|
||||
observationIds.push(existing.id);
|
||||
}
|
||||
|
||||
// 2. Store summary if provided
|
||||
@@ -2086,25 +2244,25 @@ export class SessionStore {
|
||||
const storeAndMarkTx = this.db.transaction(() => {
|
||||
const observationIds: number[] = [];
|
||||
|
||||
// 1. Store all observations (with content-hash deduplication)
|
||||
// 1. Store all observations.
|
||||
// DB-enforced dedup via UNIQUE(memory_session_id, content_hash) +
|
||||
// ON CONFLICT DO NOTHING (Plan 01 Phase 4).
|
||||
const obsStmt = this.db.prepare(`
|
||||
INSERT INTO observations
|
||||
(memory_session_id, project, type, title, subtitle, facts, narrative, concepts,
|
||||
files_read, files_modified, prompt_number, discovery_tokens, agent_type, agent_id, content_hash, created_at, created_at_epoch,
|
||||
generated_by_model)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||
ON CONFLICT(memory_session_id, content_hash) DO NOTHING
|
||||
RETURNING id
|
||||
`);
|
||||
const lookupExistingStmt = this.db.prepare(
|
||||
'SELECT id FROM observations WHERE memory_session_id = ? AND content_hash = ?'
|
||||
);
|
||||
|
||||
for (const observation of observations) {
|
||||
// Content-hash deduplication (same logic as storeObservation singular)
|
||||
const contentHash = computeObservationContentHash(memorySessionId, observation.title, observation.narrative);
|
||||
const existing = findDuplicateObservation(this.db, contentHash, timestampEpoch);
|
||||
if (existing) {
|
||||
observationIds.push(existing.id);
|
||||
continue;
|
||||
}
|
||||
|
||||
const result = obsStmt.run(
|
||||
const inserted = obsStmt.get(
|
||||
memorySessionId,
|
||||
project,
|
||||
observation.type,
|
||||
@@ -2123,8 +2281,20 @@ export class SessionStore {
|
||||
timestampIso,
|
||||
timestampEpoch,
|
||||
generatedByModel || null
|
||||
);
|
||||
observationIds.push(Number(result.lastInsertRowid));
|
||||
) as { id: number } | null;
|
||||
|
||||
if (inserted) {
|
||||
observationIds.push(inserted.id);
|
||||
continue;
|
||||
}
|
||||
|
||||
const existing = lookupExistingStmt.get(memorySessionId, contentHash) as { id: number } | null;
|
||||
if (!existing) {
|
||||
throw new Error(
|
||||
`storeObservationsAndMarkComplete: ON CONFLICT without existing row for content_hash=${contentHash}`
|
||||
);
|
||||
}
|
||||
observationIds.push(existing.id);
|
||||
}
|
||||
|
||||
// 2. Store summary if provided
|
||||
@@ -2177,11 +2347,6 @@ export class SessionStore {
|
||||
|
||||
|
||||
|
||||
// REMOVED: cleanupOrphanedSessions - violates "EVERYTHING SHOULD SAVE ALWAYS"
|
||||
// There's no such thing as an "orphaned" session. Sessions are created by hooks
|
||||
// and managed by Claude Code's lifecycle. Worker restarts don't invalidate them.
|
||||
// Marking all active sessions as 'failed' on startup destroys the user's current work.
|
||||
|
||||
/**
|
||||
* Get session summaries by IDs (for hybrid Chroma search)
|
||||
* Returns summaries in specified temporal order
|
||||
@@ -2189,7 +2354,7 @@ export class SessionStore {
|
||||
getSessionSummariesByIds(
|
||||
ids: number[],
|
||||
options: { orderBy?: 'date_desc' | 'date_asc'; limit?: number; project?: string } = {}
|
||||
): SessionSummaryRecord[] {
|
||||
): SessionSummarySearchResult[] {
|
||||
if (ids.length === 0) return [];
|
||||
|
||||
const { orderBy = 'date_desc', limit, project } = options;
|
||||
@@ -2211,7 +2376,7 @@ export class SessionStore {
|
||||
${limitClause}
|
||||
`);
|
||||
|
||||
return stmt.all(...params) as SessionSummaryRecord[];
|
||||
return stmt.all(...params) as SessionSummarySearchResult[];
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -2443,7 +2608,15 @@ export class SessionStore {
|
||||
LIMIT 1
|
||||
`);
|
||||
|
||||
return stmt.get(id) || null;
|
||||
return (stmt.get(id) as {
|
||||
id: number;
|
||||
content_session_id: string;
|
||||
prompt_number: number;
|
||||
prompt_text: string;
|
||||
project: string;
|
||||
created_at: string;
|
||||
created_at_epoch: number;
|
||||
} | null) || null;
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -2519,7 +2692,18 @@ export class SessionStore {
|
||||
LIMIT 1
|
||||
`);
|
||||
|
||||
return stmt.get(id) || null;
|
||||
return (stmt.get(id) as {
|
||||
id: number;
|
||||
memory_session_id: string | null;
|
||||
content_session_id: string;
|
||||
project: string;
|
||||
user_prompt: string;
|
||||
request_summary: string | null;
|
||||
learned_summary: string | null;
|
||||
status: string;
|
||||
created_at: string;
|
||||
created_at_epoch: number;
|
||||
} | null) || null;
|
||||
}
|
||||
|
||||
/**
|
||||
|
||||
@@ -30,7 +30,6 @@ export class MigrationRunner {
|
||||
this.ensureDiscoveryTokensColumn();
|
||||
this.createPendingMessagesTable();
|
||||
this.renameSessionIdColumns();
|
||||
this.repairSessionIdColumnRename();
|
||||
this.addFailedAtEpochColumn();
|
||||
this.addOnUpdateCascadeToForeignKeys();
|
||||
this.addObservationContentHashColumn();
|
||||
@@ -39,6 +38,8 @@ export class MigrationRunner {
|
||||
this.addSessionPlatformSourceColumn();
|
||||
this.ensureMergedIntoProjectColumns();
|
||||
this.addObservationSubagentColumns();
|
||||
this.rebuildPendingMessagesForSelfHealingClaim();
|
||||
this.addObservationsUniqueContentHashIndex();
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -533,7 +534,6 @@ export class MigrationRunner {
|
||||
status TEXT NOT NULL DEFAULT 'pending' CHECK(status IN ('pending', 'processing', 'processed', 'failed')),
|
||||
retry_count INTEGER NOT NULL DEFAULT 0,
|
||||
created_at_epoch INTEGER NOT NULL,
|
||||
started_processing_at_epoch INTEGER,
|
||||
completed_at_epoch INTEGER,
|
||||
FOREIGN KEY (session_db_id) REFERENCES sdk_sessions(id) ON DELETE CASCADE
|
||||
)
|
||||
@@ -613,23 +613,9 @@ export class MigrationRunner {
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Repair session ID column renames (migration 19)
|
||||
* DEPRECATED: Migration 17 is now fully idempotent and handles all cases.
|
||||
* This migration is kept for backwards compatibility but does nothing.
|
||||
*/
|
||||
private repairSessionIdColumnRename(): void {
|
||||
const applied = this.db.prepare('SELECT version FROM schema_versions WHERE version = ?').get(19) as SchemaVersion | undefined;
|
||||
if (applied) return;
|
||||
|
||||
// Migration 17 now handles all column rename cases idempotently.
|
||||
// Just record this migration as applied.
|
||||
this.db.prepare('INSERT OR IGNORE INTO schema_versions (version, applied_at) VALUES (?, ?)').run(19, new Date().toISOString());
|
||||
}
|
||||
|
||||
/**
|
||||
* Add failed_at_epoch column to pending_messages (migration 20)
|
||||
* Used by markSessionMessagesFailed() for error recovery tracking
|
||||
* Used by transitionMessagesTo() for error recovery tracking
|
||||
*/
|
||||
private addFailedAtEpochColumn(): void {
|
||||
const applied = this.db.prepare('SELECT version FROM schema_versions WHERE version = ?').get(20) as SchemaVersion | undefined;
|
||||
@@ -1015,4 +1001,207 @@ export class MigrationRunner {
|
||||
this.db.prepare('INSERT OR IGNORE INTO schema_versions (version, applied_at) VALUES (?, ?)').run(27, new Date().toISOString());
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Rebuild pending_messages for self-healing claim (migration 28).
|
||||
*
|
||||
* PATHFINDER-2026-04-22 Plan 01 Phase 2.
|
||||
*
|
||||
* - Drops the legacy stale-reset epoch column (was the input to the
|
||||
* 60-s stale-reset; replaced by worker-PID liveness at claim time).
|
||||
* - Adds `worker_pid INTEGER` (set by claimNextMessage to the live
|
||||
* worker's PID; rows whose worker_pid is no longer alive are
|
||||
* immediately reclaimable).
|
||||
* - Adds `tool_use_id TEXT` so ingestion-time pairing of tool_use →
|
||||
* tool_result can be DB-backed instead of an in-memory Map
|
||||
* (Plan 03 dependency).
|
||||
* - Dedupes any existing rows that share (content_session_id,
|
||||
* tool_use_id), then creates a partial UNIQUE index.
|
||||
*
|
||||
* Follows the table-rebuild precedent at runner.ts:691 (migration 21):
|
||||
* disable FKs, BEGIN, recreate, INSERT-SELECT, RENAME, COMMIT, re-enable.
|
||||
*/
|
||||
private rebuildPendingMessagesForSelfHealingClaim(): void {
|
||||
const applied = this.db.prepare('SELECT version FROM schema_versions WHERE version = ?').get(28) as SchemaVersion | undefined;
|
||||
if (applied) return;
|
||||
|
||||
const pendingExists = (this.db.query("SELECT name FROM sqlite_master WHERE type='table' AND name='pending_messages'").all() as TableNameRow[]).length > 0;
|
||||
if (!pendingExists) {
|
||||
// pending_messages table never created on this DB — nothing to rebuild.
|
||||
this.db.prepare('INSERT OR IGNORE INTO schema_versions (version, applied_at) VALUES (?, ?)').run(28, new Date().toISOString());
|
||||
return;
|
||||
}
|
||||
|
||||
logger.debug('DB', 'Rebuilding pending_messages for self-healing claim (migration 28)');
|
||||
|
||||
// PRAGMA foreign_keys must be set outside a transaction.
|
||||
this.db.run('PRAGMA foreign_keys = OFF');
|
||||
this.db.run('BEGIN TRANSACTION');
|
||||
|
||||
try {
|
||||
// Source columns may include legacy fields. We build the SELECT explicitly
|
||||
// using only columns we know are present in the source after migration 27.
|
||||
const sourceCols = this.db.query('PRAGMA table_info(pending_messages)').all() as TableColumnInfo[];
|
||||
const colNames = new Set(sourceCols.map(c => c.name));
|
||||
const has = (name: string) => colNames.has(name);
|
||||
|
||||
// Clean up leftover temp from a previously-crashed run.
|
||||
this.db.run('DROP TABLE IF EXISTS pending_messages_new');
|
||||
|
||||
this.db.run(`
|
||||
CREATE TABLE pending_messages_new (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
session_db_id INTEGER NOT NULL,
|
||||
content_session_id TEXT NOT NULL,
|
||||
tool_use_id TEXT,
|
||||
message_type TEXT NOT NULL CHECK(message_type IN ('observation', 'summarize')),
|
||||
tool_name TEXT,
|
||||
tool_input TEXT,
|
||||
tool_response TEXT,
|
||||
cwd TEXT,
|
||||
last_user_message TEXT,
|
||||
last_assistant_message TEXT,
|
||||
prompt_number INTEGER,
|
||||
status TEXT NOT NULL DEFAULT 'pending'
|
||||
CHECK(status IN ('pending', 'processing', 'processed', 'failed')),
|
||||
retry_count INTEGER NOT NULL DEFAULT 0,
|
||||
created_at_epoch INTEGER NOT NULL,
|
||||
failed_at_epoch INTEGER,
|
||||
completed_at_epoch INTEGER,
|
||||
worker_pid INTEGER,
|
||||
agent_type TEXT,
|
||||
agent_id TEXT,
|
||||
FOREIGN KEY (session_db_id) REFERENCES sdk_sessions(id) ON DELETE CASCADE
|
||||
)
|
||||
`);
|
||||
|
||||
// INSERT-SELECT — note that the legacy stale-reset epoch column is
|
||||
// intentionally omitted. Any 'processing' row is left with worker_pid =
|
||||
// NULL so that a self-healing claim picks it up immediately on next
|
||||
// worker boot.
|
||||
this.db.run(`
|
||||
INSERT INTO pending_messages_new (
|
||||
id, session_db_id, content_session_id, tool_use_id, message_type,
|
||||
tool_name, tool_input, tool_response, cwd, last_user_message,
|
||||
last_assistant_message, prompt_number, status, retry_count,
|
||||
created_at_epoch, failed_at_epoch, completed_at_epoch, worker_pid,
|
||||
agent_type, agent_id
|
||||
)
|
||||
SELECT
|
||||
id,
|
||||
session_db_id,
|
||||
content_session_id,
|
||||
${has('tool_use_id') ? 'tool_use_id' : 'NULL'},
|
||||
message_type,
|
||||
tool_name,
|
||||
tool_input,
|
||||
tool_response,
|
||||
cwd,
|
||||
${has('last_user_message') ? 'last_user_message' : 'NULL'},
|
||||
${has('last_assistant_message') ? 'last_assistant_message' : 'NULL'},
|
||||
${has('prompt_number') ? 'prompt_number' : 'NULL'},
|
||||
status,
|
||||
retry_count,
|
||||
created_at_epoch,
|
||||
${has('failed_at_epoch') ? 'failed_at_epoch' : 'NULL'},
|
||||
${has('completed_at_epoch') ? 'completed_at_epoch' : 'NULL'},
|
||||
NULL,
|
||||
${has('agent_type') ? 'agent_type' : 'NULL'},
|
||||
${has('agent_id') ? 'agent_id' : 'NULL'}
|
||||
FROM pending_messages
|
||||
`);
|
||||
|
||||
this.db.run('DROP TABLE pending_messages');
|
||||
this.db.run('ALTER TABLE pending_messages_new RENAME TO pending_messages');
|
||||
|
||||
this.db.run('CREATE INDEX IF NOT EXISTS idx_pending_messages_session ON pending_messages(session_db_id)');
|
||||
this.db.run('CREATE INDEX IF NOT EXISTS idx_pending_messages_status ON pending_messages(status)');
|
||||
this.db.run('CREATE INDEX IF NOT EXISTS idx_pending_messages_claude_session ON pending_messages(content_session_id)');
|
||||
this.db.run('CREATE INDEX IF NOT EXISTS idx_pending_messages_worker_pid ON pending_messages(worker_pid)');
|
||||
|
||||
// Dedup any pre-existing duplicate (content_session_id, tool_use_id) pairs
|
||||
// before adding the UNIQUE index. Keep the lowest id (oldest) per pair.
|
||||
this.db.run(`
|
||||
DELETE FROM pending_messages
|
||||
WHERE tool_use_id IS NOT NULL
|
||||
AND id NOT IN (
|
||||
SELECT MIN(id) FROM pending_messages
|
||||
WHERE tool_use_id IS NOT NULL
|
||||
GROUP BY content_session_id, tool_use_id
|
||||
)
|
||||
`);
|
||||
|
||||
this.db.run(`
|
||||
CREATE UNIQUE INDEX IF NOT EXISTS ux_pending_session_tool
|
||||
ON pending_messages(content_session_id, tool_use_id)
|
||||
WHERE tool_use_id IS NOT NULL
|
||||
`);
|
||||
|
||||
this.db.prepare('INSERT OR IGNORE INTO schema_versions (version, applied_at) VALUES (?, ?)').run(28, new Date().toISOString());
|
||||
this.db.run('COMMIT');
|
||||
this.db.run('PRAGMA foreign_keys = ON');
|
||||
|
||||
logger.debug('DB', 'Rebuilt pending_messages for self-healing claim');
|
||||
} catch (error) {
|
||||
this.db.run('ROLLBACK');
|
||||
this.db.run('PRAGMA foreign_keys = ON');
|
||||
if (error instanceof Error) {
|
||||
throw error;
|
||||
}
|
||||
throw new Error(`Migration 28 failed: ${String(error)}`);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Add UNIQUE(memory_session_id, content_hash) on observations (migration 29).
|
||||
*
|
||||
* PATHFINDER-2026-04-22 Plan 01 Phase 2 + Phase 4.
|
||||
*
|
||||
* - Dedupes existing rows that share (memory_session_id, content_hash),
|
||||
* keeping the lowest id (oldest) per pair.
|
||||
* - Creates a UNIQUE index that lets writers use
|
||||
* INSERT … ON CONFLICT(memory_session_id, content_hash) DO NOTHING
|
||||
* in place of the legacy dedup window scan.
|
||||
*/
|
||||
private addObservationsUniqueContentHashIndex(): void {
|
||||
const applied = this.db.prepare('SELECT version FROM schema_versions WHERE version = ?').get(29) as SchemaVersion | undefined;
|
||||
if (applied) return;
|
||||
|
||||
// Need both columns to exist.
|
||||
const obsCols = this.db.query('PRAGMA table_info(observations)').all() as TableColumnInfo[];
|
||||
const hasMem = obsCols.some(c => c.name === 'memory_session_id');
|
||||
const hasHash = obsCols.some(c => c.name === 'content_hash');
|
||||
if (!hasMem || !hasHash) {
|
||||
// Nothing to do; record so we don't keep retrying.
|
||||
this.db.prepare('INSERT OR IGNORE INTO schema_versions (version, applied_at) VALUES (?, ?)').run(29, new Date().toISOString());
|
||||
return;
|
||||
}
|
||||
|
||||
this.db.run('BEGIN TRANSACTION');
|
||||
try {
|
||||
// Dedup before adding the UNIQUE index — keep the lowest id per pair.
|
||||
this.db.run(`
|
||||
DELETE FROM observations
|
||||
WHERE id NOT IN (
|
||||
SELECT MIN(id) FROM observations
|
||||
GROUP BY memory_session_id, content_hash
|
||||
)
|
||||
`);
|
||||
|
||||
this.db.run(`
|
||||
CREATE UNIQUE INDEX IF NOT EXISTS ux_observations_session_hash
|
||||
ON observations(memory_session_id, content_hash)
|
||||
`);
|
||||
|
||||
this.db.prepare('INSERT OR IGNORE INTO schema_versions (version, applied_at) VALUES (?, ?)').run(29, new Date().toISOString());
|
||||
this.db.run('COMMIT');
|
||||
logger.debug('DB', 'Added UNIQUE(memory_session_id, content_hash) on observations');
|
||||
} catch (error) {
|
||||
this.db.run('ROLLBACK');
|
||||
if (error instanceof Error) {
|
||||
throw error;
|
||||
}
|
||||
throw new Error(`Migration 29 failed: ${String(error)}`);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -9,9 +9,6 @@ import { logger } from '../../../utils/logger.js';
|
||||
import { getProjectContext } from '../../../utils/project-name.js';
|
||||
import type { ObservationInput, StoreObservationResult } from './types.js';
|
||||
|
||||
/** Deduplication window: observations with the same content hash within this window are skipped */
|
||||
const DEDUP_WINDOW_MS = 30_000;
|
||||
|
||||
/**
|
||||
* Compute a short content hash for deduplication.
|
||||
* Uses (memory_session_id, title, narrative) as the semantic identity of an observation.
|
||||
@@ -30,25 +27,13 @@ export function computeObservationContentHash(
|
||||
}
|
||||
|
||||
/**
|
||||
* Check if a duplicate observation exists within the dedup window.
|
||||
* Returns the existing observation's id and timestamp if found, null otherwise.
|
||||
*/
|
||||
export function findDuplicateObservation(
|
||||
db: Database,
|
||||
contentHash: string,
|
||||
timestampEpoch: number
|
||||
): { id: number; created_at_epoch: number } | null {
|
||||
const windowStart = timestampEpoch - DEDUP_WINDOW_MS;
|
||||
const stmt = db.prepare(
|
||||
'SELECT id, created_at_epoch FROM observations WHERE content_hash = ? AND created_at_epoch > ?'
|
||||
);
|
||||
return (stmt.get(contentHash, windowStart) as { id: number; created_at_epoch: number } | null);
|
||||
}
|
||||
|
||||
/**
|
||||
* Store an observation (from SDK parsing)
|
||||
* Assumes session already exists (created by hook)
|
||||
* Performs content-hash deduplication: skips INSERT if an identical observation exists within 30s
|
||||
* Store an observation (from SDK parsing).
|
||||
*
|
||||
* Assumes session already exists (created by hook). Deduplication is enforced
|
||||
* by the database via UNIQUE(memory_session_id, content_hash) (Plan 01 Phase 4):
|
||||
* INSERT … ON CONFLICT DO NOTHING absorbs duplicates silently. The returned id
|
||||
* is the existing row's id when a conflict occurred, otherwise the freshly
|
||||
* inserted row.
|
||||
*/
|
||||
export function storeObservation(
|
||||
db: Database,
|
||||
@@ -66,22 +51,18 @@ export function storeObservation(
|
||||
// Guard against empty project string (race condition where project isn't set yet)
|
||||
const resolvedProject = project || getProjectContext(process.cwd()).primary;
|
||||
|
||||
// Content-hash deduplication
|
||||
const contentHash = computeObservationContentHash(memorySessionId, observation.title, observation.narrative);
|
||||
const existing = findDuplicateObservation(db, contentHash, timestampEpoch);
|
||||
if (existing) {
|
||||
logger.debug('DEDUP', `Skipped duplicate observation | contentHash=${contentHash} | existingId=${existing.id}`);
|
||||
return { id: existing.id, createdAtEpoch: existing.created_at_epoch };
|
||||
}
|
||||
|
||||
const stmt = db.prepare(`
|
||||
INSERT INTO observations
|
||||
(memory_session_id, project, type, title, subtitle, facts, narrative, concepts,
|
||||
files_read, files_modified, prompt_number, discovery_tokens, agent_type, agent_id, content_hash, created_at, created_at_epoch)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||
ON CONFLICT(memory_session_id, content_hash) DO NOTHING
|
||||
RETURNING id, created_at_epoch
|
||||
`);
|
||||
|
||||
const result = stmt.run(
|
||||
const inserted = stmt.get(
|
||||
memorySessionId,
|
||||
resolvedProject,
|
||||
observation.type,
|
||||
@@ -99,10 +80,24 @@ export function storeObservation(
|
||||
contentHash,
|
||||
timestampIso,
|
||||
timestampEpoch
|
||||
);
|
||||
) as { id: number; created_at_epoch: number } | null;
|
||||
|
||||
return {
|
||||
id: Number(result.lastInsertRowid),
|
||||
createdAtEpoch: timestampEpoch
|
||||
};
|
||||
if (inserted) {
|
||||
return { id: inserted.id, createdAtEpoch: inserted.created_at_epoch };
|
||||
}
|
||||
|
||||
// Conflict — fetch the existing row's id for the (memory_session_id, content_hash) pair.
|
||||
const existing = db.prepare(
|
||||
'SELECT id, created_at_epoch FROM observations WHERE memory_session_id = ? AND content_hash = ?'
|
||||
).get(memorySessionId, contentHash) as { id: number; created_at_epoch: number } | null;
|
||||
|
||||
if (!existing) {
|
||||
// Unreachable in practice (UNIQUE conflict implies existing row), but be explicit.
|
||||
throw new Error(
|
||||
`storeObservation: ON CONFLICT fired but no row exists for (memory_session_id=${memorySessionId}, content_hash=${contentHash})`
|
||||
);
|
||||
}
|
||||
|
||||
logger.debug('DEDUP', `Skipped duplicate observation | contentHash=${contentHash} | existingId=${existing.id}`);
|
||||
return { id: existing.id, createdAtEpoch: existing.created_at_epoch };
|
||||
}
|
||||
|
||||
@@ -0,0 +1,188 @@
|
||||
-- claude-mem SQLite schema
|
||||
--
|
||||
-- Authoritative shape of the database after all migrations through
|
||||
-- runner.ts have been applied (current tip = migration 29). Fresh
|
||||
-- databases boot directly into this shape; existing databases reach
|
||||
-- it via the migration runner.
|
||||
--
|
||||
-- Source of truth: src/services/sqlite/migrations/runner.ts
|
||||
-- Regenerated by: PATHFINDER-2026-04-22 Plan 01 (Data Integrity).
|
||||
--
|
||||
-- Invariants enforced here (Plan 01):
|
||||
-- * pending_messages.UNIQUE(content_session_id, tool_use_id) — replaces
|
||||
-- in-memory pendingTools Map for ingestion pairing (Plan 03 also depends).
|
||||
-- * pending_messages.worker_pid INTEGER — populated by self-healing
|
||||
-- claim query; replaces the legacy stale-reset epoch column.
|
||||
-- * observations.UNIQUE(memory_session_id, content_hash) — replaces the
|
||||
-- legacy dedup window; ON CONFLICT DO NOTHING absorbs duplicates.
|
||||
|
||||
CREATE TABLE IF NOT EXISTS schema_versions (
|
||||
id INTEGER PRIMARY KEY,
|
||||
version INTEGER UNIQUE NOT NULL,
|
||||
applied_at TEXT NOT NULL
|
||||
);
|
||||
|
||||
-- ─────────────────────────────────────────────────────────────────────
|
||||
-- sdk_sessions: one row per Claude/Codex session observed by claude-mem.
|
||||
-- ─────────────────────────────────────────────────────────────────────
|
||||
CREATE TABLE IF NOT EXISTS sdk_sessions (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
content_session_id TEXT UNIQUE NOT NULL,
|
||||
memory_session_id TEXT UNIQUE,
|
||||
project TEXT NOT NULL,
|
||||
platform_source TEXT NOT NULL DEFAULT 'claude',
|
||||
user_prompt TEXT,
|
||||
started_at TEXT NOT NULL,
|
||||
started_at_epoch INTEGER NOT NULL,
|
||||
completed_at TEXT,
|
||||
completed_at_epoch INTEGER,
|
||||
status TEXT NOT NULL DEFAULT 'active'
|
||||
CHECK(status IN ('active', 'completed', 'failed')),
|
||||
worker_port INTEGER,
|
||||
prompt_counter INTEGER DEFAULT 0,
|
||||
custom_title TEXT
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS idx_sdk_sessions_claude_id ON sdk_sessions(content_session_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_sdk_sessions_sdk_id ON sdk_sessions(memory_session_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_sdk_sessions_project ON sdk_sessions(project);
|
||||
CREATE INDEX IF NOT EXISTS idx_sdk_sessions_status ON sdk_sessions(status);
|
||||
CREATE INDEX IF NOT EXISTS idx_sdk_sessions_started ON sdk_sessions(started_at_epoch DESC);
|
||||
CREATE INDEX IF NOT EXISTS idx_sdk_sessions_platform_source ON sdk_sessions(platform_source);
|
||||
|
||||
-- ─────────────────────────────────────────────────────────────────────
|
||||
-- observations: structured memory rows extracted from SDK output.
|
||||
-- UNIQUE(memory_session_id, content_hash) replaces the legacy dedup window;
|
||||
-- writes use INSERT … ON CONFLICT DO NOTHING.
|
||||
-- ─────────────────────────────────────────────────────────────────────
|
||||
CREATE TABLE IF NOT EXISTS observations (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
memory_session_id TEXT NOT NULL,
|
||||
project TEXT NOT NULL,
|
||||
text TEXT,
|
||||
type TEXT NOT NULL,
|
||||
title TEXT,
|
||||
subtitle TEXT,
|
||||
facts TEXT,
|
||||
narrative TEXT,
|
||||
concepts TEXT,
|
||||
files_read TEXT,
|
||||
files_modified TEXT,
|
||||
prompt_number INTEGER,
|
||||
discovery_tokens INTEGER DEFAULT 0,
|
||||
content_hash TEXT,
|
||||
agent_type TEXT,
|
||||
agent_id TEXT,
|
||||
merged_into_project TEXT,
|
||||
generated_by_model TEXT,
|
||||
created_at TEXT NOT NULL,
|
||||
created_at_epoch INTEGER NOT NULL,
|
||||
FOREIGN KEY(memory_session_id) REFERENCES sdk_sessions(memory_session_id)
|
||||
ON DELETE CASCADE ON UPDATE CASCADE,
|
||||
UNIQUE(memory_session_id, content_hash)
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS idx_observations_sdk_session ON observations(memory_session_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_observations_project ON observations(project);
|
||||
CREATE INDEX IF NOT EXISTS idx_observations_type ON observations(type);
|
||||
CREATE INDEX IF NOT EXISTS idx_observations_created ON observations(created_at_epoch DESC);
|
||||
CREATE INDEX IF NOT EXISTS idx_observations_content_hash ON observations(content_hash, created_at_epoch);
|
||||
CREATE INDEX IF NOT EXISTS idx_observations_agent_type ON observations(agent_type);
|
||||
CREATE INDEX IF NOT EXISTS idx_observations_agent_id ON observations(agent_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_observations_merged_into ON observations(merged_into_project);
|
||||
|
||||
-- ─────────────────────────────────────────────────────────────────────
|
||||
-- session_summaries: one summary row per memory session.
|
||||
-- ─────────────────────────────────────────────────────────────────────
|
||||
CREATE TABLE IF NOT EXISTS session_summaries (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
memory_session_id TEXT NOT NULL,
|
||||
project TEXT NOT NULL,
|
||||
request TEXT,
|
||||
investigated TEXT,
|
||||
learned TEXT,
|
||||
completed TEXT,
|
||||
next_steps TEXT,
|
||||
files_read TEXT,
|
||||
files_edited TEXT,
|
||||
notes TEXT,
|
||||
prompt_number INTEGER,
|
||||
discovery_tokens INTEGER DEFAULT 0,
|
||||
merged_into_project TEXT,
|
||||
created_at TEXT NOT NULL,
|
||||
created_at_epoch INTEGER NOT NULL,
|
||||
FOREIGN KEY(memory_session_id) REFERENCES sdk_sessions(memory_session_id)
|
||||
ON DELETE CASCADE ON UPDATE CASCADE
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS idx_session_summaries_sdk_session ON session_summaries(memory_session_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_session_summaries_project ON session_summaries(project);
|
||||
CREATE INDEX IF NOT EXISTS idx_session_summaries_created ON session_summaries(created_at_epoch DESC);
|
||||
CREATE INDEX IF NOT EXISTS idx_summaries_merged_into ON session_summaries(merged_into_project);
|
||||
|
||||
-- ─────────────────────────────────────────────────────────────────────
|
||||
-- pending_messages: persistent work queue for SDK messages.
|
||||
-- worker_pid + UNIQUE(content_session_id, tool_use_id) make the claim
|
||||
-- query self-healing without any legacy stale-reset epoch column.
|
||||
-- ─────────────────────────────────────────────────────────────────────
|
||||
CREATE TABLE IF NOT EXISTS pending_messages (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
session_db_id INTEGER NOT NULL,
|
||||
content_session_id TEXT NOT NULL,
|
||||
tool_use_id TEXT,
|
||||
message_type TEXT NOT NULL
|
||||
CHECK(message_type IN ('observation', 'summarize')),
|
||||
tool_name TEXT,
|
||||
tool_input TEXT,
|
||||
tool_response TEXT,
|
||||
cwd TEXT,
|
||||
last_user_message TEXT,
|
||||
last_assistant_message TEXT,
|
||||
prompt_number INTEGER,
|
||||
status TEXT NOT NULL DEFAULT 'pending'
|
||||
CHECK(status IN ('pending', 'processing', 'processed', 'failed')),
|
||||
retry_count INTEGER NOT NULL DEFAULT 0,
|
||||
created_at_epoch INTEGER NOT NULL,
|
||||
failed_at_epoch INTEGER,
|
||||
completed_at_epoch INTEGER,
|
||||
worker_pid INTEGER,
|
||||
agent_type TEXT,
|
||||
agent_id TEXT,
|
||||
FOREIGN KEY (session_db_id) REFERENCES sdk_sessions(id) ON DELETE CASCADE
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS idx_pending_messages_session ON pending_messages(session_db_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_pending_messages_status ON pending_messages(status);
|
||||
CREATE INDEX IF NOT EXISTS idx_pending_messages_claude_session ON pending_messages(content_session_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_pending_messages_worker_pid ON pending_messages(worker_pid);
|
||||
CREATE UNIQUE INDEX IF NOT EXISTS ux_pending_session_tool
|
||||
ON pending_messages(content_session_id, tool_use_id)
|
||||
WHERE tool_use_id IS NOT NULL;
|
||||
|
||||
-- ─────────────────────────────────────────────────────────────────────
|
||||
-- user_prompts: per-prompt history (UI + FTS search).
|
||||
-- ─────────────────────────────────────────────────────────────────────
|
||||
CREATE TABLE IF NOT EXISTS user_prompts (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
content_session_id TEXT NOT NULL,
|
||||
prompt_number INTEGER NOT NULL,
|
||||
prompt_text TEXT NOT NULL,
|
||||
created_at TEXT NOT NULL,
|
||||
created_at_epoch INTEGER NOT NULL,
|
||||
FOREIGN KEY(content_session_id) REFERENCES sdk_sessions(content_session_id) ON DELETE CASCADE
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS idx_user_prompts_claude_session ON user_prompts(content_session_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_user_prompts_created ON user_prompts(created_at_epoch DESC);
|
||||
CREATE INDEX IF NOT EXISTS idx_user_prompts_prompt_number ON user_prompts(prompt_number);
|
||||
CREATE INDEX IF NOT EXISTS idx_user_prompts_lookup ON user_prompts(content_session_id, prompt_number);
|
||||
|
||||
-- ─────────────────────────────────────────────────────────────────────
|
||||
-- observation_feedback: usage-signal tracking for tier routing.
|
||||
-- ─────────────────────────────────────────────────────────────────────
|
||||
CREATE TABLE IF NOT EXISTS observation_feedback (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
observation_id INTEGER NOT NULL,
|
||||
signal_type TEXT NOT NULL,
|
||||
session_db_id INTEGER,
|
||||
created_at_epoch INTEGER NOT NULL,
|
||||
metadata TEXT,
|
||||
FOREIGN KEY (observation_id) REFERENCES observations(id) ON DELETE CASCADE
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS idx_feedback_observation ON observation_feedback(observation_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_feedback_signal ON observation_feedback(signal_type);
|
||||
@@ -10,7 +10,7 @@ import { Database } from 'bun:sqlite';
|
||||
import { logger } from '../../utils/logger.js';
|
||||
import type { ObservationInput } from './observations/types.js';
|
||||
import type { SummaryInput } from './summaries/types.js';
|
||||
import { computeObservationContentHash, findDuplicateObservation } from './observations/store.js';
|
||||
import { computeObservationContentHash } from './observations/store.js';
|
||||
|
||||
/**
|
||||
* Result from storeObservations / storeObservationsAndMarkComplete transaction
|
||||
@@ -64,23 +64,25 @@ export function storeObservationsAndMarkComplete(
|
||||
const storeAndMarkTx = db.transaction(() => {
|
||||
const observationIds: number[] = [];
|
||||
|
||||
// 1. Store all observations (with content-hash deduplication)
|
||||
// 1. Store all observations.
|
||||
// UNIQUE(memory_session_id, content_hash) + ON CONFLICT DO NOTHING enforces
|
||||
// dedup at the DB layer (Plan 01 Phase 4). RETURNING gives us the row id
|
||||
// when the insert went through; on conflict we look up the existing id.
|
||||
const obsStmt = db.prepare(`
|
||||
INSERT INTO observations
|
||||
(memory_session_id, project, type, title, subtitle, facts, narrative, concepts,
|
||||
files_read, files_modified, prompt_number, discovery_tokens, agent_type, agent_id, content_hash, created_at, created_at_epoch)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||
ON CONFLICT(memory_session_id, content_hash) DO NOTHING
|
||||
RETURNING id
|
||||
`);
|
||||
const lookupExistingStmt = db.prepare(
|
||||
'SELECT id FROM observations WHERE memory_session_id = ? AND content_hash = ?'
|
||||
);
|
||||
|
||||
for (const observation of observations) {
|
||||
const contentHash = computeObservationContentHash(memorySessionId, observation.title, observation.narrative);
|
||||
const existing = findDuplicateObservation(db, contentHash, timestampEpoch);
|
||||
if (existing) {
|
||||
observationIds.push(existing.id);
|
||||
continue;
|
||||
}
|
||||
|
||||
const result = obsStmt.run(
|
||||
const inserted = obsStmt.get(
|
||||
memorySessionId,
|
||||
project,
|
||||
observation.type,
|
||||
@@ -98,8 +100,20 @@ export function storeObservationsAndMarkComplete(
|
||||
contentHash,
|
||||
timestampIso,
|
||||
timestampEpoch
|
||||
);
|
||||
observationIds.push(Number(result.lastInsertRowid));
|
||||
) as { id: number } | null;
|
||||
|
||||
if (inserted) {
|
||||
observationIds.push(inserted.id);
|
||||
continue;
|
||||
}
|
||||
|
||||
const existing = lookupExistingStmt.get(memorySessionId, contentHash) as { id: number } | null;
|
||||
if (!existing) {
|
||||
throw new Error(
|
||||
`storeObservationsAndMarkComplete: ON CONFLICT without existing row for content_hash=${contentHash}`
|
||||
);
|
||||
}
|
||||
observationIds.push(existing.id);
|
||||
}
|
||||
|
||||
// 2. Store summary if provided
|
||||
@@ -185,23 +199,24 @@ export function storeObservations(
|
||||
const storeTx = db.transaction(() => {
|
||||
const observationIds: number[] = [];
|
||||
|
||||
// 1. Store all observations (with content-hash deduplication)
|
||||
// 1. Store all observations.
|
||||
// UNIQUE(memory_session_id, content_hash) + ON CONFLICT DO NOTHING enforces
|
||||
// dedup at the DB layer (Plan 01 Phase 4).
|
||||
const obsStmt = db.prepare(`
|
||||
INSERT INTO observations
|
||||
(memory_session_id, project, type, title, subtitle, facts, narrative, concepts,
|
||||
files_read, files_modified, prompt_number, discovery_tokens, agent_type, agent_id, content_hash, created_at, created_at_epoch)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||
ON CONFLICT(memory_session_id, content_hash) DO NOTHING
|
||||
RETURNING id
|
||||
`);
|
||||
const lookupExistingStmt = db.prepare(
|
||||
'SELECT id FROM observations WHERE memory_session_id = ? AND content_hash = ?'
|
||||
);
|
||||
|
||||
for (const observation of observations) {
|
||||
const contentHash = computeObservationContentHash(memorySessionId, observation.title, observation.narrative);
|
||||
const existing = findDuplicateObservation(db, contentHash, timestampEpoch);
|
||||
if (existing) {
|
||||
observationIds.push(existing.id);
|
||||
continue;
|
||||
}
|
||||
|
||||
const result = obsStmt.run(
|
||||
const inserted = obsStmt.get(
|
||||
memorySessionId,
|
||||
project,
|
||||
observation.type,
|
||||
@@ -219,8 +234,20 @@ export function storeObservations(
|
||||
contentHash,
|
||||
timestampIso,
|
||||
timestampEpoch
|
||||
);
|
||||
observationIds.push(Number(result.lastInsertRowid));
|
||||
) as { id: number } | null;
|
||||
|
||||
if (inserted) {
|
||||
observationIds.push(inserted.id);
|
||||
continue;
|
||||
}
|
||||
|
||||
const existing = lookupExistingStmt.get(memorySessionId, contentHash) as { id: number } | null;
|
||||
if (!existing) {
|
||||
throw new Error(
|
||||
`storeObservations: ON CONFLICT without existing row for content_hash=${contentHash}`
|
||||
);
|
||||
}
|
||||
observationIds.push(existing.id);
|
||||
}
|
||||
|
||||
// 2. Store summary if provided
|
||||
|
||||
@@ -341,6 +341,73 @@ export class ChromaMcpManager {
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Deep semantic-search probe — verifies the actual query path works,
|
||||
* not just that the subprocess responds to one tool. Each stage is wrapped
|
||||
* in its own try/catch so the returned `stage` reflects where it failed.
|
||||
*
|
||||
* Stages:
|
||||
* - 'list' → chroma_list_collections (also counts collections)
|
||||
* - 'query' → chroma_query_documents against cm__claude-mem with a trivial
|
||||
* query and n_results: 1 (measures latency)
|
||||
* - 'done' → both stages succeeded
|
||||
*/
|
||||
async probeSemanticSearch(): Promise<{
|
||||
ok: boolean;
|
||||
stage: 'connect' | 'list' | 'query' | 'done';
|
||||
error?: string;
|
||||
collections?: number;
|
||||
queryLatencyMs?: number;
|
||||
}> {
|
||||
let collections: number | undefined;
|
||||
|
||||
// Stage: list — also lazy-connects via callTool
|
||||
try {
|
||||
const listResult: any = await this.callTool('chroma_list_collections', { limit: 100 });
|
||||
if (Array.isArray(listResult)) {
|
||||
collections = listResult.length;
|
||||
} else if (listResult && Array.isArray(listResult.collections)) {
|
||||
collections = listResult.collections.length;
|
||||
} else if (listResult && typeof listResult === 'object' && 'length' in listResult) {
|
||||
collections = (listResult as { length: number }).length;
|
||||
}
|
||||
} catch (error) {
|
||||
const message = error instanceof Error ? error.message : String(error);
|
||||
logger.warn('CHROMA_MCP', 'Deep probe failed at list stage', { error: message });
|
||||
return { ok: false, stage: 'list', error: message };
|
||||
}
|
||||
|
||||
// Stage: query — round-trip through the embedding/vector path
|
||||
const queryStartedAt = Date.now();
|
||||
try {
|
||||
await this.callTool('chroma_query_documents', {
|
||||
collection_name: 'cm__claude-mem',
|
||||
query_texts: ['ping'],
|
||||
n_results: 1
|
||||
});
|
||||
const queryLatencyMs = Date.now() - queryStartedAt;
|
||||
return { ok: true, stage: 'done', collections, queryLatencyMs };
|
||||
} catch (error) {
|
||||
const queryLatencyMs = Date.now() - queryStartedAt;
|
||||
const rawMessage = error instanceof Error ? error.message : String(error);
|
||||
const isMissingOrEmpty = /not exist|missing|empty|no such/i.test(rawMessage);
|
||||
const errorMessage = isMissingOrEmpty
|
||||
? `collection cm__claude-mem missing or empty (${rawMessage})`
|
||||
: rawMessage;
|
||||
logger.warn('CHROMA_MCP', 'Deep probe failed at query stage', {
|
||||
error: rawMessage,
|
||||
queryLatencyMs
|
||||
});
|
||||
return {
|
||||
ok: false,
|
||||
stage: 'query',
|
||||
error: errorMessage,
|
||||
collections,
|
||||
queryLatencyMs
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Gracefully stop the MCP connection and kill the chroma-mcp subprocess.
|
||||
* client.close() sends stdin close -> SIGTERM -> SIGKILL to the subprocess.
|
||||
|
||||
@@ -549,9 +549,10 @@ export class ChromaSync {
|
||||
* Reads from SQLite and syncs in batches
|
||||
* @param projectOverride - If provided, backfill this project instead of this.project.
|
||||
* Used by backfillAllProjects() to iterate projects without mutating instance state.
|
||||
* @param storeOverride - If provided, use this SessionStore instead of creating a new one.
|
||||
* Throws error if backfill fails
|
||||
*/
|
||||
async ensureBackfilled(projectOverride?: string): Promise<void> {
|
||||
async ensureBackfilled(projectOverride?: string, storeOverride?: SessionStore): Promise<void> {
|
||||
const backfillProject = projectOverride ?? this.project;
|
||||
logger.info('CHROMA_SYNC', 'Starting smart backfill', { project: backfillProject });
|
||||
|
||||
@@ -560,7 +561,7 @@ export class ChromaSync {
|
||||
// Fetch existing IDs from Chroma (fast, metadata only)
|
||||
const existing = await this.getExistingChromaIds(backfillProject);
|
||||
|
||||
const db = new SessionStore();
|
||||
const db = storeOverride ?? new SessionStore();
|
||||
|
||||
try {
|
||||
await this.runBackfillPipeline(db, backfillProject, existing);
|
||||
@@ -568,7 +569,10 @@ export class ChromaSync {
|
||||
logger.error('CHROMA_SYNC', 'Backfill failed', { project: backfillProject }, error instanceof Error ? error : new Error(String(error)));
|
||||
throw new Error(`Backfill failed: ${error instanceof Error ? error.message : String(error)}`);
|
||||
} finally {
|
||||
db.close();
|
||||
// Only close if we created it
|
||||
if (!storeOverride) {
|
||||
db.close();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -861,8 +865,8 @@ export class ChromaSync {
|
||||
* with project scoped via metadata, matching how DatabaseManager and SearchManager operate.
|
||||
* Designed to be called fire-and-forget on worker startup.
|
||||
*/
|
||||
static async backfillAllProjects(): Promise<void> {
|
||||
const db = new SessionStore();
|
||||
static async backfillAllProjects(storeOverride?: SessionStore): Promise<void> {
|
||||
const db = storeOverride ?? new SessionStore();
|
||||
const sync = new ChromaSync('claude-mem');
|
||||
try {
|
||||
const projects = db.db.prepare(
|
||||
@@ -873,7 +877,7 @@ export class ChromaSync {
|
||||
|
||||
for (const { project } of projects) {
|
||||
try {
|
||||
await sync.ensureBackfilled(project);
|
||||
await sync.ensureBackfilled(project, db);
|
||||
} catch (error) {
|
||||
if (error instanceof Error) {
|
||||
logger.error('CHROMA_SYNC', `Backfill failed for project: ${project}`, {}, error);
|
||||
@@ -885,7 +889,10 @@ export class ChromaSync {
|
||||
}
|
||||
} finally {
|
||||
await sync.close();
|
||||
db.close();
|
||||
// Only close if we created it
|
||||
if (!storeOverride) {
|
||||
db.close();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -1,6 +1,5 @@
|
||||
import path from 'path';
|
||||
import { sessionInitHandler } from '../../cli/handlers/session-init.js';
|
||||
import { observationHandler } from '../../cli/handlers/observation.js';
|
||||
import { fileEditHandler } from '../../cli/handlers/file-edit.js';
|
||||
import { sessionCompleteHandler } from '../../cli/handlers/session-complete.js';
|
||||
import { ensureWorkerRunning, workerHttpRequest } from '../../shared/worker-utils.js';
|
||||
@@ -12,6 +11,7 @@ import { resolveFieldSpec, resolveFields, matchesRule } from './field-utils.js';
|
||||
import { expandHomePath } from './config.js';
|
||||
import type { TranscriptSchema, WatchTarget, SchemaEvent } from './types.js';
|
||||
import { normalizePlatformSource } from '../../shared/platform-source.js';
|
||||
import { ingestObservation } from '../worker/http/shared.js';
|
||||
|
||||
interface SessionState {
|
||||
sessionId: string;
|
||||
@@ -20,14 +20,10 @@ interface SessionState {
|
||||
project?: string;
|
||||
lastUserMessage?: string;
|
||||
lastAssistantMessage?: string;
|
||||
pendingTools: Map<string, { name?: string; input?: unknown }>;
|
||||
}
|
||||
|
||||
interface PendingTool {
|
||||
id?: string;
|
||||
name?: string;
|
||||
input?: unknown;
|
||||
response?: unknown;
|
||||
// In-memory pairing for transcript schemas (e.g. Codex) where tool_use
|
||||
// carries toolName + toolInput and tool_result only carries tool_use_id +
|
||||
// output. Keyed by toolId; consumed and deleted on the matching tool_result.
|
||||
pendingTools?: Map<string, { toolName: string; toolInput: unknown }>;
|
||||
}
|
||||
|
||||
export class TranscriptEventProcessor {
|
||||
@@ -56,7 +52,6 @@ export class TranscriptEventProcessor {
|
||||
session = {
|
||||
sessionId,
|
||||
platformSource: normalizePlatformSource(watch.name),
|
||||
pendingTools: new Map()
|
||||
};
|
||||
this.sessions.set(key, session);
|
||||
}
|
||||
@@ -129,7 +124,7 @@ export class TranscriptEventProcessor {
|
||||
const project = this.resolveProject(entry, watch, schema, event, session);
|
||||
if (project) session.project = project;
|
||||
|
||||
const fields = resolveFields(event.fields, entry, { watch, schema, session });
|
||||
const fields = resolveFields(event.fields, entry, { watch, schema, session: session as unknown as Record<string, unknown> });
|
||||
|
||||
switch (event.action) {
|
||||
case 'session_context':
|
||||
@@ -196,12 +191,6 @@ export class TranscriptEventProcessor {
|
||||
const toolInput = this.maybeParseJson(fields.toolInput);
|
||||
const toolResponse = this.maybeParseJson(fields.toolResponse);
|
||||
|
||||
const pending: PendingTool = { id: toolId, name: toolName, input: toolInput, response: toolResponse };
|
||||
|
||||
if (toolId) {
|
||||
session.pendingTools.set(toolId, { name: pending.name, input: pending.input });
|
||||
}
|
||||
|
||||
if (toolName === 'apply_patch' && typeof toolInput === 'string') {
|
||||
const files = this.parseApplyPatchFiles(toolInput);
|
||||
for (const filePath of files) {
|
||||
@@ -212,35 +201,61 @@ export class TranscriptEventProcessor {
|
||||
}
|
||||
}
|
||||
|
||||
if (toolResponse !== undefined && toolName) {
|
||||
// Two schema shapes to support:
|
||||
// 1. Self-contained events (e.g. Claude JSONL): tool_use and tool_result
|
||||
// both carry toolName; tool_use may already include toolResponse.
|
||||
// 2. Split events (e.g. Codex): tool_use carries toolName + toolInput,
|
||||
// tool_result carries only toolUseId + output. Neither side alone
|
||||
// has both toolName and toolResponse.
|
||||
//
|
||||
// For (1) we emit eagerly when toolResponse is present. For (2) we stash
|
||||
// toolName/toolInput on the session keyed by toolId so handleToolResult
|
||||
// can join them at tool_result time. The DB's
|
||||
// UNIQUE(content_session_id, tool_use_id) index collapses any duplicate
|
||||
// emissions that arise when both events carry a complete record.
|
||||
if (toolName && toolResponse !== undefined) {
|
||||
await this.sendObservation(session, {
|
||||
toolName,
|
||||
toolInput,
|
||||
toolResponse
|
||||
toolResponse,
|
||||
toolUseId: toolId,
|
||||
});
|
||||
} else if (toolName && toolId) {
|
||||
if (!session.pendingTools) session.pendingTools = new Map();
|
||||
session.pendingTools.set(toolId, { toolName, toolInput });
|
||||
}
|
||||
}
|
||||
|
||||
private async handleToolResult(session: SessionState, fields: Record<string, unknown>): Promise<void> {
|
||||
const toolId = typeof fields.toolId === 'string' ? fields.toolId : undefined;
|
||||
const toolName = typeof fields.toolName === 'string' ? fields.toolName : undefined;
|
||||
let toolName = typeof fields.toolName === 'string' ? fields.toolName : undefined;
|
||||
const toolResponse = this.maybeParseJson(fields.toolResponse);
|
||||
let toolInput = this.maybeParseJson(fields.toolInput);
|
||||
|
||||
let toolInput: unknown = this.maybeParseJson(fields.toolInput);
|
||||
let name = toolName;
|
||||
|
||||
if (toolId && session.pendingTools.has(toolId)) {
|
||||
const pending = session.pendingTools.get(toolId)!;
|
||||
toolInput = pending.input ?? toolInput;
|
||||
name = name ?? pending.name;
|
||||
session.pendingTools.delete(toolId);
|
||||
// Consume any pending-tool entry for this toolId regardless of whether the
|
||||
// tool_result already carries toolName: in the split-schema path the
|
||||
// result always resolves the pending entry, so leaving it behind would
|
||||
// grow the map until session end.
|
||||
if (toolId && session.pendingTools) {
|
||||
const pending = session.pendingTools.get(toolId);
|
||||
if (pending) {
|
||||
if (!toolName) toolName = pending.toolName;
|
||||
if (toolInput === undefined) toolInput = pending.toolInput;
|
||||
session.pendingTools.delete(toolId);
|
||||
}
|
||||
}
|
||||
|
||||
if (name) {
|
||||
if (toolName) {
|
||||
await this.sendObservation(session, {
|
||||
toolName: name,
|
||||
toolName,
|
||||
toolInput,
|
||||
toolResponse
|
||||
toolResponse,
|
||||
toolUseId: toolId,
|
||||
});
|
||||
} else {
|
||||
logger.debug('TRANSCRIPT', 'Dropping tool_result with no resolvable toolName', {
|
||||
sessionId: session.sessionId,
|
||||
toolId,
|
||||
});
|
||||
}
|
||||
}
|
||||
@@ -249,14 +264,23 @@ export class TranscriptEventProcessor {
|
||||
const toolName = typeof fields.toolName === 'string' ? fields.toolName : undefined;
|
||||
if (!toolName) return;
|
||||
|
||||
await observationHandler.execute({
|
||||
sessionId: session.sessionId,
|
||||
// PATHFINDER plan 03 phase 7: replace HTTP loopback (worker → its own
|
||||
// /api/sessions/observations endpoint) with a direct in-process call to
|
||||
// ingestObservation. Same implementation backs the cross-process HTTP
|
||||
// route handler (one helper, N callers).
|
||||
const result = ingestObservation({
|
||||
contentSessionId: session.sessionId,
|
||||
cwd: session.cwd ?? process.cwd(),
|
||||
toolName,
|
||||
toolInput: this.maybeParseJson(fields.toolInput),
|
||||
toolResponse: this.maybeParseJson(fields.toolResponse),
|
||||
platform: session.platformSource
|
||||
platformSource: session.platformSource,
|
||||
toolUseId: typeof fields.toolUseId === 'string' ? fields.toolUseId : undefined,
|
||||
});
|
||||
|
||||
if (!result.ok) {
|
||||
throw new Error(`ingestObservation failed: ${result.reason}`);
|
||||
}
|
||||
}
|
||||
|
||||
private async sendFileEdit(session: SessionState, fields: Record<string, unknown>): Promise<void> {
|
||||
@@ -277,10 +301,17 @@ export class TranscriptEventProcessor {
|
||||
const trimmed = value.trim();
|
||||
if (!trimmed) return value;
|
||||
if (!(trimmed.startsWith('{') || trimmed.startsWith('['))) return value;
|
||||
// Pass through the raw string on parse failure rather than throwing.
|
||||
// Throwing from this helper propagates to `handleLine`'s outer catch,
|
||||
// which then silently drops the entire transcript line — including any
|
||||
// valid sibling fields. A single malformed JSON-shaped field should
|
||||
// degrade to opaque-string handling, not lose the whole observation.
|
||||
try {
|
||||
return JSON.parse(trimmed);
|
||||
} catch (error: unknown) {
|
||||
logger.debug('WORKER', 'Failed to parse JSON string', { length: trimmed.length }, error instanceof Error ? error : undefined);
|
||||
} catch (error) {
|
||||
logger.debug('TRANSCRIPT', 'Field looked like JSON but did not parse; using raw string', {
|
||||
preview: trimmed.slice(0, 120),
|
||||
}, error instanceof Error ? error : undefined);
|
||||
return value;
|
||||
}
|
||||
}
|
||||
@@ -314,7 +345,7 @@ export class TranscriptEventProcessor {
|
||||
platform: session.platformSource
|
||||
});
|
||||
await this.updateContext(session, watch);
|
||||
session.pendingTools.clear();
|
||||
session.pendingTools?.clear();
|
||||
const key = this.getSessionKey(watch, session.sessionId);
|
||||
this.sessions.delete(key);
|
||||
}
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
import { existsSync, statSync, watch as fsWatch, createReadStream } from 'fs';
|
||||
import { basename, join } from 'path';
|
||||
import { basename, join, resolve as resolvePath, sep as pathSep } from 'path';
|
||||
import { globSync } from 'glob';
|
||||
import { logger } from '../../utils/logger.js';
|
||||
import { expandHomePath } from './config.js';
|
||||
@@ -84,7 +84,7 @@ export class TranscriptWatcher {
|
||||
private processor = new TranscriptEventProcessor();
|
||||
private tailers = new Map<string, FileTailer>();
|
||||
private state: TranscriptWatchState;
|
||||
private rescanTimers: Array<NodeJS.Timeout> = [];
|
||||
private rootWatchers: Array<ReturnType<typeof fsWatch>> = [];
|
||||
|
||||
constructor(private config: TranscriptWatchConfig, private statePath: string) {
|
||||
this.state = loadWatchState(statePath);
|
||||
@@ -101,10 +101,10 @@ export class TranscriptWatcher {
|
||||
tailer.close();
|
||||
}
|
||||
this.tailers.clear();
|
||||
for (const timer of this.rescanTimers) {
|
||||
clearInterval(timer);
|
||||
for (const watcher of this.rootWatchers) {
|
||||
watcher.close();
|
||||
}
|
||||
this.rescanTimers = [];
|
||||
this.rootWatchers = [];
|
||||
}
|
||||
|
||||
private async setupWatch(watch: WatchTarget): Promise<void> {
|
||||
@@ -121,16 +121,80 @@ export class TranscriptWatcher {
|
||||
await this.addTailer(filePath, watch, schema, true);
|
||||
}
|
||||
|
||||
const rescanIntervalMs = watch.rescanIntervalMs ?? 5000;
|
||||
const timer = setInterval(async () => {
|
||||
const newFiles = this.resolveWatchFiles(resolvedPath);
|
||||
for (const filePath of newFiles) {
|
||||
if (!this.tailers.has(filePath)) {
|
||||
await this.addTailer(filePath, watch, schema, false);
|
||||
// PATHFINDER plan 03 phase 5: 5-second rescan timer replaced by a
|
||||
// recursive fs.watch on the configured root. Requires Node 20+ on Linux
|
||||
// for recursive mode (engines.node >= 20.0.0 — already enforced in
|
||||
// package.json).
|
||||
const watchRoot = this.deepestNonGlobAncestor(resolvedPath);
|
||||
if (!watchRoot || !existsSync(watchRoot)) {
|
||||
logger.debug('TRANSCRIPT', 'Watch root does not exist, skipping fs.watch', { watch: watch.name, watchRoot });
|
||||
return;
|
||||
}
|
||||
|
||||
try {
|
||||
const watcher = fsWatch(watchRoot, { recursive: true, persistent: true }, (event, name) => {
|
||||
if (!name) return; // some events omit filename
|
||||
// Skip the glob scan for paths we already tail — JSONL appends fire
|
||||
// here on every line and a full resolveWatchFiles() per append is
|
||||
// more expensive than the prior 5-s interval. Only unknown paths
|
||||
// warrant a rescan (new transcript files surface here first).
|
||||
const changed = resolvePath(watchRoot, name);
|
||||
if (this.tailers.has(changed)) return;
|
||||
const matches = this.resolveWatchFiles(resolvedPath);
|
||||
for (const filePath of matches) {
|
||||
if (!this.tailers.has(filePath)) {
|
||||
void this.addTailer(filePath, watch, schema, false);
|
||||
}
|
||||
}
|
||||
});
|
||||
this.rootWatchers.push(watcher);
|
||||
logger.info('TRANSCRIPT', 'Watching transcript root recursively', { watch: watch.name, watchRoot });
|
||||
} catch (error) {
|
||||
logger.warn('TRANSCRIPT', 'Failed to start recursive fs.watch on transcript root', {
|
||||
watch: watch.name,
|
||||
watchRoot,
|
||||
}, error instanceof Error ? error : undefined);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Return the deepest path component that contains no glob meta-characters.
|
||||
* Used to anchor `fs.watch(recursive: true)` for both literal directories
|
||||
* and patterns like `~/.codex/sessions/**\/*.jsonl`.
|
||||
*
|
||||
* Handles both `/` and `\` as separators so Windows-native paths
|
||||
* (e.g. `C:\Users\x\codex\sessions\**\*.jsonl`) resolve correctly. When
|
||||
* the input is purely glob meta (no literal prefix) we return an empty
|
||||
* string so the caller skips the watch instead of anchoring at the
|
||||
* filesystem root.
|
||||
*/
|
||||
private deepestNonGlobAncestor(inputPath: string): string {
|
||||
if (!this.hasGlob(inputPath)) {
|
||||
// Literal path: if it's a file, return its directory; otherwise return as-is.
|
||||
if (existsSync(inputPath)) {
|
||||
try {
|
||||
const stat = statSync(inputPath);
|
||||
return stat.isDirectory() ? inputPath : resolvePath(inputPath, '..');
|
||||
} catch {
|
||||
return resolvePath(inputPath, '..');
|
||||
}
|
||||
}
|
||||
}, rescanIntervalMs);
|
||||
this.rescanTimers.push(timer);
|
||||
return inputPath;
|
||||
}
|
||||
|
||||
const segments = inputPath.split(/[/\\]/);
|
||||
const literalSegments: string[] = [];
|
||||
for (const segment of segments) {
|
||||
if (/[*?[\]{}()]/.test(segment)) break;
|
||||
literalSegments.push(segment);
|
||||
}
|
||||
if (literalSegments.length === 0) return '';
|
||||
if (literalSegments.length === 1 && literalSegments[0] === '') {
|
||||
// Input started with a separator but the first real segment was a
|
||||
// glob (e.g. `/**/foo`). Don't silently broaden the watch to `/`.
|
||||
return '';
|
||||
}
|
||||
return literalSegments.join(pathSep);
|
||||
}
|
||||
|
||||
private resolveSchema(watch: WatchTarget): TranscriptSchema | null {
|
||||
|
||||
+146
-265
@@ -79,6 +79,7 @@ import { DatabaseManager } from './worker/DatabaseManager.js';
|
||||
import { SessionManager } from './worker/SessionManager.js';
|
||||
import { SSEBroadcaster } from './worker/SSEBroadcaster.js';
|
||||
import { SDKAgent } from './worker/SDKAgent.js';
|
||||
import type { WorkerRef } from './worker/agents/types.js';
|
||||
import { GeminiAgent, isGeminiSelected, isGeminiAvailable } from './worker/GeminiAgent.js';
|
||||
import { OpenRouterAgent, isOpenRouterSelected, isOpenRouterAvailable } from './worker/OpenRouterAgent.js';
|
||||
import { PaginationHelper } from './worker/PaginationHelper.js';
|
||||
@@ -88,6 +89,7 @@ import { FormattingService } from './worker/FormattingService.js';
|
||||
import { TimelineService } from './worker/TimelineService.js';
|
||||
import { SessionEventBroadcaster } from './worker/events/SessionEventBroadcaster.js';
|
||||
import { SessionCompletionHandler } from './worker/session/SessionCompletionHandler.js';
|
||||
import { setIngestContext, attachIngestGeneratorStarter } from './worker/http/shared.js';
|
||||
import { DEFAULT_CONFIG_PATH, DEFAULT_STATE_PATH, expandHomePath, loadTranscriptWatchConfig, writeSampleConfig } from './transcripts/config.js';
|
||||
import { TranscriptWatcher } from './transcripts/watcher.js';
|
||||
|
||||
@@ -100,14 +102,18 @@ import { SettingsRoutes } from './worker/http/routes/SettingsRoutes.js';
|
||||
import { LogsRoutes } from './worker/http/routes/LogsRoutes.js';
|
||||
import { MemoryRoutes } from './worker/http/routes/MemoryRoutes.js';
|
||||
import { CorpusRoutes } from './worker/http/routes/CorpusRoutes.js';
|
||||
import { ChromaRoutes } from './worker/http/routes/ChromaRoutes.js';
|
||||
|
||||
// Knowledge agent services
|
||||
import { CorpusStore } from './worker/knowledge/CorpusStore.js';
|
||||
import { CorpusBuilder } from './worker/knowledge/CorpusBuilder.js';
|
||||
import { KnowledgeAgent } from './worker/knowledge/KnowledgeAgent.js';
|
||||
|
||||
// Process management for zombie cleanup (Issue #737)
|
||||
import { startOrphanReaper, reapOrphanedProcesses, getProcessBySession, ensureProcessExit } from './worker/ProcessRegistry.js';
|
||||
// Primary-path session lifecycle helpers — no reapers, no orphan sweeps.
|
||||
// The SDK subprocess is spawned in its own POSIX process group via
|
||||
// createSdkSpawnFactory; teardown via ensureSdkProcessExit kills the whole
|
||||
// group so no descendants leak (Principle 5).
|
||||
import { getSdkProcessForSession, ensureSdkProcessExit } from '../supervisor/process-registry.js';
|
||||
|
||||
/**
|
||||
* Build JSON status output for hook framework communication.
|
||||
@@ -133,7 +139,7 @@ export function buildStatusOutput(status: 'ready' | 'error', message?: string):
|
||||
};
|
||||
}
|
||||
|
||||
export class WorkerService {
|
||||
export class WorkerService implements WorkerRef {
|
||||
private server: Server;
|
||||
private startTime: number = Date.now();
|
||||
private mcpClient: Client;
|
||||
@@ -146,14 +152,14 @@ export class WorkerService {
|
||||
// Service layer
|
||||
private dbManager: DatabaseManager;
|
||||
private sessionManager: SessionManager;
|
||||
private sseBroadcaster: SSEBroadcaster;
|
||||
public sseBroadcaster: SSEBroadcaster;
|
||||
private sdkAgent: SDKAgent;
|
||||
private geminiAgent: GeminiAgent;
|
||||
private openRouterAgent: OpenRouterAgent;
|
||||
private paginationHelper: PaginationHelper;
|
||||
private settingsManager: SettingsManager;
|
||||
private sessionEventBroadcaster: SessionEventBroadcaster;
|
||||
private sessionCompletionHandler: SessionCompletionHandler;
|
||||
private completionHandler: SessionCompletionHandler;
|
||||
private corpusStore: CorpusStore;
|
||||
|
||||
// Route handlers
|
||||
@@ -169,12 +175,6 @@ export class WorkerService {
|
||||
private initializationComplete: Promise<void>;
|
||||
private resolveInitialization!: () => void;
|
||||
|
||||
// Orphan reaper cleanup function (Issue #737)
|
||||
private stopOrphanReaper: (() => void) | null = null;
|
||||
|
||||
// Stale session reaper interval (Issue #1168)
|
||||
private staleSessionReaperInterval: ReturnType<typeof setInterval> | null = null;
|
||||
|
||||
// AI interaction tracking for health endpoint
|
||||
private lastAiInteraction: {
|
||||
timestamp: number;
|
||||
@@ -200,13 +200,21 @@ export class WorkerService {
|
||||
this.paginationHelper = new PaginationHelper(this.dbManager);
|
||||
this.settingsManager = new SettingsManager(this.dbManager);
|
||||
this.sessionEventBroadcaster = new SessionEventBroadcaster(this.sseBroadcaster, this);
|
||||
this.sessionCompletionHandler = new SessionCompletionHandler(
|
||||
this.completionHandler = new SessionCompletionHandler(
|
||||
this.sessionManager,
|
||||
this.sessionEventBroadcaster,
|
||||
this.dbManager
|
||||
this.dbManager,
|
||||
);
|
||||
this.corpusStore = new CorpusStore();
|
||||
|
||||
// Wire ingest helpers (plan 03 phase 0). Worker-internal callers use these
|
||||
// directly instead of HTTP-loopback into our own routes.
|
||||
setIngestContext({
|
||||
sessionManager: this.sessionManager,
|
||||
dbManager: this.dbManager,
|
||||
eventBroadcaster: this.sessionEventBroadcaster,
|
||||
});
|
||||
|
||||
// Set callback for when sessions are deleted
|
||||
this.sessionManager.setOnSessionDeleted(() => {
|
||||
this.broadcastProcessingStatus();
|
||||
@@ -268,6 +276,9 @@ export class WorkerService {
|
||||
private registerRoutes(): void {
|
||||
// IMPORTANT: Middleware must be registered BEFORE routes (Express processes in order)
|
||||
|
||||
// Register Chroma routes immediately so they bypass the initialization guard
|
||||
this.server.registerRoutes(new ChromaRoutes());
|
||||
|
||||
// Early handler for /api/context/inject — fail open if not yet initialized
|
||||
this.server.app.get('/api/context/inject', async (req, res, next) => {
|
||||
if (!this.initializationCompleteFlag || !this.searchRoutes) {
|
||||
@@ -281,14 +292,20 @@ export class WorkerService {
|
||||
|
||||
// Guard ALL /api/* routes during initialization — wait for DB with timeout
|
||||
// Exceptions: /api/health, /api/readiness, /api/version (handled by Server.ts core routes)
|
||||
// and /api/context/inject (handled above with fail-open)
|
||||
// and /api/chroma/status (diagnostic endpoint)
|
||||
this.server.app.use('/api', async (req, res, next) => {
|
||||
// Bypass guard for diagnostic endpoints
|
||||
if (req.path === '/chroma/status' || req.path === '/health' || req.path === '/readiness' || req.path === '/version') {
|
||||
next();
|
||||
return;
|
||||
}
|
||||
|
||||
if (this.initializationCompleteFlag) {
|
||||
next();
|
||||
return;
|
||||
}
|
||||
|
||||
const timeoutMs = 30000;
|
||||
const timeoutMs = 120000; // 2 minutes
|
||||
const timeoutPromise = new Promise<void>((_, reject) =>
|
||||
setTimeout(() => reject(new Error('Database initialization timeout')), timeoutMs)
|
||||
);
|
||||
@@ -312,7 +329,15 @@ export class WorkerService {
|
||||
|
||||
// Standard routes (registered AFTER guard middleware)
|
||||
this.server.registerRoutes(new ViewerRoutes(this.sseBroadcaster, this.dbManager, this.sessionManager));
|
||||
this.server.registerRoutes(new SessionRoutes(this.sessionManager, this.dbManager, this.sdkAgent, this.geminiAgent, this.openRouterAgent, this.sessionEventBroadcaster, this, this.sessionCompletionHandler));
|
||||
const sessionRoutes = new SessionRoutes(this.sessionManager, this.dbManager, this.sdkAgent, this.geminiAgent, this.openRouterAgent, this.sessionEventBroadcaster, this, this.completionHandler);
|
||||
this.server.registerRoutes(sessionRoutes);
|
||||
// Wire the generator-starter callback now that SessionRoutes exists.
|
||||
// `setIngestContext` ran in the constructor before routes were
|
||||
// constructed; transcript-watcher observations depend on this side-effect
|
||||
// to auto-start the SDK generator after enqueue.
|
||||
attachIngestGeneratorStarter((sessionDbId, source) =>
|
||||
sessionRoutes.ensureGeneratorRunning(sessionDbId, source),
|
||||
);
|
||||
this.server.registerRoutes(new DataRoutes(this.paginationHelper, this.dbManager, this.sessionManager, this.sseBroadcaster, this, this.startTime));
|
||||
this.server.registerRoutes(new SettingsRoutes(this.settingsManager));
|
||||
this.server.registerRoutes(new LogsRoutes());
|
||||
@@ -359,6 +384,7 @@ export class WorkerService {
|
||||
*/
|
||||
private async initializeBackground(): Promise<void> {
|
||||
try {
|
||||
logger.info('WORKER', 'Background initialization starting...');
|
||||
await aggressiveStartupCleanup();
|
||||
|
||||
// Load mode configuration
|
||||
@@ -368,47 +394,39 @@ export class WorkerService {
|
||||
|
||||
const settings = SettingsDefaultsManager.loadFromFile(USER_SETTINGS_PATH);
|
||||
|
||||
const modeId = settings.CLAUDE_MEM_MODE;
|
||||
ModeManager.getInstance().loadMode(modeId);
|
||||
logger.info('SYSTEM', `Mode loaded: ${modeId}`);
|
||||
|
||||
// One-time chroma wipe for users upgrading from versions with duplicate worker bugs.
|
||||
// Only runs in local mode (chroma is local-only). Backfill at line ~414 rebuilds from SQLite.
|
||||
if (settings.CLAUDE_MEM_MODE === 'local' || !settings.CLAUDE_MEM_MODE) {
|
||||
logger.info('WORKER', 'Checking for one-time Chroma migration...');
|
||||
runOneTimeChromaMigration();
|
||||
}
|
||||
|
||||
// One-time remap of pre-worktree project names using pending_messages.cwd.
|
||||
// Must run before dbManager.initialize() so we don't hold the DB open.
|
||||
logger.info('WORKER', 'Checking for one-time CWD remap...');
|
||||
runOneTimeCwdRemap();
|
||||
|
||||
// Stamp merged worktrees so their observations surface under the parent
|
||||
// project. Runs every startup (not marker-gated) because git state evolves
|
||||
// and the engine is fully idempotent. Must also precede dbManager.initialize().
|
||||
//
|
||||
// The worker daemon is spawned with cwd=marketplace-plugin-dir (not a git
|
||||
// repo), so we can't seed adoption with process.cwd(). Instead, discover
|
||||
// parent repos from recorded pending_messages.cwd values.
|
||||
let adoptions: Awaited<ReturnType<typeof adoptMergedWorktreesForAllKnownRepos>> | null = null;
|
||||
try {
|
||||
adoptions = await adoptMergedWorktreesForAllKnownRepos({});
|
||||
} catch (err) {
|
||||
// [ANTI-PATTERN IGNORED]: Worktree adoption is best-effort on startup; failure must not block worker initialization
|
||||
if (err instanceof Error) {
|
||||
logger.error('WORKER', 'Worktree adoption failed (non-fatal)', {}, err);
|
||||
} else {
|
||||
logger.error('WORKER', 'Worktree adoption failed (non-fatal) with non-Error', {}, new Error(String(err)));
|
||||
}
|
||||
}
|
||||
if (adoptions) {
|
||||
for (const adoption of adoptions) {
|
||||
if (adoption.adoptedObservations > 0 || adoption.adoptedSummaries > 0 || adoption.chromaUpdates > 0) {
|
||||
logger.info('SYSTEM', 'Merged worktrees adopted on startup', adoption);
|
||||
}
|
||||
if (adoption.errors.length > 0) {
|
||||
logger.warn('SYSTEM', 'Worktree adoption had per-branch errors', {
|
||||
repoPath: adoption.repoPath,
|
||||
errors: adoption.errors
|
||||
});
|
||||
// Stamp merged worktrees (Non-blocking, fire-and-forget)
|
||||
logger.info('WORKER', 'Adopting merged worktrees (background)...');
|
||||
adoptMergedWorktreesForAllKnownRepos({}).then(adoptions => {
|
||||
if (adoptions) {
|
||||
for (const adoption of adoptions) {
|
||||
if (adoption.adoptedObservations > 0 || adoption.adoptedSummaries > 0 || adoption.chromaUpdates > 0) {
|
||||
logger.info('SYSTEM', 'Merged worktrees adopted in background', adoption);
|
||||
}
|
||||
if (adoption.errors.length > 0) {
|
||||
logger.warn('SYSTEM', 'Worktree adoption had per-branch errors', {
|
||||
repoPath: adoption.repoPath,
|
||||
errors: adoption.errors
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}).catch(err => {
|
||||
logger.error('WORKER', 'Worktree adoption failed (background)', {}, err instanceof Error ? err : new Error(String(err)));
|
||||
});
|
||||
|
||||
// Initialize ChromaMcpManager only if Chroma is enabled
|
||||
const chromaEnabled = settings.CLAUDE_MEM_CHROMA_ENABLED !== 'false';
|
||||
@@ -419,21 +437,24 @@ export class WorkerService {
|
||||
logger.info('SYSTEM', 'Chroma disabled via CLAUDE_MEM_CHROMA_ENABLED=false, skipping ChromaMcpManager');
|
||||
}
|
||||
|
||||
const modeId = settings.CLAUDE_MEM_MODE;
|
||||
ModeManager.getInstance().loadMode(modeId);
|
||||
logger.info('SYSTEM', `Mode loaded: ${modeId}`);
|
||||
|
||||
logger.info('WORKER', 'Initializing database manager...');
|
||||
await this.dbManager.initialize();
|
||||
|
||||
// Reset any messages that were processing when worker died
|
||||
const { PendingMessageStore } = await import('./sqlite/PendingMessageStore.js');
|
||||
const pendingStore = new PendingMessageStore(this.dbManager.getSessionStore().db, 3);
|
||||
const resetCount = pendingStore.resetStaleProcessingMessages(0); // 0 = reset ALL processing
|
||||
if (resetCount > 0) {
|
||||
logger.info('SYSTEM', `Reset ${resetCount} stale processing messages to pending`);
|
||||
// One-shot GC for terminally-failed rows
|
||||
try {
|
||||
logger.info('WORKER', 'Running startup GC for pending messages...');
|
||||
const { PendingMessageStore } = await import('./sqlite/PendingMessageStore.js');
|
||||
const pendingStore = new PendingMessageStore(this.dbManager.getSessionStore().db, 3);
|
||||
const cleared = pendingStore.clearFailedOlderThan(7 * 24 * 60 * 60 * 1000);
|
||||
if (cleared > 0) {
|
||||
logger.info('QUEUE', 'Startup GC cleared old failed pending_messages rows', { cleared });
|
||||
}
|
||||
} catch (err) {
|
||||
logger.warn('QUEUE', 'Startup GC for failed pending_messages rows failed', {}, err instanceof Error ? err : undefined);
|
||||
}
|
||||
|
||||
// Initialize search services
|
||||
logger.info('WORKER', 'Initializing search services...');
|
||||
const formattingService = new FormattingService();
|
||||
const timelineService = new TimelineService();
|
||||
const searchManager = new SearchManager(
|
||||
@@ -464,8 +485,6 @@ export class WorkerService {
|
||||
logger.info('WORKER', 'CorpusRoutes registered');
|
||||
|
||||
// DB and search are ready — mark initialization complete so hooks can proceed.
|
||||
// MCP connection is tracked separately via mcpReady and is NOT required for
|
||||
// the worker to serve context/search requests.
|
||||
this.initializationCompleteFlag = true;
|
||||
this.resolveInitialization();
|
||||
logger.info('SYSTEM', 'Core initialization complete (DB + search ready)');
|
||||
@@ -474,7 +493,7 @@ export class WorkerService {
|
||||
|
||||
// Auto-backfill Chroma for all projects if out of sync with SQLite (fire-and-forget)
|
||||
if (this.chromaMcpManager) {
|
||||
ChromaSync.backfillAllProjects().then(() => {
|
||||
ChromaSync.backfillAllProjects(this.dbManager.getSessionStore()).then(() => {
|
||||
logger.info('CHROMA_SYNC', 'Backfill check complete for all projects');
|
||||
}).catch(error => {
|
||||
logger.error('CHROMA_SYNC', 'Backfill failed (non-blocking)', {}, error as Error);
|
||||
@@ -482,134 +501,55 @@ export class WorkerService {
|
||||
}
|
||||
|
||||
// Mark MCP as externally ready once the bundled stdio server binary exists.
|
||||
// Codex/Claude Desktop connect to this binary directly; the loopback client
|
||||
// below is only a best-effort self-check and should not mark health false.
|
||||
const mcpServerPath = path.join(__dirname, 'mcp-server.cjs');
|
||||
this.mcpReady = existsSync(mcpServerPath);
|
||||
|
||||
// Best-effort loopback MCP self-check
|
||||
getSupervisor().assertCanSpawn('mcp server');
|
||||
const transport = new StdioClientTransport({
|
||||
command: process.execPath, // Use resolved path, not bare 'node' which fails on non-interactive PATH (#1876)
|
||||
args: [mcpServerPath],
|
||||
env: sanitizeEnv(process.env)
|
||||
// Best-effort loopback MCP self-check (Non-blocking, F&F)
|
||||
this.runMcpSelfCheck(mcpServerPath).catch(err => {
|
||||
logger.debug('WORKER', 'MCP self-check failed (non-fatal)', { error: err.message });
|
||||
});
|
||||
|
||||
const MCP_INIT_TIMEOUT_MS = 300000;
|
||||
return;
|
||||
} catch (error) {
|
||||
// Background initialization failed - log and let worker fail health checks
|
||||
logger.error('SYSTEM', 'Background initialization failed', {}, error instanceof Error ? error : undefined);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Run a best-effort loopback MCP self-check to verify the bundled server can start.
|
||||
* This is entirely diagnostic and does not block worker availability.
|
||||
*/
|
||||
private async runMcpSelfCheck(mcpServerPath: string): Promise<void> {
|
||||
try {
|
||||
getSupervisor().assertCanSpawn('mcp server');
|
||||
const transport = new StdioClientTransport({
|
||||
command: process.execPath,
|
||||
args: [mcpServerPath],
|
||||
env: Object.fromEntries(
|
||||
Object.entries(sanitizeEnv(process.env)).filter(([, value]) => value !== undefined)
|
||||
) as Record<string, string>
|
||||
});
|
||||
|
||||
const MCP_INIT_TIMEOUT_MS = 60000; // 1 minute is plenty for local check
|
||||
const mcpConnectionPromise = this.mcpClient.connect(transport);
|
||||
let timeoutId: ReturnType<typeof setTimeout>;
|
||||
|
||||
const timeoutPromise = new Promise<never>((_, reject) => {
|
||||
timeoutId = setTimeout(
|
||||
() => reject(new Error('MCP connection timeout after 5 minutes')),
|
||||
MCP_INIT_TIMEOUT_MS
|
||||
setTimeout(
|
||||
() => reject(new Error('MCP connection timeout')),
|
||||
60000
|
||||
);
|
||||
});
|
||||
|
||||
try {
|
||||
await Promise.race([mcpConnectionPromise, timeoutPromise]);
|
||||
} catch (connectionError) {
|
||||
clearTimeout(timeoutId!);
|
||||
logger.warn('WORKER', 'MCP loopback self-check failed, cleaning up subprocess', {
|
||||
error: connectionError instanceof Error ? connectionError.message : String(connectionError)
|
||||
});
|
||||
try {
|
||||
await transport.close();
|
||||
} catch (transportCloseError) {
|
||||
// [ANTI-PATTERN IGNORED]: transport.close() is best-effort cleanup after MCP connection already failed; supervisor handles orphan processes
|
||||
logger.debug('WORKER', 'transport.close() failed during MCP cleanup', {
|
||||
error: transportCloseError instanceof Error ? transportCloseError.message : String(transportCloseError)
|
||||
});
|
||||
}
|
||||
logger.info('WORKER', 'Bundled MCP server remains available for external stdio clients', {
|
||||
path: mcpServerPath
|
||||
});
|
||||
return;
|
||||
}
|
||||
clearTimeout(timeoutId!);
|
||||
await Promise.race([mcpConnectionPromise, timeoutPromise]);
|
||||
logger.info('WORKER', 'MCP loopback self-check connected successfully');
|
||||
|
||||
const mcpProcess = (transport as unknown as { _process?: import('child_process').ChildProcess })._process;
|
||||
if (mcpProcess?.pid) {
|
||||
getSupervisor().registerProcess('mcp-server', {
|
||||
pid: mcpProcess.pid,
|
||||
type: 'mcp',
|
||||
startedAt: new Date().toISOString()
|
||||
}, mcpProcess);
|
||||
mcpProcess.once('exit', () => {
|
||||
getSupervisor().unregisterProcess('mcp-server');
|
||||
});
|
||||
}
|
||||
logger.success('WORKER', 'MCP loopback self-check connected');
|
||||
|
||||
// Start orphan reaper to clean up zombie processes (Issue #737)
|
||||
this.stopOrphanReaper = startOrphanReaper(() => {
|
||||
const activeIds = new Set<number>();
|
||||
for (const [id] of this.sessionManager['sessions']) {
|
||||
activeIds.add(id);
|
||||
}
|
||||
return activeIds;
|
||||
});
|
||||
logger.info('SYSTEM', 'Started orphan reaper (runs every 30 seconds)');
|
||||
|
||||
// Reap stale sessions to unblock orphan process cleanup (Issue #1168)
|
||||
this.staleSessionReaperInterval = setInterval(async () => {
|
||||
try {
|
||||
const reaped = await this.sessionManager.reapStaleSessions();
|
||||
if (reaped > 0) {
|
||||
logger.info('SYSTEM', `Reaped ${reaped} stale sessions`);
|
||||
}
|
||||
} catch (e) {
|
||||
// [ANTI-PATTERN IGNORED]: setInterval callback cannot throw; reaper retries on next tick (every 2 min)
|
||||
if (e instanceof Error) {
|
||||
logger.error('WORKER', 'Stale session reaper error', {}, e);
|
||||
} else {
|
||||
logger.error('WORKER', 'Stale session reaper error with non-Error', {}, new Error(String(e)));
|
||||
}
|
||||
}
|
||||
|
||||
// Purge stale failed pending messages to prevent unbounded queue growth (#1957)
|
||||
// Only remove failures older than 1 hour to preserve recent failures for inspection/retry
|
||||
try {
|
||||
const pendingStore = this.sessionManager.getPendingMessageStore();
|
||||
const FAILED_MESSAGE_RETENTION_MS = 60 * 60 * 1000; // 1 hour
|
||||
const purged = pendingStore.clearFailedOlderThan(FAILED_MESSAGE_RETENTION_MS);
|
||||
if (purged > 0) {
|
||||
logger.info('SYSTEM', `Purged ${purged} stale failed pending messages (older than 1h)`);
|
||||
}
|
||||
} catch (e) {
|
||||
if (e instanceof Error) {
|
||||
logger.error('WORKER', 'Failed message purge error', {}, e);
|
||||
} else {
|
||||
logger.error('WORKER', 'Failed message purge error with non-Error', {}, new Error(String(e)));
|
||||
}
|
||||
}
|
||||
|
||||
// Periodic WAL checkpoint to prevent unbounded WAL growth (#1956)
|
||||
try {
|
||||
this.dbManager.getSessionStore().db.run('PRAGMA wal_checkpoint(PASSIVE)');
|
||||
} catch (e) {
|
||||
if (e instanceof Error) {
|
||||
logger.error('WORKER', 'WAL checkpoint error', {}, e);
|
||||
} else {
|
||||
logger.error('WORKER', 'WAL checkpoint error with non-Error', {}, new Error(String(e)));
|
||||
}
|
||||
}
|
||||
}, 2 * 60 * 1000);
|
||||
|
||||
// Auto-recover orphaned queues (fire-and-forget with error logging)
|
||||
this.processPendingQueues(50).then(result => {
|
||||
if (result.sessionsStarted > 0) {
|
||||
logger.info('SYSTEM', `Auto-recovered ${result.sessionsStarted} sessions with pending work`, {
|
||||
totalPending: result.totalPendingSessions,
|
||||
started: result.sessionsStarted,
|
||||
sessionIds: result.startedSessionIds
|
||||
});
|
||||
}
|
||||
}).catch(error => {
|
||||
logger.error('SYSTEM', 'Auto-recovery of pending queues failed', {}, error as Error);
|
||||
});
|
||||
// Cleanup
|
||||
await transport.close();
|
||||
} catch (error) {
|
||||
logger.error('SYSTEM', 'Background initialization failed', {}, error as Error);
|
||||
throw error;
|
||||
logger.warn('WORKER', 'MCP loopback self-check failed', {
|
||||
error: error instanceof Error ? error.message : String(error)
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
@@ -787,10 +727,11 @@ export class WorkerService {
|
||||
throw error;
|
||||
})
|
||||
.finally(async () => {
|
||||
// CRITICAL: Verify subprocess exit to prevent zombie accumulation (Issue #1168)
|
||||
const trackedProcess = getProcessBySession(session.sessionDbId);
|
||||
// Primary-path subprocess teardown — process-group kill ensures any
|
||||
// SDK descendants are reaped too (Principle 5).
|
||||
const trackedProcess = getSdkProcessForSession(session.sessionDbId);
|
||||
if (trackedProcess && trackedProcess.process.exitCode === null) {
|
||||
await ensureProcessExit(trackedProcess, 5000);
|
||||
await ensureSdkProcessExit(trackedProcess, 5000);
|
||||
}
|
||||
|
||||
session.generatorPromise = null;
|
||||
@@ -833,12 +774,14 @@ export class WorkerService {
|
||||
session.consecutiveRestarts = (session.consecutiveRestarts || 0) + 1; // Keep for logging
|
||||
|
||||
if (!restartAllowed) {
|
||||
logger.error('SYSTEM', 'Restart guard tripped: too many restarts in window, stopping to prevent runaway costs', {
|
||||
logger.error('SYSTEM', 'Restart guard tripped: session is dead, terminating', {
|
||||
sessionId: session.sessionDbId,
|
||||
pendingCount,
|
||||
restartsInWindow: session.restartGuard.restartsInWindow,
|
||||
windowMs: session.restartGuard.windowMs,
|
||||
maxRestarts: session.restartGuard.maxRestarts
|
||||
maxRestarts: session.restartGuard.maxRestarts,
|
||||
consecutiveFailures: session.restartGuard.consecutiveFailuresSinceSuccess,
|
||||
maxConsecutiveFailures: session.restartGuard.maxConsecutiveFailures
|
||||
});
|
||||
session.consecutiveRestarts = 0;
|
||||
this.terminateSession(session.sessionDbId, 'max_restarts_exceeded');
|
||||
@@ -856,26 +799,17 @@ export class WorkerService {
|
||||
this.startSessionProcessor(session, 'pending-work-restart');
|
||||
this.broadcastProcessingStatus();
|
||||
} else {
|
||||
// Successful completion with no pending work — clean up session.
|
||||
// Only remove from the in-memory map if finalize succeeds; otherwise
|
||||
// leave the session in place so the 60s orphan reaper (or a future
|
||||
// retry) can repair the inconsistency. Removing a still-"active" DB
|
||||
// row from memory would orphan it indefinitely under the new
|
||||
// fire-and-forget Stop hook (no /api/sessions/complete to retry).
|
||||
// Successful completion with no pending work — finalize then drop
|
||||
// in-memory state. finalizeSession flips sdk_sessions.status to
|
||||
// 'completed', drains orphaned pendings, broadcasts; idempotent so
|
||||
// the later POST /api/sessions/complete from the Stop hook is a
|
||||
// no-op. Without this, hooks-disabled installs (and any session
|
||||
// whose Stop hook fails before /api/sessions/complete) leave the
|
||||
// DB row permanently 'active'.
|
||||
session.restartGuard?.recordSuccess();
|
||||
session.consecutiveRestarts = 0;
|
||||
let finalized = false;
|
||||
try {
|
||||
this.sessionCompletionHandler.finalizeSession(session.sessionDbId);
|
||||
finalized = true;
|
||||
} catch (err) {
|
||||
logger.warn('SESSION', 'finalizeSession failed in WorkerService generator .finally()', {
|
||||
sessionId: session.sessionDbId
|
||||
}, err as Error);
|
||||
}
|
||||
if (finalized) {
|
||||
this.sessionManager.removeSessionImmediate(session.sessionDbId);
|
||||
}
|
||||
this.completionHandler.finalizeSession(session.sessionDbId);
|
||||
this.sessionManager.removeSessionImmediate(session.sessionDbId);
|
||||
}
|
||||
});
|
||||
}
|
||||
@@ -960,34 +894,12 @@ export class WorkerService {
|
||||
}
|
||||
}
|
||||
|
||||
// No fallback or both failed: mark messages abandoned and remove session so queue doesn't grow
|
||||
const pendingStore = this.sessionManager.getPendingMessageStore();
|
||||
const abandoned = pendingStore.markAllSessionMessagesAbandoned(sessionDbId);
|
||||
if (abandoned > 0) {
|
||||
logger.warn('SDK', 'No fallback available; marked pending messages abandoned', {
|
||||
sessionId: sessionDbId,
|
||||
abandoned
|
||||
});
|
||||
}
|
||||
// Finalize so DB status + broadcast + pending-drain are consistent on fallback failure.
|
||||
// finalizeSession already broadcasts session_completed, so we don't also call
|
||||
// broadcastSessionCompleted below. On finalize failure, fall back to the
|
||||
// explicit broadcast so the UI still gets the event and leave the session
|
||||
// in memory for the orphan reaper to retry.
|
||||
let finalized = false;
|
||||
try {
|
||||
this.sessionCompletionHandler.finalizeSession(sessionDbId);
|
||||
finalized = true;
|
||||
} catch (err) {
|
||||
logger.warn('SESSION', 'finalizeSession failed in runFallbackForTerminatedSession', {
|
||||
sessionId: sessionDbId
|
||||
}, err as Error);
|
||||
}
|
||||
if (finalized) {
|
||||
this.sessionManager.removeSessionImmediate(sessionDbId);
|
||||
} else {
|
||||
this.sessionEventBroadcaster.broadcastSessionCompleted(sessionDbId);
|
||||
}
|
||||
// No fallback or both failed: mark session completed in DB (drain pending
|
||||
// + broadcast via finalizeSession, idempotent) then drop in-memory state.
|
||||
// Without this, sdk_sessions.status stays 'active' forever — the deleted
|
||||
// reapStaleSessions interval was the only prior backstop.
|
||||
this.completionHandler.finalizeSession(sessionDbId);
|
||||
this.sessionManager.removeSessionImmediate(sessionDbId);
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -1001,34 +913,15 @@ export class WorkerService {
|
||||
* no? → terminateSession()
|
||||
*/
|
||||
private terminateSession(sessionDbId: number, reason: string): void {
|
||||
const pendingStore = this.sessionManager.getPendingMessageStore();
|
||||
const abandoned = pendingStore.markAllSessionMessagesAbandoned(sessionDbId);
|
||||
logger.info('SYSTEM', 'Session terminated', { sessionId: sessionDbId, reason });
|
||||
|
||||
logger.info('SYSTEM', 'Session terminated', {
|
||||
sessionId: sessionDbId,
|
||||
reason,
|
||||
abandonedMessages: abandoned
|
||||
});
|
||||
// finalizeSession marks sdk_sessions.status='completed', drains pending
|
||||
// messages, and broadcasts. Idempotent. Without this, wall-clock-limited
|
||||
// and unrecoverable-error paths leave DB rows as 'active' forever.
|
||||
this.completionHandler.finalizeSession(sessionDbId);
|
||||
|
||||
// Finalize session (mark completed in DB + drain pending + broadcast). Idempotent.
|
||||
// This runs AFTER startSession() has returned, which means any summary/observation
|
||||
// writes inside processAgentResponse() are already committed to SQLite synchronously.
|
||||
// Only remove from the in-memory map if finalize succeeds; otherwise leave the
|
||||
// session in place so the 60s orphan reaper can repair the DB inconsistency.
|
||||
let finalized = false;
|
||||
try {
|
||||
this.sessionCompletionHandler.finalizeSession(sessionDbId);
|
||||
finalized = true;
|
||||
} catch (err) {
|
||||
logger.warn('SESSION', 'finalizeSession failed during terminateSession', {
|
||||
sessionId: sessionDbId, reason
|
||||
}, err as Error);
|
||||
}
|
||||
|
||||
if (finalized) {
|
||||
// removeSessionImmediate fires onSessionDeletedCallback → broadcastProcessingStatus()
|
||||
this.sessionManager.removeSessionImmediate(sessionDbId);
|
||||
}
|
||||
// removeSessionImmediate fires onSessionDeletedCallback → broadcastProcessingStatus()
|
||||
this.sessionManager.removeSessionImmediate(sessionDbId);
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -1154,18 +1047,6 @@ export class WorkerService {
|
||||
logger.info('TRANSCRIPT', 'Transcript watcher stopped');
|
||||
}
|
||||
|
||||
// Stop orphan reaper before shutdown (Issue #737)
|
||||
if (this.stopOrphanReaper) {
|
||||
this.stopOrphanReaper();
|
||||
this.stopOrphanReaper = null;
|
||||
}
|
||||
|
||||
// Stop stale session reaper (Issue #1168)
|
||||
if (this.staleSessionReaperInterval) {
|
||||
clearInterval(this.staleSessionReaperInterval);
|
||||
this.staleSessionReaperInterval = null;
|
||||
}
|
||||
|
||||
await performGracefulShutdown({
|
||||
server: this.server.getHttpServer(),
|
||||
sessionManager: this.sessionManager,
|
||||
|
||||
@@ -48,9 +48,6 @@ export interface ActiveSession {
|
||||
// Track whether the most recent storage operation persisted a summary record.
|
||||
// Used by the status endpoint so the Stop hook can detect silent summary loss (#1633).
|
||||
lastSummaryStored?: boolean;
|
||||
// Circuit breaker: track consecutive summary failures to prevent infinite retry loops (#1633).
|
||||
// When this reaches MAX_CONSECUTIVE_SUMMARY_FAILURES, further summarize requests are skipped.
|
||||
consecutiveSummaryFailures: number;
|
||||
// Subagent identity carried forward from the most recent claimed pending message.
|
||||
// When observations are parsed and stored, these fields label the resulting rows
|
||||
// so subagent work is attributable. NULL / undefined means the batch came from the main session.
|
||||
@@ -69,6 +66,9 @@ export interface PendingMessage {
|
||||
// Claude Code subagent identity — present only when the hook fired inside a subagent.
|
||||
agentId?: string;
|
||||
agentType?: string;
|
||||
/** Provider-assigned tool-use id; underpins the
|
||||
* UNIQUE(content_session_id, tool_use_id) idempotency index added in plan 01. */
|
||||
toolUseId?: string;
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -90,6 +90,8 @@ export interface ObservationData {
|
||||
// Claude Code subagent identity — present only when the hook fired inside a subagent.
|
||||
agentId?: string;
|
||||
agentType?: string;
|
||||
/** Provider-assigned tool-use id (plan 03 phase 6 idempotency key). */
|
||||
toolUseId?: string;
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
|
||||
@@ -8,15 +8,17 @@
|
||||
* - ChromaSync integration
|
||||
*/
|
||||
|
||||
import { Database } from 'bun:sqlite';
|
||||
import { SessionStore } from '../sqlite/SessionStore.js';
|
||||
import { SessionSearch } from '../sqlite/SessionSearch.js';
|
||||
import { ChromaSync } from '../sync/ChromaSync.js';
|
||||
import { SettingsDefaultsManager } from '../../shared/SettingsDefaultsManager.js';
|
||||
import { USER_SETTINGS_PATH } from '../../shared/paths.js';
|
||||
import { USER_SETTINGS_PATH, DB_PATH } from '../../shared/paths.js';
|
||||
import { logger } from '../../utils/logger.js';
|
||||
import type { DBSession } from '../worker-types.js';
|
||||
|
||||
export class DatabaseManager {
|
||||
private db: Database | null = null;
|
||||
private sessionStore: SessionStore | null = null;
|
||||
private sessionSearch: SessionSearch | null = null;
|
||||
private chromaSync: ChromaSync | null = null;
|
||||
@@ -26,8 +28,11 @@ export class DatabaseManager {
|
||||
*/
|
||||
async initialize(): Promise<void> {
|
||||
// Open database connection (ONCE)
|
||||
this.sessionStore = new SessionStore();
|
||||
this.sessionSearch = new SessionSearch();
|
||||
this.db = new Database(DB_PATH);
|
||||
|
||||
// Shared connection between store and search
|
||||
this.sessionStore = new SessionStore(this.db);
|
||||
this.sessionSearch = new SessionSearch(this.db);
|
||||
|
||||
// Initialize ChromaSync only if Chroma is enabled (SQLite-only fallback when disabled)
|
||||
const settings = SettingsDefaultsManager.loadFromFile(USER_SETTINGS_PATH);
|
||||
@@ -38,7 +43,7 @@ export class DatabaseManager {
|
||||
logger.info('DB', 'Chroma disabled via CLAUDE_MEM_CHROMA_ENABLED=false, using SQLite-only search');
|
||||
}
|
||||
|
||||
logger.info('DB', 'Database initialized');
|
||||
logger.info('DB', 'Database initialized (shared connection)');
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -51,13 +56,14 @@ export class DatabaseManager {
|
||||
this.chromaSync = null;
|
||||
}
|
||||
|
||||
if (this.sessionStore) {
|
||||
this.sessionStore.close();
|
||||
this.sessionStore = null;
|
||||
}
|
||||
if (this.sessionSearch) {
|
||||
this.sessionSearch.close();
|
||||
this.sessionSearch = null;
|
||||
// We don't call sessionStore.close() or sessionSearch.close()
|
||||
// because they share this.db which we close below.
|
||||
this.sessionStore = null;
|
||||
this.sessionSearch = null;
|
||||
|
||||
if (this.db) {
|
||||
this.db.close();
|
||||
this.db = null;
|
||||
}
|
||||
logger.info('DB', 'Database closed');
|
||||
}
|
||||
@@ -89,10 +95,6 @@ export class DatabaseManager {
|
||||
return this.chromaSync;
|
||||
}
|
||||
|
||||
// REMOVED: cleanupOrphanedSessions - violates "EVERYTHING SHOULD SAVE ALWAYS"
|
||||
// Worker restarts don't make sessions orphaned. Sessions are managed by hooks
|
||||
// and exist independently of worker state.
|
||||
|
||||
/**
|
||||
* Get session by ID (throws if not found)
|
||||
*/
|
||||
|
||||
@@ -7,6 +7,7 @@
|
||||
* - Efficient LIMIT+1 trick to avoid COUNT(*) query
|
||||
*/
|
||||
|
||||
import type { SQLQueryBindings } from 'bun:sqlite';
|
||||
import { DatabaseManager } from './DatabaseManager.js';
|
||||
import { logger } from '../../utils/logger.js';
|
||||
import { OBSERVER_SESSIONS_PROJECT } from '../../shared/paths.js';
|
||||
@@ -102,7 +103,7 @@ export class PaginationHelper {
|
||||
FROM observations o
|
||||
LEFT JOIN sdk_sessions s ON o.memory_session_id = s.memory_session_id
|
||||
`;
|
||||
const params: unknown[] = [];
|
||||
const params: SQLQueryBindings[] = [];
|
||||
const conditions: string[] = [];
|
||||
|
||||
if (project) {
|
||||
|
||||
@@ -1,527 +0,0 @@
|
||||
/**
|
||||
* ProcessRegistry: Track spawned Claude subprocesses
|
||||
*
|
||||
* Fixes Issue #737: Claude haiku subprocesses don't terminate properly,
|
||||
* causing zombie process accumulation (user reported 155 processes / 51GB RAM).
|
||||
*
|
||||
* Root causes:
|
||||
* 1. SDK's SpawnedProcess interface hides subprocess PIDs
|
||||
* 2. deleteSession() doesn't verify subprocess exit before cleanup
|
||||
* 3. abort() is fire-and-forget with no confirmation
|
||||
*
|
||||
* Solution:
|
||||
* - Use SDK's spawnClaudeCodeProcess option to capture PIDs
|
||||
* - Track all spawned processes with session association
|
||||
* - Verify exit on session deletion with timeout + SIGKILL escalation
|
||||
* - Safety net orphan reaper runs every 5 minutes
|
||||
*/
|
||||
|
||||
import { spawn, exec, ChildProcess } from 'child_process';
|
||||
import { promisify } from 'util';
|
||||
import { logger } from '../../utils/logger.js';
|
||||
import { sanitizeEnv } from '../../supervisor/env-sanitizer.js';
|
||||
import { getSupervisor } from '../../supervisor/index.js';
|
||||
|
||||
const execAsync = promisify(exec);
|
||||
|
||||
interface TrackedProcess {
|
||||
pid: number;
|
||||
sessionDbId: number;
|
||||
spawnedAt: number;
|
||||
process: ChildProcess;
|
||||
}
|
||||
|
||||
function getTrackedProcesses(): TrackedProcess[] {
|
||||
return getSupervisor().getRegistry()
|
||||
.getAll()
|
||||
.filter(record => record.type === 'sdk')
|
||||
.map((record) => {
|
||||
const processRef = getSupervisor().getRegistry().getRuntimeProcess(record.id);
|
||||
if (!processRef) {
|
||||
return null;
|
||||
}
|
||||
|
||||
return {
|
||||
pid: record.pid,
|
||||
sessionDbId: Number(record.sessionId),
|
||||
spawnedAt: Date.parse(record.startedAt),
|
||||
process: processRef
|
||||
};
|
||||
})
|
||||
.filter((value): value is TrackedProcess => value !== null);
|
||||
}
|
||||
|
||||
/**
|
||||
* Register a spawned process in the registry
|
||||
*/
|
||||
export function registerProcess(pid: number, sessionDbId: number, process: ChildProcess): void {
|
||||
getSupervisor().registerProcess(`sdk:${sessionDbId}:${pid}`, {
|
||||
pid,
|
||||
type: 'sdk',
|
||||
sessionId: sessionDbId,
|
||||
startedAt: new Date().toISOString()
|
||||
}, process);
|
||||
logger.info('PROCESS', `Registered PID ${pid} for session ${sessionDbId}`, { pid, sessionDbId });
|
||||
}
|
||||
|
||||
/**
|
||||
* Unregister a process from the registry and notify pool waiters
|
||||
*/
|
||||
export function unregisterProcess(pid: number): void {
|
||||
for (const record of getSupervisor().getRegistry().getByPid(pid)) {
|
||||
if (record.type === 'sdk') {
|
||||
getSupervisor().unregisterProcess(record.id);
|
||||
}
|
||||
}
|
||||
logger.debug('PROCESS', `Unregistered PID ${pid}`, { pid });
|
||||
// Notify waiters that a pool slot may be available
|
||||
notifySlotAvailable();
|
||||
}
|
||||
|
||||
/**
|
||||
* Get process info by session ID
|
||||
* Warns if multiple processes found (indicates race condition)
|
||||
*/
|
||||
export function getProcessBySession(sessionDbId: number): TrackedProcess | undefined {
|
||||
const matches = getTrackedProcesses().filter(info => info.sessionDbId === sessionDbId);
|
||||
if (matches.length > 1) {
|
||||
logger.warn('PROCESS', `Multiple processes found for session ${sessionDbId}`, {
|
||||
count: matches.length,
|
||||
pids: matches.map(m => m.pid)
|
||||
});
|
||||
}
|
||||
return matches[0];
|
||||
}
|
||||
|
||||
/**
|
||||
* Get count of active processes in the registry
|
||||
*/
|
||||
export function getActiveCount(): number {
|
||||
return getSupervisor().getRegistry().getAll().filter(record => record.type === 'sdk').length;
|
||||
}
|
||||
|
||||
// Waiters for pool slots - resolved when a process exits and frees a slot
|
||||
const slotWaiters: Array<() => void> = [];
|
||||
|
||||
/**
|
||||
* Notify waiters that a slot has freed up
|
||||
*/
|
||||
function notifySlotAvailable(): void {
|
||||
const waiter = slotWaiters.shift();
|
||||
if (waiter) waiter();
|
||||
}
|
||||
|
||||
/**
|
||||
* Wait for a pool slot to become available (promise-based, not polling)
|
||||
* @param maxConcurrent Max number of concurrent agents
|
||||
* @param timeoutMs Max time to wait before giving up
|
||||
* @param evictIdleSession Optional callback to evict an idle session when all slots are full (#1868)
|
||||
*/
|
||||
const TOTAL_PROCESS_HARD_CAP = 10;
|
||||
|
||||
export async function waitForSlot(
|
||||
maxConcurrent: number,
|
||||
timeoutMs: number = 60_000,
|
||||
evictIdleSession?: () => boolean
|
||||
): Promise<void> {
|
||||
// Hard cap: refuse to spawn if too many processes exist regardless of pool accounting
|
||||
const activeCount = getActiveCount();
|
||||
if (activeCount >= TOTAL_PROCESS_HARD_CAP) {
|
||||
throw new Error(`Hard cap exceeded: ${activeCount} processes in registry (cap=${TOTAL_PROCESS_HARD_CAP}). Refusing to spawn more.`);
|
||||
}
|
||||
|
||||
if (activeCount < maxConcurrent) return;
|
||||
|
||||
// Try to evict an idle session before waiting (#1868)
|
||||
// Idle sessions hold pool slots during their 3-min idle timeout, blocking new sessions
|
||||
// that would timeout after 60s. Eviction aborts the idle session asynchronously —
|
||||
// the freed slot is picked up by the waiter mechanism below.
|
||||
if (evictIdleSession) {
|
||||
const evicted = evictIdleSession();
|
||||
if (evicted) {
|
||||
logger.info('PROCESS', 'Evicted idle session to free pool slot for waiting request');
|
||||
}
|
||||
}
|
||||
|
||||
logger.info('PROCESS', `Pool limit reached (${activeCount}/${maxConcurrent}), waiting for slot...`);
|
||||
|
||||
return new Promise<void>((resolve, reject) => {
|
||||
const timeout = setTimeout(() => {
|
||||
const idx = slotWaiters.indexOf(onSlot);
|
||||
if (idx >= 0) slotWaiters.splice(idx, 1);
|
||||
reject(new Error(`Timed out waiting for agent pool slot after ${timeoutMs}ms`));
|
||||
}, timeoutMs);
|
||||
|
||||
const onSlot = () => {
|
||||
clearTimeout(timeout);
|
||||
if (getActiveCount() < maxConcurrent) {
|
||||
resolve();
|
||||
} else {
|
||||
// Still full, re-queue
|
||||
slotWaiters.push(onSlot);
|
||||
}
|
||||
};
|
||||
|
||||
slotWaiters.push(onSlot);
|
||||
});
|
||||
}
|
||||
|
||||
/**
|
||||
* Get all active PIDs (for debugging)
|
||||
*/
|
||||
export function getActiveProcesses(): Array<{ pid: number; sessionDbId: number; ageMs: number }> {
|
||||
const now = Date.now();
|
||||
return getTrackedProcesses().map(info => ({
|
||||
pid: info.pid,
|
||||
sessionDbId: info.sessionDbId,
|
||||
ageMs: now - info.spawnedAt
|
||||
}));
|
||||
}
|
||||
|
||||
/**
|
||||
* Wait for a process to exit with timeout, escalating to SIGKILL if needed
|
||||
* Uses event-based waiting instead of polling to avoid CPU overhead
|
||||
*/
|
||||
export async function ensureProcessExit(tracked: TrackedProcess, timeoutMs: number = 5000): Promise<void> {
|
||||
const { pid, process: proc } = tracked;
|
||||
|
||||
// Already exited? Only trust exitCode, NOT proc.killed
|
||||
// proc.killed only means Node sent a signal — the process can still be alive
|
||||
if (proc.exitCode !== null) {
|
||||
unregisterProcess(pid);
|
||||
return;
|
||||
}
|
||||
|
||||
// Wait for graceful exit with timeout using event-based approach
|
||||
const exitPromise = new Promise<void>((resolve) => {
|
||||
proc.once('exit', () => resolve());
|
||||
});
|
||||
|
||||
const timeoutPromise = new Promise<void>((resolve) => {
|
||||
setTimeout(resolve, timeoutMs);
|
||||
});
|
||||
|
||||
await Promise.race([exitPromise, timeoutPromise]);
|
||||
|
||||
// Check if exited gracefully — only trust exitCode
|
||||
if (proc.exitCode !== null) {
|
||||
unregisterProcess(pid);
|
||||
return;
|
||||
}
|
||||
|
||||
// Timeout: escalate to SIGKILL
|
||||
logger.warn('PROCESS', `PID ${pid} did not exit after ${timeoutMs}ms, sending SIGKILL`, { pid, timeoutMs });
|
||||
try {
|
||||
proc.kill('SIGKILL');
|
||||
} catch {
|
||||
// Already dead
|
||||
}
|
||||
|
||||
// Wait for SIGKILL to take effect — use exit event with 1s timeout instead of blind sleep
|
||||
const sigkillExitPromise = new Promise<void>((resolve) => {
|
||||
proc.once('exit', () => resolve());
|
||||
});
|
||||
const sigkillTimeout = new Promise<void>((resolve) => {
|
||||
setTimeout(resolve, 1000);
|
||||
});
|
||||
await Promise.race([sigkillExitPromise, sigkillTimeout]);
|
||||
unregisterProcess(pid);
|
||||
}
|
||||
|
||||
/**
|
||||
* Kill idle daemon children (claude processes spawned by worker-service)
|
||||
*
|
||||
* These are SDK-spawned claude processes that completed their work but
|
||||
* didn't terminate properly. They remain as children of the worker-service
|
||||
* daemon, consuming memory without doing useful work.
|
||||
*
|
||||
* Criteria for cleanup:
|
||||
* - Process name is "claude"
|
||||
* - Parent PID is the worker-service daemon (this process)
|
||||
* - Process has 0% CPU (idle)
|
||||
* - Process has been running for more than 2 minutes
|
||||
*/
|
||||
async function killIdleDaemonChildren(): Promise<number> {
|
||||
if (process.platform === 'win32') {
|
||||
// Windows: Different process model, skip for now
|
||||
return 0;
|
||||
}
|
||||
|
||||
const daemonPid = process.pid;
|
||||
let killed = 0;
|
||||
|
||||
try {
|
||||
const { stdout } = await execAsync(
|
||||
'ps -eo pid,ppid,%cpu,etime,comm 2>/dev/null | grep "claude$" || true'
|
||||
);
|
||||
|
||||
for (const line of stdout.trim().split('\n')) {
|
||||
if (!line) continue;
|
||||
|
||||
const parts = line.trim().split(/\s+/);
|
||||
if (parts.length < 5) continue;
|
||||
|
||||
const [pidStr, ppidStr, cpuStr, etime] = parts;
|
||||
const pid = parseInt(pidStr, 10);
|
||||
const ppid = parseInt(ppidStr, 10);
|
||||
const cpu = parseFloat(cpuStr);
|
||||
|
||||
// Skip if not a child of this daemon
|
||||
if (ppid !== daemonPid) continue;
|
||||
|
||||
// Skip if actively using CPU
|
||||
if (cpu > 0) continue;
|
||||
|
||||
// Parse elapsed time to minutes
|
||||
// Formats: MM:SS, HH:MM:SS, D-HH:MM:SS
|
||||
let minutes = 0;
|
||||
const dayMatch = etime.match(/^(\d+)-(\d+):(\d+):(\d+)$/);
|
||||
const hourMatch = etime.match(/^(\d+):(\d+):(\d+)$/);
|
||||
const minMatch = etime.match(/^(\d+):(\d+)$/);
|
||||
|
||||
if (dayMatch) {
|
||||
minutes = parseInt(dayMatch[1], 10) * 24 * 60 +
|
||||
parseInt(dayMatch[2], 10) * 60 +
|
||||
parseInt(dayMatch[3], 10);
|
||||
} else if (hourMatch) {
|
||||
minutes = parseInt(hourMatch[1], 10) * 60 +
|
||||
parseInt(hourMatch[2], 10);
|
||||
} else if (minMatch) {
|
||||
minutes = parseInt(minMatch[1], 10);
|
||||
}
|
||||
|
||||
// Kill if idle for more than 1 minute
|
||||
if (minutes >= 1) {
|
||||
logger.info('PROCESS', `Killing idle daemon child PID ${pid} (idle ${minutes}m)`, { pid, minutes });
|
||||
try {
|
||||
process.kill(pid, 'SIGKILL');
|
||||
killed++;
|
||||
} catch {
|
||||
// Already dead or permission denied
|
||||
}
|
||||
}
|
||||
}
|
||||
} catch {
|
||||
// No matches or command error
|
||||
}
|
||||
|
||||
return killed;
|
||||
}
|
||||
|
||||
/**
|
||||
* Kill system-level orphans (ppid=1 on Unix)
|
||||
* These are Claude processes whose parent died unexpectedly
|
||||
*/
|
||||
async function killSystemOrphans(): Promise<number> {
|
||||
if (process.platform === 'win32') {
|
||||
return 0; // Windows doesn't have ppid=1 orphan concept
|
||||
}
|
||||
|
||||
try {
|
||||
const { stdout } = await execAsync(
|
||||
'ps -eo pid,ppid,args 2>/dev/null | grep -E "claude.*haiku|claude.*output-format" | grep -v grep'
|
||||
);
|
||||
|
||||
let killed = 0;
|
||||
for (const line of stdout.trim().split('\n')) {
|
||||
if (!line) continue;
|
||||
const match = line.trim().match(/^(\d+)\s+(\d+)/);
|
||||
if (match && parseInt(match[2]) === 1) { // ppid=1 = orphan
|
||||
const orphanPid = parseInt(match[1]);
|
||||
logger.warn('PROCESS', `Killing system orphan PID ${orphanPid}`, { pid: orphanPid });
|
||||
try {
|
||||
process.kill(orphanPid, 'SIGKILL');
|
||||
killed++;
|
||||
} catch {
|
||||
// Already dead or permission denied
|
||||
}
|
||||
}
|
||||
}
|
||||
return killed;
|
||||
} catch {
|
||||
return 0; // No matches or error
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Reap orphaned processes - both registry-tracked and system-level
|
||||
*/
|
||||
export async function reapOrphanedProcesses(activeSessionIds: Set<number>): Promise<number> {
|
||||
let killed = 0;
|
||||
|
||||
// Registry-based: kill processes for dead sessions
|
||||
for (const record of getSupervisor().getRegistry().getAll().filter(entry => entry.type === 'sdk')) {
|
||||
const pid = record.pid;
|
||||
const sessionDbId = Number(record.sessionId);
|
||||
const processRef = getSupervisor().getRegistry().getRuntimeProcess(record.id);
|
||||
|
||||
if (activeSessionIds.has(sessionDbId)) continue; // Active = safe
|
||||
|
||||
logger.warn('PROCESS', `Killing orphan PID ${pid} (session ${sessionDbId} gone)`, { pid, sessionDbId });
|
||||
try {
|
||||
if (processRef) {
|
||||
processRef.kill('SIGKILL');
|
||||
} else {
|
||||
process.kill(pid, 'SIGKILL');
|
||||
}
|
||||
killed++;
|
||||
} catch {
|
||||
// Already dead
|
||||
}
|
||||
getSupervisor().unregisterProcess(record.id);
|
||||
notifySlotAvailable();
|
||||
}
|
||||
|
||||
// System-level: find ppid=1 orphans
|
||||
killed += await killSystemOrphans();
|
||||
|
||||
// Daemon children: find idle SDK processes that didn't terminate
|
||||
killed += await killIdleDaemonChildren();
|
||||
|
||||
return killed;
|
||||
}
|
||||
|
||||
/**
|
||||
* Create a custom spawn function for SDK that captures PIDs
|
||||
*
|
||||
* The SDK's spawnClaudeCodeProcess option allows us to intercept subprocess
|
||||
* creation and capture the PID before the SDK hides it.
|
||||
*
|
||||
* NOTE: Session isolation is handled via the `cwd` option in SDKAgent.ts,
|
||||
* NOT via CLAUDE_CONFIG_DIR (which breaks authentication).
|
||||
*/
|
||||
export function createPidCapturingSpawn(sessionDbId: number) {
|
||||
return (spawnOptions: {
|
||||
command: string;
|
||||
args: string[];
|
||||
cwd?: string;
|
||||
env?: NodeJS.ProcessEnv;
|
||||
signal?: AbortSignal;
|
||||
}) => {
|
||||
// Kill any existing process for this session before spawning a new one.
|
||||
// Multiple processes sharing the same --resume UUID waste API credits and
|
||||
// can conflict with each other (Issue #1590).
|
||||
const existing = getProcessBySession(sessionDbId);
|
||||
if (existing && existing.process.exitCode === null) {
|
||||
logger.warn('PROCESS', `Killing duplicate process PID ${existing.pid} before spawning new one for session ${sessionDbId}`, {
|
||||
existingPid: existing.pid,
|
||||
sessionDbId
|
||||
});
|
||||
let exited = false;
|
||||
try {
|
||||
existing.process.kill('SIGTERM');
|
||||
exited = existing.process.exitCode !== null;
|
||||
} catch (error: unknown) {
|
||||
// Already dead — safe to unregister immediately
|
||||
if (error instanceof Error) {
|
||||
logger.warn('WORKER', `Failed to kill duplicate process PID ${existing.pid}, likely already dead`, { existingPid: existing.pid, sessionDbId }, error);
|
||||
}
|
||||
exited = true;
|
||||
}
|
||||
|
||||
if (exited) {
|
||||
unregisterProcess(existing.pid);
|
||||
}
|
||||
// If still alive, the 'exit' handler (line ~440) will unregister it.
|
||||
}
|
||||
|
||||
getSupervisor().assertCanSpawn('claude sdk');
|
||||
|
||||
// On Windows, use cmd.exe wrapper for .cmd files to properly handle paths with spaces
|
||||
const useCmdWrapper = process.platform === 'win32' && spawnOptions.command.endsWith('.cmd');
|
||||
const env = sanitizeEnv(spawnOptions.env ?? process.env);
|
||||
|
||||
// Filter empty string args AND their preceding flag (Issue #2049).
|
||||
// The Agent SDK emits ["--setting-sources", ""] when settingSources defaults to [].
|
||||
// Simply dropping "" leaves an orphan --setting-sources that consumes the next
|
||||
// flag (e.g. --permission-mode) as its value, crashing Claude Code 2.1.109+ with
|
||||
// "Invalid setting source: --permission-mode". Drop the flag too so the SDK
|
||||
// default (no setting sources) is preserved by omission.
|
||||
const args: string[] = [];
|
||||
for (const arg of spawnOptions.args) {
|
||||
if (arg === '') {
|
||||
if (args.length > 0 && args[args.length - 1].startsWith('--')) {
|
||||
args.pop();
|
||||
}
|
||||
continue;
|
||||
}
|
||||
args.push(arg);
|
||||
}
|
||||
|
||||
const child = useCmdWrapper
|
||||
? spawn('cmd.exe', ['/d', '/c', spawnOptions.command, ...args], {
|
||||
cwd: spawnOptions.cwd,
|
||||
env,
|
||||
stdio: ['pipe', 'pipe', 'pipe'],
|
||||
signal: spawnOptions.signal,
|
||||
windowsHide: true
|
||||
})
|
||||
: spawn(spawnOptions.command, args, {
|
||||
cwd: spawnOptions.cwd,
|
||||
env,
|
||||
stdio: ['pipe', 'pipe', 'pipe'],
|
||||
signal: spawnOptions.signal, // CRITICAL: Pass signal for AbortController integration
|
||||
windowsHide: true
|
||||
});
|
||||
|
||||
// Capture stderr for debugging spawn failures
|
||||
if (child.stderr) {
|
||||
child.stderr.on('data', (data: Buffer) => {
|
||||
logger.debug('SDK_SPAWN', `[session-${sessionDbId}] stderr: ${data.toString().trim()}`);
|
||||
});
|
||||
}
|
||||
|
||||
// Register PID
|
||||
if (child.pid) {
|
||||
registerProcess(child.pid, sessionDbId, child);
|
||||
|
||||
// Auto-unregister on exit
|
||||
child.on('exit', (code: number | null, signal: string | null) => {
|
||||
if (code !== 0) {
|
||||
logger.warn('SDK_SPAWN', `[session-${sessionDbId}] Claude process exited`, { code, signal, pid: child.pid });
|
||||
}
|
||||
if (child.pid) {
|
||||
unregisterProcess(child.pid);
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
// Return SDK-compatible interface
|
||||
return {
|
||||
stdin: child.stdin,
|
||||
stdout: child.stdout,
|
||||
stderr: child.stderr,
|
||||
get killed() { return child.killed; },
|
||||
get exitCode() { return child.exitCode; },
|
||||
kill: child.kill.bind(child),
|
||||
on: child.on.bind(child),
|
||||
once: child.once.bind(child),
|
||||
off: child.off.bind(child)
|
||||
};
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Start the orphan reaper interval
|
||||
* Returns cleanup function to stop the interval
|
||||
*/
|
||||
export function startOrphanReaper(getActiveSessionIds: () => Set<number>, intervalMs: number = 30 * 1000): () => void {
|
||||
const interval = setInterval(async () => {
|
||||
try {
|
||||
const activeIds = getActiveSessionIds();
|
||||
const killed = await reapOrphanedProcesses(activeIds);
|
||||
if (killed > 0) {
|
||||
logger.info('PROCESS', `Reaper cleaned up ${killed} orphaned processes`, { killed });
|
||||
}
|
||||
} catch (error) {
|
||||
if (error instanceof Error) {
|
||||
logger.error('WORKER', 'Reaper error', {}, error);
|
||||
} else {
|
||||
logger.error('WORKER', 'Reaper error', { rawError: String(error) });
|
||||
}
|
||||
}
|
||||
}, intervalMs);
|
||||
|
||||
// Return cleanup function
|
||||
return () => clearInterval(interval);
|
||||
}
|
||||
@@ -3,15 +3,26 @@
|
||||
* Prevents tight-loop restarts (bug) while allowing legitimate occasional restarts
|
||||
* over long sessions. Replaces the flat consecutiveRestarts counter that stranded
|
||||
* pending messages after just 3 restarts over any timeframe (#2053).
|
||||
*
|
||||
* TWO INDEPENDENT TRIPS:
|
||||
* 1. Sliding window: more than MAX_WINDOWED_RESTARTS within RESTART_WINDOW_MS.
|
||||
* Catches genuinely tight loops (e.g. crash every <6s).
|
||||
* 2. Consecutive failures: more than MAX_CONSECUTIVE_FAILURES restarts with
|
||||
* NO successful processing in between. Catches dead sessions that
|
||||
* fail-restart-fail-restart on a slow exponential backoff cadence
|
||||
* (e.g. 8s backoff cap + spawn failures = restartsInWindow stays under
|
||||
* the windowed cap forever, but the session is clearly dead).
|
||||
*/
|
||||
|
||||
const RESTART_WINDOW_MS = 60_000; // Only count restarts within last 60 seconds
|
||||
const MAX_WINDOWED_RESTARTS = 10; // 10 restarts in 60s = runaway loop
|
||||
const MAX_CONSECUTIVE_FAILURES = 5; // 5 restarts with no success in between = session is dead
|
||||
const DECAY_AFTER_SUCCESS_MS = 5 * 60_000; // Clear history after 5min of uninterrupted success
|
||||
|
||||
export class RestartGuard {
|
||||
private restartTimestamps: number[] = [];
|
||||
private lastSuccessfulProcessing: number | null = null;
|
||||
private consecutiveFailures: number = 0;
|
||||
|
||||
/**
|
||||
* Record a restart and check if the guard should trip.
|
||||
@@ -34,16 +45,23 @@ export class RestartGuard {
|
||||
|
||||
// Record this restart
|
||||
this.restartTimestamps.push(now);
|
||||
this.consecutiveFailures += 1;
|
||||
|
||||
// Check if we've exceeded the cap within the window
|
||||
return this.restartTimestamps.length <= MAX_WINDOWED_RESTARTS;
|
||||
// Trip if EITHER guard exceeds its limit:
|
||||
// - Sliding window cap (tight loops)
|
||||
// - Consecutive failures with no successful work (dead session, e.g. spawn always fails)
|
||||
const withinWindowedCap = this.restartTimestamps.length <= MAX_WINDOWED_RESTARTS;
|
||||
const withinConsecutiveCap = this.consecutiveFailures <= MAX_CONSECUTIVE_FAILURES;
|
||||
return withinWindowedCap && withinConsecutiveCap;
|
||||
}
|
||||
|
||||
/**
|
||||
* Call when a message is successfully processed to update the success timestamp.
|
||||
* Resets the consecutive-failure counter (real progress was made).
|
||||
*/
|
||||
recordSuccess(): void {
|
||||
this.lastSuccessfulProcessing = Date.now();
|
||||
this.consecutiveFailures = 0;
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -67,4 +85,18 @@ export class RestartGuard {
|
||||
get maxRestarts(): number {
|
||||
return MAX_WINDOWED_RESTARTS;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get consecutive failures since last successful processing (for logging).
|
||||
*/
|
||||
get consecutiveFailuresSinceSuccess(): number {
|
||||
return this.consecutiveFailures;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get the max allowed consecutive failures (for logging).
|
||||
*/
|
||||
get maxConsecutiveFailures(): number {
|
||||
return MAX_CONSECUTIVE_FAILURES;
|
||||
}
|
||||
}
|
||||
|
||||
@@ -21,7 +21,12 @@ import { buildIsolatedEnv, getAuthMethodDescription } from '../../shared/EnvMana
|
||||
import type { ActiveSession, SDKUserMessage } from '../worker-types.js';
|
||||
import { ModeManager } from '../domain/ModeManager.js';
|
||||
import { processAgentResponse, type WorkerRef } from './agents/index.js';
|
||||
import { createPidCapturingSpawn, getProcessBySession, ensureProcessExit, waitForSlot } from './ProcessRegistry.js';
|
||||
import {
|
||||
createSdkSpawnFactory,
|
||||
getSdkProcessForSession,
|
||||
ensureSdkProcessExit,
|
||||
waitForSlot,
|
||||
} from '../../supervisor/process-registry.js';
|
||||
import { sanitizeEnv } from '../../supervisor/env-sanitizer.js';
|
||||
|
||||
// Import Agent SDK (assumes it's installed)
|
||||
@@ -90,11 +95,11 @@ export class SDKAgent {
|
||||
}
|
||||
|
||||
// Wait for agent pool slot (configurable via CLAUDE_MEM_MAX_CONCURRENT_AGENTS)
|
||||
// Pass idle session eviction callback to prevent pool deadlock (#1868):
|
||||
// idle sessions hold slots during 3-min idle wait, blocking new sessions
|
||||
// Backpressure only — a full pool waits, never evicts a live session
|
||||
// (Principle 1: do not kick live work to make room).
|
||||
const settings = SettingsDefaultsManager.loadFromFile(USER_SETTINGS_PATH);
|
||||
const maxConcurrent = parseInt(settings.CLAUDE_MEM_MAX_CONCURRENT_AGENTS, 10) || 2;
|
||||
await waitForSlot(maxConcurrent, 60_000, () => this.sessionManager.evictIdlestSession());
|
||||
await waitForSlot(maxConcurrent, 60_000);
|
||||
|
||||
// Build isolated environment from ~/.claude-mem/.env
|
||||
// This prevents Issue #733: random ANTHROPIC_API_KEY from project .env files
|
||||
@@ -105,7 +110,7 @@ export class SDKAgent {
|
||||
logger.info('SDK', 'Starting SDK query', {
|
||||
sessionDbId: session.sessionDbId,
|
||||
contentSessionId: session.contentSessionId,
|
||||
memorySessionId: session.memorySessionId,
|
||||
memorySessionId: session.memorySessionId ?? undefined,
|
||||
hasRealMemorySessionId,
|
||||
shouldResume,
|
||||
resume_parameter: shouldResume ? session.memorySessionId : '(none - fresh start)',
|
||||
@@ -139,12 +144,13 @@ export class SDKAgent {
|
||||
// instead of polluting user's actual project resume lists
|
||||
cwd: OBSERVER_SESSIONS_DIR,
|
||||
// Only resume if shouldResume is true (memorySessionId exists, not first prompt, not forceInit)
|
||||
...(shouldResume && { resume: session.memorySessionId }),
|
||||
...(shouldResume && session.memorySessionId ? { resume: session.memorySessionId } : {}),
|
||||
disallowedTools,
|
||||
abortController: session.abortController,
|
||||
pathToClaudeCodeExecutable: claudePath,
|
||||
// Custom spawn function captures PIDs to fix zombie process accumulation
|
||||
spawnClaudeCodeProcess: createPidCapturingSpawn(session.sessionDbId),
|
||||
// Custom spawn factory: spawns the SDK child in its own POSIX process
|
||||
// group so the worker can tear down the whole subtree on shutdown.
|
||||
spawnClaudeCodeProcess: createSdkSpawnFactory(session.sessionDbId),
|
||||
env: isolatedEnv // Use isolated credentials from ~/.claude-mem/.env, not process.env
|
||||
}
|
||||
});
|
||||
@@ -283,10 +289,12 @@ export class SDKAgent {
|
||||
}
|
||||
}
|
||||
} finally {
|
||||
// Ensure subprocess is terminated after query completes (or on error)
|
||||
const tracked = getProcessBySession(session.sessionDbId);
|
||||
// Ensure subprocess is terminated after query completes (or on error).
|
||||
// Process-group teardown via ensureSdkProcessExit kills any descendants
|
||||
// the SDK spawned, so no orphan reaper is needed (Principle 5).
|
||||
const tracked = getSdkProcessForSession(session.sessionDbId);
|
||||
if (tracked && tracked.process.exitCode === null) {
|
||||
await ensureProcessExit(tracked, 5000);
|
||||
await ensureSdkProcessExit(tracked, 5000);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -31,6 +31,8 @@ import {
|
||||
SEARCH_CONSTANTS
|
||||
} from './search/index.js';
|
||||
import type { TimelineData } from './search/index.js';
|
||||
import { ResultFormatter } from './search/ResultFormatter.js';
|
||||
import { ChromaUnavailableError } from './search/errors.js';
|
||||
|
||||
export class SearchManager {
|
||||
private orchestrator: SearchOrchestrator;
|
||||
@@ -52,6 +54,22 @@ export class SearchManager {
|
||||
this.timelineBuilder = new TimelineBuilder();
|
||||
}
|
||||
|
||||
/**
|
||||
* Accessor for the underlying orchestrator. Used by HTTP routes that need
|
||||
* raw StrategySearchResult instead of formatted MCP text output.
|
||||
*/
|
||||
getOrchestrator(): SearchOrchestrator {
|
||||
return this.orchestrator;
|
||||
}
|
||||
|
||||
/**
|
||||
* Accessor for the formatter. Used by HTTP routes that construct
|
||||
* text output from raw orchestrator results.
|
||||
*/
|
||||
getFormatter(): FormattingService {
|
||||
return this.formatter;
|
||||
}
|
||||
|
||||
/**
|
||||
* Query Chroma vector database via ChromaSync
|
||||
* @deprecated Use orchestrator.search() instead
|
||||
@@ -166,6 +184,7 @@ export class SearchManager {
|
||||
let sessions: SessionSummarySearchResult[] = [];
|
||||
let prompts: UserPromptSearchResult[] = [];
|
||||
let chromaFailed = false;
|
||||
let chromaFailureReason: { message: string; isConnectionError: boolean } | null = null;
|
||||
|
||||
// Determine which types to query based on type filter
|
||||
const searchObservations = !type || type === 'observations';
|
||||
@@ -202,12 +221,6 @@ export class SearchManager {
|
||||
whereFilter = { doc_type: 'user_prompt' };
|
||||
}
|
||||
|
||||
// Include project in the Chroma where clause to scope vector search.
|
||||
// Without this, larger projects dominate the top-N results and smaller
|
||||
// projects get crowded out before the post-hoc SQLite filter.
|
||||
// Match both native-provenance rows (project) and adopted merged-worktree
|
||||
// rows (merged_into_project) so a parent-project query surfaces its
|
||||
// merged children's observations too.
|
||||
if (options.project) {
|
||||
const projectFilter = {
|
||||
$or: [
|
||||
@@ -220,82 +233,96 @@ export class SearchManager {
|
||||
: projectFilter;
|
||||
}
|
||||
|
||||
// Step 1: Chroma semantic search with optional type + project filter
|
||||
const chromaResults = await this.queryChroma(query, 100, whereFilter);
|
||||
chromaSucceeded = true; // Chroma didn't throw error
|
||||
logger.debug('SEARCH', 'ChromaDB returned semantic matches', { matchCount: chromaResults.ids.length });
|
||||
try {
|
||||
// Step 1: Chroma semantic search with optional type + project filter
|
||||
const chromaResults = await this.queryChroma(query, 100, whereFilter);
|
||||
chromaSucceeded = true; // Chroma didn't throw error
|
||||
logger.debug('SEARCH', 'ChromaDB returned semantic matches', { matchCount: chromaResults.ids.length });
|
||||
|
||||
if (chromaResults.ids.length > 0) {
|
||||
// Step 2: Filter by date range
|
||||
// Use user-provided dateRange if available, otherwise fall back to 90-day recency window
|
||||
const { dateRange } = options;
|
||||
let startEpoch: number | undefined;
|
||||
let endEpoch: number | undefined;
|
||||
if (chromaResults.ids.length > 0) {
|
||||
// Step 2: Filter by date range
|
||||
const { dateRange } = options;
|
||||
let startEpoch: number | undefined;
|
||||
let endEpoch: number | undefined;
|
||||
|
||||
if (dateRange) {
|
||||
if (dateRange.start) {
|
||||
startEpoch = typeof dateRange.start === 'number'
|
||||
? dateRange.start
|
||||
: new Date(dateRange.start).getTime();
|
||||
if (dateRange) {
|
||||
if (dateRange.start) {
|
||||
startEpoch = typeof dateRange.start === 'number'
|
||||
? dateRange.start
|
||||
: new Date(dateRange.start).getTime();
|
||||
}
|
||||
if (dateRange.end) {
|
||||
endEpoch = typeof dateRange.end === 'number'
|
||||
? dateRange.end
|
||||
: new Date(dateRange.end).getTime();
|
||||
}
|
||||
} else {
|
||||
// Default: 90-day recency window
|
||||
startEpoch = Date.now() - SEARCH_CONSTANTS.RECENCY_WINDOW_MS;
|
||||
}
|
||||
if (dateRange.end) {
|
||||
endEpoch = typeof dateRange.end === 'number'
|
||||
? dateRange.end
|
||||
: new Date(dateRange.end).getTime();
|
||||
|
||||
const recentMetadata = chromaResults.metadatas.map((meta, idx) => ({
|
||||
id: chromaResults.ids[idx],
|
||||
meta,
|
||||
isRecent: meta && meta.created_at_epoch != null
|
||||
&& (!startEpoch || meta.created_at_epoch >= startEpoch)
|
||||
&& (!endEpoch || meta.created_at_epoch <= endEpoch)
|
||||
})).filter(item => item.isRecent);
|
||||
|
||||
logger.debug('SEARCH', dateRange ? 'Results within user date range' : 'Results within 90-day window', { count: recentMetadata.length });
|
||||
|
||||
// Step 3: Categorize IDs by document type
|
||||
const obsIds: number[] = [];
|
||||
const sessionIds: number[] = [];
|
||||
const promptIds: number[] = [];
|
||||
|
||||
for (const item of recentMetadata) {
|
||||
const docType = item.meta?.doc_type;
|
||||
if (docType === 'observation' && searchObservations) {
|
||||
obsIds.push(item.id);
|
||||
} else if (docType === 'session_summary' && searchSessions) {
|
||||
sessionIds.push(item.id);
|
||||
} else if (docType === 'user_prompt' && searchPrompts) {
|
||||
promptIds.push(item.id);
|
||||
}
|
||||
}
|
||||
|
||||
// Step 4: Hydrate from SQLite with additional filters
|
||||
if (obsIds.length > 0) {
|
||||
const obsOptions = { ...options, type: obs_type, concepts, files };
|
||||
observations = this.sessionStore.getObservationsByIds(obsIds, obsOptions);
|
||||
}
|
||||
if (sessionIds.length > 0) {
|
||||
sessions = this.sessionStore.getSessionSummariesByIds(sessionIds, { orderBy: 'date_desc', limit: options.limit, project: options.project });
|
||||
}
|
||||
if (promptIds.length > 0) {
|
||||
prompts = this.sessionStore.getUserPromptsByIds(promptIds, { orderBy: 'date_desc', limit: options.limit, project: options.project });
|
||||
}
|
||||
} else {
|
||||
// Default: 90-day recency window
|
||||
startEpoch = Date.now() - SEARCH_CONSTANTS.RECENCY_WINDOW_MS;
|
||||
logger.debug('SEARCH', 'ChromaDB found no matches (final result, no FTS5 fallback)', {});
|
||||
}
|
||||
} catch (chromaError) {
|
||||
const errorObject = chromaError instanceof Error ? chromaError : new Error(String(chromaError));
|
||||
chromaFailureReason = {
|
||||
message: errorObject.message,
|
||||
isConnectionError: chromaError instanceof ChromaUnavailableError,
|
||||
};
|
||||
logger.warn('SEARCH', 'ChromaDB semantic search failed, falling back to FTS5 keyword search', {}, errorObject);
|
||||
chromaFailed = true;
|
||||
|
||||
const recentMetadata = chromaResults.metadatas.map((meta, idx) => ({
|
||||
id: chromaResults.ids[idx],
|
||||
meta,
|
||||
isRecent: meta && meta.created_at_epoch != null
|
||||
&& (!startEpoch || meta.created_at_epoch >= startEpoch)
|
||||
&& (!endEpoch || meta.created_at_epoch <= endEpoch)
|
||||
})).filter(item => item.isRecent);
|
||||
|
||||
logger.debug('SEARCH', dateRange ? 'Results within user date range' : 'Results within 90-day window', { count: recentMetadata.length });
|
||||
|
||||
// Step 3: Categorize IDs by document type
|
||||
const obsIds: number[] = [];
|
||||
const sessionIds: number[] = [];
|
||||
const promptIds: number[] = [];
|
||||
|
||||
for (const item of recentMetadata) {
|
||||
const docType = item.meta?.doc_type;
|
||||
if (docType === 'observation' && searchObservations) {
|
||||
obsIds.push(item.id);
|
||||
} else if (docType === 'session_summary' && searchSessions) {
|
||||
sessionIds.push(item.id);
|
||||
} else if (docType === 'user_prompt' && searchPrompts) {
|
||||
promptIds.push(item.id);
|
||||
}
|
||||
// Fallback to FTS5 path since Chroma failed
|
||||
if (searchObservations) {
|
||||
observations = this.sessionSearch.searchObservations(query, { ...options, type: obs_type, concepts, files });
|
||||
}
|
||||
|
||||
logger.debug('SEARCH', 'Categorized results by type', { observations: obsIds.length, sessions: sessionIds.length, prompts: prompts.length });
|
||||
|
||||
// Step 4: Hydrate from SQLite with additional filters
|
||||
if (obsIds.length > 0) {
|
||||
// Apply obs_type, concepts, files filters if provided
|
||||
const obsOptions = { ...options, type: obs_type, concepts, files };
|
||||
observations = this.sessionStore.getObservationsByIds(obsIds, obsOptions);
|
||||
if (searchSessions) {
|
||||
sessions = this.sessionSearch.searchSessions(query, options);
|
||||
}
|
||||
if (sessionIds.length > 0) {
|
||||
sessions = this.sessionStore.getSessionSummariesByIds(sessionIds, { orderBy: 'date_desc', limit: options.limit, project: options.project });
|
||||
if (searchPrompts) {
|
||||
prompts = this.sessionSearch.searchUserPrompts(query, options);
|
||||
}
|
||||
if (promptIds.length > 0) {
|
||||
prompts = this.sessionStore.getUserPromptsByIds(promptIds, { orderBy: 'date_desc', limit: options.limit, project: options.project });
|
||||
}
|
||||
|
||||
logger.debug('SEARCH', 'Hydrated results from SQLite', { observations: observations.length, sessions: sessions.length, prompts: prompts.length });
|
||||
} else {
|
||||
// Chroma returned 0 results - this is the correct answer, don't fall back to FTS5
|
||||
logger.debug('SEARCH', 'ChromaDB found no matches (final result, no FTS5 fallback)', {});
|
||||
}
|
||||
}
|
||||
// ChromaDB not initialized - fall back to FTS5 keyword search (#1913, #2048)
|
||||
// PATH 3: FTS5 KEYWORD SEARCH (Chroma not initialized)
|
||||
else if (query) {
|
||||
logger.debug('SEARCH', 'ChromaDB not initialized — falling back to FTS5 keyword search', {});
|
||||
try {
|
||||
@@ -329,11 +356,11 @@ export class SearchManager {
|
||||
}
|
||||
|
||||
if (totalResults === 0) {
|
||||
if (chromaFailed) {
|
||||
if (chromaFailureReason !== null) {
|
||||
return {
|
||||
content: [{
|
||||
type: 'text' as const,
|
||||
text: `Vector search failed - semantic search unavailable.\n\nTo enable semantic search:\n1. Install uv: https://docs.astral.sh/uv/getting-started/installation/\n2. Restart the worker: npm run worker:restart\n\nNote: You can still use filter-only searches (date ranges, types, files) without a query term.`
|
||||
text: ResultFormatter.formatChromaFailureMessage(chromaFailureReason)
|
||||
}]
|
||||
};
|
||||
}
|
||||
@@ -1203,265 +1230,6 @@ export class SearchManager {
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Tool handler: find_by_concept
|
||||
*/
|
||||
async findByConcept(args: any): Promise<any> {
|
||||
const normalized = this.normalizeParams(args);
|
||||
const { concepts: concept, ...filters } = normalized;
|
||||
let results: ObservationSearchResult[] = [];
|
||||
|
||||
// Metadata-first, semantic-enhanced search
|
||||
if (this.chromaSync) {
|
||||
logger.debug('SEARCH', 'Using metadata-first + semantic ranking for concept search', {});
|
||||
|
||||
// Step 1: SQLite metadata filter (get all IDs with this concept)
|
||||
const metadataResults = this.sessionSearch.findByConcept(concept, filters);
|
||||
logger.debug('SEARCH', 'Found observations with concept', { concept, count: metadataResults.length });
|
||||
|
||||
if (metadataResults.length > 0) {
|
||||
// Step 2: Chroma semantic ranking (rank by relevance to concept)
|
||||
const ids = metadataResults.map(obs => obs.id);
|
||||
const chromaResults = await this.queryChroma(concept, Math.min(ids.length, 100));
|
||||
|
||||
// Intersect: Keep only IDs that passed metadata filter, in semantic rank order
|
||||
const rankedIds: number[] = [];
|
||||
for (const chromaId of chromaResults.ids) {
|
||||
if (ids.includes(chromaId) && !rankedIds.includes(chromaId)) {
|
||||
rankedIds.push(chromaId);
|
||||
}
|
||||
}
|
||||
|
||||
logger.debug('SEARCH', 'Chroma ranked results by semantic relevance', { count: rankedIds.length });
|
||||
|
||||
// Step 3: Hydrate in semantic rank order
|
||||
if (rankedIds.length > 0) {
|
||||
results = this.sessionStore.getObservationsByIds(rankedIds, { limit: filters.limit || 20 });
|
||||
// Restore semantic ranking order
|
||||
results.sort((a, b) => rankedIds.indexOf(a.id) - rankedIds.indexOf(b.id));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Fall back to SQLite-only if Chroma unavailable or failed
|
||||
if (results.length === 0) {
|
||||
logger.debug('SEARCH', 'Using SQLite-only concept search', {});
|
||||
results = this.sessionSearch.findByConcept(concept, filters);
|
||||
}
|
||||
|
||||
if (results.length === 0) {
|
||||
return {
|
||||
content: [{
|
||||
type: 'text' as const,
|
||||
text: `No observations found with concept "${concept}"`
|
||||
}]
|
||||
};
|
||||
}
|
||||
|
||||
// Format as table
|
||||
const header = `Found ${results.length} observation(s) with concept "${concept}"\n\n${this.formatter.formatTableHeader()}`;
|
||||
const formattedResults = results.map((obs, i) => this.formatter.formatObservationIndex(obs, i));
|
||||
|
||||
return {
|
||||
content: [{
|
||||
type: 'text' as const,
|
||||
text: header + '\n' + formattedResults.join('\n')
|
||||
}]
|
||||
};
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Tool handler: find_by_file
|
||||
*/
|
||||
async findByFile(args: any): Promise<any> {
|
||||
const normalized = this.normalizeParams(args);
|
||||
const { files: rawFilePath, ...filters } = normalized;
|
||||
// Handle both string and array (normalizeParams may split on comma)
|
||||
const filePath = Array.isArray(rawFilePath) ? rawFilePath[0] : rawFilePath;
|
||||
let observations: ObservationSearchResult[] = [];
|
||||
let sessions: SessionSummarySearchResult[] = [];
|
||||
|
||||
// Metadata-first, semantic-enhanced search for observations
|
||||
if (this.chromaSync) {
|
||||
logger.debug('SEARCH', 'Using metadata-first + semantic ranking for file search', {});
|
||||
|
||||
// Step 1: SQLite metadata filter (get all results with this file)
|
||||
const metadataResults = this.sessionSearch.findByFile(filePath, filters);
|
||||
logger.debug('SEARCH', 'Found results for file', { file: filePath, observations: metadataResults.observations.length, sessions: metadataResults.sessions.length });
|
||||
|
||||
// Sessions: Keep as-is (already summarized, no semantic ranking needed)
|
||||
sessions = metadataResults.sessions;
|
||||
|
||||
// Observations: Apply semantic ranking
|
||||
if (metadataResults.observations.length > 0) {
|
||||
// Step 2: Chroma semantic ranking (rank by relevance to file path)
|
||||
const ids = metadataResults.observations.map(obs => obs.id);
|
||||
const chromaResults = await this.queryChroma(filePath, Math.min(ids.length, 100));
|
||||
|
||||
// Intersect: Keep only IDs that passed metadata filter, in semantic rank order
|
||||
const rankedIds: number[] = [];
|
||||
for (const chromaId of chromaResults.ids) {
|
||||
if (ids.includes(chromaId) && !rankedIds.includes(chromaId)) {
|
||||
rankedIds.push(chromaId);
|
||||
}
|
||||
}
|
||||
|
||||
logger.debug('SEARCH', 'Chroma ranked observations by semantic relevance', { count: rankedIds.length });
|
||||
|
||||
// Step 3: Hydrate in semantic rank order
|
||||
if (rankedIds.length > 0) {
|
||||
observations = this.sessionStore.getObservationsByIds(rankedIds, { limit: filters.limit || 20 });
|
||||
// Restore semantic ranking order
|
||||
observations.sort((a, b) => rankedIds.indexOf(a.id) - rankedIds.indexOf(b.id));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Fall back to SQLite-only if Chroma unavailable or failed
|
||||
if (observations.length === 0 && sessions.length === 0) {
|
||||
logger.debug('SEARCH', 'Using SQLite-only file search', {});
|
||||
const results = this.sessionSearch.findByFile(filePath, filters);
|
||||
observations = results.observations;
|
||||
sessions = results.sessions;
|
||||
}
|
||||
|
||||
const totalResults = observations.length + sessions.length;
|
||||
|
||||
if (totalResults === 0) {
|
||||
return {
|
||||
content: [{
|
||||
type: 'text' as const,
|
||||
text: `No results found for file "${filePath}"`
|
||||
}]
|
||||
};
|
||||
}
|
||||
|
||||
// Combine observations and sessions with timestamps for date grouping
|
||||
const combined: Array<{
|
||||
type: 'observation' | 'session';
|
||||
data: ObservationSearchResult | SessionSummarySearchResult;
|
||||
epoch: number;
|
||||
created_at: string;
|
||||
}> = [
|
||||
...observations.map(obs => ({
|
||||
type: 'observation' as const,
|
||||
data: obs,
|
||||
epoch: obs.created_at_epoch,
|
||||
created_at: obs.created_at
|
||||
})),
|
||||
...sessions.map(sess => ({
|
||||
type: 'session' as const,
|
||||
data: sess,
|
||||
epoch: sess.created_at_epoch,
|
||||
created_at: sess.created_at
|
||||
}))
|
||||
];
|
||||
|
||||
// Sort by date (most recent first)
|
||||
combined.sort((a, b) => b.epoch - a.epoch);
|
||||
|
||||
// Group by date for proper timeline rendering
|
||||
const resultsByDate = groupByDate(combined, item => item.created_at);
|
||||
|
||||
// Format with date headers for proper date parsing by folder CLAUDE.md generator
|
||||
const lines: string[] = [];
|
||||
lines.push(`Found ${totalResults} result(s) for file "${filePath}"`);
|
||||
lines.push('');
|
||||
|
||||
for (const [day, dayResults] of resultsByDate) {
|
||||
lines.push(`### ${day}`);
|
||||
lines.push('');
|
||||
lines.push(this.formatter.formatTableHeader());
|
||||
|
||||
for (const result of dayResults) {
|
||||
if (result.type === 'observation') {
|
||||
lines.push(this.formatter.formatObservationIndex(result.data as ObservationSearchResult, 0));
|
||||
} else {
|
||||
lines.push(this.formatter.formatSessionIndex(result.data as SessionSummarySearchResult, 0));
|
||||
}
|
||||
}
|
||||
lines.push('');
|
||||
}
|
||||
|
||||
return {
|
||||
content: [{
|
||||
type: 'text' as const,
|
||||
text: lines.join('\n')
|
||||
}]
|
||||
};
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Tool handler: find_by_type
|
||||
*/
|
||||
async findByType(args: any): Promise<any> {
|
||||
const normalized = this.normalizeParams(args);
|
||||
const { type, ...filters } = normalized;
|
||||
const typeStr = Array.isArray(type) ? type.join(', ') : type;
|
||||
let results: ObservationSearchResult[] = [];
|
||||
|
||||
// Metadata-first, semantic-enhanced search
|
||||
if (this.chromaSync) {
|
||||
logger.debug('SEARCH', 'Using metadata-first + semantic ranking for type search', {});
|
||||
|
||||
// Step 1: SQLite metadata filter (get all IDs with this type)
|
||||
const metadataResults = this.sessionSearch.findByType(type, filters);
|
||||
logger.debug('SEARCH', 'Found observations with type', { type: typeStr, count: metadataResults.length });
|
||||
|
||||
if (metadataResults.length > 0) {
|
||||
// Step 2: Chroma semantic ranking (rank by relevance to type)
|
||||
const ids = metadataResults.map(obs => obs.id);
|
||||
const chromaResults = await this.queryChroma(typeStr, Math.min(ids.length, 100));
|
||||
|
||||
// Intersect: Keep only IDs that passed metadata filter, in semantic rank order
|
||||
const rankedIds: number[] = [];
|
||||
for (const chromaId of chromaResults.ids) {
|
||||
if (ids.includes(chromaId) && !rankedIds.includes(chromaId)) {
|
||||
rankedIds.push(chromaId);
|
||||
}
|
||||
}
|
||||
|
||||
logger.debug('SEARCH', 'Chroma ranked results by semantic relevance', { count: rankedIds.length });
|
||||
|
||||
// Step 3: Hydrate in semantic rank order
|
||||
if (rankedIds.length > 0) {
|
||||
results = this.sessionStore.getObservationsByIds(rankedIds, { limit: filters.limit || 20 });
|
||||
// Restore semantic ranking order
|
||||
results.sort((a, b) => rankedIds.indexOf(a.id) - rankedIds.indexOf(b.id));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Fall back to SQLite-only if Chroma unavailable or failed
|
||||
if (results.length === 0) {
|
||||
logger.debug('SEARCH', 'Using SQLite-only type search', {});
|
||||
results = this.sessionSearch.findByType(type, filters);
|
||||
}
|
||||
|
||||
if (results.length === 0) {
|
||||
return {
|
||||
content: [{
|
||||
type: 'text' as const,
|
||||
text: `No observations found with type "${typeStr}"`
|
||||
}]
|
||||
};
|
||||
}
|
||||
|
||||
// Format as table
|
||||
const header = `Found ${results.length} observation(s) with type "${typeStr}"\n\n${this.formatter.formatTableHeader()}`;
|
||||
const formattedResults = results.map((obs, i) => this.formatter.formatObservationIndex(obs, i));
|
||||
|
||||
return {
|
||||
content: [{
|
||||
type: 'text' as const,
|
||||
text: header + '\n' + formattedResults.join('\n')
|
||||
}]
|
||||
};
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Tool handler: get_recent_context
|
||||
*/
|
||||
|
||||
@@ -14,75 +14,10 @@ import { logger } from '../../utils/logger.js';
|
||||
import type { ActiveSession, PendingMessage, PendingMessageWithId, ObservationData } from '../worker-types.js';
|
||||
import { PendingMessageStore } from '../sqlite/PendingMessageStore.js';
|
||||
import { SessionQueueProcessor } from '../queue/SessionQueueProcessor.js';
|
||||
import { getProcessBySession, ensureProcessExit } from './ProcessRegistry.js';
|
||||
import { getSdkProcessForSession, ensureSdkProcessExit } from '../../supervisor/process-registry.js';
|
||||
import { getSupervisor } from '../../supervisor/index.js';
|
||||
import { MAX_CONSECUTIVE_SUMMARY_FAILURES } from '../../sdk/prompts.js';
|
||||
import { RestartGuard } from './RestartGuard.js';
|
||||
|
||||
/** Idle threshold before a stuck generator (zombie subprocess) is force-killed. */
|
||||
export const MAX_GENERATOR_IDLE_MS = 5 * 60 * 1000; // 5 minutes
|
||||
|
||||
/** Idle threshold before a no-generator session with no pending work is reaped. */
|
||||
export const MAX_SESSION_IDLE_MS = 15 * 60 * 1000; // 15 minutes
|
||||
|
||||
/**
|
||||
* Minimal process interface used by detectStaleGenerator — compatible with
|
||||
* both the real Bun.Subprocess / ChildProcess shapes and test mocks.
|
||||
*/
|
||||
export interface StaleGeneratorProcess {
|
||||
exitCode: number | null;
|
||||
kill(signal?: string): boolean | void;
|
||||
}
|
||||
|
||||
/**
|
||||
* Minimal session fields required to evaluate stale-generator status.
|
||||
* This is a subset of ActiveSession, allowing unit tests to pass plain objects.
|
||||
*/
|
||||
export interface StaleGeneratorCandidate {
|
||||
generatorPromise: Promise<void> | null;
|
||||
lastGeneratorActivity: number;
|
||||
abortController: AbortController;
|
||||
}
|
||||
|
||||
/**
|
||||
* Detect whether a session's generator is stuck (zombie subprocess) and, if so,
|
||||
* SIGKILL the subprocess and abort the controller.
|
||||
*
|
||||
* Extracted from reapStaleSessions() so tests can import and exercise the exact
|
||||
* same logic rather than duplicating it locally. (Issue #1652)
|
||||
*
|
||||
* @param session - session to inspect
|
||||
* @param proc - tracked subprocess (may be undefined if not in ProcessRegistry)
|
||||
* @param now - current timestamp (defaults to Date.now(); pass explicit value in tests)
|
||||
* @returns true if the session was marked stale, false otherwise
|
||||
*/
|
||||
export function detectStaleGenerator(
|
||||
session: StaleGeneratorCandidate,
|
||||
proc: StaleGeneratorProcess | undefined,
|
||||
now = Date.now()
|
||||
): boolean {
|
||||
if (!session.generatorPromise) return false;
|
||||
|
||||
const generatorIdleMs = now - session.lastGeneratorActivity;
|
||||
if (generatorIdleMs <= MAX_GENERATOR_IDLE_MS) return false;
|
||||
|
||||
// Kill subprocess to unblock stuck for-await
|
||||
if (proc && proc.exitCode === null) {
|
||||
try {
|
||||
proc.kill('SIGKILL');
|
||||
} catch (error) {
|
||||
if (error instanceof Error) {
|
||||
logger.warn('SESSION', 'Failed to SIGKILL stale generator subprocess', {}, error);
|
||||
} else {
|
||||
logger.warn('SESSION', 'Failed to SIGKILL stale generator subprocess with non-Error', {}, new Error(String(error)));
|
||||
}
|
||||
}
|
||||
}
|
||||
// Signal the SDK agent loop to exit
|
||||
session.abortController.abort();
|
||||
return true;
|
||||
}
|
||||
|
||||
export class SessionManager {
|
||||
private dbManager: DatabaseManager;
|
||||
private sessions: Map<number, ActiveSession> = new Map();
|
||||
@@ -229,7 +164,6 @@ export class SessionManager {
|
||||
restartGuard: new RestartGuard(),
|
||||
processingMessageIds: [], // CLAIM-CONFIRM: Track message IDs for confirmProcessed()
|
||||
lastGeneratorActivity: Date.now(), // Initialize for stale detection (Issue #1099)
|
||||
consecutiveSummaryFailures: 0, // Circuit breaker for summary retry loop (#1633)
|
||||
pendingAgentId: null, // Subagent identity carried from the most recent claimed message
|
||||
pendingAgentType: null // (null for main-session messages)
|
||||
};
|
||||
@@ -289,16 +223,28 @@ export class SessionManager {
|
||||
prompt_number: data.prompt_number,
|
||||
cwd: data.cwd,
|
||||
agentId: data.agentId,
|
||||
agentType: data.agentType
|
||||
agentType: data.agentType,
|
||||
toolUseId: data.toolUseId,
|
||||
};
|
||||
|
||||
try {
|
||||
const messageId = this.getPendingStore().enqueue(sessionDbId, session.contentSessionId, message);
|
||||
const queueDepth = this.getPendingStore().getPendingCount(sessionDbId);
|
||||
const toolSummary = logger.formatTool(data.tool_name, data.tool_input);
|
||||
logger.info('QUEUE', `ENQUEUED | sessionDbId=${sessionDbId} | messageId=${messageId} | type=observation | tool=${toolSummary} | depth=${queueDepth}`, {
|
||||
sessionId: sessionDbId
|
||||
});
|
||||
// enqueue returns 0 on INSERT OR IGNORE conflict (UNIQUE(session_id, tool_use_id)
|
||||
// — Plan 01 Phase 1). The duplicate is correctly suppressed by the DB; surface
|
||||
// it visibly so it isn't misread as "messageId=0 was inserted." Per
|
||||
// Principle 3 (UNIQUE constraint over dedup window) this is the success path
|
||||
// for replayed transcript lines, not an error.
|
||||
if (messageId === 0) {
|
||||
logger.debug('QUEUE', `DUP_SUPPRESSED | sessionDbId=${sessionDbId} | type=observation | tool=${toolSummary} | toolUseId=${data.toolUseId ?? 'null'} | depth=${queueDepth}`, {
|
||||
sessionId: sessionDbId
|
||||
});
|
||||
} else {
|
||||
logger.info('QUEUE', `ENQUEUED | sessionDbId=${sessionDbId} | messageId=${messageId} | type=observation | tool=${toolSummary} | depth=${queueDepth}`, {
|
||||
sessionId: sessionDbId
|
||||
});
|
||||
}
|
||||
} catch (error) {
|
||||
if (error instanceof Error) {
|
||||
logger.error('SESSION', 'Failed to persist observation to DB', {
|
||||
@@ -333,17 +279,10 @@ export class SessionManager {
|
||||
session = this.initializeSession(sessionDbId);
|
||||
}
|
||||
|
||||
// Circuit breaker: skip summarize if too many consecutive failures (#1633).
|
||||
// This prevents the infinite loop where each failed summary spawns a new session
|
||||
// with an ever-growing prompt. Counter is in-memory per ActiveSession — it resets
|
||||
// on worker restart, which is acceptable because session state is already ephemeral.
|
||||
if (session.consecutiveSummaryFailures >= MAX_CONSECUTIVE_SUMMARY_FAILURES) {
|
||||
logger.warn('SESSION', `Circuit breaker OPEN: skipping summarize after ${session.consecutiveSummaryFailures} consecutive failures (#1633)`, {
|
||||
sessionId: sessionDbId,
|
||||
contentSessionId: session.contentSessionId
|
||||
});
|
||||
return;
|
||||
}
|
||||
// PATHFINDER plan 03 phase 3: summary-failure circuit breaker deleted.
|
||||
// Each failed parse is independently marked failed via the retry ladder
|
||||
// in PendingMessageStore.markFailed; a storm of bad parses surfaces as
|
||||
// retry exhaustion, not as silent suppression of further requests.
|
||||
|
||||
// CRITICAL: Persist to database FIRST
|
||||
const message: PendingMessage = {
|
||||
@@ -354,9 +293,16 @@ export class SessionManager {
|
||||
try {
|
||||
const messageId = this.getPendingStore().enqueue(sessionDbId, session.contentSessionId, message);
|
||||
const queueDepth = this.getPendingStore().getPendingCount(sessionDbId);
|
||||
logger.info('QUEUE', `ENQUEUED | sessionDbId=${sessionDbId} | messageId=${messageId} | type=summarize | depth=${queueDepth}`, {
|
||||
sessionId: sessionDbId
|
||||
});
|
||||
// See queueObservation note: messageId=0 means UNIQUE-suppressed duplicate.
|
||||
if (messageId === 0) {
|
||||
logger.debug('QUEUE', `DUP_SUPPRESSED | sessionDbId=${sessionDbId} | type=summarize | depth=${queueDepth}`, {
|
||||
sessionId: sessionDbId
|
||||
});
|
||||
} else {
|
||||
logger.info('QUEUE', `ENQUEUED | sessionDbId=${sessionDbId} | messageId=${messageId} | type=summarize | depth=${queueDepth}`, {
|
||||
sessionId: sessionDbId
|
||||
});
|
||||
}
|
||||
} catch (error) {
|
||||
if (error instanceof Error) {
|
||||
logger.error('SESSION', 'Failed to persist summarize to DB', {
|
||||
@@ -402,19 +348,21 @@ export class SessionManager {
|
||||
});
|
||||
}
|
||||
|
||||
// 3. Verify subprocess exit with 5s timeout (Issue #737 fix)
|
||||
const tracked = getProcessBySession(sessionDbId);
|
||||
// 3. Verify subprocess exit with 5s timeout. Process-group teardown is
|
||||
// used internally so any SDK descendants are killed too (Principle 5).
|
||||
const tracked = getSdkProcessForSession(sessionDbId);
|
||||
if (tracked && tracked.process.exitCode === null) {
|
||||
logger.debug('SESSION', `Waiting for subprocess PID ${tracked.pid} to exit`, {
|
||||
logger.debug('SESSION', `Waiting for subprocess PID ${tracked.pid} (pgid ${tracked.pgid}) to exit`, {
|
||||
sessionId: sessionDbId,
|
||||
pid: tracked.pid
|
||||
pid: tracked.pid,
|
||||
pgid: tracked.pgid
|
||||
});
|
||||
await ensureProcessExit(tracked, 5000);
|
||||
await ensureSdkProcessExit(tracked, 5000);
|
||||
}
|
||||
|
||||
// 3b. Reap all supervisor-tracked processes for this session (#1351)
|
||||
// This catches MCP servers and other child processes not tracked by the
|
||||
// in-memory ProcessRegistry (e.g. processes registered only in supervisor.json).
|
||||
// Catches MCP servers and other child processes registered only in
|
||||
// supervisor.json that the in-process tracking would not see.
|
||||
try {
|
||||
await getSupervisor().getRegistry().reapSession(sessionDbId);
|
||||
} catch (error) {
|
||||
@@ -467,106 +415,6 @@ export class SessionManager {
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Evict the idlest session to free a pool slot (#1868).
|
||||
* An "idle" session has an active generator but no pending work — it's sitting
|
||||
* in the 3-min idle wait before subprocess cleanup. Evicting it triggers abort
|
||||
* which kills the subprocess and frees the pool slot for a waiting new session.
|
||||
* @returns true if a session was evicted, false if no idle sessions found
|
||||
*/
|
||||
evictIdlestSession(): boolean {
|
||||
let idlestSessionId: number | null = null;
|
||||
let oldestActivity = Infinity;
|
||||
|
||||
for (const [sessionDbId, session] of this.sessions) {
|
||||
if (!session.generatorPromise) continue; // No generator = no slot held
|
||||
const pendingCount = this.getPendingStore().getPendingCount(sessionDbId);
|
||||
if (pendingCount > 0) continue; // Has work to do, don't evict
|
||||
|
||||
// Pick the session with the oldest lastGeneratorActivity (idlest)
|
||||
if (session.lastGeneratorActivity < oldestActivity) {
|
||||
oldestActivity = session.lastGeneratorActivity;
|
||||
idlestSessionId = sessionDbId;
|
||||
}
|
||||
}
|
||||
|
||||
if (idlestSessionId === null) return false;
|
||||
|
||||
const session = this.sessions.get(idlestSessionId);
|
||||
if (!session) return false;
|
||||
|
||||
logger.info('SESSION', 'Evicting idle session to free pool slot for new request (#1868)', {
|
||||
sessionDbId: idlestSessionId,
|
||||
idleDurationMs: Date.now() - oldestActivity
|
||||
});
|
||||
|
||||
session.idleTimedOut = true;
|
||||
session.abortController.abort();
|
||||
return true;
|
||||
}
|
||||
|
||||
/**
|
||||
* Reap sessions with no active generator and no pending work that have been idle too long.
|
||||
* Also reaps sessions whose generator has been stuck (no lastGeneratorActivity update) for
|
||||
* longer than MAX_GENERATOR_IDLE_MS — these are zombie subprocesses that will never exit
|
||||
* on their own because the orphan reaper skips sessions in the active sessions map. (Issue #1652)
|
||||
*
|
||||
* This unblocks the orphan reaper which skips processes for "active" sessions. (Issue #1168)
|
||||
*/
|
||||
async reapStaleSessions(): Promise<number> {
|
||||
const now = Date.now();
|
||||
const staleSessionIds: number[] = [];
|
||||
|
||||
for (const [sessionDbId, session] of this.sessions) {
|
||||
// Sessions with active generators — check for stuck/zombie generators (Issue #1652)
|
||||
if (session.generatorPromise) {
|
||||
const generatorIdleMs = now - session.lastGeneratorActivity;
|
||||
if (generatorIdleMs > MAX_GENERATOR_IDLE_MS) {
|
||||
logger.warn('SESSION', `Stale generator detected for session ${sessionDbId} (no activity for ${Math.round(generatorIdleMs / 60000)}m) — force-killing subprocess`, {
|
||||
sessionDbId,
|
||||
generatorIdleMs
|
||||
});
|
||||
// Force-kill the subprocess to unblock the stuck for-await in SDKAgent.
|
||||
// Without this the generator is blocked on `for await (const msg of queryResult)`
|
||||
// and will never exit even after abort() is called.
|
||||
const trackedProcess = getProcessBySession(sessionDbId);
|
||||
if (trackedProcess && trackedProcess.process.exitCode === null) {
|
||||
try {
|
||||
trackedProcess.process.kill('SIGKILL');
|
||||
} catch (err) {
|
||||
if (err instanceof Error) {
|
||||
logger.warn('SESSION', 'Failed to SIGKILL subprocess for stale generator', { sessionDbId }, err);
|
||||
} else {
|
||||
logger.warn('SESSION', 'Failed to SIGKILL subprocess for stale generator with non-Error', { sessionDbId }, new Error(String(err)));
|
||||
}
|
||||
}
|
||||
}
|
||||
// Signal the SDK agent loop to exit after the subprocess dies
|
||||
session.abortController.abort();
|
||||
staleSessionIds.push(sessionDbId);
|
||||
}
|
||||
continue;
|
||||
}
|
||||
|
||||
// Skip sessions with pending work
|
||||
const pendingCount = this.getPendingStore().getPendingCount(sessionDbId);
|
||||
if (pendingCount > 0) continue;
|
||||
|
||||
// No generator + no pending work + old enough = stale
|
||||
const sessionAge = now - session.startTime;
|
||||
if (sessionAge > MAX_SESSION_IDLE_MS) {
|
||||
logger.warn('SESSION', `Reaping idle session ${sessionDbId} (no activity for >${Math.round(MAX_SESSION_IDLE_MS / 60000)}m)`, { sessionDbId });
|
||||
staleSessionIds.push(sessionDbId);
|
||||
}
|
||||
}
|
||||
|
||||
for (const sessionDbId of staleSessionIds) {
|
||||
await this.deleteSession(sessionDbId);
|
||||
}
|
||||
|
||||
return staleSessionIds.length;
|
||||
}
|
||||
|
||||
/**
|
||||
* Shutdown all active sessions
|
||||
*/
|
||||
|
||||
@@ -37,7 +37,9 @@ export class SettingsManager {
|
||||
for (const row of rows) {
|
||||
const key = row.key as keyof ViewerSettings;
|
||||
if (key in settings) {
|
||||
settings[key] = JSON.parse(row.value) as ViewerSettings[typeof key];
|
||||
// Object.assign narrows correctly across the discriminated union
|
||||
// where `settings[key] = value` would collapse to `never`.
|
||||
Object.assign(settings, { [key]: JSON.parse(row.value) });
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -12,8 +12,8 @@
|
||||
*/
|
||||
|
||||
import { logger } from '../../../utils/logger.js';
|
||||
import { parseObservations, parseSummary, type ParsedObservation, type ParsedSummary } from '../../../sdk/parser.js';
|
||||
import { SUMMARY_MODE_MARKER, MAX_CONSECUTIVE_SUMMARY_FAILURES } from '../../../sdk/prompts.js';
|
||||
import { parseAgentXml, type ParsedObservation, type ParsedSummary } from '../../../sdk/parser.js';
|
||||
import { ingestSummary } from '../http/shared.js';
|
||||
import { updateCursorContextForProject } from '../../integrations/CursorHooksInstaller.js';
|
||||
import { notifyTelegram } from '../../integrations/TelegramNotifier.js';
|
||||
import { updateFolderClaudeMdFiles } from '../../../utils/claude-md-utils.js';
|
||||
@@ -67,39 +67,16 @@ export async function processAgentResponse(
|
||||
session.conversationHistory.push({ role: 'assistant', content: text });
|
||||
}
|
||||
|
||||
// Parse observations and summary
|
||||
const observations = parseObservations(text, session.contentSessionId);
|
||||
// Single fail-fast parse (PATHFINDER plan 03 phase 1+2). On invalid XML,
|
||||
// mark each in-flight pending message failed and stop. The PendingMessageStore
|
||||
// retry ladder is the legitimate primary-path surface for transient failures;
|
||||
// there is no circuit breaker, no coercion.
|
||||
const parsed = parseAgentXml(text, session.contentSessionId);
|
||||
|
||||
// Detect whether the most recent prompt was a summary request.
|
||||
// If so, enable observation-to-summary coercion to prevent the infinite
|
||||
// retry loop described in #1633.
|
||||
const lastMessage = session.conversationHistory.at(-1);
|
||||
const lastUserMessage = lastMessage?.role === 'user'
|
||||
? lastMessage
|
||||
: session.conversationHistory.findLast(m => m.role === 'user') ?? null;
|
||||
const summaryExpected = lastUserMessage?.content?.includes(SUMMARY_MODE_MARKER) ?? false;
|
||||
|
||||
const summary = parseSummary(text, session.sessionDbId, summaryExpected);
|
||||
|
||||
// Detect non-XML responses (auth errors, rate limits, garbled output).
|
||||
// When the response contains no parseable XML and produced no observations,
|
||||
// mark the pending messages as failed instead of confirming them — this prevents
|
||||
// silent data loss when the LLM returns garbage (#1874).
|
||||
const isNonXmlResponse = (
|
||||
text.trim() &&
|
||||
observations.length === 0 &&
|
||||
!summary &&
|
||||
!/<observation>|<summary>|<skip_summary\b/.test(text)
|
||||
);
|
||||
|
||||
if (isNonXmlResponse) {
|
||||
const preview = text.length > 200 ? `${text.slice(0, 200)}...` : text;
|
||||
logger.warn('PARSER', `${agentName} returned non-XML response; marking messages as failed for retry (#1874)`, {
|
||||
if (!parsed.valid) {
|
||||
logger.warn('PARSER', `${agentName} returned unparseable response: ${parsed.reason}`, {
|
||||
sessionId: session.sessionDbId,
|
||||
preview
|
||||
});
|
||||
|
||||
// Mark messages as failed (retry logic in PendingMessageStore handles retries)
|
||||
const pendingStore = sessionManager.getPendingMessageStore();
|
||||
for (const messageId of session.processingMessageIds) {
|
||||
pendingStore.markFailed(messageId);
|
||||
@@ -108,6 +85,17 @@ export async function processAgentResponse(
|
||||
return;
|
||||
}
|
||||
|
||||
let observations: ParsedObservation[] = [];
|
||||
let summary: ParsedSummary | null = null;
|
||||
if (parsed.kind === 'observation') {
|
||||
observations = parsed.data;
|
||||
} else if (!parsed.data.skipped) {
|
||||
// `<skip_summary/>` is a first-class parser result but carries nothing to
|
||||
// persist; the summary storage path is skipped entirely so storeObservations
|
||||
// does not see an empty record.
|
||||
summary = parsed.data;
|
||||
}
|
||||
|
||||
// Convert nullable fields to empty strings for storeSummary (if summary exists)
|
||||
const summaryForStore = normalizeSummaryForStorage(summary);
|
||||
|
||||
@@ -174,30 +162,23 @@ export async function processAgentResponse(
|
||||
// to the Stop hook for silent-summary-loss detection (#1633)
|
||||
session.lastSummaryStored = result.summaryId !== null;
|
||||
|
||||
// Circuit breaker: track consecutive summary failures (#1633).
|
||||
// Only evaluate when a summary was actually expected (summarize message was sent).
|
||||
// Without this guard, the counter would increment on every normal observation
|
||||
// response, tripping the breaker after 3 observations and permanently blocking
|
||||
// summarization — reproducing the data-loss scenario this fix is meant to prevent.
|
||||
if (summaryExpected) {
|
||||
const skippedIntentionally = /<skip_summary\b/.test(text);
|
||||
if (summaryForStore !== null) {
|
||||
// Summary was present in the response — reset the failure counter
|
||||
session.consecutiveSummaryFailures = 0;
|
||||
} else if (skippedIntentionally) {
|
||||
// Explicit <skip_summary/> is a valid protocol response — neither success
|
||||
// nor failure. Leave the counter unchanged so we don't mask a bad run that
|
||||
// happens to end on a skip, but also don't punish intentional skips.
|
||||
} else {
|
||||
// Summary was expected but none was stored — count as failure
|
||||
session.consecutiveSummaryFailures += 1;
|
||||
if (session.consecutiveSummaryFailures >= MAX_CONSECUTIVE_SUMMARY_FAILURES) {
|
||||
logger.error('SESSION', `Circuit breaker: ${session.consecutiveSummaryFailures} consecutive summary failures — further summarize requests will be skipped (#1633)`, {
|
||||
sessionId: session.sessionDbId,
|
||||
contentSessionId: session.contentSessionId
|
||||
});
|
||||
}
|
||||
}
|
||||
// Gate ingestSummary({kind:'parsed'}) on real persistence so the event bus
|
||||
// only fires for summaries that actually landed in the DB. Skipped summaries
|
||||
// (<skip_summary/>) are an explicit bypass and still notify.
|
||||
if (parsed.kind === 'summary' && (parsed.data.skipped || session.lastSummaryStored)) {
|
||||
const messageId = session.processingMessageIds[0] ?? -1;
|
||||
ingestSummary({
|
||||
kind: 'parsed',
|
||||
sessionDbId: session.sessionDbId,
|
||||
messageId,
|
||||
contentSessionId: session.contentSessionId,
|
||||
parsed: parsed.data,
|
||||
});
|
||||
} else if (parsed.kind === 'summary') {
|
||||
logger.warn('DB', 'summary parsed but no row persisted; suppressing summaryStoredEvent', {
|
||||
sessionId: session.sessionDbId,
|
||||
memorySessionId: session.memorySessionId,
|
||||
});
|
||||
}
|
||||
|
||||
// CLAIM-CONFIRM: Now that storage succeeded, confirm all processing messages (delete from queue)
|
||||
@@ -342,7 +323,7 @@ async function syncAndBroadcastObservations(
|
||||
// Only runs if CLAUDE_MEM_FOLDER_CLAUDEMD_ENABLED is true (default: false)
|
||||
const settings = SettingsDefaultsManager.loadFromFile(USER_SETTINGS_PATH);
|
||||
// Handle both string 'true' and boolean true from JSON settings
|
||||
const settingValue = settings.CLAUDE_MEM_FOLDER_CLAUDEMD_ENABLED;
|
||||
const settingValue: unknown = settings.CLAUDE_MEM_FOLDER_CLAUDEMD_ENABLED;
|
||||
const folderClaudeMdEnabled = settingValue === 'true' || settingValue === true;
|
||||
|
||||
if (folderClaudeMdEnabled) {
|
||||
|
||||
@@ -47,20 +47,6 @@ export abstract class BaseRouteHandler {
|
||||
return value;
|
||||
}
|
||||
|
||||
/**
|
||||
* Validate required body parameters
|
||||
* Returns true if all required params present, sends 400 error otherwise
|
||||
*/
|
||||
protected validateRequired(req: Request, res: Response, params: string[]): boolean {
|
||||
for (const param of params) {
|
||||
if (req.body[param] === undefined || req.body[param] === null) {
|
||||
this.badRequest(res, `Missing ${param}`);
|
||||
return false;
|
||||
}
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
/**
|
||||
* Send 400 Bad Request response
|
||||
*/
|
||||
|
||||
@@ -42,42 +42,6 @@ export function createMiddleware(
|
||||
credentials: false
|
||||
}));
|
||||
|
||||
// Simple in-memory rate limiter (#1935).
|
||||
// Worker binds localhost-only, so in practice this is a global 300 req/min
|
||||
// cap — every caller shares the 127.0.0.1/::1 bucket.
|
||||
const requestCounts = new Map<string, { count: number; resetAt: number }>();
|
||||
const RATE_LIMIT_WINDOW_MS = 60_000;
|
||||
const RATE_LIMIT_MAX_REQUESTS = 300;
|
||||
|
||||
const rateLimiter: RequestHandler = (req, res, next) => {
|
||||
// Normalise IPv4-mapped IPv6 so 127.0.0.1 and ::ffff:127.0.0.1 share a bucket.
|
||||
const clientIp = (req.socket.remoteAddress ?? req.ip ?? 'unknown').replace(/^::ffff:/, '');
|
||||
const now = Date.now();
|
||||
let entry = requestCounts.get(clientIp);
|
||||
|
||||
if (!entry || now >= entry.resetAt) {
|
||||
// Safety valve in case the worker is ever bound non-localhost.
|
||||
if (requestCounts.size > 1000) {
|
||||
for (const [ip, e] of requestCounts) {
|
||||
if (now >= e.resetAt) requestCounts.delete(ip);
|
||||
}
|
||||
}
|
||||
entry = { count: 0, resetAt: now + RATE_LIMIT_WINDOW_MS };
|
||||
requestCounts.set(clientIp, entry);
|
||||
}
|
||||
|
||||
if (entry.count >= RATE_LIMIT_MAX_REQUESTS) {
|
||||
res.set('Retry-After', String(Math.ceil((entry.resetAt - now) / 1000)));
|
||||
res.status(429).json({ error: 'Rate limit exceeded' });
|
||||
return;
|
||||
}
|
||||
entry.count++;
|
||||
|
||||
next();
|
||||
};
|
||||
|
||||
middlewares.push(rateLimiter);
|
||||
|
||||
// HTTP request/response logging
|
||||
middlewares.push((req: Request, res: Response, next: NextFunction) => {
|
||||
// Skip logging for static assets, health checks, and polling endpoints
|
||||
|
||||
@@ -0,0 +1,37 @@
|
||||
/**
|
||||
* Zod body-validation middleware — PATHFINDER-2026-04-22 Plan 06 Phase 2.
|
||||
*
|
||||
* Canonical signature: given a Zod schema, parse `req.body` with `safeParse`.
|
||||
* On failure, respond 400 with `{ error: 'ValidationError', issues: [...] }`
|
||||
* and stop. On success, replace `req.body` with the parsed (typed) value and
|
||||
* call `next()`.
|
||||
*
|
||||
* Principles:
|
||||
* - Principle 2 — Fail-fast over grace-degrade. No try/catch swallow,
|
||||
* no coercion, no "best-effort" defaults.
|
||||
* - Principle 6 — One helper, N callers. Every validated POST/PUT
|
||||
* across `src/services/worker/http/routes/` uses this one middleware
|
||||
* wrapped around a per-route Zod schema declared at the top of its
|
||||
* owning route file.
|
||||
*/
|
||||
|
||||
import type { RequestHandler } from 'express';
|
||||
import type { ZodTypeAny } from 'zod';
|
||||
|
||||
export const validateBody = <S extends ZodTypeAny>(schema: S): RequestHandler =>
|
||||
(req, res, next) => {
|
||||
const result = schema.safeParse(req.body);
|
||||
if (!result.success) {
|
||||
res.status(400).json({
|
||||
error: 'ValidationError',
|
||||
issues: result.error.issues.map(i => ({
|
||||
path: i.path,
|
||||
message: i.message,
|
||||
code: i.code,
|
||||
})),
|
||||
});
|
||||
return;
|
||||
}
|
||||
req.body = result.data;
|
||||
next();
|
||||
};
|
||||
@@ -0,0 +1,78 @@
|
||||
/**
|
||||
* Chroma Routes
|
||||
*
|
||||
* Provides diagnostic endpoints for ChromaDB integration.
|
||||
*/
|
||||
|
||||
import express, { Request, Response } from 'express';
|
||||
import { BaseRouteHandler } from '../BaseRouteHandler.js';
|
||||
import { ChromaMcpManager } from '../../../sync/ChromaMcpManager.js';
|
||||
import { logger } from '../../../../utils/logger.js';
|
||||
import { SettingsDefaultsManager } from '../../../../shared/SettingsDefaultsManager.js';
|
||||
import { USER_SETTINGS_PATH } from '../../../../shared/paths.js';
|
||||
|
||||
export class ChromaRoutes extends BaseRouteHandler {
|
||||
setupRoutes(app: express.Application): void {
|
||||
app.get('/api/chroma/status', this.handleGetStatus.bind(this));
|
||||
}
|
||||
|
||||
/**
|
||||
* GET /api/chroma/status
|
||||
* Returns current health and connection status of chroma-mcp.
|
||||
*
|
||||
* Pass `?deep=1` (or `?deep=true`) to additionally run a real
|
||||
* semantic-search round-trip via ChromaMcpManager.probeSemanticSearch().
|
||||
* The cheap path (no `deep`) stays cheap — it only calls isHealthy().
|
||||
*/
|
||||
private handleGetStatus = this.wrapHandler(async (req: Request, res: Response): Promise<void> => {
|
||||
const settings = SettingsDefaultsManager.loadFromFile(USER_SETTINGS_PATH);
|
||||
const chromaEnabled = settings.CLAUDE_MEM_CHROMA_ENABLED !== 'false';
|
||||
|
||||
// Truthy check: any non-empty, non-"false"/"0" value enables deep probe.
|
||||
// Bare `?deep` (no value) shows up as '' in Express, which we treat as enabled.
|
||||
const deepRaw = req.query.deep;
|
||||
const deepEnabled =
|
||||
deepRaw !== undefined &&
|
||||
deepRaw !== 'false' &&
|
||||
deepRaw !== '0';
|
||||
|
||||
if (!chromaEnabled) {
|
||||
res.json({
|
||||
status: 'disabled',
|
||||
connected: false,
|
||||
timestamp: new Date().toISOString(),
|
||||
details: 'Chroma is disabled via CLAUDE_MEM_CHROMA_ENABLED=false',
|
||||
deep: deepEnabled
|
||||
});
|
||||
return;
|
||||
}
|
||||
|
||||
const chromaMcp = ChromaMcpManager.getInstance();
|
||||
const isHealthy = await chromaMcp.isHealthy();
|
||||
|
||||
if (!deepEnabled) {
|
||||
res.json({
|
||||
status: isHealthy ? 'healthy' : 'unhealthy',
|
||||
connected: isHealthy,
|
||||
timestamp: new Date().toISOString(),
|
||||
details: isHealthy ? 'chroma-mcp is responding to tool calls' : 'chroma-mcp health check failed',
|
||||
deep: false
|
||||
});
|
||||
return;
|
||||
}
|
||||
|
||||
const probe = await chromaMcp.probeSemanticSearch();
|
||||
const status = probe.ok ? 'healthy' : 'unhealthy';
|
||||
|
||||
res.json({
|
||||
status,
|
||||
connected: isHealthy,
|
||||
timestamp: new Date().toISOString(),
|
||||
details: probe.ok
|
||||
? 'chroma-mcp semantic search round-trip succeeded'
|
||||
: `chroma-mcp deep probe failed at stage '${probe.stage}'`,
|
||||
deep: true,
|
||||
probe
|
||||
});
|
||||
});
|
||||
}
|
||||
@@ -6,14 +6,65 @@
|
||||
*/
|
||||
|
||||
import express, { Request, Response } from 'express';
|
||||
import { z } from 'zod';
|
||||
import { BaseRouteHandler } from '../BaseRouteHandler.js';
|
||||
import { logger } from '../../../../utils/logger.js';
|
||||
import { validateBody } from '../middleware/validateBody.js';
|
||||
import { CorpusStore } from '../../knowledge/CorpusStore.js';
|
||||
import { CorpusBuilder } from '../../knowledge/CorpusBuilder.js';
|
||||
import { KnowledgeAgent } from '../../knowledge/KnowledgeAgent.js';
|
||||
import type { CorpusFilter } from '../../knowledge/types.js';
|
||||
|
||||
const ALLOWED_CORPUS_TYPES = new Set(['decision', 'bugfix', 'feature', 'refactor', 'discovery', 'change', 'security_alert', 'security_note']);
|
||||
const ALLOWED_CORPUS_TYPES = ['decision', 'bugfix', 'feature', 'refactor', 'discovery', 'change', 'security_alert', 'security_note'] as const;
|
||||
const ALLOWED_CORPUS_TYPE_SET = new Set<string>(ALLOWED_CORPUS_TYPES);
|
||||
|
||||
// Plan 06 Phase 3 — per-route Zod schemas. Coercions match the legacy
|
||||
// `coerceStringArray` / `coercePositiveInteger` semantics: accept JSON
|
||||
// strings, comma-separated strings, or native arrays; reject empty fields.
|
||||
const stringArrayLike = z.preprocess((value) => {
|
||||
if (value === undefined || value === null || value === '') return undefined;
|
||||
if (Array.isArray(value)) return value;
|
||||
if (typeof value === 'string') {
|
||||
try {
|
||||
const parsed = JSON.parse(value);
|
||||
if (Array.isArray(parsed)) return parsed;
|
||||
} catch {
|
||||
// not JSON, fall through to comma split
|
||||
}
|
||||
return value.split(',').map((part) => part.trim()).filter(Boolean);
|
||||
}
|
||||
return value;
|
||||
}, z.array(z.string().min(1)).optional());
|
||||
|
||||
const positiveIntegerLike = z.preprocess((value) => {
|
||||
if (value === undefined || value === null || value === '') return undefined;
|
||||
if (typeof value === 'string') {
|
||||
const parsed = Number(value);
|
||||
return Number.isNaN(parsed) ? value : parsed;
|
||||
}
|
||||
return value;
|
||||
}, z.number().int().positive().optional());
|
||||
|
||||
const buildCorpusSchema = z.object({
|
||||
name: z.string().min(1),
|
||||
description: z.string().optional(),
|
||||
project: z.string().optional(),
|
||||
types: stringArrayLike.refine(
|
||||
(arr) => arr === undefined || arr.every((t) => ALLOWED_CORPUS_TYPE_SET.has(t)),
|
||||
{ message: `types must contain only ${ALLOWED_CORPUS_TYPES.join(', ')}` }
|
||||
),
|
||||
concepts: stringArrayLike,
|
||||
files: stringArrayLike,
|
||||
query: z.string().optional(),
|
||||
date_start: z.string().optional(),
|
||||
date_end: z.string().optional(),
|
||||
limit: positiveIntegerLike,
|
||||
}).passthrough();
|
||||
|
||||
const queryCorpusSchema = z.object({
|
||||
question: z.string().trim().min(1),
|
||||
}).passthrough();
|
||||
|
||||
const emptyBodySchema = z.object({}).passthrough();
|
||||
|
||||
export class CorpusRoutes extends BaseRouteHandler {
|
||||
constructor(
|
||||
@@ -25,14 +76,14 @@ export class CorpusRoutes extends BaseRouteHandler {
|
||||
}
|
||||
|
||||
setupRoutes(app: express.Application): void {
|
||||
app.post('/api/corpus', this.handleBuildCorpus.bind(this));
|
||||
app.post('/api/corpus', validateBody(buildCorpusSchema), this.handleBuildCorpus.bind(this));
|
||||
app.get('/api/corpus', this.handleListCorpora.bind(this));
|
||||
app.get('/api/corpus/:name', this.handleGetCorpus.bind(this));
|
||||
app.delete('/api/corpus/:name', this.handleDeleteCorpus.bind(this));
|
||||
app.post('/api/corpus/:name/rebuild', this.handleRebuildCorpus.bind(this));
|
||||
app.post('/api/corpus/:name/prime', this.handlePrimeCorpus.bind(this));
|
||||
app.post('/api/corpus/:name/query', this.handleQueryCorpus.bind(this));
|
||||
app.post('/api/corpus/:name/reprime', this.handleReprimeCorpus.bind(this));
|
||||
app.post('/api/corpus/:name/rebuild', validateBody(emptyBodySchema), this.handleRebuildCorpus.bind(this));
|
||||
app.post('/api/corpus/:name/prime', validateBody(emptyBodySchema), this.handlePrimeCorpus.bind(this));
|
||||
app.post('/api/corpus/:name/query', validateBody(queryCorpusSchema), this.handleQueryCorpus.bind(this));
|
||||
app.post('/api/corpus/:name/reprime', validateBody(emptyBodySchema), this.handleReprimeCorpus.bind(this));
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -41,42 +92,18 @@ export class CorpusRoutes extends BaseRouteHandler {
|
||||
* Body: { name, description?, project?, types?, concepts?, files?, query?, date_start?, date_end?, limit? }
|
||||
*/
|
||||
private handleBuildCorpus = this.wrapHandler(async (req: Request, res: Response): Promise<void> => {
|
||||
if (!req.body.name) {
|
||||
res.status(400).json({
|
||||
error: 'Missing required field: name',
|
||||
fix: 'Add a "name" field to your request body',
|
||||
example: { name: 'my-corpus', query: 'hooks', limit: 100 }
|
||||
});
|
||||
return;
|
||||
}
|
||||
|
||||
const { name, description, project, types, concepts, files, query, date_start, date_end, limit } = req.body;
|
||||
|
||||
const coercedTypes = this.coerceStringArray(types, 'types', res);
|
||||
if (coercedTypes === null) return;
|
||||
if (coercedTypes && !coercedTypes.every(type => ALLOWED_CORPUS_TYPES.has(type))) {
|
||||
this.badRequest(res, 'types must contain valid observation types');
|
||||
return;
|
||||
}
|
||||
|
||||
const coercedConcepts = this.coerceStringArray(concepts, 'concepts', res);
|
||||
if (coercedConcepts === null) return;
|
||||
|
||||
const coercedFiles = this.coerceStringArray(files, 'files', res);
|
||||
if (coercedFiles === null) return;
|
||||
|
||||
const coercedLimit = this.coercePositiveInteger(limit, 'limit', res);
|
||||
if (coercedLimit === null) return;
|
||||
const { name, description, project, types, concepts, files, query, date_start, date_end, limit } =
|
||||
req.body as z.infer<typeof buildCorpusSchema>;
|
||||
|
||||
const filter: CorpusFilter = {};
|
||||
if (project) filter.project = project;
|
||||
if (coercedTypes && coercedTypes.length > 0) filter.types = coercedTypes as CorpusFilter['types'];
|
||||
if (coercedConcepts && coercedConcepts.length > 0) filter.concepts = coercedConcepts;
|
||||
if (coercedFiles && coercedFiles.length > 0) filter.files = coercedFiles;
|
||||
if (types && types.length > 0) filter.types = types as CorpusFilter['types'];
|
||||
if (concepts && concepts.length > 0) filter.concepts = concepts;
|
||||
if (files && files.length > 0) filter.files = files;
|
||||
if (query) filter.query = query;
|
||||
if (date_start) filter.date_start = date_start;
|
||||
if (date_end) filter.date_end = date_end;
|
||||
if (coercedLimit !== undefined) filter.limit = coercedLimit;
|
||||
if (limit !== undefined) filter.limit = limit;
|
||||
|
||||
const corpus = await this.corpusBuilder.build(name, description || '', filter);
|
||||
|
||||
@@ -85,45 +112,6 @@ export class CorpusRoutes extends BaseRouteHandler {
|
||||
res.json(metadata);
|
||||
});
|
||||
|
||||
private coerceStringArray(value: unknown, fieldName: string, res: Response): string[] | null | undefined {
|
||||
if (value === undefined || value === null || value === '') {
|
||||
return undefined;
|
||||
}
|
||||
|
||||
let parsed = value;
|
||||
if (typeof value === 'string') {
|
||||
try {
|
||||
parsed = JSON.parse(value);
|
||||
} catch (parseError: unknown) {
|
||||
if (parseError instanceof Error) {
|
||||
logger.debug('HTTP', `${fieldName} is not valid JSON, treating as comma-separated string`, { value });
|
||||
}
|
||||
parsed = value.split(',').map(part => part.trim()).filter(Boolean);
|
||||
}
|
||||
}
|
||||
|
||||
if (!Array.isArray(parsed) || !parsed.every(item => typeof item === 'string')) {
|
||||
this.badRequest(res, `${fieldName} must be an array of strings`);
|
||||
return null;
|
||||
}
|
||||
|
||||
return parsed.map(item => item.trim()).filter(Boolean);
|
||||
}
|
||||
|
||||
private coercePositiveInteger(value: unknown, fieldName: string, res: Response): number | null | undefined {
|
||||
if (value === undefined || value === null || value === '') {
|
||||
return undefined;
|
||||
}
|
||||
|
||||
const parsed = typeof value === 'string' ? Number(value) : value;
|
||||
if (typeof parsed !== 'number' || !Number.isInteger(parsed) || parsed <= 0) {
|
||||
this.badRequest(res, `${fieldName} must be a positive integer`);
|
||||
return null;
|
||||
}
|
||||
|
||||
return parsed;
|
||||
}
|
||||
|
||||
/**
|
||||
* List all corpora with stats
|
||||
* GET /api/corpus
|
||||
@@ -234,16 +222,6 @@ export class CorpusRoutes extends BaseRouteHandler {
|
||||
*/
|
||||
private handleQueryCorpus = this.wrapHandler(async (req: Request, res: Response): Promise<void> => {
|
||||
const { name } = req.params;
|
||||
|
||||
if (!req.body.question || typeof req.body.question !== 'string' || req.body.question.trim().length === 0) {
|
||||
res.status(400).json({
|
||||
error: 'Missing required field: question',
|
||||
fix: 'Add a non-empty "question" string to your request body',
|
||||
example: { question: 'What architectural decisions were made about hooks?' }
|
||||
});
|
||||
return;
|
||||
}
|
||||
|
||||
const corpus = this.corpusStore.read(name);
|
||||
|
||||
if (!corpus) {
|
||||
|
||||
@@ -6,6 +6,7 @@
|
||||
*/
|
||||
|
||||
import express, { Request, Response } from 'express';
|
||||
import { z } from 'zod';
|
||||
import path from 'path';
|
||||
import { readFileSync, statSync, existsSync } from 'fs';
|
||||
import { logger } from '../../../../utils/logger.js';
|
||||
@@ -18,9 +19,63 @@ import { SessionManager } from '../../SessionManager.js';
|
||||
import { SSEBroadcaster } from '../../SSEBroadcaster.js';
|
||||
import type { WorkerService } from '../../../worker-service.js';
|
||||
import { BaseRouteHandler } from '../BaseRouteHandler.js';
|
||||
import { validateBody } from '../middleware/validateBody.js';
|
||||
import { normalizePlatformSource } from '../../../../shared/platform-source.js';
|
||||
import { getObservationsByFilePath } from '../../../sqlite/observations/get.js';
|
||||
|
||||
// Plan 06 Phase 3 — per-route Zod schemas. Coercions match the legacy
|
||||
// behaviour where MCP clients sometimes send arrays as JSON-encoded strings
|
||||
// or comma-separated strings.
|
||||
const integerArrayLike = z.preprocess((value) => {
|
||||
if (Array.isArray(value)) return value;
|
||||
if (typeof value === 'string') {
|
||||
try {
|
||||
const parsed = JSON.parse(value);
|
||||
if (Array.isArray(parsed)) return parsed;
|
||||
} catch {
|
||||
// not JSON, fall through to comma split
|
||||
}
|
||||
// Keep NaN values so the inner z.number().int() schema rejects them
|
||||
// — coercion does not silently drop garbage input.
|
||||
return value.split(',').map((part) => Number(part.trim()));
|
||||
}
|
||||
return value;
|
||||
}, z.array(z.number().int()));
|
||||
|
||||
const stringArrayLike = z.preprocess((value) => {
|
||||
if (Array.isArray(value)) return value;
|
||||
if (typeof value === 'string') {
|
||||
try {
|
||||
const parsed = JSON.parse(value);
|
||||
if (Array.isArray(parsed)) return parsed;
|
||||
} catch {
|
||||
// not JSON, fall through to comma split
|
||||
}
|
||||
return value.split(',').map((part) => part.trim()).filter(Boolean);
|
||||
}
|
||||
return value;
|
||||
}, z.array(z.string()));
|
||||
|
||||
const observationsBatchSchema = z.object({
|
||||
ids: integerArrayLike,
|
||||
orderBy: z.enum(['date_desc', 'date_asc']).optional(),
|
||||
limit: z.number().int().positive().optional(),
|
||||
project: z.string().optional(),
|
||||
}).passthrough();
|
||||
|
||||
const sdkSessionsBatchSchema = z.object({
|
||||
memorySessionIds: stringArrayLike,
|
||||
}).passthrough();
|
||||
|
||||
const setProcessingSchema = z.object({}).passthrough();
|
||||
|
||||
const importSchema = z.object({
|
||||
sessions: z.array(z.unknown()).optional(),
|
||||
summaries: z.array(z.unknown()).optional(),
|
||||
observations: z.array(z.unknown()).optional(),
|
||||
prompts: z.array(z.unknown()).optional(),
|
||||
}).passthrough();
|
||||
|
||||
export class DataRoutes extends BaseRouteHandler {
|
||||
constructor(
|
||||
private paginationHelper: PaginationHelper,
|
||||
@@ -42,9 +97,9 @@ export class DataRoutes extends BaseRouteHandler {
|
||||
// Fetch by ID endpoints
|
||||
app.get('/api/observation/:id', this.handleGetObservationById.bind(this));
|
||||
app.get('/api/observations/by-file', this.handleGetObservationsByFile.bind(this));
|
||||
app.post('/api/observations/batch', this.handleGetObservationsByIds.bind(this));
|
||||
app.post('/api/observations/batch', validateBody(observationsBatchSchema), this.handleGetObservationsByIds.bind(this));
|
||||
app.get('/api/session/:id', this.handleGetSessionById.bind(this));
|
||||
app.post('/api/sdk-sessions/batch', this.handleGetSdkSessionsByIds.bind(this));
|
||||
app.post('/api/sdk-sessions/batch', validateBody(sdkSessionsBatchSchema), this.handleGetSdkSessionsByIds.bind(this));
|
||||
app.get('/api/prompt/:id', this.handleGetPromptById.bind(this));
|
||||
|
||||
// Metadata endpoints
|
||||
@@ -53,16 +108,10 @@ export class DataRoutes extends BaseRouteHandler {
|
||||
|
||||
// Processing status endpoints
|
||||
app.get('/api/processing-status', this.handleGetProcessingStatus.bind(this));
|
||||
app.post('/api/processing', this.handleSetProcessing.bind(this));
|
||||
|
||||
// Pending queue management endpoints
|
||||
app.get('/api/pending-queue', this.handleGetPendingQueue.bind(this));
|
||||
app.post('/api/pending-queue/process', this.handleProcessPendingQueue.bind(this));
|
||||
app.delete('/api/pending-queue/failed', this.handleClearFailedQueue.bind(this));
|
||||
app.delete('/api/pending-queue/all', this.handleClearAllQueue.bind(this));
|
||||
app.post('/api/processing', validateBody(setProcessingSchema), this.handleSetProcessing.bind(this));
|
||||
|
||||
// Import endpoint
|
||||
app.post('/api/import', this.handleImport.bind(this));
|
||||
app.post('/api/import', validateBody(importSchema), this.handleImport.bind(this));
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -139,29 +188,13 @@ export class DataRoutes extends BaseRouteHandler {
|
||||
* Body: { ids: number[], orderBy?: 'date_desc' | 'date_asc', limit?: number, project?: string }
|
||||
*/
|
||||
private handleGetObservationsByIds = this.wrapHandler((req: Request, res: Response): void => {
|
||||
let { ids, orderBy, limit, project } = req.body;
|
||||
|
||||
// Coerce string-encoded arrays from MCP clients (e.g. "[1,2,3]" or "1,2,3")
|
||||
if (typeof ids === 'string') {
|
||||
try { ids = JSON.parse(ids); } catch { ids = ids.split(',').map(Number); }
|
||||
}
|
||||
|
||||
if (!ids || !Array.isArray(ids)) {
|
||||
this.badRequest(res, 'ids must be an array of numbers');
|
||||
return;
|
||||
}
|
||||
const { ids, orderBy, limit, project } = req.body as z.infer<typeof observationsBatchSchema>;
|
||||
|
||||
if (ids.length === 0) {
|
||||
res.json([]);
|
||||
return;
|
||||
}
|
||||
|
||||
// Validate all IDs are numbers
|
||||
if (!ids.every(id => typeof id === 'number' && Number.isInteger(id))) {
|
||||
this.badRequest(res, 'All ids must be integers');
|
||||
return;
|
||||
}
|
||||
|
||||
const store = this.dbManager.getSessionStore();
|
||||
const observations = store.getObservationsByIds(ids, { orderBy, limit, project });
|
||||
|
||||
@@ -193,17 +226,7 @@ export class DataRoutes extends BaseRouteHandler {
|
||||
* Body: { memorySessionIds: string[] }
|
||||
*/
|
||||
private handleGetSdkSessionsByIds = this.wrapHandler((req: Request, res: Response): void => {
|
||||
let { memorySessionIds } = req.body;
|
||||
|
||||
// Coerce string-encoded arrays from MCP clients (e.g. '["a","b"]' or "a,b")
|
||||
if (typeof memorySessionIds === 'string') {
|
||||
try { memorySessionIds = JSON.parse(memorySessionIds); } catch { memorySessionIds = memorySessionIds.split(',').map((s: string) => s.trim()); }
|
||||
}
|
||||
|
||||
if (!Array.isArray(memorySessionIds)) {
|
||||
this.badRequest(res, 'memorySessionIds must be an array');
|
||||
return;
|
||||
}
|
||||
const { memorySessionIds } = req.body as z.infer<typeof sdkSessionsBatchSchema>;
|
||||
|
||||
const store = this.dbManager.getSessionStore();
|
||||
const sessions = store.getSdkSessionsBySessionIds(memorySessionIds);
|
||||
@@ -467,96 +490,4 @@ export class DataRoutes extends BaseRouteHandler {
|
||||
});
|
||||
});
|
||||
|
||||
/**
|
||||
* Get pending queue contents
|
||||
* GET /api/pending-queue
|
||||
* Returns all pending, processing, and failed messages with optional recently processed
|
||||
*/
|
||||
private handleGetPendingQueue = this.wrapHandler((req: Request, res: Response): void => {
|
||||
const { PendingMessageStore } = require('../../../sqlite/PendingMessageStore.js');
|
||||
const pendingStore = new PendingMessageStore(this.dbManager.getSessionStore().db, 3);
|
||||
|
||||
// Get queue contents (pending, processing, failed)
|
||||
const queueMessages = pendingStore.getQueueMessages();
|
||||
|
||||
// Get recently processed (last 30 min, up to 20)
|
||||
const recentlyProcessed = pendingStore.getRecentlyProcessed(20, 30);
|
||||
|
||||
// Get stuck message count (processing > 5 min)
|
||||
const stuckCount = pendingStore.getStuckCount(5 * 60 * 1000);
|
||||
|
||||
// Get sessions with pending work
|
||||
const sessionsWithPending = pendingStore.getSessionsWithPendingMessages();
|
||||
|
||||
res.json({
|
||||
queue: {
|
||||
messages: queueMessages,
|
||||
totalPending: queueMessages.filter((m: { status: string }) => m.status === 'pending').length,
|
||||
totalProcessing: queueMessages.filter((m: { status: string }) => m.status === 'processing').length,
|
||||
totalFailed: queueMessages.filter((m: { status: string }) => m.status === 'failed').length,
|
||||
stuckCount
|
||||
},
|
||||
recentlyProcessed,
|
||||
sessionsWithPendingWork: sessionsWithPending
|
||||
});
|
||||
});
|
||||
|
||||
/**
|
||||
* Process pending queue
|
||||
* POST /api/pending-queue/process
|
||||
* Body: { sessionLimit?: number } - defaults to 10
|
||||
* Starts SDK agents for sessions with pending messages
|
||||
*/
|
||||
private handleProcessPendingQueue = this.wrapHandler(async (req: Request, res: Response): Promise<void> => {
|
||||
const sessionLimit = Math.min(
|
||||
Math.max(parseInt(req.body.sessionLimit, 10) || 10, 1),
|
||||
100 // Max 100 sessions at once
|
||||
);
|
||||
|
||||
const result = await this.workerService.processPendingQueues(sessionLimit);
|
||||
|
||||
res.json({
|
||||
success: true,
|
||||
...result
|
||||
});
|
||||
});
|
||||
|
||||
/**
|
||||
* Clear all failed messages from the queue
|
||||
* DELETE /api/pending-queue/failed
|
||||
* Returns the number of messages cleared
|
||||
*/
|
||||
private handleClearFailedQueue = this.wrapHandler((req: Request, res: Response): void => {
|
||||
const { PendingMessageStore } = require('../../../sqlite/PendingMessageStore.js');
|
||||
const pendingStore = new PendingMessageStore(this.dbManager.getSessionStore().db, 3);
|
||||
|
||||
const clearedCount = pendingStore.clearFailed();
|
||||
|
||||
logger.info('QUEUE', 'Cleared failed queue messages', { clearedCount });
|
||||
|
||||
res.json({
|
||||
success: true,
|
||||
clearedCount
|
||||
});
|
||||
});
|
||||
|
||||
/**
|
||||
* Clear all messages from the queue (pending, processing, and failed)
|
||||
* DELETE /api/pending-queue/all
|
||||
* Returns the number of messages cleared
|
||||
*/
|
||||
private handleClearAllQueue = this.wrapHandler((req: Request, res: Response): void => {
|
||||
const { PendingMessageStore } = require('../../../sqlite/PendingMessageStore.js');
|
||||
const pendingStore = new PendingMessageStore(this.dbManager.getSessionStore().db, 3);
|
||||
|
||||
const clearedCount = pendingStore.clearAll();
|
||||
|
||||
logger.warn('QUEUE', 'Cleared ALL queue messages (pending, processing, failed)', { clearedCount });
|
||||
|
||||
res.json({
|
||||
success: true,
|
||||
clearedCount
|
||||
});
|
||||
});
|
||||
|
||||
}
|
||||
|
||||
@@ -5,11 +5,16 @@
|
||||
*/
|
||||
|
||||
import express, { Request, Response } from 'express';
|
||||
import { z } from 'zod';
|
||||
import { openSync, fstatSync, readSync, closeSync, existsSync, writeFileSync } from 'fs';
|
||||
import { join } from 'path';
|
||||
import { logger } from '../../../../utils/logger.js';
|
||||
import { SettingsDefaultsManager } from '../../../../shared/SettingsDefaultsManager.js';
|
||||
import { BaseRouteHandler } from '../BaseRouteHandler.js';
|
||||
import { validateBody } from '../middleware/validateBody.js';
|
||||
|
||||
// Plan 06 Phase 3 — per-route Zod schema. The clear-logs endpoint takes no body.
|
||||
const clearLogsSchema = z.object({}).passthrough();
|
||||
|
||||
/**
|
||||
* Read the last N lines from a file without loading the entire file into memory.
|
||||
@@ -99,7 +104,7 @@ export class LogsRoutes extends BaseRouteHandler {
|
||||
|
||||
setupRoutes(app: express.Application): void {
|
||||
app.get('/api/logs', this.handleGetLogs.bind(this));
|
||||
app.post('/api/logs/clear', this.handleClearLogs.bind(this));
|
||||
app.post('/api/logs/clear', validateBody(clearLogsSchema), this.handleClearLogs.bind(this));
|
||||
}
|
||||
|
||||
/**
|
||||
|
||||
@@ -6,10 +6,19 @@
|
||||
*/
|
||||
|
||||
import express, { Request, Response } from 'express';
|
||||
import { z } from 'zod';
|
||||
import { BaseRouteHandler } from '../BaseRouteHandler.js';
|
||||
import { validateBody } from '../middleware/validateBody.js';
|
||||
import { logger } from '../../../../utils/logger.js';
|
||||
import type { DatabaseManager } from '../../DatabaseManager.js';
|
||||
|
||||
// Plan 06 Phase 3 — per-route Zod schema.
|
||||
const saveMemorySchema = z.object({
|
||||
text: z.string().trim().min(1),
|
||||
title: z.string().optional(),
|
||||
project: z.string().optional(),
|
||||
}).passthrough();
|
||||
|
||||
export class MemoryRoutes extends BaseRouteHandler {
|
||||
constructor(
|
||||
private dbManager: DatabaseManager,
|
||||
@@ -19,7 +28,7 @@ export class MemoryRoutes extends BaseRouteHandler {
|
||||
}
|
||||
|
||||
setupRoutes(app: express.Application): void {
|
||||
app.post('/api/memory/save', this.handleSaveMemory.bind(this));
|
||||
app.post('/api/memory/save', validateBody(saveMemorySchema), this.handleSaveMemory.bind(this));
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -27,14 +36,9 @@ export class MemoryRoutes extends BaseRouteHandler {
|
||||
* Body: { text: string, title?: string, project?: string }
|
||||
*/
|
||||
private handleSaveMemory = this.wrapHandler(async (req: Request, res: Response): Promise<void> => {
|
||||
const { text, title, project } = req.body;
|
||||
const { text, title, project } = req.body as z.infer<typeof saveMemorySchema>;
|
||||
const targetProject = project || this.defaultProject;
|
||||
|
||||
if (!text || typeof text !== 'string' || text.trim().length === 0) {
|
||||
this.badRequest(res, 'text is required and must be non-empty');
|
||||
return;
|
||||
}
|
||||
|
||||
const sessionStore = this.dbManager.getSessionStore();
|
||||
const chromaSync = this.dbManager.getChromaSync();
|
||||
|
||||
@@ -69,6 +73,17 @@ export class MemoryRoutes extends BaseRouteHandler {
|
||||
});
|
||||
|
||||
// 4. Sync to ChromaDB (async, fire-and-forget)
|
||||
if (!chromaSync) {
|
||||
logger.debug('CHROMA', 'ChromaDB sync skipped (chromaSync not available)', { id: result.id });
|
||||
res.json({
|
||||
success: true,
|
||||
id: result.id,
|
||||
title: observation.title,
|
||||
project: targetProject,
|
||||
message: `Memory saved as observation #${result.id}`
|
||||
});
|
||||
return;
|
||||
}
|
||||
chromaSync.syncObservation(
|
||||
result.id,
|
||||
memorySessionId,
|
||||
|
||||
@@ -6,9 +6,21 @@
|
||||
*/
|
||||
|
||||
import express, { Request, Response } from 'express';
|
||||
import { z } from 'zod';
|
||||
import { SearchManager } from '../../SearchManager.js';
|
||||
import { BaseRouteHandler } from '../BaseRouteHandler.js';
|
||||
import { validateBody } from '../middleware/validateBody.js';
|
||||
import { logger } from '../../../../utils/logger.js';
|
||||
import { groupByDate } from '../../../../shared/timeline-formatting.js';
|
||||
import type { ObservationSearchResult, SessionSummarySearchResult } from '../../../sqlite/types.js';
|
||||
|
||||
// Plan 06 Phase 3 — per-route Zod schema. The semantic-context endpoint
|
||||
// also accepts query-string fallbacks, so the body itself is fully optional.
|
||||
const semanticContextSchema = z.object({
|
||||
q: z.string().optional(),
|
||||
project: z.string().optional(),
|
||||
limit: z.union([z.string(), z.number()]).optional(),
|
||||
}).passthrough();
|
||||
|
||||
export class SearchRoutes extends BaseRouteHandler {
|
||||
constructor(
|
||||
@@ -38,7 +50,7 @@ export class SearchRoutes extends BaseRouteHandler {
|
||||
app.get('/api/context/timeline', this.handleGetContextTimeline.bind(this));
|
||||
app.get('/api/context/preview', this.handleContextPreview.bind(this));
|
||||
app.get('/api/context/inject', this.handleContextInject.bind(this));
|
||||
app.post('/api/context/semantic', this.handleSemanticContext.bind(this));
|
||||
app.post('/api/context/semantic', validateBody(semanticContextSchema), this.handleSemanticContext.bind(this));
|
||||
|
||||
// Timeline and help endpoints
|
||||
app.get('/api/timeline/by-query', this.handleGetTimelineByQuery.bind(this));
|
||||
@@ -120,28 +132,156 @@ export class SearchRoutes extends BaseRouteHandler {
|
||||
/**
|
||||
* Search observations by concept
|
||||
* GET /api/search/by-concept?concept=discovery&limit=5
|
||||
*
|
||||
* Chroma errors surface as 503 via ChromaUnavailableError (thrown by orchestrator).
|
||||
*/
|
||||
private handleSearchByConcept = this.wrapHandler(async (req: Request, res: Response): Promise<void> => {
|
||||
const result = await this.searchManager.findByConcept(req.query);
|
||||
res.json(result);
|
||||
const orchestrator = this.searchManager.getOrchestrator();
|
||||
const formatter = this.searchManager.getFormatter();
|
||||
const query = req.query as Record<string, any>;
|
||||
const rawConcept = query.concepts ?? query.concept;
|
||||
const concept = Array.isArray(rawConcept) ? rawConcept[0] : rawConcept;
|
||||
const strategyResult = await orchestrator.findByConcept(concept, query);
|
||||
const observations = strategyResult.results.observations;
|
||||
|
||||
if (observations.length === 0) {
|
||||
res.json({
|
||||
content: [{
|
||||
type: 'text' as const,
|
||||
text: `No observations found with concept "${concept}"`
|
||||
}]
|
||||
});
|
||||
return;
|
||||
}
|
||||
|
||||
const header = `Found ${observations.length} observation(s) with concept "${concept}"\n\n${formatter.formatTableHeader()}`;
|
||||
const rows = observations.map((obs: ObservationSearchResult, i: number) => formatter.formatObservationIndex(obs, i));
|
||||
res.json({
|
||||
content: [{
|
||||
type: 'text' as const,
|
||||
text: header + '\n' + rows.join('\n')
|
||||
}]
|
||||
});
|
||||
});
|
||||
|
||||
/**
|
||||
* Search by file path
|
||||
* GET /api/search/by-file?filePath=...&limit=10
|
||||
*
|
||||
* Chroma errors surface as 503 via ChromaUnavailableError (thrown by orchestrator).
|
||||
*/
|
||||
private handleSearchByFile = this.wrapHandler(async (req: Request, res: Response): Promise<void> => {
|
||||
const result = await this.searchManager.findByFile(req.query);
|
||||
res.json(result);
|
||||
const orchestrator = this.searchManager.getOrchestrator();
|
||||
const formatter = this.searchManager.getFormatter();
|
||||
const query = req.query as Record<string, any>;
|
||||
// Accept both filePath and files for API compatibility
|
||||
const rawFilePath = query.filePath ?? query.files;
|
||||
const filePath = Array.isArray(rawFilePath)
|
||||
? rawFilePath[0]
|
||||
: (typeof rawFilePath === 'string' && rawFilePath.includes(','))
|
||||
? rawFilePath.split(',')[0].trim()
|
||||
: rawFilePath;
|
||||
|
||||
const { observations, sessions } = await orchestrator.findByFile(filePath, query);
|
||||
const totalResults = observations.length + sessions.length;
|
||||
|
||||
if (totalResults === 0) {
|
||||
res.json({
|
||||
content: [{
|
||||
type: 'text' as const,
|
||||
text: `No results found for file "${filePath}"`
|
||||
}]
|
||||
});
|
||||
return;
|
||||
}
|
||||
|
||||
// Combine observations and sessions with timestamps for date grouping
|
||||
const combined: Array<{
|
||||
type: 'observation' | 'session';
|
||||
data: ObservationSearchResult | SessionSummarySearchResult;
|
||||
epoch: number;
|
||||
created_at: string;
|
||||
}> = [
|
||||
...observations.map((obs: ObservationSearchResult) => ({
|
||||
type: 'observation' as const,
|
||||
data: obs,
|
||||
epoch: obs.created_at_epoch,
|
||||
created_at: obs.created_at
|
||||
})),
|
||||
...sessions.map((sess: SessionSummarySearchResult) => ({
|
||||
type: 'session' as const,
|
||||
data: sess,
|
||||
epoch: sess.created_at_epoch,
|
||||
created_at: sess.created_at
|
||||
}))
|
||||
];
|
||||
|
||||
combined.sort((a, b) => b.epoch - a.epoch);
|
||||
const resultsByDate = groupByDate(combined, item => item.created_at);
|
||||
|
||||
const lines: string[] = [];
|
||||
lines.push(`Found ${totalResults} result(s) for file "${filePath}"`);
|
||||
lines.push('');
|
||||
|
||||
for (const [day, dayResults] of resultsByDate) {
|
||||
lines.push(`### ${day}`);
|
||||
lines.push('');
|
||||
lines.push(formatter.formatTableHeader());
|
||||
for (const result of dayResults) {
|
||||
if (result.type === 'observation') {
|
||||
lines.push(formatter.formatObservationIndex(result.data as ObservationSearchResult, 0));
|
||||
} else {
|
||||
lines.push(formatter.formatSessionIndex(result.data as SessionSummarySearchResult, 0));
|
||||
}
|
||||
}
|
||||
lines.push('');
|
||||
}
|
||||
|
||||
res.json({
|
||||
content: [{
|
||||
type: 'text' as const,
|
||||
text: lines.join('\n')
|
||||
}]
|
||||
});
|
||||
});
|
||||
|
||||
/**
|
||||
* Search observations by type
|
||||
* GET /api/search/by-type?type=bugfix&limit=10
|
||||
*
|
||||
* Chroma errors surface as 503 via ChromaUnavailableError (thrown by orchestrator).
|
||||
*/
|
||||
private handleSearchByType = this.wrapHandler(async (req: Request, res: Response): Promise<void> => {
|
||||
const result = await this.searchManager.findByType(req.query);
|
||||
res.json(result);
|
||||
const orchestrator = this.searchManager.getOrchestrator();
|
||||
const formatter = this.searchManager.getFormatter();
|
||||
const query = req.query as Record<string, any>;
|
||||
const rawType = query.type;
|
||||
const type = (typeof rawType === 'string' && rawType.includes(','))
|
||||
? rawType.split(',').map((s: string) => s.trim()).filter(Boolean)
|
||||
: rawType;
|
||||
const typeStr = Array.isArray(type) ? type.join(', ') : type;
|
||||
|
||||
const strategyResult = await orchestrator.findByType(type, query);
|
||||
const observations = strategyResult.results.observations;
|
||||
|
||||
if (observations.length === 0) {
|
||||
res.json({
|
||||
content: [{
|
||||
type: 'text' as const,
|
||||
text: `No observations found with type "${typeStr}"`
|
||||
}]
|
||||
});
|
||||
return;
|
||||
}
|
||||
|
||||
const header = `Found ${observations.length} observation(s) with type "${typeStr}"\n\n${formatter.formatTableHeader()}`;
|
||||
const rows = observations.map((obs: ObservationSearchResult, i: number) => formatter.formatObservationIndex(obs, i));
|
||||
res.json({
|
||||
content: [{
|
||||
type: 'text' as const,
|
||||
text: header + '\n' + rows.join('\n')
|
||||
}]
|
||||
});
|
||||
});
|
||||
|
||||
/**
|
||||
|
||||
@@ -6,6 +6,9 @@
|
||||
*/
|
||||
|
||||
import express, { Request, Response } from 'express';
|
||||
import { z } from 'zod';
|
||||
import { ingestObservation } from '../shared.js';
|
||||
import { validateBody } from '../middleware/validateBody.js';
|
||||
import { getWorkerPort } from '../../../../shared/worker-utils.js';
|
||||
import { logger } from '../../../../utils/logger.js';
|
||||
import { stripMemoryTagsFromJson, stripMemoryTagsFromPrompt } from '../../../../utils/tag-stripping.js';
|
||||
@@ -21,13 +24,14 @@ import { SessionCompletionHandler } from '../../session/SessionCompletionHandler
|
||||
import { PrivacyCheckValidator } from '../../validation/PrivacyCheckValidator.js';
|
||||
import { SettingsDefaultsManager } from '../../../../shared/SettingsDefaultsManager.js';
|
||||
import { USER_SETTINGS_PATH } from '../../../../shared/paths.js';
|
||||
import { getProcessBySession, ensureProcessExit } from '../../ProcessRegistry.js';
|
||||
import { getSdkProcessForSession, ensureSdkProcessExit } from '../../../../supervisor/process-registry.js';
|
||||
import { getProjectContext } from '../../../../utils/project-name.js';
|
||||
import { normalizePlatformSource } from '../../../../shared/platform-source.js';
|
||||
import { RestartGuard } from '../../RestartGuard.js';
|
||||
|
||||
const MAX_USER_PROMPT_BYTES = 256 * 1024;
|
||||
|
||||
export class SessionRoutes extends BaseRouteHandler {
|
||||
private completionHandler: SessionCompletionHandler;
|
||||
private spawnInProgress = new Map<number, boolean>();
|
||||
private crashRecoveryScheduled = new Set<number>();
|
||||
|
||||
@@ -39,13 +43,9 @@ export class SessionRoutes extends BaseRouteHandler {
|
||||
private openRouterAgent: OpenRouterAgent,
|
||||
private eventBroadcaster: SessionEventBroadcaster,
|
||||
private workerService: WorkerService,
|
||||
completionHandler: SessionCompletionHandler
|
||||
private completionHandler: SessionCompletionHandler,
|
||||
) {
|
||||
super();
|
||||
// Use the shared completion handler from WorkerService so the SDK-agent
|
||||
// completion path and the HTTP fallback route operate on the same instance
|
||||
// (avoids duplicate construction; keeps finalize semantics consistent).
|
||||
this.completionHandler = completionHandler;
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -97,7 +97,7 @@ export class SessionRoutes extends BaseRouteHandler {
|
||||
private static readonly STALE_GENERATOR_THRESHOLD_MS = 30_000; // 30 seconds (#1099)
|
||||
private static readonly MAX_SESSION_WALL_CLOCK_MS = 4 * 60 * 60 * 1000; // 4 hours (#1590)
|
||||
|
||||
private ensureGeneratorRunning(sessionDbId: number, source: string): void {
|
||||
public ensureGeneratorRunning(sessionDbId: number, source: string): void {
|
||||
const session = this.sessionManager.getSession(sessionDbId);
|
||||
if (!session) return;
|
||||
|
||||
@@ -121,7 +121,7 @@ export class SessionRoutes extends BaseRouteHandler {
|
||||
session.abortController.abort();
|
||||
}
|
||||
const pendingStore = this.sessionManager.getPendingMessageStore();
|
||||
pendingStore.markAllSessionMessagesAbandoned(sessionDbId);
|
||||
pendingStore.transitionMessagesTo('abandoned', { sessionDbId });
|
||||
this.sessionManager.removeSessionImmediate(sessionDbId);
|
||||
return;
|
||||
}
|
||||
@@ -253,7 +253,7 @@ export class SessionRoutes extends BaseRouteHandler {
|
||||
// Mark all processing messages as failed so they can be retried or abandoned
|
||||
const pendingStore = this.sessionManager.getPendingMessageStore();
|
||||
try {
|
||||
const failedCount = pendingStore.markSessionMessagesFailed(session.sessionDbId);
|
||||
const failedCount = pendingStore.transitionMessagesTo('failed', { sessionDbId: session.sessionDbId });
|
||||
if (failedCount > 0) {
|
||||
logger.error('SESSION', `Marked messages as failed after generator error`, {
|
||||
sessionId: session.sessionDbId,
|
||||
@@ -268,10 +268,11 @@ export class SessionRoutes extends BaseRouteHandler {
|
||||
}
|
||||
})
|
||||
.finally(async () => {
|
||||
// CRITICAL: Verify subprocess exit to prevent zombie accumulation (Issue #1168)
|
||||
const tracked = getProcessBySession(session.sessionDbId);
|
||||
// Primary-path subprocess teardown — process-group kill ensures any
|
||||
// SDK descendants are reaped too (Principle 5).
|
||||
const tracked = getSdkProcessForSession(session.sessionDbId);
|
||||
if (tracked && !tracked.process.killed && tracked.process.exitCode === null) {
|
||||
await ensureProcessExit(tracked, 5000);
|
||||
await ensureSdkProcessExit(tracked, 5000);
|
||||
}
|
||||
|
||||
const sessionDbId = session.sessionDbId;
|
||||
@@ -289,43 +290,6 @@ export class SessionRoutes extends BaseRouteHandler {
|
||||
session.currentProvider = null;
|
||||
this.workerService.broadcastProcessingStatus();
|
||||
|
||||
// Stop-hook fire-and-forget (Phase 2): if the generator just processed
|
||||
// a summary and no work remains, the Stop hook is done and we should
|
||||
// self-clean the session. The summary write is already committed to
|
||||
// SQLite synchronously inside processAgentResponse() BEFORE startSession()
|
||||
// returns (see ResponseProcessor.ts: storeObservations() is sync, and
|
||||
// confirmProcessed() runs right after), so by the time this .finally()
|
||||
// runs the summary is durably persisted.
|
||||
//
|
||||
// We gate on lastSummaryStored so we don't finalize after every idle
|
||||
// timeout between tool calls — only when a real Stop event produced
|
||||
// a summary record.
|
||||
try {
|
||||
const pendingStore = this.sessionManager.getPendingMessageStore();
|
||||
const pendingNow = pendingStore.getPendingCount(sessionDbId);
|
||||
if (session.lastSummaryStored === true && pendingNow === 0) {
|
||||
logger.info('SESSION', 'Stop-hook self-clean: summary persisted + queue drained → finalizing', {
|
||||
sessionId: sessionDbId
|
||||
});
|
||||
// finalizeSession is idempotent and does NOT touch the in-memory map —
|
||||
// it only marks DB completed, drains any orphaned pending messages,
|
||||
// and broadcasts the completion event. sessionManager cleanup is
|
||||
// handled below by the existing abort/removeSessionImmediate flow.
|
||||
this.completionHandler.finalizeSession(sessionDbId);
|
||||
// Clear the flag so a subsequent re-activation of the same session
|
||||
// does not fire finalize again without a fresh summary.
|
||||
session.lastSummaryStored = false;
|
||||
// Ensure the session is removed from the active-sessions map so the
|
||||
// Stop-hook path doesn't depend on a later idle-timeout tick.
|
||||
this.sessionManager.removeSessionImmediate(sessionDbId);
|
||||
return;
|
||||
}
|
||||
} catch (err) {
|
||||
logger.warn('SESSION', 'finalizeSession failed in SessionRoutes generator .finally()', {
|
||||
sessionId: sessionDbId
|
||||
}, err as Error);
|
||||
}
|
||||
|
||||
// Crash recovery: If not aborted and still has work, restart (with limit)
|
||||
if (!wasAborted) {
|
||||
const pendingStore = this.sessionManager.getPendingMessageStore();
|
||||
@@ -353,16 +317,34 @@ export class SessionRoutes extends BaseRouteHandler {
|
||||
session.consecutiveRestarts = (session.consecutiveRestarts || 0) + 1; // Keep for logging
|
||||
|
||||
if (!restartAllowed) {
|
||||
logger.error('SESSION', `CRITICAL: Restart guard tripped — too many restarts in window, stopping to prevent runaway costs`, {
|
||||
logger.error('SESSION', `CRITICAL: Restart guard tripped — session is dead, draining pending messages and terminating`, {
|
||||
sessionId: sessionDbId,
|
||||
pendingCount,
|
||||
restartsInWindow: session.restartGuard.restartsInWindow,
|
||||
windowMs: session.restartGuard.windowMs,
|
||||
maxRestarts: session.restartGuard.maxRestarts,
|
||||
action: 'Generator will NOT restart. Check logs for root cause. Messages remain in pending state.'
|
||||
consecutiveFailures: session.restartGuard.consecutiveFailuresSinceSuccess,
|
||||
maxConsecutiveFailures: session.restartGuard.maxConsecutiveFailures,
|
||||
action: 'Generator will NOT restart. Pending messages drained to abandoned. Check logs for root cause.'
|
||||
});
|
||||
// Don't restart - abort to prevent further API calls
|
||||
// Don't restart - abort to prevent further API calls AND drain pending
|
||||
// messages so the session doesn't reappear in getSessionsWithPendingMessages
|
||||
// and trigger another auto-start cycle.
|
||||
session.abortController.abort();
|
||||
try {
|
||||
const drained = pendingStore.transitionMessagesTo('abandoned', { sessionDbId });
|
||||
if (drained > 0) {
|
||||
logger.error('SESSION', 'Drained pending messages to abandoned after restart guard trip', {
|
||||
sessionId: sessionDbId,
|
||||
drained,
|
||||
});
|
||||
}
|
||||
} catch (drainErr) {
|
||||
const normalized = drainErr instanceof Error ? drainErr : new Error(String(drainErr));
|
||||
logger.error('SESSION', 'Failed to drain pending messages after restart guard trip', {
|
||||
sessionId: sessionDbId,
|
||||
}, normalized);
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
@@ -371,7 +353,9 @@ export class SessionRoutes extends BaseRouteHandler {
|
||||
pendingCount,
|
||||
consecutiveRestarts: session.consecutiveRestarts,
|
||||
restartsInWindow: session.restartGuard!.restartsInWindow,
|
||||
maxRestarts: session.restartGuard!.maxRestarts
|
||||
maxRestarts: session.restartGuard!.maxRestarts,
|
||||
consecutiveFailures: session.restartGuard!.consecutiveFailuresSinceSuccess,
|
||||
maxConsecutiveFailures: session.restartGuard!.maxConsecutiveFailures
|
||||
});
|
||||
|
||||
// Abort OLD controller before replacing to prevent child process leaks
|
||||
@@ -411,21 +395,106 @@ export class SessionRoutes extends BaseRouteHandler {
|
||||
|
||||
setupRoutes(app: express.Application): void {
|
||||
// Legacy session endpoints (use sessionDbId)
|
||||
app.post('/sessions/:sessionDbId/init', this.handleSessionInit.bind(this));
|
||||
app.post('/sessions/:sessionDbId/observations', this.handleObservations.bind(this));
|
||||
app.post('/sessions/:sessionDbId/summarize', this.handleSummarize.bind(this));
|
||||
app.post(
|
||||
'/sessions/:sessionDbId/init',
|
||||
validateBody(SessionRoutes.legacySessionInitSchema),
|
||||
this.handleSessionInit.bind(this)
|
||||
);
|
||||
app.post(
|
||||
'/sessions/:sessionDbId/observations',
|
||||
validateBody(SessionRoutes.legacyObservationsSchema),
|
||||
this.handleObservations.bind(this)
|
||||
);
|
||||
app.post(
|
||||
'/sessions/:sessionDbId/summarize',
|
||||
validateBody(SessionRoutes.legacySummarizeSchema),
|
||||
this.handleSummarize.bind(this)
|
||||
);
|
||||
app.get('/sessions/:sessionDbId/status', this.handleSessionStatus.bind(this));
|
||||
app.delete('/sessions/:sessionDbId', this.handleSessionDelete.bind(this));
|
||||
app.post('/sessions/:sessionDbId/complete', this.handleSessionComplete.bind(this));
|
||||
|
||||
// New session endpoints (use contentSessionId)
|
||||
app.post('/api/sessions/init', this.handleSessionInitByClaudeId.bind(this));
|
||||
app.post('/api/sessions/observations', this.handleObservationsByClaudeId.bind(this));
|
||||
app.post('/api/sessions/summarize', this.handleSummarizeByClaudeId.bind(this));
|
||||
app.post('/api/sessions/complete', this.handleCompleteByClaudeId.bind(this));
|
||||
app.post(
|
||||
'/api/sessions/init',
|
||||
validateBody(SessionRoutes.sessionInitByClaudeIdSchema),
|
||||
this.handleSessionInitByClaudeId.bind(this)
|
||||
);
|
||||
app.post(
|
||||
'/api/sessions/observations',
|
||||
validateBody(SessionRoutes.observationsByClaudeIdSchema),
|
||||
this.handleObservationsByClaudeId.bind(this)
|
||||
);
|
||||
app.post(
|
||||
'/api/sessions/summarize',
|
||||
validateBody(SessionRoutes.summarizeByClaudeIdSchema),
|
||||
this.handleSummarizeByClaudeId.bind(this)
|
||||
);
|
||||
app.post(
|
||||
'/api/sessions/complete',
|
||||
validateBody(SessionRoutes.completeByClaudeIdSchema),
|
||||
this.handleCompleteByClaudeId.bind(this)
|
||||
);
|
||||
app.get('/api/sessions/status', this.handleStatusByClaudeId.bind(this));
|
||||
}
|
||||
|
||||
// Plan 06 Phase 3 — per-route Zod schemas. Schemas live at the top of the
|
||||
// owning route file and gate body validation via `validateBody`.
|
||||
// `passthrough()` preserves optional/forwarded fields the handlers
|
||||
// already accept (e.g. cwd, agentId, agentType, platformSource).
|
||||
private static readonly legacySessionInitSchema = z.object({
|
||||
userPrompt: z.string().optional(),
|
||||
promptNumber: z.number().int().optional(),
|
||||
}).passthrough();
|
||||
|
||||
private static readonly legacyObservationsSchema = z.object({
|
||||
tool_name: z.string().min(1),
|
||||
tool_input: z.unknown().optional(),
|
||||
tool_response: z.unknown().optional(),
|
||||
prompt_number: z.number().int().optional(),
|
||||
cwd: z.string().optional(),
|
||||
}).passthrough();
|
||||
|
||||
private static readonly legacySummarizeSchema = z.object({
|
||||
last_assistant_message: z.string().optional(),
|
||||
}).passthrough();
|
||||
|
||||
private static readonly sessionInitByClaudeIdSchema = z.object({
|
||||
contentSessionId: z.string().min(1),
|
||||
project: z.string().optional(),
|
||||
prompt: z.string().optional(),
|
||||
platformSource: z.string().optional(),
|
||||
customTitle: z.string().optional(),
|
||||
}).passthrough();
|
||||
|
||||
private static readonly observationsByClaudeIdSchema = z.object({
|
||||
contentSessionId: z.string().min(1),
|
||||
tool_name: z.string().min(1),
|
||||
tool_input: z.unknown().optional(),
|
||||
tool_response: z.unknown().optional(),
|
||||
cwd: z.string().optional(),
|
||||
agentId: z.string().optional(),
|
||||
agentType: z.string().optional(),
|
||||
platformSource: z.string().optional(),
|
||||
// Idempotency key for the UNIQUE(content_session_id, tool_use_id) index
|
||||
// added in Plan 01 Phase 1. Accept both snake and camel shapes so
|
||||
// cross-process callers using either convention still deduplicate.
|
||||
tool_use_id: z.string().optional(),
|
||||
toolUseId: z.string().optional(),
|
||||
}).passthrough();
|
||||
|
||||
private static readonly summarizeByClaudeIdSchema = z.object({
|
||||
contentSessionId: z.string().min(1),
|
||||
last_assistant_message: z.string().optional(),
|
||||
agentId: z.string().optional(),
|
||||
platformSource: z.string().optional(),
|
||||
}).passthrough();
|
||||
|
||||
private static readonly completeByClaudeIdSchema = z.object({
|
||||
contentSessionId: z.string().min(1),
|
||||
platformSource: z.string().optional(),
|
||||
}).passthrough();
|
||||
|
||||
/**
|
||||
* Initialize a new session
|
||||
*/
|
||||
@@ -600,98 +669,40 @@ export class SessionRoutes extends BaseRouteHandler {
|
||||
* Body: { contentSessionId, tool_name, tool_input, tool_response, cwd }
|
||||
*/
|
||||
private handleObservationsByClaudeId = this.wrapHandler((req: Request, res: Response): void => {
|
||||
const { contentSessionId, tool_name, tool_input, tool_response, cwd, agentId, agentType } = req.body;
|
||||
const platformSource = normalizePlatformSource(req.body.platformSource);
|
||||
const project = typeof cwd === 'string' && cwd.trim() ? getProjectContext(cwd).primary : '';
|
||||
|
||||
if (!contentSessionId) {
|
||||
return this.badRequest(res, 'Missing contentSessionId');
|
||||
}
|
||||
|
||||
// Load skip tools from settings
|
||||
const settings = SettingsDefaultsManager.loadFromFile(USER_SETTINGS_PATH);
|
||||
const skipTools = new Set(settings.CLAUDE_MEM_SKIP_TOOLS.split(',').map(t => t.trim()).filter(Boolean));
|
||||
|
||||
// Skip low-value or meta tools
|
||||
if (skipTools.has(tool_name)) {
|
||||
logger.debug('SESSION', 'Skipping observation for tool', { tool_name });
|
||||
res.json({ status: 'skipped', reason: 'tool_excluded' });
|
||||
return;
|
||||
}
|
||||
|
||||
// Skip meta-observations: file operations on session-memory files
|
||||
const fileOperationTools = new Set(['Edit', 'Write', 'Read', 'NotebookEdit']);
|
||||
if (fileOperationTools.has(tool_name) && tool_input) {
|
||||
const filePath = tool_input.file_path || tool_input.notebook_path;
|
||||
if (filePath && filePath.includes('session-memory')) {
|
||||
logger.debug('SESSION', 'Skipping meta-observation for session-memory file', {
|
||||
tool_name,
|
||||
file_path: filePath
|
||||
});
|
||||
res.json({ status: 'skipped', reason: 'session_memory_meta' });
|
||||
return;
|
||||
}
|
||||
}
|
||||
|
||||
const store = this.dbManager.getSessionStore();
|
||||
|
||||
let sessionDbId: number;
|
||||
let promptNumber: number;
|
||||
try {
|
||||
sessionDbId = store.createSDKSession(contentSessionId, project, '', undefined, platformSource);
|
||||
promptNumber = store.getPromptNumberFromUserPrompts(contentSessionId);
|
||||
} catch (error) {
|
||||
const normalizedError = error instanceof Error ? error : new Error(String(error));
|
||||
logger.error('HTTP', 'Observation storage failed', { contentSessionId, tool_name }, normalizedError);
|
||||
res.json({ stored: false, reason: normalizedError.message });
|
||||
return;
|
||||
}
|
||||
|
||||
// Privacy check: skip if user prompt was entirely private
|
||||
const userPrompt = PrivacyCheckValidator.checkUserPromptPrivacy(
|
||||
store,
|
||||
const {
|
||||
contentSessionId,
|
||||
promptNumber,
|
||||
'observation',
|
||||
sessionDbId,
|
||||
{ tool_name }
|
||||
);
|
||||
if (!userPrompt) {
|
||||
res.json({ status: 'skipped', reason: 'private' });
|
||||
return;
|
||||
}
|
||||
|
||||
// Strip memory tags from tool_input and tool_response
|
||||
const cleanedToolInput = tool_input !== undefined
|
||||
? stripMemoryTagsFromJson(JSON.stringify(tool_input))
|
||||
: '{}';
|
||||
|
||||
const cleanedToolResponse = tool_response !== undefined
|
||||
? stripMemoryTagsFromJson(JSON.stringify(tool_response))
|
||||
: '{}';
|
||||
|
||||
// Queue observation
|
||||
this.sessionManager.queueObservation(sessionDbId, {
|
||||
tool_name,
|
||||
tool_input: cleanedToolInput,
|
||||
tool_response: cleanedToolResponse,
|
||||
prompt_number: promptNumber,
|
||||
cwd: cwd || (() => {
|
||||
logger.error('SESSION', 'Missing cwd when queueing observation in SessionRoutes', {
|
||||
sessionId: sessionDbId,
|
||||
tool_name
|
||||
});
|
||||
return '';
|
||||
})(),
|
||||
agentId: typeof agentId === 'string' ? agentId : undefined,
|
||||
agentType: typeof agentType === 'string' ? agentType : undefined,
|
||||
tool_input,
|
||||
tool_response,
|
||||
cwd,
|
||||
platformSource,
|
||||
agentId,
|
||||
agentType,
|
||||
tool_use_id,
|
||||
toolUseId,
|
||||
} = req.body;
|
||||
|
||||
const result = ingestObservation({
|
||||
contentSessionId,
|
||||
toolName: tool_name,
|
||||
toolInput: tool_input,
|
||||
toolResponse: tool_response,
|
||||
cwd,
|
||||
platformSource,
|
||||
agentId,
|
||||
agentType,
|
||||
toolUseId: typeof tool_use_id === 'string' ? tool_use_id : (typeof toolUseId === 'string' ? toolUseId : undefined),
|
||||
});
|
||||
|
||||
// Ensure SDK agent is running
|
||||
this.ensureGeneratorRunning(sessionDbId, 'observation');
|
||||
if (!result.ok) {
|
||||
res.status(result.status ?? 500).json({ stored: false, reason: result.reason });
|
||||
return;
|
||||
}
|
||||
|
||||
// Broadcast observation queued event
|
||||
this.eventBroadcaster.broadcastObservationQueued(sessionDbId);
|
||||
if ('status' in result && result.status === 'skipped') {
|
||||
res.json({ status: 'skipped', reason: result.reason });
|
||||
return;
|
||||
}
|
||||
|
||||
res.json({ status: 'queued' });
|
||||
});
|
||||
@@ -707,10 +718,6 @@ export class SessionRoutes extends BaseRouteHandler {
|
||||
const { contentSessionId, last_assistant_message, agentId } = req.body;
|
||||
const platformSource = normalizePlatformSource(req.body.platformSource);
|
||||
|
||||
if (!contentSessionId) {
|
||||
return this.badRequest(res, 'Missing contentSessionId');
|
||||
}
|
||||
|
||||
// Belt-and-suspenders: reject summarize requests from subagent context.
|
||||
// Gate on agentId only — agentType alone indicates a main session started with
|
||||
// --agent, which still owns its summary. Mirrors the hook-side guard in summarize.ts.
|
||||
@@ -802,10 +809,6 @@ export class SessionRoutes extends BaseRouteHandler {
|
||||
|
||||
logger.info('HTTP', '→ POST /api/sessions/complete', { contentSessionId });
|
||||
|
||||
if (!contentSessionId) {
|
||||
return this.badRequest(res, 'Missing contentSessionId');
|
||||
}
|
||||
|
||||
const store = this.dbManager.getSessionStore();
|
||||
|
||||
// Look up sessionDbId from contentSessionId (createSDKSession is idempotent)
|
||||
@@ -854,10 +857,25 @@ export class SessionRoutes extends BaseRouteHandler {
|
||||
// Only contentSessionId is truly required — Cursor and other platforms
|
||||
// may omit prompt/project in their payload (#838, #1049)
|
||||
const project = req.body.project || 'unknown';
|
||||
const prompt = req.body.prompt || '[media prompt]';
|
||||
let prompt = req.body.prompt || '[media prompt]';
|
||||
const platformSource = normalizePlatformSource(req.body.platformSource);
|
||||
const customTitle = req.body.customTitle || undefined;
|
||||
|
||||
const promptByteLength = Buffer.byteLength(prompt, 'utf8');
|
||||
if (promptByteLength > MAX_USER_PROMPT_BYTES) {
|
||||
logger.warn('HTTP', 'SessionRoutes: oversized prompt truncated at session-init boundary', {
|
||||
project,
|
||||
contentSessionId,
|
||||
promptByteLength,
|
||||
maxBytes: MAX_USER_PROMPT_BYTES,
|
||||
preview: prompt.slice(0, 200)
|
||||
});
|
||||
const buf = Buffer.from(prompt, 'utf8');
|
||||
let end = MAX_USER_PROMPT_BYTES;
|
||||
while (end > 0 && (buf[end] & 0xc0) === 0x80) end--;
|
||||
prompt = buf.subarray(0, end).toString('utf8');
|
||||
}
|
||||
|
||||
logger.info('HTTP', 'SessionRoutes: handleSessionInitByClaudeId called', {
|
||||
contentSessionId,
|
||||
project,
|
||||
@@ -866,11 +884,6 @@ export class SessionRoutes extends BaseRouteHandler {
|
||||
customTitle
|
||||
});
|
||||
|
||||
// Validate required parameters
|
||||
if (!this.validateRequired(req, res, ['contentSessionId'])) {
|
||||
return;
|
||||
}
|
||||
|
||||
const store = this.dbManager.getSessionStore();
|
||||
|
||||
// Step 1: Create/get SDK session (idempotent INSERT OR IGNORE)
|
||||
|
||||
@@ -6,6 +6,7 @@
|
||||
*/
|
||||
|
||||
import express, { Request, Response } from 'express';
|
||||
import { z } from 'zod';
|
||||
import path from 'path';
|
||||
import { readFileSync, writeFileSync, existsSync, renameSync, mkdirSync } from 'fs';
|
||||
import { homedir } from 'os';
|
||||
@@ -13,11 +14,27 @@ import { getPackageRoot } from '../../../../shared/paths.js';
|
||||
import { logger } from '../../../../utils/logger.js';
|
||||
import { SettingsManager } from '../../SettingsManager.js';
|
||||
import { getBranchInfo, switchBranch, pullUpdates } from '../../BranchManager.js';
|
||||
import { ModeManager } from '../../domain/ModeManager.js';
|
||||
import { ModeManager } from '../../../domain/ModeManager.js';
|
||||
import { BaseRouteHandler } from '../BaseRouteHandler.js';
|
||||
import { validateBody } from '../middleware/validateBody.js';
|
||||
import { SettingsDefaultsManager } from '../../../../shared/SettingsDefaultsManager.js';
|
||||
import { clearPortCache } from '../../../../shared/worker-utils.js';
|
||||
|
||||
// Plan 06 Phase 3 — per-route Zod schemas. Semantic validation of individual
|
||||
// CLAUDE_MEM_* keys still happens inside `validateSettings()` because the
|
||||
// allowed-value rules are richer than what Zod expresses here.
|
||||
const updateSettingsSchema = z.object({}).passthrough();
|
||||
|
||||
const toggleMcpSchema = z.object({
|
||||
enabled: z.boolean(),
|
||||
}).passthrough();
|
||||
|
||||
const switchBranchSchema = z.object({
|
||||
branch: z.string().min(1),
|
||||
}).passthrough();
|
||||
|
||||
const updateBranchSchema = z.object({}).passthrough();
|
||||
|
||||
export class SettingsRoutes extends BaseRouteHandler {
|
||||
constructor(
|
||||
private settingsManager: SettingsManager
|
||||
@@ -28,16 +45,16 @@ export class SettingsRoutes extends BaseRouteHandler {
|
||||
setupRoutes(app: express.Application): void {
|
||||
// Settings endpoints
|
||||
app.get('/api/settings', this.handleGetSettings.bind(this));
|
||||
app.post('/api/settings', this.handleUpdateSettings.bind(this));
|
||||
app.post('/api/settings', validateBody(updateSettingsSchema), this.handleUpdateSettings.bind(this));
|
||||
|
||||
// MCP toggle endpoints
|
||||
app.get('/api/mcp/status', this.handleGetMcpStatus.bind(this));
|
||||
app.post('/api/mcp/toggle', this.handleToggleMcp.bind(this));
|
||||
app.post('/api/mcp/toggle', validateBody(toggleMcpSchema), this.handleToggleMcp.bind(this));
|
||||
|
||||
// Branch switching endpoints
|
||||
app.get('/api/branch/status', this.handleGetBranchStatus.bind(this));
|
||||
app.post('/api/branch/switch', this.handleSwitchBranch.bind(this));
|
||||
app.post('/api/branch/update', this.handleUpdateBranch.bind(this));
|
||||
app.post('/api/branch/switch', validateBody(switchBranchSchema), this.handleSwitchBranch.bind(this));
|
||||
app.post('/api/branch/update', validateBody(updateBranchSchema), this.handleUpdateBranch.bind(this));
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -156,12 +173,7 @@ export class SettingsRoutes extends BaseRouteHandler {
|
||||
* Body: { enabled: boolean }
|
||||
*/
|
||||
private handleToggleMcp = this.wrapHandler((req: Request, res: Response): void => {
|
||||
const { enabled } = req.body;
|
||||
|
||||
if (typeof enabled !== 'boolean') {
|
||||
this.badRequest(res, 'enabled must be a boolean');
|
||||
return;
|
||||
}
|
||||
const { enabled } = req.body as z.infer<typeof toggleMcpSchema>;
|
||||
|
||||
this.toggleMcp(enabled);
|
||||
res.json({ success: true, enabled: this.isMcpEnabled() });
|
||||
@@ -180,12 +192,7 @@ export class SettingsRoutes extends BaseRouteHandler {
|
||||
* Body: { branch: "main" | "beta/7.0" }
|
||||
*/
|
||||
private handleSwitchBranch = this.wrapHandler(async (req: Request, res: Response): Promise<void> => {
|
||||
const { branch } = req.body;
|
||||
|
||||
if (!branch) {
|
||||
res.status(400).json({ success: false, error: 'Missing branch parameter' });
|
||||
return;
|
||||
}
|
||||
const { branch } = req.body as z.infer<typeof switchBranchSchema>;
|
||||
|
||||
// Validate branch name
|
||||
const allowedBranches = ['main', 'beta/7.0', 'feature/bun-executable'];
|
||||
|
||||
@@ -15,6 +15,40 @@ import { DatabaseManager } from '../../DatabaseManager.js';
|
||||
import { SessionManager } from '../../SessionManager.js';
|
||||
import { BaseRouteHandler } from '../BaseRouteHandler.js';
|
||||
|
||||
/**
|
||||
* Plan 06 Phase 6 — viewer.html is loaded once at module init and held in
|
||||
* memory for the lifetime of the worker process. Process restart is the
|
||||
* cache-invalidation event; no fs.watch, no TTL, no refresh.
|
||||
*
|
||||
* We probe the same two on-disk locations the legacy handler did so the
|
||||
* dev (cache) and installed (marketplace) layouts both keep working.
|
||||
*/
|
||||
const VIEWER_HTML_CANDIDATE_PATHS: readonly string[] = (() => {
|
||||
const packageRoot = getPackageRoot();
|
||||
return [
|
||||
path.join(packageRoot, 'ui', 'viewer.html'),
|
||||
path.join(packageRoot, 'plugin', 'ui', 'viewer.html'),
|
||||
];
|
||||
})();
|
||||
|
||||
const resolvedViewerHtmlPath: string | null =
|
||||
VIEWER_HTML_CANDIDATE_PATHS.find((candidate) => existsSync(candidate)) ?? null;
|
||||
|
||||
const viewerHtmlBytes: Buffer | null = resolvedViewerHtmlPath
|
||||
? readFileSync(resolvedViewerHtmlPath)
|
||||
: null;
|
||||
|
||||
if (resolvedViewerHtmlPath) {
|
||||
logger.info('SYSTEM', 'Cached viewer.html at boot', {
|
||||
path: resolvedViewerHtmlPath,
|
||||
bytes: viewerHtmlBytes!.byteLength,
|
||||
});
|
||||
} else {
|
||||
logger.warn('SYSTEM', 'viewer.html not found at any expected location at boot', {
|
||||
candidates: VIEWER_HTML_CANDIDATE_PATHS,
|
||||
});
|
||||
}
|
||||
|
||||
export class ViewerRoutes extends BaseRouteHandler {
|
||||
constructor(
|
||||
private sseBroadcaster: SSEBroadcaster,
|
||||
@@ -49,26 +83,15 @@ export class ViewerRoutes extends BaseRouteHandler {
|
||||
});
|
||||
|
||||
/**
|
||||
* Serve viewer UI
|
||||
* Serve viewer UI from the in-memory cache populated at module init.
|
||||
* Plan 06 Phase 6 — single read at boot, no per-request fs hit.
|
||||
*/
|
||||
private handleViewerUI = this.wrapHandler((req: Request, res: Response): void => {
|
||||
const packageRoot = getPackageRoot();
|
||||
|
||||
// Try cache structure first (ui/viewer.html), then marketplace structure (plugin/ui/viewer.html)
|
||||
const viewerPaths = [
|
||||
path.join(packageRoot, 'ui', 'viewer.html'),
|
||||
path.join(packageRoot, 'plugin', 'ui', 'viewer.html')
|
||||
];
|
||||
|
||||
const viewerPath = viewerPaths.find(p => existsSync(p));
|
||||
|
||||
if (!viewerPath) {
|
||||
if (!viewerHtmlBytes) {
|
||||
throw new Error('Viewer UI not found at any expected location');
|
||||
}
|
||||
|
||||
const html = readFileSync(viewerPath, 'utf-8');
|
||||
res.setHeader('Content-Type', 'text/html');
|
||||
res.send(html);
|
||||
res.setHeader('Content-Type', 'text/html; charset=utf-8');
|
||||
res.send(viewerHtmlBytes);
|
||||
});
|
||||
|
||||
/**
|
||||
|
||||
@@ -0,0 +1,406 @@
|
||||
/**
|
||||
* Worker HTTP shared ingest helpers.
|
||||
*
|
||||
* Per PATHFINDER-2026-04-22 plan 03 phase 0:
|
||||
* `ingestObservation`, `ingestPrompt`, `ingestSummary` are the single
|
||||
* in-process implementation of the worker's three ingest paths. The HTTP
|
||||
* route handlers (cross-process callers) and worker-internal producers
|
||||
* (transcript processor, ResponseProcessor) BOTH delegate here.
|
||||
*
|
||||
* No HTTP loopback. No duplicated insert logic. One helper, N callers.
|
||||
*
|
||||
* Wiring: `WorkerService` registers its `sessionManager`, `dbManager`, and
|
||||
* `sessionEventBroadcaster` once at startup via `setIngestContext`. The
|
||||
* helpers fail fast if called before registration.
|
||||
*/
|
||||
|
||||
import { logger } from '../../../utils/logger.js';
|
||||
import type { SessionManager } from '../SessionManager.js';
|
||||
import type { DatabaseManager } from '../DatabaseManager.js';
|
||||
import type { SessionEventBroadcaster } from '../events/SessionEventBroadcaster.js';
|
||||
import type { ParsedSummary } from '../../../sdk/parser.js';
|
||||
import { stripMemoryTagsFromJson } from '../../../utils/tag-stripping.js';
|
||||
import { isProjectExcluded } from '../../../utils/project-filter.js';
|
||||
import { SettingsDefaultsManager } from '../../../shared/SettingsDefaultsManager.js';
|
||||
import { USER_SETTINGS_PATH } from '../../../shared/paths.js';
|
||||
import { getProjectContext } from '../../../utils/project-name.js';
|
||||
import { normalizePlatformSource } from '../../../shared/platform-source.js';
|
||||
import { PrivacyCheckValidator } from '../validation/PrivacyCheckValidator.js';
|
||||
import { EventEmitter } from 'events';
|
||||
|
||||
// ============================================================================
|
||||
// Event bus — Phase 2 (`summaryStoredEvent`) consumers attach here.
|
||||
// ============================================================================
|
||||
|
||||
/**
|
||||
* Event payload emitted exactly once per successful `ingestSummary` call that
|
||||
* actually stored a summary row. `messageId` is the pending_messages row id
|
||||
* that produced the summary; `sessionId` is the contentSessionId.
|
||||
*
|
||||
* Currently dormant — the only consumer (the blocking `/api/session/end`
|
||||
* endpoint) was removed when the Stop hook went fire-and-forget. Kept for
|
||||
* future internal subscribers; emissions are cheap no-ops with no listeners.
|
||||
*/
|
||||
export interface SummaryStoredEvent {
|
||||
sessionId: string;
|
||||
messageId: number;
|
||||
}
|
||||
|
||||
class IngestEventBus extends EventEmitter {
|
||||
/**
|
||||
* Recent summaryStoredEvent buffer keyed by sessionId. Originally protected
|
||||
* the register-after-emit race for the blocking `/api/session/end` handler.
|
||||
* Currently unused (handler removed when Stop hook went fire-and-forget);
|
||||
* preserved so any future subscriber gets the same race-free contract.
|
||||
*/
|
||||
private readonly recentStored = new Map<string, { event: SummaryStoredEvent; at: number }>();
|
||||
private static readonly RECENT_EVENT_TTL_MS = 60_000;
|
||||
|
||||
constructor() {
|
||||
super();
|
||||
// Disable the default 10-listener warning. With no current consumers
|
||||
// this is moot, but kept for parity if future subscribers attach.
|
||||
this.setMaxListeners(0);
|
||||
this.on('summaryStoredEvent', (evt: SummaryStoredEvent) => {
|
||||
this.recentStored.set(evt.sessionId, { event: evt, at: Date.now() });
|
||||
this.evictExpiredStored();
|
||||
});
|
||||
}
|
||||
|
||||
/** Read a recently-emitted summaryStoredEvent (idempotent; TTL-evicted). */
|
||||
takeRecentSummaryStored(sessionId: string): SummaryStoredEvent | undefined {
|
||||
const entry = this.recentStored.get(sessionId);
|
||||
if (!entry) return undefined;
|
||||
if (Date.now() - entry.at > IngestEventBus.RECENT_EVENT_TTL_MS) {
|
||||
this.recentStored.delete(sessionId);
|
||||
return undefined;
|
||||
}
|
||||
return entry.event;
|
||||
}
|
||||
|
||||
private evictExpiredStored(): void {
|
||||
const cutoff = Date.now() - IngestEventBus.RECENT_EVENT_TTL_MS;
|
||||
for (const [key, entry] of this.recentStored) {
|
||||
if (entry.at < cutoff) this.recentStored.delete(key);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Process-local event bus for ingestion lifecycle events.
|
||||
*
|
||||
* Single Node EventEmitter — there is no third event-bus in the worker.
|
||||
* `SessionManager` already uses Node EventEmitter for queue notifications
|
||||
* (`src/services/worker/SessionManager.ts:25`), and
|
||||
* `SessionQueueProcessor` consumes EventEmitter events
|
||||
* (`src/services/queue/SessionQueueProcessor.ts:18`); this module follows
|
||||
* the same pattern at the ingestion layer.
|
||||
*/
|
||||
export const ingestEventBus = new IngestEventBus();
|
||||
|
||||
// ============================================================================
|
||||
// Context registration
|
||||
// ============================================================================
|
||||
|
||||
interface IngestContext {
|
||||
sessionManager: SessionManager;
|
||||
dbManager: DatabaseManager;
|
||||
eventBroadcaster: SessionEventBroadcaster;
|
||||
/** Optional callback to (re)start the SDK generator after enqueue. */
|
||||
ensureGeneratorRunning?: (sessionDbId: number, source: string) => void;
|
||||
}
|
||||
|
||||
let ctx: IngestContext | null = null;
|
||||
|
||||
/**
|
||||
* Register the worker-scoped services the ingest helpers depend on.
|
||||
* Called once from `WorkerService` constructor.
|
||||
*/
|
||||
export function setIngestContext(next: IngestContext): void {
|
||||
ctx = next;
|
||||
}
|
||||
|
||||
/**
|
||||
* Attach the generator-running callback after `SessionRoutes` has been
|
||||
* constructed. `setIngestContext` is called early in `WorkerService` startup
|
||||
* (before routes exist), so the callback is wired in as a second step once
|
||||
* `SessionRoutes.ensureGeneratorRunning` is available.
|
||||
*
|
||||
* Without this, transcript-watcher observations queue via
|
||||
* `ingestObservation()` but the SDK generator never auto-starts to drain
|
||||
* them.
|
||||
*/
|
||||
export function attachIngestGeneratorStarter(
|
||||
ensureGeneratorRunning: (sessionDbId: number, source: string) => void,
|
||||
): void {
|
||||
requireContext().ensureGeneratorRunning = ensureGeneratorRunning;
|
||||
}
|
||||
|
||||
function requireContext(): IngestContext {
|
||||
if (!ctx) {
|
||||
throw new Error('ingest helpers used before setIngestContext() — wiring bug');
|
||||
}
|
||||
return ctx;
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Result type
|
||||
// ============================================================================
|
||||
|
||||
export type IngestResult =
|
||||
| { ok: true; sessionDbId: number; messageId?: number }
|
||||
| { ok: true; status: 'skipped'; reason: string }
|
||||
| { ok: false; reason: string; status?: number };
|
||||
|
||||
// ============================================================================
|
||||
// Observation
|
||||
// ============================================================================
|
||||
|
||||
export interface ObservationPayload {
|
||||
contentSessionId: string;
|
||||
toolName: string;
|
||||
toolInput: unknown;
|
||||
toolResponse: unknown;
|
||||
cwd?: string;
|
||||
platformSource?: string;
|
||||
agentId?: string;
|
||||
agentType?: string;
|
||||
toolUseId?: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* Ingest an observation: resolve session, apply project / skip-tool filters,
|
||||
* strip privacy tags, persist to pending_messages, ensure the SDK generator
|
||||
* is running.
|
||||
*
|
||||
* Same implementation for cross-process HTTP callers and worker-internal
|
||||
* callers (transcript processor, ResponseProcessor side-effects).
|
||||
*/
|
||||
export function ingestObservation(payload: ObservationPayload): IngestResult {
|
||||
const { sessionManager, dbManager, eventBroadcaster, ensureGeneratorRunning } = requireContext();
|
||||
|
||||
if (!payload.contentSessionId) {
|
||||
return { ok: false, reason: 'missing contentSessionId', status: 400 };
|
||||
}
|
||||
if (!payload.toolName) {
|
||||
return { ok: false, reason: 'missing toolName', status: 400 };
|
||||
}
|
||||
|
||||
const platformSource = normalizePlatformSource(payload.platformSource);
|
||||
const cwd = typeof payload.cwd === 'string' ? payload.cwd : '';
|
||||
const project = cwd.trim() ? getProjectContext(cwd).primary : '';
|
||||
|
||||
const settings = SettingsDefaultsManager.loadFromFile(USER_SETTINGS_PATH);
|
||||
|
||||
// Project exclusion (the same gate the hook handler applies).
|
||||
if (cwd && isProjectExcluded(cwd, settings.CLAUDE_MEM_EXCLUDED_PROJECTS)) {
|
||||
return { ok: true, status: 'skipped', reason: 'project_excluded' };
|
||||
}
|
||||
|
||||
// Skip low-value or meta tools per user settings.
|
||||
const skipTools = new Set(
|
||||
settings.CLAUDE_MEM_SKIP_TOOLS.split(',').map(t => t.trim()).filter(Boolean)
|
||||
);
|
||||
if (skipTools.has(payload.toolName)) {
|
||||
return { ok: true, status: 'skipped', reason: 'tool_excluded' };
|
||||
}
|
||||
|
||||
// Skip meta-observations: file operations on session-memory files.
|
||||
const fileOperationTools = new Set(['Edit', 'Write', 'Read', 'NotebookEdit']);
|
||||
if (fileOperationTools.has(payload.toolName) && payload.toolInput && typeof payload.toolInput === 'object') {
|
||||
const input = payload.toolInput as { file_path?: string; notebook_path?: string };
|
||||
const filePath = input.file_path || input.notebook_path;
|
||||
if (filePath && filePath.includes('session-memory')) {
|
||||
return { ok: true, status: 'skipped', reason: 'session_memory_meta' };
|
||||
}
|
||||
}
|
||||
|
||||
const store = dbManager.getSessionStore();
|
||||
|
||||
let sessionDbId: number;
|
||||
let promptNumber: number;
|
||||
try {
|
||||
sessionDbId = store.createSDKSession(payload.contentSessionId, project, '', undefined, platformSource);
|
||||
promptNumber = store.getPromptNumberFromUserPrompts(payload.contentSessionId);
|
||||
} catch (error) {
|
||||
const message = error instanceof Error ? error.message : String(error);
|
||||
logger.error('INGEST', 'Observation session resolution failed', {
|
||||
contentSessionId: payload.contentSessionId,
|
||||
toolName: payload.toolName,
|
||||
}, error instanceof Error ? error : new Error(message));
|
||||
return { ok: false, reason: message, status: 500 };
|
||||
}
|
||||
|
||||
// Privacy: skip if user prompt was entirely private.
|
||||
const userPrompt = PrivacyCheckValidator.checkUserPromptPrivacy(
|
||||
store,
|
||||
payload.contentSessionId,
|
||||
promptNumber,
|
||||
'observation',
|
||||
sessionDbId,
|
||||
{ tool_name: payload.toolName }
|
||||
);
|
||||
if (!userPrompt) {
|
||||
return { ok: true, status: 'skipped', reason: 'private' };
|
||||
}
|
||||
|
||||
const cleanedToolInput = payload.toolInput !== undefined
|
||||
? stripMemoryTagsFromJson(JSON.stringify(payload.toolInput))
|
||||
: '{}';
|
||||
const cleanedToolResponse = payload.toolResponse !== undefined
|
||||
? stripMemoryTagsFromJson(JSON.stringify(payload.toolResponse))
|
||||
: '{}';
|
||||
|
||||
sessionManager.queueObservation(sessionDbId, {
|
||||
tool_name: payload.toolName,
|
||||
tool_input: cleanedToolInput,
|
||||
tool_response: cleanedToolResponse,
|
||||
prompt_number: promptNumber,
|
||||
cwd: cwd || (() => {
|
||||
logger.error('INGEST', 'Missing cwd when ingesting observation', {
|
||||
sessionId: sessionDbId,
|
||||
toolName: payload.toolName,
|
||||
});
|
||||
return '';
|
||||
})(),
|
||||
agentId: typeof payload.agentId === 'string' ? payload.agentId : undefined,
|
||||
agentType: typeof payload.agentType === 'string' ? payload.agentType : undefined,
|
||||
// Forward the provider-assigned tool-use id so the
|
||||
// UNIQUE(content_session_id, tool_use_id) idempotency index from Plan 01
|
||||
// can actually collapse replays. SQLite treats NULL tool_use_id values as
|
||||
// distinct, so dropping it here silently defeats the INSERT OR IGNORE.
|
||||
toolUseId: typeof payload.toolUseId === 'string' ? payload.toolUseId : undefined,
|
||||
});
|
||||
|
||||
ensureGeneratorRunning?.(sessionDbId, 'observation');
|
||||
eventBroadcaster.broadcastObservationQueued(sessionDbId);
|
||||
|
||||
return { ok: true, sessionDbId };
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Summary (queue side — agent processes the request asynchronously)
|
||||
// ============================================================================
|
||||
|
||||
export interface PromptPayload {
|
||||
contentSessionId: string;
|
||||
/** The user prompt text (must not contain stripped tags). */
|
||||
prompt: string;
|
||||
cwd?: string;
|
||||
platformSource?: string;
|
||||
promptNumber?: number;
|
||||
}
|
||||
|
||||
/**
|
||||
* Ingest a user prompt. Used by the SessionStart / UserPromptSubmit hooks and
|
||||
* by transcript-driven session inits. Wraps `SessionStore.appendUserPrompt`
|
||||
* so cross-process and in-process callers share the same path.
|
||||
*/
|
||||
export function ingestPrompt(payload: PromptPayload): IngestResult {
|
||||
const { dbManager } = requireContext();
|
||||
|
||||
if (!payload.contentSessionId) {
|
||||
return { ok: false, reason: 'missing contentSessionId', status: 400 };
|
||||
}
|
||||
if (typeof payload.prompt !== 'string') {
|
||||
return { ok: false, reason: 'missing prompt text', status: 400 };
|
||||
}
|
||||
|
||||
const platformSource = normalizePlatformSource(payload.platformSource);
|
||||
const cwd = typeof payload.cwd === 'string' ? payload.cwd : '';
|
||||
const project = cwd.trim() ? getProjectContext(cwd).primary : '';
|
||||
|
||||
try {
|
||||
const store = dbManager.getSessionStore();
|
||||
const sessionDbId = store.createSDKSession(payload.contentSessionId, project, payload.prompt, undefined, platformSource);
|
||||
return { ok: true, sessionDbId };
|
||||
} catch (error) {
|
||||
const message = error instanceof Error ? error.message : String(error);
|
||||
return { ok: false, reason: message, status: 500 };
|
||||
}
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Summary
|
||||
// ============================================================================
|
||||
|
||||
/**
|
||||
* Two shapes of ingest:
|
||||
* - "queue a summarize request" (cross-process hook trigger): goes via
|
||||
* `SessionManager.queueSummarize` so the SDK agent will produce the XML
|
||||
* payload on its next iteration.
|
||||
* - "the SDK agent already produced the parsed summary": goes via
|
||||
* `ingestSummary({ parsed, sessionDbId, messageId })`. Stored synchronously,
|
||||
* emits `summaryStoredEvent` for the blocking endpoint in plan 05.
|
||||
*/
|
||||
export type SummaryPayload =
|
||||
| {
|
||||
kind: 'queue';
|
||||
contentSessionId: string;
|
||||
lastAssistantMessage?: string;
|
||||
platformSource?: string;
|
||||
cwd?: string;
|
||||
}
|
||||
| {
|
||||
kind: 'parsed';
|
||||
sessionDbId: number;
|
||||
messageId: number;
|
||||
contentSessionId: string;
|
||||
parsed: ParsedSummary;
|
||||
};
|
||||
|
||||
export function ingestSummary(payload: SummaryPayload): IngestResult {
|
||||
// The 'parsed' branch is a pure post-store notification — it only touches
|
||||
// the module-scope event bus, not the database/session manager. Resolving
|
||||
// requireContext() before the branch split breaks unit tests that drive
|
||||
// ResponseProcessor with a mocked sessionManager but no setIngestContext.
|
||||
// Only the 'queue' branch needs the worker-internal context.
|
||||
if (payload.kind === 'queue') {
|
||||
const { sessionManager, dbManager, ensureGeneratorRunning } = requireContext();
|
||||
|
||||
if (!payload.contentSessionId) {
|
||||
return { ok: false, reason: 'missing contentSessionId', status: 400 };
|
||||
}
|
||||
|
||||
const platformSource = normalizePlatformSource(payload.platformSource);
|
||||
const cwd = typeof payload.cwd === 'string' ? payload.cwd : '';
|
||||
const project = cwd.trim() ? getProjectContext(cwd).primary : '';
|
||||
|
||||
let sessionDbId: number;
|
||||
try {
|
||||
sessionDbId = dbManager.getSessionStore().createSDKSession(payload.contentSessionId, project, '', undefined, platformSource);
|
||||
} catch (error) {
|
||||
const message = error instanceof Error ? error.message : String(error);
|
||||
return { ok: false, reason: message, status: 500 };
|
||||
}
|
||||
|
||||
sessionManager.queueSummarize(sessionDbId, payload.lastAssistantMessage);
|
||||
ensureGeneratorRunning?.(sessionDbId, 'summarize');
|
||||
|
||||
return { ok: true, sessionDbId };
|
||||
}
|
||||
|
||||
// kind === 'parsed' — the SDK agent has produced a summary; store via
|
||||
// session store and emit the summaryStoredEvent for blocking consumers.
|
||||
// Skipped summaries (`<skip_summary/>`) are recorded as a successful no-op:
|
||||
// they have no content to persist, but consumers should still be unblocked.
|
||||
if (payload.parsed.skipped) {
|
||||
ingestEventBus.emit('summaryStoredEvent', {
|
||||
sessionId: payload.contentSessionId,
|
||||
messageId: payload.messageId,
|
||||
} satisfies SummaryStoredEvent);
|
||||
return { ok: true, sessionDbId: payload.sessionDbId, messageId: payload.messageId };
|
||||
}
|
||||
|
||||
// The actual storage of the parsed summary remains co-transactional with
|
||||
// the observation batch in `processAgentResponse`. By the time this branch
|
||||
// is reached the row is already persisted; this call is the canonical
|
||||
// post-store notification path so every producer fires the event the same
|
||||
// way (Plan 03 Phase 2 + greploop fix — sole emitter of summaryStoredEvent).
|
||||
ingestEventBus.emit('summaryStoredEvent', {
|
||||
sessionId: payload.contentSessionId,
|
||||
messageId: payload.messageId,
|
||||
} satisfies SummaryStoredEvent);
|
||||
|
||||
return { ok: true, sessionDbId: payload.sessionDbId, messageId: payload.messageId };
|
||||
}
|
||||
@@ -6,7 +6,7 @@
|
||||
*/
|
||||
|
||||
import { logger } from '../../../utils/logger.js';
|
||||
import type { ObservationRecord } from '../../../types/database.js';
|
||||
import type { ObservationSearchResult } from '../../sqlite/types.js';
|
||||
import type { SessionStore } from '../../sqlite/SessionStore.js';
|
||||
import type { SearchOrchestrator } from '../search/SearchOrchestrator.js';
|
||||
import { CorpusRenderer } from './CorpusRenderer.js';
|
||||
@@ -121,19 +121,19 @@ export class CorpusBuilder {
|
||||
}
|
||||
|
||||
/**
|
||||
* Map a raw ObservationRecord (with JSON string fields) to a CorpusObservation
|
||||
* Map a raw ObservationSearchResult (with JSON string fields) to a CorpusObservation
|
||||
*/
|
||||
private mapObservationToCorpus(row: ObservationRecord): CorpusObservation {
|
||||
private mapObservationToCorpus(row: ObservationSearchResult): CorpusObservation {
|
||||
return {
|
||||
id: row.id,
|
||||
type: row.type,
|
||||
title: (row as any).title || '',
|
||||
subtitle: (row as any).subtitle || null,
|
||||
narrative: (row as any).narrative || null,
|
||||
facts: safeParseJsonArray((row as any).facts),
|
||||
concepts: safeParseJsonArray((row as any).concepts),
|
||||
files_read: safeParseJsonArray((row as any).files_read),
|
||||
files_modified: safeParseJsonArray((row as any).files_modified),
|
||||
title: row.title || '',
|
||||
subtitle: row.subtitle || null,
|
||||
narrative: row.narrative || null,
|
||||
facts: safeParseJsonArray(row.facts),
|
||||
concepts: safeParseJsonArray(row.concepts),
|
||||
files_read: safeParseJsonArray(row.files_read),
|
||||
files_modified: safeParseJsonArray(row.files_modified),
|
||||
project: row.project,
|
||||
created_at: row.created_at,
|
||||
created_at_epoch: row.created_at_epoch,
|
||||
|
||||
@@ -33,7 +33,13 @@ export class ResultFormatter {
|
||||
|
||||
if (totalResults === 0) {
|
||||
if (chromaFailed) {
|
||||
return this.formatChromaFailureMessage();
|
||||
// Legacy callers route through here without a specific reason; surface a
|
||||
// generic non-connection failure so users still get the diagnostic pointer
|
||||
// instead of the old "install uv" lie.
|
||||
return ResultFormatter.formatChromaFailureMessage({
|
||||
message: 'unknown error (no reason captured by caller)',
|
||||
isConnectionError: false,
|
||||
});
|
||||
}
|
||||
return `No results found matching "${query}"`;
|
||||
}
|
||||
@@ -270,16 +276,18 @@ export class ResultFormatter {
|
||||
}
|
||||
|
||||
/**
|
||||
* Format Chroma failure message
|
||||
* Format Chroma failure message with the real underlying error.
|
||||
*
|
||||
* Static so callers (e.g. SearchManager) can format without needing
|
||||
* an instance. The message intentionally surfaces the raw error text
|
||||
* and points users at /api/chroma/status?deep=1 for diagnostics —
|
||||
* never a static "install uv" instruction (which lies about the cause).
|
||||
*/
|
||||
private formatChromaFailureMessage(): string {
|
||||
return `Vector search failed - semantic search unavailable.
|
||||
|
||||
To enable semantic search:
|
||||
1. Install uv: https://docs.astral.sh/uv/getting-started/installation/
|
||||
2. Restart the worker: npm run worker:restart
|
||||
|
||||
Note: You can still use filter-only searches (date ranges, types, files) without a query term.`;
|
||||
static formatChromaFailureMessage(reason: { message: string; isConnectionError: boolean }): string {
|
||||
if (reason.isConnectionError) {
|
||||
return `Semantic search is offline (Chroma MCP unreachable: ${reason.message}). Falling back to keyword search; results may be incomplete. Run \`/api/chroma/status?deep=1\` to diagnose.`;
|
||||
}
|
||||
return `Semantic search failed: ${reason.message}. Falling back to keyword search; results may be incomplete. Check \`~/.claude-mem/logs/\` for the CHROMA_SYNC entry. Run \`/api/chroma/status?deep=1\` for a deeper probe.`;
|
||||
}
|
||||
|
||||
/**
|
||||
|
||||
@@ -30,6 +30,7 @@ import type {
|
||||
SearchResults,
|
||||
ObservationSearchResult
|
||||
} from './types.js';
|
||||
import { ChromaUnavailableError } from './errors.js';
|
||||
import { logger } from '../../../utils/logger.js';
|
||||
|
||||
/**
|
||||
@@ -88,34 +89,27 @@ export class SearchOrchestrator {
|
||||
}
|
||||
|
||||
// PATH 2: CHROMA SEMANTIC SEARCH (query text + Chroma available)
|
||||
// Fail-fast: if Chroma errors, ChromaSearchStrategy now lets the error
|
||||
// propagate. We catch it here only to translate into a typed 503.
|
||||
if (this.chromaStrategy) {
|
||||
logger.debug('SEARCH', 'Orchestrator: Using Chroma semantic search', {});
|
||||
const result = await this.chromaStrategy.search(options);
|
||||
|
||||
// If Chroma succeeded (even with 0 results), return
|
||||
if (result.usedChroma) {
|
||||
return result;
|
||||
try {
|
||||
return await this.chromaStrategy.search(options);
|
||||
} catch (error) {
|
||||
const errorObj = error instanceof Error ? error : new Error(String(error));
|
||||
throw new ChromaUnavailableError(
|
||||
`Chroma query failed: ${errorObj.message}`,
|
||||
errorObj
|
||||
);
|
||||
}
|
||||
|
||||
// Chroma failed - fall back to SQLite for filter-only
|
||||
logger.debug('SEARCH', 'Orchestrator: Chroma failed, falling back to SQLite', {});
|
||||
const fallbackResult = await this.sqliteStrategy.search({
|
||||
...options,
|
||||
query: undefined // Remove query for SQLite fallback
|
||||
});
|
||||
|
||||
return {
|
||||
...fallbackResult,
|
||||
fellBack: true
|
||||
};
|
||||
}
|
||||
|
||||
// PATH 3: No Chroma available
|
||||
logger.debug('SEARCH', 'Orchestrator: Chroma not available', {});
|
||||
// PATH 3: Chroma not configured (explicitly uninitialized at construction).
|
||||
// This is a legitimate config state — return empty results, not an error.
|
||||
logger.debug('SEARCH', 'Orchestrator: Chroma not configured', {});
|
||||
return {
|
||||
results: { observations: [], sessions: [], prompts: [] },
|
||||
usedChroma: false,
|
||||
fellBack: false,
|
||||
strategy: 'sqlite'
|
||||
};
|
||||
}
|
||||
@@ -130,12 +124,11 @@ export class SearchOrchestrator {
|
||||
return await this.hybridStrategy.findByConcept(concept, options);
|
||||
}
|
||||
|
||||
// Fallback to SQLite
|
||||
// Chroma not configured: SQLite metadata-only result.
|
||||
const results = this.sqliteStrategy.findByConcept(concept, options);
|
||||
return {
|
||||
results: { observations: results, sessions: [], prompts: [] },
|
||||
usedChroma: false,
|
||||
fellBack: false,
|
||||
strategy: 'sqlite'
|
||||
};
|
||||
}
|
||||
@@ -150,12 +143,11 @@ export class SearchOrchestrator {
|
||||
return await this.hybridStrategy.findByType(type, options);
|
||||
}
|
||||
|
||||
// Fallback to SQLite
|
||||
// Chroma not configured: SQLite metadata-only result.
|
||||
const results = this.sqliteStrategy.findByType(type, options);
|
||||
return {
|
||||
results: { observations: results, sessions: [], prompts: [] },
|
||||
usedChroma: false,
|
||||
fellBack: false,
|
||||
strategy: 'sqlite'
|
||||
};
|
||||
}
|
||||
|
||||
@@ -0,0 +1,16 @@
|
||||
/**
|
||||
* Search-related error classes
|
||||
*/
|
||||
|
||||
import { AppError } from '../../server/ErrorHandler.js';
|
||||
|
||||
/**
|
||||
* Thrown when Chroma is expected to be available but failed at query time.
|
||||
* Maps to HTTP 503 Service Unavailable.
|
||||
*/
|
||||
export class ChromaUnavailableError extends AppError {
|
||||
constructor(message: string, cause?: Error) {
|
||||
super(message, 503, 'CHROMA_UNAVAILABLE', cause ? { cause: cause.message } : undefined);
|
||||
this.name = 'ChromaUnavailableError';
|
||||
}
|
||||
}
|
||||
@@ -59,31 +59,16 @@ export class ChromaSearchStrategy extends BaseSearchStrategy implements SearchSt
|
||||
const searchSessions = searchType === 'all' || searchType === 'sessions';
|
||||
const searchPrompts = searchType === 'all' || searchType === 'prompts';
|
||||
|
||||
let observations: ObservationSearchResult[] = [];
|
||||
let sessions: SessionSummarySearchResult[] = [];
|
||||
let prompts: UserPromptSearchResult[] = [];
|
||||
|
||||
// Build Chroma where filter for doc_type and project
|
||||
const whereFilter = this.buildWhereFilter(searchType, project);
|
||||
|
||||
logger.debug('SEARCH', 'ChromaSearchStrategy: Querying Chroma', { query, searchType });
|
||||
|
||||
try {
|
||||
return await this.executeChromaSearch(query, whereFilter, {
|
||||
searchObservations, searchSessions, searchPrompts,
|
||||
obsType, concepts, files, orderBy, limit, project
|
||||
});
|
||||
} catch (error) {
|
||||
const errorObj = error instanceof Error ? error : new Error(String(error));
|
||||
logger.error('WORKER', 'ChromaSearchStrategy: Search failed', {}, errorObj);
|
||||
// Return empty result - caller may try fallback strategy
|
||||
return {
|
||||
results: { observations: [], sessions: [], prompts: [] },
|
||||
usedChroma: false,
|
||||
fellBack: false,
|
||||
strategy: 'chroma'
|
||||
};
|
||||
}
|
||||
// Fail-fast: errors propagate to orchestrator, which translates to HTTP 503.
|
||||
return await this.executeChromaSearch(query, whereFilter, {
|
||||
searchObservations, searchSessions, searchPrompts,
|
||||
obsType, concepts, files, orderBy, limit, project
|
||||
});
|
||||
}
|
||||
|
||||
private async executeChromaSearch(
|
||||
@@ -111,7 +96,6 @@ export class ChromaSearchStrategy extends BaseSearchStrategy implements SearchSt
|
||||
return {
|
||||
results: { observations: [], sessions: [], prompts: [] },
|
||||
usedChroma: true,
|
||||
fellBack: false,
|
||||
strategy: 'chroma'
|
||||
};
|
||||
}
|
||||
@@ -123,27 +107,31 @@ export class ChromaSearchStrategy extends BaseSearchStrategy implements SearchSt
|
||||
let sessions: SessionSummarySearchResult[] = [];
|
||||
let prompts: UserPromptSearchResult[] = [];
|
||||
|
||||
// Chroma already ranks by vector similarity; 'relevance' has no SQL
|
||||
// equivalent, so drop it before hydrating rows from SessionStore.
|
||||
const sqlOrderBy: 'date_desc' | 'date_asc' | undefined =
|
||||
options.orderBy === 'relevance' ? undefined : options.orderBy;
|
||||
|
||||
if (categorized.obsIds.length > 0) {
|
||||
const obsOptions = { type: options.obsType, concepts: options.concepts, files: options.files, orderBy: options.orderBy, limit: options.limit, project: options.project };
|
||||
const obsOptions = { type: options.obsType, concepts: options.concepts, files: options.files, orderBy: sqlOrderBy, limit: options.limit, project: options.project };
|
||||
observations = this.sessionStore.getObservationsByIds(categorized.obsIds, obsOptions);
|
||||
}
|
||||
|
||||
if (categorized.sessionIds.length > 0) {
|
||||
sessions = this.sessionStore.getSessionSummariesByIds(categorized.sessionIds, {
|
||||
orderBy: options.orderBy, limit: options.limit, project: options.project
|
||||
orderBy: sqlOrderBy, limit: options.limit, project: options.project
|
||||
});
|
||||
}
|
||||
|
||||
if (categorized.promptIds.length > 0) {
|
||||
prompts = this.sessionStore.getUserPromptsByIds(categorized.promptIds, {
|
||||
orderBy: options.orderBy, limit: options.limit, project: options.project
|
||||
orderBy: sqlOrderBy, limit: options.limit, project: options.project
|
||||
});
|
||||
}
|
||||
|
||||
return {
|
||||
results: { observations, sessions, prompts },
|
||||
usedChroma: true,
|
||||
fellBack: false,
|
||||
strategy: 'chroma'
|
||||
};
|
||||
}
|
||||
|
||||
@@ -79,20 +79,8 @@ export class HybridSearchStrategy extends BaseSearchStrategy implements SearchSt
|
||||
|
||||
const ids = metadataResults.map(obs => obs.id);
|
||||
|
||||
try {
|
||||
return await this.rankAndHydrate(concept, ids, limit);
|
||||
} catch (error) {
|
||||
const errorObj = error instanceof Error ? error : new Error(String(error));
|
||||
logger.error('WORKER', 'HybridSearchStrategy: findByConcept failed', {}, errorObj);
|
||||
// Fall back to metadata-only results
|
||||
const results = this.sessionSearch.findByConcept(concept, filterOptions);
|
||||
return {
|
||||
results: { observations: results, sessions: [], prompts: [] },
|
||||
usedChroma: false,
|
||||
fellBack: true,
|
||||
strategy: 'hybrid'
|
||||
};
|
||||
}
|
||||
// Fail-fast: Chroma errors propagate to orchestrator (HTTP 503).
|
||||
return await this.rankAndHydrate(concept, ids, limit);
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -117,19 +105,8 @@ export class HybridSearchStrategy extends BaseSearchStrategy implements SearchSt
|
||||
|
||||
const ids = metadataResults.map(obs => obs.id);
|
||||
|
||||
try {
|
||||
return await this.rankAndHydrate(typeStr, ids, limit);
|
||||
} catch (error) {
|
||||
const errorObj = error instanceof Error ? error : new Error(String(error));
|
||||
logger.error('WORKER', 'HybridSearchStrategy: findByType failed', {}, errorObj);
|
||||
const results = this.sessionSearch.findByType(type as any, filterOptions);
|
||||
return {
|
||||
results: { observations: results, sessions: [], prompts: [] },
|
||||
usedChroma: false,
|
||||
fellBack: true,
|
||||
strategy: 'hybrid'
|
||||
};
|
||||
}
|
||||
// Fail-fast: Chroma errors propagate to orchestrator (HTTP 503).
|
||||
return await this.rankAndHydrate(typeStr, ids, limit);
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -158,18 +135,8 @@ export class HybridSearchStrategy extends BaseSearchStrategy implements SearchSt
|
||||
|
||||
const ids = metadataResults.observations.map(obs => obs.id);
|
||||
|
||||
try {
|
||||
return await this.rankAndHydrateForFile(filePath, ids, limit, sessions);
|
||||
} catch (error) {
|
||||
const errorObj = error instanceof Error ? error : new Error(String(error));
|
||||
logger.error('WORKER', 'HybridSearchStrategy: findByFile failed', {}, errorObj);
|
||||
const results = this.sessionSearch.findByFile(filePath, filterOptions);
|
||||
return {
|
||||
observations: results.observations,
|
||||
sessions: results.sessions,
|
||||
usedChroma: false
|
||||
};
|
||||
}
|
||||
// Fail-fast: Chroma errors propagate to orchestrator (HTTP 503).
|
||||
return await this.rankAndHydrateForFile(filePath, ids, limit, sessions);
|
||||
}
|
||||
|
||||
private async rankAndHydrate(
|
||||
@@ -191,7 +158,6 @@ export class HybridSearchStrategy extends BaseSearchStrategy implements SearchSt
|
||||
return {
|
||||
results: { observations, sessions: [], prompts: [] },
|
||||
usedChroma: true,
|
||||
fellBack: false,
|
||||
strategy: 'hybrid'
|
||||
};
|
||||
}
|
||||
|
||||
@@ -98,7 +98,6 @@ export class SQLiteSearchStrategy extends BaseSearchStrategy implements SearchSt
|
||||
return {
|
||||
results: { observations, sessions, prompts },
|
||||
usedChroma: false,
|
||||
fellBack: false,
|
||||
strategy: 'sqlite'
|
||||
};
|
||||
}
|
||||
|
||||
@@ -54,7 +54,6 @@ export abstract class BaseSearchStrategy implements SearchStrategy {
|
||||
prompts: []
|
||||
},
|
||||
usedChroma: strategy === 'chroma' || strategy === 'hybrid',
|
||||
fellBack: false,
|
||||
strategy
|
||||
};
|
||||
}
|
||||
|
||||
@@ -103,8 +103,6 @@ export interface StrategySearchResult {
|
||||
results: SearchResults;
|
||||
/** Whether Chroma was used successfully */
|
||||
usedChroma: boolean;
|
||||
/** Whether fallback was triggered */
|
||||
fellBack: boolean;
|
||||
/** Strategy that produced the results */
|
||||
strategy: SearchStrategyHint;
|
||||
}
|
||||
|
||||
@@ -57,7 +57,7 @@ export class SessionCompletionHandler {
|
||||
// completed session would never be picked up again.
|
||||
try {
|
||||
const pendingStore = this.sessionManager.getPendingMessageStore();
|
||||
const drainedCount = pendingStore.markAllSessionMessagesAbandoned(sessionDbId);
|
||||
const drainedCount = pendingStore.transitionMessagesTo('abandoned', { sessionDbId });
|
||||
if (drainedCount > 0) {
|
||||
logger.warn('SESSION', `Drained ${drainedCount} orphaned pending messages on session finalize`, {
|
||||
sessionId: sessionDbId, drainedCount
|
||||
|
||||
@@ -30,13 +30,6 @@ const BLOCKED_ENV_VARS = [
|
||||
'CLAUDECODE', // Prevent "cannot be launched inside another Claude Code session" error
|
||||
];
|
||||
|
||||
// Credential keys that claude-mem manages
|
||||
export const MANAGED_CREDENTIAL_KEYS = [
|
||||
'ANTHROPIC_API_KEY',
|
||||
'GEMINI_API_KEY',
|
||||
'OPENROUTER_API_KEY',
|
||||
];
|
||||
|
||||
export interface ClaudeMemEnv {
|
||||
// Credentials (optional - empty means use CLI billing for Claude)
|
||||
ANTHROPIC_API_KEY?: string;
|
||||
@@ -269,16 +262,6 @@ export function getCredential(key: keyof ClaudeMemEnv): string | undefined {
|
||||
return env[key];
|
||||
}
|
||||
|
||||
/**
|
||||
* Set a specific credential in claude-mem's .env
|
||||
* Pass empty string to remove the credential
|
||||
*/
|
||||
export function setCredential(key: keyof ClaudeMemEnv, value: string): void {
|
||||
const env = loadClaudeMemEnv();
|
||||
env[key] = value || undefined;
|
||||
saveClaudeMemEnv(env);
|
||||
}
|
||||
|
||||
/**
|
||||
* Check if claude-mem has an Anthropic API key configured
|
||||
* If false, it means CLI billing should be used
|
||||
|
||||
@@ -56,6 +56,7 @@ export interface SettingsDefaults {
|
||||
CLAUDE_MEM_TRANSCRIPTS_CONFIG_PATH: string; // Path to transcript watcher config JSON
|
||||
// Process Management
|
||||
CLAUDE_MEM_MAX_CONCURRENT_AGENTS: string; // Max concurrent Claude SDK agent subprocesses (default: 2)
|
||||
CLAUDE_MEM_HOOK_FAIL_LOUD_THRESHOLD: string; // Plan 05 Phase 8 — consecutive hook→worker unreachable failures before exit code 2 (default: 3)
|
||||
// Exclusion Settings
|
||||
CLAUDE_MEM_EXCLUDED_PROJECTS: string; // Comma-separated glob patterns for excluded project paths
|
||||
CLAUDE_MEM_FOLDER_MD_EXCLUDE: string; // JSON array of folder paths to exclude from CLAUDE.md generation
|
||||
@@ -133,6 +134,7 @@ export class SettingsDefaultsManager {
|
||||
CLAUDE_MEM_TRANSCRIPTS_CONFIG_PATH: join(homedir(), '.claude-mem', 'transcript-watch.json'),
|
||||
// Process Management
|
||||
CLAUDE_MEM_MAX_CONCURRENT_AGENTS: '2', // Max concurrent Claude SDK agent subprocesses
|
||||
CLAUDE_MEM_HOOK_FAIL_LOUD_THRESHOLD: '3', // Plan 05 Phase 8 — escalate to exit code 2 after N consecutive worker-unreachable hook invocations
|
||||
// Exclusion Settings
|
||||
CLAUDE_MEM_EXCLUDED_PROJECTS: '', // Comma-separated glob patterns for excluded project paths
|
||||
CLAUDE_MEM_FOLDER_MD_EXCLUDE: '[]', // JSON array of folder paths to exclude from CLAUDE.md generation
|
||||
@@ -193,7 +195,7 @@ export class SettingsDefaultsManager {
|
||||
* Handles both string 'true' and boolean true from JSON
|
||||
*/
|
||||
static getBool(key: keyof SettingsDefaults): boolean {
|
||||
const value = this.get(key);
|
||||
const value: unknown = this.get(key);
|
||||
return value === 'true' || value === true;
|
||||
}
|
||||
|
||||
|
||||
@@ -0,0 +1,35 @@
|
||||
/**
|
||||
* Per-process settings cache for hook handlers.
|
||||
*
|
||||
* Plan 05 Phase 4 (PATHFINDER-2026-04-22): each hook process is short-lived,
|
||||
* but multiple handlers within a single hook invocation independently call
|
||||
* `SettingsDefaultsManager.loadFromFile(USER_SETTINGS_PATH)` and re-read the
|
||||
* settings file from disk. Settings cannot mutate during a single hook
|
||||
* invocation, so we memoize the first read for the lifetime of the process.
|
||||
*
|
||||
* One helper, N callers (Principle 6). Every hook handler that needs settings
|
||||
* imports `loadFromFileOnce()` from here instead of calling
|
||||
* `SettingsDefaultsManager.loadFromFile` directly.
|
||||
*/
|
||||
|
||||
import {
|
||||
SettingsDefaultsManager,
|
||||
type SettingsDefaults,
|
||||
} from './SettingsDefaultsManager.js';
|
||||
import { USER_SETTINGS_PATH } from './paths.js';
|
||||
|
||||
let cachedSettings: SettingsDefaults | null = null;
|
||||
|
||||
/**
|
||||
* Load settings from disk on first call, return the memoized value thereafter.
|
||||
*
|
||||
* Cache lifetime is the process — hooks are short-lived (typically <1s), so a
|
||||
* settings change made by the user is picked up the next time Claude Code
|
||||
* spawns a hook process. There is no in-process invalidation API because there
|
||||
* is no in-process mutation path.
|
||||
*/
|
||||
export function loadFromFileOnce(): SettingsDefaults {
|
||||
if (cachedSettings !== null) return cachedSettings;
|
||||
cachedSettings = SettingsDefaultsManager.loadFromFile(USER_SETTINGS_PATH);
|
||||
return cachedSettings;
|
||||
}
|
||||
@@ -0,0 +1,44 @@
|
||||
/**
|
||||
* Single answer to "should this hook run for this cwd?"
|
||||
*
|
||||
* Plan 05 Phase 5 (PATHFINDER-2026-04-22): three handlers (observation,
|
||||
* session-init, file-context) each duplicated the
|
||||
* `loadFromFileOnce() → isProjectExcluded(cwd, settings.CLAUDE_MEM_EXCLUDED_PROJECTS)`
|
||||
* pair. This module is the only entry point for that question; handlers call
|
||||
* `shouldTrackProject(cwd)` and route through here.
|
||||
*
|
||||
* One helper, N callers (Principle 6). After this module lands, no handler
|
||||
* references `isProjectExcluded` directly — the import lives only here.
|
||||
*/
|
||||
|
||||
import { relative, isAbsolute } from 'path';
|
||||
import { isProjectExcluded } from '../utils/project-filter.js';
|
||||
import { loadFromFileOnce } from './hook-settings.js';
|
||||
import { OBSERVER_SESSIONS_DIR } from './paths.js';
|
||||
|
||||
function isWithin(child: string, parent: string): boolean {
|
||||
if (child === parent) return true;
|
||||
const rel = relative(parent, child);
|
||||
return rel.length > 0 && !rel.startsWith('..') && !isAbsolute(rel);
|
||||
}
|
||||
|
||||
/**
|
||||
* @returns true when the project at `cwd` is NOT excluded from claude-mem
|
||||
* tracking, i.e., the hook should proceed; false when the project
|
||||
* matches one of the exclusion globs.
|
||||
*
|
||||
* Hard-excludes OBSERVER_SESSIONS_DIR: the SDK agent spawns Claude Code with
|
||||
* that cwd, and its hooks must never feed the worker — otherwise the observer's
|
||||
* own init/continuation/summary prompts end up stored as `user_prompts` and
|
||||
* leak into the viewer (meta-observation).
|
||||
*/
|
||||
export function shouldTrackProject(cwd: string): boolean {
|
||||
if (!cwd) return true;
|
||||
// path.relative handles separator differences (Windows '\\' vs POSIX '/')
|
||||
// and trailing-slash variance, which a literal startsWith would miss.
|
||||
if (isWithin(cwd, OBSERVER_SESSIONS_DIR)) {
|
||||
return false;
|
||||
}
|
||||
const settings = loadFromFileOnce();
|
||||
return !isProjectExcluded(cwd, settings.CLAUDE_MEM_EXCLUDED_PROJECTS);
|
||||
}
|
||||
+395
-21
@@ -1,9 +1,17 @@
|
||||
import path from "path";
|
||||
import { readFileSync } from "fs";
|
||||
import { readFileSync, existsSync, writeFileSync, renameSync, mkdirSync } from "fs";
|
||||
import { spawn, execSync } from "child_process";
|
||||
import { logger } from "../utils/logger.js";
|
||||
import { HOOK_TIMEOUTS, getTimeout } from "./hook-constants.js";
|
||||
import { HOOK_TIMEOUTS, HOOK_EXIT_CODES, getTimeout } from "./hook-constants.js";
|
||||
import { SettingsDefaultsManager } from "./SettingsDefaultsManager.js";
|
||||
import { MARKETPLACE_ROOT } from "./paths.js";
|
||||
import { MARKETPLACE_ROOT, DATA_DIR } from "./paths.js";
|
||||
import { loadFromFileOnce } from "./hook-settings.js";
|
||||
// `validateWorkerPidFile` consults `captureProcessStartToken` at
|
||||
// `src/supervisor/process-registry.ts` for PID-reuse detection (commit
|
||||
// 99060bac). The lazy-spawn fast path below uses it to confirm a live port
|
||||
// is owned by OUR worker incarnation rather than a stale PID squatting on
|
||||
// the port after container restart.
|
||||
import { validateWorkerPidFile } from "../supervisor/index.js";
|
||||
|
||||
// Named constants for health checks
|
||||
// Allow env var override for users on slow systems (e.g., CLAUDE_MEM_HEALTH_TIMEOUT_MS=10000)
|
||||
@@ -214,26 +222,392 @@ async function checkWorkerVersion(): Promise<void> {
|
||||
|
||||
|
||||
/**
|
||||
* Ensure worker service is running
|
||||
* Quick health check - returns false if worker not healthy (doesn't block)
|
||||
* Port might be in use by another process, or worker might not be started yet
|
||||
* Resolve the absolute path to the worker-service script the hook should
|
||||
* relaunch as a detached daemon. Hooks live in the plugin's `scripts/`
|
||||
* directory next to `worker-service.cjs`; production and dev checkouts both
|
||||
* ship the bundled CJS there. Returns null when no candidate exists on disk
|
||||
* (partial install, build artifact missing).
|
||||
*/
|
||||
export async function ensureWorkerRunning(): Promise<boolean> {
|
||||
// Quick health check (single attempt, no polling)
|
||||
try {
|
||||
if (await isWorkerHealthy()) {
|
||||
await checkWorkerVersion(); // logs warning on mismatch, doesn't restart
|
||||
return true; // Worker healthy
|
||||
}
|
||||
} catch (e) {
|
||||
// Not healthy - log for debugging
|
||||
logger.debug('SYSTEM', 'Worker health check failed', {
|
||||
error: e instanceof Error ? e.message : String(e)
|
||||
});
|
||||
function resolveWorkerScriptPath(): string | null {
|
||||
const candidates = [
|
||||
path.join(MARKETPLACE_ROOT, 'plugin', 'scripts', 'worker-service.cjs'),
|
||||
path.join(process.cwd(), 'plugin', 'scripts', 'worker-service.cjs'),
|
||||
];
|
||||
for (const candidate of candidates) {
|
||||
if (existsSync(candidate)) return candidate;
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
// Port might be in use by something else, or worker not started
|
||||
// Return false but don't throw - let caller decide how to handle
|
||||
logger.warn('SYSTEM', 'Worker not healthy, hook will proceed gracefully');
|
||||
/**
|
||||
* Resolve the absolute path to the Bun runtime.
|
||||
*
|
||||
* Local to worker-utils.ts so the lazy-spawn path does not transitively
|
||||
* import `services/infrastructure/ProcessManager.ts` — that module pulls
|
||||
* in `bun:sqlite` via `cwd-remap`, and pulling it in would break the NPX
|
||||
* CLI bundle which must run under plain Node (no Bun). The worker daemon
|
||||
* itself requires Bun (it uses bun:sqlite directly); this lookup finds
|
||||
* the Bun binary that the daemon will execute under.
|
||||
*/
|
||||
function resolveBunRuntime(): string | null {
|
||||
if (process.env.BUN && existsSync(process.env.BUN)) return process.env.BUN;
|
||||
|
||||
try {
|
||||
const cmd = process.platform === 'win32' ? 'where bun' : 'which bun';
|
||||
const output = execSync(cmd, {
|
||||
stdio: ['ignore', 'pipe', 'ignore'],
|
||||
encoding: 'utf-8',
|
||||
windowsHide: true,
|
||||
});
|
||||
const firstMatch = output
|
||||
.split(/\r?\n/)
|
||||
.map(line => line.trim())
|
||||
.find(line => line.length > 0);
|
||||
return firstMatch || null;
|
||||
} catch {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Wait for the worker port to open, using exponential backoff.
|
||||
*
|
||||
* Deliberately hand-rolled — `respawn` or similar npm helpers add a
|
||||
* supervisor semantic layer we do not want here (Principle 6). The retry
|
||||
* policy is three attempts with 250ms → 500ms → 1000ms backoff, which is
|
||||
* enough to cover the worker's start-up (~1-2s on a warm cache, slower on
|
||||
* Windows) without blocking a hook for long when the spawn outright failed.
|
||||
*/
|
||||
async function waitForWorkerPort(options: { attempts: number; backoffMs: number }): Promise<boolean> {
|
||||
let delayMs = options.backoffMs;
|
||||
for (let attempt = 1; attempt <= options.attempts; attempt++) {
|
||||
if (await isWorkerPortAlive()) return true;
|
||||
if (attempt < options.attempts) {
|
||||
await new Promise<void>(resolve => setTimeout(resolve, delayMs));
|
||||
delayMs *= 2;
|
||||
}
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
/**
|
||||
* Is the worker port owned by a live worker we recognize?
|
||||
*
|
||||
* Two gates:
|
||||
* 1. HTTP /api/health returns 200, AND
|
||||
* 2. PID-file start-token check (via `validateWorkerPidFile` →
|
||||
* `captureProcessStartToken`) confirms the recorded PID has not been
|
||||
* reused by a different process since the file was written.
|
||||
*
|
||||
* When the PID file is missing we accept a healthy HTTP response on its own
|
||||
* — the file is written by the worker itself after `listen()` succeeds, so
|
||||
* a brief window exists during which a freshly-spawned worker is reachable
|
||||
* via HTTP but has not yet persisted its PID record. Treating this as
|
||||
* "not ours" would cause the hook to double-spawn in a race with the
|
||||
* worker's own PID-file write.
|
||||
*
|
||||
* An 'alive' status that fails identity verification is treated as dead so
|
||||
* the caller falls through to the spawn path (Phase 8 contract).
|
||||
*/
|
||||
async function isWorkerPortAlive(): Promise<boolean> {
|
||||
let healthy: boolean;
|
||||
try {
|
||||
healthy = await isWorkerHealthy();
|
||||
} catch (error: unknown) {
|
||||
logger.debug('SYSTEM', 'Worker health check threw', {
|
||||
error: error instanceof Error ? error.message : String(error),
|
||||
});
|
||||
return false;
|
||||
}
|
||||
if (!healthy) return false;
|
||||
|
||||
const pidStatus = validateWorkerPidFile({ logAlive: false });
|
||||
if (pidStatus === 'missing') return true; // race: listening before PID file written
|
||||
if (pidStatus === 'alive') return true; // identity verified via start-token
|
||||
return false; // 'stale' | 'invalid' — PID reused
|
||||
}
|
||||
|
||||
/**
|
||||
* Lazy-spawn the worker if it is not already running, then wait for its port.
|
||||
*
|
||||
* Flow:
|
||||
* 1. If the port is alive AND verified as ours, return true (fast path).
|
||||
* 2. Otherwise, resolve the bun runtime + worker script path.
|
||||
* 3. Spawn detached, `unref()` so the hook's exit does not take the worker
|
||||
* down with it (the worker lives as its own independent daemon).
|
||||
* 4. Wait for the port to come up, up to 3 attempts with exponential
|
||||
* backoff (250ms → 500ms → 1000ms — ~1.75s total).
|
||||
*
|
||||
* PID-reuse safety is inherited from `validateWorkerPidFile` (commit
|
||||
* 99060bac) — see the `isWorkerPortAlive` comment above. There is no
|
||||
* auto-restart loop; failure is reported via the return value so the hook
|
||||
* can surface it through exit code 2 (Principle 2 — fail-fast).
|
||||
*/
|
||||
export async function ensureWorkerRunning(): Promise<boolean> {
|
||||
if (await isWorkerPortAlive()) {
|
||||
await checkWorkerVersion();
|
||||
return true;
|
||||
}
|
||||
|
||||
const runtimePath = resolveBunRuntime();
|
||||
const scriptPath = resolveWorkerScriptPath();
|
||||
|
||||
if (!runtimePath) {
|
||||
logger.warn('SYSTEM', 'Cannot lazy-spawn worker: Bun runtime not found on PATH');
|
||||
return false;
|
||||
}
|
||||
if (!scriptPath) {
|
||||
logger.warn('SYSTEM', 'Cannot lazy-spawn worker: worker-service.cjs not found in plugin/scripts');
|
||||
return false;
|
||||
}
|
||||
|
||||
logger.info('SYSTEM', 'Worker not running — lazy-spawning', { runtimePath, scriptPath });
|
||||
|
||||
try {
|
||||
const proc = spawn(runtimePath, [scriptPath, '--daemon'], {
|
||||
detached: true,
|
||||
stdio: ['ignore', 'ignore', 'ignore'],
|
||||
});
|
||||
proc.unref();
|
||||
} catch (error: unknown) {
|
||||
if (error instanceof Error) {
|
||||
logger.error('SYSTEM', 'Lazy-spawn of worker failed', { runtimePath, scriptPath }, error);
|
||||
} else {
|
||||
logger.error('SYSTEM', 'Lazy-spawn of worker failed (non-Error)', {
|
||||
runtimePath, scriptPath, error: String(error),
|
||||
});
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
const alive = await waitForWorkerPort({ attempts: 3, backoffMs: 250 });
|
||||
if (!alive) {
|
||||
logger.warn('SYSTEM', 'Worker port did not open after lazy-spawn within 3 attempts');
|
||||
return false;
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Plan 05 Phase 9 — single per-process alive cache.
|
||||
//
|
||||
// One hook invocation may issue multiple worker requests (session-init issues
|
||||
// several). The alive-state cannot change mid-invocation without the hook
|
||||
// process exiting, so memoize the first result. By Principle 6 (one helper,
|
||||
// N callers), this is the ONLY alive-state cache; all hook→worker call sites
|
||||
// route through `executeWithWorkerFallback` (Phase 2) which calls this.
|
||||
// ============================================================================
|
||||
|
||||
let aliveCache: boolean | null = null;
|
||||
|
||||
export async function ensureWorkerAliveOnce(): Promise<boolean> {
|
||||
if (aliveCache !== null) return aliveCache;
|
||||
aliveCache = await ensureWorkerRunning();
|
||||
return aliveCache;
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Plan 05 Phase 8 — fail-loud counter.
|
||||
//
|
||||
// The counter records how many consecutive hook invocations have seen the
|
||||
// worker unreachable. After N (default 3) consecutive failures, the next
|
||||
// hook exits code 2 so Claude Code's hook contract surfaces the outage to
|
||||
// Claude. Below N, hooks exit 0 to avoid breaking the user's session.
|
||||
//
|
||||
// This is NOT a retry. We do not reinvoke `ensureWorkerAliveOnce` or
|
||||
// reattempt the HTTP request. We record the result of the one primary-path
|
||||
// attempt and either return (graceful) or escalate (fail-loud).
|
||||
//
|
||||
// File: ~/.claude-mem/state/hook-failures.json
|
||||
// Atomic write: tmp + rename (POSIX atomic within a filesystem).
|
||||
// ============================================================================
|
||||
|
||||
interface HookFailureState {
|
||||
consecutiveFailures: number;
|
||||
lastFailureAt: number;
|
||||
}
|
||||
|
||||
const FAIL_LOUD_DEFAULT_THRESHOLD = 3;
|
||||
|
||||
function getStateDir(): string {
|
||||
return path.join(DATA_DIR, 'state');
|
||||
}
|
||||
|
||||
function getHookFailuresPath(): string {
|
||||
return path.join(getStateDir(), 'hook-failures.json');
|
||||
}
|
||||
|
||||
function readHookFailureState(): HookFailureState {
|
||||
try {
|
||||
const raw = readFileSync(getHookFailuresPath(), 'utf-8');
|
||||
const parsed = JSON.parse(raw) as Partial<HookFailureState>;
|
||||
return {
|
||||
consecutiveFailures: typeof parsed.consecutiveFailures === 'number' && Number.isFinite(parsed.consecutiveFailures)
|
||||
? Math.max(0, Math.floor(parsed.consecutiveFailures))
|
||||
: 0,
|
||||
lastFailureAt: typeof parsed.lastFailureAt === 'number' && Number.isFinite(parsed.lastFailureAt)
|
||||
? parsed.lastFailureAt
|
||||
: 0,
|
||||
};
|
||||
} catch {
|
||||
// Missing file or corrupt JSON → fresh state.
|
||||
return { consecutiveFailures: 0, lastFailureAt: 0 };
|
||||
}
|
||||
}
|
||||
|
||||
function writeHookFailureStateAtomic(state: HookFailureState): void {
|
||||
const stateDir = getStateDir();
|
||||
const dest = getHookFailuresPath();
|
||||
const tmp = `${dest}.tmp`;
|
||||
try {
|
||||
if (!existsSync(stateDir)) {
|
||||
mkdirSync(stateDir, { recursive: true });
|
||||
}
|
||||
writeFileSync(tmp, JSON.stringify(state), 'utf-8');
|
||||
renameSync(tmp, dest);
|
||||
} catch (error: unknown) {
|
||||
logger.debug('SYSTEM', 'Failed to persist hook-failure counter', {
|
||||
error: error instanceof Error ? error.message : String(error),
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
function getFailLoudThreshold(): number {
|
||||
try {
|
||||
const settings = loadFromFileOnce();
|
||||
const raw = settings.CLAUDE_MEM_HOOK_FAIL_LOUD_THRESHOLD;
|
||||
const parsed = parseInt(raw, 10);
|
||||
if (Number.isFinite(parsed) && parsed >= 1) return parsed;
|
||||
} catch {
|
||||
// settings unreadable — fall through to default
|
||||
}
|
||||
return FAIL_LOUD_DEFAULT_THRESHOLD;
|
||||
}
|
||||
|
||||
/**
|
||||
* Record a worker-unreachable hook invocation. Returns the new counter value.
|
||||
* If the counter reaches the threshold, this function writes to stderr and
|
||||
* exits the process with code 2 (blocking error per Claude Code hook contract).
|
||||
*
|
||||
* Not a retry — does not reattempt the operation. The caller already ran the
|
||||
* single primary-path attempt and got `false` from `ensureWorkerAliveOnce`.
|
||||
*/
|
||||
function recordWorkerUnreachable(): number {
|
||||
const state = readHookFailureState();
|
||||
const next: HookFailureState = {
|
||||
consecutiveFailures: state.consecutiveFailures + 1,
|
||||
lastFailureAt: Date.now(),
|
||||
};
|
||||
writeHookFailureStateAtomic(next);
|
||||
|
||||
const threshold = getFailLoudThreshold();
|
||||
if (next.consecutiveFailures >= threshold) {
|
||||
process.stderr.write(
|
||||
`claude-mem worker unreachable for ${next.consecutiveFailures} consecutive hooks.\n`
|
||||
);
|
||||
process.exit(HOOK_EXIT_CODES.BLOCKING_ERROR);
|
||||
}
|
||||
return next.consecutiveFailures;
|
||||
}
|
||||
|
||||
/**
|
||||
* Reset the consecutive-failure counter. Called when the worker is alive,
|
||||
* acknowledging that any prior outage has ended. Not a retry — it is a
|
||||
* success-path acknowledgement.
|
||||
*/
|
||||
function resetWorkerFailureCounter(): void {
|
||||
const state = readHookFailureState();
|
||||
if (state.consecutiveFailures === 0) return; // skip a no-op write
|
||||
writeHookFailureStateAtomic({ consecutiveFailures: 0, lastFailureAt: 0 });
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Plan 05 Phase 2 — `executeWithWorkerFallback(url, method, body)`.
|
||||
//
|
||||
// Eight handlers used to duplicate the
|
||||
// `ensureWorkerRunning() → workerHttpRequest() → if (!ok) return { continue: true }`
|
||||
// sequence. This helper is the ONE implementation; eight handlers import it.
|
||||
//
|
||||
// Behavior:
|
||||
// 1. ensureWorkerAliveOnce() (Phase 9). If false → fail-loud counter
|
||||
// (Phase 8). May process.exit(2). Otherwise return graceful fallback.
|
||||
// 2. workerHttpRequest(url, method, body). Parse JSON.
|
||||
// 3. On success, reset the fail-loud counter.
|
||||
//
|
||||
// No retry inside this helper. No timeout-and-exit-0 swallow. The fail-loud
|
||||
// counter records consecutive invocation outcomes; it does not reinvoke work.
|
||||
// ============================================================================
|
||||
|
||||
// Branded sentinel so isWorkerFallback cannot false-positive on legitimate
|
||||
// API responses that happen to carry `continue: true` in their own schema.
|
||||
const WORKER_FALLBACK_BRAND: unique symbol = Symbol.for('claude-mem/worker-fallback');
|
||||
|
||||
export type WorkerFallback =
|
||||
| { continue: true; [WORKER_FALLBACK_BRAND]: true }
|
||||
| { continue: true; reason: string; [WORKER_FALLBACK_BRAND]: true };
|
||||
|
||||
export type WorkerCallResult<T> = T | WorkerFallback;
|
||||
|
||||
export function isWorkerFallback<T>(result: WorkerCallResult<T>): result is WorkerFallback {
|
||||
return typeof result === 'object'
|
||||
&& result !== null
|
||||
&& (result as { [WORKER_FALLBACK_BRAND]?: unknown })[WORKER_FALLBACK_BRAND] === true;
|
||||
}
|
||||
|
||||
export interface WorkerFallbackOptions {
|
||||
/**
|
||||
* Per-call HTTP timeout in ms. Forwarded to workerHttpRequest. Omit to use
|
||||
* HEALTH_CHECK_TIMEOUT_MS (the default ~3 s suitable for short pings).
|
||||
* All hook endpoints are fire-and-forget queueing endpoints that return
|
||||
* `{status: 'queued'}` immediately, so the default suffices.
|
||||
*/
|
||||
timeoutMs?: number;
|
||||
}
|
||||
|
||||
export async function executeWithWorkerFallback<T = unknown>(
|
||||
url: string,
|
||||
method: 'GET' | 'POST' | 'PUT' | 'DELETE',
|
||||
body?: unknown,
|
||||
options: WorkerFallbackOptions = {},
|
||||
): Promise<WorkerCallResult<T>> {
|
||||
const alive = await ensureWorkerAliveOnce();
|
||||
if (!alive) {
|
||||
// Records and possibly process.exit(2). If we return below, the counter
|
||||
// is below threshold, the user's session continues uninterrupted.
|
||||
recordWorkerUnreachable();
|
||||
return { continue: true, reason: 'worker_unreachable', [WORKER_FALLBACK_BRAND]: true };
|
||||
}
|
||||
|
||||
const init: { method: string; headers?: Record<string, string>; body?: string; timeoutMs?: number } = { method };
|
||||
if (body !== undefined) {
|
||||
init.headers = { 'Content-Type': 'application/json' };
|
||||
init.body = JSON.stringify(body);
|
||||
}
|
||||
if (options.timeoutMs !== undefined) {
|
||||
init.timeoutMs = options.timeoutMs;
|
||||
}
|
||||
|
||||
const response = await workerHttpRequest(url, init);
|
||||
if (!response.ok) {
|
||||
// Non-2xx is a real worker response (so the worker IS reachable). Reset
|
||||
// the consecutive-failures counter; surface the response body to the
|
||||
// caller as a typed value via T's caller-controlled shape. Callers that
|
||||
// care about non-2xx must inspect the value (or wrap with their own
|
||||
// status check); the helper does not silently coerce non-2xx into a
|
||||
// graceful fallback.
|
||||
resetWorkerFailureCounter();
|
||||
const text = await response.text().catch(() => '');
|
||||
let parsed: unknown = text;
|
||||
try { parsed = JSON.parse(text); } catch { /* keep raw text */ }
|
||||
return parsed as T;
|
||||
}
|
||||
|
||||
resetWorkerFailureCounter();
|
||||
const text = await response.text();
|
||||
if (text.length === 0) return undefined as unknown as T;
|
||||
try {
|
||||
return JSON.parse(text) as T;
|
||||
} catch {
|
||||
return text as unknown as T;
|
||||
}
|
||||
}
|
||||
|
||||
@@ -146,10 +146,6 @@ export async function startSupervisor(): Promise<void> {
|
||||
await supervisorSingleton.start();
|
||||
}
|
||||
|
||||
export async function stopSupervisor(): Promise<void> {
|
||||
await supervisorSingleton.stop();
|
||||
}
|
||||
|
||||
export function getSupervisor(): Supervisor {
|
||||
return supervisorSingleton;
|
||||
}
|
||||
@@ -168,7 +164,7 @@ export function validateWorkerPidFile(options: ValidateWorkerPidOptions = {}): V
|
||||
let pidInfo: PidInfo | null = null;
|
||||
|
||||
try {
|
||||
pidInfo = JSON.parse(readFileSync(pidFilePath, 'utf-8')) as PidInfo;
|
||||
pidInfo = JSON.parse(readFileSync(pidFilePath, 'utf-8')) as PidInfo | null;
|
||||
} catch (error: unknown) {
|
||||
if (error instanceof Error) {
|
||||
logger.warn('SYSTEM', 'Failed to parse worker PID file, removing it', { path: pidFilePath }, error);
|
||||
@@ -182,7 +178,8 @@ export function validateWorkerPidFile(options: ValidateWorkerPidOptions = {}): V
|
||||
return 'invalid';
|
||||
}
|
||||
|
||||
if (verifyPidFileOwnership(pidInfo)) {
|
||||
const isAlive = verifyPidFileOwnership(pidInfo);
|
||||
if (isAlive && pidInfo) {
|
||||
if (options.logAlive ?? true) {
|
||||
logger.info('SYSTEM', 'Worker already running (PID alive)', {
|
||||
existingPid: pidInfo.pid,
|
||||
@@ -194,9 +191,9 @@ export function validateWorkerPidFile(options: ValidateWorkerPidOptions = {}): V
|
||||
}
|
||||
|
||||
logger.info('SYSTEM', 'Removing stale PID file (worker process is dead or PID has been reused)', {
|
||||
pid: pidInfo.pid,
|
||||
port: pidInfo.port,
|
||||
startedAt: pidInfo.startedAt
|
||||
pid: pidInfo?.pid,
|
||||
port: pidInfo?.port,
|
||||
startedAt: pidInfo?.startedAt
|
||||
});
|
||||
rmSync(pidFilePath, { force: true });
|
||||
return 'stale';
|
||||
|
||||
@@ -1,8 +1,9 @@
|
||||
import { ChildProcess, spawnSync } from 'child_process';
|
||||
import { ChildProcess, spawn, spawnSync } from 'child_process';
|
||||
import { existsSync, mkdirSync, readFileSync, writeFileSync } from 'fs';
|
||||
import { homedir } from 'os';
|
||||
import path from 'path';
|
||||
import { logger } from '../utils/logger.js';
|
||||
import { sanitizeEnv } from './env-sanitizer.js';
|
||||
|
||||
const REAP_SESSION_SIGTERM_TIMEOUT_MS = 5_000;
|
||||
const REAP_SESSION_SIGKILL_TIMEOUT_MS = 1_000;
|
||||
@@ -15,6 +16,14 @@ export interface ManagedProcessInfo {
|
||||
type: string;
|
||||
sessionId?: string | number;
|
||||
startedAt: string;
|
||||
// POSIX process group leader PID for group-scoped teardown.
|
||||
// On Unix, when a child is spawned with `detached: true`, the kernel calls
|
||||
// setpgid() and the child becomes the leader of its own group — its pgid
|
||||
// equals its pid. Stored so `process.kill(-pgid, signal)` can tear down
|
||||
// the child AND every descendant it spawned in one syscall (Principle 5).
|
||||
// Undefined on Windows (no POSIX groups) and for processes that were not
|
||||
// spawned with detached: true (e.g. the worker itself, MCP stdio clients).
|
||||
pgid?: number;
|
||||
}
|
||||
|
||||
export interface ManagedProcessRecord extends ManagedProcessInfo {
|
||||
@@ -303,22 +312,30 @@ export class ProcessRegistry {
|
||||
pids: sessionRecords.map(r => r.pid)
|
||||
});
|
||||
|
||||
// Phase 1: SIGTERM all alive processes
|
||||
// Phase 1: SIGTERM all alive processes — use process-group teardown for
|
||||
// records that carry pgid so any descendants the SDK spawned are killed
|
||||
// too (Principle 5).
|
||||
const aliveRecords = sessionRecords.filter(r => isPidAlive(r.pid));
|
||||
for (const record of aliveRecords) {
|
||||
try {
|
||||
process.kill(record.pid, 'SIGTERM');
|
||||
if (typeof record.pgid === 'number' && process.platform !== 'win32') {
|
||||
process.kill(-record.pgid, 'SIGTERM');
|
||||
} else {
|
||||
process.kill(record.pid, 'SIGTERM');
|
||||
}
|
||||
} catch (error: unknown) {
|
||||
if (error instanceof Error) {
|
||||
const code = (error as NodeJS.ErrnoException).code;
|
||||
if (code !== 'ESRCH') {
|
||||
logger.debug('SYSTEM', `Failed to SIGTERM session process PID ${record.pid}`, {
|
||||
pid: record.pid
|
||||
pid: record.pid,
|
||||
pgid: record.pgid
|
||||
}, error);
|
||||
}
|
||||
} else {
|
||||
logger.warn('SYSTEM', `Failed to SIGTERM session process PID ${record.pid} (non-Error)`, {
|
||||
pid: record.pid,
|
||||
pgid: record.pgid,
|
||||
error: String(error)
|
||||
});
|
||||
}
|
||||
@@ -333,26 +350,34 @@ export class ProcessRegistry {
|
||||
await new Promise(resolve => setTimeout(resolve, 100));
|
||||
}
|
||||
|
||||
// Phase 3: SIGKILL any survivors
|
||||
// Phase 3: SIGKILL any survivors — process-group teardown when pgid is
|
||||
// recorded so descendants are killed too.
|
||||
const survivors = aliveRecords.filter(r => isPidAlive(r.pid));
|
||||
for (const record of survivors) {
|
||||
logger.warn('SYSTEM', `Session process PID ${record.pid} did not exit after SIGTERM, sending SIGKILL`, {
|
||||
pid: record.pid,
|
||||
pgid: record.pgid,
|
||||
sessionId: sessionIdNum
|
||||
});
|
||||
try {
|
||||
process.kill(record.pid, 'SIGKILL');
|
||||
if (typeof record.pgid === 'number' && process.platform !== 'win32') {
|
||||
process.kill(-record.pgid, 'SIGKILL');
|
||||
} else {
|
||||
process.kill(record.pid, 'SIGKILL');
|
||||
}
|
||||
} catch (error: unknown) {
|
||||
if (error instanceof Error) {
|
||||
const code = (error as NodeJS.ErrnoException).code;
|
||||
if (code !== 'ESRCH') {
|
||||
logger.debug('SYSTEM', `Failed to SIGKILL session process PID ${record.pid}`, {
|
||||
pid: record.pid
|
||||
pid: record.pid,
|
||||
pgid: record.pgid
|
||||
}, error);
|
||||
}
|
||||
} else {
|
||||
logger.warn('SYSTEM', `Failed to SIGKILL session process PID ${record.pid} (non-Error)`, {
|
||||
pid: record.pid,
|
||||
pgid: record.pgid,
|
||||
error: String(error)
|
||||
});
|
||||
}
|
||||
@@ -406,3 +431,401 @@ export function getProcessRegistry(): ProcessRegistry {
|
||||
export function createProcessRegistry(registryPath: string): ProcessRegistry {
|
||||
return new ProcessRegistry(registryPath);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// SDK session lookup + exit verification
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
export interface TrackedSdkProcess {
|
||||
pid: number;
|
||||
pgid: number | undefined;
|
||||
sessionDbId: number;
|
||||
process: ChildProcess;
|
||||
}
|
||||
|
||||
/**
|
||||
* Look up the live SDK subprocess for a given session, if any.
|
||||
*
|
||||
* Returns undefined when no SDK record is registered for the session, or
|
||||
* when the ChildProcess reference has been dropped (process exited and was
|
||||
* unregistered). Warns on duplicates — multiple SDK records per session
|
||||
* indicate a race in createSdkSpawnFactory's pre-spawn cleanup.
|
||||
*/
|
||||
export function getSdkProcessForSession(sessionDbId: number): TrackedSdkProcess | undefined {
|
||||
const registry = getProcessRegistry();
|
||||
const matches = registry.getBySession(sessionDbId).filter(r => r.type === 'sdk');
|
||||
|
||||
if (matches.length > 1) {
|
||||
logger.warn('PROCESS', `Multiple SDK processes found for session ${sessionDbId}`, {
|
||||
count: matches.length,
|
||||
pids: matches.map(m => m.pid),
|
||||
});
|
||||
}
|
||||
|
||||
const record = matches[0];
|
||||
if (!record) return undefined;
|
||||
|
||||
const processRef = registry.getRuntimeProcess(record.id);
|
||||
if (!processRef) return undefined;
|
||||
|
||||
return {
|
||||
pid: record.pid,
|
||||
pgid: record.pgid,
|
||||
sessionDbId,
|
||||
process: processRef,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Wait for an SDK subprocess to exit, escalating to SIGKILL on the process
|
||||
* group if it overstays `timeoutMs`. Fully event-driven — no polling.
|
||||
*
|
||||
* This is primary-path cleanup invoked from session-level finally() blocks
|
||||
* when a session ends; it is NOT a reaper. It runs at most once per session
|
||||
* deletion. Process-group teardown (`kill(-pgid, SIGKILL)`) ensures any
|
||||
* descendants the SDK spawned are also killed.
|
||||
*/
|
||||
export async function ensureSdkProcessExit(
|
||||
tracked: TrackedSdkProcess,
|
||||
timeoutMs: number = 5000
|
||||
): Promise<void> {
|
||||
const { pid, pgid, process: proc } = tracked;
|
||||
|
||||
// Already exited? Trust exitCode, not proc.killed — proc.killed only means
|
||||
// Node sent a signal; the process may still be running.
|
||||
if (proc.exitCode !== null) return;
|
||||
|
||||
const exitPromise = new Promise<void>((resolve) => {
|
||||
proc.once('exit', () => resolve());
|
||||
});
|
||||
|
||||
const timeoutPromise = new Promise<void>((resolve) => {
|
||||
setTimeout(resolve, timeoutMs);
|
||||
});
|
||||
|
||||
await Promise.race([exitPromise, timeoutPromise]);
|
||||
|
||||
if (proc.exitCode !== null) return;
|
||||
|
||||
// Timeout: escalate to SIGKILL on the whole process group so any
|
||||
// descendants the SDK spawned are killed too (Principle 5).
|
||||
logger.warn('PROCESS', `PID ${pid} did not exit after ${timeoutMs}ms, sending SIGKILL to process group`, {
|
||||
pid, pgid, timeoutMs,
|
||||
});
|
||||
try {
|
||||
if (typeof pgid === 'number' && process.platform !== 'win32') {
|
||||
process.kill(-pgid, 'SIGKILL');
|
||||
} else {
|
||||
proc.kill('SIGKILL');
|
||||
}
|
||||
} catch {
|
||||
// Already dead — fine.
|
||||
}
|
||||
|
||||
// Wait up to 1s for SIGKILL to take effect (event-driven, not blind sleep).
|
||||
const sigkillExit = new Promise<void>((resolve) => {
|
||||
proc.once('exit', () => resolve());
|
||||
});
|
||||
const sigkillTimeout = new Promise<void>((resolve) => {
|
||||
setTimeout(resolve, 1000);
|
||||
});
|
||||
await Promise.race([sigkillExit, sigkillTimeout]);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Pool slot waiters — backpressure without eviction
|
||||
// ---------------------------------------------------------------------------
|
||||
//
|
||||
// waitForSlot is used by SDKAgent to avoid starting more concurrent SDK
|
||||
// subprocesses than configured. It is event-driven: when a process exits and
|
||||
// is unregistered, notifySlotAvailable() wakes exactly one waiter. There is
|
||||
// no polling. There is no idle-session eviction (Principle 1 — do not kick
|
||||
// live sessions to make room; a full pool must apply backpressure upstream).
|
||||
|
||||
const TOTAL_PROCESS_HARD_CAP = 10;
|
||||
const slotWaiters: Array<() => void> = [];
|
||||
|
||||
function getActiveSdkCount(): number {
|
||||
return getProcessRegistry().getAll().filter(record => record.type === 'sdk').length;
|
||||
}
|
||||
|
||||
function notifySlotAvailable(): void {
|
||||
const waiter = slotWaiters.shift();
|
||||
if (waiter) waiter();
|
||||
}
|
||||
|
||||
/**
|
||||
* Wait until a pool slot is available to spawn another SDK subprocess.
|
||||
*
|
||||
* Resolves immediately when active SDK process count is below `maxConcurrent`.
|
||||
* Otherwise enqueues a waiter that is woken by a subsequent exit handler.
|
||||
* Rejects with a timeout error if no slot opens within `timeoutMs`.
|
||||
* Rejects immediately if the registry is already at the hard cap.
|
||||
*/
|
||||
export async function waitForSlot(maxConcurrent: number, timeoutMs: number = 60_000): Promise<void> {
|
||||
const activeCount = getActiveSdkCount();
|
||||
if (activeCount >= TOTAL_PROCESS_HARD_CAP) {
|
||||
throw new Error(`Hard cap exceeded: ${activeCount} processes in registry (cap=${TOTAL_PROCESS_HARD_CAP}). Refusing to spawn more.`);
|
||||
}
|
||||
|
||||
if (activeCount < maxConcurrent) return;
|
||||
|
||||
logger.info('PROCESS', `Pool limit reached (${activeCount}/${maxConcurrent}), waiting for slot...`);
|
||||
|
||||
return new Promise<void>((resolve, reject) => {
|
||||
const timeout = setTimeout(() => {
|
||||
const idx = slotWaiters.indexOf(onSlot);
|
||||
if (idx >= 0) slotWaiters.splice(idx, 1);
|
||||
reject(new Error(`Timed out waiting for agent pool slot after ${timeoutMs}ms`));
|
||||
}, timeoutMs);
|
||||
|
||||
const onSlot = () => {
|
||||
clearTimeout(timeout);
|
||||
if (getActiveSdkCount() < maxConcurrent) {
|
||||
resolve();
|
||||
} else {
|
||||
slotWaiters.push(onSlot);
|
||||
}
|
||||
};
|
||||
|
||||
slotWaiters.push(onSlot);
|
||||
});
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// SDK subprocess spawn
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
export interface SpawnedSdkProcess {
|
||||
stdin: NonNullable<ChildProcess['stdin']>;
|
||||
stdout: NonNullable<ChildProcess['stdout']>;
|
||||
stderr: NonNullable<ChildProcess['stderr']>;
|
||||
readonly killed: boolean;
|
||||
readonly exitCode: number | null;
|
||||
kill: ChildProcess['kill'];
|
||||
on: ChildProcess['on'];
|
||||
once: ChildProcess['once'];
|
||||
off: ChildProcess['off'];
|
||||
}
|
||||
|
||||
export interface SpawnSdkOptions {
|
||||
command: string;
|
||||
args: string[];
|
||||
cwd?: string;
|
||||
env?: NodeJS.ProcessEnv;
|
||||
signal?: AbortSignal;
|
||||
}
|
||||
|
||||
/**
|
||||
* Spawn a Claude SDK subprocess in its own POSIX process group.
|
||||
*
|
||||
* The spawn uses `detached: true` so the child becomes the leader of a new
|
||||
* process group (setpgid). The leader's PID equals its pgid on Unix, so we
|
||||
* store `child.pid` as both pid and pgid on the managed process record.
|
||||
* Shutdown then signals the group via `process.kill(-pgid, signal)`, tearing
|
||||
* down the SDK child AND every descendant in one syscall (Principle 5).
|
||||
*
|
||||
* Windows caveat: `detached: true` does not create a POSIX group. The
|
||||
* recorded pgid is still the child PID so Windows teardown at least kills
|
||||
* the direct child; full subtree teardown on Windows requires Job Objects
|
||||
* or `taskkill /T /F` (see shutdown.ts).
|
||||
*
|
||||
* Node's child_process.spawn is used intentionally — Bun.spawn does NOT
|
||||
* support `detached: true` (see PATHFINDER-2026-04-22/_reference.md Part 2
|
||||
* row 3), and this module must work under Bun as well as Node.
|
||||
*/
|
||||
export function spawnSdkProcess(
|
||||
sessionDbId: number,
|
||||
options: SpawnSdkOptions
|
||||
): { process: SpawnedSdkProcess; pid: number; pgid: number } | null {
|
||||
const registry = getProcessRegistry();
|
||||
|
||||
// On Windows, use cmd.exe wrapper for .cmd files to properly handle paths with spaces.
|
||||
const useCmdWrapper = process.platform === 'win32' && options.command.endsWith('.cmd');
|
||||
const env = sanitizeEnv(options.env ?? process.env);
|
||||
|
||||
// Filter empty string args AND their preceding flag (Issue #2049).
|
||||
// The Agent SDK emits ["--setting-sources", ""] when settingSources defaults to [].
|
||||
// Simply dropping "" leaves an orphan --setting-sources that consumes the next
|
||||
// flag as its value, crashing Claude Code 2.1.109+ with
|
||||
// "Invalid setting source: --permission-mode". Drop the flag too so the SDK
|
||||
// default (no setting sources) is preserved by omission.
|
||||
const filteredArgs: string[] = [];
|
||||
for (const arg of options.args) {
|
||||
if (arg === '') {
|
||||
if (filteredArgs.length > 0 && filteredArgs[filteredArgs.length - 1].startsWith('--')) {
|
||||
filteredArgs.pop();
|
||||
}
|
||||
continue;
|
||||
}
|
||||
filteredArgs.push(arg);
|
||||
}
|
||||
|
||||
// Unix: detached:true causes the kernel to setpgid() on the child so the
|
||||
// child becomes leader of a new process group whose pgid equals its pid.
|
||||
// Windows: detached:true decouples the child from the parent console; there
|
||||
// is no POSIX group, but the flag is still safe to pass.
|
||||
//
|
||||
// stdin must be 'pipe' (not 'ignore') because SpawnedSdkProcess.stdin is
|
||||
// typed NonNullable<...> and the Claude Agent SDK consumes that pipe to
|
||||
// stream prompts in. With 'ignore', child.stdin would be null and the
|
||||
// null-check below (line ~737) would tear the child down immediately.
|
||||
const child = useCmdWrapper
|
||||
? spawn('cmd.exe', ['/d', '/c', options.command, ...filteredArgs], {
|
||||
cwd: options.cwd,
|
||||
env,
|
||||
detached: true,
|
||||
stdio: ['pipe', 'pipe', 'pipe'],
|
||||
signal: options.signal,
|
||||
windowsHide: true,
|
||||
})
|
||||
: spawn(options.command, filteredArgs, {
|
||||
cwd: options.cwd,
|
||||
env,
|
||||
detached: true,
|
||||
stdio: ['pipe', 'pipe', 'pipe'],
|
||||
signal: options.signal,
|
||||
windowsHide: true,
|
||||
});
|
||||
|
||||
// ALWAYS attach an 'error' listener BEFORE any other code runs, regardless of
|
||||
// whether the child has a PID. child_process.spawn emits 'error' asynchronously
|
||||
// for ENOENT, EACCES, AbortSignal-driven aborts, etc. Without a listener these
|
||||
// become uncaughtException — the cause of "The operation was aborted." escaping
|
||||
// to the daemon during crash-recovery loops.
|
||||
child.on('error', (err: Error) => {
|
||||
logger.warn('SDK_SPAWN', `[session-${sessionDbId}] child emitted error event`, {
|
||||
sessionDbId,
|
||||
pid: child.pid,
|
||||
errorName: err.name,
|
||||
errorCode: (err as NodeJS.ErrnoException).code,
|
||||
}, err);
|
||||
});
|
||||
|
||||
if (!child.pid) {
|
||||
logger.error('PROCESS', 'Spawn succeeded but produced no PID', { sessionDbId });
|
||||
return null;
|
||||
}
|
||||
|
||||
const pid = child.pid;
|
||||
const pgid = pid; // On Unix with detached:true, pgid === pid. On Windows, this is an alias.
|
||||
|
||||
// Capture stderr for debugging spawn failures.
|
||||
if (child.stderr) {
|
||||
child.stderr.on('data', (data: Buffer) => {
|
||||
logger.debug('SDK_SPAWN', `[session-${sessionDbId}] stderr: ${data.toString().trim()}`);
|
||||
});
|
||||
}
|
||||
|
||||
// Register the process in the supervisor registry with pgid recorded so
|
||||
// the shutdown cascade can signal the whole group.
|
||||
const recordId = `sdk:${sessionDbId}:${pid}`;
|
||||
registry.register(recordId, {
|
||||
pid,
|
||||
type: 'sdk',
|
||||
sessionId: sessionDbId,
|
||||
startedAt: new Date().toISOString(),
|
||||
pgid,
|
||||
}, child);
|
||||
|
||||
// Auto-unregister on exit. child.on('exit') is the authoritative event-driven
|
||||
// signal that a process has left — no polling, no sweeper needed (Principle 4).
|
||||
child.on('exit', (code: number | null, signal: string | null) => {
|
||||
if (code !== 0) {
|
||||
logger.warn('SDK_SPAWN', `[session-${sessionDbId}] Claude process exited`, { code, signal, pid });
|
||||
}
|
||||
registry.unregister(recordId);
|
||||
// Wake one pool-slot waiter since a slot just freed up.
|
||||
notifySlotAvailable();
|
||||
});
|
||||
|
||||
if (!child.stdin || !child.stdout || !child.stderr) {
|
||||
logger.error('PROCESS', 'Spawned SDK child missing required stdio streams', {
|
||||
sessionDbId,
|
||||
pid,
|
||||
hasStdin: Boolean(child.stdin),
|
||||
hasStdout: Boolean(child.stdout),
|
||||
hasStderr: Boolean(child.stderr),
|
||||
});
|
||||
try { child.kill('SIGKILL'); } catch { /* already dead */ }
|
||||
return null;
|
||||
}
|
||||
|
||||
const spawned: SpawnedSdkProcess = {
|
||||
stdin: child.stdin,
|
||||
stdout: child.stdout,
|
||||
stderr: child.stderr,
|
||||
get killed() { return child.killed; },
|
||||
get exitCode() { return child.exitCode; },
|
||||
kill: child.kill.bind(child),
|
||||
on: child.on.bind(child),
|
||||
once: child.once.bind(child),
|
||||
off: child.off.bind(child),
|
||||
};
|
||||
|
||||
return { process: spawned, pid, pgid };
|
||||
}
|
||||
|
||||
/**
|
||||
* SDK-compatible spawn factory.
|
||||
*
|
||||
* The Claude Agent SDK's `spawnClaudeCodeProcess` option calls our factory
|
||||
* with its own spawn arguments; we forward them into `spawnSdkProcess` which
|
||||
* creates the child in its own process group and records it in the supervisor
|
||||
* registry. The returned shape is the minimal subset of ChildProcess that the
|
||||
* SDK consumes — stdin/stdout/stderr pipes, killed/exitCode getters, and
|
||||
* kill/on/once/off.
|
||||
*
|
||||
* Pre-spawn cleanup: if a previous process for this session is still alive
|
||||
* (e.g. a crash-recovery attempt that collided with a still-running SDK),
|
||||
* SIGTERM it. Multiple processes sharing the same --resume UUID waste API
|
||||
* credits and can conflict with each other (Issue #1590).
|
||||
*/
|
||||
export function createSdkSpawnFactory(sessionDbId: number) {
|
||||
return (spawnOptions: SpawnSdkOptions): SpawnedSdkProcess => {
|
||||
const registry = getProcessRegistry();
|
||||
|
||||
// Kill any existing process for this session before spawning a new one.
|
||||
const existing = registry.getBySession(sessionDbId).filter(r => r.type === 'sdk');
|
||||
for (const record of existing) {
|
||||
if (!isPidAlive(record.pid)) continue;
|
||||
try {
|
||||
if (typeof record.pgid === 'number') {
|
||||
// Signal the whole group — kill the SDK child and any descendants.
|
||||
if (process.platform !== 'win32') {
|
||||
process.kill(-record.pgid, 'SIGTERM');
|
||||
} else {
|
||||
process.kill(record.pid, 'SIGTERM');
|
||||
}
|
||||
} else {
|
||||
process.kill(record.pid, 'SIGTERM');
|
||||
}
|
||||
logger.warn('PROCESS', `Killing duplicate SDK process PID ${record.pid} before spawning new one for session ${sessionDbId}`, {
|
||||
existingPid: record.pid,
|
||||
sessionDbId,
|
||||
});
|
||||
} catch (error: unknown) {
|
||||
const code = error instanceof Error ? (error as NodeJS.ErrnoException).code : undefined;
|
||||
if (code !== 'ESRCH') {
|
||||
if (error instanceof Error) {
|
||||
logger.warn('PROCESS', `Failed to SIGTERM duplicate SDK process PID ${record.pid}`, { sessionDbId }, error);
|
||||
} else {
|
||||
logger.warn('PROCESS', `Failed to SIGTERM duplicate SDK process PID ${record.pid} (non-Error)`, {
|
||||
sessionDbId, error: String(error),
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
const result = spawnSdkProcess(sessionDbId, spawnOptions);
|
||||
if (!result) {
|
||||
// Match the legacy failure mode: the SDK needs a process-like object
|
||||
// even on spawn failure; throwing here surfaces via exit code 2 to the
|
||||
// hook layer (Principle 2 — fail-fast).
|
||||
throw new Error(`Failed to spawn SDK subprocess for session ${sessionDbId}`);
|
||||
}
|
||||
|
||||
return result.process;
|
||||
};
|
||||
}
|
||||
|
||||
+60
-40
@@ -34,16 +34,18 @@ export async function runShutdownCascade(options: ShutdownCascadeOptions): Promi
|
||||
}
|
||||
|
||||
try {
|
||||
await signalProcess(record.pid, 'SIGTERM');
|
||||
await signalProcess(record, 'SIGTERM');
|
||||
} catch (error: unknown) {
|
||||
if (error instanceof Error) {
|
||||
logger.debug('SYSTEM', 'Failed to send SIGTERM to child process', {
|
||||
pid: record.pid,
|
||||
pgid: record.pgid,
|
||||
type: record.type
|
||||
}, error);
|
||||
} else {
|
||||
logger.warn('SYSTEM', 'Failed to send SIGTERM to child process (non-Error)', {
|
||||
pid: record.pid,
|
||||
pgid: record.pgid,
|
||||
type: record.type,
|
||||
error: String(error)
|
||||
});
|
||||
@@ -56,16 +58,18 @@ export async function runShutdownCascade(options: ShutdownCascadeOptions): Promi
|
||||
const survivors = childRecords.filter(record => isPidAlive(record.pid));
|
||||
for (const record of survivors) {
|
||||
try {
|
||||
await signalProcess(record.pid, 'SIGKILL');
|
||||
await signalProcess(record, 'SIGKILL');
|
||||
} catch (error: unknown) {
|
||||
if (error instanceof Error) {
|
||||
logger.debug('SYSTEM', 'Failed to force kill child process', {
|
||||
pid: record.pid,
|
||||
pgid: record.pgid,
|
||||
type: record.type
|
||||
}, error);
|
||||
} else {
|
||||
logger.warn('SYSTEM', 'Failed to force kill child process (non-Error)', {
|
||||
pid: record.pid,
|
||||
pgid: record.pgid,
|
||||
type: record.type,
|
||||
error: String(error)
|
||||
});
|
||||
@@ -110,7 +114,38 @@ async function waitForExit(records: ManagedProcessRecord[], timeoutMs: number):
|
||||
}
|
||||
}
|
||||
|
||||
async function signalProcess(pid: number, signal: 'SIGTERM' | 'SIGKILL'): Promise<void> {
|
||||
async function signalProcess(record: ManagedProcessRecord, signal: 'SIGTERM' | 'SIGKILL'): Promise<void> {
|
||||
const { pid, pgid } = record;
|
||||
|
||||
// Unix path: when the record carries a pgid (set when the child was spawned
|
||||
// with detached:true so it became its own group leader), signal the negative
|
||||
// PID to tear down the whole process group in one syscall — the SDK child
|
||||
// AND every descendant it spawned. This replaces hand-rolled orphan sweeps
|
||||
// (Principle 5: OS-supervised process groups over hand-rolled reapers).
|
||||
//
|
||||
// Falls back to single-PID kill when pgid is absent (the worker itself,
|
||||
// MCP stdio clients, anything not spawned with detached:true).
|
||||
if (process.platform !== 'win32') {
|
||||
try {
|
||||
if (typeof pgid === 'number') {
|
||||
process.kill(-pgid, signal);
|
||||
} else {
|
||||
process.kill(pid, signal);
|
||||
}
|
||||
} catch (error: unknown) {
|
||||
if (error instanceof Error) {
|
||||
const errno = (error as NodeJS.ErrnoException).code;
|
||||
if (errno === 'ESRCH') {
|
||||
return;
|
||||
}
|
||||
}
|
||||
throw error;
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
// Windows: no POSIX process groups. SIGTERM uses single-PID kill; SIGKILL
|
||||
// uses tree-kill or taskkill /T to walk the descendant tree.
|
||||
if (signal === 'SIGTERM') {
|
||||
try {
|
||||
process.kill(pid, signal);
|
||||
@@ -126,50 +161,35 @@ async function signalProcess(pid: number, signal: 'SIGTERM' | 'SIGKILL'): Promis
|
||||
return;
|
||||
}
|
||||
|
||||
if (process.platform === 'win32') {
|
||||
const treeKill = await loadTreeKill();
|
||||
if (treeKill) {
|
||||
await new Promise<void>((resolve, reject) => {
|
||||
treeKill(pid, signal, (error) => {
|
||||
if (!error) {
|
||||
resolve();
|
||||
return;
|
||||
}
|
||||
const treeKill = await loadTreeKill();
|
||||
if (treeKill) {
|
||||
await new Promise<void>((resolve, reject) => {
|
||||
treeKill(pid, signal, (error) => {
|
||||
if (!error) {
|
||||
resolve();
|
||||
return;
|
||||
}
|
||||
|
||||
const errno = (error as NodeJS.ErrnoException).code;
|
||||
if (errno === 'ESRCH') {
|
||||
resolve();
|
||||
return;
|
||||
}
|
||||
reject(error);
|
||||
});
|
||||
const errno = (error as NodeJS.ErrnoException).code;
|
||||
if (errno === 'ESRCH') {
|
||||
resolve();
|
||||
return;
|
||||
}
|
||||
reject(error);
|
||||
});
|
||||
return;
|
||||
}
|
||||
|
||||
const args = ['/PID', String(pid), '/T'];
|
||||
if (signal === 'SIGKILL') {
|
||||
args.push('/F');
|
||||
}
|
||||
|
||||
await execFileAsync('taskkill', args, {
|
||||
timeout: HOOK_TIMEOUTS.POWERSHELL_COMMAND,
|
||||
windowsHide: true
|
||||
});
|
||||
return;
|
||||
}
|
||||
|
||||
try {
|
||||
process.kill(pid, signal);
|
||||
} catch (error: unknown) {
|
||||
if (error instanceof Error) {
|
||||
const errno = (error as NodeJS.ErrnoException).code;
|
||||
if (errno === 'ESRCH') {
|
||||
return;
|
||||
}
|
||||
}
|
||||
throw error;
|
||||
const args = ['/PID', String(pid), '/T'];
|
||||
if (signal === 'SIGKILL') {
|
||||
args.push('/F');
|
||||
}
|
||||
|
||||
await execFileAsync('taskkill', args, {
|
||||
timeout: HOOK_TIMEOUTS.POWERSHELL_COMMAND,
|
||||
windowsHide: true
|
||||
});
|
||||
}
|
||||
|
||||
async function loadTreeKill(): Promise<TreeKillFn | null> {
|
||||
|
||||
@@ -15,7 +15,7 @@ type DataItem = Observation | Summary | UserPrompt;
|
||||
/**
|
||||
* Generic pagination hook for observations, summaries, and prompts
|
||||
*/
|
||||
function usePaginationFor(endpoint: string, dataType: DataType, currentFilter: string, currentSource: string) {
|
||||
function usePaginationFor<TItem extends DataItem>(endpoint: string, dataType: DataType, currentFilter: string, currentSource: string) {
|
||||
const [state, setState] = useState<PaginationState>({
|
||||
isLoading: false,
|
||||
hasMore: true
|
||||
@@ -30,7 +30,7 @@ function usePaginationFor(endpoint: string, dataType: DataType, currentFilter: s
|
||||
* Load more items from the API
|
||||
* Automatically resets offset to 0 if filter has changed
|
||||
*/
|
||||
const loadMore = useCallback(async (): Promise<DataItem[]> => {
|
||||
const loadMore = useCallback(async (): Promise<TItem[]> => {
|
||||
// Check if filter changed - if so, reset pagination synchronously
|
||||
const selectionKey = `${currentSource}::${currentFilter}`;
|
||||
const filterChanged = lastSelectionRef.current !== selectionKey;
|
||||
@@ -75,7 +75,7 @@ function usePaginationFor(endpoint: string, dataType: DataType, currentFilter: s
|
||||
throw new Error(`Failed to load ${dataType}: ${response.statusText}`);
|
||||
}
|
||||
|
||||
const data = await response.json() as { items: DataItem[], hasMore: boolean };
|
||||
const data = await response.json() as { items: TItem[], hasMore: boolean };
|
||||
|
||||
const nextState = {
|
||||
...stateRef.current,
|
||||
@@ -106,9 +106,9 @@ function usePaginationFor(endpoint: string, dataType: DataType, currentFilter: s
|
||||
* Hook for paginating observations
|
||||
*/
|
||||
export function usePagination(currentFilter: string, currentSource: string) {
|
||||
const observations = usePaginationFor(API_ENDPOINTS.OBSERVATIONS, 'observations', currentFilter, currentSource);
|
||||
const summaries = usePaginationFor(API_ENDPOINTS.SUMMARIES, 'summaries', currentFilter, currentSource);
|
||||
const prompts = usePaginationFor(API_ENDPOINTS.PROMPTS, 'prompts', currentFilter, currentSource);
|
||||
const observations = usePaginationFor<Observation>(API_ENDPOINTS.OBSERVATIONS, 'observations', currentFilter, currentSource);
|
||||
const summaries = usePaginationFor<Summary>(API_ENDPOINTS.SUMMARIES, 'summaries', currentFilter, currentSource);
|
||||
const prompts = usePaginationFor<UserPrompt>(API_ENDPOINTS.PROMPTS, 'prompts', currentFilter, currentSource);
|
||||
|
||||
return {
|
||||
observations,
|
||||
|
||||
@@ -0,0 +1,9 @@
|
||||
{
|
||||
"extends": "../../../tsconfig.json",
|
||||
"compilerOptions": {
|
||||
"lib": ["ES2022", "DOM", "DOM.Iterable"],
|
||||
"rootDir": "."
|
||||
},
|
||||
"include": ["./**/*"],
|
||||
"exclude": []
|
||||
}
|
||||
@@ -1,80 +0,0 @@
|
||||
/**
|
||||
* Bun Path Utility
|
||||
*
|
||||
* Resolves the Bun executable path for environments where Bun is not in PATH
|
||||
* (e.g., fish shell users where ~/.config/fish/config.fish isn't read by /bin/sh)
|
||||
*/
|
||||
|
||||
import { spawnSync } from 'child_process';
|
||||
import { existsSync } from 'fs';
|
||||
import { join } from 'path';
|
||||
import { homedir } from 'os';
|
||||
import { logger } from './logger.js';
|
||||
|
||||
/**
|
||||
* Get the Bun executable path
|
||||
* Tries PATH first, then checks common installation locations
|
||||
* Returns absolute path if found, null otherwise
|
||||
*/
|
||||
export function getBunPath(): string | null {
|
||||
const isWindows = process.platform === 'win32';
|
||||
|
||||
// Try PATH first
|
||||
try {
|
||||
const result = spawnSync('bun', ['--version'], {
|
||||
encoding: 'utf-8',
|
||||
stdio: ['pipe', 'pipe', 'pipe'],
|
||||
shell: false // SECURITY: No need for shell, bun is the executable
|
||||
});
|
||||
if (result.status === 0) {
|
||||
return 'bun'; // Available in PATH
|
||||
}
|
||||
} catch (e) {
|
||||
logger.debug('SYSTEM', 'Bun not found in PATH, checking common installation locations', {
|
||||
error: e instanceof Error ? e.message : String(e)
|
||||
});
|
||||
}
|
||||
|
||||
// Check common installation paths
|
||||
const bunPaths = isWindows
|
||||
? [join(homedir(), '.bun', 'bin', 'bun.exe')]
|
||||
: [
|
||||
join(homedir(), '.bun', 'bin', 'bun'),
|
||||
'/usr/local/bin/bun',
|
||||
'/opt/homebrew/bin/bun', // Apple Silicon Homebrew
|
||||
'/home/linuxbrew/.linuxbrew/bin/bun' // Linux Homebrew
|
||||
];
|
||||
|
||||
for (const bunPath of bunPaths) {
|
||||
if (existsSync(bunPath)) {
|
||||
return bunPath;
|
||||
}
|
||||
}
|
||||
|
||||
return null;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get the Bun executable path or throw an error
|
||||
* Use this when Bun is required for operation
|
||||
*/
|
||||
export function getBunPathOrThrow(): string {
|
||||
const bunPath = getBunPath();
|
||||
if (!bunPath) {
|
||||
const isWindows = process.platform === 'win32';
|
||||
const installCmd = isWindows
|
||||
? 'powershell -c "irm bun.sh/install.ps1 | iex"'
|
||||
: 'curl -fsSL https://bun.sh/install | bash';
|
||||
throw new Error(
|
||||
`Bun is required but not found. Install it with:\n ${installCmd}\nThen restart your terminal.`
|
||||
);
|
||||
}
|
||||
return bunPath;
|
||||
}
|
||||
|
||||
/**
|
||||
* Check if Bun is available (in PATH or common locations)
|
||||
*/
|
||||
export function isBunAvailable(): boolean {
|
||||
return getBunPath() !== null;
|
||||
}
|
||||
+38
-3
@@ -15,12 +15,47 @@ export enum LogLevel {
|
||||
SILENT = 4
|
||||
}
|
||||
|
||||
export type Component = 'HOOK' | 'WORKER' | 'SDK' | 'PARSER' | 'DB' | 'SYSTEM' | 'HTTP' | 'SESSION' | 'CHROMA' | 'CHROMA_MCP' | 'CHROMA_SYNC' | 'FOLDER_INDEX' | 'CLAUDE_MD' | 'QUEUE' | 'TELEGRAM';
|
||||
export type Component =
|
||||
| 'AGENTS_MD'
|
||||
| 'BRANCH'
|
||||
| 'CHROMA'
|
||||
| 'CHROMA_MCP'
|
||||
| 'CHROMA_SYNC'
|
||||
| 'CLAUDE_MD'
|
||||
| 'CONFIG'
|
||||
| 'CONSOLE'
|
||||
| 'CURSOR'
|
||||
| 'DB'
|
||||
| 'DEDUP'
|
||||
| 'ENV'
|
||||
| 'FOLDER_INDEX'
|
||||
| 'HOOK'
|
||||
| 'HTTP'
|
||||
| 'IMPORT'
|
||||
| 'INGEST'
|
||||
| 'OPENCLAW'
|
||||
| 'OPENCODE'
|
||||
| 'PARSER'
|
||||
| 'PROCESS'
|
||||
| 'PROJECT_NAME'
|
||||
| 'QUEUE'
|
||||
| 'SDK'
|
||||
| 'SDK_SPAWN'
|
||||
| 'SEARCH'
|
||||
| 'SECURITY'
|
||||
| 'SESSION'
|
||||
| 'SETTINGS'
|
||||
| 'SHUTDOWN'
|
||||
| 'SYSTEM'
|
||||
| 'TELEGRAM'
|
||||
| 'TRANSCRIPT'
|
||||
| 'WINDSURF'
|
||||
| 'WORKER';
|
||||
|
||||
interface LogContext {
|
||||
sessionId?: number;
|
||||
sessionId?: string | number;
|
||||
memorySessionId?: string;
|
||||
correlationId?: string;
|
||||
correlationId?: string | number;
|
||||
[key: string]: any;
|
||||
}
|
||||
|
||||
|
||||
+60
-45
@@ -10,82 +10,97 @@
|
||||
* (should not be persisted to memory)
|
||||
* 4. <system-reminder> - Claude Code-injected system reminders
|
||||
* (CLAUDE.md contents, deferred tool lists, etc. — should not be persisted)
|
||||
* 5. <persisted-output> - Persisted-output payload tag
|
||||
*
|
||||
* EDGE PROCESSING PATTERN: Filter at hook layer before sending to worker/storage.
|
||||
* This keeps the worker service simple and follows one-way data stream.
|
||||
*
|
||||
* PATHFINDER plan 03 phase 8: collapsed countTags + stripTagsInternal into a
|
||||
* single alternation regex. One pass over the input. One helper, N callers
|
||||
* (`stripMemoryTagsFromJson` / `stripMemoryTagsFromPrompt` are thin adapters).
|
||||
*/
|
||||
|
||||
import { logger } from './logger.js';
|
||||
|
||||
/** All tag names this module strips. Single source of truth for the regex. */
|
||||
const TAG_NAMES = [
|
||||
'private',
|
||||
'claude-mem-context',
|
||||
'system_instruction',
|
||||
'system-instruction',
|
||||
'persisted-output',
|
||||
'system-reminder',
|
||||
] as const;
|
||||
type TagName = (typeof TAG_NAMES)[number];
|
||||
|
||||
/**
|
||||
* Single-pass alternation regex covering every privacy / context tag.
|
||||
* Backreference `\1` ensures a closing tag matches the opening name; tag
|
||||
* attributes (e.g. `<system-reminder data-foo="…">`) are tolerated via
|
||||
* `[^>]*`.
|
||||
*/
|
||||
const STRIP_REGEX = new RegExp(
|
||||
`<(${TAG_NAMES.join('|')})\\b[^>]*>[\\s\\S]*?</\\1>`,
|
||||
'g'
|
||||
);
|
||||
|
||||
/**
|
||||
* Regex to match <system-reminder> tags and their content.
|
||||
* Exported for use by transcript parsers that strip system-reminder at read-time.
|
||||
*
|
||||
* Kept as a separate single-tag regex because the active transcript parser
|
||||
* (`src/shared/transcript-parser.ts`) consumes only this one tag and would
|
||||
* otherwise need to re-import the multi-tag list.
|
||||
*/
|
||||
export const SYSTEM_REMINDER_REGEX = /<system-reminder>[\s\S]*?<\/system-reminder>/g;
|
||||
|
||||
/**
|
||||
* Maximum number of tags allowed in a single content block
|
||||
* This protects against ReDoS (Regular Expression Denial of Service) attacks
|
||||
* where malicious input with many nested/unclosed tags could cause catastrophic backtracking
|
||||
*/
|
||||
/** Maximum total stripped-tag count before we log a ReDoS-class anomaly. */
|
||||
const MAX_TAG_COUNT = 100;
|
||||
|
||||
/**
|
||||
* Count total number of opening tags in content
|
||||
* Used for ReDoS protection before regex processing
|
||||
* Strip every recognised tag from `input` in a single pass.
|
||||
*
|
||||
* @returns the stripped string (trimmed) and per-tag counts. Counts are
|
||||
* surfaced to logs for observability but are not used as a control
|
||||
* signal.
|
||||
*/
|
||||
function countTags(content: string): number {
|
||||
const privateCount = (content.match(/<private>/g) || []).length;
|
||||
const contextCount = (content.match(/<claude-mem-context>/g) || []).length;
|
||||
const systemInstructionCount = (content.match(/<system_instruction>/g) || []).length;
|
||||
const systemInstructionHyphenCount = (content.match(/<system-instruction>/g) || []).length;
|
||||
const persistedOutputCount = (content.match(/<persisted-output>/g) || []).length;
|
||||
const systemReminderCount = (content.match(/<system-reminder>/g) || []).length;
|
||||
return privateCount + contextCount + systemInstructionCount + systemInstructionHyphenCount + persistedOutputCount + systemReminderCount;
|
||||
}
|
||||
export function stripTags(input: string): { stripped: string; counts: Record<TagName, number> } {
|
||||
const counts: Record<TagName, number> = Object.fromEntries(
|
||||
TAG_NAMES.map(name => [name, 0])
|
||||
) as Record<TagName, number>;
|
||||
|
||||
/**
|
||||
* Internal function to strip memory tags from content
|
||||
* Shared logic extracted from both JSON and prompt stripping functions
|
||||
*/
|
||||
function stripTagsInternal(content: string): string {
|
||||
// ReDoS protection: limit tag count before regex processing
|
||||
const tagCount = countTags(content);
|
||||
if (tagCount > MAX_TAG_COUNT) {
|
||||
STRIP_REGEX.lastIndex = 0; // /g state is per-instance — reset before each call.
|
||||
|
||||
let total = 0;
|
||||
const stripped = input.replace(STRIP_REGEX, (_, name: TagName) => {
|
||||
counts[name] = (counts[name] ?? 0) + 1;
|
||||
total += 1;
|
||||
return '';
|
||||
});
|
||||
|
||||
if (total > MAX_TAG_COUNT) {
|
||||
logger.warn('SYSTEM', 'tag count exceeds limit', undefined, {
|
||||
tagCount,
|
||||
tagCount: total,
|
||||
maxAllowed: MAX_TAG_COUNT,
|
||||
contentLength: content.length
|
||||
contentLength: input.length,
|
||||
});
|
||||
// Still process but log the anomaly
|
||||
}
|
||||
|
||||
return content
|
||||
.replace(/<claude-mem-context>[\s\S]*?<\/claude-mem-context>/g, '')
|
||||
.replace(/<private>[\s\S]*?<\/private>/g, '')
|
||||
.replace(/<system_instruction>[\s\S]*?<\/system_instruction>/g, '')
|
||||
.replace(/<system-instruction>[\s\S]*?<\/system-instruction>/g, '')
|
||||
.replace(/<persisted-output>[\s\S]*?<\/persisted-output>/g, '')
|
||||
.replace(SYSTEM_REMINDER_REGEX, '')
|
||||
.trim();
|
||||
return { stripped: stripped.trim(), counts };
|
||||
}
|
||||
|
||||
/**
|
||||
* Strip memory tags from JSON-serialized content (tool inputs/responses)
|
||||
*
|
||||
* @param content - Stringified JSON content from tool_input or tool_response
|
||||
* @returns Cleaned content with tags removed, or '{}' if invalid
|
||||
* Strip memory tags from JSON-serialized content (tool inputs/responses).
|
||||
* Thin adapter around `stripTags` — same regex, same single pass.
|
||||
*/
|
||||
export function stripMemoryTagsFromJson(content: string): string {
|
||||
return stripTagsInternal(content);
|
||||
return stripTags(content).stripped;
|
||||
}
|
||||
|
||||
/**
|
||||
* Strip memory tags from user prompt content
|
||||
*
|
||||
* @param content - Raw user prompt text
|
||||
* @returns Cleaned content with tags removed
|
||||
* Strip memory tags from user prompt content.
|
||||
* Thin adapter around `stripTags` — same regex, same single pass.
|
||||
*/
|
||||
export function stripMemoryTagsFromPrompt(content: string): string {
|
||||
return stripTagsInternal(content);
|
||||
return stripTags(content).stripped;
|
||||
}
|
||||
|
||||
@@ -1,266 +0,0 @@
|
||||
/**
|
||||
* TranscriptParser - Properly parse Claude Code transcript JSONL files
|
||||
* Handles all transcript entry types based on validated model
|
||||
*/
|
||||
|
||||
import { readFileSync } from 'fs';
|
||||
import { logger } from './logger.js';
|
||||
import { SYSTEM_REMINDER_REGEX } from './tag-stripping.js';
|
||||
import type {
|
||||
TranscriptEntry,
|
||||
UserTranscriptEntry,
|
||||
AssistantTranscriptEntry,
|
||||
SummaryTranscriptEntry,
|
||||
SystemTranscriptEntry,
|
||||
QueueOperationTranscriptEntry,
|
||||
ContentItem,
|
||||
TextContent,
|
||||
} from '../types/transcript.js';
|
||||
|
||||
export interface ParseStats {
|
||||
totalLines: number;
|
||||
parsedEntries: number;
|
||||
failedLines: number;
|
||||
entriesByType: Record<string, number>;
|
||||
failureRate: number;
|
||||
}
|
||||
|
||||
export class TranscriptParser {
|
||||
private entries: TranscriptEntry[] = [];
|
||||
private parseErrors: Array<{ lineNumber: number; error: string }> = [];
|
||||
|
||||
constructor(transcriptPath: string) {
|
||||
this.parseTranscript(transcriptPath);
|
||||
}
|
||||
|
||||
private parseTranscript(transcriptPath: string): void {
|
||||
const content = readFileSync(transcriptPath, 'utf-8').trim();
|
||||
if (!content) return;
|
||||
|
||||
const lines = content.split('\n');
|
||||
|
||||
lines.forEach((line, index) => {
|
||||
try {
|
||||
const entry = JSON.parse(line) as TranscriptEntry;
|
||||
this.entries.push(entry);
|
||||
} catch (error) {
|
||||
logger.debug('PARSER', 'Failed to parse transcript line', { lineNumber: index + 1 }, error as Error);
|
||||
this.parseErrors.push({
|
||||
lineNumber: index + 1,
|
||||
error: error instanceof Error ? error.message : String(error),
|
||||
});
|
||||
}
|
||||
});
|
||||
|
||||
// Log summary if there were parse errors
|
||||
if (this.parseErrors.length > 0) {
|
||||
logger.error('PARSER', `Failed to parse ${this.parseErrors.length} lines`, {
|
||||
path: transcriptPath,
|
||||
totalLines: lines.length,
|
||||
errorCount: this.parseErrors.length
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Get all entries of a specific type
|
||||
*/
|
||||
getEntriesByType<T extends TranscriptEntry>(type: T['type']): T[] {
|
||||
return this.entries.filter((e) => e.type === type) as T[];
|
||||
}
|
||||
|
||||
/**
|
||||
* Get all user entries
|
||||
*/
|
||||
getUserEntries(): UserTranscriptEntry[] {
|
||||
return this.getEntriesByType<UserTranscriptEntry>('user');
|
||||
}
|
||||
|
||||
/**
|
||||
* Get all assistant entries
|
||||
*/
|
||||
getAssistantEntries(): AssistantTranscriptEntry[] {
|
||||
return this.getEntriesByType<AssistantTranscriptEntry>('assistant');
|
||||
}
|
||||
|
||||
/**
|
||||
* Get all summary entries
|
||||
*/
|
||||
getSummaryEntries(): SummaryTranscriptEntry[] {
|
||||
return this.getEntriesByType<SummaryTranscriptEntry>('summary');
|
||||
}
|
||||
|
||||
/**
|
||||
* Get all system entries
|
||||
*/
|
||||
getSystemEntries(): SystemTranscriptEntry[] {
|
||||
return this.getEntriesByType<SystemTranscriptEntry>('system');
|
||||
}
|
||||
|
||||
/**
|
||||
* Get all queue operation entries
|
||||
*/
|
||||
getQueueOperationEntries(): QueueOperationTranscriptEntry[] {
|
||||
return this.getEntriesByType<QueueOperationTranscriptEntry>('queue-operation');
|
||||
}
|
||||
|
||||
/**
|
||||
* Get last entry of a specific type
|
||||
*/
|
||||
getLastEntryByType<T extends TranscriptEntry>(type: T['type']): T | null {
|
||||
const entries = this.getEntriesByType<T>(type);
|
||||
return entries.length > 0 ? entries[entries.length - 1] : null;
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract text content from content items
|
||||
*/
|
||||
private extractTextFromContent(content: string | ContentItem[]): string {
|
||||
if (typeof content === 'string') {
|
||||
return content;
|
||||
}
|
||||
|
||||
if (Array.isArray(content)) {
|
||||
return content
|
||||
.filter((item): item is TextContent => item.type === 'text')
|
||||
.map((item) => item.text)
|
||||
.join('\n');
|
||||
}
|
||||
|
||||
return '';
|
||||
}
|
||||
|
||||
/**
|
||||
* Get last user message text (finds last entry with actual text content)
|
||||
*/
|
||||
getLastUserMessage(): string {
|
||||
const userEntries = this.getUserEntries();
|
||||
|
||||
// Iterate backward to find the last user message with text content
|
||||
for (let i = userEntries.length - 1; i >= 0; i--) {
|
||||
const entry = userEntries[i];
|
||||
if (!entry?.message?.content) continue;
|
||||
|
||||
const text = this.extractTextFromContent(entry.message.content);
|
||||
if (text) return text;
|
||||
}
|
||||
|
||||
return '';
|
||||
}
|
||||
|
||||
/**
|
||||
* Get last assistant message text (finds last entry with text content, with optional system-reminder filtering)
|
||||
*/
|
||||
getLastAssistantMessage(filterSystemReminders = true): string {
|
||||
const assistantEntries = this.getAssistantEntries();
|
||||
|
||||
// Iterate backward to find the last assistant message with text content
|
||||
for (let i = assistantEntries.length - 1; i >= 0; i--) {
|
||||
const entry = assistantEntries[i];
|
||||
if (!entry?.message?.content) continue;
|
||||
|
||||
let text = this.extractTextFromContent(entry.message.content);
|
||||
if (!text) continue;
|
||||
|
||||
if (filterSystemReminders) {
|
||||
// Filter out system-reminder tags and their content
|
||||
text = text.replace(SYSTEM_REMINDER_REGEX, '');
|
||||
// Clean up excessive whitespace
|
||||
text = text.replace(/\n{3,}/g, '\n\n').trim();
|
||||
}
|
||||
|
||||
if (text) return text;
|
||||
}
|
||||
|
||||
return '';
|
||||
}
|
||||
|
||||
/**
|
||||
* Get all tool use operations from assistant entries
|
||||
*/
|
||||
getToolUseHistory(): Array<{ name: string; timestamp: string; input: any }> {
|
||||
const toolUses: Array<{ name: string; timestamp: string; input: any }> = [];
|
||||
|
||||
for (const entry of this.getAssistantEntries()) {
|
||||
if (Array.isArray(entry.message.content)) {
|
||||
for (const item of entry.message.content) {
|
||||
if (item.type === 'tool_use') {
|
||||
toolUses.push({
|
||||
name: item.name,
|
||||
timestamp: entry.timestamp,
|
||||
input: item.input,
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return toolUses;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get total token usage across all assistant messages
|
||||
*/
|
||||
getTotalTokenUsage(): {
|
||||
inputTokens: number;
|
||||
outputTokens: number;
|
||||
cacheCreationTokens: number;
|
||||
cacheReadTokens: number;
|
||||
} {
|
||||
const assistantEntries = this.getAssistantEntries();
|
||||
|
||||
return assistantEntries.reduce(
|
||||
(acc, entry) => {
|
||||
const usage = entry.message.usage;
|
||||
if (usage) {
|
||||
acc.inputTokens += usage.input_tokens || 0;
|
||||
acc.outputTokens += usage.output_tokens || 0;
|
||||
acc.cacheCreationTokens += usage.cache_creation_input_tokens || 0;
|
||||
acc.cacheReadTokens += usage.cache_read_input_tokens || 0;
|
||||
}
|
||||
return acc;
|
||||
},
|
||||
{
|
||||
inputTokens: 0,
|
||||
outputTokens: 0,
|
||||
cacheCreationTokens: 0,
|
||||
cacheReadTokens: 0,
|
||||
}
|
||||
);
|
||||
}
|
||||
|
||||
/**
|
||||
* Get parse statistics
|
||||
*/
|
||||
getParseStats(): ParseStats {
|
||||
const entriesByType: Record<string, number> = {};
|
||||
|
||||
for (const entry of this.entries) {
|
||||
entriesByType[entry.type] = (entriesByType[entry.type] || 0) + 1;
|
||||
}
|
||||
|
||||
const totalLines = this.entries.length + this.parseErrors.length;
|
||||
|
||||
return {
|
||||
totalLines,
|
||||
parsedEntries: this.entries.length,
|
||||
failedLines: this.parseErrors.length,
|
||||
entriesByType,
|
||||
failureRate: totalLines > 0 ? this.parseErrors.length / totalLines : 0,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Get parse errors
|
||||
*/
|
||||
getParseErrors(): Array<{ lineNumber: number; error: string }> {
|
||||
return this.parseErrors;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get all entries (raw)
|
||||
*/
|
||||
getAllEntries(): TranscriptEntry[] {
|
||||
return this.entries;
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user