perf: streamline worker startup and consolidate database connections (#2122)

* docs: pathfinder refactor corpus + Node 20 preflight

Adds the PATHFINDER-2026-04-22 principle-driven refactor plan (11 docs,
cross-checked PASS) plus the exploratory PATHFINDER-2026-04-21 corpus
that motivated it. Bumps engines.node to >=20.0.0 per the ingestion-path
plan preflight (recursive fs.watch). Adds the pathfinder skill.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: land PATHFINDER Plan 01 — data integrity

Schema, UNIQUE constraints, self-healing claim, Chroma upsert fallback.

- Phase 1: fresh schema.sql regenerated at post-refactor shape.
- Phase 2: migrations 23+24 — rebuild pending_messages without
  started_processing_at_epoch; UNIQUE(session_id, tool_use_id);
  UNIQUE(memory_session_id, content_hash) on observations; dedup
  duplicate rows before adding indexes.
- Phase 3: claimNextMessage rewritten to self-healing query using
  worker_pid NOT IN live_worker_pids; STALE_PROCESSING_THRESHOLD_MS
  and the 60-s stale-reset block deleted.
- Phase 4: DEDUP_WINDOW_MS and findDuplicateObservation deleted;
  observations.insert now uses ON CONFLICT DO NOTHING.
- Phase 5: failed-message purge block deleted from worker-service
  2-min interval; clearFailedOlderThan method deleted.
- Phase 6: repairMalformedSchema and its Python subprocess repair
  path deleted from Database.ts; SQLite errors now propagate.
- Phase 7: Chroma delete-then-add fallback gated behind
  CHROMA_SYNC_FALLBACK_ON_CONFLICT env flag as bridge until
  Chroma MCP ships native upsert.
- Phase 8: migration 19 no-op block absorbed into fresh schema.sql.

Verification greps all return 0 matches. bun test tests/sqlite/
passes 63/63. bun run build succeeds.

Plan: PATHFINDER-2026-04-22/01-data-integrity.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: land PATHFINDER Plan 02 — process lifecycle

OS process groups replace hand-rolled reapers. Worker runs until
killed; orphans are prevented by detached spawn + kill(-pgid).

- Phase 1: src/services/worker/ProcessRegistry.ts DELETED. The
  canonical registry at src/supervisor/process-registry.ts is the
  sole survivor; SDK spawn site consolidated into it via new
  createSdkSpawnFactory/spawnSdkProcess/getSdkProcessForSession/
  ensureSdkProcessExit/waitForSlot helpers.
- Phase 2: SDK children spawn with detached:true + stdio:
  ['ignore','pipe','pipe']; pgid recorded on ManagedProcessInfo.
- Phase 3: shutdown.ts signalProcess teardown uses
  process.kill(-pgid, signal) on Unix when pgid is recorded;
  Windows path unchanged (tree-kill/taskkill).
- Phase 4: all reaper intervals deleted — startOrphanReaper call,
  staleSessionReaperInterval setInterval (including the co-located
  WAL checkpoint — SQLite's built-in wal_autocheckpoint handles
  WAL growth without an app-level timer), killIdleDaemonChildren,
  killSystemOrphans, reapOrphanedProcesses, reapStaleSessions, and
  detectStaleGenerator. MAX_GENERATOR_IDLE_MS and MAX_SESSION_IDLE_MS
  constants deleted.
- Phase 5: abandonedTimer — already 0 matches; primary-path cleanup
  via generatorPromise.finally() already lives in worker-service
  startSessionProcessor and SessionRoutes ensureGeneratorRunning.
- Phase 6: evictIdlestSession and its evict callback deleted from
  SessionManager. Pool admission gates backpressure upstream.
- Phase 7: SDK-failure fallback — SessionManager has zero matches
  for fallbackAgent/Gemini/OpenRouter. Failures surface to hooks
  via exit code 2 through SessionRoutes error mapping.
- Phase 8: ensureWorkerRunning in worker-utils.ts rewritten to
  lazy-spawn — consults isWorkerPortAlive (which gates
  captureProcessStartToken for PID-reuse safety via commit
  99060bac), then spawns detached with unref(), then
  waitForWorkerPort({ attempts: 3, backoffMs: 250 }) hand-rolled
  exponential backoff 250→500→1000ms. No respawn npm dep.
- Phase 9: idle self-shutdown — zero matches for
  idleCheck/idleTimeout/IDLE_MAX_MS/idleShutdown. Worker exits
  only on external SIGTERM via supervisor signal handlers.

Three test files that exercised deleted code removed:
tests/worker/process-registry.test.ts,
tests/worker/session-lifecycle-guard.test.ts,
tests/services/worker/reap-stale-sessions.test.ts.
Pass count: 1451 → 1407 (-44), all attributable to deleted test
files. Zero new failures. 31 pre-existing failures remain
(schema-repair suite, logger-usage-standards, environmental
openclaw / plugin-distribution) — none introduced by Plan 02.

All 10 verification greps return 0. bun run build succeeds.

Plan: PATHFINDER-2026-04-22/02-process-lifecycle.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: land PATHFINDER Plan 04 (narrowed) — search fail-fast

Phases 3, 5, 6 only. Plan-doc inaccuracies for phases 1/2/4/7/8/9
deferred for plan reconciliation:
  - Phase 1/2: ObservationRow type doesn't exist; the four
    "formatters" operate on three incompatible types.
  - Phase 4: RECENCY_WINDOW_MS already imported from
    SEARCH_CONSTANTS at every call site.
  - Phase 7: getExistingChromaIds is NOT @deprecated and has an
    active caller in ChromaSync.backfillMissingSyncs.
  - Phase 8: estimateTokens already consolidated.
  - Phase 9: knowledge-corpus rewrite blocked on PG-3
    prompt-caching cost smoke test.

Phase 3 — Delete SearchManager.findByConcept/findByFile/findByType.
SearchRoutes handlers (handleSearchByConcept/File/Type) now call
searchManager.getOrchestrator().findByXxx() directly via new
getter accessors on SearchManager. ~250 LoC deleted.

Phase 5 — Fail-fast Chroma. Created
src/services/worker/search/errors.ts with ChromaUnavailableError
extends AppError(503, 'CHROMA_UNAVAILABLE'). Deleted
SearchOrchestrator.executeWithFallback's Chroma-failed
SQLite-fallback branch; runtime Chroma errors now throw 503.
"Path 3" (chromaSync was null at construction — explicit-
uninitialized config) preserved as legitimate empty-result state
per plan text. ChromaSearchStrategy.search no longer wraps in
try/catch — errors propagate.

Phase 6 — Delete HybridSearchStrategy three try/catch silent
fallback blocks (findByConcept, findByType, findByFile) at lines
~82-95, ~120-132, ~161-172. Removed `fellBack` field from
StrategySearchResult type and every return site
(SQLiteSearchStrategy, BaseSearchStrategy.emptyResult,
SearchOrchestrator).

Tests updated (Principle 7 — delete in same PR):
  - search-orchestrator.test.ts: "fall back to SQLite" rewritten
    as "throw ChromaUnavailableError (HTTP 503)".
  - chroma/hybrid/sqlite-search-strategy tests: rewritten to
    rejects.toThrow; removed fellBack assertions.

Verification: SearchManager.findBy → 0; fellBack → 0 in src/.
bun test tests/worker/search/ → 122 pass, 0 fail.
bun test (suite-wide) → 1407 pass, baseline maintained, 0 new
failures. bun run build succeeds.

Plan: PATHFINDER-2026-04-22/04-read-path.md (Phases 3, 5, 6)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: land PATHFINDER Plan 03 — ingestion path

Fail-fast parser, direct in-process ingest, recursive fs.watch,
DB-backed tool pairing. Worker-internal HTTP loopback eliminated.

- Phase 0: Created src/services/worker/http/shared.ts exporting
  ingestObservation/ingestPrompt/ingestSummary as direct
  in-process functions plus ingestEventBus (Node EventEmitter,
  reusing existing pattern — no third event bus introduced).
  setIngestContext wires the SessionManager dependency from
  worker-service constructor.
- Phase 1: src/sdk/parser.ts collapsed to one parseAgentXml
  returning { valid:true; kind: 'observation'|'summary'; data }
  | { valid:false; reason: string }. Inspects root element;
  <skip_summary reason="…"/> is a first-class summary case
  with skipped:true. NEVER returns undefined. NEVER coerces.
- Phase 2: ResponseProcessor calls parseAgentXml exactly once,
  branches on the discriminated union. On invalid → markFailed
  + logger.warn(reason). On observation → ingestObservation.
  On summary → ingestSummary then emit summaryStoredEvent
  { sessionId, messageId } (consumed by Plan 05's blocking
  /api/session/end).
- Phase 3: Deleted consecutiveSummaryFailures field
  (ResponseProcessor + SessionManager + worker-types) and
  MAX_CONSECUTIVE_SUMMARY_FAILURES constant. Circuit-breaker
  guards and "tripped" log lines removed.
- Phase 4: coerceObservationToSummary deleted from sdk/parser.ts.
- Phase 5: src/services/transcripts/watcher.ts rescan setInterval
  replaced with fs.watch(transcriptsRoot, { recursive: true,
  persistent: true }) — Node 20+ recursive mode.
- Phase 6: src/services/transcripts/processor.ts pendingTools
  Map deleted. tool_use rows insert with INSERT OR IGNORE on
  UNIQUE(session_id, tool_use_id) (added by Plan 01). New
  pairToolUsesByJoin query in PendingMessageStore for read-time
  pairing (UNIQUE INDEX provides idempotency; explicit consumer
  not yet wired).
- Phase 7: HTTP loopback at processor.ts:252 replaced with
  direct ingestObservation call. maybeParseJson silent-passthrough
  rewritten to fail-fast (throws on malformed JSON).
- Phase 8: src/utils/tag-stripping.ts countTags + stripTagsInternal
  collapsed into one alternation regex, single-pass over input.
- Phase 9: src/utils/transcript-parser.ts (dead TranscriptParser
  class) deleted. The active extractLastMessage at
  src/shared/transcript-parser.ts:41-144 is the sole survivor.

Tests updated (Principle 7 — same-PR delete):
  - tests/sdk/parser.test.ts + parse-summary.test.ts: rewritten
    to assert discriminated-union shape; coercion-specific
    scenarios collapse into { valid:false } assertions.
  - tests/worker/agents/response-processor.test.ts: circuit-breaker
    describe block skipped; non-XML/empty-response tests assert
    fail-fast markFailed behavior.

Verification: every grep returns 0. transcript-parser.ts deleted.
bun run build succeeds. bun test → 1399 pass / 28 fail / 7 skip
(net -8 pass = the 4 retired circuit-breaker tests + 4 collapsed
parser cases). Zero new failures vs baseline.

Deferred (out of Plan 03 scope, will land in Plan 06): SessionRoutes
HTTP route handlers still call sessionManager.queueObservation
inline rather than the new shared helpers — the helpers are ready,
the route swap is mechanical and belongs with the Zod refactor.

Plan: PATHFINDER-2026-04-22/03-ingestion-path.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: land PATHFINDER Plan 05 — hook surface

Worker-call plumbing collapsed to one helper. Polling replaced by
server-side blocking endpoint. Fail-loud counter surfaces persistent
worker outages via exit code 2.

- Phase 1: plugin/hooks/hooks.json — three 20-iteration `for i in
  1..20; do curl -sf .../health && break; sleep 0.1; done` shell
  retry wrappers deleted. Hook commands invoke their bun entry
  point directly.
- Phase 2: src/shared/worker-utils.ts — added
  executeWithWorkerFallback<T>(url, method, body) returning
  T | { continue: true; reason?: string }. All 8 hook handlers
  (observation, session-init, context, file-context, file-edit,
  summarize, session-complete, user-message) rewritten to use
  it instead of duplicating the ensureWorkerRunning →
  workerHttpRequest → fallback sequence.
- Phase 3: blocking POST /api/session/end in SessionRoutes.ts
  using validateBody + sessionEndSchema (z.object({sessionId})).
  One-shot ingestEventBus.on('summaryStoredEvent') listener,
  30 s timer, req.aborted handler — all share one cleanup so
  the listener cannot leak. summarize.ts polling loop, plus
  MAX_WAIT_FOR_SUMMARY_MS / POLL_INTERVAL_MS constants, deleted.
- Phase 4: src/shared/hook-settings.ts — loadFromFileOnce()
  memoizes SettingsDefaultsManager.loadFromFile per process.
  Per-handler settings reads collapsed.
- Phase 5: src/shared/should-track-project.ts — single exclusion
  check entry; isProjectExcluded no longer referenced from
  src/cli/handlers/.
- Phase 6: cwd validation pushed into adapter normalizeInput
  (all 6 adapters: claude-code, cursor, raw, gemini-cli,
  windsurf). New AdapterRejectedInput error in
  src/cli/adapters/errors.ts. Handler-level isValidCwd checks
  deleted from file-edit.ts and observation.ts. hook-command.ts
  catches AdapterRejectedInput → graceful fallback.
- Phase 7: session-init.ts conditional initAgent guard deleted;
  initAgent is idempotent. tests/hooks/context-reinjection-guard
  test (validated the deleted conditional) deleted in same PR
  per Principle 7.
- Phase 8: fail-loud counter at ~/.claude-mem/state/hook-failures
  .json. Atomic write via .tmp + rename. CLAUDE_MEM_HOOK_FAIL_LOUD
  _THRESHOLD setting (default 3). On consecutive worker-unreachable
  ≥ N: process.exit(2). On success: reset to 0. NOT a retry.
- Phase 9: ensureWorkerAliveOnce() module-scope memoization
  wrapping ensureWorkerRunning. executeWithWorkerFallback calls
  the memoized version.

Minimal validateBody middleware stub at
src/services/worker/http/middleware/validateBody.ts. Plan 06 will
expand with typed inference + error envelope conventions.

Verification: 4/4 grep targets pass. bun run build succeeds.
bun test → 1393 pass / 28 fail / 7 skip; -6 pass attributable
solely to deleted context-reinjection-guard test file. Zero new
failures vs baseline.

Plan: PATHFINDER-2026-04-22/05-hook-surface.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: land PATHFINDER Plan 06 — API surface

One Zod-based validator wrapping every POST/PUT. Rate limiter,
diagnostic endpoints, and shutdown wrappers deleted. Failure-
marking consolidated to one helper.

- Phase 1 (preflight): zod@^3 already installed.
- Phase 2: validateBody middleware confirmed at canonical shape
  in src/services/worker/http/middleware/validateBody.ts —
  safeParse → 400 { error: 'ValidationError', issues: [...] }
  on failure, replaces req.body with parsed value on success.
- Phase 3: Per-route Zod schemas declared at the top of each
  route file. 24 POST endpoints across SessionRoutes,
  CorpusRoutes, DataRoutes, MemoryRoutes, SearchRoutes,
  LogsRoutes, SettingsRoutes now wrap with validateBody().
  /api/session/end (Plan 05) confirmed using same middleware.
- Phase 4: validateRequired() deleted from BaseRouteHandler
  along with every call site. Inline coercion helpers
  (coerceStringArray, coercePositiveInteger) and inline
  if (!req.body...) guards deleted across all route files.
- Phase 5: Rate limiter middleware and its registration deleted
  from src/services/worker/http/middleware.ts. Worker binds
  127.0.0.1:37777 — no untrusted caller.
- Phase 6: viewer.html cached at module init in ViewerRoutes.ts
  via fs.readFileSync; served as Buffer with text/html content
  type. SKILL.md + per-operation .md files cached in
  Server.ts as Map<string, string>; loadInstructionContent
  helper deleted. NO fs.watch, NO TTL — process restart is the
  cache-invalidation event.
- Phase 7: Four diagnostic endpoints deleted from DataRoutes.ts
  — /api/pending-queue (GET), /api/pending-queue/process (POST),
  /api/pending-queue/failed (DELETE), /api/pending-queue/all
  (DELETE). Helper methods that ONLY served them
  (getQueueMessages, getStuckCount, getRecentlyProcessed,
  clearFailed, clearAll) deleted from PendingMessageStore.
  KEPT: /api/processing-status (observability), /health
  (used by ensureWorkerRunning).
- Phase 8: stopSupervisor wrapper deleted from supervisor/index.ts.
  GracefulShutdown now calls getSupervisor().stop() directly.
  Two functions retained with clear roles:
    - performGracefulShutdown — worker-side 6-step shutdown
    - runShutdownCascade — supervisor-side child teardown
      (process.kill(-pgid), Windows tree-kill, PID-file cleanup)
  Each has unique non-trivial logic and a single canonical caller.
- Phase 9: transitionMessagesTo(status, filter) is the sole
  failure-marking path on PendingMessageStore. Old methods
  markSessionMessagesFailed and markAllSessionMessagesAbandoned
  deleted along with all callers (worker-service,
  SessionCompletionHandler, tests/zombie-prevention).

Tests updated (Principle 7 same-PR delete): coercion test files
refactored to chain validateBody → handler. Zombie-prevention
tests rewritten to call transitionMessagesTo.

Verification: all 4 grep targets → 0. bun run build succeeds.
bun test → 1393 pass / 28 fail / 7 skip — exact match to
baseline. Zero new failures.

Plan: PATHFINDER-2026-04-22/06-api-surface.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: land PATHFINDER Plan 07 — dead code sweep

ts-prune-driven sweep across the tree after Plans 01-06 landed.
Deleted unused exports, orphan helpers, and one fully orphaned
file. Earlier-plan deletions verified.

Deleted:
- src/utils/bun-path.ts (entire file — getBunPath, getBunPathOrThrow,
  isBunAvailable: zero importers)
- bun-resolver.getBunVersionString: zero callers
- PendingMessageStore.retryMessage / resetProcessingToPending /
  abortMessage: superseded by transitionMessagesTo (Plan 06 Phase 9)
- EnvManager.MANAGED_CREDENTIAL_KEYS, EnvManager.setCredential:
  zero callers
- CodexCliInstaller.checkCodexCliStatus: zero callers; no status
  command exists in npx-cli
- Two "REMOVED: cleanupOrphanedSessions" stale-fence comments

Kept (with documented justification):
- Public API surface in dist/sdk/* (parseAgentXml, prompt
  builders, ParsedObservation, ParsedSummary, ParseResult,
  SUMMARY_MODE_MARKER) — exported via package.json sdk path.
- generateContext / loadContextConfig / token utilities — used
  via dynamic await import('../../../context-generator.js') in
  worker SearchRoutes.
- MCP_IDE_INSTALLERS, install/uninstall functions for codex/goose
  — used via dynamic await import in npx-cli/install.ts +
  uninstall.ts (ts-prune cannot trace dynamic imports).
- getExistingChromaIds — active caller in
  ChromaSync.backfillMissingSyncs (Plan 04 narrowed scope).
- processPendingQueues / getSessionsWithPendingMessages — active
  orphan-recovery caller in worker-service.ts plus
  zombie-prevention test coverage.
- StoreAndMarkCompleteResult legacy alias — return-type annotation
  in same file.
- All Database.ts barrel re-exports — used downstream.

Earlier-plan verification:
- Plan 03 Phase 9: VERIFIED — src/utils/transcript-parser.ts
  is gone; TranscriptParser has 0 references in src/.
- Plan 01 Phase 8: VERIFIED — migration 19 no-op absorbed.
- SessionStore.ts:52-70 consolidation NOT executed (deferred):
  the methods are not thin wrappers but ~900 LoC of bodies, and
  two methods are documented as intentional mirrors so the
  context-generator.cjs bundle stays schema-consistent without
  pulling MigrationRunner. Deserves its own plan, not a sweep.

Verification: TranscriptParser → 0; transcript-parser.ts → gone;
no commented-out code markers remain. bun run build succeeds.
bun test → 1393 pass / 28 fail / 7 skip — EXACT match to
baseline. Zero regressions.

Plan: PATHFINDER-2026-04-22/07-dead-code.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: remove residual ProcessRegistry comment reference

Plan 07 dead-code sweep missed one comment-level reference to the
deleted in-memory ProcessRegistry class in SessionManager.ts:347.
Rewritten to describe the supervisor.json scope without naming the
deleted class, completing the verification grep target.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address Greptile review (P1 + 2× P2)

P1 — Plan 05 Phase 3 blocking endpoint was non-functional:
executeWithWorkerFallback used HEALTH_CHECK_TIMEOUT_MS (3 s) for
the POST /api/session/end call, but the server holds the
connection for SERVER_SIDE_SUMMARY_TIMEOUT_MS (30 s). Client
always raced to a "timed out" rejection that isWorkerUnavailable
classified as worker-unreachable, so the hook silently degraded
instead of waiting for summaryStoredEvent.
  - Added optional timeoutMs to executeWithWorkerFallback,
    forwarded to workerHttpRequest.
  - summarize.ts call site now passes 35_000 (5 s above server
    hold window).

P2 — ingestSummary({ kind: 'parsed' }) branch was dead code:
ResponseProcessor emitted summaryStoredEvent directly via the
event bus, bypassing the centralized helper that the comment
claimed was the single source.
  - ResponseProcessor now calls ingestSummary({ kind: 'parsed',
    sessionDbId, messageId, contentSessionId, parsed }) so the
    event-emission path is single-sourced.
  - ingestSummary's requireContext() resolution moved inside the
    'queue' branch (the only branch that needs sessionManager /
    dbManager). 'parsed' is a pure event-bus emission and
    doesn't need worker-internal context — fixes mocked
    ResponseProcessor unit tests that don't call
    setIngestContext.

P2 — isWorkerFallback could false-positive on legitimate API
responses whose schema includes { continue: true, ... }:
  - Added a Symbol.for('claude-mem/worker-fallback') brand to
    WorkerFallback. isWorkerFallback now checks the brand, not
    a duck-typed property name.

Verification: bun run build succeeds. bun test → 1393 pass /
28 fail / 7 skip — exact baseline match. Zero new failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address Greptile iteration 2 (P1 + P2)

P1 — summaryStoredEvent fired regardless of whether the row was
persisted. ResponseProcessor's call to ingestSummary({ kind:
'parsed' }) ran for every parsed.kind === 'summary' even when
result.summaryId came back null (e.g. FK violation, null
memory_session_id at commit). The blocking /api/session/end
endpoint then returned { ok: true } and the Stop hook logged
'Summary stored' for a non-existent row.

  - Gate ingestSummary call on (parsed.data.skipped ||
    session.lastSummaryStored). Skipped summaries are an explicit
    no-op bypass and still confirm; real summaries only confirm
    when storage actually wrote a row.
  - Non-skipped + summaryId === null path logs a warn and lets
    the server-side timeout (504) surface to the hook instead of
    a false ok:true.

P2 — PendingMessageStore.enqueue() returns 0 when INSERT OR
IGNORE suppresses a duplicate (the UNIQUE(session_id, tool_use_id)
constraint added by Plan 01 Phase 1). The two callers
(SessionManager.queueObservation and queueSummarize) previously
logged 'ENQUEUED messageId=0' which read like a row was inserted.

  - Branch on messageId === 0 and emit a 'DUP_SUPPRESSED' debug
    log instead of the misleading ENQUEUED line. No behavior
    change — the duplicate is still correctly suppressed by the
    DB (Principle 3); only the log surface is corrected.
  - confirmProcessed is never called with the enqueue() return
    value (it operates on session.processingMessageIds[] from
    claimNextMessage), so no caller is broken; the visibility
    fix prevents future misuse.

Verification: bun run build succeeds. bun test → 1393 pass /
28 fail / 7 skip — exact baseline match. Zero new failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address Greptile iteration 3 (P1 + 2× P2)

- P1 worker-service.ts: wire ensureGeneratorRunning into the ingest
  context after SessionRoutes is constructed. setIngestContext runs
  before routes exist, so transcript-watcher observations queued via
  ingestObservation() had no way to auto-start the SDK generator.
  Added attachIngestGeneratorStarter() to patch the callback in.
- P2 shared.ts: IngestEventBus now sets maxListeners to 0. Concurrent
  /api/session/end calls register one listener each and clean up on
  completion, so the default-10 warning fires spuriously under normal
  load.
- P2 SessionRoutes.ts: handleObservationsByClaudeId now delegates to
  ingestObservation() instead of duplicating skip-tool / meta /
  privacy / queue logic. Single helper, matching the Plan 03 goal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address Greptile iteration 4 (P1 tool-pair + P2 parse/path/doc)

- processor.handleToolResult: restore in-memory tool-use→tool-result
  pairing via session.pendingTools for schemas (e.g. Codex) whose
  tool_result events carry only tool_use_id + output. Without this,
  neither handler fired — all tool observations silently dropped.
- processor.maybeParseJson: return raw string on parse failure instead
  of throwing. Previously a single malformed JSON-shaped field caused
  handleLine's outer catch to discard the entire transcript line.
- watcher.deepestNonGlobAncestor: split on / and \\, emit empty string
  for purely-glob inputs so the caller skips the watch instead of
  anchoring fs.watch at the filesystem root. Windows-compatible.
- PendingMessageStore.enqueue: tighten docstring — callers today only
  log on the returned id; the SessionManager branches on id === 0.

* fix: forward tool_use_id through ingestObservation (Greptile iter 5)

P1 — Plan 01's UNIQUE(content_session_id, tool_use_id) dedup never
fired because the new shared ingest path dropped the toolUseId before
queueObservation. SQLite treats NULL values as distinct for UNIQUE,
so every replayed transcript line landed a duplicate row.

- shared.ingestObservation: forward payload.toolUseId to
  queueObservation so INSERT OR IGNORE can actually collapse.
- SessionRoutes.handleObservationsByClaudeId: destructure both
  tool_use_id (HTTP convention) and toolUseId (JS convention) from
  req.body and pass into ingestObservation.
- observationsByClaudeIdSchema: declare both keys explicitly so the
  validator doesn't rely on .passthrough() alone.

* fix: drop dead pairToolUsesByJoin, close session-end listener race

- PendingMessageStore: delete pairToolUsesByJoin. The method was never
  called and its self-join semantics are structurally incompatible
  with UNIQUE(content_session_id, tool_use_id): INSERT OR IGNORE
  collapses any second row with the same pair, so a self-join can
  only ever match a row to itself. In-memory pendingTools in
  processor.ts remains the pairing path for split-event schemas.

- IngestEventBus: retain a short-lived (60s) recentStored map keyed
  by sessionId. Populated on summaryStoredEvent emit, evicted on
  consume or TTL.

- handleSessionEnd: drain the recent-events buffer before attaching
  the listener. Closes the register-after-emit race where the summary
  can persist between the hook's summarize POST and its session/end
  POST — previously that window returned 504 after the 30s timeout.

* chore: merge origin/main into vivacious-teeth

Resolves conflicts with 15 commits on main (v12.3.9, security
observation types, Telegram notifier, PID-reuse worker start-guard).

Conflict resolution strategy:
- plugin/hooks/hooks.json, plugin/scripts/*.cjs, plugin/ui/viewer-bundle.js:
  kept ours — PATHFINDER Plan 05 deletes the for-i-in-1-to-20 curl retry
  loops and the built artifacts regenerate on build.
- src/cli/handlers/summarize.ts: kept ours — Plan 05 blocking
  POST /api/session/end supersedes main's fire-and-forget path.
- src/services/worker-service.ts: kept ours — Plan 05 ingest bus +
  summaryStoredEvent supersedes main's SessionCompletionHandler DI
  refactor + orphan-reaper fallback.
- src/services/worker/http/routes/SessionRoutes.ts: kept ours — same
  reason; generator .finally() Stop-hook self-clean is a guard for a
  path our blocking endpoint removes.
- src/services/worker/http/routes/CorpusRoutes.ts: merged — added
  security_alert / security_note to ALLOWED_CORPUS_TYPES (feature from
  #2084) while preserving our Zod validateBody schema.

Typecheck: 294 errors (vs 298 pre-merge). No new errors introduced; all
remaining are pre-existing (Component-enum gaps, DOM lib for viewer,
bun:sqlite types).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address Greptile P2 findings

1) SessionRoutes.handleSessionEnd was the only route handler not wrapped
   in wrapHandler — synchronous exceptions would hang the client rather
   than surfacing as 500s. Wrap it like every other handler.

2) processor.handleToolResult only consumed the session.pendingTools
   entry when the tool_result arrived without a toolName. In the
   split-schema path where tool_result carries both toolName and toolId,
   the entry was never deleted and the map grew for the life of the
   session. Consume the entry whenever toolId is present.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: typing cleanup and viewer tsconfig split for PR feedback

- Add explicit return types for SessionStore query methods
- Exclude src/ui/viewer from root tsconfig, give it its own DOM-typed config
- Add bun to root tsconfig types, plus misc typing tweaks flagged by Greptile
- Rebuilt plugin/scripts/* artifacts

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address Greptile P2 findings (iter 2)

- PendingMessageStore.transitionMessagesTo: require sessionDbId (drop
  the unscoped-drain branch that would nuke every pending/processing
  row across all sessions if a future caller omitted the filter).
- IngestEventBus.takeRecentSummaryStored: make idempotent — keep the
  cached event until TTL eviction so a retried Stop hook's second
  /api/session/end returns immediately instead of hanging 30 s.
- TranscriptWatcher fs.watch callback: skip full glob scan for paths
  already tailed (JSONL appends fire on every line; only unknown
  paths warrant a rescan).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: call finalizeSession in terminal session paths (Greptile iter 3)

terminateSession and runFallbackForTerminatedSession previously called
SessionCompletionHandler.finalizeSession before removeSessionImmediate;
the refactor dropped those calls, leaving sdk_sessions.status='active'
for every session killed by wall-clock limit, unrecoverable error, or
exhausted fallback chain. The deleted reapStaleSessions interval was
the only prior backstop.

Re-wires finalizeSession (idempotent: marks completed, drains pending,
broadcasts) into both paths; no reaper reintroduced.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: GC failed pending_messages rows at startup (Greptile iter 4)

Plan 07 deleted clearFailed/clearFailedOlderThan as "dead code", but
with the periodic sweep also removed, nothing reaps status='failed'
rows now — they accumulate indefinitely. Since claimNextMessage's
self-healing subquery scans this table, unbounded growth degrades
claim latency over time.

Re-introduces clearFailedOlderThan and calls it once at worker startup
(not a reaper — one-shot, idempotent). 7-day retention keeps enough
history for operator inspection while bounding the table.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: finalize sessions on normal exit; cleanup hoist; share handler (iter 5)

1. startSessionProcessor success branch now calls completionHandler.
   finalizeSession before removeSessionImmediate. Hooks-disabled installs
   (and any Stop hook that fails before POST /api/sessions/complete) no
   longer leave sdk_sessions rows as status='active' forever. Idempotent
   — a subsequent /api/sessions/complete is a no-op.

2. Hoist SessionRoutes.handleSessionEnd cleanup declaration above the
   closures that reference it (TDZ safety; safe at runtime today but
   fragile if timeout ever shrinks).

3. SessionRoutes now receives WorkerService's shared SessionCompletionHandler
   instead of constructing its own — prevents silent divergence if the
   handler ever becomes stateful.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: stop runaway crash-recovery loop on dead sessions

Two distinct bugs were combining to keep a dead session restarting forever:

Bug 1 (uncaught "The operation was aborted."):
  child_process.spawn emits 'error' asynchronously for ENOENT/EACCES/abort
  signal aborts. spawnSdkProcess() never attached an 'error' listener, so
  any async spawn failure became uncaughtException and escaped to the
  daemon-level handler. Attach an 'error' listener immediately after spawn,
  before the !child.pid early-return, so async spawn errors are logged
  (with errno code) and swallowed locally.

Bug 2 (sliding-window limiter never trips on slow restart cadence):
  RestartGuard tripped only when restartTimestamps.length exceeded
  MAX_WINDOWED_RESTARTS (10) within RESTART_WINDOW_MS (60s). With the 8s
  exponential-backoff cap, only ~7-8 restarts fit in the window, so a dead
  session that fail-restart-fail-restart on 8s cycles would loop forever
  (consecutiveRestarts climbing past 30+ in observed logs). Add a
  consecutiveFailures counter that increments on every restart and resets
  only on recordSuccess(). Trip when consecutive failures exceed
  MAX_CONSECUTIVE_FAILURES (5) — meaning 5 restarts with zero successful
  processing in between proves the session is dead. Both guards now run in
  parallel: tight loops still trip the windowed cap; slow loops trip the
  consecutive-failure cap.

Also: when the SessionRoutes path trips the guard, drain pending messages
to 'abandoned' so the session does not reappear in
getSessionsWithPendingMessages and trigger another auto-start cycle. The
worker-service.ts path already does this via terminateSession.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf: streamline worker startup and consolidate database connections

1. Database Pooling: Modified DatabaseManager, SessionStore, and SessionSearch to share a single bun:sqlite connection, eliminating redundant file descriptors.
2. Non-blocking Startup: Refactored WorktreeAdoption and Chroma backfill to run in the background (fire-and-forget), preventing them from stalling core initialization.
3. Diagnostic Routes: Added /api/chroma/status and bypassed the initialization guard for health/readiness endpoints to allow diagnostics during startup.
4. Robust Search: Implemented reliable SQLite FTS5 fallback in SearchManager for when Chroma (uvx) fails or is unavailable.
5. Code Cleanup: Removed redundant loopback MCP checks and mangled initialization logic from WorkerService.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: hard-exclude observer-sessions from hooks; bundle migration 29 (#2124)

* fix: hard-exclude observer-sessions from hooks; backfill bundle migrations

Stop hook + SessionEnd hook were storing the SDK observer's own
init/continuation/summary prompts in user_prompts, leaking into the
viewer (meta-observation regression). 25 such rows accumulated.

- shouldTrackProject: hard-reject OBSERVER_SESSIONS_DIR (and its subtree)
  before consulting user-configured exclusion globs.
- summarize.ts (Stop) and session-complete.ts (SessionEnd): early-return
  when shouldTrackProject(cwd) is false, so the observer's own hooks
  cannot bootstrap the worker or queue a summary against the meta-session.
- SessionRoutes: cap user-prompt body at 256 KiB at the session-init
  boundary so a runaway observer prompt cannot blow up storage.
- SessionStore: add migration 29 (UNIQUE(memory_session_id, content_hash)
  on observations) inline so bundled artifacts (worker-service.cjs,
  context-generator.cjs) stay schema-consistent — without it, the
  ON CONFLICT clause in observation inserts throws.
- spawnSdkProcess: stdio[stdin] from 'ignore' to 'pipe' so the
  supervisor can actually feed the observer's stdin.

Also rebuilds plugin/scripts/{worker-service,context-generator}.cjs.

* fix: walk back to UTF-8 boundary on prompt truncation (Greptile P2)

Plain Buffer.subarray at MAX_USER_PROMPT_BYTES can land mid-codepoint,
which the utf8 decoder silently rewrites to U+FFFD. Walk back over any
continuation bytes (0b10xxxxxx) before decoding so the truncated prompt
ends on a valid sequence boundary instead of a replacement character.

* fix: cross-platform observer-dir containment; clarify SDK stdin pipe

claude-review feedback on PR #2124.

- shouldTrackProject: literal `cwd.startsWith(OBSERVER_SESSIONS_DIR + '/')`
  hard-coded a POSIX separator and missed Windows backslash paths plus any
  trailing-slash variance. Switched to a path.relative-based isWithin()
  helper so Windows hook input under observer-sessions\\... is also excluded.
- spawnSdkProcess: added a comment explaining why stdin must be 'pipe' —
  SpawnedSdkProcess.stdin is typed NonNullable and the Claude Agent SDK
  consumes that pipe; 'ignore' would null it and the null-check below
  would tear the child down on every spawn.

* fix: make Stop hook fire-and-forget; remove dead /api/session/end

The Stop hook was awaiting a 35-second long-poll on /api/session/end,
which the worker held open until the summary-stored event fired (or its
30s server-side timeout elapsed). Followed by another await on
/api/sessions/complete. Three sequential awaits, the middle one a 30s
hold — not fire-and-forget despite repeated requests.

The Stop hook now does ONE thing: POST /api/sessions/summarize to
queue the summary work and return. The worker drives the rest async.
Session-map cleanup is performed by the SessionEnd handler
(session-complete.ts), not duplicated here.

- summarize.ts: drop the /api/session/end long-poll and the trailing
  /api/sessions/complete await; ~40 lines removed; unused
  SessionEndResponse interface gone; header comment rewritten.
- SessionRoutes: delete handleSessionEnd, sessionEndSchema, the
  SERVER_SIDE_SUMMARY_TIMEOUT_MS constant, and the /api/session/end
  route registration. Drop the now-unused ingestEventBus and
  SummaryStoredEvent imports.
- ResponseProcessor + shared.ts + worker-utils.ts: update stale
  comments that referenced the dead endpoint. The IngestEventBus is
  left in place dormant (no listeners) for follow-up cleanup so this
  PR stays focused on the blocker.

Bundle artifact (worker-service.cjs) rebuilt via build-and-sync.

Verification:
- grep '/api/session/end' plugin/scripts/worker-service.cjs → 0
- grep 'timeoutMs:35' plugin/scripts/worker-service.cjs → 0
- Worker restarted clean, /api/health ok at pid 92368

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* deps: bump all dependencies to latest including majors

Upgrades: React 18→19, Express 4→5, Zod 3→4, TypeScript 5→6,
@types/node 20→25, @anthropic-ai/claude-agent-sdk 0.1→0.2,
@clack/prompts 0.9→1.2, plus minors. Adds Daily Maintenance section
to CLAUDE.md mandating latest-version policy across manifests.

Express 5 surfaced a race in Server.listen() where the 'error' handler
was attached after listen() was invoked; refactored to use
http.createServer with both 'error' and 'listening' handlers attached
before listen(), restoring port-conflict rejection semantics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: surface real chroma errors and add deep status probe

Replace the misleading "Vector search failed - semantic search unavailable.
Install uv... restart the worker." string in SearchManager with the actual
exception text from chroma_query_documents. The lying message blamed `uv`
for any failure — even when the real cause was a chroma-mcp transport
timeout, an empty collection, or a dead subprocess.

Also add /api/chroma/status?deep=1 backed by a new
ChromaMcpManager.probeSemanticSearch() that round-trips a real query
(chroma_list_collections + chroma_query_documents) instead of just
checking the stdio handshake. The cheap default path is unchanged.

Includes the diagnostic plan (PLAN-fix-mcp-search.md) and updated test
fixtures for the new structured failure message.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: rebuild worker-service bundle to match merged src

Bundle was stale after the squash merge of #2124 — it still contained
the old "Install uv... semantic search unavailable" string and lacked
probeSemanticSearch. Rebuilt via bun run build-and-sync.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: address coderabbit feedback on PLAN-fix-mcp-search.md

- replace machine-specific /Users/alexnewman absolute paths with portable
  <repo-root> placeholder (MD-style portability)
- add blank lines around the TypeScript fenced block (MD031)
- tag the bare fenced block with `text` (MD040)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Alex Newman
2026-04-25 13:37:40 -07:00
committed by GitHub
parent 8ace1d9c84
commit 94d592f212
159 changed files with 18091 additions and 5843 deletions
+74
View File
@@ -0,0 +1,74 @@
# Pathfinder Phase 0: Feature Inventory
**Date**: 2026-04-21
**Repo**: claude-mem (vivacious-teeth branch)
**Total Features**: 12
---
## 1. lifecycle-hooks
- **Purpose**: Intercepts Claude Code session lifecycle (SessionStart → UserPromptSubmit → PostToolUse → Summary → SessionEnd) to capture tool usage and trigger downstream processing.
- **Entry Points**: `src/hooks/hook-response.ts:1`, `src/services/worker-service.ts:23`, `src/supervisor/index.ts:1`
- **Core Files**: `src/hooks/hook-response.ts`, `src/services/infrastructure/GracefulShutdown.ts`, `src/supervisor/index.ts`, `src/supervisor/process-registry.ts`, `src/shared/hook-constants.ts`
## 2. privacy-tag-filtering
- **Purpose**: Strips privacy-control tags (`<private>`, `<system-reminder>`, `<system-instruction>`, `<claude-mem-context>`) from content at edge before storage.
- **Entry Points**: `src/utils/tag-stripping.ts:24`, `src/utils/tag-stripping.ts:51`, `src/utils/tag-stripping.ts:75`
- **Core Files**: `src/utils/tag-stripping.ts`, `src/services/worker/agents/ResponseProcessor.ts`, `src/cli/handlers/observation.ts`
## 3. sqlite-persistence
- **Purpose**: SQLite3-backed storage for observations, summaries, sessions, prompts, pending messages; schema migrations and transactions.
- **Entry Points**: `src/services/sqlite/Database.ts:1`, `src/services/sqlite/migrations/runner.ts:1`, `src/services/sqlite/observations/store.ts:1`
- **Core Files**: `src/services/sqlite/Database.ts`, `src/services/sqlite/Observations.ts`, `src/services/sqlite/Summaries.ts`, `src/services/sqlite/SessionStore.ts`, `src/services/sqlite/PendingMessageStore.ts`, `src/services/sqlite/migrations.ts`
## 4. vector-search-sync
- **Purpose**: Syncs observations and session summaries to ChromaDB via MCP for semantic search.
- **Entry Points**: `src/services/sync/ChromaSync.ts:75`
- **Core Files**: `src/services/sync/ChromaSync.ts`, `src/services/sync/ChromaMcpManager.ts`
## 5. context-injection-engine
- **Purpose**: Generates contextual observations injected into session prompts using token budgets, mode-based filtering, semantic relevance.
- **Entry Points**: `src/services/context/ContextBuilder.ts:46`, `src/services/context/ObservationCompiler.ts:1`, `src/services/context/ContextConfigLoader.ts:17`
- **Core Files**: `src/services/context/ContextBuilder.ts`, `src/services/context/ObservationCompiler.ts`, `src/services/context/TokenCalculator.ts`, `src/services/context/sections/TimelineRenderer.ts`, `src/services/context/sections/SummaryRenderer.ts`, `src/services/context/formatters/AgentFormatter.ts`
## 6. hybrid-search-orchestration
- **Purpose**: Multi-strategy search (Chroma semantic + SQLite keyword + hybrid) with timeline context, formatting, pagination.
- **Entry Points**: `src/services/worker/search/SearchOrchestrator.ts:44`
- **Core Files**: `src/services/worker/search/SearchOrchestrator.ts`, `src/services/worker/search/strategies/ChromaSearchStrategy.ts`, `src/services/worker/search/strategies/SQLiteSearchStrategy.ts`, `src/services/worker/search/strategies/HybridSearchStrategy.ts`, `src/services/worker/search/ResultFormatter.ts`, `src/services/worker/search/TimelineBuilder.ts`
## 7. response-parsing-storage
- **Purpose**: Parses XML observations/summaries from agent responses, atomic DB transactions, SSE broadcasting, message cleanup.
- **Entry Points**: `src/services/worker/agents/ResponseProcessor.ts:49`, `src/sdk/parser.ts:1`
- **Core Files**: `src/services/worker/agents/ResponseProcessor.ts`, `src/sdk/parser.ts`, `src/services/worker/agents/ObservationBroadcaster.ts`, `src/services/worker/agents/SessionCleanupHelper.ts`
## 8. session-lifecycle-management
- **Purpose**: Active session state, pending message queue, subprocess tracking, stale session detection and reaping.
- **Entry Points**: `src/services/worker/SessionManager.ts:1`, `src/services/worker/SessionManager.ts:59`
- **Core Files**: `src/services/worker/SessionManager.ts`, `src/services/worker/ProcessRegistry.ts`, `src/services/queue/SessionQueueProcessor.ts`, `src/services/sqlite/PendingMessageStore.ts`
## 9. http-server-routes
- **Purpose**: Express server on port 37777; middleware, routing for search/viewer/session/data/settings/memory; health checks.
- **Entry Points**: `src/services/server/Server.ts:72`, `src/services/worker/http/routes/SearchRoutes.ts:1`
- **Core Files**: `src/services/server/Server.ts`, `src/services/server/Middleware.ts`, `src/services/server/ErrorHandler.ts`, `src/services/worker/http/routes/SearchRoutes.ts`, `src/services/worker/http/routes/ViewerRoutes.ts`, `src/services/worker/http/routes/SessionRoutes.ts`, `src/services/worker/http/routes/SettingsRoutes.ts`, `src/services/worker/http/routes/MemoryRoutes.ts`
## 10. viewer-ui-layer
- **Purpose**: React frontend at localhost:37777 for browsing memory stream, settings, observations; SSE-driven real-time updates.
- **Entry Points**: `src/ui/viewer/App.tsx:1`, `src/ui/viewer/index.tsx:1`, `src/ui/viewer/hooks/useSSE.ts:1`
- **Core Files**: `src/ui/viewer/App.tsx`, `src/ui/viewer/components/`, `src/ui/viewer/hooks/useSettings.ts`, `src/ui/viewer/hooks/useSSE.ts`, `src/services/worker/SSEBroadcaster.ts`
## 11. knowledge-corpus-builder
- **Purpose**: Compiles filtered observation sets into named corpus files with search, rendering, storage for knowledge agent.
- **Entry Points**: `src/services/worker/knowledge/CorpusBuilder.ts:50`, `src/services/worker/knowledge/KnowledgeAgent.ts:1`
- **Core Files**: `src/services/worker/knowledge/CorpusBuilder.ts`, `src/services/worker/knowledge/CorpusRenderer.ts`, `src/services/worker/knowledge/CorpusStore.ts`, `src/services/worker/knowledge/KnowledgeAgent.ts`, `src/services/worker/http/routes/CorpusRoutes.ts`
## 12. transcript-watcher-integration
- **Purpose**: Watches external transcript files (Cursor, OpenCode) for tool events, parses, injects observations.
- **Entry Points**: `src/services/transcripts/watcher.ts:1`, `src/services/transcripts/processor.ts:33`
- **Core Files**: `src/services/transcripts/watcher.ts`, `src/services/transcripts/processor.ts`, `src/services/transcripts/config.ts`, `src/services/transcripts/types.ts`, `src/services/integrations/CursorHooksInstaller.ts`
---
## Excluded from Feature Inventory (Shared Utilities)
- `src/utils/` (logger, project-name, claude-md-utils)
- `src/shared/` (paths, worker-utils, hook-constants)
- `src/types/` (type definitions)
@@ -0,0 +1,91 @@
# Flowchart: context-injection-engine
## Sources Consulted
- `src/services/worker/http/routes/SearchRoutes.ts:209-249` (handleContextInject)
- `src/services/worker/http/routes/SearchRoutes.ts:258-296` (handleSemanticContext)
- `src/services/context/ContextBuilder.ts:46-186`
- `src/services/context/ContextConfigLoader.ts:17-40`
- `src/services/context/ObservationCompiler.ts:26-189`
- `src/services/context/TokenCalculator.ts:14-78`
- `src/services/context/sections/HeaderRenderer.ts:15-61`
- `src/services/context/sections/TimelineRenderer.ts:21-100`
- `src/services/context/sections/SummaryRenderer.ts:15-65`
- `src/services/context/sections/FooterRenderer.ts:15-42`
- `src/services/context/formatters/AgentFormatter.ts:36-98`
- `src/services/context/formatters/HumanFormatter.ts:35-80`
- `src/services/domain/ModeManager.ts:15-100`
## Happy Path Description
Two-part system. **Route-driven flow** (`/api/context/inject`): GET request with project(s) and `colors=true|false`. Handler parses comma-separated projects (worktree support), imports `generateContext`. ContextBuilder loads mode-specific config (observation types + concepts) from ModeManager, opens SQLite, queries observations and summaries filtered by mode, calculates token economics, and passes raw data to section renderers (Header, Timeline, Summary, Footer). Each renderer branches on `forHuman` — AgentFormatter emits compact markdown for LLMs, HumanFormatter emits ANSI-colored terminal output.
**Semantic flow** (`/api/context/semantic`): POST with user query. Delegates to SearchManager for Chroma similarity, formats top-N as compact markdown with title + narrative. Returns JSON for per-prompt injection.
## Mermaid Flowchart
```mermaid
flowchart TD
HTTPInject["GET /api/context/inject<br/>SearchRoutes.ts:209"] --> ExtractParams["Extract projects + colors<br/>SearchRoutes.ts:211-212"]
HTTPSemantic["POST /api/context/semantic<br/>SearchRoutes.ts:258"] --> ExtractParamsSem["Extract q + project + limit<br/>SearchRoutes.ts:259-261"]
ExtractParams --> ParseProjects["Split comma-separated<br/>SearchRoutes.ts:221"]
ParseProjects --> GenerateCtx["generateContext<br/>ContextBuilder.ts:130"]
ExtractParamsSem --> ValidateQuery["len(q) >= 20<br/>SearchRoutes.ts:263"]
ValidateQuery --> SearchMgr["SearchManager.search via Chroma<br/>SearchRoutes.ts:270"]
SearchMgr --> FormatSemantic["Top-N markdown<br/>SearchRoutes.ts:287-293"]
FormatSemantic --> ReturnSemJSON["Return JSON<br/>SearchRoutes.ts:295"]
GenerateCtx --> LoadConfig["loadContextConfig<br/>ContextBuilder.ts:134"]
LoadConfig --> ModeLoad["ModeManager.getActiveMode<br/>ContextConfigLoader.ts:22"]
ModeLoad --> CreateDB["initializeDatabase<br/>ContextBuilder.ts:152"]
CreateDB --> QueryObs["query observations<br/>ContextBuilder.ts:159"]
QueryObs --> ObsMulti{Multi-project worktree?}
ObsMulti -->|Yes| QueryObsMulti["queryObservationsMulti<br/>ObservationCompiler.ts:105"]
ObsMulti -->|No| QueryObsSingle["queryObservations<br/>ObservationCompiler.ts:26"]
QueryObsMulti --> QuerySumm["query summaries<br/>ContextBuilder.ts:162"]
QueryObsSingle --> QuerySumm
QuerySumm --> CheckEmpty{Empty?<br/>ContextBuilder.ts:167}
CheckEmpty -->|Yes| RenderEmptyState["renderEmptyState<br/>ContextBuilder.ts:73"]
CheckEmpty -->|No| BuildCtxOut["buildContextOutput<br/>ContextBuilder.ts:80-122"]
BuildCtxOut --> CalcEcon["calculateTokenEconomics<br/>TokenCalculator.ts:25"]
CalcEcon --> RenderHeader["renderHeader<br/>HeaderRenderer.ts:15"]
RenderHeader --> FormatMode{forHuman?}
FormatMode -->|true| HumanHeader["HumanFormatter<br/>HumanFormatter.ts:35"]
FormatMode -->|false| AgentHeader["AgentFormatter<br/>AgentFormatter.ts:36"]
HumanHeader --> RenderTimeline["renderTimeline<br/>TimelineRenderer.ts"]
AgentHeader --> RenderTimeline
RenderTimeline --> GroupDays["groupTimelineByDay<br/>TimelineRenderer.ts:21"]
GroupDays --> IterateDays[/"For each day"/]
IterateDays --> FormatDay{forHuman?}
FormatDay -->|true| RenderDayHuman["renderDayTimelineHuman<br/>TimelineRenderer.ts:97"]
FormatDay -->|false| RenderDayAgent["renderDayTimelineAgent<br/>TimelineRenderer.ts:56"]
RenderDayAgent --> CheckSummary["shouldShowSummary<br/>SummaryRenderer.ts:15"]
RenderDayHuman --> CheckSummary
CheckSummary --> RenderPrev["renderPreviouslySection<br/>FooterRenderer.ts:15"]
RenderPrev --> JoinLines["Join + trim<br/>ContextBuilder.ts:121"]
JoinLines --> HTTPReturn["Return text/plain<br/>SearchRoutes.ts:247"]
```
## Side Effects
- DB connection opened, closed in finally (ContextBuilder.ts:184).
- Mode state (ModeManager singleton) drives all filtering.
- Read-only — no writes during generation.
- Semantic path queries Chroma; inject path is SQLite-only.
## External Feature Dependencies
**Calls into:** ModeManager, SessionStore (SQLite), SearchManager (semantic path only), SettingsDefaultsManager, timeline-formatting utilities.
**Called by:** lifecycle-hooks (SessionStart context + UserPromptSubmit semantic), `/api/context/inject` clients (viewer UI), transcript-watcher post-session-end refresh.
## Confidence + Gaps
**High:** Route entry points; orchestration pipeline; mode filtering; Agent vs Human formatter split; token economics.
**Gaps:** HumanFormatter ANSI detail; ModeManager deep-merge inheritance; prior-session message extraction. No duplication observed internally — AgentFormatter/HumanFormatter are cleanly separated by audience.
@@ -0,0 +1,90 @@
# Flowchart: http-server-routes
## Sources Consulted
- `src/services/server/Server.ts:1-286`
- `src/services/server/Middleware.ts`
- `src/services/server/ErrorHandler.ts`
- `src/services/worker/http/middleware.ts`
- `src/services/worker/http/BaseRouteHandler.ts`
- All 8 route files under `src/services/worker/http/routes/`
## Route Inventory
| File | Endpoints | Method(s) | Purpose |
|---|---|---|---|
| ViewerRoutes.ts | `/`, `/health`, `/stream` | GET | UI HTML; SSE broadcaster |
| SearchRoutes.ts | `/api/search`, `/api/timeline`, `/api/decisions`, `/api/changes`, `/api/how-it-works`, `/api/search/*`, `/api/context/*` | GET/POST | Search + context injection |
| SessionRoutes.ts | `/sessions/:id/*`, `/api/sessions/*` | POST/GET/DELETE | Session init/observations/summarize/complete |
| DataRoutes.ts | `/api/observations`, `/api/summaries`, `/api/prompts`, `/api/stats`, `/api/projects`, `/api/processing-status`, `/api/pending-queue` | GET/POST/DELETE | Data retrieval + queue mgmt |
| SettingsRoutes.ts | `/api/settings`, `/api/mcp/*`, `/api/branch/*` | GET/POST | Settings + MCP toggle + branch |
| MemoryRoutes.ts | `/api/memory/save` | POST | Manual observation insert |
| CorpusRoutes.ts | `/api/corpus`, `/api/corpus/:name/*` | GET/POST/DELETE | Knowledge corpus CRUD |
| LogsRoutes.ts | `/api/logs`, `/api/logs/clear` | GET/POST | Log retrieval |
| Server.ts core | `/api/health`, `/api/readiness`, `/api/version`, `/api/instructions`, `/api/admin/*` | GET/POST | System health + admin |
## Happy Path Description
Request → middleware chain (JSON parse 5MB → CORS localhost → rate limit 300/min → request logging) → Express router → route handler extends `BaseRouteHandler` (provides `wrapHandler()` catching sync/async errors) → service call (SearchManager, DatabaseManager, etc.) → response (JSON, SSE, HTML). Global `errorHandler` catches uncaught errors. Admin endpoints require localhost.
## Mermaid Flowchart
```mermaid
flowchart TD
A([Request on :37777]) --> B["Middleware chain"]
B --> B1["JSON parse 5MB"]
B1 --> B2["CORS localhost"]
B2 --> B3["Rate limit 300/min/IP"]
B3 --> B4["Request logger"]
B4 --> C["Router match"]
C --> D{Route found?}
D -->|No| D1["notFoundHandler 404"]
D -->|Yes| E["Handler"]
E --> F["BaseRouteHandler.wrapHandler"]
F --> G{Try}
G -->|success| H["Service call"]
G -->|error| J["handleError"]
H --> I{Response type?}
I -->|JSON| I1["res.status.json"]
I -->|SSE| I3["text/event-stream<br/>register SSEBroadcaster"]
I -->|HTML| I6["file read + send"]
J --> J1["logger.error"]
J1 --> J2{Headers sent?}
J2 -->|No| J3["JSON error response"]
J2 -->|Yes| J4["Skip"]
I1 --> K([Sent])
I3 --> K
I6 --> K
J3 --> K
D1 --> K
L["Global errorHandler middleware"] --> J
```
## Repeated Patterns (Phase 2 candidates)
1. **Try-catch wrapping:** All routes inherit `BaseRouteHandler.wrapHandler()` — consistent, good.
2. **Validation:** Each route validates query/body **independently** — no shared validator middleware. Duplicated shape.
3. **Service injection:** Constructors accept services — consistent DI.
4. **Response shape:**
- Success: `res.status(200).json({ ... })`
- Error: `{ error, message, code?, details? }`
- 404: `notFoundHandler`
- 500: global errorHandler
5. **SSE is structurally different:** stateful persistent connection; managed by `SSEBroadcaster`.
## Side Effects
- SSE client registration grows connection list until close.
- Rate limiter in-memory IP map.
- Logger writes (stderr, async).
- Admin endpoints: `/api/admin/restart` and `/api/admin/shutdown` call `process.exit(0)`.
- File I/O for `/`, `/api/instructions`, `/api/logs` (synchronous).
## External Feature Dependencies
SearchManager, SessionManager, DatabaseManager, SSEBroadcaster, SettingsManager, BranchManager, ModeManager, CorpusStore/Builder/KnowledgeAgent, logger, AppError, Supervisor/ProcessRegistry.
## Confidence + Gaps
**High:** Middleware order; BaseRouteHandler pattern; error shape; SSE setup.
**Gaps:** No auth/permission middleware (single-machine trust model assumed); validator duplication; blocking synchronous file I/O in `/` and `/api/instructions`; SSE race on connect-mid-broadcast.
@@ -0,0 +1,97 @@
# Flowchart: hybrid-search-orchestration
## Sources Consulted
- `src/services/worker/search/SearchOrchestrator.ts:1-290`
- `src/services/worker/search/strategies/ChromaSearchStrategy.ts:1-120`
- `src/services/worker/search/strategies/SQLiteSearchStrategy.ts:1-120`
- `src/services/worker/search/strategies/HybridSearchStrategy.ts:1-240`
- `src/services/worker/search/ResultFormatter.ts:1-200`
- `src/services/worker/search/TimelineBuilder.ts:1-220`
- `src/services/worker/SearchManager.ts:1-600`
- `src/services/worker/http/routes/SearchRoutes.ts:1-150`
## Happy Path Description
`/api/search``SearchRoutes``SearchManager.search()` (thin facade) → `SearchOrchestrator` chooses among three strategies:
**Path 1 (Filter-only):** No query text → `SQLiteSearchStrategy` does metadata-only filter via SessionSearch (date range, project, concept/type/file).
**Path 2 (Semantic):** Query text + ChromaSync available → `ChromaSearchStrategy.queryChroma` → filter by recency (90-day default or custom) → categorize by doc type → hydrate from SQLite. If Chroma fails mid-query, orchestrator falls back to filter-only SQLite (drops the query term).
**Path 3 (Hybrid):** `findByConcept|Type|File` specialty methods → `HybridSearchStrategy` two-phase: (1) SQLite metadata filter → all matching IDs; (2) Chroma semantic ranking → re-rank; (3) intersect + hydrate → return metadata-matched IDs in Chroma rank order.
`ResultFormatter` renders markdown tables grouped by date/file. `TimelineBuilder` handles chronological grouping with anchor-based depth filtering.
## Mermaid Flowchart
```mermaid
flowchart TD
A["GET /api/search<br/>SearchRoutes.ts:22"] --> B["SearchManager.search<br/>SearchManager.ts:161"]
B --> C["SearchOrchestrator.search<br/>SearchOrchestrator.ts:71"]
C --> D{Decision<br/>SearchOrchestrator.ts:81}
D -->|no query| E["SQLiteStrategy.search<br/>SQLiteSearchStrategy.ts:38"]
D -->|query + Chroma| F["ChromaStrategy.search<br/>ChromaSearchStrategy.ts:42"]
D -->|no Chroma| G["Return empty<br/>SearchOrchestrator.ts:115"]
E --> E1["SessionSearch.searchObservations/Sessions/Prompts"]
E1 --> E4["StrategySearchResult<br/>SearchOrchestrator.ts:98"]
F --> F1["ChromaSync.queryChroma<br/>ChromaSearchStrategy.ts:104"]
F1 --> F3["filterByRecency 90d<br/>SearchOrchestrator.ts:119"]
F3 --> F4["categorizeByDocType<br/>SearchOrchestrator.ts:120"]
F4 --> F5["hydrate from SQLite"]
F5 --> F6["StrategySearchResult usedChroma=true"]
F --> F7[/Error?/]
F7 -->|yes| F8["SQLiteStrategy fallback<br/>SearchOrchestrator.ts:102"]
F8 --> E4_Fallback["fellBack=true<br/>SearchOrchestrator.ts:107"]
E4 --> H["SearchManager formats<br/>SearchManager.ts:320-444"]
E4_Fallback --> H
F6 --> H
G --> H
H --> Hfmt{format?}
Hfmt -->|json| H1["Raw JSON"]
Hfmt -->|markdown| H2["ResultFormatter.formatSearchResults<br/>ResultFormatter.ts:25"]
H2 --> H3["combineResults<br/>ResultFormatter.ts:115"]
H3 --> H4["groupByDate<br/>ResultFormatter.ts:49"]
H4 --> H5["groupByFile<br/>ResultFormatter.ts:61"]
H5 --> H9["Markdown tables"]
J["findByConcept/Type/File<br/>SearchOrchestrator.ts:126-180"] --> K["HybridStrategy<br/>HybridSearchStrategy.ts:26"]
K --> K1["Phase 1: SessionSearch metadata filter<br/>HybridSearchStrategy.ts:74/112/152"]
K1 --> K2["Phase 2: ChromaSync.queryChroma<br/>HybridSearchStrategy.ts:180/208"]
K2 --> K3["Phase 3: intersectWithRanking<br/>HybridSearchStrategy.ts:228"]
K3 --> K4["hydrate SQLite<br/>HybridSearchStrategy.ts:188"]
K4 --> K5["StrategySearchResult usedChroma=true"]
L["TimelineBuilder.buildTimeline<br/>TimelineBuilder.ts:46"] --> L1["Unify obs/sessions/prompts"]
L1 --> L2["filterByDepth<br/>TimelineBuilder.ts:73"]
L2 --> L3["formatTimeline<br/>TimelineBuilder.ts:124"]
```
## Side Effects
- Chroma unavailability → fallback to filter-only SQLite (drops query text).
- Default 90-day recency filter unless `dateRange` is explicit.
- HybridStrategy errors → metadata-only results with `fellBack=true`.
- SearchManager normalizes comma-separated URL params → arrays.
## External Feature Dependencies
**Calls into:** ChromaSync, SessionSearch (SQLite FTS5), SessionStore (hydration), ModeManager (type icons), timeline-formatting helpers.
**Called by:** Search routes, mem-search skill, CorpusBuilder (via SearchOrchestrator).
## Important Clarification: SearchManager vs SearchOrchestrator
- **SearchOrchestrator** is the canonical strategy coordinator introduced in Jan 2026 monolith refactor.
- **SearchManager** is a **thin facade** delegating to SearchOrchestrator, plus HTTP/display wrapping.
- **NOT duplicates.** But SearchManager retains legacy private methods (`queryChroma`, `searchChromaForTimeline` marked `@deprecated`) — candidates for cleanup.
## Confidence + Gaps
**High:** Three paths + fallback chains; SearchManager is thin facade; TimelineBuilder is standalone formatter.
**Gaps:** Pagination enforcement across strategies; CorpusBuilder's exact call into SearchOrchestrator; deprecated SearchManager methods still present.
@@ -0,0 +1,87 @@
# Flowchart: knowledge-corpus-builder
## Sources Consulted
- `src/services/worker/knowledge/CorpusBuilder.ts:1-174`
- `src/services/worker/knowledge/KnowledgeAgent.ts:1-284`
- `src/services/worker/knowledge/CorpusRenderer.ts:1-133`
- `src/services/worker/knowledge/CorpusStore.ts:1-127`
- `src/services/worker/http/routes/CorpusRoutes.ts:1-284`
- `src/services/worker/search/SearchOrchestrator.ts:1-80`
- `src/services/worker/search/ResultFormatter.ts:1-100`
- `src/services/context/formatters/AgentFormatter.ts:1-100`
## Happy Path Description
`POST /api/corpus``handleBuildCorpus``CorpusBuilder.build()` maps filters to `SearchOrchestrator.search()` → extract IDs → `SessionStore.getObservationsByIds()` hydrates full records → map to `CorpusObservation` → compute stats (type breakdown, date range) → `CorpusRenderer.generateSystemPrompt()``CorpusRenderer.renderCorpus()` produces full-detail markdown → persist to `~/.claude-mem/corpora/{name}.corpus.json` via `CorpusStore.write`.
`POST /api/corpus/:name/prime``KnowledgeAgent.prime()` → render full corpus text + system prompt → pass to Claude Agent SDK `query()` → capture `session_id` → persist in corpus.json.
`POST /api/corpus/:name/query``KnowledgeAgent.query()` resumes SDK session by id, agent answers from corpus context, auto-reprimes on expiration.
## Mermaid Flowchart
```mermaid
flowchart TD
A["POST /api/corpus<br/>CorpusRoutes.ts:43"] --> B["handleBuildCorpus"]
B --> C["CorpusBuilder.build<br/>CorpusBuilder.ts:50"]
C --> D["SearchOrchestrator.search<br/>CorpusBuilder.ts:64"]
D --> E["SessionStore.getObservationsByIds<br/>CorpusBuilder.ts:82"]
E --> F["mapObservationToCorpus<br/>CorpusBuilder.ts:126"]
F --> G["calculateStats<br/>CorpusBuilder.ts:146"]
G --> H["CorpusRenderer.generateSystemPrompt<br/>CorpusBuilder.ts:109"]
H --> I["CorpusRenderer.renderCorpus (estimate tokens)<br/>CorpusBuilder.ts:112"]
I --> J["CorpusStore.write<br/>CorpusBuilder.ts:116"]
J --> K[(~/.claude-mem/corpora/{name}.corpus.json<br/>CorpusStore.ts:14)]
L1["GET /api/corpus/:name"] --> L3["CorpusStore.read<br/>CorpusStore.ts:39"]
L3 --> K
M["POST /api/corpus/:name/prime<br/>CorpusRoutes.ts:213"] --> N["KnowledgeAgent.prime<br/>KnowledgeAgent.ts:58"]
N --> P["CorpusRenderer.renderCorpus<br/>CorpusRenderer.ts:14"]
P --> Q["Claude Agent SDK query<br/>KnowledgeAgent.ts:75"]
Q --> R["session_id captured<br/>KnowledgeAgent.ts:89"]
R --> S["CorpusStore.write update session_id<br/>KnowledgeAgent.ts:114"]
T["POST /api/corpus/:name/query<br/>CorpusRoutes.ts:235"] --> V["KnowledgeAgent.query<br/>KnowledgeAgent.ts:125"]
V --> W["Agent SDK resume session_id<br/>KnowledgeAgent.ts:190-200"]
W --> X{Session expired?}
X -->|Yes| Y["auto-reprime<br/>KnowledgeAgent.ts:148"]
X -->|No| Z["Return answer"]
AA["POST /api/corpus/:name/rebuild"] --> C
AB["POST /api/corpus/:name/reprime"] --> N
AC["DELETE /api/corpus/:name"] --> AD["CorpusStore.delete<br/>CorpusStore.ts:94"]
```
## Side Effects
- Writes `{name}.corpus.json` in `~/.claude-mem/corpora/`.
- Spawns Claude Agent SDK subprocess for prime/query.
- Creates `OBSERVER_SESSIONS_DIR` if absent.
- Environment isolation via `buildIsolatedEnv`.
## External Feature Dependencies
**Calls into:** SearchOrchestrator (strategy routing), SessionStore (hydration), Anthropic Claude Agent SDK, SettingsDefaultsManager, ChromaSync (indirect through hybrid).
**Called by:** CorpusRoutes HTTP endpoints; knowledge-agent skill (external).
## Potential Duplication Noted
**CorpusRenderer vs ResultFormatter vs AgentFormatter** — all three produce markdown from observations:
| Renderer | Audience | Density | Grouping |
|---|---|---|---|
| ResultFormatter | CLI search results | Compact table rows | Date/file |
| AgentFormatter | Session context injection | Compact per-line | Day timeline |
| CorpusRenderer | Agent priming corpus | FULL DETAIL narrative-first | List or chronological |
**No direct code reuse** but all three independently iterate observations and format markdown. Consolidating on a shared rendering interface (base class or strategy) could reduce surface area if output configurations overlap.
**Search logic NOT duplicated** — CorpusBuilder correctly delegates to SearchOrchestrator.
## Confidence + Gaps
**High:** Build → prime → query flow; 8 HTTP endpoints; session reprime on expiration.
**Gaps:** Exact "session expired" detection (regex match at KnowledgeAgent.ts:179); token heuristic (chars/4 at CorpusRenderer.ts:91); no quota enforcement for corpus count/size.
@@ -0,0 +1,128 @@
# Flowchart: lifecycle-hooks
## Sources Consulted
- `src/cli/hook-command.ts:1-122`
- `src/cli/handlers/index.ts:1-72`
- `src/cli/handlers/context.ts:1-95` (SessionStart)
- `src/cli/handlers/session-init.ts:1-192` (UserPromptSubmit)
- `src/cli/handlers/observation.ts:1-86` (PostToolUse)
- `src/cli/handlers/summarize.ts:1-170` (Stop / Summary phase)
- `src/cli/handlers/session-complete.ts:1-66` (Stop / Completion phase)
- `src/cli/handlers/user-message.ts:1-54` (SessionStart parallel)
- `src/cli/adapters/claude-code.ts:1-45`
- `src/hooks/hook-response.ts:1-12`
- `src/shared/hook-constants.ts:1-35`
- `src/services/worker-service.ts:1-100`
- `src/supervisor/index.ts:1-100`
- `src/services/worker/http/routes/SessionRoutes.ts:1-330`
- `src/services/worker/http/routes/SearchRoutes.ts:1-150`
- `src/services/infrastructure/GracefulShutdown.ts:1-100`
- `src/supervisor/process-registry.ts:1-80`
- `src/services/worker-spawner.ts:1-150`
## Happy Path Description
Claude-Mem's lifecycle-hooks system intercepts Claude Code's session lifecycle events and routes them through specialized handlers that coordinate session tracking, tool observation capture, semantic context injection, and session summarization.
**SessionStart** fires immediately when a session begins. The **context handler** ensures the worker daemon is running, queries the Chroma vector database for relevant past observations, and returns them as `additionalContext` for injection into Claude's prompt. In parallel, **user-message** displays formatted context information to the user's terminal and broadcasts the worker's live dashboard URL. Both handlers gracefully degrade if the worker is unavailable.
**UserPromptSubmit** fires when the user submits their first prompt. The **session-init handler** calls `/api/sessions/init` to create a session record in the database, captures the prompt, checks privacy settings, and optionally starts the Claude SDK agent. If semantic injection is enabled, it fetches relevant observations via `/api/context/semantic` and injects them as additional context alongside the user's prompt.
**PostToolUse** fires after Claude executes each tool. The **observation handler** sends the tool usage (name, input, response) to `/api/sessions/observations` where the worker validates privacy rules, enriches the observation with cwd/platform metadata, stores it in SQLite, and queues an async Chroma embedding for semantic search.
**Stop** hook fires when a session ends. This is split into two phases with different timing guarantees: **summarize handler** queues the session's final assistant message to `/api/sessions/summarize` and then polls `/api/sessions/status` to wait (up to 110s) for the SDK agent to finish processing the summary, then calls `/api/sessions/complete`. The **session-complete handler** (phase 2) marks the session inactive in the sessions map.
## Mermaid Flowchart
```mermaid
flowchart TD
Start([Claude Code Session<br/>Lifecycle Event]) --> Dispatch{Event Type?<br/>hook-command.ts:88}
Dispatch -->|SessionStart| CtxSetup["ensureWorkerRunning<br/>worker-spawner.ts:100"]
Dispatch -->|UserPromptSubmit| InitSetup["ensureWorkerRunning<br/>worker-spawner.ts:100"]
Dispatch -->|PostToolUse| ObsSetup["ensureWorkerRunning<br/>worker-spawner.ts:100"]
Dispatch -->|Stop| SumSetup["Check if subagent<br/>summarize.ts:34"]
CtxSetup -->|Worker unavailable| CtxEmpty["Return empty context<br/>context.ts:44-46"]
CtxSetup -->|Worker ready| CtxFetch["Fetch /api/context/inject<br/>context.ts:54-56"]
CtxFetch --> CtxInject["Return additionalContext<br/>context.ts:88-93"]
CtxInject --> UMsgStart["userMessageHandler parallel<br/>user-message.ts:32"]
UMsgStart --> UMsgFetch["GET /api/context/inject (colors)<br/>user-message.ts:13-29"]
UMsgFetch --> UMsgDisplay["Write formatted ctx to stderr<br/>user-message.ts:24-28"]
InitSetup --> InitGuard["Validate session + cwd + project<br/>session-init.ts:51-61"]
InitGuard --> InitCall["POST /api/sessions/init<br/>session-init.ts:75-84"]
InitCall --> InitProcess["Receive sessionDbId + promptNumber<br/>session-init.ts:97-106"]
InitProcess --> InitSDK["POST /sessions/{id}/init start SDK<br/>session-init.ts:141-150"]
InitSDK --> InitSemantic["Semantic injection enabled?<br/>session-init.ts:158-159"]
InitSemantic -->|Yes| SemanticFetch["POST /api/context/semantic<br/>session-init.ts:164-165"]
SemanticFetch --> SemanticInject["Return additionalContext<br/>session-init.ts:179-188"]
ObsSetup --> ObsGuard["Validate toolName + cwd + not excluded<br/>observation.ts:40-62"]
ObsGuard --> ObsSend["POST /api/sessions/observations<br/>observation.ts:65-77"]
ObsSend --> ObsDB["Worker stores + queues Chroma embed<br/>SessionRoutes.ts:30"]
SumSetup -->|Not subagent| SumEnsure["ensureWorkerRunning<br/>summarize.ts:44"]
SumEnsure --> SumValidate["Extract last assistant msg<br/>summarize.ts:50-78"]
SumValidate --> SumQueue["POST /api/sessions/summarize<br/>summarize.ts:86-104"]
SumQueue --> SumPoll["Poll /api/sessions/status 500ms up to 110s<br/>summarize.ts:117-150"]
SumPoll --> SumComplete["POST /api/sessions/complete<br/>summarize.ts:156-161"]
SumComplete --> SessionComplete["sessionCompleteHandler phase 2<br/>session-complete.ts:32"]
SessionComplete --> SCSend["POST /api/sessions/complete<br/>remove from active map<br/>session-complete.ts:54"]
CtxEmpty --> Done([Exit code 0<br/>hook-command.ts:106])
UMsgDisplay --> Done
SemanticInject --> Done
ObsDB --> Done
SCSend --> Done
```
## Side Effects
**HTTP Calls to Worker (port 37777):**
- `GET /api/context/inject` — returns markdown context for injection
- `POST /api/sessions/init` — creates session record, returns sessionDbId
- `POST /api/context/semantic` — semantic search on Chroma
- `POST /sessions/{sessionDbId}/init` — starts SDK agent
- `POST /api/sessions/observations` — stores tool usage observation
- `POST /api/sessions/summarize` — queues summary generation
- `GET /api/sessions/status` — polls queue length
- `POST /api/sessions/complete` — marks session inactive
**Database (SQLite via worker):**
- Inserts into `sdk_sessions`, `user_prompts`, `observations`
- Updates `sdk_sessions.summary` with `summary_stored` flag
**Process Management:**
- `ensureWorkerStarted` spawns worker daemon via `spawnDaemon` if not alive
- SDK agent subprocess spawned per session
- Summarize handler waits up to 110s for SDK agent to finish
**File I/O:**
- Worker PID file at `~/.claude-mem/worker.pid`
- Hook logs at `~/.claude-mem/logs/hook.log`
## External Feature Dependencies
**Calls into:**
- **context-injection-engine** (via `/api/context/inject`, `/api/context/semantic`)
- **sqlite-persistence** (all writes via worker HTTP)
- **vector-search-sync** (async Chroma embeds)
- **session-lifecycle-management** (session state, SDK subprocess)
- **privacy-tag-filtering** (observation content filtered before storage)
- **http-server-routes** (all HTTP communication)
**Called by:**
- Claude Code CLI plugin harness (registered hooks)
- Cursor IDE (routed through observation handler)
- Gemini CLI / OpenRouter adapters
## Confidence + Gaps
**High Confidence:** Hook lifecycle → handler mapping; HTTP endpoints + payloads; graceful degradation on worker unavailability; exit code 0 strategy.
**Medium Confidence:** Exact SDK agent lifecycle and crash recovery; Cursor hook integration paths.
**Gaps:** Hook installer (how hooks register in Claude Code settings); TypeScript build → CLI entry process.
@@ -0,0 +1,86 @@
# Flowchart: privacy-tag-filtering
## Sources Consulted
- `src/utils/tag-stripping.ts:1-92`
- `src/services/worker/http/routes/SessionRoutes.ts:1-900`
- `src/services/worker/SessionManager.ts:270-360`
- `src/services/sqlite/PendingMessageStore.ts:1-100`
- `src/cli/handlers/summarize.ts:1-150`
- `src/shared/transcript-parser.ts:1-130`
## Happy Path Description
User submits a prompt containing `<private>` tags via hook → Worker HTTP endpoint `/api/sessions/init` receives request → `SessionRoutes.handleSessionInitByClaudeId` (line 814) validates and extracts the prompt. At line 862, `stripMemoryTagsFromPrompt()` is called, which invokes `stripTagsInternal()` to remove six tag types: `<claude-mem-context>`, `<private>`, `<system_instruction>`, `<system-instruction>`, `<persisted-output>`, and `<system-reminder>`. The cleaned prompt is saved to `user_prompts`. Concurrently, tool observations flow through `handleObservationsByClaudeId` (line 565), where `tool_input` and `tool_response` are stringified and stripped via `stripMemoryTagsFromJson()` (lines 629, 633), then queued to `PendingMessageStore` as already-cleaned data.
Stripping occurs BEFORE persistence, ensuring the database never receives unfiltered content. However, the **assistant-message summarize path** only strips `<system-reminder>` at extraction time (summarize.ts:66), not the full suite — a known gap.
## Mermaid Flowchart
```mermaid
flowchart TD
Start([User prompt with tags<br/>SessionRoutes.ts:814]) --> Init["handleSessionInitByClaudeId<br/>SessionRoutes.ts:814"]
Start2([Tool invocation completes<br/>SessionRoutes.ts:565]) --> ObsRoute["handleObservationsByClaudeId<br/>SessionRoutes.ts:565"]
Start3([Session stops, summarize<br/>summarize.ts:66]) --> Extract["extractLastMessage stripSystemReminders=true<br/>summarize.ts:66"]
Init --> StripPrompt["stripMemoryTagsFromPrompt<br/>SessionRoutes.ts:862"]
StripPrompt --> StripInternal1["stripTagsInternal (all 6 tags)<br/>tag-stripping.ts:51"]
StripInternal1 --> RemoveTags1["Remove private, claude-mem-context,<br/>system_instruction, system-reminder,<br/>persisted-output, system-instruction<br/>tag-stripping.ts:53-59"]
RemoveTags1 --> CheckEmpty{Empty?<br/>SessionRoutes.ts:865}
CheckEmpty -->|Yes| SkipPrivate["Return skipped=true<br/>SessionRoutes.ts:872"]
CheckEmpty -->|No| SavePrompt["saveUserPrompt<br/>SessionRoutes.ts:882"]
SavePrompt --> DBPrompt["INSERT user_prompts<br/>SessionStore.ts"]
ObsRoute --> ExtractObs["Extract tool_input, tool_response<br/>SessionRoutes.ts:587"]
ExtractObs --> StripInput["stripMemoryTagsFromJson input<br/>SessionRoutes.ts:629"]
StripInput --> StripInternal2["stripTagsInternal<br/>tag-stripping.ts:51"]
StripInternal2 --> StripResponse["stripMemoryTagsFromJson response<br/>SessionRoutes.ts:633"]
StripResponse --> StripInternal3["stripTagsInternal<br/>tag-stripping.ts:51"]
StripInternal3 --> QueueObs["queueObservation<br/>SessionRoutes.ts:637"]
QueueObs --> EnqueueDB["PendingMessageStore.enqueue<br/>PendingMessageStore.ts:63"]
EnqueueDB --> DBObs["pending_messages cleaned"]
Extract --> PartialStrip["SYSTEM_REMINDER_REGEX only<br/>shared/transcript-parser.ts:84"]
PartialStrip --> SummarizeRoute["handleSummarizeByClaudeId<br/>SessionRoutes.ts:669"]
SummarizeRoute --> QueueSum["queueSummarize last_assistant_message<br/>SessionRoutes.ts:705"]
QueueSum --> PendingSum["pending_messages with INCOMPLETE strip"]
style PartialStrip fill:#fff9c4
style PendingSum fill:#fff9c4
style StripPrompt fill:#c8e6c9
style StripInput fill:#c8e6c9
style StripResponse fill:#c8e6c9
```
## Call Sites Inventory
| Location | Function | Data Protected | Tag Types | Entry |
|---|---|---|---|---|
| `SessionRoutes.ts:862` | `stripMemoryTagsFromPrompt()` | User prompts | All 6 | handleSessionInitByClaudeId |
| `SessionRoutes.ts:629` | `stripMemoryTagsFromJson()` | Tool inputs | All 6 | handleObservationsByClaudeId |
| `SessionRoutes.ts:633` | `stripMemoryTagsFromJson()` | Tool responses | All 6 | handleObservationsByClaudeId |
| `transcript-parser.ts:84` | `SYSTEM_REMINDER_REGEX` | None (read-time) | system-reminder only | Context extraction |
| `transcript-parser.ts:128` | `SYSTEM_REMINDER_REGEX` | None (read-time) | system-reminder only | Context extraction |
| `summarize.ts:66` | `extractLastMessage(..., true)` | Assistant msgs (summary path) | system-reminder only | Hook summarize handler |
| `SessionRoutes.ts:378` (LEGACY) | `handleObservations()` | Tool observations | **NONE** | Unused endpoint |
## Side Effects
- **ReDoS protection**: counts tags before regex, warns if > MAX_TAG_COUNT=100 (tag-stripping.ts:56-60).
- **Whitespace trim** after all replacements (tag-stripping.ts:65).
- **Multiple regex passes** — one per tag type. Could be unified.
## External Feature Dependencies
- **PrivacyCheckValidator** (SessionRoutes.ts:614) — after stripping, validates empty-result handling.
- **PendingMessageStore** — receives pre-cleaned data; no re-strip.
- **ResponseProcessor** — consumes pending messages; no re-strip.
- **ChromaSync** — operates on already-sanitized text from DB.
## Confidence + Gaps
**High confidence:** User prompts + tool observations fully stripped before DB write; ReDoS protection active.
**Known gaps:**
1. Assistant messages in summary path only strip `<system-reminder>`, not full suite (summarize.ts:66, SessionRoutes.ts:669).
2. Legacy endpoint `SessionRoutes.ts:378` has no stripping — stale route.
3. `stripTagsInternal` is called from two public wrappers (`stripMemoryTagsFromPrompt`, `stripMemoryTagsFromJson`) that differ only by caller context — minor DRY violation.
@@ -0,0 +1,100 @@
# Flowchart: response-parsing-storage
## Sources Consulted
- `src/services/worker/agents/ResponseProcessor.ts:49` (processAgentResponse)
- `src/sdk/parser.ts:1` (parseObservations, parseSummary, helpers)
- `src/services/worker/agents/ObservationBroadcaster.ts`
- `src/services/worker/agents/SessionCleanupHelper.ts`
- `src/services/sqlite/SessionStore.ts:1916` (storeObservations atomic)
- `src/services/worker/SDKAgent.ts`, `OpenRouterAgent.ts`, `GeminiAgent.ts` (callers)
- `src/services/sqlite/PendingMessageStore.ts`
## Happy Path Description
Agent returns final assistant text → `parseObservations` extracts `<observation>` blocks via regex, validates types, filters empty observations → `parseSummary` extracts `<summary>` (fallback coercion from observations if summary missing and `summaryExpected=true`) → ResponseProcessor detects non-XML responses (auth errors, garbage) and fails early → atomic transaction wraps both observation and summary storage with content-hash dedup → `confirmProcessed` deletes pending message (only AFTER commit) → SSE broadcasts observations + summaries → Chroma sync fire-and-forget → SessionCleanupHelper resets timestamp and broadcasts status → RestartGuard records success.
## Mermaid Flowchart
```mermaid
flowchart TD
A([Agent Returns Text<br/>SDKAgent.ts:266 / OpenRouterAgent.ts / GeminiAgent.ts]) --> B["processAgentResponse<br/>ResponseProcessor.ts:49"]
B --> C["Track lastGeneratorActivity"]
C --> D["Add to conversationHistory"]
D --> E["parseObservations<br/>parser.ts:33"]
E --> E1["Regex &lt;observation&gt; blocks"]
E1 --> E2["extractField / extractArrayElements"]
E2 --> E3["Validate type vs ModeManager"]
E3 --> E4["Skip ghost observations"]
E4 --> E6["ParsedObservation[]"]
D --> F["parseSummary<br/>parser.ts:122"]
F --> F1["Check &lt;skip_summary/&gt;"]
F1 --> F2["Regex &lt;summary&gt; block"]
F2 --> F5["coerceObservationToSummary fallback<br/>parser.ts:222"]
F5 --> F7["ParsedSummary or null"]
E6 --> G{Non-XML response?<br/>no tags + no obs}
F7 --> G
G -->|Yes| G2["Mark processingMessageIds FAILED"]
G2 --> G3([Return early])
G -->|No| H["Normalize null → empty string"]
H --> K["ATOMIC TX<br/>sessionStore.storeObservations<br/>SessionStore.ts:1916"]
K --> K1["computeContentHash"]
K1 --> K2["findDuplicateObservation 30s window"]
K2 --> K3["INSERT observations (or reuse id)"]
K3 --> K5["INSERT session_summaries if present"]
K5 --> K6["Return ids + epoch"]
K6 --> N["Circuit breaker: consecutiveSummaryFailures"]
N --> O["CLAIM-CONFIRM<br/>pendingStore.confirmProcessed each id"]
O --> O3["session.restartGuard.recordSuccess"]
O3 --> Q["syncAndBroadcastObservations<br/>ResponseProcessor.ts:270"]
Q --> Q1["getChromaSync().syncObservation FnF"]
Q1 --> Q2["worker.broadcastObservation SSE"]
Q2 --> Q3["Update folder CLAUDE.md if enabled"]
O3 --> R["syncAndBroadcastSummary<br/>ResponseProcessor.ts:363"]
R --> R1["syncSummary FnF"]
R1 --> R2["broadcastSummary SSE"]
Q3 --> S["cleanupProcessedMessages<br/>SessionCleanupHelper.ts:26"]
R2 --> S
S --> S1["Reset earliestPendingTimestamp"]
S1 --> S2["broadcastProcessingStatus"]
S2 --> T([End])
```
## Parsing Inventory
| Parser | Location | Tags | Notes |
|---|---|---|---|
| `parseObservations` | parser.ts:33 | `<observation>`, `<type>`, `<title>`, `<subtitle>`, `<narrative>`, `<facts>`, `<concept>`, `<files_read>`, `<files_modified>` | Validates types vs ModeManager; filters empty |
| `parseSummary` | parser.ts:122 | `<summary>`, `<skip_summary/>`, `<request>`, `<investigated>`, `<learned>`, `<completed>`, `<next_steps>`, `<notes>` | Skip-marker first; false-positive detection |
| `coerceObservationToSummary` | parser.ts:222 | obs → summary mapping | Fallback when summary missing + expected (#1633) |
| `extractField` | parser.ts:267 | Generic `<X>...</X>` | Non-greedy regex handles nested tags |
| `extractArrayElements` | parser.ts:282 | Generic `<Arr><Elem>...</Elem></Arr>` | Non-greedy, trims empties |
**Single parser architecture.** All XML parsing through `src/sdk/parser.ts`. No duplicate parsing layers.
## Side Effects
- Message queue cleanup via `confirmProcessed` (DELETE after commit).
- Chroma sync async fire-and-forget.
- SSE broadcasting to web UI.
- CLAUDE.md folder sync (feature-flagged).
- Session state tracking: `lastGeneratorActivity`, `lastSummaryStored`, `consecutiveSummaryFailures`, `restartGuard` metrics.
## External Feature Dependencies
**Calls into:** ModeManager (type validation), SettingsDefaultsManager, ChromaSync, SSEBroadcaster, PendingMessageStore, SessionStore.
**Called by:** SDKAgent, OpenRouterAgent, GeminiAgent (all agent providers).
## Confidence + Gaps
**High:** Single parser; atomic transaction; claim-confirm ordering; non-XML early-fail; coercion fallback.
**Gaps:** Chroma sync error propagation specifics; CLAUDE.md update error paths; content-hash window boundary conditions.
@@ -0,0 +1,125 @@
# Flowchart: session-lifecycle-management
## Sources Consulted
- `src/services/worker/SessionManager.ts:1-678`
- `src/services/worker/ProcessRegistry.ts:1-528`
- `src/services/queue/SessionQueueProcessor.ts:1-149`
- `src/services/sqlite/PendingMessageStore.ts:1-150`
- `src/supervisor/process-registry.ts:175-409`
- `src/services/worker-service.ts:173-174, 508-560, 1100-1111`
## Happy Path Description
1. HTTP request (SessionRoutes) triggers `SessionManager.initializeSession(sessionDbId)` (SessionManager.ts:118).
2. ActiveSession created in-memory with AbortController; stale memorySessionId cleared from DB (205-235).
3. SDK subprocess spawned via `createPidCapturingSpawn` → registered in supervisor ProcessRegistry (393, 57, supervisor/process-registry.ts:223).
4. Observations persisted to `PendingMessageStore` (claim-confirm) before processing (SessionManager.ts:276, PendingMessageStore.ts:63).
5. `SessionQueueProcessor.createIterator` yields messages via EventEmitter; resets stale-processing >60s on claim (SessionQueueProcessor.ts:32, PendingMessageStore.ts:99).
6. SDKAgent consumes iterator, updates `lastGeneratorActivity` per yield (SessionManager.ts:666).
7. Messages confirmed only after successful DB commit (prevents loss on crash).
8. Idle timeout (3 min) → `onIdleTimeout``session.abortController.abort()` → generator exits → session deleted (SessionManager.ts:651-655, 381).
9. Stuck-generator detection (5 min inactive) → `reapStaleSessions` SIGKILLs subprocess (516-568, 535).
10. Orphan reaper (30s) cleans dead sessions + system orphans + idle daemon children (ProcessRegistry.ts:349).
## Mermaid Flowchart
```mermaid
flowchart TD
A["SessionRoutes triggers init"] --> B["SessionManager.initializeSession<br/>SessionManager.ts:118"]
B --> C{In memory?}
C -->|Yes| D["Return cached"]
C -->|No| E["Create ActiveSession<br/>SessionManager.ts:205-235"]
E --> F["Clear stale memorySessionId<br/>SessionManager.ts:206-214"]
D --> G["SDKAgent.generateResponse<br/>SessionManager.ts:631-670"]
F --> G
G --> H["createPidCapturingSpawn<br/>ProcessRegistry.ts:393"]
H --> I["registerProcess<br/>ProcessRegistry.ts:57"]
I --> J["supervisor.registerProcess<br/>supervisor/process-registry.ts:223"]
K["queueObservation<br/>SessionManager.ts:276"] --> L["PendingMessageStore.enqueue<br/>PendingMessageStore.ts:63"]
L --> M["INSERT pending_messages status=pending"]
M --> N["emit 'message'"]
G --> O["getMessageIterator<br/>SessionManager.ts:631"]
O --> P["SessionQueueProcessor.createIterator<br/>SessionQueueProcessor.ts:32"]
P --> Q["claimNextMessage<br/>PendingMessageStore.ts:99"]
Q --> R["Reset processing>60s → pending<br/>PendingMessageStore.ts:107-116"]
R --> S["UPDATE status=processing"]
S --> T["Yield message<br/>SessionManager.ts:648"]
T --> U["lastGeneratorActivity=now<br/>SessionManager.ts:666"]
U --> V["SDK agent stores → confirmProcessed DELETE"]
V --> Q
Q -->|empty| Y["waitForMessage signal<br/>SessionQueueProcessor.ts:116"]
Y --> Z{idle >= 3min?}
Z -->|Yes| AA["onIdleTimeout<br/>SessionManager.ts:651"]
AA --> AB["abortController.abort"]
AB --> AC["Generator exits"]
AC --> AD["Auto-unregister on exit<br/>ProcessRegistry.ts:479"]
AC --> AF["SessionManager.deleteSession<br/>SessionManager.ts:381"]
AF --> AG["await generatorPromise 30s<br/>SessionManager.ts:392-403"]
AF --> AH["ensureProcessExit 5s<br/>ProcessRegistry.ts:185"]
AH -->|still alive| AI["SIGKILL escalation"]
AF --> AJ["supervisor reapSession SIGTERM→5s→SIGKILL<br/>supervisor/process-registry.ts:292"]
AF --> AL["sessions.delete + queues.delete<br/>SessionManager.ts:433-434"]
AL --> AM["onSessionDeletedCallback"]
AN["staleSessionReaperInterval 2min<br/>worker-service.ts:547"] --> AO["iterate active sessions<br/>SessionManager.ts:516-568"]
AO --> AP{idle > 5min?}
AP -->|Yes| AQ["detectStaleGenerator<br/>SessionManager.ts:59"]
AQ --> AR["SIGKILL<br/>SessionManager.ts:535"]
AR --> AS["abortController.abort"]
AO --> AU{idle > 15min?<br/>no generator + no pending}
AU -->|Yes| AF
AW["startOrphanReaper 30s<br/>ProcessRegistry.ts:508"] --> AX["reapOrphanedProcesses<br/>ProcessRegistry.ts:349"]
AX --> AY["getActiveSessionIds"]
AY --> AZ["Kill orphan PIDs"]
AX --> BB["killSystemOrphans ppid=1<br/>ProcessRegistry.ts:315"]
AX --> BC["killIdleDaemonChildren<br/>ProcessRegistry.ts:244"]
```
## Timer Inventory
| Timer | Purpose | Lifetime | Cleared On | Location |
|---|---|---|---|---|
| `waitForMessage()` setTimeout | Wait for next message or idle | Per message | clearTimeout or abort | SessionQueueProcessor.ts:145 |
| Idle timeout | Trigger onIdleTimeout at 3min | Per iterator session | resolves or signal aborts | SessionQueueProcessor.ts:130 |
| `staleSessionReaperInterval` | Reap stuck gens (5min) + old sessions (15min) | Worker lifetime | clearInterval on shutdown | worker-service.ts:547, 1108 |
| Orphan reaper (`startOrphanReaper`) | Kill dead-session procs, orphans, idle daemons | Worker lifetime | clearInterval returned | ProcessRegistry.ts:508 |
| Stale-processing self-heal | Atomic UPDATE reset >60s | Per claim (inline SQL) | n/a | PendingMessageStore.ts:106 |
| Generator-exit wait | 30s timeout on deleteSession | Per delete | AbortSignal.timeout + Promise.race | SessionManager.ts:397 |
| `ensureProcessExit` | 5s before SIGKILL | Per delete | setTimeout for escalation | ProcessRegistry.ts:200 |
## Side Effects
- Process registration persisted to supervisor.json.
- PendingMessage lifecycle persisted to SQLite (INSERT → UPDATE → DELETE).
- AbortController cascades through iterator.
- Pool-slot notification on process exit.
- Broadcast callbacks on session delete.
## External Feature Dependencies
**Calls into:** SQLite (pending_messages + sessions), supervisor ProcessRegistry, SDKAgent, RestartGuard, SSEBroadcaster.
**Called by:** SessionRoutes, DataRoutes, worker-service lifecycle (reapers, shutdown).
## Confidence + Gaps
**High:** Happy path; stale detection thresholds (5min generator, 15min session); 3-min idle timeout; 30s orphan reaper; claim-confirm; supervisor-delegated registry model.
**KNOWN GAPS (critical for duplication analysis):**
1. **ProcessRegistry duplication:** YES — two files exist:
- `src/services/worker/ProcessRegistry.ts` — worker-level facade
- `src/supervisor/process-registry.ts` — supervisor-level persistent registry
- NOT fully independent; worker-level delegates via `getSupervisor().getRegistry()`. But there is real surface-area duplication.
2. **staleSessionReaperInterval vs startUnifiedReaper:**
- `staleSessionReaperInterval` is ACTIVE at worker-service.ts:547.
- `startUnifiedReaper` NOT present in codebase search — observation notes suggest T31/T32 refactor planned to unify the two reapers but NOT yet implemented.
- Currently TWO independent reapers: `startOrphanReaper` (30s) + stale-session reaper (2min). Unification pending.
3. **MAX_SESSION_IDLE_MS (15 min)** is used only by reapStaleSessions — may be deprecated but code still in place.
@@ -0,0 +1,97 @@
# Flowchart: sqlite-persistence
## Sources Consulted
- `src/services/sqlite/Database.ts:1-349`
- `src/services/sqlite/migrations/runner.ts:1-1019`
- `src/services/sqlite/observations/store.ts:1-108`
- `src/services/sqlite/SessionStore.ts:1-500`
- `src/services/sqlite/PendingMessageStore.ts:1-150`
- `src/services/sqlite/index.ts:1-33`
## Happy Path Description
On startup, `ClaudeMemDatabase` opens a bun:sqlite connection to `DB_PATH`, optionally heals malformed schemas via Python sqlite3 wrapper, then applies PRAGMAs for WAL journaling and performance tuning (memory mapping, foreign keys, cache settings). The `MigrationRunner` runs 27 migrations in sequence, creating or altering core tables (`sdk_sessions`, `observations`, `session_summaries`, `user_prompts`, `pending_messages`) and their FTS5 virtual indexes. Each migration checks actual schema state via `PRAGMA table_info` to ensure idempotence across fresh installs, partial migrations, and cross-machine syncs.
A write cycle (e.g., `storeObservation`) computes a content hash for deduplication, checks for recent duplicates within a 30-second window, and if unique, INSERTs into `observations` with all structured fields. Reads use prepared statements with optional filtering, leveraging indexes on `created_at_epoch DESC`. Transaction boundaries are explicit via `db.transaction(fn)` wrappers. `PendingMessageStore.claimNextMessage()` self-heals stale processing messages (>60s) back to pending in a single transaction.
## Mermaid Flowchart
```mermaid
flowchart TD
Boot([Boot / SDK Call<br/>index.ts:1]) --> InitDB["ClaudeMemDatabase.ctor<br/>Database.ts:148"]
InitDB --> EnsureDir["ensureDir DATA_DIR<br/>Database.ts:151"]
EnsureDir --> OpenConn["new bun:sqlite Database<br/>Database.ts:155"]
OpenConn --> RepairSchema["repairMalformedSchema<br/>Database.ts:160"]
RepairSchema --> SetPRAGMAs["PRAGMA WAL/NORMAL/FK/mmap<br/>Database.ts:163-168"]
SetPRAGMAs --> MigRunner["new MigrationRunner<br/>Database.ts:171"]
MigRunner --> RunMigrations["runAllMigrations (27)<br/>Database.ts:172"]
RunMigrations --> Mig4["initializeSchema m4<br/>runner.ts:52-123"]
Mig4 --> Mig8["addObservationHierarchicalFields m8<br/>runner.ts:265-296"]
Mig8 --> Mig10["createUserPromptsTable m10<br/>runner.ts:383-433"]
Mig10 --> Mig16["createPendingMessagesTable m16<br/>runner.ts:506-548"]
Mig16 --> Mig22["addObservationContentHashColumn m22<br/>runner.ts:844-864"]
Mig22 --> Mig27["addObservationSubagentColumns m27<br/>runner.ts:982-1016"]
Mig27 --> Ready["DB Ready<br/>schema_versions sync'd"]
Ready --> UserWrite["storeObservation<br/>observations/store.ts:53"]
UserWrite --> ComputeHash["computeObservationContentHash<br/>observations/store.ts:21-29"]
ComputeHash --> CheckDup["findDuplicateObservation 30s window<br/>observations/store.ts:36-45"]
CheckDup -->|Dup| ReturnExisting["Return existing id+epoch"]
CheckDup -->|New| PrepareStmt["prepare INSERT observations<br/>observations/store.ts:77-82"]
PrepareStmt --> ExecInsert["stmt.run 17 params<br/>observations/store.ts:84-101"]
ExecInsert --> ReturnNew["Return id+epoch"]
Ready --> PendingMsg["PendingMessageStore.enqueue<br/>PendingMessageStore.ts:63"]
PendingMsg --> EnqueueStmt["INSERT pending_messages<br/>PendingMessageStore.ts:65-88"]
EnqueueStmt --> ClaimMsg["claimNextMessage TX<br/>PendingMessageStore.ts:99-144"]
ClaimMsg --> ResetStale["UPDATE stale → pending 60s<br/>PendingMessageStore.ts:107-115"]
ResetStale --> SelectNext["SELECT pending ORDER BY id LIMIT 1<br/>PendingMessageStore.ts:118-124"]
SelectNext --> MarkProcess["UPDATE status=processing<br/>PendingMessageStore.ts:129-134"]
Ready --> SessionWrite["SessionStore CRUD<br/>SessionStore.ts:34"]
SessionWrite --> SessionStmt["INSERT sdk_sessions<br/>SessionStore.ts:93-143"]
Ready --> UserRead["get observations<br/>observations/get.ts:14"]
UserRead --> PrepareQuery["prepare SELECT filters<br/>observations/get.ts:15-19"]
PrepareQuery --> ExecRead["stmt.get/all<br/>observations/get.ts:27-80"]
```
## Tables Owned
| Table | Owner | Purpose |
|---|---|---|
| `schema_versions` | MigrationRunner | Migration tracking |
| `sdk_sessions` | SessionStore | User + worker sessions |
| `observations` | Observations module | Work items (findings, actions) |
| `session_summaries` | Summaries module | Session conclusions |
| `user_prompts` | Prompts module | User input history |
| `pending_messages` | PendingMessageStore | Work queue (claim-confirm) |
| `observation_feedback` | SessionStore | Usage signals |
| `observations_fts` (virtual) | SessionSearch | FTS5 index |
| `session_summaries_fts` (virtual) | SessionSearch | FTS5 index |
| `user_prompts_fts` (virtual) | SessionStore | FTS5 index |
## Side Effects
**File I/O**: DB file, WAL (`db.sqlite-wal`), shared-memory (`db.sqlite-shm`).
**PRAGMAs**: `journal_mode=WAL`, `synchronous=NORMAL`, `foreign_keys=ON`, `temp_store=MEMORY`, `mmap_size=256MB`, `cache_size=10_000`.
**Transactions**: Single-connection architecture; explicit `db.transaction(fn)` for multi-step writes; `claimNextMessage` self-heals via transactional UPDATE.
**Schema Repair**: Python `sqlite3` subprocess invoked via `execFileSync('python3', ...)` for malformed-file recovery.
## External Feature Dependencies
**Called by:** SDK agents (observations/summaries), Response Processor, Search routes, Data import/export, Worker lifecycle.
**Calls into:** `bun:sqlite` driver, Python sqlite3 (repair only), logger, paths utility.
## Confidence + Gaps
**High:** init flow, migrations 4/16/22/27, dedup via content_hash + 30s window, claim-confirm with 60s stale reset.
**Medium:** FTS5 trigger mechanics, transaction isolation semantics under WAL.
**Gaps:** No explicit connection pool (single-writer via WAL); backup/restore not in scope.
@@ -0,0 +1,96 @@
# Flowchart: transcript-watcher-integration
## Sources Consulted
- `src/services/transcripts/watcher.ts:1-242`
- `src/services/transcripts/processor.ts:33-393`
- `src/services/transcripts/config.ts:1-100`
- `src/services/transcripts/types.ts:1-71`
- `src/services/worker-service.ts:91, 164, 466, 614-658`
- `src/services/integrations/CursorHooksInstaller.ts:1-100`
- `src/cli/handlers/observation.ts:1-87`
- `src/services/worker/http/routes/SessionRoutes.ts:378-660`
## Happy Path Description
Worker startup loads transcript-watch config and instantiates `TranscriptWatcher`. `FileTailer` uses `fs.watch()` on each JSONL transcript; on growth, reads new bytes and splits by newline. Each line is `JSON.parse`d and routed to `TranscriptEventProcessor.processEntry()`, which matches schema rules to classify the event (`session_init`, `tool_use`, `tool_result`, `session_end`). Per-session `SessionState` holds `pendingTools` map: `tool_use` stores name+input; `tool_result` retrieves pending, pairs with response, and calls `observationHandler.execute()` — which POSTs to `/api/sessions/observations` (the same endpoint used by lifecycle-hooks). On `session_end`, processor queues summary via `/api/sessions/summarize` and refreshes Cursor context via `/api/context/inject`.
## Mermaid Flowchart
```mermaid
flowchart TD
Start["Worker Start<br/>worker-service.ts:614"] --> Config["loadTranscriptWatchConfig<br/>config.ts:1"]
Config --> Watcher["new TranscriptWatcher<br/>watcher.ts:83-91"]
Watcher --> StartW["watcher.start<br/>watcher.ts:93"]
StartW --> SetupWatch["setupWatch per target<br/>watcher.ts:110-134"]
SetupWatch --> AddTailer["addTailer<br/>watcher.ts:169-210"]
AddTailer --> CreateTailer["new FileTailer<br/>watcher.ts:15-26"]
CreateTailer --> TailerStart["fs.watch filePath<br/>watcher.ts:28"]
TailerStart --> FileChange([File change event])
FileChange --> ReadNewData["readNewData<br/>watcher.ts:40-80"]
ReadNewData --> ParseLine["JSON.parse each line<br/>watcher.ts:220"]
ParseLine --> HandleLine["handleLine<br/>watcher.ts:212-236"]
HandleLine --> ProcessEntry["processor.processEntry<br/>processor.ts:36-46"]
ProcessEntry --> MatchRule["matchesRule<br/>processor.ts:42"]
MatchRule --> HandleEvent["handleEvent<br/>processor.ts:113-169"]
HandleEvent -->|session_init| SI["handleSessionInit<br/>processor.ts:138-142"]
HandleEvent -->|tool_use| TU["handleToolUse<br/>processor.ts:193-221"]
HandleEvent -->|tool_result| TR["handleToolResult<br/>processor.ts:224-246"]
HandleEvent -->|session_end| SE["handleSessionEnd<br/>processor.ts:309-320"]
SI --> SIhttp["POST /api/sessions/init"]
TU --> TUmap["session.pendingTools.set<br/>processor.ts:202"]
TR --> TRlookup["Lookup pending tool<br/>processor.ts:232-236"]
TRlookup --> SendObs["sendObservation<br/>processor.ts:240-244"]
SendObs --> ObsHandler["observationHandler.execute<br/>observation.ts:31-86"]
ObsHandler --> WorkerHttp["POST /api/sessions/observations<br/>observation.ts:77"]
WorkerHttp --> Routes["SessionRoutes.handleObservationsByClaudeId<br/>SessionRoutes.ts:565"]
Routes --> Strip["stripMemoryTagsFromJson<br/>SessionRoutes.ts:627-634"]
Strip --> Queue["sessionManager.queueObservation<br/>SessionRoutes.ts:637"]
Queue --> Gen["ensureGeneratorRunning<br/>SessionRoutes.ts:654"]
SE --> QS["queueSummary<br/>processor.ts:322-344"]
QS --> SumHttp["POST /api/sessions/summarize"]
SE --> UpdateCtx["updateContext<br/>processor.ts:346-392"]
UpdateCtx --> CtxHttp["GET /api/context/inject<br/>processor.ts:377"]
CtxHttp --> WriteAgentsMd["writeAgentsMd<br/>processor.ts:390"]
SE --> ClearState["sessions.delete<br/>processor.ts:319"]
```
## Side Effects
- Byte-offset state persisted to `transcript-watch-state.json`.
- Rescan timer every 5s for new transcript files (watcher.ts:124).
- PendingTools map state cleared after each paired observation.
- `AGENTS.md` context file written by Cursor session_end.
- SSE broadcast via existing pipeline when observations queued.
## External Feature Dependencies
**Calls into:** observationHandler (bridge), `/api/sessions/observations` endpoint (shared with lifecycle-hooks), `/api/sessions/summarize`, `/api/context/inject`. SessionManager processes identically regardless of source.
**Called by:** Worker-service initialization only; not user-invoked.
## Duplication with lifecycle-hooks?
**YES — significant re-implementation.** Both paths ingest observations, but via different capture mechanisms:
| Aspect | lifecycle-hooks | transcript-watcher |
|---|---|---|
| Source | Cursor/Claude Code PostToolUse hook | JSONL file via fs.watch + FileTailer |
| Tool pairing | Hook receives tool_name + response atomically | pendingTools map pairs tool_use + tool_result |
| Session init | observationHandler → sessionInitHandler | processor directly calls sessionInitHandler |
| HTTP transport | observationHandler → `/api/sessions/observations` | observationHandler → `/api/sessions/observations` (same) |
| Exclusion check | observationHandler checks `isProjectExcluded` | processor may skip this check; SessionRoutes enforces privacy |
| Storage convergence | SessionRoutes queue → SessionManager → SDK agent | SessionRoutes queue → SessionManager → SDK agent (same) |
**Conclusion:** transcript-watcher is a **parallel capture path** that re-implements session-init + observation dispatch logic but converges at the same HTTP endpoint. The pendingTools state machine is unique to transcripts. This is the clearest cross-feature duplication in the codebase and a prime target for Phase 3 unification.
## Confidence + Gaps
**High:** TranscriptWatcher → FileTailer → processor → observationHandler → shared HTTP endpoint.
**Medium:** Privacy filter coverage when bypassing observationHandler's exclusion check.
**Gaps:** FileTailer retry strategy on I/O errors; schema FieldSpec coalesce/default evaluation details; updateContext timing relative to sessionCompleteHandler.
@@ -0,0 +1,102 @@
# Flowchart: vector-search-sync
## Sources Consulted
- `src/services/sync/ChromaSync.ts:1-969`
- `src/services/sync/ChromaMcpManager.ts:1-509`
- `src/services/worker/agents/ResponseProcessor.ts:1-423`
- `src/services/worker/DatabaseManager.ts:1-100`
- `src/services/worker-service.ts:1-550`
- `src/services/infrastructure/WorktreeAdoption.ts:1-348`
- `src/services/infrastructure/GracefulShutdown.ts:1-110`
- `src/services/worker/SearchManager.ts:1-100`
## Happy Path Description
When a new observation is stored to SQLite, ResponseProcessor orchestrates two fire-and-forget async paths in parallel: (1) Database write commits the observation row transactionally, then (2) ChromaSync is notified via `syncObservation()` to send formatted documents to Chroma via MCP. If Chroma is disabled (`CLAUDE_MEM_CHROMA_ENABLED=false`), sync is skipped. ChromaMcpManager maintains a persistent singleton stdio connection to the chroma-mcp Python subprocess with lazy initialization, auto-reconnect with backoff, and graceful shutdown.
On worker startup, `ChromaSync.backfillAllProjects()` runs fire-and-forget to detect missing observations by comparing Chroma's metadata index with SQLite. It batches in 100-document chunks, formats each observation into multiple granular documents (one per field), and syncs to per-project collections named `cm__<sanitized_project>`.
## Mermaid Flowchart
```mermaid
flowchart TD
Start([Agent Response Returned<br/>ResponseProcessor.ts:49]) --> Parse["Parse Observations + Summary<br/>ResponseProcessor.ts:70-81"]
Parse --> StoreDB["Store to SQLite<br/>ResponseProcessor.ts:151"]
StoreDB --> ConfirmMsg["pendingStore.confirmProcessed<br/>ResponseProcessor.ts:206"]
ConfirmMsg --> SyncObsDef["syncAndBroadcastObservations<br/>ResponseProcessor.ts:270"]
ConfirmMsg --> SyncSumDef["syncAndBroadcastSummary<br/>ResponseProcessor.ts:363"]
SyncObsDef --> LoopObs["For each Observation<br/>ResponseProcessor.ts:280"]
LoopObs --> CheckChromaObs{Chroma Enabled?<br/>DatabaseManager.ts:34-39}
CheckChromaObs -->|Yes| CallSyncObs["getChromaSync().syncObservation<br/>ResponseProcessor.ts:286"]
CheckChromaObs -->|No| SkipObs["No-op skip"]
CallSyncObs --> SyncObsEntry["ChromaSync.syncObservation<br/>ChromaSync.ts:339"]
SyncObsEntry --> FormatObs["formatObservationDocs per field<br/>ChromaSync.ts:125"]
FormatObs --> EnsureCollObs["ensureCollectionExists<br/>ChromaSync.ts:96"]
EnsureCollObs --> AddDocObs["addDocuments batch<br/>ChromaSync.ts:262"]
AddDocObs --> SanitizeMeta["Filter null/empty metadata<br/>ChromaSync.ts:277-280"]
SanitizeMeta --> CallAddDocs["chromaMcp.callTool chroma_add_documents<br/>ChromaSync.ts:284"]
CallAddDocs --> CheckDupObs{ID Conflict?}
CheckDupObs -->|Yes| DelThenAdd["Delete then Re-add<br/>ChromaSync.ts:297-306"]
CheckDupObs -->|No| LogSuccess["Log success<br/>ChromaSync.ts:329"]
DelThenAdd --> LogSuccess
LogSuccess --> BroadcastObs["SSE broadcast<br/>ResponseProcessor.ts:312"]
SyncSumDef --> SyncSumEntry["ChromaSync.syncSummary<br/>ChromaSync.ts:384"]
SyncSumEntry --> FormatSum["formatSummaryDocs per field<br/>ChromaSync.ts:193"]
FormatSum --> CallAddSum["chroma_add_documents<br/>ChromaSync.ts:284"]
CallAddSum --> BroadcastSum["SSE broadcast<br/>ResponseProcessor.ts:403"]
InitWorker([Worker Initializes<br/>worker-service.ts:406-420]) --> InitDBMgr["dbManager.initialize<br/>DatabaseManager.ts:27"]
InitDBMgr --> CreateChromaSync["new ChromaSync<br/>DatabaseManager.ts:36"]
CreateChromaSync --> LazyMCP["ChromaMcpManager.getInstance<br/>ChromaMcpManager.ts:47"]
LazyMCP --> Backfill["backfillAllProjects FnF<br/>worker-service.ts:470"]
Backfill --> FetchProjects["SELECT DISTINCT project<br/>ChromaSync.ts:868"]
FetchProjects --> LoopProjects["For each project<br/>ChromaSync.ts:874"]
LoopProjects --> EnsureBackfilled["ensureBackfilled<br/>ChromaSync.ts:554"]
EnsureBackfilled --> GetChromaIds["getExistingChromaIds<br/>ChromaSync.ts:479"]
GetChromaIds --> RunPipeline["runBackfillPipeline<br/>ChromaSync.ts:575"]
RunPipeline --> BackfillObs["backfillObservations<br/>ChromaSync.ts:603"]
BackfillObs --> BackfillSum["backfillSummaries<br/>ChromaSync.ts:652"]
BackfillSum --> BackfillPrompts["backfillPrompts<br/>ChromaSync.ts:701"]
SearchFlow([User Search Query<br/>SearchManager.ts:56]) --> QueryChroma["chromaSync.queryChroma<br/>SearchManager.ts:59"]
QueryChroma --> CallQuery["chroma_query_documents<br/>ChromaSync.ts:768"]
CallQuery --> Dedupe["deduplicateQueryResults<br/>ChromaSync.ts:808"]
Shutdown([Worker Shutdown<br/>GracefulShutdown.ts:56]) --> StopChromaMcp["chromaMcpManager.stop<br/>GracefulShutdown.ts:73"]
StopChromaMcp --> KillSubproc["transport.close<br/>ChromaMcpManager.ts:357"]
```
## Side Effects
- **MCP Connection**: Singleton stdio connection to chroma-mcp, lazy-init, reconnect with backoff, graceful shutdown.
- **Per-project collections**: `cm__<sanitized_project>` naming.
- **Granular vectorization**: Observations split into multiple docs per field (3-5× vector count).
- **Batch reconciliation**: Duplicate IDs handled via delete-then-add within batch.
- **Fire-and-forget**: All sync is non-blocking; failures log but don't block.
- **Worktree metadata patching**: `merged_into_project` stamp applied idempotently.
## External Feature Dependencies
**Calls into:**
- `chroma-mcp` Python subprocess (via stdio MCP protocol)
- ChromaMcpManager (singleton lifecycle)
- SQLite (source of truth for backfill)
**Called by:**
- ResponseProcessor (observation/summary sync after DB write)
- SearchManager (read-side Chroma queries)
- WorktreeAdoption (post-merge metadata updates)
- Worker lifecycle (startup backfill, shutdown)
## Confidence + Gaps
**High Confidence**: Single sync implementation; fire-and-forget pattern; per-project metadata-scoped collections; lazy MCP init.
**Medium Confidence**: Exact chroma-mcp tool names verified via grep.
**Gaps**: Embedding model config is inside chroma-mcp package (not this codebase); HNSW/ANN parameters not visible.
@@ -0,0 +1,95 @@
# Flowchart: viewer-ui-layer
## Sources Consulted
- `src/ui/viewer/App.tsx:1-162`
- `src/ui/viewer/index.tsx:1-16`
- `src/ui/viewer/hooks/useSSE.ts:1-147`
- `src/ui/viewer/hooks/useSettings.ts:1-80`
- `src/ui/viewer/hooks/usePagination.ts:1-80`
- `src/ui/viewer/types.ts:1-80`
- `src/ui/viewer/components/Header.tsx:1-60`
- `src/ui/viewer/components/Feed.tsx:1-60`
- `src/ui/viewer/components/ObservationCard.tsx:1-60`
- `src/ui/viewer/components/ErrorBoundary.tsx:1-63`
- `src/ui/viewer/components/ContextSettingsModal.tsx:1-60`
- `src/services/worker/SSEBroadcaster.ts:1-77`
- `src/services/worker/http/routes/ViewerRoutes.ts`
## Component Tree
1. ErrorBoundary (root)
2. App (orchestrator)
3. Header — project/source filters, SSE status, theme toggle
4. Feed — interleaved cards, infinite scroll via IntersectionObserver
5. ObservationCard / SummaryCard / PromptCard
6. ContextSettingsModal
7. LogsDrawer
## Happy Path Description
User loads `http://localhost:37777` → static viewer.html served → React mounts via `index.tsx``<ErrorBoundary><App/></ErrorBoundary>` → App initializes hooks (`useSSE`, `useSettings`, `useTheme`, `usePagination`, `useStats`) → `useSSE` opens `EventSource('/stream')` → backend emits `initial_load` with catalog → Header + Feed render → IntersectionObserver triggers `handleLoadMore` on scroll → `pagination.*.loadMore()` fetches `/api/observations?offset=X&limit=20` → merged with live SSE data in `useMemo` (deduped by `(project, id)`) → re-render. Real-time events (`new_observation`, `new_summary`, `new_prompt`) update state → re-render. Settings modal saves via `POST /api/settings`.
## Mermaid Flowchart
```mermaid
flowchart TD
HTTP["GET /<br/>ViewerRoutes.ts"] --> EB["ErrorBoundary<br/>index.tsx:4"]
EB --> APP["App<br/>App.tsx:14"]
APP --> SSE["useSSE<br/>useSSE.ts:6"]
APP --> SETTINGS["useSettings<br/>useSettings.ts:8"]
APP --> PAGINATION["usePagination<br/>usePagination.ts:18"]
APP --> THEME["useTheme"]
APP --> STATS["useStats"]
SSE -->|EventSource| STREAM["/stream<br/>ViewerRoutes.handleSSEStream"]
STREAM --> BROADCASTER["SSEBroadcaster<br/>SSEBroadcaster.ts:15"]
BROADCASTER --> SSE
APP --> HEADER["Header<br/>Header.tsx:34"]
APP --> FEED["Feed<br/>Feed.tsx:18"]
APP --> MODAL["ContextSettingsModal"]
APP --> LOGS["LogsDrawer"]
HEADER --> FilterState[(currentFilter<br/>currentSource)]
FEED -->|IntersectionObserver| LoadMore["handleLoadMore"]
LoadMore --> PAGINATION
PAGINATION -->|GET /api/observations?offset=X| API_OBS["DataRoutes"]
FEED --> OBS["ObservationCard<br/>ObservationCard.tsx:33"]
FEED --> SUM["SummaryCard"]
FEED --> PRO["PromptCard"]
MODAL -->|POST /api/settings| API_SET["SettingsRoutes"]
```
## State Management
Hooks + local state; no Redux/Zustand/Context store.
- `useSSE`: observations, summaries, prompts, catalog, isConnected, isProcessing, queueDepth. EventSource events update.
- `useSettings`: settings object, isSaving, saveStatus.
- `usePagination`: per-datatype isLoading, hasMore, offsetRef, lastSelectionRef. Resets offset on filter change.
- `useTheme`: preference, applies to DOM.
- `useStats`: stats fetched once.
- App local: `currentFilter`, `currentSource`, `contextPreviewOpen`, `logsModalOpen`, `paginatedObservations/Summaries/Prompts`.
**Duplication note:** Observations live in both `useSSE().observations` (live) and App's `paginatedObservations` (older chunks). Merged in `useMemo` with `(project, id)` dedup.
## Side Effects
- EventSource auto-reconnect on error after `TIMING.SSE_RECONNECT_DELAY_MS`.
- IntersectionObserver setup/cleanup per Feed mount.
- Fetch settings + stats on mount.
- DOM theme attribute mutation.
## External Feature Dependencies
**Consumes:** SSEBroadcaster (backend SSE), DataRoutes (pagination), SettingsRoutes (config), SessionStore (catalog on init).
## Confidence + Gaps
**High:** SSE flow; hook composition; pagination; state merging.
**Medium:** Exact paginated response shape; catalog-update strategy (additive only).
**Gaps:** CSS layer; `TerminalPreview`, `ThemeToggle`, `GitHubStarsButton`; full LogsModal console capture; saveSettings error branch.
@@ -0,0 +1,125 @@
# Pathfinder Phase 2: Duplication Report
**Date**: 2026-04-21
**Method**: Two parallel subagents (within-feature + cross-feature) with source verification.
---
## Part A: Within-Feature Duplications
### A1. privacy-tag-filtering — redundant wrapper functions
- **Pattern**: `stripMemoryTagsFromPrompt` and `stripMemoryTagsFromJson` wrap `stripTagsInternal` with identical logic.
- **Locations**: `src/utils/tag-stripping.ts:79-91`
- **Consolidation shape**: Single `stripMemoryTags(content, context?)` with optional caller-context parameter.
### A2. context-injection-engine — independent formatter traversals
- **Pattern**: AgentFormatter, HumanFormatter, CorpusRenderer each independently iterate observations with identical icon/title/token/time lookup.
- **Locations**: `src/services/context/formatters/AgentFormatter.ts:36-200`, `src/services/context/formatters/HumanFormatter.ts:35-238`, `src/services/worker/knowledge/CorpusRenderer.ts:39-85`
- **Consolidation shape**: Shared `ObservationRenderer` base with pluggable header/row/footer methods.
### A3. hybrid-search-orchestration — strategy result post-processing
- **Pattern**: Grouping-by-date and grouping-by-file logic duplicated across strategies/formatter/timeline builder.
- **Locations**: `src/services/worker/search/SearchOrchestrator.ts:71-115`, `src/services/worker/search/ResultFormatter.ts:25-110`, `src/services/worker/search/TimelineBuilder.ts:124-240`
- **Consolidation shape**: Strategies return raw `SearchResults`; formatting centralized in `ResultFormatter`.
### A4. session-lifecycle-management — dual reapers
- **Pattern**: `staleSessionReaperInterval` (2m) and `startOrphanReaper` (30s) serve overlapping lifecycle goals.
- **Locations**: `src/services/worker-service.ts:547`, `src/services/worker/ProcessRegistry.ts:508`, `src/services/worker/SessionManager.ts:516`
- **Consolidation shape**: Single `UnifiedReaper` with pluggable check intervals per concern.
### A5. sqlite-persistence — migration boilerplate
- **Pattern**: 27 migrations repeat `CREATE TABLE IF NOT EXISTS`, ALTER logic, PRAGMA settings, and FK-preserving table recreation.
- **Locations**: `src/services/sqlite/migrations/runner.ts:52-123, 265-296, 383-433, ...`
- **Consolidation shape**: Extract `createTableWithDefaults`, `alterTableRename`, `recreateTableWithForeignKeys` helpers.
### A6. response-parsing-storage — parallel XML parsers
- **Pattern**: `parseObservations` and `parseSummary` use identical regex-based extraction helpers on different tag sets.
- **Locations**: `src/sdk/parser.ts:33-120` (obs), `src/sdk/parser.ts:122-240` (summary)
- **Consolidation shape**: `parseXmlContent(text, tagDefinitions)` driven by a registry.
### A7. session-lifecycle-management — ProcessRegistry layering
- **Pattern**: Worker-level `ProcessRegistry` is a facade over supervisor-level registry; surface duplication in registerProcess/unregisterProcess/getAll/getByPid.
- **Locations**: `src/services/worker/ProcessRegistry.ts:57-79`, `src/supervisor/process-registry.ts:175-409`
- **Consolidation shape**: Deprecate worker facade; expose supervisor registry directly.
### A8. knowledge-corpus-builder — observation metadata duplication
- **Pattern**: CorpusRenderer.renderObservation and AgentFormatter.renderAgentTableRow both format icon + title + tokens + time with nearly identical logic.
- **Locations**: `src/services/worker/knowledge/CorpusRenderer.ts:39-85`, `src/services/context/formatters/AgentFormatter.ts:127-137`
- **Consolidation shape**: Extract `formatObservationMetadata(obs, config)` returning structured metadata.
---
## Part B: Cross-Feature Duplications
### B1. Observation capture paths — LEGITIMATE
- **Locations**: `src/cli/handlers/observation.ts:31-86`, `src/services/transcripts/processor.ts:240-244`, `src/services/worker/http/routes/SessionRoutes.ts:565`
- **Verdict**: Both capture mechanisms are valid (sync IDE hook vs file-based JSONL) and converge at `/api/sessions/observations`. Divergence above the endpoint is intrinsic to their data sources.
### B2. Observation rendering — ACCIDENTAL
- **Locations**: `src/services/worker/search/ResultFormatter.ts:25-100`, `src/services/context/formatters/AgentFormatter.ts:36-80`, `src/services/worker/knowledge/CorpusRenderer.ts:14-80`
- **Verdict**: Audiences differ (CLI search results vs LLM context injection vs agent priming) but no shared interface — ~200 lines of overlapping logic. **Top candidate for unification.**
### B3. Observation storage write paths — MIXED
- **Locations**: `src/services/sqlite/observations/store.ts:53` (ResponseProcessor), `src/services/worker/http/routes/MemoryRoutes.ts` (manual save), `src/services/worker/http/routes/SessionRoutes.ts:637` (queueObservation → pending queue), `src/services/transcripts/processor.ts:252` (via observationHandler)
- **Verdict**: ResponseProcessor + PendingMessageStore path is intentional (queue + atomic write). MemoryRoutes manual insert is a deliberate feature. Transcript-watcher's re-delegation through observationHandler is **ACCIDENTAL** — could invoke `queueObservation` directly.
### B4. XML parser duplication — ACCIDENTAL
- **Locations**: `src/sdk/parser.ts:33-300` (canonical), `src/bin/import-xml-observations.ts:162` (parallel parseSummary in CLI import tool)
- **Verdict**: Import tool should reuse canonical parser. Type-validation bypass is a code smell and future schema drift risk.
### B5. Privacy tag stripping asymmetry — ACCIDENTAL + SECURITY GAP
- **Locations**: `src/utils/tag-stripping.ts:51` (full 6-tag strip for prompts + tool I/O), `src/utils/transcript-parser.ts:84` (system-reminder only at read time), `src/cli/handlers/summarize.ts:66` (system-reminder only for assistant-message summaries)
- **Verdict**: The summary path does NOT strip `<private>`, `<claude-mem-context>`, etc. from assistant messages before queuing. **Private content can leak into stored summaries.** Highest-priority fix.
### B6. Session initialization flow — LEGITIMATE
- **Locations**: `src/services/worker/http/routes/SessionRoutes.ts:814` (HTTP endpoint), `src/cli/handlers/session-init.ts:38-192` (CLI wrapper), `src/services/transcripts/processor.ts:185` (direct handler invocation)
- **Verdict**: HTTP is canonical; CLI wraps; transcript-watcher's direct-handler path avoids loopback — acceptable optimization.
### B7. Search entry points — LEGITIMATE
- **Locations**: `src/services/worker/search/SearchOrchestrator.ts:71` (canonical), `src/services/worker/SearchManager.ts:161` (thin HTTP facade), `src/services/worker/knowledge/CorpusBuilder.ts:64` (direct call)
- **Verdict**: SearchManager is explicitly a thin facade. CorpusBuilder's direct call intentionally skips HTTP display wrapping. Note: SearchManager retains legacy `@deprecated` private methods (`queryChroma`, `searchChromaForTimeline`) that should be removed as cleanup.
### B8. Process Registry duplication — ACCIDENTAL
- **Locations**: `src/services/worker/ProcessRegistry.ts:1-528`, `src/supervisor/process-registry.ts:1-409`
- **Verdict**: Worker is a facade delegating to supervisor, but API surface overlap (registerProcess/unregisterProcess/getAll/getByPid) duplicates. Worker wrapper adds minimal value beyond supervisor's own API.
### B9. Dual reapers / timers — ACCIDENTAL
- **Locations**: `src/services/worker-service.ts:547` (staleSessionReaperInterval 2min), `src/services/worker-service.ts:537` (startOrphanReaper 30s), `src/services/worker/SessionManager.ts:516-568` (reapStaleSessions body), `src/supervisor/process-registry.ts:292` (reapSession)
- **Verdict**: Historical separation. `startUnifiedReaper` was planned but not implemented. Currently two independent timers with overlapping concerns.
### B10. Database opening / migration — LEGITIMATE
- **Locations**: `src/services/sqlite/Database.ts:155` + migrations + Python repair path
- **Verdict**: Single connection (WAL enforces single writer); repair path is a legitimate safety net. Properly layered.
### B11. HTTP response shaping / validation — ACCIDENTAL
- **Locations**: All 8 route files under `src/services/worker/http/routes/`
- **Verdict**: Each route validates query/body independently. No shared validator middleware. Schema changes require N edits.
### B12. Context injection vs corpus builder — LEGITIMATE
- **Locations**: `src/services/context/ContextBuilder.ts` vs `src/services/worker/knowledge/CorpusBuilder.ts:64`
- **Verdict**: Both correctly delegate to SearchOrchestrator. Output formatting requirements differ enough to justify two call sites.
---
## Priority-Ordered Consolidation Opportunities
| # | Concern | Severity | Effort | Value |
|---|---|---|---|---|
| **P1** | **Privacy tag stripping asymmetry (summary path gap)** | SECURITY | Low | Closes private-tag leak into summaries |
| **P2** | **Unified observation renderer** (ResultFormatter / AgentFormatter / CorpusRenderer) | Code quality | Medium | ~600 lines consolidated; consistent rendering |
| **P3** | **Unified reaper** (staleSessionReaperInterval + startOrphanReaper → single unified reaper) | Complexity | Medium | Simpler lifecycle; matches stated intent (T32 refactor) |
| **P4** | **ProcessRegistry consolidation** (drop worker-level facade) | Surface area | Low | Single source of truth for process tracking |
| **P5** | **XML parser deduplication** (canonical parser in import tool) | Drift risk | Trivial | One-line import change; prevents schema divergence |
| **P6** | **HTTP validator middleware** (centralize per-route validation boilerplate) | Maintenance | High | Low ROI today; watchlist |
| **P7** | **Drop SearchManager `@deprecated` legacy methods** | Cleanup | Trivial | Dead code removal |
| **P8** | **Transcript-watcher direct `queueObservation`** (skip observationHandler hop) | Minor | Low | Small simplification |
---
## What is NOT duplication (legitimate specialization)
- Dual capture paths (lifecycle-hooks + transcript-watcher) — intrinsic to source diversity.
- HTTP endpoint vs CLI handler for session init — loopback vs direct invocation.
- SearchOrchestrator + SearchManager + CorpusBuilder search calls — thin facade + direct-path optimization.
- ContextBuilder vs CorpusBuilder — genuinely different output requirements.
- Database connection + migrations + Python repair — single connection, layered safety.
@@ -0,0 +1,269 @@
# Pathfinder Phase 3: Unified Architecture Proposal
**Date**: 2026-04-21
**Scope**: 8 unification targets derived from Phase 2 findings. Only accidental duplications — legitimate specializations are preserved untouched.
**Design principle**: Prefer deletion over abstraction. Prefer one path over configurable paths. If the simplest fix is "move the call site," do that instead of building a registry.
---
## U1. Close the Privacy-Stripping Summary Gap
**Current state**: `src/utils/tag-stripping.ts` exports `stripTagsInternal()` (all 6 tags) used at `SessionRoutes.ts:862` (user prompts) and `SessionRoutes.ts:629/633` (tool I/O). The summary-ingest path receives assistant messages stripped only of `<system-reminder>` (via `SYSTEM_REMINDER_REGEX` in `transcript-parser.ts:84` / `summarize.ts:66`), then queues them without a full-suite strip at `SessionRoutes.handleSummarizeByClaudeId:669+705`.
**Result**: `<private>`, `<claude-mem-context>`, `<system_instruction>`, `<persisted-output>` tags can reach `session_summaries` rows.
**Unified design**:
- Single entry point: `stripMemoryTags(content)` in `src/utils/tag-stripping.ts` (remove the two wrapper functions `stripMemoryTagsFromPrompt` / `stripMemoryTagsFromJson` — they already call the same internal function).
- Call `stripMemoryTags(last_assistant_message)` at `SessionRoutes.ts:~680` (inside `handleSummarizeByClaudeId`, before `queueSummarize`). This is a **three-line fix**.
**Replaces**:
- `src/utils/tag-stripping.ts:79-91` (delete both wrapper function exports, update 3 call sites to new name)
- Adds one call in `SessionRoutes.ts:~680`
**What's lost**: Nothing. No behavior change for non-summary paths.
---
## U2. Unified Observation Renderer
**Current state**: Three independent renderers produce markdown from observations:
| File | Audience | Shape |
|---|---|---|
| `src/services/worker/search/ResultFormatter.ts:25-200` | CLI search results | Compact tables grouped by date/file |
| `src/services/context/formatters/AgentFormatter.ts:36-200` | Session context injection | One-line-per-observation for LLM tokens |
| `src/services/context/formatters/HumanFormatter.ts:35-238` | Terminal context display | ANSI-colored human-readable |
| `src/services/worker/knowledge/CorpusRenderer.ts:14-133` | Agent priming corpus | Full-detail narrative sections |
Each independently looks up type icon (via ModeManager), computes tokens, formats title/subtitle, walks facts/concepts. ~600 lines of overlapping traversal.
**Unified design**: New `src/services/rendering/ObservationRenderer.ts` base with pluggable strategy:
```
ObservationRenderer {
// shared: type-icon lookup, token estimation, time formatting, facts/concepts walk
renderObservation(obs, strategy: RenderStrategy): string
}
RenderStrategy interface:
headerLine(obs) → string
detailLines(obs) → string[]
footerLine(obs) → string
groupingMode: 'date-file' | 'day-timeline' | 'none'
```
Concrete strategies:
- `SearchResultStrategy` (replaces ResultFormatter row-level logic)
- `AgentContextStrategy` (replaces AgentFormatter row-level logic)
- `HumanContextStrategy` (replaces HumanFormatter row-level logic)
- `CorpusDetailStrategy` (replaces CorpusRenderer row-level logic)
Shared grouping stays in `timeline-formatting.ts` utility (already exists).
**Replaces**:
- Traversal code in `ResultFormatter.ts:115-200`, `AgentFormatter.ts:86-137`, `HumanFormatter.ts:80-238`, `CorpusRenderer.ts:39-85`
- Keeps the four callers as thin wrappers that build a strategy and invoke the renderer.
**What's lost**: Nothing. Same outputs, one traversal.
**Anti-pattern to reject**: Do NOT build a plugin registry or factory. Four concrete strategy objects are sufficient — a simple switch or direct construction at call sites is fine.
---
## U3. Unified Reaper
**Current state**: Two independent timers with overlapping lifecycle concerns:
| Timer | Interval | Concern | Location |
|---|---|---|---|
| `staleSessionReaperInterval` | 2 min | reapStaleSessions (5-min stuck generators, 15-min stale sessions) | `worker-service.ts:547` |
| `startOrphanReaper` | 30 s | Dead-session PIDs, system orphans (ppid=1), idle daemon children | `ProcessRegistry.ts:508` |
The T32 observation notes explicitly state this unification was planned but not implemented. `reapStaleSessions` is distinct session-lifecycle logic; the orphan reaper is process-lifecycle only.
**Unified design**: `src/services/worker/UnifiedReaper.ts` with a single `setInterval` ticking every 30s. Each tick runs three checks **in order**, each skippable if its cooldown hasn't elapsed:
```
UnifiedReaper tick @30s:
1. reapOrphanedProcesses() — every tick (30s)
2. reapStaleGenerators() — every 4 ticks (2 min)
3. reapAbandonedSessions() — every 4 ticks (2 min, 15-min threshold)
```
Move `reapStaleSessions` body out of SessionManager into UnifiedReaper; keep `detectStaleGenerator` helper on SessionManager (session-owned logic).
**Replaces**:
- Delete `staleSessionReaperInterval` setup + teardown at `worker-service.ts:547, 1108-1110`
- Delete `startOrphanReaper` at `ProcessRegistry.ts:508`
- Delete `reapStaleSessions` body at `SessionManager.ts:516-568`
- Wire new `UnifiedReaper` into worker startup/shutdown
**What's lost**: Nothing functionally. The 30s orphan-reap cadence is preserved; the 2-min session cadence is preserved; call sites unify to one timer handle.
**Anti-pattern to reject**: Do NOT parameterize each check with its own separate timer. The point is ONE timer.
---
## U4. Single Process Registry
**Current state**:
- `src/services/worker/ProcessRegistry.ts` (528 lines) — worker-level facade. Delegates to supervisor via `getSupervisor().getRegistry()` for actual state.
- `src/supervisor/process-registry.ts` (409 lines) — supervisor-level persistent registry (supervisor.json).
The worker facade duplicates API surface (`registerProcess`, `unregisterProcess`, `getAll`, `getByPid`) but adds the spawn-wrapping helpers (`createPidCapturingSpawn`, `ensureProcessExit`).
**Unified design**: Keep `src/supervisor/process-registry.ts` as the sole registry. Move the spawn-wrapping helpers (the parts that DO add value) into `src/services/worker/process-spawning.ts` as plain functions, not another class. Delete `src/services/worker/ProcessRegistry.ts` and update imports to hit the supervisor registry directly.
**Replaces**:
- Delete `src/services/worker/ProcessRegistry.ts`
- Extract spawn helpers to `src/services/worker/process-spawning.ts`
- Update ~15 import sites to use `getSupervisor().getRegistry()` directly
**What's lost**: A layer of indirection that was mostly pass-through.
**Anti-pattern to reject**: Do NOT replace the worker facade with a "simpler worker facade." Just delete it.
---
## U5. Canonical XML Parser
**Current state**:
- `src/sdk/parser.ts` — canonical `parseObservations` + `parseSummary` with ModeManager type validation.
- `src/bin/import-xml-observations.ts:162` — parallel `parseSummary` for CLI import, missing type validation.
**Unified design**: Delete the inline parser in `import-xml-observations.ts` and call `parseSummary` from `src/sdk/parser.ts`. Pass an option flag to skip type validation if the import tool genuinely needs that (likely it doesn't — historical observations should still validate).
**Replaces**:
- `src/bin/import-xml-observations.ts:162` (delete ~40 lines; replace with import)
**What's lost**: Potentially: ability to import observations with types not currently in ModeManager. If that's actually needed, add `parseSummary(text, { strict: false })` option.
---
## U6. Single `stripMemoryTags` Export
**Current state**: `src/utils/tag-stripping.ts` exports three functions: `stripTagsInternal` (internal), `stripMemoryTagsFromPrompt` (wrapper), `stripMemoryTagsFromJson` (wrapper). The two public wrappers are identical.
**Unified design**: Keep `stripMemoryTags(content: string)` as the single public export. Remove the two wrappers. Update 3 call sites in SessionRoutes to new name.
**Replaces**:
- Delete `stripMemoryTagsFromPrompt` and `stripMemoryTagsFromJson` at `src/utils/tag-stripping.ts:79-91`
- Update `SessionRoutes.ts:629, 633, 862` (plus U1's new call at ~680)
**What's lost**: Nothing. Pure rename/deletion.
---
## U7. Remove SearchManager Legacy Methods
**Current state**: `src/services/worker/SearchManager.ts` retains private `@deprecated` methods (`queryChroma`, `searchChromaForTimeline`) that were superseded by SearchOrchestrator strategies.
**Unified design**: Delete the deprecated private methods. If any external caller exists (unlikely), update to use SearchOrchestrator directly.
**Replaces**: Dead code removal only.
**What's lost**: Nothing — these are flagged deprecated.
---
## U8. Transcript-Watcher Direct Queue
**Current state**: `src/services/transcripts/processor.ts:240-244` calls `observationHandler.execute()` which then POSTs to `/api/sessions/observations`, which calls `sessionManager.queueObservation()`. The HTTP loopback adds latency and an extra JSON round-trip for a same-process call.
**Unified design**: Have the transcript processor call `sessionManager.queueObservation()` directly (same as `SessionRoutes` does after validation). Move the privacy-check and tag-strip logic currently in `SessionRoutes.handleObservationsByClaudeId` into a shared helper `ingestObservation(payload)` that both SessionRoutes and TranscriptProcessor call.
**Replaces**:
- `src/services/transcripts/processor.ts:240-244` (skip observationHandler hop)
- Extract `ingestObservation` helper from `SessionRoutes.ts:565-659`
**What's lost**: Minor — the observationHandler's `isProjectExcluded` check runs in both paths; the extracted helper handles both.
---
## Combined Unified Flowchart
```mermaid
flowchart TD
subgraph Capture["Observation Capture (kept parallel — legitimate)"]
HOOK["lifecycle-hooks PostToolUse<br/>src/cli/handlers/observation.ts"]
TRANS["transcript-watcher tool_result<br/>src/services/transcripts/processor.ts"]
end
HOOK --> INGEST["ingestObservation helper<br/>shared: privacy + strip + queue<br/>(NEW, extracted from SessionRoutes.ts:565-659)"]
TRANS --> INGEST
INGEST --> STRIP["stripMemoryTags<br/>src/utils/tag-stripping.ts (U6)"]
STRIP --> QUEUE["sessionManager.queueObservation<br/>SessionManager.ts:276"]
QUEUE --> PMS[("PendingMessageStore<br/>PendingMessageStore.ts")]
PMS --> SDK["SDK agent processes<br/>SDKAgent / GeminiAgent / OpenRouterAgent"]
SDK --> PARSE["parseObservations + parseSummary<br/>src/sdk/parser.ts (canonical, U5)"]
PARSE --> RP["ResponseProcessor<br/>src/services/worker/agents/ResponseProcessor.ts:49"]
SUM["/api/sessions/summarize<br/>handleSummarizeByClaudeId"] --> STRIP
STRIP --> SUMQ["queueSummarize"]
RP --> STORE["sessionStore.storeObservations<br/>atomic TX<br/>(U1 also applies here)"]
STORE --> CHROMA["ChromaSync.syncObservation / syncSummary<br/>fire-and-forget"]
STORE --> SSE["SSEBroadcaster.broadcast"]
subgraph Render["Unified Observation Rendering (U2)"]
OR["ObservationRenderer<br/>(NEW) src/services/rendering/"]
OR --> SRS["SearchResultStrategy"]
OR --> ACS["AgentContextStrategy"]
OR --> HCS["HumanContextStrategy"]
OR --> CDS["CorpusDetailStrategy"]
end
SRS --> SEARCH_ROUTE["/api/search → ResultFormatter shell"]
ACS --> CTX_INJECT["/api/context/inject (LLM)"]
HCS --> CTX_INJECT_HUMAN["/api/context/inject (ANSI)"]
CDS --> CORPUS["CorpusBuilder → corpus.json"]
subgraph Lifecycle["Unified Reaper (U3)"]
UR["UnifiedReaper tick 30s<br/>(NEW) src/services/worker/UnifiedReaper.ts"]
UR -->|every tick| ORPHAN["reapOrphanedProcesses"]
UR -->|every 4 ticks| STALE_GEN["reapStaleGenerators (5min)"]
UR -->|every 4 ticks| STALE_SESS["reapAbandonedSessions (15min)"]
end
ORPHAN --> REG["supervisor ProcessRegistry (U4)<br/>single source of truth"]
STALE_GEN --> REG
STALE_SESS --> REG
subgraph Search["Search path (unchanged — legitimate)"]
SO["SearchOrchestrator<br/>SearchOrchestrator.ts:71"]
SO --> CH["ChromaSearchStrategy"]
SO --> SL["SQLiteSearchStrategy"]
SO --> HY["HybridSearchStrategy"]
end
SEARCH_ROUTE -.via.-> SO
CTX_INJECT -.semantic.-> SO
CORPUS -.build.-> SO
```
## Summary of Deletions
| Target | Lines removed (approx) |
|---|---|
| `stripMemoryTagsFromPrompt`/`stripMemoryTagsFromJson` wrappers | 20 |
| `src/bin/import-xml-observations.ts` inline parser | 40 |
| `src/services/worker/ProcessRegistry.ts` (mostly) | 400 |
| `staleSessionReaperInterval` + `startOrphanReaper` + `reapStaleSessions` (moved, not net-new) | 0 net (re-homed) |
| SearchManager `@deprecated` methods | 60 |
| ResultFormatter/AgentFormatter/HumanFormatter/CorpusRenderer traversal duplication | ~400 |
| **Total net deletion estimate** | **~900 lines** |
## Summary of Additions
| Addition | Lines (estimate) |
|---|---|
| `src/services/rendering/ObservationRenderer.ts` + 4 strategy files | ~300 |
| `src/services/worker/UnifiedReaper.ts` | ~120 |
| `src/services/worker/process-spawning.ts` (extracted helpers) | ~150 |
| `ingestObservation` helper | ~60 |
| **Total additions** | **~630 lines** |
**Net**: ~270 lines removed, surface area significantly reduced, security gap closed.
+288
View File
@@ -0,0 +1,288 @@
# Pathfinder Phase 4: Handoff Prompts for `/make-plan`
Each block below is a ready-to-run `/make-plan` prompt for one unified system from `03-unified-proposal.md`. Copy a block directly into `/make-plan`.
Prompts are ordered by priority (from Phase 2 ranking): **U1 (security) → U6 (low-hanging fruit) → U4 → U3 → U2 → U5 → U7 → U8**.
---
## U1. Close the Privacy-Stripping Summary Gap (PRIORITY 1 — SECURITY)
```
/make-plan
TARGET: Close the privacy-tag-stripping asymmetry so that `<private>`, `<claude-mem-context>`, `<system_instruction>`, `<persisted-output>`, and `<system-reminder>` tags cannot reach the `session_summaries` table.
CURRENT BUG: The summary ingest path at `src/services/worker/http/routes/SessionRoutes.ts` handler `handleSummarizeByClaudeId` (around line 669-705) accepts a `last_assistant_message` field that was only partially stripped upstream — `src/cli/handlers/summarize.ts:66` passes `stripSystemReminders=true` to `extractLastMessage`, which only removes `<system-reminder>` via `SYSTEM_REMINDER_REGEX` in `src/shared/transcript-parser.ts:84`. Other privacy tags pass through and land in `pending_messages` → `session_summaries`.
FIX:
1. In `SessionRoutes.ts` `handleSummarizeByClaudeId`, immediately after extracting `last_assistant_message` from the body (before calling `queueSummarize`), call `stripMemoryTags(last_assistant_message)` from `src/utils/tag-stripping.ts`.
2. Verify the call site handles the empty-after-strip case (skip queuing if empty, mirroring `SessionRoutes.ts:865-872`).
PHASE 1 FLOWCHART: PATHFINDER-2026-04-21/01-flowcharts/privacy-tag-filtering.md
EVIDENCE: PATHFINDER-2026-04-21/02-duplication-report.md §B5
ANTI-PATTERNS TO REJECT:
- Do NOT add a new "privacy service" or class. `stripMemoryTags` is already a stateless utility.
- Do NOT add a feature flag. Just strip.
- Do NOT strip inside `queueSummarize` — strip at the HTTP boundary where other user-facing inputs are stripped.
TESTS: Add a unit/integration test that POSTs a summary with `<private>foo</private>` in `last_assistant_message` and asserts the stored `session_summaries` row contains no trace of it.
```
---
## U6. Collapse tag-stripping Wrappers to One Export
```
/make-plan
TARGET: Reduce `src/utils/tag-stripping.ts` to a single public export `stripMemoryTags(content: string)` and update call sites.
CURRENT STATE: The file exports two wrapper functions that both call the internal function with identical logic:
- `stripMemoryTagsFromPrompt` at `src/utils/tag-stripping.ts:79-91` (approx)
- `stripMemoryTagsFromJson` at same region
Both call `stripTagsInternal`.
CALL SITES TO UPDATE:
- `src/services/worker/http/routes/SessionRoutes.ts:629` (tool_input)
- `src/services/worker/http/routes/SessionRoutes.ts:633` (tool_response)
- `src/services/worker/http/routes/SessionRoutes.ts:862` (user prompt)
- Plus the new site from U1 (last_assistant_message)
FIX:
1. Rename `stripTagsInternal` to the public export `stripMemoryTags` and remove the two wrapper functions.
2. Update call sites to use the new name.
PHASE 1 FLOWCHART: PATHFINDER-2026-04-21/01-flowcharts/privacy-tag-filtering.md
EVIDENCE: PATHFINDER-2026-04-21/02-duplication-report.md §A1
ANTI-PATTERNS TO REJECT:
- Do NOT add overloads or options for "pretty print" etc. — keep it one argument in, one string out.
- Do NOT keep the old names as re-exports. Just update the imports.
```
---
## U4. Single Process Registry (drop worker-level facade)
```
/make-plan
TARGET: Delete the worker-level ProcessRegistry facade; make `src/supervisor/process-registry.ts` the sole process registry. Extract genuinely-useful spawn helpers to a plain-function module.
CURRENT STATE:
- `src/services/worker/ProcessRegistry.ts` (~528 lines) is a facade that delegates to `getSupervisor().getRegistry()` for state.
- `src/supervisor/process-registry.ts` (~409 lines) is the persistent registry (supervisor.json) with real logic.
- The facade adds spawn helpers (`createPidCapturingSpawn` at ~:393, `ensureProcessExit` at ~:185) that DO have value but don't need a class.
CALL SITES TO REWRITE (from Phase 2 evidence):
- Any import of `ProcessRegistry` from `src/services/worker/ProcessRegistry.ts` — change to `getSupervisor().getRegistry()` for state methods, OR to the new `process-spawning.ts` for spawn helpers.
- `src/services/worker/SessionManager.ts:535, 540, 631-670` (uses both spawn and state)
- `src/services/worker-service.ts:537` (orphan reaper setup — handled separately in U3)
FIX:
1. Create `src/services/worker/process-spawning.ts` exporting `createPidCapturingSpawn(...)` and `ensureProcessExit(...)` as plain functions.
2. Update every import of `src/services/worker/ProcessRegistry` to either `process-spawning.ts` (spawn helpers) or `getSupervisor().getRegistry()` (registration/lookup).
3. Delete `src/services/worker/ProcessRegistry.ts`.
PHASE 1 FLOWCHART: PATHFINDER-2026-04-21/01-flowcharts/session-lifecycle-management.md
EVIDENCE: PATHFINDER-2026-04-21/02-duplication-report.md §A7, §B8
ANTI-PATTERNS TO REJECT:
- Do NOT replace the worker facade with a "simpler worker facade." Delete it.
- Do NOT create an adapter class. Plain exported functions only for spawn helpers.
- Do NOT keep a re-export shim. Update all imports.
```
---
## U3. Unified Reaper (merge staleSession + orphan timers)
```
/make-plan
TARGET: Replace two independent reaper timers with a single `UnifiedReaper` that ticks every 30s and runs three checks at their respective cadences.
CURRENT STATE:
- `staleSessionReaperInterval` at `src/services/worker-service.ts:547` (2-min interval) calls `reapStaleSessions` in `src/services/worker/SessionManager.ts:516-568` which detects 5-min stuck generators and 15-min abandoned sessions.
- `startOrphanReaper` at `src/services/worker/ProcessRegistry.ts:508` (30s interval) runs `reapOrphanedProcesses` at `ProcessRegistry.ts:349` (dead-session PIDs, system orphans via ppid=1, idle daemon children).
- Shutdown at `worker-service.ts:1108-1110` clears `staleSessionReaperInterval`.
Relates to work item: **T32 refactor** (per context: "plan premise incorrect regarding unified reaper scope"). This plan clarifies the correct scope.
FIX:
1. Create `src/services/worker/UnifiedReaper.ts` with a single `setInterval` at 30s. Each tick:
- Always: run orphan-process cleanup (existing `reapOrphanedProcesses` body).
- Every 4th tick (2 min): run stuck-generator detection (existing `detectStaleGenerator` calls for each session with threshold 5 min).
- Every 4th tick (2 min): run abandoned-session detection (threshold 15 min, deleteSession).
2. Move `reapStaleSessions` body into UnifiedReaper; keep `detectStaleGenerator` helper on SessionManager.
3. Delete `staleSessionReaperInterval` setup + teardown.
4. Delete `startOrphanReaper` (ProcessRegistry.ts:508) and the interval it returned.
5. Wire `UnifiedReaper` into worker startup (after sessionManager init) and shutdown (before graceful shutdown).
CALL SITES TO REWRITE:
- `src/services/worker-service.ts:547` → replace with `UnifiedReaper.start()`
- `src/services/worker-service.ts:1108-1110` → replace with `UnifiedReaper.stop()`
- `src/services/worker/ProcessRegistry.ts:508` → delete startOrphanReaper setup (migrated into UnifiedReaper)
- `src/services/worker/SessionManager.ts:516-568` → delete `reapStaleSessions` body (migrated)
PHASE 1 FLOWCHART: PATHFINDER-2026-04-21/01-flowcharts/session-lifecycle-management.md
EVIDENCE: PATHFINDER-2026-04-21/02-duplication-report.md §A4, §B9
ANTI-PATTERNS TO REJECT:
- Do NOT give each check its own timer "for flexibility." The whole point is ONE timer.
- Do NOT make intervals configurable via settings — hard-code 30s base tick and 4x multiplier.
- Do NOT build a plugin/registry. Three checks, called directly in sequence.
- Do NOT preserve the old reapers behind a feature flag.
NOTE: This plan supersedes any existing T32 plan premise; the unified reaper handles BOTH process orphans AND session-lifecycle concerns in one scheduler. Depends on U4 being complete first (so that ProcessRegistry refs resolve cleanly).
```
---
## U2. Unified Observation Renderer
```
/make-plan
TARGET: Create a single `ObservationRenderer` that four call sites use with pluggable strategies, eliminating ~600 lines of overlapping traversal and formatting logic.
CURRENT STATE (four independent renderers producing markdown from observations):
- `src/services/worker/search/ResultFormatter.ts:25-200` — CLI search results, grouped-by-date+file tables
- `src/services/context/formatters/AgentFormatter.ts:36-200` — LLM-compact one-liners
- `src/services/context/formatters/HumanFormatter.ts:35-238` — ANSI terminal output
- `src/services/worker/knowledge/CorpusRenderer.ts:14-133` — full-detail agent priming
All four look up type icon via ModeManager, estimate tokens, format title/subtitle, walk facts/concepts. Shared grouping helper already exists in `src/shared/timeline-formatting.ts`.
FIX:
1. Create `src/services/rendering/ObservationRenderer.ts` with:
- `renderObservations(obs[], strategy): string`
- Shared traversal: ModeManager lookup, token calc, time formatting, facts/concepts iteration.
2. Define `RenderStrategy` interface: `headerLine(obs)`, `detailLines(obs)`, `footerLine(obs)`, `groupingMode: 'date-file' | 'day-timeline' | 'none'`.
3. Concrete strategies (small files, each ~60 lines):
- `SearchResultStrategy`
- `AgentContextStrategy`
- `HumanContextStrategy`
- `CorpusDetailStrategy`
4. Reduce the four existing renderer files to thin shells: construct a strategy, call the renderer.
5. Delete the duplicate iteration/formatting code.
CALL SITES TO REWRITE:
- `ResultFormatter.formatSearchResults` (ResultFormatter.ts:25) → build SearchResultStrategy, call renderer
- `AgentFormatter.renderAgentTable` (AgentFormatter.ts:86) → build AgentContextStrategy, call renderer
- `HumanFormatter.renderHumanTable` (HumanFormatter.ts:80) → build HumanContextStrategy, call renderer
- `CorpusRenderer.renderCorpus` (CorpusRenderer.ts:14) → build CorpusDetailStrategy, call renderer
PHASE 1 FLOWCHARTS:
- PATHFINDER-2026-04-21/01-flowcharts/context-injection-engine.md
- PATHFINDER-2026-04-21/01-flowcharts/hybrid-search-orchestration.md
- PATHFINDER-2026-04-21/01-flowcharts/knowledge-corpus-builder.md
EVIDENCE: PATHFINDER-2026-04-21/02-duplication-report.md §A2, §A8, §B2
ANTI-PATTERNS TO REJECT:
- Do NOT build a registry or factory for strategies. Construct directly at call sites.
- Do NOT make strategies discoverable by name. They are four concrete classes.
- Do NOT introduce a DSL for rendering — plain TypeScript strategies only.
- Do NOT support dynamic output formats ("just in case"). If a fifth audience appears later, add a fifth strategy then.
TESTS: Snapshot tests for each of the four output formats using fixture observations; confirm byte-identical output before/after refactor.
```
---
## U5. Canonical XML Parser in Import Tool
```
/make-plan
TARGET: Make `src/bin/import-xml-observations.ts` use `parseSummary` from `src/sdk/parser.ts` instead of its parallel implementation.
CURRENT STATE: `src/bin/import-xml-observations.ts:162` has its own `parseSummary` that lacks ModeManager type validation. If summary XML schema evolves, the two diverge silently.
FIX:
1. Delete the inline parser in `import-xml-observations.ts`.
2. Import `parseSummary` from `src/sdk/parser.ts` and call it.
3. If (and only if) the import tool genuinely needs to skip type validation for historical observations with retired types, add an options argument to `parseSummary` (e.g., `{ strict: false }`) and pass it.
PHASE 1 FLOWCHART: PATHFINDER-2026-04-21/01-flowcharts/response-parsing-storage.md
EVIDENCE: PATHFINDER-2026-04-21/02-duplication-report.md §B4
ANTI-PATTERNS TO REJECT:
- Do NOT extend the parser API with an options object unless test data actually requires it. Start strict.
- Do NOT keep the inline parser as a fallback.
```
---
## U7. Delete SearchManager Deprecated Methods
```
/make-plan
TARGET: Remove `@deprecated` private methods from `src/services/worker/SearchManager.ts`.
CURRENT STATE: SearchManager retains legacy private methods (`queryChroma`, `searchChromaForTimeline`) that are flagged `@deprecated` and superseded by `SearchOrchestrator` strategies.
FIX:
1. Grep for remaining callers — likely none (they are private).
2. Delete the methods.
3. Confirm no test or compile breakage.
PHASE 1 FLOWCHART: PATHFINDER-2026-04-21/01-flowcharts/hybrid-search-orchestration.md
EVIDENCE: PATHFINDER-2026-04-21/02-duplication-report.md §B7
ANTI-PATTERNS TO REJECT:
- Do NOT leave dead deprecated code "just in case."
```
---
## U8. Transcript-Watcher Direct Queue + `ingestObservation` Helper
```
/make-plan
TARGET: Eliminate HTTP loopback in the transcript-watcher path by extracting the privacy-check + tag-strip + queue logic into a shared helper `ingestObservation(payload)` called directly by both `SessionRoutes` and `TranscriptEventProcessor`.
CURRENT STATE:
- `src/services/transcripts/processor.ts:240-244` calls `observationHandler.execute()` which POSTs to `/api/sessions/observations` via loopback HTTP.
- `src/services/worker/http/routes/SessionRoutes.ts:565-659` runs validation, privacy check, `stripMemoryTags` on tool_input/response, and `sessionManager.queueObservation`.
FIX:
1. Extract the validation + privacy-check + strip + queue logic from `SessionRoutes.ts:565-659` into a helper `ingestObservation(payload, { source })` in `src/services/worker/observation-ingest.ts`.
2. Update `SessionRoutes.handleObservationsByClaudeId` to call the helper.
3. Update `src/services/transcripts/processor.ts` to call the helper directly (delete the observationHandler invocation at line 240-244).
CALL SITES TO REWRITE:
- `src/services/worker/http/routes/SessionRoutes.ts:565-659` → reduce to thin wrapper over `ingestObservation`
- `src/services/transcripts/processor.ts:240-244` → replace observationHandler call with direct `ingestObservation` call
PHASE 1 FLOWCHARTS:
- PATHFINDER-2026-04-21/01-flowcharts/transcript-watcher-integration.md
- PATHFINDER-2026-04-21/01-flowcharts/lifecycle-hooks.md
EVIDENCE: PATHFINDER-2026-04-21/02-duplication-report.md §B3
ANTI-PATTERNS TO REJECT:
- Do NOT parameterize every difference between the two callers ("source: enum of 7 possible values"). Two call sites, two keyword args max.
- Do NOT move the logic into `SessionManager` itself — queue ingest is a boundary concern (privacy + strip happen here).
- Do NOT preserve the observationHandler → HTTP path as a fallback.
NOTE: Depends on U1 + U6 landing first so the strip helper name is already unified.
```
---
## Execution Order Recommendation
1. **U1** (security fix — land immediately)
2. **U6** (trivial; unblocks U1 cleanup)
3. **U5** (trivial; prevents drift)
4. **U7** (trivial; dead code)
5. **U4** (enables clean U3)
6. **U3** (unified reaper — requires U4 done)
7. **U8** (requires U1 + U6)
8. **U2** (largest, lowest risk — snapshot tests gate)
Each `/make-plan` invocation should produce a phased plan with ≤3 tasks per phase. Land in that order, verifying after each.
@@ -0,0 +1,631 @@
# Pathfinder Phase 5: Brutal Audit + Clean Flowcharts
**Date**: 2026-04-21
**Scope**: Strip every timer, fallback, wrapper, and coercion that exists to patch a failed abstraction. Preserve every user-facing feature. Replace patch-piles with single clear paths.
**Rules of engagement:**
- User-facing features (context injection, semantic search, Chroma sync, transcript watch, summary, viewer UI, corpus, CLAUDE.md folder sync, per-prompt semantic) — **KEEP**.
- Crash-recovery that solves a real OS-level problem (subprocess hang watchdog, dead-parent detection, FS watcher missing events on some platforms) — **KEEP but consolidate**.
- Cosmetic duplication, polling where events exist, fallbacks that hide contract violations, facades that pass through — **KILL**.
---
## Part 1: Bullshit Inventory
Every item here is a patch applied in place of a root-cause fix. They all go.
| # | Bullshit | Why it exists | Root cause to fix instead |
|---|---|---|---|
| 1 | `stripMemoryTagsFromPrompt` + `stripMemoryTagsFromJson` wrappers | Cosmetic naming; both call `stripTagsInternal` identically. | One public `stripMemoryTags(text)`. |
| 2 | Summary path only strips `<system-reminder>` | Different code path missed the fix. **SECURITY BUG**. | Funnel every ingest through the same strip call. |
| 3 | 6 sequential `.replace()` calls for 6 tags | One pass per tag. | One regex with alternation. |
| 4 | Worker-level `ProcessRegistry.ts` (528 lines) | Wraps supervisor registry with spawn helpers. | Supervisor registry is the source of truth; spawn helpers are free functions. |
| 5 | `staleSessionReaperInterval` (2 min) | Second reaper added later to catch what the first missed. | One reaper, three checks. |
| 6 | `startOrphanReaper` (30 s) | First reaper. | Same one reaper. |
| 7 | `detectStaleGenerator` helper + 5-min threshold | Watchdog for hung SDK subprocess. | Keep watchdog — it's real — but run it on the one reaper tick. |
| 8 | 15-min `MAX_SESSION_IDLE_MS` abandoned-session check | Crash recovery. | Keep — real — but same reaper. |
| 9 | 30-s `ensureProcessExit` + SIGKILL escalation ladder | Subprocesses ignore SIGTERM. | Keep SIGTERM → SIGKILL, delete the ladder framework — inline it. |
| 10 | `conversationHistory` in-memory accumulator | Multi-turn agent memory. | Keep — this is the agent's working memory, not a patch. |
| 11 | 500 ms polling `/api/sessions/status` up to 110 s in summarize hook | Hook needs to wait for SDK agent; no push mechanism. | `/api/sessions/summarize` blocks until done OR closes an SSE to the hook. Hook waits on one call. |
| 12 | `/api/context/inject` called TWICE at SessionStart (context + user-message) | Two handlers needed same data, ran in parallel. | One handler, one fetch, caller passes data to the formatter. |
| 13 | `ensureWorkerRunning` called at every hook entry | Hook has no shared state. | Cache `alive=true` in the hook process for the session. |
| 14 | `/api/context/inject` + `/api/context/semantic` both called at UserPromptSubmit | Two endpoints, two roundtrips, same session boot. | `/api/session/start` returns `{sessionDbId, contextMarkdown, semanticMarkdown}`. |
| 15 | 30-second dedup window in `storeObservation` | PostToolUse hook can fire twice on retry. | UNIQUE constraint on `(session_id, tool_use_id)`; DB rejects dup. |
| 16 | `claim-confirm` 60-s stale-reset in `PendingMessageStore.claimNextMessage` | Crash recovery mid-processing. | Keep — real — but move the reset into worker startup, not every claim call. |
| 17 | `pendingTools` map in `TranscriptEventProcessor` | Pairs `tool_use` and `tool_result` as they arrive. | JSONL lines carry `tool_use_id`; match by ID, no state map. |
| 18 | `observationHandler.execute()` HTTP loopback from transcript-watcher | Reuse of CLI handler inside worker process. | Extract `ingestObservation(payload)` helper; both call it directly. |
| 19 | 5-s rescan timer for new transcript files | `fs.watch` misses new files on some platforms. | Watch the parent directory too; add new files when created. Remove the interval. |
| 20 | `coerceObservationToSummary` fallback | Agent returns observations but no `<summary>`. | Agent contract says `<summary>` or `<skip_summary/>`. Enforce; fail the session. |
| 21 | Non-XML response detection + early-fail branch | Agent returns auth error or garbage instead of XML. | Same contract enforcement; one failure path. |
| 22 | Consecutive summary failures circuit breaker | Repeated parse failures. | Contract enforcement + RestartGuard covers this already; delete the separate counter. |
| 23 | `coerceObservationToSummary` regex chains | Summary-missing fallback only. | Delete with item 20. |
| 24 | `ChromaSync.backfillAllProjects` on every worker start | Writes sometimes fail silently, miss Chroma. | Write-path is atomic: SQLite row + Chroma doc in one `Promise.all` with hard failure. If Chroma is enabled but down at write time, mark `chroma_synced=false` on the row; backfill only rows where flag is false. No full-project scan. |
| 25 | Chroma "delete-then-add" on ID conflict | Chroma add() fails on duplicate. | Stable ID = `obs:<sqlite_rowid>`; use upsert. No conflict. |
| 26 | 3-5 granular docs per observation in Chroma | Each field separately vectorized. | One doc per observation: title + narrative + facts concatenated. Recall stays high; index is 1/4 the size. |
| 27 | Python `sqlite3` subprocess for schema repair | Historical migrations created malformed state. | Migrations are idempotent and tested; malformed state can't happen. Delete the repair path. Users on malformed DBs from v<X run a one-shot `claude-mem repair` command manually. |
| 28 | 27 migrations with copy-pasted `CREATE TABLE IF NOT EXISTS` / ALTER boilerplate | Each author wrote their own. | On fresh DB: one `schema.sql` defines current state. Migration runner only touches DBs with `schema_versions` rows < current. |
| 29 | `stripMemoryTagsFromJson` stringifies → strips → parses | Only JSON-shaped payloads. | Strip on the raw string fields (`tool_input.content`, `tool_response.output`) before serialization. One strip call per user-facing text field. |
| 30 | SearchManager `@deprecated` methods (`queryChroma`, `searchChromaForTimeline`) | Pre-Orchestrator code. | Delete. |
| 31 | SearchManager thin facade at HTTP boundary | HTTP wants markdown; Orchestrator returns structured. | Keep the display-wrap (it's real work), but delete every method that just forwards to Orchestrator. |
| 32 | `SearchOrchestrator` Chroma-fails-silently-drops-query-text fallback | Hide Chroma subprocess crashes. | Return `{error: "chroma_unavailable"}` to caller; caller decides whether to retry without query. No silent coercion. |
| 33 | 90-day default recency filter baked into `filterByRecency` | Older results are usually noise. | Orchestrator accepts `dateRange` or nothing; caller is explicit. No implicit filter. |
| 34 | `AgentFormatter` / `HumanFormatter` / `ResultFormatter` / `CorpusRenderer` — 4 independent observation walkers | Each audience implemented separately. | One `renderObservations(obs[], strategy)`; strategy = which columns/density/grouping. |
| 35 | KnowledgeAgent auto-reprime on session-expiration regex match | SDK session IDs expire silently. | Prime is cheap when corpus is loaded; just always prime on query — or store corpus content in a file the SDK loads fresh. No session_id persistence. |
| 36 | `corpus.json` stores `session_id` | Enables SDK resume. | Kill with item 35. |
| 37 | Per-route validation boilerplate × 8 files | No shared schema. | `validateBody(schema)` middleware; per-route Zod schema. |
| 38 | `/api/admin/restart` and `/api/admin/shutdown` with `process.exit(0)` | Manual worker control. | Keep (internal tooling used by version-bump). Not bullshit. |
| 39 | Rate limit 300/min in-memory IP map | Abuse limiter on localhost-only server. | Delete. Localhost trust model assumed everywhere else; this limiter doesn't add safety. |
| 40 | JSON parse 5MB limit on every request | Uploading observations that large would be pathological. | Keep (cheap), but delete any special handling for oversized — 413 is fine. |
**Total bullshit items**: 40.
**Lines expected to delete**: ~1400 (up from the 900 estimate in 03-unified-proposal.md once you audit bullshit, not just "duplication").
---
## Part 2: Clean Architecture — Root-Cause Fixes
Six decisions, applied everywhere:
**D1. One observation ingest path.** Hook, transcript-watcher, and manual-save all call `ingestObservation(payload)`. That function does: strip tags → validate privacy → INSERT `pending_messages`. No HTTP loopback inside the worker process.
**D2. One tag-strip function.** `stripMemoryTags(text)`. One regex with alternation. Called at every text-ingress point.
**D3. Zero repeating background timers** (revised 2026-04-22). Every recurring check is replaced by one of three mechanisms: (a) a subprocess-`exit`/`close` event handler for in-process subprocess death, (b) a per-session/per-operation `setTimeout` for time-bounded waits (resets on activity, fires and clears once), or (c) a boot-once reconciliation pass at worker startup for cleanup of state that can only have been orphaned by a previous worker instance. Worker-level `ProcessRegistry` facade deleted; supervisor registry is authoritative. No `setInterval` remains in `src/services/worker/` or `worker-service.ts`.
**D4. One renderer.** `renderObservations(obs[], strategy)` where `strategy` selects columns, density, and grouping. The four existing formatters become four small strategy configs.
**D5. Contract enforcement, not coercion.** Agent must return `<summary>` or `<skip_summary/>`. If it returns neither: `session.fail()`. No coerce, no circuit breaker, no non-XML fallback — RestartGuard already exists for repeated failures.
**D6. Blocking endpoints over polling.** `/api/sessions/summarize` doesn't return until the SDK has written the summary row (with a hard timeout). Hook does one request. No 500-ms loop.
---
## Part 3: New Flowcharts
Each diagram below replaces the same-named file in `01-flowcharts/`. Deleted nodes are listed under the diagram. All boxes cite target file:line for the clean implementation.
---
### 3.1 lifecycle-hooks (clean)
```mermaid
flowchart TD
Start([Claude Code lifecycle event]) --> Dispatch{Event?}
Dispatch -->|SessionStart| SS["GET /api/session/start?project=...<br/>(one call returns ctx + semantic)"]
Dispatch -->|UserPromptSubmit| UPS["POST /api/session/prompt<br/>{sessionDbId, prompt}"]
Dispatch -->|PostToolUse| PTU["POST /api/session/observation<br/>{sessionDbId, tool_use_id, name, input, output}"]
Dispatch -->|Stop| STOP["POST /api/session/end<br/>{sessionDbId, last_assistant_message}<br/>BLOCKS until summary written or 110s timeout"]
SS --> SSR["Returns {sessionDbId, contextMarkdown, semanticMarkdown}"]
SSR --> Print["Write ctx to stdout for Claude<br/>Write human-formatted copy to stderr"]
UPS --> UPSR["Returns {promptId}"]
PTU --> PTUR["Returns {observationId}"]
STOP --> STOPR["Returns {summaryId or null}"]
Print --> Done([Exit 0])
UPSR --> Done
PTUR --> Done
STOPR --> Done
```
**Deleted from old flowchart:**
- `ensureWorkerRunning` at every entry point (cache `alive` for the hook lifetime)
- `POST /api/context/semantic` separate call (folded into `/api/session/start`)
- `POST /sessions/{id}/init` SDK-start endpoint (implicit inside `/api/session/prompt`)
- `userMessageHandler` duplicate `/api/context/inject` fetch (single fetch returned from `/api/session/start` covers both)
- 500-ms poll loop on `/api/sessions/status` (replaced by blocking `/api/session/end`)
- Two-phase Stop handling (summarize then session-complete) — one endpoint, one response
**Endpoint count**: 8 → 4.
---
### 3.2 privacy-tag-filtering (clean)
```mermaid
flowchart TD
In["Any text ingress<br/>(prompt / tool_input / tool_output / assistant_message)"] --> Strip["stripMemoryTags(text)<br/>src/utils/tag-stripping.ts"]
Strip --> OneRegex["Single regex alternation:<br/>/<(private|claude-mem-context|system_instruction|system-instruction|persisted-output|system-reminder)>[\\s\\S]*?<\\/\\1>/g"]
OneRegex --> Count{Tag count > MAX=100?}
Count -->|Yes| Warn["logger.warn ReDoS suspicion"]
Count -->|No| Replace["Replace → empty string"]
Warn --> Replace
Replace --> Trim["String.trim()"]
Trim --> Empty{Empty after strip?}
Empty -->|Yes| Skip["Caller returns skipped=true"]
Empty -->|No| Pass["Return cleaned text"]
subgraph CallSites["Call sites (every text ingress uses the same function)"]
C1["ingestObservation: tool_input.content, tool_response.output"]
C2["ingestPrompt: user prompt text"]
C3["ingestSummary: last_assistant_message (CLOSES SECURITY GAP)"]
end
```
**Deleted:**
- `stripMemoryTagsFromPrompt` wrapper (20 lines)
- `stripMemoryTagsFromJson` wrapper + its stringify/parse dance (30 lines)
- Six sequential `.replace()` calls (one alternating regex instead)
- Summary-path partial strip at `summarize.ts:66` and `SessionRoutes.ts:669`
**Closes:** P1 security gap (private content reaching `session_summaries`).
---
### 3.3 sqlite-persistence (clean)
```mermaid
flowchart TD
Boot["Worker boot<br/>src/services/sqlite/Database.ts"] --> Open["new bun:sqlite"]
Open --> Pragmas["PRAGMA WAL/NORMAL/FK/mmap (one block)"]
Pragmas --> Check["SELECT version FROM schema_versions"]
Check --> Fresh{Empty?}
Fresh -->|Yes| Schema["Execute schema.sql (current state)<br/>INSERT schema_versions=N"]
Fresh -->|No| Migrate["Run migrations where id > current"]
Schema --> Ready["DB ready"]
Migrate --> Ready
Ready --> Write["INSERT observations<br/>UNIQUE(session_id, tool_use_id)"]
Write --> Conflict{UNIQUE violation?}
Conflict -->|Yes| SkipWrite["Return existing id (idempotent)"]
Conflict -->|No| Inserted["Return new id + epoch"]
Ready --> Queue["INSERT pending_messages status=pending"]
Queue --> Claim["claimNextMessage TX<br/>SELECT pending ORDER BY id LIMIT 1<br/>UPDATE status=processing"]
Claim --> Worker["Worker processes, confirms (DELETE)"]
Ready --> Read["Prepared SELECTs (indexes on created_at_epoch DESC)"]
BootOnce["Worker startup ONCE<br/>(not on every claim)"] --> Recover["UPDATE pending_messages<br/>SET status=pending<br/>WHERE status=processing<br/>(crash recovery)"]
```
**Deleted:**
- Python `sqlite3` subprocess schema-repair path (~120 lines; if someone's DB is malformed from v<6.5, they run `claude-mem repair` explicitly)
- 30-second content-hash dedup window in `storeObservation` (replaced by DB UNIQUE constraint on `(session_id, tool_use_id)`)
- `findDuplicateObservation` function (~30 lines)
- 60-s stale-reset inside `claimNextMessage` (moved to one-time boot recovery; normal claims are a pure SELECT+UPDATE)
- 24+ migrations of `CREATE TABLE IF NOT EXISTS` boilerplate collapsed into one `schema.sql` for fresh DBs; the migration runner only runs actual upgrade steps
**Tables unchanged.** FTS5 triggers unchanged. WAL mode unchanged.
---
### 3.4 vector-search-sync (clean)
```mermaid
flowchart TD
Write["Observation written to SQLite<br/>id=42, session_id=abc"] --> FlagCheck{Chroma enabled?}
FlagCheck -->|No| End([Skip])
FlagCheck -->|Yes| Format["formatDoc<br/>text = title + narrative + facts<br/>id = 'obs:42'"]
Format --> Upsert["chroma_mcp.upsert(id, text, metadata)<br/>(stable ID = stable upsert)"]
Upsert --> OK{Success?}
OK -->|Yes| Mark["UPDATE observations SET chroma_synced=1 WHERE id=42"]
OK -->|No| LogFail["Leave chroma_synced=0<br/>logger.warn"]
Mark --> End
LogFail --> End
BootOnce["Worker startup ONCE"] --> CheckUnsync["SELECT id FROM observations<br/>WHERE chroma_synced=0<br/>LIMIT 1000"]
CheckUnsync --> LoopBackfill["For each: formatDoc → upsert → mark"]
Query["User search query"] --> QueryChroma["chroma_mcp.query(project, text, n)"]
QueryChroma --> Hydrate["SELECT * FROM observations WHERE id IN (...)"]
Hydrate --> Return["Return results"]
```
**Deleted:**
- `ensureBackfilled` + `runBackfillPipeline` full-project scan on every startup (~200 lines)
- `getExistingChromaIds` metadata index scan (~80 lines)
- Delete-then-add for ID conflicts (replaced by `upsert`)
- Granular per-field doc formatter (3-5 docs per observation → 1 doc per observation)
- `backfillAllProjects` fire-and-forget on worker boot (replaced by targeted `WHERE chroma_synced=0`)
**Adds:** `chroma_synced` boolean column on `observations`. Schema migration.
**Effect:** Chroma index size drops ~70%. Backfill cost drops from "every startup, every project, full scan" to "boot once, only unsynced rows."
---
### 3.5 context-injection-engine (clean)
```mermaid
flowchart TD
Route["GET /api/session/start?project=X"] --> Gen["generateContext(projects, forHuman=false)<br/>ContextBuilder.ts"]
Route --> GenH["generateContext(projects, forHuman=true)"]
Gen --> Mode["ModeManager.getActiveMode()"]
GenH --> Mode
Mode --> Fetch["SELECT observations + summaries<br/>filtered by mode types"]
Fetch --> Budget["calculateTokenEconomics"]
Budget --> Render["renderObservations(obs, strategy)<br/>(U2 unified renderer)"]
Render --> Strategy{strategy?}
Strategy -->|AgentContextStrategy| AgentOut["Compact markdown for LLM"]
Strategy -->|HumanContextStrategy| HumanOut["ANSI-colored terminal"]
AgentOut --> Return["Return contextMarkdown"]
HumanOut --> Return
Semantic["POST /api/session/start (also includes semantic)"] --> SearchO["SearchOrchestrator.search(query, limit=5)"]
SearchO --> Strategy
```
**Deleted:**
- Separate `renderEmptyState`, `renderHeader`, `renderTimeline`, `renderPreviouslySection`, `renderFooter` branches — one strategy definition carries the shape
- `formatDay` branching (forHuman split pushed to strategy)
- Independent `AgentFormatter` vs `HumanFormatter` traversals — one renderer, two strategies
**Kept user-facing:** Agent format (LLM), Human format (terminal ANSI), token budgets, mode filtering, semantic injection.
---
### 3.6 hybrid-search-orchestration (clean)
```mermaid
flowchart TD
A["GET /api/search?q=...&project=...&concept=..."] --> B["SearchRoutes.handleSearch"]
B --> C["SearchOrchestrator.search(params)"]
C --> D{Decision}
D -->|q + Chroma enabled| Semantic["ChromaSearchStrategy.search"]
D -->|q + Chroma disabled| Err["Return 503<br/>error=chroma_unavailable<br/>(NO silent fallback)"]
D -->|no q| FilterOnly["SQLiteSearchStrategy.search"]
D -->|concept/type/file| Hybrid["HybridSearchStrategy.search<br/>(SQLite filter + Chroma rank)"]
Semantic --> Hydrate["Hydrate from SQLite"]
FilterOnly --> Hydrate
Hybrid --> Hydrate
Hydrate --> Fmt{format?}
Fmt -->|json| J["Raw JSON"]
Fmt -->|markdown| M["renderObservations(results, SearchResultStrategy)"]
```
**Deleted:**
- `SearchManager` thin facade (~300 lines; route handler talks to Orchestrator directly)
- `SearchManager.queryChroma`, `SearchManager.searchChromaForTimeline` (`@deprecated`)
- Silent Chroma-fails-drops-query fallback (returns 503 now)
- 90-day default recency filter (callers pass `dateRange` explicitly or get all)
- `filterByRecency` helper
**Kept user-facing:** All three search paths, markdown + json formats, per-concept/type/file filters, timeline builder.
---
### 3.7 response-parsing-storage (clean)
```mermaid
flowchart TD
A["SDK agent returns text"] --> B["processAgentResponse"]
B --> C["parseAgentXml(text, { requireSummary })<br/>src/sdk/parser.ts"]
C --> D{Valid?}
D -->|No| Fail["session.recordFailure()<br/>Mark pending_messages FAILED<br/>RestartGuard handles repeats"]
D -->|Yes| Store["sessionStore.storeObservations(parsed)<br/>atomic TX"]
Store --> Confirm["pendingStore.confirmProcessed(ids)<br/>DELETE after commit"]
Confirm --> Sync["getChromaSync().syncObservation / syncSummary<br/>fire-and-forget"]
Confirm --> SSE["SSEBroadcaster.broadcast"]
Confirm --> Folder["Optional: writeAgentsMd (flagged)"]
```
**Deleted:**
- `coerceObservationToSummary` fallback (~40 lines) — agent must return `<summary>` or `<skip_summary/>`
- `parseObservations` and `parseSummary` as two separate functions → one `parseAgentXml(text, opts)` driven by a tag registry
- Non-XML early-fail special case (collapsed into single `parseAgentXml``{valid: false, reason}` response)
- `consecutiveSummaryFailures` counter + circuit-breaker logic (RestartGuard covers this already)
- Null-normalization hacks between parser and store (parser returns structured, never null)
**Kept:** Atomic transaction for obs + summary, content-hash dedup *within the parse output* (not window-based), SSE broadcast, Chroma sync trigger, CLAUDE.md folder sync (feature flagged).
---
### 3.8 session-lifecycle-management (clean) — **BIGGEST CULL**
```mermaid
flowchart TD
A["POST /api/session/prompt"] --> B["SessionManager.initializeSession(sessionDbId)"]
B --> C{In memory?}
C -->|Yes| Use["Use cached"]
C -->|No| Create["Create ActiveSession<br/>spawn SDK subprocess<br/>register in supervisor.ProcessRegistry"]
Use --> Gen["SDKAgent.generateResponse iterator"]
Create --> Gen
Q["POST /api/session/observation"] --> Enqueue["ingestObservation(payload)<br/>strip → validate → INSERT pending_messages<br/>emit 'message' event"]
Enqueue --> Wake["iterator.wakeUp()"]
Gen --> Claim["claimNextMessage TX"]
Claim --> YieldMsg["yield message"]
YieldMsg --> Update["lastActivity = now"]
Update --> SDKProcess["SDK processes → ResponseProcessor confirms"]
SDKProcess --> Claim
Claim -->|queue empty + idle≥3min| Idle["signal abort"]
Idle --> Exit["iterator exits"]
Exit --> Unreg["Auto-unregister (process 'exit' event)"]
Unreg --> Delete["SessionManager.delete"]
End["POST /api/session/end"] --> Queue_Sum["queueSummarize as normal pending_message"]
Queue_Sum --> WaitSum["await summary_stored flag OR 110s timeout"]
WaitSum --> Abort["abortController.abort → iterator exits"]
Abort --> Delete
subgraph EventDriven["Event-driven cleanup — no repeating timers"]
EH1["child.on('exit') on SDK spawn<br/>ProcessRegistry.ts:479"] --> Unreg2["unregisterProcess(pid)"]
EH2["mcpProcess.once('exit')<br/>worker-service.ts:530"] --> Unreg3["supervisor.unregisterProcess('mcp-server')"]
IdleT["Per-iterator 3-min setTimeout<br/>SessionQueueProcessor.ts:6<br/>(resets on every chunk at :51-52, :62-63)"] --> IdleFire["onIdleTimeout → abortController.abort<br/>→ child.on('exit') fires → Unreg"]
AbandT["Per-session setTimeout(deleteSession, 15min)<br/>scheduled on last-generator-completion<br/>cleared on new activity"] --> Delete
end
EH1 -.-> Delete
EH2 -.-> Delete
IdleFire -.-> Delete
subgraph BootOnceBlock["Worker startup — boot-once reconciliation"]
BootOnce["Worker startup"] --> Recover["UPDATE pending_messages status processing → pending<br/>(crash recovery)"]
Recover --> BootOrphans["killSystemOrphans(): kill ppid=1 Claude processes<br/>from previous crashed worker instance<br/>(ProcessRegistry.ts:315-344, called ONCE)"]
BootOrphans --> BootPrune["supervisor.pruneDeadEntries():<br/>drop registry entries for PIDs no longer in OS"]
BootPrune --> BootSQL["clearFailedOlderThan(1h)<br/>(one-shot cleanup of stale failed rows)"]
end
```
**Deleted:**
- `src/services/worker/ProcessRegistry.ts` (facade, 528 lines) — supervisor registry is source of truth
- `staleSessionReaperInterval` (separate 2-min timer)
- `startOrphanReaper` (separate 30-s timer)
- `reapStaleSessions` / `reapHungGenerators` / `reapAbandonedSessions` as **background-scanner** sweeps — replaced by per-session `setTimeout`s that fire at the session itself, not from a global scanner
- `reapOrphanedProcesses` as a separate function — folded into boot-once `pruneDeadEntries` + per-spawn `exit` handlers
- `killIdleDaemonChildren` as a runtime sweep — its job is covered by subprocess `exit` handlers during runtime and by boot-once `killSystemOrphans` for ppid=1 leftovers from a prior worker crash
- `killSystemOrphans` as a **repeating** call — function kept, but called exactly once at boot (it can only catch state that predates this worker's existence)
- `ensureProcessExit` 5-s escalation scaffolding — inline the SIGTERM→wait 5s→SIGKILL in one function (remains per-operation, not repeating)
- 60-s self-healing `UPDATE stale → pending` inside `claimNextMessage` — runs once at boot instead
- `MAX_SESSION_IDLE_MS` global (just a constant — consolidated into per-session-timer config)
- Explicit `PRAGMA wal_checkpoint(PASSIVE)` call — SQLite's default `wal_autocheckpoint=1000` pages is the contract (`Database.ts:162-168` sets no override, so the default is live)
- Periodic `clearFailedOlderThan(1h)` — moved to boot-once in plan 02
**Repeating background timers**: 2 → 0.
**Process-registry files**: 2 → 1.
**Process-lifecycle lines**: ~900 → ~400.
**Kept user-facing:** Session init/observe/end, async SDK processing, subprocess crash recovery (via `exit` handlers), hung-generator cleanup (via per-session idle timeout that already exists at `SessionQueueProcessor.ts:6`), abandoned-session cleanup (via per-session `setTimeout`), cross-restart orphan cleanup (via boot-once `killSystemOrphans`). Zero functional loss.
---
### 3.9 http-server-routes (clean)
```mermaid
flowchart TD
A([Request on :37777]) --> B["JSON parse 5MB<br/>CORS localhost<br/>request logger"]
B --> C{Route match}
C -->|Yes| D["validateBody(schema) middleware<br/>(Zod per route)"]
C -->|No| NF[404]
D --> E{Valid?}
E -->|No| BR["400 with field errors"]
E -->|Yes| F["BaseRouteHandler.wrapHandler"]
F --> G["Service call"]
G --> H{Response}
H -->|JSON| J1["res.json"]
H -->|SSE| J2["text/event-stream<br/>SSEBroadcaster register"]
H -->|HTML/file| J3["res.sendFile"]
G -->|error| Err["Global errorHandler → { error, message, code }"]
subgraph Routes["Route inventory (user-facing, unchanged)"]
R1["ViewerRoutes: /, /health, /stream"]
R2["SearchRoutes: /api/search, /api/timeline, /api/context/*"]
R3["SessionRoutes: /api/session/* (4 endpoints — see 3.1)"]
R4["DataRoutes: /api/observations, /api/summaries, /api/prompts, /api/stats, /api/projects"]
R5["SettingsRoutes: /api/settings, /api/mcp/*, /api/branch/*"]
R6["MemoryRoutes: /api/memory/save"]
R7["CorpusRoutes: /api/corpus/*"]
R8["LogsRoutes: /api/logs"]
end
```
**Deleted:**
- In-memory rate limiter (300/min IP map) — localhost trust model everywhere else makes this theater
- Per-route hand-rolled validation (Zod middleware replaces)
- Synchronous file read for `/` and `/api/instructions` (replace with cached `Buffer` loaded at boot)
- Legacy `SessionRoutes.handleObservations` (no-privacy-strip) endpoint at `SessionRoutes.ts:378`
**Kept:** All user-facing routes, SSE, middleware chain, admin endpoints (used by tooling).
---
### 3.10 viewer-ui-layer (clean)
```mermaid
flowchart TD
HTTP["GET /"] --> HTML["viewer.html (cached at boot)"]
HTML --> React["React mount"]
React --> SSE["useSSE → EventSource('/stream')"]
SSE --> Initial["Receive initial_load catalog"]
Initial --> Feed["Feed renders<br/>IntersectionObserver → loadMore"]
Feed --> Page["GET /api/observations?offset&limit"]
Page --> Merge["useMemo dedup (project, id)<br/>live SSE + paginated"]
Merge --> Cards["ObservationCard / SummaryCard / PromptCard"]
SSE -->|new_observation / new_summary / new_prompt| Cards
Settings["ContextSettingsModal save"] -->|POST /api/settings| API
SSE -->|disconnect| Reconnect["EventSource auto-reconnect"]
Reconnect --> SSE
```
**Deleted:**
- (Nothing — this subsystem is clean. The only internal cosmetic is `useSSE().observations` + `paginatedObservations` dedup, which is a correct pattern for live + historical merging.)
**Kept:** Everything. User-facing.
---
### 3.11 knowledge-corpus-builder (clean)
```mermaid
flowchart TD
A["POST /api/corpus<br/>{name, filters}"] --> B["CorpusBuilder.build"]
B --> C["SearchOrchestrator.search(filters)"]
C --> D["SessionStore.getObservationsByIds"]
D --> E["renderObservations(obs, CorpusDetailStrategy)<br/>(U2 unified renderer)"]
E --> F["CorpusStore.write(~/.claude-mem/corpora/{name}.corpus.json)"]
Q["POST /api/corpus/:name/query {question}"] --> R["CorpusStore.read(name)"]
R --> S["SDK.query(systemPrompt=corpus, userPrompt=question)<br/>(fresh query — no session resume)"]
S --> T["Return answer"]
Re["POST /api/corpus/:name/rebuild"] --> B
Del["DELETE /api/corpus/:name"] --> DelFile["CorpusStore.delete"]
```
**Deleted:**
- `KnowledgeAgent.prime` as a distinct operation — build IS prime (corpus.json is the prime artifact)
- `session_id` persisted in corpus.json
- Auto-reprime on regex-matched expiration (~40 lines)
- `reprime` endpoint (rebuild covers it)
**Kept user-facing:** Build, query, rebuild, delete. Same HTTP surface minus `/prime` and `/reprime`.
**Cost note:** Every query re-loads corpus as system prompt. Claude Agent SDK with prompt caching makes this cheap (cached system prompt TTL is 5 min). Cost approximately equal to session-resume path without the session-expiration brittleness.
---
### 3.12 transcript-watcher-integration (clean)
```mermaid
flowchart TD
Boot["Worker startup"] --> LoadCfg["loadTranscriptWatchConfig"]
LoadCfg --> ParentWatch["fs.watch(parent_dir, {recursive})<br/>watches existing files AND new files"]
ParentWatch --> OnChange([File event])
OnChange --> ReadDelta["FileTailer.readNewBytes"]
ReadDelta --> SplitLines["Split by \\n"]
SplitLines --> Parse["JSON.parse line"]
Parse --> Match["processor.matchesRule(schema)"]
Match --> Route{event type}
Route -->|session_init| Init["sessionManager.initializeSession(sessionDbId)<br/>(direct, no HTTP loopback)"]
Route -->|tool_use + tool_result paired by tool_use_id| Ingest["ingestObservation({sessionDbId, tool_use_id, name, input, output})"]
Route -->|session_end| EndFlow["sessionManager.endSession(sessionDbId)<br/>→ queueSummarize (same as hook path)"]
EndFlow --> WriteCtx["Optional: writeAgentsMd (Cursor flag)"]
Ingest --> Queue["Same pending_messages queue"]
```
**Deleted:**
- 5-second rescan timer for new files (parent-directory recursive watch catches new files natively)
- `pendingTools` state map (lines match by `tool_use_id`; no per-session pairing map needed)
- `observationHandler.execute()` HTTP loopback (direct `ingestObservation` call)
- `isProjectExcluded` re-check inside transcript processor (done once in `ingestObservation`)
**Kept user-facing:** Cursor, OpenCode, Gemini-CLI transcript ingestion. Summary generation at session end. AGENTS.md write.
---
## Part 4: Timer Census — Before vs After (revised 2026-04-22)
| Timer | Before | After |
|---|---|---|
| `staleSessionReaperInterval` (2 min) | ✓ | ✗ deleted (replaced by per-session `setTimeout` for abandoned sessions) |
| `startOrphanReaper` (30 s) | ✓ | ✗ deleted (replaced by `child.on('exit')` handlers + boot-once reconciliation) |
| Transcript rescan (5 s) | ✓ | ✗ parent watch (event-driven `fs.watch` recursive) |
| Summary poll (500 ms × 220 iter) | ✓ | ✗ endpoint blocks |
| Periodic `clearFailedOlderThan(1h)` (2 min) | ✓ | ✗ deleted (moved to boot-once in plan 02) |
| Explicit `PRAGMA wal_checkpoint(PASSIVE)` (2 min) | ✓ | ✗ deleted outright (SQLite `wal_autocheckpoint=1000` default is the contract) |
| Chroma MCP backoff reconnect | ✓ | ✓ (event-driven on disconnect — not a repeating sweeper) |
| Claim-confirm 60-s stale reset | ✓ per claim | ✗ replaced by boot-once `recoverStuckProcessing()` |
| `killSystemOrphans` ppid=1 sweep | ✓ (inside 30-s interval) | ✗ repeating form deleted; function kept and called ONCE at boot (catches leftovers from a prior worker crash) |
| Boot-once `supervisor.pruneDeadEntries` | — | ✓ NEW (catches any registry entry whose PID died before we saw the `exit` event, e.g., across worker restart) |
| Per-iterator idle 3-min `setTimeout` | ✓ | ✓ (per-session, resets on every chunk — now the only defense against hung SDK generators) |
| Per-session abandoned `setTimeout(deleteSession, 15min)` | — | ✓ NEW (per-session; scheduled on last-generator-completion; cleared on new activity) |
| `child.on('exit')` on SDK / MCP spawn | ✓ | ✓ (already wired; now the sole runtime subprocess-death signal) |
| Generator-exit 30-s wait | ✓ | ✓ (per-delete `Promise.race`, not repeating) |
| `ensureProcessExit` 5-s escalate | ✓ | ✓ (inline SIGTERM→SIGKILL, per-operation) |
| EventSource auto-reconnect (UI) | ✓ | ✓ (browser-owned) |
**Repeating background timers:** 3 → **0**.
**Polling loops:** 1 → 0.
**Per-operation timeouts:** unchanged (they're correct).
**Boot-once reconciliation steps:** 3 (recoverStuckProcessing, killSystemOrphans + pruneDeadEntries, clearFailedOlderThan).
**Why zero is achievable** (investigation 2026-04-22, see `08-reconciliation.md` Part 4 cross-check):
1. In-process subprocess death is covered by `child.on('exit')` handlers at `ProcessRegistry.ts:479` (SDK) and `worker-service.ts:530` (MCP). No scanner needed.
2. Hung SDK generators are caught by the per-iterator 3-min `setTimeout` at `SessionQueueProcessor.ts:6` (resets on every chunk at `:51-52, :62-63`). The background `reapHungGenerators` sweep was redundant with it.
3. Cross-restart orphans (ppid=1 Claude processes from a prior crashed worker) are the only case event handlers cannot catch — but they can only exist *before* this worker started, so a single boot-time `killSystemOrphans()` call covers them exhaustively.
4. Abandoned sessions (no activity for 15 min with no pending work) are now detected at the session itself via a per-session `setTimeout(deleteSession, 15min)` set on last-generator-completion and cleared on new activity — no global scanner.
5. SQLite housekeeping: `clearFailedOlderThan(1h)` becomes boot-once (`pending_messages` has no constraint needing periodic purge); explicit `wal_checkpoint(PASSIVE)` is deleted because SQLite's default `wal_autocheckpoint=1000` pages is active (`Database.ts:162-168` sets no override).
---
## Part 5: Deletion Totals
| Area | Lines deleted | Lines added | Net |
|---|---|---|---|
| `ProcessRegistry.ts` facade | -528 | — | -528 |
| `process-spawning.ts` extracted helpers | — | +150 | +150 |
| `staleSessionReaperInterval` + `startOrphanReaper` + `reapStaleSessions` body | -380 | +280 (UnifiedReaper) | -100 |
| `stripMemoryTagsFromPrompt` / `FromJson` wrappers + 6 regex passes | -60 | +15 | -45 |
| Summary-path privacy gap fix | — | +3 | +3 |
| `AgentFormatter` / `HumanFormatter` / `ResultFormatter` / `CorpusRenderer` traversals | -600 | +320 (renderer + 4 strategies) | -280 |
| `parseObservations` + `parseSummary` + `coerceObservationToSummary` | -280 | +150 (unified `parseAgentXml`) | -130 |
| Non-XML fallback + circuit breaker | -80 | — | -80 |
| SearchManager thin facade + `@deprecated` methods | -300 | +40 (display-wrap only) | -260 |
| Chroma silent-fallback + 90-day filter + granular docs + delete-then-add | -220 | +60 | -160 |
| Chroma backfill full-project scan | -200 | +40 (`chroma_synced` flag backfill) | -160 |
| 30-s content-hash dedup window + `findDuplicateObservation` | -80 | +10 (UNIQUE constraint + migration) | -70 |
| Python sqlite3 schema repair | -120 | — | -120 |
| 24+ migration boilerplate collapsed into schema.sql + upgrade-only migrations | -700 | +400 | -300 |
| Summarize 500-ms polling hook | -60 | +20 (blocking endpoint) | -40 |
| Double `/api/context/*` fetches → `/api/session/start` | -120 | +60 | -60 |
| Transcript 5-s rescan + `pendingTools` map + HTTP loopback | -150 | +40 | -110 |
| Rate-limit middleware | -40 | — | -40 |
| `KnowledgeAgent.prime` + `session_id` persistence + auto-reprime | -140 | +30 | -110 |
| Per-route validation boilerplate | -320 | +200 (Zod middleware + schemas) | -120 |
| **TOTAL** | **-4378** | **+1818** | **-2560** |
Estimate: ~2500 lines removed, ~1800 lines added, net ~2500 lines deleted. Actual numbers depend on how aggressively the schema.sql consolidation goes; conservative net is ~1800.
---
## Part 6: Execution Order
Clean-architecture migrations must land in dependency order:
1. **U6 — `stripMemoryTags`** (trivial; unblocks U1) [<1 hr]
2. **U1 — Summary privacy gap** (3 lines; security) [<1 hr]
3. **Ingest helper** (`ingestObservation`, `ingestPrompt`, `ingestSummary`) — consolidates privacy + queue. Foundation for everything else. [1 day]
4. **U5 + response-parser unification** — delete `coerceObservationToSummary`, unify parseAgentXml. [1 day]
5. **U7 + SearchOrchestrator direct routing** — delete SearchManager facade. [1 day]
6. **U4 — delete worker ProcessRegistry facade** — do before U3 because U3 depends on single-registry. [2 days]
7. **U3 — Zero-timer session lifecycle** (revised 2026-04-22) — delete `staleSessionReaperInterval` + `startOrphanReaper`; replace with (a) per-session `setTimeout(deleteSession, 15min)` for abandoned sessions, (b) boot-once `killSystemOrphans()` + `supervisor.pruneDeadEntries()` for cross-restart orphans, (c) trust existing `child.on('exit')` handlers + per-iterator 3-min idle `setTimeout` for in-process cleanup. No `ReaperTick`, no `setInterval` in `src/services/worker/`. [1 day]
8. **Transcript cleanup** — direct `ingestObservation`, parent watch, drop pendingTools map. [1 day]
9. **U2 — unified `renderObservations`** — largest refactor, lowest risk (pure code reorg, no behavior change). [3 days]
10. **SQLite consolidation** — UNIQUE constraint + schema.sql + delete Python repair + one-shot boot recovery. [2 days]
11. **Chroma rewrite** — stable IDs, `chroma_synced` flag, delete backfill scan. [2 days]
12. **Endpoint consolidation**`/api/session/start`, blocking `/api/session/end`. [2 days]
13. **Zod validator middleware** — replaces per-route validation. [2 days]
14. **KnowledgeAgent simplification** — drop prime endpoint, drop session_id. [1 day]
15. **HTTP cleanup** — delete rate limit, cache static files. [<1 day]
Total estimated work: ~18 engineer-days for full clean-through. The first three items (privacy gap + ingest helper) can land in one day and close the security bug.
---
## Part 7: What This Does NOT Cull
For the record, the following are **not** bullshit and stay as-is:
- **Pending-messages queue** (async pipeline between hook ack and SDK processing)
- **Fire-and-forget Chroma sync from write path** (writes must not block on vector index)
- **SSE broadcasting** (live UI updates)
- **WAL mode + FTS5 triggers** (correct SQLite design)
- **Graceful shutdown with SIGTERM→SIGKILL escalation** (correct process lifecycle)
- **RestartGuard** (crash-loop prevention)
- **Mode-based filtering** (user-facing feature)
- **Per-project Chroma collections** (multi-tenant semantics)
- **Content-hash on observations** (useful for cross-machine dedup, just not the 30-s window)
- **EventSource auto-reconnect** (correct networking)
- **Agent provider abstraction** (SDKAgent / OpenRouterAgent / GeminiAgent)
- **Transcript schema-driven classification** (Cursor, OpenCode, etc.)
- **Human vs Agent context formats** (user-facing output shapes)
- **Admin restart/shutdown endpoints** (used by version-bump)
Everything above is real work. Everything deleted above it is accumulated patch cruft.
@@ -0,0 +1,691 @@
# Pathfinder Phase 6: Implementation Plan
**Date**: 2026-04-22
**Source**: `PATHFINDER-2026-04-21/05-clean-flowcharts.md`
**Scope**: 15 execution phases to land the brutal-audit cleanup. Each phase is self-contained so it can be run in a fresh chat session.
> **Design authority**: `05-clean-flowcharts.md` is the canonical design doc. This plan references it by section number (e.g., "05: 3.2" = section 3.2 of the clean-flowcharts file). When the plan and audit disagree, the plan's *verified-findings* take precedence — those corrections are called out explicitly in Phase 0.
---
## Phase 0 — Documentation Discovery (ALREADY COMPLETED)
The design docs needed for this plan have been read and verified against the live codebase. **Do not re-do this phase**; cite its outputs from later phases.
### Sources consulted
1. `PATHFINDER-2026-04-21/05-clean-flowcharts.md` — brutal audit + 12 clean flowcharts (Part 3), timer census (Part 4), deletion ledger (Part 5), execution order (Part 6), non-cull list (Part 7)
2. `PATHFINDER-2026-04-21/02-duplication-report.md` — 12 cross-feature duplication findings (background)
3. `PATHFINDER-2026-04-21/03-unified-proposal.md` — earlier consolidation targets (U1U8)
4. Live codebase at `/Users/alexnewman/.superset/worktrees/claude-mem/vivacious-teeth/src/**/*.ts`
### Verified-findings corrections (supersede the audit where they disagree)
These were produced by four parallel discovery subagents. Use these numbers in every downstream phase.
| # | Audit claimed | Reality | Impact on plan |
|---|---|---|---|
| V1 | Summary path only strips `<system-reminder>` (`summarize.ts:66`, `SessionRoutes.ts:669`) | Summary paths strip **ZERO** tags. `handleSummarize` (`SessionRoutes.ts:491`) and `handleSummarizeByClaudeId` (`SessionRoutes.ts:669`) pass `last_assistant_message` straight to `queueSummarize` with no strip. | Privacy gap is **worse** than audit — fix must be added to `ingestSummary`, not a one-line patch. |
| V2 | Legacy `handleObservations` with no-strip at `SessionRoutes.ts:378` | `handleObservations` is at `SessionRoutes.ts:464`. It does **not** strip tags. `handleObservationsByClaudeId` at `SessionRoutes.ts:560` **does** strip (lines 629633). | Delete/consolidate *both* into `ingestObservation` helper. |
| V3 | `stripMemoryTagsFromJson` + `stripMemoryTagsFromPrompt` wrappers exist | Confirmed. `src/utils/tag-stripping.ts:79` and `:89` both delegate to `stripTagsInternal` at line 48. Six sequential `.replace()` calls at lines 6166. | U6 target is exact. |
| V4 | Only 3 files call any `stripMemoryTags*` variant | Confirmed. `SessionRoutes.ts:629`, `:633`, `:862`. **No call sites** in summary, legacy observation, or summarize hook. | After U6, verify call-site count equals number of new ingest helpers × text fields. |
| V5 | `startUnifiedReaper` at `process-registry.ts:492` | **Does not exist**. Supervisor registry (`src/supervisor/process-registry.ts`, 408 lines) has `ProcessRegistry` class + `reapSession()` (line 292) but no background timer. Both reapers live in the **worker layer**. | Phase 6 builds `ReaperTick` fresh in worker-service.ts; supervisor registry stays as-is. |
| V6 | Two reapers in worker | Confirmed. `startOrphanReaper` (`src/services/worker/ProcessRegistry.ts:508`, invoked from `worker-service.ts:537`, 30 s). `staleSessionReaperInterval` (inline `setInterval` at `worker-service.ts:547`, 2 min, calls `SessionManager.reapStaleSessions`). Orphan reaper does **not** call `reapStaleSessions`. | Phase 6 replaces both. |
| V7 | `coerceObservationToSummary` exists + non-XML early-fail + circuit breaker | Confirmed. Private fn at `src/sdk/parser.ts:222`. Non-XML fail at `ResponseProcessor.ts:87108`. Circuit breaker at `ResponseProcessor.ts:176200` using `session.consecutiveSummaryFailures`. | Phase 3 deletion set is exact. |
| V8 | 500 ms poll up to 110 s in summarize hook | Confirmed. `src/cli/handlers/summarize.ts:117150`. Constants: `POLL_INTERVAL_MS = 500` (:24), `MAX_WAIT_FOR_SUMMARY_MS = 110_000` (:25). | Phase 11 replaces with blocking endpoint. |
| V9 | SessionRoutes has 8 endpoints | Actually **10**: six under `/sessions/:sessionDbId/*` (`:377:382`) and five under `/api/sessions/*` (`:385:389`). `/api/sessions/status` is the one summary-hook polls. | Phase 11 collapses to 4; deletes are larger than audit implied. |
| V10 | `ensureWorkerRunning` at every hook entry | Confirmed. Called in all 8 CLI handlers (`context.ts:19`, `user-message.ts:35`, `summarize.ts:44`, `observation.ts`, `file-context.ts`, `file-edit.ts`, `session-init.ts`, `session-complete.ts`). | Phase 1 hook-cache module lands before endpoint consolidation. |
| V11 | SearchManager thin facade | Confirmed for `@deprecated` methods (`queryChroma` at `:59`, `searchChromaForTimeline` at `:70`) — but `search()` at `:161445` does *real* work (result combining, date filtering, grouping, markdown tables). File is 2069 lines. | Phase 4 keeps display-wrap, deletes deprecated + passthroughs only. |
| V12 | 27 migrations | 22 private methods in `MigrationRunner.runAllMigrations` (lines 2241 of `src/services/sqlite/migrations/runner.ts`); legacy system adds ~5 more. `schema_versions` table created at `runner.ts:55`. | Phase 9 target is "22+legacy → schema.sql + N upgrade migrations". |
| V13 | Python `sqlite3` subprocess ~120 lines | Python script embedded; invoked via `execSync('python3 ...')` at `tests/services/sqlite/schema-repair.test.ts:62` (test file is 253 lines; production script similar). | Phase 9 deletion confirmed; move to user-facing `claude-mem repair` subcommand. |
| V14 | 30-s content-hash dedup window + `findDuplicateObservation` ~30 lines | Confirmed at `src/services/sqlite/observations/store.ts:13` (`DEDUP_WINDOW_MS = 30_000`). `findDuplicateObservation` is 11 lines at `:3646`. Dedup key is SHA of `(memory_session_id, title, narrative)` — not `tool_use_id`. | Phase 9 adds `UNIQUE(session_id, tool_use_id)` constraint and removes window; this is a **new** constraint, not an existing one. |
| V15 | No `chroma_synced` column | Confirmed. Phase 10 must add it in a migration. | Blocks Phase 10's backfill simplification. |
| V16 | Granular per-field Chroma docs (35 per obs) | Confirmed. 7 observation fields + 6 summary fields (`ChromaSync.ts:125256`). `formatObservationsAsDocs` and `formatSummariesAsDocs` produce separate docs. | Phase 10 concatenates into one doc per observation/summary. |
| V17 | `getExistingChromaIds` metadata scan + delete-then-add on conflict | Confirmed. `getExistingChromaIds` at `ChromaSync.ts:479545` pages via `chroma_get_documents` with `include: ['metadatas']`. Delete-then-add at `:292306`. | Phase 10 replaces with `upsert` using stable IDs. |
| V18 | 5-s rescan + `pendingTools` map + HTTP loopback | Confirmed. `src/services/transcripts/watcher.ts:124` (`rescanIntervalMs ?? 5000`). `pendingTools` in `SessionState` interface. `observation.ts:17` loops through `workerHttpRequest('/api/sessions/observations', …)`. Watcher calls handler directly; handler HTTPs back to worker. | Phase 7 replaces with `fs.watch(parentDir, {recursive})` and direct `ingestObservation(payload)` call. |
| V19 | 60-s stale reset in every `claimNextMessage` | Confirmed. `src/services/sqlite/PendingMessageStore.ts:99145`. Constant `STALE_PROCESSING_THRESHOLD_MS = 60_000` at `:6`. | Phase 6 moves the reset to worker startup. |
| V20 | Rate limiter 300/min | Confirmed at `src/services/worker/http/middleware.ts:4579`. Constants at `:4950`. Keyed by IP, normalizes `::ffff:127.0.0.1`. | Phase 14 deletes. |
### Allowed APIs (what the refactor may rely on)
Copy from these exact sources; do **not** invent.
- **bun:sqlite** — `Database`, `db.prepare(sql)`, `db.run`, `db.transaction(fn)`. Unique constraint: `CREATE TABLE x (... UNIQUE(a,b))`. Conflict clause: `INSERT ... ON CONFLICT DO NOTHING` or `ON CONFLICT (a,b) DO UPDATE SET ...`. (Used everywhere under `src/services/sqlite/`.)
- **Express 4** — `app.get/post`, `router.use(middleware)`, `req.body`, `res.json`, `res.sendFile`, SSE via `res.write('event: …\ndata: …\n\n')`. (See `BaseRouteHandler.ts`, `SSEBroadcaster.ts`.)
- **Zod** — `z.object({...})`, `schema.safeParse(body)`, `result.success ? result.data : result.error.flatten()`. (Not yet a dep; Phase 12 adds `zod` via npm; already shipped transitively via `@anthropic-ai/sdk` — confirm before landing.)
- **Node `fs.watch`** — `fs.watch(dir, { recursive: true }, (event, filename) => …)`. On macOS + Linux recursive is supported; Windows is too. New files in the watched directory fire `rename` events. (Replaces the 5-s rescan timer.)
- **Claude Agent SDK `@anthropic-ai/claude-agent-sdk`** — existing usage in `src/services/worker/SDKAgent.ts`. Agent contract requires `<summary>` OR `<skip_summary/>`; see `src/sdk/prompts.ts` for the exact instruction text.
### Anti-patterns to prohibit (cite in every phase)
A. **Inventing APIs** — never add a method to a class because it "should exist". Grep the class first.
B. **Polling where events exist**`setInterval` + HTTP poll replaced by blocking endpoint or SSE.
C. **Silent fallbacks** — Chroma failure returns 503, not dropped-query-text search. Parser failure marks `pending_messages` FAILED, not coerced summary.
D. **Facades that pass through** — if a method body is `return this.other.method(args)`, delete it; call `this.other` directly.
E. **Two code paths for the same data** — if transcript watcher and CLI handler both ingest observations, they call the same helper. No duplicate tag-strip logic.
---
## Phase 1 — One `stripMemoryTags` + close summary privacy gap
**Outcome**: A single public `stripMemoryTags(text: string): string`. Every text-ingress call-site switches to it. Summary paths strip tags (closes P1 security bug).
### Context this phase needs
- `05-clean-flowcharts.md` section 3.2 (privacy-tag-filtering clean flowchart)
- Verified-findings V1, V2, V3, V4
- `src/utils/tag-stripping.ts:4891` — existing wrappers
### Tasks
1. **Rewrite `src/utils/tag-stripping.ts`** to export:
```ts
const MEMORY_TAGS = ['private','claude-mem-context','system_instruction','system-instruction','persisted-output','system-reminder'] as const;
const STRIP_REGEX = new RegExp(`<(${MEMORY_TAGS.join('|')})\\b[^>]*>[\\s\\S]*?<\\/\\1>`, 'g');
export function stripMemoryTags(text: string): string { /* one pass; ReDoS guard if match count > 100 */ }
```
Delete `stripMemoryTagsFromPrompt`, `stripMemoryTagsFromJson`, `stripTagsInternal`, `SYSTEM_REMINDER_REGEX`. Keep the length/timing guards from the existing file if they're there today.
2. **Fix every call site** to use `stripMemoryTags`:
- `SessionRoutes.ts:629,633` (was `stripMemoryTagsFromJson`): call on `JSON.stringify(tool_input)` and `JSON.stringify(tool_response)` — same shape, new name.
- `SessionRoutes.ts:862` (was `stripMemoryTagsFromPrompt`): unchanged signature.
- **Add** in `SessionRoutes.ts:464` (legacy `handleObservations`): strip `tool_input` and `tool_response` before `queueObservation`.
- **Add** in `SessionRoutes.ts:491` (`handleSummarize`): strip `last_assistant_message` before `queueSummarize`.
- **Add** in `SessionRoutes.ts:669` (`handleSummarizeByClaudeId`): same.
3. **Update the test** `tests/utils/tag-stripping.test.ts` (if present) to cover the merged function; delete tests for the removed wrappers.
### Verification
- [ ] `grep -r "stripMemoryTagsFromJson\|stripMemoryTagsFromPrompt\|stripTagsInternal" src/` → zero hits.
- [ ] `grep -c "stripMemoryTags(" src/` ≥ 5 (new call sites: 3 existing + 3 new summary/legacy paths).
- [ ] Regression test: insert `<private>secret</private>` into a summary via `/sessions/:id/summarize`; assert `session_summaries.last_assistant_message` contains no `<private>` or `secret`.
- [ ] `npm run build-and-sync` succeeds.
### Anti-pattern guards
- A: Don't add a `stripMemoryTagsV2` wrapper — rename in place.
- D: Don't leave the old function names as re-exports "for safety" — delete.
### Blast radius
Edits: 2 files (`tag-stripping.ts`, `SessionRoutes.ts`). No schema changes.
---
## Phase 2 — Unified ingest helpers
**Outcome**: Three helpers that every ingest point calls. No HTTP loopback inside the worker process.
### Context this phase needs
- `05-clean-flowcharts.md` section 3.1 (lifecycle-hooks clean), Part 2 Decision D1
- Verified-findings V2, V18
- Phase 1 **MUST** be done first.
### Tasks
1. **Create `src/services/worker/ingest/index.ts`** exporting:
```ts
export function ingestObservation(payload: IngestObservationPayload): Promise<IngestResult>;
export function ingestPrompt(payload: IngestPromptPayload): Promise<IngestResult>;
export function ingestSummary(payload: IngestSummaryPayload): Promise<IngestResult>;
```
Each helper: (a) calls `stripMemoryTags` on user-facing text fields, (b) runs privacy / project-exclusion validation (move logic from `SessionRoutes.handleObservationsByClaudeId:614621` and `PrivacyCheckValidator.ts`), (c) INSERTs into `pending_messages`. Returns `{ skipped: boolean, id?: number, reason?: string }`.
2. **Rewire** `SessionRoutes.ts:464` (`handleObservations`), `:560` (`handleObservationsByClaudeId`), `:491` + `:669` (summarize), `:862` (`handleSessionInitByClaudeId` → `ingestPrompt`) to call the helpers. Route handler's job shrinks to body parsing + response serialization.
3. **Rewire** `src/cli/handlers/observation.ts` to call `ingestObservation` directly when the worker is the current process — but since hooks run in CLI, they still HTTP to the worker. The key change: the worker side of the route talks to the helper, no more inline logic.
4. **Rewire** `src/services/transcripts/watcher.ts` to call `ingestObservation(payload)` directly (no `workerHttpRequest` from inside the worker). Delete the inner HTTP call from the transcript path.
### Verification
- [ ] `grep -n "stripMemoryTags" src/services/worker/` → only inside `ingest/index.ts`.
- [ ] `grep -n "queueObservation\|queueSummarize" src/services/worker/http/routes/SessionRoutes.ts` → zero (handlers use ingest helpers).
- [ ] Unit tests for each helper: tag stripping, privacy validation, project exclusion, INSERT behaviour, idempotent returns for dup.
- [ ] Integration: run full hook cycle via `npm run build-and-sync` + trigger `SessionStart` + `PostToolUse`; observe `pending_messages` row.
### Anti-pattern guards
- E: Don't leave behind `handleObservations` and `handleObservationsByClaudeId` with slightly different logic. One helper, both handlers call it.
- A: No `IngestService` class unless two existing classes already share state. A module with three functions is enough.
### Blast radius
Files touched: `SessionRoutes.ts`, new `ingest/*`, `watcher.ts`, `PrivacyCheckValidator.ts` (may collapse into helper). No schema changes.
---
## Phase 3 — Unify parser; delete coerce + circuit breaker
**Outcome**: One `parseAgentXml(text, {requireSummary})`. `coerceObservationToSummary`, consecutive-failure counter, and non-XML early-fail branch are gone. RestartGuard handles repeated failures.
### Context this phase needs
- `05-clean-flowcharts.md` section 3.7, Part 2 Decision D5
- Verified-findings V7
- `src/sdk/parser.ts`, `src/services/worker/agents/ResponseProcessor.ts:87200`, `src/services/worker/RestartGuard.ts`
- `src/sdk/prompts.ts` — agent instructions must already state "return `<summary>` or `<skip_summary/>`". If not, update the prompt in this phase.
### Tasks
1. **Replace `parser.ts`** with:
```ts
export interface ParsedAgentOutput {
observations: ParsedObservation[];
summary: ParsedSummary | null;
skipSummary: boolean;
}
export interface ParseResult {
valid: boolean;
data?: ParsedAgentOutput;
reason?: 'no_xml' | 'missing_summary' | 'malformed';
}
export function parseAgentXml(text: string, opts: { requireSummary: boolean }): ParseResult;
```
Delete `parseObservations` and `parseSummary` exports; keep them as private helpers only if the call sites merge into one. Delete `coerceObservationToSummary` outright.
2. **Update `ResponseProcessor.ts`**:
- Replace the parse path with a single `parseAgentXml(text, {requireSummary: session.expectsSummary})`.
- On `valid:false`: call `session.recordFailure(result.reason)` → mark `pending_messages` FAILED → let RestartGuard decide. Delete lines `:87108` (non-XML early-fail), lines `:176200` (`consecutiveSummaryFailures` counter + circuit).
- Remove the `consecutiveSummaryFailures` field from `ActiveSession`.
3. **Update `sdk/prompts.ts`** if needed so the agent contract is explicit: on work → one or more `<observation>` then exactly one `<summary>`; on no work → `<skip_summary/>`.
### Verification
- [ ] `grep -n "coerceObservationToSummary\|consecutiveSummaryFailures" src/` → zero hits.
- [ ] `grep -n "parseObservations\|parseSummary" src/ | grep -v parser.ts` → zero (callers use `parseAgentXml`).
- [ ] Test: inject garbage-text agent output; assert `pending_messages.status = 'failed'` and no summary row written.
- [ ] Test: inject valid `<observation>` without `<summary>` when `requireSummary=true`; assert `valid:false, reason:'missing_summary'`.
- [ ] RestartGuard still trips after N consecutive failures (unchanged count).
### Anti-pattern guards
- C: Don't coerce "close enough" to `<summary>`. Fail fast.
- A: No new `ParserValidator` class. Pure function returns a result object.
### Blast radius
Files: `parser.ts`, `ResponseProcessor.ts`, possibly `prompts.ts`, `ActiveSession` (remove counter field). No schema changes.
---
## Phase 4 — Delete `SearchManager` pass-throughs
**Outcome**: HTTP route → `SearchOrchestrator` directly. `SearchManager` shrinks to the display-wrap only.
### Context this phase needs
- `05-clean-flowcharts.md` section 3.6
- Verified-finding V11
- `src/services/worker/SearchManager.ts` (2069 lines) and `src/services/worker/http/routes/SearchRoutes.ts`
### Tasks
1. **Route rewire**: `SearchRoutes.ts` handlers call `SearchOrchestrator.search(params)` directly for structured results, then `renderSearchResults(results, format)` (new small helper extracted from current SearchManager) for markdown.
2. **Delete from `SearchManager.ts`**:
- `queryChroma` (`:59`, `@deprecated`) — delete all call sites first (grep).
- `searchChromaForTimeline` (`:70`) — delete.
- Any method whose body is `return this.orchestrator.foo(...)` with no other work.
3. **Keep** the result-combining / grouping / markdown-table code in `SearchManager.search()` as a `renderSearchResults(results, opts)` module. This is real work (V11). Put it in `src/services/worker/search/ResultRenderer.ts` if not already there.
4. **Delete** `filterByRecency` default 90-day filter. Callers pass `dateRange` explicitly.
### Verification
- [ ] `grep -n "class SearchManager" src/` → file either deleted or reduced to < 200 lines of display logic.
- [ ] `grep -n "queryChroma\|searchChromaForTimeline" src/` → zero.
- [ ] `grep -n "filterByRecency" src/` → zero.
- [ ] Integration: `curl '/api/search?q=test&project=cm&format=markdown'` and `format=json` — both return expected shapes.
### Anti-pattern guards
- D: A method that forwards must die.
- C: If Chroma is disabled and `q` is set, return 503 with `error: 'chroma_unavailable'` — don't silently run a SQLite fallback.
### Blast radius
`SearchManager.ts`, `SearchRoutes.ts`, new `ResultRenderer.ts`. No schema changes.
---
## Phase 5 — Delete worker `ProcessRegistry` facade
**Outcome**: Worker talks to `src/supervisor/process-registry.ts` directly. `src/services/worker/ProcessRegistry.ts` becomes a small module of free functions for spawning and SIGTERM→SIGKILL escalation (not a registry).
### Context this phase needs
- `05-clean-flowcharts.md` section 3.8, Part 2 Decision D3
- Verified-findings V5, V6
- `src/services/worker/ProcessRegistry.ts` (527 lines), `src/supervisor/process-registry.ts` (408 lines), `src/services/worker-service.ts` (uses both)
### Tasks
1. **Audit `worker/ProcessRegistry.ts` exports** and rehome:
- `registerProcess`, `unregisterProcess`, `getProcessBySession`, `getActiveCount`, `waitForSlot`, `getActiveProcesses`, `reapOrphanedProcesses` → these wrap the supervisor's registry. Delete the worker copies; callers switch to `getSupervisor().getRegistry().foo(…)` (already what they ultimately hit).
- `ensureProcessExit` (`:185`, SIGTERM→SIGKILL escalation) → keep as a free function in a new `src/services/worker/process-control.ts`. Inline the 5-s wait + SIGKILL. Remove the ladder-framework packaging.
- `createPidCapturingSpawn` (`:393`) → move to `process-control.ts`.
- `startOrphanReaper` (`:508`) → **delete in Phase 6** (replaced by ReaperTick).
2. **Delete** `src/services/worker/ProcessRegistry.ts` when it's empty.
3. **Update all imports** (grep for `from.*worker/ProcessRegistry` and re-point).
### Verification
- [ ] `test -f src/services/worker/ProcessRegistry.ts` → false.
- [ ] `grep -rn "worker/ProcessRegistry" src/` → zero.
- [ ] All worker + tests still compile: `npx tsc --noEmit`.
- [ ] Manual test: start worker, spawn a summarize subprocess, SIGTERM it → observe SIGKILL after 5 s.
### Anti-pattern guards
- D: Do not add a "compatibility shim" that re-exports the deleted symbols.
- A: `ensureProcessExit` is five lines — don't build a class for it.
### Blast radius
Big import fan-out. Compile-time breakage until all imports are fixed. Runtime: identical behavior (supervisor registry was always the backing store).
---
## Phase 6 — `ReaperTick`: single 30-s timer with three checks
**Outcome**: One `setInterval(30_000)` in `worker-service.ts`. Three skippable checks: prune dead PIDs (every tick), kill hung generators (every 4 ticks), delete abandoned sessions (every 4 ticks). The per-claim 60-s stale reset runs once at boot instead.
### Context this phase needs
- `05-clean-flowcharts.md` section 3.8 subgraph `OneReaper`, Part 4 timer census
- Verified-findings V6, V19
- Phase 5 **MUST** be done.
### Tasks
1. **Create `src/services/worker/reaper.ts`**:
```ts
export function startReaperTick(deps: {
processRegistry: ProcessRegistry;
sessionManager: SessionManager;
pendingStore: PendingMessageStore;
thresholds?: { generatorIdleMs?: number; sessionIdleMs?: number };
}): { stop(): void };
```
Internally: tick counter, `reapDeadPids()` every tick, `reapHungGenerators()` + `reapAbandonedSessions()` every 4 ticks. Thresholds: `generatorIdleMs=5*60_000`, `sessionIdleMs=15*60_000`.
2. **Delete `startOrphanReaper`** (`ProcessRegistry.ts:508`) and `staleSessionReaperInterval` (`worker-service.ts:547`). Delete `reapOrphanedProcesses`, `killSystemOrphans`, `killIdleDaemonChildren` as separate functions; fold their bodies into `reapDeadPids`.
3. **Move `PendingMessageStore.claimNextMessage`** stale reset from inside the claim (lines `:99145`) into a new `PendingMessageStore.recoverStuckProcessing()` method called once at worker boot in `worker-service.ts` after the DB is ready. The claim becomes a clean `SELECT ... LIMIT 1 FOR UPDATE`-equivalent transaction.
4. **Update `worker-service.ts`** shutdown path to `stop()` the ReaperTick before orphan reaper (it's the same thing now).
### Verification
- [ ] `grep -n "setInterval" src/services/worker*/` → exactly one call (inside `reaper.ts`).
- [ ] `grep -n "staleSessionReaperInterval\|startOrphanReaper" src/` → zero.
- [ ] `grep -A3 "STALE_PROCESSING_THRESHOLD_MS" src/services/sqlite/PendingMessageStore.ts` → threshold used only in `recoverStuckProcessing`.
- [ ] Integration test: kill the SDK subprocess for a running session; within 30 s the ProcessRegistry has unregistered and SessionManager entry is gone.
- [ ] Boot recovery test: insert `pending_messages` row with `status=processing, started_processing_at_epoch=epoch-2hr`; start worker; assert row flipped back to `pending` within boot.
### Anti-pattern guards
- B: No polling loops. `claimNextMessage` must not do self-healing on each call.
- A: No `Reaper` class unless a second state ever has to live there. Start as a function.
### Blast radius
Worker lifecycle + SQLite claim path. Risk: reaper timing regression. Mitigation: keep the three thresholds identical to today.
---
## Phase 7 — Transcript watcher cleanup
**Outcome**: `fs.watch(parent_dir, {recursive: true})` instead of 5-s rescan. No `pendingTools` state map (match by `tool_use_id` at line boundary). Direct `ingestObservation` call; no HTTP loopback from inside worker.
### Context this phase needs
- `05-clean-flowcharts.md` section 3.12
- Verified-finding V18
- Phases 2, 5, 6 **MUST** be done.
### Tasks
1. **Rewrite `src/services/transcripts/watcher.ts`**:
- Replace periodic rescan (`setInterval(… 5000)`) with `fs.watch(parentDir, { recursive: true }, onFileEvent)`. Handle `rename` events to add new files, `change` events to tail existing ones.
- Delete `rescanIntervalMs` config option and the watcher-internal timer.
2. **Rewrite `src/services/transcripts/processor.ts`**:
- Remove `pendingTools: Map<string, {name?, input?}>` from `SessionState`.
- When a JSONL line is a `tool_use` → enqueue into a per-file map keyed by `tool_use_id`. When a later line is a `tool_result` with the same `tool_use_id`, emit one `IngestObservationPayload` and drop the entry. If a tool_use has no tool_result after N lines (say, 10 MB of JSONL read), timeout-log and drop.
3. **Replace HTTP loopback** with `import { ingestObservation } from '…/worker/ingest'` and direct call.
4. **Project-exclusion**: let `ingestObservation` handle it; remove the re-check in the transcript processor.
### Verification
- [ ] `grep -n "setInterval" src/services/transcripts/` → zero.
- [ ] `grep -n "pendingTools" src/` → zero.
- [ ] `grep -n "workerHttpRequest" src/services/transcripts/ src/cli/handlers/observation.ts` → count ≥ 0 (CLI handler can still HTTP the worker; only the *in-process* loopback is forbidden).
- [ ] Integration: drop a new Cursor transcript file into the watched dir; within 1 s a `pending_messages` row appears.
### Anti-pattern guards
- B: No fallback polling "in case fs.watch misses an event". Parent-recursive watch is the contract.
- E: The transcript ingest path and the hook ingest path both call `ingestObservation`. One function, two callers.
### Blast radius
Transcript watcher only. Kept user-facing: Cursor, OpenCode, Gemini-CLI JSONL ingest still works.
---
## Phase 8 — Unified `renderObservations(obs, strategy)`
**Outcome**: One traversal, four strategy configs. `AgentFormatter`, `HumanFormatter`, `ResultFormatter`, and `CorpusRenderer` become strategy definitions that plug into the single renderer.
### Context this phase needs
- `05-clean-flowcharts.md` section 3.5 (context-injection) + Part 2 Decision D4
- Files: `src/services/context/formatters/{AgentFormatter,HumanFormatter}.ts`, `src/services/worker/search/ResultFormatter.ts`, `src/services/worker/knowledge/CorpusRenderer.ts`, all section renderers under `src/services/context/sections/`
### Tasks
1. **Design the renderer contract** in `src/services/rendering/renderObservations.ts`:
```ts
export interface RenderStrategy {
name: 'agent' | 'human' | 'search' | 'corpus';
columns: Array<'title'|'narrative'|'facts'|'file'|'date'|'session'|'tokens'>;
density: 'compact' | 'normal' | 'verbose';
grouping?: 'none' | 'by-day' | 'by-file' | 'by-session';
colorize?: boolean; // terminal ANSI
tokenBudget?: number;
}
export function renderObservations(obs: Observation[], strategy: RenderStrategy): string;
```
2. **Replace** each of the four formatters with a `RenderStrategy` object (e.g., `AgentContextStrategy`, `HumanContextStrategy`, `SearchResultStrategy`, `CorpusDetailStrategy`). The strategies live in their respective feature folders; the renderer is shared.
3. **Move one-off logic** (ANSI coloring, token budgeting, day-grouping) from the four formatters into the renderer, gated by strategy flags.
4. **Keep** mode filtering + section ordering in the *builder* (`ContextBuilder`) — only the final render step unifies.
### Verification
- [ ] `grep -rn "formatObservation\|renderObservation" src/ | wc -l` — one shared renderer, four strategy files.
- [ ] Snapshot tests: for each strategy, feed the same fixture `Observation[]` and assert output is byte-equal to the old formatter's output.
- [ ] `npm run build-and-sync` + SessionStart injects a context block identical to pre-refactor bytes (modulo strategy-flagged differences).
### Anti-pattern guards
- E: No "almost the same" paths remain. All four formatters end up as thin `export const FooStrategy: RenderStrategy = …` files.
- A: No `RendererFactory`. The renderer is a pure function.
### Blast radius
Pure code reorganization, lowest risk. Snapshot tests are the safety net.
---
## Phase 9 — SQLite consolidation
**Outcome**: Fresh DBs use `schema.sql` (current state). Upgrade-only migrations run for old DBs. `UNIQUE(session_id, tool_use_id)` added. 30-s content-hash dedup window removed. Python repair script gone; user-facing `claude-mem repair` command added.
### Context this phase needs
- `05-clean-flowcharts.md` section 3.3, Part 5 ledger rows for SQLite
- Verified-findings V12, V13, V14
- `src/services/sqlite/migrations/runner.ts`, `src/services/sqlite/observations/store.ts`, `tests/services/sqlite/schema-repair.test.ts`
### Tasks
1. **Add `observations.tool_use_id` column** in a new migration (if not already there — grep the schema). Add `UNIQUE(session_id, tool_use_id)` constraint. For observations without a `tool_use_id` (legacy rows), set a synthetic value like `legacy:<id>` so the UNIQUE doesn't collide.
2. **Rewrite `observations/store.ts`**:
- Use `INSERT ... ON CONFLICT (session_id, tool_use_id) DO NOTHING RETURNING id`.
- On conflict, re-SELECT the existing row and return its `id`. Idempotent.
- Delete `DEDUP_WINDOW_MS`, `findDuplicateObservation`, and the content-hash dedup query. **Keep** the `content_hash` column — it's useful for cross-machine dedup analytics; just don't use it as a dedup gate.
3. **Create `src/services/sqlite/schema.sql`** with the current schema. On fresh DB, run `schema.sql` then write `schema_versions` row at current version. On existing DB, skip `schema.sql` and run only migrations with `version > max(schema_versions.version)`.
4. **Delete the Python repair path** (`execSync('python3 …')`). Add a new CLI subcommand `claude-mem repair` that runs the Python script on demand — this is for users who hit corruption from v<X. Document in a new `docs/public/troubleshooting/repair.mdx` page.
5. **Consolidate migration boilerplate**. 22+ migrations with `CREATE TABLE IF NOT EXISTS` patterns become: `schema.sql` covers everything; remaining upgrade migrations only do `ALTER TABLE` / `CREATE INDEX IF NOT EXISTS` / data migrations.
### Verification
- [ ] Fresh-install test: delete `~/.claude-mem/claude-mem.db`; start worker; assert `schema_versions.version = N` and all expected tables exist.
- [ ] Upgrade test: start worker on an old DB from v6.0; assert all migrations run and the final schema matches `schema.sql`.
- [ ] Dup test: insert two `observations` rows with the same `(session_id, tool_use_id)`; assert second INSERT returns the first row's id and no duplicate row exists.
- [ ] `grep -n "execSync.*python" src/` → zero.
- [ ] `claude-mem repair` command executes without error on a known-corrupt DB fixture.
### Anti-pattern guards
- A: No "schema migration framework". bun:sqlite + a `schema_versions` table + a list of migration functions is enough.
- E: Don't keep both content-hash dedup and UNIQUE(session_id, tool_use_id) as two gates. Pick one (the constraint).
### Blast radius
Highest-risk migration in the plan. Requires backfill of `tool_use_id` for rows that don't have it. Run in a staged release with the `claude-mem repair` fallback.
---
## Phase 10 — Chroma rewrite
**Outcome**: One doc per observation (title + narrative + facts concatenated). Stable ID `obs:<sqlite_rowid>`. Upsert instead of delete-then-add. `chroma_synced` boolean column on `observations`; backfill only rows where the flag is false. Full-project scan on boot deleted.
### Context this phase needs
- `05-clean-flowcharts.md` section 3.4
- Verified-findings V15, V16, V17
- `src/services/sync/ChromaSync.ts:125545`
- Phase 9 **MUST** be done (so `chroma_synced` migration can land alongside).
### Tasks
1. **Migration**: add `chroma_synced INTEGER DEFAULT 0` column to `observations` and `session_summaries`.
2. **Rewrite `ChromaSync.formatObservationAsDoc`**: one doc per observation. Text = `title + "\n\n" + narrative + "\n\n" + facts.join("\n")`. ID = `obs:${sqliteRowId}`. Metadata keeps project, session_id, timestamp, type. Same for summaries (one doc, stable ID).
3. **Replace `chromaSync.syncObservation`** write path: `chroma_mcp.upsert(id, text, metadata)`. On success: `UPDATE observations SET chroma_synced=1 WHERE id=?`. On failure: `logger.warn`, leave flag 0.
4. **Replace `ensureBackfilled` + `runBackfillPipeline` + `getExistingChromaIds`** with a simple `backfillUnsynced(limit=1000)` called **once at boot**. Query: `SELECT id FROM observations WHERE chroma_synced=0 LIMIT 1000`. For each: format → upsert → mark.
5. **Delete** `backfillAllProjects` (static), `ensureBackfilled`, `runBackfillPipeline`, `getExistingChromaIds`, `formatObservationsAsDocs`, `formatSummariesAsDocs` (multi-doc), and the delete-then-add conflict handler.
### Verification
- [ ] Chroma index contains one doc per observation (not 7). Query Chroma directly: `chroma_count_documents(collection)` = `SELECT COUNT(*) FROM observations WHERE chroma_synced=1`.
- [ ] Idempotent re-sync: call `syncObservation` twice with same ID; assert no conflict, one doc.
- [ ] Boot with Chroma down: observations sync'd to SQLite normally, `chroma_synced=0`. Start Chroma, restart worker: those rows upserted within boot.
- [ ] `grep -n "backfillAllProjects\|ensureBackfilled\|getExistingChromaIds" src/` → zero.
### Anti-pattern guards
- C: On Chroma failure at write time, do **not** throw — leave flag 0 and move on. The backfill path covers recovery.
- A: No `ChromaBackfillScheduler`. One function, called at boot, done.
### Blast radius
Chroma index regenerates under the new doc shape. Users see the old index until the first boot-time backfill completes (may take minutes on large corpora).
---
## Phase 11 — Endpoint consolidation
**Outcome**: 10 session endpoints → 4. `/api/session/start` returns context + semantic in one call. `/api/session/end` blocks until summary written or 110-s timeout (no hook-side polling). `/api/context/inject` + `/api/context/semantic` deleted or folded.
### Context this phase needs
- `05-clean-flowcharts.md` section 3.1, section 3.9 (Routes inventory), Part 2 Decision D6
- Verified-findings V8, V9, V10
- `src/services/worker/http/routes/SessionRoutes.ts`, `src/services/worker/http/routes/SearchRoutes.ts`, `src/cli/handlers/{context,user-message,summarize,session-complete}.ts`
### Tasks
1. **New endpoints** (4 total):
- `POST /api/session/start` — body: `{project, claudeSessionId}`. Returns `{sessionDbId, contextMarkdown, semanticMarkdown}`. Internally: calls `ContextBuilder.generateContext` + `SearchOrchestrator.search`.
- `POST /api/session/prompt` — body: `{sessionDbId, prompt}`. Returns `{promptId}`.
- `POST /api/session/observation` — body: `{sessionDbId, tool_use_id, name, input, output}`. Returns `{observationId|null, skipped}`.
- `POST /api/session/end` — body: `{sessionDbId, last_assistant_message}`. **Blocks** until the queue is drained and the summary row is written (or 110-s timeout). Returns `{summaryId|null}`.
2. **Blocking `/api/session/end`**: implement via a per-session `Deferred<SummaryResult>`. When `ResponseProcessor` writes the summary row, resolve the deferred. Route handler `await`s the promise with a 110-s race.
3. **Delete the old 10 endpoints** under `/sessions/:sessionDbId/*` and `/api/sessions/*` after all hook-side callers are switched. Also delete `/api/context/inject` and `/api/context/semantic`.
4. **Rewrite hook handlers** (`context.ts`, `user-message.ts`, `summarize.ts`, `session-complete.ts`) to use the 4 new endpoints. Delete the 500-ms polling loop in `summarize.ts:117150`.
5. **Hook-side `ensureWorkerRunning` cache**: create `src/hooks/worker-cache.ts` that caches `alive=true` in module scope for the hook process. First call spawns/HTTPs `/health`; subsequent calls skip. Switch all 8 handlers to import from this module.
### Verification
- [ ] `grep -n "router\.\(get\|post\|delete\)" src/services/worker/http/routes/SessionRoutes.ts` → 4 routes.
- [ ] `grep -n "/api/context/inject\|/api/context/semantic" src/` → zero.
- [ ] `grep -n "POLL_INTERVAL_MS\|MAX_WAIT_FOR_SUMMARY_MS" src/cli/handlers/` → zero.
- [ ] Integration: run a full session lifecycle; assert Stop hook returns within ~110 s (or earlier) with a `summaryId`, and no /status polling requests hit the worker.
- [ ] Perf: SessionStart latency ≤ previous latency (one request vs two).
### Anti-pattern guards
- B: No polling. Blocking + timeout replaces it.
- D: `/api/session/start` must not be a facade over `/api/context/inject`; the old endpoints are deleted.
### Blast radius
Hook ↔ worker HTTP contract changes. Needs coordinated plugin rebuild (`npm run build-and-sync`). Old hooks calling old endpoints will 404 — land after a version bump.
---
## Phase 12 — Zod validator middleware
**Outcome**: Per-route Zod schema + one `validateBody(schema)` middleware. Per-route hand-rolled validation gone.
### Context this phase needs
- `05-clean-flowcharts.md` section 3.9
- `src/services/worker/http/routes/*.ts` (8 files with inline validation)
### Tasks
1. **Add `zod`** to `package.json` dependencies (confirm not already present; if it is, skip).
2. **Create `src/services/worker/http/middleware/validateBody.ts`**:
```ts
export function validateBody<T>(schema: z.ZodType<T>): RequestHandler { … }
```
On parse failure: `res.status(400).json({ error: 'validation_failed', fields: result.error.flatten() })`.
3. **Per-route schemas** in a parallel `schemas/` directory (or inline at top of each route file). One `z.object({…})` per endpoint.
4. **Delete** per-route boilerplate: manual `typeof x !== 'string'` checks, `if (!body.foo) return res.status(400)…`.
### Verification
- [ ] `grep -n "res.status(400)" src/services/worker/http/routes/ | wc -l` significantly reduced (only routes that return 400 for domain reasons, not shape validation).
- [ ] Error-shape tests: each endpoint, with invalid body, returns `{error, message, code, fields}`.
- [ ] No behavioral regression on happy path (snapshot test of responses).
### Anti-pattern guards
- A: Don't invent `ZodUtil.assertBody` — use `safeParse` directly.
- E: Single middleware, not one per route.
### Blast radius
HTTP error shape might change slightly (field names in 400s). Client (viewer UI) must tolerate `fields` key.
---
## Phase 13 — KnowledgeAgent simplification
**Outcome**: No `session_id` persistence in `corpus.json`. No `prime` endpoint. No auto-reprime regex. `build` IS prime; every `query` loads the corpus fresh as system prompt.
### Context this phase needs
- `05-clean-flowcharts.md` section 3.11
- `src/services/worker/knowledge/KnowledgeAgent.ts`, `CorpusStore.ts`, `CorpusBuilder.ts`, corresponding routes in `CorpusRoutes.ts`
### Tasks
1. **Delete** `KnowledgeAgent.prime` and the `reprime` endpoint. Update the OpenAPI/route table to drop them.
2. **Simplify `CorpusStore`**: corpus JSON contains `{name, filters, renderedCorpus, generatedAt}`. No `session_id`.
3. **Rewrite `KnowledgeAgent.query`** to always pass `systemPrompt = renderedCorpus` to the SDK. Claude prompt-caching reduces cost when the same corpus is queried repeatedly within the 5-min TTL.
4. **Delete** the session-expiration regex match and auto-reprime path.
### Verification
- [ ] `grep -n "session_id" src/services/worker/knowledge/` → zero.
- [ ] `grep -n "reprime\|auto.*reprime" src/` → zero.
- [ ] Cost test: query the same corpus 3× within 5 min; assert cache hits (the SDK returns `cache_read_input_tokens > 0`).
- [ ] `POST /api/corpus/:name/rebuild` still works; `POST /api/corpus/:name/prime` returns 404.
### Anti-pattern guards
- C: Don't try to "detect session expiration". Always pass fresh system prompt; let the SDK cache decide.
### Blast radius
Corpus JSON format changes (drops `session_id`). Existing corpora still load (extra field ignored or migrated on read).
---
## Phase 14 — HTTP cleanup
**Outcome**: Rate limiter deleted. Static file reads cached at boot.
### Context this phase needs
- `05-clean-flowcharts.md` section 3.9
- Verified-finding V20
- `src/services/worker/http/middleware.ts:4579`, `ViewerRoutes.ts`
### Tasks
1. **Delete `src/services/worker/http/middleware.ts:4579`** (the rate limiter) and its registration in `Middleware.ts`.
2. **Cache `viewer.html`** and `/api/instructions` content in memory at boot; serve from `Buffer` instead of `fs.readFile`.
3. **Delete** the legacy `SessionRoutes.handleObservations` no-privacy-strip endpoint (already handled in Phase 2 if the route is rewired; this is the cleanup pass).
### Verification
- [ ] `grep -n "RATE_LIMIT_WINDOW_MS\|RATE_LIMIT_MAX_REQUESTS" src/` → zero.
- [ ] Boot time: `viewer.html` hits don't cause `fs.readFile` calls (measure with lsof or a log statement).
### Anti-pattern guards
- B: Don't re-introduce the rate limiter as a "config flag". Localhost trust model is explicit.
### Blast radius
Minimal. The rate limiter was theater on a localhost server.
---
## Phase 15 — Final verification
**Outcome**: Whole system behaves per the clean flowcharts. Timer census reads 1 repeating timer. No polling loops. No silent fallbacks. Deleted-lines counter ≥ 2500 net.
### Tasks
1. **Run the timer census**:
```
grep -rn "setInterval\|setTimeout.*recursive\|setTimeout.*repeat" src/ | grep -v test
```
Expected: one `setInterval` in `reaper.ts`; one per-session idle timeout; one EventSource reconnect (UI); no others. Compare against `05-clean-flowcharts.md` Part 4.
2. **Anti-pattern grep pass**:
- `grep -rn "coerceObservationToSummary\|consecutiveSummaryFailures\|DEDUP_WINDOW_MS\|STALE_PROCESSING_THRESHOLD_MS.*claimNextMessage\|backfillAllProjects\|getExistingChromaIds\|stripMemoryTagsFromJson\|stripMemoryTagsFromPrompt\|POLL_INTERVAL_MS" src/` → zero matches.
- `grep -rn "res.status(503)" src/` includes `chroma_unavailable` path (positive check).
3. **Deleted-lines count**: `git diff main --stat | tail -1` — compare against the audit's Part 5 estimate (~2500 net).
4. **Run full test suite**: `npm test`.
5. **Run plugin end-to-end**: `npm run build-and-sync` → trigger all 5 lifecycle hooks in a real Claude Code session → verify SSE events, viewer UI renders, search works, corpus builds + queries, transcript watcher picks up a synthetic Cursor log.
6. **Document**: update `docs/public/architecture.mdx` (or equivalent) to point at `05-clean-flowcharts.md` as the canonical architecture doc.
### Verification
- [ ] Timer census matches `05-clean-flowcharts.md` Part 4 "after" column.
- [ ] All grep anti-pattern checks return zero matches.
- [ ] Full test suite green.
- [ ] End-to-end plugin test passes.
---
## Phase dependency graph
```
P1 ─┐
├─> P2 ─┬─> P3
│ ├─> P6 ─> P7
│ └─> P11
P4 (independent)
P5 ──> P6 (already sequenced above)
P8 (independent — can run anytime)
P9 ──> P10
P11 ──> P12 (Zod lands after endpoint shape is final)
P13 (independent)
P14 (after P11 so legacy route delete is clean)
P15 gates merge.
```
Parallelizable tracks: (P1→P2→P3), (P4), (P5→P6→P7), (P8), (P9→P10), (P13). Merge order: P1,P2,P3,P4,P5,P6,P7,P8,P9,P10,P11,P12,P13,P14,P15.
---
## Estimated effort
Per `05-clean-flowcharts.md` Part 6: ~18 engineer-days for full clean-through. Phase 1 alone closes the P1 security gap (<1 day).
## Success criteria
- One `setInterval` in the worker codebase.
- Zero polling loops on the hook side.
- 40 bullshit items from `05-clean-flowcharts.md` Part 1 all deleted (verified by grep).
- All 12 user-facing features from Pathfinder Phase 0 still work.
- Net LOC deleted ≥ 1800.
+214
View File
@@ -0,0 +1,214 @@
# Pathfinder Phase 7: Master Orchestration Plan
**Date**: 2026-04-22
**Produced by**: `/make-plan` skill invoked on `05-clean-flowcharts.md`
**Supersedes**: `06-implementation-plan.md` as the top-level execution doc (06 is kept as Phase 0 Documentation-Discovery evidence; its verified-findings V1V20 are still canonical and are re-cited from each per-flowchart plan).
> **For `/do` execution, read `09-execution-runbook.md` first** — it's the live runbook with drift-prevention rules, preflight status, and tier-by-tier checkboxes. This master plan describes the dispatch *strategy*; the runbook tracks the *state*.
---
## Why this plan exists
`06-implementation-plan.md` was written *without* invoking the `/make-plan` skill, so it collapsed 12 distinct flowcharts into 15 cross-cutting phases and lost per-flowchart isolation. A new chat context executing a single phase from 06 had to skim across multiple flowchart sections to piece its work together, which is the exact failure mode `/make-plan` exists to prevent.
**This plan fixes that by one-to-one mapping**: every flowchart in `05-clean-flowcharts.md` gets its own self-contained plan document in `07-plans/`, authored by a subagent that runs `/make-plan` methodology against that single flowchart. Any chat session can then execute any per-flowchart plan cold, with all design references, verified findings, and copy-ready snippets inlined.
---
## Phase 0 — Documentation Discovery (consolidated)
Sources and verified findings are not re-derived here — they already exist:
- **Design sources**: `05-clean-flowcharts.md` (canonical flowcharts + deletion ledger + execution order), `02-duplication-report.md` (cross-feature duplication), `03-unified-proposal.md` (U1U8 targets), `00-features.md` (feature boundary map).
- **Verified-findings ledger**: `06-implementation-plan.md` Phase 0 table (V1V20). Every per-flowchart plan **must** cite the V-numbers that apply to its scope and use the V-number reality over the audit's claim whenever they disagree.
- **Allowed APIs**: `06-implementation-plan.md` Phase 0 "Allowed APIs" section (`bun:sqlite`, Express 4, Zod, `fs.watch`, Claude Agent SDK). No new libraries are adopted in Phase 7; if a per-flowchart plan needs one it surfaces the request and stops.
- **Anti-patterns**: `06-implementation-plan.md` Phase 0 "Anti-patterns" (AE). Every per-flowchart plan re-lists the subset of AE it applies.
---
## Split strategy — 12 flowcharts, 12 plans
Each section of Part 3 in `05-clean-flowcharts.md` becomes exactly one plan document. The `01/` flowchart file in `PATHFINDER-2026-04-21/01-flowcharts/` is the "before" reference; the `05` section is the "after" design; the `07-plans/NN-<slug>.md` is the executable plan.
| # | Plan file | Flowchart in 05 | Original flowchart file | Primary 06 phases covered |
|---|---|---|---|---|
| 01 | `07-plans/01-privacy-tag-filtering.md` | 3.2 | `privacy-tag-filtering.md` | Phase 1 |
| 02 | `07-plans/02-sqlite-persistence.md` | 3.3 | `sqlite-persistence.md` | Phase 9 |
| 03 | `07-plans/03-response-parsing-storage.md` | 3.7 | `response-parsing-storage.md` | Phase 3 |
| 04 | `07-plans/04-vector-search-sync.md` | 3.4 | `vector-search-sync.md` | Phase 10 |
| 05 | `07-plans/05-context-injection-engine.md` | 3.5 | `context-injection-engine.md` | Phase 8 (partial) |
| 06 | `07-plans/06-hybrid-search-orchestration.md` | 3.6 | `hybrid-search-orchestration.md` | Phase 4, Phase 8 (partial) |
| 07 | `07-plans/07-session-lifecycle-management.md` | 3.8 | `session-lifecycle-management.md` | Phases 5, 6 |
| 08 | `07-plans/08-transcript-watcher-integration.md` | 3.12 | `transcript-watcher-integration.md` | Phase 7 |
| 09 | `07-plans/09-lifecycle-hooks.md` | 3.1 | `lifecycle-hooks.md` | Phases 2, 11 |
| 10 | `07-plans/10-knowledge-corpus-builder.md` | 3.11 | `knowledge-corpus-builder.md` | Phase 13 |
| 11 | `07-plans/11-http-server-routes.md` | 3.9 | `http-server-routes.md` | Phases 12, 14 |
| 12 | `07-plans/12-viewer-ui-layer.md` | 3.10 | `viewer-ui-layer.md` | — (no-change lockdown) |
The numeric prefix on each plan file encodes the **dispatch-and-execution order** (see "Dependency ordering" below). Filename slugs match the flowchart section title for easy grep.
---
## Dispatch strategy — parallel subagents, one per flowchart
### Why subagents (and not one monolithic author)
Each plan needs independent grep-verification against the live codebase (file:line citations, API confirmations, API-non-existence checks). Running these in parallel divides the codebase scan cost by 12 and forces each plan to stand alone — the subagent has no shared context, so anything it omits would not be available in a downstream `/do` execution either.
### Subagent contract (MANDATORY for every dispatch)
Each subagent receives a prompt with the following five fields, exactly matching the `/make-plan` skill's Subagent Reporting Contract:
1. **Target flowchart**: Section number in `05-clean-flowcharts.md` + the corresponding `01-flowcharts/*.md` "before" file + the output path in `07-plans/`.
2. **Reading list**: `05` (read full file; the section under plan is the authoritative "after" design), `06` Phase 0 ledger (V1V20), the live codebase files cited in `05` (verify file:line; do not copy from the audit without re-grep).
3. **Dependencies**: Upstream flowcharts whose plans must land first, downstream flowcharts that depend on this one (copied from the dependency table below).
4. **Phase contract**: Every phase in the output plan must include (a) What to implement, framed as *copy from doc:line*; (b) Documentation references (05 section + V-numbers + live file:line); (c) Verification checklist (grep counts, tests); (d) Anti-pattern guards (subset of 06 Phase 0 AE).
5. **Reporting contract** — the plan doc opens with:
- **Sources consulted** — every file/URL read, with line ranges.
- **Concrete findings** — exact API signatures, exact file:line locations, differences from the audit.
- **Copy-ready snippet locations** — files and line ranges a future `/do` run will copy from.
- **Confidence + gaps** — what the subagent could not verify and would need a follow-up read to confirm.
A plan doc missing any of the five reporting-contract fields is **rejected** and the subagent is redispatched.
### Parallelism envelope
All 12 subagents dispatch in one batch. They do not talk to each other. Cross-flowchart ordering concerns are handled by each plan citing its dependencies in its header, not by serializing the authoring work. Execution order (via `/do`) is the dependency order below; **authoring order is irrelevant** as long as every plan header lists its deps.
---
## Dependency ordering (for `/do` execution, not for authoring)
Derived from `05-clean-flowcharts.md` Part 6 and reconciled with `06-implementation-plan.md` Phase-dependency graph (line 659+).
```
01 privacy-tag-filtering ──┬──► 08 transcript-watcher
├──► 09 lifecycle-hooks
└──► 07 session-lifecycle
02 sqlite-persistence ──┬──► 03 response-parsing
├──► 04 vector-search-sync (needs chroma_synced migration)
└──► 07 session-lifecycle (needs boot-recovery path)
03 response-parsing-storage ──┬──► 07 session-lifecycle (parser contract used by ResponseProcessor)
05 context-injection-engine ──┬──► 06 hybrid-search (both consume U2 renderObservations)
└──► 10 knowledge-corpus (CorpusDetailStrategy is a renderObservations strategy)
06 hybrid-search-orchestration ──┬──► 10 knowledge-corpus (CorpusBuilder calls SearchOrchestrator)
07 session-lifecycle-management ──┬──► 09 lifecycle-hooks (blocking /api/session/end)
11 http-server-routes ── independent of all except 12 (Zod middleware wraps existing routes)
12 viewer-ui-layer ── independent; lockdown-only plan (no code changes planned)
```
**Execution ladder (top-down for `/do`):**
1. `01-privacy-tag-filtering` — unblocks everything that ingests text.
2. `02-sqlite-persistence` — unblocks every downstream DB change.
3. `03-response-parsing-storage` — unblocks session lifecycle.
4. `04-vector-search-sync` — requires 02's `chroma_synced` migration.
5. `05-context-injection-engine` — introduces U2 renderer; unblocks 06 and 10.
6. `06-hybrid-search-orchestration` — consumes U2 renderer; unblocks 10.
7. `07-session-lifecycle-management` — biggest cull; requires 01, 02, 03.
8. `08-transcript-watcher-integration` — requires 01 (shared ingest helper).
9. `09-lifecycle-hooks` — requires 01, 07 (blocking endpoint must exist).
10. `10-knowledge-corpus-builder` — requires 05, 06.
11. `11-http-server-routes` — independent; land any time after 01 for consistency.
12. `12-viewer-ui-layer` — lockdown doc; no code changes; land last as final regression gate.
If a downstream plan cannot be executed because an upstream one hasn't landed, `/do` halts that branch and reports the missing prerequisite. Parallel execution of independent branches (e.g., 04 and 07) is allowed.
---
## Aggregation / reconciliation step (post-dispatch)
After all 12 per-flowchart plans have been authored, the orchestrator (a human or a follow-up `/make-plan` session) performs these reconciliation checks:
1. **Cross-plan citation consistency** — every file:line cited in more than one plan must resolve to the same code. Any divergence indicates two subagents read different commits; re-dispatch the one citing the older line.
2. **Deletion-ledger totals** — sum the "lines deleted" claimed by all 12 plans; must be within ±15% of `05` Part 5's `-2560` net-lines figure. A large overshoot means duplicate deletion claims (two plans claiming ownership of the same file); the aggregator resolves ownership.
3. **Endpoint inventory** — collate every `/api/*` endpoint claimed as added/removed/renamed across 09 and 11; must equal `05` 3.1's "8→4" and `05` 3.9's route table exactly.
4. **Timer census** — aggregate every `setInterval`/`setTimeout` each plan claims to delete vs. keep; must match `05` Part 4 (3 repeating background timers → **0**, replaced by event-driven handlers + per-session `setTimeout`s + boot-once reconciliation).
5. **Confidence/Gap roll-up** — extract every plan's "Confidence + gaps" block into one aggregated gaps ledger. Any gap blocking execution triggers a targeted discovery subagent before `/do` runs.
Reconciliation writes `PATHFINDER-2026-04-21/08-reconciliation.md` before `/do` executes anything.
---
## Per-flowchart dispatch payload template
Every subagent dispatched in this batch receives this prompt scaffold (with `<FIELDS>` substituted):
```
You are implementing the /make-plan skill methodology on ONE flowchart from claude-mem
v6.5.0's brutal-audit refactor. You have no context from prior sessions; treat this
prompt as self-contained.
TARGET:
- Flowchart section: <SECTION> of PATHFINDER-2026-04-21/05-clean-flowcharts.md
("<FLOWCHART NAME>")
- Before-state file: PATHFINDER-2026-04-21/01-flowcharts/<BEFORE>.md
- Output path: PATHFINDER-2026-04-21/07-plans/<NN>-<SLUG>.md
DEPENDENCIES (cite in plan header):
- Upstream (must land before): <UPSTREAM LIST>
- Downstream (depends on this): <DOWNSTREAM LIST>
READING LIST (all five required):
1. PATHFINDER-2026-04-21/05-clean-flowcharts.md — full file for cross-refs; section
<SECTION> is the authoritative "after" design.
2. PATHFINDER-2026-04-21/06-implementation-plan.md — Phase 0 verified-findings
V1..V20 (lines ~26-47). Cite V-numbers whose scope touches this flowchart and
prefer V-reality over audit claims.
3. PATHFINDER-2026-04-21/01-flowcharts/<BEFORE>.md — "before" diagram.
4. Live codebase files cited in section <SECTION> — re-grep every file:line before
trusting it.
5. Any dependency plans already in PATHFINDER-2026-04-21/07-plans/ — for cross-plan
citation consistency.
PHASE CONTRACT (every phase in the plan):
(a) What to implement — framed as "Copy from <file>:<line-range> into <dest>",
never "transform existing code".
(b) Documentation references — 05 section + V-numbers + live file:line.
(c) Verification checklist — concrete greps (with expected counts) + tests to run.
(d) Anti-pattern guards — subset of 06 Phase 0 AE relevant to this phase.
REPORTING CONTRACT (plan doc opens with four blocks):
- Sources consulted (files/URLs + line ranges)
- Concrete findings (exact APIs, file:line, deltas from audit)
- Copy-ready snippet locations (files a /do run will copy from)
- Confidence + gaps (what you could not verify; what a follow-up discovery must close)
CONSTRAINTS:
- Do NOT invent APIs. If a method "should exist", grep the class first and report
absence in the Gaps block.
- Do NOT widen scope beyond <SECTION>'s "Kept user-facing" list.
- Cite exact file:line for every change; never write "somewhere in SearchManager".
- Plans must be /do-executable: each phase self-contained, copy-ready, verifiable.
WRITE the plan to PATHFINDER-2026-04-21/07-plans/<NN>-<SLUG>.md and stop. Do NOT
edit source code. Do NOT run /do. Report back with a one-paragraph summary
including the plan's phase count, total expected lines deleted, and top 1-2 gaps.
```
---
## What this orchestration plan does NOT do
- It does not edit source code. All source edits happen inside per-flowchart plans, executed by `/do` in a later session.
- It does not produce a consolidated deletion PR. Each per-flowchart plan is a separate landable unit.
- It does not redo the brutal audit. `05-clean-flowcharts.md` is the design authority; this plan only restructures its execution.
- It does not obsolete `06-implementation-plan.md`. 06's Phase 0 (verified-findings V1V20) remains the canonical discovery record. 06's Phases 115 are superseded by the 12 per-flowchart plans, which preserve the same deletion targets but repackage them by flowchart boundary.
---
## Success criteria for Phase 7 (this orchestration plan)
- [ ] 12 plan documents exist under `PATHFINDER-2026-04-21/07-plans/`.
- [ ] Every plan opens with the four-block reporting contract (sources / findings / snippets / confidence).
- [ ] Every plan cites at least one V-number from 06's verified-findings ledger (or states explicitly that none apply).
- [ ] Every plan's phase has all four required sub-fields (What / Docs / Verification / Anti-pattern).
- [ ] Deletion-ledger roll-up across the 12 plans sums to 2500 ±15% net lines.
- [ ] 08-reconciliation.md is written before any `/do` execution.
When all six are true, the cleanup is ready for `/do` to execute the 12 plans in the dependency order above.
@@ -0,0 +1,433 @@
# Plan 01 — privacy-tag-filtering (foundation)
**Target design**: `PATHFINDER-2026-04-21/05-clean-flowcharts.md` section 3.2 ("privacy-tag-filtering (clean)")
**Before-state diagram**: `PATHFINDER-2026-04-21/01-flowcharts/privacy-tag-filtering.md`
**Author date**: 2026-04-22
**Execution order slot**: Part 6 steps 1 and 2 (U6 `stripMemoryTags` + U1 summary privacy gap). First plan in the series.
## Dependencies
- **Upstream (must land before this)**: **none** — this is the foundation plan for the v6.5.0 brutal-audit refactor.
- **Downstream (depends on this)**:
- `07-session-lifecycle-management.md` — introduces `ingestObservation` / `ingestPrompt` / `ingestSummary` helpers that wrap `stripMemoryTags`. Plan 01 must land first so those helpers have a single strip function to call.
- `08-transcript-watcher-integration.md` — calls `ingestObservation` directly (dropping the HTTP loopback). Needs the ingest helpers introduced downstream, which in turn need `stripMemoryTags`.
- `09-lifecycle-hooks.md` — the new `POST /api/session/observation`, `/api/session/prompt`, `/api/session/end` paths must all run stripping; they will route through the downstream ingest helpers.
---
## Sources Consulted
| Source | Lines | What it gave us |
|---|---|---|
| `PATHFINDER-2026-04-21/05-clean-flowcharts.md` | 19, 20, 21, 47, 127-156, 534-558, 564-584 | Part 1 items #1, #2, #3, #29; section 3.2 authoritative clean design; Part 5 deletion ledger row "stripMemoryTagsFromPrompt / FromJson wrappers" (-60/+15 = -45) + summary-path privacy-gap fix row (+3); Part 6 execution steps 1-3 |
| `PATHFINDER-2026-04-21/06-implementation-plan.md` | 22-47 (Phase 0 verified findings V1-V4), 69-111 (Phase 1 tasks), 114-151 (Phase 2 context on ingest helpers), 59-66 (anti-pattern guards A-E) | Verified findings that correct the audit (V1: summary strips ZERO tags not just `<system-reminder>`; V2: `handleObservations` is at line 464, not 378; V3+V4: wrapper + call-site inventory) |
| `PATHFINDER-2026-04-21/01-flowcharts/privacy-tag-filtering.md` | 1-86 | Before-state: three ingress paths (prompt, observation, summary) with partial/missing strip coverage on the summary path |
| `src/utils/tag-stripping.ts` | 1-91 (full file) | Current implementation: `stripTagsInternal` (line 51) + 6 sequential `.replace()` (lines 63-69) + two public wrappers (`stripMemoryTagsFromJson` line 79, `stripMemoryTagsFromPrompt` line 89), `SYSTEM_REMINDER_REGEX` export (line 24), `MAX_TAG_COUNT=100` ReDoS guard (line 31) |
| `src/services/worker/http/routes/SessionRoutes.ts` | 11 (import), 376-389 (route map), 464-485 (`handleObservations` legacy), 491-506 (`handleSummarize` legacy), 560-660 (`handleObservationsByClaudeId` with strip at 629/633), 669-710 (`handleSummarizeByClaudeId` — NO strip), 814-895 (`handleSessionInitByClaudeId` with strip at 862) | Every call site; confirmed every audit line number against live code |
| `src/cli/handlers/summarize.ts` | 19, 59-68, 84-97 | Hook extracts `last_assistant_message` via `extractLastMessage(transcriptPath, 'assistant', true)` (line 64; the `true` strips `<system-reminder>` at read-time only), then POSTs it raw to `/api/sessions/summarize` (line 89). The hook itself does NOT run `stripMemoryTags`; it relies on the worker. Today the worker doesn't strip either — that is the P1 bug. |
| `tests/utils/tag-stripping.test.ts` | 1-80 (413 total lines) | Existing tests import `stripMemoryTagsFromPrompt` + `stripMemoryTagsFromJson` by name; these imports must change. |
## Concrete Findings
1. **Wrappers are identical**. `stripMemoryTagsFromJson(content)` and `stripMemoryTagsFromPrompt(content)` both call `stripTagsInternal(content)` with no behavioural difference (`src/utils/tag-stripping.ts:80` and `:90`). Confirms audit item #1.
2. **Six sequential `.replace()` calls** at `src/utils/tag-stripping.ts:64-69`, one per tag type, each scanning the full string. Confirms audit item #3.
3. **Summary paths strip ZERO tags, not just "`<system-reminder>` only"** — this is the V1 correction to the before-state audit:
- `handleSummarize` (`SessionRoutes.ts:491`): receives `last_assistant_message`, passes it untouched to `this.sessionManager.queueSummarize(sessionDbId, last_assistant_message)` at `:497`.
- `handleSummarizeByClaudeId` (`SessionRoutes.ts:669`): same — raw body → `queueSummarize(sessionDbId, last_assistant_message)` at `:705`.
- The hook-side `extractLastMessage(..., true)` at `summarize.ts:64` only strips `<system-reminder>` via `SYSTEM_REMINDER_REGEX` during transcript parsing; it does nothing for `<private>`, `<claude-mem-context>`, etc.
- **Result**: a `<private>secret</private>` inside an assistant message persists to `pending_messages` and then to `session_summaries`. This is the P1 security gap audit item #2 claims to close.
4. **Legacy `handleObservations` is at line 464, not 378** (V2). It has NO strip — it calls `queueObservation(sessionDbId, {tool_input, tool_response, ...})` directly at `:470`.
5. **Call-site inventory (grep-verified, V4)**:
| File | Line | Function called | Text stripped |
|---|---|---|---|
| `src/utils/tag-stripping.ts` | 79 | declaration `stripMemoryTagsFromJson` | — |
| `src/utils/tag-stripping.ts` | 89 | declaration `stripMemoryTagsFromPrompt` | — |
| `src/services/worker/http/routes/SessionRoutes.ts` | 11 | import both wrappers | — |
| `src/services/worker/http/routes/SessionRoutes.ts` | 629 | `stripMemoryTagsFromJson(JSON.stringify(tool_input))` | observation |
| `src/services/worker/http/routes/SessionRoutes.ts` | 633 | `stripMemoryTagsFromJson(JSON.stringify(tool_response))` | observation |
| `src/services/worker/http/routes/SessionRoutes.ts` | 862 | `stripMemoryTagsFromPrompt(prompt)` | prompt |
| `tests/utils/tag-stripping.test.ts` | 13 | import both wrappers | — (test) |
**No other call sites exist**. The summary path (`:491`, `:669`), the legacy observation path (`:464`), and the hook side of summarize (`summarize.ts`) never touch a strip function.
6. **ReDoS guard & trim already correct**. `countTags` at `tag-stripping.ts:37` + `MAX_TAG_COUNT=100` check at `:54`; `.trim()` at `:70`. Keep both.
7. **`SYSTEM_REMINDER_REGEX` is exported** (`tag-stripping.ts:24`) and used by `src/shared/transcript-parser.ts:84` and `:128` to strip system-reminder at transcript-read-time (the `stripSystemReminders=true` path in `extractLastMessage`). That external use is **not** a memory-strip call site — it is a read-time sanitation of raw transcript JSON. Section 3.2 of 05 keeps that behaviour (it operates before text ever enters our pipeline). **Keep `SYSTEM_REMINDER_REGEX` as an export.**
## Copy-Ready Snippet Locations
`/do` runs can copy verbatim from these locations:
| Copy from | Into | Purpose |
|---|---|---|
| `src/utils/tag-stripping.ts:31` (`MAX_TAG_COUNT = 100`) | New `src/utils/tag-stripping.ts` (rewritten) | ReDoS constant — preserve exact value |
| `src/utils/tag-stripping.ts:37-45` (`countTags`) | New `src/utils/tag-stripping.ts` | Tag-count helper — preserve exact body (one-regex version still needs a count for the warn path) |
| `src/utils/tag-stripping.ts:54-61` (ReDoS guard with `logger.warn`) | New `stripMemoryTags` body | Preserve the warn-but-continue semantics |
| `src/utils/tag-stripping.ts:24` (`SYSTEM_REMINDER_REGEX` export) | New `src/utils/tag-stripping.ts` | External callers (`transcript-parser.ts:84`, `:128`) still import this — must keep export |
| Section 3.2 alternation regex at `05-clean-flowcharts.md:132` | New `stripMemoryTags` body | `/<(private\|claude-mem-context\|system_instruction\|system-instruction\|persisted-output\|system-reminder)>[\s\S]*?<\/\1>/g` |
| `SessionRoutes.ts:629-634` (existing call shape `JSON.stringify(tool_input)`) | Replacement lines at `:629` and `:633` | Same two arguments, new function name |
| `SessionRoutes.ts:862` (existing `stripMemoryTagsFromPrompt(prompt)`) | Replacement line | Same text, new function name |
## Confidence + Gaps
**High confidence**
- Every source line number verified against live code on 2026-04-22.
- The P1 security gap is reproducible: inserting `<private>secret</private>` into an assistant message today writes through to `session_summaries.last_assistant_message` untouched.
- `SYSTEM_REMINDER_REGEX` external usage is real — if Phase 1 deletes it, `transcript-parser.ts` breaks. Keep the export.
**Gaps / unverified**
- I did not measure the ReDoS cost of the alternation regex vs. six sequential `replace()` on pathological inputs. Section 3.2 and audit item #3 claim the single regex is net-faster; that is plausible but untested. Phase 1 includes a micro-benchmark test to confirm before/after.
- Phase 1 assumes `queueObservation` and `queueSummarize` accept arbitrary strings. Confirmed by reading `SessionRoutes.ts:470` and `:497, :705` but not by reading `SessionManager.queueSummarize` itself. If `queueSummarize` does any parsing of `last_assistant_message`, stripping before the call may or may not change that behaviour — Phase 3 verifies with a targeted integration test.
- The hook-side `summarize.ts:64` call to `extractLastMessage(..., true)` leaves `<system-reminder>` stripped *before* the raw message hits the wire. After this plan lands, the worker also runs `stripMemoryTags` on it. That is a double-strip on `<system-reminder>`, which is idempotent (first pass removes it, second pass is a no-op). **Noted; not a bug.**
---
## Phase 1 — Rewrite `src/utils/tag-stripping.ts` to a single `stripMemoryTags`
### (a) What to implement
Replace the entire contents of `src/utils/tag-stripping.ts` with a new version that exports:
1. `SYSTEM_REMINDER_REGEX` (unchanged — external callers depend on it).
2. `stripMemoryTags(text: string): string` — single public function using one alternation regex with back-reference.
Copy `MAX_TAG_COUNT = 100` from current `src/utils/tag-stripping.ts:31`.
Copy `countTags` body from current `src/utils/tag-stripping.ts:37-45` (keep call-site warn semantics).
Copy the `logger.warn('SYSTEM', 'tag count exceeds limit', ...)` block from current `:54-61`.
Copy the alternation regex pattern from `PATHFINDER-2026-04-21/05-clean-flowcharts.md:132`:
```ts
const MEMORY_TAG_NAMES = [
'private',
'claude-mem-context',
'system_instruction',
'system-instruction',
'persisted-output',
'system-reminder',
] as const;
const STRIP_REGEX = new RegExp(
`<(${MEMORY_TAG_NAMES.join('|')})>[\\s\\S]*?<\\/\\1>`,
'g'
);
export function stripMemoryTags(text: string): string {
if (!text) return text;
const tagCount = countTags(text);
if (tagCount > MAX_TAG_COUNT) {
logger.warn('SYSTEM', 'tag count exceeds limit', undefined, {
tagCount,
maxAllowed: MAX_TAG_COUNT,
contentLength: text.length,
});
// Still process but log the anomaly (preserves current behaviour)
}
return text.replace(STRIP_REGEX, '').trim();
}
```
Delete `stripTagsInternal`, `stripMemoryTagsFromJson`, `stripMemoryTagsFromPrompt`.
### (b) Documentation references
- `05-clean-flowcharts.md:127-156` (section 3.2 authoritative design)
- `05-clean-flowcharts.md:19` (audit item #1 — wrapper collapse)
- `05-clean-flowcharts.md:21` (audit item #3 — one-regex alternation)
- `05-clean-flowcharts.md:47` (audit item #29 — strip-on-raw-string, no stringify/parse dance — already how callers pass arguments, so no change needed here)
- `06-implementation-plan.md:30` (V3 verified inventory)
- `06-implementation-plan.md:81-87` (Phase 1 task 1 exact prescription)
- Live file: `src/utils/tag-stripping.ts:1-91`
### (c) Verification checklist
Run from repo root:
```bash
# No stray wrappers survive
grep -rn "stripMemoryTagsFromPrompt\|stripMemoryTagsFromJson\|stripTagsInternal" src/
# Expected: 0 matches
# The new function exists exactly once as a declaration
grep -n "export function stripMemoryTags\b" src/utils/tag-stripping.ts
# Expected: 1 match, on a single line
# SYSTEM_REMINDER_REGEX export preserved
grep -n "export const SYSTEM_REMINDER_REGEX" src/utils/tag-stripping.ts
# Expected: 1 match
# TypeScript compiles
npx tsc --noEmit
# Expected: exit 0 (no errors in tag-stripping.ts; SessionRoutes.ts will still error until Phase 2 — that is expected)
```
Tests: not yet — the test file still imports the old wrappers. Phase 4 updates the test file; Phase 1 leaves it broken.
### (d) Anti-pattern guards
- **A (invent APIs)**: do not add `stripMemoryTagsV2`, `stripMemoryTagsAsync`, `stripTagsSafe`, or any other variant. One public function.
- **C (silent fallbacks)**: the ReDoS guard continues to *warn and process*, not *warn and return empty*. Copy the `logger.warn` call verbatim.
- **D (facades that pass through)**: do not leave `stripMemoryTagsFromPrompt` / `stripMemoryTagsFromJson` as deprecated re-exports calling `stripMemoryTags`. Delete the names.
- **E (two code paths for same data)**: the new file has exactly one strip implementation. No branch on "is JSON" vs "is prompt".
---
## Phase 2 — Replace existing `stripMemoryTagsFromJson` / `FromPrompt` call sites
### (a) What to implement
Edit `src/services/worker/http/routes/SessionRoutes.ts` in exactly three places:
1. **Line 11** — change import:
- From: `import { stripMemoryTagsFromJson, stripMemoryTagsFromPrompt } from '../../../../utils/tag-stripping.js';`
- To: `import { stripMemoryTags } from '../../../../utils/tag-stripping.js';`
2. **Line 629** — rename only:
- From: `? stripMemoryTagsFromJson(JSON.stringify(tool_input))`
- To: `? stripMemoryTags(JSON.stringify(tool_input))`
3. **Line 633** — rename only:
- From: `? stripMemoryTagsFromJson(JSON.stringify(tool_response))`
- To: `? stripMemoryTags(JSON.stringify(tool_response))`
4. **Line 862** — rename only:
- From: `const cleanedPrompt = stripMemoryTagsFromPrompt(prompt);`
- To: `const cleanedPrompt = stripMemoryTags(prompt);`
No logic changes. No reordering. Same arguments.
### (b) Documentation references
- `05-clean-flowcharts.md:127-156` (section 3.2)
- `06-implementation-plan.md:31` (V4 verified call-site inventory — "No call sites in summary, legacy observation, or summarize hook")
- `06-implementation-plan.md:88-90` (Phase 1 task 2 prescription)
- Live file: `src/services/worker/http/routes/SessionRoutes.ts:11, :629, :633, :862`
### (c) Verification checklist
```bash
# Old names gone from the only consumer
grep -n "stripMemoryTagsFromJson\|stripMemoryTagsFromPrompt" src/services/worker/http/routes/SessionRoutes.ts
# Expected: 0 matches
# New name present exactly three times in SessionRoutes (629, 633, 862) plus one import
grep -c "stripMemoryTags(" src/services/worker/http/routes/SessionRoutes.ts
# Expected: 3 (call sites; the import statement uses `stripMemoryTags` without trailing `(`)
grep -n "import .*stripMemoryTags" src/services/worker/http/routes/SessionRoutes.ts
# Expected: 1 match on line 11
# Compiles
npx tsc --noEmit
# Expected: exit 0 (SessionRoutes now uses the new API; summary + legacy obs paths still untouched — will pass)
```
No runtime tests yet — Phase 3 adds the new strip calls that unlock the regression test.
### (d) Anti-pattern guards
- **A (invent APIs)**: do not introduce `stripMemoryTagsAt(callerType, text)`; the single function is enough.
- **E (two code paths)**: after this phase all live strip call sites funnel through one function. Do not leave a "fast path" for prompts and a "JSON path" for observations.
---
## Phase 3 — ADD `stripMemoryTags` calls at summary-path and legacy-observation entry points (closes P1 per V1)
### (a) What to implement
Edit `src/services/worker/http/routes/SessionRoutes.ts` in three additional places. Each change **adds** a strip call before the existing queue call.
1. **`handleObservations` — line 464 handler** (V2 correction of audit's "line 378"):
- Before line 470 (`this.sessionManager.queueObservation(sessionDbId, {...})`), copy the pattern from `:628-634`:
```ts
const cleanedToolInput = tool_input !== undefined
? stripMemoryTags(JSON.stringify(tool_input))
: '{}';
const cleanedToolResponse = tool_response !== undefined
? stripMemoryTags(JSON.stringify(tool_response))
: '{}';
```
- Pass `cleanedToolInput` / `cleanedToolResponse` into `queueObservation` instead of `tool_input` / `tool_response`.
2. **`handleSummarize` — line 491 handler** (V1 security gap; audit had only described missing `<system-reminder>` but V1 confirms ZERO tags are stripped):
- Before line 497 (`this.sessionManager.queueSummarize(sessionDbId, last_assistant_message);`), insert:
```ts
const cleanedAssistantMessage = typeof last_assistant_message === 'string'
? stripMemoryTags(last_assistant_message)
: '';
```
- Pass `cleanedAssistantMessage` into `queueSummarize`.
3. **`handleSummarizeByClaudeId` — line 669 handler** (same V1 gap, `/api/sessions/summarize` endpoint):
- Before line 705 (`this.sessionManager.queueSummarize(sessionDbId, last_assistant_message);`), insert the same cleaning block as #2.
- Pass `cleanedAssistantMessage` into `queueSummarize`.
No new wrappers, no new helper module. Inline call site.
### (b) Documentation references
- `05-clean-flowcharts.md:20` (audit item #2 — SECURITY BUG label)
- `05-clean-flowcharts.md:127-156` (section 3.2 — the `C3: ingestSummary` call site is the design that lands properly once the downstream ingest helper plan uses it; this plan inlines the strip at the route boundary in the interim)
- `05-clean-flowcharts.md:542` (Part 5 ledger row "Summary-path privacy gap fix: +3")
- `06-implementation-plan.md:28` (V1 — "Summary paths strip ZERO tags")
- `06-implementation-plan.md:29` (V2 — `handleObservations` is at line 464)
- `06-implementation-plan.md:91-93` (Phase 1 task 2 sub-bullets)
- Live file: `src/services/worker/http/routes/SessionRoutes.ts:464-485, :491-506, :669-710`
### (c) Verification checklist
```bash
# Every strip call site accounted for
grep -cn "stripMemoryTags(" src/services/worker/http/routes/SessionRoutes.ts
# Expected: 6 (two new observation lines, two new summary lines, two preserved from Phase 2)
# Breakdown:
# :464-handler — 2 (input + response) NEW
# :491-handler — 1 (assistant message) NEW
# :565-handler — 2 (input + response) PHASE-2 RENAME
# :669-handler — 1 (assistant message) NEW
# :862-handler — 1 (prompt) PHASE-2 RENAME
# Total: 7 call sites -> NOTE: grep counts lines; if a call wraps onto its own line count is 7. Use -c with care.
grep -n "queueSummarize(sessionDbId, last_assistant_message)" src/services/worker/http/routes/SessionRoutes.ts
# Expected: 0 — both sites should now pass cleanedAssistantMessage
grep -n "queueObservation(sessionDbId, {" src/services/worker/http/routes/SessionRoutes.ts
# Expected: 2 call sites, both using cleanedToolInput / cleanedToolResponse
# Regression test: insert <private>secret</private> into a summary
# - Start worker locally: npm run build-and-sync
# - POST /sessions/:id/summarize with body {"last_assistant_message":"ok <private>secret</private> done"}
# - SELECT last_assistant_message FROM session_summaries WHERE session_id = :id
# - Expected: "ok done" (trimmed, no "secret", no "<private>")
# - Repeat with POST /api/sessions/summarize and contentSessionId
# - Expected: same result
# Regression test: <persisted-output> in tool_response routed through /sessions/:id/observations
# - POST /sessions/:id/observations with body containing tool_response: "a <persisted-output>blob</persisted-output> b"
# - SELECT tool_response FROM observations WHERE session_id = :id
# - Expected: serialized JSON with "a b", no <persisted-output>, no "blob"
npx tsc --noEmit
# Expected: exit 0
```
### (d) Anti-pattern guards
- **A (invent APIs)**: do not add a `cleanMessageForSummary` or `sanitizeObservation` helper — a two-line inline strip is simpler than any new abstraction. A unified `ingestSummary` / `ingestObservation` helper IS planned, but in the downstream plan `07-session-lifecycle-management.md`, not here. This plan deliberately inlines to land the security fix fast (Part 6 step 2 — "3 lines to close P1, <1 hr").
- **C (silent fallbacks)**: if `last_assistant_message` is not a string, the strip returns `''`. `queueSummarize` then stores an empty summary. That is the explicit behaviour — do not silently coerce a non-string to `JSON.stringify(...)`.
- **E (two code paths for same data)**: `handleObservations` (line 464) and `handleObservationsByClaudeId` (line 565) still have mostly-duplicate bodies after this phase. The downstream `07-session-lifecycle-management.md` plan merges them via `ingestObservation`. Do NOT attempt that merge here — it is out of scope. This phase only adds the missing strip call into the legacy handler; the merge is the next plan's job.
---
## Phase 4 — Delete obsolete wrappers, tests, and dead exports
### (a) What to implement
1. **`src/utils/tag-stripping.ts`** already rewritten in Phase 1 — confirm the file no longer contains `stripMemoryTagsFromPrompt`, `stripMemoryTagsFromJson`, or `stripTagsInternal`.
2. **`tests/utils/tag-stripping.test.ts`** — rewrite to import the new API. Delete any `describe('stripMemoryTagsFromPrompt')` and `describe('stripMemoryTagsFromJson')` blocks; merge their cases into a single `describe('stripMemoryTags')` block. Keep every input assertion — the behaviour must be identical to today for all supported tags.
- Specifically: the test file at `tests/utils/tag-stripping.test.ts:13` imports `{ stripMemoryTagsFromPrompt, stripMemoryTagsFromJson }`. Change to `{ stripMemoryTags }`. Substitute every `stripMemoryTagsFromPrompt(` and `stripMemoryTagsFromJson(` with `stripMemoryTags(`.
3. **grep for any other importer** in `src/`:
- Expected (by V4): only `SessionRoutes.ts` and the test file import the old names. After Phase 2 + Phase 4 edits, no importer remains.
### (b) Documentation references
- `05-clean-flowcharts.md:149-150` (3.2 deletion list: the two wrapper files)
- `05-clean-flowcharts.md:541` (Part 5 ledger: -60/+15 = -45 net line delta)
- `06-implementation-plan.md:94` (Phase 1 task 3 — update tests)
- Live file: `tests/utils/tag-stripping.test.ts:13`, `:33-413`
### (c) Verification checklist
```bash
# No consumer of old names anywhere in tree
grep -rn "stripMemoryTagsFromPrompt\|stripMemoryTagsFromJson\|stripTagsInternal" src/ tests/
# Expected: 0 matches
# Test file compiles and uses the new API
grep -c "stripMemoryTags(" tests/utils/tag-stripping.test.ts
# Expected: >= number of old-wrapper call sites (current file has ~40 calls across the two wrappers; new file should have >= that count)
# Run the test suite
bun test tests/utils/tag-stripping.test.ts
# Expected: all tests green
# Full project typecheck
npx tsc --noEmit
# Expected: exit 0
```
### (d) Anti-pattern guards
- **D (facades that pass through)**: do not add `export const stripMemoryTagsFromPrompt = stripMemoryTags` for "backward compatibility". Callers are entirely internal; change them.
- **E (two code paths)**: the test file should have ONE describe block, not two. Do not leave parallel test suites.
---
## Phase 5 — Final verification (counts + regression + benchmark)
### (a) What to implement
This is a verification-only phase. No new code. Run the following checks and record results in the PR description.
1. **Grep census** (expected counts anchor the acceptance criteria):
| Command | Expected |
|---|---|
| `grep -rn "stripMemoryTagsFromPrompt\|stripMemoryTagsFromJson\|stripTagsInternal" src/ tests/` | `0` matches |
| `grep -rn "stripMemoryTags\b" src/ tests/` | exactly 1 declaration (`src/utils/tag-stripping.ts`) + 1 test import + 6 SessionRoutes.ts call lines + however many test-body call sites exist |
| `grep -c "stripMemoryTags(" src/services/worker/http/routes/SessionRoutes.ts` | `6` (3 rename sites + 3 added sites, counting each tool_input/tool_response separately per handler + the 2 summary handlers + 1 prompt handler = 6) |
| `grep -rn "queueSummarize(sessionDbId, last_assistant_message\b" src/` | `0` (both sites now pass `cleanedAssistantMessage`) |
| `grep -rn "SYSTEM_REMINDER_REGEX" src/` | `>= 3` (export in `tag-stripping.ts`, imports in `transcript-parser.ts:84` and `:128`) |
2. **End-to-end regression: `<private>` in summary path**
- Insert `<private>SHOULD_NOT_APPEAR</private>` into an assistant message via the transcript used by the summarize hook.
- Trigger `Stop` hook. Wait for `/api/sessions/summarize` blocking response.
- `SELECT last_assistant_message FROM session_summaries ORDER BY id DESC LIMIT 1;`
- Expected: no occurrence of `SHOULD_NOT_APPEAR` and no `<private>`.
3. **End-to-end regression: `<persisted-output>` in tool_response**
- POST a sample observation via hook path with a `tool_response` containing `<persisted-output>LARGE</persisted-output>`.
- `SELECT tool_response FROM observations ORDER BY id DESC LIMIT 1;`
- Expected: `LARGE` absent, `<persisted-output>` absent.
4. **Micro-benchmark** (informational, not blocking):
- New single-regex alternation should be no worse than the old six-sequential `.replace()` on a 1 MB input with 50 tags. Record ms/op.
- If the new version is >2× slower, escalate — but the audit claim is that one regex is faster.
5. **Build sanity**: `npm run build-and-sync` succeeds; worker restarts cleanly.
### (b) Documentation references
- `05-clean-flowcharts.md:155` (3.2 closes: "P1 security gap (private content reaching `session_summaries`)")
- `05-clean-flowcharts.md:538-558` (Part 5 — deletion totals for this row: -45 lines wrappers + -3 lines partial strip + +3 lines new summary-path strip)
- `06-implementation-plan.md:96-101` (Phase 1 verification checklist template)
### (c) Verification checklist
Already enumerated in (a).
### (d) Anti-pattern guards
- **A**: do not add a wrapper "for the benchmark" — measure by timing `stripMemoryTags` directly.
- **C**: if the regression test finds stripped content leaking to the DB, the fix is to call `stripMemoryTags` — not to add a post-strip "second pass" to the consumer. The ingress is the only place to strip.
---
## Line-count summary (this plan only)
Referencing Part 5 of `05-clean-flowcharts.md`:
| Change | Lines deleted | Lines added | Source row |
|---|---|---|---|
| Wrappers + six regex passes collapse to one | -60 | +15 | 05 Part 5 row "stripMemoryTagsFromPrompt / FromJson wrappers" |
| Summary-path privacy gap fix (V1) | 0 | +3 | 05 Part 5 row "Summary-path privacy gap fix" |
| Legacy-observation privacy gap fix (V2, not in 05 ledger) | 0 | +6 | V2 correction (two strip calls in `handleObservations`) |
| Test file rewrites | ~-5 | ~+5 | Phase 4 |
| **Net** | **≈ -60** | **≈ +29** | **≈ -31 net** |
Net code delta is small; the load-bearing outcome is **closing P1** (private content no longer reaches `session_summaries` or the legacy observation path).
@@ -0,0 +1,518 @@
# Plan 02 — sqlite-persistence (clean)
**Target**: claude-mem v6.5.0 brutal-audit refactor, flowchart 3.3.
**Design authority**: `PATHFINDER-2026-04-21/05-clean-flowcharts.md` section **3.3**.
**Corrections authority**: `PATHFINDER-2026-04-21/06-implementation-plan.md` Phase 0 verified-findings **V12, V13, V14, V15, V19**.
**Date**: 2026-04-22.
---
## Dependencies
- **Upstream (must land before this plan):** none. This is a leaf plan.
- **Downstream (blocked on this plan):**
- `03-response-parsing-storage` — depends on `UNIQUE(session_id, tool_use_id)` + `ON CONFLICT DO NOTHING` added in **Phase 1** below (dedup gate moves from content-hash window to DB constraint).
- `04-vector-search-sync` — depends on the `chroma_synced INTEGER DEFAULT 0` column added in **Phase 2** below. 04's whole backfill simplification (`WHERE chroma_synced=0 LIMIT 1000`) cannot ship until that column exists.
- `07-session-lifecycle-management` — depends on the boot-once `recoverStuckProcessing()` extracted in **Phase 4** below (07 wires it into the worker startup sequence).
---
## Reporting block 1 — Sources consulted
1. `PATHFINDER-2026-04-21/05-clean-flowcharts.md` — full file (607 lines). **Section 3.3** is the canonical clean design for sqlite-persistence (lines 159194). Part 1 items **#15** (30-s dedup window → UNIQUE constraint, line 33), **#16** (60-s claim stale-reset → boot recovery, line 34), **#27** (Python sqlite3 repair → `claude-mem repair`, line 45), **#28** (27 migrations → `schema.sql` + upgrade-only runner, line 46). Part 5 ledger rows for SQLite referenced in `06-implementation-plan.md` Phase 9.
2. `PATHFINDER-2026-04-21/06-implementation-plan.md` Phase 0 verified-findings:
- **V12** (line 39): audit claimed 27 migrations; reality is **19 private methods** in `MigrationRunner.runAllMigrations()` at `runner.ts:2241`; highest `schema_versions.version` written is **27** (legacy system from `DatabaseManager` contributed ~5 more numbers). Plan target: "19 methods + legacy → `schema.sql` + N upgrade-only migrations".
- **V13** (line 40): Python sqlite3 subprocess **lives in production code** (`Database.ts:7999`, not just tests). Test file exists at `tests/services/sqlite/schema-repair.test.ts` (253 lines). Phase 5 must delete from production; test file becomes a CLI test.
- **V14** (line 41): `DEDUP_WINDOW_MS = 30_000` at `observations/store.ts:13`. Dedup key is SHA-256 of `(memory_session_id, title, narrative)` at `:2129`**NOT** `tool_use_id`. The new UNIQUE is an **additive** gate (different key space); it does not automatically subsume every path the content-hash hit.
- **V15** (line 42): No `chroma_synced` column exists today; Phase 2 creates it.
- **V19** (line 46): `STALE_PROCESSING_THRESHOLD_MS = 60_000` at `PendingMessageStore.ts:6`; stale reset happens inside every `claimNextMessage()` call (lines 99145).
- Phase 9 (lines 412448) is prior scope draft — superseded where this plan differs.
3. `PATHFINDER-2026-04-21/01-flowcharts/sqlite-persistence.md` — "before" diagram (97 lines). Confirms: 27 migrations claim (V12 corrects), content-hash dedup with 30-s window, claim-confirm self-heal, Python schema repair at boot.
4. Live codebase:
- `src/services/sqlite/Database.ts` (359 lines). Python repair at `:37109`, reopen wrapper at `:115132`, PRAGMA block at `:163168`, `MigrationRunner` invocation at `:171172`.
- `src/services/sqlite/migrations/runner.ts` (1018 lines). 19 private methods listed at `:2241`. Schema-version INSERTs write versions {4,5,6,7,8,9,10,11,16,17,19,20,21,22,23,24,25,27} — gaps (1215, 18, 26) confirm the legacy `DatabaseManager` numbering V12 mentions.
- `src/services/sqlite/observations/store.ts` (108 lines). `DEDUP_WINDOW_MS` at `:13`, `computeObservationContentHash` at `:2130`, `findDuplicateObservation` at `:3646`, `storeObservation` at `:53108`.
- `src/services/sqlite/PendingMessageStore.ts` (529 lines). `STALE_PROCESSING_THRESHOLD_MS` at `:6`, stale-reset block inside `claimNextMessage` transaction at `:99145` (reset SQL at `:107115`, peek at `:118124`, mark-processing at `:129134`).
- `tests/services/sqlite/schema-repair.test.ts` (253 lines) — Python script invoked via `execSync`, per V13.
- `tests/services/sqlite/migration-runner.test.ts` (361 lines) — existing migration regression tests; these must still pass after consolidation.
- **No** `src/services/sqlite/schema.sql` exists today (grep confirms). Phase 3 must create it.
5. `PATHFINDER-2026-04-21/07-plans/` — empty of dependency plans (this is the first plan written).
---
## Reporting block 2 — Concrete findings
| Claim | Verified? | Evidence |
|---|---|---|
| Migration method count is 22 (V12 audit) | **Partially** — actual is **19 private methods** enumerated in `runAllMigrations` at `runner.ts:2241`. 27 is the highest `schema_versions.version` written (legacy `DatabaseManager` migrations 13, 1215, 18, 26 contribute the gap). | `runner.ts:2241` + grep of `schema_versions.*VALUES.*run(N)` lines. |
| Highest current schema version is 27 | **Yes** — last INSERT at `runner.ts:1015` writes version `27` for `addObservationSubagentColumns`. | `runner.ts:1015`. |
| `UNIQUE(session_id, tool_use_id)` exists today | **No** — zero references to `tool_use_id` anywhere under `src/services/sqlite/`. The identifier only appears in `src/types/transcript.ts` and `src/services/worker/SDKAgent.ts` (input payload shape). | Grep `tool_use_id` in `src/services/sqlite/` returns zero files. |
| Dedup is content-hash based, NOT `tool_use_id` | **Yes**`computeObservationContentHash` hashes `(memory_session_id, title, narrative)` at `store.ts:2129`. Subagent `agent_type`/`agent_id` intentionally excluded per the comment at `:1819`. | `store.ts:1346`. |
| `chroma_synced` column exists | **No** — no migration adds it; no reference in `runner.ts` or any store. | Grep confirms. |
| 60-s stale reset fires per-claim, not at boot | **Yes** — reset UPDATE lives **inside** the `claimTx` transaction at `PendingMessageStore.ts:107115`, run every time `claimNextMessage()` is called. | `PendingMessageStore.ts:99145`. |
| Python sqlite3 lives in production, not just tests | **Yes**`execFileSync('python3', [scriptPath, dbPath, objectName], ...)` at `Database.ts:99` inside the production `repairMalformedSchema` function (`:37109`). Test file at `tests/services/sqlite/schema-repair.test.ts` exercises that production code path. | `Database.ts:99`. |
| `schema.sql` file exists today | **No** — Phase 3 must create it. "HOW" is detailed below (dump current state from a clean fresh-install DB). | Glob `**/*.sql` under `src/` returns zero. |
**Net count correction propagated to every phase below:** "19 methods (not 22 or 27)" where migration count is cited.
---
## Reporting block 3 — Copy-ready snippet locations
| Destination | Source file:line | What to copy |
|---|---|---|
| `src/services/sqlite/migrations/2026-04-22_add_observations_tool_use_id.ts` (new upgrade migration) | Existing patterns from `runner.ts:658842` (migration `addOnUpdateCascadeToForeignKeys`, idempotent ALTER) | The idempotent "check column via `PRAGMA table_info`, ALTER if missing, mark `schema_versions`" pattern. |
| `src/services/sqlite/observations/store.ts` (Phase 1 rewrite) | Existing INSERT shape at `store.ts:77102` | Keep the 17-column INSERT layout; only change the body from "compute hash → check dup → INSERT" to "INSERT … ON CONFLICT (memory_session_id, tool_use_id) DO NOTHING RETURNING id". |
| `src/services/sqlite/migrations/2026-04-23_add_observations_chroma_synced.ts` (new upgrade migration) | Pattern from `addObservationContentHashColumn` at `runner.ts:844864` | Exact template: `PRAGMA table_info``ALTER TABLE observations ADD COLUMN chroma_synced INTEGER DEFAULT 0` → record version. |
| `src/services/sqlite/schema.sql` (new — created in Phase 3) | `runner.ts:52124` (initializeSchema block) + tables from migrations 5,6,8,9,10,11,16,17,19,20,21,22,23,24,25,27 | Run the current `MigrationRunner` end-to-end on a fresh `:memory:` DB, then dump via `SELECT sql FROM sqlite_master WHERE type IN ('table','index') ORDER BY rootpage` — this is the authoritative generator. Detail in Phase 3 tasks. |
| `src/services/sqlite/PendingMessageStore.ts` (Phase 4) | Stale-reset block at `PendingMessageStore.ts:107115` | Copy the SQL verbatim into a new `recoverStuckProcessing()` method; delete the copy from inside `claimTx`. `claimNextMessage` keeps only `peek` (`:118124`) + `mark-processing` (`:129134`) inside its transaction. |
| `src/cli/handlers/repair.ts` (new — Phase 5) | `Database.ts:79107` (Python script body + `execFileSync` call) | Move the whole Python-script-written-to-tempfile + `execFileSync` pattern into a user-invoked CLI command handler; remove boot-time auto-call. |
---
## Reporting block 4 — Confidence + gaps
**Confidence: HIGH** on:
- Phases 1, 2, 4, 6 — all reference existing, stable code (V14/V15/V19 are pinned to single-file call sites).
- Phase 5 — Python block is small (~70 lines of wrapper + embedded script at `Database.ts:37109`) and test coverage already exists at `tests/services/sqlite/schema-repair.test.ts`.
**Confidence: MEDIUM** on:
- Phase 3 (schema.sql generation). `schema.sql` does not exist today. The mechanical path is: (a) spin up `:memory:` DB, (b) run current `MigrationRunner.runAllMigrations()` unchanged, (c) dump `SELECT sql FROM sqlite_master` in a stable order, (d) check the dump into the repo. Risk: FTS5 virtual tables and their implicit rowid-shadow tables may need hand-tuning because `sqlite_master` includes internal `*_content`/`*_idx` tables that must NOT be in `schema.sql` (they're auto-created by the `CREATE VIRTUAL TABLE USING fts5` statement). **The schema.sql generator must filter `name NOT LIKE '%_content' AND name NOT LIKE '%_segments' AND name NOT LIKE '%_segdir' AND name NOT LIKE '%_docsize' AND name NOT LIKE '%_config'`** (all standard FTS5 shadow-table suffixes).
- Phase 1 ordering w.r.t. Phase 6. Dropping `DEDUP_WINDOW_MS` + `findDuplicateObservation` (Phase 6) ONLY after Phase 1 lands AND verification proves every observation-ingest path writes a `tool_use_id`. The **transcript-watcher ingest path** (`src/services/transcripts/watcher.ts`, referenced by downstream plan `07-session-lifecycle-management`) may emit observations where `tool_use_id` is derived from JSONL line parsing rather than the hook payload — if that path produces a non-unique or missing `tool_use_id`, the UNIQUE constraint will not cover it and the content-hash gate still provides value. **Phase 6 is gated by a concrete grep + runtime check that every call site into `storeObservation` supplies a real `tool_use_id`.**
**Top gaps:**
1. **`schema.sql` doesn't exist today — must be generated mechanically.** Phase 3 specifies the exact generator script so this is reproducible. The risk is that FTS5 shadow tables leak into the dump; the filter list above must be applied. If a future migration adds a `USING fts5` virtual table with a non-default suffix, the filter will need updating.
2. **Dedup semantics may differ across ingest paths.** V14 confirms the current dedup key (SHA of title+narrative) and V14's warning applies: the transcript watcher, `/api/sessions/observations` hook path, and `/sessions/:id/observations` legacy path may each derive `tool_use_id` differently. Phase 1 adds the UNIQUE constraint but Phase 6 (dedup-window removal) must verify all three paths supply a consistent `tool_use_id` BEFORE the content-hash fallback is deleted. If the transcript-watcher path uses synthetic IDs (e.g., `file:offset`) instead of the real Claude Code `tool_use_id`, that's a real gap to flag to the owner of plan `07-session-lifecycle-management` before both plans land.
---
## Phase contract — template applied below
Every phase specifies:
- **(a) What to implement** — framed as "Copy from `<file>:<line>` into `<dest>`".
- **(b) Documentation references** — 05 section + V-numbers + live file:line.
- **(c) Verification checklist** — concrete greps + tests.
- **(d) Anti-pattern guards** — A (invent migration methods), B (polling), C (silent fallback), E (two dedup paths).
---
## Phase 1 — Add `UNIQUE(session_id, tool_use_id)` and `ON CONFLICT DO NOTHING` INSERT
**Outcome**: Observations have a `tool_use_id` column; `(memory_session_id, tool_use_id)` is UNIQUE; `storeObservation` uses `INSERT ... ON CONFLICT DO NOTHING RETURNING id` (idempotent, constraint-based). Content-hash dedup still runs underneath (removed in Phase 6 after verification).
### (a) Tasks
1. **Create new migration** `src/services/sqlite/migrations/` (add a method to `MigrationRunner.runAllMigrations` between `addObservationSubagentColumns` (line 41) and a new method `addObservationToolUseIdUnique`, assigning `schema_versions.version = 28`).
- Copy the idempotent pattern from `addObservationContentHashColumn` at `runner.ts:844864`: `PRAGMA table_info(observations)` → if `tool_use_id` column missing, `ALTER TABLE observations ADD COLUMN tool_use_id TEXT`.
- Backfill legacy rows: `UPDATE observations SET tool_use_id = 'legacy:' || id WHERE tool_use_id IS NULL`. Legacy synthetic IDs must be unique across existing rows (row `id` is unique by PK) and prefixed so future real `tool_use_id` values never collide.
- Create unique partial index: `CREATE UNIQUE INDEX IF NOT EXISTS idx_observations_session_tool_use_id ON observations(memory_session_id, tool_use_id) WHERE tool_use_id IS NOT NULL`.
- Register version 28.
2. **Rewrite `src/services/sqlite/observations/store.ts:53108`** (`storeObservation`):
- Add `tool_use_id: string` to `ObservationInput` (`src/services/sqlite/observations/types.ts`).
- Replace the INSERT at `:77102` with:
```sql
INSERT INTO observations
(memory_session_id, project, type, title, subtitle, facts, narrative, concepts,
files_read, files_modified, prompt_number, discovery_tokens, agent_type, agent_id,
content_hash, tool_use_id, created_at, created_at_epoch)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
ON CONFLICT(memory_session_id, tool_use_id) DO NOTHING
RETURNING id, created_at_epoch
```
- If `RETURNING` returns a row → new insert, return it.
- If no row returned → SELECT the existing row: `SELECT id, created_at_epoch FROM observations WHERE memory_session_id = ? AND tool_use_id = ?` and return.
- **Keep** `computeObservationContentHash` and `findDuplicateObservation` and the pre-INSERT dedup check **intact** in this phase. Phase 6 removes them. (Rationale: additive gate first, drop old gate only after confirming coverage — anti-pattern E avoidance.)
3. **Wire `tool_use_id` through every call site that creates an observation**. Grep: every `storeObservation(` caller must now pass `tool_use_id`. The three known ingest paths are (i) `/api/sessions/observations` HTTP route, (ii) `/sessions/:id/observations` legacy route, (iii) transcript-watcher ingest. Each must read `tool_use_id` from the incoming payload (hook sends it; transcript JSONL lines contain it).
### (b) Documentation references
- `05-clean-flowcharts.md` **section 3.3**, line 172 (`INSERT observations UNIQUE(session_id, tool_use_id)`) and line 188 (deletion ledger entry). Part 1 item **#15** at line 33.
- Verified-finding **V14** (`06-implementation-plan.md:41`).
- Live code: `observations/store.ts:13108`, `runner.ts:844864` (copy-from template).
### (c) Verification checklist
- [ ] Grep: `grep -n "tool_use_id" src/services/sqlite/` returns at least 3 hits (types, store INSERT, migration).
- [ ] Grep: `grep -n "tool_use_id" src/services/worker/http/routes/SessionRoutes.ts` confirms both observation route handlers read it from body.
- [ ] New unit test `tests/services/sqlite/observations/unique-constraint.test.ts`: insert two observations with same `(memory_session_id, tool_use_id)`; assert second returns the first's `id`; assert `SELECT COUNT(*) FROM observations` incremented by exactly 1.
- [ ] Existing `tests/services/sqlite/migration-runner.test.ts` (361 lines) still passes — no regressions on migrations 427.
- [ ] Fresh-install smoke: delete DB, boot worker, confirm `PRAGMA index_list(observations)` includes `idx_observations_session_tool_use_id`.
- [ ] Upgrade smoke: copy a v6.5.0 DB into place, boot worker, confirm legacy rows got `tool_use_id = 'legacy:<id>'` and new index exists.
### (d) Anti-pattern guards
- **A (invent migration methods)**: do NOT add any migration method besides `addObservationToolUseIdUnique` in this phase. Enumerate before adding.
- **C (silent fallback)**: `ON CONFLICT DO NOTHING` is **idempotent, not silent** — conflicts are expected and return the existing id. The route handler must not treat "no new row inserted" as an error; the caller gets the existing id back.
- **E (two dedup paths)**: both dedup gates are present in this phase **intentionally**. The old one exits in Phase 6 after every path is verified.
### Blast radius
Schema change (one new column, one new index). Hook + route payload shapes gain `tool_use_id`. No runtime behavior change on happy path (first INSERT wins as before); conflict path now returns the existing id faster (no pre-check query, one INSERT round-trip).
---
## Phase 2 — Add `chroma_synced` column (blocks plan 04)
**Outcome**: `observations.chroma_synced INTEGER DEFAULT 0`, `session_summaries.chroma_synced INTEGER DEFAULT 0`, and `user_prompts.chroma_synced INTEGER DEFAULT 0` exist. Partial index on `chroma_synced = 0` for the backfill scan on all three tables. Plan `04-vector-search-sync` can now consume these.
> **Preflight edit 2026-04-22 (reconciliation C3)**: The original phase covered only `observations` + `session_summaries`. Reconciliation identified that plan 04 also backfills `user_prompts`, so this phase must add the column there too. Migration body below extends to all three tables.
### (a) Tasks
1. **Add migration method `addChromaSyncedColumns`** to `MigrationRunner.runAllMigrations` (between the new `addObservationToolUseIdUnique` from Phase 1 and end of list), assigning `schema_versions.version = 29`.
- Template: `addObservationContentHashColumn` at `runner.ts:844864`.
- Body:
```ts
const obsInfo = this.db.query('PRAGMA table_info(observations)').all() as TableColumnInfo[];
if (!obsInfo.some(c => c.name === 'chroma_synced')) {
this.db.run('ALTER TABLE observations ADD COLUMN chroma_synced INTEGER NOT NULL DEFAULT 0');
}
const sumInfo = this.db.query('PRAGMA table_info(session_summaries)').all() as TableColumnInfo[];
if (!sumInfo.some(c => c.name === 'chroma_synced')) {
this.db.run('ALTER TABLE session_summaries ADD COLUMN chroma_synced INTEGER NOT NULL DEFAULT 0');
}
const promptInfo = this.db.query('PRAGMA table_info(user_prompts)').all() as TableColumnInfo[];
if (!promptInfo.some(c => c.name === 'chroma_synced')) {
this.db.run('ALTER TABLE user_prompts ADD COLUMN chroma_synced INTEGER NOT NULL DEFAULT 0');
}
this.db.run('CREATE INDEX IF NOT EXISTS idx_observations_chroma_synced ON observations(chroma_synced) WHERE chroma_synced = 0');
this.db.run('CREATE INDEX IF NOT EXISTS idx_summaries_chroma_synced ON session_summaries(chroma_synced) WHERE chroma_synced = 0');
this.db.run('CREATE INDEX IF NOT EXISTS idx_prompts_chroma_synced ON user_prompts(chroma_synced) WHERE chroma_synced = 0');
this.db.prepare('INSERT OR IGNORE INTO schema_versions (version, applied_at) VALUES (?, ?)').run(29, new Date().toISOString());
```
2. **Do NOT** modify `ChromaSync.ts` in this phase — that is plan 04's responsibility. This phase only lands the schema.
### (b) Documentation references
- `05-clean-flowcharts.md` **section 3.4** line 226 ("Adds: `chroma_synced` boolean column on `observations`. Schema migration.").
- Verified-finding **V15** (`06-implementation-plan.md:42`).
- Live code: `runner.ts:844864` (copy template).
### (c) Verification checklist
- [ ] `PRAGMA table_info(observations)` on a fresh-boot DB includes `chroma_synced`.
- [ ] `PRAGMA table_info(session_summaries)` includes `chroma_synced`.
- [ ] `PRAGMA table_info(user_prompts)` includes `chroma_synced`.
- [ ] Partial indexes exist: `SELECT name FROM sqlite_master WHERE type='index' AND name LIKE '%chroma_synced%'` returns 3 rows.
- [ ] Upgrade smoke: on a pre-Phase-2 DB, both ALTERs run exactly once; second boot is a no-op (idempotency gate).
- [ ] `migration-runner.test.ts` extended with a case asserting `schema_versions.version = 29` after fresh install.
### (d) Anti-pattern guards
- **A**: one method, one version. Do not add a backfill-on-migration step here (that's plan 04).
- **E**: do NOT touch `ChromaSync.ts` write path in this phase; keep concerns isolated so plans can land independently.
### Blast radius
Pure additive schema. Zero runtime behavior change until plan 04 starts writing to the column.
---
## Phase 3 — Consolidate 19 migrations into `schema.sql` + slim upgrade-only runner
**Outcome**: Fresh DBs execute `src/services/sqlite/schema.sql` in one shot and write `schema_versions.version = <current>`. Existing DBs continue running only upgrade-step migrations whose version is `> max(schema_versions.version)`. The 19 `CREATE TABLE IF NOT EXISTS` / `ALTER TABLE` idempotency bodies shrink dramatically since fresh-DB paths no longer traverse them.
### (a) Tasks
1. **Generate `src/services/sqlite/schema.sql`** by a reproducible script, not by hand:
- Write a one-shot generator at `scripts/dump-schema.ts`:
```ts
import { Database } from 'bun:sqlite';
import { MigrationRunner } from '../src/services/sqlite/migrations/runner.js';
import { writeFileSync } from 'fs';
const db = new Database(':memory:');
new MigrationRunner(db).runAllMigrations();
// Filter out FTS5 shadow tables — they're created automatically by CREATE VIRTUAL TABLE.
const rows = db.query(`
SELECT sql FROM sqlite_master
WHERE sql IS NOT NULL
AND name NOT LIKE 'sqlite_%'
AND name NOT LIKE '%_content'
AND name NOT LIKE '%_segments'
AND name NOT LIKE '%_segdir'
AND name NOT LIKE '%_docsize'
AND name NOT LIKE '%_config'
AND name NOT LIKE '%_data'
AND name NOT LIKE '%_idx'
ORDER BY
CASE type WHEN 'table' THEN 0 WHEN 'index' THEN 1 WHEN 'trigger' THEN 2 ELSE 3 END,
name
`).all() as { sql: string }[];
writeFileSync('src/services/sqlite/schema.sql',
rows.map(r => r.sql + ';').join('\n\n') + '\n');
```
- Run `bun run scripts/dump-schema.ts`, commit the resulting `schema.sql`.
- `schema.sql` must end with `INSERT OR IGNORE INTO schema_versions (version, applied_at) VALUES (29, datetime('now'));` (where 29 = current max after Phases 1 and 2).
2. **Rewrite `Database.ts:171172`** to check for fresh DB:
- After PRAGMAs, query `SELECT COUNT(*) FROM sqlite_master WHERE type='table' AND name='schema_versions'`.
- If zero (true fresh DB): read `schema.sql` (bundled via `import.meta` or FS at a known path), execute via `db.exec(sql)`, done.
- Else: run `MigrationRunner` as today (it's already idempotent per-migration via `PRAGMA table_info` checks).
3. **DO NOT delete the 19 migration methods.** They remain as upgrade paths for existing DBs from v6.4.x or earlier. What shrinks is the fresh-install path cost (19 idempotent ALTER checks → 1 `db.exec(schema.sql)`).
4. **Add a CI check** in `tests/services/sqlite/schema-consistency.test.ts`: runs the dump-schema generator in-memory, diffs against the checked-in `schema.sql`; fails if they drift. This is the only way to keep `schema.sql` honest as new migrations land.
### (b) Documentation references
- `05-clean-flowcharts.md` **section 3.3** lines 166170 (Boot → Check → Fresh? → Execute `schema.sql` vs Migrate). Line 191 in the deletion ledger.
- Verified-finding **V12** (`06-implementation-plan.md:39`) — confirms 19 methods, not 27.
- Live code: `Database.ts:163173` (boot sequence), `runner.ts:2241` (method list).
- **Gap note from reporting block 4 (#1)**: the FTS5 shadow-table filter list in the generator is non-obvious; comment it inline with a link to the SQLite FTS5 docs section on shadow tables.
### (c) Verification checklist
- [ ] `ls src/services/sqlite/schema.sql` exists and is > 0 bytes.
- [ ] Fresh-install test: delete DB → boot → dump `sqlite_master` → byte-equal to `schema.sql` content (modulo the `schema_versions` INSERT).
- [ ] Upgrade test: copy a v6.4 fixture DB → boot → all 19 migration methods run → final schema matches `schema.sql`.
- [ ] `schema-consistency.test.ts` (new) passes on CI.
- [ ] `migration-runner.test.ts` (existing, 361 lines) still passes — upgrade path is unchanged.
- [ ] No FTS5 shadow table names appear in `schema.sql` (grep: `_content\|_segments\|_segdir\|_docsize\|_config\|_data\|_idx` returns zero).
### (d) Anti-pattern guards
- **A (invent migration methods)**: `schema.sql` is NOT a replacement for the runner's upgrade methods — it's a fresh-install fast-path. Don't invent a "migration framework". `db.exec()` + a list of functions is the whole system.
- **C (silent fallback)**: if `schema.sql` parsing throws on boot, **do not** fall back to running the runner from scratch — fail boot with a clear error. A fresh-DB schema failure is a shipped-bug bug; users should see it.
### Blast radius
Fresh-install boot drops from ~19 idempotency checks to one `db.exec`. Existing DBs: identical behavior. Risk: `schema.sql` drift from runner — mitigated by the consistency test.
**Lines deleted estimate for this phase alone: 0 net from runner (methods stay for upgrades). Lines added: ~200 for `schema.sql`, ~30 for consistency test, ~15 for boot branch.**
---
## Phase 4 — Move all SQLite housekeeping to boot-once (revised 2026-04-22)
**Outcome**: zero repeating SQLite-related `setInterval`s anywhere in the worker. `PendingMessageStore.claimNextMessage()` becomes pure SELECT+UPDATE (no self-healing per call). Three boot-once jobs exist on `PendingMessageStore` / `Database`, called exactly once at worker startup:
1. `recoverStuckProcessing()` — resets `status='processing'` rows left by a crashed prior worker.
2. `clearFailedOlderThan(1h)` — prunes old failed rows that accumulated before this boot (no schema constraint requires periodic execution; see Reporting block 2).
3. Deletion of the periodic `PRAGMA wal_checkpoint(PASSIVE)` call — replaced by SQLite's native `wal_autocheckpoint` default (1000 pages). `Database.ts:162-168` sets no override so the default is already active; no new code is required.
**Why zero-timer** (authoritative rationale, supersedes any older plan text): SQLite auto-checkpoints when the WAL reaches 1000 pages of writes, which is the correct contract for a long-running worker. An explicit 2-min `PRAGMA wal_checkpoint(PASSIVE)` call accelerates checkpoints beyond that default but is not required for correctness — it was a band-aid layered on top of the stale-reaper interval (`worker-service.ts:547-589`). Similarly, `clearFailedOlderThan(1h)` running every 2 min purges rows that realistically accumulate at single-digit-per-hour rates; once-per-boot is sufficient and no `pending_messages` query cares about row count or stale-row presence. See `08-reconciliation.md` Part 4 revised cross-check (Invariant 4).
### (a) Tasks
1. **Add new method** `PendingMessageStore.recoverStuckProcessing()`:
- Copy the stale-reset SQL block from `PendingMessageStore.ts:106115` **verbatim** into the new method:
```ts
recoverStuckProcessing(): number {
const staleCutoff = Date.now() - STALE_PROCESSING_THRESHOLD_MS;
const resetStmt = this.db.prepare(`
UPDATE pending_messages
SET status = 'pending', started_processing_at_epoch = NULL
WHERE status = 'processing' AND started_processing_at_epoch < ?
`);
const result = resetStmt.run(staleCutoff);
if (result.changes > 0) {
logger.info('QUEUE', `BOOT_RECOVERY | recovered ${result.changes} stale processing message(s)`);
}
return result.changes as number;
}
```
- Note the SQL changes one thing: no `session_db_id = ?` predicate — boot recovery is global across all sessions.
2. **Delete** `PendingMessageStore.ts:103116` (the `staleCutoff` / `resetStmt` block inside `claimTx`). The transaction body shrinks to peek (lines 118124) + mark-processing (lines 129134).
3. **Confirm `clearFailedOlderThan()` is callable standalone.** Current signature at `PendingMessageStore.ts:486-495` accepts a `thresholdMs` number and runs a single-statement UPDATE/DELETE. No change to the method body; this phase only moves **where it is called from**. No new method is added for this — the existing one is sufficient.
4. **Delete the explicit `PRAGMA wal_checkpoint(PASSIVE)` call** from `worker-service.ts:~581` as part of plan 07 Phase 4's deletion of the stale-reaper block (`worker-service.ts:547-589`). This plan is the authority that it is safe to delete: `Database.ts:162-168` sets `journal_mode=WAL`, `synchronous=NORMAL`, `cache_size`, `mmap_size`, and leaves `wal_autocheckpoint` at SQLite's default (1000 pages). No override was ever introduced. Verification in (c) confirms.
5. **Wire the three boot calls** in the downstream plan `07-session-lifecycle-management` Phase 3 Mechanism C (boot-once reconciliation block). That plan's responsibility to place `pendingStore.recoverStuckProcessing()` and `pendingStore.clearFailedOlderThan(60 * 60 * 1000)` in the worker startup sequence. This plan **adds/confirms the methods** but does not modify `worker-service.ts` directly (single-responsibility per plan).
### (b) Documentation references
- `05-clean-flowcharts.md` **section 3.3** lines 183184 ("Worker startup ONCE (not on every claim) … crash recovery") and line 190 (deletion ledger).
- `05-clean-flowcharts.md` Part 2 **D3** (revised 2026-04-22 — zero repeating background timers).
- `05-clean-flowcharts.md` Part 4 timer census (revised — `clearFailedOlderThan` and `PRAGMA wal_checkpoint` explicit disposition).
- Part 1 item **#16** (line 34) and Part 2 decision on "Crash-recovery that solves a real OS-level problem … keep but consolidate".
- Verified-finding **V19** (`06-implementation-plan.md:46`).
- `08-reconciliation.md` Part 4 revised — Invariant 4 (SQLite auto-checkpoint default is active).
- Live code: `PendingMessageStore.ts:6` (threshold), `:99145` (full `claimNextMessage`), `:486-495` (`clearFailedOlderThan`), `Database.ts:162-168` (PRAGMA block — confirms no `wal_autocheckpoint` override), `worker-service.ts:547-589` (stale-reaper block being deleted by plan 07 Phase 4).
### (c) Verification checklist
- [ ] Grep: `grep -n "STALE_PROCESSING_THRESHOLD_MS" src/services/sqlite/PendingMessageStore.ts` → 2 matches max (constant + `recoverStuckProcessing` body).
- [ ] Grep: `grep -n "status = 'processing'" src/services/sqlite/PendingMessageStore.ts` finds exactly one UPDATE that flips processing→pending (in `recoverStuckProcessing`), NOT in `claimNextMessage`.
- [ ] Inspect `claimNextMessage`: transaction body has no UPDATE-to-pending step.
- [ ] Grep: `grep -rn "clearFailedOlderThan" src/` → exactly 2 matches (the method definition in `PendingMessageStore.ts` and a single call site in the boot-once reconciliation block inside `worker-service.ts`). No call inside any `setInterval` or handler.
- [ ] Grep: `grep -rn "wal_checkpoint" src/services/worker/ src/services/worker-service.ts`**0 matches** in `worker-service.ts`. If the codebase introduces an observability read of `PRAGMA wal_autocheckpoint` at boot for logging purposes, that is fine — but no explicit `PRAGMA wal_checkpoint(...)` execution anywhere.
- [ ] Grep: `grep -n "wal_autocheckpoint" src/services/sqlite/Database.ts` → 0 matches (confirms we are relying on SQLite's default of 1000 pages; any future non-zero override must be reviewed against this plan).
- [ ] Grep: `grep -rn "setInterval" src/services/sqlite/ src/services/worker-service.ts`**0 matches** for SQLite-related intervals.
- [ ] New unit test `tests/services/sqlite/PendingMessageStore.boot-recovery.test.ts`:
- Insert a row with `status='processing'`, `started_processing_at_epoch = Date.now() - 2*60_000`.
- Call `recoverStuckProcessing()`; assert return = 1; assert `status='pending'` and `started_processing_at_epoch=NULL`.
- [ ] New unit test `tests/services/sqlite/PendingMessageStore.failed-purge.test.ts`:
- Insert three `status='failed'` rows with `updated_at_epoch` values `now-2h`, `now-30min`, `now-5min`.
- Call `clearFailedOlderThan(60 * 60 * 1000)`; assert exactly the `now-2h` row is removed; the other two remain.
- [ ] WAL-checkpoint regression test: with `wal_autocheckpoint` at SQLite default, write > 1000 pages to the DB in a loop; assert the WAL file size stabilizes (does not grow unbounded). Proves the default is sufficient without explicit `PRAGMA wal_checkpoint`.
- [ ] Existing `tests/services/sqlite/PendingMessageStore.test.ts` tests for `claimNextMessage` still pass, but the "self-healing" test case (if present) is rewritten against `recoverStuckProcessing` instead.
### (d) Anti-pattern guards
- **B (no polling, no new interval)**: none of the three boot-once jobs may run on a timer, inside `claimNextMessage`, or inside any request handler. Boot-once is the contract. The canonical check is `grep -rn "setInterval" src/services/sqlite/ src/services/worker-service.ts`**0**.
- **A (no invented abstractions)**: no `SqliteHousekeepingService` class, no `BootRecoveryOrchestrator`. The three calls live as three plain method invocations inside plan 07's boot-once reconciliation block. If a fourth housekeeping job appears later, *then* extract.
- **D (no facade-over-facade)**: `clearFailedOlderThan` is called directly on `PendingMessageStore` — do not add a `housekeepFailed()` wrapper that just forwards.
### Blast radius
`PendingMessageStore` (new method + deletion of in-transaction self-heal) and — through plan 07's boot block — `worker-service.ts` (deletion of the periodic `wal_checkpoint` + `clearFailedOlderThan` calls inside the stale-reaper interval). Downstream `07-session-lifecycle-management` adds the call sites; until that plan lands, `recoverStuckProcessing()` is dead code (acceptable — additive, doesn't break anything). Deleting the explicit `wal_checkpoint` call has no user-visible effect; the WAL grows slightly larger between auto-checkpoints, which is within SQLite's designed behavior.
---
## Phase 5 — Delete Python sqlite3 schema-repair; replace with user-facing `claude-mem repair`
**Outcome**: `Database.ts:37132` (`repairMalformedSchema` + `repairMalformedSchemaWithReopen`) gone. Production boot never shells out to Python. A new CLI subcommand `claude-mem repair` exists (or is stubbed with a documented follow-up plan) for users hitting pre-v6.5 corruption.
### (a) Tasks
1. **Delete** `Database.ts:25` (imports: `execFileSync`, `fs` helpers, `tmpdir`, `path.join`) and `Database.ts:37132` (both `repairMalformedSchema` functions and their reopen wrapper).
2. **Delete** `Database.ts:160` (the call to `repairMalformedSchemaWithReopen`) in the `ClaudeMemDatabase` constructor. PRAGMAs now execute directly after `new Database()`.
3. **Create CLI subcommand** `src/cli/handlers/repair.ts`:
- Copy the Python script body + `execFileSync` pattern from the deleted `Database.ts:8199` verbatim.
- Expose via `src/cli/index.ts` (or wherever subcommand dispatch lives) as `claude-mem repair`.
- On success, print a human-readable summary: "Dropped N orphaned schema objects; reset migration versions. Restart the worker."
- On failure: exit code 1 with the Python error surfaced.
- **Acceptable alternative if CLI scaffolding is heavier than expected**: ship this phase as a **stub** handler that prints a "Feature scheduled — see follow-up plan [link]" message and register the follow-up plan explicitly. Do not leave the production Python path alive "until the CLI is ready" — the boot-time auto-repair must be deleted in this phase.
4. **Move the existing test** `tests/services/sqlite/schema-repair.test.ts` (253 lines) to exercise the CLI handler instead of the production boot path. If the stub route is taken, the test becomes a skipped/TODO stub with a reference to the follow-up plan.
### (b) Documentation references
- `05-clean-flowcharts.md` Part 1 item **#27** (line 45): "Users on malformed DBs from v<X run a one-shot `claude-mem repair` command manually."
- Section 3.3 deletion ledger line 187 (~120 lines estimate).
- Verified-finding **V13** (`06-implementation-plan.md:40`).
- Live code: `Database.ts:37132` (delete), `tests/services/sqlite/schema-repair.test.ts` (repoint).
### (c) Verification checklist
- [ ] `grep -n "execFileSync\|execSync" src/services/sqlite/` → zero hits.
- [ ] `grep -n "python3" src/services/` → zero hits.
- [ ] `grep -rn "repairMalformedSchema" src/` → zero hits.
- [ ] `wc -l src/services/sqlite/Database.ts` shows ~100 fewer lines than today (359 → ~260).
- [ ] `claude-mem repair --help` prints usage (or stub message with follow-up-plan link).
- [ ] Fresh boot smoke: start worker with a healthy DB; confirm no Python process spawned (check `ps` or instrumentation log).
- [ ] Malformed-DB smoke: deliberately corrupt `sqlite_master`, boot worker → expect a clean error with instruction "run `claude-mem repair`" (not a silent auto-heal).
### (d) Anti-pattern guards
- **C (silent fallback)**: boot must not auto-recover from malformed schema. Surface the error. That's the whole point of V13's call-out.
- **A**: do not invent an `AutoRepairService`. One CLI handler, done.
- **E**: `claude-mem repair` is the ONE repair entry point. Delete everywhere else.
### Blast radius
Boot path simplifies. Users on corrupt DBs get a clear message instead of silent auto-fix. Risk: users accustomed to auto-repair will see hard failure — mitigated by the message pointing at `claude-mem repair`.
**Lines deleted estimate: ~100 from `Database.ts`.**
---
## Phase 6 — Delete `DEDUP_WINDOW_MS` + `findDuplicateObservation` (gated on Phase 1 verification)
**Outcome**: Content-hash dedup window removed. UNIQUE constraint is the sole dedup gate. `store.ts` drops to the single INSERT-with-conflict path.
**CRITICAL GATE**: this phase ONLY runs after the gap in reporting block 4 (#2) has been closed: every call site into `storeObservation` provably supplies a real, hook-or-transcript-sourced `tool_use_id`. Before running the `rm` commands below, execute the verification grep AND the integration test described.
### (a) Tasks
**Pre-phase gate (must pass before any deletion):**
- Run `grep -rn "storeObservation(" src/` → enumerate every caller.
- For each caller, trace the `tool_use_id` field back to its source. Must be either (i) the Claude Code hook payload (`tool_use_id` field from `PostToolUse`), (ii) a JSONL transcript line's `tool_use_id`, or (iii) a synthetic-but-stable identifier documented in the caller's comments.
- If any caller has no stable `tool_use_id`, **stop**. Flag to plan owner, keep content-hash fallback, exit this phase.
**If gate passes:**
1. **Delete from `observations/store.ts`**:
- Line 13 (`DEDUP_WINDOW_MS`).
- Lines 2130 (`computeObservationContentHash` export) — **KEEP** the column and the value written into it for analytics, but the function itself is no longer a public export; inline the SHA computation inside `storeObservation` so the column still gets populated on INSERT. Alternative: keep `computeObservationContentHash` as a utility if any caller outside this file uses it (grep first; V14 implies it's only used here).
- Lines 3646 (`findDuplicateObservation`).
- Lines 6975 (the pre-INSERT dup check block).
2. **Simplify `storeObservation` body** to a single INSERT path (the one added in Phase 1).
### (b) Documentation references
- `05-clean-flowcharts.md` section 3.3 lines 188189 (deletion ledger).
- Verified-finding **V14** (`06-implementation-plan.md:41`).
- Gap #2 in reporting block 4 above — this phase's gate is the closure mechanism for that gap.
### (c) Verification checklist
- [ ] Grep: `grep -rn "DEDUP_WINDOW_MS\|findDuplicateObservation" src/` → zero hits.
- [ ] Grep: `grep -n "computeObservationContentHash" src/services/sqlite/observations/` → limited to `store.ts` (inline) OR zero external callers.
- [ ] New integration test: simulate two PostToolUse hook payloads with the same content (title+narrative) but different `tool_use_id` → assert **both** observations are persisted (UNIQUE doesn't trigger, content-hash no longer blocks). This validates the coverage shift is correct behavior.
- [ ] New integration test: simulate two PostToolUse hook payloads with the same `(session, tool_use_id)` → assert only one row persists, both return the same id.
- [ ] End-to-end: run the full hook cycle; confirm observations land in DB and no dedup log lines from the deleted path appear.
### (d) Anti-pattern guards
- **E (two dedup paths)**: the WHOLE POINT of this phase. Grep must prove the old path is gone before merge.
- **C**: the UNIQUE constraint raises a conflict, which `ON CONFLICT DO NOTHING` converts to a no-op + SELECT-existing. That's **idempotent**, not silent — the caller gets the existing id. Do not introduce any `try/catch` that swallows the conflict differently.
### Blast radius
`observations/store.ts` shrinks to ~40 lines. If the gate fails and this phase is skipped, content-hash dedup survives harmlessly alongside the UNIQUE constraint (extra work per INSERT, no correctness loss).
**Lines deleted estimate: ~40 from `store.ts` (file goes from 108 → ~65 lines).**
---
## Phase 7 — Final verification
**Outcome**: All six phases above land; regression suite green; anti-pattern greps zero.
### (a) Tasks
1. **Run anti-pattern grep pass** (cite these exact patterns):
- `grep -rn "DEDUP_WINDOW_MS" src/` → zero (Phase 6).
- `grep -rn "findDuplicateObservation" src/` → zero (Phase 6).
- `grep -rn "repairMalformedSchema\|execFileSync.*python" src/services/` → zero (Phase 5).
- `grep -rn "STALE_PROCESSING_THRESHOLD_MS" src/` → 2 hits max: constant definition + `recoverStuckProcessing` body (Phase 4).
- `grep -n "status = 'processing'" src/services/sqlite/PendingMessageStore.ts` finds exactly one pending-flip UPDATE, inside `recoverStuckProcessing` (Phase 4).
- `grep -n "tool_use_id" src/services/sqlite/observations/store.ts` ≥ 2 hits (type + INSERT) (Phase 1).
- `grep -n "chroma_synced" src/services/sqlite/migrations/runner.ts` finds the Phase 2 migration (Phase 2).
- `ls src/services/sqlite/schema.sql` exists (Phase 3).
2. **Run tests**:
- `bun test tests/services/sqlite/` — all existing + new tests green.
- Specifically: `migration-runner.test.ts` (361 lines, unchanged test set must still pass), `PendingMessageStore.test.ts`, `schema-repair.test.ts` (retargeted to CLI), plus new: `unique-constraint.test.ts`, `boot-recovery.test.ts`, `schema-consistency.test.ts`.
3. **Run fresh-install smoke**:
- Delete `~/.claude-mem/claude-mem.db`.
- Boot worker via `npm run build-and-sync`.
- Assert: `schema.sql` path taken (no Python process, no 19 migration logs on fresh install).
- Assert: `schema_versions.version = 29` (or whatever the final version is after Phase 2's migration 29 lands).
4. **Run upgrade smoke**:
- Copy a v6.4.x fixture DB to the live path.
- Boot worker.
- Assert: all upgrade migrations through version 29 run; final schema matches `schema.sql`.
5. **Count deleted lines**: `git diff main -- src/services/sqlite/ | grep -c "^-"` should show:
- ~40 lines from `store.ts` (Phase 6).
- ~100 lines from `Database.ts` (Phase 5).
- ~15 lines from `PendingMessageStore.ts` (Phase 4 — net ~0 because `recoverStuckProcessing` is added).
- Net deletions: **~140 lines** (before counting Phase 3's `schema.sql` which is additive).
### (b) Documentation references
- `05-clean-flowcharts.md` section 3.3 (full).
- `06-implementation-plan.md` Phase 9 (lines 412448) — superseded-but-aligned.
- `06-implementation-plan.md` Phase 15 (lines 631655) — final-verification template.
### (c) Verification checklist
- [ ] All anti-pattern greps pass.
- [ ] All tests green.
- [ ] Fresh + upgrade smoke tests pass.
- [ ] Deleted-line count ≥ 140.
- [ ] Downstream plan owners (03, 04, 07) notified that their prerequisites (UNIQUE constraint, `chroma_synced` column, `recoverStuckProcessing`) are available.
### (d) Anti-pattern guards
- **A/B/C/E**: final grep pass is the enforcement.
---
## Summary
- **Phase count**: 7 (matches minimum expected set).
- **Net lines deleted** (estimate, source-only, excluding `schema.sql` which is added): **~140**, split:
- Phase 5: ~100 lines from `Database.ts` (Python repair).
- Phase 6: ~40 lines from `observations/store.ts` (dedup window + helper + call block).
- Phase 4: ~0 net (delete ~13, add ~15 for `recoverStuckProcessing`).
- Phase 3: 0 from source (migrations stay for upgrade path; `schema.sql` is new).
- Phases 1, 2: additive only (new migration methods + column + constraint).
- **Top gaps** (see reporting block 4):
1. `schema.sql` generator must filter FTS5 shadow tables; Phase 3 includes the exact NOT-LIKE filter list, but a new FTS5 virtual table with a non-default suffix in a future migration would break this — needs a convention-lock or a more general regex.
2. Phase 6 is **gated** by cross-path `tool_use_id` verification (Phase 1's UNIQUE must provably cover the transcript-watcher ingest path, owned by plan `07-session-lifecycle-management`). If transcript-watcher produces synthetic `tool_use_id`s (e.g., `file:offset`) that don't match hook-path IDs, the content-hash gate cannot be removed safely and Phase 6 must be deferred to a follow-up plan.
@@ -0,0 +1,257 @@
# 03 — response-parsing-storage (implementation plan)
> **Design authority**: `05-clean-flowcharts.md` §3.7 (clean diagram + deletion list at lines 295317), Part 1 bullshit items #20#23 (lines 3841), Part 2 decision **D5** (lines 77). This plan translates §3.7 into concrete edits. Where the audit disagrees with verified code, the live-file citations win and are called out.
## Dependencies
- **Upstream**`02-sqlite-persistence`. The sibling plan introduces a `UNIQUE(session_id, tool_use_id)` constraint on `pending_messages` and replaces the 30 s in-memory dedup window with `INSERT … ON CONFLICT DO NOTHING`. *This plan does not touch `pending_messages` schema, but the sibling's `markFailed` contract (`UPDATE … SET status='failed'`) must remain intact — parser-level failure marking continues to go through `PendingMessageStore.markFailed(messageId)` at `src/services/sqlite/PendingMessageStore.ts:349`.* Cite: 02-sqlite-persistence Phase 2 (UNIQUE-constraint phase).
- **Downstream**`07-session-lifecycle-management`. That plan owns `RestartGuard` evolution and the one-reaper timer. **Critical coupling**: today `RestartGuard` (`src/services/worker/RestartGuard.ts:1270`) exposes only `recordRestart()`, `recordSuccess()`, and read-only counters — **there is no `recordFailure()` method**. The audit's D5 claim "RestartGuard already exists for repeated failures" is half-true: it covers process-restart loops, not per-message parse failures. Two legitimate options:
1. (preferred) Let parse-failure propagate via `PendingMessageStore.markFailed` only. Session exits through the existing idle path; on the next summarize or observation attempt the session is re-initialised. If parsing fails repeatedly enough to crash the SDK subprocess, `RestartGuard.recordRestart()` is the thing that trips — already wired via existing restart paths. No new RestartGuard surface area required.
2. (alt) Add `session.recordFailure(reason)` as a thin helper that logs + calls `markFailed` for each `processingMessageIds` entry. Still no RestartGuard API changes.
**This plan adopts option (1)**: no new methods on RestartGuard. The flowchart box "session.recordFailure()" from §3.7 resolves to the block of code that marks all `processingMessageIds` as `'failed'` in `pending_messages` — identical shape to today's non-XML early-fail branch at `ResponseProcessor.ts:102106`, but reached through the single `parseAgentXml` return path. See the `07-session-lifecycle-management` plan for any RestartGuard API additions; do not add them here.
## Verified facts (pinned to files)
| # | Fact | Source |
|---|---|---|
| V7a | `coerceObservationToSummary` is a private fn used twice inside `parseSummary`. | `src/sdk/parser.ts:222` (def), `:152` + `:197` (call sites) |
| V7b | Non-XML early-fail branch lives at lines 87108. | `src/services/worker/agents/ResponseProcessor.ts:87108` |
| V7c | Consecutive-summary-failures circuit breaker lives at lines 176200. | `src/services/worker/agents/ResponseProcessor.ts:176200` |
| V7d | `consecutiveSummaryFailures` field on `ActiveSession`. | `src/services/worker-types.ts:53` |
| V7e | `consecutiveSummaryFailures` is also **read** by `SessionManager.queueSummarize` at line 340 to short-circuit. That site must be deleted too — the original Phase 3 draft in `06-implementation-plan.md` did not list it. | `src/services/worker/SessionManager.ts:340346` |
| V7f | `MAX_CONSECUTIVE_SUMMARY_FAILURES` constant in `src/sdk/prompts.ts:21` is imported by both `ResponseProcessor.ts:16` and `SessionManager.ts` (via prompts import). Delete the constant and both imports. | `src/sdk/prompts.ts:21` |
| V7g | Pending-message FAILED state literal is **`'failed'`** (lowercase). CHECK constraint: `status IN ('pending','processing','processed','failed')`. `markFailed(messageId)` is the official API. | `src/services/sqlite/PendingMessageStore.ts:22`, `:349`, `:369`; `src/services/sqlite/migrations/runner.ts:533`; `src/services/sqlite/SessionStore.ts:565` |
| V7h | RestartGuard has no `recordFailure()` method. Public surface: `recordRestart()`, `recordSuccess()`, `restartsInWindow`, `windowMs`, `maxRestarts`. | `src/services/worker/RestartGuard.ts:170` |
| V7i | Prompts already mandate `<summary>` root tag for summary turns ("you MUST wrap your ENTIRE response in `<summary>...</summary>` tags", "The ONLY accepted root tag is `<summary>`"). `<skip_summary reason="..."/>` is recognised by the parser (`parser.ts:124`) but is **not** documented in `buildSummaryPrompt` as a valid alternative. Prompt must be updated (Phase 1b) so the D5 contract is actually printed to the agent. | `src/sdk/prompts.ts:153174`; `src/sdk/parser.ts:124` |
| V7j | Atomic TX boundary is `sessionStore.storeObservations(...)` (single call, internal BEGIN/COMMIT). Do not split it. Today it wraps observations + optional summary in one transaction. | `src/services/worker/agents/ResponseProcessor.ts:149164`, `src/services/sqlite/observations/store.ts` (module) |
| V7k | `parseSummary` accepts `coerceFromObservation: boolean = false`. All coercion is gated on this flag — it is `true` only when `summaryExpected` (derived from `SUMMARY_MODE_MARKER` substring match) is true. | `src/sdk/parser.ts:122`, `ResponseProcessor.ts:7581` |
## Concrete target signatures
```ts
// src/sdk/parser.ts — replaces parseObservations + parseSummary + coerceObservationToSummary
export type ParseFailureReason = 'no_xml' | 'missing_summary' | 'malformed';
export interface ParsedAgentOutput {
observations: ParsedObservation[];
summary: ParsedSummary | null;
skipSummary: boolean;
}
export type ParseResult =
| { valid: true; data: ParsedAgentOutput }
| { valid: false; reason: ParseFailureReason };
export function parseAgentXml(
text: string,
opts: { requireSummary: boolean; correlationId?: string; sessionId?: number }
): ParseResult;
```
Failure semantics (no coercion, per D5):
- `text.trim()` is non-empty, no `<observation>`/`<summary>`/`<skip_summary` token → `{valid:false, reason:'no_xml'}`.
- `opts.requireSummary === true` and parse yields no `<summary>` and no `<skip_summary/>``{valid:false, reason:'missing_summary'}`.
- Any regex match with empty sub-tag payload where `requireSummary``{valid:false, reason:'malformed'}`.
- Otherwise → `{valid:true, data:{observations, summary|null, skipSummary}}`.
## Phases
### Phase 1 — Write `parseAgentXml` in `src/sdk/parser.ts`
**(a) What to implement**
1. Copy `extractField` from `src/sdk/parser.ts:267276` and `extractArrayElements` from `:282305` verbatim into the new module layout. These remain private helpers.
2. Copy the observation-extraction loop body (field extraction, type validation, ghost-obs filter) from `src/sdk/parser.ts:40108` into a private `extractObservations(text, correlationId)` that returns `ParsedObservation[]`. No behaviour change.
3. Copy the summary-extraction happy path (skip_summary check at `:124133`, `<summary>` regex at `:136137`, field extraction at `:164169`, false-positive guard at `:191214`) into a private `extractSummary(text, sessionId)` that returns `{ summary: ParsedSummary | null; skipSummary: boolean; malformed: boolean }`. **Delete the two `coerceFromObservation` branches at `:151158` and `:196203` — they do not survive.**
4. Delete `coerceObservationToSummary` (`src/sdk/parser.ts:222259`, 38 lines) outright.
5. Write the public `parseAgentXml(text, opts)` that:
- Computes `observations = extractObservations(text, opts.correlationId)`.
- Computes `{ summary, skipSummary, malformed } = extractSummary(text, opts.sessionId)`.
- Returns `{valid:false, reason:'no_xml'}` if `text.trim()` && `observations.length === 0` && `!summary` && `!skipSummary` && `!/<observation>|<summary>|<skip_summary\b/.test(text)`.
- Returns `{valid:false, reason:'missing_summary'}` if `opts.requireSummary` && `!summary` && `!skipSummary`.
- Returns `{valid:false, reason:'malformed'}` if `opts.requireSummary` && `malformed`.
- Returns `{valid:true, data:{observations, summary, skipSummary}}` otherwise.
6. Remove the old named exports `parseObservations` and `parseSummary` and their `coerceFromObservation` parameter. Keep `ParsedObservation`/`ParsedSummary` interfaces (`src/sdk/parser.ts:927`) — they're part of the public shape.
**(b) Docs** — `05-clean-flowcharts.md` §3.7 (clean diagram, lines 295317), Part 1 #20/#21/#23 (lines 3841), Part 2 D5 (line 77). V7a (parser.ts:222). V7i (prompt contract already mandates `<summary>`; skip-summary token recognised at parser.ts:124). V7k (coerceFromObservation gating on `summaryExpected`).
**(c) Verification**
- `grep -n "coerceObservationToSummary" src/` → 0 hits.
- `grep -n "parseObservations\|parseSummary\b" src/` → 0 hits outside `parser.ts` itself; inside `parser.ts` only the private helpers.
- Unit test: `parseAgentXml('', {requireSummary:false})``{valid:true, data:{observations:[], summary:null, skipSummary:false}}` (empty string is not `no_xml`; trim is empty).
- Unit test: `parseAgentXml('Error: auth token expired', {requireSummary:true})``{valid:false, reason:'no_xml'}`.
- Unit test: agent returns `<observation><type>x</type><title>t</title></observation>` with `requireSummary:true``{valid:false, reason:'missing_summary'}` (no coercion to summary).
- Unit test: `<skip_summary reason="no work"/>` with `requireSummary:true``{valid:true, data:{observations:[], summary:null, skipSummary:true}}`.
- Unit test: `<summary><request>r</request>…</summary>``{valid:true, data:{…, summary:{…}, skipSummary:false}}`.
**(d) Anti-pattern guards**
- **Guard C (silent fallback)**: Coercion is *deleted*, not relocated. `grep -n "coerce" src/sdk/parser.ts` → 0 hits.
- **Guard D (facades)**: `parseObservations` + `parseSummary` collapse to a single `parseAgentXml`. Two public fns → one.
- **Guard A (invent APIs)**: No new classes. Pure function returning a discriminated union. No `ParserValidator`, no `SummaryCoercer`, no base class.
---
### Phase 1b — Update agent contract in `src/sdk/prompts.ts`
**(a) What to implement** — Extend `buildSummaryPrompt()` at `src/sdk/prompts.ts:140175` (the return-value template) so it explicitly permits `<skip_summary reason="..."/>` as an alternative when there is literally nothing to summarise. Current text says "The ONLY accepted root tag is `<summary>`" (`:155`), which is incompatible with the parser's `<skip_summary/>` recognition (`parser.ts:124`) and incompatible with the D5 contract ("`<summary>` or `<skip_summary/>`"). Proposed insertion, directly after the existing line `:173`:
```
• If (and ONLY if) there is no work to summarise, you may return
<skip_summary reason="..."/> as the sole root tag instead of <summary>.
Any other response is a protocol violation and the session will fail.
```
Also delete the export `MAX_CONSECUTIVE_SUMMARY_FAILURES` at `src/sdk/prompts.ts:21` and its JSDoc at `:1720`. The constant is unused after Phase 2 + Phase 3.
**(b) Docs** — §3.7 deletion list ("agent must return `<summary>` or `<skip_summary/>`", line 311). Part 2 D5 (line 77). V7i.
**(c) Verification**
- `grep -n "MAX_CONSECUTIVE_SUMMARY_FAILURES" src/` → 0 hits.
- Manual diff of generated summary prompt shows the skip-summary clause.
- Existing prompt-mandate text (`:153`, `:155`, `:173`) preserved so the normal-case contract stays strict.
**(d) Anti-pattern guards**
- **Guard C**: The contract is now self-describing — no silent downstream coercion needed because the agent is told the protocol explicitly.
---
### Phase 2 — Replace parse path in `ResponseProcessor.ts`
**(a) What to implement**
1. Replace the import at `src/services/worker/agents/ResponseProcessor.ts:15` with `import { parseAgentXml, type ParsedObservation, type ParsedSummary } from '../../../sdk/parser.js';`. Delete `MAX_CONSECUTIVE_SUMMARY_FAILURES` from the `:16` import (keep `SUMMARY_MODE_MARKER`).
2. Replace `processAgentResponse` body at `:69108`:
- Keep `:6267` (lastGeneratorActivity + conversationHistory append).
- Compute `summaryExpected` exactly as today (`:7579`).
- Replace `:70` and `:81` (two separate parse calls) with a single call:
```ts
const parsed = parseAgentXml(text, {
requireSummary: summaryExpected,
correlationId: session.contentSessionId,
sessionId: session.sessionDbId,
});
```
- Replace the non-XML early-fail block `:83108` (26 lines) with:
```ts
if (!parsed.valid) {
const preview = text.length > 200 ? `${text.slice(0, 200)}...` : text;
logger.warn('PARSER', `${agentName} returned invalid response (${parsed.reason}); marking messages as failed`, {
sessionId: session.sessionDbId,
reason: parsed.reason,
preview,
});
const pendingStore = sessionManager.getPendingMessageStore();
for (const messageId of session.processingMessageIds) {
pendingStore.markFailed(messageId);
}
session.processingMessageIds = [];
return;
}
const { observations, summary } = parsed.data;
```
- Everything at `:110174` stays unchanged (normalize, ensureMemorySessionIdRegistered, STORING log, labeledObservations, atomic TX, STORED log, lastSummaryStored) — the single-TX invariant is preserved.
3. **Delete the circuit-breaker block `:176200`** (25 lines) entirely. After deleting, `:202` (claim-confirm) runs immediately after `:174` (lastSummaryStored).
4. No changes to `:202241` (claim-confirm, restartGuard.recordSuccess, Chroma sync, SSE broadcast, cleanup).
5. **(Preflight edit 2026-04-22 — reconciliation C6)** Emit `summaryStoredEvent` when a summary row is committed. After setting `session.lastSummaryStored` (unchanged from today), if `session.summaryStoredEvent` exists (initialized by `SessionManager` when the session is created, see plan 07 Phase 7), call `session.summaryStoredEvent.emit('stored', summaryId)`. This unblocks the blocking `/api/session/end` handler in plan 07 Phase 7 without polling. Contract: emit exactly once per summary commit; `summaryId` is the newly inserted row id from the atomic TX.
```ts
// inside the block that sets session.lastSummaryStored (around :170174)
session.lastSummaryStored = true;
session.summaryStoredEvent?.emit('stored', summaryRowId);
```
**(b) Docs** — §3.7 clean diagram (B→C→D→{Fail | Store}→Confirm→…, lines 299308). Part 1 #21 (line 39), #22 (line 40). Part 2 D5 (line 77). V7b (`:87108`), V7c (`:176200`), V7g (`'failed'` + `markFailed`).
**(c) Verification**
- `grep -n "parseObservations\|parseSummary\|coerceObservationToSummary\|consecutiveSummaryFailures" src/services/worker/agents/ResponseProcessor.ts` → 0 hits.
- `grep -n "MAX_CONSECUTIVE_SUMMARY_FAILURES" src/services/worker/agents/ResponseProcessor.ts` → 0 hits.
- Integration test A — malformed input: send `"Service temporarily unavailable"` as `text`, assert (i) no row inserted in `observations` table, (ii) no row in `session_summaries`, (iii) every id in `session.processingMessageIds` has `status='failed'` in `pending_messages` after the call returns, (iv) `session.processingMessageIds === []`.
- Integration test B — observation-without-summary when summary expected: `summaryExpected=true`, text is `<observation><type>code</type><title>x</title></observation>`, assert (i) no row in `session_summaries`, (ii) no row in `observations` (contract failure fails the whole batch — no partial write), (iii) pending messages marked `failed`. This is **the critical regression test** — today the coerce path would have written a coerced summary row.
- Integration test C — valid obs + summary: single atomic TX still commits both rows together (pre-existing behaviour, no regression).
**(d) Anti-pattern guards**
- **Guard C**: No coercion, no "close-enough" branch. Every `parsed.valid === false` path leads to `markFailed` and `return`.
- **Guard D**: One parse call (`parseAgentXml`) replaces two (`parseObservations` + `parseSummary`). No wrapper facade.
- **Guard A**: No new method on `RestartGuard`, no new class, no new helper file. Direct calls to the existing `PendingMessageStore.markFailed`.
---
### Phase 3 — Remove `consecutiveSummaryFailures` from `ActiveSession` + its consumer
**(a) What to implement**
1. Delete `src/services/worker-types.ts:5153` (the three lines: JSDoc + `consecutiveSummaryFailures: number;` field). Field name must vanish from the type.
2. Delete `src/services/worker/SessionManager.ts:336346` (the 11-line circuit-breaker check in `queueSummarize`). The method body goes straight from the auto-initialize check (`:331334`) to the `// CRITICAL: Persist to database FIRST` comment (`:348`). **This deletion was omitted from the original Phase 3 draft at `06-implementation-plan.md:155204` — V7e is the new citation.**
3. Delete the initialiser `consecutiveSummaryFailures: 0,` at `SessionManager.ts:232` (inside `initializeSession`).
4. Delete the `MAX_CONSECUTIVE_SUMMARY_FAILURES` import in `SessionManager.ts` (if present). Use `grep -n "MAX_CONSECUTIVE_SUMMARY_FAILURES" src/services/worker/SessionManager.ts` first; remove the line.
5. No schema changes. No new `RestartGuard` API (see Dependencies above — option (1)).
**(b) Docs** — §3.7 deletion bullet "consecutiveSummaryFailures counter + circuit-breaker logic (RestartGuard covers this already)" (line 314). Part 1 #22 (line 40). Part 2 D5 (line 77). V7d, V7e, V7f.
**(c) Verification**
- `grep -rn "consecutiveSummaryFailures" src/` → 0 hits.
- `grep -rn "MAX_CONSECUTIVE_SUMMARY_FAILURES" src/` → 0 hits (constant, its JSDoc, all imports gone).
- TypeScript compile succeeds (removing a field and all references is mechanical; no union fallout expected).
- Behavioural test: call `sessionManager.queueSummarize(sessionDbId)` five times in rapid succession with intentionally failing agent output; assert every call enqueues to `pending_messages` (no silent drop) and each failed attempt marks that message `'failed'`. The old circuit breaker would have swallowed calls 45; the new contract doesn't.
- Behavioural test: existing `RestartGuard` still trips after the configured restart count (`MAX_WINDOWED_RESTARTS = 10`, `RESTART_WINDOW_MS = 60_000`) — prove that repeated parse failures + subsequent subprocess restarts still converge to guard-tripped within the window. Covered by `07-session-lifecycle-management` tests; no duplication here.
**(d) Anti-pattern guards**
- **Guard A**: No new `RestartGuard.recordFailure()` invented. The class stays at 70 lines, public API unchanged. Dependency coupling to `07-session-lifecycle-management` is documentation-only.
- **Guard C**: Removing the circuit breaker means failures flow to queue-level `'failed'` state — a single, visible, DB-backed failure signal. No silent swallow.
---
### Phase 4 — Verification sweep
**(a) What to implement** — Grep audit + targeted regression tests. No new code.
**(b) Docs** — §3.7 full deletion list (lines 310315), Phase 3 verification block in `06-implementation-plan.md:189195`.
**(c) Verification — must all return 0 matches**
- `grep -rn "coerceObservationToSummary" src/` → 0.
- `grep -rn "consecutiveSummaryFailures" src/` → 0.
- `grep -rn "MAX_CONSECUTIVE_SUMMARY_FAILURES" src/` → 0.
- `grep -rn "parseObservations\|parseSummary" src/ | grep -v "src/sdk/parser.ts"` → 0 (the only survivors are private helpers inside `parser.ts` itself; if you named them without the `parse` prefix this grep is also 0).
- `grep -rn "coerceFromObservation" src/` → 0.
**(c-cont) Regression tests — must all pass**
- Parser fuzz: feed 1 000 synthetic agent outputs mixing valid/invalid XML + present/absent `<summary>`; assert `valid:false` paths never write to `observations` or `session_summaries`. Must be 0 coerced summary rows.
- Atomic-TX sanity: inject a DB error on `INSERT INTO session_summaries`; assert `storeObservations` rolls back so `observations` for that batch also revert. (Pre-existing invariant; we didn't touch it, but prove it.)
- Idempotency of failure: double-delivery of the same malformed response (e.g., via worker crash + retry) results in the same `pending_messages` row in `'failed'` status; second attempt does not create a duplicate observation. Relies on upstream `02-sqlite-persistence` `UNIQUE(session_id, tool_use_id)` — cross-check with that plan.
- End-to-end: Stop-hook summarize path exercises `parseAgentXml({requireSummary:true})`. With a mocked agent returning garbage, assert the hook receives the 110 s timeout path (no silent summary write), the pending message is `'failed'`, and SessionManager does NOT short-circuit subsequent summarize enqueues (circuit breaker is gone).
**(d) Anti-pattern guards** — All four grep checks enforce Guards A/C/D structurally.
---
## Blast radius
**Files modified**:
- `src/sdk/parser.ts` — full rewrite of public surface; private helpers preserved.
- `src/sdk/prompts.ts` — two-edit surgical change (skip-summary clause, constant delete).
- `src/services/worker/agents/ResponseProcessor.ts` — replace lines 1516 imports, 69108 parse block, delete 176200 circuit breaker.
- `src/services/worker-types.ts` — delete 3 lines.
- `src/services/worker/SessionManager.ts` — delete 11 lines (queueSummarize guard) + 1 line initialiser + maybe 1 import.
**Files not touched**: `src/services/sqlite/observations/store.ts` (atomic TX lives here and is preserved). `src/services/worker/RestartGuard.ts` (API unchanged — see Dependencies option 1). `src/services/worker/agents/SessionCleanupHelper.ts`. `ObservationBroadcaster.ts`. Any Chroma sync module.
**Schema changes**: none.
**Estimated lines deleted**:
- `coerceObservationToSummary` body + JSDoc: ~43 lines
- `coerceFromObservation` branches in `parseSummary`: ~16 lines
- `parseSummary` / `parseObservations` wrapper deduplication: ~15 lines (after collapse into `parseAgentXml`)
- Non-XML early-fail block in `ResponseProcessor.ts:83108`: ~26 lines (replaced by ~12 lines → net 14)
- Circuit breaker in `ResponseProcessor.ts:176200`: ~25 lines
- `consecutiveSummaryFailures` field + initialiser + SessionManager guard: ~15 lines
- `MAX_CONSECUTIVE_SUMMARY_FAILURES` constant + JSDoc + imports: ~8 lines
**Net**: ~135 lines deleted, ~35 lines added → **~100 LoC net reduction**.
## Confidence + gaps
**High confidence**:
- Parser rewrite is mechanical (extract three private fns, compose them, add the discriminated-union return).
- `'failed'` status string + `markFailed` API are verified.
- Circuit-breaker + field removals are pure deletion once call sites are enumerated (V7e catches the missed site).
**Gaps**:
1. **RestartGuard contract claim in D5 is overstated.** D5 says "RestartGuard already exists for repeated failures — delete the separate counter". RestartGuard today only handles **process-restart** loops, not per-message parse failures. This plan adopts the narrower interpretation (parse failure → `markFailed`; existing RestartGuard handles the subprocess-restart side effects unchanged). If the `07-session-lifecycle-management` plan decides to add `RestartGuard.recordFailure()`, callers here can start using it in a follow-up — no churn to this plan. **Flag for `07-session-lifecycle-management` author**: confirm the RestartGuard surface they want.
2. **Prompt updates assumed in-scope.** The audit implies the agent contract "already states `<summary>` or `<skip_summary/>`". Verified: prompts enforce `<summary>` strictly but never mention `<skip_summary/>`. Phase 1b adds the missing clause. If the team prefers to keep `<skip_summary/>` as a *recognised-but-undocumented* escape hatch, Phase 1b can be dropped — but then the parser should be stricter too (reason `missing_summary` when only skip-summary is emitted without prompt permission). Flag for product owner.
@@ -0,0 +1,314 @@
# Plan 04 — vector-search-sync
**Design authority**: `PATHFINDER-2026-04-21/05-clean-flowcharts.md` **section 3.4** (lines 197229). Bullshit ledger items **#24, #25, #26** (lines 4244 of `05-clean-flowcharts.md` Part 1). Implementation-plan anchor: `06-implementation-plan.md` **Phase 10** (lines 452486) and Phase 0 verified findings **V15, V16, V17** (lines 4244).
**Dependency — upstream (blocker)**: Plan `02-sqlite-persistence` **Phase 2** (`07-plans/02-sqlite-persistence.md:154190`) adds `observations.chroma_synced INTEGER DEFAULT 0`, `session_summaries.chroma_synced INTEGER DEFAULT 0`, and partial indexes `idx_observations_chroma_synced` / `idx_summaries_chroma_synced`. This plan ASSUMES that column and indexes exist. Do not start Phase 1 here until Plan 02 Phase 2 is merged and migrated on dev.
**Dependency — downstream (consumer)**: Plan `06-hybrid-search-orchestration` consumes this plan's write-path contract "Chroma down at write time → row committed to SQLite with `chroma_synced=0`, logger.warn, no throw", and the read-path contract "search with Chroma disabled returns 503 `chroma_unavailable`, no silent drop" (see `05-clean-flowcharts.md` section 3.6, lines 270272, bullshit item #32 line 50). Keep both contracts stable.
---
## Sources consulted
- `PATHFINDER-2026-04-21/05-clean-flowcharts.md:197229` — section 3.4 clean flowchart + deletion ledger.
- `PATHFINDER-2026-04-21/05-clean-flowcharts.md:4244` — bullshit items #24 #25 #26.
- `PATHFINDER-2026-04-21/05-clean-flowcharts.md:547548` — Part 5 deletion totals for Chroma (160 + 160 lines; +60 +40 added).
- `PATHFINDER-2026-04-21/06-implementation-plan.md:4244` — verified findings V15, V16, V17.
- `PATHFINDER-2026-04-21/06-implementation-plan.md:452486` — Phase 10 outcome, tasks, verification.
- `PATHFINDER-2026-04-21/01-flowcharts/vector-search-sync.md:1102` — before-state flowchart.
- `PATHFINDER-2026-04-21/07-plans/02-sqlite-persistence.md:154190` — chroma_synced migration (Phase 2).
- `src/services/sync/ChromaSync.ts:125187``formatObservationDocs` (granular, multi-doc).
- `src/services/sync/ChromaSync.ts:193256``formatSummaryDocs` (granular, multi-doc).
- `src/services/sync/ChromaSync.ts:262333``addDocuments` + delete-then-add conflict handler.
- `src/services/sync/ChromaSync.ts:339420``syncObservation` / `syncSummary`.
- `src/services/sync/ChromaSync.ts:479545``getExistingChromaIds` metadata scan.
- `src/services/sync/ChromaSync.ts:554592``ensureBackfilled` + `runBackfillPipeline`.
- `src/services/sync/ChromaSync.ts:864890` — static `backfillAllProjects`.
- `src/services/sync/ChromaSync.ts:903956``updateMergedIntoProject` (kept; uses `chroma_update_documents`).
- `src/services/worker/agents/ResponseProcessor.ts:286308` — observation call site (fire-and-forget).
- `src/services/worker/agents/ResponseProcessor.ts:380405` — summary call site (fire-and-forget).
- `src/services/worker-service.ts:470` — boot-time `ChromaSync.backfillAllProjects()` fire-and-forget.
## Concrete findings
- **CRITICAL — no `chroma_upsert_documents` tool exists in the codebase.** Grep of `ChromaSync.ts` for `upsert` returns zero hits. Available MCP tools used today: `chroma_add_documents` (line 284), `chroma_delete_documents` (line 297), `chroma_update_documents` (lines 899, 942, used only for metadata patching in `updateMergedIntoProject`), `chroma_get_documents` (lines 499, 918), `chroma_query_documents`. `chroma_update_documents` *silently ignores missing IDs* (confirmed by the comment at `ChromaSync.ts:293294`). Therefore a single-call upsert is not available via the current MCP surface.
- **Fallback strategy (documented)**: Replace the write path with "try `chroma_add_documents` first; on `"already exists"` error, call `chroma_delete_documents` then `chroma_add_documents` for that single ID (not the whole batch)." Because the new ID scheme is stable (`obs:<rowid>`, `sum:<rowid>`), conflicts can only occur on legitimate resync — never on organic dedup as before. Keep the branch but collapse it into one helper. Flag: if chroma-mcp ever exposes `chroma_upsert_documents`, replace the add-or-delete+add branch with a single call. Track as a TODO in the code.
- **Write-path is already fire-and-forget** at `ResponseProcessor.ts:286308` and `:380405` (`.then().catch()` with `logger.error`, no await). Do not make it blocking. The `chroma_synced=1` UPDATE must run inside the `.then()` arm; the `logger.warn` + leave-flag-0 must run inside the `.catch()` arm.
- **Granularity today**: an observation with narrative + 3 facts = **4** Chroma docs (`narrative` + `text` + `fact_0..fact_N`). A summary with 6 fields populated = **6** docs. Target: 1 doc per row (2 collections, one per doc_type).
- **`getExistingChromaIds` scans *all* metadata for a project** via paged `chroma_get_documents`. On large corpora this is expensive and happens on every worker boot. Replace with `WHERE chroma_synced=0 LIMIT 1000` scan of SQLite.
- **`updateMergedIntoProject` (lines 903956)** uses `chroma_update_documents` for metadata patching during worktree adoption. That code path is **unrelated** to this plan and must not be touched.
- **Boot-time backfill** is fire-and-forget at `worker-service.ts:470` via static `ChromaSync.backfillAllProjects()`. Swap with instance method `startupBackfillUnsynced()` but keep fire-and-forget.
## Copy-ready snippet locations
| What to copy / cut | From | To |
|---|---|---|
| Replace multi-doc formatter body | `ChromaSync.ts:125187` (`formatObservationDocs`) | One `formatObservationAsDoc` returning single doc; id `obs:${id}`, text `title + "\n\n" + narrative + "\n\n" + facts.join("\n")`, metadata block kept from lines 134157. |
| Replace multi-doc formatter body | `ChromaSync.ts:193256` (`formatSummaryDocs`) | One `formatSummaryAsDoc` returning single doc; id `sum:${id}`, text = all six fields joined with `"\n\n"`, metadata from lines 196204. |
| Rewrite write path | `ChromaSync.ts:262333` (`addDocuments` body) | `upsertDoc(doc)` helper: try `chroma_add_documents` with single id; on `"already exist"` call `chroma_delete_documents` then `chroma_add_documents` for that one id. No batch branch; callers pass a single doc. |
| Replace `syncObservation` tail | `ChromaSync.ts:369377` (`formatObservationDocs``addDocuments`) | `const doc = this.formatObservationAsDoc(stored); await this.upsertDoc(doc); await markObservationSynced(observationId);` |
| Replace `syncSummary` tail | `ChromaSync.ts:411419` (`formatSummaryDocs``addDocuments`) | `const doc = this.formatSummaryAsDoc(stored); await this.upsertDoc(doc); await markSummarySynced(summaryId);` |
| Wrap call sites with flag update | `ResponseProcessor.ts:286308` and `:380405` | Move `UPDATE observations SET chroma_synced=1 WHERE id=?` inside the helper (Phase 3), not in the call site. Leave the call site's `.catch()` as-is; it already logs. |
| Delete — static full-project scanner | `ChromaSync.ts:864890` (`backfillAllProjects`) | Replace with instance method `startupBackfillUnsynced()` that does one SELECT LIMIT 1000 and iterates. |
| Delete — metadata scanner | `ChromaSync.ts:479545` (`getExistingChromaIds`) | Remove entirely after Phase 6 verification passes. |
| Delete — pipeline + per-type backfill | `ChromaSync.ts:554592` (`ensureBackfilled`, `runBackfillPipeline`) + `backfillObservations/Summaries/Prompts` blocks | Remove after `startupBackfillUnsynced()` replaces them. |
| Boot-site swap | `worker-service.ts:470` (`ChromaSync.backfillAllProjects().then(...)`) | `const sync = this.dbManager.getChromaSync(); sync?.startupBackfillUnsynced().then(...)`. Keep fire-and-forget. |
## Confidence + gaps
- **High**: Flag column exists via Plan 02 Phase 2. Write path is fire-and-forget at call sites. Stable ID scheme is trivial. Granular formatter lines (125256) can be excised cleanly. `updateMergedIntoProject` is decoupled from this refactor.
- **Medium**: The `"already exists"` string is the only signal of ID conflict today (lines 292295). If chroma-mcp changes the error message, the delete-then-add branch will silently fall through to the generic error path. Mitigation: match on both `"already exist"` substring and error code if chroma-mcp exposes one (grep on landing day).
- **Gap — unverified MCP assumption**: No `chroma_upsert_documents` tool. Plan commits to delete-then-add fallback. If chroma-mcp adds native upsert post-landing, collapse `upsertDoc` into one call. Flag as TODO at the helper.
- **Gap — prompts**: `backfillPrompts` (`ChromaSync.ts:701`+) and `formatUserPromptDoc` (`:426438`) already produce one doc per prompt. Keep them; this plan only restructures obs + summary. Verify in Phase 4 that prompt backfill is folded into `startupBackfillUnsynced()` using a `user_prompts.chroma_synced` column (add to Plan 02 Phase 2 or skip — see Phase 4 note below).
---
## Phase 1 — One doc per row: rewrite formatters
### (a) What to implement
- Copy metadata block from `src/services/sync/ChromaSync.ts:134157` into a new `formatObservationAsDoc(stored): ChromaDocument` that returns exactly one document.
- Copy metadata block from `src/services/sync/ChromaSync.ts:196204` into a new `formatSummaryAsDoc(stored): ChromaDocument` that returns exactly one document.
- Replace `private formatObservationDocs` (lines 125187) and `private formatSummaryDocs` (lines 193256) with these single-doc versions. Delete the `field_type`, per-fact, per-field, and `obs_${id}_narrative` / `obs_${id}_text` / `summary_${id}_request` ID variants.
Observation doc shape:
```ts
{
id: `obs:${stored.id}`,
document: [stored.title, stored.narrative, facts.join("\n")]
.filter(Boolean)
.join("\n\n"),
metadata: /* existing baseMetadata block */
}
```
Summary doc shape: id `sum:${stored.id}`, document = `[request, investigated, learned, completed, next_steps, notes].filter(Boolean).join("\n\n")`.
### (b) Docs
- `05-clean-flowcharts.md` section 3.4 (line 203 `Format` node) and deletion ledger line 223.
- Bullshit item **#26** (`05-clean-flowcharts.md:44`).
- Verified finding **V16** (`06-implementation-plan.md:43`).
- Live code: `src/services/sync/ChromaSync.ts:125256`.
### (c) Verification
- `grep -n "obs_\${" src/services/sync/ChromaSync.ts` → zero.
- `grep -n "summary_\${" src/services/sync/ChromaSync.ts` → zero.
- `grep -nE "field_type|fact_\\\$\\{" src/services/sync/ChromaSync.ts` → zero.
- Unit test: given an observation with narrative + 3 facts, `formatObservationAsDoc` returns 1 doc whose `document` string contains title, narrative, and each fact, separated by `\n\n`, and `id === "obs:<rowid>"`.
### (d) Anti-pattern guards
- **A (Inventing APIs)**: do not add a new class for the single-doc shape — reuse the existing `ChromaDocument` type (already defined at top of `ChromaSync.ts`).
- **C (Silent fallbacks)**: if title is empty AND narrative is empty AND facts is empty, throw — do not produce an empty vector.
- **E (Two code paths)**: delete the multi-doc branches, do not leave them behind a feature flag.
---
## Phase 2 — Replace delete-then-add with upsert-or-fallback
### (a) What to implement
- Cut `private async addDocuments(documents[])` at `src/services/sync/ChromaSync.ts:262333`.
- Replace with `private async upsertDoc(doc: ChromaDocument): Promise<void>` that:
1. `await this.ensureCollectionExists();`
2. Sanitizes metadata (keep the `filter(([_, v]) => v !== null && v !== undefined && v !== '')` pattern from lines 277281).
3. Calls `chroma_add_documents` with a single-id payload.
4. On thrown error whose message matches `/already exist/i`: call `chroma_delete_documents` with `[doc.id]`, then retry `chroma_add_documents`. Log at `info` level.
5. On any other error: rethrow. The caller (the `.then()`/`.catch()` in Phase 3 or the `ResponseProcessor` fire-and-forget path) logs and sets the flag.
- TODO comment at top of `upsertDoc`: `// TODO: Replace delete+add fallback with chroma_upsert_documents when MCP exposes it.`
### (b) Docs
- `05-clean-flowcharts.md` section 3.4 line 204 (`Upsert` node) and deletion ledger line 222.
- Bullshit item **#25** (`05-clean-flowcharts.md:43`).
- Verified finding **V17** (`06-implementation-plan.md:44`).
- Live code to cut: `src/services/sync/ChromaSync.ts:262333`.
### (c) Verification
- `grep -nE "chroma_upsert_documents|upsertDoc" src/services/sync/ChromaSync.ts``upsertDoc` appears; `chroma_upsert_documents` absent unless chroma-mcp has shipped it.
- Behavioral test: call `upsertDoc({id:"obs:9999", ...})` twice in a row against a live Chroma. Expect: no error, `chroma_count_documents WHERE metadata.sqlite_id=9999` returns 1.
- Behavioral test: rename the collection to a read-only state, call `upsertDoc`. Expect: error propagates, caller's `.catch()` fires.
### (d) Anti-pattern guards
- **A**: do not add a `ChromaUpsertStrategy` class. One helper function.
- **C**: if delete succeeds but re-add fails, rethrow — do not swallow the error and return silently. The caller's `.catch()` path will leave `chroma_synced=0`, and the backfill will retry.
- **D (Facades that pass through)**: do not wrap `chromaMcp.callTool('chroma_add_documents', ...)` in a `ChromaClient.add()` method — call `callTool` directly inside `upsertDoc`.
---
## Phase 3 — Write path sets `chroma_synced=1` on success
### (a) What to implement
- In `SessionStore` (or nearest matching store file — grep for `prepareStatement('UPDATE observations SET ')` to confirm location before editing), add two 1-line helpers: `markObservationSynced(id: number)``UPDATE observations SET chroma_synced=1 WHERE id=?`; and `markSummarySynced(id: number)` likewise against `session_summaries`. Use `db.prepare().run(id)` pattern already used by the store.
- In `ChromaSync.syncObservation` (`ChromaSync.ts:339378`), replace the existing tail (`formatObservationDocs` + `addDocuments`) with:
```ts
const doc = this.formatObservationAsDoc(stored);
await this.upsertDoc(doc);
markObservationSynced(observationId);
```
Wrap the above in a `try`: on throw, `logger.warn('CHROMA_SYNC', 'obs sync failed, flag stays 0', {id: observationId}, err)` and **rethrow** so the `ResponseProcessor.ts:286308` `.catch()` still fires (it logs at error level — do not lose that log).
- Same pattern for `syncSummary` (`ChromaSync.ts:384420`) with `markSummarySynced`.
- Leave the `ResponseProcessor` call site alone — the existing `.then()/.catch()` is correct.
### (b) Docs
- `05-clean-flowcharts.md` section 3.4 lines 205209 (OK branch → `Mark`; fail branch → `LogFail`).
- Bullshit item **#24** (`05-clean-flowcharts.md:42`).
- Phase 10 task 3 (`06-implementation-plan.md:467`).
- Anti-pattern **C** (`06-implementation-plan.md:63`): "On Chroma failure at write time, do not throw — leave flag 0".
- Live call sites: `src/services/worker/agents/ResponseProcessor.ts:286308` (obs) and `:380405` (summary).
### (c) Verification
- Functional test: Chroma enabled, worker running, send one observation → after 1 s, `SELECT chroma_synced FROM observations WHERE id=<new>` returns `1`.
- Functional test: Stop Chroma subprocess (kill chroma-mcp), send one observation → SQLite row commits, `chroma_synced=0`, `logger.warn` line emitted. No 500 to the hook.
- Start Chroma again, restart worker. Phase 4's `startupBackfillUnsynced()` upserts the row; flag flips to `1`.
- `grep -n "chroma_synced=1\\|chroma_synced = 1" src/services/` → finds only the two new `mark*Synced` statements.
### (d) Anti-pattern guards
- **C (Silent fallbacks)**: the `logger.warn` call must include `obsId`, `project`, and the error message — never a bare "sync failed".
- **E**: do not set the flag inside the `.then()` arm at the call site. The store update lives in `ChromaSync`, one place.
- **A**: no `SyncStateMachine`, no `ChromaSyncResult` enum. Boolean column + throw-on-fail is enough.
---
## Phase 4 — Replace backfill trio with `startupBackfillUnsynced()`
### (a) What to implement
- Add instance method on `ChromaSync`:
```ts
async startupBackfillUnsynced(limit = 1000): Promise<void> {
const db = new SessionStore();
try {
const obsRows = db.db.prepare(
'SELECT id FROM observations WHERE chroma_synced = 0 LIMIT ?'
).all(limit) as { id: number }[];
for (const { id } of obsRows) { /* load, formatObservationAsDoc, upsertDoc, markObservationSynced — swallow per-row errors */ }
const sumRows = db.db.prepare(
'SELECT id FROM session_summaries WHERE chroma_synced = 0 LIMIT ?'
).all(limit) as { id: number }[];
for (const { id } of sumRows) { /* same pattern */ }
} finally {
db.close();
}
}
```
- Per-row `try/catch`: a single failed upsert must not abort the whole backfill. Logger.warn per failure; leave flag 0.
- In `src/services/worker-service.ts:470`, replace `ChromaSync.backfillAllProjects().then(...)` with `this.dbManager.getChromaSync()?.startupBackfillUnsynced().then(...).catch(...)`. Keep fire-and-forget.
- Delete `static async backfillAllProjects()` (`ChromaSync.ts:864890`), `ensureBackfilled` (`:554573`), `runBackfillPipeline` (`:575592`), `backfillObservations`, `backfillSummaries`, `backfillPrompts`.
- **Prompts note**: if `user_prompts.chroma_synced` column is not added by Plan 02 Phase 2, then either (a) extend Plan 02 Phase 2 to include it, or (b) keep `formatUserPromptDoc`-based one-shot backfill for prompts only and mark as a follow-up. Do not block Phase 4 on this — flag it and continue.
### (b) Docs
- `05-clean-flowcharts.md` section 3.4 lines 211212 (`BootOnce``CheckUnsync``LoopBackfill`).
- Deletion ledger lines 220, 224.
- Phase 10 task 4 (`06-implementation-plan.md:468`).
- Live code to cut: `src/services/sync/ChromaSync.ts:554592`, `:864890`, and `backfillObservations/Summaries/Prompts` helper bodies (currently inside the 600860 range).
- Boot call site: `src/services/worker-service.ts:470`.
### (c) Verification
- `grep -n "backfillAllProjects\|ensureBackfilled\|runBackfillPipeline" src/` → zero.
- Functional test: Insert 5 observations while Chroma is down. Restart worker. Within 10 s, all 5 rows have `chroma_synced=1` and Chroma collection shows 5 docs with ids `obs:<id>`.
- Functional test: Set 1001 rows to `chroma_synced=0`. Restart worker. Exactly 1000 rows flip to `1` after boot backfill; the 1001st stays `0` until next boot (LIMIT 1000 is intentional — document this).
- Log check: `CHROMA_SYNC` logger emits one `"startup backfill complete"` info line per boot with counts.
### (d) Anti-pattern guards
- **A**: no `BackfillScheduler`, no `cron`, no second setInterval. One boot call, fire-and-forget.
- **B (Polling where events exist)**: the existing 5-s rescan or per-startup metadata scan are the exact pollers being removed — do not add a retry timer here.
- **E**: `startupBackfillUnsynced` must use `upsertDoc` and `formatObservationAsDoc` from Phases 12. Do not write a parallel fast path.
---
## Phase 5 — Delete `getExistingChromaIds` metadata scan
### (a) What to implement
- Delete `private async getExistingChromaIds(projectOverride?: string)` at `src/services/sync/ChromaSync.ts:479545` and every call site (only call today is from the now-deleted `ensureBackfilled`).
- **Precondition**: Phase 4 must be landed and its verification passing. This phase is the cleanup sweep.
- **Do NOT delete** in the same PR as Phase 4 unless the targeted `WHERE chroma_synced=0` backfill has been proven in staging to cover missing-doc recovery. Keep `getExistingChromaIds` dead-code-fenced with an `@deprecated` JSDoc for one release if there is any concern.
### (b) Docs
- `05-clean-flowcharts.md:221` ("`getExistingChromaIds` metadata index scan (~80 lines)").
- Verified finding **V17** (`06-implementation-plan.md:44`).
- Live code to cut: `src/services/sync/ChromaSync.ts:479545`.
### (c) Verification
- `grep -n "getExistingChromaIds" src/` → zero.
- No change in functional behavior vs. end of Phase 4 — this is a pure deletion.
- Re-run Phase 4 functional tests; all pass.
### (d) Anti-pattern guards
- **D (Facades that pass through)**: confirm no caller besides `ensureBackfilled` existed (grep both `ChromaSync.ts` and test files).
- **A**: do not replace with a `getSyncedIds` helper. The SQLite flag is source of truth now.
---
## Phase 6 — Verification gates
### (a) What to implement
Pure test/verification phase. No source edits.
1. **Chroma doc-count = one per obs row**:
- Fresh DB + Chroma. Insert 20 observations. Wait for sync.
- `SELECT COUNT(*) FROM observations WHERE chroma_synced=1` → 20.
- `chroma_count_documents(cm__claude-mem)` → 20 (not 60100 as before).
2. **Idempotent re-sync**:
- For existing observation id 42 (`chroma_synced=1`): call `syncObservation(42, ...)` again (simulate worktree adoption touch-up).
- Expect: no error, Chroma still has exactly one doc with id `obs:42`, SQLite flag still `1`.
3. **Chroma-down write path**:
- Stop chroma-mcp subprocess. Insert 5 observations via hook.
- SQLite rows commit, `chroma_synced=0` for all 5, `logger.warn` emitted 5 times.
- Restart Chroma, restart worker. Within 10 s: 5 rows flip to `1`, Chroma has 5 docs with ids `obs:<id>`.
4. **Downstream contract smoke** (for Plan 06):
- With Chroma disabled (`CLAUDE_MEM_CHROMA_ENABLED=false`), new observations commit with `chroma_synced=0` and no warn spam.
- Search path (Plan 06's 503 contract): not tested here — plan 06 owns that test.
5. **Grep gates** (all must return zero):
- `grep -nE "formatObservationDocs|formatSummaryDocs" src/`
- `grep -nE "backfillAllProjects|ensureBackfilled|runBackfillPipeline|getExistingChromaIds" src/`
- `grep -nE "obs_\\\$\\{|summary_\\\$\\{|field_type" src/services/sync/`
- `grep -n "addDocuments" src/services/sync/` (should show only the new `upsertDoc` name).
### (b) Docs
- `06-implementation-plan.md:473476` (Phase 10 verification list).
- `05-clean-flowcharts.md:228` (effect: ~70% index shrink).
### (c) Verification
- All grep gates green.
- All four functional tests pass in CI.
- Chroma on-disk size (`du -sh ~/.claude-mem/chroma`) drops vs. pre-landing baseline (expected ~70% reduction after a full reindex; partial if tests only rebuild a fraction).
### (d) Anti-pattern guards
- **C**: the idempotent re-sync test catches silent divergence (doc count != row count).
- **E**: the grep gates catch any stray code path left behind.
---
## Blast radius
- **Index regenerates under new doc shape**: users on an upgrade path see the old index until `startupBackfillUnsynced()` catches up. On a large corpus (10k+ observations) with a 1000-row limit per boot, full reindex takes ~10 worker restarts or a one-time `claude-mem reindex` CLI (out of scope for this plan — file follow-up).
- **Breaking ID change** (`obs_42_narrative``obs:42`): any caller that had hard-coded the old ID scheme (there are none in this repo — grep) would break. Third-party search tools reading Chroma directly would also break; document in changelog.
- **Metadata field removal**: `field_type` and `fact_index` disappear from Chroma metadata. If the viewer UI or search filters depend on these, Plan 06 must absorb the change. Grep `src/` for `field_type` and `fact_index` before merging.
## Estimated deletion
Matches the Part-5 ledger entry "Chroma silent-fallback + 90-day filter + granular docs + delete-then-add" (`-220 +60`) plus "Chroma backfill full-project scan" (`-200 +40`). Net for this plan alone: **~-320 lines** (not counting test churn).
@@ -0,0 +1,308 @@
# Plan 05 — context-injection-engine (U2 unified renderObservations)
**Date**: 2026-04-22
**Flowchart**: `PATHFINDER-2026-04-21/05-clean-flowcharts.md` section **3.5** (context-injection-engine clean)
**Before-state**: `PATHFINDER-2026-04-21/01-flowcharts/context-injection-engine.md`
**Design authority**: `05-clean-flowcharts.md` Part 1 item #34, Part 2 Decision **D4**, Part 3 section **3.5**.
---
## Dependencies
**Upstream**: none direct. This plan *introduces* **U2 `renderObservations(obs, strategy)`** — the single traversal that all four existing formatters become strategy configs for.
**Downstream**:
- `06-hybrid-search-orchestration``SearchResultStrategy` is a `renderObservations` strategy (05 section 3.6 arrow `Fmt -->|markdown| M["renderObservations(results, SearchResultStrategy)"]`).
- `10-knowledge-corpus-builder``CorpusDetailStrategy` is a `renderObservations` strategy (05 section 3.11 arrow `D --> E["renderObservations(obs, CorpusDetailStrategy)"]`).
- `09-lifecycle-hooks` — consumes the single `GET /api/session/start` endpoint introduced in 05 section 3.1; that endpoint returns `{sessionDbId, contextMarkdown, semanticMarkdown}` in one payload (Phase 6 below).
**Note on `06-implementation-plan.md`**: Phase 8 of the implementation plan covers the same renderer unification and owns the verification-findings list (V1V20). **There is no V-number for `renderObservations` itself** — the audit's item #34 is the sole design reference. Cited here explicitly so downstream agents don't look for a V-number that doesn't exist.
---
## Sources consulted
1. `PATHFINDER-2026-04-21/05-clean-flowcharts.md` — full file (607 lines). Section 3.5 at lines 232258; Part 1 item #34 at line 52; Decision D4 at line 75; deletion ledger row for this refactor at line 543 (600 lines formatters → +320 renderer + 4 strategies = **280 net**).
2. `PATHFINDER-2026-04-21/06-implementation-plan.md` — Phase 8 at lines 368408. No V-number for renderObservations.
3. `PATHFINDER-2026-04-21/01-flowcharts/context-injection-engine.md` — before diagram; documents the existing two-path surface (`/api/context/inject` GET for SQLite context + `/api/context/semantic` POST for Chroma injection) and the HeaderRenderer/TimelineRenderer/SummaryRenderer/FooterRenderer fan-out.
4. Live codebase — file:line table below.
5. Existing 07-plans/ — directory empty at planning time; this is the first plan file.
### Live file:line inventory (the four formatters + orchestration)
| Concern | File | Lines | Key symbols |
|---|---|---|---|
| **AgentFormatter** (LLM markdown) | `src/services/context/formatters/AgentFormatter.ts` | 227 | `renderAgentHeader` :36, `renderAgentLegend` :46, `renderAgentContextEconomics` :75, `renderAgentDayHeader` :103, `renderAgentTableRow` :127, `renderAgentFullObservation` :142, `renderAgentSummaryItem` :177, `renderAgentSummaryField` :189, `renderAgentPreviouslySection` :197, `renderAgentFooter` :214, `renderAgentEmptyState` :225, private `compactTime` :120, private `formatHeaderDateTime` :21 |
| **HumanFormatter** (ANSI terminal) | `src/services/context/formatters/HumanFormatter.ts` | 238 | `renderHumanHeader` :35, `renderHumanLegend` :47, `renderHumanColumnKey` :60, `renderHumanContextIndex` :72, `renderHumanContextEconomics` :87, `renderHumanDayHeader` :116, `renderHumanFileHeader` :126, `renderHumanTableRow` :135, `renderHumanFullObservation` :155, `renderHumanSummaryItem` :186, `renderHumanSummaryField` :200, `renderHumanPreviouslySection` :208, `renderHumanFooter` :225, `renderHumanEmptyState` :236, private `formatHeaderDateTime` :20 |
| **ResultFormatter** (search markdown, class) | `src/services/worker/search/ResultFormatter.ts` | 301 | `class ResultFormatter` :21, `formatSearchResults` :25 (the top-level walker), `combineResults` :115, `formatSearchTableHeader` :141, `formatTableHeader` :149, `formatObservationSearchRow` :157, `formatSessionSearchRow` :178, `formatPromptSearchRow` :199, `formatObservationIndex` :221, `formatSessionIndex` :237, `formatPromptIndex` :250, `estimateReadTokens` :264, `formatChromaFailureMessage` :275, `formatSearchTips` :288 |
| **CorpusRenderer** (corpus detail, class) | `src/services/worker/knowledge/CorpusRenderer.ts` | 133 | `class CorpusRenderer` :10, `renderCorpus` :14 (the top-level walker), `renderObservation` :39 (private, the per-obs detail renderer), `estimateTokens` :90, `generateSystemPrompt` :97 |
| Orchestrator | `src/services/context/ContextBuilder.ts` | 186 | `generateContext` :130, `buildContextOutput` :80, `initializeDatabase` :49, `renderEmptyState` :73 (calls both empty-state functions) |
| Day-grouping walker (shared today) | `src/services/context/sections/TimelineRenderer.ts` | 183 | `groupTimelineByDay` :21, `renderTimeline` :168, `renderDayTimeline` :151 (forHuman branch :159), `renderDayTimelineAgent` :56, `renderDayTimelineHuman` :97, private `getDetailField` :46 |
| Section dispatch (forHuman branching) | `src/services/context/sections/HeaderRenderer.ts` | 61 | `renderHeader` :15 (branches forHuman for 5 sub-sections) |
| Section dispatch | `src/services/context/sections/SummaryRenderer.ts` | 65 | `shouldShowSummary` :15, `renderSummaryFields` :46 (branches forHuman) |
| Section dispatch | `src/services/context/sections/FooterRenderer.ts` | 42 | `renderPreviouslySection` :15 (branches forHuman), `renderFooter` :28 (branches forHuman) |
| Token economics (KEEP) | `src/services/context/TokenCalculator.ts` | 78 | `calculateTokenEconomics`, `formatObservationTokenDisplay`, `shouldShowContextEconomics` |
| Mode filtering (KEEP) | `src/services/domain/ModeManager.ts` | 266 | `ModeManager.getInstance()`, `getActiveMode`, `getTypeIcon`, `getWorkEmoji` |
| HTTP caller (today) | `src/services/worker/http/routes/SearchRoutes.ts` | — | `handleContextInject` :209 (GET, dynamically imports `context-generator.generateContext`), `handleSemanticContext` :258 (POST, inlines its own formatter at :286293) |
**Top-level LoC of the four formatters**: 227 + 238 + 301 + 133 = **899 lines**. Section dispatch files (Header/Summary/Footer/Timeline) add another 61 + 65 + 42 + 183 = **351 lines of forHuman branching** that collapse once strategies own the shape.
### Copy-ready: the shared "walk" all four formatters share
Every formatter does some subset of the same four-step traversal. The invariants below become the body of `renderObservations`:
1. **Optional header**: project/title/date line + legend + economics. Today: `HeaderRenderer.renderHeader` (`HeaderRenderer.ts:15`) + `ResultFormatter.formatSearchResults` :53 + `CorpusRenderer.renderCorpus` :17. → Strategy flag: `header: 'context' | 'search' | 'corpus' | 'none'`.
2. **Group and iterate** — the core walk. Today: `groupTimelineByDay` (`TimelineRenderer.ts:21`) for agent/human paths; `groupByDate` (`shared/timeline-formatting.ts`) + file-bucketing at `ResultFormatter.ts:5672` for search; flat iteration for corpus at `CorpusRenderer.ts:2831`. → Strategy flag: `grouping: 'by-day' | 'by-day-then-file' | 'none'`.
3. **Per-observation row** — either compact line or full-detail block. Today: `renderAgentTableRow`/`renderAgentFullObservation`, `renderHumanTableRow`/`renderHumanFullObservation`, `formatObservationSearchRow`/`formatObservationIndex`, `CorpusRenderer.renderObservation`. → Strategy flag: `density: 'compact' | 'table' | 'full-detail'` + `colorize: boolean` + `columns: [...]` + `showTokens: {read, work}`.
4. **Optional tail**: summary fields + previously section + footer tips. Today: `SummaryRenderer.renderSummaryFields`, `FooterRenderer.renderPreviouslySection`, `FooterRenderer.renderFooter`, `ResultFormatter.formatSearchTips`. → Strategy flag: `tail: 'context' | 'search-tips' | 'corpus-stats' | 'none'`.
The **five constants** all four share: `ModeManager.getTypeIcon(type)` for the type emoji, `formatTime(epoch)` / `formatDate` / `formatDateTime` from `shared/timeline-formatting.ts`, `extractFirstFile` for file extraction, `parseJsonArray` for facts parsing, and the title-fallback rule `obs.title || 'Untitled'`. These move unchanged into the renderer.
### Confidence + gaps
**High confidence**:
- File inventory, LoC, and symbol-level API of the four formatters.
- That all four read the same shape (`Observation` with `id/title/narrative/facts/type/created_at_epoch/files_modified/files_read`).
- Decision D4's four-strategy ceiling: **Agent, Human, SearchResult, CorpusDetail** — no others.
**Gaps / risks**:
- **ANSI-color preservation in `HumanContextStrategy` is a regression surface**. `HumanFormatter.ts` uses `colors.bright`, `colors.cyan`, `colors.gray`, `colors.dim`, `colors.yellow`, `colors.magenta`, `colors.green`, `colors.blue` imported from `../types.js`. Any divergence — including trailing spaces around ANSI wrappers, padding in `renderHumanTableRow` at :145 (`' '.repeat(time.length)` when `showTime=false`), and the `─`×60 separator at `:39` and `:237` — is a user-visible regression. Phase 8 fixtures must assert byte equality including escape sequences.
- **ResultFormatter has two row formats** (`formatSearchTableHeader` without `Work` column + `formatTableHeader` with `Work` column). `SearchResultStrategy` must support both, gated by a `columns` array — otherwise index-rendering callers (`formatObservationIndex` used elsewhere) regress silently. Grep during Phase 4 to enumerate callers before choosing defaults.
- Semantic-injection POST handler at `SearchRoutes.ts:286293` implements **its own mini-formatter** (`## Relevant Past Work (semantic match)` header + `### title (date)` + narrative). Anti-pattern E forbids this post-refactor. Phase 6 folds it into a `SearchResultStrategy` variant or a narrow `SemanticInjectStrategy` (still counts as a `SearchResult` strategy per Decision D4's four-total rule — treat this as a strategy *flag*, not a fifth strategy).
---
## Phase contract (applies to every phase)
Every phase below carries:
- **(a) What**: "Copy from …" instructions. The four existing formatters become four strategy configs feeding ONE `renderObservations`.
- **(b) Docs**: `05-clean-flowcharts.md` section 3.5 + Decision D4 + live file:line for each of the four formatters (table above).
- **(c) Verification**: unit tests per strategy against a fixed `Observation[]` fixture; **byte-for-byte match** against the old formatter's output for identical inputs.
- **(d) Anti-pattern guards**:
- **Guard A** (audit Part 2): only four strategies — `AgentContextStrategy`, `HumanContextStrategy`, `SearchResultStrategy`, `CorpusDetailStrategy`. Any fifth strategy fails review.
- **Guard E** (audit Part 2): single renderer path. No caller may implement its own walker. Grep check (Phase 8) enforces.
---
## Phase 1 — Extract common traversal into `renderObservations(obs, strategy)`
**(a) What**:
Create a new module `src/services/rendering/renderObservations.ts` (new folder `src/services/rendering/` so no caller is forced to import across feature boundaries). Copy the *walk* from the three existing walkers:
- Day grouping: from `TimelineRenderer.groupTimelineByDay` (`src/services/context/sections/TimelineRenderer.ts:21`).
- Day-then-file grouping: from `ResultFormatter.formatSearchResults` (`src/services/worker/search/ResultFormatter.ts:5672`).
- Flat iteration: from `CorpusRenderer.renderCorpus` (`src/services/worker/knowledge/CorpusRenderer.ts:2831`).
Signature:
```ts
export interface RenderStrategy {
name: 'agent-context' | 'human-context' | 'search-result' | 'corpus-detail';
header?: (ctx: HeaderCtx) => string[];
grouping: 'by-day' | 'by-day-then-file' | 'none';
renderGroupHeader?: (key: string) => string[];
renderSubgroupHeader?: (key: string) => string[]; // e.g., file within day
renderSummaryItem?: (s: SummaryItem, time: string) => string[];
renderRow: (obs: Observation, ctx: RowCtx) => string;
renderFullObservation?: (obs: Observation, ctx: RowCtx) => string[];
tail?: (ctx: TailCtx) => string[];
emptyState?: (ctx: HeaderCtx) => string;
}
export function renderObservations(
items: Array<Observation | SummaryItem>,
strategy: RenderStrategy,
ctx: RenderContext,
): string;
```
The orchestrator owns: (1) token budget enforcement (from `calculateTokenEconomics`, `TokenCalculator.ts:25`), (2) mode filtering (from `ModeManager.getActiveMode()`, `ModeManager.ts:15`), (3) full-vs-compact selection (from `getFullObservationIds` in `ObservationCompiler.ts`). Strategies **do not** re-implement any of this.
**(b) Docs**: 05 section 3.5 lines 234251; Decision D4 line 75. File:line for all four formatters per inventory table.
**(c) Verification**:
- Unit tests: `tests/services/rendering/renderObservations.test.ts` — three tests, one per `grouping` mode, with a synthetic `Observation[]` of 5 items across 2 days and 3 files.
- Build check: `npm run build-and-sync` passes after new module is in place (not yet wired).
**(d) Anti-pattern guards**: A — stop at four strategy names (compile-time `name` union enforces). E — module is the single renderer; callers will switch to it in Phase 6, Phase 7 deletes the old paths.
---
## Phase 2 — `AgentContextStrategy` from `AgentFormatter`
**(a) What**: Create `src/services/context/strategies/AgentContextStrategy.ts` and copy the output-shape bytes from `AgentFormatter.ts` into strategy callbacks:
- `header``renderAgentHeader` (:36) + `renderAgentLegend` (:46) + `renderAgentColumnKey` (:61, no-op) + `renderAgentContextIndex` (:68, no-op) + `renderAgentContextEconomics` (:75) composed in order per `HeaderRenderer.renderHeader` :15.
- `grouping: 'by-day'`; `renderGroupHeader``renderAgentDayHeader` (:103).
- `renderSummaryItem``renderAgentSummaryItem` (:177).
- `renderRow``renderAgentTableRow` (:127); `renderFullObservation``renderAgentFullObservation` (:142).
- `tail``renderAgentSummaryField` (:189) for each of the four fields + `renderAgentPreviouslySection` (:197) + `renderAgentFooter` (:214).
- `emptyState``renderAgentEmptyState` (:225).
The shared `formatHeaderDateTime` (:21) and `compactTime` (:120) move into `src/services/rendering/render-helpers.ts` or stay inline in the strategy (two callers — no DRY pressure yet).
**(b) Docs**: 05 section 3.5 arrow `Strategy -->|AgentContextStrategy| AgentOut["Compact markdown for LLM"]` (line 244); inventory row for `AgentFormatter.ts` above.
**(c) Verification**: snapshot test — feed the same `Observation[]` fixture to (i) the old `buildContextOutput(..., forHuman=false)` and (ii) `renderObservations(items, AgentContextStrategy, ctx)`; assert string equality. Zero-tolerance: LLM context is consumed by models — any whitespace change shifts KV-cache and can surface as behavioral regressions.
**(d) Anti-pattern guards**: A — strategy file defines the config object only, no walker. E — no custom grouping code; reuse Phase 1's `by-day` grouping.
---
## Phase 3 — `HumanContextStrategy` from `HumanFormatter` (preserves ANSI)
**(a) What**: Create `src/services/context/strategies/HumanContextStrategy.ts`. Copy output-shape bytes from `HumanFormatter.ts`:
- `header``renderHumanHeader` (:35) + `renderHumanLegend` (:47) + `renderHumanColumnKey` (:60) + `renderHumanContextIndex` (:72) + `renderHumanContextEconomics` (:87).
- `grouping: 'by-day-then-file'`; `renderGroupHeader``renderHumanDayHeader` (:116); `renderSubgroupHeader``renderHumanFileHeader` (:126).
- `renderSummaryItem``renderHumanSummaryItem` (:186).
- `renderRow``renderHumanTableRow` (:135) — **preserves `colors.dim`, `colors.cyan`, `colors.bright`, `colors.reset` escapes and the ` '.repeat(time.length)` padding for `showTime=false`** (see HumanFormatter.ts:145).
- `renderFullObservation``renderHumanFullObservation` (:155).
- `tail``renderHumanSummaryField` (:200) per field (with its per-field ANSI color from `SummaryRenderer.ts:5256``blue/yellow/green/magenta`) + `renderHumanPreviouslySection` (:208) + `renderHumanFooter` (:225).
- `emptyState``renderHumanEmptyState` (:236) — note the literal `─`×60 separator and the `\n` layout.
ANSI `colors` import from `src/services/context/types.js` stays inside this strategy only. The renderer core is ANSI-agnostic.
**(b) Docs**: 05 section 3.5 arrow `Strategy -->|HumanContextStrategy| HumanOut["ANSI-colored terminal"]` (line 245); inventory row for `HumanFormatter.ts`; D4 explicit about "columns/density/grouping" plus `colorize` per Phase 8 sketch in 06-implementation-plan.md line 385.
**(c) Verification**: snapshot test with explicit ANSI-escape comparison. Fixture MUST include: a no-time continuation row (to exercise the `' '.repeat(time.length)` padding at :145), a full-observation row with facts (exercises :167177), and the empty-state path (exercises :237). Assert raw buffer equality — not stripped-ANSI equality. Confidence gap: this is the highest regression risk in the plan (see Gaps above).
**(d) Anti-pattern guards**: A — one human strategy. E — no duplicate ANSI wrapping helper; `colors` constants travel with the strategy.
---
## Phase 4 — `SearchResultStrategy` from `ResultFormatter`
**(a) What**: Create `src/services/worker/search/strategies/SearchResultStrategy.ts`. Copy from `ResultFormatter.ts`:
- `header` ← the `Found N result(s) matching "…"` line at :53 (parameterized on query + counts).
- `grouping: 'by-day-then-file'`; `renderGroupHeader` ← day label ``### ${day}`` (:57); `renderSubgroupHeader` ← `**${file}**` + `formatSearchTableHeader` :141 (the `| ID | Time | T | Title | Read |` header).
- `renderRow` dispatches on item kind: `formatObservationSearchRow` (:157), `formatSessionSearchRow` (:178), `formatPromptSearchRow` (:199). The `lastTime` threading for `"` continuation stays in the renderer's `RowCtx` (from Phase 1).
- `tail``formatSearchTips` (:288) appended when not empty.
- `emptyState``No results found matching "${query}"` (:38) / `formatChromaFailureMessage` (:275) gated by a new `ctx.chromaFailed` flag.
The index-column variant (`formatObservationIndex` :221 etc., with the `Work` column) becomes a strategy *option* `columns: ['id','time','type','title','read'] | ['id','time','type','title','read','work']`. Before choosing a default, grep Phase 4 callers to enumerate usages — confidence gap noted above.
**(b) Docs**: 05 section 3.6 line 281 (`renderObservations(results, SearchResultStrategy)`); inventory row for `ResultFormatter.ts`. Cross-reference: `06-hybrid-search-orchestration` plan (downstream) will consume this strategy.
**(c) Verification**: feed the same `SearchResults` fixture to `ResultFormatter.formatSearchResults` and to `renderObservations(combined, SearchResultStrategy, ctx)`; assert byte equality including the date-group headers, file headers, table pipe characters, and trailing blank lines.
**(d) Anti-pattern guards**: A — single `SearchResultStrategy`; if semantic-injection handler at `SearchRoutes.ts:286293` needs a different shape, it becomes a **flag** on this strategy (`variant: 'table' | 'injection'`), not a fifth strategy. E — delete any caller that still walks `results.observations.map(...)` by hand (Phase 7 grep).
---
## Phase 5 — `CorpusDetailStrategy` from `CorpusRenderer`
**(a) What**: Create `src/services/worker/knowledge/strategies/CorpusDetailStrategy.ts`. Copy from `CorpusRenderer.ts`:
- `header``CorpusRenderer.renderCorpus` :1426 (the `# Knowledge Corpus: …`, description, stats block, `---` divider). Parameterized on `CorpusFile.name/description/stats`.
- `grouping: 'none'` — corpus walks flat (:2831).
- `renderFullObservation``CorpusRenderer.renderObservation` (:39) — full narrative, facts list, concepts, files_read, files_modified. No compact row form; every observation renders at full detail (per CorpusRenderer.ts:5).
- `tail: undefined` — corpus has no tail beyond the trailing `---`.
`generateSystemPrompt` (:97) is **not** part of the strategy — it's a separate function on the corpus feature that stays where it is. `estimateTokens` (:90) already moves to `shared/timeline-formatting.ts` as `estimateTokens` (it's already there per `ResultFormatter.ts:17` import); delete the duplicate at `CorpusRenderer.ts:90`.
**(b) Docs**: 05 section 3.11 line 457 (`renderObservations(obs, CorpusDetailStrategy)`); inventory row for `CorpusRenderer.ts`. Cross-reference: `10-knowledge-corpus-builder` plan (downstream) consumes this strategy.
**(c) Verification**: feed the same `CorpusFile` to `CorpusRenderer.renderCorpus` and to `renderObservations(corpus.observations, CorpusDetailStrategy, {corpus})`; assert byte equality. Important: corpus output is a *prompt* — whitespace divergence changes prompt-cache hit rate on the SDK side (see 05 section 3.11 cost note, line 476).
**(d) Anti-pattern guards**: A — single `CorpusDetailStrategy`. E — `KnowledgeAgent` and `CorpusBuilder` both route through it; no direct `CorpusRenderer` instantiation post-Phase 7.
---
## Phase 6 — Switch `ContextBuilder.generateContext` + `/api/session/start` handler to `renderObservations`
**(a) What**:
1. Rewrite `src/services/context/ContextBuilder.ts`:
- `buildContextOutput` :80 collapses to: resolve strategy = `forHuman ? HumanContextStrategy : AgentContextStrategy`, build `RenderContext` (economics, fullObservationIds, priorMessages, mostRecentSummary), call `renderObservations(timeline, strategy, ctx)`. The explicit `renderHeader`/`renderTimeline`/`renderSummaryFields`/`renderPreviouslySection`/`renderFooter` fan-out at :95119 deletes in favor of strategy-owned `header`/`renderGroupHeader`/`renderRow`/`tail`.
- `renderEmptyState` :73 collapses to `strategy.emptyState?.(ctx)`.
- `generateContext` :130 signature is unchanged — external callers see identical input/output.
2. Add the new `/api/session/start` handler (per 05 section 3.1 line 95 `GET /api/session/start?project=…`). Owned by `lifecycle-hooks` plan (09); this plan lands the *renderer-facing* side: one call into `generateContext(forHuman:false)` for `contextMarkdown`, one call into `SearchOrchestrator.search(query, limit=5)` + `renderObservations(results, SearchResultStrategy, {variant:'injection'})` for `semanticMarkdown`. Both served from a single response body.
3. Delete the inline mini-formatter at `SearchRoutes.ts:286293` (the `## Relevant Past Work …` block); route through `SearchResultStrategy`.
**(b) Docs**: 05 section 3.5 entry arrows lines 236242; 05 section 3.1 lines 95 + 100 (one `/api/session/start` returns ctx + semantic); 06 plan Phase 8 lines 391394.
**(c) Verification**:
- End-to-end byte-identity: capture the pre-refactor output of `GET /api/context/inject?projects=X&colors=true` and `…&colors=false` for a seeded DB; after the switch, curl the same and diff. Zero diff.
- New `/api/session/start` returns `{sessionDbId, contextMarkdown, semanticMarkdown}` (per 05 section 3.1 line 100) with the two markdown fields byte-matching the previous two-endpoint responses.
- `npm run build-and-sync` passes.
**(d) Anti-pattern guards**: A — no new strategies introduced. E — `SearchRoutes.handleSemanticContext` either deleted (covered by `/api/session/start`) or its body becomes a single `renderObservations(…, SearchResultStrategy, {variant:'injection'})` call — no more inline `lines.push('### …')`.
---
## Phase 7 — Delete the four old formatter files; update imports
**(a) What**:
1. `rm src/services/context/formatters/AgentFormatter.ts` (227 lines).
2. `rm src/services/context/formatters/HumanFormatter.ts` (238 lines).
3. `rm src/services/worker/search/ResultFormatter.ts` (301 lines).
4. `rm src/services/worker/knowledge/CorpusRenderer.ts` (133 lines).
5. Delete `src/services/context/sections/{HeaderRenderer,TimelineRenderer,SummaryRenderer,FooterRenderer}.ts` — their forHuman branching is now owned by strategies. `ObservationCompiler.ts` keeps the data-loading helpers (`queryObservations`, `buildTimeline`, `getFullObservationIds` — these feed the renderer, not part of the deletion).
6. Update imports at: `ContextBuilder.ts` (switch to `renderObservations` + strategies), `SearchManager.ts` / `SearchRoutes.ts` (switch to `SearchResultStrategy`), `KnowledgeAgent.ts` / `CorpusBuilder.ts` (switch to `CorpusDetailStrategy`). Grep for every `import … from '.*AgentFormatter|HumanFormatter|ResultFormatter|CorpusRenderer'` — expect zero after this phase.
**Net line impact**: deletes 227 + 238 + 301 + 133 + 61 + 183 + 65 + 42 = **1,250 lines**. Adds ~320 for `renderObservations` + 4 strategies + shared helpers. **Net ≈ 930 lines** — beats the audit's estimate at 05 line 543 (280 net) because the forHuman branching in the section renderers was not counted there.
**(b) Docs**: 05 section 3.5 "Deleted" list lines 253256; 06 plan Phase 8 verification line 397.
**(c) Verification**:
- `grep -rn "AgentFormatter\|HumanFormatter\|ResultFormatter\|CorpusRenderer" src/ tests/` → zero hits.
- `grep -rn "renderHeader\|renderTimeline\|renderSummaryFields\|renderPreviouslySection\|renderFooter" src/services/context/sections/` → zero hits (directory removed).
- `npx tsc --noEmit` passes.
- `npm run build-and-sync` passes.
**(d) Anti-pattern guards**: D — no compatibility shim re-exports old names. E — single walker; grep `for (const .* of .*observations)` in `src/services/worker/` and `src/services/context/` should only match inside `renderObservations.ts` (and test fixtures).
---
## Phase 8 — Verification: byte-identical output for all four paths
**(a) What**: Add four golden-file fixtures under `tests/fixtures/rendering/`:
- `agent-context.txt` — output of old `generateContext(input, forHuman=false)` captured before Phase 6.
- `human-context.ansi` — raw bytes including ANSI escapes from old `generateContext(input, forHuman=true)`.
- `search-result.md` — output of old `ResultFormatter.formatSearchResults(results, "test query")`.
- `corpus-detail.md` — output of old `CorpusRenderer.renderCorpus(corpus)`.
Capture on the branch tip *before* Phase 1 so the baseline is pre-refactor. Each phase's unit test (Phases 25) diffs against its golden file.
A final integration test runs the four renderers end-to-end against a seeded DB and diffs all four outputs simultaneously.
**(b) Docs**: 06 plan Phase 8 verification lines 396399 ("Snapshot tests: for each strategy, feed the same fixture `Observation[]` and assert output is byte-equal to the old formatter's output").
**(c) Verification**:
- All four snapshot tests green.
- Grep audit: `grep -rn "setInterval\|formatObservation\|renderObservation" src/ | grep -v renderObservations.ts | grep -v test` — zero hits outside the one renderer.
- SessionStart end-to-end: trigger a real Claude Code session with `npm run build-and-sync`; Agent context in the session + ANSI context in terminal both diff-clean against pre-refactor capture.
- Chroma corpus query test: build a corpus, query it 3× within 5 minutes, assert `cache_read_input_tokens > 0` on SDK response (proves corpus prompt bytes are stable, per 05 section 3.11 cost note).
**(d) Anti-pattern guards**: A — tests enforce the four-strategy ceiling by unioned `name` type. E — the grep audit above is the single-walker check.
---
## Constraints summary
- **Zero behavior change** for LLM (Agent) output bytes and human terminal ANSI bytes. Enforced by Phase 8 golden files.
- **Token-budget logic stays in the orchestrator** (`calculateTokenEconomics` at `TokenCalculator.ts:25`; `getFullObservationIds` at `ObservationCompiler.ts`). Strategies receive computed `RowCtx.isFull`, never re-decide.
- **Mode filtering stays in the orchestrator** (`ModeManager.getActiveMode()` at `ModeManager.ts:15`). Strategies receive filtered `Observation[]`.
- **ANSI color codes preserved**: all `colors.*` literals from `src/services/context/types.js` travel into `HumanContextStrategy` only. The renderer core is ANSI-agnostic.
- **Four strategies, no more**: `AgentContextStrategy`, `HumanContextStrategy`, `SearchResultStrategy`, `CorpusDetailStrategy`. Variants live as strategy config flags.
---
## Phase count
**8 phases.**
- Phase 1: extract renderer.
- Phase 2: `AgentContextStrategy`.
- Phase 3: `HumanContextStrategy` (ANSI).
- Phase 4: `SearchResultStrategy`.
- Phase 5: `CorpusDetailStrategy`.
- Phase 6: wire `ContextBuilder.generateContext` + `/api/session/start`.
- Phase 7: delete old formatters + section renderers.
- Phase 8: byte-identical verification.
---
## Blast radius + estimated LoC
- **Files deleted**: 8 (four formatters + four section renderers).
- **Files created**: ~6 (`renderObservations.ts` + 4 strategy files + shared helpers).
- **Lines deleted**: ~1,250 (AgentFormatter 227 + HumanFormatter 238 + ResultFormatter 301 + CorpusRenderer 133 + HeaderRenderer 61 + TimelineRenderer 183 + SummaryRenderer 65 + FooterRenderer 42).
- **Lines added**: ~320 (renderer + four strategies, per audit estimate at 05 line 543).
- **Net**: **930 lines**, ~3.3× the audit's row-level estimate of 280, once the forHuman branching in `*Renderer.ts` section files is counted.
Risk: lowest of the cleanup plan (pure reorganization, no behavior change). Snapshot tests are the safety net.
@@ -0,0 +1,283 @@
# Plan 06 — hybrid-search-orchestration (clean)
> **Design authority**: `05-clean-flowcharts.md` section 3.6. This plan implements that diagram. When plan and audit disagree, the `06-implementation-plan.md` verified-findings (Phase 0, V11) take precedence.
## Dependencies
- **Upstream**: `07-plans/05-context-injection-engine.md` — introduces `renderObservations(obs, strategy)` and the `SearchResultStrategy` strategy config (derived from `ResultFormatter.ts`). This plan consumes that strategy; it does NOT create it. Hard blocker: Phase 6 below cannot land until Plan 05 Phase 4 lands.
- **Downstream**: `07-plans/10-knowledge-corpus-builder.md``CorpusBuilder.build` calls `SearchOrchestrator.search(params)`. Signature stability of `SearchOrchestrator.search` is the contract Plan 10 depends on. Do not rename. Do not change the shape of `StrategySearchResult`.
## Sources consulted
1. `PATHFINDER-2026-04-21/05-clean-flowcharts.md` — section 3.6 (lines 262292); Part 1 bullshit items #30 #31 #32 #33 (lines 4851).
2. `PATHFINDER-2026-04-21/06-implementation-plan.md` — Phase 0 V11 (line 38); Phase 4 (lines 208242); anti-pattern guards C and D (lines 6364).
3. `PATHFINDER-2026-04-21/01-flowcharts/hybrid-search-orchestration.md` — before-state; full 97 lines.
4. `src/services/worker/SearchManager.ts:1-2069` — full method inventory via grep; spot-read `:1-200`, `:1209-1310`.
5. `src/services/worker/search/SearchOrchestrator.ts:1-290` — confirmed `search(args: any): Promise<StrategySearchResult>` signature; `executeWithFallback` at `:81-121`; silent fallback branch at `:100-110`.
6. `src/services/worker/search/strategies/ChromaSearchStrategy.ts:1-247``filterByRecency` at `:196-217`; hard-coded 90-day cutoff via `SEARCH_CONSTANTS.RECENCY_WINDOW_MS` at `:200`.
7. `src/services/worker/search/strategies/SQLiteSearchStrategy.ts:1-132`, `HybridSearchStrategy.ts:1-240`, `SearchStrategy.ts:1-61` — strategy interface and existence confirmed.
8. `src/services/worker/search/types.ts:15-16``RECENCY_WINDOW_DAYS: 90` and `RECENCY_WINDOW_MS: 90 * 24 * 60 * 60 * 1000`.
9. `src/services/worker/http/routes/SearchRoutes.ts:1-303` — 14 search/context handlers, all delegating `await this.searchManager.<method>(req.query)`.
10. `PATHFINDER-2026-04-21/07-plans/05-context-injection-engine.md``SearchResultStrategy` signature & path (`src/services/worker/search/strategies/SearchResultStrategy.ts` per that plan's Phase 4).
## Concrete findings
### SearchManager method inventory (2069 lines)
Classifications per Decision D ("if body is `return this.other.method(args)`, delete it"):
| `:line` | Method | Classification | Notes |
|---|---|---|---|
| `:59` | `queryChroma` | **real-work (but @deprecated)** | Pre-Orchestrator; called only by `searchChromaForTimeline` and `findByConcept`/`findByFile` hybrid paths inside `SearchManager`. **DELETE** (item #30). |
| `:70` | `searchChromaForTimeline` | **real-work (but @deprecated)** | Bakes 90-day cutoff via `ninetyDaysAgo` param. Callers: only `timeline()` `:490`. **DELETE** (item #30). |
| `:103` | `normalizeParams` | **display-wrap helper** | SearchOrchestrator `:239` has an equivalent. This one adds `filePath→files`, `concept→concepts`, `isFolder` coercion. If we keep SearchManager display-wrap, keep this. Otherwise fold into SearchOrchestrator.normalizeParams and delete. |
| `:161` | `search` | **real-work (display-wrap)** | Lines 161445: re-implements the whole decision tree + recency filter + categorization + markdown tables. Contains one of four 90-day filter copies (`:230-259`). This is the V11 "real work" method. **REFACTOR**: decision tree/execution deleted (already in Orchestrator); keep only the markdown combining → migrate to `renderObservations(combined, SearchResultStrategy)`. |
| `:450` | `timeline` | **real-work (display-wrap)** | Uses `searchChromaForTimeline` `:490` + 90-day cutoff `:488`. Delegates to `TimelineBuilder` for rendering. **REFACTOR**: strip 90-day cutoff; call `SearchOrchestrator` timeline helpers (`getTimeline`, `formatTimeline` at Orchestrator `:185-209`). |
| `:731` | `decisions` | **display-wrap** | Semantic shortcut; queries Chroma for "decision" observations, renders tables. Route could call `SearchOrchestrator.search({query:'decision', ...})` directly; keep the markdown wrap. |
| `:810` | `changes` | **display-wrap** | Same shape as `decisions`. |
| `:894` | `howItWorks` | **display-wrap** | Same shape. |
| `:951` | `searchObservations` | **pass-through** (with backward-compat shim) | `{type:'observations'}` preset + call through. **DELETE**; route calls `SearchOrchestrator.search({...req.query, type:'observations'})`. |
| `:1037` | `searchSessions` | **pass-through** | Same; `type:'sessions'`. **DELETE**. |
| `:1123` | `searchUserPrompts` | **pass-through** | Same; `type:'prompts'`. **DELETE**. |
| `:1209` | `findByConcept` | **real-work (display-wrap)** | Duplicates the two-phase hybrid logic that exists in `HybridSearchStrategy.findByConcept` at `HybridSearchStrategy.ts:74`. Pure duplication. **DELETE** execution; route calls `SearchOrchestrator.findByConcept(concept, args)` at `SearchOrchestrator.ts:126`. Keep markdown header/table rendering via `renderObservations(obs, SearchResultStrategy)`. |
| `:1277` | `findByFile` | **real-work (display-wrap)** | Same pattern — duplicates `HybridSearchStrategy.findByFile`. **DELETE** execution; route → `SearchOrchestrator.findByFile`. Keep render. |
| `:1399` | `findByType` | **real-work (display-wrap)** | Same pattern — duplicates `HybridSearchStrategy.findByType`. **DELETE** execution; route → `SearchOrchestrator.findByType`. Keep render. |
| `:1468` | `getRecentContext` | **real-work** | ContextBuilder territory, NOT search. Leave to Plan 05. |
| `:1596` | `getContextTimeline` | **real-work** | Same — ContextBuilder / Plan 05. Leave. |
| `:1810` | `getTimelineByQuery` | **real-work** | Contains a fourth copy of the 90-day filter at `:1840-1847`. Depends on `SearchOrchestrator.getTimeline` + `formatTimeline`. **REFACTOR**: strip 90-day; delegate. |
**Tally**: 3 pure pass-throughs to delete (`:951`, `:1037`, `:1123`); 2 `@deprecated` to delete (`:59`, `:70`); 6 real-work methods that keep only their rendering (`:161`, `:450`, `:1209`, `:1277`, `:1399`, `:1810`); 3 semantic shortcuts kept as display-wraps (`:731`, `:810`, `:894`); 2 ContextBuilder-owned methods left for Plan 05 (`:1468`, `:1596`). Every remaining "real-work" body becomes `orchestrator.X(args)` + `renderObservations(combined, SearchResultStrategy, ctx)` — no decision tree, no Chroma calls, no recency filter.
### Duplication vs facade distinction
The three hybrid methods (`findByConcept` `:1209`, `findByFile` `:1277`, `findByType` `:1399`) are not thin facades — they implement the same two-phase (SQLite metadata filter → Chroma semantic rank → intersect) algorithm that already lives in `HybridSearchStrategy.ts:26-240`. This is **parallel reimplementation**, not delegation. Phase 6 kills the in-file copy and routes through `SearchOrchestrator.findByConcept/File/Type` (`SearchOrchestrator.ts:126-180`), which already wraps `HybridSearchStrategy`.
### filterByRecency location
- **Canonical**: `src/services/worker/search/strategies/ChromaSearchStrategy.ts:196-217``private filterByRecency(chromaResults)`. Uses `SEARCH_CONSTANTS.RECENCY_WINDOW_MS` at `:200`. Called from `:119` inside `executeChromaSearch`.
- **Constant**: `src/services/worker/search/types.ts:15``RECENCY_WINDOW_DAYS: 90`; `:16``RECENCY_WINDOW_MS: 90 * 24 * 60 * 60 * 1000`.
- **Legacy copies in `SearchManager.ts`**: `:230`, `:247-259`, `:488`, `:978-985`, `:1064-1071`, `:1150-1157`, `:1840-1847`. All delete with the methods above or their refactors.
### Current Chroma-fail behavior (item #32 silent fallback)
`SearchOrchestrator.executeWithFallback` at `SearchOrchestrator.ts:93-110`:
```ts
const result = await this.chromaStrategy.search(options);
if (result.usedChroma) return result;
// Chroma failed - fall back to SQLite for filter-only
const fallbackResult = await this.sqliteStrategy.search({
...options,
query: undefined // Remove query for SQLite fallback <-- DROPS query text silently
});
return { ...fallbackResult, fellBack: true };
```
And inside `ChromaSearchStrategy.search` at `:76-86`, a thrown error becomes `{ usedChroma: false, fellBack: false }` (swallowed). The Orchestrator's `usedChroma=false` branch then runs SQLite with the query text stripped. **This is the silent fallback from audit item #32**. The current behavior drops the query text and returns filter-only SQLite results — no 503, no error signal to the caller. Caller (SearchManager) flips a `chromaFailed` flag into the rendered markdown, but JSON callers (viewer UI, mem-search skill, CorpusBuilder) have no way to detect it.
### Route surface
`src/services/worker/http/routes/SearchRoutes.ts` declares 18 endpoints. Of those that invoke `this.searchManager.*`:
- Pass-through candidates (3): `/api/search/observations` `:98`, `/api/search/sessions` `:107`, `/api/search/prompts` `:116`.
- Route-to-Orchestrator-directly candidates (3): `/api/search/by-concept` `:125`, `/api/search/by-file` `:134`, `/api/search/by-type` `:143`.
- Display-wrap kept: `/api/search` `:53`, `/api/timeline` `:62`, `/api/decisions` `:71`, `/api/changes` `:80`, `/api/how-it-works` `:89`, `/api/timeline/by-query` `:303`, plus `/api/context/*` (Plan 05 territory).
## Copy-ready snippet locations
- Hybrid decision tree + 503 branch target: `SearchOrchestrator.ts:81-121`. Replace lines 100110 with the 503 throw.
- 503 shape: follow anti-pattern guard C from `06-implementation-plan.md:63` — throw a typed `ChromaUnavailableError` (new class `src/services/worker/search/errors.ts`) with `code='chroma_unavailable'`; `SearchRoutes.wrapHandler` catches and maps to `res.status(503).json({error:'chroma_unavailable'})`.
- Render path: `renderObservations(combined, SearchResultStrategy, ctx)` from Plan 05 Phase 4 → new file `src/services/worker/search/strategies/SearchResultStrategy.ts`.
- Pass-through deletion ranges: `SearchManager.ts:951-1036` (`searchObservations`), `:1037-1122` (`searchSessions`), `:1123-1208` (`searchUserPrompts`).
- `filterByRecency` + callers to delete: `ChromaSearchStrategy.ts:196-217` + call site `:119`; `SEARCH_CONSTANTS.RECENCY_WINDOW_DAYS`/`_MS` at `types.ts:15-16`; plus the seven copies in `SearchManager.ts` listed above.
## Confidence + gaps
**High confidence**:
- SearchManager method classifications (grep-verified inventory; body-read for the three hybrid methods confirms exact duplication of `HybridSearchStrategy.*`).
- Current silent-fallback behavior (read in `SearchOrchestrator.ts:93-110`).
- 90-day default exists at exactly one shared constant (`types.ts:15-16`) plus seven in-file duplicate copies inside `SearchManager.ts`.
**Gaps**:
- Semantic-inject POST `/api/context/semantic` at `SearchRoutes.ts:270` calls `searchManager.search` with its own mini-formatter **post-render** (flagged by Plan 05 Phase 6). This plan does not touch that handler; Plan 05 owns it.
- `ResultFormatter.formatSearchResults` callers — need one grep pass during Phase 6 to confirm no other caller beyond `SearchManager.search` at `:321`, `formatSearchResults` routes, and `SearchOrchestrator.ts:214` (which also exposes it). Left as a Phase 6 checklist item.
- Exact JSON error body shape for 503 — two reasonable choices (`{error:'chroma_unavailable'}` vs `{error:{code:'chroma_unavailable', retryable:true}}`). Defer to Phase 4 decision; current plan uses the simpler shape.
---
## Phase 1 — Classify every `SearchManager` method
**(a) What**: Lock the method inventory above into the repo as a code comment in `SearchManager.ts` header (keeps future auditors honest). No behavior change.
**(b) Docs**: `05-clean-flowcharts.md` Part 1 item #31; `06-implementation-plan.md:38` (V11); live file `src/services/worker/SearchManager.ts:1-2069`.
**(c) Verification**:
- `grep -n "^\s*async \+[a-zA-Z]" src/services/worker/SearchManager.ts | wc -l` → 15 public async methods (matches inventory).
- `grep -n "@deprecated" src/services/worker/SearchManager.ts` → exactly one hit at `:57` (`queryChroma`). Confirm `searchChromaForTimeline` at `:70` is untagged but classified deprecated per `01-flowcharts/hybrid-search-orchestration.md:91`.
**(d) Anti-pattern guards**: Guard D — every method marked "pass-through" in the inventory must have a body that trivially forwards to `this.orchestrator.*` after reading. If a method claims pass-through but also does date filtering or recency windows, reclassify as real-work before later phases delete it.
---
## Phase 2 — Delete `@deprecated` methods
**(a) What**: Copy from `SearchManager.ts:59-97`**delete** both `queryChroma` and `searchChromaForTimeline`. Update `timeline()` at `:490` to call `SearchOrchestrator.getTimeline` / `formatTimeline` (`SearchOrchestrator.ts:185-209`) instead.
**(b) Docs**: `05-clean-flowcharts.md` Part 1 item #30 (line 48); `05-clean-flowcharts.md` §3.6 "Deleted" bullet 2 (line 286); `SearchManager.ts:57` @deprecated tag.
**(c) Verification**:
- `grep -rn "queryChroma\|searchChromaForTimeline" src/` → only hits are `chromaSync.queryChroma` (ChromaSync public method — do not touch) and `ChromaSearchStrategy.ts` calls to `chromaSync.queryChroma`.
- `grep -n "@deprecated" src/services/worker/SearchManager.ts` → zero hits.
- `npm run build` passes; `/api/timeline?query=x` still returns timeline.
**(d) Anti-pattern guards**: Guard D — no replacement shim; delete outright. Do not leave a `/** @deprecated */` stub calling the Orchestrator — that is the thin-facade anti-pattern returning.
---
## Phase 3 — Route `SearchRoutes` directly to `SearchOrchestrator` for pass-throughs
**(a) What**: In `src/services/worker/http/routes/SearchRoutes.ts`:
1. Inject `SearchOrchestrator` alongside `SearchManager` (or replace `SearchManager` prop entirely once Phase 6 lands). Copy constructor wiring shape from `SearchRoutes.ts:14-18`.
2. Rewire three handlers:
- `:98` `handleSearchObservations``await this.orchestrator.search({...req.query, type:'observations'})`
- `:107` `handleSearchSessions``await this.orchestrator.search({...req.query, type:'sessions'})`
- `:116` `handleSearchPrompts``await this.orchestrator.search({...req.query, type:'prompts'})`
3. Delete `searchObservations`, `searchSessions`, `searchUserPrompts` from `SearchManager.ts:951-1208`.
**(b) Docs**: `05-clean-flowcharts.md` §3.6 diagram (line 267 `B --> C`); `06-implementation-plan.md:208-225` Phase 4 step 1; live file `src/services/worker/http/routes/SearchRoutes.ts:98-118` and `SearchManager.ts:951-1208`.
**(c) Verification**:
- `grep -n "this.searchManager.search\(Observations\|Sessions\|UserPrompts\)" src/` → zero hits.
- `curl localhost:37777/api/search/observations?query=x` returns the same JSON shape as before (snapshot test).
- Chroma-down test: stop the Chroma subprocess; call `/api/search/observations?query=x`**503 with `{error:'chroma_unavailable'}`** (contract established in Phase 4). Not an empty `observations:[]` array.
**(d) Anti-pattern guards**:
- Guard D — the deleted methods were ~85 lines each of wrapping; make sure the replacement route lines do NOT re-import a "for type consistency" shim from SearchManager.
- Guard C — if the old pass-through silently caught Chroma failures and returned `observations:[]`, the new direct route must propagate the 503 from Phase 4.
---
## Phase 4 — Replace silent Chroma-fail with 503 in `SearchOrchestrator`
**(a) What**: Copy from `SearchOrchestrator.ts:90-110`. Delete the fallback branch:
```ts
// DELETE these lines 100-110
const fallbackResult = await this.sqliteStrategy.search({...options, query: undefined});
return {...fallbackResult, fellBack: true};
```
Replace with:
```ts
throw new ChromaUnavailableError();
```
Add `src/services/worker/search/errors.ts` exporting `class ChromaUnavailableError extends Error { code = 'chroma_unavailable' }`.
Also update `ChromaSearchStrategy.ts:76-86` — the catch block currently swallows errors and returns `usedChroma:false`. Change to rethrow as `ChromaUnavailableError` so `executeWithFallback` sees it.
In `SearchRoutes.ts` `wrapHandler` (or `BaseRouteHandler`), catch `ChromaUnavailableError``res.status(503).json({error:'chroma_unavailable'})`.
Update `SearchOrchestrator.findByConcept`/`findByType`/`findByFile` (`:126-180`) — today they fall back to SQLite-only on no-hybrid. That fallback is **allowed** because concept/type/file filters are legitimate without Chroma. Only text-query paths get 503. Document this distinction inline.
**(b) Docs**: `05-clean-flowcharts.md` Part 1 item #32 (line 50); `05-clean-flowcharts.md` §3.6 line 271 (`Return 503 error=chroma_unavailable (NO silent fallback)`); `06-implementation-plan.md:63` anti-pattern C; `06-implementation-plan.md:644` verification line (grep for `res.status(503)` + `chroma_unavailable`).
**(c) Verification**:
- Unit test: stub `ChromaSync.queryChroma` to throw → `SearchOrchestrator.search({query:'x'})` throws `ChromaUnavailableError`.
- Unit test: construct `SearchOrchestrator` with `chromaSync = null``search({query:'x'})` throws `ChromaUnavailableError` (today returns an empty result at `:115-120`; that branch also goes).
- Integration test: `curl localhost:37777/api/search?query=x` with Chroma disabled → `503` with body `{"error":"chroma_unavailable"}`.
- Integration test: `curl localhost:37777/api/search/by-concept?concept=x` with Chroma disabled → 200 with SQLite-only results. Concept/type/file filters remain functional without Chroma; only text-query paths hard-fail.
- `curl localhost:37777/api/search` (no query) with Chroma disabled → 200 with SQLite filter-only results (this path is legitimate per §3.6 line 272).
- `grep -rn "query: undefined" src/services/worker/search/` → zero hits (the silent-drop pattern).
- `grep -rn "fellBack" src/` → zero hits. The `fellBack` field on `StrategySearchResult` is obsolete once fallback is deleted; remove from `types.ts` as part of this phase.
**(d) Anti-pattern guards**:
- Guard C — primary target. Silent fallback deleted; explicit error class + HTTP status.
- Guard D — do not wrap the new throw behind a shim in `SearchManager`. The orchestrator throws; routes handle.
---
## Phase 5 — Delete `filterByRecency` and the 90-day default
**(a) What**:
1. Copy from `ChromaSearchStrategy.ts:196-217`**delete** `filterByRecency` method.
2. Delete its call site at `ChromaSearchStrategy.ts:119` (`const recentItems = this.filterByRecency(chromaResults);`). Replace with direct `chromaResults.ids` + `metadatas` join (preserve the metadata-by-id map logic from the old method's lines `:202-208` — that dedup IS real work; only the 90-day filter goes).
3. Delete `SEARCH_CONSTANTS.RECENCY_WINDOW_DAYS` and `RECENCY_WINDOW_MS` from `src/services/worker/search/types.ts:15-16`.
4. Delete the seven in-file copies in `SearchManager.ts` (lines 230-259, 488, 978-985, 1064-1071, 1150-1157, 1840-1847). Replaced by caller-supplied `dateRange` only — if caller wants recency, caller passes `dateRange: {start, end}`.
**(b) Docs**: `05-clean-flowcharts.md` Part 1 item #33 (line 51); `05-clean-flowcharts.md` §3.6 "Deleted" bullet 4 (line 288); live `src/services/worker/search/strategies/ChromaSearchStrategy.ts:196-217`; `src/services/worker/search/types.ts:15-16`.
**(c) Verification**:
- `grep -rn "RECENCY_WINDOW\|filterByRecency\|ninetyDaysAgo\|90.day\|90 days" src/` → zero hits.
- Integration test: seed an observation dated 100 days ago; query by its text → it appears in results (would have been filtered out pre-deletion).
- Integration test: pass `dateRange.start` = 60 days ago; observation from 100 days ago is excluded. Explicit filter still works.
**(d) Anti-pattern guards**:
- Guard C — silent implicit filter replaced by explicit caller param.
- Guard D — no "convenience wrapper" that re-applies 90 days when `dateRange` is missing. Missing = all.
---
## Phase 6 — Keep display-wrap in `SearchManager`; switch to `renderObservations(results, SearchResultStrategy)`
**BLOCKED until**: Plan 05 Phase 4 lands and ships `src/services/worker/search/strategies/SearchResultStrategy.ts`.
**(a) What**:
1. In `SearchManager.ts:161-445` (`search`): delete everything from the `PATH 1` decision at `:177` through the categorization/hydration blocks at `:321`. The full decision tree is already in `SearchOrchestrator.search`. Replace body with:
```ts
async search(args: any): Promise<any> {
const results = await this.orchestrator.search(args);
if (args.format === 'json') return { content:[{type:'text', text: JSON.stringify(results)}] };
const combined = combineResults(results.results);
return { content:[{type:'text', text: renderObservations(combined, SearchResultStrategy, ctx)}] };
}
```
2. Apply same transformation to `timeline` `:450`, `findByConcept` `:1209`, `findByFile` `:1277`, `findByType` `:1399`, `getTimelineByQuery` `:1810`. Each becomes: call orchestrator → render via strategy. Keep the outer `{content:[{type:'text', ...}]}` MCP envelope; drop everything in between.
3. Keep `decisions`, `changes`, `howItWorks` `:731-950` as semantic-shortcut wrappers. They compute a preset query string, call `this.orchestrator.search({...args, query:'decision'})` (or equivalent), render via `renderObservations`. Body shrinks from ~70 lines each to ~10.
4. Delete or drop-in replace `normalizeParams` at `:103``SearchOrchestrator.normalizeParams` at `:239` is canonical. If the API-only coercions (`filePath→files`, `isFolder`) are missing there, **move them into** `SearchOrchestrator.normalizeParams` and delete the SearchManager copy. Guard: grep every caller to confirm the Orchestrator version covers all cases.
**(b) Docs**: `05-clean-flowcharts.md` §3.6 line 281 (`Fmt -->|markdown| M["renderObservations(results, SearchResultStrategy)"]`); `06-implementation-plan.md:220-225` (Phase 4 step 3 — keep the combine/group/table code as a `ResultRenderer` module); `07-plans/05-context-injection-engine.md:169-182` Phase 4 (SearchResultStrategy); live `src/services/worker/SearchManager.ts:161-445`.
**(c) Verification**:
- `wc -l src/services/worker/SearchManager.ts` → under 400 lines (from 2069).
- Snapshot test: fixture `SearchResults``renderObservations(combined, SearchResultStrategy, ctx)` output is byte-equal to the pre-refactor `ResultFormatter.formatSearchResults` output. Plan 05 Phase 4 owns this fixture; reuse it here.
- `grep -n "combineResults\|groupByDate\|groupByFile" src/services/worker/SearchManager.ts` → zero hits (now lives in SearchResultStrategy / renderObservations).
- Manual: viewer UI `http://localhost:37777` search results render identically.
**(d) Anti-pattern guards**:
- Guard D — SearchManager's remaining methods must each be ≤15 lines (orchestrator call + render envelope). If any method balloons back, it's re-implementing decision logic.
- Guard A (strategy count from Plan 05 audit Part 2) — don't invent a fifth strategy just for "semantic context injection". Plan 05 Phase 6 routes that handler through `SearchResultStrategy` with a flag.
---
## Phase 7 — Verification
Run all checks from phases 16 in one pass, plus:
1. **Behavior preservation**:
- All three search paths (filter-only, Chroma-semantic, hybrid concept/type/file) return results for representative queries.
- `?format=json` and default markdown both work on every search endpoint.
- `concept=`, `type=`, `obs_type=`, `files=`, `filePath=` filters all honored (grep-verify normalizeParams covers each).
- Timeline endpoint returns chronological groupings with anchor depth filtering intact.
2. **Chroma-down contract**:
- Stop Chroma subprocess. `curl /api/search?query=x` → 503 `{"error":"chroma_unavailable"}`. Not empty, not silent.
- `curl /api/search` (no query) → 200 with SQLite filter results.
- `curl /api/search/by-concept?concept=foo` → 200 with SQLite metadata results (per `SearchOrchestrator.ts:126-140`).
3. **Line-count targets**:
- `SearchManager.ts`: 2069 → under 400 lines (≥1600 deleted).
- `SearchOrchestrator.ts`: ~290 → ~280 (fallback branch removed, error class added).
- `ChromaSearchStrategy.ts`: 247 → ~215 (filterByRecency deleted).
- Net project delete target: ~1700 lines.
4. **Grep contract checks**:
- `grep -rn "query: undefined" src/services/worker/search/` → 0.
- `grep -rn "RECENCY_WINDOW\|filterByRecency\|ninetyDaysAgo" src/` → 0.
- `grep -rn "@deprecated" src/services/worker/SearchManager.ts` → 0.
- `grep -rn "this.searchManager.search\(Observations\|Sessions\|UserPrompts\)" src/` → 0.
- `grep -rn "res.status(503)" src/services/worker/http/` → at least one hit on the `chroma_unavailable` path.
5. **Downstream smoke** (Plan 10 contract):
- `CorpusBuilder.build` test — feed synthetic observations, confirm `SearchOrchestrator.search` signature unchanged and `StrategySearchResult` shape stable.
6. **Anti-pattern audit**:
- Guard C: no `catch { return empty }` patterns in `src/services/worker/search/`.
- Guard D: every method in `SearchManager.ts` either renders or shortcut-presets. No single-line `return this.orchestrator.x(args)` remains.
@@ -0,0 +1,529 @@
# Implementation Plan: session-lifecycle-management
**Flowchart**: PATHFINDER-2026-04-21/05-clean-flowcharts.md § 3.8 ("session-lifecycle-management (clean) — BIGGEST CULL")
**Before-state**: PATHFINDER-2026-04-21/01-flowcharts/session-lifecycle-management.md
**Scope** (revised 2026-04-22: zero-timer model): delete all three repeating background timers in the worker layer — no `ReaperTick` replacement, no `sqliteHousekeepingInterval`. Replace each recurring check with one of: (a) the `child.on('exit')` handlers already wired at `ProcessRegistry.ts:479` (SDK) and `worker-service.ts:530` (MCP), (b) the per-iterator 3-min idle `setTimeout` already wired at `SessionQueueProcessor.ts:6` (covers hung-generator case on its own), (c) a per-session `setTimeout(deleteSession, 15min)` scheduled on last-generator-completion and cleared on new activity (covers abandoned-session case), (d) a boot-once reconciliation block that calls the existing `killSystemOrphans()` + `supervisor.pruneDeadEntries()` + `recoverStuckProcessing()` + `clearFailedOlderThan(1h)` once at worker startup. Delete the worker-level `ProcessRegistry` facade (528 LoC). Inline the SIGTERM→SIGKILL ladder. Implement blocking `POST /api/session/end`.
**Target LoC**: process-lifecycle ~900 → ~400.
**Target repeating-timer count in `src/services/worker/` + `worker-service.ts`**: 3 → **0**. (The only `setTimeout` calls that remain are the per-operation escalation ladder, per-session idle, per-session abandonment, and the generator-exit race — all non-repeating, all correct.)
---
## Dependencies
### Upstream (must land first)
- **01-privacy-tag-filtering** — defines shared `stripMemoryTags(text)` in `src/utils/tag-stripping.ts`. Phase 1 of THIS plan introduces `ingestObservation` / `ingestPrompt` / `ingestSummary` helpers that call that function. If 01 has not landed, Phase 1 here imports the existing wrappers, but the ingest-helper location (`src/services/ingest/`) is authoritative and 01 rewires its call-sites into these helpers.
- **02-sqlite-persistence** — owns the boot-recovery section of `sqlite-persistence (clean)` (§ 3.3 bottom box `BootOnce`). V19 per-claim 60-s reset (`PendingMessageStore.ts:99-145`) is deleted by Phase 5 of THIS plan and replaced with a single `PendingMessageStore.recoverStuckProcessing()` called once in worker boot. 02 codifies the broader schema-recovery ordering; Phase 5 slots `recoverStuckProcessing()` into that boot sequence.
- **03-response-parsing-storage** — defines `ResponseProcessor` + `session.recordFailure()` contract. Phase 7 (blocking `/api/session/end`) awaits the `summary_stored` flag that `ResponseProcessor` sets after a successful summary commit. The "summary_stored OR 110s timeout" integration point lives inside this plan (Phase 7) but depends on 03 wiring the flag.
### Downstream (this plan enables)
- **09-lifecycle-hooks** — hook layer consumes the blocking `POST /api/session/end` built in Phase 7 (replaces the current 500-ms polling loop in `src/cli/handlers/summarize.ts:117-150`). That plan's hook simplification is blocked until Phase 7 ships.
---
## Concrete findings from live code
### `src/services/worker/ProcessRegistry.ts` (527 lines — entire file slated for deletion)
Exposed surface (every export → supervisor-registry method it should hit directly):
| Worker export | File:line | Replacement |
|---|---|---|
| `registerProcess(pid, sessionDbId, process)` | `:57-65` | `getSupervisor().registerProcess(id, info, procRef)` — already the body of this function |
| `unregisterProcess(pid)` | `:70-79` | `getSupervisor().getRegistry().getByPid(pid)` + `getSupervisor().unregisterProcess(record.id)` — already the body |
| `getProcessBySession(sessionDbId)` | `:85-94` | Move to free helper `findSessionProcess(id)` in `src/services/worker/process-spawning.ts`; body iterates `getRegistry().getAll()` + filters by `type==='sdk'` (same as `getTrackedProcesses` helper at `:34-52`) |
| `getActiveCount()` | `:99-101` | Direct: `getSupervisor().getRegistry().getAll().filter(r => r.type==='sdk').length` |
| `waitForSlot(max, timeout, evict)` | `:122-167` | Pool-slot bookkeeping is worker-scoped, **not** a supervisor concern. Keep as free function in `process-spawning.ts`. The `slotWaiters` array (`:104`) stays module-local. |
| `notifySlotAvailable()` (internal) | `:109-112` | Stays module-local in `process-spawning.ts`; called from the `exit` event handler inside `createPidCapturingSpawn`. Under the zero-timer model, `exit` is the sole runtime trigger, so slot notification happens directly from the handler that already owns subprocess-death semantics. No scanner involved. |
| `getActiveProcesses()` | `:172-179` | Free helper in `process-spawning.ts` (still used for stats / debug endpoints). |
| `ensureProcessExit(tracked, timeoutMs=5000)` | `:185-229` | **Inline** into `deleteSession` (SessionManager.ts:406-413) as 12-line block: check `exitCode`, `Promise.race([once('exit'), setTimeout])`, SIGKILL, race again. Per audit item #9 and anti-pattern guard A. |
| `killIdleDaemonChildren()` | `:244-309` | **Delete**. Its runtime role (cleaning up our own idle daemons) is covered by the `child.on('exit')` handler at `ProcessRegistry.ts:479` which already calls `unregisterProcess(pid)`, combined with the per-iterator 3-min idle `setTimeout` at `SessionQueueProcessor.ts:6` that aborts hung generators. Ppid=1 leftovers from a prior worker crash are caught by boot-once `killSystemOrphans()` (see next row). |
| `killSystemOrphans()` | `:315-344` | **Keep function body; move call from interval to boot-once.** Ppid=1 Claude processes can only exist because a *previous* worker crashed without reaping them — during the current worker's lifetime, `exit` handlers catch subprocess death. So one call at worker startup covers the full scope. Called from worker boot init (Phase 3), never scheduled. |
| `reapOrphanedProcesses(activeSessionIds)` | `:349-382` | **Delete**. Runtime component: covered by `exit` handlers. Cross-restart component: covered by boot-once `supervisor.pruneDeadEntries()` which walks the registry and drops entries whose PIDs are no longer in the OS. |
| `createPidCapturingSpawn(sessionDbId)` | `:393-502` | Move verbatim to `process-spawning.ts` as free function. It already wires `child.on('exit')``unregisterProcess(pid)` at `:479-486` — keep that path; it's the sole runtime subprocess-death signal under the zero-timer model. |
| `startOrphanReaper(getActiveSessionIds, intervalMs=30_000)` | `:508-527` | **Delete**; no replacement timer. |
Caller fan-out (every `from '.../ProcessRegistry'` site must be re-pointed):
- `src/services/worker/SessionManager.ts:17` — imports `getProcessBySession, ensureProcessExit`. Rewrite: import from `./process-spawning.js` (findSessionProcess), and inline the exit wait in `deleteSession`.
- `src/services/worker/SDKAgent.ts:24` — imports `createPidCapturingSpawn, getProcessBySession, ensureProcessExit, waitForSlot`. Rewrite: import from `./process-spawning.js`. The `ensureProcessExit` call-site (search inside SDKAgent) goes away when we route through `deleteSession`.
- `src/services/worker-service.ts:109` — imports `startOrphanReaper, reapOrphanedProcesses, getProcessBySession, ensureProcessExit`. After Phase 3, imports shrink to `{ getActiveProcesses }` from `./process-spawning.js`. `startOrphanReaper` + `reapOrphanedProcesses` delete. The `ensureProcessExit` at `worker-service.ts:786` inlines.
### `src/supervisor/process-registry.ts` (408 lines — authoritative, stays as-is)
Relevant API (no changes needed):
- `class ProcessRegistry` at `:175``register`, `unregister`, `getAll`, `getBySession`, `getByPid`, `getRuntimeProcess`, `pruneDeadEntries` (`:269-285`, uses `isPidAlive`), `reapSession(sessionId)` (`:292-385`, implements SIGTERM → wait 5 s → SIGKILL → wait 1 s).
- `isPidAlive(pid)` at `:28-45` — reused directly by boot-once `supervisor.pruneDeadEntries()` (Phase 3 Mechanism C) and by the inlined `killSystemOrphans()` body, both called exactly once per worker boot. Not called by any repeating timer.
- `getSupervisor().getRegistry()` — how worker code reaches this class (verified in worker/ProcessRegistry.ts:39, 71, 353).
### `src/services/worker/worker-service.ts`
- Line `109`: import site that must shrink.
- Line `174`: `private staleSessionReaperInterval: ReturnType<typeof setInterval> | null = null;` — delete field.
- Line `537`: `this.stopOrphanReaper = startOrphanReaper(() => { ... });` — delete outright, no replacement timer. Runtime subprocess death is handled by `child.on('exit')` handlers; cross-restart orphans are handled by boot-once `killSystemOrphans()` + `supervisor.pruneDeadEntries()`.
- Line `547`: `this.staleSessionReaperInterval = setInterval(async () => { ... }, 2*60*1000)`**delete the entire block** (outer wrapper + body). Disposition of the three things it did under the zero-timer model:
- `reapStaleSessions()` → deleted (no replacement timer). Hung-generator case is covered by the per-iterator idle `setTimeout` at `SessionQueueProcessor.ts:6`; no-generator abandonment is covered by the per-session `abandonedTimer` (Phase 3 Mechanism B).
- `clearFailedOlderThan(1h)` → moved to boot-once (Phase 3 Mechanism C step 4, co-owned with plan 02).
- `PRAGMA wal_checkpoint(PASSIVE)` → deleted outright. SQLite's default `wal_autocheckpoint=1000` pages is the contract (confirmed at `Database.ts:162-168` — no override).
- Line `786`: `await ensureProcessExit(trackedProcess, 5000)` — inline.
- Line `1108-1110`: shutdown path clears `staleSessionReaperInterval`. **Delete both shutdown clauses outright** — there is nothing to clear since no `setInterval` remains in the worker layer.
### `src/services/worker/SessionManager.ts`
- `MAX_GENERATOR_IDLE_MS = 5*60*1000` at `:23`**delete**. Hung-generator detection is now owned by `SessionQueueProcessor.ts:6` (`IDLE_TIMEOUT_MS = 3*60*1000`) at the stream level. The 5-min worker-layer threshold is redundant with the 3-min per-iterator threshold and the old split created two sources of truth.
- `MAX_SESSION_IDLE_MS = 15*60*1000` at `:26` — keep; now consumed by the per-session `scheduleAbandonedCheck()` method (Phase 3 Mechanism B).
- `detectStaleGenerator(session, proc, now)` at `:59-84`**delete**. Its consumer (`reapStaleSessions`) is being deleted; its logic (compare `lastGeneratorActivity` against a threshold) is superseded by the per-iterator idle `setTimeout` in `SessionQueueProcessor.ts` which resets on every chunk and fires `onIdleTimeout``abortController.abort()` at the stream level, not from a scanner.
- `deleteSession(sessionDbId)` at `:381-446` — inline `ensureProcessExit` at `:412`; additionally, clear `session.abandonedTimer` at the top of this method if set (per Phase 3 Mechanism B wiring).
- `reapStaleSessions()` at `:516-568`**delete method**, no replacement closure. The two branches:
- Generator-active branch at `:520-549`: replaced by the per-iterator idle `setTimeout` at `SessionQueueProcessor.ts:6` which aborts the controller when the stream is silent ≥3 min. The subprocess's `exit` handler then unregisters.
- No-generator branch at `:550-561`: replaced by the per-session `abandonedTimer` `setTimeout` scheduled on last-generator-completion and cleared on new activity (Phase 3 Mechanism B).
- `queueSummarize(sessionDbId, lastAssistantMessage)` at `:329-377` — unchanged; Phase 7's blocking endpoint calls this first, then awaits.
### `src/services/worker/SDKAgent.ts`
- Line `24` imports.
- The iterator pattern uses `session.abortController` (established in `SessionManager.initializeSession`); Phase 7's `/api/session/end` calls `session.abortController.abort()` after awaiting summary_stored. No change to SDKAgent body needed for abort semantics — the AbortSignal flows through the SDK query already (confirmed by SessionManager.ts:390 existing abort path).
### `src/services/sqlite/PendingMessageStore.ts`
- `STALE_PROCESSING_THRESHOLD_MS = 60_000` at `:6`.
- `claimNextMessage(sessionDbId)` at `:99-145` — the transaction body currently does both self-heal (`:103-116`) and claim (`:118-140`). Phase 5: keep the transaction, delete lines `103-116`, add a new public method `recoverStuckProcessing(): number` that runs the same UPDATE **unscoped by session id** once at worker boot.
- No behavior regression: the only functional change is timing. Crashed sessions are recovered on next worker boot (correct crash-recovery semantic), not on every claim call (polling anti-pattern).
### Blocking `POST /api/session/end` (Phase 7) — current state
- Existing endpoints (to consolidate):
- `POST /api/sessions/summarize` at `SessionRoutes.ts:387` → handler `handleSummarizeByClaudeId` → calls `queueSummarize` (`:705`) and returns immediately.
- `POST /api/sessions/complete` at `SessionRoutes.ts:753` → clears active session map.
- `GET /api/sessions/status?contentSessionId=...` at hook-side polling (`src/cli/handlers/summarize.ts:123`) — returns `{queueLength, summaryStored}`.
- `session.lastSummaryStored` is already written inside `ResponseProcessor` (see `SessionRoutes.ts:747` where it is read). This is the flag Phase 7 awaits.
- Phase 7 delivers: `POST /api/session/end` — body `{sessionDbId, last_assistant_message}`. Server-side: call `queueSummarize`, then `await` a `Promise` that resolves when `session.lastSummaryStored` flips, with a hard 110 000 ms timeout, then `session.abortController.abort()`, then `deleteSession`. Returns `{summaryId or null}`.
- Hook simplification (in 09-lifecycle-hooks plan) replaces the 220-iteration 500-ms poll loop at `summarize.ts:117-150` with one POST.
---
## Copy-ready snippet locations — event-driven + boot-once + per-session timers (revised 2026-04-22)
No new file. No `reaper.ts`. No `ReaperTick`. Three mechanisms, spread across existing modules:
### Mechanism A — `child.on('exit')` handlers (already wired; verify and keep)
- SDK spawn: `ProcessRegistry.ts:475-486` → moves to `process-spawning.ts:createPidCapturingSpawn` in Phase 2. The `on('exit', ...)` at `:479` must continue to call `unregisterProcess(child.pid)` at `:484`. Do not modify.
- MCP spawn: `worker-service.ts:523-532`. The `once('exit', ...)` at `:530` must continue to call `getSupervisor().unregisterProcess('mcp-server')` at `:531`. Do not modify.
- Per-iterator 3-min idle timeout: `SessionQueueProcessor.ts:6` (`IDLE_TIMEOUT_MS`), resets at `:51-52, :62-63`, fires `onIdleTimeout` at `:93-104``SessionManager.ts:651-655``session.abortController.abort()` → the abort signal reaches the spawn at `ProcessRegistry.ts:463` → child exits → `exit` handler unregisters. This chain already exists and covers the hung-generator case entirely.
**No code edit** — this mechanism is the verification target, not the change target. Phase 3 verification greps confirm these handlers are still in place after Phase 2's extraction.
### Mechanism B — Per-session abandoned-session `setTimeout` (new, replaces `reapAbandonedSessions`)
Goal: when a session has no generator running and no pending messages for 15 min, delete it. Detected at the session itself rather than by a global scanner.
Add to `SessionManager.ts`:
```ts
// In ActiveSession interface — add:
abandonedTimer?: ReturnType<typeof setTimeout>;
// New private method on SessionManager:
private scheduleAbandonedCheck(sessionDbId: number): void {
const session = this.sessions.get(sessionDbId);
if (!session) return;
if (session.abandonedTimer) clearTimeout(session.abandonedTimer);
session.abandonedTimer = setTimeout(() => {
const s = this.sessions.get(sessionDbId);
if (!s) return;
if (s.generatorPromise !== null) return; // still working — drop the timer silently
if (this.pendingStore.getPendingCount(sessionDbId) > 0) {
this.scheduleAbandonedCheck(sessionDbId); // work arrived while we waited — reschedule
return;
}
void this.deleteSession(sessionDbId); // truly abandoned — clean up
}, MAX_SESSION_IDLE_MS);
}
// In every code path that marks "work finished" — call scheduleAbandonedCheck
// In every code path that marks "new work arrived" — call clearTimeout(session.abandonedTimer)
```
Call-sites (derived from `SessionManager.ts`):
- Schedule (work finished): after `generatorPromise` resolves at `SessionManager.ts:~335` (`queueSummarize` fire-and-forget completion) and after `iterator` exits at `SessionManager.ts:~648` (the for-await loop exit).
- Clear (new work arrived): at the top of `initializeSession()` when a pending message lands; inside `queueSummarize()`; inside any `ingestObservation` path that sets `lastActivity`.
The timer is per-session, not repeating. When it fires it either deletes the session or reschedules itself if new work snuck in — no drift, no thundering-herd scan.
### Mechanism C — Boot-once reconciliation block (new helper in `worker-service.ts`)
Goal: at worker startup, in ONE sequential block, reconcile all state that event handlers cannot catch (i.e., state that can only have been orphaned by a previous worker instance).
Add to `worker-service.ts` boot init, immediately after `resetStaleProcessingMessages(0)` at `:424`:
```ts
// Boot-once reconciliation — runs exactly ONCE per worker process lifetime.
// Catches state orphaned by a previous (possibly crashed) worker instance.
await this.reconcileWorkerStartup();
// private method:
private async reconcileWorkerStartup(): Promise<void> {
// 1. Kill ppid=1 Claude processes leftover from a crashed prior worker.
// (Copy body of killSystemOrphans from ProcessRegistry.ts:315-344 into
// process-spawning.ts as a free helper before Phase 2 deletes the file.)
await killSystemOrphans();
// 2. Prune registry entries whose PID is no longer in the OS (crash-recovery).
getSupervisor().getRegistry().pruneDeadEntries();
// 3. pending_messages stuck on 'processing' from a crashed worker.
// (Moved from per-claim 60-s reset — see Phase 5.)
this.sessionManager.getPendingMessageStore().recoverStuckProcessing();
// 4. SQLite housekeeping (moved from the deleted stale-reaper interval).
// (Covered by plan 02's boot-once SQLite housekeeping phase — this
// plan assumes 02 has landed; if it has not, copy the call here.)
this.sessionManager.getPendingMessageStore().clearFailedOlderThan(60 * 60 * 1000);
}
```
No `setInterval` anywhere in this block. Each step runs exactly once. Explicit `PRAGMA wal_checkpoint` is **not** in this block because SQLite's default `wal_autocheckpoint=1000` pages (`Database.ts:162-168` sets no override) is the contract — see plan 02.
### What's deleted outright (no replacement)
- `src/services/worker/reaper.ts` (never created in this revision).
- `startReaperTick` export (never created).
- `staleSessionReaperInterval` (`worker-service.ts:174, :547`).
- `startOrphanReaper` (`ProcessRegistry.ts:508-527`, `worker-service.ts:537-544`).
- `reapStaleSessions` (`SessionManager.ts:516-568`).
- `reapOrphanedProcesses` (`ProcessRegistry.ts:349-382`).
- `killIdleDaemonChildren` as a runtime sweep (`ProcessRegistry.ts:244-309`) — function deleted entirely; its role is already covered by `exit` handlers + per-iterator idle timeout.
- Periodic `PRAGMA wal_checkpoint(PASSIVE)` call at `worker-service.ts:~581` — SQLite default covers it.
- Periodic `clearFailedOlderThan(1h)` call at `worker-service.ts:~567` — moved to boot-once (Mechanism C step 4).
---
## Phases
Every phase must satisfy: (a) precise "Copy from …" pointer, (b) doc citations, (c) verification, (d) anti-pattern guards (A invent supervisor API; B polling; D facade-over-facade).
### Phase 1 — Introduce ingest helpers (`ingestObservation` / `ingestPrompt` / `ingestSummary`)
(a) **Implement**:
- Create `src/services/ingest/index.ts` (new module). Three exports:
- `ingestObservation(payload: ObservationPayload): { id: number; skipped: boolean }`
- `ingestPrompt(payload: PromptPayload): { id: number; skipped: boolean }`
- `ingestSummary(payload: SummaryPayload): { id: number; skipped: boolean }`
- Each helper: `stripMemoryTags` all user-facing text fields → `PrivacyCheckValidator.validate(operationType)` (existing at `src/services/worker/validation/PrivacyCheckValidator.ts:17-24`) → `INSERT pending_messages` via `PendingMessageStore.enqueue`.
- Copy from: current HTTP-boundary strip + validate + enqueue sequence in `SessionRoutes.ts:696-705` (summarize branch) and the observation-queue path in `SessionManager.ts:276`. Consolidate.
(b) **Docs**:
- 05 § 3.8 — "`POST /api/session/observation``ingestObservation(payload) strip → validate → INSERT pending_messages` → emit 'message' event"
- 05 Part 2 D1 ("One observation ingest path")
- 05 § 3.2 call-site list (`C1` ingestObservation, `C2` ingestPrompt, `C3` ingestSummary — **C3 closes the summary privacy gap**)
- 06 cites `src/services/worker/validation/PrivacyCheckValidator.ts:17-24`
- Live: `src/services/worker/http/routes/SessionRoutes.ts:696-705`, `src/services/worker/SessionManager.ts:276`
(c) **Verification**:
- Grep `stripMemoryTags` usage: exactly 3 call-sites (one per helper) + unit test imports.
- Unit test: `ingestSummary({ last_assistant_message: "<private>secret</private> clean text" })` → DB row's `last_assistant_message` field does not contain "secret" (closes P1).
- `POST /api/sessions/summarize` call-path routes through `ingestSummary` (no direct strip call in `SessionRoutes.ts` anymore).
(d) **Guards**:
- A: do **not** add a fourth "`ingestAny(type, payload)`" dispatcher; the three shapes have different required fields and privacy rules. Separate functions → explicit failure modes.
- D: do **not** keep the old HTTP-boundary strip calls as a "belt-and-suspenders" second pass. Edge-processing only.
### Phase 2 — Delete `src/services/worker/ProcessRegistry.ts`; extract spawn helpers
(a) **Implement**:
- Create `src/services/worker/process-spawning.ts`:
- `createPidCapturingSpawn(sessionDbId)` — copy verbatim from `ProcessRegistry.ts:393-502`.
- `findSessionProcess(sessionDbId): TrackedProcess | undefined` — copy from `ProcessRegistry.ts:85-94` (`getProcessBySession` renamed for clarity).
- `getActiveProcesses()` — copy from `:172-179`.
- `getActiveProcessCount()` — copy from `:99-101`.
- `waitForSlot(max, timeoutMs, evict)` + `notifySlotAvailable()` + `slotWaiters` array + `TOTAL_PROCESS_HARD_CAP` — copy from `:104-167`.
- `TrackedProcess` interface — copy from `:27-32`.
- Inline helper `getTrackedProcesses()` — copy from `:34-52`.
- Rewire imports in:
- `SessionManager.ts:17``{ findSessionProcess }` from `./process-spawning.js`.
- `SDKAgent.ts:24``{ createPidCapturingSpawn, findSessionProcess, waitForSlot }`.
- `worker-service.ts:109``{ getActiveProcesses }`.
- Delete `src/services/worker/ProcessRegistry.ts`.
(b) **Docs**:
- 05 § 3.8 "Deleted: `src/services/worker/ProcessRegistry.ts` (facade, 528 lines) — supervisor registry is source of truth"
- 05 Part 1 item #4
- 06 Phase 5 "Delete worker ProcessRegistry facade" (Phase 5 :246-280)
- V5, V6
- Live: `ProcessRegistry.ts:1-527`, `worker-service.ts:109, 537, 786`, `SessionManager.ts:17, 412`, `SDKAgent.ts:24`
(c) **Verification**:
- `test -f src/services/worker/ProcessRegistry.ts` → false.
- `grep -rn "worker/ProcessRegistry" src/` → 0.
- `npx tsc --noEmit` clean.
- Manual: spawn SDK subprocess, kill with `kill -TERM <pid>`; subprocess exits; supervisor-registry prunes dead PID on next reaper tick (Phase 3 verifies the prune).
(d) **Guards**:
- D: no compat shim re-exporting deleted symbols.
- A: do **not** invent new methods on `supervisor/process-registry.ts` — use its existing public API (`register`, `unregister`, `getByPid`, `getBySession`, `getAll`, `pruneDeadEntries`, `reapSession`, `getRuntimeProcess`).
### Phase 3 — Wire event-driven cleanup + boot-once reconciliation + per-session abandoned-session timer (revised 2026-04-22)
**Previously proposed:** build a new `reaper.ts` module exporting a `ReaperTick` with three skippable checks on a 30-s interval; additionally introduce a dedicated `sqliteHousekeepingInterval` for `clearFailedOlderThan` + `wal_checkpoint`. Both were rejected as band-aids by investigation 2026-04-22 — see `08-reconciliation.md` Part 4 revision. This phase is now a **three-part change with zero new `setInterval`s.**
(a) **Implement — Part 1 (Mechanism A: verify existing event handlers survive Phase 2's extraction)**:
After Phase 2 moved `createPidCapturingSpawn` from `ProcessRegistry.ts:393-502` to `process-spawning.ts`, verify the subprocess `exit` handler still:
- At `ProcessRegistry.ts:479` (now `process-spawning.ts` in its new location): `child.on('exit', ...)` is present.
- Calls `unregisterProcess(child.pid)` (line `:484` relative) on exit.
- Also calls `notifySlotAvailable()` inside the same handler (keeps pool bookkeeping correct without a scanner).
No code change beyond what Phase 2 already did — the handler was already correct; this phase is where it *becomes load-bearing* because the sweeper it was backing up is being deleted.
(a) **Implement — Part 2 (Mechanism B: per-session abandoned-session `setTimeout`)**:
In `SessionManager.ts`:
1. Add `abandonedTimer?: ReturnType<typeof setTimeout>` to `ActiveSession` interface.
2. Add private `scheduleAbandonedCheck(sessionDbId: number): void` per the Copy-ready snippet section (Mechanism B). Threshold: `MAX_SESSION_IDLE_MS = 15*60*1000` (re-home from the module-level const at `:26` to a `thresholds` object — or leave in place and import into the method).
3. Wire schedule-on-idle call-sites:
- Inside `queueSummarize()` fire-and-forget completion handler (around `:335` — the `.finally` branch on the generator promise): `this.scheduleAbandonedCheck(sessionDbId)`.
- Inside the for-await iterator exit in `getMessageIterator()` consumer (around `:648`): `this.scheduleAbandonedCheck(sessionDbId)`.
4. Wire clear-on-activity call-sites:
- Top of `initializeSession()`: if `sessions.has(id)` and `session.abandonedTimer`, `clearTimeout(session.abandonedTimer)` + `session.abandonedTimer = undefined`.
- Inside `queueSummarize()` at entry: same clear.
- Inside observation enqueue path (wherever `ingestObservation` bumps `lastActivity`): same clear.
5. Inside `deleteSession()`: `if (session.abandonedTimer) clearTimeout(session.abandonedTimer)`. (Prevents firing after deletion.)
(a) **Implement — Part 3 (Mechanism C: boot-once reconciliation in `worker-service.ts`)**:
In `worker-service.ts`, replace the deleted blocks at lines `537-544` (`startOrphanReaper`) and `547-589` (stale reaper + WAL + failed-purge) with the boot-once call per the Copy-ready snippet section (Mechanism C). Insertion point: immediately after the existing `resetStaleProcessingMessages(0)` at `:424`.
Move the body of `killSystemOrphans` out of the doomed `ProcessRegistry.ts` **before** Phase 2 deletes that file. Two options:
- Land Phase 3 before Phase 2 and keep a direct import until Phase 2 runs; then move the function along with `createPidCapturingSpawn` into `process-spawning.ts` and re-export. (Chosen — preserves Phase ordering.)
- Copy the body inline into `worker-service.ts` boot helper. (Fallback if circular-import issues arise.)
`supervisor.getRegistry().pruneDeadEntries()` is used directly — no new method on the supervisor, per anti-pattern guard A.
(b) **Docs**:
- 05 § 3.8 revised subgraph "Event-driven cleanup — no repeating timers" and "Worker startup — boot-once reconciliation".
- 05 Part 2 **D3** ("Zero repeating background timers").
- 05 Part 4 timer census ("Repeating background timers: 3 → 0") — revision 2026-04-22.
- 08-reconciliation.md Part 4 (revised) — zero-timer model rationale + invariants.
- V6 (register ownership), V19 (stale-reset relocation to boot-once).
- Live: `ProcessRegistry.ts:315-344, 475-486, 479-484`, `worker-service.ts:421-427, 523-532, 537-589`, `SessionManager.ts:26, 59-84, 516-568, 648-656, 651-655`, `SessionQueueProcessor.ts:6, 51-52, 62-63, 93-104`, `supervisor/process-registry.ts` (pruneDeadEntries).
(c) **Verification**:
- **Zero `setInterval` in the worker layer**:
```
grep -rn "setInterval" src/services/worker/ src/services/worker-service.ts
```
Expected: **0** matches. No exclusions, no parenthetical carve-outs.
- **Zero references to the deleted sweeper names**:
```
grep -rn "ReaperTick\|startReaperTick\|startOrphanReaper\|staleSessionReaperInterval\|reapStaleSessions\|reapOrphanedProcesses\|killIdleDaemonChildren\|sqliteHousekeepingInterval" src/
```
Expected: **0**.
- **`killSystemOrphans` is called exactly once per worker boot**:
```
grep -rn "killSystemOrphans" src/
```
Expected: 2 matches — the definition and a single call site inside the boot-once helper. No call site inside any handler or interval.
- **Abandoned-session timer**:
- Unit test: initialize a session, fire-and-forget resolve its generator, advance a fake clock 15 min — assert `deleteSession` was called exactly once.
- Unit test: initialize a session, let it go idle for 14 min, then enqueue an observation — assert `abandonedTimer` was cleared and nothing was deleted.
- Unit test: initialize a session, idle 15 min, timer fires, but `pendingStore.getPendingCount()` returns > 0 at the moment of firing — assert timer reschedules and no delete occurs.
- **Hung-generator path**:
- Integration test: spawn an SDK session, freeze its stream (SIGSTOP the subprocess); after 3 min the per-iterator idle timeout at `SessionQueueProcessor.ts` fires, `abortController.abort()` fires, the child exits, the `exit` handler unregisters. No background scanner involved.
- **Boot-once reconciliation**:
- Integration test: before starting the worker, spawn a detached Claude subprocess whose ppid is `1` (simulate a crashed prior worker). Boot the worker. Within 1 s of boot completion, that process is SIGKILLed. Registry is clean.
- Integration test: seed `pending_messages` with a row in `status='processing'` from a prior (fake-crashed) worker; boot; assert the row is reset to `status='pending'` within 1 s.
- **Subprocess crash-recovery during runtime**:
- Integration test: while the worker is running, `kill -9` an active SDK subprocess. Within 500 ms the `exit` handler fires, `unregisterProcess` is called, pool slot is released. No timer involved.
(d) **Guards**:
- **B (no polling, no new interval)**: the definitive grep. `grep -rn "setInterval" src/services/worker/ src/services/worker-service.ts` must return **0**. Any hit is a regression — the fix is to either remove the call or convert it to an event-driven / per-session pattern.
- **A (no invented supervisor API)**: `pruneDeadEntries`, `getByPid`, `getBySession`, `getAll`, `reapSession`, `getRuntimeProcess`, `unregisterProcess`, `registerProcess` are the full public surface — any other method name in a diff is an invented API and must be reverted.
- **D (no facade-over-facade)**: the per-session abandoned-session timer lives on `ActiveSession` as a field — no new `AbandonedSessionManager` class, no `SessionTimeoutScheduler` abstraction. If a second per-session timer needs to be added later, *then* extract.
- **E (one code path per concern)**: the only subprocess-death signal at runtime is `child.on('exit')`. Do not add a second redundant signal (no `pid-alive` poller, no "heartbeat check").
### Phase 4 — Delete `staleSessionReaperInterval` + `startOrphanReaper` + periodic SQLite housekeeping (revised 2026-04-22)
(a) **Implement**:
- Delete `src/services/worker/worker-service.ts:174` field declaration (`private staleSessionReaperInterval`).
- Delete `worker-service.ts:537-544` (startOrphanReaper call + `this.stopOrphanReaper` wiring).
- Delete `worker-service.ts:547-589` (entire stale-reaper block, including its embedded `clearFailedOlderThan` and `PRAGMA wal_checkpoint(PASSIVE)` calls). **Do not** create a new `setInterval` in their place. `clearFailedOlderThan` has moved to boot-once (Phase 3 Mechanism C step 4, co-owned with plan 02). `wal_checkpoint` is deleted outright — SQLite's default `wal_autocheckpoint=1000` pages covers it (`Database.ts:162-168` sets no override; the default is active).
- Delete shutdown clauses at `worker-service.ts:1108-1110` (both `clearInterval(this.staleSessionReaperInterval)` and `this.stopOrphanReaper?.()`). The boot-once block has nothing to clear on shutdown.
- Delete `startOrphanReaper` export from `ProcessRegistry.ts` (already removed by Phase 2's file deletion).
- Delete `SessionManager.reapStaleSessions()` method entirely (`SessionManager.ts:516-568`). No stub; no replacement — both of its branches are covered by the per-iterator idle timeout (hung-generator branch) and the per-session abandoned-session timer from Phase 3 (no-generator branch).
- Keep module-level `MAX_SESSION_IDLE_MS` in `SessionManager.ts:26` — it is now consumed by `scheduleAbandonedCheck()` (Phase 3 Mechanism B). Keep `MAX_GENERATOR_IDLE_MS` at `:23` — unchanged usage by `detectStaleGenerator`.
(b) **Docs**:
- 05 § 3.8 Deleted list (`staleSessionReaperInterval`, `startOrphanReaper`, `reapStaleSessions`, periodic `clearFailedOlderThan`, periodic `wal_checkpoint`).
- 05 Part 1 items #5, #6, #7.
- 05 Part 4 timer census (revised 2026-04-22 — 3 → 0).
- 05 Part 2 **D3** (zero repeating background timers).
- 08-reconciliation.md Part 4 revised + C7 revised (no `sqliteHousekeepingInterval`).
- V6.
- Live: `worker-service.ts:174, 537, 547-589, 1108`, `SessionManager.ts:516-568`, `Database.ts:162-168` (auto-checkpoint confirmation).
(c) **Verification**:
- `grep -rn "staleSessionReaperInterval\|startOrphanReaper\|reapStaleSessions\|sqliteHousekeepingInterval" src/`**0** (tests included).
- `grep -rn "setInterval" src/services/worker/ src/services/worker-service.ts`**0**. No carve-outs, no exclusions. If any match appears, the fix is to delete or convert to event-driven, never to add an exclusion comment.
- `grep -rn "wal_checkpoint" src/` → 0 in `worker-service.ts`. (The `PRAGMA wal_autocheckpoint` read at boot for observability is fine if introduced by plan 02.)
- `grep -rn "clearFailedOlderThan" src/` → 2 matches: the definition in `PendingMessageStore.ts` and a single call site inside the boot-once reconciliation block.
(d) **Guards**:
- D: no "deprecated stub" left behind for `reapStaleSessions`; no shim for `startOrphanReaper`; no renamed variant of `sqliteHousekeepingInterval`.
- B: no `setInterval` added anywhere in the worker layer — the grep above is the canonical check.
### Phase 5 — Move `PendingMessageStore` 60-s reset to one-shot boot recovery
(a) **Implement**:
- In `src/services/sqlite/PendingMessageStore.ts`:
- Delete lines `103-116` (self-heal UPDATE inside `claimNextMessage` transaction).
- Add a new public method:
```ts
recoverStuckProcessing(): number {
const stmt = this.db.prepare(`
UPDATE pending_messages
SET status = 'pending', started_processing_at_epoch = NULL
WHERE status = 'processing'
`);
const result = stmt.run();
if (result.changes > 0) {
logger.info('QUEUE', `BOOT_RECOVERY | recovered ${result.changes} stuck processing message(s)`);
}
return result.changes;
}
```
- Note the one-shot version is **unscoped by session** and **unscoped by threshold** — on boot, any `processing` row is by definition stuck (worker was not running a moment ago), so the 60-s guard is not needed. This is cleaner than copying the threshold logic.
- Delete `STALE_PROCESSING_THRESHOLD_MS` constant (line 6) — no remaining caller.
- In `src/services/worker-service.ts`, call `pendingStore.recoverStuckProcessing()` once during boot as part of the boot-once reconciliation block (Phase 3 Mechanism C step 3), after DB initialization. (Co-owned with 02-sqlite-persistence; that plan may also call it — this plan guarantees the call exists.)
(b) **Docs**:
- 05 § 3.3 bottom box "BootOnce → Recover" (authoritative).
- 05 Part 1 item #16.
- 05 § 3.8 bottom "Worker startup → UPDATE pending_messages status processing → pending".
- 06 Phase 6 task 3.
- V19.
- Live: `src/services/sqlite/PendingMessageStore.ts:6, 99-145`.
(c) **Verification**:
- `grep -n "STALE_PROCESSING_THRESHOLD_MS" src/` → 0.
- Integration test: insert `pending_messages` row with `status='processing', started_processing_at_epoch=now-2*3600*1000`; start worker; assert row flips to `pending` before first `claimNextMessage` is called.
- Unit test: `claimNextMessage` is now a pure SELECT+UPDATE transaction; passing a row with `started_processing_at_epoch=now-10000` (stale by old threshold) is **not** reset — confirms boot-only recovery.
(d) **Guards**:
- B: `claimNextMessage` no longer mutates on read path.
- A: `recoverStuckProcessing` is a method on `PendingMessageStore`, not a new table / migration.
### Phase 6 — Inline SIGTERM → wait 5 s → SIGKILL
(a) **Implement**:
- In `SessionManager.deleteSession` (`:381-446`), replace the call at `:412` (`await ensureProcessExit(tracked, 5000)`) with the inlined ladder. 12-line block:
```ts
if (tracked.process.exitCode !== null) {
// already exited
} else {
const exited = new Promise<void>(resolve => tracked.process.once('exit', () => resolve()));
const timed = new Promise<void>(resolve => setTimeout(resolve, 5000));
await Promise.race([exited, timed]);
if (tracked.process.exitCode === null) {
try { tracked.process.kill('SIGKILL'); } catch { /* dead */ }
const killed = new Promise<void>(resolve => tracked.process.once('exit', () => resolve()));
const killTimed = new Promise<void>(resolve => setTimeout(resolve, 1000));
await Promise.race([killed, killTimed]);
}
}
// unregister via supervisor
for (const rec of getSupervisor().getRegistry().getByPid(tracked.pid)) {
if (rec.type === 'sdk') getSupervisor().unregisterProcess(rec.id);
}
notifySlotAvailable();
```
- Do the same inline at `worker-service.ts:786` (other call-site).
- Delete `ensureProcessExit` (already removed with `ProcessRegistry.ts` in Phase 2; this phase also removes its re-export if any temporary shim existed).
(b) **Docs**:
- 05 Part 1 item #9 ("Keep SIGTERM → SIGKILL, delete the ladder framework — inline it").
- 05 § 3.8 Deleted list.
- 06 Phase 5 task 1 ("`ensureProcessExit` → keep as free function... Remove the ladder-framework packaging").
- Live: `ProcessRegistry.ts:185-229`, `SessionManager.ts:412`, `worker-service.ts:786`.
(c) **Verification**:
- `grep -n "ensureProcessExit" src/` → 0.
- Manual: spawn subprocess that ignores SIGTERM (`trap '' TERM; sleep 60`); call `deleteSession`; observe SIGKILL 5 s after the abort.
(d) **Guards**:
- A: no new `EscalationLadder` class, no `ProcessControl` wrapper.
### Phase 7 — Blocking `POST /api/session/end`
(a) **Implement**:
- Add new route in `src/services/worker/http/routes/SessionRoutes.ts`:
```ts
app.post('/api/session/end', this.handleSessionEnd.bind(this));
```
- Handler body (copy and simplify from `handleSummarizeByClaudeId` at `:663-720` + the hook-side wait at `summarize.ts:117-150`):
1. Resolve `session = sessionManager.getSession(sessionDbId)`; if missing, try to init from DB (same pattern `queueSummarize` uses at `SessionManager.ts:332-334`).
2. `sessionManager.queueSummarize(sessionDbId, last_assistant_message)`. Also call `ensureGeneratorRunning(sessionDbId, 'summarize')` (same helper used at `SessionRoutes.ts:500, 708`).
3. Await `session.lastSummaryStored` flag flipping (currently written by `ResponseProcessor` — see 03-response-parsing-storage). Implementation: expose an `awaitSummary(sessionDbId, timeoutMs)` helper on `SessionManager` that returns a `Promise<{ summaryId: number | null; timedOut: boolean }>`. Internally: subscribe to the existing `sessionQueues` EventEmitter for a `summary-stored` event, OR fall back to polling `session.lastSummaryStored` once per 200 ms. *Recommendation: add a `session.summaryStoredEvent = new EventEmitter()` field and have `ResponseProcessor` emit `'stored'` with the summary id; `awaitSummary` uses `events.once(emitter, 'stored')` raced against `setTimeout(110_000)`.*
4. After the promise resolves (or times out): `session.abortController.abort()`. Wait briefly (≤1 s) for generator, then `sessionManager.deleteSession(sessionDbId)` (which runs the inline SIGTERM→SIGKILL from Phase 6 + supervisor `reapSession`).
5. **(Preflight edit 2026-04-22 — reconciliation B2)** Return `{ summaryId, timedOut }` with **HTTP 200 on both success and timeout**. Do NOT return 504 on timeout — that status was rejected in reconciliation. Windows Terminal closes tabs only when the hook exits with code 0; hook 09 Phase 3 maps HTTP 200 → exit 0 unconditionally. If the endpoint returns any non-200, the hook must fall through to exit 1 which accumulates Windows Terminal tabs per CLAUDE.md. Contract: timeout path response is `{ summaryId: null, timedOut: true }` with status 200; success path is `{ summaryId: <number>, timedOut: false }` with status 200. Only programmer errors (400 invalid body, 404 missing session) use non-200.
6. **(Preflight edit 2026-04-22 — reconciliation C6)** Initialize `session.summaryStoredEvent = new EventEmitter()` when an `ActiveSession` is created in `SessionManager` (likely the `initializeSession` method). The emitter is consumed by `awaitSummary` above and produced by `ResponseProcessor` per plan 03 Phase 2 step 5. Field addition on `ActiveSession` shape: `summaryStoredEvent?: EventEmitter`. Use `events.once(session.summaryStoredEvent, 'stored')` raced against `setTimeout(110_000)` inside `awaitSummary`.
- Delete after hook 09 lands: `POST /api/sessions/complete` (`:753`) and `GET /api/sessions/status` consumers in hooks (the hook-side poll loop at `summarize.ts:117-150`). Keep the status endpoint for the viewer UI short-term.
(b) **Docs**:
- 05 § 3.8 `End → queueSummarize → await summary_stored OR 110s → abortController.abort → delete` (authoritative).
- 05 § 3.1 (STOP box: "BLOCKS until summary written or 110s timeout").
- 05 Part 1 item #11 ("`/api/sessions/summarize` blocks until done... Hook waits on one call").
- 05 Part 2 D6.
- Live: `src/cli/handlers/summarize.ts:25, 89, 117-150`, `src/services/worker/http/routes/SessionRoutes.ts:379-720, 747-753`, `src/services/worker/SessionManager.ts:329-377`, `src/services/worker/agents/ObservationBroadcaster.ts:43-55`.
(c) **Verification**:
- Hook-less integration test: POST `/api/session/end` with a valid sessionDbId that has queued work; response arrives only after the summary row exists in `session_summaries`; **HTTP 200** with `{ summaryId: <number>, timedOut: false }`; total latency <5 s in happy path.
- Timeout test: POST with a session whose SDK is hung; response at 110 s with **HTTP 200** and `{ summaryId: null, timedOut: true }`; subprocess is killed (verify PID gone from registry). Assert status code is 200, not 504 — this is a Windows Terminal contract gate (preflight edit B2).
- Hook 09 plan's verification runs one POST (no 500-ms loop) and asserts hook exit 0 on both the success and timeout paths.
(d) **Guards**:
- B: no 500-ms polling loop in the server handler either — use the event emitter or single 200-ms fall-back.
- D: do not keep `/api/sessions/complete` as a "safety net" — one endpoint owns session termination.
- A: do not extend `SessionRoutes` with a seventh summary endpoint; route-count goal is shrink, not grow.
### Phase 8 — Verification
(a) **Run**:
- `grep -rn "setInterval" src/services/worker/ src/services/worker-service.ts`**0** matches. No repeating intervals in the worker layer at all.
- `wc -l src/services/worker/ProcessRegistry.ts 2>/dev/null || echo DELETED` → DELETED.
- `wc -l src/services/worker/process-spawning.ts` → ~150 LoC (contains `createPidCapturingSpawn`, `findSessionProcess`, `getActiveProcesses`, `waitForSlot`, `notifySlotAvailable`, `killSystemOrphans` as free helpers). No `reaper.ts` exists.
- Session-lifecycle total: `SessionManager.ts` (~570 after deleting `reapStaleSessions` + `detectStaleGenerator` + `MAX_GENERATOR_IDLE_MS`, adding `scheduleAbandonedCheck` + `abandonedTimer` wiring) + `process-spawning.ts` (~150) + worker-service boot-once block (~40 added, ~55 removed from the deleted stale-reaper block) + `supervisor/process-registry.ts` (unchanged 408) ≈ **~450 LoC reduction** from today's ~900 in worker-layer lifecycle code.
(b) **Regression suite**:
- Subprocess crash recovery: kill SDK subprocess → within ~500 ms the `child.on('exit')` handler fires at `process-spawning.ts` (copied from `ProcessRegistry.ts:479`) and calls `unregisterProcess(pid)`. No scanner involved.
- Hung-generator kill: SDK subprocess frozen (SIGSTOP) → after 3 min of stream silence the per-iterator idle `setTimeout` at `SessionQueueProcessor.ts:6` fires `onIdleTimeout``SessionManager.ts:651-655``abortController.abort()` → child exits → `exit` handler unregisters. No scanner involved.
- Abandoned-session cleanup: session with no generator and no pending for 15 min → the per-session `abandonedTimer` (scheduled on last-generator-completion) fires, calls `deleteSession(id)`. If new work arrived first, the timer was cleared on activity. No scanner involved.
- Cross-restart orphans: ppid=1 Claude processes from a previously crashed worker are cleaned up exactly once, at the next worker's boot, by `killSystemOrphans()` in the boot-once reconciliation block. No repeating sweep.
- PID reuse: supervisor `isPidAlive` + `verifyPidFileOwnership` (already at `supervisor/process-registry.ts:28-172`) catches PID reuse — no behavior change.
- Privacy gap closed: end-to-end test with `<private>` tag in `last_assistant_message` — not persisted to `session_summaries`.
- Blocking `/api/session/end`: one request, ≤110 s, returns summary id or null.
(c) **Doc-driven coverage check**: every item in 05 § 3.8 "Deleted" list corresponds to a Phase and a grep-based verification.
(d) **Guards audit**: no new timers, no new classes over 5 LoC, no supervisor-registry surface extension.
---
## Confidence + gaps
### High confidence
- Worker-layer `ProcessRegistry.ts` (527 LoC) is a pure facade over `supervisor/process-registry.ts`: every method body I audited (`:34-52`, `:57-65`, `:70-79`, `:85-94`, `:99-101`, `:349-382`) already delegates via `getSupervisor().getRegistry()`. Deletion is mechanical.
- `reapStaleSessions` (SessionManager.ts:516-568) has two independent branches that map cleanly onto existing mechanisms: the generator-active branch is already covered by `SessionQueueProcessor.ts:6` (per-iterator 3-min idle `setTimeout` that resets on every chunk and aborts the controller — then `child.on('exit')` unregisters); the no-generator branch is covered by the new per-session `abandonedTimer` `setTimeout` (Phase 3 Mechanism B). `detectStaleGenerator` (`:59-84`) is deleted along with `reapStaleSessions` — the per-iterator timer at the stream level is the single source of truth for "silent generator."
- Supervisor `reapSession` (`supervisor/process-registry.ts:292-385`) already implements SIGTERM → 5 s → SIGKILL; the worker-layer `ensureProcessExit` (`ProcessRegistry.ts:185-229`) duplicates this for the ChildProcess reference. Inlining the worker version keeps per-process escalation while supervisor-level reap handles the session-wide sweep on `deleteSession`.
- Cadence math: 30 s tick × 4 = 2 min matches the current `staleSessionReaperInterval` cadence at `worker-service.ts:589`. Zero timing regression.
### Gaps / open integration points
1. **`summary_stored` wiring (Phase 7)** — the cleanest implementation needs `ResponseProcessor` (03-response-parsing-storage) to emit a per-session event on successful summary write. Today `session.lastSummaryStored` is written (referenced at `SessionRoutes.ts:747`) but there is no event — only a polled read. **Blocking coordinate point: 09-lifecycle-hooks cannot simplify its hook until Phase 7 is wired, and Phase 7 cannot wire `awaitSummary` cleanly until 03 exposes an emitter.** Concrete ask from 03: add `session.summaryStoredEvent = new EventEmitter()` populated inside `ResponseProcessor` after the commit (approx. location: `src/services/worker/agents/ResponseProcessor.ts:228` region where `broadcastSummary` is already called). Fallback if 03 can't accommodate: Phase 7 polls `session.lastSummaryStored` at 200 ms with the 110 s timeout — still one HTTP call from the hook's perspective, still blocking server-side, just internally polled. Degrades cleanly.
2. **SQLite housekeeping in `worker-service.ts:547-589`** (resolved 2026-04-22) — the stale-reaper block today also runs `clearFailedOlderThan(1h)` and `PRAGMA wal_checkpoint`. Under the zero-timer model: `clearFailedOlderThan` moves to boot-once (co-owned with plan 02's boot-once SQLite housekeeping phase); `wal_checkpoint` explicit calls are deleted outright because `Database.ts:162-168` sets no `wal_autocheckpoint` override, so SQLite's default of 1000 pages is the active policy. This plan's Phase 4 deletes all three items together — no transient "two `setInterval` hits" in the diff.
@@ -0,0 +1,363 @@
# Plan 08 — transcript-watcher-integration (clean)
**Feature scope**: `src/services/transcripts/*` + `src/cli/handlers/observation.ts` HTTP loopback.
**Source of truth (design)**: `PATHFINDER-2026-04-21/05-clean-flowcharts.md` § 3.12; Part 1 items #17, #18, #19.
**Phase-7 counterpart in 06**: `PATHFINDER-2026-04-21/06-implementation-plan.md` Phase 7 (Transcript watcher cleanup).
**Before-state**: `PATHFINDER-2026-04-21/01-flowcharts/transcript-watcher-integration.md`.
## Dependencies (must land first)
| Plan | Dependency | What this plan consumes |
|---|---|---|
| `07-plans/01-privacy-tag-filtering.md` | `stripMemoryTags(text)` (06 Phase 1) | Single call used inside `ingestObservation`. We never strip in the watcher. |
| `07-plans/07-session-lifecycle-management.md` | `ingestObservation(payload)` helper (06 Phase 2) + `SessionManager.initializeSession` / `endSession` direct API (06 § 3.8) | Watcher calls the helper **directly** (no `workerHttpRequest`, no `observationHandler.execute`). Session lifecycle routes `session_init` / `session_end` to `SessionManager` without HTTP. |
Downstream dependents: **none**.
## Dependency-verified facts (live-code citations)
- **V18 confirmed** (`06-implementation-plan.md:45`). All three artifacts still present:
- 5-s rescan timer — `src/services/transcripts/watcher.ts:124` (`rescanIntervalMs ?? 5000`) + `setInterval(...)` at `:125`.
- `pendingTools` map — `src/services/transcripts/processor.ts:23` (in `SessionState` interface) + `.set` at `:202`, `.get/.delete` at `:232-236`, `.clear` at `:317`.
- HTTP loopback — `src/cli/handlers/observation.ts:17` loops through `workerHttpRequest('/api/sessions/observations', ...)`. Chain: watcher.ts:221 → processor.ts:252 `observationHandler.execute` → observation.ts:17 `workerHttpRequest` back to the same worker. This is the "call the CLI handler from inside the worker, which HTTP-loops back to the worker" anti-pattern.
- **Schema list (exhaustive)**: only **one** JSONL transcript schema ships today: **Codex**, defined in `src/services/transcripts/config.ts:9` as `CODEX_SAMPLE_SCHEMA` (confirming `63472 — CODEX_SAMPLE_SCHEMA in config.ts is the source of truth`). The live config file is `transcript-watch.example.json` (line 1-95) which registers only `codex` under `schemas.codex`. The `CodexCliInstaller.ts` is the only installer that merges JSONL schemas into `~/.claude-mem/transcript-watch.json` (`src/services/integrations/CodexCliInstaller.ts:97-99`).
- `CursorHooksInstaller.ts`, `OpenCodeInstaller.ts`, `GeminiCliHooksInstaller.ts` do **not** register JSONL transcript schemas — they install **PostToolUse hooks** that feed the CLI observation handler directly (same path as Claude Code's own hooks). They do not touch the transcript watcher.
- **The audit's "Cursor, OpenCode, Gemini-CLI" for transcript ingestion is accurate only at the user-facing-feature level (these agents' activity is captured), but the capture path for those three is the hook handler chain, not the JSONL watcher.** The watcher's only current JSONL client is Codex.
- **tool_use_id availability in Codex schema** (`src/services/transcripts/config.ts:47-77`):
- `tool-use` event: `toolId: 'payload.call_id'` — present on `function_call`, `custom_tool_call`, `web_search_call`, `exec_command`.
- `tool-result` event: `toolId: 'payload.call_id'` — present on `function_call_output`, `custom_tool_call_output`, `exec_command_output`.
- **Both sides always carry `call_id`** in the Codex schema. No fallback needed for Codex.
- **Schema-driven, not hard-coded**: the `toolId` field is part of the `SchemaEvent.fields` contract (`src/services/transcripts/types.ts:34`). Any future schema that wants to use the transcript watcher must set `fields.toolId` on both its tool_use and tool_result events, or pair them some other way. Phase 2 below documents this contract explicitly.
- **Watched parent dir per schema**: `~/.codex/sessions/**/*.jsonl` (`config.ts:95`, `transcript-watch.example.json:83`). The glob matches files recursively under `~/.codex/sessions/`. The parent dir to pass to `fs.watch(..., { recursive: true })` is the **glob-root**: `expandHomePath('~/.codex/sessions')` (everything before the first glob metachar). `resolveWatchFiles()` at `watcher.ts:143-163` already understands glob vs plain-dir vs plain-file — the new watch code will derive the root the same way.
- **fs.watch recursive support**: supported on macOS, Linux (kernel >= 2.6.36 via `inotify`, but Node's recursive option landed with macOS + Windows in 0.x and Linux in Node 20 via libuv). CI target: `package.json:58` declares `"node": ">=18.0.0"`. **Recursive fs.watch on Linux requires Node 20+**; we must bump the engines floor (see Gaps). Bun supports `fs.watch` recursive on all three platforms.
- **FileTailer location**: `src/services/transcripts/watcher.ts:15-81` (unchanged by this plan — lines already do the byte-offset-tail correctly; only the file-discovery layer changes).
## Phase contract (applies to every phase below)
- **(a) Copy from** `05-clean-flowcharts.md` § 3.12 (canonical flowchart).
- **(b) Docs** at the top of each phase: 05 section ref + 06 verified finding (V-number) + live file:line.
- **(c) Verification** is mechanical: a `grep` count, a runtime test, or a file existence check.
- **(d) Anti-pattern guards** — every phase cites (from `06:59-66`):
- **A** — no invented APIs. Grep for the method before using it.
- **B** — no polling; `fs.watch` events only (no rescan `setInterval`).
- **E** — one code path for observation ingest; watcher + CLI hook both call `ingestObservation`, never a second path.
---
## Phase 1 — Parent-directory recursive watch replaces per-file `fs.watch` + 5 s rescan
**Goal**: `fs.watch(parentDir, { recursive: true }, onFileEvent)` supplants both the per-file `fsWatch(filePath, ...)` in `FileTailer` and the `setInterval(..., rescanIntervalMs)` rescan in `TranscriptWatcher`.
### (a) What to implement — Copy from § 3.12
From the clean flowchart (`05-clean-flowcharts.md:484-500`):
```
Boot["Worker startup"] --> LoadCfg["loadTranscriptWatchConfig"]
LoadCfg --> ParentWatch["fs.watch(parent_dir, {recursive})
watches existing files AND new files"]
ParentWatch --> OnChange([File event])
OnChange --> ReadDelta["FileTailer.readNewBytes"]
```
**Code change (watcher.ts)**:
1. Delete the per-file watcher inside `FileTailer` (`src/services/transcripts/watcher.ts:16`, `:28-33`, `:35-38`). `FileTailer` becomes a pure byte-offset reader — no internal `fs.watch` subscription. Rename its `start()` to `readAvailable()` (one-shot tail) and drop the `close()` method (nothing to close now).
2. In `TranscriptWatcher.setupWatch` (`:110`), derive `glob-root` from `watch.path`:
- If `watch.path` has no glob metachars and is a file: watch `dirname(resolved)` non-recursively.
- Otherwise: walk the path tokens, stop at the first token containing a glob metachar, join the prefix — that's the root dir (e.g. `~/.codex/sessions/**/*.jsonl``~/.codex/sessions`). Use the new helper `getGlobRoot(inputPath): string`.
3. Replace `setInterval(async () => { ... }, rescanIntervalMs)` (`:124-132`) with:
```ts
fs.watch(globRoot, { recursive: true, persistent: true }, (eventType, filename) => {
if (!filename) return;
const absPath = path.resolve(globRoot, filename);
if (!globMatches(absPath, resolvedPath)) return;
// rename event fires when a new file is created (or renamed/deleted)
if (!this.tailers.has(absPath) && existsSync(absPath)) {
this.addTailer(absPath, watch, schema, false).catch(err =>
logger.warn('TRANSCRIPT', 'addTailer failed on fs.watch event',
{ file: absPath, error: err instanceof Error ? err.message : String(err) }));
}
const tailer = this.tailers.get(absPath);
tailer?.readAvailable().catch(() => undefined);
});
```
4. Update `TranscriptWatcher.stop()` (`:99-108`) to close the single parent watcher per target instead of iterating per-tailer `.close()` + `clearInterval` on the timer array. Delete the `rescanTimers: NodeJS.Timeout[]` field (`:87`).
5. Delete the `rescanIntervalMs?: number` field from `WatchTarget` (`src/services/transcripts/types.ts:61`). Update `CodexCliInstaller.ts` and `transcript-watch.example.json` if either still sets it (grep).
### (b) Docs cited
- 05 § 3.12 lines 482-500 (clean flowchart).
- Part 1 item #19 (`05-clean-flowcharts.md:37`) — "5-s rescan timer for new transcript files".
- V18 (`06-implementation-plan.md:45`) — `rescanIntervalMs ?? 5000` at `watcher.ts:124`.
- Live: `src/services/transcripts/watcher.ts:28` (per-file `fsWatch`), `:124-133` (rescan interval + `setInterval`).
### (c) Verification
- `grep -n "setInterval" src/services/transcripts/`**zero** matches.
- `grep -n "rescanIntervalMs" src/ transcript-watch.example.json`**zero** matches.
- Runtime test: start worker against an empty temp dir `T`; wait 1 s; `touch T/new-session.jsonl` then `echo '{"type":"session_meta","payload":{"id":"test","cwd":"/tmp"}}' >> T/new-session.jsonl`; assert a `TRANSCRIPT Watching transcript file` log line appears within **100 ms** of the write (not within the old 5 s window). Follow up with a tool_use line and assert `pending_messages` row appears within another 100 ms.
- `grep -n "new FileTailer.*filePath.*offset.*onLine" src/services/transcripts/` → still exactly one call site in `addTailer` (signature preserved for byte-offset state).
### (d) Anti-pattern guards
- **A**: do not invent a "glob walker" class. A single `getGlobRoot(path: string): string` top-level function is enough.
- **B**: **no** fallback `setInterval` "in case fs.watch misses events". The parent-recursive watch is the contract; missed-event scenarios fall under the Gaps section (Node-version requirement).
### Blast radius
Single file rewrite: `src/services/transcripts/watcher.ts`. Small touch: `types.ts` (drop `rescanIntervalMs`). One touch to `CodexCliInstaller.ts` or `transcript-watch.example.json` only if they reference that deleted option.
---
## Phase 2 — Delete `pendingTools` map; match `tool_use` + `tool_result` by `tool_use_id` at parse time
**Goal**: `SessionState.pendingTools: Map<string, …>` is gone. Tool pairing happens locally inside each log file's tail buffer keyed by `tool_use_id`; the per-session map disappears.
### (a) What to implement — Copy from § 3.12
```
Route -->|tool_use + tool_result paired by tool_use_id| Ingest["ingestObservation({sessionDbId, tool_use_id, name, input, output})"]
```
**Code change (processor.ts)**:
1. Remove `pendingTools: Map<string, {name?, input?}>` from `SessionState` (`src/services/transcripts/processor.ts:23`).
2. Remove `pendingTools: new Map()` from `getOrCreateSession` (`:59`).
3. Rewrite `handleToolUse` (`:193-222`):
- Move the per-file pairing buffer **out of** the session and **into** `TranscriptWatcher` as a **per-file** map: `private pendingToolUses = new Map<string /* filePath */, Map<string /* tool_use_id */, { name: string; input: unknown; ts: number }>>()`. Inject it as a callback arg, or move the pairing into the processor keyed by file.
- Simpler option (preferred): keep the short-lived pairing **in the processor keyed by `${watch.name}:${sessionId}:${tool_use_id}`** — it still clears on `tool_result`, but it's keyed by ID, not by session-state entry. Upper bound size with an LRU (`max=10_000`, drop-oldest) to avoid unbounded growth if a tool_use has no matching tool_result.
4. Rewrite `handleToolResult` (`:224-246`) to read from that keyed map; on hit, emit **one** `ingestObservation({sessionDbId, tool_use_id, name, input, output})` call (Phase 3 wires the helper). On miss, log debug + drop (don't synthesize).
5. Drop the `apply_patch` auto-file-edit branch at `:205-213` only if Codex stops sending `tool_use` with `toolResponse` inline — inspecting `handleToolUse` today, there's a legacy branch at `:215-221` that fires `sendObservation` from inside `handleToolUse` when `toolResponse !== undefined`. That branch is the **first half of the duplicated ingest** and must be deleted in Phase 3. Keep the `apply_patch` file-edit branch (`:205-213`); file edits are a separate path not in scope here.
6. Session state retains `lastUserMessage`, `lastAssistantMessage`, `cwd`, `project` — untouched.
### (b) Docs cited
- 05 § 3.12 line 494 ("paired by tool_use_id").
- Part 1 item #17 (`05-clean-flowcharts.md:35`) — "pendingTools map in TranscriptEventProcessor ... match by ID, no state map."
- V18 — pendingTools presence confirmed.
- Live: `src/services/transcripts/processor.ts:23` (interface field), `:59` (init), `:202` (`.set`), `:232-236` (lookup/delete), `:317` (clear on session_end).
- Contract source: Codex schema in `src/services/transcripts/config.ts:47-77``toolId: 'payload.call_id'` on both tool_use and tool_result.
### (c) Verification
- `grep -rn "pendingTools" src/`**zero** matches (interface field, initializer, and three call sites all gone).
- `grep -n "SessionState" src/services/transcripts/processor.ts` — interface still exists, but with `pendingTools` field removed (assert via a small diff check in a test).
- Runtime: replay a recorded Codex JSONL (fixture). Assert the stream of `pending_messages` rows matches byte-for-byte with the pre-refactor run for the same fixture (the pairing semantics are unchanged; we only moved where the map lives).
- Memory test: feed 50 sessions with 1000 tool_use each but **no** tool_result. The LRU bounds at 10k — not unbounded.
### (d) Anti-pattern guards
- **A**: the pairing map is a private field of `TranscriptEventProcessor`, not a new `ToolPairingService` class.
- **E**: only **one** observation ingest call per paired event — delete the `handleToolUse`-inline `sendObservation` branch at `:215-221` in Phase 3.
### Blast radius
`src/services/transcripts/processor.ts` only. No schema contract change (Codex already populates `call_id` on both sides).
---
## Phase 3 — Replace `observationHandler.execute()` HTTP loopback with direct `ingestObservation(payload)`
**Goal**: `sendObservation` no longer calls the CLI handler, which no longer does `workerHttpRequest`. The worker process calls its own helper in-memory.
### (a) What to implement — Copy from § 3.12 + D1
From 05 Part 2 Decision D1 (`:69-70`):
> **D1. One observation ingest path.** Hook, transcript-watcher, and manual-save all call `ingestObservation(payload)`. That function does: strip tags → validate privacy → INSERT `pending_messages`. **No HTTP loopback inside the worker process.**
From § 3.12 line 494 — `ingestObservation({sessionDbId, tool_use_id, name, input, output})`.
**Code change**:
1. In `src/services/transcripts/processor.ts`:
- Replace `sendObservation` body (`:248-260`) so it builds the `IngestObservationPayload` (matching the shape owned by `07-plans/07-session-lifecycle-management.md`) and calls `await ingestObservation(payload)` directly. No `observationHandler` import.
- Remove the import of `observationHandler` (`:3`).
- Remove the import of `workerHttpRequest` and `ensureWorkerRunning` from `../../shared/worker-utils.js` (`:6`) **from the observation path only**`queueSummary` still hits `/api/sessions/summarize` today and `updateContext` still hits `/api/context/inject`; those two are untouched by Phase 3. Phase 4 deletes both.
2. In `src/services/transcripts/watcher.ts`: no change — the watcher already delegates to `processor.processEntry`; the processor is what imports the helper.
3. `IngestObservationPayload` shape reused from Plan 07 (definition lives in `src/services/worker/ingest/index.ts`):
```ts
{ contentSessionId, platformSource, cwd, tool_name, tool_use_id,
tool_input, tool_response, agentId?, agentType? }
```
Plan 07 additionally adds `tool_use_id` as a required field when the caller is the transcript watcher (already present in hook-path flows via the UNIQUE constraint added in Phase 9 of `06-implementation-plan.md`). Synthesize `tool_use_id = payload.call_id` from the schema's `toolId` field.
### (b) Docs cited
- 05 § 3.12 line 494, Part 2 D1 lines 69-70.
- Part 1 item #18 (`05-clean-flowcharts.md:36`) — "observationHandler.execute() HTTP loopback from transcript-watcher ... Extract ingestObservation helper; both call it directly."
- V18 — `observation.ts:17` HTTP loopback confirmed.
- Live: `src/cli/handlers/observation.ts:17` (`workerHttpRequest('/api/sessions/observations', …)`), `src/services/transcripts/processor.ts:252` (`observationHandler.execute` call site).
- Dependency contract: `07-plans/07-session-lifecycle-management.md` exports `ingestObservation` at `src/services/worker/ingest/index.ts` per `06-implementation-plan.md:126-132`.
### (c) Verification
- `grep -n "observationHandler" src/services/transcripts/`**zero** matches.
- `grep -n "workerHttpRequest.*observations" src/services/transcripts/`**zero** matches.
- `grep -n "workerHttpRequest" src/services/transcripts/` → count ≤ 2 (temporarily: `queueSummary` + `updateContext`, deleted in Phase 4).
- `grep -n "workerHttpRequest" src/cli/handlers/observation.ts` → still exactly one (CLI hook path still uses HTTP when the CLI is a separate process from the worker; that's **not** a loopback, it's the hook-to-worker boundary).
- Unit test: seed a single Codex JSONL line with a tool_use + tool_result pair; assert (1) exactly one `pending_messages` INSERT, (2) zero outbound HTTP requests recorded against the worker's own `/api/sessions/observations` endpoint (use an HTTP spy).
### (d) Anti-pattern guards
- **B**: no polling — direct function call, not an event bus, not a retry loop.
- **E**: the hook path and the transcript path **both** call `ingestObservation(payload)`. Only ingress shape conversion differs; the helper is the single code path (matches `06-implementation-plan.md:146` — "One helper, both handlers call it.").
### Blast radius
`src/services/transcripts/processor.ts` only. The watcher chain inside the worker process no longer crosses the HTTP boundary. The CLI hook (`observation.ts`) remains unchanged for this phase — it runs in the hook subprocess and must HTTP the worker.
---
## Phase 4 — Route `session_init` / `session_end` directly to `SessionManager` (drop `/api/sessions/summarize` + `/api/context/inject` loopbacks)
**Goal**: `handleSessionInit` calls `SessionManager.initializeSession` directly. `handleSessionEnd` calls `SessionManager.endSession` (which internally queues the summary the same way the hook-side does). The last two in-process HTTP loopbacks disappear from the transcript path.
### (a) What to implement — Copy from § 3.12
```
Route -->|session_init| Init["sessionManager.initializeSession(sessionDbId)
(direct, no HTTP loopback)"]
Route -->|session_end| EndFlow["sessionManager.endSession(sessionDbId)
→ queueSummarize (same as hook path)"]
EndFlow --> WriteCtx["Optional: writeAgentsMd (Cursor flag)"]
```
**Code change (processor.ts)**:
1. Replace `handleSessionInit` (`:178-191`) with a direct call to `SessionManager.initializeSession(sessionDbId, userPrompt=fields.prompt, promptNumber)`. The worker-process `SessionManager` instance is injected via constructor (plan 07 already plumbs this; the watcher receives it in `TranscriptWatcher` constructor).
2. Replace `queueSummary` (`:322-344`): call the same helper that `07-plans/07-session-lifecycle-management.md` exposes as `endSession({contentSessionId, platformSource, last_assistant_message})` → internally it calls `ingestSummary(payload)` (from `06-implementation-plan.md:130`). No `workerHttpRequest('/api/sessions/summarize', …)`.
3. Replace `updateContext` (`:346-392`): keep the **path-traversal guard** (`:363-373` — real security check, not patch cruft), but replace the HTTP call at `:377` with a direct `generateContext(allProjects)` call from `ContextBuilder` (the same function `/api/context/inject` handler wraps). `writeAgentsMd` unchanged.
4. Remove import of `ensureWorkerRunning` and `workerHttpRequest` (both already freed by this point).
5. `sessionCompleteHandler.execute` at `processor.ts:311-315` — delete; `endSession` subsumes it.
### (b) Docs cited
- 05 § 3.12 lines 493, 495, 497 — direct `initializeSession` / `endSession`, `writeAgentsMd` kept.
- 05 Part 2 D1 line 70 — "no HTTP loopback inside the worker process."
- Dependency: plan 07 `06-implementation-plan.md:114-152` (Phase 2 helpers: `ingestObservation`, `ingestPrompt`, `ingestSummary`) and `:321-326` (§ 3.8 `endSession` blocks until summary).
- Live: `src/services/transcripts/processor.ts:185` (`sessionInitHandler.execute`), `:334` (`workerHttpRequest('/api/sessions/summarize', …)`), `:377` (`workerHttpRequest(contextUrl)`), `:363-373` (security guard — **preserve**).
### (c) Verification
- `grep -n "workerHttpRequest\|ensureWorkerRunning" src/services/transcripts/`**zero** matches.
- `grep -n "sessionInitHandler\|sessionCompleteHandler\|observationHandler" src/services/transcripts/`**zero** matches.
- `grep -n "writeAgentsMd\|isPathSafe" src/services/transcripts/processor.ts` → still present (security guard kept).
- Integration: drive a full Codex JSONL run through the watcher; assert the AGENTS.md file is written with the same content as the pre-refactor path.
### (d) Anti-pattern guards
- **D**: no facade — the processor talks to `SessionManager` **directly**, not via a `TranscriptSessionBridge`.
- **E**: `ingestSummary` is the one code path — transcript `session_end` and hook `Stop` both call it.
### Blast radius
`src/services/transcripts/processor.ts` — large internal rewrite. No external shape changes: the eventual `pending_messages` rows are byte-identical to today's hook-path output.
---
## Phase 5 — Remove `isProjectExcluded` re-check in the processor (moved into `ingestObservation`)
**Goal**: The transcript processor does not re-run project-exclusion. `ingestObservation` (and its siblings) run the check once, centrally (per Plan 07).
### (a) What to implement — Copy from § 3.12
From 05 § 3.12 Deleted list (`:502-506`):
> - `isProjectExcluded` re-check inside transcript processor (done once in `ingestObservation`)
**Code change**:
1. `grep -n "isProjectExcluded" src/services/transcripts/` — if any call site exists (it is currently checked inside `observationHandler.execute`, `src/cli/handlers/observation.ts:59`, which the watcher path no longer uses after Phase 3), delete it.
2. Assert `ingestObservation` performs the exclusion check (Plan 07 requirement, per `06-implementation-plan.md:132` — "(b) runs privacy / project-exclusion validation").
### (b) Docs cited
- 05 § 3.12 deleted-list (`:506`).
- Dependency: `06-implementation-plan.md:132`.
- Live: `src/cli/handlers/observation.ts:57-62` — current exclusion check (removed from the transcript path by Phase 3's loopback kill; this phase confirms no second copy exists in the watcher).
### (c) Verification
- `grep -rn "isProjectExcluded" src/services/transcripts/`**zero** matches.
- `grep -n "isProjectExcluded" src/services/worker/ingest/`**exactly one** call (inside `ingestObservation` / shared privacy-validate path).
### (d) Anti-pattern guards
- **E**: one exclusion check, one code path — `ingestObservation` is authoritative.
### Blast radius
Essentially a grep-and-delete pass; most likely zero lines to change (the check never lived in the processor, only in the CLI handler we've already unlinked).
---
## Phase 6 — Verification gate
**Goal**: Prove the four deletions and the single new mechanism by mechanical checks.
### Checks
1. **Parent-dir watch drop test** (from Phase 1's ©): write a brand-new JSONL file into a mock watched dir; within **100 ms** observe a `Watching transcript file` log line AND a `pending_messages` INSERT after the first tool_use+tool_result pair. Without the 5-s rescan, this must succeed on a sub-second timeline.
2. **`pendingTools` gone**: `grep -rn "pendingTools" src/``0`.
3. **HTTP loopback gone**: `grep -rn "workerHttpRequest\|ensureWorkerRunning" src/services/transcripts/``0`. `grep -rn "observationHandler\|sessionInitHandler\|sessionCompleteHandler" src/services/transcripts/``0`.
4. **Timer gone**: `grep -rn "setInterval" src/services/transcripts/``0`.
5. **Single-path ingest**: `grep -rn "ingestObservation(" src/` — ≥ 2 call sites (transcript processor + hook-path route handler from Plan 07); zero in CLI handler (still uses HTTP to reach the worker).
6. **Schema-contract fuzz**: drop a crafted JSONL where `tool_use` omits `call_id`. Assert: debug log "tool_use without toolId", no crash, no paired observation emitted. Drop a `tool_result` with a `call_id` we never saw. Assert: debug log "orphan tool_result", no crash.
7. **Cursor / OpenCode / Gemini-CLI unaffected**: those paths go through `src/cli/handlers/observation.ts` (hook PostToolUse). Run the standard hook-round-trip smoke test (`npm run build-and-sync` + trigger a PostToolUse from each); assert `pending_messages` rows still appear. **This is the non-regression guard for the prompt's "preserve Cursor/OpenCode/Gemini-CLI" constraint** — they never depended on the transcript JSONL watcher, so Phases 1-5 cannot break them; this check exists to *prove* it.
8. **End-to-end**: full Codex JSONL fixture → expected SQLite state identical to pre-refactor.
### Anti-pattern guards (final sweep)
- **A**: every new identifier (`getGlobRoot`, `pendingToolUses` map, `readAvailable`) traces to a concrete live function or the plan's invented, single-use helper. No new classes.
- **B**: one `fs.watch` subscription per target, no timers, no polling, no "retry-rescan on SIGCHLD".
- **E**: transcript processor and hook route both import `ingestObservation` from the same module (`src/services/worker/ingest/index.ts`), with no privately duplicated strip / privacy / exclusion logic.
---
## Summary of line deletions
Against current live code:
| File | Lines removed | Lines added | Net |
|---|---|---|---|
| `src/services/transcripts/watcher.ts` | ~40 (per-file fsWatch + rescan interval + timer-cleanup scaffolding) | ~25 (parent-dir recursive watch + `getGlobRoot`) | -15 |
| `src/services/transcripts/processor.ts` | ~120 (`pendingTools` state, `handleToolUse` inline ingest, HTTP queueSummary, HTTP updateContext, handler imports) | ~50 (LRU tool-pairing map, direct `ingestObservation`/`endSession` calls, direct `generateContext` import) | -70 |
| `src/services/transcripts/types.ts` | 1 (`rescanIntervalMs` field) | 0 | -1 |
| `src/cli/handlers/observation.ts` | 0 (preserved; hook path still HTTPs the worker) | 0 | 0 |
| **Total** | **~161** | **~75** | **~-86** |
Plan-level estimate aligns with `05-clean-flowcharts.md:554` row "Transcript 5-s rescan + pendingTools map + HTTP loopback: -150 / +40 / -110" — consistent with our per-file count.
---
## Phase count
**6 phases** (5 implementation + 1 verification gate), matching the minimum set specified in the prompt.
---
## Gaps and open questions
1. **Node-version floor must bump.** `package.json:58` currently pins `"node": ">=18.0.0"`. `fs.watch(dir, { recursive: true })` on **Linux** became stable in **Node 20** (earlier versions throw `ERR_FEATURE_UNAVAILABLE_ON_PLATFORM`). macOS + Windows + Bun have supported it all along. **Action before merging Phase 1**: bump `engines.node` to `>=20.0.0` (coordinate with infra/CI matrix) and verify the plugin's install path (Bun-managed) satisfies it. If bumping is blocked, a Linux-only fallback (chokidar or a polling Map of child dirs) is needed — but that re-introduces anti-pattern B, so the Node-20 bump is the right move.
2. **Single schema in the live codebase, audit phrasing diverges from implementation.** The audit text (and this prompt) references "Cursor, OpenCode, Gemini-CLI transcript ingestion" as preserved. In this codebase **those three agents ingest through the PostToolUse hook chain** (`CursorHooksInstaller.ts`, `OpenCodeInstaller.ts`, `GeminiCliHooksInstaller.ts` — none of which register a JSONL schema). The only JSONL schema is **Codex** (`src/services/transcripts/config.ts:9` + `transcript-watch.example.json`). Phases 1-5 therefore only affect the Codex capture path. The preservation claim for Cursor/OpenCode/Gemini-CLI is satisfied trivially — their path doesn't touch this feature. This is worth calling out in the PR description to avoid reviewer confusion.
## Sources consulted
- `PATHFINDER-2026-04-21/05-clean-flowcharts.md` — full file, § 3.12 canonical, Part 1 #17/18/19, Part 2 D1, Part 4 timer census, Part 5 deletion row.
- `PATHFINDER-2026-04-21/06-implementation-plan.md` — full file, Phase 0 V18, Phase 7 scope, Phase 2 ingest-helper contract.
- `PATHFINDER-2026-04-21/01-flowcharts/transcript-watcher-integration.md` — full before-state.
- `src/services/transcripts/watcher.ts` (lines 1-242).
- `src/services/transcripts/processor.ts` (lines 1-393).
- `src/services/transcripts/config.ts` (lines 1-138).
- `src/services/transcripts/types.ts` (lines 1-70).
- `src/services/transcripts/field-utils.ts` (lines 1-153).
- `src/cli/handlers/observation.ts` (lines 1-86).
- `src/services/worker/http/routes/SessionRoutes.ts` (lines 560-659 for `handleObservationsByClaudeId` shape).
- `src/services/worker-service.ts` (watcher lifecycle at :90, :164, :466, :614-640, :1095-1097).
- `src/services/integrations/{CursorHooksInstaller,OpenCodeInstaller,GeminiCliHooksInstaller,CodexCliInstaller}.ts` — confirming only Codex registers a JSONL schema.
- `transcript-watch.example.json` — confirming only `codex` schema in the live config template.
- `package.json:57-60` — Node engine floor.
@@ -0,0 +1,469 @@
# Phase Plan 09 — lifecycle-hooks (clean)
**Date**: 2026-04-22
**Target flowchart**: `PATHFINDER-2026-04-21/05-clean-flowcharts.md` §3.1 ("lifecycle-hooks (clean)")
**Before-state**: `PATHFINDER-2026-04-21/01-flowcharts/lifecycle-hooks.md`
**Scope**: Collapse the 10 current `SessionRoutes` endpoints + the 500-ms polling Stop hook + the 8 per-handler `ensureWorkerRunning` calls + the duplicate `/api/context/*` fetches into the clean 4-endpoint, no-polling, hook-cached design from §3.1. **Zero user-facing change. Exit codes preserved.**
---
## Header: Dependencies
**Upstream (must land first):**
- **Plan 01 — privacy-tag-filtering** (Phases 12 of the implementation plan — `stripMemoryTags` + `ingestObservation/ingestPrompt/ingestSummary` helpers). Required because the new `POST /api/session/observation`, `POST /api/session/prompt`, and `POST /api/session/end` endpoints call those ingest helpers rather than re-implementing tag stripping. Cite: `06-implementation-plan.md` Phase 1 + Phase 2 (plan authoring pipeline; `01-privacy-tag-filtering.md` when landed).
- **Plan 05 — context-injection-engine** — introduces `GET /api/session/start` returning `{sessionDbId, contextMarkdown, semanticMarkdown}`. Phase 1 of this plan depends on that endpoint existing on the worker side. Cite: `05-clean-flowcharts.md` §3.5 + §3.1 arrow `SS → SSR`.
- **Plan 07 — session-lifecycle-management** — introduces blocking `POST /api/session/end` (per-session `Deferred<SummaryResult>` resolved by `ResponseProcessor` when the summary row is written; 110 s hard timeout). Phase 3 of this plan switches the Stop hook to call that endpoint. Cite: `05-clean-flowcharts.md` §3.8 (`POST /api/session/end → queueSummarize → await summary_stored flag OR 110s timeout`), Part 2 decision **D6** (blocking endpoints over polling), `06-implementation-plan.md` Phase 11 step 2.
**Downstream:** none. This is a leaf cleanup in the dependency DAG — no other feature plan reads from the hook layer.
---
## Sources Consulted (what this plan is built from)
1. `PATHFINDER-2026-04-21/05-clean-flowcharts.md` — full read. Authoritative §3.1 diagram (lines 89123); §3.9 route inventory (lines 382418); Part 1 bullshit-inventory items **#11** (500 ms poll), **#12** (double `/api/context/inject`), **#13** (`ensureWorkerRunning` every entry), **#14** (`/api/context/inject` + `/api/context/semantic` both at UserPromptSubmit); Part 2 decision **D6** (blocking endpoints over polling, line 79); Part 4 timer census (Summary poll 500 ms × 220 iter → endpoint blocks, line 520); Part 5 deletion ledger rows `Summarize 500-ms polling hook -60/+20` and `Double /api/context/* fetches → /api/session/start -120/+60` (lines 552553).
2. `PATHFINDER-2026-04-21/06-implementation-plan.md` — Phase 0 verified-findings **V8** (500 ms poll @ `summarize.ts:117150`, `POLL_INTERVAL_MS=500` @ `:24`, `MAX_WAIT_FOR_SUMMARY_MS=110_000` @ `:25`), **V9** (SessionRoutes is **actually 10 endpoints, not 8**: six `/sessions/:sessionDbId/*` at `:377:382` + five `/api/sessions/*` at `:385:389`; `/api/sessions/status` is the polled one), **V10** (`ensureWorkerRunning` in all 8 CLI handlers: `context.ts:19`, `user-message.ts:35`, `summarize.ts:44`, `observation.ts:34`, `file-context.ts:218`, `file-edit.ts:32`, `session-init.ts:41`, `session-complete.ts:35`). Phase 2 (unified ingest helpers) and Phase 11 (endpoint consolidation) define the shared contract.
3. `PATHFINDER-2026-04-21/01-flowcharts/lifecycle-hooks.md` — "before" diagram. 10 hook→worker HTTP edges enumerated (lines 8492 — side effects). Two-phase Stop handling (`summarize` → poll → `session-complete`) at lines 6873.
4. Live codebase (verified `Read`/`Grep` during authoring, 2026-04-22):
- `src/cli/handlers/context.ts:19``await ensureWorkerRunning()` at SessionStart.
- `src/cli/handlers/user-message.ts:35``await ensureWorkerRunning()` at SessionStart (parallel).
- `src/cli/handlers/session-init.ts:41` — UserPromptSubmit.
- `src/cli/handlers/observation.ts:34` — PostToolUse.
- `src/cli/handlers/summarize.ts:17` (import), `:24` (`POLL_INTERVAL_MS = 500`), `:25` (`MAX_WAIT_FOR_SUMMARY_MS = 110_000`), `:44` (`ensureWorkerRunning`), `:89` (`POST /api/sessions/summarize`), `:117150` (poll loop against `/api/sessions/status?contentSessionId=…`), `:156` (`POST /api/sessions/complete`).
- `src/cli/handlers/session-complete.ts:18` (`POST /api/sessions/complete`), `:35` (`ensureWorkerRunning`).
- `src/cli/handlers/file-context.ts:218` (`ensureWorkerRunning`), `:237` (`GET /api/observations/by-file`).
- `src/cli/handlers/file-edit.ts:15` (`POST /api/sessions/observations`), `:32` (`ensureWorkerRunning`).
- `src/services/worker/http/routes/SessionRoutes.ts:375389``setupRoutes` registers **10** routes:
- Legacy `/sessions/:sessionDbId/*` × **6** (`:377` init, `:378` observations, `:379` summarize, `:380` status, `:381` delete, `:382` complete).
- `/api/sessions/*` × **5** (`:385` init, `:386` observations, `:387` summarize, `:388` complete, `:389` status).
- (Earlier sections above register `:setupRoutes` itself on the Express app; the 11 `.get/.post/.delete(` tokens outside `setupRoutes` are internal maps, not routes — verified.)
- `src/shared/hook-constants.ts:2122``HOOK_EXIT_CODES.SUCCESS = 0`. Every handler returns it on the graceful-degradation path (required by CLAUDE.md exit-code strategy — Windows Terminal tab preservation depends on exit 0).
5. Dependency plans: **not yet written on disk**. Plans 01, 05, 07 will be authored in parallel to this one; citations above reference their planned phase numbers per `06-implementation-plan.md` (authoritative sequencing doc).
---
## Endpoint Reality Check (numbers — V9 vs §3.9 claim)
| Source | Claimed current count | Verified current count |
|---|---|---|
| `05-clean-flowcharts.md` §3.1 "Endpoint count: 8 → 4" (line 123) | 8 | — |
| `06-implementation-plan.md` Phase 0 **V9** | — | **10** (six `:377:382` + five `:385:389`) |
| Live `Grep router\.` / `.post/.get/.delete` on `SessionRoutes.ts` (2026-04-22) | — | **10** (confirms V9; §3.9 "8" is an undercount) |
**This plan uses 10 → 4** as the verified target. The §3.1 "8 → 4" claim is footnoted as an undercount of the legacy `/sessions/:sessionDbId/*` subtree.
---
## Hook → Endpoint Mapping (current vs clean)
| Claude Code event | Current hook handler | Current endpoints called | Clean endpoint (§3.1) |
|---|---|---|---|
| SessionStart | `context.ts` | `GET /api/context/inject?projects=…` (`:41`) + (conditionally) `GET /api/context/inject?colors=true` (`:42`) | **`GET /api/session/start?project=…`** — returns `{sessionDbId, contextMarkdown, semanticMarkdown}` |
| SessionStart (parallel) | `user-message.ts` | `GET /api/context/inject?project=…&colors=true` (`:14`) | (same) — reads from the cached `/api/session/start` response in `context.ts`; no second HTTP call |
| UserPromptSubmit | `session-init.ts` | `POST /api/sessions/init` (`:75`), `POST /sessions/{id}/init` (`:141`), `POST /api/context/semantic` (`:23`) | **`POST /api/session/prompt`** `{sessionDbId, prompt}` → returns `{promptId}` (SDK-start implicit inside prompt handler) |
| PostToolUse | `observation.ts` | `POST /api/sessions/observations` (`:17`) | **`POST /api/session/observation`** `{sessionDbId, tool_use_id, name, input, output}``{observationId}` |
| PostToolUse (Cursor file-edit) | `file-edit.ts` | `POST /api/sessions/observations` (`:15`) | **`POST /api/session/observation`** (same endpoint, same payload shape) |
| PreToolUse (file-context gate) | `file-context.ts` | `GET /api/observations/by-file` (`:237`) | Unchanged — this is a read endpoint outside the Session lifecycle; belongs to Plan 08 (DataRoutes), not this one |
| Stop | `summarize.ts` | `POST /api/sessions/summarize` (`:89`) + poll `GET /api/sessions/status` 500 ms × up to 220 iter (`:117150`) + `POST /api/sessions/complete` (`:156`) | **`POST /api/session/end`** `{sessionDbId, last_assistant_message}` — blocks until summary written or 110 s timeout; returns `{summaryId|null}` |
| Stop (phase 2) | `session-complete.ts` | `POST /api/sessions/complete` (`:18`) | **Deleted.** Folded into `POST /api/session/end` (§3.1: "Two-phase Stop handling (summarize then session-complete) — one endpoint, one response"). |
**Endpoints before**: 10 on `SessionRoutes` + 2 on `SearchRoutes` (`/api/context/inject`, `/api/context/semantic`) = 12 lifecycle-touching endpoints.
**Endpoints after**: 4 on `SessionRoutes` (`start`, `prompt`, `observation`, `end`). `/api/context/*` removed (folded into `/api/session/start`).
**Net delete**: 10 4 = **6 from SessionRoutes**; **2 from SearchRoutes**; **8 total**.
---
## Phase Contract (applied to every phase below)
Each phase specifies:
- **(a) What to implement** — "Copy from §X.Y / V-finding / file:line" — no invention.
- **(b) Docs** — `05-clean-flowcharts.md` section + `V8/V9/V10` + live file:line.
- **(c) Verification** — grep counts, before/after.
- **(d) Anti-pattern guards** — **A** (invent hook event types), **B** (polling — replace 500 ms loop with blocking endpoint + SSE), **D** (two context fetches collapse to one `GET /api/session/start`), **E** (duplicate `/api/context/inject` at SessionStart + user-message — single cache).
---
## Phase 1 — Collapse double `/api/context/*` fetches into single `GET /api/session/start`
### (a) What to implement
Copy from `05-clean-flowcharts.md` §3.1 lines 95, 100 (`SS --> SSR["Returns {sessionDbId, contextMarkdown, semanticMarkdown}"]`) and §3.5 line 236 (`generateContext(projects, forHuman=false)` + `generateContext(projects, forHuman=true)` on one route handler).
Switch `context.ts` + `user-message.ts` to a **single** `GET /api/session/start` call. The worker route is produced by Plan 05 Phase 1; this phase only rewires the two hook handlers.
1. **Rewrite `src/cli/handlers/context.ts:4174`**: replace the two-URL `Promise.all([workerHttpRequest(apiPath), showTerminalOutput ? workerHttpRequest(colorApiPath).catch(()=>null) : …])` with one `workerHttpRequest('/api/session/start?project=…&colors=…&semantic=…')`. Parse response as `{sessionDbId, contextMarkdown, humanMarkdown?, semanticMarkdown}`. `contextMarkdown``additionalContext`; `humanMarkdown` (present when `colors=true`) → `systemMessage` block.
2. **Delete `user-message.ts:fetchAndDisplayContext` (lines 1330) entirely.** The parallel SessionStart display becomes a second consumer of `context.ts`'s cached `/api/session/start` result — see Phase 2 for the shared cache. In the interim (before Phase 2 lands), `user-message.ts` calls `/api/session/start?colors=true&display=true` with its own request — one HTTP call, still replaces the old `/api/context/inject` double-call. Remove the `fetchAndDisplayContext` helper + its usage at `:46`.
3. **Delete hook-side calls to `/api/context/inject`** anywhere they appear. Grep: only `context.ts:41,42` + `user-message.ts:1416` touch it. After this phase: zero hook-side references to `/api/context/inject`.
4. `session-init.ts:23` (`POST /api/context/semantic`) moves to Phase 6 (consolidated with session-prompt); leave untouched here.
### (b) Docs
- §3.1 lines 95, 100 — `SS → SSR` edge.
- §3.5 line 236 — `generateContext(projects, forHuman=false)` + `generateContext(projects, forHuman=true)` (dual-strategy render).
- Part 1 items **#12** ("double `/api/context/inject` at SessionStart") and **#14** ("`/api/context/inject` + `/api/context/semantic` both at UserPromptSubmit — fold into `/api/session/start`").
- **V10** — both `context.ts:19` and `user-message.ts:35` currently bootstrap the worker then each fire a GET.
- Live: `src/cli/handlers/context.ts:4174`, `src/cli/handlers/user-message.ts:1330,46`.
### (c) Verification
```
grep -rn "/api/context/inject" src/cli/handlers/ → 0 matches
grep -rn "/api/session/start" src/cli/handlers/ → 2 matches (context.ts + user-message.ts)
grep -c "workerHttpRequest" src/cli/handlers/context.ts → 1 (was 2 — the `apiPath` + `colorApiPath` pair collapses)
```
Snapshot test: capture `additionalContext` bytes from an existing SessionStart fixture and assert byte-equal after the rewire (strategy-driven rendering must be indistinguishable in `forHuman=false` mode).
### (d) Anti-pattern guards
- **D** — no two fetches for the same data. `/api/session/start` is one request returning both markdowns.
- **E** — the parallel SessionStart display in `user-message.ts` shares the response shape; Phase 2 collapses to one cache entry.
- **A** — no new `hookEventName` values. Still `'SessionStart'` at `context.ts:88`.
---
## Phase 2 — Cache `alive=true` in the hook process for the session lifetime
### (a) What to implement
Copy from `05-clean-flowcharts.md` §3.1 "Deleted from old flowchart" bullet 1 ("`ensureWorkerRunning` at every entry point (cache `alive` for the hook lifetime)") + Part 1 item **#13** ("Hook has no shared state. — Cache `alive=true` in the hook process for the session.").
1. **Create `src/hooks/worker-cache.ts`** (new file, ~25 lines):
```ts
// One variable in the hook's process; lives as long as the hook process does.
let alive: boolean | null = null;
// Cached /api/session/start response, shared between context + user-message handlers
// within the same hook process (invoked once per SessionStart fan-out).
let sessionStartResponse: SessionStartResponse | null = null;
export async function ensureWorkerAliveOnce(): Promise<boolean> {
if (alive !== null) return alive;
alive = await originalEnsureWorkerRunning();
return alive;
}
export function cacheSessionStart(response: SessionStartResponse): void { sessionStartResponse = response; }
export function getCachedSessionStart(): SessionStartResponse | null { return sessionStartResponse; }
```
"Hook process" = one Node/Bun invocation per Claude Code hook event. Lifetime ~50 ms ~120 s. Module-scope `let` is sufficient; no cross-process state needed.
2. **Switch all 8 CLI handlers** to import `ensureWorkerAliveOnce` instead of `ensureWorkerRunning`:
- `context.ts:19`, `user-message.ts:35`, `summarize.ts:44`, `observation.ts:34`, `file-context.ts:218`, `file-edit.ts:32`, `session-init.ts:41`, `session-complete.ts:35`.
3. **First-call behaviour**: the first handler in a given hook process spawns/pings the worker (same code path as today's `ensureWorkerRunning` in `src/shared/worker-utils.ts`). Subsequent calls in the **same process** skip.
4. **Cross-handler coordination for SessionStart**: when `context.ts` receives the `/api/session/start` response it calls `cacheSessionStart(response)`. `user-message.ts` (running as a parallel handler in the same hook process when both are wired to SessionStart) calls `getCachedSessionStart()` first; falls back to its own fetch if null (separate hook-process invocations).
### (b) Docs
- §3.1 "Deleted from old flowchart" bullet 1.
- Part 1 item **#13**.
- **V10** — 8 live callsites today.
- Live: `src/shared/worker-utils.ts` (current `ensureWorkerRunning` implementation is the one `ensureWorkerAliveOnce` delegates to internally).
### (c) Verification
```
grep -rn "ensureWorkerRunning" src/cli/handlers/ → 0 matches (was 8 import lines + 8 callsites)
grep -rn "ensureWorkerAliveOnce" src/cli/handlers/ → 8 import + 8 callsite matches
grep -c "ensureWorkerRunning" src/cli/handlers/*.ts → reduces from 8 to 0 (cached)
```
Instrumentation test: start a Claude Code session, trigger SessionStart → UserPromptSubmit → 2× PostToolUse → Stop. Assert the worker's `GET /health` (or equivalent startup ping) is called **once** per hook process, not once per handler. (Today it's 5 calls in the SessionStart fan-out alone.)
### (d) Anti-pattern guards
- **E** — one cache, two readers (`context.ts` + `user-message.ts`). No duplicate cache keys.
- **A** — no `WorkerCacheService` class. Module-scope `let` is sufficient; adding a class would be invention (CLAUDE.md: YAGNI, simple-first).
### Exit-code invariant
The caller still returns `HOOK_EXIT_CODES.SUCCESS` when `ensureWorkerAliveOnce()` returns `false` (worker unavailable → empty context → exit 0). CLAUDE.md exit-code strategy preserved: Windows Terminal tabs continue to close on exit 0 even when the worker is down.
---
## Phase 3 — Replace `summarize.ts` 500 ms poll loop with single blocking `POST /api/session/end`
### (a) What to implement
Copy from `05-clean-flowcharts.md` §3.1 lines 98, 107 (`STOP --> STOPR["Returns {summaryId or null}"]`) + §3.8 lines 346349 (`POST /api/session/end → queueSummarize → await summary_stored flag OR 110s timeout → abortController.abort → Delete`) + Part 2 decision **D6**. The worker-side blocking endpoint is implemented by Plan 07 Phase 2 (per-session `Deferred<SummaryResult>` resolved by `ResponseProcessor` when the summary row is written).
1. **Rewrite `src/cli/handlers/summarize.ts:86167`** (the queue + poll + complete block) into:
```ts
const response = await workerHttpRequest('/api/session/end', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ contentSessionId: sessionId, last_assistant_message: lastAssistantMessage, platformSource }),
timeoutMs: MAX_WAIT_FOR_SUMMARY_MS + 5_000 // 115s — hook times out slightly after server
});
// Response: { summaryId: number | null, timedOut?: boolean }
```
2. **Delete constants** `POLL_INTERVAL_MS = 500` (`:24`) and `POLL_INTERVAL_MS` references. `MAX_WAIT_FOR_SUMMARY_MS` stays — migrates from poll-duration cap to HTTP-client timeout (preserves the 110 s semantic).
3. **Delete the poll loop** (`summarize.ts:117150`).
4. **Delete the explicit session-complete call** (`summarize.ts:155161`) — folded into the worker's `/api/session/end` handler on the other side of the wire.
5. **Preserve the subagent guard** at `:3441` (exits early before any HTTP).
6. **Preserve the transcript-extract guard** at `:6078` (exits 0 when no assistant message).
7. **Preserve the exit-code contract**: successful completion, timeout, and worker-unreachable all return `HOOK_EXIT_CODES.SUCCESS` (exit 0). This matches today's `summarize.ts:47,56,67,77,103,107,167` — every return path exits 0. CLAUDE.md exit-code strategy: Windows Terminal closes tabs on exit 0, so the 110 s timeout path must also exit 0, not 2.
### (b) Docs
- §3.1 lines 98, 107 — STOP edge.
- §3.8 lines 346349 — `End → Queue_Sum → WaitSum → Abort → Delete`.
- Part 2 **D6** (blocking endpoints over polling, line 79).
- Part 4 timer census line 520 (`Summary poll (500 ms × 220 iter)` ✓ before / ✗ after).
- **V8**`summarize.ts:117150` + `:24` + `:25`.
- **V9**`/api/sessions/status` is deleted in Phase 5.
- Live: `src/cli/handlers/summarize.ts:2425,86167`.
### (c) Verification
```
grep -n "POLL_INTERVAL_MS" src/ → 0 matches
grep -n "MAX_WAIT_FOR_SUMMARY_MS" src/cli/handlers/summarize.ts → 1 match (used as HTTP timeout)
grep -n "/api/sessions/status" src/ → 0 matches in src/cli/
grep -n "/api/session/end" src/cli/handlers/summarize.ts → 1 match
wc -l src/cli/handlers/summarize.ts → < 90 (was 169)
```
End-to-end: run a Claude Code session that produces a summary. Assert the Stop hook returns within ~(summary-processing time + 1 s), not ≥500 ms (the old minimum due to the first poll interval). Assert no `GET /api/sessions/status` requests hit the worker log.
Timeout path test: configure the SDK agent to hang past 110 s. Assert Stop hook returns exit 0 with `summaryId: null, timedOut: true`. **This is the exit-code invariant that CLAUDE.md's Windows Terminal note demands — confirm explicitly** (see "Confidence + Gaps" below).
### (d) Anti-pattern guards
- **B** — polling replaced by blocking endpoint + HTTP-client timeout. The hook-side client timeout is `MAX_WAIT_FOR_SUMMARY_MS + 5_000` to give the server side first claim on the 110 s budget.
- **A** — no new `SessionStopResult` type; reuse the existing `{summaryId, timedOut?}` shape Plan 07 Phase 2 defines.
---
## Phase 4 — Delete `/sessions/:sessionDbId/*` legacy endpoints (6)
### (a) What to implement
Copy from `06-implementation-plan.md` Phase 11 step 3 ("Delete the old 10 endpoints under `/sessions/:sessionDbId/*` and `/api/sessions/*` after all hook-side callers are switched"). Also §3.9 line 403 (SessionRoutes: "`/api/session/*` (4 endpoints — see 3.1)").
1. **Delete registrations** at `SessionRoutes.ts:377382`:
- `app.post('/sessions/:sessionDbId/init', this.handleSessionInit.bind(this));`
- `app.post('/sessions/:sessionDbId/observations', this.handleObservations.bind(this));`
- `app.post('/sessions/:sessionDbId/summarize', this.handleSummarize.bind(this));`
- `app.get('/sessions/:sessionDbId/status', this.handleSessionStatus.bind(this));`
- `app.delete('/sessions/:sessionDbId', this.handleSessionDelete.bind(this));`
- `app.post('/sessions/:sessionDbId/complete', this.handleSessionComplete.bind(this));`
2. **Delete handler methods** `handleSessionInit`, `handleObservations`, `handleSummarize`, `handleSessionStatus`, `handleSessionDelete`, `handleSessionComplete` (the legacy six) if no other code references them.
3. Keep the `handle*ByClaudeId` variants in place *for this phase* — Phase 5 deletes `/api/sessions/status` specifically; Phase 6 replaces the remaining four `/api/sessions/*` with the unified four `/api/session/*`.
### (b) Docs
- §3.1 line 123 ("Endpoint count: 8 → 4") — corrected to **10 → 4** per V9.
- §3.9 line 403 — final target `R3["SessionRoutes: /api/session/* (4 endpoints — see 3.1)"]`.
- **V9**.
- Live: `src/services/worker/http/routes/SessionRoutes.ts:377382`.
### (c) Verification
```
grep -n "app\.\(post\|get\|delete\)\('/sessions/" src/services/worker/http/routes/SessionRoutes.ts → 0 matches
grep -n "app\.\(post\|get\|delete\)\('/api/sessions/" src/services/worker/http/routes/SessionRoutes.ts → 5 matches (Phase 5+6 reduce to 0)
wc -l src/services/worker/http/routes/SessionRoutes.ts → drops by ~250 lines (legacy handlers removed)
```
Integration test: send `POST /sessions/1/init` to a running worker. Assert `404`. Send to `/api/session/prompt` (Phase 6's replacement). Assert `200`.
### (d) Anti-pattern guards
- **D** — pure deletion; no "forwarding shim" to the new endpoints.
- **A** — no "LegacySessionRoutes" compatibility module. Delete means delete. Users who pinned an old plugin version still have the old worker binary shipped with their install.
---
## Phase 5 — Delete `/api/sessions/status` (polling endpoint is obsolete)
### (a) What to implement
Copy from §3.1 "Deleted from old flowchart" bullet 5 ("500-ms poll loop on `/api/sessions/status` (replaced by blocking `/api/session/end`)"). Phase 3 removes the only consumer; this phase deletes the supply.
1. **Delete registration** at `SessionRoutes.ts:389` (`app.get('/api/sessions/status', this.handleStatusByClaudeId.bind(this));`).
2. **Delete handler method** `handleStatusByClaudeId` + any private helpers it uses (if no other code references them).
3. Sanity-grep for any residual polling client.
### (b) Docs
- §3.1 deletion bullet 5.
- Part 2 **D6**.
- **V9** (endpoint 10 of 10).
- Live: `src/services/worker/http/routes/SessionRoutes.ts:389`.
### (c) Verification
```
grep -rn "/api/sessions/status" src/ → 0 matches (hook side removed in Phase 3)
grep -n "handleStatusByClaudeId" src/ → 0 matches
```
### (d) Anti-pattern guards
- **B** — no polling endpoint means no one can be tempted to re-add a 500 ms loop against it later.
---
## Phase 6 — Consolidate `session-init` / `session-complete` handlers into unified session endpoints
### (a) What to implement
Copy from §3.1 diagram edges:
- `UPS["POST /api/session/prompt<br/>{sessionDbId, prompt}"] --> UPSR["Returns {promptId}"]` (lines 96, 103).
- `PTU["POST /api/session/observation<br/>{sessionDbId, tool_use_id, name, input, output}"] --> PTUR["Returns {observationId}"]` (lines 97, 105).
- "Deleted" bullet 3: "`POST /sessions/{id}/init` SDK-start endpoint (implicit inside `/api/session/prompt`)".
- "Deleted" bullet 6: "Two-phase Stop handling (summarize then session-complete) — one endpoint, one response".
1. **Rewrite `src/cli/handlers/session-init.ts:72150`** as a single `POST /api/session/prompt` call:
- Replace `/api/sessions/init` (`:75`) + `/sessions/{sessionDbId}/init` (`:141`) + `/api/context/semantic` (`:23`) with one `workerHttpRequest('/api/session/prompt', {body: JSON.stringify({sessionId, project, prompt, platformSource})})`.
- The worker-side `/api/session/prompt` handler (implemented by Plan 07 Phase 3) does: (a) resolve/create `sessionDbId`, (b) `ingestPrompt` (Plan 01 Phase 2), (c) start the SDK agent if not already running for this session, (d) fetch semantic markdown via `SearchOrchestrator`, (e) return `{promptId, sessionDbId, semanticMarkdown?}`.
- `session-init.ts` passes `semanticMarkdown` into `additionalContext` (preserves the user-facing semantic injection feature — §3.5 + §3.1 `SS → SSR`).
2. **Rewrite `src/cli/handlers/observation.ts:17`** to call `POST /api/session/observation` with the new `{sessionDbId, tool_use_id, name, input, output}` payload. `tool_use_id` is passed through from the Claude Code hook input (already captured in `NormalizedHookInput` — verify before landing; if not, Plan 01 Phase 2 adds it because the UNIQUE constraint in Phase 9 depends on it).
3. **Rewrite `src/cli/handlers/file-edit.ts:15`** similarly — same endpoint, Cursor flow generates a synthetic `tool_use_id` (`file-edit:<path>:<mtime>`) if none exists.
4. **Delete `src/cli/handlers/session-complete.ts` entirely.** Its only role (mark session inactive) moves server-side into `/api/session/end`.
5. **Delete hook wiring** for the Stop-phase-2 `sessionCompleteHandler` in the adapter layer (`src/cli/adapters/claude-code.ts` — verify dispatcher mapping; this handler was the second callsite for the Stop event, feeding the old two-phase flow).
6. **Delete the remaining four `/api/sessions/*` legacy endpoints** at `SessionRoutes.ts:385388` (`init`, `observations`, `summarize`, `complete`) — Phase 5 already deleted `status`. Their handlers `handleSessionInitByClaudeId`, `handleObservationsByClaudeId`, `handleSummarizeByClaudeId`, `handleCompleteByClaudeId` are deleted.
### (b) Docs
- §3.1 lines 96, 97, 103, 105 + deletion bullets 3, 6.
- §3.8 lines 325332 (A `POST /api/session/prompt``SessionManager.initializeSession → Create → ActiveSession → spawn SDK`) — implicit SDK start.
- **V9** endpoints `:385:388`.
- Live: `src/cli/handlers/session-init.ts:75,141,23`; `src/cli/handlers/observation.ts:17`; `src/cli/handlers/file-edit.ts:15`; `src/cli/handlers/session-complete.ts` (entire file).
### (c) Verification
```
grep -rn "/api/sessions/" src/ → 0 matches (all five legacy paths deleted)
grep -rn "/sessions/.*sessionDbId" src/ → 0 matches (legacy six deleted in Phase 4)
grep -rn "/api/session/" src/ → exactly 4 distinct paths: start, prompt, observation, end
grep -rn "/api/context/semantic" src/ → 0 matches (folded into /api/session/prompt)
grep -rn "sessionCompleteHandler" src/ → 0 matches (file deleted)
test -f src/cli/handlers/session-complete.ts → false
```
End-to-end: full SessionStart → UserPromptSubmit → PostToolUse × 3 → Stop cycle against a fresh worker. Assert exactly these HTTP calls (verified via worker access log):
1. `GET /api/session/start?project=…` (SessionStart, from `context.ts`)
2. (Maybe) `GET /api/session/start?project=…&colors=true` (SessionStart parallel, from `user-message.ts`) — **if Phase 2 cache misses because the two handlers run in separate hook processes; otherwise 0 calls.**
3. `POST /api/session/prompt` (UserPromptSubmit)
4. `POST /api/session/observation` × 3 (PostToolUse)
5. `POST /api/session/end` (Stop)
Total: 5 or 6 HTTP calls per session (was 1014: one `ensureWorkerRunning` ping per handler + two `/api/context/inject` + `/api/sessions/init` + `/sessions/1/init` + `/api/context/semantic` + 3× `/api/sessions/observations` + `/api/sessions/summarize` + ~220× poll `/api/sessions/status` + `/api/sessions/complete` × 2).
### (d) Anti-pattern guards
- **A** — no new event type; `POST /api/session/prompt` maps 1:1 to the existing UserPromptSubmit hook. No `hookEventName` changes.
- **D**`/api/session/prompt` is the single source of truth for "start processing this user prompt". No facade calling an internal `/api/sessions/init`.
- **E**`session-init.ts` and `observation.ts` both land on the same backend `ingestObservation`/`ingestPrompt` helpers via their respective endpoints; no duplicate tag-strip / privacy check paths.
---
## Phase 7 — Verification (grep counts, exit codes, Windows Terminal)
### (a) What to verify
1. **Grep counts** (final "clean" state):
```
grep -rn "ensureWorkerRunning" src/cli/handlers/ → 0
grep -rn "ensureWorkerAliveOnce" src/cli/handlers/ → 8
grep -n "POLL_INTERVAL_MS" src/ → 0
grep -n "MAX_WAIT_FOR_SUMMARY_MS" src/cli/handlers/summarize.ts → 1 (HTTP client timeout)
grep -rn "/api/sessions/" src/ → 0
grep -rn "/sessions/.*sessionDbId" src/ → 0
grep -rn "/api/context/inject" src/ → 0
grep -rn "/api/context/semantic" src/ → 0
grep -rn "/api/session/" src/ → exactly 4 paths
grep -c "app\.\(post\|get\|delete\)" src/services/worker/http/routes/SessionRoutes.ts → 4
```
2. **Exit-code census** (preserves CLAUDE.md contract):
- Every hook-handler return path uses `HOOK_EXIT_CODES.SUCCESS` (= 0) on the graceful-degradation branch. Run:
```
grep -B1 "HOOK_EXIT_CODES" src/cli/handlers/*.ts
```
Expected: exit 0 on (worker-unreachable, empty context, empty transcript, 110 s timeout, subagent, project excluded). No new exit 2 paths.
- Windows Terminal tab behaviour: exit 0 closes the tab on successful completion. The blocking `/api/session/end` 110 s path MUST also return exit 0 (not exit 2), so tabs close on timeout. Ship a Windows-Terminal integration test: trigger a synthetic 110 s timeout; confirm tab closes.
3. **Timer census**:
```
grep -n "setInterval\|setTimeout.*recursive" src/cli/ → 0 in CLI handlers
grep -n "setTimeout.*POLL" src/cli/ → 0
```
4. **Endpoint count** on `SessionRoutes.ts`: exactly **4** route registrations. Matches §3.1.
### (b) Docs
- Whole §3.1 diagram, Part 4 timer census, Part 5 deletion ledger rows for "Summarize 500-ms polling hook" and "Double `/api/context/*` fetches".
- **V8**, **V9**, **V10**.
- CLAUDE.md exit-code strategy section ("Exit 0: Success or graceful shutdown — Windows Terminal closes tabs").
### (c) Verification (running the phase)
The phase produces no new code; it runs the grep + integration tests above and fails the rollout if any gate trips. Land only when:
- all greps pass,
- synthetic 110 s timeout → exit 0 → tab closes (Windows),
- full session cycle reports 56 HTTP calls (was 1014).
### (d) Anti-pattern guards
- **B/D/E** — verified by absence (grep). **A** — verified by "`hookEventName` value set unchanged" (`SessionStart`, `UserPromptSubmit`, `PostToolUse`, `Stop`).
---
## Copy-Ready Snippet Locations
**Hook-side session-alive cache (Phase 2)**:
Location: new file `src/hooks/worker-cache.ts` (create; this is the one file added by this plan).
Shape: one module-scope `let alive: boolean | null = null;` + one `let sessionStartResponse: SessionStartResponse | null = null;`. Lives as long as the hook process does (≤120 s). No persistence, no cross-process sharing — that's the point. Plan 07 owns the *server-side* session state; Plan 09 owns only the per-hook-process cache.
**Poll loop deletion target (Phase 3)**:
`src/cli/handlers/summarize.ts:117150` — the entire `while ((Date.now() - waitStart) < MAX_WAIT_FOR_SUMMARY_MS) { await sleep(POLL_INTERVAL_MS); … }` block plus `summarize.ts:24` (`POLL_INTERVAL_MS = 500`).
**Double-fetch deletion target (Phase 1)**:
`src/cli/handlers/context.ts:4157` (the `Promise.all([workerHttpRequest(apiPath), workerHttpRequest(colorApiPath)])`) + `src/cli/handlers/user-message.ts:1330` (`fetchAndDisplayContext`).
**`ensureWorkerRunning` 8 callsites (Phase 2 rewires all 8)**:
```
src/cli/handlers/context.ts:19
src/cli/handlers/user-message.ts:35
src/cli/handlers/session-init.ts:41
src/cli/handlers/observation.ts:34
src/cli/handlers/summarize.ts:44
src/cli/handlers/session-complete.ts:35 (file deleted in Phase 6 — callsite deleted with it)
src/cli/handlers/file-context.ts:218
src/cli/handlers/file-edit.ts:32
```
---
## Confidence + Gaps
### High confidence
- Hook → endpoint mapping (enumerated against live code).
- V8/V9/V10 verified against `Grep` output this session (2026-04-22).
- Endpoint count **10 → 4** verified at `SessionRoutes.ts:377389` — supersedes the §3.1 "8 → 4" claim.
- `HOOK_EXIT_CODES.SUCCESS = 0` is the sole value used in every return branch of every handler today. Every phase preserves exit-0 semantics.
### Gaps (call out before executing)
1. **Stop-hook exit codes on 110 s timeout path — NEEDS CONFIRMATION.** Current `summarize.ts` returns exit 0 on all branches (poll timeout falls through to `/api/sessions/complete``return { exitCode: undefined }` implicitly → adapter defaults to 0). The new blocking `/api/session/end` must explicitly return exit 0 when the server responds `{timedOut: true, summaryId: null}`. §3.1 ("Exit 0") and CLAUDE.md ("Exit 0: graceful shutdown — Windows Terminal closes tabs") agree. **Phase 3 verification step must include a synthetic-timeout Windows Terminal test** — otherwise the refactor could silently introduce an exit-2 path that blocks tab closure, which CLAUDE.md explicitly warns against.
2. **`tool_use_id` availability in CLI hook payloads.** `POST /api/session/observation` requires `tool_use_id` (§3.1 `PTU` edge). Current `NormalizedHookInput` may or may not already carry it — `src/shared/NormalizedHookInput` needs a verification pass in Phase 6 (deferred to Plan 01 Phase 2 if absent). This gates the UNIQUE constraint in Plan 09 Phase 9 (SQLite); out of scope here but a coupling to flag.
3. **`user-message.ts` + `context.ts` run as separate hook processes on some Claude Code versions.** Module-scope `let` in `worker-cache.ts` won't share state across processes. If the Claude Code hook runner invokes them sequentially in one process: 1 HTTP call. If in parallel processes: 2 HTTP calls (still one each, still ≤2 total — acceptable, same as today's `/api/context/inject` double-fetch but under the new endpoint). **Not a correctness issue; a minor perf claim in Phase 1 verification needs empirical confirmation, not a blocker.**
### Out-of-scope adjacencies (flagged)
- Worker-side implementation of `GET /api/session/start`, `POST /api/session/prompt`, `POST /api/session/end` → Plans 05 + 07.
- `ingestObservation`/`ingestPrompt`/`ingestSummary` helpers → Plan 01.
- `file-context.ts` `GET /api/observations/by-file` endpoint → Plan 08 (DataRoutes), not touched here.
- `pre-compact.ts` (delegates to `summarizeHandler`) inherits the Phase 3 rewrite automatically; no extra work.
---
## Summary
- **7 phases**, executed in order (1 → 7). Phases 1, 2, 3 are independent of each other on the **hook side** (different files) but all depend on worker-side Plans 01, 05, 07 Phase-N endpoints existing; Phases 4, 5, 6 delete worker-side code after hooks stop calling it.
- **Lines deleted (hook side)**: `summarize.ts` loses ~80 lines (lines 86167 collapse to ~10); `user-message.ts` loses ~17 lines; `context.ts` loses ~15 lines; `session-complete.ts` deleted entirely (65 lines); `session-init.ts` loses ~60 lines. **~237 lines gone** from `src/cli/handlers/`.
- **Lines deleted (worker side, SessionRoutes.ts)**: ~250 lines (6 legacy handlers + 5 ByClaudeId handlers).
- **Lines added**: `src/hooks/worker-cache.ts` ~25 lines; 8 handler rewires net ~0. **Total net**: ~-460 lines in this plan's scope (consistent with Part 5 ledger rows `-60/+20` summarize + `-120/+60` context = **-100 net**, plus the Phase 4+5+6 SessionRoutes delete not counted in §5 because §5 lumped it into "session-lifecycle-management").
- **Top gaps**: (1) 110 s timeout exit code must be 0 (Windows Terminal contract); (2) `tool_use_id` presence in `NormalizedHookInput` needs verification before Phase 6.
@@ -0,0 +1,391 @@
# Plan 10 — knowledge-corpus-builder (clean)
**Target section**: `PATHFINDER-2026-04-21/05-clean-flowcharts.md` § 3.11 (lines 450476), Part 1 items #35 (line 53) and #36 (line 54).
**Before-state**: `PATHFINDER-2026-04-21/01-flowcharts/knowledge-corpus-builder.md` (lines 187).
**Implementation-plan correspondence**: `PATHFINDER-2026-04-21/06-implementation-plan.md` Phase 13 — "KnowledgeAgent simplification" (lines 567597). **Direct V-number: NONE** — the verified-findings matrix (V1V20, lines 2247) does not include a corpus-specific entry. No upstream discrepancy was registered for this area; treat 05 § 3.11 + Phase 13 as the canonical pair.
## Dependencies
- **Upstream**:
- Plan 05-context-injection-engine — defines `CorpusDetailStrategy` (one of the four strategy configs in 05 § 3.5 lines 232259 and Part 2 decision D4 line 75). This plan calls `renderObservations(obs, CorpusDetailStrategy)` from CorpusBuilder.
- Plan 06-hybrid-search-orchestration — defines the clean `SearchOrchestrator.search` signature (05 § 3.6 lines 262292). CorpusBuilder is a *consumer* — the live call is `SearchOrchestrator.search(args)` at `src/services/worker/search/SearchOrchestrator.ts:71`.
- **Downstream**: none.
## Phase 0 — Documentation Discovery (already done)
### Sources consulted
1. `PATHFINDER-2026-04-21/05-clean-flowcharts.md` — full file (607 lines). Section 3.11 (lines 450476) is canonical; Part 1 items #3536 (lines 5354) set the kill rationale; Part 5 ledger row (line 556) promises ~110 net lines deleted in this area.
2. `PATHFINDER-2026-04-21/06-implementation-plan.md` — full file (691 lines). Phase 13 (lines 567597). **No V-number in 06's verified-findings table (V1V20) covers the corpus.** Stated explicitly: Phase 13 cites 05 § 3.11 directly without a V-correction, because the audit's claims matched the live code.
3. `PATHFINDER-2026-04-21/01-flowcharts/knowledge-corpus-builder.md` — full file (87 lines). "Before" flowchart + the Confidence+Gaps section pinpoints the regex at `KnowledgeAgent.ts:179`.
4. Live codebase (confirmed paths, line counts, and specific anchors):
- `src/services/worker/knowledge/KnowledgeAgent.ts` (284 lines)
- `src/services/worker/knowledge/CorpusStore.ts` (127 lines)
- `src/services/worker/knowledge/CorpusBuilder.ts` (174 lines)
- `src/services/worker/knowledge/CorpusRenderer.ts` (133 lines)
- `src/services/worker/knowledge/types.ts` (56 lines)
- `src/services/worker/knowledge/index.ts` (14 lines)
- `src/services/worker/http/routes/CorpusRoutes.ts` (283 lines)
- `src/services/worker-service.ts:455-456` — constructor wiring
- `src/servers/mcp-server.ts:499,517,551` — MCP tool surface that mirrors HTTP
5. Dependency plans (cross-refs only, not re-planned here):
- 05 § 3.5 (CorpusDetailStrategy) — renderer contract at 05 lines 379389
- 05 § 3.6 (SearchOrchestrator.search) — live signature at `src/services/worker/search/SearchOrchestrator.ts:71`.
### Allowed APIs (copy from; do not invent)
- **Claude Agent SDK**`query({ prompt, options })` already used at `KnowledgeAgent.ts:75` and `:190`. Per 05 § 3.11 (line 461 node "S"): call as `SDK.query(systemPrompt=corpus, userPrompt=question)` — a fresh query every call. The existing SDK usage patterns (cwd, disallowedTools, pathToClaudeCodeExecutable, env) at `KnowledgeAgent.ts:77-84` stay.
- **Prompt caching** — the SDK supplies it automatically when the same system prompt is sent within the 5-min TTL. 05 § 3.11 "Cost note" (lines 476): "cached system prompt TTL is 5 min. Cost approximately equal to session-resume path without the session-expiration brittleness." The refactor does not add any caching code — it relies on the SDK's own behavior.
- **CorpusDetailStrategy** — comes from Plan 05 (renderer contract at 05 lines 379389). This plan consumes it; it does not define it.
- **`bun:sqlite` / file I/O** — `CorpusStore` already uses `fs.writeFileSync/readFileSync`. No new storage primitives.
### Anti-patterns to prohibit (cited in every phase)
- **A — Invent SDK methods for session resume.** The SDK has no documented session-expiry ping or refresh endpoint. Don't add one.
- **B — Polling.** The regex test `/session|resume|expired|invalid.*session|not found/i` at `KnowledgeAgent.ts:179` is a polling heuristic in disguise — try, match on error text, retry. Delete.
- **C — Silent fallback.** The current "session expired → silently reprime → retry" path at `KnowledgeAgent.ts:146160` hides a contract violation. Replacement contract: every `/query` runs a **fresh** SDK query; there is no expiration state to recover from.
- **D — Facades that pass through.** `KnowledgeAgent.reprime` at `KnowledgeAgent.ts:168171` is a two-line call to `prime`. Both die together.
- **E — Two code paths for the same data.** After the refactor, there is exactly one path that sends a corpus to the SDK: inside the `/query` handler.
### Corpus.json schema change (from `types.ts:4051`)
Before:
```ts
interface CorpusFile {
version: 1;
name: string;
description: string;
created_at: string;
updated_at: string;
filter: CorpusFilter;
stats: CorpusStats;
system_prompt: string;
session_id: string | null; // <-- DROP
observations: CorpusObservation[];
}
```
After (per 06 Phase 13 task 2, line 579 — with this plan's note that observations stay because `/query` still needs them to build the system prompt):
```ts
interface CorpusFile {
version: 2; // bump so older files with session_id are recognized
name: string;
description: string;
created_at: string;
updated_at: string;
filter: CorpusFilter;
stats: CorpusStats;
system_prompt: string;
observations: CorpusObservation[];
}
```
> 06 Phase 13 line 579 suggests trimming further to `{name, filters, renderedCorpus, generatedAt}`. This plan keeps the richer shape so `/query` can recompute `renderObservations(obs, CorpusDetailStrategy)` on demand without re-hitting SQLite. If the stored `system_prompt` + observations combined are too large, switch to storing `renderedCorpus` directly; decision flagged in "Gaps" below.
### HTTP surface (constraint from prompt)
Keep:
- `POST /api/corpus` (build)
- `POST /api/corpus/:name/query`
- `POST /api/corpus/:name/rebuild`
- `DELETE /api/corpus/:name`
- `GET /api/corpus` (list) and `GET /api/corpus/:name` (get) — present today at `CorpusRoutes.ts:29-30`; 05 § 3.11 doesn't mention them but they are user-facing read endpoints. Keep.
Delete (per 05 § 3.11 lines 468474):
- `POST /api/corpus/:name/prime` (handler at `CorpusRoutes.ts:33` / `:213-228`)
- `POST /api/corpus/:name/reprime` (handler at `CorpusRoutes.ts:35` / `:267-282`)
---
## Phase 1 — Remove `session_id` from the corpus schema and `CorpusStore`
### (a) What to implement — Copy from …
- Copy from **05 § 3.11 line 470**: "`session_id` persisted in corpus.json" is in the deleted list. Also **06 Phase 13 task 2** (line 579): "Simplify `CorpusStore`… No `session_id`."
### (b) Docs
- 05 § 3.11 (lines 450474) — sets the "no session_id" rule.
- 06 Phase 13 task 2 (line 579) — task text.
- Live file:line targets:
- `src/services/worker/knowledge/types.ts:49``session_id: string | null;` inside `CorpusFile`. Remove.
- `src/services/worker/knowledge/types.ts:40` — bump `version: 1``version: 2`.
- `src/services/worker/knowledge/types.ts:53-56``QueryResult { answer, session_id }`. Remove `session_id` from `QueryResult` (new shape: `{ answer }`).
- `src/services/worker/knowledge/CorpusStore.ts:61, :67, :77``list()` return type drops `session_id`; payload builder at `:74-78` drops the field.
- `src/services/worker/knowledge/CorpusBuilder.ts:104` — literal `session_id: null` inside the built corpus. Delete the line.
### (c) Verification
- `grep -n "session_id" src/services/worker/knowledge/` → zero lines. (Today: 18 matches across KnowledgeAgent.ts, CorpusStore.ts, CorpusBuilder.ts, types.ts.)
- Compile clean: `npx tsc --noEmit`.
- Unit test: `CorpusStore.read` on a legacy corpus file that still has `session_id` returns a valid `CorpusFile` (extra field ignored by the structural cast, or migrated — see "Blast radius" note below).
- `corpus.json` schema assertion (new integration test): build a corpus; read the file back with `JSON.parse`; assert `!("session_id" in parsed)`.
### (d) Anti-pattern guards
- **A**: Don't add a "migration helper" that re-writes old `session_id: "..."` fields into some new shape. Ignore the field on read; the worker never re-emits it.
- **C**: Don't default `session_id` to `null` "for backward compat" — drop the field outright.
---
## Phase 2 — Delete `KnowledgeAgent.prime` as a distinct operation
### (a) What to implement — Copy from …
- Copy from **05 § 3.11 deleted list, line 469**: "`KnowledgeAgent.prime` as a distinct operation — build IS prime (corpus.json is the prime artifact)."
- 06 Phase 13 task 1 (line 578).
### (b) Docs
- 05 § 3.11 (lines 450474) — deleted-nodes rationale.
- Live file:line targets:
- `src/services/worker/knowledge/KnowledgeAgent.ts:52-117` — entire `prime()` method (66 lines). Delete.
- `src/services/worker/knowledge/KnowledgeAgent.ts:163-171` — entire `reprime()` method (9 lines). Delete (see Phase 4 for endpoint). `reprime` just calls `prime`, so it dies with it (anti-pattern **D**).
- `src/services/worker/knowledge/KnowledgeAgent.ts:12-41` — imports `OBSERVER_SESSIONS_DIR`, `ensureDir`, `buildIsolatedEnv`, `sanitizeEnv`, `KNOWLEDGE_AGENT_DISALLOWED_TOOLS`. Some still used by the rewritten `query()` in Phase 5; reassess after Phase 5 lands. The disallowedTools list at `:28-41` stays (still applied per call per 05 § 3.11 — Q&A only).
### (c) Verification
- `grep -n "^\s*async prime\|\.prime(" src/services/worker/knowledge/` → zero.
- `grep -n "async reprime\|\.reprime(" src/services/worker/knowledge/` → zero.
- Corpus still builds end-to-end: `curl -X POST /api/corpus -d '{"name":"t","limit":5}'` returns metadata; the resulting `~/.claude-mem/corpora/t.corpus.json` has observations + system_prompt but no SDK session was spawned during build.
- `wc -l src/services/worker/knowledge/KnowledgeAgent.ts` drops by roughly 75 lines (prime 66 + reprime 9). Tracked against the 110-line net-delete target in 05 Part 5.
### (d) Anti-pattern guards
- **A**: Don't add `buildAndPrime(corpus)` as a "unified" helper. Build *is* prime; the SDK is not touched at build time anymore.
- **D**: `reprime` is a pass-through; delete the method, don't keep a stub.
---
## Phase 3 — Delete the auto-reprime regex and the session-expiration retry path
### (a) What to implement — Copy from …
- Copy from **05 Part 1 line 53** (item #35): "KnowledgeAgent auto-reprime on session-expiration regex match … just always prime on query — or store corpus content in a file the SDK loads fresh. No session_id persistence."
- Copy from **05 § 3.11 deleted list, line 471**: "Auto-reprime on regex-matched expiration (~40 lines)."
### (b) Docs
- 05 Part 1 #35 (line 53) — kill rationale.
- 05 § 3.11 (lines 450474) — replacement flow ("SDK.query(systemPrompt=corpus, userPrompt=question) — fresh query — no session resume").
- Live file:line targets:
- `src/services/worker/knowledge/KnowledgeAgent.ts:119-161``query()` method with its try/catch auto-reprime branch. Delete the entire body; Phase 5 rewrites it.
- `src/services/worker/knowledge/KnowledgeAgent.ts:173-180``isSessionResumeError()`. **Exact regex to delete** (captured at `:179`):
```
/session|resume|expired|invalid.*session|not found/i
```
Delete the whole method.
- `src/services/worker/knowledge/KnowledgeAgent.ts:183-230``executeQuery()` (the resume path). Delete; Phase 5 replaces it.
### (c) Verification
- `grep -n "isSessionResumeError\|auto.?reprime\|session.*expired" src/services/worker/knowledge/` → zero.
- `grep -nE "session\|resume\|expired\|invalid.*session\|not found" src/services/worker/knowledge/` → zero (the raw regex string is gone).
- No retry-on-error logic anywhere in `KnowledgeAgent`. A failed `/query` call propagates to the route handler as a thrown error, returned to the client as `{error: '…'}`.
### (d) Anti-pattern guards
- **B**: Do not replace the regex with a different error-string match. The whole "detect expiry → retry" pattern goes.
- **C**: If `SDK.query` throws, do **not** silently reprime and retry. Propagate. The caller decides.
- **A**: The SDK does not expose a `refreshSession` or `isSessionValid` method — confirmed by the existing usage in `SDKAgent.ts` (not imported for our code path). Don't invent one.
---
## Phase 4 — Delete `/prime` and `/reprime` endpoints
### (a) What to implement — Copy from …
- Copy from **05 § 3.11 deleted list, lines 472474**: "`reprime` endpoint (rebuild covers it)" and (by implication) `prime` endpoint (since `prime` as an operation is gone).
- 06 Phase 13 task 1 (line 578): "Delete `KnowledgeAgent.prime` and the `reprime` endpoint."
### (b) Docs
- Constraint from the request: keep `POST /api/corpus`, `POST /api/corpus/:name/query`, `POST /api/corpus/:name/rebuild`, `DELETE /api/corpus/:name`. Drop `/prime` and `/reprime`.
- Live file:line targets:
- `src/services/worker/http/routes/CorpusRoutes.ts:33``app.post('/api/corpus/:name/prime', …)` registration. Delete.
- `src/services/worker/http/routes/CorpusRoutes.ts:35``app.post('/api/corpus/:name/reprime', …)` registration. Delete.
- `src/services/worker/http/routes/CorpusRoutes.ts:209-228``handlePrimeCorpus` handler (20 lines). Delete.
- `src/services/worker/http/routes/CorpusRoutes.ts:263-282``handleReprimeCorpus` handler (20 lines). Delete.
- `src/servers/mcp-server.ts:499` — MCP tool `prime_corpus`. Delete (tool registration + handler). The deferred-tool namespace exposes it today as `mcp__plugin_claude-mem_mcp-search__prime_corpus`.
- `src/servers/mcp-server.ts:551` — MCP tool `reprime_corpus`. Delete.
- `src/servers/mcp-server.ts:517``query_corpus` description mentions "The corpus must be primed first"; update to "Ask a question about the corpus; the corpus content is loaded fresh per query."
### (c) Verification
- `curl -X POST http://localhost:37777/api/corpus/foo/prime` → HTTP 404 (route no longer registered; Express default 404).
- `curl -X POST http://localhost:37777/api/corpus/foo/reprime` → HTTP 404.
- `grep -n "prime_corpus\|reprime_corpus" src/` → zero.
- `grep -n "handlePrimeCorpus\|handleReprimeCorpus" src/` → zero.
- MCP client listing no longer shows `prime_corpus` or `reprime_corpus` tools.
### (d) Anti-pattern guards
- **D**: Don't leave thin `/prime` and `/reprime` handlers that just return 410 Gone. Delete the routes; 404 is the correct response.
- **A**: Don't add a compatibility-shim tool `prime_corpus_deprecated`.
---
## Phase 5 — Rewrite `/query` to issue a fresh SDK query with corpus content as system prompt
### (a) What to implement — Copy from …
- Copy from **05 § 3.11 lines 460463** (the clean flowchart):
```
Q["POST /api/corpus/:name/query {question}"] --> R["CorpusStore.read(name)"]
R --> S["SDK.query(systemPrompt=corpus, userPrompt=question) (fresh query — no session resume)"]
S --> T["Return answer"]
```
- Copy from **06 Phase 13 task 3** (line 580): "Rewrite `KnowledgeAgent.query` to always pass `systemPrompt = renderedCorpus` to the SDK. Claude prompt-caching reduces cost when the same corpus is queried repeatedly within the 5-min TTL."
### (b) Docs
- 05 § 3.11 (lines 450476), especially the Cost note (line 476).
- Live file:line targets:
- `src/services/worker/knowledge/KnowledgeAgent.ts` — new `query(corpus, question)` body. Copy the SDK-invocation pattern from the current `executeQuery` at `:185-230`, but with:
- `prompt: question` (user prompt)
- `options.systemPrompt: renderedCorpus` (new — load the corpus as system prompt)
- **Remove** `options.resume: corpus.session_id` (line 194)
- Keep `options.model`, `options.cwd`, `options.disallowedTools`, `options.pathToClaudeCodeExecutable`, `options.env` (lines 193, 195198).
- `src/services/worker/knowledge/KnowledgeAgent.ts:14``import { CorpusRenderer }` already exists. Use it. The corpus-rendering call is the combination of `corpus.system_prompt` + `renderer.renderCorpus(corpus)`. Exact shape (copy from the current `prime` prompt at `KnowledgeAgent.ts:61-69`, minus the "Acknowledge" ending):
```
const systemPrompt = [
corpus.system_prompt,
'',
'Here is your complete knowledge base:',
'',
renderer.renderCorpus(corpus),
].join('\n');
```
- **Note for Phase 6**: `renderer.renderCorpus(corpus)` is the migration target for `renderObservations(obs, CorpusDetailStrategy)`. In this phase, call the existing renderer; Phase 6 swaps the internals.
- `src/services/worker/http/routes/CorpusRoutes.ts:235-261``handleQueryCorpus`. Keep the handler; change the response shape from `{answer, session_id}` (line 260) to `{answer}` only.
- `src/services/worker/knowledge/types.ts:53-56``QueryResult` narrowed to `{ answer: string }`.
### (c) Verification
- Send three queries against the same corpus within 5 min. Inspect SDK response usage (cache fields). Expected: call 1 writes full system prompt to the cache; calls 2 and 3 report `cache_read_input_tokens > 0`.
- `grep -n "resume:" src/services/worker/knowledge/KnowledgeAgent.ts` → zero.
- `grep -n "systemPrompt" src/services/worker/knowledge/KnowledgeAgent.ts` → exactly one occurrence (inside new `query`).
- Every `/query` call produces a subprocess with no `--resume` flag. Verify with `lsof` or SDK logs.
- End-to-end: `curl -X POST /api/corpus/foo/query -d '{"question":"What did we learn about Chroma?"}'` returns `{answer: "..."}` with no `session_id` field.
### (d) Anti-pattern guards
- **A**: The SDK option is `systemPrompt`; do not invent `systemMessage`, `initialContext`, or `primePrompt`. Verify the exact SDK option name in `@anthropic-ai/claude-agent-sdk` types before shipping.
- **C**: If `SDK.query` throws, propagate the error. No silent retry. No fallback to "cached answer".
- **E**: There is exactly one SDK-call site in the knowledge module after this phase — inside `KnowledgeAgent.query`. Anyone adding a second SDK call elsewhere in the module is introducing duplication.
---
## Phase 6 — Switch `CorpusBuilder` rendering to `renderObservations(obs, CorpusDetailStrategy)`
### (a) What to implement — Copy from …
- Copy from **05 § 3.11 line 457** (the clean flowchart node E): `E["renderObservations(obs, CorpusDetailStrategy)<br/>(U2 unified renderer)"]`.
- Copy from **05 Part 2 Decision D4** (line 75): "One renderer. `renderObservations(obs[], strategy)` where `strategy` selects columns, density, and grouping. The four existing formatters become four small strategy configs."
- Copy the `RenderStrategy` contract from **05 § 3.5 / 06 Phase 8** (06 lines 379389).
### (b) Docs
- 05 § 3.11 (lines 450476), 05 § 3.5, 05 Part 2 D4.
- **This plan depends on Plan 05-context-injection-engine** to have defined `CorpusDetailStrategy` at `src/services/rendering/renderObservations.ts` (path per 06 Phase 8 task 1, line 379). If Plan 05 has not shipped, this phase BLOCKS on it.
- Live file:line targets:
- `src/services/worker/knowledge/CorpusBuilder.ts:44``this.renderer = new CorpusRenderer();` constructor line. Replace with import of `renderObservations` and `CorpusDetailStrategy`.
- `src/services/worker/knowledge/CorpusBuilder.ts:109``corpus.system_prompt = this.renderer.generateSystemPrompt(corpus)`. Keep (the system-prompt *preamble* is distinct from the observation rendering). Or migrate to a separate strategy if 05 specifies one; 05 does not, so keep.
- `src/services/worker/knowledge/CorpusBuilder.ts:112``const renderedText = this.renderer.renderCorpus(corpus)`. Replace with `const renderedText = renderObservations(corpus.observations, CorpusDetailStrategy);`.
- `src/services/worker/knowledge/CorpusBuilder.ts:113``corpus.stats.token_estimate = this.renderer.estimateTokens(renderedText)`. Keep (token estimator is independent); if Plan 05 moves `estimateTokens` into the unified renderer's output, update.
- `src/services/worker/knowledge/KnowledgeAgent.ts` (Phase 5 rewrite) — swap `renderer.renderCorpus(corpus)` inside the query-time systemPrompt builder for `renderObservations(corpus.observations, CorpusDetailStrategy)`.
- `src/services/worker/knowledge/CorpusRenderer.ts` — after both call-sites migrate, delete `renderCorpus()` (lines 1434) and `renderObservation()` (lines 3985). Keep `generateSystemPrompt()` (lines 97132) and `estimateTokens()` (lines 9092) unless Plan 05 absorbs them. If nothing remains, delete the file; otherwise trim.
### (c) Verification
- `grep -n "renderCorpus\|renderObservation(" src/services/worker/knowledge/CorpusBuilder.ts` → zero.
- `grep -n "renderObservations" src/services/worker/knowledge/` → exactly two call-sites (CorpusBuilder and KnowledgeAgent).
- Snapshot test: feed the same fixture `CorpusObservation[]` to the old `CorpusRenderer.renderCorpus` and the new `renderObservations(obs, CorpusDetailStrategy)` call; assert byte-equal output (or diff in a controlled way documented in Plan 05's snapshot contract).
- `wc -l src/services/worker/knowledge/CorpusRenderer.ts` drops from 133 to roughly 40 (only `generateSystemPrompt` + `estimateTokens` remain, if they remain at all).
### (d) Anti-pattern guards
- **A**: The function name is `renderObservations` (plural), per 05 D4 and 06 Phase 8. Don't invent `renderCorpusObservations` or `renderForAgent`.
- **E**: After this phase, there is one traversal of `observations` in the knowledge module — inside `renderObservations`. Don't leave `renderObservation` (singular) as a helper in CorpusRenderer; Plan 05 owns it.
---
## Phase 7 — Verification (final)
### (a) What to implement — Copy from …
- Copy the verification pattern from **06 Phase 13 task 4 / verification block** (lines 581588).
- Copy the cost-check from **05 § 3.11 Cost note** (line 476).
### (b) Docs
- 05 § 3.11 (lines 450476).
- 06 Phase 13 (lines 567597).
### (c) Verification
1. **Grep gauntlet** (exact commands):
- `grep -rn "session_id" src/services/worker/knowledge/`**zero**.
- `grep -rn "session_id" src/services/worker/http/routes/CorpusRoutes.ts src/servers/mcp-server.ts` → zero for corpus/knowledge paths.
- `grep -rn "isSessionResumeError\|auto.?reprime\|session.*expired" src/services/worker/knowledge/` → zero.
- `grep -rn "/session|resume|expired|invalid.*session|not found/" src/services/worker/knowledge/` → zero (the exact regex string must be gone).
- `grep -rn "\.prime(\|\.reprime(" src/services/worker/knowledge/ src/servers/mcp-server.ts` → zero.
- `grep -rn "prime_corpus\|reprime_corpus" src/` → zero.
- `grep -rn "handlePrimeCorpus\|handleReprimeCorpus" src/` → zero.
2. **HTTP endpoints**:
- `POST /api/corpus` → 200, returns metadata.
- `POST /api/corpus/:name/rebuild` → 200.
- `POST /api/corpus/:name/query` → 200, `{answer: "..."}` only (no `session_id`).
- `DELETE /api/corpus/:name` → 200.
- `POST /api/corpus/:name/prime`**404**.
- `POST /api/corpus/:name/reprime`**404**.
3. **Cost smoke test** (per 05 line 476, "cached system prompt TTL is 5 min"):
- Build a 20-observation corpus.
- Run `POST /api/corpus/test/query` three times within 90 seconds, each with a different question.
- Record SDK response usage counters for each call. Expect: call 1 `cache_read_input_tokens == 0`; calls 2 and 3 `cache_read_input_tokens > 0` (approximately equal to the rendered corpus length in tokens).
- If no cache hits on calls 23, escalate to "Gaps" below — cost model is broken and the refactor must be revisited.
4. **corpus.json on disk**:
- `cat ~/.claude-mem/corpora/test.corpus.json | jq 'has("session_id")'``false`.
- `jq '.version'``2`.
5. **Line-count delta** (target from 05 Part 5 line 556: net -110 LOC for this area):
- Before: KnowledgeAgent 284 + CorpusStore 127 + CorpusBuilder 174 + CorpusRenderer 133 + CorpusRoutes 283 = **1001 lines** in the five files.
- After: roughly -75 (prime+reprime) -10 (CorpusStore `session_id` fields) -40 (auto-reprime + regex + executeQuery body) -40 (prime+reprime HTTP handlers) -93 (CorpusRenderer renderCorpus+renderObservation shift to shared renderer) +30 (new slim query() using systemPrompt). Net ≈ **-228**.
- 05 Part 5 promised -110; actual deletion is larger because the audit underweighted the CorpusRenderer migration credit (it's also double-counted in Plan 08/unified-renderer).
6. **Full `npm run build-and-sync`** passes.
7. **MCP tool listing** no longer exposes `prime_corpus` or `reprime_corpus`.
### (d) Anti-pattern guards
- **A**: Every grep that returns a non-zero match is a failed phase. No "we'll clean it up later" waivers.
- **B**: If the cost smoke test fails (no cache hits on call 2/3), do not "fix" by reintroducing session-resume. Investigate the SDK's prompt-caching behavior and file the bug.
- **C**: Any handler that silently returns a cached answer without calling the SDK is a regression. Every `/query` must invoke the SDK.
---
## Blast radius + migration
- **corpus.json schema**: `version: 1``version: 2`. Old files with `session_id` still parse because TypeScript structural casting is permissive on reads; extra field is ignored, never re-emitted. No explicit migration script — corpus files are rebuilt on `/rebuild` anyway.
- **MCP surface shrinks**: downstream users of the MCP search plugin lose `prime_corpus` and `reprime_corpus` tool names. Coordinate with plugin release notes.
- **Cost profile**: depends on SDK prompt-caching TTL (5 min). See Gap 1 below.
## Confidence + Gaps
**Confidence — High**:
- All deletion targets have exact file:line references verified against live code.
- The 06 Phase 13 verification steps align 1:1 with 05 § 3.11 deletion list.
- Every HTTP and MCP endpoint has been mapped to a specific line in `CorpusRoutes.ts` or `mcp-server.ts`.
**Gap 1 (flagged per prompt — prompt-caching TTL)**: 05 line 476 asserts "cached system prompt TTL is 5 min" → cost roughly equal to session-resume. **This is an assumption**, not a measured fact. If the Claude Agent SDK's caching hits on `systemPrompt` behave differently than expected (e.g., cache key sensitive to small whitespace changes in the rendered corpus; cache disabled when `options.cwd` varies; TTL shorter than 5 min), every `/query` becomes a full prompt-ingest — per-call cost jumps ~20×. **Required**: Phase 7 step 3 (the cost smoke test) must run and the cache-hit ratio must be logged before declaring the phase shipped. If cache miss rate > 10% on repeat queries within 5 min, escalate.
**Gap 2 — corpus.json storage shape**: 06 Phase 13 task 2 (line 579) suggests `{name, filters, renderedCorpus, generatedAt}` — storing the fully-rendered string instead of observations. This plan keeps observations because `renderObservations(obs, CorpusDetailStrategy)` is recomputed per query (Phase 5). Tradeoff: storing `renderedCorpus` saves one render per query (small) but loses the ability to change strategies without a rebuild. **Decision deferred**: ship Phase 17 with observations preserved; reopen if Plan 05 lands and stores `renderedCorpus` directly.
---
## Phase Count
**7 phases**: schema cleanup → `prime` deletion → auto-reprime deletion → endpoint deletion → `/query` rewrite → renderer unification → verification.
## Anticipated LOC Impact
- 05 Part 5 row 19 (line 556): `-140 / +30 / net -110`.
- This plan's line-by-line trace (see Phase 7 step 5): actual net deletion closer to **-228** once the `CorpusRenderer` shrink lands.
- Five files touched: `KnowledgeAgent.ts`, `CorpusStore.ts`, `CorpusBuilder.ts`, `CorpusRenderer.ts`, `CorpusRoutes.ts`, plus `mcp-server.ts` and `types.ts` edits.
@@ -0,0 +1,463 @@
# Plan 11: http-server-routes (clean)
Implements flowchart §3.9 of `PATHFINDER-2026-04-21/05-clean-flowcharts.md`.
Introduces Zod + `validateBody(schema)` middleware, deletes the rate limiter, caches the two served static files at boot, and strips per-route hand-rolled shape-validation. Bullshit-inventory items **#37 (per-route validation boilerplate)**, **#39 (rate limit)**, **#40 (oversized-body special handling)** are eliminated. **#38 (admin endpoints)** is explicitly preserved per the inventory note.
## Header
- **Target flowchart**: `PATHFINDER-2026-04-21/05-clean-flowcharts.md` §3.9 "http-server-routes (clean)" (lines 382-420).
- **Before state**: `PATHFINDER-2026-04-21/01-flowcharts/http-server-routes.md`.
- **Upstream dependencies**: *none*. Zod adoption is orthogonal to every other plan; this plan OWNS the Zod introduction.
- **Downstream dependencies**: *none*. Other plans land unaffected; they gain `validateBody(schema)` validation by attaching a schema to their routes at landing time, not by rewriting this plan.
- **Coordination note**: Plan 09 (lifecycle-hooks) collapses `SessionRoutes` from 10 → 4 endpoints (V9 finding). This plan MUST land **after** Plan 09 so the Zod schemas here target the final 4-endpoint surface, not the legacy 10. If landing order flips, re-attach schemas to whichever route names survive.
- **Verified findings cited**: V2 (legacy `/sessions/*` vs `/api/sessions/*`, SessionRoutes.ts:378-389); V9 (SessionRoutes has 10 endpoints, not 8); V20 (rate limiter at `src/services/worker/http/middleware.ts:45-79`, 300 req/min IP map, keyed by `::ffff:127.0.0.1`-normalized IP).
## Anti-patterns prohibited in every phase
- **A**: No invented Zod methods. Every API used must be verified against the installed zod version (Phase 1). In particular, use `schema.safeParse(body)` + `result.success ? result.data : result.error.flatten()` — no `ZodUtil.assertBody`, no `schema.validateOrThrow`.
- **D**: No per-route validation blocks of 5+ if statements. Any block that currently does `if (typeof x !== 'string') ... if (!body.foo) ... if (!body.bar) ...` collapses to a single `validateBody(schema)` middleware call.
- **E**: No two validation paths. If a route gets a Zod schema, the hand-rolled checks in the handler body get deleted in the same commit. "Defense in depth" via duplicate validation is forbidden.
---
## Phase 1 — Confirm Zod availability; add if absent
**Outcome**: `zod` is a first-class dependency in `package.json`, installed in `node_modules`, with a known version so every schema in Phase 3 uses a stable API.
### (a) What to implement
- Run `npm ls zod` in the repo root.
- If present (transitive or direct): pin the resolved major version in `package.json` dependencies (move from transitive to explicit so future `npm ci` can't drop it).
- If absent (confirmed state as of 2026-04-22 — see findings below): `npm install zod@^3.23.8` (current stable 3.x line). Commit `package.json` + `package-lock.json`.
- Record the resolved version in the PR description. All subsequent phases use this version's API surface.
Copy from: nothing — this is a dependency add. Reference the `package.json` structure at `/Users/alexnewman/.superset/worktrees/claude-mem/vivacious-teeth/package.json:111-125` (current `dependencies` block).
### (b) Docs
- §3.9 "Deleted" bullet 2 ("Per-route hand-rolled validation (Zod middleware replaces)").
- `06-implementation-plan.md` line 55: "Zod — `z.object({...})`, `schema.safeParse(body)`, `result.success ? result.data : result.error.flatten()`. (Not yet a dep; Phase 12 adds `zod` via npm; already shipped transitively via `@anthropic-ai/sdk` — confirm before landing.)"
- V9 (06-implementation-plan.md:36) confirms the SessionRoutes endpoint count that Phase 3 must schema.
- Live file:line: `package.json:111-125` (dependencies block); `package.json:124` (`zod-to-json-schema` — sibling package, *not* zod itself).
### (c) Verification
- `npm ls zod` prints a single resolved path, not "(empty)".
- `node -e "require('zod')"` exits 0.
- Grep: `grep -n '"zod"' package.json`**≥1** match in dependencies (not just `zod-to-json-schema`).
- `git diff package.json` shows `zod` added; `package-lock.json` shows resolved version.
### (d) Anti-pattern guards
- **A**: Don't pin to `@latest`; pin to the major line installed now (3.x). Record the exact minor in the plan PR.
- **E**: Don't add `zod` to both `dependencies` and `devDependencies` — runtime code imports it, so `dependencies` only.
---
## Phase 2 — Write `validateBody(schema)` middleware
**Outcome**: One Express middleware file, ~40 lines, that accepts any Zod schema and rejects non-conforming bodies with a uniform 400 shape. Zero per-route boilerplate.
### (a) What to implement
Create `src/services/worker/http/middleware/validateBody.ts`:
```ts
import { RequestHandler } from 'express';
import { ZodType } from 'zod';
export function validateBody<T>(schema: ZodType<T>): RequestHandler {
return (req, res, next) => {
const result = schema.safeParse(req.body);
if (!result.success) {
res.status(400).json({
error: 'validation_failed',
message: 'Request body failed schema validation',
code: 'VALIDATION_FAILED',
fields: result.error.flatten()
});
return;
}
req.body = result.data;
next();
};
}
```
Copy error-shape keys (`error`, `message`, `code`) from the existing `BaseRouteHandler.handleError` response shape at `/Users/alexnewman/.superset/worktrees/claude-mem/vivacious-teeth/src/services/worker/http/BaseRouteHandler.ts:82-99`, extended with `fields` (per 06-implementation-plan.md:546, 553, 563).
Create the directory: `src/services/worker/http/middleware/` (new; sibling to `middleware.ts`). One file, one export.
### (b) Docs
- §3.9 flowchart node D: `validateBody(schema) middleware (Zod per route)` → node E `Valid? → 400 with field errors` (05-clean-flowcharts.md:388-391).
- 06-implementation-plan.md Phase 12, lines 542-548 (middleware signature + `safeParse` + 400 with `fields`).
- Live file:line: existing error shape at `src/services/worker/http/BaseRouteHandler.ts:82-99` (fields: `error`, `code`, `details`).
### (c) Verification
- `grep -n "export function validateBody" src/services/worker/http/middleware/validateBody.ts` → 1 match.
- `grep -rn "res.status(400)" src/services/worker/http/middleware/validateBody.ts` → exactly 1 (the single 400 response).
- Unit test: schema `z.object({ foo: z.string() })` accepts `{foo:"bar"}`, rejects `{foo:42}` with 400 and `fields.fieldErrors.foo` populated.
- TypeScript: `tsc --noEmit` succeeds — the generic `<T>` signature must compile.
### (d) Anti-pattern guards
- **A**: `safeParse` only — no `.parse()` with try/catch wrapper, no `assertSafe`, no `ZodUtil` helper class. The Express middleware contract already provides error isolation.
- **D**: This file is the *only* place a Zod parse happens in the HTTP layer. If a future PR adds a second `safeParse` call inside a handler, it is a duplicate validation path — delete it.
- **E**: `next()` only on success. On failure, `res.status(400).json(...)` **and return**. Never both call `next()` and send a response.
---
## Phase 3 — Per-route Zod schemas; attach via middleware
**Outcome**: Every POST / PUT / DELETE-with-body endpoint has a Zod schema sitting next to its route registration. `validateBody(schema)` is inserted into the middleware chain for that route.
### (a) What to implement
For each route file, add a top-of-file `schemas` block (plain `const X = z.object({...})` — do NOT build a `schemas/` parallel directory; inline at top of file keeps the schema co-located with its handler). Attach via the route registration:
Before (`CorpusRoutes.ts:28`):
```ts
app.post('/api/corpus', this.handleBuildCorpus.bind(this));
```
After:
```ts
app.post('/api/corpus', validateBody(BuildCorpusSchema), this.handleBuildCorpus.bind(this));
```
**Schemas required (one per endpoint with a body). Target list assumes Plan 09 has already collapsed SessionRoutes to the 4-endpoint surface per §3.1.** If Plan 09 has not landed, also schema the legacy `/sessions/:sessionDbId/*` endpoints at `src/services/worker/http/routes/SessionRoutes.ts:377-382` — they're deleted by Plan 09 but must not be left unvalidated in the interim.
| Route file | Endpoint | Schema name | Core fields |
|---|---|---|---|
| `SessionRoutes.ts` | `POST /api/session/start` (post-Plan 09) | `SessionStartSchema` | `{ project: string, contentSessionId: string, platformSource?: string, customTitle?: string }` |
| `SessionRoutes.ts` | `POST /api/session/prompt` | `SessionPromptSchema` | `{ sessionDbId: number, prompt: string }` |
| `SessionRoutes.ts` | `POST /api/session/observation` | `SessionObservationSchema` | `{ sessionDbId: number, tool_use_id: string, name: string, input: unknown, output: unknown, cwd?: string }` |
| `SessionRoutes.ts` | `POST /api/session/end` | `SessionEndSchema` | `{ sessionDbId: number, last_assistant_message: string }` |
| `DataRoutes.ts` | `POST /api/observations/batch` | `ObservationsBatchSchema` | `{ ids: z.array(z.number().int()), orderBy?: z.enum(['date_desc','date_asc']), limit?: number, project?: string }` |
| `DataRoutes.ts` | `POST /api/sdk-sessions/batch` | `SdkSessionsBatchSchema` | `{ memorySessionIds: z.array(z.string()) }` |
| `DataRoutes.ts` | `POST /api/processing` | `SetProcessingSchema` | `{ isProcessing: z.boolean() }` (verify field name in handler) |
| `DataRoutes.ts` | `POST /api/pending-queue/process` | `ProcessQueueSchema` | (likely empty — `z.object({}).strict()`) |
| `DataRoutes.ts` | `POST /api/import` | `ImportSchema` | per handler's body shape |
| `MemoryRoutes.ts` | `POST /api/memory/save` | `MemorySaveSchema` | `{ text: z.string().min(1), title?: string, project?: string }` |
| `CorpusRoutes.ts` | `POST /api/corpus` | `BuildCorpusSchema` | `{ name: z.string().min(1), description?: string, project?: string, types?: z.array(z.string()), concepts?: z.array(z.string()), files?: z.array(z.string()), query?: string, date_start?: string, date_end?: string, limit?: z.number().int().positive() }` |
| `CorpusRoutes.ts` | `POST /api/corpus/:name/query` | `QueryCorpusSchema` | `{ question: z.string().min(1) }` |
| `CorpusRoutes.ts` | `POST /api/corpus/:name/rebuild` | `RebuildCorpusSchema` | `z.object({}).strict()` or per handler |
| `SettingsRoutes.ts` | `POST /api/settings` | `UpdateSettingsSchema` | **see note below** |
| `SettingsRoutes.ts` | `POST /api/mcp/toggle` | `ToggleMcpSchema` | `{ enabled: z.boolean() }` |
| `SettingsRoutes.ts` | `POST /api/branch/switch` | `SwitchBranchSchema` | `{ branch: z.enum(['main', 'beta/7.0', 'feature/bun-executable']) }` |
| `SettingsRoutes.ts` | `POST /api/branch/update` | `UpdateBranchSchema` | `z.object({}).strict()` |
| `LogsRoutes.ts` | `POST /api/logs/clear` | `ClearLogsSchema` | `z.object({}).strict()` or per handler |
| `ViewerRoutes.ts` | (GET-only) | — | no body schemas needed |
| `SearchRoutes.ts` | `POST /api/context/semantic` | `SemanticContextSchema` | per handler at `src/services/worker/http/routes/SearchRoutes.ts:41` |
**Special case — `POST /api/settings`**: the existing `validateSettings(settings)` function at `src/services/worker/http/routes/SettingsRoutes.ts:237-385` is ~148 lines of domain validation (valid providers, port ranges, Python version regex, URL parse). That is **domain validation, not shape validation.** Keep it. The Zod schema here validates only that each field, if present, is of the right primitive type (`z.string().optional()`, `z.number().optional()`, `z.boolean().optional()` as appropriate per the `settingKeys` array at `SettingsRoutes.ts:88-128`). The domain validation stays in the handler. This is the correct application of rule D: delete only shape checks, not domain checks.
Copy-ready pattern to replicate: `CorpusRoutes.ts:238-244` — the `QueryCorpusSchema` replaces exactly this block. Cleanest single-field existing check in the codebase.
### (b) Docs
- §3.9 flowchart node D (`validateBody(schema) middleware (Zod per route)`, 05-clean-flowcharts.md:388).
- Bullshit-inventory item #37: "Per-route validation boilerplate × 8 files" → "`validateBody(schema)` middleware; per-route Zod schema" (05-clean-flowcharts.md:55).
- 06-implementation-plan.md Phase 12 task 3 (line 547): "Per-route schemas in a parallel `schemas/` directory (or inline at top of each route file). One `z.object({…})` per endpoint." **This plan chooses inline** (co-location wins over directory partition at this scale — 8 files × ~3 schemas each = ~24 schemas; a separate directory adds import overhead with no clarity gain).
- V9 (06-implementation-plan.md:36): confirms SessionRoutes endpoint count pre/post Plan 09.
- Live file:line per row in the schema table above.
### (c) Verification
- `grep -rn "^import.*from 'zod'" src/services/worker/http/routes/`**≥1 per route file with a POST endpoint** (7 of 8 files — ViewerRoutes is GET-only).
- `grep -rn "validateBody(" src/services/worker/http/routes/` → count matches the POST/PUT endpoint total in the table above (~18 endpoints).
- For each schema: a successful request round-trips unchanged; an invalid-shape request returns 400 with `{error:'validation_failed', fields:...}`.
### (d) Anti-pattern guards
- **A**: Every schema uses published zod 3.x methods (`z.object`, `z.string`, `z.number`, `z.array`, `z.enum`, `z.boolean`, `.optional`, `.min`, `.int`, `.positive`). Anything else — verify against the resolved zod version from Phase 1. **Do not invent** `.isPositiveInt()` or `.nonEmptyString()` helper methods; use the built-in chain.
- **E**: No schema duplicated. If two endpoints share a shape (e.g. `contentSessionId` appears in multiple SessionRoutes handlers), extract to a shared `const SessionIdField = z.string()` at the top of the file and reuse. Duplicated literal `z.object({...})` with identical fields across files = delete one.
- **D**: Inline schemas only. Do not build `schemas/SessionSchemas.ts` / `schemas/DataSchemas.ts` — that re-introduces the parallel-directory anti-pattern the plan text at 06-implementation-plan.md:547 warns about.
---
## Phase 4 — Delete hand-rolled validation blocks
**Outcome**: Every shape-validation block (type check, presence check, array check) inside a route handler is deleted. Only domain validation remains.
### (a) What to implement
Delete (exact line ranges, to be deleted alongside the Phase 3 schema attachment for each route):
| File | Line range to delete | What | Replaced by |
|---|---|---|---|
| `src/services/worker/http/routes/CorpusRoutes.ts` | `44-51` | `if (!req.body.name) { res.status(400).json({error:'Missing required field: name', fix:..., example:...}); return; }` | `BuildCorpusSchema` in Phase 3 |
| `src/services/worker/http/routes/CorpusRoutes.ts` | `55-69` | Coercion calls for `types`, `concepts`, `files`, `limit` (`coerceStringArray`, `coercePositiveInteger`) | Zod coerces via `z.coerce.number()`, `z.string().transform(s => s.split(','))` as needed |
| `src/services/worker/http/routes/CorpusRoutes.ts` | `88-125` | `coerceStringArray` + `coercePositiveInteger` helper methods | Zod schema coercion replaces both helpers entirely |
| `src/services/worker/http/routes/CorpusRoutes.ts` | `238-245` | `QueryCorpus` question presence + type check | `QueryCorpusSchema` in Phase 3 |
| `src/services/worker/http/routes/DataRoutes.ts` | `118-123` | `path` query-param check (note: query-param, not body — keep as-is OR migrate to `validateQuery(schema)` if the middleware is extended; for this plan, leave) | — |
| `src/services/worker/http/routes/DataRoutes.ts` | `144-163` | `ids` coerce + array-check + integer-check for `POST /api/observations/batch` | `ObservationsBatchSchema` |
| `src/services/worker/http/routes/DataRoutes.ts` | `196-206` | `memorySessionIds` coerce + array-check for `POST /api/sdk-sessions/batch` | `SdkSessionsBatchSchema` |
| `src/services/worker/http/routes/SessionRoutes.ts` | `570-572` | `if (!contentSessionId) return this.badRequest(...)` in `handleObservationsByClaudeId` | Pre-Plan 09: keep as-is until routes collapse; post-Plan 09: replaced by `SessionObservationSchema` |
| `src/services/worker/http/routes/SessionRoutes.ts` | `672-676` | `contentSessionId` check in `handleSummarizeByClaudeId` | Same |
| `src/services/worker/http/routes/SessionRoutes.ts` | `724-728` | `contentSessionId` query-param check in `handleStatusByClaudeId` (GET — query not body; leave) | — |
| `src/services/worker/http/routes/SessionRoutes.ts` | `767-771` | `contentSessionId` check in `handleCompleteByClaudeId` | `SessionEndSchema` post-Plan 09 |
| `src/services/worker/http/routes/SessionRoutes.ts` | `831-835` | `this.validateRequired(req, res, ['contentSessionId'])` in `handleSessionInitByClaudeId` | `SessionStartSchema` post-Plan 09 |
| `src/services/worker/http/routes/SettingsRoutes.ts` | `159-164` | `enabled` boolean type check in `handleToggleMcp` | `ToggleMcpSchema` |
| `src/services/worker/http/routes/SettingsRoutes.ts` | `184-198` | `branch` presence + allowlist check in `handleSwitchBranch` | `SwitchBranchSchema` (`z.enum([...])` handles both presence and allowlist) |
| `src/services/worker/http/routes/MemoryRoutes.ts` | `33-36` | `text` presence + type + non-empty check | `MemorySaveSchema` |
| `src/services/worker/http/routes/BaseRouteHandler.ts` | `54-62` | `validateRequired(req, res, params)` helper method | **Delete entire method.** No caller remains after this phase. Keep `parseIntParam`, `badRequest`, `notFound`, `handleError`, `wrapHandler`. |
Total hand-rolled-validation lines deleted: approximately **125 LOC** across 5 files.
**`SettingsRoutes.validateSettings` at lines 237-385 is NOT deleted** — that is domain validation (provider allowlists, port ranges, URL parse) and stays in the handler as-is. Zod handles only shape. Cite rule D: "per-route validation blocks of 5+ if statements — collapsed to validateBody(schema)" applies to shape blocks; domain blocks are orthogonal and survive.
### (b) Docs
- §3.9 "Deleted" bullet 2: "Per-route hand-rolled validation (Zod middleware replaces)" (05-clean-flowcharts.md:414).
- Bullshit-inventory #37 (05-clean-flowcharts.md:55).
- 06-implementation-plan.md Phase 12 task 4 (line 548): "Delete per-route boilerplate: manual `typeof x !== 'string'` checks, `if (!body.foo) return res.status(400)…`."
- Live line ranges per row in the table above.
### (c) Verification
- `grep -rn "validateRequired" src/services/worker/http/`**0**.
- `grep -rn "typeof .* !== 'string'" src/services/worker/http/routes/`**0** for body validation; any surviving matches must be for non-body purposes (e.g., narrowing a union type inside business logic).
- `grep -rn "res.status(400)" src/services/worker/http/routes/` drops significantly (from ~12 to ≤ 2 domain-specific 400s in `SettingsRoutes.validateSettings` path and corpus `404 → 400` edge).
- `grep -n "coerceStringArray\|coercePositiveInteger" src/`**0**.
- Happy-path tests for each endpoint: response shape unchanged.
### (d) Anti-pattern guards
- **D**: If a handler still has a `typeof` check on a body field after this phase, the schema is missing a constraint. Fix the schema, not the handler.
- **E**: No fall-through: after `validateBody` accepts, the handler does NOT re-validate the same field. Example: `SwitchBranchSchema` uses `z.enum(['main','beta/7.0','feature/bun-executable'])` — the handler must not re-check `if (!allowedBranches.includes(branch))`.
- **A**: Don't replace `validateRequired` with a similarly-named Zod wrapper. Delete the method outright.
---
## Phase 5 — Delete rate-limit middleware
**Outcome**: The rate limiter at `src/services/worker/http/middleware.ts:45-79` (300 req/min IP map, keyed by `::ffff:127.0.0.1`-normalized IP) is deleted. Bullshit item #39 removed.
### (a) What to implement
Delete the following from `src/services/worker/http/middleware.ts`:
- **Lines 45-50**: comment block + `requestCounts` map + `RATE_LIMIT_WINDOW_MS` + `RATE_LIMIT_MAX_REQUESTS` constants.
- **Lines 52-77**: the `rateLimiter` RequestHandler.
- **Line 79**: `middlewares.push(rateLimiter);`.
Total: **35 LOC deleted from middleware.ts**.
No change needed in `Server.ts` — it registers middleware via `createMiddleware(summarizeRequestBody)` at `src/services/server/Server.ts:156`, which returns the array. Removing the `.push(rateLimiter)` call is sufficient; the caller loops over whatever middleware returns.
### (b) Docs
- §3.9 "Deleted" bullet 1: "In-memory rate limiter (300/min IP map) — localhost trust model everywhere else makes this theater" (05-clean-flowcharts.md:413).
- Bullshit-inventory #39 (05-clean-flowcharts.md:57).
- V20 (06-implementation-plan.md:47): "Rate limiter 300/min — Confirmed at `src/services/worker/http/middleware.ts:45-79`. Constants at `:49-50`. Keyed by IP, normalizes `::ffff:127.0.0.1`. Phase 14 deletes."
- 06-implementation-plan.md Phase 14 task 1 (line 612).
- Live file:line: `src/services/worker/http/middleware.ts:45-79`.
### (c) Verification
- `grep -n "RATE_LIMIT_WINDOW_MS\|RATE_LIMIT_MAX_REQUESTS\|requestCounts\|rateLimiter" src/`**0 matches**.
- `grep -n "429" src/services/worker/http/`**0** (the only 429 in the codebase is the rate limiter; survey the repo with `grep -rn "429" src/` to confirm).
- `curl -s -w "%{http_code}" -o /dev/null http://localhost:37777/api/health` repeated 1000× returns 200 every time — no 429 after request #300.
- Build green: `tsc --noEmit`.
### (d) Anti-pattern guards
- **B** (from 06-implementation-plan.md:623): "Don't re-introduce the rate limiter as a 'config flag'. Localhost trust model is explicit." No `if (settings.rateLimitEnabled)` conditional reintroduction.
- **D**: Do not leave the function in place "commented out" — delete the lines.
- **A**: Do not repurpose the `requestCounts` Map for a "request-counting telemetry" feature. Delete the Map.
---
## Phase 6 — Cache viewer.html and /api/instructions at boot
**Outcome**: The sync `readFileSync` on every `GET /` and `GET /api/instructions` request is replaced by an in-memory `Buffer` loaded once at worker boot.
> **Cache lifecycle contract (Preflight edit 2026-04-22 — reconciliation C10)**: The cached `Buffer` lives for the **lifetime of the worker process** — re-read on every worker boot, never refreshed mid-process. This is the contract plan 12's T1 regression test (SHA-256 of `GET /`) assumes when it mandates re-baselining after every worker restart. If the viewer.html content includes a per-boot bearer-token injection (observation 71147), the Buffer captures that token at constructor time and serves it consistently until the next boot. **Do not** add any hot-reload / file-watcher / TTL cache invalidation. If an operator edits `viewer.html` in place, they must restart the worker to see the change — documented tradeoff, not a regression.
### (a) What to implement
**`/` (viewer.html)** — currently at `src/services/worker/http/routes/ViewerRoutes.ts:54-72`:
Refactor `ViewerRoutes` constructor (currently `src/services/worker/http/routes/ViewerRoutes.ts:19-25`) to resolve + read `viewer.html` once and store as a module-level or instance-level `Buffer`:
```ts
private viewerHtml: Buffer;
constructor(...) {
super();
const packageRoot = getPackageRoot();
const candidates = [
path.join(packageRoot, 'ui', 'viewer.html'),
path.join(packageRoot, 'plugin', 'ui', 'viewer.html')
];
const found = candidates.find(existsSync);
if (!found) throw new Error('Viewer UI not found at boot');
this.viewerHtml = readFileSync(found); // Buffer
}
private handleViewerUI = this.wrapHandler((req, res) => {
res.setHeader('Content-Type', 'text/html');
res.send(this.viewerHtml);
});
```
Delete `readFileSync` + `existsSync` calls from inside the request handler (lines 63-71 of current file).
**`/api/instructions`** — currently at `src/services/server/Server.ts:202-234`:
The endpoint supports 4 `topic` values × N `operation` values. Option (a): pre-compute the 4 section strings at boot. Option (b): pre-read `SKILL.md` once and read `operations/*.md` lazily (these are rarer).
Recommended: Option (a). At `Server` constructor time, call `loadInstructionContent(undefined, 'all')` once, extract all 4 sections, store as `Record<string, Buffer>`. Store a `Map<string, Buffer>` for `operations/*.md` populated lazily on first hit (or eagerly if the operations directory is small — enumerate at boot).
Preserve path-traversal security: the `operationPath.startsWith(OPERATIONS_BASE_DIR + path.sep)` check at `Server.ts:218` stays. Caching does not bypass validation — the cache key is the already-validated operation name.
Preserve the `ALLOWED_TOPICS` + `ALLOWED_OPERATIONS` allowlist at `Server.ts:207-213`.
Copy-ready pattern: the current `extractInstructionSection` function at `Server.ts:350-359` already partitions content into a `sections` record — that IS the cache structure; just hoist it from per-request to boot.
### (b) Docs
- §3.9 "Deleted" bullet 3: "Synchronous file read for `/` and `/api/instructions` (replace with cached `Buffer` loaded at boot)" (05-clean-flowcharts.md:415).
- §3.10 flowchart node HTML: "viewer.html (cached at boot)" (05-clean-flowcharts.md:426).
- 06-implementation-plan.md Phase 14 task 2 (line 613): "Cache `viewer.html` and `/api/instructions` content in memory at boot; serve from `Buffer` instead of `fs.readFile`."
- Live file:line: `src/services/worker/http/routes/ViewerRoutes.ts:54-72` (viewer.html); `src/services/server/Server.ts:202-234` (instructions endpoint); `src/services/server/Server.ts:337-345` (loader); `src/services/server/Server.ts:350-359` (section extractor).
### (c) Verification
- Static file reads happen once at boot: add a `logger.info('WORKER', 'viewer.html cached', { bytes: this.viewerHtml.length })` at constructor time; grep logs after 100 `GET /` requests to confirm the message fires exactly once.
- `lsof -p $(pidof node) | grep viewer.html` at steady-state: either zero (Buffer held in memory, no open FD) or exactly one (memory-mapped).
- `grep -n "readFileSync.*viewer.html\|readFileSync.*SKILL.md\|readFileSync.*operations" src/services/worker/ src/services/server/`**0** matches inside request handlers (module-scope or constructor-scope matches are fine; per-request matches fail).
- Response body unchanged (byte-for-byte) across a request pair before and after the change.
### (d) Anti-pattern guards
- **E**: Do not keep the `readFileSync` path "as a fallback" for when the Buffer is undefined. If the file isn't found at boot, throw — fail-fast aligns with global standard #3. No silent fallback.
- **D**: The viewer-path-candidate array at `ViewerRoutes.ts:58-61` is not a duplicate validation — it's install-layout probing. Keep both candidates for boot-time resolution. After the first successful read, the candidate list is discarded.
- **A**: Do not wrap the Buffer in a `StaticFileCache` class. Hold it as a private field on the route class. One field, one assignment.
---
## Phase 7 — Delete oversized-body special handling
**Outcome**: The 5MB JSON parse limit stays (cheap; bullshit item #40 keep-clause). Any `if (body.size > …) specialHandler()` or hand-rolled 413 logic is deleted — Express's built-in 413 from the `express.json({ limit: '5mb' })` middleware is sufficient.
### (a) What to implement
Survey the route files and `middleware.ts` for body-size special handling:
- `src/services/worker/http/middleware.ts:25``express.json({ limit: '5mb' })`**KEEP**. This is the one-line limit per item #40.
- Any handler that inspects `req.body.length`, `req.headers['content-length']`, or returns a custom 413: **DELETE**.
Based on the grep survey in Phase 0, **no custom oversized-body handling currently exists in `src/services/worker/http/`**. This phase is a verification pass confirming absence. If any is discovered during implementation, delete it without replacement — the `express.json()` middleware already emits 413 with `entity.too.large` on oversized bodies.
If any handler catches the Express 413 and remaps it to a different shape, delete the catch — uniform error handling via `BaseRouteHandler.handleError` (`src/services/worker/http/BaseRouteHandler.ts:82-99`) is already in place.
### (b) Docs
- Bullshit-inventory #40 (05-clean-flowcharts.md:58): "JSON parse 5MB limit on every request — Keep (cheap), but delete any special handling for oversized — 413 is fine."
- Live file:line: `src/services/worker/http/middleware.ts:25` (the `express.json` call to preserve).
### (c) Verification
- `grep -rn "413\|'entity.too.large'\|PayloadTooLarge" src/services/worker/http/`**0 matches in handler code** (framework-internal uses do not appear in our source).
- `grep -rn "content-length\|contentLength\|Content-Length" src/services/worker/http/routes/`**0** matches in route handlers (header-inspection by handlers is the anti-pattern to find).
- Sending a 6MB body returns Express default 413. Sending a 4MB body round-trips.
### (d) Anti-pattern guards
- **D**: If a grep hit appears, delete it. Do not "improve" it.
- **A**: Don't add a `RequestSizeGuard` middleware. `express.json({ limit })` already guards.
- **E**: Don't let a handler's try/catch swallow a 413 and remap to 400. The Express error shape for 413 is Express's; uniformity below that boundary is enforced by `handleError`.
---
## Phase 8 — Verification
**Outcome**: Whole §3.9 diagram is reality. All greps clean, route smoke tests pass, deleted-line count matches estimate.
### (a) What to implement
Execute the verification checklist below. This phase does not modify production code; it runs scripts/tests and fixes regressions uncovered.
### (b) Docs
- §3.9 full diagram (05-clean-flowcharts.md:384-410).
- §3.9 "Deleted" block (lines 412-416).
- §3.9 "Kept" block (line 418): "All user-facing routes, SSE, middleware chain, admin endpoints (used by tooling)." — the admin endpoints (`/api/admin/restart`, `/api/admin/shutdown`, `/api/admin/doctor` at `src/services/server/Server.ts:237-330`) are explicitly preserved; item #38 (05-clean-flowcharts.md:56).
- 06-implementation-plan.md Phase 15 (line 631-656): timer census + grep pass + full test suite.
### (c) Verification checklist
- [ ] **Rate limiter gone**: `grep -rn "RATE_LIMIT_WINDOW_MS\|RATE_LIMIT_MAX_REQUESTS\|requestCounts\|rateLimiter" src/`**0**.
- [ ] **Zod present**: `grep -rn "^import .* from 'zod'" src/services/worker/http/`**≥8** matches (middleware + 7 route files with POSTs).
- [ ] **validateBody attached**: `grep -rn "validateBody(" src/services/worker/http/routes/" → **~18** matches (one per schemaed POST/PUT).
- [ ] **validateRequired deleted**: `grep -rn "validateRequired" src/`**0**.
- [ ] **Static-file reads hoisted**: `grep -rn "readFileSync.*viewer.html" src/services/worker/` → 0 matches inside request handlers; OK in constructor/module-scope.
- [ ] **SSE preserved**: `GET /stream` returns `text/event-stream` with initial `initial_load` event (manual smoke test).
- [ ] **Admin preserved**: `POST /api/admin/doctor` from localhost returns JSON; from non-localhost returns 403 (per `requireLocalhost` at `src/services/worker/http/middleware.ts:121-143`). Used by version-bump per item #38.
- [ ] **Route smoke tests per endpoint (curl or integration suite)**:
- `GET /` → 200 HTML (from cached Buffer).
- `GET /health` → 200 JSON `{status:'ok', activeSessions:N}`.
- `GET /stream` → 200 SSE stream.
- `POST /api/memory/save` with `{text:""}` → 400 `{error:'validation_failed', fields:...}`.
- `POST /api/memory/save` with `{text:"hi"}` → 200 `{success:true, id:...}`.
- `POST /api/corpus` with `{name:"t", query:"hooks"}` → 200 metadata.
- `POST /api/corpus` with `{}` → 400 validation_failed with `fields.fieldErrors.name`.
- `POST /api/mcp/toggle` with `{enabled:"yes"}` → 400; `{enabled:true}` → 200.
- `POST /api/branch/switch` with `{branch:"nonexistent"}` → 400; `{branch:"main"}` → 200.
- `GET /api/instructions?topic=workflow` → 200 JSON content (served from cache).
- `POST /api/admin/restart` from localhost → 200 `{status:'restarting'}`.
- [ ] **Build green**: `npm run build` succeeds.
- [ ] **Worker boots**: `npm run build-and-sync` and verify `GET /health` answers within 2s.
- [ ] **Deleted-lines tally**: approximately **35 LOC** (rate limiter, Phase 5) + **~125 LOC** (hand-rolled validation + helpers, Phase 4) + **~9 LOC** (`BaseRouteHandler.validateRequired` method, Phase 4) + **~10 LOC** (per-request `readFileSync`/`existsSync` probes moved to constructor, Phase 6) ≈ **~180 LOC net deleted**, offset by **~60 LOC added** (new `validateBody` + ~24 schemas averaging 2-3 lines each) = **~120 LOC net deletion**.
### (d) Anti-pattern guards
- **D** (whole plan): if any verification grep finds unexpected matches, do not "fix forward" — delete the offending code.
- **E**: If a route smoke test fails due to schema over-constraint (e.g., an optional field rejected), **relax the schema, do not re-add a hand-rolled fallback.**
- **A**: Do not add integration tests that fake the Zod surface. Use the installed zod.
---
## Reporting summary
**Phase count**: 8.
**Estimated deletion**: ~180 LOC gross, ~60 LOC added, **~120 LOC net**. Primary deletes: rate limiter (35), hand-rolled validation blocks (125), `validateRequired` helper (9), per-request file-read probing (10). Primary additions: `validateBody.ts` (~40), Zod schemas inline (~60 across 7 files).
**Sources consulted**:
- `PATHFINDER-2026-04-21/05-clean-flowcharts.md` (full); §3.9 (lines 382-420) canonical; Part 1 items #37-40 (lines 55-58); Part 2 decisions (lines 65-79).
- `PATHFINDER-2026-04-21/06-implementation-plan.md`: V2 (line 29), V9 (line 36), V20 (line 47); allowed-APIs block (lines 49-55); anti-patterns (line 59); Phase 12 (lines 530-565); Phase 14 (lines 600-627); Phase 15 (lines 631-656).
- `PATHFINDER-2026-04-21/01-flowcharts/http-server-routes.md` (before state).
- Live codebase (9 files): `src/services/worker/http/middleware.ts`, `src/services/worker/http/BaseRouteHandler.ts`, `src/services/worker/http/routes/{ViewerRoutes,SearchRoutes,SessionRoutes,DataRoutes,SettingsRoutes,MemoryRoutes,CorpusRoutes,LogsRoutes}.ts`, `src/services/server/Server.ts`.
- `package.json` (dependencies block lines 111-125) + `npm ls zod` + filesystem probe of `node_modules/zod`.
**Concrete findings**:
- **Zod presence check** (2026-04-22 10:18 PDT): `npm ls zod` returns `(empty)`. `node_modules/zod/package.json` does not exist. Transitively it is NOT shipped — the only zod-adjacent package is `zod-to-json-schema@^3.24.6` at `package.json:124`, which does not pull in `zod` itself. **Phase 1 MUST add `zod` via `npm install zod@^3.x`.** Verified findings block at `06-implementation-plan.md:55` should be updated: "already shipped transitively via `@anthropic-ai/sdk`" is false for this repo (the SDK is `@anthropic-ai/claude-agent-sdk`, not `@anthropic-ai/sdk`).
- **Route-file inventory with validation styles** (8 files, `src/services/worker/http/routes/`):
- `ViewerRoutes.ts` (116 LOC): GET-only, no body schemas needed.
- `SearchRoutes.ts` (421 LOC): 1 POST (`/api/context/semantic` at line 41), mostly query-param validation.
- `SessionRoutes.ts` (958 LOC): 10 POST endpoints per V9 (6 legacy `/sessions/:id/*` at lines 377-382 + 4 under `/api/sessions/*` at lines 385-389, plus `/api/sessions/status` GET). Uses `this.validateRequired` (line 833) and inline `if (!contentSessionId)` checks (lines 570, 674, 726, 769). Post-Plan 09 collapses to 4.
- `DataRoutes.ts` (562 LOC): 5 POST endpoints. Uses `this.badRequest` + inline `typeof` checks (lines 120-123, 149-163, 203-206). Contains ad-hoc coerce logic (JSON.parse-or-split-by-comma) at lines 145-147, 199-201 — Zod `z.preprocess` subsumes this.
- `SettingsRoutes.ts` (434 LOC): 5 POST endpoints. Has a 148-line **domain-validation** function `validateSettings` (lines 237-385) — **preserve**; the shape-validation is inline at lines 161-164, 185-197 — **delete**.
- `MemoryRoutes.ts` (93 LOC): 1 POST. Validation block at lines 33-36. Cleanest single-endpoint pattern in the codebase — **copy-ready template for Phase 3**.
- `CorpusRoutes.ts` (283 LOC): 5 POST endpoints. Validation at lines 44-51, 238-245 plus two coerce helpers at lines 88-125 (~38 LOC of helper boilerplate deletable).
- `LogsRoutes.ts` (165 LOC): 1 POST (`/api/logs/clear` at line 102). Minimal body.
- **Static file endpoints**:
- `GET /` serves `viewer.html``ViewerRoutes.ts:54-72` does per-request `readFileSync` over 2 candidate paths. Move to constructor.
- `GET /api/instructions``Server.ts:202-234` does per-request `fs.promises.readFile` via `loadInstructionContent` (line 337). 4 topic sections (extractable at boot) + operation files (lazy-cache OK). Allowlist at `Server.ts:207-213` (`ALLOWED_TOPICS`, `ALLOWED_OPERATIONS`) stays; path-traversal check at line 218 stays.
- Static assets (`js`, `css`, fonts) served via `express.static(uiDir)` at `middleware.ts:110-112`**already cached by Express; no change**.
- **Copy-ready snippet locations**:
- Cleanest single-field validation example to replicate: `CorpusRoutes.ts:238-244` (the `question` check for `QueryCorpus`) — this exact shape replaces one-to-one with a `QueryCorpusSchema = z.object({ question: z.string().min(1) })`.
- Cleanest presence check to Zod-ify: `MemoryRoutes.ts:33-36` (the `text` check) — maps to `MemorySaveSchema = z.object({ text: z.string().min(1), title: z.string().optional(), project: z.string().optional() })`.
- Error-shape template to mirror in `validateBody`: `BaseRouteHandler.ts:82-99` (existing `{error, code, details}` shape) — extend with `fields`.
**Confidence + gaps**:
- **High confidence**: rate-limiter deletion (V20 verified exact lines), static-file caching (exact file:line confirmed), validation-block locations (grep returned matching line numbers), BaseRouteHandler method cleanup.
- **Gap 1 — Plan 09 landing order**: This plan assumes the §3.1 4-endpoint SessionRoutes surface is the target. If Plan 09 has not landed when this plan begins Phase 3, the plan must attach schemas to the 10 legacy endpoints (`src/services/worker/http/routes/SessionRoutes.ts:377-389`) and then refactor in lockstep when Plan 09 merges. Coordination required — add `[blocked-on: plan-09]` gate on the Phase 3 PR, or land Plan 09 first.
- **Gap 2 — Zod version lock-in for the whole refactor**: Phase 1 picks the zod 3.x version; if a future phase in another plan wants a zod 4.x-only API, this plan's schemas become incompatible. Mitigation: schemas use only the stable `z.object/string/number/array/enum/boolean/optional/min/int/positive` surface, which is unchanged between 3.x majors and 4.x. Still, a breaking upgrade must be coordinated here.
@@ -0,0 +1,297 @@
# Plan 12 — viewer-ui-layer (LOCKDOWN / REGRESSION-DETECTION)
**Target flowchart:** `PATHFINDER-2026-04-21/05-clean-flowcharts.md` section 3.10 ("viewer-ui-layer (clean)")
**Before-state flowchart:** `PATHFINDER-2026-04-21/01-flowcharts/viewer-ui-layer.md`
**Canonical doctrine from 05 §3.10:** *"Deleted: (Nothing — this subsystem is clean.)"* / *"Kept: Everything. User-facing."*
## Plan Type
**LOCKDOWN / REGRESSION-DETECTION.** This is NOT a refactor plan. Section 3.10 declares the viewer subsystem already aligned with the clean architecture. The deliverable is a protective harness that detects regressions introduced by the **other 11 plans** landing.
No source code in `src/ui/viewer/**` is modified by this plan. The only artifacts produced are regression tests, baselines, and a re-run schedule.
**Expected lines deleted by this plan:** 0
**Expected lines added to `src/`:** 0 (tests live under `tests/viewer-lockdown/`)
## Dependencies
- **Upstream:** none — no other plan produces code this plan consumes.
- **Downstream:** none — no other plan consumes code this plan produces.
- **Cross-reference dependencies (tests-run-after):**
- Plan 11 (`http-server-routes`, flowchart §3.9) — **CRITICAL.** Phase 14 of `06-implementation-plan.md:600-627` caches `viewer.html` at boot. The lockdown suite MUST run after plan 11 to confirm the cached Buffer serve still produces a byte-identical HTML response and that `express.static(path.join(packageRoot, 'ui'))` (`ViewerRoutes.ts:30`) still serves JS/CSS assets.
- Plan 09 (`lifecycle-hooks`) — only indirectly relevant; hooks don't talk to the viewer, but SSE broadcast events originate from write paths the hooks trigger. Re-run the `new_observation` live-update test after plan 09 lands.
- All remaining 9 plans — run the suite as a smoke check.
- **Implementation-plan cross-ref:** no V-finding targets the viewer subsystem directly in `06-implementation-plan.md`. V20 (rate-limiter deletion, Phase 14) and the "cache `viewer.html`" task in Phase 14 tasks 12 are the only lines that touch the viewer's serve path. **No V-number in `06-implementation-plan.md` is assigned to viewer-ui behavior. State recorded here for audit completeness.**
## Sources Consulted
- `PATHFINDER-2026-04-21/05-clean-flowcharts.md:422-447` (section 3.10, canonical)
- `PATHFINDER-2026-04-21/05-clean-flowcharts.md:564-587` (Part 5 deletion totals — viewer contributes 0)
- `PATHFINDER-2026-04-21/01-flowcharts/viewer-ui-layer.md:1-95` (before-state, identical to after-state)
- `PATHFINDER-2026-04-21/06-implementation-plan.md:600-627` (Phase 14 — static-file cache task)
- `src/ui/viewer/App.tsx:1-163`
- `src/ui/viewer/index.tsx:1-17`
- `src/ui/viewer/hooks/useSSE.ts:1-148`
- `src/ui/viewer/hooks/usePagination.ts:1-119`
- `src/ui/viewer/hooks/useSettings.ts:1-100`
- `src/ui/viewer/components/Feed.tsx:1-100`
- `src/ui/viewer/constants/api.ts:5-12`
- `src/ui/viewer/constants/timing.ts:7` (`SSE_RECONNECT_DELAY_MS: 3000`)
- `src/services/worker/http/routes/ViewerRoutes.ts:1-116`
- `src/services/worker/http/routes/DataRoutes.ts:38-45` (`/api/observations` endpoints)
- `src/services/worker/http/routes/SettingsRoutes.ts:30-31` (`/api/settings` endpoints)
## Concrete Findings (React Component + Hook Inventory)
### React Components (all in `src/ui/viewer/components/`)
- `ErrorBoundary.tsx` — root wrapper, mounted via `index.tsx:13-15`.
- `Header.tsx` — project/source filters, SSE connection light, theme toggle.
- `Feed.tsx:18` — interleaved card list; IntersectionObserver at `Feed.tsx:33-41` with `threshold: UI.LOAD_MORE_THRESHOLD`.
- `ObservationCard.tsx` / `SummaryCard.tsx` / `PromptCard.tsx` — rendered in `Feed.tsx:69-75`.
- `ContextSettingsModal.tsx` — POST `/api/settings` via `useSettings.saveSettings`.
- `LogsDrawer` (from `LogsModal.tsx`) — console capture drawer.
- `ScrollToTop.tsx` — inside `Feed.tsx:65`.
- `TerminalPreview.tsx`, `ThemeToggle.tsx`, `GitHubStarsButton.tsx` — supplemental.
### Hooks (all in `src/ui/viewer/hooks/`)
- `useSSE.ts:6`**SSE subscription owner.** Returns `{observations, summaries, prompts, projects, sources, projectsBySource, isProcessing, queueDepth, isConnected}`. EventSource at `useSSE.ts:50`; auto-reconnect at `useSSE.ts:61-71` after `TIMING.SSE_RECONNECT_DELAY_MS`.
- `usePagination.ts:108` — exposes `{observations, summaries, prompts}`, each with `{isLoading, hasMore, loadMore}`. Resets offset on filter change (`usePagination.ts:36-46`).
- `useSettings.ts:8` — GET/POST `/api/settings`.
- `useTheme.ts`, `useStats.ts`, `useContextPreview.ts`, `useGitHubStars.ts`, `useSpinningFavicon.ts` — ancillary.
### SSE Event Types the Viewer Subscribes To
From `useSSE.ts:76-120` switch:
- `initial_load` — catalog payload `{projects, sources, projectsBySource}`.
- `new_observation` — appends to `observations` state (prepend).
- `new_summary` — appends to `summaries` state (prepend).
- `new_prompt` — appends to `prompts` state (prepend).
- `processing_status` — updates `isProcessing` + `queueDepth`.
### The Dedup Invariant (05 §3.10 line 444)
Live SSE data (`useSSE().observations`) and paginated history (`App.paginatedObservations`) are merged with `(project, id)` dedup in `App.tsx:50-66` via `mergeAndDeduplicateByProject`. Section 3.10 line 444 explicitly protects this: *"which is a correct pattern for live + historical merging."* **Anti-pattern guard E:** do NOT collapse the two paginated fetches into one. The duplication is legitimate.
## Phase Contract
Every phase below follows this structure:
- **(a) What to implement** — the regression artifact or action.
- **(b) Docs** — 05 §3.10 + live file:line anchors.
- **(c) Verification** — exact executable checks.
- **(d) Anti-pattern guards** — A (invent new UI behaviors) + E (collapse legitimate dedup).
---
## Phase 1 — Inventory viewer behaviors
**(a) What to implement**
Produce a single source-of-truth inventory document at `tests/viewer-lockdown/INVENTORY.md` enumerating:
1. All 7 component files under `src/ui/viewer/components/` with file:line anchors for their main exports.
2. All 9 hook files under `src/ui/viewer/hooks/` with exported function signatures.
3. Every SSE event type the viewer subscribes to (5 types, from `useSSE.ts:76-120`).
4. Every HTTP endpoint the viewer calls (`/stream`, `/api/observations`, `/api/summaries`, `/api/prompts`, `/api/settings`, `/api/stats`).
5. Timing constants currently in effect: `SSE_RECONNECT_DELAY_MS=3000` (`constants/timing.ts:7`), `UI.PAGINATION_PAGE_SIZE`, `UI.LOAD_MORE_THRESHOLD` (`constants/ui.ts`).
**(b) Docs**
- 05 §3.10 (mermaid diagram at `05-clean-flowcharts.md:424-441`)
- `01-flowcharts/viewer-ui-layer.md:18-27` (component tree) + `:30` (happy path)
**(c) Verification**
- `grep -c "^" tests/viewer-lockdown/INVENTORY.md` ≥ 60 lines.
- Every file:line reference in the inventory resolves under `git ls-files`.
- All 5 SSE event types from `useSSE.ts:76-120` appear verbatim in the inventory.
**(d) Anti-pattern guards**
- **A:** Do not invent behaviors. Inventory strictly what exists in HEAD.
- **E:** List the dedup call site (`App.tsx:50-66`) as a "protected pattern — do not collapse".
---
## Phase 2 — Define invariants (one per behavior from 05 §3.10)
**(a) What to implement**
Write `tests/viewer-lockdown/INVARIANTS.md` with one numbered invariant per flowchart node/edge in 05 §3.10:
- **I1 (serve):** `GET /` returns HTML whose byte-count equals the baseline within 0 bytes OR differs only by bearer-token substitution. Anchor: `ViewerRoutes.ts:54-72`.
- **I2 (mount):** `index.tsx:11-15` mounts `<ErrorBoundary><App/></ErrorBoundary>` into `#root`. No other mount paths.
- **I3 (SSE open):** `useSSE.ts:50` opens `new EventSource(API_ENDPOINTS.STREAM)` where `STREAM === '/stream'` (`constants/api.ts:12`).
- **I4 (initial_load):** On the first `initial_load` event, `catalog.projects`, `catalog.sources`, `catalog.projectsBySource` populate (`useSSE.ts:77-87`).
- **I5 (live appends):** `new_observation` / `new_summary` / `new_prompt` prepend to their arrays (`useSSE.ts:89-111`). Order: newest first.
- **I6 (processing_status):** Updates `isProcessing` + `queueDepth` (`useSSE.ts:113-119`).
- **I7 (pagination):** `Feed.tsx:33-41` IntersectionObserver fires `onLoadMoreRef.current()``App.handleLoadMore` (`App.tsx:79-99`) → three parallel `/api/{observations,summaries,prompts}` fetches with `offset` + `limit` query params.
- **I8 (dedup):** `App.tsx:50-66` merges live + paginated with `mergeAndDeduplicateByProject` keyed on `(project, id)`. **Two distinct arrays MUST remain.** (Anti-pattern guard E.)
- **I9 (filter reset):** Changing `currentFilter` or `currentSource` resets `paginatedObservations/Summaries/Prompts` to `[]` and re-fetches page 0 (`App.tsx:102-108`, `usePagination.ts:36-46`).
- **I10 (settings round-trip):** `ContextSettingsModal` save → `useSettings.saveSettings``POST /api/settings``{success: true}` response path sets `saveStatus='✓ Saved'` (`useSettings.ts:65-96`).
- **I11 (reconnect):** EventSource `onerror` closes and calls `connect()` after `TIMING.SSE_RECONNECT_DELAY_MS` (3000 ms) (`useSSE.ts:61-71`).
- **I12 (static assets):** `express.static(path.join(packageRoot, 'ui'))` (`ViewerRoutes.ts:30`) serves bundled JS/CSS. Must still 200 after plan 11 lands its cache change.
**(b) Docs**
- Each invariant cites file:line as shown above.
- Cross-ref 05 §3.10 mermaid nodes one-to-one: HTTP→I1, HTML→I1/I12, React→I2, SSE→I3, Initial→I4, Feed→I7, Page→I7, Merge→I8, Cards→I5, Settings→I10, Reconnect→I11.
**(c) Verification**
- Every mermaid node in `05-clean-flowcharts.md:426-440` maps to ≥1 invariant in `INVARIANTS.md`.
- Every invariant cites at least one live `file.ts:NN` anchor that resolves at HEAD.
**(d) Anti-pattern guards**
- **A:** Each invariant must be phrased as "X currently happens", not "X should happen". This is a lockdown, not a wish list.
- **E:** I8 is the anti-collapse invariant — explicitly forbid "flattening paginated + live into a single array".
---
## Phase 3 — Write regression tests (one per invariant)
**(a) What to implement**
Create the test harness `tests/viewer-lockdown/` with these files. Prefer Playwright (headless Chromium) since EventSource + IntersectionObserver require a real browser. If Playwright is not already a dev dep, author a **manual checklist** instead — do not introduce a new test framework.
1. `tests/viewer-lockdown/regression.spec.ts` (Playwright) OR `tests/viewer-lockdown/CHECKLIST.md` (manual):
- **T1 → I1:** `curl -s http://localhost:37777/` returns 200 + `Content-Type: text/html`. Diff against `baseline/viewer.html.sha256`.
- **T2 → I2:** Page loads, `document.querySelector('#root').children.length > 0` within 2 s.
- **T3 → I3+I4:** Open `/stream` via EventSource, receive `initial_load` within 2 s; payload has `projects`, `sources`, `projectsBySource`.
- **T4 → I5:** Insert a synthetic observation via `POST /api/sessions/:id/observations`; assert a card appears in the feed within 2 s without a page refresh.
- **T5 → I7:** Scroll the feed past the IntersectionObserver sentinel; assert network panel shows `GET /api/observations?offset=20&limit=20` (or matching `UI.PAGINATION_PAGE_SIZE`).
- **T6 → I8:** Inject a duplicate `(project, id)` pair via SSE and paginated response; assert exactly one card rendered.
- **T7 → I9:** Change project filter; assert `paginatedObservations` cleared (check via `Feed` DOM length before/after) and a fresh page-0 request fires.
- **T8 → I10:** Open `ContextSettingsModal`, change `CLAUDE_MEM_CONTEXT_OBSERVATIONS`, click save; assert `POST /api/settings` → 200 → `saveStatus` text contains `✓ Saved`.
- **T9 → I11:** Kill the worker SSE connection (e.g. `curl -X POST /__test__/drop-sse-clients` if available, else restart worker); assert EventSource reconnects within 4 s (3 s delay + 1 s slack).
- **T10 → I12:** `curl -sI http://localhost:37777/viewer.js` (or whatever bundled asset name is) returns 200.
- **T11 → I6:** Trigger worker processing; assert `queueDepth` in DOM increments.
2. `tests/viewer-lockdown/run.sh` — wrapper that spins up the worker on a test port, seeds fixtures, runs the spec, and tears down.
**(b) Docs**
- Each T-number maps to an I-number in a table at the top of `regression.spec.ts` / `CHECKLIST.md`.
**(c) Verification**
- Running the suite against a clean HEAD worker (before any of plans 111 land) produces 11/11 PASS. This is the baseline.
- Every test has a deterministic pass/fail criterion. No "looks right" assertions.
**(d) Anti-pattern guards**
- **A:** Do not add tests for behaviors not listed in 05 §3.10 mermaid (e.g. do not test Header theme-toggle colors — out of scope).
- **E:** T6 is the explicit anti-collapse test.
---
## Phase 4 — Baseline current outputs
**(a) What to implement**
Capture pre-refactor baselines under `tests/viewer-lockdown/baseline/`:
1. `baseline/viewer.html.sha256` — SHA-256 of `GET /` response body with bearer token stripped (token is injected per-boot per `Apr 19 2026 observation 71147`).
2. `baseline/initial_load.json` — full `initial_load` SSE event payload captured against a seeded DB.
3. `baseline/api-observations-page0.json` — response of `GET /api/observations?offset=0&limit=20` on the same seeded DB.
4. `baseline/api-settings.json` — response of `GET /api/settings`.
5. `baseline/screenshots/` — 3 Playwright screenshots: initial feed render, modal open, filter applied. These are visual-regression anchors only; do NOT gate CI on pixel diffs.
**(b) Docs**
- `baseline/README.md` records git SHA, worker version, node version, OS at capture time.
**(c) Verification**
- Running the suite twice against HEAD produces identical SHA-256s and identical JSON payloads (modulo timestamps stripped).
**(d) Anti-pattern guards**
- **A:** Baselines represent observed HEAD behavior, not design wishes.
- **E:** n/a.
---
## Phase 5 — Post-landing re-run schedule
**(a) What to implement**
A schedule table in `tests/viewer-lockdown/SCHEDULE.md` mandating suite re-run after each of the **other 11 plans** lands. Critical re-run points:
| Upstream plan | Trigger | Critical tests |
|---|---|---|
| Plan 01 (privacy-tag-filtering) | new tag stripping at ingest | T4 (observation renders with stripped tags visible in card) |
| Plan 02 (sqlite-persistence) | schema migration | T3 (`initial_load` catalog non-empty after migration) |
| Plan 03 (response-parsing-storage) | ResponseProcessor changes | T4, T11 |
| Plan 04 (vector-search-sync) | `chroma_synced` column added | T5 (pagination response shape unchanged) |
| Plan 05 (context-injection-engine) | — | smoke only |
| Plan 06 (hybrid-search-orchestration) | — | smoke only |
| Plan 07 (session-lifecycle-management) | reaper consolidation | T3, T11 |
| Plan 08 (knowledge-corpus-builder) | — | smoke only |
| Plan 09 (lifecycle-hooks) | hook cache / `ensureWorkerRunning` changes | T4 (hook-triggered observation still broadcasts via SSE) |
| **Plan 11 (http-server-routes)** | **Phase 14 static-file cache + rate-limiter delete** (`06-implementation-plan.md:600-627`) | **ALL 11 tests** — critical. |
| Plan 12 (transcript-watcher-integration) | watcher rewires to direct-call | T4 (Cursor-sourced observation still appears via SSE) |
**(b) Docs**
- Schedule references 05 §3.10 as the unchanging contract.
- Mention CI hook location: if a CI workflow runs the test suite, gate merges of plans 111 on the lockdown suite passing green.
**(c) Verification**
- Schedule covers every plan in `06-implementation-plan.md` Phases 114 that is not this one.
- Plan 11 row explicitly lists all 11 tests (T1T11) as critical.
**(d) Anti-pattern guards**
- **A:** Do not skip the re-run for "unrelated" plans; smoke-run is still mandatory.
- **E:** n/a.
---
## Phase 6 — Escalation path
**(a) What to implement**
Write `tests/viewer-lockdown/ESCALATION.md` documenting:
1. **If the lockdown suite goes red after plan N lands:** open a new plan `07-plans/13-viewer-regression-{short-name}.md` describing:
- Which test failed (T-number).
- Which invariant was violated (I-number).
- Which upstream plan's change triggered the regression.
- A fix proposal.
2. **Do NOT** fix regressions inline inside plan N's branch. Regressions get their own branch, their own PR, and their own review. This preserves audit traceability.
3. **Special case — Plan 11 static-file cache:** if T1 SHA-256 mismatches after plan 11 lands, the likely cause is that `ViewerRoutes.handleViewerUI` (`ViewerRoutes.ts:54-72`) now serves a cached Buffer with a different bearer-token-injection strategy. Document whether (a) the baseline should be regenerated (bearer-token format changed) or (b) the cache implementation needs to match the pre-cache injection point. This is the single highest-risk interaction in the entire refactor.
**(b) Docs**
- Reference `06-implementation-plan.md:600-627` Phase 14 task 2.
- Reference `01-flowcharts/viewer-ui-layer.md:80` (reconnect timing constant) for I11 reconnect regressions.
**(c) Verification**
- Escalation doc exists.
- Template for `13-viewer-regression-*.md` is included.
**(d) Anti-pattern guards**
- **A:** Escalation doc does not prescribe fixes — only detection + routing.
- **E:** n/a.
---
## Copy-ready snippet locations
**None.** This is a lockdown plan; no code snippets are authored.
Regression-test files to be created (all under `tests/viewer-lockdown/`):
- `INVENTORY.md`
- `INVARIANTS.md`
- `regression.spec.ts` (or `CHECKLIST.md` if Playwright is unavailable)
- `run.sh`
- `baseline/viewer.html.sha256`
- `baseline/initial_load.json`
- `baseline/api-observations-page0.json`
- `baseline/api-settings.json`
- `baseline/screenshots/` (3 PNGs)
- `baseline/README.md`
- `SCHEDULE.md`
- `ESCALATION.md`
## Confidence + Gaps
**High confidence:**
- React component tree (confirmed in `App.tsx:1-163`).
- SSE event type list (confirmed in `useSSE.ts:76-120`).
- Hook inventory (confirmed via `src/ui/viewer/hooks/*` glob).
- Dedup pattern anchor (`App.tsx:50-66`, `utils/data.ts``mergeAndDeduplicateByProject`).
- Flowchart-to-live-code mapping for I1I12.
**Medium / gaps:**
1. **Gap — Plan 11 cache + bearer-token interaction.** Phase 14 task 2 in `06-implementation-plan.md:613` says "Cache `viewer.html` … in memory at boot; serve from `Buffer` instead of `fs.readFile`." But observation 71147 (Apr 19 2026) says the bearer token is injected into the viewer HTML as a per-boot window global. If the cache is a static immutable Buffer captured at worker-start, the bearer token will be baked in once per worker boot — fine. If plan 11 changes that to share a Buffer across worker restarts (e.g. via a persistent cache file), the token would desync. **T1 SHA-256 baseline must be regenerated after every worker restart** — document this in `baseline/README.md`. Confirm with plan 11 author whether caching happens at process-boot or at module-import (which could be once per container lifetime).
2. **Gap — Playwright availability.** If `package.json` does not already list Playwright as a dev dependency, adding it to satisfy this lockdown plan would violate the "no code changes" constraint. Fallback: author a manual `CHECKLIST.md` instead of the spec file. Decision deferred to execution time. Check: `grep -q playwright package.json` before choosing automation-vs-manual path.
3. **Low-priority gap — catalog update strategy.** `01-flowcharts/viewer-ui-layer.md:93` lists this as Medium confidence ("additive only"). If a plan introduces project deletion, `updateCatalogForItem` (`useSSE.ts:21-42`) is additive-only and will show stale entries. Not in scope for this lockdown but worth adding I13 if any upstream plan touches catalog eviction.
## Summary
- **Phase count:** 6
- **Expected lines deleted:** 0
- **Expected lines added to `src/`:** 0 (tests go under `tests/viewer-lockdown/`, outside the protected subsystem)
- **Top gaps:**
1. Plan 11's static-file cache change may reshape how bearer tokens are injected into `viewer.html` — T1 SHA-256 baseline needs re-capture after worker boots, and the cache lifecycle (per-boot vs. persistent) must be confirmed with plan 11 before T1 is considered reliable.
2. Playwright may not be a project dev dependency; fall back to a manual `CHECKLIST.md` if adding it is out-of-scope for a lockdown plan (which it is).
+244
View File
@@ -0,0 +1,244 @@
# Pathfinder Phase 8: Reconciliation
**Date**: 2026-04-22
**Inputs**: 12 per-flowchart plans in `PATHFINDER-2026-04-21/07-plans/`
**Authority**: Master plan `07-master-plan.md` defines the five reconciliation checks executed here. Plans supersede audit claims where they verified against live code.
---
## Status gate
- **5 hard blockers** must be resolved before `/do` runs. All are single-file, single-command fixes or out-of-band decisions — none requires re-planning.
- **11 coordination items** are resolved by landing plans in dependency order (the ladder in `07-master-plan.md`). No deadlocks detected.
- **15 info items** logged; none blocks execution.
- **2 ownership conflicts** detected (plans 07/09 on `/api/session/end`, plans 09/11 on `/api/context/semantic` schema). Both resolved by landing order — no code-level conflict.
- **Deletion-ledger aggregate: ~4,000 net source LoC**, 56% higher than the audit's 2,560 target. The overage is **genuine**, not double-counting: plan 06's live-code audit of `SearchManager.ts` (2069 lines → &lt;400) and plan 05's inclusion of `{Header,Timeline,Summary,Footer}Renderer.ts` files both exceeded the audit's row-level estimates.
**Recommended action**: resolve the 5 blockers (below), then run `/do` against the plans in the dependency order from `07-master-plan.md` § "Execution ladder". Reconciliation re-runs after each tier lands.
---
## Part 1 — Cross-plan citation index (overlap hotspots)
Only citations referenced by two or more plans are catalogued. Every overlap was verified consistent (same file, same or overlapping line range, same referenced symbol). No stale/divergent citations detected.
### Hotspot files (cited by 3+ plans)
| File | Cited by | Regions cited | Consistency |
|---|---|---|---|
| `src/services/worker/http/routes/SessionRoutes.ts` | 01, 03, 07, 09, 10, 11 | :377-389 (setupRoutes), :464-485 (handleObservations), :491-506 (handleSummarize), :629-633 (strip), :669-710 (summarizeByClaudeId), :747-753 (complete), :814-895 (sessionInit) | ✓ |
| `src/services/worker/SessionManager.ts` | 03, 07 | :17 (imports), :59-84 (detectStaleGenerator), :329-377 (queueSummarize), :336-346 (circuit breaker), :381-446 (deleteSession), :516-568 (reapStaleSessions) | ✓ |
| `src/services/worker/agents/ResponseProcessor.ts` | 03, 04, 07 | :69-108 (processAgentResponse), :87-108 (non-XML fail), :176-200 (circuit breaker), :286-308 (syncObservation), :380-405 (syncSummary) | ✓ |
### Cross-plan overlaps with symbol-level detail
- `SessionRoutes.ts:464-485` — plan 01 replaces the hand-rolled strip; plan 07 reframes the handler as an `ingestObservation()` call site. Plans must sequence: **01 before 07**.
- `SessionRoutes.ts:747-753` — plan 07 reads `session.lastSummaryStored`; plan 09 wires the hook-side blocking-call contract against the same state field. Plans must sequence: **07 before 09**.
- `SessionManager.ts:329-377` (queueSummarize) — plan 03 deletes lines :336-346 (circuit breaker + `consecutiveSummaryFailures`); plan 07 reframes the whole method around new pending-message queueing. Sequencing: **03 before 07**.
- `PendingMessageStore.ts:6, :99-145` — plan 02 Phase 4 moves the 60-s stale reset out of `claimNextMessage` into boot; plan 07 Phase 5 consumes that boot-recovery path. Sequencing: **02 before 07**.
- `SearchRoutes.ts:286-293` (inline semantic-injection mini-formatter) — plan 05 folds into `SearchResultStrategy`; plan 06 flags this as out-of-scope. Both plans acknowledge the handoff explicitly. Sequencing: **05 before 06**.
- `ResultFormatter.ts` and `CorpusRenderer.ts` — plan 05 deletes (consolidates into `renderObservations`); plans 06 and 10 consume the strategies. Sequencing: **05 before 06 before 10**.
### Files cited by exactly one plan (per-plan scope; no overlap)
- `src/utils/tag-stripping.ts` — plan 01 only.
- `src/services/sync/ChromaSync.ts` — plan 04 only.
- `src/services/worker/SearchManager.ts` — plan 06 only.
- `src/services/worker/knowledge/*` — plan 10 only.
- `src/services/transcripts/*` — plan 08 only.
- `src/cli/handlers/*` — plan 09 (most), plan 07 (summarize.ts for poll→blocking migration).
- `src/sdk/parser.ts` + `src/sdk/prompts.ts` — plan 03 only.
- `src/services/worker/ProcessRegistry.ts` (full file, 527 lines) — plan 07 only.
- `src/services/worker/http/middleware.ts` + route files (non-session) — plan 11 only.
- `src/ui/viewer/**` — plan 12 only (lockdown).
**No stale or divergent citations detected.** Reconciliation check 1 PASS.
---
## Part 2 — Deletion-ledger aggregate
| # | Flowchart | Plan: gross del | Plan: gross add | Plan: net | Audit Part 5 net | Delta | Flag |
|---|---|---|---|---|---|---|---|
| 01 | 3.2 privacy | 60 | +29 | **31** | 42 | +11 | ✓ |
| 02 | 3.3 sqlite | 140 | +~295 (incl. schema.sql) | **140** source-only | 490 | 71% | ⚠ reframe |
| 03 | 3.7 parsing | 135 | +35 | **100** | 210 | 52% | ⚠ narrow count |
| 04 | 3.4 chroma | 320 | +~60 | **320** | 320 | 0 | ✓ |
| 05 | 3.5 context | 1,250 | +320 | **930** | 280 | +233% | ⚠ expanded scope |
| 06 | 3.6 search | 1,700 | +40 | **1,700** | 260 | +554% | ⚠ audit undercounted |
| 07 | 3.8 lifecycle | 900 target | +400 | **500** | 478 | +5% | ✓ |
| 08 | 3.12 transcripts | 161 | +75 | **86** | 110 | 22% | ✓ |
| 09 | 3.1 hooks | 487 | +25 | **460** | 100 | +360% | ⚠ includes SessionRoutes cleanup |
| 10 | 3.11 corpus | 228 | +30 | **198** | 110 | +80% | ⚠ renderer double-count risk |
| 11 | 3.9 http | 180 | +60 | **120** | 160 | +25% | ✓ |
| 12 | 3.10 viewer | 0 | 0 | **0** (lockdown) | 0 | — | ✓ |
| **TOTAL** | | **~5,364** | **~+1,369** | **~4,000** | 2,560 | **+56%** | — |
**Delta analysis:**
- Plans 05, 06, 09 overshoot the audit rows by genuine margins (live-code counts dwarfed the audit's row estimates — plan 06's SearchManager was estimated at 260 but the actual file is 2,069 lines of which >1,700 is boilerplate/deprecated/pass-through).
- Plan 02 undershoots because it keeps 19 private migration methods as upgrade-only runners and treats `schema.sql` as *additive new file* rather than a replacement for deleted lines.
- Plan 03 undershoots because its count is area-local (parser + ResponseProcessor) and doesn't roll up the audit's row pairing.
- Plans 05/06/10 share renderer deletion credit. Plan 10 explicitly flags this (`CorpusRenderer` migration "credit is shared with Plan 05/unified-renderer"). **Action**: plan 05 owns the deletion; plans 06 and 10 count only their consumer-side imports.
**Adjusted net after double-count correction**: ~3,800 LoC (still +48% vs audit target; primarily plan 06's SearchManager mass-delete). Reconciliation check 2 PASS with note.
---
## Part 3 — Endpoint inventory reconciliation
### Before/after census
| Route file | Before | After | Δ |
|---|---|---|---|
| `SessionRoutes.ts` | 10 (6× `/sessions/:id/*` + 4× `/api/sessions/*`) | 4 (`/api/session/{start,prompt,observation,end}`) | 6 |
| `SearchRoutes.ts` | 18 | ~12 (pass-throughs deleted; `/api/context/{inject,semantic}` folded) | 6 |
| `CorpusRoutes.ts` | 7 | 5 (`/prime` and `/reprime` deleted) | 2 |
| Everything else | ~20 | ~20 (unchanged; Zod schemas added) | 0 |
**Audit claim** (05 § 3.1): "Endpoint count: 8 → 4". **Actual**: 10 → 4 per V9; plan 09 explicitly flags the audit undercount. Reconciliation adopts **10 → 4**.
### Ownership conflicts
1. **`POST /api/sessions/complete``POST /api/session/end` (blocking)**
- Plan 07 Phase 7: owns the worker-side blocking handler (replaces old `handleSessionComplete` at `SessionRoutes.ts:753`).
- Plan 09 Phase 6: owns the hook-side caller (replaces 500ms poll loop with single blocking call).
- **Status**: co-ownership, not a conflict. **Sequencing: 07 before 09.**
2. **`POST /api/context/semantic`**
- Plan 09 Phase 6: deletes the endpoint (folded into `/api/session/prompt`).
- Plan 11 Phase 3: attaches a `SemanticContextSchema` Zod schema to it (still exists from 11's perspective).
- **Status**: **landing-order conflict**. Plan 11 explicitly documents this (Gap 1: "Plan 09 landing order"). **Resolution: 09 must land before 11, or plan 11 must omit the semantic-context schema at execution time.**
3. **Plan-05 `/api/session/start`**
- Plan 05 Phase 6: worker-side handler returns `{sessionDbId, contextMarkdown, semanticMarkdown}`.
- Plan 09 Phase 1: hook-side caller consumes the payload.
- **Status**: co-ownership, declared. **Sequencing: 05 before 09.**
Reconciliation check 3 PASS with mandatory landing order: **07 → 05 → 06 → 09 → 11**.
---
## Part 4 — Timer census (revised 2026-04-22: zero-timer model)
> **Revision note:** this section previously accepted a "3 → 1" (`ReaperTick`) target and, via C7, quietly added a second `sqliteHousekeepingInterval`, which pushed the real count to 2. Both were band-aids over an event-driven model that already exists. Investigation 2026-04-22 (Invariants 1-4) confirmed the live code supports a true zero-timer model with one additional boot-once call. Target revised to **3 → 0 repeating background timers**.
| Timer | Location | Action | Owner | Before | After |
|---|---|---|---|---|---|
| `staleSessionReaperInterval` | `worker-service.ts:174, :547` | **delete** (replaced by event-driven + boot-once) | 07 P3 | 2 min | — |
| `startOrphanReaper` | `worker-service.ts:537` + `ProcessRegistry.ts:508-527` | **delete** | 07 P3 | 30 s | — |
| Transcript rescan | `watcher.ts:124-132` | delete (event-driven `fs.watch` recursive) | 08 P1 | 5 s | — |
| Summary poll | `summarize.ts:24, :117-150` | delete (blocking endpoint) | 09 P3 | 500 ms × 220 | — |
| Claim-stale reset (in `claimNextMessage`) | `PendingMessageStore.ts:99-145` | delete → boot-once `recoverStuckProcessing()` | 02 P4 / 07 P5 | per-claim | boot-once |
| `clearFailedOlderThan(1h)` | `worker-service.ts:567` | delete interval → boot-once call | 02 P(new) | 2 min | boot-once |
| `PRAGMA wal_checkpoint(PASSIVE)` | `worker-service.ts:581` | **delete outright** (SQLite default `wal_autocheckpoint=1000` pages is the contract) | 02 P(new) | 2 min | — |
| `killSystemOrphans` (ppid=1 sweep) | `ProcessRegistry.ts:315-344` | keep function, **move call** from interval → boot-once | 07 P3 | 30 s | boot-once |
| Chroma MCP backoff | (existing) | keep (event-driven on disconnect, not a repeating sweeper) | — | as-is | as-is |
| `ensureProcessExit` 5-s escalate | `ProcessRegistry.ts:185-229` | keep (inlined SIGTERM→5s→SIGKILL per-operation) | 07 P6 | per-delete | per-delete |
| Generator-exit 30-s wait | per-delete `Promise.race` | keep (per-operation) | — | per-delete | per-delete |
| Per-iterator idle 3-min `setTimeout` | `SessionQueueProcessor.ts:6` + resets at `:51-52, :62-63` | keep (per-session, resets on every chunk — covers hung-generator case on its own) | — | per-session | per-session |
| **Abandoned-session `setTimeout(deleteSession, 15min)`** | new, in `SessionManager.ts` | **ADD (per-session)** — scheduled on last-generator-completion, cleared on new activity; replaces `reapAbandonedSessions` sweeper | 07 P3 | — | per-session |
| SSE auto-reconnect (UI) | `useSSE.ts:61-71` | keep (I11, browser-owned) | 12 | 3 s | 3 s |
### Cross-check against 05 Part 4 (revised 2026-04-22)
- **"Repeating background timers: 3 → 0"** — CONFIRMED. `staleSessionReaperInterval`, `startOrphanReaper`, transcript rescan, summary poll all retire. No `ReaperTick` is introduced. No `sqliteHousekeepingInterval` is introduced. Final worker-layer count: **0 `setInterval`** across `src/services/worker/` + `worker-service.ts`.
- **"Polling loops: 1 → 0"** — CONFIRMED. Summary poll retires into blocking endpoint.
- **Zero-timer viability** (investigation 2026-04-22):
- **Invariant 1 (subprocess exit handlers)**: SDK at `ProcessRegistry.ts:479``unregisterProcess(:484)`; MCP at `worker-service.ts:530``supervisor.unregisterProcess(:531)`. HOLDS.
- **Invariant 2 (per-iterator idle timer)**: `SessionQueueProcessor.ts:6` with resets at `:51-52, :62-63` and `onIdleTimeout``SessionManager.ts:651-655``abortController.abort()`. HOLDS; supersedes `reapHungGenerators`.
- **Invariant 3 (sweeper coverage)**: only remaining event-model gap is ppid=1 orphans from a previous crashed worker. Closed by moving the existing `killSystemOrphans()` call from the interval to boot-once. HOLDS.
- **Invariant 4 (SQLite housekeeping)**: `Database.ts:162-168` sets no `wal_autocheckpoint` override → SQLite default (1000 pages) is active. Explicit `wal_checkpoint(PASSIVE)` call is redundant. `pending_messages` has no constraint requiring periodic purge; `clearFailedOlderThan` at boot-once is sufficient. HOLDS.
Reconciliation check 4 PASS (no action items; the prior action item is rescinded).
---
## Part 5 — Consolidated gaps ledger
### BLOCKERS (5) — resolve before `/do`
| # | Plan | Blocker | Resolution |
|---|---|---|---|
| B1 | 08 | `package.json:58` engine floor is `>=18.0.0`; recursive `fs.watch` on Linux requires Node 20+ | Bump `engines.node` to `>=20.0.0` in `package.json` **before** plan 08 Phase 1. Single-line change. |
| B2 | 09 | Stop-hook exit code on 110-s timeout must be 0 (Windows Terminal contract from CLAUDE.md) — plan 07's new blocking `/api/session/end` must return 200 with `{timedOut: true, summaryId: null}`, not 504/408 | Decision: plan 07 Phase 7's blocking endpoint returns HTTP 200 with `{summaryId: null, timedOut: true}` on timeout. Plan 09 Phase 3 maps any 200 to exit 0. Document in plan 07 Phase 7 edit. |
| B3 | 10 | Prompt-caching TTL assumption ("~5 min, near free") is unmeasured. If SDK cache key is whitespace-sensitive or cwd-scoped, per-query cost jumps ~20× | Run plan 10 Phase 7 step 3 (cost smoke test: three sequential `/api/corpus/:name/query` calls; assert `cache_read_input_tokens > 0` on calls 2 and 3) **before** declaring plan 10 landed. Gate subsequent work on pass. |
| B4 | 11 | Zod is NOT transitively shipped (`npm ls zod` empty). 06 Phase 0's claim that it's transitive via `@anthropic-ai/sdk` is factually wrong — this repo uses `@anthropic-ai/claude-agent-sdk`. | Plan 11 Phase 1 must run `npm install zod@^3.x` and commit the `package.json` + `package-lock.json` delta before any other Phase 11 work. Already in plan, flagged here for ops visibility. |
| B5 | 04 | No native `chroma_upsert_documents` in MCP surface; plan uses `add → on "already exist" error → delete+add` fallback keyed on error-text match | Document the error-text match pattern in plan 04 Phase 2. Add a guard: if Chroma MCP ships upsert or changes error text, fallback must be updated. Low risk, but brittle — INFO-level in practice, but listed here because it's a silent-failure surface. Consider demoting to INFO after ops review. |
### COORDINATION (11) — resolve by landing order
| # | Plans | Coordination | Resolution via |
|---|---|---|---|
| C1 | 02 ← 08 | Plan 02 Phase 6 (delete DEDUP_WINDOW_MS) gated on cross-path `tool_use_id` availability | Plan 08 must land first; its ingest ensures `tool_use_id` is present. Plan 02 Phase 6 gates on grep-verify during /do execution. |
| C2 | 03 ↔ 07 | RestartGuard surface ownership — plan 03 does not add `recordFailure()`; plan 07 may need to extend RestartGuard later | 03 lands first with narrower interpretation; 07 evaluates during Phase 7 whether to extend. Non-blocking. |
| C3 | 02 ← 04 | Plan 04 assumes `user_prompts.chroma_synced` column exists; plan 02 Phase 2 adds `observations.chroma_synced` only | **Action**: plan 02 Phase 2 also adds `user_prompts.chroma_synced` (or defer prompt backfill as plan 04 follow-up). Recommend extending 02 during /do. |
| C4 | 05 → 06 | `SearchResultStrategy.columns` option must handle two row shapes (with/without Work column) + the `SearchRoutes.ts:286-293` inline mini-formatter | Plan 05 defines the option in Phase 4; plan 06 Phase 6 consumes. Enforce landing order 05 → 06. |
| C5 | 05 → 09 | `/api/session/start` must include semantic markdown — plan 05 Phase 6 worker-side; plan 09 Phase 1 hook-side | Landing order 05 → 09. |
| C6 | 03 → 07 → 09 | `summary_stored` event wiring — plan 03 owns ResponseProcessor emission; plan 07 owns blocking-endpoint await; plan 09 owns hook blocking call | **Action**: plan 03 Phase 2 adds `session.summaryStoredEvent = new EventEmitter()`; plan 07 Phase 7 awaits; plan 09 Phase 3 calls. Landing 03 → 07 → 09. |
| C7 (REVISED 2026-04-22) | 07 ↔ 02 | `clearFailedOlderThan` + `wal_checkpoint` currently ride the stale-reaper interval; interval itself is being deleted | **Resolution**: `clearFailedOlderThan` moves to boot-once in plan 02 (new phase). Explicit `PRAGMA wal_checkpoint(PASSIVE)` is deleted outright — SQLite's default `wal_autocheckpoint=1000` pages covers it. No new `setInterval` is introduced. Plan 07 Phase 3 deletes the shared interval as part of removing the stale reaper. |
| C8 | 01 → 02 → 09 | `tool_use_id` availability in `NormalizedHookInput` (plan 01 payload), DB UNIQUE constraint (plan 02), hook serialization (plan 09) | Landing order 01 → 02 → 09; plan 02 UNIQUE constraint verifies presence. |
| C9 | 09 → 11 | Plan 11 Zod schemas target plan 09's post-state endpoint surface | Landing order 09 → 11, OR plan 11 ships schemas for legacy endpoints and prunes when 09 lands. **Recommend 09 → 11.** |
| C10 | 12 ↔ 11 | Viewer T1 SHA-256 baseline vs plan 11's viewer.html static cache; bearer-token-per-boot injection | Plan 12 T1 re-baselines after every worker boot. Plan 11 must document that cache lifecycle is per-boot (not persistent) — add to plan 11 Phase 6 notes. |
| C11 | 01 → 07, 08, 09 | `ingestObservation/ingestPrompt/ingestSummary` helper location — plan 07 owns; plans 08 and 09 consume | Landing order 01 → 07 → (08, 09 parallel). |
### INFO (15) — logged only
- Plan 01: ReDoS micro-benchmark informational; `queueSummarize` integration covered by Phase 3 test.
- Plan 01: Double-strip of `<system-reminder>` is idempotent.
- Plan 02: `schema.sql` generator filter must cover future FTS5 suffix variants.
- Plan 03: `<skip_summary/>` recognition decision (prompt update vs parser strict) — flagged for product owner.
- Plan 04: `updateMergedIntoProject` metadata patching left untouched.
- Plan 05: ANSI color-preservation regression surface (byte-equal snapshot required in Phase 8).
- Plan 05: `ResultFormatter` has two row shapes (tracked in C4).
- Plan 06: 503 error-body JSON shape decision (`{error:'chroma_unavailable'}`).
- Plan 06: `ResultFormatter.formatSearchResults` caller grep checklist.
- Plan 08: audit-named "Cursor/OpenCode/Gemini-CLI transcripts" diverges from implementation — those use hooks, not JSONL watcher.
- Plan 09: hook-process module-scope cache caveat (perf, not correctness).
- Plan 10: `corpus.json` storage shape tradeoff (observations vs rendered string).
- Plan 11: Zod version lock-in (3.x stable surface).
- Plan 12: Playwright optional; fallback manual `CHECKLIST.md`.
- Plan 12: Catalog update strategy may stale on future project-deletion feature.
---
## Part 6 — Execution decision
**Reconciliation verdict: READY to run `/do`**, subject to completing the blocker resolutions below as a preflight step.
### Preflight (before `/do`)
1. Bump `package.json` `engines.node` from `>=18.0.0` to `>=20.0.0` (B1).
2. Edit plan 07 Phase 7 spec to mandate `HTTP 200 + {summaryId: null, timedOut: true}` on the 110-s timeout path; edit plan 09 Phase 3 to map HTTP 200 → hook exit 0 (B2).
3. Edit plan 02 Phase 2 to add `user_prompts.chroma_synced` column alongside `observations.chroma_synced` (C3).
4. Edit plan 03 Phase 2 to add `session.summaryStoredEvent = new EventEmitter()` emission on summary commit (C6).
5. Edit plan 07 Phase 4 to preserve `clearFailedOlderThan` + `wal_checkpoint` in a dedicated 2-min interval (C7, Part 4 action item).
6. Edit plan 11 Phase 6 to document per-boot cache lifecycle (for plan 12's T1 baseline reset — C10).
Blockers B3 (plan 10 prompt-caching cost smoke test) and B4 (plan 11 Zod install) are already in the respective plans; no preflight edit needed but `/do` must block on these gates during execution of those plans.
### Recommended `/do` landing order
Landing tiers (plans in a tier can run in parallel; next tier waits for previous):
- **Tier 1**: 01 (privacy), 12 (viewer lockdown — regression harness only, independent).
- **Tier 2**: 02 (sqlite), 03 (parsing).
- **Tier 3**: 04 (chroma, requires 02), 05 (context/renderer).
- **Tier 4**: 06 (search, requires 05), 07 (session lifecycle, requires 01+02+03).
- **Tier 5**: 08 (transcripts, requires 01+07), 09 (hooks, requires 01+05+07), 10 (corpus, requires 05+06).
- **Tier 6**: 11 (http routes, requires 09).
Rerun reconciliation after Tier 3 and Tier 4 — they have the highest cross-plan overlap. Viewer regression suite from plan 12 runs after every tier per its Phase 5 schedule.
### Success gate for the full cleanup
All six success criteria from `07-master-plan.md` must be true. After `/do` completes all tiers:
- 12 plan documents exist ✓ (already)
- All plans have the four-block reporting contract ✓ (extraction confirmed)
- All plans cite at least one V-number or declare absence ✓ (extraction confirmed)
- All phases have the four sub-fields ✓ (extraction confirmed per sampled plan)
- Deletion-ledger roll-up ~4,000 LoC (after double-count correction: 3,800) — **exceeds** audit's 2,560 target by +48% due to genuine live-code undercount in the audit; reconciliation-verified, not padded.
- `08-reconciliation.md` written ✓ (this document)
**Gate status: CLEAR to proceed once preflight edits 1-6 above are applied.**
@@ -0,0 +1,145 @@
# Pathfinder Phase 9: Execution Runbook
**This is the control document for `/do` execution of the claude-mem v6.5.0 brutal-audit cleanup.**
Read this file first. It tells you what to read next, what to skip, what rules apply, and where to mark progress. Do not rely on memory — check this file every turn. Do not re-plan; a plan already exists.
---
## STOP — read this before touching anything
### Reading hierarchy (canonical → supporting → stale → forbidden)
| Tier | Files | How to use |
|---|---|---|
| **Canonical (always authoritative)** | `07-plans/01-*.md``07-plans/12-*.md` | The 12 per-flowchart plans. Each is self-contained and /do-executable. When phase instructions conflict with anything else, the per-flowchart plan wins. |
| **Canonical (design authority, read-only)** | `05-clean-flowcharts.md` | The brutal-audit design. Per-flowchart plans already cite the relevant sections; re-read only to resolve ambiguity. **Never modify.** |
| **Canonical (this file)** | `09-execution-runbook.md` | Runbook + checklists. Update the checkboxes as tiers land. |
| **Canonical (reconciliation)** | `08-reconciliation.md` | Preflight status, tier dependencies, ownership conflicts, timer census, gaps ledger. Re-read before each tier. Re-run reconciliation itself after Tiers 3 and 4. |
| **Supporting (cite when needed)** | `07-master-plan.md` | Dispatch strategy + ladder. Skim once to orient, then work from `07-plans/`. |
| **Supporting (discovery evidence)** | `06-implementation-plan.md` **Phase 0 only** (V1V20 verified-findings table, ~lines 2247) | Cross-reference when a plan cites a V-number. The V-table is still authoritative. |
| **STALE — DO NOT FOLLOW** | `06-implementation-plan.md` **Phases 115** | Superseded by `07-plans/`. These 15 cross-cutting phases were written without `/make-plan` and collapse 12 flowcharts into phase-ordered chunks. Every instruction in these phases is replaced by the per-flowchart plan. If you find yourself reading Phase 115, stop and go to the corresponding `07-plans/` file. |
| **STALE — DO NOT FOLLOW** | `03-unified-proposal.md`, `04-handoff-prompts.md` | Earlier drafts, superseded by `05-clean-flowcharts.md`. Background only. |
| **Reference (read-only)** | `00-features.md`, `01-flowcharts/*.md`, `02-duplication-report.md` | "Before" state documentation. Read only when a plan cites them for the current implementation's shape. |
### Rules — do not drift
1. **One tier at a time.** Finish all plans in a tier before starting the next. Plans within a tier may run in parallel.
2. **One plan at a time inside a session** (unless you're the orchestrator dispatching subagents). `/do` executes one per-flowchart plan per subagent; the subagent opens the plan file, works its phases in order, runs every Verification block, then reports back.
3. **Copy from file:line — never invent APIs.** Every plan phase says "copy from `<file>:<line>`". If the line doesn't match what you expect, stop and ask — don't guess.
4. **Never widen scope.** If a plan's phase list doesn't mention a file, don't touch that file. Out-of-scope fixes go in a new follow-up plan, never in the current execution.
5. **Never edit `05-clean-flowcharts.md`.** It is the design authority. If reality contradicts 05, write a correction into the affected per-flowchart plan as a `> **Preflight edit YYYY-MM-DD**` note — do not silently modify the plan body, and never the design doc.
6. **Never edit `06-implementation-plan.md` Phases 115.** They are stale by definition.
7. **Check every Verification checklist.** "Phase complete" means every checkbox in the phase's Verification block is green. A subagent that reports "done" without running the greps/tests is rejected.
8. **Update this runbook as you go.** Mark tier boxes complete only after all plans in the tier pass verification. Mark a plan in-progress the moment a subagent is dispatched; mark it landed when verification passes; mark it blocked if verification fails.
9. **Stop the tier on failure.** If any plan in a tier fails verification, halt the tier — do not start the next tier until the failure is triaged.
10. **Re-run reconciliation after Tier 3 and Tier 4** (largest cross-plan overlap). The existing reconciliation process is in `08-reconciliation.md` § "Part 6"; repeat the five checks against the landed state.
11. **Viewer regression (plan 12) runs after every tier.** Plan 12 is a lockdown doc; its regression suite (`tests/viewer-lockdown/*`) executes once before Tier 1 to baseline, then again after Tiers 1, 2, 3, 4, 5, 6. Any regression halts the tier.
12. **Do not commit the worktree branch partway through a tier** unless the tier's partial state builds and tests pass. Per-plan commits within a tier are fine.
13. **When in doubt, read `08-reconciliation.md` Part 6** — it lists the landing decision and preflight status.
14. **Ask the user before destructive moves outside the plan's scope.** Deleting extra files, bumping unrelated dependencies, reorganizing directories = all require permission.
### Preflight status (must be true before Tier 1 starts)
- [x] **B1**`package.json` engines.node bumped to `>=20.0.0` (applied 2026-04-22)
- [x] **B2** — plan 07 Phase 7 spec: HTTP 200 + `{timedOut: true}` on 110s timeout (applied 2026-04-22)
- [x] **C3** — plan 02 Phase 2: `user_prompts.chroma_synced` column added alongside observations + summaries (applied 2026-04-22)
- [x] **C6** — plan 03 Phase 2 + plan 07 Phase 7: `session.summaryStoredEvent` wiring (applied 2026-04-22)
- [x] **C7 (REVERTED 2026-04-22)** — earlier proposal introduced a dedicated `sqliteHousekeepingInterval`, which added a new repeating timer. Replaced by zero-timer model: `clearFailedOlderThan` moves to boot-once (plan 02); explicit `PRAGMA wal_checkpoint` calls are deleted (SQLite's `wal_autocheckpoint` default = 1000 pages is the contract). See 05 Part 4 revision 2026-04-22.
- [x] **C10** — plan 11 Phase 6: per-boot cache lifecycle contract documented (applied 2026-04-22)
**Preflight gate**: all six boxes must be `[x]` before launching Tier 1. If any is `[ ]`, stop — preflight edits are mandatory, not optional.
**In-flight blockers (gated during tier execution, not preflight):**
- **B3** (plan 10, Tier 5): prompt-caching TTL cost smoke test must pass before declaring plan 10 landed.
- **B4** (plan 11, Tier 6): plan 11 Phase 1 `npm install zod@^3.x` must run before any other plan 11 phase.
- **B5** (plan 04, Tier 3): Chroma upsert fallback error-text match is brittle; landed as-is with documentation, but flagged for ops review.
---
## Execution ladder — check off as you land
Tick the plan box only when every phase in the plan has passed verification AND the post-tier reconciliation (if applicable) is clean.
### Tier 1 — privacy foundation + viewer baseline (parallel)
- [ ] **Plan 01**`07-plans/01-privacy-tag-filtering.md` (5 phases, ~31 LoC)
- [ ] **Plan 12**`07-plans/12-viewer-ui-layer.md` (6 phases, lockdown: baseline regression suite)
**Tier gate**: plan 12 Phase 4 must produce a clean baseline snapshot before any other tier runs. Plan 01's summary privacy gap (P1 security bug) must be verified closed via the `<private>secret</private>` regression test.
### Tier 2 — data plane (parallel)
- [ ] **Plan 02**`07-plans/02-sqlite-persistence.md` (7 phases, ~140 LoC source, +~295 add inc. schema.sql)
- [ ] **Plan 03**`07-plans/03-response-parsing-storage.md` (5 phases, ~100 LoC)
**Tier gate**: plan 02 Phase 2 must land `chroma_synced` on observations + summaries + user_prompts (three tables per preflight C3). Plan 03 Phase 2 step 5 must wire `summaryStoredEvent.emit('stored', summaryId)`. Plan 12 regression re-run.
### Tier 3 — chroma + renderer (parallel)
- [ ] **Plan 04**`07-plans/04-vector-search-sync.md` (6 phases, ~320 LoC) — depends on plan 02
- [ ] **Plan 05**`07-plans/05-context-injection-engine.md` (8 phases, ~930 LoC)
**Tier gate**: plan 04 relies on plan 02's migration. Plan 05's ANSI byte-equal snapshots must pass. **Re-run full reconciliation** per rule 10. Plan 12 regression re-run.
### Tier 4 — search + session lifecycle (parallel)
- [ ] **Plan 06**`07-plans/06-hybrid-search-orchestration.md` (7 phases, ~1700 LoC) — depends on plan 05
- [ ] **Plan 07**`07-plans/07-session-lifecycle-management.md` (8 phases, ~500 LoC) — depends on plans 01, 02, 03
**Tier gate**: plan 07 Phase 7 blocking endpoint must pass both happy-path and 110s-timeout integration tests with HTTP 200 on both paths (preflight B2). Plan 06 must return 503 `{error:'chroma_unavailable'}` when Chroma is down, not silent SQL fallback. **Re-run full reconciliation** per rule 10. Plan 12 regression re-run.
### Tier 5 — transcripts + hooks + corpus (parallel)
- [ ] **Plan 08**`07-plans/08-transcript-watcher-integration.md` (6 phases, ~86 LoC) — depends on plans 01, 07
- [ ] **Plan 09**`07-plans/09-lifecycle-hooks.md` (7 phases, ~460 LoC) — depends on plans 01, 05, 07
- [ ] **Plan 10**`07-plans/10-knowledge-corpus-builder.md` (7 phases, ~198 LoC) — depends on plans 05, 06
**Tier gate**: plan 09 Phase 3 Windows Terminal tab-close test (hook exit 0 on 110s timeout). Plan 10 Phase 7 step 3 cost smoke test (preflight B3). Plan 08 relies on Node 20+ (preflight B1). Plan 12 regression re-run.
### Tier 6 — http routes + zod (solo)
- [ ] **Plan 11**`07-plans/11-http-server-routes.md` (8 phases, ~120 LoC) — depends on plan 09
**Tier gate**: plan 11 Phase 1 `npm install zod@^3.x` (preflight B4). Schemas attach to post-plan-09 endpoint surface (4 session endpoints, folded context endpoints). Plan 12 regression re-run (final).
### Post-landing
- [ ] Full reconciliation re-run; all green.
- [ ] Deletion-ledger total landed within ±10% of the reconciliation target (~3,800 LoC after double-count correction).
- [ ] Viewer regression baseline from Tier 1 matches viewer behavior after Tier 6 (modulo bearer-token re-baseline per plan 12 T1 + preflight C10).
- [ ] Full test suite clean on Node 20+.
- [ ] `grep -r "ProcessRegistry" src/` returns zero hits in `src/services/worker/` (supervisor registry is the only one left).
- [ ] `grep -rn "setInterval" src/services/worker/ src/services/worker-service.ts` returns **zero** hits. Zero-timer model: every recurring check is replaced by an event-driven handler, a per-operation `setTimeout`, or boot-once reconciliation. See 05 Part 4 (revised 2026-04-22).
- [ ] `grep -rn "startOrphanReaper\|staleSessionReaperInterval\|reapStaleSessions\|startReaperTick\|ReaperTick" src/` returns zero hits.
- [ ] `grep -rn "POLL_INTERVAL_MS\|MAX_WAIT_FOR_SUMMARY_MS" src/` returns zero hits (polling loop gone).
- [ ] `grep -rn "coerceObservationToSummary\|consecutiveSummaryFailures\|findDuplicateObservation\|stripMemoryTagsFromJson\|stripMemoryTagsFromPrompt" src/` returns zero hits.
---
## Per-plan quick reference
| Plan | Flowchart | Key files touched | Critical invariant to preserve |
|---|---|---|---|
| 01 | 3.2 privacy | `src/utils/tag-stripping.ts`, `src/services/worker/http/routes/SessionRoutes.ts` | Every text-ingress point strips memory tags; summary path closes P1 security gap |
| 02 | 3.3 sqlite | `src/services/sqlite/**` | WAL mode, FTS5 triggers, tables unchanged; only constraints + columns added |
| 03 | 3.7 parsing | `src/sdk/parser.ts`, `src/services/worker/agents/ResponseProcessor.ts`, `src/sdk/prompts.ts` | Atomic obs+summary TX preserved; parser contract enforced (no coerce) |
| 04 | 3.4 chroma | `src/services/sync/ChromaSync.ts` | Writes to SQLite never blocked by Chroma; `chroma_synced` flag drives backfill |
| 05 | 3.5 context | `src/services/context/**`, `src/services/worker/search/ResultFormatter.ts`, `src/services/worker/knowledge/CorpusRenderer.ts` | Agent + Human outputs byte-identical post-refactor |
| 06 | 3.6 search | `src/services/worker/SearchManager.ts`, `src/services/worker/search/**` | All three search paths preserved; 503 on Chroma-down, no silent fallback |
| 07 | 3.8 session | `src/services/worker/ProcessRegistry.ts` (deleted), `src/services/worker/worker-service.ts`, `src/services/worker/SessionManager.ts` | Subprocess crash recovery preserved via `child.on('exit')` handlers (already wired); previous-worker-crash orphans cleaned via boot-once `killSystemOrphans()`; abandoned-session cleanup via per-session `setTimeout(deleteSession,15min)` scheduled on last-generator-completion. **No repeating background timers.** |
| 08 | 3.12 transcripts | `src/services/transcripts/**` | Codex JSONL ingestion preserved; session_end → queueSummarize still triggers |
| 09 | 3.1 hooks | `src/cli/handlers/**`, `src/services/worker/http/routes/SessionRoutes.ts` | Hook exit codes preserved; Windows Terminal tab behavior (exit 0) preserved |
| 10 | 3.11 corpus | `src/services/worker/knowledge/**`, `src/services/worker/http/routes/CorpusRoutes.ts` | Build / query / rebuild / delete HTTP surface preserved; prime/reprime removed |
| 11 | 3.9 http | `src/services/worker/http/**`, all route files | All user-facing routes preserved; SSE preserved; admin endpoints preserved |
| 12 | 3.10 viewer | `tests/viewer-lockdown/*` (new) | No source changes; invariants I1I12 hold |
---
## If something goes wrong
1. **Read `08-reconciliation.md` Part 5 gaps ledger first** — the issue may be a known blocker or coordination item.
2. **Check preflight status** at the top of this file. A missed preflight is the most common drift source.
3. **Do not "fix" by widening scope.** If a plan phase fails, the fix goes in that phase or a follow-up plan. Do not hand-edit the codebase outside the plan's scope.
4. **If a plan's file:line citation is stale** (file has moved or line numbers shifted because an earlier tier already edited it), note it in the plan body as a `> **Live correction YYYY-MM-DD**:` block and proceed with the updated location. Do not re-run the subagent that wrote the plan.
5. **If reconciliation after a tier fails the deletion-ledger check** by more than ±15% below target, a plan's deletions were incomplete. Re-read the plan's Phase verification blocks; the missing greps point to the undone work.
6. **If a plan reports "blocked"** because an upstream plan's assumption doesn't hold, escalate to the user with the plan file + phase number + the broken assumption. Do not improvise.
---
## Why this file exists
`07-master-plan.md` describes the split-and-dispatch strategy. `08-reconciliation.md` captures the snapshot of the 12 plans and the preflight decisions. Neither is a runbook with live state — they're snapshots. This file is the living execution record: it says what to read, what to skip, what to check off, and what rules prevent drift. An agent picking up the work cold reads **this file first** and can orient from here without having to reconstruct the state from 20 prior docs.