Files
claude-mem/src/services/worker
Alex Newman ba1ef6c42c fix: Issue Blowout 2026 — 25 bugs across worker, hooks, security, and search (#2080)
* fix: resolve search, database, and docker bugs (#1913, #1916, #1956, #1957, #2048)

- Fix concept/concepts param mismatch in SearchManager.normalizeParams (#1916)
- Add FTS5 keyword fallback when ChromaDB is unavailable (#1913, #2048)
- Add periodic WAL checkpoint and journal_size_limit to prevent unbounded WAL growth (#1956)
- Add periodic clearFailed() to purge stale pending_messages (#1957)
- Fix nounset-safe TTY_ARGS expansion in docker/claude-mem/run.sh

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: prevent silent data loss on non-XML responses, add queue info to /health (#1867, #1874)

- ResponseProcessor: mark messages as failed (with retry) instead of confirming
  when the LLM returns non-XML garbage (auth errors, rate limits) (#1874)
- Health endpoint: include activeSessions count for queue liveness monitoring (#1867)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: cache isFts5Available() at construction time

Addresses Greptile review: avoid DDL probe (CREATE + DROP) on every text
query. Result is now cached in _fts5Available at construction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve worker stability bugs — pool deadlock, MCP loopback, restart guard (#1868, #1876, #2053)

- Replace flat consecutiveRestarts counter with time-windowed RestartGuard:
  only counts restarts within 60s window (cap=10), decays after 5min of
  success. Prevents stranding pending messages on long-running sessions. (#2053)

- Add idle session eviction to pool slot allocation: when all slots are full,
  evict the idlest session (no pending work, oldest activity) to free a slot
  for new requests, preventing 60s timeout deadlock. (#1868)

- Fix MCP loopback self-check: use process.execPath instead of bare 'node'
  which fails on non-interactive PATH. Fix crash misclassification by removing
  false "Generator exited unexpectedly" error log on normal completion. (#1876)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve hooks reliability bugs — summarize exit code, session-init health wait (#1896, #1901, #1903, #1907)

- Wrap summarize hook's workerHttpRequest in try/catch to prevent exit
  code 2 (blocking error) on network failures or malformed responses.
  Session exit no longer blocks on worker errors. (#1901)

- Add health-check wait loop to UserPromptSubmit session-init command in
  hooks.json. On Linux/WSL where hook ordering fires UserPromptSubmit
  before SessionStart, session-init now waits up to 10s for worker health
  before proceeding. Also wrap session-init HTTP call in try/catch. (#1907)

- Close #1896 as already-fixed: mtime comparison at file-context.ts:255-267
  bypasses truncation when file is newer than latest observation.

- Close #1903 as no-repro: hooks.json correctly declares all hook events.
  Issue was Claude Code 12.0.1/macOS platform event-dispatch bug.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: security hardening — bearer auth, path validation, rate limits, per-user port (#1932, #1933, #1934, #1935, #1936)

- Add bearer token auth to all API endpoints: auto-generated 32-byte
  token stored at ~/.claude-mem/worker-auth-token (mode 0600). All hook,
  MCP, viewer, and OpenCode requests include Authorization header.
  Health/readiness endpoints exempt for polling. (#1932, #1933)

- Add path traversal protection: watch.context.path validated against
  project root and ~/.claude-mem/ before write. Rejects ../../../etc
  style attacks. (#1934)

- Reduce JSON body limit from 50MB to 5MB. Add in-memory rate limiter
  (300 req/min/IP) to prevent abuse. (#1935)

- Derive default worker port from UID (37700 + uid%100) to prevent
  cross-user data leakage on multi-user macOS. Windows falls back to
  37777. Shell hooks use same formula via id -u. (#1936)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve search project filtering and import Chroma sync (#1911, #1912, #1914, #1918)

- Fix per-type search endpoints to pass project filter to Chroma queries
  and SQLite hydration. searchObservations/Sessions/UserPrompts now use
  $or clause matching project + merged_into_project. (#1912)

- Fix timeline/search methods to pass project to Chroma anchor queries.
  Prevents cross-project result leakage when project param omitted. (#1911)

- Sync imported observations to ChromaDB after FTS rebuild. Import
  endpoint now calls chromaSync.syncObservation() for each imported
  row, making them visible to MCP search(). (#1914)

- Fix session-init cwd fallback to match context.ts (process.cwd()).
  Prevents project key mismatch that caused "no previous sessions"
  on fresh sessions. (#1918)

- Fix sync-marketplace restart to include auth token and per-user port.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve all CodeRabbit and Greptile review comments on PR #2080

- Fix run.sh comment mismatch (no-op flag vs empty array)
- Gate session-init on health check success (prevent running when worker unreachable)
- Fix date_desc ordering ignored in FTS session search
- Age-scope failed message purge (1h retention) instead of clearing all
- Anchor RestartGuard decay to real successes (null init, not Date.now())
- Add recordSuccess() calls in ResponseProcessor and completion path
- Prevent caller headers from overriding bearer auth token
- Add lazy cleanup for rate limiter map to prevent unbounded growth
- Bound post-import Chroma sync with concurrency limit of 8
- Add doc_type:'observation' filter to Chroma queries feeding observation hydration
- Add FTS fallback to all specialized search handlers (observations, sessions, prompts, timeline)
- Add response.ok check and error handling in viewer saveSettings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve CodeRabbit round-2 review comments

- Use failure timestamp (COALESCE) instead of created_at_epoch for stale purge
- Downgrade _fts5Available flag when FTS table creation fails
- Escape FTS5 MATCH input by quoting user queries as literal phrases
- Escape LIKE metacharacters (%, _, \) in prompt text search
- Add response.ok check in initial settings load (matches save flow)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve CodeRabbit round-3 review comments

- Include failed_at_epoch in COALESCE for age-scoped purge
- Re-throw FTS5 errors so callers can distinguish failure from no-results
- Wrap all FTS fallback calls in SearchManager with try/catch

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 11:42:09 -07:00
..

Worker Service Architecture

Overview

The Worker Service is an Express HTTP server that handles all claude-mem operations. It runs on port 37777 (configurable via CLAUDE_MEM_WORKER_PORT) and is managed by PM2.

Request Flow

Hook (plugin/scripts/*-hook.js)
  → HTTP Request to Worker (localhost:37777)
    → Route Handler (http/routes/*.ts)
      → MCP Server Tool (for search) OR Service Layer (for session/data)
        → Database (SQLite3 + Chroma vector DB)

Directory Structure

src/services/worker/
├── README.md                     # This file
├── WorkerService.ts              # Slim orchestrator (~150 lines)
├── http/                         # HTTP layer
│   ├── middleware.ts             # Shared middleware (logging, CORS, etc.)
│   └── routes/                   # Route handlers organized by feature area
│       ├── SessionRoutes.ts      # Session lifecycle (init, observations, summarize, complete)
│       ├── DataRoutes.ts         # Data retrieval (get observations, summaries, prompts, stats)
│       ├── SearchRoutes.ts       # Search/MCP proxy (all search endpoints)
│       ├── SettingsRoutes.ts     # Settings, MCP toggle, branch switching
│       └── ViewerRoutes.ts       # Health check, viewer UI, SSE stream
└── services/                     # Business logic services (existing, NO CHANGES in Phase 1)
    ├── DatabaseManager.ts        # SQLite connection management
    ├── SessionManager.ts         # Session state tracking
    ├── SDKAgent.ts               # Claude Agent SDK for observations/summaries
    ├── SSEBroadcaster.ts         # Server-Sent Events for real-time updates
    ├── PaginationHelper.ts       # Query pagination utilities
    ├── SettingsManager.ts        # User settings CRUD
    └── BranchManager.ts          # Git branch operations

Route Organization

ViewerRoutes.ts

  • GET /health - Health check endpoint
  • GET / - Serve viewer UI (React app)
  • GET /stream - SSE stream for real-time updates

SessionRoutes.ts

Session lifecycle operations (use service layer directly):

  • POST /sessions/init - Initialize new session
  • POST /sessions/:sessionId/observations - Add tool usage observations
  • POST /sessions/:sessionId/summarize - Trigger session summary
  • GET /sessions/:sessionId/status - Get session status
  • DELETE /sessions/:sessionId - Delete session
  • POST /sessions/:sessionId/complete - Mark session complete
  • POST /sessions/claude-id/:claudeId/observations - Add observations by claude_id
  • POST /sessions/claude-id/:claudeId/summarize - Summarize by claude_id
  • POST /sessions/claude-id/:claudeId/complete - Complete by claude_id

DataRoutes.ts

Data retrieval operations (use service layer directly):

  • GET /observations - List observations (paginated)
  • GET /summaries - List session summaries (paginated)
  • GET /prompts - List user prompts (paginated)
  • GET /observations/:id - Get observation by ID
  • GET /sessions/:sessionId - Get session by ID
  • GET /prompts/:id - Get prompt by ID
  • GET /stats - Get database statistics
  • GET /projects - List all projects
  • GET /processing - Get processing status
  • POST /processing - Set processing status

SearchRoutes.ts

All search operations (proxy to MCP server):

  • GET /search - Unified search (observations + sessions + prompts)
  • GET /timeline - Unified timeline context
  • GET /decisions - Decision-type observations
  • GET /changes - Change-related observations
  • GET /how-it-works - How-it-works explanations
  • GET /search/observations - Search observations
  • GET /search/sessions - Search sessions
  • GET /search/prompts - Search prompts
  • GET /search/by-concept - Find by concept tag
  • GET /search/by-file - Find by file path
  • GET /search/by-type - Find by observation type
  • GET /search/recent-context - Get recent context
  • GET /search/context-timeline - Get context timeline
  • GET /context/preview - Preview context
  • GET /context/inject - Inject context
  • GET /search/timeline-by-query - Timeline by search query
  • GET /search/help - Search help

SettingsRoutes.ts

Settings and configuration (use service layer directly):

  • GET /settings - Get user settings
  • POST /settings - Update user settings
  • GET /mcp/status - Get MCP server status
  • POST /mcp/toggle - Toggle MCP server on/off
  • GET /branch/status - Get git branch info
  • POST /branch/switch - Switch git branch
  • POST /branch/update - Pull branch updates

Current State (Phase 1)

Phase 1 is a pure code reorganization with ZERO functional changes:

  • Extract route handlers from WorkerService.ts monolith
  • Organize into logical route classes
  • Keep all existing behavior identical

MCP vs Direct DB Split (inherited, not changed in Phase 1):

  • Search operations → MCP server (mem-search)
  • Session/data operations → Direct DB access via service layer

Future Phase 2

Phase 2 will unify the architecture:

  1. Expand MCP server to handle ALL operations (not just search)
  2. Convert all route handlers to proxy through MCP
  3. Move database logic from service layer into MCP tools
  4. Result: Worker becomes pure HTTP → MCP proxy for maximum portability

This separation allows the worker to be deployed anywhere (as a CLI tool, cloud service, etc.) without carrying database dependencies.

Adding New Endpoints

  1. Choose the appropriate route file based on the endpoint's purpose
  2. Add the route handler method to the class
  3. Register the route in the setupRoutes() method
  4. Import any needed services in the constructor
  5. Follow the existing patterns for error handling and logging

Example:

// In DataRoutes.ts
private async handleGetFoo(req: Request, res: Response): Promise<void> {
  try {
    const result = await this.dbManager.getFoo();
    res.json(result);
  } catch (error) {
    logger.failure('WORKER', 'Get foo failed', {}, error as Error);
    res.status(500).json({ error: (error as Error).message });
  }
}

// Register in setupRoutes()
app.get('/foo', this.handleGetFoo.bind(this));

Key Design Principles

  1. Progressive Disclosure: Navigate from high-level (WorkerService.ts) to specific routes to implementation details
  2. Single Responsibility: Each route class handles one feature area
  3. Dependency Injection: Route classes receive only the services they need
  4. Consistent Error Handling: All handlers use try/catch with logger.failure()
  5. Bound Methods: All route handlers use .bind(this) to preserve context