Files
claude-mem/src/services/worker
Alex Newman 40daf8f3fa feat: replace WASM embeddings with persistent chroma-mcp MCP connection (#1176)
* feat: replace WASM embeddings with persistent chroma-mcp MCP connection

Replace ChromaServerManager (npx chroma run + chromadb npm + ONNX/WASM)
with ChromaMcpManager, a singleton stdio MCP client that communicates with
chroma-mcp via uvx. This eliminates native binary issues, segfaults, and
WASM embedding failures that plagued cross-platform installs.

Key changes:
- Add ChromaMcpManager: singleton MCP client with lazy connect, auto-reconnect,
  connection lock, and Zscaler SSL cert support
- Rewrite ChromaSync to use MCP tool calls instead of chromadb npm client
- Handle chroma-mcp's non-JSON responses (plain text success/error messages)
- Treat "collection already exists" as idempotent success
- Wire ChromaMcpManager into GracefulShutdown for clean subprocess teardown
- Delete ChromaServerManager (no longer needed)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — connection guard leak, timer leak, async reset

- Clear connecting guard in finally block to prevent permanent reconnection block
- Clear timeout after successful connection to prevent timer leak
- Make reset() async to await stop() before nullifying instance
- Delete obsolete chroma-server-manager test (imports deleted class)
- Update graceful-shutdown test to use chromaMcpManager property name

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: prevent chroma-mcp spawn storm — zombie cleanup, stale onclose guard, reconnect backoff

Three bugs caused chroma-mcp processes to accumulate (92+ observed):

1. Zombie on timeout: failed connections left subprocess alive because
   only the timer was cleared, not the transport. Now catch block
   explicitly closes transport+client before rethrowing.

2. Stale onclose race: old transport's onclose handler captured `this`
   and overwrote the current connection reference after reconnect,
   orphaning the new subprocess. Now guarded with reference check.

3. No backoff: every failure triggered immediate reconnect. With
   backfill doing hundreds of MCP calls, this created rapid-fire
   spawning. Added 10s backoff on both connection failure and
   unexpected process death.

Also includes ChromaSync fixes from PR review:
- queryChroma deduplication now preserves index-aligned arrays
- SQL injection guard on backfill ID exclusion lists

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 18:32:38 -05:00
..
2026-02-07 01:05:38 -05:00

Worker Service Architecture

Overview

The Worker Service is an Express HTTP server that handles all claude-mem operations. It runs on port 37777 (configurable via CLAUDE_MEM_WORKER_PORT) and is managed by PM2.

Request Flow

Hook (plugin/scripts/*-hook.js)
  → HTTP Request to Worker (localhost:37777)
    → Route Handler (http/routes/*.ts)
      → MCP Server Tool (for search) OR Service Layer (for session/data)
        → Database (SQLite3 + Chroma vector DB)

Directory Structure

src/services/worker/
├── README.md                     # This file
├── WorkerService.ts              # Slim orchestrator (~150 lines)
├── http/                         # HTTP layer
│   ├── middleware.ts             # Shared middleware (logging, CORS, etc.)
│   └── routes/                   # Route handlers organized by feature area
│       ├── SessionRoutes.ts      # Session lifecycle (init, observations, summarize, complete)
│       ├── DataRoutes.ts         # Data retrieval (get observations, summaries, prompts, stats)
│       ├── SearchRoutes.ts       # Search/MCP proxy (all search endpoints)
│       ├── SettingsRoutes.ts     # Settings, MCP toggle, branch switching
│       └── ViewerRoutes.ts       # Health check, viewer UI, SSE stream
└── services/                     # Business logic services (existing, NO CHANGES in Phase 1)
    ├── DatabaseManager.ts        # SQLite connection management
    ├── SessionManager.ts         # Session state tracking
    ├── SDKAgent.ts               # Claude Agent SDK for observations/summaries
    ├── SSEBroadcaster.ts         # Server-Sent Events for real-time updates
    ├── PaginationHelper.ts       # Query pagination utilities
    ├── SettingsManager.ts        # User settings CRUD
    └── BranchManager.ts          # Git branch operations

Route Organization

ViewerRoutes.ts

  • GET /health - Health check endpoint
  • GET / - Serve viewer UI (React app)
  • GET /stream - SSE stream for real-time updates

SessionRoutes.ts

Session lifecycle operations (use service layer directly):

  • POST /sessions/init - Initialize new session
  • POST /sessions/:sessionId/observations - Add tool usage observations
  • POST /sessions/:sessionId/summarize - Trigger session summary
  • GET /sessions/:sessionId/status - Get session status
  • DELETE /sessions/:sessionId - Delete session
  • POST /sessions/:sessionId/complete - Mark session complete
  • POST /sessions/claude-id/:claudeId/observations - Add observations by claude_id
  • POST /sessions/claude-id/:claudeId/summarize - Summarize by claude_id
  • POST /sessions/claude-id/:claudeId/complete - Complete by claude_id

DataRoutes.ts

Data retrieval operations (use service layer directly):

  • GET /observations - List observations (paginated)
  • GET /summaries - List session summaries (paginated)
  • GET /prompts - List user prompts (paginated)
  • GET /observations/:id - Get observation by ID
  • GET /sessions/:sessionId - Get session by ID
  • GET /prompts/:id - Get prompt by ID
  • GET /stats - Get database statistics
  • GET /projects - List all projects
  • GET /processing - Get processing status
  • POST /processing - Set processing status

SearchRoutes.ts

All search operations (proxy to MCP server):

  • GET /search - Unified search (observations + sessions + prompts)
  • GET /timeline - Unified timeline context
  • GET /decisions - Decision-type observations
  • GET /changes - Change-related observations
  • GET /how-it-works - How-it-works explanations
  • GET /search/observations - Search observations
  • GET /search/sessions - Search sessions
  • GET /search/prompts - Search prompts
  • GET /search/by-concept - Find by concept tag
  • GET /search/by-file - Find by file path
  • GET /search/by-type - Find by observation type
  • GET /search/recent-context - Get recent context
  • GET /search/context-timeline - Get context timeline
  • GET /context/preview - Preview context
  • GET /context/inject - Inject context
  • GET /search/timeline-by-query - Timeline by search query
  • GET /search/help - Search help

SettingsRoutes.ts

Settings and configuration (use service layer directly):

  • GET /settings - Get user settings
  • POST /settings - Update user settings
  • GET /mcp/status - Get MCP server status
  • POST /mcp/toggle - Toggle MCP server on/off
  • GET /branch/status - Get git branch info
  • POST /branch/switch - Switch git branch
  • POST /branch/update - Pull branch updates

Current State (Phase 1)

Phase 1 is a pure code reorganization with ZERO functional changes:

  • Extract route handlers from WorkerService.ts monolith
  • Organize into logical route classes
  • Keep all existing behavior identical

MCP vs Direct DB Split (inherited, not changed in Phase 1):

  • Search operations → MCP server (mem-search)
  • Session/data operations → Direct DB access via service layer

Future Phase 2

Phase 2 will unify the architecture:

  1. Expand MCP server to handle ALL operations (not just search)
  2. Convert all route handlers to proxy through MCP
  3. Move database logic from service layer into MCP tools
  4. Result: Worker becomes pure HTTP → MCP proxy for maximum portability

This separation allows the worker to be deployed anywhere (as a CLI tool, cloud service, etc.) without carrying database dependencies.

Adding New Endpoints

  1. Choose the appropriate route file based on the endpoint's purpose
  2. Add the route handler method to the class
  3. Register the route in the setupRoutes() method
  4. Import any needed services in the constructor
  5. Follow the existing patterns for error handling and logging

Example:

// In DataRoutes.ts
private async handleGetFoo(req: Request, res: Response): Promise<void> {
  try {
    const result = await this.dbManager.getFoo();
    res.json(result);
  } catch (error) {
    logger.failure('WORKER', 'Get foo failed', {}, error as Error);
    res.status(500).json({ error: (error as Error).message });
  }
}

// Register in setupRoutes()
app.get('/foo', this.handleGetFoo.bind(this));

Key Design Principles

  1. Progressive Disclosure: Navigate from high-level (WorkerService.ts) to specific routes to implementation details
  2. Single Responsibility: Each route class handles one feature area
  3. Dependency Injection: Route classes receive only the services they need
  4. Consistent Error Handling: All handlers use try/catch with logger.failure()
  5. Bound Methods: All route handlers use .bind(this) to preserve context