40daf8f3fa
* feat: replace WASM embeddings with persistent chroma-mcp MCP connection Replace ChromaServerManager (npx chroma run + chromadb npm + ONNX/WASM) with ChromaMcpManager, a singleton stdio MCP client that communicates with chroma-mcp via uvx. This eliminates native binary issues, segfaults, and WASM embedding failures that plagued cross-platform installs. Key changes: - Add ChromaMcpManager: singleton MCP client with lazy connect, auto-reconnect, connection lock, and Zscaler SSL cert support - Rewrite ChromaSync to use MCP tool calls instead of chromadb npm client - Handle chroma-mcp's non-JSON responses (plain text success/error messages) - Treat "collection already exists" as idempotent success - Wire ChromaMcpManager into GracefulShutdown for clean subprocess teardown - Delete ChromaServerManager (no longer needed) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review — connection guard leak, timer leak, async reset - Clear connecting guard in finally block to prevent permanent reconnection block - Clear timeout after successful connection to prevent timer leak - Make reset() async to await stop() before nullifying instance - Delete obsolete chroma-server-manager test (imports deleted class) - Update graceful-shutdown test to use chromaMcpManager property name Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: prevent chroma-mcp spawn storm — zombie cleanup, stale onclose guard, reconnect backoff Three bugs caused chroma-mcp processes to accumulate (92+ observed): 1. Zombie on timeout: failed connections left subprocess alive because only the timer was cleared, not the transport. Now catch block explicitly closes transport+client before rethrowing. 2. Stale onclose race: old transport's onclose handler captured `this` and overwrote the current connection reference after reconnect, orphaning the new subprocess. Now guarded with reference check. 3. No backoff: every failure triggered immediate reconnect. With backfill doing hundreds of MCP calls, this created rapid-fire spawning. Added 10s backoff on both connection failure and unexpected process death. Also includes ChromaSync fixes from PR review: - queryChroma deduplication now preserves index-aligned arrays - SQL injection guard on backfill ID exclusion lists Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Worker Service Architecture
Overview
The Worker Service is an Express HTTP server that handles all claude-mem operations. It runs on port 37777 (configurable via CLAUDE_MEM_WORKER_PORT) and is managed by PM2.
Request Flow
Hook (plugin/scripts/*-hook.js)
→ HTTP Request to Worker (localhost:37777)
→ Route Handler (http/routes/*.ts)
→ MCP Server Tool (for search) OR Service Layer (for session/data)
→ Database (SQLite3 + Chroma vector DB)
Directory Structure
src/services/worker/
├── README.md # This file
├── WorkerService.ts # Slim orchestrator (~150 lines)
├── http/ # HTTP layer
│ ├── middleware.ts # Shared middleware (logging, CORS, etc.)
│ └── routes/ # Route handlers organized by feature area
│ ├── SessionRoutes.ts # Session lifecycle (init, observations, summarize, complete)
│ ├── DataRoutes.ts # Data retrieval (get observations, summaries, prompts, stats)
│ ├── SearchRoutes.ts # Search/MCP proxy (all search endpoints)
│ ├── SettingsRoutes.ts # Settings, MCP toggle, branch switching
│ └── ViewerRoutes.ts # Health check, viewer UI, SSE stream
└── services/ # Business logic services (existing, NO CHANGES in Phase 1)
├── DatabaseManager.ts # SQLite connection management
├── SessionManager.ts # Session state tracking
├── SDKAgent.ts # Claude Agent SDK for observations/summaries
├── SSEBroadcaster.ts # Server-Sent Events for real-time updates
├── PaginationHelper.ts # Query pagination utilities
├── SettingsManager.ts # User settings CRUD
└── BranchManager.ts # Git branch operations
Route Organization
ViewerRoutes.ts
GET /health- Health check endpointGET /- Serve viewer UI (React app)GET /stream- SSE stream for real-time updates
SessionRoutes.ts
Session lifecycle operations (use service layer directly):
POST /sessions/init- Initialize new sessionPOST /sessions/:sessionId/observations- Add tool usage observationsPOST /sessions/:sessionId/summarize- Trigger session summaryGET /sessions/:sessionId/status- Get session statusDELETE /sessions/:sessionId- Delete sessionPOST /sessions/:sessionId/complete- Mark session completePOST /sessions/claude-id/:claudeId/observations- Add observations by claude_idPOST /sessions/claude-id/:claudeId/summarize- Summarize by claude_idPOST /sessions/claude-id/:claudeId/complete- Complete by claude_id
DataRoutes.ts
Data retrieval operations (use service layer directly):
GET /observations- List observations (paginated)GET /summaries- List session summaries (paginated)GET /prompts- List user prompts (paginated)GET /observations/:id- Get observation by IDGET /sessions/:sessionId- Get session by IDGET /prompts/:id- Get prompt by IDGET /stats- Get database statisticsGET /projects- List all projectsGET /processing- Get processing statusPOST /processing- Set processing status
SearchRoutes.ts
All search operations (proxy to MCP server):
GET /search- Unified search (observations + sessions + prompts)GET /timeline- Unified timeline contextGET /decisions- Decision-type observationsGET /changes- Change-related observationsGET /how-it-works- How-it-works explanationsGET /search/observations- Search observationsGET /search/sessions- Search sessionsGET /search/prompts- Search promptsGET /search/by-concept- Find by concept tagGET /search/by-file- Find by file pathGET /search/by-type- Find by observation typeGET /search/recent-context- Get recent contextGET /search/context-timeline- Get context timelineGET /context/preview- Preview contextGET /context/inject- Inject contextGET /search/timeline-by-query- Timeline by search queryGET /search/help- Search help
SettingsRoutes.ts
Settings and configuration (use service layer directly):
GET /settings- Get user settingsPOST /settings- Update user settingsGET /mcp/status- Get MCP server statusPOST /mcp/toggle- Toggle MCP server on/offGET /branch/status- Get git branch infoPOST /branch/switch- Switch git branchPOST /branch/update- Pull branch updates
Current State (Phase 1)
Phase 1 is a pure code reorganization with ZERO functional changes:
- Extract route handlers from WorkerService.ts monolith
- Organize into logical route classes
- Keep all existing behavior identical
MCP vs Direct DB Split (inherited, not changed in Phase 1):
- Search operations → MCP server (mem-search)
- Session/data operations → Direct DB access via service layer
Future Phase 2
Phase 2 will unify the architecture:
- Expand MCP server to handle ALL operations (not just search)
- Convert all route handlers to proxy through MCP
- Move database logic from service layer into MCP tools
- Result: Worker becomes pure HTTP → MCP proxy for maximum portability
This separation allows the worker to be deployed anywhere (as a CLI tool, cloud service, etc.) without carrying database dependencies.
Adding New Endpoints
- Choose the appropriate route file based on the endpoint's purpose
- Add the route handler method to the class
- Register the route in the
setupRoutes()method - Import any needed services in the constructor
- Follow the existing patterns for error handling and logging
Example:
// In DataRoutes.ts
private async handleGetFoo(req: Request, res: Response): Promise<void> {
try {
const result = await this.dbManager.getFoo();
res.json(result);
} catch (error) {
logger.failure('WORKER', 'Get foo failed', {}, error as Error);
res.status(500).json({ error: (error as Error).message });
}
}
// Register in setupRoutes()
app.get('/foo', this.handleGetFoo.bind(this));
Key Design Principles
- Progressive Disclosure: Navigate from high-level (WorkerService.ts) to specific routes to implementation details
- Single Responsibility: Each route class handles one feature area
- Dependency Injection: Route classes receive only the services they need
- Consistent Error Handling: All handlers use try/catch with logger.failure()
- Bound Methods: All route handlers use
.bind(this)to preserve context