ba1ef6c42c
* fix: resolve search, database, and docker bugs (#1913, #1916, #1956, #1957, #2048) - Fix concept/concepts param mismatch in SearchManager.normalizeParams (#1916) - Add FTS5 keyword fallback when ChromaDB is unavailable (#1913, #2048) - Add periodic WAL checkpoint and journal_size_limit to prevent unbounded WAL growth (#1956) - Add periodic clearFailed() to purge stale pending_messages (#1957) - Fix nounset-safe TTY_ARGS expansion in docker/claude-mem/run.sh Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: prevent silent data loss on non-XML responses, add queue info to /health (#1867, #1874) - ResponseProcessor: mark messages as failed (with retry) instead of confirming when the LLM returns non-XML garbage (auth errors, rate limits) (#1874) - Health endpoint: include activeSessions count for queue liveness monitoring (#1867) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: cache isFts5Available() at construction time Addresses Greptile review: avoid DDL probe (CREATE + DROP) on every text query. Result is now cached in _fts5Available at construction. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve worker stability bugs — pool deadlock, MCP loopback, restart guard (#1868, #1876, #2053) - Replace flat consecutiveRestarts counter with time-windowed RestartGuard: only counts restarts within 60s window (cap=10), decays after 5min of success. Prevents stranding pending messages on long-running sessions. (#2053) - Add idle session eviction to pool slot allocation: when all slots are full, evict the idlest session (no pending work, oldest activity) to free a slot for new requests, preventing 60s timeout deadlock. (#1868) - Fix MCP loopback self-check: use process.execPath instead of bare 'node' which fails on non-interactive PATH. Fix crash misclassification by removing false "Generator exited unexpectedly" error log on normal completion. (#1876) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve hooks reliability bugs — summarize exit code, session-init health wait (#1896, #1901, #1903, #1907) - Wrap summarize hook's workerHttpRequest in try/catch to prevent exit code 2 (blocking error) on network failures or malformed responses. Session exit no longer blocks on worker errors. (#1901) - Add health-check wait loop to UserPromptSubmit session-init command in hooks.json. On Linux/WSL where hook ordering fires UserPromptSubmit before SessionStart, session-init now waits up to 10s for worker health before proceeding. Also wrap session-init HTTP call in try/catch. (#1907) - Close #1896 as already-fixed: mtime comparison at file-context.ts:255-267 bypasses truncation when file is newer than latest observation. - Close #1903 as no-repro: hooks.json correctly declares all hook events. Issue was Claude Code 12.0.1/macOS platform event-dispatch bug. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: security hardening — bearer auth, path validation, rate limits, per-user port (#1932, #1933, #1934, #1935, #1936) - Add bearer token auth to all API endpoints: auto-generated 32-byte token stored at ~/.claude-mem/worker-auth-token (mode 0600). All hook, MCP, viewer, and OpenCode requests include Authorization header. Health/readiness endpoints exempt for polling. (#1932, #1933) - Add path traversal protection: watch.context.path validated against project root and ~/.claude-mem/ before write. Rejects ../../../etc style attacks. (#1934) - Reduce JSON body limit from 50MB to 5MB. Add in-memory rate limiter (300 req/min/IP) to prevent abuse. (#1935) - Derive default worker port from UID (37700 + uid%100) to prevent cross-user data leakage on multi-user macOS. Windows falls back to 37777. Shell hooks use same formula via id -u. (#1936) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve search project filtering and import Chroma sync (#1911, #1912, #1914, #1918) - Fix per-type search endpoints to pass project filter to Chroma queries and SQLite hydration. searchObservations/Sessions/UserPrompts now use $or clause matching project + merged_into_project. (#1912) - Fix timeline/search methods to pass project to Chroma anchor queries. Prevents cross-project result leakage when project param omitted. (#1911) - Sync imported observations to ChromaDB after FTS rebuild. Import endpoint now calls chromaSync.syncObservation() for each imported row, making them visible to MCP search(). (#1914) - Fix session-init cwd fallback to match context.ts (process.cwd()). Prevents project key mismatch that caused "no previous sessions" on fresh sessions. (#1918) - Fix sync-marketplace restart to include auth token and per-user port. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve all CodeRabbit and Greptile review comments on PR #2080 - Fix run.sh comment mismatch (no-op flag vs empty array) - Gate session-init on health check success (prevent running when worker unreachable) - Fix date_desc ordering ignored in FTS session search - Age-scope failed message purge (1h retention) instead of clearing all - Anchor RestartGuard decay to real successes (null init, not Date.now()) - Add recordSuccess() calls in ResponseProcessor and completion path - Prevent caller headers from overriding bearer auth token - Add lazy cleanup for rate limiter map to prevent unbounded growth - Bound post-import Chroma sync with concurrency limit of 8 - Add doc_type:'observation' filter to Chroma queries feeding observation hydration - Add FTS fallback to all specialized search handlers (observations, sessions, prompts, timeline) - Add response.ok check and error handling in viewer saveSettings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve CodeRabbit round-2 review comments - Use failure timestamp (COALESCE) instead of created_at_epoch for stale purge - Downgrade _fts5Available flag when FTS table creation fails - Escape FTS5 MATCH input by quoting user queries as literal phrases - Escape LIKE metacharacters (%, _, \) in prompt text search - Add response.ok check in initial settings load (matches save flow) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve CodeRabbit round-3 review comments - Include failed_at_epoch in COALESCE for age-scoped purge - Re-throw FTS5 errors so callers can distinguish failure from no-results - Wrap all FTS fallback calls in SearchManager with try/catch Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Worker Service Architecture
Overview
The Worker Service is an Express HTTP server that handles all claude-mem operations. It runs on port 37777 (configurable via CLAUDE_MEM_WORKER_PORT) and is managed by PM2.
Request Flow
Hook (plugin/scripts/*-hook.js)
→ HTTP Request to Worker (localhost:37777)
→ Route Handler (http/routes/*.ts)
→ MCP Server Tool (for search) OR Service Layer (for session/data)
→ Database (SQLite3 + Chroma vector DB)
Directory Structure
src/services/worker/
├── README.md # This file
├── WorkerService.ts # Slim orchestrator (~150 lines)
├── http/ # HTTP layer
│ ├── middleware.ts # Shared middleware (logging, CORS, etc.)
│ └── routes/ # Route handlers organized by feature area
│ ├── SessionRoutes.ts # Session lifecycle (init, observations, summarize, complete)
│ ├── DataRoutes.ts # Data retrieval (get observations, summaries, prompts, stats)
│ ├── SearchRoutes.ts # Search/MCP proxy (all search endpoints)
│ ├── SettingsRoutes.ts # Settings, MCP toggle, branch switching
│ └── ViewerRoutes.ts # Health check, viewer UI, SSE stream
└── services/ # Business logic services (existing, NO CHANGES in Phase 1)
├── DatabaseManager.ts # SQLite connection management
├── SessionManager.ts # Session state tracking
├── SDKAgent.ts # Claude Agent SDK for observations/summaries
├── SSEBroadcaster.ts # Server-Sent Events for real-time updates
├── PaginationHelper.ts # Query pagination utilities
├── SettingsManager.ts # User settings CRUD
└── BranchManager.ts # Git branch operations
Route Organization
ViewerRoutes.ts
GET /health- Health check endpointGET /- Serve viewer UI (React app)GET /stream- SSE stream for real-time updates
SessionRoutes.ts
Session lifecycle operations (use service layer directly):
POST /sessions/init- Initialize new sessionPOST /sessions/:sessionId/observations- Add tool usage observationsPOST /sessions/:sessionId/summarize- Trigger session summaryGET /sessions/:sessionId/status- Get session statusDELETE /sessions/:sessionId- Delete sessionPOST /sessions/:sessionId/complete- Mark session completePOST /sessions/claude-id/:claudeId/observations- Add observations by claude_idPOST /sessions/claude-id/:claudeId/summarize- Summarize by claude_idPOST /sessions/claude-id/:claudeId/complete- Complete by claude_id
DataRoutes.ts
Data retrieval operations (use service layer directly):
GET /observations- List observations (paginated)GET /summaries- List session summaries (paginated)GET /prompts- List user prompts (paginated)GET /observations/:id- Get observation by IDGET /sessions/:sessionId- Get session by IDGET /prompts/:id- Get prompt by IDGET /stats- Get database statisticsGET /projects- List all projectsGET /processing- Get processing statusPOST /processing- Set processing status
SearchRoutes.ts
All search operations (proxy to MCP server):
GET /search- Unified search (observations + sessions + prompts)GET /timeline- Unified timeline contextGET /decisions- Decision-type observationsGET /changes- Change-related observationsGET /how-it-works- How-it-works explanationsGET /search/observations- Search observationsGET /search/sessions- Search sessionsGET /search/prompts- Search promptsGET /search/by-concept- Find by concept tagGET /search/by-file- Find by file pathGET /search/by-type- Find by observation typeGET /search/recent-context- Get recent contextGET /search/context-timeline- Get context timelineGET /context/preview- Preview contextGET /context/inject- Inject contextGET /search/timeline-by-query- Timeline by search queryGET /search/help- Search help
SettingsRoutes.ts
Settings and configuration (use service layer directly):
GET /settings- Get user settingsPOST /settings- Update user settingsGET /mcp/status- Get MCP server statusPOST /mcp/toggle- Toggle MCP server on/offGET /branch/status- Get git branch infoPOST /branch/switch- Switch git branchPOST /branch/update- Pull branch updates
Current State (Phase 1)
Phase 1 is a pure code reorganization with ZERO functional changes:
- Extract route handlers from WorkerService.ts monolith
- Organize into logical route classes
- Keep all existing behavior identical
MCP vs Direct DB Split (inherited, not changed in Phase 1):
- Search operations → MCP server (mem-search)
- Session/data operations → Direct DB access via service layer
Future Phase 2
Phase 2 will unify the architecture:
- Expand MCP server to handle ALL operations (not just search)
- Convert all route handlers to proxy through MCP
- Move database logic from service layer into MCP tools
- Result: Worker becomes pure HTTP → MCP proxy for maximum portability
This separation allows the worker to be deployed anywhere (as a CLI tool, cloud service, etc.) without carrying database dependencies.
Adding New Endpoints
- Choose the appropriate route file based on the endpoint's purpose
- Add the route handler method to the class
- Register the route in the
setupRoutes()method - Import any needed services in the constructor
- Follow the existing patterns for error handling and logging
Example:
// In DataRoutes.ts
private async handleGetFoo(req: Request, res: Response): Promise<void> {
try {
const result = await this.dbManager.getFoo();
res.json(result);
} catch (error) {
logger.failure('WORKER', 'Get foo failed', {}, error as Error);
res.status(500).json({ error: (error as Error).message });
}
}
// Register in setupRoutes()
app.get('/foo', this.handleGetFoo.bind(this));
Key Design Principles
- Progressive Disclosure: Navigate from high-level (WorkerService.ts) to specific routes to implementation details
- Single Responsibility: Each route class handles one feature area
- Dependency Injection: Route classes receive only the services they need
- Consistent Error Handling: All handlers use try/catch with logger.failure()
- Bound Methods: All route handlers use
.bind(this)to preserve context