Files
claude-mem/src/services/worker
Alex Newman c6f932988a Fix 30+ root-cause bugs across 10 triage phases (#1214)
* MAESTRO: fix ChromaDB core issues — Python pinning, Windows paths, disable toggle, metadata sanitization, transport errors

- Add --python version pinning to uvx args in both local and remote mode (fixes #1196, #1206, #1208)
- Convert backslash paths to forward slashes for --data-dir on Windows (fixes #1199)
- Add CLAUDE_MEM_CHROMA_ENABLED setting for SQLite-only fallback mode (fixes #707)
- Sanitize metadata in addDocuments() to filter null/undefined/empty values (fixes #1183, #1188)
- Wrap callTool() in try/catch for transport errors with auto-reconnect (fixes #1162)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix data integrity — content-hash deduplication, project name collision, empty project guard, stuck isProcessing

- Add SHA-256 content-hash deduplication to observations INSERT (store.ts, transactions.ts, SessionStore.ts)
- Add content_hash column via migration 22 with backfill and index
- Fix project name collision: getCurrentProjectName() now returns parent/basename
- Guard against empty project string with cwd-derived fallback
- Fix stuck isProcessing: hasAnyPendingWork() resets processing messages older than 5 minutes
- Add 12 new tests covering all four fixes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix hook lifecycle — stderr suppression, output isolation, conversation pollution prevention

- Suppress process.stderr.write in hookCommand() to prevent Claude Code showing diagnostic
  output as error UI (#1181). Restores stderr in finally block for worker-continues case.
- Convert console.error() to logger.warn()/error() in hook-command.ts and handlers/index.ts
  so all diagnostics route to log file instead of stderr.
- Verified all 7 handlers return suppressOutput: true (prevents conversation pollution #598, #784).
- Verified session-complete is a recognized event type (fixes #984).
- Verified unknown event types return no-op handler with exit 0 (graceful degradation).
- Added 10 new tests in tests/hook-lifecycle.test.ts covering event dispatch, adapter defaults,
  stderr suppression, and standard response constants.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix worker lifecycle — restart loop coordination, stale transport retry, ENOENT shutdown race

- Add PID file mtime guard to prevent concurrent restart storms (#1145):
  isPidFileRecent() + touchPidFile() coordinate across sessions
- Add transparent retry in ChromaMcpManager.callTool() on transport
  error — reconnects and retries once instead of failing (#1131)
- Wrap getInstalledPluginVersion() with ENOENT/EBUSY handling (#1042)
- Verified ChromaMcpManager.stop() already called on all shutdown paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix Windows platform support — uvx.cmd spawn, PowerShell $_ elimination, windowsHide, FTS5 fallback

- Route uvx spawn through cmd.exe /c on Windows since MCP SDK lacks shell:true (#1190, #1192, #1199)
- Replace all PowerShell Where-Object {$_} pipelines with WQL -Filter server-side filtering (#1024, #1062)
- Add windowsHide: true to all exec/spawn calls missing it to prevent console popups (#1048)
- Add FTS5 runtime probe with graceful fallback when unavailable on Windows (#791)
- Guard FTS5 table creation in migrations, SessionSearch, and SessionStore with try/catch

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix skills/ distribution — build-time verification and regression tests (#1187)

Add post-build verification in build-hooks.js that fails if critical
distribution files (skills, hooks, plugin manifest) are missing. Add
10 regression tests covering skill file presence, YAML frontmatter,
hooks.json integrity, and package.json files field.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix MigrationRunner schema initialization (#979) — version conflict between parallel migration systems

Root cause: old DatabaseManager migrations 1-7 shared schema_versions table with
MigrationRunner's 4-22, causing version number collisions (5=drop tables vs add column,
6=FTS5 vs prompt tracking, 7=discovery_tokens vs remove UNIQUE).  initializeSchema()
was gated behind maxApplied===0, so core tables were never created when old versions
were present.

Fixes:
- initializeSchema() always creates core tables via CREATE TABLE IF NOT EXISTS
- Migrations 5-7 check actual DB state (columns/constraints) not just version tracking
- Crash-safe temp table rebuilds (DROP IF EXISTS _new before CREATE)
- Added missing migration 21 (ON UPDATE CASCADE) to MigrationRunner
- Added ON UPDATE CASCADE to FK definitions in initializeSchema()
- All changes applied to both runner.ts and SessionStore.ts

Tests: 13 new tests in migration-runner.test.ts covering fresh DB, idempotency,
version conflicts, crash recovery, FK constraints, and data integrity.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix 21 test failures — stale mocks, outdated assertions, missing OpenClaw guards

Server tests (12): Added missing workerPath and getAiStatus to ServerOptions
mocks after interface expansion. ChromaSync tests (3): Updated to verify
transport cleanup in ChromaMcpManager after architecture refactor. OpenClaw (2):
Added memory_ tool skipping and response truncation to prevent recursive loops
and oversized payloads. MarkdownFormatter (2): Updated assertions to match
current output. SettingsDefaultsManager (1): Used correct default key for
getBool test. Logger standards (1): Excluded CLI transcript command from
background service check.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix Codex CLI compatibility (#744) — session_id fallbacks, unknown platform tolerance, undefined guard

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix Cursor IDE integration (#838, #1049) — adapter field fallbacks, tolerant session-init validation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix /api/logs OOM (#1203) — tail-read replaces full-file readFileSync

Replace readFileSync (loads entire file into memory) with readLastLines()
that reads only from the end of the file in expanding chunks (64KB → 10MB cap).
Prevents OOM on large log files while preserving the same API response shape.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix Settings CORS error (#1029) — explicit methods and allowedHeaders in CORS config

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: add session custom_title for agent attribution (#1213) — migration 23, endpoint + store support

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: prevent CLAUDE.md/AGENTS.md writes inside .git/ directories (#1165)

Add .git path guard to all 4 write sites to prevent ref corruption when
paths resolve inside .git internals.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix plugin disabled state not respected (#781) — early exit check in all hook entry points

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix UserPromptSubmit context re-injection on every turn (#1079) — contextInjected session flag

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* MAESTRO: fix stale AbortController queue stall (#1099) — lastGeneratorActivity tracking + 30s timeout

Three-layer fix:
1. Added lastGeneratorActivity timestamp to ActiveSession, updated by
   processAgentResponse (all agents), getMessageIterator (queue yields),
   and startGeneratorWithProvider (generator launch)
2. Added stale generator detection in ensureGeneratorRunning — if no
   activity for >30s, aborts stale controller, resets state, restarts
3. Added AbortSignal.timeout(30000) in deleteSession to prevent
   indefinite hang when awaiting a stuck generator promise

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 19:34:35 -05:00
..
2026-02-07 01:05:38 -05:00

Worker Service Architecture

Overview

The Worker Service is an Express HTTP server that handles all claude-mem operations. It runs on port 37777 (configurable via CLAUDE_MEM_WORKER_PORT) and is managed by PM2.

Request Flow

Hook (plugin/scripts/*-hook.js)
  → HTTP Request to Worker (localhost:37777)
    → Route Handler (http/routes/*.ts)
      → MCP Server Tool (for search) OR Service Layer (for session/data)
        → Database (SQLite3 + Chroma vector DB)

Directory Structure

src/services/worker/
├── README.md                     # This file
├── WorkerService.ts              # Slim orchestrator (~150 lines)
├── http/                         # HTTP layer
│   ├── middleware.ts             # Shared middleware (logging, CORS, etc.)
│   └── routes/                   # Route handlers organized by feature area
│       ├── SessionRoutes.ts      # Session lifecycle (init, observations, summarize, complete)
│       ├── DataRoutes.ts         # Data retrieval (get observations, summaries, prompts, stats)
│       ├── SearchRoutes.ts       # Search/MCP proxy (all search endpoints)
│       ├── SettingsRoutes.ts     # Settings, MCP toggle, branch switching
│       └── ViewerRoutes.ts       # Health check, viewer UI, SSE stream
└── services/                     # Business logic services (existing, NO CHANGES in Phase 1)
    ├── DatabaseManager.ts        # SQLite connection management
    ├── SessionManager.ts         # Session state tracking
    ├── SDKAgent.ts               # Claude Agent SDK for observations/summaries
    ├── SSEBroadcaster.ts         # Server-Sent Events for real-time updates
    ├── PaginationHelper.ts       # Query pagination utilities
    ├── SettingsManager.ts        # User settings CRUD
    └── BranchManager.ts          # Git branch operations

Route Organization

ViewerRoutes.ts

  • GET /health - Health check endpoint
  • GET / - Serve viewer UI (React app)
  • GET /stream - SSE stream for real-time updates

SessionRoutes.ts

Session lifecycle operations (use service layer directly):

  • POST /sessions/init - Initialize new session
  • POST /sessions/:sessionId/observations - Add tool usage observations
  • POST /sessions/:sessionId/summarize - Trigger session summary
  • GET /sessions/:sessionId/status - Get session status
  • DELETE /sessions/:sessionId - Delete session
  • POST /sessions/:sessionId/complete - Mark session complete
  • POST /sessions/claude-id/:claudeId/observations - Add observations by claude_id
  • POST /sessions/claude-id/:claudeId/summarize - Summarize by claude_id
  • POST /sessions/claude-id/:claudeId/complete - Complete by claude_id

DataRoutes.ts

Data retrieval operations (use service layer directly):

  • GET /observations - List observations (paginated)
  • GET /summaries - List session summaries (paginated)
  • GET /prompts - List user prompts (paginated)
  • GET /observations/:id - Get observation by ID
  • GET /sessions/:sessionId - Get session by ID
  • GET /prompts/:id - Get prompt by ID
  • GET /stats - Get database statistics
  • GET /projects - List all projects
  • GET /processing - Get processing status
  • POST /processing - Set processing status

SearchRoutes.ts

All search operations (proxy to MCP server):

  • GET /search - Unified search (observations + sessions + prompts)
  • GET /timeline - Unified timeline context
  • GET /decisions - Decision-type observations
  • GET /changes - Change-related observations
  • GET /how-it-works - How-it-works explanations
  • GET /search/observations - Search observations
  • GET /search/sessions - Search sessions
  • GET /search/prompts - Search prompts
  • GET /search/by-concept - Find by concept tag
  • GET /search/by-file - Find by file path
  • GET /search/by-type - Find by observation type
  • GET /search/recent-context - Get recent context
  • GET /search/context-timeline - Get context timeline
  • GET /context/preview - Preview context
  • GET /context/inject - Inject context
  • GET /search/timeline-by-query - Timeline by search query
  • GET /search/help - Search help

SettingsRoutes.ts

Settings and configuration (use service layer directly):

  • GET /settings - Get user settings
  • POST /settings - Update user settings
  • GET /mcp/status - Get MCP server status
  • POST /mcp/toggle - Toggle MCP server on/off
  • GET /branch/status - Get git branch info
  • POST /branch/switch - Switch git branch
  • POST /branch/update - Pull branch updates

Current State (Phase 1)

Phase 1 is a pure code reorganization with ZERO functional changes:

  • Extract route handlers from WorkerService.ts monolith
  • Organize into logical route classes
  • Keep all existing behavior identical

MCP vs Direct DB Split (inherited, not changed in Phase 1):

  • Search operations → MCP server (mem-search)
  • Session/data operations → Direct DB access via service layer

Future Phase 2

Phase 2 will unify the architecture:

  1. Expand MCP server to handle ALL operations (not just search)
  2. Convert all route handlers to proxy through MCP
  3. Move database logic from service layer into MCP tools
  4. Result: Worker becomes pure HTTP → MCP proxy for maximum portability

This separation allows the worker to be deployed anywhere (as a CLI tool, cloud service, etc.) without carrying database dependencies.

Adding New Endpoints

  1. Choose the appropriate route file based on the endpoint's purpose
  2. Add the route handler method to the class
  3. Register the route in the setupRoutes() method
  4. Import any needed services in the constructor
  5. Follow the existing patterns for error handling and logging

Example:

// In DataRoutes.ts
private async handleGetFoo(req: Request, res: Response): Promise<void> {
  try {
    const result = await this.dbManager.getFoo();
    res.json(result);
  } catch (error) {
    logger.failure('WORKER', 'Get foo failed', {}, error as Error);
    res.status(500).json({ error: (error as Error).message });
  }
}

// Register in setupRoutes()
app.get('/foo', this.handleGetFoo.bind(this));

Key Design Principles

  1. Progressive Disclosure: Navigate from high-level (WorkerService.ts) to specific routes to implementation details
  2. Single Responsibility: Each route class handles one feature area
  3. Dependency Injection: Route classes receive only the services they need
  4. Consistent Error Handling: All handlers use try/catch with logger.failure()
  5. Bound Methods: All route handlers use .bind(this) to preserve context