# Claude-Mem Worker Server Architecture
**Document Version:** 1.0
**Last Updated:** 2025-01-24
**Author:** Analysis by Claude Code
**Purpose:** Comprehensive technical analysis of the worker server architecture, logic flow, blocking behavior, and component value assessment
---
## Executive Summary
The claude-mem worker server is a long-running HTTP service managed by PM2 that processes tool execution observations and generates session summaries using the Claude Agent SDK. It implements a **defensive, layered architecture** designed to maximize data persistence while maintaining flexibility.
### Key Design Principles
1. **Maximally Permissive Storage** - System defaults to saving data even if incomplete
2. **Auto-Recovery** - Worker restarts don't prevent processing (session state reconstructed from database)
3. **Queue-Based Processing** - HTTP API decoupled from AI processing for reliability
4. **Defensive Programming** - Auto-creates missing database records, accepts null fields
5. **Session Isolation** - Each session has independent state and SDK agent
### Architecture at a Glance
```
┌─────────────────────────────────────────────────────────────┐
│ Layer 1: HTTP API (Express.js) │
│ - 6 REST endpoints │
│ - Always queues messages (maximally permissive) │
└──────────────────┬──────────────────────────────────────────┘
│
┌──────────────────▼──────────────────────────────────────────┐
│ Layer 2: In-Memory Queue │
│ - pendingMessages array per session │
│ - VULNERABILITY: Lost on worker restart │
└──────────────────┬──────────────────────────────────────────┘
│
┌──────────────────▼──────────────────────────────────────────┐
│ Layer 3: SDK Agent (Claude Agent SDK) │
│ - Processes queued messages via async generator │
│ - Can fail due to config or AI errors │
└──────────────────┬──────────────────────────────────────────┘
│
┌──────────────────▼──────────────────────────────────────────┐
│ Layer 4: Parser (XML Extraction) │
│ - Extracts observations and summaries from AI responses │
│ - Permissive (v4.2.5/v4.2.6 fixes ensure partial data saved)│
└──────────────────┬──────────────────────────────────────────┘
│
┌──────────────────▼──────────────────────────────────────────┐
│ Layer 5: Database (SQLite with better-sqlite3) │
│ - Permanent storage (once here, data persists) │
│ - Auto-creates missing sessions, accepts nulls │
└─────────────────────────────────────────────────────────────┘
```
**Critical Insight:** Data can only be lost between layers 2-4. Once it reaches the database (layer 5), it's permanent.
---
## Component Inventory
### HTTP REST API Endpoints
| Endpoint | Purpose | Blocks Data? |
|----------|---------|--------------|
| `GET /health` | Worker health check | N/A |
| `POST /sessions/:id/init` | Initialize session and start SDK agent | Only if session not in DB (expected) |
| `POST /sessions/:id/observations` | Queue tool observation | ❌ Never (auto-recovery) |
| `POST /sessions/:id/summarize` | Queue summary request | ❌ Never (auto-recovery) |
| `GET /sessions/:id/status` | Get session status | N/A |
| `DELETE /sessions/:id` | Abort session | ⚠️ Queued messages lost |
### Core Processing Components
| Component | File | Lines | Purpose |
|-----------|------|-------|---------|
| WorkerService | worker-service.ts | 52-590 | Main service class, manages sessions |
| runSDKAgent | worker-service.ts | 345-404 | Runs SDK agent for a session |
| createMessageGenerator | worker-service.ts | 410-502 | Async generator feeding SDK |
| handleAgentMessage | worker-service.ts | 508-563 | Parses and stores SDK responses |
| parseObservations | parser.ts | 32-96 | Extracts observations from XML |
| parseSummary | parser.ts | 102-157 | Extracts summary from XML |
| SessionStore | SessionStore.ts | 9-1086 | Database operations |
---
## Deep Dive: HTTP Endpoints
### GET /health (lines 100-109)
**Purpose:** Health check for monitoring and debugging
**Logic Flow:**
1. Returns JSON with status, port, PID, active sessions, uptime, memory
**Blocking Analysis:** ❌ N/A (read-only endpoint)
**Value Assessment:** ✅ HIGH VALUE
- Essential for monitoring worker health
- Helps debug port conflicts and process state
- Keep as-is
---
### POST /sessions/:sessionDbId/init (lines 115-169)
**Purpose:** Initialize a new session and start the SDK agent
**Logic Flow:**
1. Parse `sessionDbId` from URL
2. Extract `project` and `userPrompt` from request body
3. Fetch session from database using `SessionStore.getSessionById()`
4. **CRITICAL CHECK:** Return 404 if session not found in DB
5. Retrieve `claudeSessionId` from database record
6. Create `ActiveSession` object with initial state:
```typescript
{
sessionDbId, claudeSessionId, sdkSessionId: null,
project, userPrompt, pendingMessages: [],
abortController: new AbortController(),
generatorPromise: null, lastPromptNumber: 0,
observationCounter: 0, startTime: Date.now()
}
```
7. Store session in memory map (`this.sessions`)
8. Update `worker_port` in database
9. Start `runSDKAgent()` in background (fire-and-forget promise)
10. Return success response immediately
**Blocking Analysis:** ⚠️ CONDITIONAL
- Returns 404 if session doesn't exist in database
- This is expected behavior - session must be created before init
- Doesn't prevent future initialization attempts
- Error logged and hook can retry
**Value Assessment:** ✅ HIGH VALUE
- Critical initialization step
- Background SDK agent startup prevents timeout
- Keep as-is
**Edge Cases:**
- Session exists but SDK agent fails to start → Session marked as failed, but new init can retry
- Multiple init calls for same session → First one wins (subsequent calls find session in memory)
---
### POST /sessions/:sessionDbId/observations (lines 175-230)
**Purpose:** Queue a tool execution observation for processing
**Logic Flow:**
1. Parse `sessionDbId` from URL
2. Extract `tool_name`, `tool_input`, `tool_output`, `prompt_number` from body
3. Check if session exists in memory map (`this.sessions.get(sessionDbId)`)
4. **AUTO-RECOVERY** (lines 181-209): If session NOT in memory:
- Fetch session from database
- Recreate `ActiveSession` object
- Start new SDK agent in background
- This enables recovery from worker restarts!
5. Increment `observationCounter` for correlation ID tracking
6. Push observation message to `pendingMessages` queue:
```typescript
{
type: 'observation',
tool_name, tool_input, tool_output, prompt_number
}
```
7. Return success with queue length
**Blocking Analysis:** ❌ NEVER BLOCKS
- Auto-creates session state from database if missing
- Always queues the observation
- HTTP response confirms receipt immediately
- Processing happens asynchronously
**Value Assessment:** ✅ HIGH VALUE
- Auto-recovery is brilliant design
- Worker restart doesn't lose ability to process observations
- Keep as-is
**Edge Cases:**
- Worker restart while observation in queue → Lost (queue is in-memory)
- But NEW observations after restart are queued successfully (auto-recovery)
- Database not found → Would throw error, but SessionStore auto-creates sessions
---
### POST /sessions/:sessionDbId/summarize (lines 236-284)
**Purpose:** Queue a summary generation request
**Logic Flow:**
1. Parse `sessionDbId` and `prompt_number` from request
2. Check if session exists in memory
3. **AUTO-RECOVERY** (lines 241-270): Same pattern as observations endpoint
- Fetches session from database
- Recreates `ActiveSession` object
- Starts new SDK agent
4. Push summarize message to `pendingMessages` queue:
```typescript
{
type: 'summarize',
prompt_number
}
```
5. Return success with queue length
**Blocking Analysis:** ❌ NEVER BLOCKS
- Same auto-recovery mechanism as observations
- Always queues the summary request
- Processing happens asynchronously
**Value Assessment:** ✅ HIGH VALUE
- Auto-recovery pattern prevents data loss
- Keep as-is
**Code Quality Note:** ⚠️ MEDIUM - Duplicated auto-recovery code (lines 181-209 and 241-270 are nearly identical)
- Could extract to helper function: `getOrCreateSession(sessionDbId)`
- Would reduce duplication and improve maintainability
---
### GET /sessions/:sessionDbId/status (lines 289-304)
**Purpose:** Get current session status and queue length
**Logic Flow:**
1. Parse `sessionDbId` from URL
2. Get session from memory map
3. Return 404 if not found
4. Return session info: `sessionDbId`, `sdkSessionId`, `project`, `pendingMessages.length`
**Blocking Analysis:** ❌ N/A (read-only endpoint)
**Value Assessment:** ✅ MEDIUM VALUE
- Useful for debugging
- Not critical for core functionality
- Keep as-is
---
### DELETE /sessions/:sessionDbId (lines 309-340)
**Purpose:** Abort a running session and clean up
**Logic Flow:**
1. Parse `sessionDbId` from URL
2. Get session from memory map
3. Return 404 if not found
4. Call `abortController.abort()` to signal SDK agent to stop
5. Wait for `generatorPromise` to finish (max 5 seconds timeout)
6. Mark session as 'failed' in database
7. Delete session from memory map
8. Return success
**Blocking Analysis:** ⚠️ BLOCKS QUEUED MESSAGES
- Aborts SDK agent processing
- Any messages in `pendingMessages` queue are lost
- Already-stored observations/summaries remain in database
**Value Assessment:** ✅ MEDIUM VALUE
- Provides clean shutdown mechanism
- Used for manual cleanup
- As of v4.1.0, SessionEnd hook doesn't call DELETE (graceful cleanup)
- Keep for manual intervention, but not used automatically
**Historical Note:**
- v4.0.x: SessionEnd hook called DELETE → interrupted summary generation
- v4.1.0+: Graceful cleanup → workers finish naturally
---
## Deep Dive: SDK Agent Processing
### runSDKAgent (lines 345-404)
**Purpose:** Core processing engine that runs continuously for each session
**Logic Flow:**
1. Call `query()` from Claude Agent SDK with:
```typescript
{
prompt: this.createMessageGenerator(session),
options: {
model: MODEL, // from CLAUDE_MEM_MODEL env var
disallowedTools: DISALLOWED_TOOLS,
abortController: session.abortController,
pathToClaudeCodeExecutable: claudePath
}
}
```
2. Iterate over SDK responses using `for await`
3. For each assistant message:
- Extract text content from response
- Log response size
- Call `handleAgentMessage()` to parse and store
4. On completion:
- Log session duration
- Mark session as 'completed' in database
- Delete session from memory map
5. On error:
- Log error (or warning for AbortError)
- Mark session as 'failed' in database
- Throw error (caught by `generatorPromise.catch()`)
**Blocking Analysis:** ⚠️ CAN BLOCK IF:
- Invalid `CLAUDE_MEM_MODEL` → SDK initialization fails
- Invalid `CLAUDE_CODE_PATH` → SDK initialization fails
- SDK crashes → Session marked as failed
- BUT: Doesn't prevent NEW sessions from being created
**Value Assessment:** ✅ HIGH VALUE
- Core processing engine
- Proper error handling with session status tracking
- Keep as-is
**Configuration Dependencies:**
- `CLAUDE_MEM_MODEL` (default: 'claude-sonnet-4-5')
- `CLAUDE_CODE_PATH` (default: specific path to claude executable)
- `DISALLOWED_TOOLS` (default: ['Glob', 'Grep', 'ListMcpResourcesTool', 'WebSearch'])
**Edge Cases:**
- SDK hangs → AbortController provides cancellation mechanism
- Network issues → SDK handles retries internally
- Multiple concurrent sessions → Each has isolated SDK agent
---
### createMessageGenerator (lines 410-502)
**Purpose:** Async generator that feeds messages to the SDK agent (bridge between HTTP API and SDK)
**Logic Flow:**
1. Build init prompt using `buildInitPrompt(project, claudeSessionId, userPrompt)`
2. Yield initial user message:
```typescript
{
type: 'user',
session_id: claudeSessionId, // Real Claude Code session ID
parent_tool_use_id: null,
message: { role: 'user', content: initPrompt }
}
```
3. Enter infinite loop (`while (true)`):
- Check `abortController.signal.aborted` → break if aborted
- If no pending messages → sleep 100ms and continue
- While `pendingMessages.length > 0`:
- Shift message from queue (FIFO)
- If type === 'summarize':
- Update `lastPromptNumber`
- Fetch session from database
- Build summary prompt using `buildSummaryPrompt()`
- Yield summary prompt as user message
- If type === 'observation':
- Update `lastPromptNumber`
- Build observation prompt using `buildObservationPrompt()`
- Yield observation prompt as user message
**Blocking Analysis:** ❌ DOESN'T BLOCK
- Continuously processes queue until aborted
- 100ms polling means small delay but no data loss
- Messages shifted from queue and sent to SDK
- If SDK fails, messages lost from queue (but already confirmed via HTTP)
**Value Assessment:** ✅ HIGH VALUE
- Elegant async generator pattern
- Keep as-is
**Performance Note:** ⚠️ 100ms polling interval
- Could be improved with event-driven queue (e.g., `AsyncQueue` with notifications)
- Current implementation is simple and works well
- Low priority optimization
**Data Flow:**
```
HTTP /observations → pendingMessages.push() → [sleep 100ms] →
pendingMessages.shift() → buildObservationPrompt() → yield to SDK →
SDK processes → handleAgentMessage()
```
---
### handleAgentMessage (lines 508-563)
**Purpose:** Parse SDK response and store observations/summaries in database
**Logic Flow:**
1. Call `parseObservations(content, correlationId)`
2. If observations found:
- For each observation:
- Call `db.storeObservation(claudeSessionId, project, observation, promptNumber)`
- Log success with correlation ID
3. Call `parseSummary(content, sessionId)`
4. If summary found:
- Call `db.storeSummary(claudeSessionId, project, summary, promptNumber)`
- Log success
5. If NO summary found:
- Log warning with content sample
**Blocking Analysis:** ⚠️ CAN BLOCK IF:
- Parser returns empty array/null → Nothing stored (but this is expected for routine operations)
- Database error → Would throw and crash handler (rare with permissive schema)
**Value Assessment:** ✅ HIGH VALUE
- Core storage logic
- Proper logging for debugging
- Keep as-is
**Critical Dependencies:**
- `parseObservations()` must return valid observations
- `parseSummary()` must return valid summary
- Database must accept the data (schema constraints)
**Logging:**
- Extensive logging at INFO, SUCCESS, and WARN levels
- Correlation IDs for tracking individual observations
- Debug mode logs full SDK responses
---
## Deep Dive: Parser System
### parseObservations (parser.ts lines 32-96)
**Purpose:** Extract observation XML blocks from SDK response and parse into structured data
**Logic Flow:**
1. Use regex to find all `...` blocks (non-greedy):
```typescript
/([\s\S]*?)<\/observation>/g
```
2. For each block:
- Extract all fields: `type`, `title`, `subtitle`, `narrative`, `facts`, `concepts`, `files_read`, `files_modified`
- **VALIDATION** (lines 52-67):
- If `type` is missing or invalid → default to "change"
- Valid types: `['bugfix', 'feature', 'refactor', 'change', 'discovery', 'decision']`
- All other fields can be null
- Filter out `type` from `concepts` array (types and concepts are separate dimensions)
- Push observation to results array
3. Return all observations
**Blocking Analysis:** ❌ NEVER BLOCKS (as of v4.2.6)
- **CRITICAL FIX** (v4.2.6): Removed validation that required title, subtitle, and narrative
- Comment on line 52: "NOTE FROM THEDOTMACK: ALWAYS save observations - never skip. 10/24/2025"
- Always returns observations with whatever fields exist
- Only transformation: type defaults to "change" if invalid
**Value Assessment:** ✅ HIGH VALUE
- Permissive parsing ensures data is never lost
- v4.2.6 fix was critical for reliability
- Keep as-is
**Historical Context:**
- **Before v4.2.6:** Would skip observations missing required fields → data loss
- **After v4.2.6:** Always saves with defaults → maximally permissive
**Edge Cases:**
1. No `` tags → Returns empty array (normal for routine operations)
2. All fields empty → Returns observation with null fields and type="change"
3. Malformed XML → Regex won't match → Returns empty array (data loss)
4. Type in concepts → Filtered out (types and concepts are orthogonal)
**Example:**
```xml
feature
Authentication added
Implemented OAuth2 flow
Added OAuth2 provider configuration
Created callback endpoint
Full OAuth2 authentication...
how-it-works
what-changed
src/auth/oauth.ts
src/auth/oauth.ts
```
---
### parseSummary (parser.ts lines 102-157)
**Purpose:** Extract summary XML block from SDK response
**Logic Flow:**
1. Check for `` tag (lines 104-113)
- If found → log reason and return null (intentional skip)
2. Match `...` block (non-greedy):
```typescript
/([\s\S]*?)<\/summary>/
```
- If not found → return null (SDK didn't provide summary)
3. Extract all fields: `request`, `investigated`, `learned`, `completed`, `next_steps`, `notes` (optional)
4. **VALIDATION REMOVED** (lines 133-147):
- Comment: "NOTE FROM THEDOTMACK: 100% of the time we must SAVE the summary, even if fields are missing. 10/24/2025"
- Comment: "NEVER DO THIS NONSENSE AGAIN."
- Old code checked if all required fields present → would return null
- New code returns summary with whatever fields exist
5. Return `ParsedSummary` object
**Blocking Analysis:** ⚠️ MINIMAL BLOCKING (as of v4.2.5)
- `` tag → Returns null (intentional, not a bug)
- Missing `` tags → Returns null (SDK didn't provide)
- Missing fields within `` → Does NOT block anymore (v4.2.5 fix)
**Value Assessment:** ✅ HIGH VALUE
- v4.2.5 fix ensures partial summaries are saved
- Keep as-is
**Historical Context:**
- **Before v4.2.5:** Would return null if any required field missing → data loss
- **After v4.2.5:** Returns summary with whatever fields exist → maximally permissive
**Edge Cases:**
1. `` → Returns null, logs reason
2. No `` tags → Returns null (SDK didn't generate summary)
3. `` with all empty fields → Returns summary with empty/null strings
4. Malformed XML → Regex won't match → Returns null (data loss)
**Example:**
```xml
Add OAuth2 authentication
Reviewed existing auth system
System uses JWT tokens for sessions
Implemented OAuth2 provider integration
Test with production credentials
Need to configure callback URLs in provider dashboard
```
---
## Deep Dive: Database Layer
### SessionStore.storeObservation (SessionStore.ts lines 901-964)
**Purpose:** Store a parsed observation in the database
**Logic Flow:**
1. **AUTO-CREATE SESSION** (lines 920-940):
- Check if `sdk_session_id` exists in `sdk_sessions` table
- If NOT found:
- Auto-create session record
- Log: "Auto-created session record for session_id: {id}"
- This prevents foreign key constraint errors
2. Prepare INSERT statement:
```sql
INSERT INTO observations
(sdk_session_id, project, type, title, subtitle, facts, narrative,
concepts, files_read, files_modified, prompt_number, created_at, created_at_epoch)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
```
3. Insert observation with:
- `facts`, `concepts`, `files_read`, `files_modified` → JSON.stringify()
- Timestamps auto-generated
- All fields as-is (nulls allowed)
**Blocking Analysis:** ❌ NEVER BLOCKS
- Auto-creates missing sessions (defensive programming)
- All fields nullable (except required ones)
- No validation checks that could fail
- Schema is permissive
**Value Assessment:** ✅ HIGH VALUE
- Auto-creation pattern is brilliant
- Prevents foreign key errors
- Keep as-is
**Schema Constraints:**
- `type` must be one of 6 valid types (CHECK constraint)
- BUT: Parser ensures type is always valid (defaults to "change")
- `sdk_session_id` has foreign key to `sdk_sessions`
- BUT: Auto-creation ensures session exists
- Arrays stored as JSON strings
**Edge Cases:**
- Session doesn't exist → Auto-created
- Invalid type → Parser prevents this (defaults to "change")
- Null fields → Allowed by schema
---
### SessionStore.storeSummary (SessionStore.ts lines 970-1029)
**Purpose:** Store a parsed summary in the database
**Logic Flow:**
1. **AUTO-CREATE SESSION** (lines 987-1007):
- Same defensive pattern as `storeObservation()`
- Ensures session exists before INSERT
2. Prepare INSERT statement:
```sql
INSERT INTO session_summaries
(sdk_session_id, project, request, investigated, learned, completed,
next_steps, notes, prompt_number, created_at, created_at_epoch)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
```
3. Insert summary with:
- All content fields as-is (nulls allowed)
- Timestamps auto-generated
**Blocking Analysis:** ❌ NEVER BLOCKS
- Auto-creates missing sessions
- All content fields nullable
- No validation checks
- Multiple summaries per session allowed (migration 7 removed UNIQUE constraint)
**Value Assessment:** ✅ HIGH VALUE
- Auto-creation ensures reliability
- Nullable fields allow partial data
- Keep as-is
**Schema Evolution:**
- **Before migration 7:** `sdk_session_id` had UNIQUE constraint → Only one summary per session
- **After migration 7:** UNIQUE removed → Multiple summaries per session (one per prompt)
**Edge Cases:**
- Session doesn't exist → Auto-created
- All fields null/empty → Allowed
- Multiple summaries for same session → Allowed (migration 7)
---
### Database Schema Constraints
#### observations table
```sql
CREATE TABLE observations (
id INTEGER PRIMARY KEY AUTOINCREMENT,
sdk_session_id TEXT NOT NULL, -- Foreign key
project TEXT NOT NULL,
text TEXT, -- Nullable (deprecated, migration 9)
type TEXT NOT NULL CHECK(type IN ('decision', 'bugfix', 'feature', 'refactor', 'discovery', 'change')),
title TEXT, -- Nullable
subtitle TEXT, -- Nullable
facts TEXT, -- Nullable (JSON array)
narrative TEXT, -- Nullable
concepts TEXT, -- Nullable (JSON array)
files_read TEXT, -- Nullable (JSON array)
files_modified TEXT, -- Nullable (JSON array)
prompt_number INTEGER, -- Nullable
created_at TEXT NOT NULL,
created_at_epoch INTEGER NOT NULL,
FOREIGN KEY(sdk_session_id) REFERENCES sdk_sessions(sdk_session_id) ON DELETE CASCADE
);
```
**Blocking Potential:**
- Invalid `type` → CHECK constraint violation
- Mitigated by: Parser defaults to "change"
- Missing `sdk_session_id` → Foreign key violation
- Mitigated by: Auto-creation in storeObservation()
#### session_summaries table
```sql
CREATE TABLE session_summaries (
id INTEGER PRIMARY KEY AUTOINCREMENT,
sdk_session_id TEXT NOT NULL, -- No longer UNIQUE (migration 7)
project TEXT NOT NULL,
request TEXT, -- Nullable
investigated TEXT, -- Nullable
learned TEXT, -- Nullable
completed TEXT, -- Nullable
next_steps TEXT, -- Nullable
notes TEXT, -- Nullable
prompt_number INTEGER, -- Nullable
created_at TEXT NOT NULL,
created_at_epoch INTEGER NOT NULL,
FOREIGN KEY(sdk_session_id) REFERENCES sdk_sessions(sdk_session_id) ON DELETE CASCADE
);
```
**Blocking Potential:**
- Missing `sdk_session_id` → Foreign key violation
- Mitigated by: Auto-creation in storeSummary()
**Key Design Decisions:**
1. **Nullable fields** - Allows partial data to be saved
2. **Auto-creation** - Prevents foreign key errors
3. **No UNIQUE constraints** (migration 7) - Multiple summaries per session
4. **WAL mode** - Better concurrency for multiple sessions
5. **JSON arrays** - Flexible storage for lists (facts, concepts, files)
---
## Deep Dive: Prompt System
### buildInitPrompt (prompts.ts lines 24-125)
**Purpose:** Generate initial prompt that instructs the SDK agent what to observe and how to record
**Content:**
1. **Role Definition:** "You are observing a development session to create searchable memory FOR FUTURE SESSIONS"
2. **Critical Instruction:** "Record what was BUILT/FIXED/DEPLOYED/CONFIGURED, not what you (the observer) are doing"
3. **What to Record:** Focus on deliverables, capabilities, technical changes
4. **When to Skip:** Routine operations (empty status checks, package installations, file listings)
5. **Output Format:** XML structure with `` tags and required fields
**Blocking Analysis:** ⚠️ CAN CAUSE SKIPPING
- "WHEN TO SKIP" section instructs SDK to not output for routine operations
- "No output necessary if skipping" means no observations stored
- **This is intentional filtering**, not a bug
**Value Assessment:** ✅ HIGH VALUE
- Prevents noise from routine operations
- Focuses on meaningful changes
- Keep as-is, but consider making "WHEN TO SKIP" configurable
**Key Instructions:**
```
WHEN TO SKIP
------------
Skip routine operations:
- Empty status checks
- Package installations with no errors
- Simple file listings
- Repetitive operations you've already documented
- **No output necessary if skipping.**
```
**Impact:**
- Reduces database size by filtering noise
- But could cause "missing" observations for operations user cares about
- Trade-off between signal and completeness
---
### buildObservationPrompt (prompts.ts lines 130-153)
**Purpose:** Wrap tool execution data in XML for SDK processing
**Content:**
```xml
{name}
{ISO timestamp}
{JSON}
{JSON}
```
**Blocking Analysis:** ❌ NO BLOCKING
- Just data presentation, no instructions to skip
- SDK processes based on init prompt rules
**Value Assessment:** ✅ HIGH VALUE
- Simple, clean data wrapper
- Keep as-is
---
### buildSummaryPrompt (prompts.ts lines 158-178)
**Purpose:** Request summary of the session so far
**Content:**
1. **Instruction:** "Think about the last request, and write a summary of what was done, what was learned, and what's next"
2. **Important Note:** "DO NOT summarize the observation process itself - you are summarizing a DIFFERENT claude code session, not this one"
3. **Output Format:** XML `` with required fields
4. **Encouragement:** "Always write at least a minimal summary explaining where we are at currently, even if you didn't learn anything new or complete any work"
**Blocking Analysis:** ❌ NO BLOCKING
- Encourages always writing summary
- SDK may still skip if truly nothing to summarize
**Value Assessment:** ✅ HIGH VALUE
- Ensures summaries are generated
- "Always write at least a minimal summary" reduces skip rate
- Keep as-is
---
## Data Flow Analysis
### End-to-End Flow: Tool Execution → Database
```
1. User executes tool in Claude Code
↓
2. PostToolUse hook captures execution
↓
3. Hook sends HTTP POST to worker /observations endpoint
↓
4. Worker queues message in pendingMessages array
└─→ HTTP 200 response (confirmed receipt)
↓
5. createMessageGenerator polls queue (100ms interval)
↓
6. Message shifted from queue
↓
7. buildObservationPrompt wraps tool data in XML
↓
8. Generator yields message to SDK agent
↓
9. SDK sends message to Claude API
↓
10. Claude processes tool data based on init prompt
↓
11. Claude responds with XML (or skips if routine operation)
↓
12. SDK returns response to runSDKAgent
↓
13. handleAgentMessage receives response
↓
14. parseObservations extracts blocks
↓
15. For each observation:
- db.storeObservation called
- Auto-creates session if missing
- Inserts into observations table
↓
16. Data persisted in SQLite database
```
**Failure Points:**
- **Point 3:** Worker not running → HTTP request fails → Hook logs error
- **Point 4:** Worker crashes before processing → Queue lost
- **Point 9:** Invalid model config → SDK fails → Session marked failed
- **Point 11:** Malformed XML response → Parser returns empty array
- **Point 15:** Database error (rare) → Throws exception
**Recovery Mechanisms:**
- **Auto-recovery:** New requests after worker restart auto-create session
- **Graceful degradation:** Partial data saved (v4.2.5/v4.2.6 fixes)
- **Database persistence:** Once stored, data survives all restarts
---
## Blocking Assessment Matrix
### Components That CAN Block Data Storage
| Component | Blocking Scenario | Impact | Mitigation |
|-----------|------------------|---------|------------|
| Worker not running | HTTP requests fail | Observations not queued | PM2 auto-restart, health monitoring |
| Invalid CLAUDE_MEM_MODEL | SDK agent fails to start | Queued messages never processed | Validation in settings script |
| Invalid CLAUDE_CODE_PATH | SDK agent fails to start | Queued messages never processed | Default path, env var fallback |
| Malformed XML in SDK response | Parser can't extract | Data lost for that response | Better error handling, partial parsing |
| Worker restart | In-memory queue lost | Queued messages lost | Could persist queue to DB |
| Session abort (DELETE) | Queue processing stopped | Remaining queue lost | Graceful cleanup (v4.1.0) |
| Init prompt "WHEN TO SKIP" | SDK intentionally skips | No observation stored | Intentional filtering, configurable? |
### Components That CANNOT Block Data Storage
| Component | Reason | Design Pattern |
|-----------|--------|----------------|
| /observations endpoint | Auto-recovery, always queues | Maximally permissive |
| /summarize endpoint | Auto-recovery, always queues | Maximally permissive |
| parseObservations() | Defaults to "change" type, accepts nulls | Permissive (v4.2.6 fix) |
| parseSummary() | Returns partial summaries | Permissive (v4.2.5 fix) |
| storeObservation() | Auto-creates sessions, accepts nulls | Defensive programming |
| storeSummary() | Auto-creates sessions, accepts nulls | Defensive programming |
| Database schema | Nullable fields, no UNIQUE constraints | Flexible storage |
---
## Critical Findings
### 1. Auto-Recovery Pattern Prevents Worker Restart Data Loss
**Location:** `/observations` and `/summarize` endpoints (lines 181-209, 241-270)
**How it works:**
```typescript
if (!session) {
// Fetch session from database
const dbSession = db.getSessionById(sessionDbId);
// Recreate in-memory state
session = {
sessionDbId,
claudeSessionId: dbSession!.claude_session_id,
sdkSessionId: null,
project: dbSession!.project,
userPrompt: dbSession!.user_prompt,
pendingMessages: [],
abortController: new AbortController(),
generatorPromise: null,
lastPromptNumber: 0,
observationCounter: 0,
startTime: Date.now()
};
// Start new SDK agent
session.generatorPromise = this.runSDKAgent(session);
}
```
**Value:** ✅ HIGH
- Worker restart doesn't prevent new observations from being processed
- Database is source of truth
- Stateless design enables resilience
**Recommendation:** Extract to helper function to reduce duplication
---
### 2. Parser Fixes (v4.2.5/v4.2.6) Ensure Partial Data Saved
**parseObservations (v4.2.6):**
```typescript
// NOTE FROM THEDOTMACK: ALWAYS save observations - never skip. 10/24/2025
// All fields except type are nullable in schema
// If type is missing or invalid, use "change" as catch-all fallback
let finalType = 'change'; // Default catch-all
if (type && validTypes.includes(type.trim())) {
finalType = type.trim();
}
// All other fields are optional - save whatever we have
observations.push({
type: finalType,
title, // Can be null
subtitle, // Can be null
facts,
narrative, // Can be null
concepts,
files_read,
files_modified
});
```
**parseSummary (v4.2.5):**
```typescript
// NOTE FROM THEDOTMACK: 100% of the time we must SAVE the summary,
// even if fields are missing. 10/24/2025
// NEVER DO THIS NONSENSE AGAIN.
return {
request, // Can be null
investigated, // Can be null
learned, // Can be null
completed, // Can be null
next_steps, // Can be null
notes // Can be null
};
```
**Value:** ✅ CRITICAL
- Prevents data loss from incomplete AI responses
- LLMs make mistakes - system must be resilient
- Partial data is better than no data
**Recommendation:** Keep as-is, this is the right design
---
### 3. In-Memory Queue is Main Vulnerability
**Issue:** `pendingMessages` array is in-memory only
- Worker restart → All queued messages lost
- But HTTP response already confirmed receipt
**Current behavior:**
1. Hook sends observation → Worker responds "queued" → Hook thinks it's saved
2. Worker crashes before processing → Observation lost
3. BUT: New observations after restart are still processed (auto-recovery)
**Impact:** ⚠️ MEDIUM
- Data loss window between queue and processing
- But observations are idempotent (can be resent)
- Hooks don't retry on success response
**Recommendation:** ⚠️ CONSIDER
- Persist queue to database (e.g., `pending_observations` table)
- Mark as processed when SDK handles
- Increases reliability but adds complexity
---
### 4. Init Prompt "WHEN TO SKIP" Intentionally Filters
**Instruction:**
```
WHEN TO SKIP
------------
Skip routine operations:
- Empty status checks
- Package installations with no errors
- Simple file listings
- Repetitive operations you've already documented
- **No output necessary if skipping.**
```
**Impact:**
- Reduces noise in database
- Focuses on meaningful changes
- BUT: User might wonder why some tool executions aren't recorded
**Value:** ✅ MEDIUM - Intentional filtering
- Prevents database bloat
- Trade-off between signal and completeness
**Recommendation:** ⚠️ CONSIDER
- Make "WHEN TO SKIP" configurable (env var or settings)
- Or add verbosity levels (minimal/normal/verbose)
---
## Value Assessment by Component
### HIGH VALUE - Keep As-Is
| Component | Reason |
|-----------|--------|
| Auto-recovery pattern | Prevents worker restart data loss |
| Permissive parser (v4.2.5/v4.2.6) | Ensures partial data saved, critical for reliability |
| Nullable database schema | Flexible storage, allows incomplete data |
| WAL mode SQLite | Good concurrency, reliable writes |
| Isolated session state | No cross-contamination between sessions |
| Queue-based architecture | Decouples HTTP from SDK processing |
| storeObservation/storeSummary auto-creation | Defensive programming, prevents foreign key errors |
### MEDIUM VALUE - Consider Improvements
| Component | Current State | Potential Improvement |
|-----------|--------------|----------------------|
| In-memory queue | Lost on restart | Persist to DB for durability |
| 100ms polling | Works but inefficient | Event-driven async queue |
| Duplicated auto-recovery code | Lines 181-209 and 241-270 identical | Extract to `getOrCreateSession()` helper |
| No try-catch around DB ops | Errors crash handler | Add error handling with logging |
| Model/port defaults | Hard-coded | Already configurable via env vars ✓ |
| Init prompt filtering | Fixed "WHEN TO SKIP" rules | Make configurable (verbosity levels) |
### LOW VALUE - Questionable Design
| Component | Issue | Recommendation |
|-----------|-------|----------------|
| cleanupOrphanedSessions() | Marks ALL active sessions failed on startup | Aggressive, but necessary with fixed port |
| 5-second DELETE timeout | Arbitrary | Make configurable via env var |
| "NO SUMMARY TAGS FOUND" warning | Log level too high | Change to INFO level |
---
## Recommendations
### Priority 1: Critical Reliability Improvements
1. **Persist Message Queue to Database**
- Create `pending_messages` table
- Store queued observations/summaries
- Mark as processed when handled by SDK
- Prevents data loss on worker restart
- **Effort:** Medium, **Impact:** High
2. **Add Error Handling Around Database Operations**
- Wrap `db.storeObservation()` and `db.storeSummary()` in try-catch
- Log errors with full context
- Continue processing other messages on error
- **Effort:** Low, **Impact:** Medium
### Priority 2: Code Quality Improvements
3. **Extract Auto-Recovery to Helper Function**
```typescript
private async getOrCreateSession(sessionDbId: number): Promise {
// Consolidate lines 181-209 and 241-270
}
```
- **Effort:** Low, **Impact:** Low (code quality)
4. **Make Configuration More Flexible**
- Add `CLAUDE_MEM_VERBOSITY` env var (minimal/normal/verbose)
- Adjust init prompt "WHEN TO SKIP" based on verbosity
- Add `CLAUDE_MEM_DELETE_TIMEOUT` env var
- **Effort:** Low, **Impact:** Medium
### Priority 3: Performance Optimizations
5. **Replace Polling with Event-Driven Queue**
- Use `AsyncQueue` with notifications instead of 100ms polling
- Reduces latency from queue to processing
- **Effort:** Medium, **Impact:** Low (performance)
6. **Add Queue Metrics**
- Track queue length over time
- Alert if queue grows unbounded
- Add to `/health` endpoint
- **Effort:** Low, **Impact:** Low (observability)
---
## Appendix: Configuration Reference
### Environment Variables
| Variable | Default | Purpose | Blocking Impact |
|----------|---------|---------|----------------|
| `CLAUDE_MEM_MODEL` | `claude-sonnet-4-5` | AI model for processing | Invalid = SDK fails |
| `CLAUDE_MEM_WORKER_PORT` | `37777` | HTTP server port | Invalid = Worker won't start |
### Constants
| Constant | Value | Purpose |
|----------|-------|---------|
| `DISALLOWED_TOOLS` | `['Glob', 'Grep', 'ListMcpResourcesTool', 'WebSearch']` | Tools SDK agent can't use |
| Polling interval | `100ms` | Queue polling frequency |
| DELETE timeout | `5000ms` | Max wait for agent shutdown |
---
## Conclusion
The claude-mem worker server is a well-designed system with a clear **defensive, layered architecture** that prioritizes **data persistence**. The key strengths are:
1. **Auto-recovery** from worker restarts
2. **Permissive parsing** that saves partial data
3. **Nullable schema** that accepts incomplete information
4. **Session isolation** preventing cross-contamination
The main vulnerability is the **in-memory queue**, which could be mitigated by persisting to the database. Overall, the system achieves its goal of creating a persistent memory system that survives failures and continues operating even with incomplete data.
**Design Philosophy:** "Better to save partial data than lose everything."
This philosophy is evident throughout the codebase, from the v4.2.5/v4.2.6 parser fixes to the auto-creation patterns in the database layer. The system is built to be resilient to AI errors, configuration issues, and process failures.
---
**End of Document**