Files
claude-mem/src/servers/mcp-server.ts
T
Alex Newman 817b9e8f27 Improve error handling and logging across worker services (#528)
* fix: prevent memory_session_id from equaling content_session_id

The bug: memory_session_id was initialized to contentSessionId as a
"placeholder for FK purposes". This caused the SDK resume logic to
inject memory agent messages into the USER's Claude Code transcript,
corrupting their conversation history.

Root cause:
- SessionStore.createSDKSession initialized memory_session_id = contentSessionId
- SDKAgent checked memorySessionId !== contentSessionId but this check
  only worked if the session was fetched fresh from DB

The fix:
- SessionStore: Initialize memory_session_id as NULL, not contentSessionId
- SDKAgent: Simple truthy check !!session.memorySessionId (NULL = fresh start)
- Database migration: Ran UPDATE to set memory_session_id = NULL for 1807
  existing sessions that had the bug

Also adds [ALIGNMENT] logging across the session lifecycle to help debug
session continuity issues:
- Hook entry: contentSessionId + promptNumber
- DB lookup: contentSessionId → memorySessionId mapping proof
- Resume decision: shows which memorySessionId will be used for resume
- Capture: logs when memorySessionId is captured from first SDK response

UI: Added "Alignment" quick filter button in LogsModal to show only
alignment logs for debugging session continuity.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: improve error handling in worker-service.ts

- Fix GENERIC_CATCH anti-patterns by logging full error objects instead of just messages
- Add [ANTI-PATTERN IGNORED] markers for legitimate cases (cleanup, hot paths)
- Simplify error handling comments to be more concise
- Improve httpShutdown() error discrimination for ECONNREFUSED
- Reduce LARGE_TRY_BLOCK issues in initialization code

Part of anti-pattern cleanup plan (132 total issues)

* refactor: improve error logging in SearchManager.ts

- Pass full error objects to logger instead of just error.message
- Fixes PARTIAL_ERROR_LOGGING anti-patterns (10 instances)
- Better debugging visibility when Chroma queries fail

Part of anti-pattern cleanup (133 remaining)

* refactor: improve error logging across SessionStore and mcp-server

- SessionStore.ts: Fix error logging in column rename utility
- mcp-server.ts: Log full error objects instead of just error.message
- Improve error handling in Worker API calls and tool execution

Part of anti-pattern cleanup (133 remaining)

* Refactor hooks to streamline error handling and loading states

- Simplified error handling in useContextPreview by removing try-catch and directly checking response status.
- Refactored usePagination to eliminate try-catch, improving readability and maintaining error handling through response checks.
- Cleaned up useSSE by removing unnecessary try-catch around JSON parsing, ensuring clarity in message handling.
- Enhanced useSettings by streamlining the saving process, removing try-catch, and directly checking the result for success.

* refactor: add error handling back to SearchManager Chroma calls

- Wrap queryChroma calls in try-catch to prevent generator crashes
- Log Chroma errors as warnings and fall back gracefully
- Fixes generator failures when Chroma has issues
- Part of anti-pattern cleanup recovery

* feat: Add generator failure investigation report and observation duplication regression report

- Created a comprehensive investigation report detailing the root cause of generator failures during anti-pattern cleanup, including the impact, investigation process, and implemented fixes.
- Documented the critical regression causing observation duplication due to race conditions in the SDK agent, outlining symptoms, root cause analysis, and proposed fixes.

* fix: address PR #528 review comments - atomic cleanup and detector improvements

This commit addresses critical review feedback from PR #528:

## 1. Atomic Message Cleanup (Fix Race Condition)

**Problem**: SessionRoutes.ts generator error handler had race condition
- Queried messages then marked failed in loop
- If crash during loop → partial marking → inconsistent state

**Solution**:
- Added `markSessionMessagesFailed()` to PendingMessageStore.ts
- Single atomic UPDATE statement replaces loop
- Follows existing pattern from `resetProcessingToPending()`

**Files**:
- src/services/sqlite/PendingMessageStore.ts (new method)
- src/services/worker/http/routes/SessionRoutes.ts (use new method)

## 2. Anti-Pattern Detector Improvements

**Problem**: Detector didn't recognize logger.failure() method
- Lines 212 & 335 already included "failure"
- Lines 112-113 (PARTIAL_ERROR_LOGGING detection) did not

**Solution**: Updated regex patterns to include "failure" for consistency

**Files**:
- scripts/anti-pattern-test/detect-error-handling-antipatterns.ts

## 3. Documentation

**PR Comment**: Added clarification on memory_session_id fix location
- Points to SessionStore.ts:1155
- Explains why NULL initialization prevents message injection bug

## Review Response

Addresses "Must Address Before Merge" items from review:
 Clarified memory_session_id bug fix location (via PR comment)
 Made generator error handler message cleanup atomic
 Deferred comprehensive test suite to follow-up PR (keeps PR focused)

## Testing

- Build passes with no errors
- Anti-pattern detector runs successfully
- Atomic cleanup follows proven pattern from existing methods

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: FOREIGN KEY constraint and missing failed_at_epoch column

Two critical bugs fixed:

1. Missing failed_at_epoch column in pending_messages table
   - Added migration 20 to create the column
   - Fixes error when trying to mark messages as failed

2. FOREIGN KEY constraint failed when storing observations
   - All three agents (SDK, Gemini, OpenRouter) were passing
     session.contentSessionId instead of session.memorySessionId
   - storeObservationsAndMarkComplete expects memorySessionId
   - Added null check and clear error message

However, observations still not saving - see investigation report.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* Refactor hook input parsing to improve error handling

- Added a nested try-catch block in new-hook.ts, save-hook.ts, and summary-hook.ts to handle JSON parsing errors more gracefully.
- Replaced direct error throwing with logging of the error details using logger.error.
- Ensured that the process exits cleanly after handling input in all three hooks.

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 18:51:59 -05:00

315 lines
9.2 KiB
TypeScript

/**
* Claude-mem MCP Search Server - Thin HTTP Wrapper
*
* Refactored from 2,718 lines to ~600-800 lines
* Delegates all business logic to Worker HTTP API at localhost:37777
* Maintains MCP protocol handling and tool schemas
*/
// Import logger first
import { logger } from '../utils/logger.js';
// CRITICAL: Redirect console to stderr BEFORE other imports
// MCP uses stdio transport where stdout is reserved for JSON-RPC protocol messages.
// Any logs to stdout break the protocol (Claude Desktop parses "[2025..." as JSON array).
const _originalLog = console['log'];
console['log'] = (...args: any[]) => {
logger.error('CONSOLE', 'Intercepted console output (MCP protocol protection)', undefined, { args });
};
import { Server } from '@modelcontextprotocol/sdk/server/index.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import {
CallToolRequestSchema,
ListToolsRequestSchema,
} from '@modelcontextprotocol/sdk/types.js';
import { getWorkerPort, getWorkerHost } from '../shared/worker-utils.js';
/**
* Worker HTTP API configuration
*/
const WORKER_PORT = getWorkerPort();
const WORKER_HOST = getWorkerHost();
const WORKER_BASE_URL = `http://${WORKER_HOST}:${WORKER_PORT}`;
/**
* Map tool names to Worker HTTP endpoints
*/
const TOOL_ENDPOINT_MAP: Record<string, string> = {
'search': '/api/search',
'timeline': '/api/timeline'
};
/**
* Call Worker HTTP API endpoint
*/
async function callWorkerAPI(
endpoint: string,
params: Record<string, any>
): Promise<{ content: Array<{ type: 'text'; text: string }>; isError?: boolean }> {
logger.debug('SYSTEM', '→ Worker API', undefined, { endpoint, params });
try {
const searchParams = new URLSearchParams();
// Convert params to query string
for (const [key, value] of Object.entries(params)) {
if (value !== undefined && value !== null) {
searchParams.append(key, String(value));
}
}
const url = `${WORKER_BASE_URL}${endpoint}?${searchParams}`;
const response = await fetch(url);
if (!response.ok) {
const errorText = await response.text();
throw new Error(`Worker API error (${response.status}): ${errorText}`);
}
const data = await response.json() as { content: Array<{ type: 'text'; text: string }>; isError?: boolean };
logger.debug('SYSTEM', '← Worker API success', undefined, { endpoint });
// Worker returns { content: [...] } format directly
return data;
} catch (error) {
logger.error('SYSTEM', '← Worker API error', { endpoint }, error as Error);
return {
content: [{
type: 'text' as const,
text: `Error calling Worker API: ${error instanceof Error ? error.message : String(error)}`
}],
isError: true
};
}
}
/**
* Call Worker HTTP API with POST body
*/
async function callWorkerAPIPost(
endpoint: string,
body: Record<string, any>
): Promise<{ content: Array<{ type: 'text'; text: string }>; isError?: boolean }> {
logger.debug('HTTP', 'Worker API request (POST)', undefined, { endpoint });
try {
const url = `${WORKER_BASE_URL}${endpoint}`;
const response = await fetch(url, {
method: 'POST',
headers: {
'Content-Type': 'application/json'
},
body: JSON.stringify(body)
});
if (!response.ok) {
const errorText = await response.text();
throw new Error(`Worker API error (${response.status}): ${errorText}`);
}
const data = await response.json();
logger.debug('HTTP', 'Worker API success (POST)', undefined, { endpoint });
// Wrap raw data in MCP format
return {
content: [{
type: 'text' as const,
text: JSON.stringify(data, null, 2)
}]
};
} catch (error) {
logger.error('HTTP', 'Worker API error (POST)', { endpoint }, error as Error);
return {
content: [{
type: 'text' as const,
text: `Error calling Worker API: ${error instanceof Error ? error.message : String(error)}`
}],
isError: true
};
}
}
/**
* Verify Worker is accessible
*/
async function verifyWorkerConnection(): Promise<boolean> {
try {
const response = await fetch(`${WORKER_BASE_URL}/api/health`);
return response.ok;
} catch (error) {
// Expected during worker startup or if worker is down
logger.debug('SYSTEM', 'Worker health check failed', {}, error as Error);
return false;
}
}
/**
* Tool definitions with HTTP-based handlers
* Minimal descriptions - use help() tool with operation parameter for detailed docs
*/
const tools = [
{
name: '__IMPORTANT',
description: `3-LAYER WORKFLOW (ALWAYS FOLLOW):
1. search(query) → Get index with IDs (~50-100 tokens/result)
2. timeline(anchor=ID) → Get context around interesting results
3. get_observations([IDs]) → Fetch full details ONLY for filtered IDs
NEVER fetch full details without filtering first. 10x token savings.`,
inputSchema: {
type: 'object',
properties: {}
},
handler: async () => ({
content: [{
type: 'text' as const,
text: `# Memory Search Workflow
**3-Layer Pattern (ALWAYS follow this):**
1. **Search** - Get index of results with IDs
\`search(query="...", limit=20, project="...")\`
Returns: Table with IDs, titles, dates (~50-100 tokens/result)
2. **Timeline** - Get context around interesting results
\`timeline(anchor=<ID>, depth_before=3, depth_after=3)\`
Returns: Chronological context showing what was happening
3. **Fetch** - Get full details ONLY for relevant IDs
\`get_observations(ids=[...])\` # ALWAYS batch for 2+ items
Returns: Complete details (~500-1000 tokens/result)
**Why:** 10x token savings. Never fetch full details without filtering first.`
}]
})
},
{
name: 'search',
description: 'Step 1: Search memory. Returns index with IDs. Params: query, limit, project, type, obs_type, dateStart, dateEnd, offset, orderBy',
inputSchema: {
type: 'object',
properties: {},
additionalProperties: true
},
handler: async (args: any) => {
const endpoint = TOOL_ENDPOINT_MAP['search'];
return await callWorkerAPI(endpoint, args);
}
},
{
name: 'timeline',
description: 'Step 2: Get context around results. Params: anchor (observation ID) OR query (finds anchor automatically), depth_before, depth_after, project',
inputSchema: {
type: 'object',
properties: {},
additionalProperties: true
},
handler: async (args: any) => {
const endpoint = TOOL_ENDPOINT_MAP['timeline'];
return await callWorkerAPI(endpoint, args);
}
},
{
name: 'get_observations',
description: 'Step 3: Fetch full details for filtered IDs. Params: ids (array of observation IDs, required), orderBy, limit, project',
inputSchema: {
type: 'object',
properties: {
ids: {
type: 'array',
items: { type: 'number' },
description: 'Array of observation IDs to fetch (required)'
}
},
required: ['ids'],
additionalProperties: true
},
handler: async (args: any) => {
return await callWorkerAPIPost('/api/observations/batch', args);
}
}
];
// Create the MCP server
const server = new Server(
{
name: 'mcp-search-server',
version: '1.0.0',
},
{
capabilities: {
tools: {},
},
}
);
// Register tools/list handler
server.setRequestHandler(ListToolsRequestSchema, async () => {
return {
tools: tools.map(tool => ({
name: tool.name,
description: tool.description,
inputSchema: tool.inputSchema
}))
};
});
// Register tools/call handler
server.setRequestHandler(CallToolRequestSchema, async (request) => {
const tool = tools.find(t => t.name === request.params.name);
if (!tool) {
throw new Error(`Unknown tool: ${request.params.name}`);
}
try {
return await tool.handler(request.params.arguments || {});
} catch (error) {
logger.error('SYSTEM', 'Tool execution failed', { tool: request.params.name }, error as Error);
return {
content: [{
type: 'text' as const,
text: `Tool execution failed: ${error instanceof Error ? error.message : String(error)}`
}],
isError: true
};
}
});
// Cleanup function
async function cleanup() {
logger.info('SYSTEM', 'MCP server shutting down');
process.exit(0);
}
// Register cleanup handlers for graceful shutdown
process.on('SIGTERM', cleanup);
process.on('SIGINT', cleanup);
// Start the server
async function main() {
// Start the MCP server
const transport = new StdioServerTransport();
await server.connect(transport);
logger.info('SYSTEM', 'Claude-mem search server started');
// Check Worker availability in background
setTimeout(async () => {
const workerAvailable = await verifyWorkerConnection();
if (!workerAvailable) {
logger.warn('SYSTEM', 'Worker not available', undefined, { workerUrl: WORKER_BASE_URL });
logger.warn('SYSTEM', 'Tools will fail until Worker is started');
logger.warn('SYSTEM', 'Start Worker with: npm run worker:restart');
} else {
logger.info('SYSTEM', 'Worker available', undefined, { workerUrl: WORKER_BASE_URL });
}
}, 0);
}
main().catch((error) => {
logger.error('SYSTEM', 'Fatal error', undefined, error);
process.exit(1);
});