Files
claude-mem/src/services/sqlite/PendingMessageStore.ts
T
Alex Newman 417acb0f81 fix: comprehensive error handling improvements and architecture documentation (#522)
* Add enforceable anti-pattern detection for try-catch abuse

PROBLEM:
- Overly-broad try-catch blocks waste 10+ hours of debugging time
- Empty catch blocks silently swallow errors
- AI assistants use try-catch to paper over uncertainty instead of doing research

SOLUTION:
1. Created detect-error-handling-antipatterns.ts test
   - Detects empty catch blocks (45 CRITICAL found)
   - Detects catch without logging (45 CRITICAL total)
   - Detects large try blocks (>10 lines)
   - Detects generic catch without type checking
   - Detects catch-and-continue on critical paths
   - Exit code 1 if critical issues found

2. Updated CLAUDE.md with MANDATORY ERROR HANDLING RULES
   - 5-question pre-flight checklist before any try-catch
   - FORBIDDEN patterns with examples
   - ALLOWED patterns with examples
   - Meta-rule: UNCERTAINTY TRIGGERS RESEARCH, NOT TRY-CATCH
   - Critical path protection list

3. Created comprehensive try-catch audit report
   - Documents all 96 try-catch blocks in worker service
   - Identifies critical issue at worker-service.ts:748-750
   - Categorizes patterns and provides recommendations

This is enforceable via test, not just instructions that can be ignored.

Current state: 163 anti-patterns detected (45 critical, 47 high, 71 medium)
Next: Fix critical issues identified by test

🤖 Generated with Claude Code
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: add logging to 5 critical empty catch blocks (Wave 1)

Wave 1 of error handling cleanup - fixing empty catch blocks that
silently swallow errors without any trace.

Fixed files:
- src/bin/import-xml-observations.ts:80 - Log skipped invalid JSON
- src/utils/bun-path.ts:33 - Log when bun not in PATH
- src/utils/cursor-utils.ts:44 - Log failed registry reads
- src/utils/cursor-utils.ts:149 - Log corrupt MCP config
- src/shared/worker-utils.ts:128 - Log failed health checks

All catch blocks now have proper logging with context and error details.

Progress: 41 → 39 CRITICAL issues remaining

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: add logging to promise catches on critical paths (Wave 2)

Wave 2 of error handling cleanup - fixing empty promise catch handlers
that silently swallow errors on critical code paths. These are the
patterns that caused the 10-hour debugging session.

Fixed empty promise catches:
- worker-service.ts:642 - Background initialization failures
- SDKAgent.ts:372,446 - Session processor errors
- GeminiAgent.ts:408,475 - Finalization failures
- OpenRouterAgent.ts:451,518 - Finalization failures
- SessionManager.ts:289 - Generator promise failures

Added justification comments to catch-and-continue blocks:
- worker-service.ts:68 - PID file removal (cleanup, non-critical)
- worker-service.ts:130 - Cursor context update (non-critical)

All promise rejection handlers now log errors with context, preventing
silent failures that were nearly impossible to debug.

Note: The anti-pattern detector only tracks try-catch blocks, not
standalone promise chains. These fixes address the root cause of the
original 10-hour debugging session even though the detector count
remains unchanged.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: add logging and documentation to error handling patterns (Wave 3)

Wave 3 of error handling cleanup - comprehensive review and fixes for
remaining critical issues identified by the anti-pattern detector.

Changes organized by severity:

**Wave 3.1: Fixed 2 EMPTY_CATCH blocks**
- worker-service.ts:162 - Health check polling now logs failures
- worker-service.ts:610 - Process cleanup logs failures

**Wave 3.2: Reviewed 12 CATCH_AND_CONTINUE patterns**
- Verified all are correct (log errors AND exit/return HTTP errors)
- Added justification comment to session recovery (line 829)
- All patterns properly notify callers of failures

**Wave 3.3: Fixed 29 NO_LOGGING_IN_CATCH issues**

Added logging to 16 catch blocks:
- UI layer: useSettings.ts, useContextPreview.ts (console logging)
- Servers: mcp-server.ts health checks and tool execution
- Worker: version fetch, cleanup, config corruption
- Routes: error handler, session recovery, settings validation
- Services: branch checkout, timeline queries

Documented 13 intentional exceptions with comments explaining why:
- Hot paths (port checks, process checks in tight loops)
- Error accumulation (transcript parser collects for batch retrieval)
- Special cases (logger can't log its own failures)
- Fallback parsing (JSON parse in optional data structures)

All changes follow error handling guidelines from CLAUDE.md:
- Appropriate log levels (error/warn/debug)
- Context objects with relevant details
- Descriptive messages explaining failures
- Error extraction pattern for Error instances

Progress: 41 → 29 detector warnings
Remaining warnings are conservative flags on verified-correct patterns
(catch-and-continue blocks that properly log + notify callers).

Build verified successful. All error handling now provides visibility
for debugging while avoiding excessive logging on hot paths.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* feat: add queue:clear command to remove failed messages

Added functionality to clear failed messages from the observation queue:

**Changes:**
- PendingMessageStore: Added clearFailed() method to delete failed messages
- DataRoutes: Added DELETE /api/pending-queue/failed endpoint
- CLI: Created scripts/clear-failed-queue.ts for interactive queue clearing
- package.json: Added npm run queue:clear script

**Usage:**
  npm run queue:clear          # Interactive - prompts for confirmation
  npm run queue:clear -- --force  # Non-interactive - clears without prompt

Failed messages are observations that exceeded max retry count. They
remain in the queue for debugging but won't be processed. This command
removes them to clean up the queue.

Works alongside existing queue:check and queue:process commands to
provide complete queue management capabilities.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* feat: add --all flag to queue:clear for complete queue reset

Extended queue clearing functionality to support clearing all messages,
not just failed ones.

**Changes:**
- PendingMessageStore: Added clearAll() method to clear pending, processing, and failed
- DataRoutes: Added DELETE /api/pending-queue/all endpoint
- clear-failed-queue.ts: Added --all flag to clear everything
- Updated help text and UI to distinguish between failed-only and all-clear modes

**Usage:**
  npm run queue:clear              # Clear failed only (interactive)
  npm run queue:clear -- --all     # Clear ALL messages (interactive)
  npm run queue:clear -- --all --force  # Clear all without confirmation

The --all flag provides a complete queue reset, removing pending,
processing, and failed messages. Useful when you want a fresh start
or need to cancel stuck sessions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* feat: add comprehensive documentation for session ID architecture and validation tests

* feat: add logs viewer with clear functionality to UI

- Add LogsRoutes API endpoint for fetching and clearing worker logs
- Create LogsModal component with auto-refresh and clear button
- Integrate logs viewer button into Header component
- Add comprehensive CSS styling for logs modal
- Logs accessible via new document icon button in header

Logs viewer features:
- Display last 1000 lines of current day's log file
- Auto-refresh toggle (2s interval)
- Clear logs button with confirmation
- Monospace font for readable log output
- Responsive modal design matching existing UI

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* refactor: redesign logs as Chrome DevTools-style console drawer

Major UX improvements to match Chrome DevTools console:
- Convert from modal to bottom drawer that slides up
- Move toggle button to bottom-left corner (floating button)
- Add draggable resize handle for height adjustment
- Use plain monospace font (SF Mono/Monaco/Consolas) instead of Monaspace
- Simplify controls with icon-only buttons
- Add Console tab UI matching DevTools aesthetic

Changes:
- Renamed LogsModal to LogsDrawer with drawer implementation
- Added resize functionality with mouse drag
- Removed logs button from header
- Added floating console toggle button in bottom-left
- Updated all CSS to match Chrome console styling
- Minimum height: 150px, maximum: window height - 100px

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: suppress /api/logs endpoint logging to reduce noise

Skip logging GET /api/logs requests in HTTP middleware to prevent
log spam from auto-refresh polling (every 2s). Keeps the auto-refresh
feature functional while eliminating the repetitive log entries.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* refactor: enhance error handling guidelines with approved overrides for justified exceptions

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-01 23:38:22 -05:00

430 lines
14 KiB
TypeScript

import { Database } from './sqlite-compat.js';
import type { PendingMessage } from '../worker-types.js';
import { logger } from '../../utils/logger.js';
/**
* Persistent pending message record from database
*/
export interface PersistentPendingMessage {
id: number;
session_db_id: number;
content_session_id: string;
message_type: 'observation' | 'summarize';
tool_name: string | null;
tool_input: string | null;
tool_response: string | null;
cwd: string | null;
last_user_message: string | null;
last_assistant_message: string | null;
prompt_number: number | null;
status: 'pending' | 'processing' | 'processed' | 'failed';
retry_count: number;
created_at_epoch: number;
started_processing_at_epoch: number | null;
completed_at_epoch: number | null;
}
/**
* PendingMessageStore - Persistent work queue for SDK messages
*
* Messages are persisted before processing and marked complete after success.
* This enables recovery from SDK hangs and worker crashes.
*
* Lifecycle:
* 1. enqueue() - Message persisted with status 'pending'
* 2. markProcessing() - Status changes to 'processing' when yielded to SDK
* 3. markProcessed() - Status changes to 'processed' after successful SDK response
* 4. markFailed() - Status changes to 'failed' if max retries exceeded
*
* Recovery:
* - resetStuckMessages() - Moves 'processing' messages back to 'pending' if stuck
* - getSessionsWithPendingMessages() - Find sessions that need recovery on startup
*/
export class PendingMessageStore {
private db: Database;
private maxRetries: number;
constructor(db: Database, maxRetries: number = 3) {
this.db = db;
this.maxRetries = maxRetries;
}
/**
* Enqueue a new message (persist before processing)
* @returns The database ID of the persisted message
*/
enqueue(sessionDbId: number, contentSessionId: string, message: PendingMessage): number {
const now = Date.now();
const stmt = this.db.prepare(`
INSERT INTO pending_messages (
session_db_id, content_session_id, message_type,
tool_name, tool_input, tool_response, cwd,
last_user_message, last_assistant_message,
prompt_number, status, retry_count, created_at_epoch
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 'pending', 0, ?)
`);
const result = stmt.run(
sessionDbId,
contentSessionId,
message.type,
message.tool_name || null,
message.tool_input ? JSON.stringify(message.tool_input) : null,
message.tool_response ? JSON.stringify(message.tool_response) : null,
message.cwd || null,
message.last_user_message || null,
message.last_assistant_message || null,
message.prompt_number || null,
now
);
return result.lastInsertRowid as number;
}
/**
* Atomically claim the next pending message for processing
* Finds oldest pending -> marks processing -> returns it
* Uses a transaction to prevent race conditions
*/
claimNextMessage(sessionDbId: number): PersistentPendingMessage | null {
const now = Date.now();
const claimTx = this.db.transaction((sessionId: number, timestamp: number) => {
const peekStmt = this.db.prepare(`
SELECT * FROM pending_messages
WHERE session_db_id = ? AND status = 'pending'
ORDER BY id ASC
LIMIT 1
`);
const msg = peekStmt.get(sessionId) as PersistentPendingMessage | null;
if (msg) {
const updateStmt = this.db.prepare(`
UPDATE pending_messages
SET status = 'processing', started_processing_at_epoch = ?
WHERE id = ?
`);
updateStmt.run(timestamp, msg.id);
// Return updated object
return {
...msg,
status: 'processing',
started_processing_at_epoch: timestamp
} as PersistentPendingMessage;
}
return null;
});
return claimTx(sessionDbId, now) as PersistentPendingMessage | null;
}
/**
* Get all pending messages for session (ordered by creation time)
*/
getAllPending(sessionDbId: number): PersistentPendingMessage[] {
const stmt = this.db.prepare(`
SELECT * FROM pending_messages
WHERE session_db_id = ? AND status = 'pending'
ORDER BY id ASC
`);
return stmt.all(sessionDbId) as PersistentPendingMessage[];
}
/**
* Get all queue messages (for UI display)
* Returns pending, processing, and failed messages (not processed - they're deleted)
* Joins with sdk_sessions to get project name
*/
getQueueMessages(): (PersistentPendingMessage & { project: string | null })[] {
const stmt = this.db.prepare(`
SELECT pm.*, ss.project
FROM pending_messages pm
LEFT JOIN sdk_sessions ss ON pm.content_session_id = ss.content_session_id
WHERE pm.status IN ('pending', 'processing', 'failed')
ORDER BY
CASE pm.status
WHEN 'failed' THEN 0
WHEN 'processing' THEN 1
WHEN 'pending' THEN 2
END,
pm.created_at_epoch ASC
`);
return stmt.all() as (PersistentPendingMessage & { project: string | null })[];
}
/**
* Get count of stuck messages (processing longer than threshold)
*/
getStuckCount(thresholdMs: number): number {
const cutoff = Date.now() - thresholdMs;
const stmt = this.db.prepare(`
SELECT COUNT(*) as count FROM pending_messages
WHERE status = 'processing' AND started_processing_at_epoch < ?
`);
const result = stmt.get(cutoff) as { count: number };
return result.count;
}
/**
* Retry a specific message (reset to pending)
* Works for pending (re-queue), processing (reset stuck), and failed messages
*/
retryMessage(messageId: number): boolean {
const stmt = this.db.prepare(`
UPDATE pending_messages
SET status = 'pending', started_processing_at_epoch = NULL
WHERE id = ? AND status IN ('pending', 'processing', 'failed')
`);
const result = stmt.run(messageId);
return result.changes > 0;
}
/**
* Reset all processing messages for a session to pending
* Used when force-restarting a stuck session
*/
resetProcessingToPending(sessionDbId: number): number {
const stmt = this.db.prepare(`
UPDATE pending_messages
SET status = 'pending', started_processing_at_epoch = NULL
WHERE session_db_id = ? AND status = 'processing'
`);
const result = stmt.run(sessionDbId);
return result.changes;
}
/**
* Abort a specific message (delete from queue)
*/
abortMessage(messageId: number): boolean {
const stmt = this.db.prepare('DELETE FROM pending_messages WHERE id = ?');
const result = stmt.run(messageId);
return result.changes > 0;
}
/**
* Retry all stuck messages at once
*/
retryAllStuck(thresholdMs: number): number {
const cutoff = Date.now() - thresholdMs;
const stmt = this.db.prepare(`
UPDATE pending_messages
SET status = 'pending', started_processing_at_epoch = NULL
WHERE status = 'processing' AND started_processing_at_epoch < ?
`);
const result = stmt.run(cutoff);
return result.changes;
}
/**
* Get recently processed messages (for UI feedback)
* Shows messages completed in the last N minutes so users can see their stuck items were processed
*/
getRecentlyProcessed(limit: number = 10, withinMinutes: number = 30): (PersistentPendingMessage & { project: string | null })[] {
const cutoff = Date.now() - (withinMinutes * 60 * 1000);
const stmt = this.db.prepare(`
SELECT pm.*, ss.project
FROM pending_messages pm
LEFT JOIN sdk_sessions ss ON pm.content_session_id = ss.content_session_id
WHERE pm.status = 'processed' AND pm.completed_at_epoch > ?
ORDER BY pm.completed_at_epoch DESC
LIMIT ?
`);
return stmt.all(cutoff, limit) as (PersistentPendingMessage & { project: string | null })[];
}
/**
* Mark message as being processed (status: pending -> processing)
*/
markProcessing(messageId: number): void {
const now = Date.now();
const stmt = this.db.prepare(`
UPDATE pending_messages
SET status = 'processing', started_processing_at_epoch = ?
WHERE id = ? AND status = 'pending'
`);
stmt.run(now, messageId);
}
/**
* Mark message as successfully processed (status: processing -> processed)
* Clears tool_input and tool_response to save space (observations are already saved)
*/
markProcessed(messageId: number): void {
const now = Date.now();
const stmt = this.db.prepare(`
UPDATE pending_messages
SET
status = 'processed',
completed_at_epoch = ?,
tool_input = NULL,
tool_response = NULL
WHERE id = ? AND status = 'processing'
`);
stmt.run(now, messageId);
}
/**
* Mark message as failed (status: processing -> failed or back to pending for retry)
* If retry_count < maxRetries, moves back to 'pending' for retry
* Otherwise marks as 'failed' permanently
*/
markFailed(messageId: number): void {
const now = Date.now();
// Get current retry count
const msg = this.db.prepare('SELECT retry_count FROM pending_messages WHERE id = ?').get(messageId) as { retry_count: number } | undefined;
if (!msg) return;
if (msg.retry_count < this.maxRetries) {
// Move back to pending for retry
const stmt = this.db.prepare(`
UPDATE pending_messages
SET status = 'pending', retry_count = retry_count + 1, started_processing_at_epoch = NULL
WHERE id = ?
`);
stmt.run(messageId);
} else {
// Max retries exceeded, mark as permanently failed
const stmt = this.db.prepare(`
UPDATE pending_messages
SET status = 'failed', completed_at_epoch = ?
WHERE id = ?
`);
stmt.run(now, messageId);
}
}
/**
* Reset stuck messages (processing -> pending if stuck longer than threshold)
* @param thresholdMs Messages processing longer than this are considered stuck (0 = reset all)
* @returns Number of messages reset
*/
resetStuckMessages(thresholdMs: number): number {
const cutoff = thresholdMs === 0 ? Date.now() : Date.now() - thresholdMs;
const stmt = this.db.prepare(`
UPDATE pending_messages
SET status = 'pending', started_processing_at_epoch = NULL
WHERE status = 'processing' AND started_processing_at_epoch < ?
`);
const result = stmt.run(cutoff);
return result.changes;
}
/**
* Get count of pending messages for a session
*/
getPendingCount(sessionDbId: number): number {
const stmt = this.db.prepare(`
SELECT COUNT(*) as count FROM pending_messages
WHERE session_db_id = ? AND status IN ('pending', 'processing')
`);
const result = stmt.get(sessionDbId) as { count: number };
return result.count;
}
/**
* Check if any session has pending work
*/
hasAnyPendingWork(): boolean {
const stmt = this.db.prepare(`
SELECT COUNT(*) as count FROM pending_messages
WHERE status IN ('pending', 'processing')
`);
const result = stmt.get() as { count: number };
return result.count > 0;
}
/**
* Get all session IDs that have pending messages (for recovery on startup)
*/
getSessionsWithPendingMessages(): number[] {
const stmt = this.db.prepare(`
SELECT DISTINCT session_db_id FROM pending_messages
WHERE status IN ('pending', 'processing')
`);
const results = stmt.all() as { session_db_id: number }[];
return results.map(r => r.session_db_id);
}
/**
* Get session info for a pending message (for recovery)
*/
getSessionInfoForMessage(messageId: number): { sessionDbId: number; contentSessionId: string } | null {
const stmt = this.db.prepare(`
SELECT session_db_id, content_session_id FROM pending_messages WHERE id = ?
`);
const result = stmt.get(messageId) as { session_db_id: number; content_session_id: string } | undefined;
return result ? { sessionDbId: result.session_db_id, contentSessionId: result.content_session_id } : null;
}
/**
* Cleanup old processed messages (retention policy)
* Keeps the most recent N processed messages, deletes the rest
* @param retentionCount Number of processed messages to keep (default: 100)
* @returns Number of messages deleted
*/
cleanupProcessed(retentionCount: number = 100): number {
const stmt = this.db.prepare(`
DELETE FROM pending_messages
WHERE status = 'processed'
AND id NOT IN (
SELECT id FROM pending_messages
WHERE status = 'processed'
ORDER BY completed_at_epoch DESC
LIMIT ?
)
`);
const result = stmt.run(retentionCount);
return result.changes;
}
/**
* Clear all failed messages from the queue
* @returns Number of messages deleted
*/
clearFailed(): number {
const stmt = this.db.prepare(`
DELETE FROM pending_messages
WHERE status = 'failed'
`);
const result = stmt.run();
return result.changes;
}
/**
* Clear all pending, processing, and failed messages from the queue
* Keeps only processed messages (for history)
* @returns Number of messages deleted
*/
clearAll(): number {
const stmt = this.db.prepare(`
DELETE FROM pending_messages
WHERE status IN ('pending', 'processing', 'failed')
`);
const result = stmt.run();
return result.changes;
}
/**
* Convert a PersistentPendingMessage back to PendingMessage format
*/
toPendingMessage(persistent: PersistentPendingMessage): PendingMessage {
return {
type: persistent.message_type,
tool_name: persistent.tool_name || undefined,
tool_input: persistent.tool_input ? JSON.parse(persistent.tool_input) : undefined,
tool_response: persistent.tool_response ? JSON.parse(persistent.tool_response) : undefined,
prompt_number: persistent.prompt_number || undefined,
cwd: persistent.cwd || undefined,
last_user_message: persistent.last_user_message || undefined,
last_assistant_message: persistent.last_assistant_message || undefined
};
}
}