Improve error handling and logging across worker services (#528)

* fix: prevent memory_session_id from equaling content_session_id The bug: memory_session_id was initialized to contentSessionId as a "placeholder for FK purposes". This caused the SDK resume logic to inject memory agent messages into the USER's Claude Code transcript, corrupting their conversation history. Root cause: - SessionStore.createSDKSession initialized memory_session_id = contentSessionId - SDKAgent checked memorySessionId !== contentSessionId but this check only worked if the session was fetched fresh from DB The fix: - SessionStore: Initialize memory_session_id as NULL, not contentSessionId - SDKAgent: Simple truthy check !!session.memorySessionId (NULL = fresh start) - Database migration: Ran UPDATE to set memory_session_id = NULL for 1807 existing sessions that had the bug Also adds [ALIGNMENT] logging across the session lifecycle to help debug session continuity issues: - Hook entry: contentSessionId + promptNumber - DB lookup: contentSessionId → memorySessionId mapping proof - Resume decision: shows which memorySessionId will be used for resume - Capture: logs when memorySessionId is captured from first SDK response UI: Added "Alignment" quick filter button in LogsModal to show only alignment logs for debugging session continuity. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor: improve error handling in worker-service.ts - Fix GENERIC_CATCH anti-patterns by logging full error objects instead of just messages - Add [ANTI-PATTERN IGNORED] markers for legitimate cases (cleanup, hot paths) - Simplify error handling comments to be more concise - Improve httpShutdown() error discrimination for ECONNREFUSED - Reduce LARGE_TRY_BLOCK issues in initialization code Part of anti-pattern cleanup plan (132 total issues) * refactor: improve error logging in SearchManager.ts - Pass full error objects to logger instead of just error.message - Fixes PARTIAL_ERROR_LOGGING anti-patterns (10 instances) - Better debugging visibility when Chroma queries fail Part of anti-pattern cleanup (133 remaining) * refactor: improve error logging across SessionStore and mcp-server - SessionStore.ts: Fix error logging in column rename utility - mcp-server.ts: Log full error objects instead of just error.message - Improve error handling in Worker API calls and tool execution Part of anti-pattern cleanup (133 remaining) * Refactor hooks to streamline error handling and loading states - Simplified error handling in useContextPreview by removing try-catch and directly checking response status. - Refactored usePagination to eliminate try-catch, improving readability and maintaining error handling through response checks. - Cleaned up useSSE by removing unnecessary try-catch around JSON parsing, ensuring clarity in message handling. - Enhanced useSettings by streamlining the saving process, removing try-catch, and directly checking the result for success. * refactor: add error handling back to SearchManager Chroma calls - Wrap queryChroma calls in try-catch to prevent generator crashes - Log Chroma errors as warnings and fall back gracefully - Fixes generator failures when Chroma has issues - Part of anti-pattern cleanup recovery * feat: Add generator failure investigation report and observation duplication regression report - Created a comprehensive investigation report detailing the root cause of generator failures during anti-pattern cleanup, including the impact, investigation process, and implemented fixes. - Documented the critical regression causing observation duplication due to race conditions in the SDK agent, outlining symptoms, root cause analysis, and proposed fixes. * fix: address PR #528 review comments - atomic cleanup and detector improvements This commit addresses critical review feedback from PR #528: ## 1. Atomic Message Cleanup (Fix Race Condition) **Problem**: SessionRoutes.ts generator error handler had race condition - Queried messages then marked failed in loop - If crash during loop → partial marking → inconsistent state **Solution**: - Added `markSessionMessagesFailed()` to PendingMessageStore.ts - Single atomic UPDATE statement replaces loop - Follows existing pattern from `resetProcessingToPending()` **Files**: - src/services/sqlite/PendingMessageStore.ts (new method) - src/services/worker/http/routes/SessionRoutes.ts (use new method) ## 2. Anti-Pattern Detector Improvements **Problem**: Detector didn't recognize logger.failure() method - Lines 212 & 335 already included "failure" - Lines 112-113 (PARTIAL_ERROR_LOGGING detection) did not **Solution**: Updated regex patterns to include "failure" for consistency **Files**: - scripts/anti-pattern-test/detect-error-handling-antipatterns.ts ## 3. Documentation **PR Comment**: Added clarification on memory_session_id fix location - Points to SessionStore.ts:1155 - Explains why NULL initialization prevents message injection bug ## Review Response Addresses "Must Address Before Merge" items from review: ✅ Clarified memory_session_id bug fix location (via PR comment) ✅ Made generator error handler message cleanup atomic ❌ Deferred comprehensive test suite to follow-up PR (keeps PR focused) ## Testing - Build passes with no errors - Anti-pattern detector runs successfully - Atomic cleanup follows proven pattern from existing methods 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix: FOREIGN KEY constraint and missing failed_at_epoch column Two critical bugs fixed: 1. Missing failed_at_epoch column in pending_messages table - Added migration 20 to create the column - Fixes error when trying to mark messages as failed 2. FOREIGN KEY constraint failed when storing observations - All three agents (SDK, Gemini, OpenRouter) were passing session.contentSessionId instead of session.memorySessionId - storeObservationsAndMarkComplete expects memorySessionId - Added null check and clear error message However, observations still not saving - see investigation report. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Refactor hook input parsing to improve error handling - Added a nested try-catch block in new-hook.ts, save-hook.ts, and summary-hook.ts to handle JSON parsing errors more gracefully. - Replaced direct error throwing with logging of the error details using logger.error. - Ensured that the process exits cleanly after handling input in all three hooks. --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 18:51:59 -05:00
parent e830157e77
commit 817b9e8f27
31 changed files with 4490 additions and 3292 deletions
@@ -15,7 +15,7 @@ interface AntiPattern {
  file: string;
  line: number;
  pattern: string;
-  severity: 'CRITICAL' | 'HIGH' | 'MEDIUM' | 'APPROVED_OVERRIDE';
+  severity: 'ISSUE' | 'APPROVED_OVERRIDE';
  description: string;
  code: string;
  overrideReason?: string;
@@ -98,7 +98,7 @@ function detectAntiPatterns(filePath: string, projectRoot: string): AntiPattern[
            file: relPath,
            line: i + 1,
            pattern: 'ERROR_STRING_MATCHING',
-            severity: isGeneric ? 'CRITICAL' : 'HIGH',
+            severity: 'ISSUE',
            description: `Error type detection via string matching on "${matchedString}" - fragile and masks the real error. Log the FULL error object. We don't care about pretty error handling, we care about SEEING what went wrong.`,
            code: trimmed
          });
@@ -109,8 +109,8 @@ function detectAntiPatterns(filePath: string, projectRoot: string): AntiPattern[
    // HIGH: Logging only error.message instead of the full error object
    // Patterns like: logger.error('X', 'Y', {}, error.message) or console.error(error.message)
    const partialErrorLoggingPatterns = [
-      /logger\.(error|warn|info|debug)\s*\([^)]*,\s*(?:error|err|e)\.message\s*\)/,
-      /logger\.(error|warn|info|debug)\s*\([^)]*\{\s*(?:error|err|e):\s*(?:error|err|e)\.message\s*\}/,
+      /logger\.(error|warn|info|debug|failure)\s*\([^)]*,\s*(?:error|err|e)\.message\s*\)/,
+      /logger\.(error|warn|info|debug|failure)\s*\([^)]*\{\s*(?:error|err|e):\s*(?:error|err|e)\.message\s*\}/,
      /console\.(error|warn|log)\s*\(\s*(?:error|err|e)\.message\s*\)/,
      /console\.(error|warn|log)\s*\(\s*['"`][^'"`]+['"`]\s*,\s*(?:error|err|e)\.message\s*\)/,
    ];
@@ -132,7 +132,7 @@ function detectAntiPatterns(filePath: string, projectRoot: string): AntiPattern[
            file: relPath,
            line: i + 1,
            pattern: 'PARTIAL_ERROR_LOGGING',
-            severity: 'HIGH',
+            severity: 'ISSUE',
            description: 'Logging only error.message HIDES the stack trace, error type, and all properties. ALWAYS pass the full error object - you need the complete picture, not a summary.',
            code: trimmed
          });
@@ -159,7 +159,7 @@ function detectAntiPatterns(filePath: string, projectRoot: string): AntiPattern[
          file: relPath,
          line: i + 1,
          pattern: 'ERROR_MESSAGE_GUESSING',
-          severity: 'CRITICAL',
+          severity: 'ISSUE',
          description: 'Multiple string checks on error message to guess error type. STOP GUESSING. Log the FULL error object. We don\'t care what the library throws - we care about SEEING the error when it happens.',
          code: trimmed
        });
@@ -187,7 +187,7 @@ function detectAntiPatterns(filePath: string, projectRoot: string): AntiPattern[
        file: relPath,
        line: i + 1,
        pattern: 'PROMISE_EMPTY_CATCH',
-        severity: 'CRITICAL',
+        severity: 'ISSUE',
        description: 'Promise .catch() with empty handler - errors disappear into the void.',
        code: trimmed
      });
@@ -217,7 +217,7 @@ function detectAntiPatterns(filePath: string, projectRoot: string): AntiPattern[
          file: relPath,
          line: i + 1,
          pattern: 'PROMISE_CATCH_NO_LOGGING',
-          severity: 'CRITICAL',
+          severity: 'ISSUE',
          description: 'Promise .catch() without logging - errors are silently swallowed.',
          code: catchBody.trim().split('\n').slice(0, 5).join('\n')
        });
@@ -353,7 +353,7 @@ function analyzeTryCatchBlock(
        file: relPath,
        line: catchStartLine,
        pattern: 'NO_LOGGING_IN_CATCH',
-        severity: 'CRITICAL',
+        severity: 'ISSUE',
        description: 'Catch block has no logging - errors occur invisibly.',
        code: catchBlock.trim()
      });
@@ -371,7 +371,7 @@ function analyzeTryCatchBlock(
      file: relPath,
      line: tryStartLine,
      pattern: 'LARGE_TRY_BLOCK',
-      severity: 'HIGH',
+      severity: 'ISSUE',
      description: `Try block has ${significantTryLines} lines - too broad. Multiple errors lumped together.`,
      code: `${tryLines.slice(0, 3).join('\n')}\n... (${significantTryLines} lines) ...`
    });
@@ -388,7 +388,7 @@ function analyzeTryCatchBlock(
      file: relPath,
      line: catchStartLine,
      pattern: 'GENERIC_CATCH',
-      severity: 'MEDIUM',
+      severity: 'ISSUE',
      description: 'Catch block handles all errors identically - no error type discrimination.',
      code: catchBlock.trim()
    });
@@ -416,7 +416,7 @@ function analyzeTryCatchBlock(
          file: relPath,
          line: catchStartLine,
          pattern: 'CATCH_AND_CONTINUE_CRITICAL_PATH',
-          severity: 'CRITICAL',
+          severity: 'ISSUE',
          description: 'Critical path continues after error - may cause silent data corruption.',
          code: catchBlock.trim()
        });
@@ -427,9 +427,7 @@ function analyzeTryCatchBlock(
 }

 function formatReport(antiPatterns: AntiPattern[]): string {
-  const critical = antiPatterns.filter(a => a.severity === 'CRITICAL');
-  const high = antiPatterns.filter(a => a.severity === 'HIGH');
-  const medium = antiPatterns.filter(a => a.severity === 'MEDIUM');
+  const issues = antiPatterns.filter(a => a.severity === 'ISSUE');
  const approved = antiPatterns.filter(a => a.severity === 'APPROVED_OVERRIDE');

  if (antiPatterns.length === 0) {
@@ -440,47 +438,16 @@ function formatReport(antiPatterns: AntiPattern[]): string {
  report += '═══════════════════════════════════════════════════════════════\n';
  report += '  ERROR HANDLING ANTI-PATTERNS DETECTED\n';
  report += '═══════════════════════════════════════════════════════════════\n\n';
-  report += `Found ${critical.length + high.length + medium.length} anti-patterns:\n`;
-  report += `  🔴 CRITICAL: ${critical.length}\n`;
-  report += `  🟠 HIGH: ${high.length}\n`;
-  report += `  🟡 MEDIUM: ${medium.length}\n`;
+  report += `Found ${issues.length} anti-patterns that must be fixed:\n`;
  if (approved.length > 0) {
    report += `  ⚪ APPROVED OVERRIDES: ${approved.length}\n`;
  }
  report += '\n';

-  if (critical.length > 0) {
-    report += '🔴 CRITICAL ISSUES (Fix immediately - these cause silent failures):\n';
+  if (issues.length > 0) {
+    report += '❌ ISSUES TO FIX:\n';
    report += '─────────────────────────────────────────────────────────────\n\n';
-    for (const ap of critical) {
-      report += `📁 ${ap.file}:${ap.line}\n`;
-      report += `❌ ${ap.pattern}\n`;
-      report += `   ${ap.description}\n\n`;
-      report += `   Code:\n`;
-      const codeLines = ap.code.split('\n');
-      for (const line of codeLines.slice(0, 5)) {
-        report += `   ${line}\n`;
-      }
-      if (codeLines.length > 5) {
-        report += `   ... (${codeLines.length - 5} more lines)\n`;
-      }
-      report += '\n';
-    }
-  }
-
-  if (high.length > 0) {
-    report += '🟠 HIGH PRIORITY:\n';
-    report += '─────────────────────────────────────────────────────────────\n\n';
-    for (const ap of high) {
-      report += `📁 ${ap.file}:${ap.line} - ${ap.pattern}\n`;
-      report += `   ${ap.description}\n\n`;
-    }
-  }
-
-  if (medium.length > 0) {
-    report += '🟡 MEDIUM PRIORITY:\n';
-    report += '─────────────────────────────────────────────────────────────\n\n';
-    for (const ap of medium) {
+    for (const ap of issues) {
      report += `📁 ${ap.file}:${ap.line} - ${ap.pattern}\n`;
      report += `   ${ap.description}\n\n`;
    }
@@ -537,10 +504,10 @@ for (const file of tsFiles) {
 const report = formatReport(allAntiPatterns);
 console.log(report);

-// Exit with error code if critical issues found
-const critical = allAntiPatterns.filter(a => a.severity === 'CRITICAL');
-if (critical.length > 0) {
-  console.error(`❌ FAILED: ${critical.length} critical error handling anti-patterns must be fixed.\n`);
+// Exit with error code if any issues found
+const issues = allAntiPatterns.filter(a => a.severity === 'ISSUE');
+if (issues.length > 0) {
+  console.error(`❌ FAILED: ${issues.length} error handling anti-patterns must be fixed.\n`);
  process.exit(1);
 }