Improve error handling and logging across worker services (#528)

* fix: prevent memory_session_id from equaling content_session_id The bug: memory_session_id was initialized to contentSessionId as a "placeholder for FK purposes". This caused the SDK resume logic to inject memory agent messages into the USER's Claude Code transcript, corrupting their conversation history. Root cause: - SessionStore.createSDKSession initialized memory_session_id = contentSessionId - SDKAgent checked memorySessionId !== contentSessionId but this check only worked if the session was fetched fresh from DB The fix: - SessionStore: Initialize memory_session_id as NULL, not contentSessionId - SDKAgent: Simple truthy check !!session.memorySessionId (NULL = fresh start) - Database migration: Ran UPDATE to set memory_session_id = NULL for 1807 existing sessions that had the bug Also adds [ALIGNMENT] logging across the session lifecycle to help debug session continuity issues: - Hook entry: contentSessionId + promptNumber - DB lookup: contentSessionId → memorySessionId mapping proof - Resume decision: shows which memorySessionId will be used for resume - Capture: logs when memorySessionId is captured from first SDK response UI: Added "Alignment" quick filter button in LogsModal to show only alignment logs for debugging session continuity. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor: improve error handling in worker-service.ts - Fix GENERIC_CATCH anti-patterns by logging full error objects instead of just messages - Add [ANTI-PATTERN IGNORED] markers for legitimate cases (cleanup, hot paths) - Simplify error handling comments to be more concise - Improve httpShutdown() error discrimination for ECONNREFUSED - Reduce LARGE_TRY_BLOCK issues in initialization code Part of anti-pattern cleanup plan (132 total issues) * refactor: improve error logging in SearchManager.ts - Pass full error objects to logger instead of just error.message - Fixes PARTIAL_ERROR_LOGGING anti-patterns (10 instances) - Better debugging visibility when Chroma queries fail Part of anti-pattern cleanup (133 remaining) * refactor: improve error logging across SessionStore and mcp-server - SessionStore.ts: Fix error logging in column rename utility - mcp-server.ts: Log full error objects instead of just error.message - Improve error handling in Worker API calls and tool execution Part of anti-pattern cleanup (133 remaining) * Refactor hooks to streamline error handling and loading states - Simplified error handling in useContextPreview by removing try-catch and directly checking response status. - Refactored usePagination to eliminate try-catch, improving readability and maintaining error handling through response checks. - Cleaned up useSSE by removing unnecessary try-catch around JSON parsing, ensuring clarity in message handling. - Enhanced useSettings by streamlining the saving process, removing try-catch, and directly checking the result for success. * refactor: add error handling back to SearchManager Chroma calls - Wrap queryChroma calls in try-catch to prevent generator crashes - Log Chroma errors as warnings and fall back gracefully - Fixes generator failures when Chroma has issues - Part of anti-pattern cleanup recovery * feat: Add generator failure investigation report and observation duplication regression report - Created a comprehensive investigation report detailing the root cause of generator failures during anti-pattern cleanup, including the impact, investigation process, and implemented fixes. - Documented the critical regression causing observation duplication due to race conditions in the SDK agent, outlining symptoms, root cause analysis, and proposed fixes. * fix: address PR #528 review comments - atomic cleanup and detector improvements This commit addresses critical review feedback from PR #528: ## 1. Atomic Message Cleanup (Fix Race Condition) **Problem**: SessionRoutes.ts generator error handler had race condition - Queried messages then marked failed in loop - If crash during loop → partial marking → inconsistent state **Solution**: - Added `markSessionMessagesFailed()` to PendingMessageStore.ts - Single atomic UPDATE statement replaces loop - Follows existing pattern from `resetProcessingToPending()` **Files**: - src/services/sqlite/PendingMessageStore.ts (new method) - src/services/worker/http/routes/SessionRoutes.ts (use new method) ## 2. Anti-Pattern Detector Improvements **Problem**: Detector didn't recognize logger.failure() method - Lines 212 & 335 already included "failure" - Lines 112-113 (PARTIAL_ERROR_LOGGING detection) did not **Solution**: Updated regex patterns to include "failure" for consistency **Files**: - scripts/anti-pattern-test/detect-error-handling-antipatterns.ts ## 3. Documentation **PR Comment**: Added clarification on memory_session_id fix location - Points to SessionStore.ts:1155 - Explains why NULL initialization prevents message injection bug ## Review Response Addresses "Must Address Before Merge" items from review: ✅ Clarified memory_session_id bug fix location (via PR comment) ✅ Made generator error handler message cleanup atomic ❌ Deferred comprehensive test suite to follow-up PR (keeps PR focused) ## Testing - Build passes with no errors - Anti-pattern detector runs successfully - Atomic cleanup follows proven pattern from existing methods 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix: FOREIGN KEY constraint and missing failed_at_epoch column Two critical bugs fixed: 1. Missing failed_at_epoch column in pending_messages table - Added migration 20 to create the column - Fixes error when trying to mark messages as failed 2. FOREIGN KEY constraint failed when storing observations - All three agents (SDK, Gemini, OpenRouter) were passing session.contentSessionId instead of session.memorySessionId - storeObservationsAndMarkComplete expects memorySessionId - Added null check and clear error message However, observations still not saving - see investigation report. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Refactor hook input parsing to improve error handling - Added a nested try-catch block in new-hook.ts, save-hook.ts, and summary-hook.ts to handle JSON parsing errors more gracefully. - Replaced direct error throwing with logging of the error details using logger.error. - Ensured that the process exits cleanly after handling input in all three hooks. --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 18:51:59 -05:00
parent e830157e77
commit 817b9e8f27
31 changed files with 4490 additions and 3292 deletions
@@ -147,23 +147,18 @@ export class SessionRoutes extends BaseRouteHandler {

        // Mark all processing messages as failed so they can be retried or abandoned
        const pendingStore = this.sessionManager.getPendingMessageStore();
-        const db = this.dbManager.getSessionStore().db;
        try {
-          const stmt = db.prepare(`
-            SELECT id FROM pending_messages
-            WHERE session_db_id = ? AND status = 'processing'
-          `);
-          const processingMessages = stmt.all(session.sessionDbId) as { id: number }[];
-
-          for (const msg of processingMessages) {
-            pendingStore.markFailed(msg.id);
-            logger.warn('SESSION', `Marked message as failed after generator error`, {
+          const failedCount = pendingStore.markSessionMessagesFailed(session.sessionDbId);
+          if (failedCount > 0) {
+            logger.warn('SESSION', `Marked messages as failed after generator error`, {
              sessionId: session.sessionDbId,
-              messageId: msg.id
+              failedCount
            });
          }
        } catch (dbError) {
-          logger.error('SESSION', 'Failed to mark messages as failed', { sessionId: session.sessionDbId }, dbError as Error);
+          logger.error('SESSION', 'Failed to mark messages as failed', {
+            sessionId: session.sessionDbId
+          }, dbError as Error);
        }
      })
      .finally(() => {
@@ -570,6 +565,11 @@ export class SessionRoutes extends BaseRouteHandler {
      contentSessionId
    });

+    // SESSION ALIGNMENT LOG: DB lookup proof - show content→memory mapping
+    const dbSession = store.getSessionById(sessionDbId);
+    const memorySessionId = dbSession?.memory_session_id || null;
+    const hasCapturedMemoryId = !!memorySessionId;
+
    // Step 2: Get next prompt number from user_prompts count
    const currentCount = store.getPromptNumberFromUserPrompts(contentSessionId);
    const promptNumber = currentCount + 1;
@@ -580,6 +580,13 @@ export class SessionRoutes extends BaseRouteHandler {
      currentCount
    });

+    // SESSION ALIGNMENT LOG: For prompt > 1, prove we looked up memorySessionId from contentSessionId
+    if (promptNumber > 1) {
+      logger.info('HTTP', `[ALIGNMENT] DB Lookup Proof | contentSessionId=${contentSessionId} → memorySessionId=${memorySessionId || '(not yet captured)'} | prompt#=${promptNumber} | hasCapturedMemoryId=${hasCapturedMemoryId}`);
+    } else {
+      logger.info('HTTP', `[ALIGNMENT] New Session | contentSessionId=${contentSessionId} | prompt#=${promptNumber} | memorySessionId will be captured on first SDK response`);
+    }
+
    // Step 3: Strip privacy tags from prompt
    const cleanedPrompt = stripMemoryTagsFromPrompt(prompt);