Improve error handling and logging across worker services (#528)
* fix: prevent memory_session_id from equaling content_session_id The bug: memory_session_id was initialized to contentSessionId as a "placeholder for FK purposes". This caused the SDK resume logic to inject memory agent messages into the USER's Claude Code transcript, corrupting their conversation history. Root cause: - SessionStore.createSDKSession initialized memory_session_id = contentSessionId - SDKAgent checked memorySessionId !== contentSessionId but this check only worked if the session was fetched fresh from DB The fix: - SessionStore: Initialize memory_session_id as NULL, not contentSessionId - SDKAgent: Simple truthy check !!session.memorySessionId (NULL = fresh start) - Database migration: Ran UPDATE to set memory_session_id = NULL for 1807 existing sessions that had the bug Also adds [ALIGNMENT] logging across the session lifecycle to help debug session continuity issues: - Hook entry: contentSessionId + promptNumber - DB lookup: contentSessionId → memorySessionId mapping proof - Resume decision: shows which memorySessionId will be used for resume - Capture: logs when memorySessionId is captured from first SDK response UI: Added "Alignment" quick filter button in LogsModal to show only alignment logs for debugging session continuity. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor: improve error handling in worker-service.ts - Fix GENERIC_CATCH anti-patterns by logging full error objects instead of just messages - Add [ANTI-PATTERN IGNORED] markers for legitimate cases (cleanup, hot paths) - Simplify error handling comments to be more concise - Improve httpShutdown() error discrimination for ECONNREFUSED - Reduce LARGE_TRY_BLOCK issues in initialization code Part of anti-pattern cleanup plan (132 total issues) * refactor: improve error logging in SearchManager.ts - Pass full error objects to logger instead of just error.message - Fixes PARTIAL_ERROR_LOGGING anti-patterns (10 instances) - Better debugging visibility when Chroma queries fail Part of anti-pattern cleanup (133 remaining) * refactor: improve error logging across SessionStore and mcp-server - SessionStore.ts: Fix error logging in column rename utility - mcp-server.ts: Log full error objects instead of just error.message - Improve error handling in Worker API calls and tool execution Part of anti-pattern cleanup (133 remaining) * Refactor hooks to streamline error handling and loading states - Simplified error handling in useContextPreview by removing try-catch and directly checking response status. - Refactored usePagination to eliminate try-catch, improving readability and maintaining error handling through response checks. - Cleaned up useSSE by removing unnecessary try-catch around JSON parsing, ensuring clarity in message handling. - Enhanced useSettings by streamlining the saving process, removing try-catch, and directly checking the result for success. * refactor: add error handling back to SearchManager Chroma calls - Wrap queryChroma calls in try-catch to prevent generator crashes - Log Chroma errors as warnings and fall back gracefully - Fixes generator failures when Chroma has issues - Part of anti-pattern cleanup recovery * feat: Add generator failure investigation report and observation duplication regression report - Created a comprehensive investigation report detailing the root cause of generator failures during anti-pattern cleanup, including the impact, investigation process, and implemented fixes. - Documented the critical regression causing observation duplication due to race conditions in the SDK agent, outlining symptoms, root cause analysis, and proposed fixes. * fix: address PR #528 review comments - atomic cleanup and detector improvements This commit addresses critical review feedback from PR #528: ## 1. Atomic Message Cleanup (Fix Race Condition) **Problem**: SessionRoutes.ts generator error handler had race condition - Queried messages then marked failed in loop - If crash during loop → partial marking → inconsistent state **Solution**: - Added `markSessionMessagesFailed()` to PendingMessageStore.ts - Single atomic UPDATE statement replaces loop - Follows existing pattern from `resetProcessingToPending()` **Files**: - src/services/sqlite/PendingMessageStore.ts (new method) - src/services/worker/http/routes/SessionRoutes.ts (use new method) ## 2. Anti-Pattern Detector Improvements **Problem**: Detector didn't recognize logger.failure() method - Lines 212 & 335 already included "failure" - Lines 112-113 (PARTIAL_ERROR_LOGGING detection) did not **Solution**: Updated regex patterns to include "failure" for consistency **Files**: - scripts/anti-pattern-test/detect-error-handling-antipatterns.ts ## 3. Documentation **PR Comment**: Added clarification on memory_session_id fix location - Points to SessionStore.ts:1155 - Explains why NULL initialization prevents message injection bug ## Review Response Addresses "Must Address Before Merge" items from review: ✅ Clarified memory_session_id bug fix location (via PR comment) ✅ Made generator error handler message cleanup atomic ❌ Deferred comprehensive test suite to follow-up PR (keeps PR focused) ## Testing - Build passes with no errors - Anti-pattern detector runs successfully - Atomic cleanup follows proven pattern from existing methods 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix: FOREIGN KEY constraint and missing failed_at_epoch column Two critical bugs fixed: 1. Missing failed_at_epoch column in pending_messages table - Added migration 20 to create the column - Fixes error when trying to mark messages as failed 2. FOREIGN KEY constraint failed when storing observations - All three agents (SDK, Gemini, OpenRouter) were passing session.contentSessionId instead of session.memorySessionId - storeObservationsAndMarkComplete expects memorySessionId - Added null check and clear error message However, observations still not saving - see investigation report. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Refactor hook input parsing to improve error handling - Added a nested try-catch block in new-hook.ts, save-hook.ts, and summary-hook.ts to handle JSON parsing errors more gracefully. - Replaced direct error throwing with logging of the error details using logger.error. - Ensured that the process exits cleanly after handling input in all three hooks. --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
+145
-160
@@ -212,16 +212,12 @@ export class GeminiAgent {
|
||||
const tokensUsed = obsResponse.tokensUsed || 0;
|
||||
session.cumulativeInputTokens += Math.floor(tokensUsed * 0.7);
|
||||
session.cumulativeOutputTokens += Math.floor(tokensUsed * 0.3);
|
||||
await this.processGeminiResponse(session, obsResponse.content, worker, tokensUsed, originalTimestamp);
|
||||
} else {
|
||||
// Empty response - still mark messages as processed to avoid stuck state
|
||||
logger.warn('SDK', 'Empty Gemini response for observation, marking as processed', {
|
||||
sessionId: session.sessionDbId,
|
||||
toolName: message.tool_name
|
||||
});
|
||||
await this.markMessagesProcessed(session, worker);
|
||||
}
|
||||
|
||||
// Process response (even if empty) - empty responses will have no observations/summaries
|
||||
// but messages still need to be marked complete atomically
|
||||
await this.processGeminiResponse(session, obsResponse.content || '', worker, tokensUsed, originalTimestamp);
|
||||
|
||||
} else if (message.type === 'summarize') {
|
||||
// Build summary prompt
|
||||
const summaryPrompt = buildSummaryPrompt({
|
||||
@@ -243,14 +239,11 @@ export class GeminiAgent {
|
||||
const tokensUsed = summaryResponse.tokensUsed || 0;
|
||||
session.cumulativeInputTokens += Math.floor(tokensUsed * 0.7);
|
||||
session.cumulativeOutputTokens += Math.floor(tokensUsed * 0.3);
|
||||
await this.processGeminiResponse(session, summaryResponse.content, worker, tokensUsed, originalTimestamp);
|
||||
} else {
|
||||
// Empty response - still mark messages as processed to avoid stuck state
|
||||
logger.warn('SDK', 'Empty Gemini response for summary, marking as processed', {
|
||||
sessionId: session.sessionDbId
|
||||
});
|
||||
await this.markMessagesProcessed(session, worker);
|
||||
}
|
||||
|
||||
// Process response (even if empty) - empty responses will have no observations/summaries
|
||||
// but messages still need to be marked complete atomically
|
||||
await this.processGeminiResponse(session, summaryResponse.content || '', worker, tokensUsed, originalTimestamp);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -374,163 +367,155 @@ export class GeminiAgent {
|
||||
discoveryTokens: number,
|
||||
originalTimestamp: number | null
|
||||
): Promise<void> {
|
||||
// Parse observations (same XML format)
|
||||
// Parse observations and summary
|
||||
const observations = parseObservations(text, session.contentSessionId);
|
||||
|
||||
// Store observations with original timestamp (if processing backlog) or current time
|
||||
for (const obs of observations) {
|
||||
const { id: obsId, createdAtEpoch } = this.dbManager.getSessionStore().storeObservation(
|
||||
session.contentSessionId,
|
||||
session.project,
|
||||
obs,
|
||||
session.lastPromptNumber,
|
||||
discoveryTokens,
|
||||
originalTimestamp ?? undefined
|
||||
);
|
||||
|
||||
logger.info('SDK', 'Gemini observation saved', {
|
||||
sessionId: session.sessionDbId,
|
||||
obsId,
|
||||
type: obs.type,
|
||||
title: obs.title || '(untitled)'
|
||||
});
|
||||
|
||||
// Sync to Chroma
|
||||
this.dbManager.getChromaSync().syncObservation(
|
||||
obsId,
|
||||
session.contentSessionId,
|
||||
session.project,
|
||||
obs,
|
||||
session.lastPromptNumber,
|
||||
createdAtEpoch,
|
||||
discoveryTokens
|
||||
).catch(err => {
|
||||
logger.warn('SDK', 'Gemini chroma sync failed', { obsId }, err);
|
||||
});
|
||||
|
||||
// Broadcast to SSE clients
|
||||
if (worker && worker.sseBroadcaster) {
|
||||
worker.sseBroadcaster.broadcast({
|
||||
type: 'new_observation',
|
||||
observation: {
|
||||
id: obsId,
|
||||
memory_session_id: session.memorySessionId,
|
||||
session_id: session.contentSessionId,
|
||||
type: obs.type,
|
||||
title: obs.title,
|
||||
subtitle: obs.subtitle,
|
||||
text: null,
|
||||
narrative: obs.narrative || null,
|
||||
facts: JSON.stringify(obs.facts || []),
|
||||
concepts: JSON.stringify(obs.concepts || []),
|
||||
files_read: JSON.stringify(obs.files_read || []),
|
||||
files_modified: JSON.stringify(obs.files_modified || []),
|
||||
project: session.project,
|
||||
prompt_number: session.lastPromptNumber,
|
||||
created_at_epoch: createdAtEpoch
|
||||
}
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// Parse summary
|
||||
const summary = parseSummary(text, session.sessionDbId);
|
||||
|
||||
if (summary) {
|
||||
// Convert nullable fields to empty strings for storeSummary
|
||||
const summaryForStore = {
|
||||
request: summary.request || '',
|
||||
investigated: summary.investigated || '',
|
||||
learned: summary.learned || '',
|
||||
completed: summary.completed || '',
|
||||
next_steps: summary.next_steps || '',
|
||||
notes: summary.notes
|
||||
};
|
||||
// Convert nullable fields to empty strings for storeSummary (if summary exists)
|
||||
const summaryForStore = summary ? {
|
||||
request: summary.request || '',
|
||||
investigated: summary.investigated || '',
|
||||
learned: summary.learned || '',
|
||||
completed: summary.completed || '',
|
||||
next_steps: summary.next_steps || '',
|
||||
notes: summary.notes
|
||||
} : null;
|
||||
|
||||
const { id: summaryId, createdAtEpoch } = this.dbManager.getSessionStore().storeSummary(
|
||||
session.contentSessionId,
|
||||
session.project,
|
||||
summaryForStore,
|
||||
session.lastPromptNumber,
|
||||
discoveryTokens,
|
||||
originalTimestamp ?? undefined
|
||||
);
|
||||
|
||||
logger.info('SDK', 'Gemini summary saved', {
|
||||
sessionId: session.sessionDbId,
|
||||
summaryId,
|
||||
request: summary.request || '(no request)'
|
||||
});
|
||||
|
||||
// Sync to Chroma
|
||||
this.dbManager.getChromaSync().syncSummary(
|
||||
summaryId,
|
||||
session.contentSessionId,
|
||||
session.project,
|
||||
summaryForStore,
|
||||
session.lastPromptNumber,
|
||||
createdAtEpoch,
|
||||
discoveryTokens
|
||||
).catch(err => {
|
||||
logger.warn('SDK', 'Gemini chroma sync failed', { summaryId }, err);
|
||||
});
|
||||
|
||||
// Broadcast to SSE clients
|
||||
if (worker && worker.sseBroadcaster) {
|
||||
worker.sseBroadcaster.broadcast({
|
||||
type: 'new_summary',
|
||||
summary: {
|
||||
id: summaryId,
|
||||
session_id: session.contentSessionId,
|
||||
request: summary.request,
|
||||
investigated: summary.investigated,
|
||||
learned: summary.learned,
|
||||
completed: summary.completed,
|
||||
next_steps: summary.next_steps,
|
||||
notes: summary.notes,
|
||||
project: session.project,
|
||||
prompt_number: session.lastPromptNumber,
|
||||
created_at_epoch: createdAtEpoch
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
// Update Cursor context file for registered projects (fire-and-forget)
|
||||
updateCursorContextForProject(session.project, getWorkerPort()).catch(error => {
|
||||
logger.warn('CURSOR', 'Context update failed (non-critical)', { project: session.project }, error as Error);
|
||||
});
|
||||
}
|
||||
|
||||
// Mark messages as processed
|
||||
await this.markMessagesProcessed(session, worker);
|
||||
}
|
||||
|
||||
/**
|
||||
* Mark pending messages as processed
|
||||
*/
|
||||
private async markMessagesProcessed(session: ActiveSession, worker: any | undefined): Promise<void> {
|
||||
// Get the pending message ID(s) for this response
|
||||
const pendingMessageStore = this.sessionManager.getPendingMessageStore();
|
||||
if (session.pendingProcessingIds.size > 0) {
|
||||
for (const messageId of session.pendingProcessingIds) {
|
||||
pendingMessageStore.markProcessed(messageId);
|
||||
}
|
||||
logger.debug('SDK', 'Gemini messages marked as processed', {
|
||||
sessionId: session.sessionDbId,
|
||||
count: session.pendingProcessingIds.size
|
||||
});
|
||||
session.pendingProcessingIds.clear();
|
||||
const sessionStore = this.dbManager.getSessionStore();
|
||||
|
||||
if (session.pendingProcessingIds.size > 0) {
|
||||
// ATOMIC TRANSACTION: Store observations + summary + mark message(s) complete
|
||||
for (const messageId of session.pendingProcessingIds) {
|
||||
// CRITICAL: Must use memorySessionId (not contentSessionId) for FK constraint
|
||||
if (!session.memorySessionId) {
|
||||
throw new Error('Cannot store observations: memorySessionId not yet captured');
|
||||
}
|
||||
|
||||
const result = sessionStore.storeObservationsAndMarkComplete(
|
||||
session.memorySessionId,
|
||||
session.project,
|
||||
observations,
|
||||
summaryForStore,
|
||||
messageId,
|
||||
pendingMessageStore,
|
||||
session.lastPromptNumber,
|
||||
discoveryTokens,
|
||||
originalTimestamp ?? undefined
|
||||
);
|
||||
|
||||
logger.info('SDK', 'Gemini observations and summary saved atomically', {
|
||||
sessionId: session.sessionDbId,
|
||||
messageId,
|
||||
observationCount: result.observationIds.length,
|
||||
hasSummary: !!result.summaryId,
|
||||
atomicTransaction: true
|
||||
});
|
||||
|
||||
// AFTER transaction commits - async operations (can fail safely)
|
||||
for (let i = 0; i < observations.length; i++) {
|
||||
const obsId = result.observationIds[i];
|
||||
const obs = observations[i];
|
||||
|
||||
this.dbManager.getChromaSync().syncObservation(
|
||||
obsId,
|
||||
session.contentSessionId,
|
||||
session.project,
|
||||
obs,
|
||||
session.lastPromptNumber,
|
||||
result.createdAtEpoch,
|
||||
discoveryTokens
|
||||
).catch(err => {
|
||||
logger.warn('SDK', 'Gemini chroma sync failed', { obsId }, err);
|
||||
});
|
||||
|
||||
// Broadcast to SSE clients
|
||||
if (worker && worker.sseBroadcaster) {
|
||||
worker.sseBroadcaster.broadcast({
|
||||
type: 'new_observation',
|
||||
observation: {
|
||||
id: obsId,
|
||||
memory_session_id: session.memorySessionId,
|
||||
session_id: session.contentSessionId,
|
||||
type: obs.type,
|
||||
title: obs.title,
|
||||
subtitle: obs.subtitle,
|
||||
text: null,
|
||||
narrative: obs.narrative || null,
|
||||
facts: JSON.stringify(obs.facts || []),
|
||||
concepts: JSON.stringify(obs.concepts || []),
|
||||
files_read: JSON.stringify(obs.files_read || []),
|
||||
files_modified: JSON.stringify(obs.files_modified || []),
|
||||
project: session.project,
|
||||
prompt_number: session.lastPromptNumber,
|
||||
created_at_epoch: result.createdAtEpoch
|
||||
}
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// Sync summary to Chroma (if present)
|
||||
if (summaryForStore && result.summaryId) {
|
||||
this.dbManager.getChromaSync().syncSummary(
|
||||
result.summaryId,
|
||||
session.contentSessionId,
|
||||
session.project,
|
||||
summaryForStore,
|
||||
session.lastPromptNumber,
|
||||
result.createdAtEpoch,
|
||||
discoveryTokens
|
||||
).catch(err => {
|
||||
logger.warn('SDK', 'Gemini chroma sync failed', { summaryId: result.summaryId }, err);
|
||||
});
|
||||
|
||||
// Broadcast to SSE clients
|
||||
if (worker && worker.sseBroadcaster) {
|
||||
worker.sseBroadcaster.broadcast({
|
||||
type: 'new_summary',
|
||||
summary: {
|
||||
id: result.summaryId,
|
||||
session_id: session.contentSessionId,
|
||||
request: summary!.request,
|
||||
investigated: summary!.investigated,
|
||||
learned: summary!.learned,
|
||||
completed: summary!.completed,
|
||||
next_steps: summary!.next_steps,
|
||||
notes: summary!.notes,
|
||||
project: session.project,
|
||||
prompt_number: session.lastPromptNumber,
|
||||
created_at_epoch: result.createdAtEpoch
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
// Update Cursor context file for registered projects (fire-and-forget)
|
||||
updateCursorContextForProject(session.project, getWorkerPort()).catch(error => {
|
||||
logger.warn('CURSOR', 'Context update failed (non-critical)', { project: session.project }, error as Error);
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// Clear the processed message IDs
|
||||
session.pendingProcessingIds.clear();
|
||||
session.earliestPendingTimestamp = null;
|
||||
|
||||
// Clean up old processed messages
|
||||
const deletedCount = pendingMessageStore.cleanupProcessed(100);
|
||||
if (deletedCount > 0) {
|
||||
logger.debug('SDK', 'Gemini cleaned up old processed messages', { deletedCount });
|
||||
logger.debug('SDK', 'Cleaned up old processed messages', { deletedCount });
|
||||
}
|
||||
|
||||
// Broadcast activity status after processing
|
||||
if (worker && typeof worker.broadcastProcessingStatus === 'function') {
|
||||
worker.broadcastProcessingStatus();
|
||||
}
|
||||
}
|
||||
|
||||
if (worker && typeof worker.broadcastProcessingStatus === 'function') {
|
||||
worker.broadcastProcessingStatus();
|
||||
}
|
||||
}
|
||||
|
||||
// REMOVED: markMessagesProcessed() - replaced by atomic transaction in processGeminiResponse()
|
||||
// Messages are now marked complete atomically with observation storage to prevent duplicates
|
||||
|
||||
/**
|
||||
* Get Gemini configuration from settings or environment
|
||||
*/
|
||||
|
||||
Reference in New Issue
Block a user