feat: Fix observation timestamps, refactor session management, and enhance worker reliability (#437)
* Refactor worker version checks and increase timeout settings - Updated the default hook timeout from 5000ms to 120000ms for improved stability. - Modified the worker version check to log a warning instead of restarting the worker on version mismatch. - Removed legacy PM2 cleanup and worker start logic, simplifying the ensureWorkerRunning function. - Enhanced polling mechanism for worker readiness with increased retries and reduced interval. * feat: implement worker queue polling to ensure processing completion before proceeding * refactor: change worker command from start to restart in hooks configuration * refactor: remove session management complexity - Simplify createSDKSession to pure INSERT OR IGNORE - Remove auto-create logic from storeObservation/storeSummary - Delete 11 unused session management methods - Derive prompt_number from user_prompts count - Keep sdk_sessions table schema unchanged for compatibility * refactor: simplify session management by removing unused methods and auto-creation logic * Refactor session prompt number retrieval in SessionRoutes - Updated the method of obtaining the prompt number from the session. - Replaced `store.getPromptCounter(sessionDbId)` with `store.getPromptNumberFromUserPrompts(claudeSessionId)` for better clarity and accuracy. - Adjusted the logic for incrementing the prompt number to derive it from the user prompts count instead of directly incrementing a counter. * refactor: replace getPromptCounter with getPromptNumberFromUserPrompts in SessionManager Phase 7 of session management simplification. Updates SessionManager to derive prompt numbers from user_prompts table count instead of using the deprecated prompt_counter column. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * refactor: simplify SessionCompletionHandler to use direct SQL query Phase 8: Remove call to findActiveSDKSession() and replace with direct database query in SessionCompletionHandler.completeByClaudeId(). This removes dependency on the deleted findActiveSDKSession() method and simplifies the code by using a straightforward SELECT query. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * refactor: remove markSessionCompleted call from SDKAgent - Delete call to markSessionCompleted() in SDKAgent.ts - Session status is no longer tracked or updated - Part of phase 9: simplifying session management 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * refactor: remove markSessionComplete method (Phase 10) - Deleted markSessionComplete() method from DatabaseManager - Removed markSessionComplete call from SessionCompletionHandler - Session completion status no longer tracked in database - Part of session management simplification effort 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * refactor: replace deleted updateSDKSessionId calls in import script (Phase 11) - Replace updateSDKSessionId() calls with direct SQL UPDATE statements - Method was deleted in Phase 3 as part of session management simplification - Import script now uses direct database access consistently 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * test: add validation for SQL updates in sdk_sessions table * refactor: enhance worker-cli to support manual and automated runs * Remove cleanup hook and associated session completion logic - Deleted the cleanup-hook implementation from the hooks directory. - Removed the session completion endpoint that was used by the cleanup hook. - Updated the SessionCompletionHandler to eliminate the completeByClaudeId method and its dependencies. - Adjusted the SessionRoutes to reflect the removal of the session completion route. * fix: update worker-cli command to use bun for consistency * feat: Implement timestamp fix for observations and enhance processing logic - Added `earliestPendingTimestamp` to `ActiveSession` to track the original timestamp of the earliest pending message. - Updated `SDKAgent` to capture and utilize the earliest pending timestamp during response processing. - Modified `SessionManager` to track the earliest timestamp when yielding messages. - Created scripts for fixing corrupted timestamps, validating fixes, and investigating timestamp issues. - Verified that all corrupted observations have been repaired and logic for future processing is sound. - Ensured orphan processing can be safely re-enabled after validation. * feat: Enhance SessionStore to support custom database paths and add timestamp fields for observations and summaries * Refactor pending queue processing and add management endpoints - Disabled automatic recovery of orphaned queues on startup; users must now use the new /api/pending-queue/process endpoint. - Updated processOrphanedQueues method to processPendingQueues with improved session handling and return detailed results. - Added new API endpoints for managing pending queues: GET /api/pending-queue and POST /api/pending-queue/process. - Introduced a new script (check-pending-queue.ts) for checking and processing pending observation queues interactively or automatically. - Enhanced logging and error handling for better monitoring of session processing. * updated agent sdk * feat: Add manual recovery guide and queue management endpoints to documentation --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -427,6 +427,16 @@ export class WorkerService {
|
||||
// Initialize database (once, stays open)
|
||||
await this.dbManager.initialize();
|
||||
|
||||
// Recover stuck messages from previous crashes
|
||||
// Messages stuck in 'processing' state are reset to 'pending' for reprocessing
|
||||
const { PendingMessageStore } = await import('./sqlite/PendingMessageStore.js');
|
||||
const pendingStore = new PendingMessageStore(this.dbManager.getSessionStore().db, 3);
|
||||
const STUCK_THRESHOLD_MS = 5 * 60 * 1000; // 5 minutes
|
||||
const resetCount = pendingStore.resetStuckMessages(STUCK_THRESHOLD_MS);
|
||||
if (resetCount > 0) {
|
||||
logger.info('SYSTEM', `Recovered ${resetCount} stuck messages from previous session`, { thresholdMinutes: 5 });
|
||||
}
|
||||
|
||||
// Initialize search services (requires initialized database)
|
||||
const formattingService = new FormattingService();
|
||||
const timelineService = new TimelineService();
|
||||
@@ -464,6 +474,8 @@ export class WorkerService {
|
||||
this.initializationCompleteFlag = true;
|
||||
this.resolveInitialization();
|
||||
logger.info('SYSTEM', 'Background initialization complete');
|
||||
|
||||
// Note: Auto-recovery of orphaned queues disabled - use /api/pending-queue/process endpoint instead
|
||||
} catch (error) {
|
||||
logger.error('SYSTEM', 'Background initialization failed', {}, error as Error);
|
||||
// Don't resolve - let the promise remain pending so readiness check continues to fail
|
||||
@@ -471,6 +483,78 @@ export class WorkerService {
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Process pending session queues
|
||||
* Starts SDK agents for sessions that have pending messages but no active processor
|
||||
* @param sessionLimit Maximum number of sessions to start processing (default: 10)
|
||||
* @returns Info about what was started
|
||||
*/
|
||||
async processPendingQueues(sessionLimit: number = 10): Promise<{
|
||||
totalPendingSessions: number;
|
||||
sessionsStarted: number;
|
||||
sessionsSkipped: number;
|
||||
startedSessionIds: number[];
|
||||
}> {
|
||||
const { PendingMessageStore } = await import('./sqlite/PendingMessageStore.js');
|
||||
const pendingStore = new PendingMessageStore(this.dbManager.getSessionStore().db, 3);
|
||||
const orphanedSessionIds = pendingStore.getSessionsWithPendingMessages();
|
||||
|
||||
const result = {
|
||||
totalPendingSessions: orphanedSessionIds.length,
|
||||
sessionsStarted: 0,
|
||||
sessionsSkipped: 0,
|
||||
startedSessionIds: [] as number[]
|
||||
};
|
||||
|
||||
if (orphanedSessionIds.length === 0) {
|
||||
return result;
|
||||
}
|
||||
|
||||
logger.info('SYSTEM', `Processing up to ${sessionLimit} of ${orphanedSessionIds.length} pending session queues`);
|
||||
|
||||
// Process each session sequentially up to the limit
|
||||
for (const sessionDbId of orphanedSessionIds) {
|
||||
if (result.sessionsStarted >= sessionLimit) {
|
||||
break;
|
||||
}
|
||||
|
||||
try {
|
||||
// Skip if session already has an active generator
|
||||
const existingSession = this.sessionManager.getSession(sessionDbId);
|
||||
if (existingSession?.generatorPromise) {
|
||||
result.sessionsSkipped++;
|
||||
continue;
|
||||
}
|
||||
|
||||
// Initialize session and start SDK agent
|
||||
const session = this.sessionManager.initializeSession(sessionDbId);
|
||||
|
||||
logger.info('SYSTEM', `Starting processor for session ${sessionDbId}`, {
|
||||
project: session.project,
|
||||
pendingCount: pendingStore.getPendingCount(sessionDbId)
|
||||
});
|
||||
|
||||
// Start SDK agent (non-blocking)
|
||||
session.generatorPromise = this.sdkAgent.startSession(session, this)
|
||||
.finally(() => {
|
||||
session.generatorPromise = null;
|
||||
this.broadcastProcessingStatus();
|
||||
});
|
||||
|
||||
result.sessionsStarted++;
|
||||
result.startedSessionIds.push(sessionDbId);
|
||||
|
||||
// Small delay between sessions to avoid rate limiting
|
||||
await new Promise(resolve => setTimeout(resolve, 100));
|
||||
} catch (error) {
|
||||
logger.warn('SYSTEM', `Failed to process session ${sessionDbId}`, {}, error as Error);
|
||||
result.sessionsSkipped++;
|
||||
}
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract a specific section from instruction content
|
||||
* Used by /api/instructions endpoint for progressive instruction loading
|
||||
|
||||
Reference in New Issue
Block a user