feat: Fix observation timestamps, refactor session management, and enhance worker reliability (#437)
* Refactor worker version checks and increase timeout settings - Updated the default hook timeout from 5000ms to 120000ms for improved stability. - Modified the worker version check to log a warning instead of restarting the worker on version mismatch. - Removed legacy PM2 cleanup and worker start logic, simplifying the ensureWorkerRunning function. - Enhanced polling mechanism for worker readiness with increased retries and reduced interval. * feat: implement worker queue polling to ensure processing completion before proceeding * refactor: change worker command from start to restart in hooks configuration * refactor: remove session management complexity - Simplify createSDKSession to pure INSERT OR IGNORE - Remove auto-create logic from storeObservation/storeSummary - Delete 11 unused session management methods - Derive prompt_number from user_prompts count - Keep sdk_sessions table schema unchanged for compatibility * refactor: simplify session management by removing unused methods and auto-creation logic * Refactor session prompt number retrieval in SessionRoutes - Updated the method of obtaining the prompt number from the session. - Replaced `store.getPromptCounter(sessionDbId)` with `store.getPromptNumberFromUserPrompts(claudeSessionId)` for better clarity and accuracy. - Adjusted the logic for incrementing the prompt number to derive it from the user prompts count instead of directly incrementing a counter. * refactor: replace getPromptCounter with getPromptNumberFromUserPrompts in SessionManager Phase 7 of session management simplification. Updates SessionManager to derive prompt numbers from user_prompts table count instead of using the deprecated prompt_counter column. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * refactor: simplify SessionCompletionHandler to use direct SQL query Phase 8: Remove call to findActiveSDKSession() and replace with direct database query in SessionCompletionHandler.completeByClaudeId(). This removes dependency on the deleted findActiveSDKSession() method and simplifies the code by using a straightforward SELECT query. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * refactor: remove markSessionCompleted call from SDKAgent - Delete call to markSessionCompleted() in SDKAgent.ts - Session status is no longer tracked or updated - Part of phase 9: simplifying session management 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * refactor: remove markSessionComplete method (Phase 10) - Deleted markSessionComplete() method from DatabaseManager - Removed markSessionComplete call from SessionCompletionHandler - Session completion status no longer tracked in database - Part of session management simplification effort 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * refactor: replace deleted updateSDKSessionId calls in import script (Phase 11) - Replace updateSDKSessionId() calls with direct SQL UPDATE statements - Method was deleted in Phase 3 as part of session management simplification - Import script now uses direct database access consistently 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * test: add validation for SQL updates in sdk_sessions table * refactor: enhance worker-cli to support manual and automated runs * Remove cleanup hook and associated session completion logic - Deleted the cleanup-hook implementation from the hooks directory. - Removed the session completion endpoint that was used by the cleanup hook. - Updated the SessionCompletionHandler to eliminate the completeByClaudeId method and its dependencies. - Adjusted the SessionRoutes to reflect the removal of the session completion route. * fix: update worker-cli command to use bun for consistency * feat: Implement timestamp fix for observations and enhance processing logic - Added `earliestPendingTimestamp` to `ActiveSession` to track the original timestamp of the earliest pending message. - Updated `SDKAgent` to capture and utilize the earliest pending timestamp during response processing. - Modified `SessionManager` to track the earliest timestamp when yielding messages. - Created scripts for fixing corrupted timestamps, validating fixes, and investigating timestamp issues. - Verified that all corrupted observations have been repaired and logic for future processing is sound. - Ensured orphan processing can be safely re-enabled after validation. * feat: Enhance SessionStore to support custom database paths and add timestamp fields for observations and summaries * Refactor pending queue processing and add management endpoints - Disabled automatic recovery of orphaned queues on startup; users must now use the new /api/pending-queue/process endpoint. - Updated processOrphanedQueues method to processPendingQueues with improved session handling and return detailed results. - Added new API endpoints for managing pending queues: GET /api/pending-queue and POST /api/pending-queue/process. - Introduced a new script (check-pending-queue.ts) for checking and processing pending observation queues interactively or automatically. - Enhanced logging and error handling for better monitoring of session processing. * updated agent sdk * feat: Add manual recovery guide and queue management endpoints to documentation --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -19,7 +19,7 @@ The worker service is a long-running HTTP API built with Express.js and managed
|
||||
|
||||
## REST API Endpoints
|
||||
|
||||
The worker service exposes 20 HTTP endpoints organized into five categories:
|
||||
The worker service exposes 22 HTTP endpoints organized into six categories:
|
||||
|
||||
### Viewer & Health Endpoints
|
||||
|
||||
@@ -385,9 +385,106 @@ POST /api/settings
|
||||
}
|
||||
```
|
||||
|
||||
### Queue Management Endpoints
|
||||
|
||||
#### 16. Get Pending Queue Status
|
||||
```
|
||||
GET /api/pending-queue
|
||||
```
|
||||
|
||||
**Purpose**: View current processing queue status and identify stuck messages
|
||||
|
||||
**Response**:
|
||||
```json
|
||||
{
|
||||
"queue": {
|
||||
"messages": [
|
||||
{
|
||||
"id": 123,
|
||||
"session_db_id": 45,
|
||||
"claude_session_id": "abc123",
|
||||
"message_type": "observation",
|
||||
"status": "pending",
|
||||
"retry_count": 0,
|
||||
"created_at_epoch": 1730886600000,
|
||||
"started_processing_at_epoch": null,
|
||||
"completed_at_epoch": null
|
||||
}
|
||||
],
|
||||
"totalPending": 5,
|
||||
"totalProcessing": 2,
|
||||
"totalFailed": 0,
|
||||
"stuckCount": 1
|
||||
},
|
||||
"recentlyProcessed": [
|
||||
{
|
||||
"id": 122,
|
||||
"session_db_id": 44,
|
||||
"status": "processed",
|
||||
"completed_at_epoch": 1730886500000
|
||||
}
|
||||
],
|
||||
"sessionsWithPendingWork": [44, 45, 46]
|
||||
}
|
||||
```
|
||||
|
||||
**Status Definitions**:
|
||||
- `pending`: Message queued, not yet processed
|
||||
- `processing`: Message currently being processed by SDK agent
|
||||
- `processed`: Message completed successfully
|
||||
- `failed`: Message failed after max retry attempts (3 by default)
|
||||
|
||||
**Stuck Detection**: Messages in `processing` status for >5 minutes are considered stuck and included in `stuckCount`
|
||||
|
||||
**Use Case**: Check queue health after worker crashes or restarts to identify unprocessed observations
|
||||
|
||||
#### 17. Trigger Manual Recovery
|
||||
```
|
||||
POST /api/pending-queue/process
|
||||
```
|
||||
|
||||
**Purpose**: Manually trigger processing of pending queues (replaces automatic recovery in v5.x+)
|
||||
|
||||
**Request Body**:
|
||||
```json
|
||||
{
|
||||
"sessionLimit": 10
|
||||
}
|
||||
```
|
||||
|
||||
**Body Parameters**:
|
||||
- `sessionLimit` (optional): Maximum number of sessions to process (default: 10, max: 100)
|
||||
|
||||
**Response**:
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"totalPendingSessions": 15,
|
||||
"sessionsStarted": 10,
|
||||
"sessionsSkipped": 2,
|
||||
"startedSessionIds": [44, 45, 46, 47, 48, 49, 50, 51, 52, 53]
|
||||
}
|
||||
```
|
||||
|
||||
**Response Fields**:
|
||||
- `totalPendingSessions`: Total sessions with pending messages in database
|
||||
- `sessionsStarted`: Number of sessions we started processing this request
|
||||
- `sessionsSkipped`: Sessions already actively processing (not restarted)
|
||||
- `startedSessionIds`: Database IDs of sessions started
|
||||
|
||||
**Behavior**:
|
||||
- Processes up to `sessionLimit` sessions with pending work
|
||||
- Skips sessions already actively processing (prevents duplicate agents)
|
||||
- Starts non-blocking SDK agents for each session
|
||||
- Returns immediately with status (processing continues in background)
|
||||
|
||||
**Use Case**: Manually recover stuck observations after worker crashes, or when automatic recovery was disabled
|
||||
|
||||
**Recovery Strategy Note**: As of v5.x, automatic recovery on worker startup is disabled by default. Users must manually trigger recovery using this endpoint or the CLI tool (`bun scripts/check-pending-queue.ts`) to maintain explicit control over reprocessing.
|
||||
|
||||
### Session Management Endpoints
|
||||
|
||||
#### 16. Initialize Session
|
||||
#### 19. Initialize Session
|
||||
```
|
||||
POST /sessions/:sessionDbId/init
|
||||
```
|
||||
@@ -408,7 +505,7 @@ POST /sessions/:sessionDbId/init
|
||||
}
|
||||
```
|
||||
|
||||
#### 17. Add Observation
|
||||
#### 20. Add Observation
|
||||
```
|
||||
POST /sessions/:sessionDbId/observations
|
||||
```
|
||||
@@ -431,7 +528,7 @@ POST /sessions/:sessionDbId/observations
|
||||
}
|
||||
```
|
||||
|
||||
#### 18. Generate Summary
|
||||
#### 21. Generate Summary
|
||||
```
|
||||
POST /sessions/:sessionDbId/summarize
|
||||
```
|
||||
@@ -451,7 +548,7 @@ POST /sessions/:sessionDbId/summarize
|
||||
}
|
||||
```
|
||||
|
||||
#### 19. Session Status
|
||||
#### 22. Session Status
|
||||
```
|
||||
GET /sessions/:sessionDbId/status
|
||||
```
|
||||
@@ -466,7 +563,7 @@ GET /sessions/:sessionDbId/status
|
||||
}
|
||||
```
|
||||
|
||||
#### 20. Delete Session
|
||||
#### 23. Delete Session
|
||||
```
|
||||
DELETE /sessions/:sessionDbId
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user