fix: self-healing claimNextMessage prevents stuck processing messages (#1159)

* fix: self-healing claimNextMessage prevents stuck processing messages

claimAndDelete → claimNextMessage with atomic self-healing: resets stale
processing messages (>60s) back to pending before claiming. Eliminates
stuck messages from generator crashes without external timers. Removes
redundant idle-timeout reset in worker-service.ts. Adds QUEUE to logger
Component type.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: update stale comments in SessionQueueProcessor to reflect claim-confirm pattern

Comments still referenced the old claim-and-delete pattern after the
claimNextMessage rename. Updated to accurately describe the current
lifecycle where messages are marked as processing and stay in DB until
confirmProcessed() is called.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: move Date.now() inside transaction and extract stale threshold constant

- Move Date.now() inside claimNextMessage transaction closure so timestamp
  is fresh if WAL contention causes retry
- Extract STALE_PROCESSING_THRESHOLD_MS to module-level constant
- Add comment clarifying strict < boundary semantics

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Alex Newman
2026-02-17 23:15:46 -05:00
committed by GitHub
parent b2e3a7e668
commit b88251bc8b
9 changed files with 298 additions and 93 deletions
+7 -6
View File
@@ -20,8 +20,9 @@ export class SessionQueueProcessor {
/**
* Create an async iterator that yields messages as they become available.
* Uses atomic claim-and-delete to prevent duplicates.
* The queue is a pure buffer: claim it, delete it, process in memory.
* Uses atomic claim-confirm to prevent duplicates.
* Messages are claimed (marked processing) and stay in DB until confirmProcessed().
* Self-heals stale processing messages before each claim.
* Waits for 'message' event when queue is empty.
*
* CRITICAL: Calls onIdleTimeout callback after 3 minutes of inactivity.
@@ -34,14 +35,14 @@ export class SessionQueueProcessor {
while (!signal.aborted) {
try {
// Atomically claim AND DELETE next message from DB
// Message is now in memory only - no "processing" state tracking needed
const persistentMessage = this.store.claimAndDelete(sessionDbId);
// Atomically claim next pending message (marks as 'processing')
// Self-heals any stale processing messages before claiming
const persistentMessage = this.store.claimNextMessage(sessionDbId);
if (persistentMessage) {
// Reset activity time when we successfully yield a message
lastActivityTime = Date.now();
// Yield the message for processing (it's already deleted from queue)
// Yield the message for processing (it's marked as 'processing' in DB)
yield this.toPendingMessageWithId(persistentMessage);
} else {
// Queue empty - wait for wake-up event or timeout