Files
claude-mem/QUEUE-SIMPLIFICATION-PLAN.md
T

107 lines
5.0 KiB
Markdown

# Queue System Simplification Plan
## 1. Executive Summary
The current queue system suffers from accidental complexity due to **state duplication** (in-memory vs. database), **fragile control flow** (recursive restarts), and **distributed state management**. This plan proposes a refactoring to establish the Database as the Single Source of Truth, unifying the processing logic into a robust, linear "Pump" model.
## 2. Identified Pain Points
1. **Dual State Synchronization:**
* *Issue:* The system maintains both `session.pendingMessages` (in-memory array) and the `pending_messages` SQLite table.
* *Impact:* Requires constant manual synchronization (push/shift/enqueue), leading to race conditions where the in-memory queue drifts from the DB state.
2. **Fragile Generator Lifecycle:**
* *Issue:* The use of `startGeneratorWithProvider` and `startSessionWithAutoRestart` with recursive `setTimeout` calls to keep the processor alive is brittle.
* *Impact:* Hard to debug, prone to stack issues or silent failures if the "chain" breaks.
3. **Non-Atomic State Transitions:**
* *Issue:* The logic separates "peeking" a message from "marking it processing" (the "Critical Flow" identified in the analysis).
* *Impact:* If the worker crashes or halts between these steps, messages can be processed twice or lost in limbo.
4. **Distributed Logic:**
* *Issue:* Queue logic is scattered across `SessionManager` (coordination), `PendingMessageStore` (DB queries), `SDKAgent` (consumption), and `WorkerService` (orchestration).
* *Impact:* Difficult to trace the lifecycle of a single message.
## 3. Proposed Architecture
### 3.1. Core Principle: "The Database is the Queue"
We will eliminate the in-memory `pendingMessages` array entirely. The SQLite database will be the *only* place where queue state exists.
### 3.2. Architecture Components
#### A. Atomic `claimNextMessage()`
Instead of `peek` then `mark`, we will implement a single atomic operation in `PendingMessageStore`.
* **Logic:**
1. Find the oldest `pending` message for the session.
2. Update it to `processing` and set the timestamp.
3. Return the message record.
* **SQL Strategy:** Use a transaction or `UPDATE ... RETURNING` (if supported) to ensure no other worker can claim the same message.
#### B. The `QueuePump` (Unified Processor)
We will replace the recursive generator logic with a class (or function) dedicated to "pumping" messages for a specific session.
* **Pseudocode Structure:**
```typescript
async function runSessionPump(sessionId: number, signal: AbortSignal) {
while (!signal.aborted) {
// 1. Atomic Claim
const message = store.claimNextMessage(sessionId);
if (!message) {
// 2. Wait for signal (Event-driven, not polling)
await waitForNewData(sessionId, signal);
continue;
}
try {
// 3. Process
await sdkAgent.processMessage(message);
// 4. Mark Complete
store.markProcessed(message.id);
} catch (error) {
// 5. Handle Failure
store.markFailed(message.id, error);
}
}
}
```
### 3.3. Key Changes
| Component | Current State | Proposed State |
| :--- | :--- | :--- |
| **Storage** | In-memory Array + SQLite | SQLite Only |
| **Consumption** | `yield` loop inside SDK Agent | `QueuePump` calls SDK Agent per message |
| **Concurrency** | `peekPending` -> `markProcessing` (Race Prone) | `claimNextMessage` (Atomic Transaction) |
| **Lifecycle** | Recursive `setTimeout` loops | Single `while` loop with `await` |
| **Recovery** | `resetStuckMessages` (Global) | Pump handles own retries + Global cleanup on startup |
## 4. Implementation Steps
### Phase 1: Database Layer Hardening
1. Add `claimNextMessage(sessionDbId)` to `PendingMessageStore`.
* Must be transactional.
* Returns `null` if no work is available.
2. Ensure `markProcessed` and `markFailed` are robust.
### Phase 2: The Pump
1. Create `SessionQueueProcessor.ts`.
2. Implement the `while(!aborted)` loop.
3. Integrate the `EventEmitter` to wake the loop when `enqueue()` happens (replacing the current polling-like behavior).
### Phase 3: SDK Integration
1. Refactor `SDKAgent` to accept a *single* message or a streamlined iterator that doesn't manage queue state itself.
2. Remove `session.pendingMessages` from `ActiveSession` type.
### Phase 4: Cleanup
1. Remove `startGeneratorWithProvider` and `startSessionWithAutoRestart`.
2. Remove `peekPending` (as it's replaced by `claimNextMessage`).
3. Remove manual synchronization code in `SessionManager`.
## 5. Benefits
* **Simplicity:** Code reduction of ~30-40%.
* **Reliability:** Atomic database operations eliminate race conditions.
* **Observability:** Linear control flow is easier to log and debug.
* **Resilience:** Crashes are handled by simply restarting the Pump, which naturally picks up "processing" (stuck) or "pending" messages.