Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
9.6 KiB
Phase 08: Session Management & Message Processing
These PRs fix various session lifecycle issues — orphaned messages, provider recovery, infinite loops, and stateless provider support.
High Priority (Data Loss / Cost Risk)
-
Review PR #934 (
fix: terminate session on prompt-too-long instead of retrying indefinitelyby @jayvenn21). File:src/services/worker/SDKAgent.ts. Currently, prompt-too-long errors cause infinite retry loops. Steps: (1)gh pr checkout 934(2) Review — should detect the specific error and terminate the session cleanly (3) Verify no data loss on termination (pending observations should still be saved) (4) Runnpm run build(5) If clean:gh pr merge 934 --rebase --delete-branch- MERGED (2026-02-06): Clean +5 line fix. Detects "Prompt is too long" in SDK response text and throws a fatal error, which propagates up to
startSessionProcessorcatch handler for clean session termination. No data loss — previously processed observations are already saved before this point. Build passes.
- MERGED (2026-02-06): Clean +5 line fix. Detects "Prompt is too long" in SDK response text and throws a fatal error, which propagates up to
-
Review PR #693 (
fix: prevent infinite restart loop that causes runaway API costsby @ajbmachon). Files:src/services/worker-types.ts,GeminiAgent.ts,OpenRouterAgent.ts,SessionManager.ts,SessionRoutes.ts. Steps: (1)gh pr checkout 693(2) Review restart limiting logic — should have max restart count or cooldown (3) Verify it applies to all providers (Claude, Gemini, OpenRouter) (4) Runnpm run build(5) If clean:gh pr merge 693 --rebase --delete-branch- MERGED (2026-02-06): Applied directly to main (branch was 3+ weeks old). Adds
consecutiveRestartscounter to ActiveSession, limits crash-recovery restarts to 3 with exponential backoff (1s, 2s, 4s), and adds defensivememorySessionIdchecks in GeminiAgent and OpenRouterAgent before expensive LLM calls. Build passes. Addresses $402+ runaway cost scenario reported by the author.
- MERGED (2026-02-06): Applied directly to main (branch was 3+ weeks old). Adds
Session Processing
-
Review PR #940 (
fix: Backfill project field on SDK session creationby @miclip). Files:src/services/sqlite/SessionStore.ts,src/services/sqlite/sessions/create.ts. Steps: (1)gh pr checkout 940(2) Review — should populate the project field when creating a session so observations are properly scoped (3) Small, focused change (4) Runnpm run build(5) If clean:gh pr merge 940 --rebase --delete-branch- MERGED (2026-02-06): Clean +18/-2 line fix. Adds a conditional UPDATE after INSERT OR IGNORE to backfill the project field when a later hook provides it — only updates when existing value is NULL or empty. Fixes race condition where PostToolUse hook creates the session before UserPromptSubmit sets the project. Both functional (
sessions/create.ts) and class (SessionStore.ts) versions updated identically. Build passes.
- MERGED (2026-02-06): Clean +18/-2 line fix. Adds a conditional UPDATE after INSERT OR IGNORE to backfill the project field when a later hook provides it — only updates when existing value is NULL or empty. Fixes race condition where PostToolUse hook creates the session before UserPromptSubmit sets the project. Both functional (
-
Review PR #937 (
fix(worker): gracefully process orphaned pending messages after session terminationby @jayvenn21). Files:src/services/sqlite/PendingMessageStore.ts,src/services/worker-service.ts,src/services/worker/SessionManager.ts. Steps: (1)gh pr checkout 937(2) Review — orphaned messages should be processed or discarded cleanly, not stuck forever (3) Runnpm run build(4) If clean:gh pr merge 937 --rebase --delete-branch- MERGED (2026-02-06): Clean +125/-3 line fix addressing issue #936 (51+ orphaned messages/day). Adds
isSessionTerminatedError()to detect SDK resume failures from closed terminals. On failure, falls back to Gemini/OpenRouter agents if available to drain the queue. If no fallback,markAllSessionMessagesAbandoned()marks pending/processing messages as failed andremoveSessionImmediate()cleans up the session without deadlocking on the generator promise. Build passes. Rebased cleanly onto main.
- MERGED (2026-02-06): Clean +125/-3 line fix addressing issue #936 (51+ orphaned messages/day). Adds
-
Review PR #899 (
fix: resolve message processing failures in multi-session scenariosby @hahaschool). Files: 7 files including SessionStore, SDKAgent, ResponseProcessor, SessionRoutes. Steps: (1)gh pr checkout 899(2) This is a broader fix — review carefully for scope creep (3) Check that multi-session message routing is correct (messages go to the right session) (4) Runnpm run build(5) If focused and correct:gh pr merge 899 --rebase --delete-branch. If too broad, request scope reduction.- CLOSED (2026-02-06): Too broad — conflicts in 3 files (
worker-service.ts,worker-types.ts,SDKAgent.ts) due to overlap with recently merged PRs #693, #937, and #940. Crash recovery logic already addressed by #693 (restart limits with exponential backoff) and #937 (orphaned message fallback + session termination detection). Requested author re-submit as focused PRs for the genuinely valuable parts: (1) AbortController stale-check at generator start (~10 lines, real bug), (2) FK constraint protection in SDKAgent/ResponseProcessor (prevents FK violation when SDK generates new session ID after restart), (3) queueDepth logging fix usingpendingStore.getPendingCount()instead of deprecatedsession.pendingMessages.length.
- CLOSED (2026-02-06): Too broad — conflicts in 3 files (
-
Review PR #627 (
fix: Reset AbortController before starting generatorby @TranslateMe). File:src/services/worker/http/routes/SessionRoutes.ts. Steps: (1)gh pr checkout 627(2) Old PR (Dec 27) — check if still applicable after v8.5.2 memory leak fix (3) If the abort controller reset is still needed: rebase and merge. If already handled: close.- MERGED (2026-02-06): Clean +10 line fix still applicable. Adds stale-AbortController check at the start of
startGeneratorWithProvider()— ifsignal.abortedis already true, creates a fresh AbortController before proceeding. This prevents infinite "Generator aborted" loops where: (1) session aborts, settingsignal.aborted=true, (2)generatorPromiseis set to null, (3) new observations triggerensureGeneratorRunning, (4) new generator immediately sees stale abort signal and exits, creating an infinite loop. The crash recovery path (merged in PR #693) already resets the controller for non-abort exits, but this fix covers the abort case. Rebased cleanly onto main, build passes.
- MERGED (2026-02-06): Clean +10 line fix still applicable. Adds stale-AbortController check at the start of
-
Review PR #741 (
fix: Provider-aware recovery and stale session cleanupby @licutis). File:src/services/worker-service.ts. Steps: (1)gh pr checkout 741(2) Review provider-aware recovery logic — should handle Gemini/OpenRouter differently from Claude SDK (3) Runnpm run build(4) If clean:gh pr merge 741 --rebase --delete-branch- MERGED (2026-02-06): Applied directly to main (branch was 3 weeks old with conflicts from PRs #693, #937, #627). Two clean changes: (1) Adds
getActiveAgent()to WorkerService matching SessionRoutes logic — startup recovery now uses the correct provider (OpenRouter/Gemini/SDK) instead of hardcoding SDKAgent. (2) Stale session cleanup inprocessPendingQueues()— marks sessions stuck 'active' for 6+ hours as failed along with their pending messages before processing orphaned queues. Conflicts resolved by combining PR #741's dynamic agent selection with PR #937's session-terminated fallback logic. Build passes.
- MERGED (2026-02-06): Applied directly to main (branch was 3 weeks old with conflicts from PRs #693, #937, #627). Two clean changes: (1) Adds
Stateless Provider Support
These fix Gemini/OpenRouter providers that don't have SDK sessions.
-
Review PR #910 (
fix: complete stateless provider support with enhanced validationby @Scheevel). WARNING: This PR modifies ~90 files, most of which are CLAUDE.md files that should NOT have been included. Steps: (1)gh pr checkout 910(2) Identify the actual source changes (look at.tsfiles only, ignore CLAUDE.md files) (3) Key files:src/services/sqlite/SessionStore.ts,src/services/worker/GeminiAgent.ts,src/services/worker/OpenRouterAgent.ts,src/services/worker/agents/ResponseProcessor.ts(4) If the core logic is good but CLAUDE.md pollution is unacceptable, request changes to remove all CLAUDE.md files from the PR. Or cherry-pick just the source changes.- CLOSED (2026-02-06): Stale branch with critical regressions against 5 recently merged PRs. Reverts:
consecutiveRestarts(PR #693 — infinite restart prevention),removeSessionImmediate()(PR #937 — orphaned message handling),getCredential()(Issue #733 — centralized env),CLAUDE_MEM_FOLDER_CLAUDEMD_ENABLEDcheck (PR #913), andcreateIterator()onIdleTimeout. Also bundles 3 unrelated features: observation deduplication (+285 lines Levenshtein in ResponseProcessor), summary soft-delete (hidden column + PATCH API), and deletion of ~80 CLAUDE.md files + compiled plugin artifacts. Core synthetic memorySessionId generation is valuable — requested re-submission as focused PRs: (1) synthetic ID gen for stateless providers, (2) observation deduplication, (3) summary soft-delete.
- CLOSED (2026-02-06): Stale branch with critical regressions against 5 recently merged PRs. Reverts:
-
Evaluate PR #615 (
fix: generate memorySessionId for stateless providersby @JiehoonKwak). Files: 6 files. Steps: (1)gh pr checkout 615(2) Check if #910 supersedes this (both fix stateless provider session IDs) (3) If #910 is more complete, close #615:gh pr close 615 --comment "Superseded by PR #910 which provides more complete stateless provider support. Thank you!"(4) If #615 has unique value, rebase and merge first.- APPLIED TO MAIN (2026-02-06): PR #910 was already CLOSED (regressions), so #615 was NOT superseded. Core synthetic memorySessionId generation (+8 lines each in GeminiAgent.ts and OpenRouterAgent.ts) applied directly to main — PR was 1 month old with conflicts from PRs #693, #913, #940. The fix generates
gemini-{contentSessionId}-{timestamp}/openrouter-{contentSessionId}-{timestamp}IDs before the first API call, fixing a real bug where PR #693's defensive!session.memorySessionIdchecks would throw errors for all stateless provider sessions. Other PR changes (sessions/create.ts backfill, CLAUDE_MEM_FOLDER_CLAUDEMD_ENABLED, quote style changes) already in main or formatting-only. Build passes.
- APPLIED TO MAIN (2026-02-06): PR #910 was already CLOSED (regressions), so #615 was NOT superseded. Core synthetic memorySessionId generation (+8 lines each in GeminiAgent.ts and OpenRouterAgent.ts) applied directly to main — PR was 1 month old with conflicts from PRs #693, #913, #940. The fix generates