f62b5d248c
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
39 lines
9.6 KiB
Markdown
39 lines
9.6 KiB
Markdown
# Phase 08: Session Management & Message Processing
|
|
|
|
These PRs fix various session lifecycle issues — orphaned messages, provider recovery, infinite loops, and stateless provider support.
|
|
|
|
## High Priority (Data Loss / Cost Risk)
|
|
|
|
- [x] Review PR #934 (`fix: terminate session on prompt-too-long instead of retrying indefinitely` by @jayvenn21). File: `src/services/worker/SDKAgent.ts`. Currently, prompt-too-long errors cause infinite retry loops. Steps: (1) `gh pr checkout 934` (2) Review — should detect the specific error and terminate the session cleanly (3) Verify no data loss on termination (pending observations should still be saved) (4) Run `npm run build` (5) If clean: `gh pr merge 934 --rebase --delete-branch`
|
|
- **MERGED** (2026-02-06): Clean +5 line fix. Detects "Prompt is too long" in SDK response text and throws a fatal error, which propagates up to `startSessionProcessor` catch handler for clean session termination. No data loss — previously processed observations are already saved before this point. Build passes.
|
|
|
|
- [x] Review PR #693 (`fix: prevent infinite restart loop that causes runaway API costs` by @ajbmachon). Files: `src/services/worker-types.ts`, `GeminiAgent.ts`, `OpenRouterAgent.ts`, `SessionManager.ts`, `SessionRoutes.ts`. Steps: (1) `gh pr checkout 693` (2) Review restart limiting logic — should have max restart count or cooldown (3) Verify it applies to all providers (Claude, Gemini, OpenRouter) (4) Run `npm run build` (5) If clean: `gh pr merge 693 --rebase --delete-branch`
|
|
- **MERGED** (2026-02-06): Applied directly to main (branch was 3+ weeks old). Adds `consecutiveRestarts` counter to ActiveSession, limits crash-recovery restarts to 3 with exponential backoff (1s, 2s, 4s), and adds defensive `memorySessionId` checks in GeminiAgent and OpenRouterAgent before expensive LLM calls. Build passes. Addresses $402+ runaway cost scenario reported by the author.
|
|
|
|
## Session Processing
|
|
|
|
- [x] Review PR #940 (`fix: Backfill project field on SDK session creation` by @miclip). Files: `src/services/sqlite/SessionStore.ts`, `src/services/sqlite/sessions/create.ts`. Steps: (1) `gh pr checkout 940` (2) Review — should populate the project field when creating a session so observations are properly scoped (3) Small, focused change (4) Run `npm run build` (5) If clean: `gh pr merge 940 --rebase --delete-branch`
|
|
- **MERGED** (2026-02-06): Clean +18/-2 line fix. Adds a conditional UPDATE after INSERT OR IGNORE to backfill the project field when a later hook provides it — only updates when existing value is NULL or empty. Fixes race condition where PostToolUse hook creates the session before UserPromptSubmit sets the project. Both functional (`sessions/create.ts`) and class (`SessionStore.ts`) versions updated identically. Build passes.
|
|
|
|
- [x] Review PR #937 (`fix(worker): gracefully process orphaned pending messages after session termination` by @jayvenn21). Files: `src/services/sqlite/PendingMessageStore.ts`, `src/services/worker-service.ts`, `src/services/worker/SessionManager.ts`. Steps: (1) `gh pr checkout 937` (2) Review — orphaned messages should be processed or discarded cleanly, not stuck forever (3) Run `npm run build` (4) If clean: `gh pr merge 937 --rebase --delete-branch`
|
|
- **MERGED** (2026-02-06): Clean +125/-3 line fix addressing issue #936 (51+ orphaned messages/day). Adds `isSessionTerminatedError()` to detect SDK resume failures from closed terminals. On failure, falls back to Gemini/OpenRouter agents if available to drain the queue. If no fallback, `markAllSessionMessagesAbandoned()` marks pending/processing messages as failed and `removeSessionImmediate()` cleans up the session without deadlocking on the generator promise. Build passes. Rebased cleanly onto main.
|
|
|
|
- [x] Review PR #899 (`fix: resolve message processing failures in multi-session scenarios` by @hahaschool). Files: 7 files including SessionStore, SDKAgent, ResponseProcessor, SessionRoutes. Steps: (1) `gh pr checkout 899` (2) This is a broader fix — review carefully for scope creep (3) Check that multi-session message routing is correct (messages go to the right session) (4) Run `npm run build` (5) If focused and correct: `gh pr merge 899 --rebase --delete-branch`. If too broad, request scope reduction.
|
|
- **CLOSED** (2026-02-06): Too broad — conflicts in 3 files (`worker-service.ts`, `worker-types.ts`, `SDKAgent.ts`) due to overlap with recently merged PRs #693, #937, and #940. Crash recovery logic already addressed by #693 (restart limits with exponential backoff) and #937 (orphaned message fallback + session termination detection). Requested author re-submit as focused PRs for the genuinely valuable parts: (1) AbortController stale-check at generator start (~10 lines, real bug), (2) FK constraint protection in SDKAgent/ResponseProcessor (prevents FK violation when SDK generates new session ID after restart), (3) queueDepth logging fix using `pendingStore.getPendingCount()` instead of deprecated `session.pendingMessages.length`.
|
|
|
|
- [x] Review PR #627 (`fix: Reset AbortController before starting generator` by @TranslateMe). File: `src/services/worker/http/routes/SessionRoutes.ts`. Steps: (1) `gh pr checkout 627` (2) Old PR (Dec 27) — check if still applicable after v8.5.2 memory leak fix (3) If the abort controller reset is still needed: rebase and merge. If already handled: close.
|
|
- **MERGED** (2026-02-06): Clean +10 line fix still applicable. Adds stale-AbortController check at the start of `startGeneratorWithProvider()` — if `signal.aborted` is already true, creates a fresh AbortController before proceeding. This prevents infinite "Generator aborted" loops where: (1) session aborts, setting `signal.aborted=true`, (2) `generatorPromise` is set to null, (3) new observations trigger `ensureGeneratorRunning`, (4) new generator immediately sees stale abort signal and exits, creating an infinite loop. The crash recovery path (merged in PR #693) already resets the controller for non-abort exits, but this fix covers the abort case. Rebased cleanly onto main, build passes.
|
|
|
|
- [x] Review PR #741 (`fix: Provider-aware recovery and stale session cleanup` by @licutis). File: `src/services/worker-service.ts`. Steps: (1) `gh pr checkout 741` (2) Review provider-aware recovery logic — should handle Gemini/OpenRouter differently from Claude SDK (3) Run `npm run build` (4) If clean: `gh pr merge 741 --rebase --delete-branch`
|
|
- **MERGED** (2026-02-06): Applied directly to main (branch was 3 weeks old with conflicts from PRs #693, #937, #627). Two clean changes: (1) Adds `getActiveAgent()` to WorkerService matching SessionRoutes logic — startup recovery now uses the correct provider (OpenRouter/Gemini/SDK) instead of hardcoding SDKAgent. (2) Stale session cleanup in `processPendingQueues()` — marks sessions stuck 'active' for 6+ hours as failed along with their pending messages before processing orphaned queues. Conflicts resolved by combining PR #741's dynamic agent selection with PR #937's session-terminated fallback logic. Build passes.
|
|
|
|
## Stateless Provider Support
|
|
|
|
These fix Gemini/OpenRouter providers that don't have SDK sessions.
|
|
|
|
- [x] Review PR #910 (`fix: complete stateless provider support with enhanced validation` by @Scheevel). WARNING: This PR modifies ~90 files, most of which are CLAUDE.md files that should NOT have been included. Steps: (1) `gh pr checkout 910` (2) Identify the actual source changes (look at `.ts` files only, ignore CLAUDE.md files) (3) Key files: `src/services/sqlite/SessionStore.ts`, `src/services/worker/GeminiAgent.ts`, `src/services/worker/OpenRouterAgent.ts`, `src/services/worker/agents/ResponseProcessor.ts` (4) If the core logic is good but CLAUDE.md pollution is unacceptable, request changes to remove all CLAUDE.md files from the PR. Or cherry-pick just the source changes.
|
|
- **CLOSED** (2026-02-06): Stale branch with critical regressions against 5 recently merged PRs. Reverts: `consecutiveRestarts` (PR #693 — infinite restart prevention), `removeSessionImmediate()` (PR #937 — orphaned message handling), `getCredential()` (Issue #733 — centralized env), `CLAUDE_MEM_FOLDER_CLAUDEMD_ENABLED` check (PR #913), and `createIterator()` onIdleTimeout. Also bundles 3 unrelated features: observation deduplication (+285 lines Levenshtein in ResponseProcessor), summary soft-delete (hidden column + PATCH API), and deletion of ~80 CLAUDE.md files + compiled plugin artifacts. Core synthetic memorySessionId generation is valuable — requested re-submission as focused PRs: (1) synthetic ID gen for stateless providers, (2) observation deduplication, (3) summary soft-delete.
|
|
|
|
- [x] Evaluate PR #615 (`fix: generate memorySessionId for stateless providers` by @JiehoonKwak). Files: 6 files. Steps: (1) `gh pr checkout 615` (2) Check if #910 supersedes this (both fix stateless provider session IDs) (3) If #910 is more complete, close #615: `gh pr close 615 --comment "Superseded by PR #910 which provides more complete stateless provider support. Thank you!"` (4) If #615 has unique value, rebase and merge first.
|
|
- **APPLIED TO MAIN** (2026-02-06): PR #910 was already CLOSED (regressions), so #615 was NOT superseded. Core synthetic memorySessionId generation (+8 lines each in GeminiAgent.ts and OpenRouterAgent.ts) applied directly to main — PR was 1 month old with conflicts from PRs #693, #913, #940. The fix generates `gemini-{contentSessionId}-{timestamp}` / `openrouter-{contentSessionId}-{timestamp}` IDs before the first API call, fixing a real bug where PR #693's defensive `!session.memorySessionId` checks would throw errors for all stateless provider sessions. Other PR changes (sessions/create.ts backfill, CLAUDE_MEM_FOLDER_CLAUDEMD_ENABLED, quote style changes) already in main or formatting-only. Build passes.
|