Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
6.9 KiB
Phase 04: Hook Resilience & Non-blocking Startup
All these PRs share the goal of preventing hooks from blocking Claude Code prompts when the worker is unavailable or slow. They should be reviewed together since they touch overlapping files.
Core Hook Files Affected
src/cli/handlers/session-init.ts— PRs #973, #828, #829, #928src/shared/worker-utils.ts— PRs #964, #530plugin/hooks/hooks.json/src/shared/hook-constants.ts— PRs #960, #922src/services/worker-service.ts— PR #959
Tasks
-
Review and merge PR #960 (
fix: remove user-message hook from SessionStart to prevent startup errorby @rodboev). Files:plugin/hooks/hooks.json,src/shared/hook-constants.ts. This removes a user-message hook that was incorrectly bundled into SessionStart, causing errors on startup. Steps: (1)gh pr checkout 960(2) Verify that the user-message hook is genuinely not needed in SessionStart — check ifhooks.jsonhas a separate UserPromptSubmit hook that handles user messages (3) Runnpm run buildand verify (4) If clean:gh pr merge 960 --rebase --delete-branch- Merged on 2026-02-05. Verified:
contexthook handles SessionStart injection,UserPromptSubmithandles user messages viasession-init. Build clean. TheUSER_MESSAGE_ONLY: 3exit code was also documented inhook-constants.ts.
- Merged on 2026-02-05. Verified:
-
Review PR #973 (
Fix hooks to fail gracefully instead of blocking promptsby @farikh). Files:src/cli/handlers/session-init.ts,src/cli/handlers/user-message.ts. This wraps hook handlers in try-catch so failures don't block Claude Code. Steps: (1)gh pr checkout 973(2) Review — hooks should output valid JSON status on failure (exit 0 with error info) rather than crashing (exit 2 which blocks Claude) (3) Verify the approach aligns with the exit code strategy in CLAUDE.md (exit 0 for non-blocking, exit 2 for blocking) (4) Runnpm run build(5) If appropriate:gh pr merge 973 --rebase --delete-branch- Merged on 2026-02-05. Rebased cleanly onto main. session-init.ts: replaced two
throwon worker 500/SDK agent failure withlogger.failure()+ graceful exit 0 (fail-open). user-message.ts: replacedthrowwith graceful return,console.error()→process.stderr.write(),USER_MESSAGE_ONLY→SUCCESS. Note: user-message handler is effectively dead code since PR #960 removed its hook from SessionStart, but the cleanup is harmless. Build clean, all existing tests pass.
- Merged on 2026-02-05. Rebased cleanly onto main. session-init.ts: replaced two
-
Review PR #959 (
fix: fail open on /api/context/inject during initializationby @rodboev). File:src/services/worker-service.ts. The context inject endpoint should return empty context (not 503) during worker initialization so hooks don't block. Steps: (1)gh pr checkout 959(2) Verify the endpoint returns a valid empty context response during init rather than erroring (3) Runnpm run build(4) If clean:gh pr merge 959 --rebase --delete-branch- Merged on 2026-02-05. Rebased cleanly onto main. Replaced blocking
await Promise.race([initializationComplete, 5-min-timeout])with synchronousinitializationCompleteFlagcheck. Returns 200 with empty context{ content: [{ type: 'text', text: '' }] }instead of 503 error during initialization. Aligns with fail-open hook strategy: hooks get valid response and exit 0 instead of hanging for up to 5 minutes. Build clean, no test regressions (9 pre-existing failures in worker-json-status.test.ts unrelated to this change).
- Merged on 2026-02-05. Rebased cleanly onto main. Replaced blocking
-
Review PR #964 (
Add fetch timeouts to Stop hook and health checksby @rodboev). Files:src/cli/handlers/summarize.ts,src/shared/worker-utils.ts. Adds AbortController timeouts to prevent hooks from hanging on fetch calls. Steps: (1)gh pr checkout 964(2) Verify timeout values are reasonable (should be < hook timeout of 120s) (3) Check that AbortController usage is correct (signal passed to fetch) (4) Runnpm run build(5) If clean:gh pr merge 964 --rebase --delete-branch- Merged on 2026-02-05. Rebased cleanly onto main. Adds
fetchWithTimeout()helper inworker-utils.tsusingPromise.race+setTimeout(avoidsAbortSignal.timeout()which causes libuv assertion crash in Bun on Windows). AppliedHEALTH_CHECK_TIMEOUT_MS(30s / 45s on Windows) toisWorkerHealthy()andgetWorkerVersion()— these previously had no timeout and would hang indefinitely when worker was unreachable. AppliedHOOK_TIMEOUTS.DEFAULT(5min) to summarize POST request. Implementation properly clears timeout on both resolve and reject paths. Build clean.
- Merged on 2026-02-05. Rebased cleanly onto main. Adds
-
Review PR #922 (
fix: add async:true to SessionStart hooksby @kamran-khalid-v9). File:plugin/hooks/hooks.json. This addsasync: trueto SessionStart hooks so they don't block terminal on Windows. Steps: (1)gh pr checkout 922(2) Verify that making SessionStart async doesn't break context injection (context must be available before Claude starts processing) (3) IMPORTANT: If SessionStart is async, context won't be injected in time. This may conflict with the architecture. Verify carefully. (4) If async is inappropriate for SessionStart (which injects context), close with explanation. If only certain sub-hooks should be async, request changes.- Closed on 2026-02-05. Making SessionStart hooks async would break claude-mem's core functionality — the
contexthook's stdout must be captured synchronously for memory context injection. The three SessionStart hooks also have a strict dependency chain (install → start worker → fetch context). The Windows blocking issue was already resolved by the fail-open approach in PRs #973, #959, and #964 (graceful failures, empty context during init, fetch timeouts). PR was also stale: referencedbundirectly instead ofbun-runner.jswrapper and included theuser-messagehook removed in PR #960.
- Closed on 2026-02-05. Making SessionStart hooks async would break claude-mem's core functionality — the
-
Evaluate PR #530 (
fix: add retry logic with exponential backoff to hook fetch callsby @BeamNawapat). Files: 10 files including all hook scripts and worker-utils. This is an older PR (Jan 3) that adds retry logic. Steps: (1) Check if PR is rebased on current main (2) The approach (retry with backoff) may conflict with the "fail fast" philosophy — hooks should fail quickly rather than retry. (3) If the retry approach conflicts with the hook resilience PRs above (#973, #959), close with:gh pr close 530 --comment "The hook resilience approach has evolved — hooks now fail open rather than retry. See PRs #973 and #959 for the current approach. Thank you for the contribution!"- Closed on 2026-02-05. The retry-with-backoff approach (3 retries, 100ms→200ms→400ms exponential backoff) directly conflicts with the fail-open strategy established by PRs #973, #959, and #964. Current architecture: hooks fail open immediately with valid empty responses (exit 0) rather than retrying failed connections. The PR was also stale (Jan 3), touching 10 files including hook scripts that have since been restructured. The
fetchWithRetry()utility in worker-utils.ts would conflict with thefetchWithTimeout()already merged from PR #964.
- Closed on 2026-02-05. The retry-with-backoff approach (3 retries, 100ms→200ms→400ms exponential backoff) directly conflicts with the fail-open strategy established by PRs #973, #959, and #964. Current architecture: hooks fail open immediately with valid empty responses (exit 0) rather than retrying failed connections. The PR was also stale (Jan 3), touching 10 files including hook scripts that have since been restructured. The