Files
claude-mem/Auto Run Docs/PR-Triage/PR-Triage-04.md
T
2026-02-05 18:27:55 -05:00

6.9 KiB

Phase 04: Hook Resilience & Non-blocking Startup

All these PRs share the goal of preventing hooks from blocking Claude Code prompts when the worker is unavailable or slow. They should be reviewed together since they touch overlapping files.

Core Hook Files Affected

  • src/cli/handlers/session-init.ts — PRs #973, #828, #829, #928
  • src/shared/worker-utils.ts — PRs #964, #530
  • plugin/hooks/hooks.json / src/shared/hook-constants.ts — PRs #960, #922
  • src/services/worker-service.ts — PR #959

Tasks

  • Review and merge PR #960 (fix: remove user-message hook from SessionStart to prevent startup error by @rodboev). Files: plugin/hooks/hooks.json, src/shared/hook-constants.ts. This removes a user-message hook that was incorrectly bundled into SessionStart, causing errors on startup. Steps: (1) gh pr checkout 960 (2) Verify that the user-message hook is genuinely not needed in SessionStart — check if hooks.json has a separate UserPromptSubmit hook that handles user messages (3) Run npm run build and verify (4) If clean: gh pr merge 960 --rebase --delete-branch

    • Merged on 2026-02-05. Verified: context hook handles SessionStart injection, UserPromptSubmit handles user messages via session-init. Build clean. The USER_MESSAGE_ONLY: 3 exit code was also documented in hook-constants.ts.
  • Review PR #973 (Fix hooks to fail gracefully instead of blocking prompts by @farikh). Files: src/cli/handlers/session-init.ts, src/cli/handlers/user-message.ts. This wraps hook handlers in try-catch so failures don't block Claude Code. Steps: (1) gh pr checkout 973 (2) Review — hooks should output valid JSON status on failure (exit 0 with error info) rather than crashing (exit 2 which blocks Claude) (3) Verify the approach aligns with the exit code strategy in CLAUDE.md (exit 0 for non-blocking, exit 2 for blocking) (4) Run npm run build (5) If appropriate: gh pr merge 973 --rebase --delete-branch

    • Merged on 2026-02-05. Rebased cleanly onto main. session-init.ts: replaced two throw on worker 500/SDK agent failure with logger.failure() + graceful exit 0 (fail-open). user-message.ts: replaced throw with graceful return, console.error()process.stderr.write(), USER_MESSAGE_ONLYSUCCESS. Note: user-message handler is effectively dead code since PR #960 removed its hook from SessionStart, but the cleanup is harmless. Build clean, all existing tests pass.
  • Review PR #959 (fix: fail open on /api/context/inject during initialization by @rodboev). File: src/services/worker-service.ts. The context inject endpoint should return empty context (not 503) during worker initialization so hooks don't block. Steps: (1) gh pr checkout 959 (2) Verify the endpoint returns a valid empty context response during init rather than erroring (3) Run npm run build (4) If clean: gh pr merge 959 --rebase --delete-branch

    • Merged on 2026-02-05. Rebased cleanly onto main. Replaced blocking await Promise.race([initializationComplete, 5-min-timeout]) with synchronous initializationCompleteFlag check. Returns 200 with empty context { content: [{ type: 'text', text: '' }] } instead of 503 error during initialization. Aligns with fail-open hook strategy: hooks get valid response and exit 0 instead of hanging for up to 5 minutes. Build clean, no test regressions (9 pre-existing failures in worker-json-status.test.ts unrelated to this change).
  • Review PR #964 (Add fetch timeouts to Stop hook and health checks by @rodboev). Files: src/cli/handlers/summarize.ts, src/shared/worker-utils.ts. Adds AbortController timeouts to prevent hooks from hanging on fetch calls. Steps: (1) gh pr checkout 964 (2) Verify timeout values are reasonable (should be < hook timeout of 120s) (3) Check that AbortController usage is correct (signal passed to fetch) (4) Run npm run build (5) If clean: gh pr merge 964 --rebase --delete-branch

    • Merged on 2026-02-05. Rebased cleanly onto main. Adds fetchWithTimeout() helper in worker-utils.ts using Promise.race + setTimeout (avoids AbortSignal.timeout() which causes libuv assertion crash in Bun on Windows). Applied HEALTH_CHECK_TIMEOUT_MS (30s / 45s on Windows) to isWorkerHealthy() and getWorkerVersion() — these previously had no timeout and would hang indefinitely when worker was unreachable. Applied HOOK_TIMEOUTS.DEFAULT (5min) to summarize POST request. Implementation properly clears timeout on both resolve and reject paths. Build clean.
  • Review PR #922 (fix: add async:true to SessionStart hooks by @kamran-khalid-v9). File: plugin/hooks/hooks.json. This adds async: true to SessionStart hooks so they don't block terminal on Windows. Steps: (1) gh pr checkout 922 (2) Verify that making SessionStart async doesn't break context injection (context must be available before Claude starts processing) (3) IMPORTANT: If SessionStart is async, context won't be injected in time. This may conflict with the architecture. Verify carefully. (4) If async is inappropriate for SessionStart (which injects context), close with explanation. If only certain sub-hooks should be async, request changes.

    • Closed on 2026-02-05. Making SessionStart hooks async would break claude-mem's core functionality — the context hook's stdout must be captured synchronously for memory context injection. The three SessionStart hooks also have a strict dependency chain (install → start worker → fetch context). The Windows blocking issue was already resolved by the fail-open approach in PRs #973, #959, and #964 (graceful failures, empty context during init, fetch timeouts). PR was also stale: referenced bun directly instead of bun-runner.js wrapper and included the user-message hook removed in PR #960.
  • Evaluate PR #530 (fix: add retry logic with exponential backoff to hook fetch calls by @BeamNawapat). Files: 10 files including all hook scripts and worker-utils. This is an older PR (Jan 3) that adds retry logic. Steps: (1) Check if PR is rebased on current main (2) The approach (retry with backoff) may conflict with the "fail fast" philosophy — hooks should fail quickly rather than retry. (3) If the retry approach conflicts with the hook resilience PRs above (#973, #959), close with: gh pr close 530 --comment "The hook resilience approach has evolved — hooks now fail open rather than retry. See PRs #973 and #959 for the current approach. Thank you for the contribution!"

    • Closed on 2026-02-05. The retry-with-backoff approach (3 retries, 100ms→200ms→400ms exponential backoff) directly conflicts with the fail-open strategy established by PRs #973, #959, and #964. Current architecture: hooks fail open immediately with valid empty responses (exit 0) rather than retrying failed connections. The PR was also stale (Jan 3), touching 10 files including hook scripts that have since been restructured. The fetchWithRetry() utility in worker-utils.ts would conflict with the fetchWithTimeout() already merged from PR #964.