Reduce timeouts to eliminate 10-30s startup delay when worker is dead
(common on WSL2 after hibernate). Add stale PID detection, graceful
error handling across all handlers, and error classification that
distinguishes worker unavailability from handler bugs.
- HEALTH_CHECK 30s→3s, new POST_SPAWN_WAIT (5s), PORT_IN_USE_WAIT (3s)
- isProcessAlive() with EPERM handling, cleanStalePidFile()
- getPluginVersion() try-catch for shutdown race (#1042)
- isWorkerUnavailableError: transport+5xx+429→exit 0, 4xx→exit 2
- No-op handler for unknown event types (#984)
- Wrap all handler fetch calls in try-catch for graceful degradation
- CLAUDE_MEM_HEALTH_TIMEOUT_MS env var override with validation
Cherry-picked from PR #844 by @thusdigital. Sessions stayed in active
sessions map forever after summarize, causing the orphan reaper to think
all processes were still active. Adds session-complete as Stop phase 2
hook that calls POST /api/sessions/complete to remove sessions from the
active map, allowing the reaper to correctly identify and clean up
orphaned worker processes. Fixes#842.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>