Replace execAsync kill command with individual process.kill calls wrapped in try/catch to gracefully handle processes that have already exited. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
5.7 KiB
PR #458 Fix Plan: Worker Lifecycle Critical Issues
Branch: bugfix/spawn-worker
Created: 2025-12-26
Context: PR review found 3 critical bugs, 5 important issues in the worker self-spawn implementation
Phase 1: Fix Critical waitForProcessesExit Bug
Goal: Fix the logic bug that crashes shutdown when child processes exit
File: src/services/worker-service.ts
What to do:
- Find the
waitForProcessesExitmethod (around line 752-771) - The
pids.filter()callback callsprocess.kill(pid, 0)which throws when the process is dead - Wrap in try/catch to return false when process doesn't exist
Current code:
const stillAlive = pids.filter(pid => {
process.kill(pid, 0); // Signal 0 checks if process exists - throws if dead
return true;
});
Fixed code:
const stillAlive = pids.filter(pid => {
try {
process.kill(pid, 0);
return true;
} catch {
return false;
}
});
Verification: The filter should now correctly remove PIDs of exited processes instead of throwing.
Phase 2: Fix Spawn PID Validation
Goal: Prevent invalid PID file writes when spawn fails
File: src/services/worker-service.ts
What to do:
- Find the
startcase in the CLI switch (around line 816-844) - After spawn(), add check for undefined pid before writing PID file
- Exit with error if spawn failed
Current code:
const child = spawn(process.execPath, [__filename, '--daemon'], {...});
child.unref();
writePidFile({ pid: child.pid!, port, startedAt: new Date().toISOString() });
Fixed code:
const child = spawn(process.execPath, [__filename, '--daemon'], {...});
if (child.pid === undefined) {
console.error('Failed to spawn worker daemon');
process.exit(1);
}
child.unref();
writePidFile({ pid: child.pid, port, startedAt: new Date().toISOString() });
Verification: Remove the ! non-null assertion since we now properly check.
Phase 3: Fix Unix Process Cleanup Error Handling
Goal: Handle errors in orphaned process cleanup
File: src/services/worker-service.ts
What to do:
- Find
cleanupOrphanedProcessesmethod (around line 389-458) - The Unix branch at line 454 has
await execAsync(\kill ${pids.join(' ')}`)` with no error handling - Replace with individual process.kill calls wrapped in try/catch
Current code:
} else {
await execAsync(`kill ${pids.join(' ')}`);
}
Fixed code:
} else {
for (const pid of pids) {
try {
process.kill(pid, 'SIGKILL');
} catch {
// Process already exited - that's fine
}
}
}
Phase 4: Fix Windows taskkill Error Handling
Goal: Prevent cleanup abort when one process fails to kill
File: src/services/worker-service.ts
What to do:
- Find the Windows taskkill loop in
cleanupOrphanedProcesses(around line 444-452) - Wrap
execSyncin try/catch so one failure doesn't abort the loop
Current code:
for (const pid of pids) {
if (!Number.isInteger(pid) || pid <= 0) {
logger.warn('SYSTEM', 'Skipping invalid PID', { pid });
continue;
}
execSync(`taskkill /PID ${pid} /T /F`, { timeout: 60000, stdio: 'ignore' });
}
Fixed code:
for (const pid of pids) {
if (!Number.isInteger(pid) || pid <= 0) {
logger.warn('SYSTEM', 'Skipping invalid PID', { pid });
continue;
}
try {
execSync(`taskkill /PID ${pid} /T /F`, { timeout: 60000, stdio: 'ignore' });
} catch {
// Process may have already exited - continue cleanup
}
}
Phase 5: Use Readiness Endpoint for Health Checks
Goal: Ensure waitForHealth checks full initialization, not just HTTP server
File: src/services/worker-service.ts
What to do:
- Find
waitForHealthfunction (around line 61-68) - Change endpoint from
/api/healthto/api/readiness /api/readinessreturns 503 until background init completes,/api/healthreturns 200 immediately
Current code:
async function waitForHealth(port: number, timeoutMs: number = 30000): Promise<boolean> {
const start = Date.now();
while (Date.now() - start < timeoutMs) {
if (await isPortInUse(port)) return true;
await new Promise(r => setTimeout(r, 500));
}
return false;
}
Fixed code:
async function waitForHealth(port: number, timeoutMs: number = 30000): Promise<boolean> {
const start = Date.now();
while (Date.now() - start < timeoutMs) {
try {
const response = await fetch(`http://127.0.0.1:${port}/api/readiness`, {
signal: AbortSignal.timeout(2000)
});
if (response.ok) return true;
} catch {
// Not ready yet
}
await new Promise(r => setTimeout(r, 500));
}
return false;
}
Phase 6: Build, Test, and Commit
Goal: Verify fixes work and commit changes
Commands:
npm run build-and-sync
Commit message:
fix: address critical error handling issues in worker lifecycle
- Fix waitForProcessesExit crash when child processes exit
- Add spawn pid validation before writing PID file
- Handle Unix kill errors in orphaned process cleanup
- Handle Windows taskkill errors in cleanup loop
- Use /api/readiness for health checks instead of /api/health
Fixes issues found in PR #458 review.
Summary
| Phase | Priority | Description |
|---|---|---|
| 1 | CRITICAL | Fix waitForProcessesExit filter bug |
| 2 | CRITICAL | Validate child.pid after spawn |
| 3 | CRITICAL | Fix Unix kill error handling |
| 4 | HIGH | Fix Windows taskkill error handling |
| 5 | HIGH | Use readiness endpoint for health |
| 6 | - | Build and commit |
Estimated changes: ~30 lines modified in worker-service.ts