fix: prevent zombie subprocess accumulation by only trusting exitCode (#1226) (#1325)

proc.killed only means Node sent a signal — the process can still be alive.
This caused premature pool slot release, allowing unbounded process spawning.

- ensureProcessExit: remove proc.killed from early-exit checks, only trust exitCode
- Fix 3 call-site guards that skipped cleanup for signaled-but-alive processes
- Add TOTAL_PROCESS_HARD_CAP=10 safety net in waitForSlot()
- After SIGKILL, wait up to 1s via exit event instead of blind 200ms sleep
- Reduce reaper interval from 5min to 1min, idle threshold from 2min to 1min

Closes #1226

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Nir Alfasi
2026-03-13 04:59:42 +02:00
committed by GitHub
parent 23058d4b0c
commit 38d9ac7adb
5 changed files with 231 additions and 13 deletions
+1 -1
View File
@@ -281,7 +281,7 @@ export class SDKAgent {
} finally {
// Ensure subprocess is terminated after query completes (or on error)
const tracked = getProcessBySession(session.sessionDbId);
if (tracked && !tracked.process.killed && tracked.process.exitCode === null) {
if (tracked && tracked.process.exitCode === null) {
await ensureProcessExit(tracked, 5000);
}
}