fix: prevent zombie subprocess accumulation by only trusting exitCode (#1226) (#1325)

proc.killed only means Node sent a signal — the process can still be alive.
This caused premature pool slot release, allowing unbounded process spawning.

- ensureProcessExit: remove proc.killed from early-exit checks, only trust exitCode
- Fix 3 call-site guards that skipped cleanup for signaled-but-alive processes
- Add TOTAL_PROCESS_HARD_CAP=10 safety net in waitForSlot()
- After SIGKILL, wait up to 1s via exit event instead of blind 200ms sleep
- Reduce reaper interval from 5min to 1min, idle threshold from 2min to 1min

Closes #1226

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Nir Alfasi
2026-03-13 04:59:42 +02:00
committed by GitHub
parent 23058d4b0c
commit 38d9ac7adb
5 changed files with 231 additions and 13 deletions
+1 -1
View File
@@ -302,7 +302,7 @@ export class SessionManager {
// 3. Verify subprocess exit with 5s timeout (Issue #737 fix)
const tracked = getProcessBySession(sessionDbId);
if (tracked && !tracked.process.killed && tracked.process.exitCode === null) {
if (tracked && tracked.process.exitCode === null) {
logger.debug('SESSION', `Waiting for subprocess PID ${tracked.pid} to exit`, {
sessionId: sessionDbId,
pid: tracked.pid