fix: prevent zombie process accumulation via PID registry and signal propagation (Issue #737) (#806)

* Fix zombie process accumulation (Issue #737)

Problem: Claude haiku subprocesses spawned by the SDK weren't terminating
properly, causing zombie process accumulation (user reported 155 processes
consuming 51GB RAM).

Root causes:
1. SDK's SpawnedProcess interface hides subprocess PIDs
2. deleteSession() didn't verify subprocess exit
3. abort() was fire-and-forget with no confirmation
4. No mechanism to track or clean up orphaned processes

Solution:
- Add ProcessRegistry module to track spawned Claude subprocesses
- Use SDK's spawnClaudeCodeProcess option to capture PIDs via custom spawn
- Pass signal parameter to enable AbortController integration
- Wait for subprocess exit in deleteSession() with 5s timeout
- Escalate to SIGKILL if graceful exit fails
- Add orphan reaper running every 5 minutes as safety net

Files changed:
- src/services/worker/ProcessRegistry.ts (new): PID registry and reaper
- src/services/worker/SDKAgent.ts: Use custom spawn to capture PIDs
- src/services/worker/SessionManager.ts: Verify subprocess exit on delete
- src/services/worker-service.ts: Start/stop orphan reaper

Fixes #737

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: address code review feedback

- Replace busy-wait polling with event-based proc.once('exit')
- Detect and warn about multiple processes per session (race condition)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: bigphoot <bigphoot@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Alexander Knigge
2026-01-25 17:10:11 -08:00
committed by GitHub
parent 67651669a1
commit c1b5b2a783
4 changed files with 295 additions and 5 deletions
+5 -1
View File
@@ -20,6 +20,7 @@ import { USER_SETTINGS_PATH } from '../../shared/paths.js';
import type { ActiveSession, SDKUserMessage } from '../worker-types.js';
import { ModeManager } from '../domain/ModeManager.js';
import { processAgentResponse, type WorkerRef } from './agents/index.js';
import { createPidCapturingSpawn, getProcessBySession, ensureProcessExit } from './ProcessRegistry.js';
// Import Agent SDK (assumes it's installed)
// @ts-ignore - Agent SDK types may not be available
@@ -99,6 +100,7 @@ export class SDKAgent {
// Run Agent SDK query loop
// Only resume if we have a captured memory session ID
// Use custom spawn to capture PIDs for zombie process cleanup (Issue #737)
const queryResult = query({
prompt: messageGenerator,
options: {
@@ -109,7 +111,9 @@ export class SDKAgent {
...(hasRealMemorySessionId && session.lastPromptNumber > 1 && { resume: session.memorySessionId }),
disallowedTools,
abortController: session.abortController,
pathToClaudeCodeExecutable: claudePath
pathToClaudeCodeExecutable: claudePath,
// Custom spawn function captures PIDs to fix zombie process accumulation
spawnClaudeCodeProcess: createPidCapturingSpawn(session.sessionDbId)
}
});