fix: prevent chroma-mcp spawn storm with 5-layer defense (641 processes → max 2)

During SIGHUP testing with 6+ active sessions, ChromaSync.ensureConnection()
had no mutex — concurrent fire-and-forget syncObservation() calls each spawned
a chroma-mcp subprocess via StdioClientTransport, creating 641 orphans in ~5min.
Error-driven reconnection formed a positive feedback loop amplifying the storm.

Defense layers:
- Layer 0: Connection mutex via promise memoization (prevents concurrent spawns)
- Layer 1: Pre-spawn process count guard using execFileSync('ps') (kills excess)
- Layer 2: Hardened close() with try-finally + Unix pkill in GracefulShutdown
- Layer 3: Count-based orphan reaper in ProcessManager (not age-based)
- Layer 4: Circuit breaker stops retries after 3 consecutive failures for 60s

Closes #1063, closes #695
Relates to #1010, #707
This commit is contained in:
Rod Boev
2026-02-11 05:53:10 -05:00
parent 79b3a61ac8
commit a3f9e7f638
7 changed files with 505 additions and 21 deletions
+2
View File
@@ -66,6 +66,7 @@ import {
removePidFile,
getPlatformTimeout,
cleanupOrphanedProcesses,
cleanupExcessChromaProcesses,
spawnDaemon,
createSignalHandler
} from './infrastructure/ProcessManager.js';
@@ -316,6 +317,7 @@ export class WorkerService {
private async initializeBackground(): Promise<void> {
try {
await cleanupOrphanedProcesses();
await cleanupExcessChromaProcesses();
// Load mode configuration
const { ModeManager } = await import('./domain/ModeManager.js');