fix(windows): solve zombie port problem with wrapper architecture (#372)

On Windows, Bun doesn't properly release socket handles when the worker
process exits, causing "zombie ports" that remain bound even after all
processes are dead. This required a system reboot to clear.

Solution: Introduce a wrapper process (worker-wrapper.cjs) that:
- Spawns the actual worker as a child with IPC channel
- On restart/shutdown, uses `taskkill /T /F` to kill the entire process tree
- Exits itself, allowing hooks to start fresh

The wrapper has no sockets, so Bun's socket cleanup bug doesn't affect it.
When the wrapper kills the inner worker tree and exits, the port is properly
released and can be immediately reused.

Key changes:
- New worker-wrapper.ts for Windows process lifecycle management
- ProcessManager starts wrapper on Windows, worker directly on Unix
- Worker sends IPC messages to wrapper for restart/shutdown
- Health endpoint now includes debug info (build ID, managed status, hasIpc)

Tested: Restart API now properly releases port and new worker binds to same port.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
ToxMox
2025-12-17 14:45:41 -05:00
committed by GitHub
parent 1fec1e8339
commit a5bf653a47
14 changed files with 664 additions and 248 deletions
+23 -4
View File
@@ -43,8 +43,10 @@ export class ProcessManager {
// Ensure log directory exists
mkdirSync(LOG_DIR, { recursive: true });
// Get worker script path
const workerScript = join(MARKETPLACE_ROOT, 'plugin', 'scripts', 'worker-service.cjs');
// On Windows, use the wrapper script to solve zombie port problem
// On Unix, use the worker directly
const scriptName = process.platform === 'win32' ? 'worker-wrapper.cjs' : 'worker-service.cjs';
const workerScript = join(MARKETPLACE_ROOT, 'plugin', 'scripts', scriptName);
if (!existsSync(workerScript)) {
return { success: false, error: `Worker script not found at ${workerScript}` };
@@ -86,6 +88,10 @@ export class ProcessManager {
// Note: windowsHide: true doesn't work with detached: true (Bun inherits Node.js process spawning semantics)
// See: https://github.com/nodejs/node/issues/21825 and PR #315 for detailed testing
//
// On Windows, we start worker-wrapper.cjs which manages the actual worker-service.cjs.
// This solves the zombie port problem: the wrapper has no sockets, so when it kills
// and respawns the inner worker, the socket is properly released.
//
// Security: All paths (bunPath, script, MARKETPLACE_ROOT) are application-controlled system paths,
// not user input. If an attacker could modify these paths, they would already have full filesystem
// access including direct access to ~/.claude-mem/claude-mem.db. Nevertheless, we properly escape
@@ -168,8 +174,21 @@ export class ProcessManager {
if (!info) return true;
try {
process.kill(info.pid, 'SIGTERM');
await this.waitForExit(info.pid, timeout);
if (process.platform === 'win32') {
// On Windows, use taskkill /T /F to kill entire process tree
// This ensures the wrapper AND all its children (inner worker, MCP, ChromaSync) are killed
// which is necessary to properly release the socket and avoid zombie ports
const { execSync } = await import('child_process');
try {
execSync(`taskkill /PID ${info.pid} /T /F`, { timeout: 10000, stdio: 'ignore' });
} catch {
// Process may already be dead
}
} else {
// On Unix, use signals
process.kill(info.pid, 'SIGTERM');
await this.waitForExit(info.pid, timeout);
}
} catch {
try {
process.kill(info.pid, 'SIGKILL');