feat: add embedded Process Supervisor for unified process lifecycle (#1370)

* feat: add embedded Process Supervisor for unified process lifecycle management

Consolidates scattered process management (ProcessManager, GracefulShutdown,
HealthMonitor, ProcessRegistry) into a unified src/supervisor/ module.

New: ProcessRegistry with JSON persistence, env sanitizer (strips CLAUDECODE_*
vars), graceful shutdown cascade (SIGTERM → 5s wait → SIGKILL with tree-kill
on Windows), PID file liveness validation, and singleton Supervisor API.

Fixes #1352 (worker inherits CLAUDECODE env causing nested sessions)
Fixes #1356 (zombie TCP socket after Windows reboot)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add session-scoped process reaping to supervisor

Adds reapSession(sessionId) to ProcessRegistry for killing session-tagged
processes on session end. SessionManager.deleteSession() now triggers reaping.
Tightens orphan reaper interval from 60s to 30s.

Fixes #1351 (MCP server processes leak on session end)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add Unix domain socket support for worker communication

Introduces socket-manager.ts for UDS-based worker communication, eliminating
port 37777 collisions between concurrent sessions. Worker listens on
~/.claude-mem/sockets/worker.sock by default with TCP fallback.

All hook handlers, MCP server, health checks, and admin commands updated to
use socket-aware workerHttpRequest(). Backwards compatible — settings can
force TCP mode via CLAUDE_MEM_WORKER_TRANSPORT=tcp.

Fixes #1346 (port 37777 collision across concurrent sessions)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove in-process worker fallback from hook command

Removes the fallback path where hook scripts started WorkerService in-process,
making the worker a grandchild of Claude Code (killed by sandbox). Hooks now
always delegate to ensureWorkerStarted() which spawns a fully detached daemon.

Fixes #1249 (grandchild process killed by sandbox)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add health checker and /api/admin/doctor endpoint

Adds 30-second periodic health sweep that prunes dead processes from the
supervisor registry and cleans stale socket files. Adds /api/admin/doctor
endpoint exposing supervisor state, process liveness, and environment health.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add comprehensive supervisor test suite

64 tests covering all supervisor modules: process registry (18 tests),
env sanitizer (8), shutdown cascade (10), socket manager (15), health
checker (5), and supervisor API (6). Includes persistence, isolation,
edge cases, and cross-module integration scenarios.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: revert Unix domain socket transport, restore TCP on port 37777

The socket-manager introduced UDS as default transport, but this broke
the HTTP server's TCP accessibility (viewer UI, curl, external monitoring).
Since there's only ever one worker process handling all sessions, the
port collision rationale for UDS doesn't apply. Reverts to TCP-only,
removing ~900 lines of unnecessary complexity.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: remove dead code found in pre-landing review

Remove unused `acceptingSpawns` field from Supervisor class (written but
never read — assertCanSpawn uses stopPromise instead) and unused
`buildWorkerUrl` import from context handler.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* updated gitignore

* fix: address PR review feedback - downgrade HTTP logging, clean up gitignore, harden supervisor

- Downgrade request/response HTTP logging from info to debug to reduce noise
- Remove unused getWorkerPort imports, use buildWorkerUrl helper
- Export ENV_PREFIXES/ENV_EXACT_MATCHES from env-sanitizer, reuse in Server.ts
- Fix isPidAlive(0) returning true (should be false)
- Add shutdownInitiated flag to prevent signal handler race condition
- Make validateWorkerPidFile testable with pidFilePath option
- Remove unused dataDir from ShutdownCascadeOptions
- Upgrade reapSession log from debug to warn
- Rename zombiePidFiles to deadProcessPids (returns actual PIDs)
- Clean up gitignore: remove duplicate datasets/, stale ~*/ and http*/ patterns
- Fix tests to use temp directories instead of relying on real PID file

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Alex Newman
2026-03-16 14:49:23 -07:00
committed by GitHub
parent 237a4c37f8
commit 80a8c90a1a
44 changed files with 2385 additions and 636 deletions
@@ -27,6 +27,15 @@ mock.module('../../src/shared/SettingsDefaultsManager.js', () => ({
mock.module('../../src/shared/worker-utils.js', () => ({
ensureWorkerRunning: () => Promise.resolve(true),
getWorkerPort: () => 37777,
workerHttpRequest: (apiPath: string, options?: any) => {
// Delegate to global fetch so tests can mock fetch behavior
const url = `http://127.0.0.1:37777${apiPath}`;
return globalThis.fetch(url, {
method: options?.method ?? 'GET',
headers: options?.headers,
body: options?.body,
});
},
}));
mock.module('../../src/utils/project-name.js', () => ({
+26 -8
View File
@@ -59,7 +59,11 @@ describe('HealthMonitor', () => {
describe('waitForHealth', () => {
it('should succeed immediately when server responds', async () => {
global.fetch = mock(() => Promise.resolve({ ok: true } as Response));
global.fetch = mock(() => Promise.resolve({
ok: true,
status: 200,
text: () => Promise.resolve('')
} as unknown as Response));
const start = Date.now();
const result = await waitForHealth(37777, 5000);
@@ -91,7 +95,11 @@ describe('HealthMonitor', () => {
if (callCount < 3) {
return Promise.reject(new Error('ECONNREFUSED'));
}
return Promise.resolve({ ok: true } as Response);
return Promise.resolve({
ok: true,
status: 200,
text: () => Promise.resolve('')
} as unknown as Response);
});
const result = await waitForHealth(37777, 5000);
@@ -101,7 +109,11 @@ describe('HealthMonitor', () => {
});
it('should check health endpoint for liveness', async () => {
const fetchMock = mock(() => Promise.resolve({ ok: true } as Response));
const fetchMock = mock(() => Promise.resolve({
ok: true,
status: 200,
text: () => Promise.resolve('')
} as unknown as Response));
global.fetch = fetchMock;
await waitForHealth(37777, 1000);
@@ -115,7 +127,11 @@ describe('HealthMonitor', () => {
});
it('should use default timeout when not specified', async () => {
global.fetch = mock(() => Promise.resolve({ ok: true } as Response));
global.fetch = mock(() => Promise.resolve({
ok: true,
status: 200,
text: () => Promise.resolve('')
} as unknown as Response));
// Just verify it doesn't throw and returns quickly
const result = await waitForHealth(37777);
@@ -154,8 +170,9 @@ describe('HealthMonitor', () => {
it('should detect version mismatch', async () => {
global.fetch = mock(() => Promise.resolve({
ok: true,
json: () => Promise.resolve({ version: '0.0.0-definitely-wrong' })
} as Response));
status: 200,
text: () => Promise.resolve(JSON.stringify({ version: '0.0.0-definitely-wrong' }))
} as unknown as Response));
const result = await checkVersionMatch(37777);
@@ -172,8 +189,9 @@ describe('HealthMonitor', () => {
global.fetch = mock(() => Promise.resolve({
ok: true,
json: () => Promise.resolve({ version: pluginVersion })
} as Response));
status: 200,
text: () => Promise.resolve(JSON.stringify({ version: pluginVersion }))
} as unknown as Response));
const result = await checkVersionMatch(37777);
+123
View File
@@ -0,0 +1,123 @@
import { describe, expect, it } from 'bun:test';
import { sanitizeEnv } from '../../src/supervisor/env-sanitizer.js';
describe('sanitizeEnv', () => {
it('strips variables with CLAUDECODE_ prefix', () => {
const result = sanitizeEnv({
CLAUDECODE_FOO: 'bar',
CLAUDECODE_SOMETHING: 'value',
PATH: '/usr/bin'
});
expect(result.CLAUDECODE_FOO).toBeUndefined();
expect(result.CLAUDECODE_SOMETHING).toBeUndefined();
expect(result.PATH).toBe('/usr/bin');
});
it('strips variables with CLAUDE_CODE_ prefix', () => {
const result = sanitizeEnv({
CLAUDE_CODE_BAR: 'baz',
CLAUDE_CODE_OAUTH_TOKEN: 'token',
HOME: '/home/user'
});
expect(result.CLAUDE_CODE_BAR).toBeUndefined();
expect(result.CLAUDE_CODE_OAUTH_TOKEN).toBeUndefined();
expect(result.HOME).toBe('/home/user');
});
it('strips exact-match variables (CLAUDECODE, CLAUDE_CODE_SESSION, CLAUDE_CODE_ENTRYPOINT, MCP_SESSION_ID)', () => {
const result = sanitizeEnv({
CLAUDECODE: '1',
CLAUDE_CODE_SESSION: 'session-123',
CLAUDE_CODE_ENTRYPOINT: 'hook',
MCP_SESSION_ID: 'mcp-abc',
NODE_PATH: '/usr/local/lib'
});
expect(result.CLAUDECODE).toBeUndefined();
expect(result.CLAUDE_CODE_SESSION).toBeUndefined();
expect(result.CLAUDE_CODE_ENTRYPOINT).toBeUndefined();
expect(result.MCP_SESSION_ID).toBeUndefined();
expect(result.NODE_PATH).toBe('/usr/local/lib');
});
it('preserves allowed variables like PATH, HOME, NODE_PATH', () => {
const result = sanitizeEnv({
PATH: '/usr/bin:/usr/local/bin',
HOME: '/home/user',
NODE_PATH: '/usr/local/lib/node_modules',
SHELL: '/bin/zsh',
USER: 'developer',
LANG: 'en_US.UTF-8'
});
expect(result.PATH).toBe('/usr/bin:/usr/local/bin');
expect(result.HOME).toBe('/home/user');
expect(result.NODE_PATH).toBe('/usr/local/lib/node_modules');
expect(result.SHELL).toBe('/bin/zsh');
expect(result.USER).toBe('developer');
expect(result.LANG).toBe('en_US.UTF-8');
});
it('returns a new object and does not mutate the original', () => {
const original: NodeJS.ProcessEnv = {
PATH: '/usr/bin',
CLAUDECODE_FOO: 'bar',
KEEP: 'yes'
};
const originalCopy = { ...original };
const result = sanitizeEnv(original);
// Result should be a different object
expect(result).not.toBe(original);
// Original should be unchanged
expect(original).toEqual(originalCopy);
// Result should not contain stripped vars
expect(result.CLAUDECODE_FOO).toBeUndefined();
expect(result.PATH).toBe('/usr/bin');
});
it('handles empty env gracefully', () => {
const result = sanitizeEnv({});
expect(result).toEqual({});
});
it('skips entries with undefined values', () => {
const env: NodeJS.ProcessEnv = {
DEFINED: 'value',
UNDEFINED_KEY: undefined
};
const result = sanitizeEnv(env);
expect(result.DEFINED).toBe('value');
expect('UNDEFINED_KEY' in result).toBe(false);
});
it('combines prefix and exact match removal in a single pass', () => {
const result = sanitizeEnv({
PATH: '/usr/bin',
CLAUDECODE: '1',
CLAUDECODE_FOO: 'bar',
CLAUDE_CODE_BAR: 'baz',
CLAUDE_CODE_OAUTH_TOKEN: 'oauth-token',
CLAUDE_CODE_SESSION: 'session',
CLAUDE_CODE_ENTRYPOINT: 'entry',
MCP_SESSION_ID: 'mcp',
KEEP_ME: 'yes'
});
expect(result.PATH).toBe('/usr/bin');
expect(result.KEEP_ME).toBe('yes');
expect(result.CLAUDECODE).toBeUndefined();
expect(result.CLAUDECODE_FOO).toBeUndefined();
expect(result.CLAUDE_CODE_BAR).toBeUndefined();
expect(result.CLAUDE_CODE_OAUTH_TOKEN).toBeUndefined();
expect(result.CLAUDE_CODE_SESSION).toBeUndefined();
expect(result.CLAUDE_CODE_ENTRYPOINT).toBeUndefined();
expect(result.MCP_SESSION_ID).toBeUndefined();
});
});
+73
View File
@@ -0,0 +1,73 @@
import { afterEach, describe, expect, it, mock } from 'bun:test';
import { startHealthChecker, stopHealthChecker } from '../../src/supervisor/health-checker.js';
describe('health-checker', () => {
afterEach(() => {
// Always stop the checker to avoid leaking intervals between tests
stopHealthChecker();
});
it('startHealthChecker sets up an interval without throwing', () => {
expect(() => startHealthChecker()).not.toThrow();
});
it('stopHealthChecker clears the interval without throwing', () => {
startHealthChecker();
expect(() => stopHealthChecker()).not.toThrow();
});
it('stopHealthChecker is safe to call when no checker is running', () => {
expect(() => stopHealthChecker()).not.toThrow();
});
it('multiple startHealthChecker calls do not create multiple intervals', () => {
// Track setInterval calls
const originalSetInterval = globalThis.setInterval;
let setIntervalCallCount = 0;
globalThis.setInterval = ((...args: Parameters<typeof setInterval>) => {
setIntervalCallCount++;
return originalSetInterval(...args);
}) as typeof setInterval;
try {
// Stop any existing checker first to ensure clean state
stopHealthChecker();
setIntervalCallCount = 0;
startHealthChecker();
startHealthChecker();
startHealthChecker();
// Only one interval should have been created due to the guard
expect(setIntervalCallCount).toBe(1);
} finally {
globalThis.setInterval = originalSetInterval;
}
});
it('stopHealthChecker after start allows restarting', () => {
const originalSetInterval = globalThis.setInterval;
let setIntervalCallCount = 0;
globalThis.setInterval = ((...args: Parameters<typeof setInterval>) => {
setIntervalCallCount++;
return originalSetInterval(...args);
}) as typeof setInterval;
try {
stopHealthChecker();
setIntervalCallCount = 0;
startHealthChecker();
expect(setIntervalCallCount).toBe(1);
stopHealthChecker();
startHealthChecker();
expect(setIntervalCallCount).toBe(2);
} finally {
globalThis.setInterval = originalSetInterval;
}
});
});
+111
View File
@@ -0,0 +1,111 @@
import { afterEach, describe, expect, it } from 'bun:test';
import { mkdirSync, rmSync, writeFileSync } from 'fs';
import { tmpdir } from 'os';
import path from 'path';
import { validateWorkerPidFile, type ValidateWorkerPidStatus } from '../../src/supervisor/index.js';
function makeTempDir(): string {
const dir = path.join(tmpdir(), `claude-mem-index-${Date.now()}-${Math.random().toString(36).slice(2)}`);
mkdirSync(dir, { recursive: true });
return dir;
}
const tempDirs: string[] = [];
describe('validateWorkerPidFile', () => {
afterEach(() => {
while (tempDirs.length > 0) {
const dir = tempDirs.pop();
if (dir) {
rmSync(dir, { recursive: true, force: true });
}
}
});
it('returns "missing" when PID file does not exist', () => {
const tempDir = makeTempDir();
tempDirs.push(tempDir);
const pidFilePath = path.join(tempDir, 'worker.pid');
const status = validateWorkerPidFile({ logAlive: false, pidFilePath });
expect(status).toBe('missing');
});
it('returns "invalid" when PID file contains bad JSON', () => {
const tempDir = makeTempDir();
tempDirs.push(tempDir);
const pidFilePath = path.join(tempDir, 'worker.pid');
writeFileSync(pidFilePath, 'not-json!!!');
const status = validateWorkerPidFile({ logAlive: false, pidFilePath });
expect(status).toBe('invalid');
});
it('returns "stale" when PID file references a dead process', () => {
const tempDir = makeTempDir();
tempDirs.push(tempDir);
const pidFilePath = path.join(tempDir, 'worker.pid');
writeFileSync(pidFilePath, JSON.stringify({
pid: 2147483647,
port: 37777,
startedAt: new Date().toISOString()
}));
const status = validateWorkerPidFile({ logAlive: false, pidFilePath });
expect(status).toBe('stale');
});
it('returns "alive" when PID file references the current process', () => {
const tempDir = makeTempDir();
tempDirs.push(tempDir);
const pidFilePath = path.join(tempDir, 'worker.pid');
writeFileSync(pidFilePath, JSON.stringify({
pid: process.pid,
port: 37777,
startedAt: new Date().toISOString()
}));
const status = validateWorkerPidFile({ logAlive: false, pidFilePath });
expect(status).toBe('alive');
});
});
describe('Supervisor assertCanSpawn behavior', () => {
it('assertCanSpawn throws when stopPromise is active (shutdown in progress)', () => {
const { getSupervisor } = require('../../src/supervisor/index.js');
const supervisor = getSupervisor();
// When not shutting down, assertCanSpawn should not throw
expect(() => supervisor.assertCanSpawn('test')).not.toThrow();
});
it('registerProcess and unregisterProcess delegate to the registry', () => {
const { getSupervisor } = require('../../src/supervisor/index.js');
const supervisor = getSupervisor();
const registry = supervisor.getRegistry();
const testId = `test-${Date.now()}`;
supervisor.registerProcess(testId, {
pid: process.pid,
type: 'test',
startedAt: new Date().toISOString()
});
const found = registry.getAll().find((r: { id: string }) => r.id === testId);
expect(found).toBeDefined();
expect(found?.type).toBe('test');
supervisor.unregisterProcess(testId);
const afterUnregister = registry.getAll().find((r: { id: string }) => r.id === testId);
expect(afterUnregister).toBeUndefined();
});
});
describe('Supervisor start idempotency', () => {
it('getSupervisor returns the same instance', () => {
const { getSupervisor } = require('../../src/supervisor/index.js');
const s1 = getSupervisor();
const s2 = getSupervisor();
expect(s1).toBe(s2);
});
});
+423
View File
@@ -0,0 +1,423 @@
import { afterEach, describe, expect, it } from 'bun:test';
import { existsSync, mkdirSync, readFileSync, rmSync, writeFileSync } from 'fs';
import { tmpdir } from 'os';
import path from 'path';
import { createProcessRegistry, isPidAlive } from '../../src/supervisor/process-registry.js';
function makeTempDir(): string {
return path.join(tmpdir(), `claude-mem-supervisor-${Date.now()}-${Math.random().toString(36).slice(2)}`);
}
const tempDirs: string[] = [];
describe('supervisor ProcessRegistry', () => {
afterEach(() => {
while (tempDirs.length > 0) {
const dir = tempDirs.pop();
if (dir) {
rmSync(dir, { recursive: true, force: true });
}
}
});
describe('isPidAlive', () => {
it('treats current process as alive', () => {
expect(isPidAlive(process.pid)).toBe(true);
});
it('treats an impossibly high PID as dead', () => {
expect(isPidAlive(2147483647)).toBe(false);
});
it('treats negative PID as dead', () => {
expect(isPidAlive(-1)).toBe(false);
});
it('treats non-integer PID as dead', () => {
expect(isPidAlive(3.14)).toBe(false);
});
});
describe('persistence', () => {
it('persists entries to disk and reloads them on initialize', () => {
const tempDir = makeTempDir();
tempDirs.push(tempDir);
mkdirSync(tempDir, { recursive: true });
const registryPath = path.join(tempDir, 'supervisor.json');
// Create a registry, register an entry, and let it persist
const registry1 = createProcessRegistry(registryPath);
registry1.register('worker:1', {
pid: process.pid,
type: 'worker',
startedAt: '2026-03-15T00:00:00.000Z'
});
// Verify file exists on disk
expect(existsSync(registryPath)).toBe(true);
const diskData = JSON.parse(readFileSync(registryPath, 'utf-8'));
expect(diskData.processes['worker:1']).toBeDefined();
// Create a second registry from the same path — it should load the persisted entry
const registry2 = createProcessRegistry(registryPath);
registry2.initialize();
const records = registry2.getAll();
expect(records).toHaveLength(1);
expect(records[0]?.id).toBe('worker:1');
expect(records[0]?.pid).toBe(process.pid);
});
it('prunes dead processes on initialize', () => {
const tempDir = makeTempDir();
tempDirs.push(tempDir);
mkdirSync(tempDir, { recursive: true });
const registryPath = path.join(tempDir, 'supervisor.json');
writeFileSync(registryPath, JSON.stringify({
processes: {
alive: {
pid: process.pid,
type: 'worker',
startedAt: '2026-03-15T00:00:00.000Z'
},
dead: {
pid: 2147483647,
type: 'mcp',
startedAt: '2026-03-15T00:00:01.000Z'
}
}
}));
const registry = createProcessRegistry(registryPath);
registry.initialize();
const records = registry.getAll();
expect(records).toHaveLength(1);
expect(records[0]?.id).toBe('alive');
expect(existsSync(registryPath)).toBe(true);
});
it('handles corrupted registry file gracefully', () => {
const tempDir = makeTempDir();
tempDirs.push(tempDir);
mkdirSync(tempDir, { recursive: true });
const registryPath = path.join(tempDir, 'supervisor.json');
writeFileSync(registryPath, '{ not valid json!!!');
const registry = createProcessRegistry(registryPath);
registry.initialize();
// Should recover with an empty registry
expect(registry.getAll()).toHaveLength(0);
});
});
describe('register and unregister', () => {
it('register adds an entry retrievable by getAll', () => {
const tempDir = makeTempDir();
tempDirs.push(tempDir);
const registry = createProcessRegistry(path.join(tempDir, 'supervisor.json'));
expect(registry.getAll()).toHaveLength(0);
registry.register('sdk:1', {
pid: process.pid,
type: 'sdk',
startedAt: '2026-03-15T00:00:00.000Z'
});
const records = registry.getAll();
expect(records).toHaveLength(1);
expect(records[0]?.id).toBe('sdk:1');
expect(records[0]?.type).toBe('sdk');
});
it('unregister removes an entry', () => {
const tempDir = makeTempDir();
tempDirs.push(tempDir);
const registry = createProcessRegistry(path.join(tempDir, 'supervisor.json'));
registry.register('sdk:1', {
pid: process.pid,
type: 'sdk',
startedAt: '2026-03-15T00:00:00.000Z'
});
expect(registry.getAll()).toHaveLength(1);
registry.unregister('sdk:1');
expect(registry.getAll()).toHaveLength(0);
});
it('unregister is a no-op for unknown IDs', () => {
const tempDir = makeTempDir();
tempDirs.push(tempDir);
const registry = createProcessRegistry(path.join(tempDir, 'supervisor.json'));
registry.register('sdk:1', {
pid: process.pid,
type: 'sdk',
startedAt: '2026-03-15T00:00:00.000Z'
});
registry.unregister('nonexistent');
expect(registry.getAll()).toHaveLength(1);
});
});
describe('getAll', () => {
it('returns records sorted by startedAt ascending', () => {
const tempDir = makeTempDir();
tempDirs.push(tempDir);
const registry = createProcessRegistry(path.join(tempDir, 'supervisor.json'));
registry.register('newest', {
pid: process.pid,
type: 'sdk',
startedAt: '2026-03-15T00:00:02.000Z'
});
registry.register('oldest', {
pid: process.pid,
type: 'worker',
startedAt: '2026-03-15T00:00:00.000Z'
});
registry.register('middle', {
pid: process.pid,
type: 'mcp',
startedAt: '2026-03-15T00:00:01.000Z'
});
const records = registry.getAll();
expect(records).toHaveLength(3);
expect(records[0]?.id).toBe('oldest');
expect(records[1]?.id).toBe('middle');
expect(records[2]?.id).toBe('newest');
});
it('returns empty array when no entries exist', () => {
const tempDir = makeTempDir();
tempDirs.push(tempDir);
const registry = createProcessRegistry(path.join(tempDir, 'supervisor.json'));
expect(registry.getAll()).toEqual([]);
});
});
describe('getBySession', () => {
it('filters records by session id', () => {
const tempDir = makeTempDir();
tempDirs.push(tempDir);
const registry = createProcessRegistry(path.join(tempDir, 'supervisor.json'));
registry.register('sdk:1', {
pid: process.pid,
type: 'sdk',
sessionId: 42,
startedAt: '2026-03-15T00:00:00.000Z'
});
registry.register('sdk:2', {
pid: process.pid,
type: 'sdk',
sessionId: 'other',
startedAt: '2026-03-15T00:00:01.000Z'
});
const records = registry.getBySession(42);
expect(records).toHaveLength(1);
expect(records[0]?.id).toBe('sdk:1');
});
it('returns empty array when no processes match the session', () => {
const tempDir = makeTempDir();
tempDirs.push(tempDir);
const registry = createProcessRegistry(path.join(tempDir, 'supervisor.json'));
registry.register('sdk:1', {
pid: process.pid,
type: 'sdk',
sessionId: 42,
startedAt: '2026-03-15T00:00:00.000Z'
});
expect(registry.getBySession(999)).toHaveLength(0);
});
it('matches string and numeric session IDs by string comparison', () => {
const tempDir = makeTempDir();
tempDirs.push(tempDir);
const registry = createProcessRegistry(path.join(tempDir, 'supervisor.json'));
registry.register('sdk:1', {
pid: process.pid,
type: 'sdk',
sessionId: '42',
startedAt: '2026-03-15T00:00:00.000Z'
});
// Querying with number should find string "42"
expect(registry.getBySession(42)).toHaveLength(1);
});
});
describe('pruneDeadEntries', () => {
it('removes entries with dead PIDs and preserves live ones', () => {
const tempDir = makeTempDir();
tempDirs.push(tempDir);
const registryPath = path.join(tempDir, 'supervisor.json');
const registry = createProcessRegistry(registryPath);
registry.register('alive', {
pid: process.pid,
type: 'worker',
startedAt: '2026-03-15T00:00:00.000Z'
});
registry.register('dead', {
pid: 2147483647,
type: 'mcp',
startedAt: '2026-03-15T00:00:01.000Z'
});
const removed = registry.pruneDeadEntries();
expect(removed).toBe(1);
expect(registry.getAll()).toHaveLength(1);
expect(registry.getAll()[0]?.id).toBe('alive');
});
it('returns 0 when all entries are alive', () => {
const tempDir = makeTempDir();
tempDirs.push(tempDir);
const registry = createProcessRegistry(path.join(tempDir, 'supervisor.json'));
registry.register('alive', {
pid: process.pid,
type: 'worker',
startedAt: '2026-03-15T00:00:00.000Z'
});
const removed = registry.pruneDeadEntries();
expect(removed).toBe(0);
expect(registry.getAll()).toHaveLength(1);
});
it('persists changes to disk after pruning', () => {
const tempDir = makeTempDir();
tempDirs.push(tempDir);
const registryPath = path.join(tempDir, 'supervisor.json');
const registry = createProcessRegistry(registryPath);
registry.register('dead', {
pid: 2147483647,
type: 'mcp',
startedAt: '2026-03-15T00:00:01.000Z'
});
registry.pruneDeadEntries();
const diskData = JSON.parse(readFileSync(registryPath, 'utf-8'));
expect(Object.keys(diskData.processes)).toHaveLength(0);
});
});
describe('clear', () => {
it('removes all entries', () => {
const tempDir = makeTempDir();
tempDirs.push(tempDir);
const registryPath = path.join(tempDir, 'supervisor.json');
const registry = createProcessRegistry(registryPath);
registry.register('sdk:1', {
pid: process.pid,
type: 'sdk',
startedAt: '2026-03-15T00:00:00.000Z'
});
registry.register('sdk:2', {
pid: process.pid,
type: 'sdk',
startedAt: '2026-03-15T00:00:01.000Z'
});
expect(registry.getAll()).toHaveLength(2);
registry.clear();
expect(registry.getAll()).toHaveLength(0);
// Verify persisted to disk
const diskData = JSON.parse(readFileSync(registryPath, 'utf-8'));
expect(Object.keys(diskData.processes)).toHaveLength(0);
});
});
describe('createProcessRegistry', () => {
it('creates an isolated instance with a custom path', () => {
const tempDir1 = makeTempDir();
const tempDir2 = makeTempDir();
tempDirs.push(tempDir1, tempDir2);
const registry1 = createProcessRegistry(path.join(tempDir1, 'supervisor.json'));
const registry2 = createProcessRegistry(path.join(tempDir2, 'supervisor.json'));
registry1.register('sdk:1', {
pid: process.pid,
type: 'sdk',
startedAt: '2026-03-15T00:00:00.000Z'
});
// registry2 should be independent
expect(registry1.getAll()).toHaveLength(1);
expect(registry2.getAll()).toHaveLength(0);
});
});
describe('reapSession', () => {
it('unregisters dead processes for the given session', async () => {
const tempDir = makeTempDir();
tempDirs.push(tempDir);
const registry = createProcessRegistry(path.join(tempDir, 'supervisor.json'));
registry.register('sdk:99:50001', {
pid: 2147483640,
type: 'sdk',
sessionId: 99,
startedAt: '2026-03-15T00:00:00.000Z'
});
registry.register('mcp:99:50002', {
pid: 2147483641,
type: 'mcp',
sessionId: 99,
startedAt: '2026-03-15T00:00:01.000Z'
});
// Register a process for a different session (should survive)
registry.register('sdk:100:50003', {
pid: process.pid,
type: 'sdk',
sessionId: 100,
startedAt: '2026-03-15T00:00:02.000Z'
});
const reaped = await registry.reapSession(99);
expect(reaped).toBe(2);
expect(registry.getBySession(99)).toHaveLength(0);
expect(registry.getBySession(100)).toHaveLength(1);
});
it('returns 0 when no processes match the session', async () => {
const tempDir = makeTempDir();
tempDirs.push(tempDir);
const registry = createProcessRegistry(path.join(tempDir, 'supervisor.json'));
registry.register('sdk:1', {
pid: process.pid,
type: 'sdk',
sessionId: 42,
startedAt: '2026-03-15T00:00:00.000Z'
});
const reaped = await registry.reapSession(999);
expect(reaped).toBe(0);
expect(registry.getAll()).toHaveLength(1);
});
});
});
+186
View File
@@ -0,0 +1,186 @@
import { afterEach, describe, expect, it } from 'bun:test';
import { mkdirSync, readFileSync, rmSync, writeFileSync } from 'fs';
import { tmpdir } from 'os';
import path from 'path';
import { createProcessRegistry } from '../../src/supervisor/process-registry.js';
import { runShutdownCascade } from '../../src/supervisor/shutdown.js';
function makeTempDir(): string {
return path.join(tmpdir(), `claude-mem-shutdown-${Date.now()}-${Math.random().toString(36).slice(2)}`);
}
const tempDirs: string[] = [];
describe('supervisor shutdown cascade', () => {
afterEach(() => {
while (tempDirs.length > 0) {
const dir = tempDirs.pop();
if (dir) {
rmSync(dir, { recursive: true, force: true });
}
}
});
it('removes child records and pid file', async () => {
const tempDir = makeTempDir();
tempDirs.push(tempDir);
mkdirSync(tempDir, { recursive: true });
const registryPath = path.join(tempDir, 'supervisor.json');
const pidFilePath = path.join(tempDir, 'worker.pid');
writeFileSync(pidFilePath, JSON.stringify({
pid: process.pid,
port: 37777,
startedAt: new Date().toISOString()
}));
const registry = createProcessRegistry(registryPath);
registry.register('worker', {
pid: process.pid,
type: 'worker',
startedAt: '2026-03-15T00:00:00.000Z'
});
registry.register('dead-child', {
pid: 2147483647,
type: 'mcp',
startedAt: '2026-03-15T00:00:01.000Z'
});
await runShutdownCascade({
registry,
currentPid: process.pid,
pidFilePath
});
const persisted = JSON.parse(readFileSync(registryPath, 'utf-8'));
expect(Object.keys(persisted.processes)).toHaveLength(0);
expect(() => readFileSync(pidFilePath, 'utf-8')).toThrow();
});
it('terminates tracked children in reverse spawn order', async () => {
const tempDir = makeTempDir();
tempDirs.push(tempDir);
mkdirSync(tempDir, { recursive: true });
const registry = createProcessRegistry(path.join(tempDir, 'supervisor.json'));
registry.register('oldest', {
pid: 41001,
type: 'sdk',
startedAt: '2026-03-15T00:00:00.000Z'
});
registry.register('middle', {
pid: 41002,
type: 'mcp',
startedAt: '2026-03-15T00:00:01.000Z'
});
registry.register('newest', {
pid: 41003,
type: 'chroma',
startedAt: '2026-03-15T00:00:02.000Z'
});
const originalKill = process.kill;
const alive = new Set([41001, 41002, 41003]);
const calls: Array<{ pid: number; signal: NodeJS.Signals | number }> = [];
process.kill = ((pid: number, signal?: NodeJS.Signals | number) => {
const normalizedSignal = signal ?? 'SIGTERM';
if (normalizedSignal === 0) {
if (!alive.has(pid)) {
const error = new Error(`kill ESRCH ${pid}`) as NodeJS.ErrnoException;
error.code = 'ESRCH';
throw error;
}
return true;
}
calls.push({ pid, signal: normalizedSignal });
alive.delete(pid);
return true;
}) as typeof process.kill;
try {
await runShutdownCascade({
registry,
currentPid: process.pid,
pidFilePath: path.join(tempDir, 'worker.pid')
});
} finally {
process.kill = originalKill;
}
expect(calls).toEqual([
{ pid: 41003, signal: 'SIGTERM' },
{ pid: 41002, signal: 'SIGTERM' },
{ pid: 41001, signal: 'SIGTERM' }
]);
});
it('handles already-dead processes gracefully without throwing', async () => {
const tempDir = makeTempDir();
tempDirs.push(tempDir);
mkdirSync(tempDir, { recursive: true });
const registryPath = path.join(tempDir, 'supervisor.json');
const registry = createProcessRegistry(registryPath);
// Register processes with PIDs that are definitely dead
registry.register('dead:1', {
pid: 2147483640,
type: 'sdk',
startedAt: '2026-03-15T00:00:00.000Z'
});
registry.register('dead:2', {
pid: 2147483641,
type: 'mcp',
startedAt: '2026-03-15T00:00:01.000Z'
});
// Should not throw
await runShutdownCascade({
registry,
currentPid: process.pid,
pidFilePath: path.join(tempDir, 'worker.pid')
});
// All entries should be unregistered
const persisted = JSON.parse(readFileSync(registryPath, 'utf-8'));
expect(Object.keys(persisted.processes)).toHaveLength(0);
});
it('unregisters all children from registry after cascade', async () => {
const tempDir = makeTempDir();
tempDirs.push(tempDir);
mkdirSync(tempDir, { recursive: true });
const registryPath = path.join(tempDir, 'supervisor.json');
const registry = createProcessRegistry(registryPath);
registry.register('worker', {
pid: process.pid,
type: 'worker',
startedAt: '2026-03-15T00:00:00.000Z'
});
registry.register('child:1', {
pid: 2147483640,
type: 'sdk',
startedAt: '2026-03-15T00:00:01.000Z'
});
registry.register('child:2', {
pid: 2147483641,
type: 'mcp',
startedAt: '2026-03-15T00:00:02.000Z'
});
await runShutdownCascade({
registry,
currentPid: process.pid,
pidFilePath: path.join(tempDir, 'worker.pid')
});
// All records (including the current process one) should be removed
expect(registry.getAll()).toHaveLength(0);
});
});
+18
View File
@@ -14,6 +14,24 @@ mock.module('../../src/utils/logger.js', () => ({
},
}));
// Mock worker-utils to delegate workerHttpRequest to global.fetch
mock.module('../../src/shared/worker-utils.js', () => ({
getWorkerPort: () => 37777,
getWorkerHost: () => '127.0.0.1',
workerHttpRequest: (apiPath: string, options?: any) => {
const url = `http://127.0.0.1:37777${apiPath}`;
return globalThis.fetch(url, {
method: options?.method ?? 'GET',
headers: options?.headers,
body: options?.body,
});
},
clearPortCache: () => {},
ensureWorkerRunning: () => Promise.resolve(true),
fetchWithTimeout: (url: string, init: any, timeoutMs: number) => globalThis.fetch(url, init),
buildWorkerUrl: (apiPath: string) => `http://127.0.0.1:37777${apiPath}`,
}));
// Import after mocks
import {
replaceTaggedContent,