bff10d49c9
* chore: bump version to 7.3.6 in package.json * Enhance worker readiness checks and MCP connection handling - Updated health check endpoint to /api/readiness for better initialization tracking. - Increased timeout for health checks and worker startup retries, especially for Windows. - Added initialization flags to track MCP readiness and overall worker initialization status. - Implemented a timeout guard for MCP connection to prevent hanging. - Adjusted logging to reflect readiness state and errors more accurately. * fix(windows): use Bun PATH detection in worker wrapper Phase 2/8: Fix Bun PATH Detection in Worker Wrapper - Import getBunPath() in worker-wrapper.ts for Bun detection - Add Bun path resolution before spawning inner worker process - Update spawn call to use detected Bun path instead of process.execPath - Add logging to bun-path.ts when PATH detection succeeds - Add logging when fallback paths are used - Add Windows-specific validation for .exe extension - Log warning with searched paths when Bun not found - Fail fast with clear error message if Bun cannot be detected This ensures worker-wrapper uses the correct Bun executable on Windows even when Bun is not in PATH, fixing issue #371 where users reported "Bun not in PATH" errors despite Bun being installed. Addresses: #371 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(windows): standardize child process spawning with windowsHide Phase 3/8: Standardize Child Process Spawning (Windows) Changes: - Added windowsHide flag to ChromaSync MCP subprocess spawn - Added Windows-specific process tracking (childPid) in ChromaSync - Force-kill subprocess on Windows before closing transport to prevent zombie processes - Updated cleanupOrphanedProcesses() to support Windows using PowerShell Get-CimInstance - Use taskkill /T /F for proper process tree cleanup on Windows - Audited BranchManager - confirmed windowsHide already present on all spawn calls This prevents PowerShell windows from appearing during ChromaSync operations and ensures proper cleanup of subprocess trees on Windows. Addresses: #363, #361, #367, #371, #373, #374 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(windows): enhance socket cleanup with recursive process tree management Phase 4/8: Enhanced Socket Cleanup & Process Tree Management Changes: - Added recursive process tree enumeration in worker-wrapper.ts for Windows - Enhanced killInner() to enumerate all descendants before killing - Added fallback individual process kill if taskkill /T fails - Added 10s timeout to ChromaSync.close() in DatabaseManager to prevent hangs - Force nullify ChromaSync even on close failure to prevent resource leaks - Improved logging to show full process tree during cleanup This ensures complete cleanup of all child processes (ChromaSync MCP subprocess, Python processes, etc.) preventing socket leaks and CLOSE_WAIT states. Addresses: #363, #361 * fix(windows): consolidate project name extraction with drive root handling Phase 5/8: Project Name Extraction Consolidation - Created shared getProjectName() utility in src/utils/project-name.ts - Handles edge case: drive roots (C:\, J:\) now return "drive-X" format - Handles edge case: null/undefined/empty cwd now returns "unknown-project" - Fixed missing null check bug in new-hook.ts - Replaced duplicated path.basename(cwd) logic in: - src/hooks/context-hook.ts - src/hooks/new-hook.ts - src/services/context-generator.ts Addresses: #374 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(windows): increase timeouts and improve error messages Phase 6/8: Increase Timeouts & Improve Error Messages - Enhanced logger.ts with platform prefix (WIN32/DARWIN) and PID in all logs - Added comprehensive Windows troubleshooting to ProcessManager error messages - Enhanced Bun detection error message with Windows-specific troubleshooting - All error messages now include GitHub issue numbers and docs links - Windows timeout already increased to 2.0x multiplier in previous phases Changes: - src/utils/logger.ts: Added platform prefix and PID to all log output - src/services/process/ProcessManager.ts: Enhanced error messages with troubleshooting steps - src/utils/bun-path.ts: Added Windows-specific Bun detection error guidance Addresses: #363, #361, #367, #371, #373, #374 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(windows): add comprehensive Windows CI testing Phase 7/8: Add Windows CI Testing - Create automated Windows testing workflow - Test worker startup/shutdown cycles - Verify Bun PATH detection on Windows - Test rapid restart scenarios - Validate port cleanup after shutdown - Check for zombie processes - Run on all pushes and PRs to main/fix/feature branches Addresses: #363, #361, #367, #371, #373, #374 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * ci(windows): remove build steps from Windows CI workflow Build files are already included in the plugin folder, so npm install and npm run build are unnecessary steps in the CI workflow. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * revert: remove Windows CI workflow The CI workflow cannot be properly implemented in the current architecture due to limitations in testing the worker service in CI environments. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * security: add PID validation and improve ChromaSync timeout handling Address critical security and reliability issues identified in PR review: **Security Fixes:** - Add PID validation before all PowerShell/taskkill command execution - Validate PIDs are positive integers to prevent command injection - Apply validation in worker-wrapper.ts, worker-service.ts, and ChromaSync.ts **Reliability Improvements:** - Add timeout handling to ChromaSync client.close() (10s timeout) - Add timeout handling to ChromaSync transport.close() (5s timeout) - Implement force-kill fallback when ChromaSync close operations timeout - Prevents hanging on shutdown and ensures subprocess cleanup **Implementation Details:** - PID validation checks: Number.isInteger(pid) && pid > 0 - Applied before all execSync taskkill calls on Windows - Applied in process enumeration (Get-CimInstance) PowerShell commands - ChromaSync.close() uses Promise.race for timeout enforcement - Graceful degradation with force-kill fallback on timeout Addresses PR #378 review feedback 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Refactor ChromaSync client and transport closure logic - Removed timeout handling for closing the Chroma client and transport. - Simplified error logging for client and transport closure. - Ensured subprocess cleanup logic is more straightforward. * fix(worker): streamline Windows process management and cleanup * revert: remove speculative LLM-generated complexity Reverts defensive code that was added speculatively without user-reported issues: - ChromaSync: Remove PID extraction and explicit taskkill (wrapper handles this) - worker-wrapper: Restore simple taskkill /T /F (validated in v7.3.5) - DatabaseManager: Remove Promise.race timeout wrapper - hook-constants: Restore original timeout values - logger: Remove platform/PID additions to every log line - bun-path: Remove speculative logging Keeps only changes that map to actual GitHub issues: - #374: Drive root project name fix (getProjectName utility) - #363: Readiness endpoint and Windows orphan cleanup - #367: windowsHide on ChromaSync transport 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
343 lines
12 KiB
TypeScript
343 lines
12 KiB
TypeScript
import { existsSync, readFileSync, writeFileSync, unlinkSync, mkdirSync } from 'fs';
|
|
import { createWriteStream } from 'fs';
|
|
import { join } from 'path';
|
|
import { spawn, spawnSync } from 'child_process';
|
|
import { homedir } from 'os';
|
|
import { DATA_DIR } from '../../shared/paths.js';
|
|
import { getBunPath, isBunAvailable } from '../../utils/bun-path.js';
|
|
|
|
const PID_FILE = join(DATA_DIR, 'worker.pid');
|
|
const LOG_DIR = join(DATA_DIR, 'logs');
|
|
const MARKETPLACE_ROOT = join(homedir(), '.claude', 'plugins', 'marketplaces', 'thedotmack');
|
|
|
|
// Timeout constants
|
|
const PROCESS_STOP_TIMEOUT_MS = 5000;
|
|
const HEALTH_CHECK_TIMEOUT_MS = 10000;
|
|
const HEALTH_CHECK_INTERVAL_MS = 200;
|
|
const HEALTH_CHECK_FETCH_TIMEOUT_MS = 1000;
|
|
const PROCESS_EXIT_CHECK_INTERVAL_MS = 100;
|
|
|
|
interface PidInfo {
|
|
pid: number;
|
|
port: number;
|
|
startedAt: string;
|
|
version: string;
|
|
}
|
|
|
|
export class ProcessManager {
|
|
static async start(port: number): Promise<{ success: boolean; pid?: number; error?: string }> {
|
|
// Validate port range
|
|
if (isNaN(port) || port < 1024 || port > 65535) {
|
|
return {
|
|
success: false,
|
|
error: `Invalid port ${port}. Must be between 1024 and 65535`
|
|
};
|
|
}
|
|
|
|
// Check if already running
|
|
if (await this.isRunning()) {
|
|
const info = this.getPidInfo();
|
|
return { success: true, pid: info?.pid };
|
|
}
|
|
|
|
// Ensure log directory exists
|
|
mkdirSync(LOG_DIR, { recursive: true });
|
|
|
|
// On Windows, use the wrapper script to solve zombie port problem
|
|
// On Unix, use the worker directly
|
|
const scriptName = process.platform === 'win32' ? 'worker-wrapper.cjs' : 'worker-service.cjs';
|
|
const workerScript = join(MARKETPLACE_ROOT, 'plugin', 'scripts', scriptName);
|
|
|
|
if (!existsSync(workerScript)) {
|
|
return { success: false, error: `Worker script not found at ${workerScript}` };
|
|
}
|
|
|
|
const logFile = this.getLogFilePath();
|
|
|
|
// Use Bun on all platforms with PowerShell workaround for Windows console popups
|
|
return this.startWithBun(workerScript, logFile, port);
|
|
}
|
|
|
|
private static isBunAvailable(): boolean {
|
|
return isBunAvailable();
|
|
}
|
|
|
|
/**
|
|
* Escapes a string for safe use in PowerShell single-quoted strings.
|
|
* In PowerShell single quotes, the only special character is the single quote itself,
|
|
* which must be doubled to escape it.
|
|
*/
|
|
private static escapePowerShellString(str: string): string {
|
|
return str.replace(/'/g, "''");
|
|
}
|
|
|
|
private static async startWithBun(script: string, logFile: string, port: number): Promise<{ success: boolean; pid?: number; error?: string }> {
|
|
const bunPath = getBunPath();
|
|
if (!bunPath) {
|
|
return {
|
|
success: false,
|
|
error: 'Bun is required but not found in PATH or common installation paths. Install from https://bun.sh'
|
|
};
|
|
}
|
|
try {
|
|
const isWindows = process.platform === 'win32';
|
|
|
|
if (isWindows) {
|
|
// Windows: Use PowerShell Start-Process with -WindowStyle Hidden
|
|
// This properly hides the console window (affects both Bun and Node.js)
|
|
// Note: windowsHide: true doesn't work with detached: true (Bun inherits Node.js process spawning semantics)
|
|
// See: https://github.com/nodejs/node/issues/21825 and PR #315 for detailed testing
|
|
//
|
|
// On Windows, we start worker-wrapper.cjs which manages the actual worker-service.cjs.
|
|
// This solves the zombie port problem: the wrapper has no sockets, so when it kills
|
|
// and respawns the inner worker, the socket is properly released.
|
|
//
|
|
// Security: All paths (bunPath, script, MARKETPLACE_ROOT) are application-controlled system paths,
|
|
// not user input. If an attacker could modify these paths, they would already have full filesystem
|
|
// access including direct access to ~/.claude-mem/claude-mem.db. Nevertheless, we properly escape
|
|
// all values for PowerShell to follow security best practices.
|
|
const escapedBunPath = this.escapePowerShellString(bunPath);
|
|
const escapedScript = this.escapePowerShellString(script);
|
|
const escapedWorkDir = this.escapePowerShellString(MARKETPLACE_ROOT);
|
|
const envVars = `$env:CLAUDE_MEM_WORKER_PORT='${port}'`;
|
|
const psCommand = `${envVars}; Start-Process -FilePath '${escapedBunPath}' -ArgumentList '${escapedScript}' -WorkingDirectory '${escapedWorkDir}' -WindowStyle Hidden -PassThru | Select-Object -ExpandProperty Id`;
|
|
|
|
const result = spawnSync('powershell', ['-Command', psCommand], {
|
|
stdio: 'pipe',
|
|
timeout: 10000,
|
|
windowsHide: true
|
|
});
|
|
|
|
if (result.status !== 0) {
|
|
return {
|
|
success: false,
|
|
error: `PowerShell spawn failed: ${result.stderr?.toString() || 'unknown error'}`
|
|
};
|
|
}
|
|
|
|
const pid = parseInt(result.stdout.toString().trim(), 10);
|
|
if (isNaN(pid)) {
|
|
return { success: false, error: 'Failed to get PID from PowerShell' };
|
|
}
|
|
|
|
// Write PID file
|
|
this.writePidFile({
|
|
pid,
|
|
port,
|
|
startedAt: new Date().toISOString(),
|
|
version: process.env.npm_package_version || 'unknown'
|
|
});
|
|
|
|
// Wait for health
|
|
return this.waitForHealth(pid, port);
|
|
} else {
|
|
// Unix: Use standard spawn with detached
|
|
const child = spawn(bunPath, [script], {
|
|
detached: true,
|
|
stdio: ['ignore', 'pipe', 'pipe'],
|
|
env: { ...process.env, CLAUDE_MEM_WORKER_PORT: String(port) },
|
|
cwd: MARKETPLACE_ROOT
|
|
});
|
|
|
|
// Write logs
|
|
const logStream = createWriteStream(logFile, { flags: 'a' });
|
|
child.stdout?.pipe(logStream);
|
|
child.stderr?.pipe(logStream);
|
|
|
|
child.unref();
|
|
|
|
if (!child.pid) {
|
|
return { success: false, error: 'Failed to get PID from spawned process' };
|
|
}
|
|
|
|
// Write PID file
|
|
this.writePidFile({
|
|
pid: child.pid,
|
|
port,
|
|
startedAt: new Date().toISOString(),
|
|
version: process.env.npm_package_version || 'unknown'
|
|
});
|
|
|
|
// Wait for health
|
|
return this.waitForHealth(child.pid, port);
|
|
}
|
|
} catch (error) {
|
|
return {
|
|
success: false,
|
|
error: error instanceof Error ? error.message : String(error)
|
|
};
|
|
}
|
|
}
|
|
|
|
static async stop(timeout: number = PROCESS_STOP_TIMEOUT_MS): Promise<boolean> {
|
|
const info = this.getPidInfo();
|
|
if (!info) return true;
|
|
|
|
try {
|
|
if (process.platform === 'win32') {
|
|
// On Windows, use taskkill /T /F to kill entire process tree
|
|
// This ensures the wrapper AND all its children (inner worker, MCP, ChromaSync) are killed
|
|
// which is necessary to properly release the socket and avoid zombie ports
|
|
const { execSync } = await import('child_process');
|
|
try {
|
|
execSync(`taskkill /PID ${info.pid} /T /F`, { timeout: 10000, stdio: 'ignore' });
|
|
} catch {
|
|
// Process may already be dead
|
|
}
|
|
} else {
|
|
// On Unix, use signals
|
|
process.kill(info.pid, 'SIGTERM');
|
|
await this.waitForExit(info.pid, timeout);
|
|
}
|
|
} catch {
|
|
try {
|
|
process.kill(info.pid, 'SIGKILL');
|
|
} catch {
|
|
// Process already dead
|
|
}
|
|
}
|
|
|
|
this.removePidFile();
|
|
return true;
|
|
}
|
|
|
|
static async restart(port: number): Promise<{ success: boolean; pid?: number; error?: string }> {
|
|
await this.stop();
|
|
return this.start(port);
|
|
}
|
|
|
|
static async status(): Promise<{ running: boolean; pid?: number; port?: number; uptime?: string }> {
|
|
const info = this.getPidInfo();
|
|
if (!info) return { running: false };
|
|
|
|
const running = this.isProcessAlive(info.pid);
|
|
return {
|
|
running,
|
|
pid: running ? info.pid : undefined,
|
|
port: running ? info.port : undefined,
|
|
uptime: running ? this.formatUptime(info.startedAt) : undefined
|
|
};
|
|
}
|
|
|
|
static async isRunning(): Promise<boolean> {
|
|
const info = this.getPidInfo();
|
|
if (!info) return false;
|
|
const alive = this.isProcessAlive(info.pid);
|
|
if (!alive) {
|
|
this.removePidFile(); // Clean up stale PID file
|
|
}
|
|
return alive;
|
|
}
|
|
|
|
// Helper methods
|
|
private static getPidInfo(): PidInfo | null {
|
|
try {
|
|
if (!existsSync(PID_FILE)) return null;
|
|
const content = readFileSync(PID_FILE, 'utf-8');
|
|
const parsed = JSON.parse(content);
|
|
// Validate required fields have correct types
|
|
if (typeof parsed.pid !== 'number' || typeof parsed.port !== 'number') {
|
|
return null;
|
|
}
|
|
return parsed as PidInfo;
|
|
} catch {
|
|
return null;
|
|
}
|
|
}
|
|
|
|
private static writePidFile(info: PidInfo): void {
|
|
mkdirSync(DATA_DIR, { recursive: true });
|
|
writeFileSync(PID_FILE, JSON.stringify(info, null, 2));
|
|
}
|
|
|
|
private static removePidFile(): void {
|
|
try {
|
|
if (existsSync(PID_FILE)) {
|
|
unlinkSync(PID_FILE);
|
|
}
|
|
} catch {
|
|
// Ignore errors
|
|
}
|
|
}
|
|
|
|
private static isProcessAlive(pid: number): boolean {
|
|
try {
|
|
process.kill(pid, 0);
|
|
return true;
|
|
} catch {
|
|
return false;
|
|
}
|
|
}
|
|
|
|
private static async waitForHealth(pid: number, port: number, timeoutMs: number = HEALTH_CHECK_TIMEOUT_MS): Promise<{ success: boolean; pid?: number; error?: string }> {
|
|
const startTime = Date.now();
|
|
const isWindows = process.platform === 'win32';
|
|
// Increase timeout on Windows to account for slower process startup
|
|
const adjustedTimeout = isWindows ? timeoutMs * 2 : timeoutMs;
|
|
|
|
while (Date.now() - startTime < adjustedTimeout) {
|
|
// Check if process is still alive
|
|
if (!this.isProcessAlive(pid)) {
|
|
const errorMsg = isWindows
|
|
? `Process died during startup\n\nTroubleshooting:\n1. Check Task Manager for zombie 'bun.exe' or 'node.exe' processes\n2. Verify port ${port} is not in use: netstat -ano | findstr ${port}\n3. Check worker logs in ~/.claude-mem/logs/\n4. See GitHub issues: #363, #367, #371, #373\n5. Docs: https://docs.claude-mem.ai/troubleshooting/windows-issues`
|
|
: 'Process died during startup';
|
|
return { success: false, error: errorMsg };
|
|
}
|
|
|
|
// Try readiness check (changed from /health to /api/readiness)
|
|
try {
|
|
const response = await fetch(`http://127.0.0.1:${port}/api/readiness`, {
|
|
signal: AbortSignal.timeout(HEALTH_CHECK_FETCH_TIMEOUT_MS)
|
|
});
|
|
if (response.ok) {
|
|
return { success: true, pid };
|
|
}
|
|
} catch {
|
|
// Not ready yet, continue polling
|
|
}
|
|
|
|
await new Promise(resolve => setTimeout(resolve, HEALTH_CHECK_INTERVAL_MS));
|
|
}
|
|
|
|
const timeoutMsg = isWindows
|
|
? `Worker failed to start on Windows (readiness check timed out after ${adjustedTimeout}ms)\n\nTroubleshooting:\n1. Check Task Manager for zombie 'bun.exe' or 'node.exe' processes\n2. Verify port ${port} is not in use: netstat -ano | findstr ${port}\n3. Check worker logs in ~/.claude-mem/logs/\n4. See GitHub issues: #363, #367, #371, #373\n5. Docs: https://docs.claude-mem.ai/troubleshooting/windows-issues`
|
|
: `Readiness check timed out after ${adjustedTimeout}ms`;
|
|
|
|
return { success: false, error: timeoutMsg };
|
|
}
|
|
|
|
private static async waitForExit(pid: number, timeout: number): Promise<void> {
|
|
const startTime = Date.now();
|
|
|
|
while (Date.now() - startTime < timeout) {
|
|
if (!this.isProcessAlive(pid)) {
|
|
return;
|
|
}
|
|
await new Promise(resolve => setTimeout(resolve, PROCESS_EXIT_CHECK_INTERVAL_MS));
|
|
}
|
|
|
|
throw new Error('Process did not exit within timeout');
|
|
}
|
|
|
|
private static getLogFilePath(): string {
|
|
const date = new Date().toISOString().slice(0, 10);
|
|
return join(LOG_DIR, `worker-${date}.log`);
|
|
}
|
|
|
|
private static formatUptime(startedAt: string): string {
|
|
const startTime = new Date(startedAt).getTime();
|
|
const now = Date.now();
|
|
const diffMs = now - startTime;
|
|
|
|
const seconds = Math.floor(diffMs / 1000);
|
|
const minutes = Math.floor(seconds / 60);
|
|
const hours = Math.floor(minutes / 60);
|
|
const days = Math.floor(hours / 24);
|
|
|
|
if (days > 0) return `${days}d ${hours % 24}h`;
|
|
if (hours > 0) return `${hours}h ${minutes % 60}m`;
|
|
if (minutes > 0) return `${minutes}m ${seconds % 60}s`;
|
|
return `${seconds}s`;
|
|
}
|
|
}
|