fix(windows): Windows platform stabilization improvements (#378)
* chore: bump version to 7.3.6 in package.json * Enhance worker readiness checks and MCP connection handling - Updated health check endpoint to /api/readiness for better initialization tracking. - Increased timeout for health checks and worker startup retries, especially for Windows. - Added initialization flags to track MCP readiness and overall worker initialization status. - Implemented a timeout guard for MCP connection to prevent hanging. - Adjusted logging to reflect readiness state and errors more accurately. * fix(windows): use Bun PATH detection in worker wrapper Phase 2/8: Fix Bun PATH Detection in Worker Wrapper - Import getBunPath() in worker-wrapper.ts for Bun detection - Add Bun path resolution before spawning inner worker process - Update spawn call to use detected Bun path instead of process.execPath - Add logging to bun-path.ts when PATH detection succeeds - Add logging when fallback paths are used - Add Windows-specific validation for .exe extension - Log warning with searched paths when Bun not found - Fail fast with clear error message if Bun cannot be detected This ensures worker-wrapper uses the correct Bun executable on Windows even when Bun is not in PATH, fixing issue #371 where users reported "Bun not in PATH" errors despite Bun being installed. Addresses: #371 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(windows): standardize child process spawning with windowsHide Phase 3/8: Standardize Child Process Spawning (Windows) Changes: - Added windowsHide flag to ChromaSync MCP subprocess spawn - Added Windows-specific process tracking (childPid) in ChromaSync - Force-kill subprocess on Windows before closing transport to prevent zombie processes - Updated cleanupOrphanedProcesses() to support Windows using PowerShell Get-CimInstance - Use taskkill /T /F for proper process tree cleanup on Windows - Audited BranchManager - confirmed windowsHide already present on all spawn calls This prevents PowerShell windows from appearing during ChromaSync operations and ensures proper cleanup of subprocess trees on Windows. Addresses: #363, #361, #367, #371, #373, #374 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(windows): enhance socket cleanup with recursive process tree management Phase 4/8: Enhanced Socket Cleanup & Process Tree Management Changes: - Added recursive process tree enumeration in worker-wrapper.ts for Windows - Enhanced killInner() to enumerate all descendants before killing - Added fallback individual process kill if taskkill /T fails - Added 10s timeout to ChromaSync.close() in DatabaseManager to prevent hangs - Force nullify ChromaSync even on close failure to prevent resource leaks - Improved logging to show full process tree during cleanup This ensures complete cleanup of all child processes (ChromaSync MCP subprocess, Python processes, etc.) preventing socket leaks and CLOSE_WAIT states. Addresses: #363, #361 * fix(windows): consolidate project name extraction with drive root handling Phase 5/8: Project Name Extraction Consolidation - Created shared getProjectName() utility in src/utils/project-name.ts - Handles edge case: drive roots (C:\, J:\) now return "drive-X" format - Handles edge case: null/undefined/empty cwd now returns "unknown-project" - Fixed missing null check bug in new-hook.ts - Replaced duplicated path.basename(cwd) logic in: - src/hooks/context-hook.ts - src/hooks/new-hook.ts - src/services/context-generator.ts Addresses: #374 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(windows): increase timeouts and improve error messages Phase 6/8: Increase Timeouts & Improve Error Messages - Enhanced logger.ts with platform prefix (WIN32/DARWIN) and PID in all logs - Added comprehensive Windows troubleshooting to ProcessManager error messages - Enhanced Bun detection error message with Windows-specific troubleshooting - All error messages now include GitHub issue numbers and docs links - Windows timeout already increased to 2.0x multiplier in previous phases Changes: - src/utils/logger.ts: Added platform prefix and PID to all log output - src/services/process/ProcessManager.ts: Enhanced error messages with troubleshooting steps - src/utils/bun-path.ts: Added Windows-specific Bun detection error guidance Addresses: #363, #361, #367, #371, #373, #374 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(windows): add comprehensive Windows CI testing Phase 7/8: Add Windows CI Testing - Create automated Windows testing workflow - Test worker startup/shutdown cycles - Verify Bun PATH detection on Windows - Test rapid restart scenarios - Validate port cleanup after shutdown - Check for zombie processes - Run on all pushes and PRs to main/fix/feature branches Addresses: #363, #361, #367, #371, #373, #374 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * ci(windows): remove build steps from Windows CI workflow Build files are already included in the plugin folder, so npm install and npm run build are unnecessary steps in the CI workflow. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * revert: remove Windows CI workflow The CI workflow cannot be properly implemented in the current architecture due to limitations in testing the worker service in CI environments. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * security: add PID validation and improve ChromaSync timeout handling Address critical security and reliability issues identified in PR review: **Security Fixes:** - Add PID validation before all PowerShell/taskkill command execution - Validate PIDs are positive integers to prevent command injection - Apply validation in worker-wrapper.ts, worker-service.ts, and ChromaSync.ts **Reliability Improvements:** - Add timeout handling to ChromaSync client.close() (10s timeout) - Add timeout handling to ChromaSync transport.close() (5s timeout) - Implement force-kill fallback when ChromaSync close operations timeout - Prevents hanging on shutdown and ensures subprocess cleanup **Implementation Details:** - PID validation checks: Number.isInteger(pid) && pid > 0 - Applied before all execSync taskkill calls on Windows - Applied in process enumeration (Get-CimInstance) PowerShell commands - ChromaSync.close() uses Promise.race for timeout enforcement - Graceful degradation with force-kill fallback on timeout Addresses PR #378 review feedback 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Refactor ChromaSync client and transport closure logic - Removed timeout handling for closing the Chroma client and transport. - Simplified error logging for client and transport closure. - Ensured subprocess cleanup logic is more straightforward. * fix(worker): streamline Windows process management and cleanup * revert: remove speculative LLM-generated complexity Reverts defensive code that was added speculatively without user-reported issues: - ChromaSync: Remove PID extraction and explicit taskkill (wrapper handles this) - worker-wrapper: Restore simple taskkill /T /F (validated in v7.3.5) - DatabaseManager: Remove Promise.race timeout wrapper - hook-constants: Restore original timeout values - logger: Remove platform/PID additions to every log line - bun-path: Remove speculative logging Keeps only changes that map to actual GitHub issues: - #374: Drive root project name fix (getProjectName utility) - #363: Readiness endpoint and Windows orphan cleanup - #367: windowsHide on ChromaSync transport 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -25,6 +25,7 @@ import {
|
||||
toRelativePath,
|
||||
extractFirstFile
|
||||
} from '../shared/timeline-formatting.js';
|
||||
import { getProjectName } from '../utils/project-name.js';
|
||||
|
||||
// Version marker path - use homedir-based path that works in both CJS and ESM contexts
|
||||
const VERSION_MARKER_PATH = path.join(homedir(), '.claude', 'plugins', 'marketplaces', 'thedotmack', 'plugin', '.install-version');
|
||||
@@ -222,7 +223,7 @@ function extractPriorMessages(transcriptPath: string): { userMessage: string; as
|
||||
export async function generateContext(input?: ContextInput, useColors: boolean = false): Promise<string> {
|
||||
const config = loadContextConfig();
|
||||
const cwd = input?.cwd ?? process.cwd();
|
||||
const project = cwd ? path.basename(cwd) : 'unknown-project';
|
||||
const project = getProjectName(cwd);
|
||||
|
||||
let db: SessionStore | null = null;
|
||||
try {
|
||||
|
||||
@@ -271,29 +271,39 @@ export class ProcessManager {
|
||||
|
||||
private static async waitForHealth(pid: number, port: number, timeoutMs: number = HEALTH_CHECK_TIMEOUT_MS): Promise<{ success: boolean; pid?: number; error?: string }> {
|
||||
const startTime = Date.now();
|
||||
const isWindows = process.platform === 'win32';
|
||||
// Increase timeout on Windows to account for slower process startup
|
||||
const adjustedTimeout = isWindows ? timeoutMs * 2 : timeoutMs;
|
||||
|
||||
while (Date.now() - startTime < timeoutMs) {
|
||||
while (Date.now() - startTime < adjustedTimeout) {
|
||||
// Check if process is still alive
|
||||
if (!this.isProcessAlive(pid)) {
|
||||
return { success: false, error: 'Process died during startup' };
|
||||
const errorMsg = isWindows
|
||||
? `Process died during startup\n\nTroubleshooting:\n1. Check Task Manager for zombie 'bun.exe' or 'node.exe' processes\n2. Verify port ${port} is not in use: netstat -ano | findstr ${port}\n3. Check worker logs in ~/.claude-mem/logs/\n4. See GitHub issues: #363, #367, #371, #373\n5. Docs: https://docs.claude-mem.ai/troubleshooting/windows-issues`
|
||||
: 'Process died during startup';
|
||||
return { success: false, error: errorMsg };
|
||||
}
|
||||
|
||||
// Try health check
|
||||
// Try readiness check (changed from /health to /api/readiness)
|
||||
try {
|
||||
const response = await fetch(`http://127.0.0.1:${port}/health`, {
|
||||
const response = await fetch(`http://127.0.0.1:${port}/api/readiness`, {
|
||||
signal: AbortSignal.timeout(HEALTH_CHECK_FETCH_TIMEOUT_MS)
|
||||
});
|
||||
if (response.ok) {
|
||||
return { success: true, pid };
|
||||
}
|
||||
} catch {
|
||||
// Not ready yet
|
||||
// Not ready yet, continue polling
|
||||
}
|
||||
|
||||
await new Promise(resolve => setTimeout(resolve, HEALTH_CHECK_INTERVAL_MS));
|
||||
}
|
||||
|
||||
return { success: false, error: 'Health check timed out' };
|
||||
const timeoutMsg = isWindows
|
||||
? `Worker failed to start on Windows (readiness check timed out after ${adjustedTimeout}ms)\n\nTroubleshooting:\n1. Check Task Manager for zombie 'bun.exe' or 'node.exe' processes\n2. Verify port ${port} is not in use: netstat -ano | findstr ${port}\n3. Check worker logs in ~/.claude-mem/logs/\n4. See GitHub issues: #363, #367, #371, #373\n5. Docs: https://docs.claude-mem.ai/troubleshooting/windows-issues`
|
||||
: `Readiness check timed out after ${adjustedTimeout}ms`;
|
||||
|
||||
return { success: false, error: timeoutMsg };
|
||||
}
|
||||
|
||||
private static async waitForExit(pid: number, timeout: number): Promise<void> {
|
||||
|
||||
@@ -101,7 +101,9 @@ export class ChromaSync {
|
||||
// See: https://github.com/thedotmack/claude-mem/issues/170 (Python 3.14 incompatibility)
|
||||
const settings = SettingsDefaultsManager.loadFromFile(USER_SETTINGS_PATH);
|
||||
const pythonVersion = settings.CLAUDE_MEM_PYTHON_VERSION;
|
||||
this.transport = new StdioClientTransport({
|
||||
const isWindows = process.platform === 'win32';
|
||||
|
||||
const transportOptions: any = {
|
||||
command: 'uvx',
|
||||
args: [
|
||||
'--python', pythonVersion,
|
||||
@@ -110,7 +112,16 @@ export class ChromaSync {
|
||||
'--data-dir', this.VECTOR_DB_DIR
|
||||
],
|
||||
stderr: 'ignore'
|
||||
});
|
||||
};
|
||||
|
||||
// CRITICAL: On Windows, try to hide console window to prevent PowerShell popups
|
||||
// Note: windowsHide may not be supported by MCP SDK's StdioClientTransport
|
||||
if (isWindows) {
|
||||
transportOptions.windowsHide = true;
|
||||
logger.debug('CHROMA_SYNC', 'Windows detected, attempting to hide console window', { project: this.project });
|
||||
}
|
||||
|
||||
this.transport = new StdioClientTransport(transportOptions);
|
||||
|
||||
this.client = new Client({
|
||||
name: 'claude-mem-chroma-sync',
|
||||
|
||||
+105
-23
@@ -14,7 +14,7 @@ import { Client } from '@modelcontextprotocol/sdk/client/index.js';
|
||||
import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js';
|
||||
import { getWorkerPort, getWorkerHost } from '../shared/worker-utils.js';
|
||||
import { logger } from '../utils/logger.js';
|
||||
import { exec } from 'child_process';
|
||||
import { exec, execSync } from 'child_process';
|
||||
import { promisify } from 'util';
|
||||
|
||||
const execAsync = promisify(exec);
|
||||
@@ -45,6 +45,10 @@ export class WorkerService {
|
||||
private startTime: number = Date.now();
|
||||
private mcpClient: Client;
|
||||
|
||||
// Initialization flags for MCP/SDK readiness tracking
|
||||
private mcpReady: boolean = false;
|
||||
private initializationCompleteFlag: boolean = false;
|
||||
|
||||
// Domain services
|
||||
private dbManager: DatabaseManager;
|
||||
private sessionManager: SessionManager;
|
||||
@@ -128,17 +132,36 @@ export class WorkerService {
|
||||
hasIpc: typeof process.send === 'function',
|
||||
platform: process.platform,
|
||||
pid: process.pid,
|
||||
initialized: this.initializationCompleteFlag,
|
||||
mcpReady: this.mcpReady,
|
||||
});
|
||||
});
|
||||
|
||||
// Readiness check endpoint - returns 503 until full initialization completes
|
||||
// Used by ProcessManager and worker-utils to ensure worker is fully ready before routing requests
|
||||
this.app.get('/api/readiness', (_req, res) => {
|
||||
if (this.initializationCompleteFlag) {
|
||||
res.status(200).json({
|
||||
status: 'ready',
|
||||
mcpReady: this.mcpReady,
|
||||
});
|
||||
} else {
|
||||
res.status(503).json({
|
||||
status: 'initializing',
|
||||
message: 'Worker is still initializing, please retry',
|
||||
});
|
||||
}
|
||||
});
|
||||
|
||||
// Version endpoint - returns the worker's current version
|
||||
this.app.get('/api/version', (_req, res) => {
|
||||
const { homedir } = require('os');
|
||||
const { readFileSync } = require('fs');
|
||||
const marketplaceRoot = path.join(homedir(), '.claude', 'plugins', 'marketplaces', 'thedotmack');
|
||||
const packageJsonPath = path.join(marketplaceRoot, 'package.json');
|
||||
|
||||
try {
|
||||
// Read version from marketplace package.json
|
||||
const { homedir } = require('os');
|
||||
const { readFileSync } = require('fs');
|
||||
const marketplaceRoot = path.join(homedir(), '.claude', 'plugins', 'marketplaces', 'thedotmack');
|
||||
const packageJsonPath = path.join(marketplaceRoot, 'package.json');
|
||||
const packageJson = JSON.parse(readFileSync(packageJsonPath, 'utf-8'));
|
||||
res.status(200).json({ version: packageJson.version });
|
||||
} catch (error) {
|
||||
@@ -295,25 +318,47 @@ export class WorkerService {
|
||||
*/
|
||||
private async cleanupOrphanedProcesses(): Promise<void> {
|
||||
try {
|
||||
// Find all chroma-mcp processes
|
||||
const { stdout } = await execAsync('ps aux | grep "chroma-mcp" | grep -v grep || true');
|
||||
|
||||
if (!stdout.trim()) {
|
||||
logger.debug('SYSTEM', 'No orphaned chroma-mcp processes found');
|
||||
return;
|
||||
}
|
||||
|
||||
const lines = stdout.trim().split('\n');
|
||||
const isWindows = process.platform === 'win32';
|
||||
const pids: number[] = [];
|
||||
|
||||
for (const line of lines) {
|
||||
const parts = line.trim().split(/\s+/);
|
||||
if (parts.length > 1) {
|
||||
const pid = parseInt(parts[1], 10);
|
||||
if (!isNaN(pid)) {
|
||||
if (isWindows) {
|
||||
// Windows: Use PowerShell Get-CimInstance to find chroma-mcp processes
|
||||
const cmd = `powershell -Command "Get-CimInstance Win32_Process | Where-Object { $_.Name -like '*python*' -and $_.CommandLine -like '*chroma-mcp*' } | Select-Object -ExpandProperty ProcessId"`;
|
||||
const { stdout } = await execAsync(cmd, { timeout: 5000 });
|
||||
|
||||
if (!stdout.trim()) {
|
||||
logger.debug('SYSTEM', 'No orphaned chroma-mcp processes found (Windows)');
|
||||
return;
|
||||
}
|
||||
|
||||
const pidStrings = stdout.trim().split('\n');
|
||||
for (const pidStr of pidStrings) {
|
||||
const pid = parseInt(pidStr.trim(), 10);
|
||||
// SECURITY: Validate PID is positive integer before adding to list
|
||||
if (!isNaN(pid) && Number.isInteger(pid) && pid > 0) {
|
||||
pids.push(pid);
|
||||
}
|
||||
}
|
||||
} else {
|
||||
// Unix: Use ps aux | grep
|
||||
const { stdout } = await execAsync('ps aux | grep "chroma-mcp" | grep -v grep || true');
|
||||
|
||||
if (!stdout.trim()) {
|
||||
logger.debug('SYSTEM', 'No orphaned chroma-mcp processes found (Unix)');
|
||||
return;
|
||||
}
|
||||
|
||||
const lines = stdout.trim().split('\n');
|
||||
for (const line of lines) {
|
||||
const parts = line.trim().split(/\s+/);
|
||||
if (parts.length > 1) {
|
||||
const pid = parseInt(parts[1], 10);
|
||||
// SECURITY: Validate PID is positive integer before adding to list
|
||||
if (!isNaN(pid) && Number.isInteger(pid) && pid > 0) {
|
||||
pids.push(pid);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (pids.length === 0) {
|
||||
@@ -321,12 +366,28 @@ export class WorkerService {
|
||||
}
|
||||
|
||||
logger.info('SYSTEM', 'Cleaning up orphaned chroma-mcp processes', {
|
||||
platform: isWindows ? 'Windows' : 'Unix',
|
||||
count: pids.length,
|
||||
pids
|
||||
});
|
||||
|
||||
// Kill all found processes
|
||||
await execAsync(`kill ${pids.join(' ')}`);
|
||||
if (isWindows) {
|
||||
for (const pid of pids) {
|
||||
// SECURITY: Double-check PID validation before using in taskkill command
|
||||
if (!Number.isInteger(pid) || pid <= 0) {
|
||||
logger.warn('SYSTEM', 'Skipping invalid PID', { pid });
|
||||
continue;
|
||||
}
|
||||
try {
|
||||
execSync(`taskkill /PID ${pid} /T /F`, { timeout: 5000, stdio: 'ignore' });
|
||||
} catch (error) {
|
||||
logger.warn('SYSTEM', 'Failed to kill orphaned process', { pid }, error as Error);
|
||||
}
|
||||
}
|
||||
} else {
|
||||
await execAsync(`kill ${pids.join(' ')}`);
|
||||
}
|
||||
|
||||
logger.info('SYSTEM', 'Orphaned processes cleaned up', { count: pids.length });
|
||||
} catch (error) {
|
||||
@@ -380,7 +441,7 @@ export class WorkerService {
|
||||
this.searchRoutes.setupRoutes(this.app); // Setup search routes now that SearchManager is ready
|
||||
logger.info('WORKER', 'SearchManager initialized and search routes registered');
|
||||
|
||||
// Connect to MCP server
|
||||
// Connect to MCP server with timeout guard
|
||||
const mcpServerPath = path.join(__dirname, 'mcp-server.cjs');
|
||||
const transport = new StdioClientTransport({
|
||||
command: 'node',
|
||||
@@ -388,10 +449,19 @@ export class WorkerService {
|
||||
env: process.env
|
||||
});
|
||||
|
||||
await this.mcpClient.connect(transport);
|
||||
// Add timeout guard to prevent hanging on MCP connection (15 seconds)
|
||||
const MCP_INIT_TIMEOUT_MS = 15000;
|
||||
const mcpConnectionPromise = this.mcpClient.connect(transport);
|
||||
const timeoutPromise = new Promise<never>((_, reject) =>
|
||||
setTimeout(() => reject(new Error('MCP connection timeout after 15s')), MCP_INIT_TIMEOUT_MS)
|
||||
);
|
||||
|
||||
await Promise.race([mcpConnectionPromise, timeoutPromise]);
|
||||
this.mcpReady = true;
|
||||
logger.success('WORKER', 'Connected to MCP server');
|
||||
|
||||
// Signal that initialization is complete
|
||||
this.initializationCompleteFlag = true;
|
||||
this.resolveInitialization();
|
||||
logger.info('SYSTEM', 'Background initialization complete');
|
||||
} catch (error) {
|
||||
@@ -492,6 +562,12 @@ export class WorkerService {
|
||||
return [];
|
||||
}
|
||||
|
||||
// SECURITY: Validate PID is a positive integer to prevent command injection
|
||||
if (!Number.isInteger(parentPid) || parentPid <= 0) {
|
||||
logger.warn('SYSTEM', 'Invalid parent PID for child process enumeration', { parentPid });
|
||||
return [];
|
||||
}
|
||||
|
||||
try {
|
||||
const cmd = `powershell -Command "Get-CimInstance Win32_Process | Where-Object { $_.ParentProcessId -eq ${parentPid} } | Select-Object -ExpandProperty ProcessId"`;
|
||||
const { stdout } = await execAsync(cmd, { timeout: 5000 });
|
||||
@@ -499,7 +575,7 @@ export class WorkerService {
|
||||
.trim()
|
||||
.split('\n')
|
||||
.map(s => parseInt(s.trim(), 10))
|
||||
.filter(n => !isNaN(n));
|
||||
.filter(n => !isNaN(n) && Number.isInteger(n) && n > 0); // SECURITY: Validate each PID
|
||||
} catch (error) {
|
||||
logger.warn('SYSTEM', 'Failed to enumerate child processes', {}, error as Error);
|
||||
return [];
|
||||
@@ -510,6 +586,12 @@ export class WorkerService {
|
||||
* Force kill a process by PID (Windows: uses taskkill /F /T)
|
||||
*/
|
||||
private async forceKillProcess(pid: number): Promise<void> {
|
||||
// SECURITY: Validate PID is a positive integer to prevent command injection
|
||||
if (!Number.isInteger(pid) || pid <= 0) {
|
||||
logger.warn('SYSTEM', 'Invalid PID for force kill', { pid });
|
||||
return;
|
||||
}
|
||||
|
||||
try {
|
||||
if (process.platform === 'win32') {
|
||||
// /T kills entire process tree, /F forces termination
|
||||
|
||||
Reference in New Issue
Block a user