bff10d49c9
* chore: bump version to 7.3.6 in package.json * Enhance worker readiness checks and MCP connection handling - Updated health check endpoint to /api/readiness for better initialization tracking. - Increased timeout for health checks and worker startup retries, especially for Windows. - Added initialization flags to track MCP readiness and overall worker initialization status. - Implemented a timeout guard for MCP connection to prevent hanging. - Adjusted logging to reflect readiness state and errors more accurately. * fix(windows): use Bun PATH detection in worker wrapper Phase 2/8: Fix Bun PATH Detection in Worker Wrapper - Import getBunPath() in worker-wrapper.ts for Bun detection - Add Bun path resolution before spawning inner worker process - Update spawn call to use detected Bun path instead of process.execPath - Add logging to bun-path.ts when PATH detection succeeds - Add logging when fallback paths are used - Add Windows-specific validation for .exe extension - Log warning with searched paths when Bun not found - Fail fast with clear error message if Bun cannot be detected This ensures worker-wrapper uses the correct Bun executable on Windows even when Bun is not in PATH, fixing issue #371 where users reported "Bun not in PATH" errors despite Bun being installed. Addresses: #371 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(windows): standardize child process spawning with windowsHide Phase 3/8: Standardize Child Process Spawning (Windows) Changes: - Added windowsHide flag to ChromaSync MCP subprocess spawn - Added Windows-specific process tracking (childPid) in ChromaSync - Force-kill subprocess on Windows before closing transport to prevent zombie processes - Updated cleanupOrphanedProcesses() to support Windows using PowerShell Get-CimInstance - Use taskkill /T /F for proper process tree cleanup on Windows - Audited BranchManager - confirmed windowsHide already present on all spawn calls This prevents PowerShell windows from appearing during ChromaSync operations and ensures proper cleanup of subprocess trees on Windows. Addresses: #363, #361, #367, #371, #373, #374 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(windows): enhance socket cleanup with recursive process tree management Phase 4/8: Enhanced Socket Cleanup & Process Tree Management Changes: - Added recursive process tree enumeration in worker-wrapper.ts for Windows - Enhanced killInner() to enumerate all descendants before killing - Added fallback individual process kill if taskkill /T fails - Added 10s timeout to ChromaSync.close() in DatabaseManager to prevent hangs - Force nullify ChromaSync even on close failure to prevent resource leaks - Improved logging to show full process tree during cleanup This ensures complete cleanup of all child processes (ChromaSync MCP subprocess, Python processes, etc.) preventing socket leaks and CLOSE_WAIT states. Addresses: #363, #361 * fix(windows): consolidate project name extraction with drive root handling Phase 5/8: Project Name Extraction Consolidation - Created shared getProjectName() utility in src/utils/project-name.ts - Handles edge case: drive roots (C:\, J:\) now return "drive-X" format - Handles edge case: null/undefined/empty cwd now returns "unknown-project" - Fixed missing null check bug in new-hook.ts - Replaced duplicated path.basename(cwd) logic in: - src/hooks/context-hook.ts - src/hooks/new-hook.ts - src/services/context-generator.ts Addresses: #374 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(windows): increase timeouts and improve error messages Phase 6/8: Increase Timeouts & Improve Error Messages - Enhanced logger.ts with platform prefix (WIN32/DARWIN) and PID in all logs - Added comprehensive Windows troubleshooting to ProcessManager error messages - Enhanced Bun detection error message with Windows-specific troubleshooting - All error messages now include GitHub issue numbers and docs links - Windows timeout already increased to 2.0x multiplier in previous phases Changes: - src/utils/logger.ts: Added platform prefix and PID to all log output - src/services/process/ProcessManager.ts: Enhanced error messages with troubleshooting steps - src/utils/bun-path.ts: Added Windows-specific Bun detection error guidance Addresses: #363, #361, #367, #371, #373, #374 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(windows): add comprehensive Windows CI testing Phase 7/8: Add Windows CI Testing - Create automated Windows testing workflow - Test worker startup/shutdown cycles - Verify Bun PATH detection on Windows - Test rapid restart scenarios - Validate port cleanup after shutdown - Check for zombie processes - Run on all pushes and PRs to main/fix/feature branches Addresses: #363, #361, #367, #371, #373, #374 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * ci(windows): remove build steps from Windows CI workflow Build files are already included in the plugin folder, so npm install and npm run build are unnecessary steps in the CI workflow. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * revert: remove Windows CI workflow The CI workflow cannot be properly implemented in the current architecture due to limitations in testing the worker service in CI environments. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * security: add PID validation and improve ChromaSync timeout handling Address critical security and reliability issues identified in PR review: **Security Fixes:** - Add PID validation before all PowerShell/taskkill command execution - Validate PIDs are positive integers to prevent command injection - Apply validation in worker-wrapper.ts, worker-service.ts, and ChromaSync.ts **Reliability Improvements:** - Add timeout handling to ChromaSync client.close() (10s timeout) - Add timeout handling to ChromaSync transport.close() (5s timeout) - Implement force-kill fallback when ChromaSync close operations timeout - Prevents hanging on shutdown and ensures subprocess cleanup **Implementation Details:** - PID validation checks: Number.isInteger(pid) && pid > 0 - Applied before all execSync taskkill calls on Windows - Applied in process enumeration (Get-CimInstance) PowerShell commands - ChromaSync.close() uses Promise.race for timeout enforcement - Graceful degradation with force-kill fallback on timeout Addresses PR #378 review feedback 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Refactor ChromaSync client and transport closure logic - Removed timeout handling for closing the Chroma client and transport. - Simplified error logging for client and transport closure. - Ensured subprocess cleanup logic is more straightforward. * fix(worker): streamline Windows process management and cleanup * revert: remove speculative LLM-generated complexity Reverts defensive code that was added speculatively without user-reported issues: - ChromaSync: Remove PID extraction and explicit taskkill (wrapper handles this) - worker-wrapper: Restore simple taskkill /T /F (validated in v7.3.5) - DatabaseManager: Remove Promise.race timeout wrapper - hook-constants: Restore original timeout values - logger: Remove platform/PID additions to every log line - bun-path: Remove speculative logging Keeps only changes that map to actual GitHub issues: - #374: Drive root project name fix (getProjectName utility) - #363: Readiness endpoint and Windows orphan cleanup - #367: windowsHide on ChromaSync transport 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
240 lines
7.4 KiB
TypeScript
240 lines
7.4 KiB
TypeScript
import path from "path";
|
|
import { homedir } from "os";
|
|
import { spawnSync } from "child_process";
|
|
import { existsSync, writeFileSync, readFileSync, mkdirSync } from "fs";
|
|
import { logger } from "../utils/logger.js";
|
|
import { HOOK_TIMEOUTS, getTimeout } from "./hook-constants.js";
|
|
import { ProcessManager } from "../services/process/ProcessManager.js";
|
|
import { SettingsDefaultsManager } from "./SettingsDefaultsManager.js";
|
|
import { getWorkerRestartInstructions } from "../utils/error-messages.js";
|
|
|
|
const MARKETPLACE_ROOT = path.join(homedir(), '.claude', 'plugins', 'marketplaces', 'thedotmack');
|
|
|
|
// Named constants for health checks
|
|
const HEALTH_CHECK_TIMEOUT_MS = getTimeout(HOOK_TIMEOUTS.HEALTH_CHECK);
|
|
|
|
// Port cache to avoid repeated settings file reads
|
|
let cachedPort: number | null = null;
|
|
|
|
/**
|
|
* Get the worker port number from settings
|
|
* Uses CLAUDE_MEM_WORKER_PORT from settings file or default (37777)
|
|
* Caches the port value to avoid repeated file reads
|
|
*/
|
|
export function getWorkerPort(): number {
|
|
if (cachedPort !== null) {
|
|
return cachedPort;
|
|
}
|
|
|
|
try {
|
|
const settingsPath = path.join(SettingsDefaultsManager.get('CLAUDE_MEM_DATA_DIR'), 'settings.json');
|
|
const settings = SettingsDefaultsManager.loadFromFile(settingsPath);
|
|
cachedPort = parseInt(settings.CLAUDE_MEM_WORKER_PORT, 10);
|
|
return cachedPort;
|
|
} catch (error) {
|
|
// Fallback to default if settings load fails
|
|
logger.debug('SYSTEM', 'Failed to load port from settings, using default', { error });
|
|
cachedPort = parseInt(SettingsDefaultsManager.get('CLAUDE_MEM_WORKER_PORT'), 10);
|
|
return cachedPort;
|
|
}
|
|
}
|
|
|
|
/**
|
|
* Clear the cached port value
|
|
* Call this when settings are updated to force re-reading from file
|
|
*/
|
|
export function clearPortCache(): void {
|
|
cachedPort = null;
|
|
}
|
|
|
|
/**
|
|
* Get the worker host address
|
|
* Priority: ~/.claude-mem/settings.json > env var > default (127.0.0.1)
|
|
*/
|
|
export function getWorkerHost(): string {
|
|
const settingsPath = path.join(homedir(), '.claude-mem', 'settings.json');
|
|
const settings = SettingsDefaultsManager.loadFromFile(settingsPath);
|
|
return settings.CLAUDE_MEM_WORKER_HOST;
|
|
}
|
|
|
|
/**
|
|
* Check if worker is responsive and fully initialized by trying the readiness endpoint
|
|
* Changed from /health to /api/readiness to ensure MCP initialization is complete
|
|
*/
|
|
async function isWorkerHealthy(): Promise<boolean> {
|
|
try {
|
|
const port = getWorkerPort();
|
|
const response = await fetch(`http://127.0.0.1:${port}/api/readiness`, {
|
|
signal: AbortSignal.timeout(HEALTH_CHECK_TIMEOUT_MS)
|
|
});
|
|
return response.ok;
|
|
} catch (error) {
|
|
logger.debug('SYSTEM', 'Worker readiness check failed', {
|
|
error: error instanceof Error ? error.message : String(error),
|
|
errorType: error?.constructor?.name
|
|
});
|
|
return false;
|
|
}
|
|
}
|
|
|
|
/**
|
|
* Get the current plugin version from package.json
|
|
*/
|
|
function getPluginVersion(): string | null {
|
|
try {
|
|
const packageJsonPath = path.join(MARKETPLACE_ROOT, 'package.json');
|
|
const packageJson = JSON.parse(readFileSync(packageJsonPath, 'utf-8'));
|
|
return packageJson.version;
|
|
} catch (error) {
|
|
logger.debug('SYSTEM', 'Failed to read plugin version', {
|
|
error: error instanceof Error ? error.message : String(error)
|
|
});
|
|
return null;
|
|
}
|
|
}
|
|
|
|
/**
|
|
* Get the running worker's version from the API
|
|
*/
|
|
async function getWorkerVersion(): Promise<string | null> {
|
|
try {
|
|
const port = getWorkerPort();
|
|
const response = await fetch(`http://127.0.0.1:${port}/api/version`, {
|
|
signal: AbortSignal.timeout(HEALTH_CHECK_TIMEOUT_MS)
|
|
});
|
|
if (!response.ok) return null;
|
|
const data = await response.json() as { version: string };
|
|
return data.version;
|
|
} catch (error) {
|
|
logger.debug('SYSTEM', 'Failed to get worker version', {
|
|
error: error instanceof Error ? error.message : String(error)
|
|
});
|
|
return null;
|
|
}
|
|
}
|
|
|
|
/**
|
|
* Check if worker version matches plugin version
|
|
* If mismatch detected, restart the worker automatically
|
|
*/
|
|
async function ensureWorkerVersionMatches(): Promise<void> {
|
|
const pluginVersion = getPluginVersion();
|
|
const workerVersion = await getWorkerVersion();
|
|
|
|
if (!pluginVersion || !workerVersion) {
|
|
// Can't determine versions, skip check
|
|
return;
|
|
}
|
|
|
|
if (pluginVersion !== workerVersion) {
|
|
logger.info('SYSTEM', 'Worker version mismatch detected - restarting worker', {
|
|
pluginVersion,
|
|
workerVersion
|
|
});
|
|
|
|
// Give files time to sync before restart
|
|
await new Promise(resolve => setTimeout(resolve, getTimeout(HOOK_TIMEOUTS.PRE_RESTART_SETTLE_DELAY)));
|
|
|
|
// Restart the worker
|
|
await ProcessManager.restart(getWorkerPort());
|
|
|
|
// Give it a moment to start
|
|
await new Promise(resolve => setTimeout(resolve, 1000));
|
|
|
|
// Verify it's healthy
|
|
if (!await isWorkerHealthy()) {
|
|
logger.error('SYSTEM', 'Worker failed to restart after version mismatch', {
|
|
expectedVersion: pluginVersion,
|
|
runningVersion: workerVersion,
|
|
port: getWorkerPort()
|
|
});
|
|
}
|
|
}
|
|
}
|
|
|
|
/**
|
|
* Start the worker service using ProcessManager
|
|
* Handles both Unix (Bun) and Windows (compiled exe) platforms
|
|
*/
|
|
async function startWorker(): Promise<boolean> {
|
|
// Clean up legacy PM2 (one-time migration)
|
|
const dataDir = SettingsDefaultsManager.get('CLAUDE_MEM_DATA_DIR');
|
|
const pm2MigratedMarker = path.join(dataDir, '.pm2-migrated');
|
|
|
|
// Ensure data directory exists (may not exist on fresh install)
|
|
mkdirSync(dataDir, { recursive: true });
|
|
|
|
if (!existsSync(pm2MigratedMarker)) {
|
|
try {
|
|
spawnSync('pm2', ['delete', 'claude-mem-worker'], { stdio: 'ignore' });
|
|
// Mark migration as complete
|
|
writeFileSync(pm2MigratedMarker, new Date().toISOString(), 'utf-8');
|
|
logger.debug('SYSTEM', 'PM2 cleanup completed and marked');
|
|
} catch {
|
|
// PM2 not installed or process doesn't exist - still mark as migrated
|
|
writeFileSync(pm2MigratedMarker, new Date().toISOString(), 'utf-8');
|
|
}
|
|
}
|
|
|
|
const port = getWorkerPort();
|
|
const result = await ProcessManager.start(port);
|
|
|
|
if (!result.success) {
|
|
logger.error('SYSTEM', 'Failed to start worker', {
|
|
platform: process.platform,
|
|
port,
|
|
error: result.error,
|
|
marketplaceRoot: MARKETPLACE_ROOT
|
|
});
|
|
}
|
|
|
|
return result.success;
|
|
}
|
|
|
|
/**
|
|
* Ensure worker service is running
|
|
* Checks health and auto-starts if not running
|
|
* Also ensures worker version matches plugin version
|
|
*/
|
|
export async function ensureWorkerRunning(): Promise<void> {
|
|
// Check if already healthy
|
|
if (await isWorkerHealthy()) {
|
|
// Worker is healthy, but check if version matches
|
|
await ensureWorkerVersionMatches();
|
|
return;
|
|
}
|
|
|
|
// Try to start the worker
|
|
const started = await startWorker();
|
|
|
|
if (!started) {
|
|
const port = getWorkerPort();
|
|
throw new Error(
|
|
getWorkerRestartInstructions({
|
|
port,
|
|
customPrefix: `Worker service failed to start on port ${port}.`
|
|
})
|
|
);
|
|
}
|
|
|
|
// Wait for worker to become responsive after starting
|
|
// Try up to 5 times with 500ms delays (2.5 seconds total)
|
|
for (let i = 0; i < 5; i++) {
|
|
await new Promise(resolve => setTimeout(resolve, 500));
|
|
if (await isWorkerHealthy()) {
|
|
await ensureWorkerVersionMatches();
|
|
return;
|
|
}
|
|
}
|
|
|
|
// Worker started but isn't responding
|
|
const port = getWorkerPort();
|
|
logger.error('SYSTEM', 'Worker started but not responding to health checks');
|
|
throw new Error(
|
|
getWorkerRestartInstructions({
|
|
port,
|
|
customPrefix: `Worker service started but is not responding on port ${port}.`
|
|
})
|
|
);
|
|
}
|