Files
claude-mem/src/services/process/ProcessManager.ts
T
Alex Newman a537433eae Code quality: comprehensive nonsense audit cleanup (20 issues) (#400)
* fix: prevent initialization promise from resolving on failure

Background initialization was resolving the promise even when it failed,
causing the readiness check to incorrectly indicate the worker was ready.
Now the promise stays pending on failure, ensuring /api/readiness
continues returning 503 until initialization succeeds.

Fixes critical issue #1 from nonsense audit.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: improve error handling in context inject and settings update routes

* Enhance error handling for ChromaDB failures in SearchManager

- Introduced a flag to track ChromaDB failure states.
- Updated logging messages to provide clearer feedback on ChromaDB initialization and failure.
- Modified the response structure to inform users when semantic search is unavailable due to ChromaDB issues, including installation instructions for UVX/Python.

* refactor: remove deprecated silent-debug utility functions

* Enhance error handling and validation in hooks

- Added validation for required fields in `summary-hook.ts` and `save-hook.ts` to ensure necessary inputs are provided before processing.
- Improved error messages for missing `cwd` in `save-hook.ts` and `transcript_path` in `summary-hook.ts`.
- Cleaned up code by removing unnecessary error handling logic and directly throwing errors when required fields are missing.
- Updated binary file `mem-search.zip` to reflect changes in the plugin.

* fix: improve error handling in summary hook to ensure errors are not masked

* fix: add error handling for unknown message content format in transcript parser

* fix: log error when failing to notify worker of session end

* Refactor date formatting functions: move to shared module

- Removed redundant date formatting functions from SearchManager.ts.
- Consolidated date formatting logic into shared timeline-formatting.ts.
- Updated functions to accept both ISO date strings and epoch milliseconds.

* Refactor tag stripping functions to extract shared logic

- Introduced a new internal function `stripTagsInternal` to handle the common logic for stripping memory tags from both JSON and prompt content.
- Updated `stripMemoryTagsFromJson` to utilize the new internal function, simplifying its implementation.
- Modified `stripMemoryTagsFromPrompt` to also call `stripTagsInternal`, reducing code duplication and improving maintainability.
- Removed redundant type checks and logging from both functions, as they now rely on the internal function for processing.

* Refactor settings validation in SettingsRoutes

- Consolidated multiple individual setting validations into a single validateSettings method.
- Updated handleUpdateSettings to use the new validation method for improved maintainability.
- Each setting now has its validation logic encapsulated within validateSettings, ensuring a single source of truth for validation rules.

* fix: add error logging to ProcessManager.getPidInfo()

Previously getPidInfo() returned null silently for three cases:
1. File not found (expected - no action needed)
2. JSON parse error (corrupted file - now logs warning)
3. Type validation failure (malformed data - now logs warning)

This fix adds warning logs for cases 2 and 3 to provide visibility
into PID file corruption issues. Logs include context like parsed
data structure or error message with file path.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: remove overly defensive try-catch in SessionRoutes

Remove unnecessary try-catch block that was masking potential errors when
checking file paths for session-memory meta-observations. Property access
on parsed JSON objects never throws - existing truthiness checks already
safely handle undefined/null values.

Issue #12 from nonsense audit: SessionRoutes catch-all exception masking

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: remove redundant try-catch from getWorkerPort()

Simplified getWorkerPort() by removing unnecessary try-catch wrapper.
SettingsDefaultsManager.loadFromFile() already handles missing files
by returning defaults, and .get() never throws - making the catch
block completely redundant.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* refactor: eliminate ceremonial wrapper in hook-response.ts

Replace buildHookResponse() function with direct constant export.
Most hook responses were calling a function just to return the same
constant object. Only SessionStart with context needs special handling.

Changes:
- Export STANDARD_HOOK_RESPONSE constant directly
- Simplify createHookResponse() to only handle SessionStart special case
- Update all hooks to use STANDARD_HOOK_RESPONSE instead of function call
- Eliminate buildHookResponse() function with redundant branching

Files modified:
- src/hooks/hook-response.ts: Export constant, simplify function
- src/hooks/new-hook.ts: Use STANDARD_HOOK_RESPONSE
- src/hooks/save-hook.ts: Use STANDARD_HOOK_RESPONSE
- src/hooks/summary-hook.ts: Use STANDARD_HOOK_RESPONSE

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: make getWorkerHost() consistent with getWorkerPort()

- Use SettingsDefaultsManager.get('CLAUDE_MEM_DATA_DIR') for path resolution
  instead of hardcoded ~/.claude-mem (supports custom data directories)
- Add caching to getWorkerHost() (same pattern as getWorkerPort())
- Update clearPortCache() to also clear host cache
- Both functions now have identical patterns: caching, consistent path
  resolution, and same error handling via SettingsDefaultsManager

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* refactor: inline single-use timeout constants in ProcessManager

Remove 6 timeout constants used only once each, inlining their values
directly at the point of use. Following YAGNI principle - constants
should only exist when used multiple times.

Removed constants:
- PROCESS_STOP_TIMEOUT_MS (5000ms)
- HEALTH_CHECK_TIMEOUT_MS (10000ms)
- HEALTH_CHECK_INTERVAL_MS (200ms)
- HEALTH_CHECK_FETCH_TIMEOUT_MS (1000ms)
- PROCESS_EXIT_CHECK_INTERVAL_MS (100ms)
- HTTP_SHUTDOWN_TIMEOUT_MS (2000ms)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: replace overly broad path filter in HTTP logging middleware

Replace `req.path.includes('.')` with explicit static file extension
checking to prevent incorrectly skipping API endpoint logging.

- Add `staticExtensions` array with legitimate asset types
- Use `.endsWith()` matching instead of `.includes()`
- API endpoints containing periods (if any) now logged correctly
- Static assets (.js, .css, .svg, etc.) still skip logging as intended

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* refactor: expand logger.formatTool() to handle all tool types

Replace hard-coded tool formatting for 4 tools with comprehensive coverage:

File operations (Read, Edit, Write, NotebookEdit):
- Consolidated file_path handling for all file operations
- Added notebook_path support for NotebookEdit
- Shows filename only (not full path)

Search tools (Glob, Grep):
- Glob: shows pattern
- Grep: shows pattern (truncated if > 30 chars)

Network tools (WebFetch, WebSearch):
- Shows URL or query (truncated if > 40 chars)

Meta tools (Task, Skill, LSP):
- Task: shows subagent_type or description
- Skill: shows skill name
- LSP: shows operation type

This eliminates the "hard-coded 4 tools" limitation and provides
meaningful log output for all tool types.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: remove all truncation from logger.formatTool()

Truncation hides critical debugging information. Show everything:

- Bash: full command (was truncated at 50 chars)
- File operations: full path (was showing filename only)
- Grep: full pattern (was truncated at 30 chars)
- WebFetch/WebSearch: full URL/query (was truncated at 40 chars)
- Task: full description (was truncated at 30 chars)

Logs exist to provide complete information. Never hide details.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* refactor: replace array indexing with regex capture for drive letter

Use explicit regex capture group to extract Windows drive letter instead
of assuming cwd[0] is always the first character. Safer and more explicit.

- Changed cwd.match(/^[A-Z]:\\/i) to cwd.match(/^([A-Z]):\\/i)
- Extract drive letter from driveMatch[1] instead of cwd[0]
- Restructured control flow to avoid nested conditionals

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: return computed values from DataRoutes processing endpoint

The handleSetProcessing endpoint was computing queueDepth and
activeSessions but not including them in the response. This commit
includes all computed values in the API response.

- Return queueDepth and activeSessions in /api/processing response
- Eliminates dead code pattern where values are computed but unused
- API callers can now access these metrics

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: move error handling into SettingsDefaultsManager.loadFromFile()

Wrap the entire loadFromFile() method in try-catch so it handles ALL
error cases (missing file, corrupted JSON, permission errors, I/O failures)
instead of forcing every caller to add redundant try-catch blocks.

This follows DRY principle: one function owns error handling, all callers
stay simple and clean.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* Refactor hook response handling and optimize token estimation

- Removed the HookType and HookResponse types and the createHookResponse function from hook-response.ts to simplify the response handling for hooks.
- Introduced a standardized hook response for all hooks in hook-response.ts.
- Moved the estimateTokens function from SearchManager.ts to timeline-formatting.ts for better reusability and clarity.
- Cleaned up redundant estimateTokens function definitions in SearchManager.ts.

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-20 19:41:33 -05:00

434 lines
15 KiB
TypeScript

import { existsSync, readFileSync, writeFileSync, unlinkSync, mkdirSync } from 'fs';
import { createWriteStream } from 'fs';
import { join } from 'path';
import { spawn, spawnSync } from 'child_process';
import { homedir } from 'os';
import { DATA_DIR } from '../../shared/paths.js';
import { getBunPath, isBunAvailable } from '../../utils/bun-path.js';
import { SettingsDefaultsManager } from '../../shared/SettingsDefaultsManager.js';
const PID_FILE = join(DATA_DIR, 'worker.pid');
const LOG_DIR = join(DATA_DIR, 'logs');
const MARKETPLACE_ROOT = join(homedir(), '.claude', 'plugins', 'marketplaces', 'thedotmack');
interface PidInfo {
pid: number;
port: number;
startedAt: string;
version: string;
}
export class ProcessManager {
static async start(port: number): Promise<{ success: boolean; pid?: number; error?: string }> {
// Validate port range
if (isNaN(port) || port < 1024 || port > 65535) {
return {
success: false,
error: `Invalid port ${port}. Must be between 1024 and 65535`
};
}
// Check if already running
if (await this.isRunning()) {
const info = this.getPidInfo();
return { success: true, pid: info?.pid };
}
// Ensure log directory exists
mkdirSync(LOG_DIR, { recursive: true });
// On Windows, use the wrapper script to solve zombie port problem
// On Unix, use the worker directly
const scriptName = process.platform === 'win32' ? 'worker-wrapper.cjs' : 'worker-service.cjs';
const workerScript = join(MARKETPLACE_ROOT, 'plugin', 'scripts', scriptName);
if (!existsSync(workerScript)) {
return { success: false, error: `Worker script not found at ${workerScript}` };
}
const logFile = this.getLogFilePath();
// Use Bun on all platforms with PowerShell workaround for Windows console popups
return this.startWithBun(workerScript, logFile, port);
}
private static isBunAvailable(): boolean {
return isBunAvailable();
}
/**
* Escapes a string for safe use in PowerShell single-quoted strings.
* In PowerShell single quotes, the only special character is the single quote itself,
* which must be doubled to escape it.
*/
private static escapePowerShellString(str: string): string {
return str.replace(/'/g, "''");
}
private static async startWithBun(script: string, logFile: string, port: number): Promise<{ success: boolean; pid?: number; error?: string }> {
const bunPath = getBunPath();
if (!bunPath) {
return {
success: false,
error: 'Bun is required but not found in PATH or common installation paths. Install from https://bun.sh'
};
}
try {
const isWindows = process.platform === 'win32';
if (isWindows) {
// Windows: Use PowerShell Start-Process with -WindowStyle Hidden
// This properly hides the console window (affects both Bun and Node.js)
// Note: windowsHide: true doesn't work with detached: true (Bun inherits Node.js process spawning semantics)
// See: https://github.com/nodejs/node/issues/21825 and PR #315 for detailed testing
//
// On Windows, we start worker-wrapper.cjs which manages the actual worker-service.cjs.
// This solves the zombie port problem: the wrapper has no sockets, so when it kills
// and respawns the inner worker, the socket is properly released.
//
// Security: All paths (bunPath, script, MARKETPLACE_ROOT) are application-controlled system paths,
// not user input. If an attacker could modify these paths, they would already have full filesystem
// access including direct access to ~/.claude-mem/claude-mem.db. Nevertheless, we properly escape
// all values for PowerShell to follow security best practices.
const escapedBunPath = this.escapePowerShellString(bunPath);
const escapedScript = this.escapePowerShellString(script);
const escapedWorkDir = this.escapePowerShellString(MARKETPLACE_ROOT);
const escapedLogFile = this.escapePowerShellString(logFile);
const envVars = `$env:CLAUDE_MEM_WORKER_PORT='${port}'`;
const psCommand = `${envVars}; Start-Process -FilePath '${escapedBunPath}' -ArgumentList '${escapedScript}' -WorkingDirectory '${escapedWorkDir}' -WindowStyle Hidden -RedirectStandardOutput '${escapedLogFile}' -RedirectStandardError '${escapedLogFile}.err' -PassThru | Select-Object -ExpandProperty Id`;
const result = spawnSync('powershell', ['-Command', psCommand], {
stdio: 'pipe',
timeout: 10000,
windowsHide: true
});
if (result.status !== 0) {
return {
success: false,
error: `PowerShell spawn failed: ${result.stderr?.toString() || 'unknown error'}`
};
}
const pid = parseInt(result.stdout.toString().trim(), 10);
if (isNaN(pid)) {
return { success: false, error: 'Failed to get PID from PowerShell' };
}
// Write PID file
this.writePidFile({
pid,
port,
startedAt: new Date().toISOString(),
version: process.env.npm_package_version || 'unknown'
});
// Wait for health
return this.waitForHealth(pid, port);
} else {
// Unix: Use standard spawn with detached
const child = spawn(bunPath, [script], {
detached: true,
stdio: ['ignore', 'pipe', 'pipe'],
env: { ...process.env, CLAUDE_MEM_WORKER_PORT: String(port) },
cwd: MARKETPLACE_ROOT
});
// Write logs
const logStream = createWriteStream(logFile, { flags: 'a' });
child.stdout?.pipe(logStream);
child.stderr?.pipe(logStream);
child.unref();
if (!child.pid) {
return { success: false, error: 'Failed to get PID from spawned process' };
}
// Write PID file
this.writePidFile({
pid: child.pid,
port,
startedAt: new Date().toISOString(),
version: process.env.npm_package_version || 'unknown'
});
// Wait for health
return this.waitForHealth(child.pid, port);
}
} catch (error) {
return {
success: false,
error: error instanceof Error ? error.message : String(error)
};
}
}
static async stop(timeout: number = 5000): Promise<boolean> {
const info = this.getPidInfo();
if (process.platform === 'win32') {
// Windows: Try graceful HTTP shutdown first - this works regardless of PID file state
// because the worker shuts itself down from the inside (via wrapper IPC)
const port = info?.port ?? this.getPortFromSettings();
const httpShutdownSucceeded = await this.tryHttpShutdown(port);
if (httpShutdownSucceeded) {
// HTTP shutdown succeeded - worker confirmed down, safe to remove PID file
this.removePidFile();
return true;
}
// HTTP shutdown failed (worker not responding), fall back to taskkill
if (!info) {
// No PID file and HTTP failed - nothing more we can do
return true;
}
const { execSync } = await import('child_process');
try {
// Use taskkill /T /F to kill entire process tree
// This ensures the wrapper AND all its children (inner worker, MCP, ChromaSync) are killed
// which is necessary to properly release the socket and avoid zombie ports
execSync(`taskkill /PID ${info.pid} /T /F`, { timeout: 10000, stdio: 'ignore' });
} catch {
// Process may already be dead
}
// Wait for process to actually exit before removing PID file
try {
await this.waitForExit(info.pid, timeout);
} catch {
// Timeout waiting - process may still be alive
}
// Only remove PID file if process is confirmed dead
if (!this.isProcessAlive(info.pid)) {
this.removePidFile();
}
return true;
} else {
// Unix: Use signals (unchanged behavior)
if (!info) return true;
try {
process.kill(info.pid, 'SIGTERM');
await this.waitForExit(info.pid, timeout);
} catch {
try {
process.kill(info.pid, 'SIGKILL');
} catch {
// Process already dead
}
}
this.removePidFile();
return true;
}
}
static async restart(port: number): Promise<{ success: boolean; pid?: number; error?: string }> {
await this.stop();
return this.start(port);
}
static async status(): Promise<{ running: boolean; pid?: number; port?: number; uptime?: string }> {
const info = this.getPidInfo();
if (!info) return { running: false };
const running = this.isProcessAlive(info.pid);
return {
running,
pid: running ? info.pid : undefined,
port: running ? info.port : undefined,
uptime: running ? this.formatUptime(info.startedAt) : undefined
};
}
static async isRunning(): Promise<boolean> {
const info = this.getPidInfo();
if (!info) return false;
const alive = this.isProcessAlive(info.pid);
if (!alive) {
this.removePidFile(); // Clean up stale PID file
}
return alive;
}
/**
* Get worker port from settings file
*/
private static getPortFromSettings(): number {
try {
const settingsPath = join(DATA_DIR, 'settings.json');
const settings = SettingsDefaultsManager.loadFromFile(settingsPath);
return parseInt(settings.CLAUDE_MEM_WORKER_PORT, 10);
} catch {
return parseInt(SettingsDefaultsManager.get('CLAUDE_MEM_WORKER_PORT'), 10);
}
}
/**
* Try to shut down the worker via HTTP endpoint
* Returns true if shutdown succeeded, false if worker not responding
*/
private static async tryHttpShutdown(port: number): Promise<boolean> {
try {
// Send shutdown request
const response = await fetch(`http://127.0.0.1:${port}/api/admin/shutdown`, {
method: 'POST',
signal: AbortSignal.timeout(2000)
});
if (!response.ok) {
return false;
}
// Wait for worker to actually stop responding
return await this.waitForWorkerDown(port, 5000);
} catch {
// Worker not responding to HTTP - it may be dead or hung
return false;
}
}
/**
* Wait for worker to stop responding on the given port
*/
private static async waitForWorkerDown(port: number, timeout: number): Promise<boolean> {
const startTime = Date.now();
while (Date.now() - startTime < timeout) {
try {
await fetch(`http://127.0.0.1:${port}/api/health`, {
signal: AbortSignal.timeout(500)
});
// Still responding, wait and retry
await new Promise(resolve => setTimeout(resolve, 100));
} catch {
// Worker stopped responding - success
return true;
}
}
// Timeout - worker still responding
return false;
}
// Helper methods
private static getPidInfo(): PidInfo | null {
try {
if (!existsSync(PID_FILE)) return null;
const content = readFileSync(PID_FILE, 'utf-8');
const parsed = JSON.parse(content);
// Validate required fields have correct types
if (typeof parsed.pid !== 'number' || typeof parsed.port !== 'number') {
logger.warn('PROCESS', 'Malformed PID file: missing or invalid pid/port fields', {}, { parsed });
return null;
}
return parsed as PidInfo;
} catch (error) {
logger.warn('PROCESS', 'Failed to read PID file', {}, {
error: error instanceof Error ? error.message : String(error),
path: PID_FILE
});
return null;
}
}
private static writePidFile(info: PidInfo): void {
mkdirSync(DATA_DIR, { recursive: true });
writeFileSync(PID_FILE, JSON.stringify(info, null, 2));
}
private static removePidFile(): void {
try {
if (existsSync(PID_FILE)) {
unlinkSync(PID_FILE);
}
} catch {
// Ignore errors
}
}
private static isProcessAlive(pid: number): boolean {
try {
process.kill(pid, 0);
return true;
} catch {
return false;
}
}
private static async waitForHealth(pid: number, port: number, timeoutMs: number = 10000): Promise<{ success: boolean; pid?: number; error?: string }> {
const startTime = Date.now();
const isWindows = process.platform === 'win32';
// Increase timeout on Windows to account for slower process startup
const adjustedTimeout = isWindows ? timeoutMs * 2 : timeoutMs;
while (Date.now() - startTime < adjustedTimeout) {
// Check if process is still alive
if (!this.isProcessAlive(pid)) {
const errorMsg = isWindows
? `Process died during startup\n\nTroubleshooting:\n1. Check Task Manager for zombie 'bun.exe' or 'node.exe' processes\n2. Verify port ${port} is not in use: netstat -ano | findstr ${port}\n3. Check worker logs in ~/.claude-mem/logs/\n4. See GitHub issues: #363, #367, #371, #373\n5. Docs: https://docs.claude-mem.ai/troubleshooting/windows-issues`
: 'Process died during startup';
return { success: false, error: errorMsg };
}
// Try readiness check (changed from /health to /api/readiness)
try {
const response = await fetch(`http://127.0.0.1:${port}/api/readiness`, {
signal: AbortSignal.timeout(1000)
});
if (response.ok) {
return { success: true, pid };
}
} catch {
// Not ready yet, continue polling
}
await new Promise(resolve => setTimeout(resolve, 200));
}
const timeoutMsg = isWindows
? `Worker failed to start on Windows (readiness check timed out after ${adjustedTimeout}ms)\n\nTroubleshooting:\n1. Check Task Manager for zombie 'bun.exe' or 'node.exe' processes\n2. Verify port ${port} is not in use: netstat -ano | findstr ${port}\n3. Check worker logs in ~/.claude-mem/logs/\n4. See GitHub issues: #363, #367, #371, #373\n5. Docs: https://docs.claude-mem.ai/troubleshooting/windows-issues`
: `Readiness check timed out after ${adjustedTimeout}ms`;
return { success: false, error: timeoutMsg };
}
private static async waitForExit(pid: number, timeout: number): Promise<void> {
const startTime = Date.now();
while (Date.now() - startTime < timeout) {
if (!this.isProcessAlive(pid)) {
return;
}
await new Promise(resolve => setTimeout(resolve, 100));
}
throw new Error('Process did not exit within timeout');
}
private static getLogFilePath(): string {
const date = new Date().toISOString().slice(0, 10);
return join(LOG_DIR, `worker-${date}.log`);
}
private static formatUptime(startedAt: string): string {
const startTime = new Date(startedAt).getTime();
const now = Date.now();
const diffMs = now - startTime;
const seconds = Math.floor(diffMs / 1000);
const minutes = Math.floor(seconds / 60);
const hours = Math.floor(minutes / 60);
const days = Math.floor(hours / 24);
if (days > 0) return `${days}d ${hours % 24}h`;
if (hours > 0) return `${hours}h ${minutes % 60}m`;
if (minutes > 0) return `${minutes}m ${seconds % 60}s`;
return `${seconds}s`;
}
}