ba1ef6c42c
* fix: resolve search, database, and docker bugs (#1913, #1916, #1956, #1957, #2048) - Fix concept/concepts param mismatch in SearchManager.normalizeParams (#1916) - Add FTS5 keyword fallback when ChromaDB is unavailable (#1913, #2048) - Add periodic WAL checkpoint and journal_size_limit to prevent unbounded WAL growth (#1956) - Add periodic clearFailed() to purge stale pending_messages (#1957) - Fix nounset-safe TTY_ARGS expansion in docker/claude-mem/run.sh Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: prevent silent data loss on non-XML responses, add queue info to /health (#1867, #1874) - ResponseProcessor: mark messages as failed (with retry) instead of confirming when the LLM returns non-XML garbage (auth errors, rate limits) (#1874) - Health endpoint: include activeSessions count for queue liveness monitoring (#1867) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: cache isFts5Available() at construction time Addresses Greptile review: avoid DDL probe (CREATE + DROP) on every text query. Result is now cached in _fts5Available at construction. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve worker stability bugs — pool deadlock, MCP loopback, restart guard (#1868, #1876, #2053) - Replace flat consecutiveRestarts counter with time-windowed RestartGuard: only counts restarts within 60s window (cap=10), decays after 5min of success. Prevents stranding pending messages on long-running sessions. (#2053) - Add idle session eviction to pool slot allocation: when all slots are full, evict the idlest session (no pending work, oldest activity) to free a slot for new requests, preventing 60s timeout deadlock. (#1868) - Fix MCP loopback self-check: use process.execPath instead of bare 'node' which fails on non-interactive PATH. Fix crash misclassification by removing false "Generator exited unexpectedly" error log on normal completion. (#1876) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve hooks reliability bugs — summarize exit code, session-init health wait (#1896, #1901, #1903, #1907) - Wrap summarize hook's workerHttpRequest in try/catch to prevent exit code 2 (blocking error) on network failures or malformed responses. Session exit no longer blocks on worker errors. (#1901) - Add health-check wait loop to UserPromptSubmit session-init command in hooks.json. On Linux/WSL where hook ordering fires UserPromptSubmit before SessionStart, session-init now waits up to 10s for worker health before proceeding. Also wrap session-init HTTP call in try/catch. (#1907) - Close #1896 as already-fixed: mtime comparison at file-context.ts:255-267 bypasses truncation when file is newer than latest observation. - Close #1903 as no-repro: hooks.json correctly declares all hook events. Issue was Claude Code 12.0.1/macOS platform event-dispatch bug. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: security hardening — bearer auth, path validation, rate limits, per-user port (#1932, #1933, #1934, #1935, #1936) - Add bearer token auth to all API endpoints: auto-generated 32-byte token stored at ~/.claude-mem/worker-auth-token (mode 0600). All hook, MCP, viewer, and OpenCode requests include Authorization header. Health/readiness endpoints exempt for polling. (#1932, #1933) - Add path traversal protection: watch.context.path validated against project root and ~/.claude-mem/ before write. Rejects ../../../etc style attacks. (#1934) - Reduce JSON body limit from 50MB to 5MB. Add in-memory rate limiter (300 req/min/IP) to prevent abuse. (#1935) - Derive default worker port from UID (37700 + uid%100) to prevent cross-user data leakage on multi-user macOS. Windows falls back to 37777. Shell hooks use same formula via id -u. (#1936) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve search project filtering and import Chroma sync (#1911, #1912, #1914, #1918) - Fix per-type search endpoints to pass project filter to Chroma queries and SQLite hydration. searchObservations/Sessions/UserPrompts now use $or clause matching project + merged_into_project. (#1912) - Fix timeline/search methods to pass project to Chroma anchor queries. Prevents cross-project result leakage when project param omitted. (#1911) - Sync imported observations to ChromaDB after FTS rebuild. Import endpoint now calls chromaSync.syncObservation() for each imported row, making them visible to MCP search(). (#1914) - Fix session-init cwd fallback to match context.ts (process.cwd()). Prevents project key mismatch that caused "no previous sessions" on fresh sessions. (#1918) - Fix sync-marketplace restart to include auth token and per-user port. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve all CodeRabbit and Greptile review comments on PR #2080 - Fix run.sh comment mismatch (no-op flag vs empty array) - Gate session-init on health check success (prevent running when worker unreachable) - Fix date_desc ordering ignored in FTS session search - Age-scope failed message purge (1h retention) instead of clearing all - Anchor RestartGuard decay to real successes (null init, not Date.now()) - Add recordSuccess() calls in ResponseProcessor and completion path - Prevent caller headers from overriding bearer auth token - Add lazy cleanup for rate limiter map to prevent unbounded growth - Bound post-import Chroma sync with concurrency limit of 8 - Add doc_type:'observation' filter to Chroma queries feeding observation hydration - Add FTS fallback to all specialized search handlers (observations, sessions, prompts, timeline) - Add response.ok check and error handling in viewer saveSettings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve CodeRabbit round-2 review comments - Use failure timestamp (COALESCE) instead of created_at_epoch for stale purge - Downgrade _fts5Available flag when FTS table creation fails - Escape FTS5 MATCH input by quoting user queries as literal phrases - Escape LIKE metacharacters (%, _, \) in prompt text search - Add response.ok check in initial settings load (matches save flow) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve CodeRabbit round-3 review comments - Include failed_at_epoch in COALESCE for age-scoped purge - Re-throw FTS5 errors so callers can distinguish failure from no-results - Wrap all FTS fallback calls in SearchManager with try/catch Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
245 lines
8.1 KiB
TypeScript
245 lines
8.1 KiB
TypeScript
import path from "path";
|
|
import { readFileSync } from "fs";
|
|
import { logger } from "../utils/logger.js";
|
|
import { HOOK_TIMEOUTS, getTimeout } from "./hook-constants.js";
|
|
import { SettingsDefaultsManager } from "./SettingsDefaultsManager.js";
|
|
import { MARKETPLACE_ROOT } from "./paths.js";
|
|
import { getAuthToken } from "./auth-token.js";
|
|
|
|
// Named constants for health checks
|
|
// Allow env var override for users on slow systems (e.g., CLAUDE_MEM_HEALTH_TIMEOUT_MS=10000)
|
|
const HEALTH_CHECK_TIMEOUT_MS = (() => {
|
|
const envVal = process.env.CLAUDE_MEM_HEALTH_TIMEOUT_MS;
|
|
if (envVal) {
|
|
const parsed = parseInt(envVal, 10);
|
|
if (Number.isFinite(parsed) && parsed >= 500 && parsed <= 300000) {
|
|
return parsed;
|
|
}
|
|
// Invalid env var — log once and use default
|
|
logger.warn('SYSTEM', 'Invalid CLAUDE_MEM_HEALTH_TIMEOUT_MS, using default', {
|
|
value: envVal, min: 500, max: 300000
|
|
});
|
|
}
|
|
return getTimeout(HOOK_TIMEOUTS.HEALTH_CHECK);
|
|
})();
|
|
|
|
/**
|
|
* Fetch with a timeout using Promise.race instead of AbortSignal.
|
|
* AbortSignal.timeout() causes a libuv assertion crash in Bun on Windows,
|
|
* so we use a racing setTimeout pattern that avoids signal cleanup entirely.
|
|
* The orphaned fetch is harmless since the process exits shortly after.
|
|
*/
|
|
export function fetchWithTimeout(url: string, init: RequestInit = {}, timeoutMs: number): Promise<Response> {
|
|
return new Promise((resolve, reject) => {
|
|
const timeoutId = setTimeout(
|
|
() => reject(new Error(`Request timed out after ${timeoutMs}ms`)),
|
|
timeoutMs
|
|
);
|
|
fetch(url, init).then(
|
|
response => { clearTimeout(timeoutId); resolve(response); },
|
|
err => { clearTimeout(timeoutId); reject(err); }
|
|
);
|
|
});
|
|
}
|
|
|
|
// Cache to avoid repeated settings file reads
|
|
let cachedPort: number | null = null;
|
|
let cachedHost: string | null = null;
|
|
|
|
/**
|
|
* Get the worker port number from settings
|
|
* Uses CLAUDE_MEM_WORKER_PORT from settings file or default (37777)
|
|
* Caches the port value to avoid repeated file reads
|
|
*/
|
|
export function getWorkerPort(): number {
|
|
if (cachedPort !== null) {
|
|
return cachedPort;
|
|
}
|
|
|
|
const settingsPath = path.join(SettingsDefaultsManager.get('CLAUDE_MEM_DATA_DIR'), 'settings.json');
|
|
const settings = SettingsDefaultsManager.loadFromFile(settingsPath);
|
|
cachedPort = parseInt(settings.CLAUDE_MEM_WORKER_PORT, 10);
|
|
return cachedPort;
|
|
}
|
|
|
|
/**
|
|
* Get the worker host address
|
|
* Uses CLAUDE_MEM_WORKER_HOST from settings file or default (127.0.0.1)
|
|
* Caches the host value to avoid repeated file reads
|
|
*/
|
|
export function getWorkerHost(): string {
|
|
if (cachedHost !== null) {
|
|
return cachedHost;
|
|
}
|
|
|
|
const settingsPath = path.join(SettingsDefaultsManager.get('CLAUDE_MEM_DATA_DIR'), 'settings.json');
|
|
const settings = SettingsDefaultsManager.loadFromFile(settingsPath);
|
|
cachedHost = settings.CLAUDE_MEM_WORKER_HOST;
|
|
return cachedHost;
|
|
}
|
|
|
|
/**
|
|
* Clear the cached port and host values.
|
|
* Call this when settings are updated to force re-reading from file.
|
|
*/
|
|
export function clearPortCache(): void {
|
|
cachedPort = null;
|
|
cachedHost = null;
|
|
}
|
|
|
|
/**
|
|
* Build a full URL for a given API path.
|
|
*/
|
|
export function buildWorkerUrl(apiPath: string): string {
|
|
return `http://${getWorkerHost()}:${getWorkerPort()}${apiPath}`;
|
|
}
|
|
|
|
/**
|
|
* Make an HTTP request to the worker over TCP.
|
|
*
|
|
* This is the preferred way for hooks to communicate with the worker.
|
|
*/
|
|
export function workerHttpRequest(
|
|
apiPath: string,
|
|
options: {
|
|
method?: string;
|
|
headers?: Record<string, string>;
|
|
body?: string;
|
|
timeoutMs?: number;
|
|
} = {}
|
|
): Promise<Response> {
|
|
const method = options.method ?? 'GET';
|
|
const timeoutMs = options.timeoutMs ?? HEALTH_CHECK_TIMEOUT_MS;
|
|
|
|
const url = buildWorkerUrl(apiPath);
|
|
const init: RequestInit = { method };
|
|
// Inject bearer token for worker API auth (#1932/#1933)
|
|
// Merge caller headers first, then set Authorization last to prevent override
|
|
const authHeaders: Record<string, string> = {
|
|
...options.headers,
|
|
'Authorization': `Bearer ${getAuthToken()}`
|
|
};
|
|
init.headers = authHeaders;
|
|
if (options.body) {
|
|
init.body = options.body;
|
|
}
|
|
|
|
if (timeoutMs > 0) {
|
|
return fetchWithTimeout(url, init, timeoutMs);
|
|
}
|
|
return fetch(url, init);
|
|
}
|
|
|
|
/**
|
|
* Check if worker HTTP server is responsive.
|
|
* Uses /api/health (liveness) instead of /api/readiness because:
|
|
* - Hooks have 15-second timeout, but full initialization can take 5+ minutes (MCP connection)
|
|
* - /api/health returns 200 as soon as HTTP server is up (sufficient for hook communication)
|
|
* - /api/readiness returns 503 until full initialization completes (too slow for hooks)
|
|
* See: https://github.com/thedotmack/claude-mem/issues/811
|
|
*/
|
|
async function isWorkerHealthy(): Promise<boolean> {
|
|
const response = await workerHttpRequest('/api/health', { timeoutMs: HEALTH_CHECK_TIMEOUT_MS });
|
|
return response.ok;
|
|
}
|
|
|
|
/**
|
|
* Get the current plugin version from package.json.
|
|
* Returns 'unknown' on ENOENT/EBUSY (shutdown race condition, fix #1042).
|
|
*/
|
|
function getPluginVersion(): string {
|
|
try {
|
|
const packageJsonPath = path.join(MARKETPLACE_ROOT, 'package.json');
|
|
const packageJson = JSON.parse(readFileSync(packageJsonPath, 'utf-8'));
|
|
return packageJson.version;
|
|
} catch (error: unknown) {
|
|
const code = error instanceof Error ? (error as NodeJS.ErrnoException).code : undefined;
|
|
if (code === 'ENOENT' || code === 'EBUSY') {
|
|
logger.debug('SYSTEM', 'Could not read plugin version (shutdown race)', { code });
|
|
return 'unknown';
|
|
}
|
|
throw error;
|
|
}
|
|
}
|
|
|
|
/**
|
|
* Get the running worker's version from the API
|
|
*/
|
|
async function getWorkerVersion(): Promise<string> {
|
|
const response = await workerHttpRequest('/api/version', { timeoutMs: HEALTH_CHECK_TIMEOUT_MS });
|
|
if (!response.ok) {
|
|
throw new Error(`Failed to get worker version: ${response.status}`);
|
|
}
|
|
const data = await response.json() as { version: string };
|
|
return data.version;
|
|
}
|
|
|
|
/**
|
|
* Check if worker version matches plugin version
|
|
* Note: Auto-restart on version mismatch is now handled in worker-service.ts start command (issue #484)
|
|
* This function logs for informational purposes only.
|
|
* Skips comparison when either version is 'unknown' (fix #1042 — avoids restart loops).
|
|
*/
|
|
async function checkWorkerVersion(): Promise<void> {
|
|
let pluginVersion: string;
|
|
try {
|
|
pluginVersion = getPluginVersion();
|
|
} catch (error: unknown) {
|
|
logger.debug('SYSTEM', 'Version check failed reading plugin version', {
|
|
error: error instanceof Error ? error.message : String(error)
|
|
});
|
|
return;
|
|
}
|
|
|
|
// Skip version check if plugin version couldn't be read (shutdown race)
|
|
if (pluginVersion === 'unknown') return;
|
|
|
|
let workerVersion: string;
|
|
try {
|
|
workerVersion = await getWorkerVersion();
|
|
} catch (error: unknown) {
|
|
logger.debug('SYSTEM', 'Version check failed reading worker version', {
|
|
error: error instanceof Error ? error.message : String(error)
|
|
});
|
|
return;
|
|
}
|
|
|
|
// Skip version check if worker version is 'unknown' (avoids restart loops)
|
|
if (workerVersion === 'unknown') return;
|
|
|
|
if (pluginVersion !== workerVersion) {
|
|
// Just log debug info - auto-restart handles the mismatch in worker-service.ts
|
|
logger.debug('SYSTEM', 'Version check', {
|
|
pluginVersion,
|
|
workerVersion,
|
|
note: 'Mismatch will be auto-restarted by worker-service start command'
|
|
});
|
|
}
|
|
}
|
|
|
|
|
|
/**
|
|
* Ensure worker service is running
|
|
* Quick health check - returns false if worker not healthy (doesn't block)
|
|
* Port might be in use by another process, or worker might not be started yet
|
|
*/
|
|
export async function ensureWorkerRunning(): Promise<boolean> {
|
|
// Quick health check (single attempt, no polling)
|
|
try {
|
|
if (await isWorkerHealthy()) {
|
|
await checkWorkerVersion(); // logs warning on mismatch, doesn't restart
|
|
return true; // Worker healthy
|
|
}
|
|
} catch (e) {
|
|
// Not healthy - log for debugging
|
|
logger.debug('SYSTEM', 'Worker health check failed', {
|
|
error: e instanceof Error ? e.message : String(e)
|
|
});
|
|
}
|
|
|
|
// Port might be in use by something else, or worker not started
|
|
// Return false but don't throw - let caller decide how to handle
|
|
logger.warn('SYSTEM', 'Worker not healthy, hook will proceed gracefully');
|
|
return false;
|
|
}
|