Compare commits
9 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| c2c3e3069c | |||
| 7966c6cba9 | |||
| e4e735d3ff | |||
| 780cc3894e | |||
| 8d46c00dd8 | |||
| 4ab601fc9f | |||
| 097035de6c | |||
| e788fd3676 | |||
| 44cdbec173 |
@@ -10,7 +10,7 @@
|
||||
"plugins": [
|
||||
{
|
||||
"name": "claude-mem",
|
||||
"version": "10.3.0",
|
||||
"version": "10.3.2",
|
||||
"source": "./plugin",
|
||||
"description": "Persistent memory system for Claude Code - context compression across sessions"
|
||||
}
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
datasets/
|
||||
node_modules/
|
||||
dist/
|
||||
!installer/dist/
|
||||
*.log
|
||||
.DS_Store
|
||||
.env
|
||||
|
||||
+40
-29
@@ -2,6 +2,46 @@
|
||||
|
||||
All notable changes to claude-mem.
|
||||
|
||||
## [v10.3.1] - 2026-02-19
|
||||
|
||||
## Fix: Prevent Duplicate Worker Daemons and Zombie Processes
|
||||
|
||||
Three root causes of chroma-mcp timeouts identified and fixed:
|
||||
|
||||
### PID-based daemon guard
|
||||
Exit immediately on startup if PID file points to a live process. Prevents the race condition where hooks firing simultaneously could start multiple daemons before either wrote a PID file.
|
||||
|
||||
### Port-based daemon guard
|
||||
Exit if port 37777 is already bound — runs before WorkerService constructor registers keepalive signal handlers that previously prevented exit on EADDRINUSE.
|
||||
|
||||
### Guaranteed process.exit() after HTTP shutdown
|
||||
HTTP shutdown (POST /api/admin/shutdown) now calls `process.exit(0)` in a `try/finally` block. Previously, zombie workers stayed alive after shutdown, and background tasks reconnected to chroma-mcp, spawning duplicate subprocesses contending for the same data directory.
|
||||
|
||||
## [v10.3.0] - 2026-02-18
|
||||
|
||||
## Replace WASM Embeddings with Persistent chroma-mcp MCP Connection
|
||||
|
||||
### Highlights
|
||||
|
||||
- **New: ChromaMcpManager** — Singleton stdio MCP client communicating with chroma-mcp via `uvx`, replacing the previous ChromaServerManager (`npx chroma run` + `chromadb` npm + ONNX/WASM)
|
||||
- **Eliminates native binary issues** — No more segfaults, WASM embedding failures, or cross-platform install headaches
|
||||
- **Graceful subprocess lifecycle** — Wired into GracefulShutdown for clean teardown; zombie process prevention with kill-on-failure and stale `onclose` handler guards
|
||||
- **Connection backoff** — 10-second reconnect backoff prevents chroma-mcp spawn storms
|
||||
- **SQL injection guards** — Added parameterization to ChromaSync ID exclusion queries
|
||||
- **Simplified ChromaSync** — Reduced complexity by delegating embedding concerns to chroma-mcp
|
||||
|
||||
### Breaking Changes
|
||||
|
||||
None — backward compatible. ChromaDB data is preserved; only the connection mechanism changed.
|
||||
|
||||
### Files Changed
|
||||
|
||||
- `src/services/sync/ChromaMcpManager.ts` (new) — MCP client singleton
|
||||
- `src/services/sync/ChromaServerManager.ts` (deleted) — Old WASM/native approach
|
||||
- `src/services/sync/ChromaSync.ts` — Simplified to use MCP client
|
||||
- `src/services/worker-service.ts` — Updated startup sequence
|
||||
- `src/services/infrastructure/GracefulShutdown.ts` — Subprocess cleanup integration
|
||||
|
||||
## [v10.2.6] - 2026-02-18
|
||||
|
||||
## Bug Fixes
|
||||
@@ -1411,32 +1451,3 @@ Thanks @yungweng for the detailed bug report!
|
||||
- Updated worker CLI scripts to reference worker-service.cjs directly
|
||||
- Simplified hook command configurations
|
||||
|
||||
## [v8.2.8] - 2025-12-29
|
||||
|
||||
## Bug Fixes
|
||||
|
||||
- Fixed orphaned chroma-mcp processes during shutdown (#489)
|
||||
- Added graceful shutdown handling with signal handlers registered early in WorkerService lifecycle
|
||||
- Ensures ChromaSync subprocess cleanup even when interrupted during initialization
|
||||
- Removes PID file during shutdown to prevent stale process tracking
|
||||
|
||||
## Technical Details
|
||||
|
||||
This patch release addresses a race condition where SIGTERM/SIGINT signals arriving during ChromaSync initialization could leave orphaned chroma-mcp processes. The fix moves signal handler registration from the start() method to the constructor, ensuring cleanup handlers exist throughout the entire initialization lifecycle.
|
||||
|
||||
**Full Changelog**: https://github.com/thedotmack/claude-mem/compare/v8.2.7...v8.2.8
|
||||
|
||||
## [v8.2.7] - 2025-12-29
|
||||
|
||||
## What's Changed
|
||||
|
||||
### Token Optimizations
|
||||
- Simplified MCP server tool definitions for reduced token usage
|
||||
- Removed outdated troubleshooting and mem-search skill documentation
|
||||
- Enhanced search parameter descriptions for better clarity
|
||||
- Streamlined MCP workflows for improved efficiency
|
||||
|
||||
This release significantly reduces the token footprint of the plugin's MCP tools and documentation.
|
||||
|
||||
**Full Changelog**: https://github.com/thedotmack/claude-mem/compare/v8.2.6...v8.2.7
|
||||
|
||||
|
||||
@@ -198,7 +198,7 @@ See [Architecture Overview](https://docs.claude-mem.ai/architecture/overview) fo
|
||||
|
||||
## MCP Search Tools
|
||||
|
||||
Claude-Mem provides intelligent memory search through **5 MCP tools** following a token-efficient **3-layer workflow pattern**:
|
||||
Claude-Mem provides intelligent memory search through **4 MCP tools** following a token-efficient **3-layer workflow pattern**:
|
||||
|
||||
**The 3-Layer Workflow:**
|
||||
|
||||
@@ -211,7 +211,6 @@ Claude-Mem provides intelligent memory search through **5 MCP tools** following
|
||||
- Start with `search` to get an index of results
|
||||
- Use `timeline` to see what was happening around specific observations
|
||||
- Use `get_observations` to fetch full details for relevant IDs
|
||||
- Use `save_memory` to manually store important information
|
||||
- **~10x token savings** by filtering before fetching details
|
||||
|
||||
**Available MCP Tools:**
|
||||
@@ -219,8 +218,6 @@ Claude-Mem provides intelligent memory search through **5 MCP tools** following
|
||||
1. **`search`** - Search memory index with full-text queries, filters by type/date/project
|
||||
2. **`timeline`** - Get chronological context around a specific observation or query
|
||||
3. **`get_observations`** - Fetch full observation details by IDs (always batch multiple IDs)
|
||||
4. **`save_memory`** - Manually save a memory/observation for semantic search
|
||||
5. **`__IMPORTANT`** - Workflow documentation (always visible to Claude)
|
||||
|
||||
**Example Usage:**
|
||||
|
||||
@@ -232,9 +229,6 @@ search(query="authentication bug", type="bugfix", limit=10)
|
||||
|
||||
// Step 3: Fetch full details
|
||||
get_observations(ids=[123, 456])
|
||||
|
||||
// Save important information manually
|
||||
save_memory(text="API requires auth header X-API-Key", title="API Auth")
|
||||
```
|
||||
|
||||
See [Search Tools Guide](https://docs.claude-mem.ai/usage/search-tools) for detailed examples.
|
||||
|
||||
@@ -5,7 +5,7 @@ set -euo pipefail
|
||||
# Usage: curl -fsSL https://install.cmem.ai | bash
|
||||
# or: curl -fsSL https://install.cmem.ai | bash -s -- --provider=gemini --api-key=YOUR_KEY
|
||||
|
||||
INSTALLER_URL="https://raw.githubusercontent.com/thedotmack/claude-mem/main/installer/dist/index.js"
|
||||
INSTALLER_URL="https://install.cmem.ai/installer.js"
|
||||
|
||||
# Colors
|
||||
RED='\033[0;31m'
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -1,5 +1,8 @@
|
||||
{
|
||||
"$schema": "https://openapi.vercel.sh/vercel.json",
|
||||
"rewrites": [
|
||||
{ "source": "/", "destination": "/install.sh" }
|
||||
],
|
||||
"headers": [
|
||||
{
|
||||
"source": "/(.*)\\.sh",
|
||||
|
||||
Vendored
+2107
File diff suppressed because it is too large
Load Diff
+1
-1
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "claude-mem",
|
||||
"version": "10.3.0",
|
||||
"version": "10.3.2",
|
||||
"description": "Memory compression system for Claude Code - persist context across sessions",
|
||||
"keywords": [
|
||||
"claude",
|
||||
|
||||
@@ -0,0 +1,52 @@
|
||||
# Fix: SessionStart Hook "startup hook error" — Worker Not Waiting
|
||||
|
||||
## Root Cause
|
||||
|
||||
The **installed plugin** (`~/.claude/plugins/marketplaces/thedotmack/`) is version **10.2.5** and has **none** of the recent fixes:
|
||||
|
||||
| Fix | Repo Status | Installed Status |
|
||||
|-----|-------------|-----------------|
|
||||
| Hook group split (smart-install isolated from worker start) | In `plugin/hooks/hooks.json` | **Missing** — all 3 hooks in one group, smart-install failure blocks worker |
|
||||
| `waitForReadiness()` after spawn | In `src/services/infrastructure/HealthMonitor.ts` | **Missing** — 0 occurrences in installed `worker-service.cjs` |
|
||||
| Early `initializationCompleteFlag` (after DB+search, not MCP) | In `src/services/worker-service.ts` | **Missing** — flag set after MCP connection (5+ minute wait) |
|
||||
|
||||
The changes exist in source code but were **never built and synced** to the installed location.
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Build and Sync
|
||||
|
||||
```bash
|
||||
npm run build-and-sync
|
||||
```
|
||||
|
||||
### Verification
|
||||
|
||||
```bash
|
||||
# 1. Confirm waitForReadiness exists in installed build
|
||||
grep -c "waitForReadiness" ~/.claude/plugins/marketplaces/thedotmack/plugin/scripts/worker-service.cjs
|
||||
# Expected: > 0
|
||||
|
||||
# 2. Confirm hooks.json has two SessionStart groups (the split)
|
||||
python3 -c "import json; d=json.load(open('$(echo $HOME)/.claude/plugins/marketplaces/thedotmack/plugin/hooks/hooks.json')); print('SessionStart groups:', len(d['hooks']['SessionStart']))"
|
||||
# Expected: 2
|
||||
|
||||
# 3. Confirm initializationCompleteFlag is set before MCP connection
|
||||
grep -n "Core initialization complete" ~/.claude/plugins/marketplaces/thedotmack/plugin/scripts/worker-service.cjs | head -1
|
||||
# Expected: appears BEFORE "MCP server connected"
|
||||
```
|
||||
|
||||
## Phase 2: Restart Worker and Test
|
||||
|
||||
```bash
|
||||
# Stop existing worker
|
||||
bun plugin/scripts/worker-service.cjs stop
|
||||
|
||||
# Verify stopped
|
||||
curl -s http://127.0.0.1:37777/api/health && echo "STILL RUNNING" || echo "STOPPED"
|
||||
```
|
||||
|
||||
Then start a new Claude Code session and verify:
|
||||
- No "SessionStart:startup hook error" messages
|
||||
- Worker is running: `curl http://127.0.0.1:37777/api/health`
|
||||
- Readiness endpoint works: `curl http://127.0.0.1:37777/api/readiness`
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "claude-mem",
|
||||
"version": "10.3.0",
|
||||
"version": "10.3.2",
|
||||
"description": "Persistent memory system for Claude Code - seamlessly preserve context across sessions",
|
||||
"author": {
|
||||
"name": "Alex Newman"
|
||||
|
||||
@@ -21,7 +21,12 @@
|
||||
"type": "command",
|
||||
"command": "node \"${CLAUDE_PLUGIN_ROOT}/scripts/smart-install.js\"",
|
||||
"timeout": 300
|
||||
},
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"matcher": "startup|clear|compact",
|
||||
"hooks": [
|
||||
{
|
||||
"type": "command",
|
||||
"command": "node \"${CLAUDE_PLUGIN_ROOT}/scripts/bun-runner.js\" \"${CLAUDE_PLUGIN_ROOT}/scripts/worker-service.cjs\" start",
|
||||
|
||||
+1
-1
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "claude-mem-plugin",
|
||||
"version": "10.3.0",
|
||||
"version": "10.3.2",
|
||||
"private": true,
|
||||
"description": "Runtime dependencies for claude-mem bundled hooks",
|
||||
"type": "module",
|
||||
|
||||
Executable
BIN
Binary file not shown.
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
+275
-270
File diff suppressed because one or more lines are too long
@@ -93,20 +93,6 @@ get_observations(ids=[11131, 10942])
|
||||
|
||||
**Returns:** Complete observation objects with title, subtitle, narrative, facts, concepts, files (~500-1000 tokens each)
|
||||
|
||||
## Saving Memories
|
||||
|
||||
Use the `save_memory` MCP tool to store manual observations:
|
||||
|
||||
```
|
||||
save_memory(text="Important discovery about the auth system", title="Auth Architecture", project="my-project")
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
|
||||
- `text` (string, required) - Content to remember
|
||||
- `title` (string, optional) - Short title, auto-generated if omitted
|
||||
- `project` (string, optional) - Project name, defaults to "claude-mem"
|
||||
|
||||
## Examples
|
||||
|
||||
**Find recent bug fixes:**
|
||||
|
||||
@@ -235,8 +235,8 @@ NEVER fetch full details without filtering first. 10x token savings.`,
|
||||
}
|
||||
},
|
||||
{
|
||||
name: 'save_memory',
|
||||
description: 'Save a manual memory/observation for semantic search. Use this to remember important information.',
|
||||
name: 'save_observation',
|
||||
description: 'Save an observation to the database. Params: text (required), title, project',
|
||||
inputSchema: {
|
||||
type: 'object',
|
||||
properties: {
|
||||
|
||||
@@ -74,8 +74,8 @@ export function renderColorContextIndex(): string[] {
|
||||
`${colors.dim}Context Index: This semantic index (titles, types, files, tokens) is usually sufficient to understand past work.${colors.reset}`,
|
||||
'',
|
||||
`${colors.dim}When you need implementation details, rationale, or debugging context:${colors.reset}`,
|
||||
`${colors.dim} - Use MCP tools (search, get_observations) to fetch full observations on-demand${colors.reset}`,
|
||||
`${colors.dim} - Critical types ( bugfix, decision) often need detailed fetching${colors.reset}`,
|
||||
`${colors.dim} - Fetch by ID: get_observations([IDs]) for observations visible in this index${colors.reset}`,
|
||||
`${colors.dim} - Search history: Use the mem-search skill for past decisions, bugs, and deeper research${colors.reset}`,
|
||||
`${colors.dim} - Trust this index over re-reading code for past decisions and learnings${colors.reset}`,
|
||||
''
|
||||
];
|
||||
|
||||
@@ -72,8 +72,8 @@ export function renderMarkdownContextIndex(): string[] {
|
||||
`**Context Index:** This semantic index (titles, types, files, tokens) is usually sufficient to understand past work.`,
|
||||
'',
|
||||
`When you need implementation details, rationale, or debugging context:`,
|
||||
`- Use MCP tools (search, get_observations) to fetch full observations on-demand`,
|
||||
`- Critical types ( bugfix, decision) often need detailed fetching`,
|
||||
`- Fetch by ID: get_observations([IDs]) for observations visible in this index`,
|
||||
`- Search history: Use the mem-search skill for past decisions, bugs, and deeper research`,
|
||||
`- Trust this index over re-reading code for past decisions and learnings`,
|
||||
''
|
||||
];
|
||||
|
||||
@@ -29,31 +29,49 @@ export async function isPortInUse(port: number): Promise<boolean> {
|
||||
}
|
||||
|
||||
/**
|
||||
* Wait for the worker HTTP server to become responsive (liveness check)
|
||||
* Uses /api/health instead of /api/readiness because:
|
||||
* - /api/health returns 200 as soon as HTTP server is listening
|
||||
* - /api/readiness waits for full initialization (MCP connection can take 5+ minutes)
|
||||
* See: https://github.com/thedotmack/claude-mem/issues/811
|
||||
* @param port Worker port to check
|
||||
* @param timeoutMs Maximum time to wait in milliseconds
|
||||
* @returns true if worker became responsive, false if timeout
|
||||
* Poll a localhost endpoint until it returns 200 OK or timeout.
|
||||
* Shared implementation for liveness and readiness checks.
|
||||
*/
|
||||
export async function waitForHealth(port: number, timeoutMs: number = 30000): Promise<boolean> {
|
||||
async function pollEndpointUntilOk(
|
||||
port: number,
|
||||
endpointPath: string,
|
||||
timeoutMs: number,
|
||||
retryLogMessage: string
|
||||
): Promise<boolean> {
|
||||
const start = Date.now();
|
||||
while (Date.now() - start < timeoutMs) {
|
||||
try {
|
||||
// Note: Removed AbortSignal.timeout to avoid Windows Bun cleanup issue (libuv assertion)
|
||||
const response = await fetch(`http://127.0.0.1:${port}/api/health`);
|
||||
const response = await fetch(`http://127.0.0.1:${port}${endpointPath}`);
|
||||
if (response.ok) return true;
|
||||
} catch (error) {
|
||||
// [ANTI-PATTERN IGNORED]: Retry loop - expected failures during startup, will retry
|
||||
logger.debug('SYSTEM', 'Service not ready yet, will retry', { port }, error as Error);
|
||||
logger.debug('SYSTEM', retryLogMessage, { port }, error as Error);
|
||||
}
|
||||
await new Promise(r => setTimeout(r, 500));
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
/**
|
||||
* Wait for the worker HTTP server to become responsive (liveness check).
|
||||
* Uses /api/health which returns 200 as soon as the HTTP server is listening.
|
||||
* For full initialization (DB + search), use waitForReadiness() instead.
|
||||
*/
|
||||
export function waitForHealth(port: number, timeoutMs: number = 30000): Promise<boolean> {
|
||||
return pollEndpointUntilOk(port, '/api/health', timeoutMs, 'Service not ready yet, will retry');
|
||||
}
|
||||
|
||||
/**
|
||||
* Wait for the worker to be fully initialized (DB + search ready).
|
||||
* Uses /api/readiness which returns 200 only after core initialization completes.
|
||||
* Now that initializationCompleteFlag is set after DB/search init (not MCP),
|
||||
* this typically completes in a few seconds.
|
||||
*/
|
||||
export function waitForReadiness(port: number, timeoutMs: number = 30000): Promise<boolean> {
|
||||
return pollEndpointUntilOk(port, '/api/readiness', timeoutMs, 'Worker not ready yet, will retry');
|
||||
}
|
||||
|
||||
/**
|
||||
* Wait for a port to become free (no longer responding to health checks)
|
||||
* Used after shutdown to confirm the port is available for restart
|
||||
|
||||
@@ -10,7 +10,7 @@
|
||||
|
||||
import path from 'path';
|
||||
import { homedir } from 'os';
|
||||
import { existsSync, writeFileSync, readFileSync, unlinkSync, mkdirSync } from 'fs';
|
||||
import { existsSync, writeFileSync, readFileSync, unlinkSync, mkdirSync, rmSync } from 'fs';
|
||||
import { exec, execSync, spawn } from 'child_process';
|
||||
import { promisify } from 'util';
|
||||
import { logger } from '../../utils/logger.js';
|
||||
@@ -426,6 +426,182 @@ export async function cleanupOrphanedProcesses(): Promise<void> {
|
||||
logger.info('SYSTEM', 'Orphaned processes cleaned up', { count: pidsToKill.length });
|
||||
}
|
||||
|
||||
// Patterns that should be killed immediately at startup (no age gate)
|
||||
// These are child processes that should not outlive their parent worker
|
||||
const AGGRESSIVE_CLEANUP_PATTERNS = ['worker-service.cjs', 'chroma-mcp'];
|
||||
|
||||
// Patterns that keep the age-gated threshold (may be legitimately running)
|
||||
const AGE_GATED_CLEANUP_PATTERNS = ['mcp-server.cjs'];
|
||||
|
||||
/**
|
||||
* Aggressive startup cleanup for orphaned claude-mem processes.
|
||||
*
|
||||
* Unlike cleanupOrphanedProcesses() which age-gates everything at 30 minutes,
|
||||
* this function kills worker-service.cjs and chroma-mcp processes immediately
|
||||
* (they should not outlive their parent worker). Only mcp-server.cjs keeps
|
||||
* the age threshold since it may be legitimately running.
|
||||
*
|
||||
* Called once at daemon startup.
|
||||
*/
|
||||
export async function aggressiveStartupCleanup(): Promise<void> {
|
||||
const isWindows = process.platform === 'win32';
|
||||
const currentPid = process.pid;
|
||||
const pidsToKill: number[] = [];
|
||||
const allPatterns = [...AGGRESSIVE_CLEANUP_PATTERNS, ...AGE_GATED_CLEANUP_PATTERNS];
|
||||
|
||||
try {
|
||||
if (isWindows) {
|
||||
const patternConditions = allPatterns
|
||||
.map(p => `$_.CommandLine -like '*${p}*'`)
|
||||
.join(' -or ');
|
||||
|
||||
const cmd = `powershell -NoProfile -NonInteractive -Command "Get-CimInstance Win32_Process | Where-Object { (${patternConditions}) -and $_.ProcessId -ne ${currentPid} } | Select-Object ProcessId, CommandLine, CreationDate | ConvertTo-Json"`;
|
||||
const { stdout } = await execAsync(cmd, { timeout: HOOK_TIMEOUTS.POWERSHELL_COMMAND });
|
||||
|
||||
if (!stdout.trim() || stdout.trim() === 'null') {
|
||||
logger.debug('SYSTEM', 'No orphaned claude-mem processes found (Windows)');
|
||||
return;
|
||||
}
|
||||
|
||||
const processes = JSON.parse(stdout);
|
||||
const processList = Array.isArray(processes) ? processes : [processes];
|
||||
const now = Date.now();
|
||||
|
||||
for (const proc of processList) {
|
||||
const pid = proc.ProcessId;
|
||||
if (!Number.isInteger(pid) || pid <= 0 || pid === currentPid) continue;
|
||||
|
||||
const commandLine = proc.CommandLine || '';
|
||||
const isAggressive = AGGRESSIVE_CLEANUP_PATTERNS.some(p => commandLine.includes(p));
|
||||
|
||||
if (isAggressive) {
|
||||
// Kill immediately — no age check
|
||||
pidsToKill.push(pid);
|
||||
logger.debug('SYSTEM', 'Found orphaned process (aggressive)', { pid, commandLine: commandLine.substring(0, 80) });
|
||||
} else {
|
||||
// Age-gated: only kill if older than threshold
|
||||
const creationMatch = proc.CreationDate?.match(/\/Date\((\d+)\)\//);
|
||||
if (creationMatch) {
|
||||
const creationTime = parseInt(creationMatch[1], 10);
|
||||
const ageMinutes = (now - creationTime) / (1000 * 60);
|
||||
if (ageMinutes >= ORPHAN_MAX_AGE_MINUTES) {
|
||||
pidsToKill.push(pid);
|
||||
logger.debug('SYSTEM', 'Found orphaned process (age-gated)', { pid, ageMinutes: Math.round(ageMinutes) });
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
} else {
|
||||
// Unix: Use ps with elapsed time
|
||||
const patternRegex = allPatterns.join('|');
|
||||
const { stdout } = await execAsync(
|
||||
`ps -eo pid,etime,command | grep -E "${patternRegex}" | grep -v grep || true`
|
||||
);
|
||||
|
||||
if (!stdout.trim()) {
|
||||
logger.debug('SYSTEM', 'No orphaned claude-mem processes found (Unix)');
|
||||
return;
|
||||
}
|
||||
|
||||
const lines = stdout.trim().split('\n');
|
||||
for (const line of lines) {
|
||||
const match = line.trim().match(/^(\d+)\s+(\S+)\s+(.*)$/);
|
||||
if (!match) continue;
|
||||
|
||||
const pid = parseInt(match[1], 10);
|
||||
const etime = match[2];
|
||||
const command = match[3];
|
||||
|
||||
if (!Number.isInteger(pid) || pid <= 0 || pid === currentPid) continue;
|
||||
|
||||
const isAggressive = AGGRESSIVE_CLEANUP_PATTERNS.some(p => command.includes(p));
|
||||
|
||||
if (isAggressive) {
|
||||
// Kill immediately — no age check
|
||||
pidsToKill.push(pid);
|
||||
logger.debug('SYSTEM', 'Found orphaned process (aggressive)', { pid, command: command.substring(0, 80) });
|
||||
} else {
|
||||
// Age-gated: only kill if older than threshold
|
||||
const ageMinutes = parseElapsedTime(etime);
|
||||
if (ageMinutes >= ORPHAN_MAX_AGE_MINUTES) {
|
||||
pidsToKill.push(pid);
|
||||
logger.debug('SYSTEM', 'Found orphaned process (age-gated)', { pid, ageMinutes, command: command.substring(0, 80) });
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
} catch (error) {
|
||||
logger.error('SYSTEM', 'Failed to enumerate orphaned processes during aggressive cleanup', {}, error as Error);
|
||||
return;
|
||||
}
|
||||
|
||||
if (pidsToKill.length === 0) {
|
||||
return;
|
||||
}
|
||||
|
||||
logger.info('SYSTEM', 'Aggressive startup cleanup: killing orphaned processes', {
|
||||
platform: isWindows ? 'Windows' : 'Unix',
|
||||
count: pidsToKill.length,
|
||||
pids: pidsToKill
|
||||
});
|
||||
|
||||
if (isWindows) {
|
||||
for (const pid of pidsToKill) {
|
||||
if (!Number.isInteger(pid) || pid <= 0) continue;
|
||||
try {
|
||||
execSync(`taskkill /PID ${pid} /T /F`, { timeout: HOOK_TIMEOUTS.POWERSHELL_COMMAND, stdio: 'ignore' });
|
||||
} catch (error) {
|
||||
logger.debug('SYSTEM', 'Failed to kill process, may have already exited', { pid }, error as Error);
|
||||
}
|
||||
}
|
||||
} else {
|
||||
for (const pid of pidsToKill) {
|
||||
try {
|
||||
process.kill(pid, 'SIGKILL');
|
||||
} catch (error) {
|
||||
logger.debug('SYSTEM', 'Process already exited', { pid }, error as Error);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
logger.info('SYSTEM', 'Aggressive startup cleanup complete', { count: pidsToKill.length });
|
||||
}
|
||||
|
||||
const CHROMA_MIGRATION_MARKER_FILENAME = '.chroma-cleaned-v10.3';
|
||||
|
||||
/**
|
||||
* One-time chroma data wipe for users upgrading from versions with duplicate
|
||||
* worker bugs that could corrupt chroma data. Since chroma is always rebuildable
|
||||
* from SQLite (via backfillAllProjects), this is safe.
|
||||
*
|
||||
* Checks for a marker file. If absent, wipes ~/.claude-mem/chroma/ and writes
|
||||
* the marker. If present, skips. Idempotent.
|
||||
*
|
||||
* @param dataDirectory - Override for DATA_DIR (used in tests)
|
||||
*/
|
||||
export function runOneTimeChromaMigration(dataDirectory?: string): void {
|
||||
const effectiveDataDir = dataDirectory ?? DATA_DIR;
|
||||
const markerPath = path.join(effectiveDataDir, CHROMA_MIGRATION_MARKER_FILENAME);
|
||||
const chromaDir = path.join(effectiveDataDir, 'chroma');
|
||||
|
||||
if (existsSync(markerPath)) {
|
||||
logger.debug('SYSTEM', 'Chroma migration marker exists, skipping wipe');
|
||||
return;
|
||||
}
|
||||
|
||||
logger.warn('SYSTEM', 'Running one-time chroma data wipe (upgrade from pre-v10.3)', { chromaDir });
|
||||
|
||||
if (existsSync(chromaDir)) {
|
||||
rmSync(chromaDir, { recursive: true, force: true });
|
||||
logger.info('SYSTEM', 'Chroma data directory removed', { chromaDir });
|
||||
}
|
||||
|
||||
// Write marker file to prevent future wipes
|
||||
mkdirSync(effectiveDataDir, { recursive: true });
|
||||
writeFileSync(markerPath, new Date().toISOString());
|
||||
logger.info('SYSTEM', 'Chroma migration marker written', { markerPath });
|
||||
}
|
||||
|
||||
/**
|
||||
* Spawn a detached daemon process
|
||||
* Returns the child PID or undefined if spawn failed
|
||||
|
||||
@@ -248,8 +248,14 @@ export class Server {
|
||||
process.send!({ type: 'restart' });
|
||||
} else {
|
||||
// Unix or standalone Windows - handle restart ourselves
|
||||
// The spawner (ensureWorkerStarted/restart command) handles spawning the new daemon.
|
||||
// This process just needs to shut down and exit.
|
||||
setTimeout(async () => {
|
||||
await this.options.onRestart();
|
||||
try {
|
||||
await this.options.onRestart();
|
||||
} finally {
|
||||
process.exit(0);
|
||||
}
|
||||
}, 100);
|
||||
}
|
||||
});
|
||||
@@ -268,7 +274,14 @@ export class Server {
|
||||
} else {
|
||||
// Unix or standalone Windows - handle shutdown ourselves
|
||||
setTimeout(async () => {
|
||||
await this.options.onShutdown();
|
||||
try {
|
||||
await this.options.onShutdown();
|
||||
} finally {
|
||||
// CRITICAL: Exit the process after shutdown completes (or fails).
|
||||
// Without this, the daemon stays alive as a zombie — background tasks
|
||||
// (backfill, reconnects) keep running and respawn chroma-mcp subprocesses.
|
||||
process.exit(0);
|
||||
}
|
||||
}, 100);
|
||||
}
|
||||
});
|
||||
|
||||
@@ -69,14 +69,17 @@ import {
|
||||
readPidFile,
|
||||
removePidFile,
|
||||
getPlatformTimeout,
|
||||
cleanupOrphanedProcesses,
|
||||
aggressiveStartupCleanup,
|
||||
runOneTimeChromaMigration,
|
||||
cleanStalePidFile,
|
||||
isProcessAlive,
|
||||
spawnDaemon,
|
||||
createSignalHandler
|
||||
} from './infrastructure/ProcessManager.js';
|
||||
import {
|
||||
isPortInUse,
|
||||
waitForHealth,
|
||||
waitForReadiness,
|
||||
waitForPortFree,
|
||||
httpShutdown,
|
||||
checkVersionMatch
|
||||
@@ -367,7 +370,7 @@ export class WorkerService {
|
||||
*/
|
||||
private async initializeBackground(): Promise<void> {
|
||||
try {
|
||||
await cleanupOrphanedProcesses();
|
||||
await aggressiveStartupCleanup();
|
||||
|
||||
// Load mode configuration
|
||||
const { ModeManager } = await import('./domain/ModeManager.js');
|
||||
@@ -376,6 +379,12 @@ export class WorkerService {
|
||||
|
||||
const settings = SettingsDefaultsManager.loadFromFile(USER_SETTINGS_PATH);
|
||||
|
||||
// One-time chroma wipe for users upgrading from versions with duplicate worker bugs.
|
||||
// Only runs in local mode (chroma is local-only). Backfill at line ~414 rebuilds from SQLite.
|
||||
if (settings.CLAUDE_MEM_MODE === 'local' || !settings.CLAUDE_MEM_MODE) {
|
||||
runOneTimeChromaMigration();
|
||||
}
|
||||
|
||||
// Initialize ChromaMcpManager (lazy - connects on first use via ChromaSync)
|
||||
this.chromaMcpManager = ChromaMcpManager.getInstance();
|
||||
logger.info('SYSTEM', 'ChromaMcpManager initialized (lazy - connects on first use)');
|
||||
@@ -408,6 +417,13 @@ export class WorkerService {
|
||||
this.server.registerRoutes(this.searchRoutes);
|
||||
logger.info('WORKER', 'SearchManager initialized and search routes registered');
|
||||
|
||||
// DB and search are ready — mark initialization complete so hooks can proceed.
|
||||
// MCP connection is tracked separately via mcpReady and is NOT required for
|
||||
// the worker to serve context/search requests.
|
||||
this.initializationCompleteFlag = true;
|
||||
this.resolveInitialization();
|
||||
logger.info('SYSTEM', 'Core initialization complete (DB + search ready)');
|
||||
|
||||
// Auto-backfill Chroma for all projects if out of sync with SQLite (fire-and-forget)
|
||||
if (this.chromaMcpManager) {
|
||||
ChromaSync.backfillAllProjects().then(() => {
|
||||
@@ -433,11 +449,7 @@ export class WorkerService {
|
||||
|
||||
await Promise.race([mcpConnectionPromise, timeoutPromise]);
|
||||
this.mcpReady = true;
|
||||
logger.success('WORKER', 'Connected to MCP server');
|
||||
|
||||
this.initializationCompleteFlag = true;
|
||||
this.resolveInitialization();
|
||||
logger.info('SYSTEM', 'Background initialization complete');
|
||||
logger.success('WORKER', 'MCP server connected');
|
||||
|
||||
// Start orphan reaper to clean up zombie processes (Issue #737)
|
||||
this.stopOrphanReaper = startOrphanReaper(() => {
|
||||
@@ -937,6 +949,13 @@ async function ensureWorkerStarted(port: number): Promise<boolean> {
|
||||
return false;
|
||||
}
|
||||
|
||||
// Health passed (HTTP listening). Now wait for DB + search initialization
|
||||
// so hooks that run immediately after can actually use the worker.
|
||||
const ready = await waitForReadiness(port, getPlatformTimeout(HOOK_TIMEOUTS.READINESS_WAIT));
|
||||
if (!ready) {
|
||||
logger.warn('SYSTEM', 'Worker is alive but readiness timed out — proceeding anyway');
|
||||
}
|
||||
|
||||
clearWorkerSpawnAttempted();
|
||||
logger.info('SYSTEM', 'Worker started successfully');
|
||||
return true;
|
||||
@@ -1097,6 +1116,28 @@ async function main() {
|
||||
|
||||
case '--daemon':
|
||||
default: {
|
||||
// GUARD 1: Refuse to start if another worker is already alive (PID check).
|
||||
// Instant check (kill -0) — no HTTP dependency.
|
||||
const existingPidInfo = readPidFile();
|
||||
if (existingPidInfo && isProcessAlive(existingPidInfo.pid)) {
|
||||
logger.info('SYSTEM', 'Worker already running (PID alive), refusing to start duplicate', {
|
||||
existingPid: existingPidInfo.pid,
|
||||
existingPort: existingPidInfo.port,
|
||||
startedAt: existingPidInfo.startedAt
|
||||
});
|
||||
process.exit(0);
|
||||
}
|
||||
|
||||
// GUARD 2: Refuse to start if the port is already bound.
|
||||
// Catches the race where two daemons start simultaneously before
|
||||
// either writes a PID file. Must run BEFORE constructing WorkerService
|
||||
// because the constructor registers signal handlers and timers that
|
||||
// prevent the process from exiting even if listen() fails later.
|
||||
if (await isPortInUse(port)) {
|
||||
logger.info('SYSTEM', 'Port already in use, refusing to start duplicate', { port });
|
||||
process.exit(0);
|
||||
}
|
||||
|
||||
// Prevent daemon from dying silently on unhandled errors.
|
||||
// The HTTP server can continue serving even if a background task throws.
|
||||
process.on('unhandledRejection', (reason) => {
|
||||
|
||||
@@ -2,6 +2,7 @@ export const HOOK_TIMEOUTS = {
|
||||
DEFAULT: 300000, // Standard HTTP timeout (5 min for slow systems)
|
||||
HEALTH_CHECK: 3000, // Worker health check (3s — healthy worker responds in <100ms)
|
||||
POST_SPAWN_WAIT: 5000, // Wait for daemon to start after spawn (starts in <1s on Linux)
|
||||
READINESS_WAIT: 30000, // Wait for DB + search init after spawn (typically <5s)
|
||||
PORT_IN_USE_WAIT: 3000, // Wait when port occupied but health failing
|
||||
WORKER_STARTUP_WAIT: 1000,
|
||||
PRE_RESTART_SETTLE_DELAY: 2000, // Give files time to sync before restart
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
import { describe, it, expect, beforeEach, afterEach } from 'bun:test';
|
||||
import { existsSync, readFileSync } from 'fs';
|
||||
import { existsSync, readFileSync, mkdirSync, writeFileSync, rmSync } from 'fs';
|
||||
import { homedir } from 'os';
|
||||
import { tmpdir } from 'os';
|
||||
import path from 'path';
|
||||
import {
|
||||
writePidFile,
|
||||
@@ -12,6 +13,7 @@ import {
|
||||
cleanStalePidFile,
|
||||
spawnDaemon,
|
||||
resolveWorkerRuntimePath,
|
||||
runOneTimeChromaMigration,
|
||||
type PidInfo
|
||||
} from '../../src/services/infrastructure/index.js';
|
||||
|
||||
@@ -32,7 +34,6 @@ describe('ProcessManager', () => {
|
||||
afterEach(() => {
|
||||
// Restore original PID file or remove test one
|
||||
if (originalPidContent !== null) {
|
||||
const { writeFileSync } = require('fs');
|
||||
writeFileSync(PID_FILE, originalPidContent);
|
||||
originalPidContent = null;
|
||||
} else {
|
||||
@@ -105,7 +106,6 @@ describe('ProcessManager', () => {
|
||||
});
|
||||
|
||||
it('should return null for corrupted JSON', () => {
|
||||
const { writeFileSync } = require('fs');
|
||||
writeFileSync(PID_FILE, 'not valid json {{{');
|
||||
|
||||
const result = readPidFile();
|
||||
@@ -415,4 +415,53 @@ describe('ProcessManager', () => {
|
||||
// This is a logic verification test — actual signal delivery is tested manually
|
||||
});
|
||||
});
|
||||
|
||||
describe('runOneTimeChromaMigration', () => {
|
||||
let testDataDir: string;
|
||||
|
||||
beforeEach(() => {
|
||||
testDataDir = path.join(tmpdir(), `claude-mem-test-${Date.now()}-${Math.random().toString(36).slice(2)}`);
|
||||
mkdirSync(testDataDir, { recursive: true });
|
||||
});
|
||||
|
||||
afterEach(() => {
|
||||
rmSync(testDataDir, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
it('should wipe chroma directory and write marker file', () => {
|
||||
// Create a fake chroma directory with data
|
||||
const chromaDir = path.join(testDataDir, 'chroma');
|
||||
mkdirSync(chromaDir, { recursive: true });
|
||||
writeFileSync(path.join(chromaDir, 'test-data.bin'), 'fake chroma data');
|
||||
|
||||
runOneTimeChromaMigration(testDataDir);
|
||||
|
||||
// Chroma dir should be gone
|
||||
expect(existsSync(chromaDir)).toBe(false);
|
||||
// Marker file should exist
|
||||
expect(existsSync(path.join(testDataDir, '.chroma-cleaned-v10.3'))).toBe(true);
|
||||
});
|
||||
|
||||
it('should skip when marker file already exists (idempotent)', () => {
|
||||
// Write marker file first
|
||||
writeFileSync(path.join(testDataDir, '.chroma-cleaned-v10.3'), 'already done');
|
||||
|
||||
// Create a chroma directory that should NOT be wiped
|
||||
const chromaDir = path.join(testDataDir, 'chroma');
|
||||
mkdirSync(chromaDir, { recursive: true });
|
||||
writeFileSync(path.join(chromaDir, 'important.bin'), 'should survive');
|
||||
|
||||
runOneTimeChromaMigration(testDataDir);
|
||||
|
||||
// Chroma dir should still exist (migration was skipped)
|
||||
expect(existsSync(chromaDir)).toBe(true);
|
||||
expect(existsSync(path.join(chromaDir, 'important.bin'))).toBe(true);
|
||||
});
|
||||
|
||||
it('should handle missing chroma directory gracefully', () => {
|
||||
// No chroma dir exists — should just write marker without error
|
||||
expect(() => runOneTimeChromaMigration(testDataDir)).not.toThrow();
|
||||
expect(existsSync(path.join(testDataDir, '.chroma-cleaned-v10.3'))).toBe(true);
|
||||
});
|
||||
});
|
||||
});
|
||||
|
||||
Reference in New Issue
Block a user