chore: bump version to 10.3.2

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fix: rename save_memory and fix MCP search instructions + startup hook (#1210 )
2026-02-23 03:32:22 -05:00 · 2026-02-23 03:30:31 -05:00 · 2026-02-19 22:39:36 -05:00 · 2026-02-19 22:08:43 -05:00 · 2026-02-19 22:06:05 -05:00 · 2026-02-18 20:12:46 -05:00
24 changed files with 4633 additions and 343 deletions
@@ -10,7 +10,7 @@
  "plugins": [
    {
      "name": "claude-mem",
-      "version": "10.3.1",
+      "version": "10.3.2",
      "source": "./plugin",
      "description": "Persistent memory system for Claude Code - context compression across sessions"
    }
@@ -1,6 +1,7 @@
 datasets/
 node_modules/
 dist/
+!installer/dist/
 *.log
 .DS_Store
 .env
@@ -2,6 +2,21 @@

 All notable changes to claude-mem.

+## [v10.3.1] - 2026-02-19
+
+## Fix: Prevent Duplicate Worker Daemons and Zombie Processes
+
+Three root causes of chroma-mcp timeouts identified and fixed:
+
+### PID-based daemon guard
+Exit immediately on startup if PID file points to a live process. Prevents the race condition where hooks firing simultaneously could start multiple daemons before either wrote a PID file.
+
+### Port-based daemon guard
+Exit if port 37777 is already bound — runs before WorkerService constructor registers keepalive signal handlers that previously prevented exit on EADDRINUSE.
+
+### Guaranteed process.exit() after HTTP shutdown
+HTTP shutdown (POST /api/admin/shutdown) now calls `process.exit(0)` in a `try/finally` block. Previously, zombie workers stayed alive after shutdown, and background tasks reconnected to chroma-mcp, spawning duplicate subprocesses contending for the same data directory.
+
 ## [v10.3.0] - 2026-02-18

 ## Replace WASM Embeddings with Persistent chroma-mcp MCP Connection
@@ -1436,18 +1451,3 @@ Thanks @yungweng for the detailed bug report!
 - Updated worker CLI scripts to reference worker-service.cjs directly
 - Simplified hook command configurations

-## [v8.2.8] - 2025-12-29
-
-## Bug Fixes
-
- Fixed orphaned chroma-mcp processes during shutdown (#489)
-  - Added graceful shutdown handling with signal handlers registered early in WorkerService lifecycle
-  - Ensures ChromaSync subprocess cleanup even when interrupted during initialization
-  - Removes PID file during shutdown to prevent stale process tracking
-
-## Technical Details
-
-This patch release addresses a race condition where SIGTERM/SIGINT signals arriving during ChromaSync initialization could leave orphaned chroma-mcp processes. The fix moves signal handler registration from the start() method to the constructor, ensuring cleanup handlers exist throughout the entire initialization lifecycle.
-
-**Full Changelog**: https://github.com/thedotmack/claude-mem/compare/v8.2.7...v8.2.8
-
@@ -198,7 +198,7 @@ See [Architecture Overview](https://docs.claude-mem.ai/architecture/overview) fo

 ## MCP Search Tools

-Claude-Mem provides intelligent memory search through **5 MCP tools** following a token-efficient **3-layer workflow pattern**:
+Claude-Mem provides intelligent memory search through **4 MCP tools** following a token-efficient **3-layer workflow pattern**:

 **The 3-Layer Workflow:**

@@ -211,7 +211,6 @@ Claude-Mem provides intelligent memory search through **5 MCP tools** following
 - Start with `search` to get an index of results
 - Use `timeline` to see what was happening around specific observations
 - Use `get_observations` to fetch full details for relevant IDs
- Use `save_memory` to manually store important information
 - **~10x token savings** by filtering before fetching details

 **Available MCP Tools:**
@@ -219,8 +218,6 @@ Claude-Mem provides intelligent memory search through **5 MCP tools** following
 1. **`search`** - Search memory index with full-text queries, filters by type/date/project
 2. **`timeline`** - Get chronological context around a specific observation or query
 3. **`get_observations`** - Fetch full observation details by IDs (always batch multiple IDs)
-4. **`save_memory`** - Manually save a memory/observation for semantic search
-5. **`__IMPORTANT`** - Workflow documentation (always visible to Claude)

 **Example Usage:**

@@ -232,9 +229,6 @@ search(query="authentication bug", type="bugfix", limit=10)

 // Step 3: Fetch full details
 get_observations(ids=[123, 456])
-
-// Save important information manually
-save_memory(text="API requires auth header X-API-Key", title="API Auth")
 ```

 See [Search Tools Guide](https://docs.claude-mem.ai/usage/search-tools) for detailed examples.
@@ -5,7 +5,7 @@ set -euo pipefail
 # Usage: curl -fsSL https://install.cmem.ai | bash
 #   or:  curl -fsSL https://install.cmem.ai | bash -s -- --provider=gemini --api-key=YOUR_KEY

-INSTALLER_URL="https://raw.githubusercontent.com/thedotmack/claude-mem/main/installer/dist/index.js"
+INSTALLER_URL="https://install.cmem.ai/installer.js"

 # Colors
 RED='\033[0;31m'
@@ -1,5 +1,8 @@
 {
  "$schema": "https://openapi.vercel.sh/vercel.json",
+  "rewrites": [
+    { "source": "/", "destination": "/install.sh" }
+  ],
  "headers": [
    {
      "source": "/(.*)\\.sh",
@@ -1,6 +1,6 @@
 {
  "name": "claude-mem",
-  "version": "10.3.1",
+  "version": "10.3.2",
  "description": "Memory compression system for Claude Code - persist context across sessions",
  "keywords": [
    "claude",
@@ -0,0 +1,52 @@
+# Fix: SessionStart Hook "startup hook error" — Worker Not Waiting
+
+## Root Cause
+
+The **installed plugin** (`~/.claude/plugins/marketplaces/thedotmack/`) is version **10.2.5** and has **none** of the recent fixes:
+
+| Fix | Repo Status | Installed Status |
+|-----|-------------|-----------------|
+| Hook group split (smart-install isolated from worker start) | In `plugin/hooks/hooks.json` | **Missing** — all 3 hooks in one group, smart-install failure blocks worker |
+| `waitForReadiness()` after spawn | In `src/services/infrastructure/HealthMonitor.ts` | **Missing** — 0 occurrences in installed `worker-service.cjs` |
+| Early `initializationCompleteFlag` (after DB+search, not MCP) | In `src/services/worker-service.ts` | **Missing** — flag set after MCP connection (5+ minute wait) |
+
+The changes exist in source code but were **never built and synced** to the installed location.
+
+---
+
+## Phase 1: Build and Sync
+
+```bash
+npm run build-and-sync
+```
+
+### Verification
+
+```bash
+# 1. Confirm waitForReadiness exists in installed build
+grep -c "waitForReadiness" ~/.claude/plugins/marketplaces/thedotmack/plugin/scripts/worker-service.cjs
+# Expected: > 0
+
+# 2. Confirm hooks.json has two SessionStart groups (the split)
+python3 -c "import json; d=json.load(open('$(echo $HOME)/.claude/plugins/marketplaces/thedotmack/plugin/hooks/hooks.json')); print('SessionStart groups:', len(d['hooks']['SessionStart']))"
+# Expected: 2
+
+# 3. Confirm initializationCompleteFlag is set before MCP connection
+grep -n "Core initialization complete" ~/.claude/plugins/marketplaces/thedotmack/plugin/scripts/worker-service.cjs | head -1
+# Expected: appears BEFORE "MCP server connected"
+```
+
+## Phase 2: Restart Worker and Test
+
+```bash
+# Stop existing worker
+bun plugin/scripts/worker-service.cjs stop
+
+# Verify stopped
+curl -s http://127.0.0.1:37777/api/health && echo "STILL RUNNING" || echo "STOPPED"
+```
+
+Then start a new Claude Code session and verify:
+- No "SessionStart:startup hook error" messages
+- Worker is running: `curl http://127.0.0.1:37777/api/health`
+- Readiness endpoint works: `curl http://127.0.0.1:37777/api/readiness`
@@ -1,6 +1,6 @@
 {
  "name": "claude-mem",
-  "version": "10.3.1",
+  "version": "10.3.2",
  "description": "Persistent memory system for Claude Code - seamlessly preserve context across sessions",
  "author": {
    "name": "Alex Newman"
@@ -21,7 +21,12 @@
            "type": "command",
            "command": "node \"${CLAUDE_PLUGIN_ROOT}/scripts/smart-install.js\"",
            "timeout": 300
-          },
+          }
+        ]
+      },
+      {
+        "matcher": "startup|clear|compact",
+        "hooks": [
          {
            "type": "command",
            "command": "node \"${CLAUDE_PLUGIN_ROOT}/scripts/bun-runner.js\" \"${CLAUDE_PLUGIN_ROOT}/scripts/worker-service.cjs\" start",
@@ -1,6 +1,6 @@
 {
  "name": "claude-mem-plugin",
-  "version": "10.3.1",
+  "version": "10.3.2",
  "private": true,
  "description": "Runtime dependencies for claude-mem bundled hooks",
  "type": "module",
@@ -93,20 +93,6 @@ get_observations(ids=[11131, 10942])

 **Returns:** Complete observation objects with title, subtitle, narrative, facts, concepts, files (~500-1000 tokens each)

-## Saving Memories
-
-Use the `save_memory` MCP tool to store manual observations:
-
-```
-save_memory(text="Important discovery about the auth system", title="Auth Architecture", project="my-project")
-```
-
-**Parameters:**
-
- `text` (string, required) - Content to remember
- `title` (string, optional) - Short title, auto-generated if omitted
- `project` (string, optional) - Project name, defaults to "claude-mem"
-
 ## Examples

 **Find recent bug fixes:**
@@ -235,8 +235,8 @@ NEVER fetch full details without filtering first. 10x token savings.`,
    }
  },
  {
-    name: 'save_memory',
-    description: 'Save a manual memory/observation for semantic search. Use this to remember important information.',
+    name: 'save_observation',
+    description: 'Save an observation to the database. Params: text (required), title, project',
    inputSchema: {
      type: 'object',
      properties: {
@@ -74,8 +74,8 @@ export function renderColorContextIndex(): string[] {
    `${colors.dim}Context Index: This semantic index (titles, types, files, tokens) is usually sufficient to understand past work.${colors.reset}`,
    '',
    `${colors.dim}When you need implementation details, rationale, or debugging context:${colors.reset}`,
-    `${colors.dim}  - Use MCP tools (search, get_observations) to fetch full observations on-demand${colors.reset}`,
-    `${colors.dim}  - Critical types ( bugfix, decision) often need detailed fetching${colors.reset}`,
+    `${colors.dim}  - Fetch by ID: get_observations([IDs]) for observations visible in this index${colors.reset}`,
+    `${colors.dim}  - Search history: Use the mem-search skill for past decisions, bugs, and deeper research${colors.reset}`,
    `${colors.dim}  - Trust this index over re-reading code for past decisions and learnings${colors.reset}`,
    ''
  ];
@@ -72,8 +72,8 @@ export function renderMarkdownContextIndex(): string[] {
    `**Context Index:** This semantic index (titles, types, files, tokens) is usually sufficient to understand past work.`,
    '',
    `When you need implementation details, rationale, or debugging context:`,
-    `- Use MCP tools (search, get_observations) to fetch full observations on-demand`,
-    `- Critical types ( bugfix, decision) often need detailed fetching`,
+    `- Fetch by ID: get_observations([IDs]) for observations visible in this index`,
+    `- Search history: Use the mem-search skill for past decisions, bugs, and deeper research`,
    `- Trust this index over re-reading code for past decisions and learnings`,
    ''
  ];
@@ -29,31 +29,49 @@ export async function isPortInUse(port: number): Promise<boolean> {
 }

 /**
- * Wait for the worker HTTP server to become responsive (liveness check)
- * Uses /api/health instead of /api/readiness because:
- * - /api/health returns 200 as soon as HTTP server is listening
- * - /api/readiness waits for full initialization (MCP connection can take 5+ minutes)
- * See: https://github.com/thedotmack/claude-mem/issues/811
- * @param port Worker port to check
- * @param timeoutMs Maximum time to wait in milliseconds
- * @returns true if worker became responsive, false if timeout
+ * Poll a localhost endpoint until it returns 200 OK or timeout.
+ * Shared implementation for liveness and readiness checks.
 */
-export async function waitForHealth(port: number, timeoutMs: number = 30000): Promise<boolean> {
+async function pollEndpointUntilOk(
+  port: number,
+  endpointPath: string,
+  timeoutMs: number,
+  retryLogMessage: string
+): Promise<boolean> {
  const start = Date.now();
  while (Date.now() - start < timeoutMs) {
    try {
      // Note: Removed AbortSignal.timeout to avoid Windows Bun cleanup issue (libuv assertion)
-      const response = await fetch(`http://127.0.0.1:${port}/api/health`);
+      const response = await fetch(`http://127.0.0.1:${port}${endpointPath}`);
      if (response.ok) return true;
    } catch (error) {
      // [ANTI-PATTERN IGNORED]: Retry loop - expected failures during startup, will retry
-      logger.debug('SYSTEM', 'Service not ready yet, will retry', { port }, error as Error);
+      logger.debug('SYSTEM', retryLogMessage, { port }, error as Error);
    }
    await new Promise(r => setTimeout(r, 500));
  }
  return false;
 }

+/**
+ * Wait for the worker HTTP server to become responsive (liveness check).
+ * Uses /api/health which returns 200 as soon as the HTTP server is listening.
+ * For full initialization (DB + search), use waitForReadiness() instead.
+ */
+export function waitForHealth(port: number, timeoutMs: number = 30000): Promise<boolean> {
+  return pollEndpointUntilOk(port, '/api/health', timeoutMs, 'Service not ready yet, will retry');
+}
+
+/**
+ * Wait for the worker to be fully initialized (DB + search ready).
+ * Uses /api/readiness which returns 200 only after core initialization completes.
+ * Now that initializationCompleteFlag is set after DB/search init (not MCP),
+ * this typically completes in a few seconds.
+ */
+export function waitForReadiness(port: number, timeoutMs: number = 30000): Promise<boolean> {
+  return pollEndpointUntilOk(port, '/api/readiness', timeoutMs, 'Worker not ready yet, will retry');
+}
+
 /**
 * Wait for a port to become free (no longer responding to health checks)
 * Used after shutdown to confirm the port is available for restart
@@ -79,6 +79,7 @@ import {
 import {
  isPortInUse,
  waitForHealth,
+  waitForReadiness,
  waitForPortFree,
  httpShutdown,
  checkVersionMatch
@@ -416,6 +417,13 @@ export class WorkerService {
      this.server.registerRoutes(this.searchRoutes);
      logger.info('WORKER', 'SearchManager initialized and search routes registered');

+      // DB and search are ready — mark initialization complete so hooks can proceed.
+      // MCP connection is tracked separately via mcpReady and is NOT required for
+      // the worker to serve context/search requests.
+      this.initializationCompleteFlag = true;
+      this.resolveInitialization();
+      logger.info('SYSTEM', 'Core initialization complete (DB + search ready)');
+
      // Auto-backfill Chroma for all projects if out of sync with SQLite (fire-and-forget)
      if (this.chromaMcpManager) {
        ChromaSync.backfillAllProjects().then(() => {
@@ -441,11 +449,7 @@ export class WorkerService {

      await Promise.race([mcpConnectionPromise, timeoutPromise]);
      this.mcpReady = true;
-      logger.success('WORKER', 'Connected to MCP server');
-
-      this.initializationCompleteFlag = true;
-      this.resolveInitialization();
-      logger.info('SYSTEM', 'Background initialization complete');
+      logger.success('WORKER', 'MCP server connected');

      // Start orphan reaper to clean up zombie processes (Issue #737)
      this.stopOrphanReaper = startOrphanReaper(() => {
@@ -945,6 +949,13 @@ async function ensureWorkerStarted(port: number): Promise<boolean> {
    return false;
  }

+  // Health passed (HTTP listening). Now wait for DB + search initialization
+  // so hooks that run immediately after can actually use the worker.
+  const ready = await waitForReadiness(port, getPlatformTimeout(HOOK_TIMEOUTS.READINESS_WAIT));
+  if (!ready) {
+    logger.warn('SYSTEM', 'Worker is alive but readiness timed out — proceeding anyway');
+  }
+
  clearWorkerSpawnAttempted();
  logger.info('SYSTEM', 'Worker started successfully');
  return true;
@@ -2,6 +2,7 @@ export const HOOK_TIMEOUTS = {
  DEFAULT: 300000,            // Standard HTTP timeout (5 min for slow systems)
  HEALTH_CHECK: 3000,         // Worker health check (3s — healthy worker responds in <100ms)
  POST_SPAWN_WAIT: 5000,      // Wait for daemon to start after spawn (starts in <1s on Linux)
+  READINESS_WAIT: 30000,      // Wait for DB + search init after spawn (typically <5s)
  PORT_IN_USE_WAIT: 3000,     // Wait when port occupied but health failing
  WORKER_STARTUP_WAIT: 1000,
  PRE_RESTART_SETTLE_DELAY: 2000,  // Give files time to sync before restart
Author	SHA1	Message	Date
Alex Newman	c2c3e3069c	chore: bump version to 10.3.2 Publish to npm / publish (push) Has been cancelled Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 03:32:22 -05:00
Alex Newman	7966c6cba9	fix: rename save_memory and fix MCP search instructions + startup hook (#1210 ) * fix: rename save_memory to save_observation and fix MCP search instructions Stop the primary agent from proactively saving memories by renaming save_memory to save_observation with a neutral description. Remove "Saving Memories" section from SKILL.md. Update context formatters and output styles to reference the mem-search skill instead of raw MCP tool names. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: split SessionStart hooks so smart-install failure doesn't block worker start smart-install.js and worker-start were in the same hook group, so if smart-install exited non-zero the worker never started. Split into separate hook groups so they run independently. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: worker startup waits for readiness before hooks fire Move initializationCompleteFlag to set after DB/search init (not MCP), add waitForReadiness() polling /api/readiness, and extract shared pollEndpointUntilOk helper to DRY up health/readiness checks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 03:30:31 -05:00
Alex Newman	e4e735d3ff	fix: add rewrite rule so install.cmem.ai root serves install.sh Without this, curl https://install.cmem.ai returns 404 because Vercel has no index file mapping for the root path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 22:39:36 -05:00
Alex Newman	780cc3894e	fix: serve installer JS from install.cmem.ai instead of GitHub raw Copied compiled installer to install/public/installer.js so Vercel serves it at install.cmem.ai/installer.js. Updated install.sh to fetch from same domain instead of raw.githubusercontent.com. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 22:08:43 -05:00
Alex Newman	8d46c00dd8	fix: add compiled installer dist so CLI installation works The bootstrap script (install.sh) fetches installer/dist/index.js from main, but it was never committed due to the global dist/ gitignore rule. Added negation rule and the compiled installer bundle. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 22:06:05 -05:00
Alex Newman	4ab601fc9f	docs: update CHANGELOG.md for v10.3.1 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-18 20:12:46 -05:00