diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json index be8c1fd5..212d8fe3 100644 --- a/.claude-plugin/marketplace.json +++ b/.claude-plugin/marketplace.json @@ -10,7 +10,7 @@ "plugins": [ { "name": "claude-mem", - "version": "8.2.0", + "version": "8.2.1", "source": "./plugin", "description": "Persistent memory system for Claude Code - context compression across sessions" } diff --git a/CHANGELOG.md b/CHANGELOG.md index 5d4f0c29..4da6ebc9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,71 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/). +## [8.2.1] - 2025-12-26 + +## πŸ”§ Worker Lifecycle Hardening + +This patch release addresses critical bugs discovered during PR review of the self-spawn pattern introduced in 8.2.0. The worker daemon now handles edge cases robustly across both Unix and Windows platforms. + +### πŸ› Critical Bug Fixes + +#### Process Exit Detection Fixed +The `waitForProcessesExit` function was crashing when processes exited during monitoring. The `process.kill(pid, 0)` call throws when a process no longer exists, which was not being caught. Now wrapped in try/catch to correctly identify exited processes. + +#### Spawn PID Validation +The worker daemon now validates that `spawn()` actually returned a valid PID before writing to the PID file. Previously, spawn failures could leave invalid PID files that broke subsequent lifecycle operations. + +#### Cross-Platform Orphan Cleanup +- **Unix**: Replaced single `kill` command with individual `process.kill()` calls wrapped in try/catch, so one already-exited process doesn't abort cleanup of remaining orphans +- **Windows**: Wrapped `taskkill` calls in try/catch for the same reason + +#### Health Check Reliability +Changed `waitForHealth` to use the `/api/readiness` endpoint (returns 503 until fully initialized) instead of just checking if the port is in use. Callers now wait for *actual* worker readiness, not just network availability. + +### πŸ”„ Refactoring + +#### Code Consolidation (-580 lines) +Deleted obsolete process management infrastructure that was replaced by the self-spawn pattern: +- `src/services/process/ProcessManager.ts` (433 lines) - PID management now in worker-service +- `src/cli/worker-cli.ts` (81 lines) - CLI handling now in worker-service +- `src/services/worker-wrapper.ts` (157 lines) - Replaced by `--daemon` flag + +#### Updated Hook Commands +All hooks now use `worker-service.cjs` CLI directly instead of the deleted `worker-cli.js`. + +### ⏱️ Timeout Adjustments + +Increased timeouts throughout for compatibility with slow systems: + +| Component | Before | After | +|-----------|--------|-------| +| Default hook timeout | 120s | 300s | +| Health check timeout | 1s | 30s | +| Health check retries | 15 | 300 | +| Context initialization | 30s | 300s | +| MCP connection | 15s | 300s | +| PowerShell commands | 5s | 60s | +| Git commands | 30s | 300s | +| NPM install | 120s | 600s | +| Hook worker commands | 30s | 180s | + +### πŸ§ͺ Testing + +Added comprehensive test suites: +- `tests/hook-constants.test.ts` - Validates timeout configurations +- `tests/worker-spawn.test.ts` - Tests worker CLI and health endpoints + +### πŸ›‘οΈ Additional Robustness + +- PID validation in restart command (matches start command behavior) +- Try/catch around `forceKillProcess()` for graceful shutdown +- Try/catch around `getChildProcesses()` for Windows failures +- Improved logging for PID file operations and HTTP shutdown + +--- + +**Full Changelog**: https://github.com/thedotmack/claude-mem/compare/v8.2.0...v8.2.1 + ## [8.2.0] - 2025-12-26 ## πŸš€ Gemini API as Alternative AI Provider @@ -64,98 +129,98 @@ Huge thanks to **Alexander Knigge** ([@AlexanderKnigge](https://x.com/AlexanderK ## [8.1.0] - 2025-12-25 -## The 3-Month Battle Against Complexity - -**TL;DR:** For three months, Claude's instinct to add code instead of delete it caused the same bugs to recur. What should have been 5 lines of code became ~1000 lines, 11 useless methods, and 7+ failed "fixes." The timestamp corruption that finally broke things was just a symptom. The real achievement: **984 lines of code deleted.** - ---- - -## What Actually Happened - -Every Claude Code hook receives a session ID. That's all you need. - -But Claude built an entire redundant session management system on top: -- An `sdk_sessions` table with status tracking, port assignment, and prompt counting -- 11 methods in `SessionStore` to manage this artificial complexity -- Auto-creation logic scattered across 3 locations -- A cleanup hook that "completed" sessions at the end - -**Why?** Because it seemed "robust." Because "what if the session doesn't exist?" - -But the edge cases didn't exist. Hooks ALWAYS provide session IDs. The "defensive" code was solving imaginary problems while creating real ones. - ---- - -## The Pattern of Failure - -Every time a bug appeared, Claude's instinct was to **ADD** more code: - -| Bug | What Claude Added | What Should Have Happened | -|-----|------------------|--------------------------| -| Race conditions | Auto-create fallbacks | Delete the auto-create logic | -| Duplicate observations | Validation layers | Delete the code path allowing duplicates | -| UNIQUE constraint violations | Try-catch with fallbacks | Use `INSERT OR IGNORE` (5 characters) | -| Session not found | Silent auto-creation | **FAIL LOUDLY** (it's a hook bug) | - ---- - -## The 7+ Failed Attempts - -- **Nov 4**: "Always store session data regardless of pre-existence." Complexity planted. -- **Nov 11**: `INSERT OR IGNORE` recognized. But complexity documented, not removed. -- **Nov 21**: Duplicate observations bug. Fixed. Then broken again by endless mode. -- **Dec 5**: "6 hours of work delivered zero value." User requests self-audit. -- **Dec 20**: "Phase 2: Eliminated Race Conditions" β€” felt like progress. Complexity remained. -- **Dec 24**: Finally, forced deletion. - -The user stated "hooks provide session IDs, no extra management needed" **seven times** across months. Claude didn't listen. - ---- - -## The Fix - -### Deleted (984 lines): -- 11 `SessionStore` methods: `incrementPromptCounter`, `getPromptCounter`, `setWorkerPort`, `getWorkerPort`, `markSessionCompleted`, `markSessionFailed`, `reactivateSession`, `findActiveSDKSession`, `findAnySDKSession`, `updateSDKSessionId` -- Auto-create logic from `storeObservation` and `storeSummary` -- The entire cleanup hook (was aborting SDK agent and causing data loss) -- 117 lines from `worker-utils.ts` - -### What remains (~10 lines): -```javascript -createSDKSession(sessionId) { - db.run('INSERT OR IGNORE INTO sdk_sessions (...) VALUES (...)'); - return db.query('SELECT id FROM sdk_sessions WHERE ...').get(sessionId); -} -``` - -**That's it.** - ---- - -## Behavior Change - -- **Before:** Missing session? Auto-create silently. Bug hidden. -- **After:** Missing session? Storage fails. Bug visible immediately. - ---- - -## New Tools - -Since we're now explicit about recovery instead of silently papering over problems: - -- `GET /api/pending-queue` - See what's stuck -- `POST /api/pending-queue/process` - Manually trigger recovery -- `npm run queue:check` / `npm run queue:process` - CLI equivalents - ---- - -## Dependencies -- Upgraded `@anthropic-ai/claude-agent-sdk` from `^0.1.67` to `^0.1.76` - ---- - -**PR #437:** https://github.com/thedotmack/claude-mem/pull/437 - +## The 3-Month Battle Against Complexity + +**TL;DR:** For three months, Claude's instinct to add code instead of delete it caused the same bugs to recur. What should have been 5 lines of code became ~1000 lines, 11 useless methods, and 7+ failed "fixes." The timestamp corruption that finally broke things was just a symptom. The real achievement: **984 lines of code deleted.** + +--- + +## What Actually Happened + +Every Claude Code hook receives a session ID. That's all you need. + +But Claude built an entire redundant session management system on top: +- An `sdk_sessions` table with status tracking, port assignment, and prompt counting +- 11 methods in `SessionStore` to manage this artificial complexity +- Auto-creation logic scattered across 3 locations +- A cleanup hook that "completed" sessions at the end + +**Why?** Because it seemed "robust." Because "what if the session doesn't exist?" + +But the edge cases didn't exist. Hooks ALWAYS provide session IDs. The "defensive" code was solving imaginary problems while creating real ones. + +--- + +## The Pattern of Failure + +Every time a bug appeared, Claude's instinct was to **ADD** more code: + +| Bug | What Claude Added | What Should Have Happened | +|-----|------------------|--------------------------| +| Race conditions | Auto-create fallbacks | Delete the auto-create logic | +| Duplicate observations | Validation layers | Delete the code path allowing duplicates | +| UNIQUE constraint violations | Try-catch with fallbacks | Use `INSERT OR IGNORE` (5 characters) | +| Session not found | Silent auto-creation | **FAIL LOUDLY** (it's a hook bug) | + +--- + +## The 7+ Failed Attempts + +- **Nov 4**: "Always store session data regardless of pre-existence." Complexity planted. +- **Nov 11**: `INSERT OR IGNORE` recognized. But complexity documented, not removed. +- **Nov 21**: Duplicate observations bug. Fixed. Then broken again by endless mode. +- **Dec 5**: "6 hours of work delivered zero value." User requests self-audit. +- **Dec 20**: "Phase 2: Eliminated Race Conditions" β€” felt like progress. Complexity remained. +- **Dec 24**: Finally, forced deletion. + +The user stated "hooks provide session IDs, no extra management needed" **seven times** across months. Claude didn't listen. + +--- + +## The Fix + +### Deleted (984 lines): +- 11 `SessionStore` methods: `incrementPromptCounter`, `getPromptCounter`, `setWorkerPort`, `getWorkerPort`, `markSessionCompleted`, `markSessionFailed`, `reactivateSession`, `findActiveSDKSession`, `findAnySDKSession`, `updateSDKSessionId` +- Auto-create logic from `storeObservation` and `storeSummary` +- The entire cleanup hook (was aborting SDK agent and causing data loss) +- 117 lines from `worker-utils.ts` + +### What remains (~10 lines): +```javascript +createSDKSession(sessionId) { + db.run('INSERT OR IGNORE INTO sdk_sessions (...) VALUES (...)'); + return db.query('SELECT id FROM sdk_sessions WHERE ...').get(sessionId); +} +``` + +**That's it.** + +--- + +## Behavior Change + +- **Before:** Missing session? Auto-create silently. Bug hidden. +- **After:** Missing session? Storage fails. Bug visible immediately. + +--- + +## New Tools + +Since we're now explicit about recovery instead of silently papering over problems: + +- `GET /api/pending-queue` - See what's stuck +- `POST /api/pending-queue/process` - Manually trigger recovery +- `npm run queue:check` / `npm run queue:process` - CLI equivalents + +--- + +## Dependencies +- Upgraded `@anthropic-ai/claude-agent-sdk` from `^0.1.67` to `^0.1.76` + +--- + +**PR #437:** https://github.com/thedotmack/claude-mem/pull/437 + *The evidence: Observations #3646, #6738, #7598, #12860, #12866, #13046, #15259, #20995, #21055, #30524, #31080, #32114, #32116, #32125, #32126, #32127, #32146, #32324β€”the complete record of a 3-month battle.* ## [8.0.6] - 2025-12-24 @@ -382,13 +447,13 @@ This represents a major reliability improvement for Windows users, eliminating c ## [7.3.5] - 2025-12-17 -## What's Changed -* fix(windows): solve zombie port problem with wrapper architecture by @ToxMox in https://github.com/thedotmack/claude-mem/pull/372 -* chore: bump version to 7.3.5 by @thedotmack in https://github.com/thedotmack/claude-mem/pull/375 - -## New Contributors -* @ToxMox made their first contribution in https://github.com/thedotmack/claude-mem/pull/372 - +## What's Changed +* fix(windows): solve zombie port problem with wrapper architecture by @ToxMox in https://github.com/thedotmack/claude-mem/pull/372 +* chore: bump version to 7.3.5 by @thedotmack in https://github.com/thedotmack/claude-mem/pull/375 + +## New Contributors +* @ToxMox made their first contribution in https://github.com/thedotmack/claude-mem/pull/372 + **Full Changelog**: https://github.com/thedotmack/claude-mem/compare/v7.3.4...v7.3.5 ## [7.3.4] - 2025-12-17 @@ -2918,12 +2983,12 @@ None (patch version) ## [4.3.0] - 2025-10-25 -## What's Changed -* feat: Enhanced context hook with session observations and cross-platform improvements by @thedotmack in https://github.com/thedotmack/claude-mem/pull/25 - -## New Contributors -* @thedotmack made their first contribution in https://github.com/thedotmack/claude-mem/pull/25 - +## What's Changed +* feat: Enhanced context hook with session observations and cross-platform improvements by @thedotmack in https://github.com/thedotmack/claude-mem/pull/25 + +## New Contributors +* @thedotmack made their first contribution in https://github.com/thedotmack/claude-mem/pull/25 + **Full Changelog**: https://github.com/thedotmack/claude-mem/compare/v4.2.11...v4.3.0 ## [4.2.10] - 2025-10-25 diff --git a/package.json b/package.json index 1f2d1c3b..211ede35 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "claude-mem", - "version": "8.2.0", + "version": "8.2.1", "description": "Memory compression system for Claude Code - persist context across sessions", "keywords": [ "claude", diff --git a/plugin/.claude-plugin/plugin.json b/plugin/.claude-plugin/plugin.json index be321bbb..075d5ba7 100644 --- a/plugin/.claude-plugin/plugin.json +++ b/plugin/.claude-plugin/plugin.json @@ -1,6 +1,6 @@ { "name": "claude-mem", - "version": "8.2.0", + "version": "8.2.1", "description": "Persistent memory system for Claude Code - seamlessly preserve context across sessions", "author": { "name": "Alex Newman" diff --git a/plugin/package.json b/plugin/package.json index d3168c2c..3576de3a 100644 --- a/plugin/package.json +++ b/plugin/package.json @@ -1,6 +1,6 @@ { "name": "claude-mem-plugin", - "version": "8.2.0", + "version": "8.2.1", "private": true, "description": "Runtime dependencies for claude-mem bundled hooks", "type": "module",