chore: release v8.2.1 - worker lifecycle hardening

Critical bug fixes for self-spawn pattern:
- Fix process exit detection in waitForProcessesExit
- Validate spawn PID before writing PID file
- Handle Unix/Windows kill errors in orphan cleanup
- Use /api/readiness for health checks

Refactoring (-580 lines):
- Deleted ProcessManager.ts, worker-cli.ts, worker-wrapper.ts
- Consolidated all lifecycle logic into worker-service.ts

Also: increased timeouts for slow systems, added test suites.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Alex Newman
2025-12-26 21:24:21 -05:00
parent d616307781
commit 071b0156a9
5 changed files with 174 additions and 109 deletions
+1 -1
View File
@@ -10,7 +10,7 @@
"plugins": [
{
"name": "claude-mem",
"version": "8.2.0",
"version": "8.2.1",
"source": "./plugin",
"description": "Persistent memory system for Claude Code - context compression across sessions"
}
+170 -105
View File
@@ -4,6 +4,71 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
## [8.2.1] - 2025-12-26
## 🔧 Worker Lifecycle Hardening
This patch release addresses critical bugs discovered during PR review of the self-spawn pattern introduced in 8.2.0. The worker daemon now handles edge cases robustly across both Unix and Windows platforms.
### 🐛 Critical Bug Fixes
#### Process Exit Detection Fixed
The `waitForProcessesExit` function was crashing when processes exited during monitoring. The `process.kill(pid, 0)` call throws when a process no longer exists, which was not being caught. Now wrapped in try/catch to correctly identify exited processes.
#### Spawn PID Validation
The worker daemon now validates that `spawn()` actually returned a valid PID before writing to the PID file. Previously, spawn failures could leave invalid PID files that broke subsequent lifecycle operations.
#### Cross-Platform Orphan Cleanup
- **Unix**: Replaced single `kill` command with individual `process.kill()` calls wrapped in try/catch, so one already-exited process doesn't abort cleanup of remaining orphans
- **Windows**: Wrapped `taskkill` calls in try/catch for the same reason
#### Health Check Reliability
Changed `waitForHealth` to use the `/api/readiness` endpoint (returns 503 until fully initialized) instead of just checking if the port is in use. Callers now wait for *actual* worker readiness, not just network availability.
### 🔄 Refactoring
#### Code Consolidation (-580 lines)
Deleted obsolete process management infrastructure that was replaced by the self-spawn pattern:
- `src/services/process/ProcessManager.ts` (433 lines) - PID management now in worker-service
- `src/cli/worker-cli.ts` (81 lines) - CLI handling now in worker-service
- `src/services/worker-wrapper.ts` (157 lines) - Replaced by `--daemon` flag
#### Updated Hook Commands
All hooks now use `worker-service.cjs` CLI directly instead of the deleted `worker-cli.js`.
### ⏱️ Timeout Adjustments
Increased timeouts throughout for compatibility with slow systems:
| Component | Before | After |
|-----------|--------|-------|
| Default hook timeout | 120s | 300s |
| Health check timeout | 1s | 30s |
| Health check retries | 15 | 300 |
| Context initialization | 30s | 300s |
| MCP connection | 15s | 300s |
| PowerShell commands | 5s | 60s |
| Git commands | 30s | 300s |
| NPM install | 120s | 600s |
| Hook worker commands | 30s | 180s |
### 🧪 Testing
Added comprehensive test suites:
- `tests/hook-constants.test.ts` - Validates timeout configurations
- `tests/worker-spawn.test.ts` - Tests worker CLI and health endpoints
### 🛡️ Additional Robustness
- PID validation in restart command (matches start command behavior)
- Try/catch around `forceKillProcess()` for graceful shutdown
- Try/catch around `getChildProcesses()` for Windows failures
- Improved logging for PID file operations and HTTP shutdown
---
**Full Changelog**: https://github.com/thedotmack/claude-mem/compare/v8.2.0...v8.2.1
## [8.2.0] - 2025-12-26
## 🚀 Gemini API as Alternative AI Provider
@@ -64,98 +129,98 @@ Huge thanks to **Alexander Knigge** ([@AlexanderKnigge](https://x.com/AlexanderK
## [8.1.0] - 2025-12-25
## The 3-Month Battle Against Complexity
**TL;DR:** For three months, Claude's instinct to add code instead of delete it caused the same bugs to recur. What should have been 5 lines of code became ~1000 lines, 11 useless methods, and 7+ failed "fixes." The timestamp corruption that finally broke things was just a symptom. The real achievement: **984 lines of code deleted.**
---
## What Actually Happened
Every Claude Code hook receives a session ID. That's all you need.
But Claude built an entire redundant session management system on top:
- An `sdk_sessions` table with status tracking, port assignment, and prompt counting
- 11 methods in `SessionStore` to manage this artificial complexity
- Auto-creation logic scattered across 3 locations
- A cleanup hook that "completed" sessions at the end
**Why?** Because it seemed "robust." Because "what if the session doesn't exist?"
But the edge cases didn't exist. Hooks ALWAYS provide session IDs. The "defensive" code was solving imaginary problems while creating real ones.
---
## The Pattern of Failure
Every time a bug appeared, Claude's instinct was to **ADD** more code:
| Bug | What Claude Added | What Should Have Happened |
|-----|------------------|--------------------------|
| Race conditions | Auto-create fallbacks | Delete the auto-create logic |
| Duplicate observations | Validation layers | Delete the code path allowing duplicates |
| UNIQUE constraint violations | Try-catch with fallbacks | Use `INSERT OR IGNORE` (5 characters) |
| Session not found | Silent auto-creation | **FAIL LOUDLY** (it's a hook bug) |
---
## The 7+ Failed Attempts
- **Nov 4**: "Always store session data regardless of pre-existence." Complexity planted.
- **Nov 11**: `INSERT OR IGNORE` recognized. But complexity documented, not removed.
- **Nov 21**: Duplicate observations bug. Fixed. Then broken again by endless mode.
- **Dec 5**: "6 hours of work delivered zero value." User requests self-audit.
- **Dec 20**: "Phase 2: Eliminated Race Conditions" — felt like progress. Complexity remained.
- **Dec 24**: Finally, forced deletion.
The user stated "hooks provide session IDs, no extra management needed" **seven times** across months. Claude didn't listen.
---
## The Fix
### Deleted (984 lines):
- 11 `SessionStore` methods: `incrementPromptCounter`, `getPromptCounter`, `setWorkerPort`, `getWorkerPort`, `markSessionCompleted`, `markSessionFailed`, `reactivateSession`, `findActiveSDKSession`, `findAnySDKSession`, `updateSDKSessionId`
- Auto-create logic from `storeObservation` and `storeSummary`
- The entire cleanup hook (was aborting SDK agent and causing data loss)
- 117 lines from `worker-utils.ts`
### What remains (~10 lines):
```javascript
createSDKSession(sessionId) {
db.run('INSERT OR IGNORE INTO sdk_sessions (...) VALUES (...)');
return db.query('SELECT id FROM sdk_sessions WHERE ...').get(sessionId);
}
```
**That's it.**
---
## Behavior Change
- **Before:** Missing session? Auto-create silently. Bug hidden.
- **After:** Missing session? Storage fails. Bug visible immediately.
---
## New Tools
Since we're now explicit about recovery instead of silently papering over problems:
- `GET /api/pending-queue` - See what's stuck
- `POST /api/pending-queue/process` - Manually trigger recovery
- `npm run queue:check` / `npm run queue:process` - CLI equivalents
---
## Dependencies
- Upgraded `@anthropic-ai/claude-agent-sdk` from `^0.1.67` to `^0.1.76`
---
**PR #437:** https://github.com/thedotmack/claude-mem/pull/437
## The 3-Month Battle Against Complexity
**TL;DR:** For three months, Claude's instinct to add code instead of delete it caused the same bugs to recur. What should have been 5 lines of code became ~1000 lines, 11 useless methods, and 7+ failed "fixes." The timestamp corruption that finally broke things was just a symptom. The real achievement: **984 lines of code deleted.**
---
## What Actually Happened
Every Claude Code hook receives a session ID. That's all you need.
But Claude built an entire redundant session management system on top:
- An `sdk_sessions` table with status tracking, port assignment, and prompt counting
- 11 methods in `SessionStore` to manage this artificial complexity
- Auto-creation logic scattered across 3 locations
- A cleanup hook that "completed" sessions at the end
**Why?** Because it seemed "robust." Because "what if the session doesn't exist?"
But the edge cases didn't exist. Hooks ALWAYS provide session IDs. The "defensive" code was solving imaginary problems while creating real ones.
---
## The Pattern of Failure
Every time a bug appeared, Claude's instinct was to **ADD** more code:
| Bug | What Claude Added | What Should Have Happened |
|-----|------------------|--------------------------|
| Race conditions | Auto-create fallbacks | Delete the auto-create logic |
| Duplicate observations | Validation layers | Delete the code path allowing duplicates |
| UNIQUE constraint violations | Try-catch with fallbacks | Use `INSERT OR IGNORE` (5 characters) |
| Session not found | Silent auto-creation | **FAIL LOUDLY** (it's a hook bug) |
---
## The 7+ Failed Attempts
- **Nov 4**: "Always store session data regardless of pre-existence." Complexity planted.
- **Nov 11**: `INSERT OR IGNORE` recognized. But complexity documented, not removed.
- **Nov 21**: Duplicate observations bug. Fixed. Then broken again by endless mode.
- **Dec 5**: "6 hours of work delivered zero value." User requests self-audit.
- **Dec 20**: "Phase 2: Eliminated Race Conditions" — felt like progress. Complexity remained.
- **Dec 24**: Finally, forced deletion.
The user stated "hooks provide session IDs, no extra management needed" **seven times** across months. Claude didn't listen.
---
## The Fix
### Deleted (984 lines):
- 11 `SessionStore` methods: `incrementPromptCounter`, `getPromptCounter`, `setWorkerPort`, `getWorkerPort`, `markSessionCompleted`, `markSessionFailed`, `reactivateSession`, `findActiveSDKSession`, `findAnySDKSession`, `updateSDKSessionId`
- Auto-create logic from `storeObservation` and `storeSummary`
- The entire cleanup hook (was aborting SDK agent and causing data loss)
- 117 lines from `worker-utils.ts`
### What remains (~10 lines):
```javascript
createSDKSession(sessionId) {
db.run('INSERT OR IGNORE INTO sdk_sessions (...) VALUES (...)');
return db.query('SELECT id FROM sdk_sessions WHERE ...').get(sessionId);
}
```
**That's it.**
---
## Behavior Change
- **Before:** Missing session? Auto-create silently. Bug hidden.
- **After:** Missing session? Storage fails. Bug visible immediately.
---
## New Tools
Since we're now explicit about recovery instead of silently papering over problems:
- `GET /api/pending-queue` - See what's stuck
- `POST /api/pending-queue/process` - Manually trigger recovery
- `npm run queue:check` / `npm run queue:process` - CLI equivalents
---
## Dependencies
- Upgraded `@anthropic-ai/claude-agent-sdk` from `^0.1.67` to `^0.1.76`
---
**PR #437:** https://github.com/thedotmack/claude-mem/pull/437
*The evidence: Observations #3646, #6738, #7598, #12860, #12866, #13046, #15259, #20995, #21055, #30524, #31080, #32114, #32116, #32125, #32126, #32127, #32146, #32324—the complete record of a 3-month battle.*
## [8.0.6] - 2025-12-24
@@ -382,13 +447,13 @@ This represents a major reliability improvement for Windows users, eliminating c
## [7.3.5] - 2025-12-17
## What's Changed
* fix(windows): solve zombie port problem with wrapper architecture by @ToxMox in https://github.com/thedotmack/claude-mem/pull/372
* chore: bump version to 7.3.5 by @thedotmack in https://github.com/thedotmack/claude-mem/pull/375
## New Contributors
* @ToxMox made their first contribution in https://github.com/thedotmack/claude-mem/pull/372
## What's Changed
* fix(windows): solve zombie port problem with wrapper architecture by @ToxMox in https://github.com/thedotmack/claude-mem/pull/372
* chore: bump version to 7.3.5 by @thedotmack in https://github.com/thedotmack/claude-mem/pull/375
## New Contributors
* @ToxMox made their first contribution in https://github.com/thedotmack/claude-mem/pull/372
**Full Changelog**: https://github.com/thedotmack/claude-mem/compare/v7.3.4...v7.3.5
## [7.3.4] - 2025-12-17
@@ -2918,12 +2983,12 @@ None (patch version)
## [4.3.0] - 2025-10-25
## What's Changed
* feat: Enhanced context hook with session observations and cross-platform improvements by @thedotmack in https://github.com/thedotmack/claude-mem/pull/25
## New Contributors
* @thedotmack made their first contribution in https://github.com/thedotmack/claude-mem/pull/25
## What's Changed
* feat: Enhanced context hook with session observations and cross-platform improvements by @thedotmack in https://github.com/thedotmack/claude-mem/pull/25
## New Contributors
* @thedotmack made their first contribution in https://github.com/thedotmack/claude-mem/pull/25
**Full Changelog**: https://github.com/thedotmack/claude-mem/compare/v4.2.11...v4.3.0
## [4.2.10] - 2025-10-25
+1 -1
View File
@@ -1,6 +1,6 @@
{
"name": "claude-mem",
"version": "8.2.0",
"version": "8.2.1",
"description": "Memory compression system for Claude Code - persist context across sessions",
"keywords": [
"claude",
+1 -1
View File
@@ -1,6 +1,6 @@
{
"name": "claude-mem",
"version": "8.2.0",
"version": "8.2.1",
"description": "Persistent memory system for Claude Code - seamlessly preserve context across sessions",
"author": {
"name": "Alex Newman"
+1 -1
View File
@@ -1,6 +1,6 @@
{
"name": "claude-mem-plugin",
"version": "8.2.0",
"version": "8.2.1",
"private": true,
"description": "Runtime dependencies for claude-mem bundled hooks",
"type": "module",