Files
claude-mem/docs/PM2-TO-BUN-MIGRATION.md
T

40 KiB

PM2 to Bun Migration: Complete Technical Documentation

Version: 7.1.0 Date: December 2025 Migration Type: Process Management (PM2 → Bun) + Database Driver (better-sqlite3 → bun:sqlite)


Table of Contents

  1. Executive Summary
  2. Architecture Comparison
  3. Migration Mechanics
  4. User Experience Timeline
  5. Platform-Specific Behavior
  6. Observable Changes
  7. File System State
  8. Edge Cases and Troubleshooting
  9. Developer Notes

Executive Summary

Claude-mem version 7.0.10 introduces two major architectural migrations:

  1. Process Management: PM2 → Custom Bun-based ProcessManager
  2. Database Driver: better-sqlite3 npm package → bun:sqlite runtime module

Both migrations are automatic and transparent to end users. The first time a hook fires after updating to 7.0.10+, the system performs a one-time cleanup of legacy PM2 processes and transitions to the new architecture.

Key Benefits

  • Simplified Dependencies: Removes PM2 and better-sqlite3 npm packages
  • Improved Cross-Platform Support: Better Windows compatibility
  • Faster Installation: No native module compilation required
  • Built-in Runtime: Leverages Bun's built-in process management and SQLite
  • Reduced Complexity: Custom ProcessManager is simpler than PM2 integration

Migration Impact

  • Data Preservation: User data, settings, and database remain unchanged
  • Automatic Cleanup: Old PM2 processes automatically terminated (all platforms)
  • No User Action Required: Migration happens automatically on first hook trigger
  • Backward Compatible: SQLite database format unchanged (only driver changed)

Architecture Comparison

Old System (PM2-based)

Process Management (PM2)

Component: PM2 (Process Manager 2)

  • Package: pm2 npm dependency
  • Process Name: claude-mem-worker
  • Management: External PM2 daemon manages lifecycle
  • Discovery: pm2 list, pm2 describe commands
  • Auto-restart: PM2 automatically restarts on crash
  • Logs: ~/.pm2/logs/claude-mem-worker-*.log
  • PID File: ~/.pm2/pids/claude-mem-worker.pid

Lifecycle Commands:

pm2 start <script>           # Start worker
pm2 stop claude-mem-worker   # Stop worker
pm2 restart claude-mem-worker # Restart worker
pm2 delete claude-mem-worker  # Remove from PM2
pm2 logs claude-mem-worker    # View logs

Pain Points:

  • Additional npm dependency required
  • PM2 daemon must be running
  • Potential conflicts with other PM2 processes
  • Windows compatibility issues
  • Complex configuration for simple use case

Database Driver (better-sqlite3)

Component: better-sqlite3

  • Package: better-sqlite3 npm package (native module)
  • Installation: Requires native compilation (node-gyp)
  • Windows: Requires Visual Studio build tools + Python
  • Import: import Database from 'better-sqlite3'
  • Verification: Extensive checks in smart-install.js

Installation Requirements:

  • Node.js development headers
  • C++ compiler (gcc/clang on Mac/Linux, MSVC on Windows)
  • Python (for node-gyp)
  • Windows: Visual Studio Build Tools

New System (Bun-based)

Process Management (Custom ProcessManager)

Component: Custom ProcessManager (src/services/process/ProcessManager.ts)

  • Package: Built-in Bun APIs (no external dependency)
  • Process Spawn: Bun.spawn() with detached mode
  • Management: Direct process control via PID file
  • Discovery: PID file + process existence check + HTTP health check
  • Auto-restart: Hook-triggered restart on failure detection
  • Logs: ~/.claude-mem/logs/worker-YYYY-MM-DD.log
  • PID File: ~/.claude-mem/.worker.pid
  • Port File: ~/.claude-mem/.worker.port (new)

Lifecycle Commands:

npm run worker:start    # Start worker
npm run worker:stop     # Stop worker
npm run worker:restart  # Restart worker
npm run worker:status   # Check status
npm run worker:logs     # View logs

Core Mechanisms:

  1. PID File Management:

    • File: ~/.claude-mem/.worker.pid
    • Content: Process ID (e.g., "35557")
    • Created: On worker start
    • Deleted: On worker stop
    • Validation: Process existence via kill(pid, 0) signal
  2. Port File Management:

    • File: ~/.claude-mem/.worker.port
    • Content: Two lines (port number, PID)
    • Purpose: Track port binding and validate PID match
    • Created: After successful port binding
    • Deleted: On worker stop
  3. Health Checking:

    • Layer 1: PID file exists?
    • Layer 2: Process alive? (kill(pid, 0))
    • Layer 3: HTTP health check (GET /health)
    • All three must pass for "healthy" status
  4. Port Validation:

    • Range: 1024-65535
    • Validation: At ProcessManager.start() entry point
    • Prevents: Invalid ports from reaching spawn logic

Advantages:

  • No external dependencies
  • Simpler codebase (direct control)
  • Better error handling and validation
  • Platform-agnostic (Bun handles platform differences)
  • Cleaner separation of concerns

Database Driver (bun:sqlite)

Component: bun:sqlite

  • Package: Built into Bun runtime (no npm package)
  • Installation: None required (comes with Bun ≥1.0)
  • Platform: Works anywhere Bun works
  • Import: import { Database } from 'bun:sqlite'
  • API: Similar to better-sqlite3 (synchronous)

Installation Requirements:

  • Bun ≥1.0 (automatically installed if missing)
  • No native compilation required
  • No platform-specific build tools needed

Compatibility:

  • SQLite database format: Unchanged
  • Database file: ~/.claude-mem/claude-mem.db (same location)
  • Query syntax: Identical (both use SQLite SQL)
  • API surface: Similar (both provide synchronous SQLite API)

Migration Mechanics

One-Time PM2 Cleanup

The migration system uses a marker-based approach to perform PM2 cleanup exactly once.

Implementation: src/shared/worker-utils.ts:73-86

// Clean up legacy PM2 (one-time migration)
const pm2MigratedMarker = join(DATA_DIR, '.pm2-migrated');

if (!existsSync(pm2MigratedMarker)) {
  try {
    spawnSync('pm2', ['delete', 'claude-mem-worker'], { stdio: 'ignore' });
    // Mark migration as complete
    writeFileSync(pm2MigratedMarker, new Date().toISOString(), 'utf-8');
    logger.debug('SYSTEM', 'PM2 cleanup completed and marked');
  } catch {
    // PM2 not installed or process doesn't exist - still mark as migrated
    writeFileSync(pm2MigratedMarker, new Date().toISOString(), 'utf-8');
  }
}

Migration Trigger Points

Hook Path (where migration happens):

  1. SessionStart, UserPromptSubmit, PostToolUse hooks execute
  2. Hook calls ensureWorkerRunning() (worker-utils.ts)
  3. ensureWorkerRunning() determines worker not running (no PID file)
  4. Calls startWorker() (worker-utils.ts)
  5. startWorker() checks for migration marker
  6. If marker missing: Runs PM2 cleanup, creates marker
  7. If marker exists: Skips cleanup
  8. Proceeds to start new Bun-managed worker

CLI Path (bypasses migration):

  1. User runs npm run worker:start|stop|restart
  2. CLI calls ProcessManager.start|stop|restart() directly
  3. ProcessManager methods do NOT check migration marker
  4. No PM2 cleanup attempted
  5. Direct Bun process management

Key Insight: Migration only happens via hook path, not CLI path. This is intentional - CLI starts are explicit user actions, while hooks represent automatic background starts.

Migration Steps (First Hook Trigger)

  1. Marker Check:

    • Check: ~/.claude-mem/.pm2-migrated exists?
    • Missing → Continue to cleanup
    • Present → Skip to worker start
  2. PM2 Cleanup Attempt:

    • Executed on all platforms (Mac/Linux/Windows)
    • Safe due to try/catch error handling
  3. PM2 Cleanup:

    • Execute: pm2 delete claude-mem-worker
    • Ignore errors (PM2 might not be installed, process might not exist)
    • This terminates the old PM2-managed worker
  4. Marker Creation:

    • Write: ISO timestamp to ~/.claude-mem/.pm2-migrated
    • Purpose: Prevent repeated cleanup attempts
    • Created even if PM2 cleanup failed
  5. New Worker Start:

    • Spawn: New Bun-managed worker process
    • Create: .worker.pid and .worker.port files
    • Log: Worker startup in ~/.claude-mem/logs/

Marker File

Location: ~/.claude-mem/.pm2-migrated

Content: ISO 8601 timestamp

2025-12-13T00:18:39.673Z

Purpose:

  • One-time migration flag
  • Prevents repeated PM2 cleanup on every start
  • Persists across restarts and reboots

Lifecycle:

  • Created: First hook trigger after update to 7.0.10+ (all platforms)
  • Updated: Never
  • Deleted: Never (user could manually delete to force re-migration)

Platform Behavior:

  • All Platforms: Created on first hook trigger after update
  • Cross-platform: Same migration behavior on Mac/Linux/Windows

User Experience Timeline

Pre-Update State (Version < 7.0.10)

Process Management:

  • Worker managed by PM2 daemon
  • Process name: claude-mem-worker
  • PID file: ~/.pm2/pids/claude-mem-worker.pid
  • Logs: ~/.pm2/logs/claude-mem-worker-*.log

Database:

  • Driver: better-sqlite3 npm package
  • Database file: ~/.claude-mem/claude-mem.db
  • Native module: Compiled during npm install

User Commands:

pm2 list                     # See worker status
pm2 logs claude-mem-worker   # View logs
pm2 restart claude-mem-worker # Restart worker

Update Process

Method 1: Automatic Update

  1. Claude Code checks for plugin updates
  2. Downloads claude-mem 7.0.10+
  3. Syncs to ~/.claude/plugins/marketplaces/thedotmack/
  4. New hook scripts deployed

Method 2: Manual Update

cd ~/Scripts/claude-mem
git pull origin main
npm run build
npm run sync-marketplace

What Gets Updated:

  • Hook scripts (6 files in plugin/scripts/*-hook.js)
  • Worker service code (bundled)
  • Skill definitions
  • Package metadata

What Doesn't Change:

  • User data: ~/.claude-mem/claude-mem.db (unchanged)
  • Settings: ~/.claude-mem/settings.json (unchanged)
  • Chroma: ~/.claude-mem/chroma/ (unchanged)
  • Logs: ~/.claude-mem/logs/ (preserved)

Old Worker State During Update:

  • Old PM2 worker may still be running
  • Running old code (pre-7.0.10)
  • Will continue until next hook trigger or manual stop

First Session After Update (Critical Migration Moment)

Trigger: User opens Claude Code, any hook fires (SessionStart most common)

Step-by-Step Execution:

  1. Hook Execution (using new 7.0.10 code):

    SessionStart hook fires
    → Calls ensureWorkerRunning()
    
  2. Worker Status Check:

    ensureWorkerRunning() checks:
    - Does ~/.claude-mem/.worker.pid exist? NO
    - Conclusion: Worker not running (from new system perspective)
    
  3. Start Worker Decision:

    Worker not running → Call startWorker()
    
  4. Migration Check:

    startWorker() checks:
    - Marker: ~/.claude-mem/.pm2-migrated exists? NO
    
  5. PM2 Cleanup (all platforms):

    Execute: pm2 delete claude-mem-worker
    Result: Old PM2 worker terminated (if exists)
    Create: ~/.claude-mem/.pm2-migrated with timestamp
    Log: "PM2 cleanup completed and marked"
    
  6. New Worker Start:

    Spawn: bun plugin/scripts/worker-cli.js start <port>
    Create: ~/.claude-mem/.worker.pid (e.g., "35557")
    Create: ~/.claude-mem/.worker.port (port + PID)
    Log: Worker startup in ~/.claude-mem/logs/worker-YYYY-MM-DD.log
    
  7. Verification:

    Check: Process exists (kill -0)
    Check: HTTP health check (GET /health)
    Result: Worker confirmed running
    
  8. Hook Completion:

    Hook returns success
    Claude Code session starts normally
    

User Observable Behavior:

  • Slight delay on first startup (PM2 cleanup + new worker spawn)
  • No error messages (cleanup failures silently handled)
  • Worker appears running via npm run worker:status
  • Old PM2 worker no longer in pm2 list

Timing:

  • Total migration time: ~2-5 seconds
  • PM2 cleanup: ~1 second
  • New worker spawn: ~1-3 seconds
  • Health check: ~1 second

Subsequent Sessions (After Migration)

Every Hook Trigger:

  1. Hook Execution:

    Any hook fires → ensureWorkerRunning()
    
  2. Worker Status Check:

    Check 1: ~/.claude-mem/.worker.pid exists? YES
    Check 2: Process alive (kill -0)? YES
    Check 3: HTTP health check? SUCCESS
    Result: Worker already running, done
    
  3. No Migration Logic:

    startWorker() NOT called
    Marker check NOT performed
    PM2 cleanup NOT attempted
    Fast path: ~50ms total
    

If Worker Needs Restart:

Scenario: Worker crashed, PID file stale

Check 1: PID file exists? YES (35557)
Check 2: Process alive? NO (process 35557 dead)
Action: Call startWorker()
Migration: Marker exists → skip PM2 cleanup
Result: Spawn new worker immediately

CLI Commands (all sessions):

npm run worker:status   # Check: PID file + process + health
npm run worker:restart  # Kill current, spawn new
npm run worker:stop     # Kill current, delete PID files
npm run worker:start    # Spawn new (if not running)
npm run worker:logs     # tail -f logs/worker-YYYY-MM-DD.log

Key Differences from First Session:

  • No PM2 cleanup (marker exists)
  • No migration delay
  • Faster startup (~1-2 seconds vs ~2-5 seconds)

Platform-Specific Behavior

macOS (Darwin)

First Session After Update:

  1. Marker Check:

    File: ~/.claude-mem/.pm2-migrated
    Exists: NO
    
  2. PM2 Cleanup:

    Command: pm2 delete claude-mem-worker
    
    Possible Outcomes:
    A) PM2 installed, process exists:
       → Successfully deleted, exit code 0
    
    B) PM2 installed, process doesn't exist:
       → Error: "process claude-mem-worker not found"
       → Exit code 1, error ignored
    
    C) PM2 not installed:
       → Error: "command not found: pm2"
       → Error ignored (catch block)
    
  3. Marker Creation:

    File: ~/.claude-mem/.pm2-migrated
    Content: 2025-12-13T00:18:39.673Z
    Created: Regardless of PM2 cleanup success/failure
    
  4. New Worker:

    Spawn: bun plugin/scripts/worker-cli.js start 37777
    Detached: true (process runs independently)
    Stdout/Stderr: ~/.claude-mem/logs/worker-YYYY-MM-DD.log
    

Subsequent Sessions:

  • Marker exists → PM2 cleanup skipped
  • Standard ProcessManager flow
  • Fast startup (~50ms status check)

macOS-Specific Notes:

  • POSIX signal handling (kill -0, SIGTERM work natively)
  • Bun fully supported on macOS
  • No platform-specific workarounds needed

Linux

Behavior: Identical to macOS

First Session:

  • Marker check → Missing
  • PM2 cleanup → Attempted
  • Marker created → ~/.claude-mem/.pm2-migrated

Subsequent Sessions:

  • Marker exists → Skip cleanup
  • Standard ProcessManager flow

Linux-Specific Notes:

  • POSIX signal handling (same as macOS)
  • Systemd integration possible (not implemented)
  • Process management via standard Linux APIs

Distribution Compatibility:

  • Ubuntu/Debian: Fully supported
  • RHEL/CentOS: Fully supported
  • Arch: Fully supported
  • Alpine: Bun may require glibc (not musl)

Windows

First Session After Update:

  1. Marker Check:

    File: ~/.claude-mem/.pm2-migrated
    Exists: NO
    
  2. PM2 Cleanup Attempt:

    Execute: pm2 delete claude-mem-worker
    
    Possible Outcomes:
    A) PM2 installed, process exists:
       → Successfully deleted, exit code 0
    
    B) PM2 installed, process doesn't exist:
       → Error: "process claude-mem-worker not found"
       → Exit code 1, error ignored
    
    C) PM2 not installed:
       → Error: "command not found: pm2" (or pm2.cmd on Windows)
       → Error ignored (catch block)
    
    D) PM2.cmd exists but fails:
       → Error caught and ignored
    
  3. Marker Creation:

    File: ~/.claude-mem/.pm2-migrated
    Content: 2025-12-13T00:18:39.673Z
    Created: Regardless of PM2 cleanup success/failure
    
  4. New Worker:

    Spawn: bun plugin/scripts/worker-cli.js start 37777
    Detached: true (Windows process detachment)
    Stdout/Stderr: ~/.claude-mem/logs/worker-YYYY-MM-DD.log
    

Subsequent Sessions:

  • Marker exists → PM2 cleanup skipped
  • Standard ProcessManager flow
  • Fast startup (~50ms status check)

Windows-Specific Notes:

  1. PM2 Cleanup on Windows:

    • Now runs on Windows just like Mac/Linux
    • Safe due to try/catch error handling
    • Even if PM2 had issues historically, orphaned processes are cleaned up
    • Quality migration: no garbage processes left behind
  2. Signal Handling:

    • Windows doesn't support POSIX signals (SIGTERM, etc.)
    • Bun abstracts this: kill(pid, 0) works on Windows
    • Process termination uses Windows APIs internally
  3. Path Separators:

    • Bun handles ~/.claude-mem/ on Windows (C:\Users\<user>\.claude-mem\)
    • Path module ensures correct separators
    • Works seamlessly across platforms
  4. File Locking:

    • Windows file locking stricter than Unix
    • SQLite database handles this (bun:sqlite)
    • PID/port files use atomic writes

Windows Command Equivalents:

npm run worker:status   # Works (uses HTTP + process check)
npm run worker:restart  # Works (Bun process management)
npm run worker:logs     # Works (PowerShell compatible)

Platform Comparison Table

Feature macOS Linux Windows
PM2 Cleanup Attempted Attempted Attempted
Marker File Created Created Created
Process Signals POSIX (native) POSIX (native) Bun abstraction
Bun Support Full Full Full
PID File Yes Yes Yes
Port File Yes Yes Yes
Health Check HTTP HTTP HTTP
Migration Delay ~2-5s first time ~2-5s first time ~2-5s first time

Observable Changes

Command Changes

Old PM2 CommandsNew Bun Commands:

Old (PM2) New (Bun) Notes
pm2 list npm run worker:status Shows worker status
pm2 start <script> npm run worker:start Start worker
pm2 stop claude-mem-worker npm run worker:stop Stop worker
pm2 restart claude-mem-worker npm run worker:restart Restart worker
pm2 delete claude-mem-worker npm run worker:stop Remove worker
pm2 logs claude-mem-worker npm run worker:logs View logs
pm2 describe claude-mem-worker npm run worker:status Detailed status
pm2 monit No equivalent PM2-specific monitoring

New Commands Work Everywhere:

  • Cross-platform (Mac/Linux/Windows)
  • No PM2 installation required
  • Consistent behavior across platforms

File Location Changes

Logs:

Old: ~/.pm2/logs/claude-mem-worker-out.log
     ~/.pm2/logs/claude-mem-worker-error.log

New: ~/.claude-mem/logs/worker-YYYY-MM-DD.log

PID Files:

Old: ~/.pm2/pids/claude-mem-worker.pid

New: ~/.claude-mem/.worker.pid

Process State:

Old: PM2 daemon memory (pm2 save)

New: ~/.claude-mem/.worker.pid
     ~/.claude-mem/.worker.port
     ~/.claude-mem/.pm2-migrated (all platforms)

Database (unchanged):

Same: ~/.claude-mem/claude-mem.db

User-Visible Changes

Before Update:

$ pm2 list
┌────┬────────────────────┬─────────┬─────────┬──────────┐
│ id │ name               │ status  │ restart │ uptime   │
├────┼────────────────────┼─────────┼─────────┼──────────┤
│ 0  │ claude-mem-worker  │ online  │ 0       │ 2d 5h    │
└────┴────────────────────┴─────────┴─────────┴──────────┘

$ pm2 logs claude-mem-worker
[2025-12-12 10:00:00] Worker started on port 37777
[2025-12-12 10:01:00] Processing observation #1234

After Update:

$ pm2 list
┌────┬────────┬─────────┬─────────┬──────────┐
│ id │ name   │ status  │ restart │ uptime   │
├────┼────────┼─────────┼─────────┼──────────┤
└────┴────────┴─────────┴─────────┴──────────┘
# Empty - worker no longer managed by PM2

$ npm run worker:status
Worker is running
PID: 35557
Port: 37777
Uptime: 2h 15m

$ npm run worker:logs
[2025-12-13 00:18:40] Worker started on port 37777
[2025-12-13 00:19:00] Processing observation #1235

Debugging Changes

Old System:

# Get detailed process info
pm2 describe claude-mem-worker

# Show process tree
pm2 prettylist

# Flush logs
pm2 flush

# Monitor in real-time
pm2 monit

New System:

# Get detailed process info
npm run worker:status
cat ~/.claude-mem/.worker.pid
cat ~/.claude-mem/.worker.port

# Show process info (direct)
ps aux | grep worker-cli

# View logs
npm run worker:logs
# Or directly:
tail -f ~/.claude-mem/logs/worker-$(date +%Y-%m-%d).log

# Check migration status
ls -la ~/.claude-mem/.pm2-migrated
cat ~/.claude-mem/.pm2-migrated

Orphaned Files

After migration, these PM2 files may remain (safe to delete):

~/.pm2/                    # Entire PM2 directory
~/.pm2/logs/               # Old logs
~/.pm2/pids/               # Old PID files
~/.pm2/pm2.log             # PM2 daemon log
~/.pm2/dump.pm2            # PM2 process dump

Cleanup (optional):

# Remove PM2 entirely (if not used for other processes)
pm2 kill
rm -rf ~/.pm2

# Or just remove claude-mem logs
rm -f ~/.pm2/logs/claude-mem-worker-*.log
rm -f ~/.pm2/pids/claude-mem-worker.pid

File System State

PID File (.worker.pid)

Location: ~/.claude-mem/.worker.pid

Content: Single line with process ID

35557

Lifecycle:

Worker Start:
1. Spawn Bun process
2. Get PID from spawn result
3. Write PID to .worker.pid
4. File created

Worker Running:
- File exists (read-only after creation)
- Used for process checks

Worker Stop:
1. Read PID from .worker.pid
2. Send SIGTERM to process
3. Wait for graceful shutdown
4. Delete .worker.pid
5. File removed

Validation:

// Check if worker is running
const pidFile = join(DATA_DIR, '.worker.pid');
if (!existsSync(pidFile)) return false;

const pid = parseInt(readFileSync(pidFile, 'utf-8'));
if (isNaN(pid)) return false;

// Verify process exists
try {
  process.kill(pid, 0); // Signal 0 = existence check
  return true; // Process exists
} catch {
  return false; // Process dead
}

Edge Cases:

  • Stale PID file: Process died, file remains → Detected and cleaned up
  • Corrupt PID file: Non-numeric content → Treated as not running
  • Missing PID file: Worker not running → Start new worker

Port File (.worker.port)

Location: ~/.claude-mem/.worker.port

Content: Two lines (port, PID)

37777
35557

Purpose:

  1. Remember which port worker is using
  2. Validate port file matches current PID
  3. Prevent stale port information

Lifecycle:

Worker Start:
1. Spawn Bun process (PID: 35557)
2. Worker binds to port (37777)
3. Write port file:
   Line 1: 37777
   Line 2: 35557
4. File created

Worker Running:
- File exists (read-only)
- Used to get worker port

Worker Stop:
1. Read PID from .worker.pid
2. Kill process
3. Delete .worker.port
4. Delete .worker.pid
5. Files removed

Validation:

// Get worker port with PID validation
const portFile = join(DATA_DIR, '.worker.port');
if (!existsSync(portFile)) return null;

const [portStr, pidStr] = readFileSync(portFile, 'utf-8').split('\n');
const port = parseInt(portStr);
const filePid = parseInt(pidStr);

// Check PID matches current worker
const currentPid = getWorkerPid(); // Read from .worker.pid
if (filePid !== currentPid) {
  // PID mismatch - port file stale
  unlinkSync(portFile);
  return null;
}

return port;

Why Two Files?:

  • .worker.pid: Canonical source of truth (which process is worker)
  • .worker.port: Cached port info (avoid config file reads)
  • PID in port file: Validation (ensure port file matches current worker)

Migration Marker (.pm2-migrated)

Location: ~/.claude-mem/.pm2-migrated

Content: ISO 8601 timestamp

2025-12-13T00:18:39.673Z

Purpose:

  • One-time migration flag
  • Prevents repeated PM2 cleanup
  • Debugging aid (when was migration performed)

Lifecycle:

First Hook Trigger (All Platforms):
1. Check: File exists? NO
2. Execute: pm2 delete claude-mem-worker (errors ignored)
3. Create: .pm2-migrated with timestamp
4. File persists forever

Subsequent Hook Triggers (All Platforms):
1. Check: File exists? YES
2. Action: Skip PM2 cleanup
3. Continue: Start worker normally

Platform Behavior:

  • All Platforms: Consistent migration behavior
  • Mac/Linux/Windows: File created on first hook trigger

Manual Intervention:

# Force re-migration (all platforms)
rm ~/.claude-mem/.pm2-migrated
# Next hook trigger will re-run PM2 cleanup

# Check migration status
ls -la ~/.claude-mem/.pm2-migrated  # Mac/Linux
dir %USERPROFILE%\.claude-mem\.pm2-migrated  # Windows

cat ~/.claude-mem/.pm2-migrated
# Output: 2025-12-13T00:18:39.673Z

File Permissions

PID and Port Files:

-rw-r--r--  1 user  staff  5 Dec 13 00:18 .worker.pid
-rw-r--r--  1 user  staff 11 Dec 13 00:18 .worker.port
  • Readable by all (needed for status checks)
  • Writable by owner only

Migration Marker:

-rw-r--r--  1 user  staff 25 Dec 13 00:18 .pm2-migrated
  • Readable by all
  • Writable by owner only
  • Content not sensitive (just timestamp)

Database:

-rw-r--r--  1 user  staff 10485760 Dec 13 00:20 claude-mem.db
  • Readable/writable by owner
  • Contains user data (observations, sessions)

State Directory Structure

Before Migration (PM2 system):

~/.claude-mem/
├── claude-mem.db          # Database (unchanged)
├── chroma/                # Vector embeddings (unchanged)
├── logs/                  # Application logs (unchanged)
└── settings.json          # User settings (unchanged)

~/.pm2/
├── logs/
│   ├── claude-mem-worker-out.log
│   └── claude-mem-worker-error.log
├── pids/
│   └── claude-mem-worker.pid
└── pm2.log

After Migration (Bun system):

~/.claude-mem/
├── claude-mem.db          # Database (same file)
├── chroma/                # Vector embeddings (unchanged)
├── logs/
│   └── worker-2025-12-13.log  # New log format
├── settings.json          # User settings (unchanged)
├── .worker.pid            # ← NEW: Process ID
├── .worker.port           # ← NEW: Port + PID
└── .pm2-migrated          # ← NEW: Migration marker (all platforms)

~/.pm2/                    # ← Orphaned (safe to delete)
├── logs/                  # Old logs (no longer written)
├── pids/                  # Old PID (no longer updated)
└── pm2.log                # PM2 daemon log (not used)

Edge Cases and Troubleshooting

Scenario 1: Migration Fails (PM2 Still Running)

Symptoms:

  • pm2 list still shows claude-mem-worker
  • Port conflict errors in logs
  • Worker fails to start

Diagnosis:

# Check if old PM2 worker running
pm2 list

# Check migration marker
cat ~/.claude-mem/.pm2-migrated
# If missing → migration not attempted or failed

Causes:

  1. PM2 cleanup threw exception (caught silently)
  2. PM2 process resurrection (if configured with --watch)
  3. User manually started PM2 worker after migration

Resolution:

# Manual cleanup
pm2 delete claude-mem-worker
pm2 save  # Persist the deletion

# Force re-migration (optional)
rm ~/.claude-mem/.pm2-migrated

# Restart worker
npm run worker:restart

Scenario 2: Stale PID File (Process Dead)

Symptoms:

  • npm run worker:status shows "not running"
  • .worker.pid file exists
  • Process ID doesn't exist in ps aux

Diagnosis:

# Check PID file
cat ~/.claude-mem/.worker.pid
# Example: 35557

# Check if process exists
ps aux | grep 35557
# No result → process dead

# Or use kill -0
kill -0 35557 2>&1
# Output: "No such process"

Causes:

  1. Worker crashed
  2. Process manually killed (kill 35557)
  3. System reboot (PID file persists across reboots)

Automatic Recovery:

Next hook trigger:
1. Read PID: 35557
2. Check existence: Process dead
3. Cleanup: Delete .worker.pid
4. Action: Start new worker
5. Result: Automatic recovery

Manual Resolution:

# Clean up stale files
rm ~/.claude-mem/.worker.pid
rm ~/.claude-mem/.worker.port

# Start fresh worker
npm run worker:start

Scenario 3: Port File PID Mismatch

Symptoms:

  • Worker running but port unknown
  • Port cache returns null
  • Settings updates don't find worker

Diagnosis:

# Check PID file
cat ~/.claude-mem/.worker.pid
# Output: 36000

# Check port file
cat ~/.claude-mem/.worker.port
# Output:
# 37777
# 35557  ← Different PID!

Causes:

  1. Worker restarted but port file not updated
  2. Race condition during restart
  3. Manual file modification

Automatic Recovery:

// Code handles this automatically
const port = getWorkerPort();
if (port === null) {
  // PID mismatch detected, port file deleted
  // Re-read from settings
  return getPortFromSettings();
}

Manual Resolution:

# Remove stale port file
rm ~/.claude-mem/.worker.port

# Port will be re-read from settings on next access

Scenario 4: Simultaneous Hook Triggers (Race Condition)

Symptoms:

  • Multiple worker processes spawned
  • Port binding failures
  • Duplicate entries in logs

Diagnosis:

# Check for multiple workers
ps aux | grep worker-cli
# Shows 2+ worker processes

# Check port binding
lsof -i :37777
# Shows which process has the port

Cause:

  • Two hooks fire simultaneously
  • Both check PID file (missing)
  • Both attempt to start worker
  • First succeeds, second fails (port in use)

Automatic Recovery:

First worker:
1. Spawns successfully
2. Binds to port 37777
3. Writes PID file
4. Running

Second worker:
1. Spawns successfully
2. Attempts to bind to port 37777
3. Error: Address already in use
4. Worker exits
5. No PID file written (first worker owns it)

Result: One worker running (correct state)

Prevention:

// ProcessManager.start() checks if already running
const isRunning = await this.isRunning();
if (isRunning) {
  return { success: true, pid: currentPid };
}
// Prevents double-start

Scenario 5: Health Check Fails (Worker Running but Unhealthy)

Symptoms:

  • Worker process exists
  • npm run worker:status shows "not running"
  • HTTP health check fails

Diagnosis:

# Check process exists
cat ~/.claude-mem/.worker.pid
ps aux | grep $(cat ~/.claude-mem/.worker.pid)
# Process is running

# Check HTTP health
curl http://localhost:37777/health
# Connection refused or timeout

Causes:

  1. Worker startup incomplete (still initializing)
  2. Worker crashed after spawn (zombie process)
  3. Port binding failed but process didn't exit
  4. Firewall blocking localhost connections

Automatic Recovery:

Hook health check:
1. PID exists: YES
2. Process alive: YES
3. HTTP health: FAIL
4. Action: Kill process, restart worker
5. Result: Fresh worker spawned

Manual Resolution:

# Kill unhealthy worker
kill $(cat ~/.claude-mem/.worker.pid)

# Clean up state
rm ~/.claude-mem/.worker.pid
rm ~/.claude-mem/.worker.port

# Start fresh
npm run worker:start

# Verify health
curl http://localhost:37777/health
# Should return: {"status":"healthy"}

Scenario 6: Fresh Install (Never Had PM2)

Symptoms:

  • User installs claude-mem 7.0.10+ for first time
  • No previous PM2 installation
  • Migration marker created but PM2 cleanup fails

Diagnosis:

# Check PM2
pm2 list
# Output: command not found: pm2

# Check marker
cat ~/.claude-mem/.pm2-migrated
# File exists (created despite PM2 not found)

Expected Behavior:

First hook trigger:
1. Marker check: Missing
2. PM2 cleanup: Attempted
3. Error: "command not found: pm2"
4. Catch block: Error ignored
5. Marker creation: Success
6. Worker start: Success

Result: Normal startup, marker created, no issues

No Action Needed: This is expected and correct behavior.

Scenario 7: Manual Marker Deletion

Symptoms:

  • User deletes .pm2-migrated file
  • Next hook trigger runs PM2 cleanup again

Diagnosis:

# Check marker
ls ~/.claude-mem/.pm2-migrated
# File not found (user deleted it)

Behavior:

Next hook trigger:
1. Marker check: Missing
2. PM2 cleanup: Attempted
3. Result: No PM2 worker exists (already cleaned)
4. Error: "process claude-mem-worker not found"
5. Catch block: Ignored
6. Marker recreation: Success
7. Worker start: Normal

Result: No harm done, marker recreated

Impact: Minimal (one extra PM2 command execution, ~1 second delay)

Common Error Messages

Error: EADDRINUSE: address already in use

Cause: Another process (or old worker) using port
Resolution:
1. Check: lsof -i :37777
2. Kill: kill -9 <PID>
3. Restart: npm run worker:restart

Error: No such process

Cause: PID file references dead process
Resolution: Automatic cleanup on next hook trigger
Manual: rm ~/.claude-mem/.worker.pid && npm run worker:start

Error: pm2: command not found (during migration)

Cause: PM2 not installed (fresh install or already uninstalled)
Resolution: None needed (error is caught and ignored)
Impact: Migration completes normally

Error: Invalid port X. Must be between 1024 and 65535

Cause: Port validation failed
Resolution: Update settings to use valid port
Command: Edit ~/.claude-mem/settings.json

Error: Failed to bind to port

Cause: Port already in use, or permission denied (<1024)
Resolution:
1. Check: lsof -i :<port>
2. Change: Update CLAUDE_MEM_WORKER_PORT in settings
3. Restart: npm run worker:restart

Developer Notes

Testing the Migration

Test Environment Setup:

# 1. Install old version (with PM2)
git checkout <pre-7.0.10-tag>
npm install
npm run build
npm run sync-marketplace

# 2. Start PM2 worker
pm2 start plugin/scripts/worker-cli.js --name claude-mem-worker

# 3. Verify PM2 running
pm2 list  # Should show claude-mem-worker

# 4. Update to new version
git checkout main
npm install
npm run build
npm run sync-marketplace

# 5. Trigger hook (simulate Claude Code session)
# Open Claude Code, or manually trigger:
node plugin/scripts/session-start-hook.js

# 6. Verify migration
pm2 list  # Should NOT show claude-mem-worker
cat ~/.claude-mem/.pm2-migrated  # Should exist (all platforms)
npm run worker:status  # Should show Bun worker running

Automated Testing:

# Run test suite (includes migration tests)
npm test

# Specific migration tests
npm test -- src/services/process/ProcessManager.test.ts

Architecture Decisions

Why Custom ProcessManager Instead of PM2?:

  1. Simplicity: Direct control, no external daemon
  2. Dependencies: Remove npm dependency
  3. Cross-platform: Bun handles platform differences
  4. Bundle Size: Reduce plugin package size
  5. Control: Fine-grained error handling and validation

Why PID File Instead of PM2 Daemon?:

  1. Simplicity: Filesystem-based state (no daemon)
  2. Debugging: Easy to inspect (cat .worker.pid)
  3. Reliability: No daemon failure scenarios
  4. Unix Philosophy: Simple, composable tools

Why One-Time Marker Instead of Always Running PM2 Delete?:

  1. Performance: Avoid unnecessary process spawning
  2. Idempotency: Migration runs exactly once
  3. Debugging: Timestamp shows when migration occurred
  4. Simplicity: Clear migration state

Why Run PM2 Cleanup on All Platforms?:

  1. Quality Migration: Clean up orphaned processes, even if PM2 had issues
  2. Consistency: Same behavior across all platforms
  3. Safety: Error handling already in place (try/catch)
  4. No Downside: If PM2 not installed, error is caught and ignored

Future Considerations

Potential Improvements:

  1. Systemd Integration (Linux): Optional systemd unit file for system-level management
  2. launchd Integration (macOS): Optional launchd plist for startup on boot
  3. Windows Service: Optional Windows Service wrapper
  4. Process Monitoring: Built-in restart on crash (without waiting for hook)
  5. Graceful Shutdown: SIGTERM handler for clean database closing

Migration Cleanup (future version):

  1. After ~6 months (all users migrated), remove PM2 cleanup code
  2. Remove .pm2-migrated marker file logic
  3. Simplify startWorker() function
  4. Keep ProcessManager as permanent architecture

Core Implementation:

  • src/services/process/ProcessManager.ts - Main process management
  • src/shared/worker-utils.ts - Worker utilities, migration logic
  • src/cli/worker-cli.ts - CLI commands

Database:

  • src/services/sqlite/Database.ts - bun:sqlite integration
  • src/types/database.ts - Type definitions

Documentation:

  • docs/public/architecture/database.mdx - Database architecture
  • docs/public/architecture/overview.mdx - System overview
  • plugin/skills/troubleshoot/operations/worker.md - Worker troubleshooting

Tests:

  • src/services/process/ProcessManager.test.ts - Process management tests
  • src/hooks/__tests__/full-lifecycle.test.ts - Integration tests

Code References

Migration Marker Logic:

// src/shared/worker-utils.ts:74-86
const pm2MigratedMarker = join(DATA_DIR, '.pm2-migrated');

if (!existsSync(pm2MigratedMarker)) {
  try {
    spawnSync('pm2', ['delete', 'claude-mem-worker'], { stdio: 'ignore' });
    writeFileSync(pm2MigratedMarker, new Date().toISOString(), 'utf-8');
    logger.debug('SYSTEM', 'PM2 cleanup completed and marked');
  } catch {
    writeFileSync(pm2MigratedMarker, new Date().toISOString(), 'utf-8');
  }
}

Port Validation:

// src/services/process/ProcessManager.ts:27-33
if (isNaN(port) || port < 1024 || port > 65535) {
  return {
    success: false,
    error: `Invalid port ${port}. Must be between 1024 and 65535`
  };
}

Health Check Layers:

// src/shared/worker-utils.ts (conceptual)
// Layer 1: PID file check
const pidFile = join(DATA_DIR, '.worker.pid');
if (!existsSync(pidFile)) return false;

// Layer 2: Process existence check
const pid = parseInt(readFileSync(pidFile, 'utf-8'));
try {
  process.kill(pid, 0);
} catch {
  return false;
}

// Layer 3: HTTP health check
const response = await fetch(`http://localhost:${port}/health`);
return response.ok;

Summary

The migration from PM2 to Bun-based ProcessManager is a one-time, automatic, transparent transition that:

  1. Removes external dependencies (PM2, better-sqlite3)
  2. Simplifies architecture (direct process control)
  3. Improves cross-platform support (especially Windows)
  4. Preserves user data (database, settings, logs unchanged)
  5. Requires no user action (automatic on first hook trigger)

Key Migration Moment: First hook trigger after update to 7.0.10+ Duration: ~2-5 seconds (one-time delay) Impact: Seamless transition, user-invisible Rollback: Not needed (migration is forward-only, safe)

For most users, the migration will be completely transparent - they'll see no errors, no data loss, and experience improved reliability and simpler troubleshooting going forward.