diff --git a/docs/plans/session-logic-fixes.md b/docs/plans/session-logic-fixes.md
new file mode 100644
index 00000000..fd79ab35
--- /dev/null
+++ b/docs/plans/session-logic-fixes.md
@@ -0,0 +1,1123 @@
+# Session Logic Fixes - Claude-Mem
+
+**Status:** Planning
+**Created:** 2025-10-16
+**Priority:** High
+**Estimated Effort:** 2-3 days
+
+## Executive Summary
+
+The claude-mem session logic architecture is fundamentally sound, using Claude Agent SDK in streaming input mode with Unix socket IPC for real-time observation processing. However, **we need to verify the basic happy path works end-to-end before addressing edge cases**.
+
+**Critical Goal:** Session ends → summary generated → next session immediately sees summary in context
+
+**Overall Assessment:** Architecture is correct, but needs systematic verification that the happy path works, then resilience improvements
+
+**Current Status:** Unknown if basic cycle works - need to test and debug the core flow first
+
+## Feedback Applied (2025-10-16)
+
+### Round 1: Technical Corrections
+- ✅ Confirmed architectural approach is sound
+- ❌ **Corrected:** SessionEnd hooks already exist in Claude Code - we're configuring, not implementing
+- ✅ Technical fixes for resilience issues are sound
+
+### Round 2: Priority Reordering (MAJOR CHANGE)
+**Critical realization:** The document focused on edge cases (zombies, crashes) when the basic happy path might not even work yet.
+
+**Complete restructure:**
+1. **Phase 0 (NEW - TOP PRIORITY):** Verify the basic cycle works
+   - Does Stop hook fire on normal exit?
+   - Does worker generate and store summary?
+   - Does context hook load summaries on next session?
+   - End-to-end integration test
+
+2. **Phase 1 (SECOND PRIORITY):** Fix resilience issues
+   - Zombie workers, race conditions, stale sockets
+   - All the original issues moved here
+
+**Key principle:** Everything else is irrelevant if "session ends → next session sees summary" doesn't work.
+
+**Revised Focus:** Get the fucking happy path working first, then worry about edge cases.
+
+## Architecture Overview
+
+### Current Flow
+
+```
+SessionStart (startup)
+  → context-hook.ts:15
+  → Loads recent summaries from DB
+  → Outputs markdown to stdout (becomes context)
+
+UserPromptSubmit
+  → new-hook.ts:16
+  → Creates SDK session in DB (status='active')
+  → Spawns detached worker process
+  → Worker starts immediately, hooks return
+
+Worker Process (worker.ts:75)
+  → Starts Unix socket server at /tmp/claude-mem-worker-{id}.sock
+  → Runs SDK agent with streaming input (async generator)
+  → Yields init prompt to SDK agent
+  → Waits for messages from hooks
+
+PostToolUse (fired for each tool)
+  → save-hook.ts:24
+  → Sends observation to worker via Unix socket
+  → Worker receives → yields to SDK agent
+  → SDK agent analyzes → returns <observation> XML
+  → Worker parses XML → stores in observations table
+
+Stop (session ends)
+  → summary-hook.ts:15
+  → Sends FINALIZE message to worker via socket
+  → Worker yields finalize prompt to SDK agent
+  → SDK agent generates <summary> XML
+  → Worker parses → stores in session_summaries table
+  → Worker marks session completed, closes socket, exits
+```
+
+### Key Components
+
+**Hook Files:**
+- `src/hooks/context.ts` - SessionStart hook logic
+- `src/hooks/new.ts` - UserPromptSubmit hook logic
+- `src/hooks/save.ts` - PostToolUse hook logic
+- `src/hooks/summary.ts` - Stop hook logic
+- `src/bin/hooks/*.ts` - Entry point wrappers for each hook
+
+**Worker:**
+- `src/sdk/worker.ts` - Main worker process with SDK integration
+- `src/sdk/prompts.ts` - Prompt generation for SDK agent
+- `src/sdk/parser.ts` - XML parser for SDK responses
+
+**Database:**
+- `src/services/sqlite/HooksDatabase.ts` - Lightweight DB interface for hooks
+- `src/services/sqlite/migrations.ts` - Schema definitions
+
+**Configuration:**
+- `hooks/hooks.json` - Hook configuration for Claude Code plugin
+
+### Technologies
+
+- **IPC:** Unix domain sockets (`/tmp/claude-mem-worker-{id}.sock`)
+- **SDK Mode:** Streaming input (async generator pattern)
+- **Output Format:** XML blocks (`<observation>` and `<summary>`)
+- **Process Model:** Detached worker (spawn with detached: true, stdio: 'ignore')
+- **Database:** SQLite with Bun
+
+## Identified Issues
+
+### Phase 0: Verify Happy Path Works (DO THIS FIRST)
+
+**Priority:** CRITICAL - Everything else is irrelevant if the basic cycle doesn't work
+
+**Goal:** Prove that when a session ends normally, the next session immediately sees the summary in its context.
+
+#### Test 0.1: Does Stop Hook Fire on Normal Exit?
+
+**What to test:**
+```bash
+# Start Claude Code session
+claude
+
+# Do some work (read files, etc)
+
+# Exit normally
+exit
+
+# Check logs - did Stop hook run?
+```
+
+**Expected behavior:**
+- Stop hook (`summary-hook`) should fire
+- Should send FINALIZE message to worker socket
+- Worker should receive it and generate summary
+
+**How to verify:**
+1. Add logging to `src/hooks/summary.ts` at the top of `summaryHook()`
+2. Add logging when sending socket message
+3. Exit session normally and check logs
+
+**If it doesn't work:** Debug why Stop hook isn't firing or why socket message fails
+
+---
+
+#### Test 0.2: Does Worker Receive FINALIZE and Generate Summary?
+
+**What to test:**
+After Stop hook fires, does the worker:
+1. Receive the FINALIZE message
+2. Yield finalize prompt to SDK agent
+3. Get back a summary from SDK
+4. Parse the XML
+5. Store it in `session_summaries` table
+
+**How to verify:**
+1. Add console.error logging in `src/sdk/worker.ts:239` in the message handler
+2. Log when FINALIZE is received
+3. Log the SDK agent response
+4. Log when summary is parsed
+5. Query DB after session ends:
+   ```bash
+   sqlite3 ~/.claude-mem/data/claude-mem.db "SELECT * FROM session_summaries ORDER BY created_at DESC LIMIT 1"
+   ```
+
+**If it doesn't work:**
+- Check if worker is even running (ps aux | grep worker)
+- Check if socket message arrived
+- Check if SDK agent returned valid XML
+- Check if parser worked
+- Check if DB insert succeeded
+
+---
+
+#### Test 0.3: Does Context Hook Load Summaries?
+
+**What to test:**
+When starting a new session, does context hook:
+1. Query recent summaries from DB
+2. Format them as markdown
+3. Output to stdout (becomes context)
+
+**How to verify:**
+1. Add logging to `src/hooks/context.ts:24`
+2. Log the summaries retrieved from DB
+3. Log the markdown output
+4. Start new session and check:
+   - Console output (should see markdown)
+   - Claude's context (ask "what did we do last session?")
+
+**If it doesn't work:**
+- Check if SessionStart hook is firing
+- Check if DB query returns results
+- Check if markdown is being formatted correctly
+- Check if output is going to stdout properly
+
+---
+
+#### Test 0.4: End-to-End Integration Test
+
+**What to test:**
+Full cycle from start to finish:
+
+```bash
+# Session 1
+claude
+# Do some work
+echo "test file" > test.txt
+cat test.txt
+exit
+
+# Verify summary was stored
+sqlite3 ~/.claude-mem/data/claude-mem.db "SELECT summary_text FROM session_summaries ORDER BY created_at DESC LIMIT 1"
+
+# Session 2
+claude
+# Ask Claude: "What did we do last session?"
+# Expected: Claude should know we created and read test.txt
+```
+
+**Success criteria:**
+- ✅ Summary appears in DB after session 1
+- ✅ Session 2 context includes summary from session 1
+- ✅ Claude can answer questions about previous session
+
+**If it doesn't work:**
+- Review logs from Tests 0.1-0.3
+- Add more granular logging
+- Check each step of the pipeline
+
+---
+
+#### Common Failure Points & Debugging
+
+**If summaries aren't showing up in new sessions:**
+
+1. **Stop hook not configured/firing:**
+   ```bash
+   # Check hooks config
+   cat ~/.claude/plugins/claude-mem/hooks.json | jq '.hooks.Stop'
+
+   # Should see summary-hook configured
+   # If not, hooks.json is wrong or plugin not installed
+   ```
+
+2. **Worker not running:**
+   ```bash
+   ps aux | grep claude-mem-worker
+
+   # If no worker, UserPromptSubmit hook failed to spawn it
+   # Check new-hook logs
+   ```
+
+3. **Socket communication failing:**
+   ```bash
+   # Check socket exists
+   ls /tmp/claude-mem-worker-*.sock
+
+   # Try to connect manually
+   echo '{"type":"finalize"}' | nc -U /tmp/claude-mem-worker-*.sock
+   ```
+
+4. **SDK agent not returning summary:**
+   - Check API key is set
+   - Check SDK agent prompt is valid
+   - Check XML parser is working
+   - Add logging to see SDK response
+
+5. **DB write failing:**
+   ```bash
+   # Check DB exists and is writable
+   sqlite3 ~/.claude-mem/data/claude-mem.db "SELECT * FROM sdk_sessions WHERE status='active'"
+
+   # If no active session, new-hook didn't create it
+   ```
+
+6. **Context hook not loading:**
+   ```bash
+   # Check SessionStart hook configured
+   cat ~/.claude/plugins/claude-mem/hooks.json | jq '.hooks.SessionStart'
+
+   # Start session and check for context output
+   # Should see markdown in initial context
+   ```
+
+**Debugging Checklist:**
+- [ ] Verify all hooks are configured in hooks.json
+- [ ] Verify plugin is installed correctly
+- [ ] Add console.error logging to all hooks (goes to stderr, visible in terminal)
+- [ ] Check each step of the pipeline systematically
+- [ ] Don't assume anything works - verify each piece
+
+---
+
+### Phase 1: Critical Resilience Issues (Fix After Happy Path Works)
+
+#### 1. Zombie Worker Processes
+
+**Severity:** High
+**Impact:** Memory/CPU waste, orphaned processes accumulate
+
+**Problem:**
+If Stop hook never fires (user Ctrl-C, Claude Code crash), worker runs forever waiting for FINALIZE message.
+
+**Location:** `src/sdk/worker.ts:239`
+```typescript
+// Current code - infinite loop with no timeout
+while (!this.isFinalized) {
+  if (this.pendingMessages.length === 0) {
+    await this.sleep(100);
+    continue;
+  }
+  // ... process messages
+}
+```
+
+**Fix Required:**
+```typescript
+// Add watchdog timer
+class SDKWorker {
+  private maxIdleTime = 2 * 60 * 60 * 1000; // 2 hours
+  private lastActivityTime = Date.now();
+
+  private updateActivity(): void {
+    this.lastActivityTime = Date.now();
+  }
+
+  private async* createMessageGenerator(): AsyncIterable<...> {
+    // Yield initial prompt
+    const initPrompt = buildInitPrompt(...);
+    yield { type: 'user', message: { role: 'user', content: initPrompt } };
+    this.updateActivity();
+
+    while (!this.isFinalized) {
+      // Check for timeout
+      const idleTime = Date.now() - this.lastActivityTime;
+      if (idleTime > this.maxIdleTime) {
+        console.error(`[SDK Worker] Timeout - no activity for ${this.maxIdleTime / 1000}s`);
+        this.isFinalized = true;
+        break;
+      }
+
+      if (this.pendingMessages.length === 0) {
+        await this.sleep(100);
+        continue;
+      }
+
+      // Process messages and update activity
+      this.updateActivity();
+      // ... existing message processing
+    }
+  }
+}
+```
+
+**Testing:**
+1. Start claude-mem session
+2. Kill Claude Code process (kill -9)
+3. Verify worker exits after 2 hours
+4. Check no orphaned processes remain
+
+---
+
+#### 2. SessionEnd Hook Not Configured
+
+**Severity:** High
+**Impact:** No cleanup on abrupt exit, sessions stuck in "active" status
+
+**Problem:**
+SessionEnd hooks are a built-in Claude Code feature that "run when a session ends" and "cannot block session termination but can perform cleanup tasks" ([docs](https://docs.claude.com/en/docs/claude-code/hooks#hook-events)). However, claude-mem's `hooks/hooks.json` does NOT configure this hook. Worker doesn't get cleaned up when Claude Code exits abruptly.
+
+**Note:** This is NOT a missing feature in Claude Code - SessionEnd hooks already exist. We just need to configure them.
+
+**Current Configuration:** `hooks/hooks.json:1-51`
+```json
+{
+  "hooks": {
+    "SessionStart": [...],
+    "UserPromptSubmit": [...],
+    "PostToolUse": [...],
+    "Stop": [...]
+    // SessionEnd is MISSING
+  }
+}
+```
+
+**Fix Required:**
+
+SessionEnd hooks receive structured input including:
+```json
+{
+  "session_id": "abc123",
+  "transcript_path": "~/.claude/projects/.../transcript.jsonl",
+  "cwd": "/Users/...",
+  "hook_event_name": "SessionEnd",
+  "reason": "exit"  // or "clear", "logout", "prompt_input_exit", etc.
+}
+```
+
+**Implementation Steps:**
+
+1. **Add SessionEnd configuration to hooks/hooks.json:**
+
+For events like SessionEnd that don't use matchers, we can omit the matcher field:
+
+```json
+{
+  "hooks": {
+    "SessionEnd": [
+      {
+        "hooks": [
+          {
+            "type": "command",
+            "command": "bun ${CLAUDE_PLUGIN_ROOT}/scripts/hooks/cleanup-hook.js",
+            "timeout": 60000
+          }
+        ]
+      }
+    ]
+  }
+}
+```
+
+2. **Create src/hooks/cleanup.ts:**
+```typescript
+import { HooksDatabase } from '../services/sqlite/HooksDatabase.js';
+import { getWorkerSocketPath } from '../shared/paths.js';
+import { existsSync, unlinkSync } from 'fs';
+import { execSync } from 'child_process';
+
+export interface SessionEndInput {
+  session_id: string;
+  cwd: string;
+  reason: 'clear' | 'logout' | 'prompt_input_exit' | 'other';
+  [key: string]: any;
+}
+
+/**
+ * Cleanup Hook - SessionEnd
+ * Cleans up worker process and marks session as terminated
+ */
+export function cleanupHook(input?: SessionEndInput): void {
+  try {
+    if (!input) {
+      console.log('No input provided - this script is designed to run as a Claude Code SessionEnd hook');
+      process.exit(0);
+    }
+
+    const { session_id, reason } = input;
+
+    // Find active SDK session
+    const db = new HooksDatabase();
+    const session = db.findActiveSDKSession(session_id);
+
+    if (!session) {
+      db.close();
+      console.log('{"suppressOutput": true}');
+      process.exit(0);
+    }
+
+    // Get socket path and clean up socket file
+    const socketPath = getWorkerSocketPath(session.id);
+    if (existsSync(socketPath)) {
+      try {
+        unlinkSync(socketPath);
+      } catch (err) {
+        console.error(`[claude-mem cleanup] Failed to remove socket: ${err.message}`);
+      }
+    }
+
+    // Mark session as failed (not completed since it was terminated)
+    db.markSessionFailed(session.id);
+    db.close();
+
+    // Try to kill worker process if still running
+    // Worker socket path includes session ID, so we can find it
+    try {
+      // Find worker process by socket file in lsof output
+      const lsofOutput = execSync(`lsof ${socketPath} 2>/dev/null || true`, { encoding: 'utf8' });
+      const pidMatch = lsofOutput.match(/\s+(\d+)\s+/);
+      if (pidMatch) {
+        const pid = pidMatch[1];
+        console.error(`[claude-mem cleanup] Killing worker process ${pid}`);
+        process.kill(parseInt(pid, 10), 'SIGTERM');
+      }
+    } catch (err) {
+      // Worker already dead or couldn't find it - that's fine
+    }
+
+    console.log('{"suppressOutput": true}');
+    process.exit(0);
+
+  } catch (error: any) {
+    console.error(`[claude-mem cleanup error: ${error.message}]`);
+    console.log('{"suppressOutput": true}');
+    process.exit(0);
+  }
+}
+```
+
+3. **Create src/bin/hooks/cleanup-hook.ts:**
+```typescript
+#!/usr/bin/env bun
+
+/**
+ * Cleanup Hook Entry Point - SessionEnd
+ * Standalone executable for plugin hooks
+ */
+
+import { cleanupHook } from '../../hooks/cleanup.js';
+
+// Read input from stdin
+const input = await Bun.stdin.text();
+
+try {
+  const parsed = input.trim() ? JSON.parse(input) : undefined;
+  cleanupHook(parsed);
+} catch (error: any) {
+  console.error(`[claude-mem cleanup-hook error: ${error.message}]`);
+  console.log('{"suppressOutput": true}');
+  process.exit(0);
+}
+```
+
+4. **Update build process to compile cleanup-hook.ts to scripts/hooks/cleanup-hook.js**
+
+**Testing:**
+1. Start claude-mem session
+2. Exit Claude Code with Ctrl-C
+3. Verify worker process is killed
+4. Verify socket file is removed
+5. Verify session marked as "failed" in DB
+
+---
+
+#### 3. Stale Socket Files Block New Sessions
+
+**Severity:** Medium
+**Impact:** Worker fails to start if previous worker crashed
+
+**Problem:**
+If worker crashes, socket file persists at `/tmp/claude-mem-worker-{id}.sock`. Next worker with same session ID fails with EADDRINUSE.
+
+**Location:** `src/sdk/worker.ts:111-163`
+```typescript
+private async startSocketServer(): Promise<void> {
+  // Current code only removes if exists
+  if (existsSync(this.socketPath)) {
+    unlinkSync(this.socketPath);
+  }
+
+  return new Promise((resolve, reject) => {
+    this.server = net.createServer((socket) => { ... });
+    this.server.listen(this.socketPath, () => { resolve(); });
+  });
+}
+```
+
+**Fix Required:**
+```typescript
+private async startSocketServer(): Promise<void> {
+  // Clean up stale socket if it exists
+  if (existsSync(this.socketPath)) {
+    // Test if socket is responsive
+    const isStale = await this.testSocketStale(this.socketPath);
+    if (isStale) {
+      console.error(`[SDK Worker] Removing stale socket: ${this.socketPath}`);
+      unlinkSync(this.socketPath);
+    } else {
+      // Socket is active - another worker is using this session ID
+      throw new Error(`Socket already in use: ${this.socketPath}`);
+    }
+  }
+
+  return new Promise((resolve, reject) => {
+    this.server = net.createServer((socket) => {
+      let buffer = '';
+      socket.on('data', (chunk) => {
+        // ... existing code
+      });
+    });
+
+    this.server.on('error', (err: any) => {
+      if (err.code === 'EADDRINUSE') {
+        console.error(`[SDK Worker] Socket already in use: ${this.socketPath}`);
+      }
+      reject(err);
+    });
+
+    this.server.listen(this.socketPath, () => {
+      resolve();
+    });
+  });
+}
+
+/**
+ * Test if socket file is stale (no process listening)
+ */
+private async testSocketStale(socketPath: string): Promise<boolean> {
+  return new Promise((resolve) => {
+    const testClient = net.connect(socketPath);
+
+    testClient.on('connect', () => {
+      // Socket is responsive - not stale
+      testClient.end();
+      resolve(false);
+    });
+
+    testClient.on('error', () => {
+      // Socket exists but not responsive - stale
+      resolve(true);
+    });
+
+    // Timeout after 100ms
+    setTimeout(() => {
+      testClient.destroy();
+      resolve(true);
+    }, 100);
+  });
+}
+```
+
+**Testing:**
+1. Start worker, kill it with kill -9
+2. Verify socket file persists
+3. Start new worker with same session ID
+4. Verify old socket is detected as stale and removed
+5. Verify new worker starts successfully
+
+---
+
+#### 4. Race Condition on First Observation
+
+**Severity:** Medium
+**Impact:** First observation might be lost if socket not ready
+
+**Problem:**
+Worker startup is async (socket creation, SDK initialization). PostToolUse can fire immediately after UserPromptSubmit returns, before socket is ready.
+
+**Current Flow:**
+1. UserPromptSubmit → creates session → spawns worker → returns immediately
+2. PostToolUse fires (Claude reads a file)
+3. save-hook tries to connect → ENOENT (socket not ready yet)
+4. Connection fails → logs error, continues
+5. First observation lost
+
+**Location:** `src/hooks/save.ts:71`
+```typescript
+const client = net.connect(socketPath, () => {
+  client.write(JSON.stringify(message) + '\n');
+  client.end();
+});
+
+client.on('error', (err) => {
+  // Currently just logs and continues - observation lost
+  console.error(`[claude-mem save] Socket error: ${err.message}`);
+});
+```
+
+**Fix Required:**
+```typescript
+/**
+ * Save Hook - PostToolUse
+ * Sends tool observations to worker via Unix socket with retry logic
+ */
+export function saveHook(input?: PostToolUseInput): void {
+  try {
+    if (!input) {
+      console.log('No input provided - this script is designed to run as a Claude Code PostToolUse hook');
+      process.exit(0);
+    }
+
+    const { session_id, tool_name, tool_input, tool_output } = input;
+
+    if (SKIP_TOOLS.has(tool_name)) {
+      console.log('{"continue": true, "suppressOutput": true}');
+      process.exit(0);
+    }
+
+    const db = new HooksDatabase();
+    const session = db.findActiveSDKSession(session_id);
+    db.close();
+
+    if (!session) {
+      console.log('{"continue": true, "suppressOutput": true}');
+      process.exit(0);
+    }
+
+    const socketPath = getWorkerSocketPath(session.id);
+    const message = {
+      type: 'observation',
+      tool_name,
+      tool_input: JSON.stringify(tool_input),
+      tool_output: JSON.stringify(tool_output)
+    };
+
+    // Try to send with retries
+    sendWithRetry(socketPath, message, 5).then(() => {
+      console.log('{"continue": true, "suppressOutput": true}');
+      process.exit(0);
+    }).catch((err) => {
+      console.error(`[claude-mem save] Failed after retries: ${err.message}`);
+      console.log('{"continue": true, "suppressOutput": true}');
+      process.exit(0);
+    });
+
+  } catch (error: any) {
+    console.error(`[claude-mem save error: ${error.message}]`);
+    console.log('{"continue": true, "suppressOutput": true}');
+    process.exit(0);
+  }
+}
+
+/**
+ * Send message to socket with exponential backoff retry
+ */
+async function sendWithRetry(
+  socketPath: string,
+  message: any,
+  maxRetries: number
+): Promise<void> {
+  let retries = maxRetries;
+  let delay = 100; // Start with 100ms
+
+  while (retries > 0) {
+    try {
+      await sendMessage(socketPath, message);
+      return; // Success
+    } catch (err: any) {
+      retries--;
+      if (retries === 0) {
+        throw err; // Out of retries
+      }
+
+      // Exponential backoff
+      await sleep(delay);
+      delay = Math.min(delay * 2, 2000); // Cap at 2s
+    }
+  }
+}
+
+/**
+ * Send single message to socket
+ */
+function sendMessage(socketPath: string, message: any): Promise<void> {
+  return new Promise((resolve, reject) => {
+    const client = net.connect(socketPath, () => {
+      client.write(JSON.stringify(message) + '\n');
+      client.end();
+      resolve();
+    });
+
+    client.on('error', (err) => {
+      reject(err);
+    });
+  });
+}
+
+function sleep(ms: number): Promise<void> {
+  return new Promise(resolve => setTimeout(resolve, ms));
+}
+```
+
+**Testing:**
+1. Add artificial delay in worker startup
+2. Fire PostToolUse immediately after UserPromptSubmit
+3. Verify save-hook retries and succeeds
+4. Verify observation is captured
+
+---
+
+### Medium Priority (Should Fix)
+
+#### 5. Orphaned Active Sessions in Database
+
+**Severity:** Low
+**Impact:** DB bloat, confusion about session status
+
+**Problem:**
+Sessions marked "active" never transition to "completed" or "failed" if worker crashes or is killed.
+
+**Fix Required:**
+
+Create cleanup script: `src/commands/cleanup-sessions.ts`
+```typescript
+import { HooksDatabase } from '../services/sqlite/HooksDatabase.js';
+
+/**
+ * Mark old active sessions as failed
+ */
+export function cleanupSessions(maxAgeHours: number = 24): void {
+  const db = new HooksDatabase();
+  const maxAgeMs = maxAgeHours * 60 * 60 * 1000;
+  const cutoffEpoch = Date.now() - maxAgeMs;
+
+  const query = (db as any).db.query(`
+    UPDATE sdk_sessions
+    SET status = 'failed', completed_at = datetime('now'), completed_at_epoch = ?
+    WHERE status = 'active' AND started_at_epoch < ?
+  `);
+
+  const result = query.run(Date.now(), cutoffEpoch);
+  console.log(`Marked ${result.changes} old active sessions as failed`);
+
+  db.close();
+}
+```
+
+Add to CLI: `src/bin/cli.ts`
+```typescript
+.command('cleanup-sessions')
+.description('Mark old active sessions as failed')
+.option('--max-age <hours>', 'Maximum age in hours', '24')
+.action((options) => {
+  cleanupSessions(parseInt(options.maxAge, 10));
+})
+```
+
+**Alternative:** Add auto-expiry check in `context-hook`:
+```typescript
+// Before loading summaries, clean up stale sessions
+const maxAgeMs = 24 * 60 * 60 * 1000;
+const cutoffEpoch = Date.now() - maxAgeMs;
+db.db.query(`
+  UPDATE sdk_sessions
+  SET status = 'failed'
+  WHERE status = 'active' AND started_at_epoch < ?
+`).run(cutoffEpoch);
+```
+
+---
+
+#### 6. SessionStart Only Runs on "startup"
+
+**Severity:** Low
+**Impact:** No context loaded on /resume
+
+**Problem:**
+`context-hook` only loads context on "startup" source, skips "resume", "clear", and "compact".
+
+**Location:** `src/hooks/context.ts:24`
+```typescript
+// Only run on startup (not on resume)
+if (input.source && input.source !== 'startup') {
+  console.log('');
+  process.exit(0);
+}
+```
+
+**Fix Required:**
+```typescript
+// Load context on startup and resume
+if (input.source && input.source !== 'startup' && input.source !== 'resume') {
+  console.log(''); // Skip for clear/compact
+  process.exit(0);
+}
+```
+
+**Rationale:**
+- **startup:** Load context (project overview)
+- **resume:** Load context (user continuing work)
+- **clear:** Skip (user wants fresh start)
+- **compact:** Skip (just memory optimization, context preserved)
+
+---
+
+### Low Priority (Nice to Have)
+
+#### 7. No Cost Control or Observation Limits
+
+**Severity:** Low
+**Impact:** Long sessions can be expensive
+
+**Problem:**
+No limits on SDK agent API calls. A session with thousands of tools could rack up significant costs.
+
+**Fix Ideas:**
+1. Add observation counter, warn after N observations
+2. Add cost estimation based on token usage
+3. Add budget limit in config
+4. Batch observations (send N at once instead of one-by-one)
+
+**Example:**
+```typescript
+class SDKWorker {
+  private observationCount = 0;
+  private maxObservations = 1000;
+
+  private handleMessage(message: WorkerMessage): void {
+    if (message.type === 'observation') {
+      this.observationCount++;
+      if (this.observationCount > this.maxObservations) {
+        console.error(`[SDK Worker] Exceeded max observations: ${this.maxObservations}`);
+        this.isFinalized = true;
+        return;
+      }
+    }
+    this.pendingMessages.push(message);
+  }
+}
+```
+
+---
+
+#### 8. No Health Check Mechanism
+
+**Severity:** Low
+**Impact:** Can't tell if worker is alive/healthy
+
+**Fix Ideas:**
+1. Add `/status` command that checks for active workers
+2. Add health check endpoint on socket (ping/pong)
+3. Add metrics to DB (last_activity_at)
+
+---
+
+#### 9. No Observation Deduplication
+
+**Severity:** Low
+**Impact:** Duplicate observations if same tool executed multiple times
+
+**Fix Ideas:**
+1. Hash tool_name + tool_input + tool_output
+2. Check for duplicate hash before storing
+3. Or let SDK agent handle deduplication naturally
+
+---
+
+## Implementation Checklist
+
+### Phase 0: Verify Happy Path (DO THIS FIRST - HIGHEST PRIORITY)
+
+**Goal:** Prove the basic cycle works end-to-end before fixing edge cases.
+
+- [ ] **Test 0.1: Verify Stop Hook Fires**
+  - [ ] Add logging to `src/hooks/summary.ts`
+  - [ ] Exit session normally and verify hook runs
+  - [ ] Verify FINALIZE message is sent to socket
+
+- [ ] **Test 0.2: Verify Worker Generates Summary**
+  - [ ] Add logging to worker message handler
+  - [ ] Verify FINALIZE message received
+  - [ ] Verify SDK agent response
+  - [ ] Verify summary parsed and stored in DB
+  - [ ] Query DB to confirm summary exists
+
+- [ ] **Test 0.3: Verify Context Hook Loads Summaries**
+  - [ ] Add logging to `src/hooks/context.ts`
+  - [ ] Start new session, verify summaries loaded
+  - [ ] Verify markdown output to stdout
+  - [ ] Verify Claude has context from previous session
+
+- [ ] **Test 0.4: End-to-End Integration Test**
+  - [ ] Run session 1 with test work
+  - [ ] Verify summary in DB
+  - [ ] Run session 2
+  - [ ] Ask Claude about previous session
+  - [ ] Confirm Claude has correct context
+
+**STOP HERE:** Only proceed to Phase 1 after confirming all Phase 0 tests pass.
+
+---
+
+### Phase 1: Critical Resilience Fixes (Do After Phase 0)
+
+- [ ] Add watchdog timer to worker (Issue #1)
+  - [ ] Add lastActivityTime tracking
+  - [ ] Add timeout check in message generator loop
+  - [ ] Test with zombie worker scenario
+
+- [ ] Configure existing SessionEnd hook (Issue #2)
+  - [ ] Add SessionEnd configuration to hooks/hooks.json
+  - [ ] Create src/hooks/cleanup.ts (implements cleanup logic)
+  - [ ] Create src/bin/hooks/cleanup-hook.ts (entry point)
+  - [ ] Update build process to compile cleanup-hook
+  - [ ] Test with Ctrl-C exit and verify worker cleanup
+
+- [ ] Fix stale socket detection (Issue #3)
+  - [ ] Add testSocketStale method
+  - [ ] Update startSocketServer to check for stale sockets
+  - [ ] Test with crashed worker scenario
+
+- [ ] Fix save-hook race condition (Issue #4)
+  - [ ] Add sendWithRetry function
+  - [ ] Add exponential backoff logic
+  - [ ] Update save-hook to use retry logic
+  - [ ] Test with immediate PostToolUse
+
+### Phase 2: Medium Priority
+
+- [ ] Add session cleanup script (Issue #5)
+  - [ ] Create cleanup-sessions command
+  - [ ] Add to CLI
+  - [ ] Optional: Add auto-cleanup to context-hook
+
+- [ ] Fix SessionStart source handling (Issue #6)
+  - [ ] Update context-hook to load on "resume"
+  - [ ] Test with /resume command
+
+### Phase 3: Low Priority (Optional)
+
+- [ ] Add cost control (Issue #7)
+- [ ] Add health checks (Issue #8)
+- [ ] Add observation deduplication (Issue #9)
+
+## Testing Strategy
+
+### Unit Tests
+
+Create tests for each fix:
+- `test/hooks/cleanup.test.ts` - SessionEnd hook
+- `test/sdk/worker-timeout.test.ts` - Watchdog timer
+- `test/hooks/save-retry.test.ts` - Retry logic
+
+### Integration Tests
+
+Test complete flows:
+1. **Normal flow:** SessionStart → UserPromptSubmit → PostToolUse → Stop
+2. **Crash recovery:** Worker crash → SessionEnd cleanup
+3. **Zombie worker:** No Stop hook → Worker timeout
+4. **Socket race:** Immediate PostToolUse → Retry success
+
+### Manual Testing Scenarios
+
+1. **Zombie Worker Test:**
+   ```bash
+   # Start session
+   claude
+   # Kill Claude with Ctrl-C
+   # Check for worker process
+   ps aux | grep claude-mem-worker
+   # Wait 2 hours, verify worker exits
+   ```
+
+2. **SessionEnd Test:**
+   ```bash
+   # Start session
+   claude
+   # Exit normally or Ctrl-C
+   # Verify worker killed
+   # Verify socket removed
+   # Check DB for session status
+   sqlite3 ~/.claude-mem/data/claude-mem.db "SELECT * FROM sdk_sessions"
+   ```
+
+3. **Stale Socket Test:**
+   ```bash
+   # Start session
+   claude
+   # Kill worker with kill -9 <pid>
+   # Verify socket exists
+   ls /tmp/claude-mem-worker-*.sock
+   # Start new session
+   # Verify old socket removed, new session starts
+   ```
+
+4. **Race Condition Test:**
+   ```bash
+   # Add delay to worker startup (for testing)
+   # Start session, immediately run command
+   claude "list all files"
+   # Verify first observation captured
+   ```
+
+## File Modifications Required
+
+### New Files
+- `src/hooks/cleanup.ts` - SessionEnd hook logic
+- `src/bin/hooks/cleanup-hook.ts` - SessionEnd entry point
+- `src/commands/cleanup-sessions.ts` - Session cleanup script
+- `test/hooks/cleanup.test.ts` - Tests for SessionEnd hook
+- `test/sdk/worker-timeout.test.ts` - Tests for watchdog timer
+- `test/hooks/save-retry.test.ts` - Tests for retry logic
+
+### Modified Files
+- `hooks/hooks.json` - Add SessionEnd configuration
+- `src/sdk/worker.ts` - Add watchdog timer, stale socket detection
+- `src/hooks/save.ts` - Add retry logic
+- `src/hooks/context.ts` - Load context on resume
+- `src/bin/cli.ts` - Add cleanup-sessions command
+
+## Dependencies
+
+No new dependencies required. All fixes use existing:
+- `net` (Unix sockets)
+- `fs` (file operations)
+- `child_process` (process management)
+- `bun:sqlite` (database)
+
+## Success Criteria
+
+### Phase 0 (Must Pass First)
+1. ✅ Stop hook fires on normal exit
+2. ✅ Worker receives FINALIZE and generates summary
+3. ✅ Summary is stored in DB correctly
+4. ✅ Context hook loads summaries on next session
+5. ✅ New session immediately sees previous session's summary in context
+6. ✅ End-to-end integration test passes
+
+### Phase 1 (After Phase 0 Passes)
+1. ✅ Worker processes never become zombies (exit after 2h max)
+2. ✅ SessionEnd hook cleans up worker and socket on exit
+3. ✅ Stale sockets don't block new sessions
+4. ✅ First observation always captured (no race condition)
+5. ✅ No orphaned "active" sessions in DB after 24h
+6. ✅ Context loads on /resume
+7. ✅ All tests pass
+
+## References
+
+- Claude Code Hooks Documentation: https://docs.claude.com/en/docs/claude-code/hooks
+- Claude Agent SDK Streaming: https://docs.claude.com/en/api/agent-sdk/streaming-vs-single-mode
+- Unix Domain Sockets: Node.js `net` module
+- SQLite Best Practices: Bun SQLite documentation
+
+## Notes
+
+- All hooks must return `{"continue": true, "suppressOutput": true}` on error
+- Hooks have 60s default timeout (configurable)
+- Worker is detached process, doesn't block Claude Code
+- SessionEnd hooks "cannot block session termination" per Claude Code docs
+- Streaming input mode is the recommended SDK approach for this architecture