Add stderr test hook for UI experiment

2025-10-26 22:29:43 -04:00
parent 44b69b737b
commit b0fae0cfd4
10 changed files with 2501 additions and 2 deletions
@@ -79,9 +79,15 @@ npx mintlify dev
 - **[Usage Guide](docs/usage/getting-started.mdx)** - How Claude-Mem works automatically
 - **[MCP Search Tools](docs/usage/search-tools.mdx)** - Query your project history

+### Best Practices
+- **[Context Engineering](docs/context-engineering.mdx)** - AI agent context optimization principles
+- **[Progressive Disclosure](docs/progressive-disclosure.mdx)** - Philosophy behind Claude-Mem's context priming strategy
+
 ### Architecture
 - **[Overview](docs/architecture/overview.mdx)** - System components & data flow
- **[Hooks](docs/architecture/hooks.mdx)** - 5 lifecycle hooks explained
+- **[Architecture Evolution](docs/architecture-evolution.mdx)** - The journey from v3 to v4
+- **[Hooks Architecture](docs/hooks-architecture.mdx)** - How Claude-Mem uses lifecycle hooks
+- **[Hooks Reference](docs/architecture/hooks.mdx)** - 5 lifecycle hooks explained
 - **[Worker Service](docs/architecture/worker-service.mdx)** - HTTP API & PM2 management
 - **[Database](docs/architecture/database.mdx)** - SQLite schema & FTS5 search
 - **[MCP Search](docs/architecture/mcp-search.mdx)** - 7 search tools & examples
@@ -0,0 +1,801 @@
+# Architecture Evolution: The Journey from v3 to v4
+
+## The Problem We Solved
+
+**Goal:** Create a memory system that makes Claude smarter across sessions without the user noticing it exists.
+
+**Challenge:** How do you observe AI agent behavior, compress it intelligently, and serve it back at the right time - all without slowing down or interfering with the main workflow?
+
+This is the story of how claude-mem evolved from a simple idea to a production-ready system, and the key architectural decisions that made it work.
+
+---
+
+## v1-v2: The Naive Approach
+
+### The First Attempt: Dump Everything
+
+**Architecture:**
+```
+PostToolUse Hook → Save raw tool outputs → Retrieve everything on startup
+```
+
+**What we learned:**
+- ❌ Context pollution (thousands of tokens of irrelevant data)
+- ❌ No compression (raw tool outputs are verbose)
+- ❌ No search (had to scan everything linearly)
+- ✅ Proved the concept: Memory across sessions is valuable
+
+**Example of what went wrong:**
+```
+SessionStart loaded:
+- 150 file read operations
+- 80 grep searches
+- 45 bash commands
+- Total: ~35,000 tokens
+- Relevant to current task: ~500 tokens (1.4%)
+```
+
+---
+
+## v3: Smart Compression, Wrong Architecture
+
+### The Breakthrough: AI-Powered Compression
+
+**New idea:** Use Claude itself to compress observations
+
+**Architecture:**
+```
+PostToolUse Hook → Queue observation → SDK Worker → AI compression → Store insights
+```
+
+**What we added:**
+1. **Claude Agent SDK integration** - Use AI to compress observations
+2. **Background worker** - Don't block main session
+3. **Structured observations** - Extract facts, decisions, insights
+4. **Session summaries** - Generate comprehensive summaries
+
+**What worked:**
+- ✅ Compression ratio: 10:1 to 100:1
+- ✅ Semantic understanding (not just keyword matching)
+- ✅ Background processing (hooks stayed fast)
+- ✅ Search became useful
+
+**What didn't work:**
+- ❌ Still loaded everything upfront
+- ❌ Session ID management was broken
+- ❌ Aggressive cleanup interrupted summaries
+- ❌ Multiple SDK sessions per Claude Code session
+
+---
+
+## The Key Realizations
+
+### Realization 1: Progressive Disclosure
+
+**Problem:** Even compressed observations can pollute context if you load them all.
+
+**Insight:** Humans don't read everything before starting work. Why should AI?
+
+**Solution:** Show an index first, fetch details on-demand.
+
+```
+❌ Old: Load 50 observations (8,500 tokens)
+✅ New: Show index of 50 observations (800 tokens)
+        Agent fetches 2-3 relevant ones (300 tokens)
+        Total: 1,100 tokens vs 8,500 tokens
+```
+
+**Impact:**
+- 87% reduction in context usage
+- 100% relevance (only fetch what's needed)
+- Agent autonomy (decides what's relevant)
+
+### Realization 2: Session ID Chaos
+
+**Problem:** SDK session IDs change on every turn.
+
+**What we thought:**
+```typescript
+// ❌ Wrong assumption
+UserPromptSubmit → Capture session ID once → Use forever
+```
+
+**Reality:**
+```typescript
+// ✅ Actual behavior
+Turn 1: session_abc123
+Turn 2: session_def456
+Turn 3: session_ghi789
+```
+
+**Why this matters:**
+- Can't resume sessions without tracking ID updates
+- Session state gets lost between turns
+- Observations get orphaned
+
+**Solution:**
+```typescript
+// Capture from system init message
+for await (const msg of response) {
+  if (msg.type === 'system' && msg.subtype === 'init') {
+    sdkSessionId = msg.session_id;
+    await updateSessionId(sessionId, sdkSessionId);
+  }
+}
+```
+
+### Realization 3: Graceful vs Aggressive Cleanup
+
+**v3 approach:**
+```typescript
+// ❌ Aggressive: Kill worker immediately
+SessionEnd → DELETE /worker/session → Worker stops
+```
+
+**Problems:**
+- Summary generation interrupted mid-process
+- Pending observations lost
+- Race conditions everywhere
+
+**v4 approach:**
+```typescript
+// ✅ Graceful: Let worker finish
+SessionEnd → Mark session complete → Worker finishes → Exit naturally
+```
+
+**Benefits:**
+- Summaries complete successfully
+- No lost observations
+- Clean state transitions
+
+**Code:**
+```typescript
+// v3: Aggressive
+async function sessionEnd(sessionId: string) {
+  await fetch(`http://localhost:37777/sessions/${sessionId}`, {
+    method: 'DELETE'
+  });
+}
+
+// v4: Graceful
+async function sessionEnd(sessionId: string) {
+  await db.run(
+    'UPDATE sdk_sessions SET completed_at = ? WHERE id = ?',
+    [Date.now(), sessionId]
+  );
+}
+```
+
+### Realization 4: One Session, Not Many
+
+**Problem:** We were creating multiple SDK sessions per Claude Code session.
+
+**What we thought:**
+```
+Claude Code session → Create SDK session per observation → 100+ SDK sessions
+```
+
+**Reality should be:**
+```
+Claude Code session → ONE long-running SDK session → Streaming input
+```
+
+**Why this matters:**
+- SDK maintains conversation state
+- Context accumulates naturally
+- Much more efficient
+
+**Implementation:**
+```typescript
+// ✅ Streaming Input Mode
+async function* messageGenerator(): AsyncIterable<UserMessage> {
+  // Initial prompt
+  yield {
+    role: "user",
+    content: "You are a memory assistant..."
+  };
+
+  // Then continuously yield observations
+  while (session.status === 'active') {
+    const observations = await pollQueue();
+    for (const obs of observations) {
+      yield {
+        role: "user",
+        content: formatObservation(obs)
+      };
+    }
+    await sleep(1000);
+  }
+}
+
+const response = query({
+  prompt: messageGenerator(),
+  options: { maxTurns: 1000 }
+});
+```
+
+---
+
+## v4: The Architecture That Works
+
+### The Core Design
+
+```
+┌─────────────────────────────────────────────────────────┐
+│              CLAUDE CODE SESSION                         │
+│  User → Claude → Tools (Read, Edit, Write, Bash)        │
+│                    ↓                                     │
+│              PostToolUse Hook                            │
+│              (queues observation)                        │
+└─────────────────────────────────────────────────────────┘
+                     ↓ SQLite queue
+┌─────────────────────────────────────────────────────────┐
+│              SDK WORKER PROCESS                          │
+│  ONE streaming session per Claude Code session          │
+│                                                          │
+│  AsyncIterable<UserMessage>                             │
+│    → Yields observations from queue                     │
+│    → SDK compresses via AI                              │
+│    → Parses XML responses                               │
+│    → Stores in database                                 │
+└─────────────────────────────────────────────────────────┘
+                     ↓ SQLite storage
+┌─────────────────────────────────────────────────────────┐
+│              NEXT SESSION                                │
+│  SessionStart Hook                                       │
+│    → Queries database                                    │
+│    → Returns progressive disclosure index               │
+│    → Agent fetches details via MCP                      │
+└─────────────────────────────────────────────────────────┘
+```
+
+### The Five Hook Architecture
+
+<Tabs>
+  <Tab title="SessionStart">
+    **Purpose:** Inject context from previous sessions
+
+    **Timing:** When Claude Code starts
+
+    **What it does:**
+    - Queries last 10 session summaries
+    - Formats as progressive disclosure index
+    - Injects into context via stdout
+
+    **Key change from v3:**
+    - ✅ Index format (not full details)
+    - ✅ Token counts visible
+    - ✅ MCP search instructions included
+  </Tab>
+
+  <Tab title="UserPromptSubmit">
+    **Purpose:** Initialize session tracking
+
+    **Timing:** Before Claude processes prompt
+
+    **What it does:**
+    - Creates session record
+    - Saves raw user prompt (v4.2.0+)
+    - Starts worker if needed
+
+    **Key change from v3:**
+    - ✅ Stores raw prompts for search
+    - ✅ Auto-starts PM2 worker
+  </Tab>
+
+  <Tab title="PostToolUse">
+    **Purpose:** Capture tool observations
+
+    **Timing:** After every tool execution
+
+    **What it does:**
+    - Enqueues observation in database
+    - Returns immediately
+
+    **Key change from v3:**
+    - ✅ Just enqueues (doesn't process)
+    - ✅ Worker handles all AI calls
+  </Tab>
+
+  <Tab title="Summary">
+    **Purpose:** Generate session summaries
+
+    **Timing:** Worker-triggered (mid-session)
+
+    **What it does:**
+    - Gathers observations
+    - Sends to Claude for summarization
+    - Stores structured summary
+
+    **Key change from v3:**
+    - ✅ Multiple summaries per session
+    - ✅ Summaries are checkpoints, not endings
+  </Tab>
+
+  <Tab title="SessionEnd">
+    **Purpose:** Graceful cleanup
+
+    **Timing:** When session ends
+
+    **What it does:**
+    - Marks session complete
+    - Lets worker finish processing
+
+    **Key change from v3:**
+    - ✅ Graceful (not aggressive)
+    - ✅ No DELETE requests
+    - ✅ Worker finishes naturally
+  </Tab>
+</Tabs>
+
+### Database Schema Evolution
+
+**v3 schema:**
+```sql
+-- Simple, flat structure
+CREATE TABLE observations (
+  id INTEGER PRIMARY KEY,
+  session_id TEXT,
+  text TEXT,
+  created_at INTEGER
+);
+```
+
+**v4 schema:**
+```sql
+-- Rich, structured schema
+CREATE TABLE observations (
+  id INTEGER PRIMARY KEY AUTOINCREMENT,
+  session_id TEXT NOT NULL,
+  project TEXT NOT NULL,
+
+  -- Progressive disclosure metadata
+  title TEXT NOT NULL,
+  subtitle TEXT,
+  type TEXT NOT NULL,  -- decision, bugfix, feature, etc.
+
+  -- Content
+  narrative TEXT NOT NULL,
+  facts TEXT,  -- JSON array
+
+  -- Searchability
+  concepts TEXT,  -- JSON array of tags
+  files_read TEXT,  -- JSON array
+  files_modified TEXT,  -- JSON array
+
+  -- Timestamps
+  created_at TEXT NOT NULL,
+  created_at_epoch INTEGER NOT NULL,
+
+  FOREIGN KEY(session_id) REFERENCES sdk_sessions(id)
+);
+
+-- FTS5 for full-text search
+CREATE VIRTUAL TABLE observations_fts USING fts5(
+  title, subtitle, narrative, facts, concepts,
+  content=observations
+);
+
+-- Auto-sync triggers
+CREATE TRIGGER observations_ai AFTER INSERT ON observations BEGIN
+  INSERT INTO observations_fts(rowid, title, subtitle, narrative, facts, concepts)
+  VALUES (new.id, new.title, new.subtitle, new.narrative, new.facts, new.concepts);
+END;
+```
+
+**What changed:**
+- ✅ Structured fields (title, subtitle, type)
+- ✅ FTS5 full-text search
+- ✅ Project-scoped queries
+- ✅ Rich metadata for progressive disclosure
+
+### Worker Service Redesign
+
+**v3 worker:**
+```typescript
+// Multiple short SDK sessions
+app.post('/process', async (req, res) => {
+  const response = await query({
+    prompt: buildPrompt(req.body),
+    options: { maxTurns: 1 }
+  });
+
+  for await (const msg of response) {
+    // Process single observation
+  }
+
+  res.json({ success: true });
+});
+```
+
+**v4 worker:**
+```typescript
+// ONE long-running SDK session
+async function runWorker(sessionId: string) {
+  const response = query({
+    prompt: messageGenerator(),  // AsyncIterable
+    options: { maxTurns: 1000 }
+  });
+
+  for await (const msg of response) {
+    if (msg.type === 'text') {
+      parseObservations(msg.content);
+      parseSummaries(msg.content);
+    }
+  }
+}
+```
+
+**Benefits:**
+- Maintains conversation state
+- SDK handles context automatically
+- More efficient (fewer API calls)
+- Natural multi-turn flow
+
+---
+
+## Critical Fixes Along the Way
+
+### Fix 1: Context Injection Pollution (v4.3.1)
+
+**Problem:** SessionStart hook output polluted with npm install logs
+
+```bash
+# Hook output contained:
+npm WARN deprecated ...
+npm WARN deprecated ...
+{"hookSpecificOutput": {"additionalContext": "..."}}
+```
+
+**Why it broke:**
+- Claude Code expects clean JSON or plain text
+- stderr/stdout from npm install mixed with hook output
+- Context didn't inject properly
+
+**Solution:**
+```json
+{
+  "command": "npm install --loglevel=silent && node context-hook.js"
+}
+```
+
+**Result:** Clean JSON output, context injection works
+
+### Fix 2: Double Shebang Issue (v4.3.1)
+
+**Problem:** Hook executables had duplicate shebangs
+
+```javascript
+#!/usr/bin/env node
+#!/usr/bin/env node  // ← Duplicate!
+
+// Rest of code...
+```
+
+**Why it happened:**
+- Source files had shebang
+- esbuild added another shebang during build
+
+**Solution:**
+```typescript
+// Remove shebangs from source files
+// Let esbuild add them during build
+```
+
+**Result:** Clean executables, no parsing errors
+
+### Fix 3: FTS5 Injection Vulnerability (v4.2.3)
+
+**Problem:** User input passed directly to FTS5 query
+
+```typescript
+// ❌ Vulnerable
+const results = db.query(
+  `SELECT * FROM observations_fts WHERE observations_fts MATCH '${userQuery}'`
+);
+```
+
+**Attack:**
+```typescript
+userQuery = "'; DROP TABLE observations; --"
+```
+
+**Solution:**
+```typescript
+// ✅ Safe: Use parameterized queries
+const results = db.query(
+  'SELECT * FROM observations_fts WHERE observations_fts MATCH ?',
+  [userQuery]
+);
+```
+
+### Fix 4: NOT NULL Constraint Violation (v4.2.8)
+
+**Problem:** Session creation failed when prompt was empty
+
+```sql
+INSERT INTO sdk_sessions (claude_session_id, user_prompt, ...)
+VALUES ('abc123', NULL, ...)  -- ❌ user_prompt is NOT NULL
+```
+
+**Solution:**
+```typescript
+// Allow NULL user_prompts
+user_prompt: input.prompt ?? null
+```
+
+**Schema change:**
+```sql
+-- Before
+user_prompt TEXT NOT NULL
+
+-- After
+user_prompt TEXT  -- Nullable
+```
+
+---
+
+## Performance Improvements
+
+### Optimization 1: Prepared Statements
+
+**Before:**
+```typescript
+for (const obs of observations) {
+  db.run(`INSERT INTO observations (...) VALUES (?, ?, ...)`, [obs.id, obs.text, ...]);
+}
+```
+
+**After:**
+```typescript
+const stmt = db.prepare(`INSERT INTO observations (...) VALUES (?, ?, ...)`);
+for (const obs of observations) {
+  stmt.run([obs.id, obs.text, ...]);
+}
+stmt.finalize();
+```
+
+**Impact:** 5x faster bulk inserts
+
+### Optimization 2: FTS5 Indexing
+
+**Before:**
+```typescript
+// Manual full-text search
+const results = db.query(
+  `SELECT * FROM observations WHERE text LIKE '%${query}%'`
+);
+```
+
+**After:**
+```typescript
+// FTS5 virtual table
+const results = db.query(
+  `SELECT * FROM observations_fts WHERE observations_fts MATCH ?`,
+  [query]
+);
+```
+
+**Impact:** 100x faster searches on large datasets
+
+### Optimization 3: Index Format Default
+
+**Before:**
+```typescript
+// Always return full observations
+search_observations({ query: "hooks" });
+// Returns: 5,000 tokens
+```
+
+**After:**
+```typescript
+// Default to index format
+search_observations({ query: "hooks", format: "index" });
+// Returns: 200 tokens
+
+// Fetch full only when needed
+search_observations({ query: "hooks", format: "full", limit: 1 });
+// Returns: 150 tokens
+```
+
+**Impact:** 25x reduction in average search result size
+
+---
+
+## What We Learned
+
+### Lesson 1: Context is Precious
+
+**Principle:** Every token you put in context window costs attention.
+
+**Application:**
+- Progressive disclosure reduces waste by 87%
+- Index-first approach gives agent control
+- Token counts make costs visible
+
+### Lesson 2: Session State is Complicated
+
+**Principle:** Distributed state is hard. SDK handles it better than we can.
+
+**Application:**
+- Use SDK's built-in session resumption
+- Don't try to manually reconstruct state
+- Track session IDs from init messages
+
+### Lesson 3: Graceful Beats Aggressive
+
+**Principle:** Let processes finish their work before terminating.
+
+**Application:**
+- Graceful cleanup prevents data loss
+- Workers finish important operations
+- Clean state transitions reduce bugs
+
+### Lesson 4: AI is the Compressor
+
+**Principle:** Don't compress manually. Let AI do semantic compression.
+
+**Application:**
+- 10:1 to 100:1 compression ratios
+- Semantic understanding, not keyword extraction
+- Structured outputs (XML parsing)
+
+### Lesson 5: Progressive Everything
+
+**Principle:** Show metadata first, fetch details on-demand.
+
+**Application:**
+- Progressive disclosure in context injection
+- Index format in search results
+- Layer 1 (titles) → Layer 2 (summaries) → Layer 3 (full details)
+
+---
+
+## The Road Ahead
+
+### Planned: Adaptive Index Size
+
+```typescript
+SessionStart({ source: "startup" }):
+  → Show last 10 sessions (normal)
+
+SessionStart({ source: "resume" }):
+  → Show only current session (minimal)
+
+SessionStart({ source: "compact" }):
+  → Show last 20 sessions (comprehensive)
+```
+
+### Planned: Relevance Scoring
+
+```typescript
+// Use embeddings to pre-sort index by semantic relevance
+search_observations({
+  query: "authentication bug",
+  sort: "relevance"  // Based on embeddings
+});
+```
+
+### Planned: Multi-Project Context
+
+```typescript
+// Cross-project pattern recognition
+search_observations({
+  query: "API rate limiting",
+  projects: ["api-gateway", "user-service", "billing-service"]
+});
+```
+
+### Planned: Collaborative Memory
+
+```typescript
+// Team-shared observations (optional)
+createObservation({
+  title: "Rate limit: 100 req/min",
+  scope: "team"  // vs "user"
+});
+```
+
+---
+
+## Migration Guide: v3 → v4
+
+### Step 1: Backup Database
+
+```bash
+cp ~/.claude-mem/claude-mem.db ~/.claude-mem/claude-mem-v3-backup.db
+```
+
+### Step 2: Update Plugin
+
+```bash
+cd ~/.claude/plugins/marketplaces/thedotmack
+git pull
+```
+
+### Step 3: Run Migration
+
+```bash
+npx tsx src/services/sqlite/migrations/v3-to-v4.ts
+```
+
+**What the migration does:**
+- Adds new columns to observations table
+- Creates FTS5 virtual tables
+- Sets up auto-sync triggers
+- Migrates existing observations to new schema
+
+### Step 4: Restart Worker
+
+```bash
+pm2 restart claude-mem-worker
+pm2 logs claude-mem-worker
+```
+
+### Step 5: Test
+
+```bash
+# Start Claude Code
+claude
+
+# Check that context is injected
+# (Should see progressive disclosure index)
+
+# Submit a prompt and check observations
+pm2 logs claude-mem-worker --nostream
+```
+
+---
+
+## Key Metrics
+
+### v3 Performance
+
+| Metric | Value |
+|--------|-------|
+| Context usage per session | ~25,000 tokens |
+| Relevant context | ~2,000 tokens (8%) |
+| Hook execution time | ~200ms |
+| Search latency | ~500ms (LIKE queries) |
+
+### v4 Performance
+
+| Metric | Value |
+|--------|-------|
+| Context usage per session | ~1,100 tokens |
+| Relevant context | ~1,100 tokens (100%) |
+| Hook execution time | ~45ms |
+| Search latency | ~15ms (FTS5) |
+
+**Improvements:**
+- 96% reduction in context waste
+- 12x increase in relevance
+- 4x faster hooks
+- 33x faster search
+
+---
+
+## Conclusion
+
+The journey from v3 to v4 was about understanding these fundamental truths:
+
+1. **Context is finite** - Progressive disclosure respects attention budget
+2. **AI is the compressor** - Semantic understanding beats keyword extraction
+3. **Agents are smart** - Let them decide what to fetch
+4. **State is hard** - Use SDK's built-in mechanisms
+5. **Graceful wins** - Let processes finish cleanly
+
+The result is a memory system that's both powerful and invisible. Users never notice it working - Claude just gets smarter over time.
+
+---
+
+## Further Reading
+
+- [Progressive Disclosure](/docs/progressive-disclosure) - The philosophy behind v4
+- [Hooks Architecture](/docs/hooks-architecture) - How hooks power the system
+- [Context Engineering](/docs/context-engineering) - Foundational principles
+- [v4.0.0 Release Notes](/CHANGELOG.md#v400) - Full changelog
+
+---
+
+*This architecture evolution reflects hundreds of hours of experimentation, dozens of dead ends, and the invaluable experience of real-world usage. v4 is the architecture that emerged from understanding what actually works.*
@@ -0,0 +1,222 @@
+# Context Engineering for AI Agents: Best Practices Cheat Sheet
+
+## Core Principle
+**Find the smallest possible set of high-signal tokens that maximize the likelihood of your desired outcome.**
+
+---
+
+## Context Engineering vs Prompt Engineering
+
+**Prompt Engineering**: Writing and organizing LLM instructions for optimal outcomes (one-time task)
+
+**Context Engineering**: Curating and maintaining the optimal set of tokens during inference across multiple turns (iterative process)
+
+Context engineering manages:
+- System instructions
+- Tools
+- Model Context Protocol (MCP)
+- External data
+- Message history
+- Runtime data retrieval
+
+---
+
+## The Problem: Context Rot
+
+**Key Insight**: LLMs have an "attention budget" that gets depleted as context grows
+
+- Every token attends to every other token (n² relationships)
+- As context length increases, model accuracy decreases
+- Models have less training experience with longer sequences
+- Context must be treated as a finite resource with diminishing marginal returns
+
+---
+
+## System Prompts: Find the "Right Altitude"
+
+### The Goldilocks Zone
+
+**Too Prescriptive** ❌
+- Hardcoded if-else logic
+- Brittle and fragile
+- High maintenance complexity
+
+**Too Vague** ❌
+- High-level guidance without concrete signals
+- Falsely assumes shared context
+- Lacks actionable direction
+
+**Just Right** ✅
+- Specific enough to guide behavior effectively
+- Flexible enough to provide strong heuristics
+- Minimal set of information that fully outlines expected behavior
+
+### Best Practices
+- Use simple, direct language
+- Organize into distinct sections (`<background_information>`, `<instructions>`, `## Tool guidance`, etc.)
+- Use XML tags or Markdown headers for structure
+- Start with minimal prompt, add based on failure modes
+- Note: Minimal ≠ short (provide sufficient information upfront)
+
+---
+
+## Tools: Minimal and Clear
+
+### Design Principles
+- **Self-contained**: Each tool has a single, clear purpose
+- **Robust to error**: Handle edge cases gracefully
+- **Extremely clear**: Intended use is unambiguous
+- **Token-efficient**: Returns relevant information without bloat
+- **Descriptive parameters**: Unambiguous input names (e.g., `user_id` not `user`)
+
+### Critical Rule
+**If a human engineer can't definitively say which tool to use in a given situation, an AI agent can't be expected to do better.**
+
+### Common Failure Modes to Avoid
+- Bloated tool sets covering too much functionality
+- Tools with overlapping purposes
+- Ambiguous decision points about which tool to use
+
+---
+
+## Examples: Diverse, Not Exhaustive
+
+**Do** ✅
+- Curate a set of diverse, canonical examples
+- Show expected behavior effectively
+- Think "pictures worth a thousand words"
+
+**Don't** ❌
+- Stuff in a laundry list of edge cases
+- Try to articulate every possible rule
+- Overwhelm with exhaustive scenarios
+
+---
+
+## Context Retrieval Strategies
+
+### Just-In-Time Context (Recommended for Agents)
+**Approach**: Maintain lightweight identifiers (file paths, queries, links) and dynamically load data at runtime
+
+**Benefits**:
+- Avoids context pollution
+- Enables progressive disclosure
+- Mirrors human cognition (we don't memorize everything)
+- Leverages metadata (file names, folder structure, timestamps)
+- Agents discover context incrementally
+
+**Trade-offs**:
+- Slower than pre-computed retrieval
+- Requires proper tool guidance to avoid dead-ends
+
+### Pre-Inference Retrieval (Traditional RAG)
+**Approach**: Use embedding-based retrieval to surface context before inference
+
+**When to Use**: Static content that won't change during interaction
+
+### Hybrid Strategy (Best of Both)
+**Approach**: Retrieve some data upfront, enable autonomous exploration as needed
+
+**Example**: Claude Code loads CLAUDE.md files upfront, uses glob/grep for just-in-time retrieval
+
+**Rule of Thumb**: "Do the simplest thing that works"
+
+---
+
+## Long-Horizon Tasks: Three Techniques
+
+### 1. Compaction
+**What**: Summarize conversation nearing context limit, reinitiate with summary
+
+**Implementation**:
+- Pass message history to model for compression
+- Preserve critical details (architectural decisions, bugs, implementation)
+- Discard redundant outputs
+- Continue with compressed context + recently accessed files
+
+**Tuning Process**:
+1. **First**: Maximize recall (capture all relevant information)
+2. **Then**: Improve precision (eliminate superfluous content)
+
+**Low-Hanging Fruit**: Clear old tool calls and results
+
+**Best For**: Tasks requiring extensive back-and-forth
+
+### 2. Structured Note-Taking (Agentic Memory)
+**What**: Agent writes notes persisted outside context window, retrieved later
+
+**Examples**:
+- To-do lists
+- NOTES.md files
+- Game state tracking (Pokémon example: tracking 1,234 steps of training)
+- Project progress logs
+
+**Benefits**:
+- Persistent memory with minimal overhead
+- Maintains critical context across tool calls
+- Enables multi-hour coherent strategies
+
+**Best For**: Iterative development with clear milestones
+
+### 3. Sub-Agent Architectures
+**What**: Specialized sub-agents handle focused tasks with clean context windows
+
+**How It Works**:
+- Main agent coordinates high-level plan
+- Sub-agents perform deep technical work
+- Sub-agents explore extensively (tens of thousands of tokens)
+- Return condensed summaries (1,000-2,000 tokens)
+
+**Benefits**:
+- Clear separation of concerns
+- Parallel exploration
+- Detailed context remains isolated
+
+**Best For**: Complex research and analysis tasks
+
+---
+
+## Quick Decision Framework
+
+| Scenario | Recommended Approach |
+|----------|---------------------|
+| Static content | Pre-inference retrieval or hybrid |
+| Dynamic exploration needed | Just-in-time context |
+| Extended back-and-forth | Compaction |
+| Iterative development | Structured note-taking |
+| Complex research | Sub-agent architectures |
+| Rapid model improvement | "Do the simplest thing that works" |
+
+---
+
+## Key Takeaways
+
+1. **Context is finite**: Treat it as a precious resource with an attention budget
+2. **Think holistically**: Consider the entire state available to the LLM
+3. **Stay minimal**: More context isn't always better
+4. **Be iterative**: Context curation happens each time you pass to the model
+5. **Design for autonomy**: As models improve, let them act intelligently
+6. **Start simple**: Test with minimal setup, add based on failure modes
+
+---
+
+## Anti-Patterns to Avoid
+
+- ❌ Cramming everything into prompts
+- ❌ Creating brittle if-else logic
+- ❌ Building bloated tool sets
+- ❌ Stuffing exhaustive edge cases as examples
+- ❌ Assuming larger context windows solve everything
+- ❌ Ignoring context pollution over long interactions
+
+---
+
+## Remember
+
+> "Even as models continue to improve, the challenge of maintaining coherence across extended interactions will remain central to building more effective agents."
+
+Context engineering will evolve, but the core principle stays the same: **optimize signal-to-noise ratio in your token budget**.
+
+---
+
+*Based on Anthropic's "Effective context engineering for AI agents" (September 2025)*
@@ -39,6 +39,14 @@
          "usage/search-tools"
        ]
      },
+      {
+        "group": "Best Practices",
+        "icon": "lightbulb",
+        "pages": [
+          "context-engineering",
+          "progressive-disclosure"
+        ]
+      },
      {
        "group": "Configuration & Development",
        "icon": "gear",
@@ -53,6 +61,8 @@
        "icon": "diagram-project",
        "pages": [
          "architecture/overview",
+          "architecture-evolution",
+          "hooks-architecture",
          "architecture/hooks",
          "architecture/worker-service",
          "architecture/database",
@@ -0,0 +1,784 @@
+# How Claude-Mem Uses Hooks: A Lifecycle-Driven Architecture
+
+## Core Principle
+**Observe the main Claude Code session from the outside, process observations in the background, inject context at the right time.**
+
+---
+
+## The Big Picture
+
+Claude-Mem is fundamentally a **hook-driven system**. Every piece of functionality happens in response to lifecycle events:
+
+```
+┌─────────────────────────────────────────────────────────┐
+│              CLAUDE CODE SESSION                         │
+│  (Main session - user interacting with Claude)          │
+│                                                          │
+│  SessionStart → UserPromptSubmit → Tool Use → Stop      │
+│       ↓              ↓               ↓          ↓        │
+│    [Hook]         [Hook]          [Hook]     [Hook]     │
+└─────────────────────────────────────────────────────────┘
+       ↓              ↓               ↓          ↓
+┌─────────────────────────────────────────────────────────┐
+│                  CLAUDE-MEM SYSTEM                       │
+│                                                          │
+│  Context      New Session    Observation    Summary     │
+│  Injection    Tracking       Capture        Generation  │
+└─────────────────────────────────────────────────────────┘
+```
+
+**Key insight:** Claude-Mem doesn't interrupt or modify Claude Code's behavior. It observes from the outside and provides value through lifecycle hooks.
+
+---
+
+## Why Hooks?
+
+### The Non-Invasive Requirement
+
+Claude-Mem had several architectural constraints:
+
+1. **Can't modify Claude Code**: It's a closed-source binary
+2. **Must be fast**: Can't slow down the main session
+3. **Must be reliable**: Can't break Claude Code if it fails
+4. **Must be portable**: Works on any project without configuration
+
+**Solution:** External command hooks configured via settings.json
+
+### The Hook System Advantage
+
+Claude Code's hook system provides exactly what we need:
+
+<CardGroup cols={2}>
+  <Card title="Lifecycle Events" icon="clock">
+    SessionStart, UserPromptSubmit, PostToolUse, Stop
+  </Card>
+
+  <Card title="Non-Blocking" icon="forward">
+    Hooks run in parallel, don't wait for completion
+  </Card>
+
+  <Card title="Context Injection" icon="upload">
+    SessionStart and UserPromptSubmit can add context
+  </Card>
+
+  <Card title="Tool Observation" icon="eye">
+    PostToolUse sees all tool inputs and outputs
+  </Card>
+</CardGroup>
+
+---
+
+## The Five Hooks
+
+### Hook 1: SessionStart (Context Hook)
+
+**Purpose:** Inject relevant context from previous sessions
+
+**When:** Claude Code starts or resumes
+
+**What it does:**
+1. Extracts project name from current working directory
+2. Queries SQLite for recent session summaries (last 10)
+3. Queries SQLite for recent observations (last 50)
+4. Formats as progressive disclosure index
+5. Outputs to stdout (automatically injected into context)
+
+**Configuration:**
+```json
+{
+  "hooks": {
+    "SessionStart": [{
+      "matcher": "startup",
+      "hooks": [{
+        "type": "command",
+        "command": "${CLAUDE_PLUGIN_ROOT}/scripts/context-hook.js",
+        "timeout": 120
+      }]
+    }]
+  }
+}
+```
+
+**Key decisions:**
+- ✅ Only runs on "startup" (not "clear" or "compact")
+- ✅ 120-second timeout for npm install (v4.3.1 fix)
+- ✅ Uses `--loglevel=silent` for clean JSON output
+- ✅ Progressive disclosure format (index, not full details)
+
+**Output format:**
+```markdown
+# [claude-mem] recent context
+
+**Legend:** 🎯 session-request | 🔴 gotcha | 🟡 problem-solution ...
+
+### Oct 26, 2025
+
+**General**
+| ID | Time | T | Title | Tokens |
+|----|------|---|-------|--------|
+| #2586 | 12:58 AM | 🔵 | Context hook file empty | ~51 |
+
+*Use claude-mem MCP search to access full details*
+```
+
+**Source:** `src/hooks/context-hook.ts` → `plugin/scripts/context-hook.js`
+
+---
+
+### Hook 2: UserPromptSubmit (New Session Hook)
+
+**Purpose:** Initialize session tracking when user submits a prompt
+
+**When:** Before Claude processes the user's message
+
+**What it does:**
+1. Reads user prompt and session ID from stdin
+2. Creates new session record in SQLite
+3. Saves raw user prompt for full-text search (v4.2.0+)
+4. Starts PM2 worker service if not running
+5. Returns immediately (non-blocking)
+
+**Configuration:**
+```json
+{
+  "hooks": {
+    "UserPromptSubmit": [{
+      "hooks": [{
+        "type": "command",
+        "command": "${CLAUDE_PLUGIN_ROOT}/scripts/new-hook.js"
+      }]
+    }]
+  }
+}
+```
+
+**Key decisions:**
+- ✅ No matcher (runs for all prompts)
+- ✅ Creates session record immediately
+- ✅ Stores raw prompts for search (privacy note: local SQLite only)
+- ✅ Auto-starts worker service
+- ✅ Suppresses output (`suppressOutput: true`)
+
+**Database operations:**
+```sql
+INSERT INTO sdk_sessions (claude_session_id, project, user_prompt, ...)
+VALUES (?, ?, ?, ...)
+
+INSERT INTO user_prompts (session_id, prompt, prompt_number, ...)
+VALUES (?, ?, ?, ...)
+```
+
+**Source:** `src/hooks/new-hook.ts` → `plugin/scripts/new-hook.js`
+
+---
+
+### Hook 3: PostToolUse (Save Observation Hook)
+
+**Purpose:** Capture tool execution observations for later processing
+
+**When:** Immediately after any tool completes successfully
+
+**What it does:**
+1. Receives tool name, input, output from stdin
+2. Finds active session for current project
+3. Enqueues observation in observation_queue table
+4. Returns immediately (processing happens in worker)
+
+**Configuration:**
+```json
+{
+  "hooks": {
+    "PostToolUse": [{
+      "matcher": "*",
+      "hooks": [{
+        "type": "command",
+        "command": "${CLAUDE_PLUGIN_ROOT}/scripts/save-hook.js"
+      }]
+    }]
+  }
+}
+```
+
+**Key decisions:**
+- ✅ Matcher: `*` (captures all tools)
+- ✅ Non-blocking (just enqueues, doesn't process)
+- ✅ Worker processes observations asynchronously
+- ✅ Parallel execution safe (each hook gets own stdin)
+
+**Database operations:**
+```sql
+INSERT INTO observation_queue (session_id, tool_name, tool_input, tool_output, ...)
+VALUES (?, ?, ?, ?, ...)
+```
+
+**What gets queued:**
+```json
+{
+  "session_id": "abc123",
+  "tool_name": "Edit",
+  "tool_input": {
+    "file_path": "/path/to/file.ts",
+    "old_string": "...",
+    "new_string": "..."
+  },
+  "tool_output": {
+    "success": true,
+    "linesChanged": 5
+  },
+  "created_at_epoch": 1698765432
+}
+```
+
+**Source:** `src/hooks/save-hook.ts` → `plugin/scripts/save-hook.js`
+
+---
+
+### Hook 4: Summary Hook (Mid-Session Checkpoint)
+
+**Purpose:** Generate AI-powered session summaries during the session
+
+**When:** Triggered programmatically by the worker service
+
+**What it does:**
+1. Gathers session observations from database
+2. Sends to Claude Agent SDK for summarization
+3. Processes response and extracts structured summary
+4. Stores in session_summaries table
+
+**Configuration:**
+```json
+{
+  "hooks": {
+    "Summary": [{
+      "hooks": [{
+        "type": "command",
+        "command": "${CLAUDE_PLUGIN_ROOT}/scripts/summary-hook.js"
+      }]
+    }]
+  }
+}
+```
+
+**Key decisions:**
+- ✅ Triggered by worker, not by Claude Code lifecycle
+- ✅ Multiple summaries per session (v4.2.0+)
+- ✅ Summaries are checkpoints, not endings
+- ✅ Uses Claude Agent SDK for AI compression
+
+**Summary structure:**
+```xml
+<summary>
+  <request>User's original request</request>
+  <investigated>What was examined</investigated>
+  <learned>Key discoveries</learned>
+  <completed>Work finished</completed>
+  <next_steps>Remaining tasks</next_steps>
+  <files_read>
+    <file>path/to/file1.ts</file>
+    <file>path/to/file2.ts</file>
+  </files_read>
+  <files_modified>
+    <file>path/to/file3.ts</file>
+  </files_modified>
+  <notes>Additional context</notes>
+</summary>
+```
+
+**Source:** `src/hooks/summary-hook.ts` → `plugin/scripts/summary-hook.js`
+
+---
+
+### Hook 5: SessionEnd (Cleanup Hook)
+
+**Purpose:** Mark sessions as completed when they end
+
+**When:** Claude Code session ends (not on `/clear`)
+
+**What it does:**
+1. Marks session as completed in database
+2. Allows worker to finish processing
+3. Performs graceful cleanup
+
+**Configuration:**
+```json
+{
+  "hooks": {
+    "SessionEnd": [{
+      "hooks": [{
+        "type": "command",
+        "command": "${CLAUDE_PLUGIN_ROOT}/scripts/cleanup-hook.js"
+      }]
+    }]
+  }
+}
+```
+
+**Key decisions:**
+- ✅ Graceful completion (v4.1.0+)
+- ✅ No longer sends DELETE to workers
+- ✅ Skips cleanup on `/clear` commands
+- ✅ Preserves ongoing sessions
+
+**Why graceful cleanup?**
+
+**Old approach (v3):**
+```typescript
+// ❌ Aggressive cleanup
+SessionEnd → DELETE /worker/session → Worker stops immediately
+```
+
+**Problems:**
+- Interrupted summary generation
+- Lost pending observations
+- Race conditions
+
+**New approach (v4.1.0+):**
+```typescript
+// ✅ Graceful completion
+SessionEnd → UPDATE sessions SET completed_at = NOW()
+Worker sees completion → Finishes processing → Exits naturally
+```
+
+**Benefits:**
+- Worker finishes important operations
+- Summaries complete successfully
+- Clean state transitions
+
+**Source:** `src/hooks/cleanup-hook.ts` → `plugin/scripts/cleanup-hook.js`
+
+---
+
+## Hook Execution Flow
+
+### Session Lifecycle
+
+```mermaid
+sequenceDiagram
+    participant User
+    participant Claude
+    participant Hooks
+    participant Worker
+    participant DB
+
+    User->>Claude: Start Claude Code
+    Claude->>Hooks: SessionStart hook
+    Hooks->>DB: Query recent context
+    DB-->>Hooks: Session summaries + observations
+    Hooks-->>Claude: Inject context
+    Note over Claude: Context available for session
+
+    User->>Claude: Submit prompt
+    Claude->>Hooks: UserPromptSubmit hook
+    Hooks->>DB: Create session record
+    Hooks->>Worker: Start worker (if not running)
+    Worker-->>DB: Ready to process
+
+    Claude->>Claude: Execute tools
+    Claude->>Hooks: PostToolUse (multiple times)
+    Hooks->>DB: Queue observations
+    Note over Worker: Polls queue, processes observations
+
+    Worker->>Worker: AI compression
+    Worker->>DB: Store compressed observations
+    Worker->>Hooks: Trigger summary hook
+    Hooks->>DB: Store session summary
+
+    User->>Claude: Finish
+    Claude->>Hooks: SessionEnd hook
+    Hooks->>DB: Mark session complete
+    Worker->>DB: Check completion
+    Worker->>Worker: Finish processing
+    Worker->>Worker: Exit gracefully
+```
+
+### Hook Timing
+
+| Event | Timing | Blocking | Timeout | Output Handling |
+|-------|--------|----------|---------|-----------------|
+| **SessionStart** | Before session | No | 120s | stdout → context |
+| **UserPromptSubmit** | Before processing | No | 60s | stdout → context |
+| **PostToolUse** | After tool | No | 60s | Transcript only |
+| **Summary** | Worker triggered | No | 300s | Database |
+| **SessionEnd** | On exit | No | 60s | Log only |
+
+---
+
+## The Worker Service Architecture
+
+### Why a Background Worker?
+
+**Problem:** Hooks must be fast (< 1 second)
+
+**Reality:** AI compression takes 5-30 seconds per observation
+
+**Solution:** Hooks enqueue observations, worker processes async
+
+```
+┌─────────────────────────────────────────────────────────┐
+│                   HOOK (Fast)                            │
+│  1. Read stdin (< 1ms)                                  │
+│  2. Insert into queue (< 10ms)                          │
+│  3. Return success (< 20ms total)                       │
+└─────────────────────────────────────────────────────────┘
+                        ↓ (queue)
+┌─────────────────────────────────────────────────────────┐
+│                 WORKER (Slow)                            │
+│  1. Poll queue every 1s                                 │
+│  2. Process observation via Claude SDK (5-30s)          │
+│  3. Parse and store results                             │
+│  4. Mark observation processed                          │
+└─────────────────────────────────────────────────────────┘
+```
+
+### PM2 Process Management
+
+**Technology:** PM2 (process manager for Node.js)
+
+**Why PM2:**
+- Auto-restart on failure
+- Log management
+- Process monitoring
+- Cross-platform (works on macOS, Linux, Windows)
+- No systemd/launchd needed
+
+**Configuration:**
+```javascript
+// ecosystem.config.cjs
+module.exports = {
+  apps: [{
+    name: 'claude-mem-worker',
+    script: './plugin/scripts/worker-service.cjs',
+    instances: 1,
+    autorestart: true,
+    watch: false,
+    max_memory_restart: '500M',
+    env: {
+      NODE_ENV: 'production',
+      CLAUDE_MEM_WORKER_PORT: 37777
+    }
+  }]
+};
+```
+
+**Worker lifecycle:**
+```bash
+# Started by new-hook (if not running)
+pm2 start ecosystem.config.cjs
+
+# Status check
+pm2 status claude-mem-worker
+
+# View logs
+pm2 logs claude-mem-worker
+
+# Restart
+pm2 restart claude-mem-worker
+```
+
+### Worker HTTP API
+
+**Technology:** Express.js REST API on port 37777
+
+**Endpoints:**
+
+| Endpoint | Method | Purpose |
+|----------|--------|---------|
+| `/health` | GET | Health check |
+| `/sessions` | POST | Create session |
+| `/sessions/:id` | GET | Get session status |
+| `/sessions/:id` | PATCH | Update session |
+| `/observations` | POST | Enqueue observation |
+| `/observations/:id` | GET | Get observation |
+
+**Why HTTP API?**
+- Language-agnostic (hooks can be any language)
+- Easy debugging (curl commands)
+- Standard error handling
+- Proper async handling
+
+---
+
+## Design Patterns
+
+### Pattern 1: Fire-and-Forget Hooks
+
+**Principle:** Hooks should return immediately, not wait for completion
+
+```typescript
+// ❌ Bad: Hook waits for processing
+export async function saveHook(stdin: HookInput) {
+  const observation = parseInput(stdin);
+  await processObservation(observation);  // BLOCKS!
+  return success();
+}
+
+// ✅ Good: Hook enqueues and returns
+export async function saveHook(stdin: HookInput) {
+  const observation = parseInput(stdin);
+  await enqueueObservation(observation);  // Fast
+  return success();  // Immediate
+}
+```
+
+### Pattern 2: Queue-Based Processing
+
+**Principle:** Decouple capture from processing
+
+```
+Hook (capture) → Queue (buffer) → Worker (process)
+```
+
+**Benefits:**
+- Parallel hook execution safe
+- Worker failure doesn't affect hooks
+- Retry logic centralized
+- Backpressure handling
+
+### Pattern 3: Graceful Degradation
+
+**Principle:** Memory system failure shouldn't break Claude Code
+
+```typescript
+try {
+  await captureObservation();
+} catch (error) {
+  // Log error, but don't throw
+  console.error('Memory capture failed:', error);
+  return { continue: true, suppressOutput: true };
+}
+```
+
+**Failure modes:**
+- Database locked → Skip observation, log error
+- Worker crashed → Auto-restart via PM2
+- Network issue → Retry with exponential backoff
+- Disk full → Warn user, disable memory
+
+### Pattern 4: Progressive Enhancement
+
+**Principle:** Core functionality works without memory, memory enhances it
+
+```
+Without memory: Claude Code works normally
+With memory:    Claude Code + context from past sessions
+Memory broken:  Falls back to working normally
+```
+
+---
+
+## Hook Debugging
+
+### Debug Mode
+
+Enable detailed hook execution logs:
+
+```bash
+claude --debug
+```
+
+**Output:**
+```
+[DEBUG] Executing hooks for PostToolUse:Write
+[DEBUG] Getting matching hook commands for PostToolUse with query: Write
+[DEBUG] Found 1 hook matchers in settings
+[DEBUG] Matched 1 hooks for query "Write"
+[DEBUG] Found 1 hook commands to execute
+[DEBUG] Executing hook command: ${CLAUDE_PLUGIN_ROOT}/scripts/save-hook.js with timeout 60000ms
+[DEBUG] Hook command completed with status 0: {"continue":true,"suppressOutput":true}
+```
+
+### Common Issues
+
+<AccordionGroup>
+  <Accordion title="Hook not executing">
+    **Symptoms:** Hook command never runs
+
+    **Debugging:**
+    1. Check `/hooks` menu - is hook registered?
+    2. Verify matcher pattern (case-sensitive!)
+    3. Test command manually: `echo '{}' | node save-hook.js`
+    4. Check file permissions (executable?)
+  </Accordion>
+
+  <Accordion title="Hook times out">
+    **Symptoms:** Hook execution exceeds timeout
+
+    **Debugging:**
+    1. Check timeout setting (default 60s)
+    2. Identify slow operation (database? network?)
+    3. Move slow operation to worker
+    4. Increase timeout if necessary
+  </Accordion>
+
+  <Accordion title="Context not injecting">
+    **Symptoms:** SessionStart hook runs but context missing
+
+    **Debugging:**
+    1. Check stdout (must be valid JSON or plain text)
+    2. Verify no stderr output (pollutes JSON)
+    3. Check exit code (must be 0)
+    4. Look for npm install output (v4.3.1 fix)
+  </Accordion>
+
+  <Accordion title="Observations not captured">
+    **Symptoms:** PostToolUse hook runs but observations missing
+
+    **Debugging:**
+    1. Check database: `sqlite3 ~/.claude-mem/claude-mem.db "SELECT * FROM observation_queue"`
+    2. Verify session exists: `SELECT * FROM sdk_sessions`
+    3. Check worker status: `pm2 status`
+    4. View worker logs: `pm2 logs claude-mem-worker`
+  </Accordion>
+</AccordionGroup>
+
+### Testing Hooks Manually
+
+```bash
+# Test context hook
+echo '{
+  "session_id": "test123",
+  "cwd": "/Users/alex/projects/my-app",
+  "hook_event_name": "SessionStart",
+  "source": "startup"
+}' | node plugin/scripts/context-hook.js
+
+# Test save hook
+echo '{
+  "session_id": "test123",
+  "tool_name": "Edit",
+  "tool_input": {"file_path": "test.ts"},
+  "tool_output": {"success": true}
+}' | node plugin/scripts/save-hook.js
+
+# Test with actual Claude Code
+claude --debug
+/hooks  # View registered hooks
+# Submit prompt and watch debug output
+```
+
+---
+
+## Performance Considerations
+
+### Hook Execution Time
+
+**Target:** < 100ms per hook
+
+**Actual measurements:**
+
+| Hook | Average | p95 | p99 |
+|------|---------|-----|-----|
+| SessionStart | 45ms | 120ms | 250ms |
+| UserPromptSubmit | 12ms | 25ms | 50ms |
+| PostToolUse | 8ms | 15ms | 30ms |
+| SessionEnd | 5ms | 10ms | 20ms |
+
+**Why SessionStart is slower:**
+- npm install check (idempotent but runs every time)
+- Database query for 10 sessions + 50 observations
+- Formatting progressive disclosure index
+
+**Optimization (v4.3.1):**
+- Use `--loglevel=silent` for npm install
+- Cache package.json hash to skip unnecessary installs
+- Use prepared statements for database queries
+
+### Database Performance
+
+**Schema optimizations:**
+- Indexes on `project`, `created_at_epoch`, `claude_session_id`
+- FTS5 virtual tables for full-text search
+- WAL mode for concurrent reads/writes
+
+**Query patterns:**
+```sql
+-- Fast: Uses index on (project, created_at_epoch)
+SELECT * FROM session_summaries
+WHERE project = ?
+ORDER BY created_at_epoch DESC
+LIMIT 10
+
+-- Fast: Uses index on claude_session_id
+SELECT * FROM sdk_sessions
+WHERE claude_session_id = ?
+LIMIT 1
+
+-- Fast: FTS5 full-text search
+SELECT * FROM observations_fts
+WHERE observations_fts MATCH ?
+ORDER BY rank
+LIMIT 20
+```
+
+### Worker Throughput
+
+**Bottleneck:** Claude API latency (5-30s per observation)
+
+**Mitigation:**
+- Process observations sequentially (simpler, more predictable)
+- Skip low-value observations (TodoWrite, ListMcpResourcesTool)
+- Batch summaries (generate every N observations, not every observation)
+
+**Future optimization:**
+- Parallel processing (multiple workers)
+- Smart batching (combine related observations)
+- Lazy summarization (summarize only when needed)
+
+---
+
+## Security Considerations
+
+### Hook Command Safety
+
+**Risk:** Hooks execute arbitrary commands with user permissions
+
+**Mitigations:**
+1. **Frozen at startup:** Hook configuration captured at start, changes require review
+2. **User review required:** `/hooks` menu shows changes, requires approval
+3. **Plugin isolation:** `${CLAUDE_PLUGIN_ROOT}` prevents path traversal
+4. **Input validation:** Hooks validate stdin schema before processing
+
+### Data Privacy
+
+**What gets stored:**
+- User prompts (raw text) - v4.2.0+
+- Tool inputs and outputs
+- File paths read/modified
+- Session summaries
+
+**Privacy guarantees:**
+- All data stored locally in `~/.claude-mem/claude-mem.db`
+- No cloud uploads (API calls only for AI compression)
+- SQLite file permissions: user-only read/write
+- No analytics or telemetry
+
+### API Key Protection
+
+**Configuration:**
+- Anthropic API key in `~/.anthropic/api_key` or `ANTHROPIC_API_KEY` env var
+- Worker inherits environment from Claude Code
+- Never logged or stored in database
+
+---
+
+## Key Takeaways
+
+1. **Hooks are interfaces**: They define clean boundaries between systems
+2. **Non-blocking is critical**: Hooks must return fast, workers do the heavy lifting
+3. **Graceful degradation**: Memory system can fail without breaking Claude Code
+4. **Queue-based decoupling**: Capture and processing happen independently
+5. **Progressive disclosure**: Context injection uses index-first approach
+6. **Lifecycle alignment**: Each hook has a clear, single purpose
+
+---
+
+## Further Reading
+
+- [Claude Code Hooks Reference](https://docs.claude.com/claude-code/hooks) - Official documentation
+- [Progressive Disclosure](/docs/progressive-disclosure) - Context priming philosophy
+- [Architecture Evolution](/docs/architecture-evolution) - v3 to v4 journey
+- [Worker Service Design](/docs/worker-service) - Background processing details
+
+---
+
+*The hook-driven architecture enables Claude-Mem to be both powerful and invisible. Users never notice the memory system working - it just makes Claude smarter over time.*
@@ -0,0 +1,655 @@
+# Progressive Disclosure: Claude-Mem's Context Priming Philosophy
+
+## Core Principle
+**Show what exists and its retrieval cost first. Let the agent decide what to fetch based on relevance and need.**
+
+---
+
+## What is Progressive Disclosure?
+
+Progressive disclosure is an information architecture pattern where you reveal complexity gradually rather than all at once. In the context of AI agents, it means:
+
+1. **Layer 1 (Index)**: Show lightweight metadata (titles, dates, types, token counts)
+2. **Layer 2 (Details)**: Fetch full content only when needed
+3. **Layer 3 (Deep Dive)**: Read original source files if required
+
+This mirrors how humans work: We scan headlines before reading articles, review table of contents before diving into chapters, and check file names before opening files.
+
+---
+
+## The Problem: Context Pollution
+
+Traditional RAG (Retrieval-Augmented Generation) systems fetch everything upfront:
+
+```
+❌ Traditional Approach:
+┌─────────────────────────────────────┐
+│ Session Start                        │
+│                                      │
+│ [15,000 tokens of past sessions]    │
+│ [8,000 tokens of observations]      │
+│ [12,000 tokens of file summaries]   │
+│                                      │
+│ Total: 35,000 tokens                │
+│ Relevant: ~2,000 tokens (6%)        │
+└─────────────────────────────────────┘
+```
+
+**Problems:**
+- Wastes 94% of attention budget on irrelevant context
+- User prompt gets buried under mountain of history
+- Agent must process everything before understanding task
+- No way to know what's actually useful until after reading
+
+---
+
+## Claude-Mem's Solution: Progressive Disclosure
+
+```
+✅ Progressive Disclosure Approach:
+┌─────────────────────────────────────┐
+│ Session Start                        │
+│                                      │
+│ Index of 50 observations: ~800 tokens│
+│ ↓                                    │
+│ Agent sees: "🔴 Hook timeout issue"  │
+│ Agent decides: "Relevant!"           │
+│ ↓                                    │
+│ Fetch observation #2543: ~120 tokens│
+│                                      │
+│ Total: 920 tokens                   │
+│ Relevant: 920 tokens (100%)         │
+└─────────────────────────────────────┘
+```
+
+**Benefits:**
+- Agent controls its own context consumption
+- Directly relevant to current task
+- Can fetch more if needed
+- Can skip everything if not relevant
+- Clear cost/benefit for each retrieval decision
+
+---
+
+## How It Works in Claude-Mem
+
+### The Index Format
+
+Every SessionStart hook provides a compact index:
+
+```markdown
+### Oct 26, 2025
+
+**General**
+| ID | Time | T | Title | Tokens |
+|----|------|---|-------|--------|
+| #2586 | 12:58 AM | 🔵 | Context hook file exists but is empty | ~51 |
+| #2587 | ″ | 🔵 | Context hook script file is empty | ~46 |
+| #2589 | ″ | 🟡 | Investigated hook debug output docs | ~105 |
+
+**src/hooks/context-hook.ts**
+| ID | Time | T | Title | Tokens |
+|----|------|---|-------|--------|
+| #2591 | 1:15 AM | ⚖️ | Stderr messaging abandoned | ~155 |
+| #2592 | 1:16 AM | ⚖️ | Web UI strategy redesigned | ~193 |
+```
+
+**What the agent sees:**
+- **What exists**: Observation titles give semantic meaning
+- **When it happened**: Timestamps for temporal context
+- **What type**: Icons indicate observation category
+- **Retrieval cost**: Token counts for informed decisions
+- **Where to get it**: MCP search tools referenced at bottom
+
+### The Legend System
+
+```
+🎯 session-request  - User's original goal
+🔴 gotcha          - Critical edge case or pitfall
+🟡 problem-solution - Bug fix or workaround
+🔵 how-it-works    - Technical explanation
+🟢 what-changed    - Code/architecture change
+🟣 discovery       - Learning or insight
+🟠 why-it-exists   - Design rationale
+🟤 decision        - Architecture decision
+⚖️ trade-off       - Deliberate compromise
+```
+
+**Purpose:**
+- Visual scanning (humans and AI both benefit)
+- Semantic categorization
+- Priority signaling (🔴 gotchas are more critical)
+- Pattern recognition across sessions
+
+### Progressive Disclosure Instructions
+
+The index includes usage guidance:
+
+```markdown
+💡 **Progressive Disclosure:** This index shows WHAT exists and retrieval COST.
+- Use MCP search tools to fetch full observation details on-demand
+- Prefer searching observations over re-reading code for past decisions
+- Critical types (🔴 gotcha, 🟤 decision, ⚖️ trade-off) often worth fetching immediately
+```
+
+**What this does:**
+- Teaches the agent the pattern
+- Suggests when to fetch (critical types)
+- Recommends search over code re-reading (efficiency)
+- Makes the system self-documenting
+
+---
+
+## The Philosophy: Context as Currency
+
+### Mental Model: Token Budget as Money
+
+Think of context window as a bank account:
+
+| Approach | Metaphor | Outcome |
+|----------|----------|---------|
+| **Dump everything** | Spending your entire paycheck on groceries you might need someday | Waste, clutter, can't afford what you actually need |
+| **Fetch nothing** | Refusing to spend any money | Starvation, can't accomplish tasks |
+| **Progressive disclosure** | Check your pantry, make a shopping list, buy only what you need | Efficiency, room for unexpected needs |
+
+### The Attention Budget
+
+LLMs have finite attention:
+- Every token attends to every other token (n² relationships)
+- 100,000 token window ≠ 100,000 tokens of useful attention
+- Context "rot" happens as window fills
+- Later tokens get less attention than earlier ones
+
+**Claude-Mem's approach:**
+- Start with ~1,000 tokens of index
+- Agent has 99,000 tokens free for task
+- Agent fetches ~200 tokens when needed
+- Final budget: ~98,000 tokens for actual work
+
+### Design for Autonomy
+
+> "As models improve, let them act intelligently"
+
+Progressive disclosure treats the agent as an **intelligent information forager**, not a passive recipient of pre-selected context.
+
+**Traditional RAG:**
+```
+System → [Decides relevance] → Agent
+        ↑
+   Hope this helps!
+```
+
+**Progressive Disclosure:**
+```
+System → [Shows index] → Agent → [Decides relevance] → [Fetches details]
+                          ↑
+                   You know best!
+```
+
+The agent knows:
+- The current task context
+- What information would help
+- How much budget to spend
+- When to stop searching
+
+We don't.
+
+---
+
+## Implementation Principles
+
+### 1. Make Costs Visible
+
+Every item in the index shows token count:
+
+```
+| #2591 | 1:15 AM | ⚖️ | Stderr messaging abandoned | ~155 |
+                                                        ^^^^
+                                                    Retrieval cost
+```
+
+**Why:**
+- Agent can make informed ROI decisions
+- Small observations (~50 tokens) are "cheap" to fetch
+- Large observations (~500 tokens) require stronger justification
+- Matches how humans think about effort
+
+### 2. Use Semantic Compression
+
+Titles compress full observations into ~10 words:
+
+**Bad title:**
+```
+Observation about a thing
+```
+
+**Good title:**
+```
+🔴 Hook timeout issue: 60s default too short for npm install
+```
+
+**What makes a good title:**
+- Specific: Identifies exact issue
+- Actionable: Clear what to do
+- Self-contained: Doesn't require reading observation
+- Searchable: Contains key terms (hook, timeout, npm)
+- Categorized: Icon indicates type
+
+### 3. Group by Context
+
+Observations are grouped by:
+- **Date**: Temporal context
+- **File path**: Spatial context (work on specific files)
+- **Project**: Logical context
+
+```markdown
+**src/hooks/context-hook.ts**
+| ID | Time | T | Title | Tokens |
+|----|------|---|-------|--------|
+| #2591 | 1:15 AM | ⚖️ | Stderr messaging abandoned | ~155 |
+| #2594 | 1:17 AM | 🟠 | Removed stderr section from docs | ~93 |
+```
+
+**Benefit:** If agent is working on `src/hooks/context-hook.ts`, related observations are already grouped together.
+
+### 4. Provide Retrieval Tools
+
+The index is useless without retrieval mechanisms:
+
+```markdown
+*Use claude-mem MCP search to access records with the given ID*
+```
+
+**Available tools:**
+- `search_observations` - Full-text search
+- `find_by_concept` - Concept-based retrieval
+- `find_by_file` - File-based retrieval
+- `find_by_type` - Type-based retrieval
+- `get_recent_context` - Recent session summaries
+
+Each tool supports `format: "index"` (default) and `format: "full"`.
+
+---
+
+## Real-World Example
+
+### Scenario: Agent asked to fix a bug in hooks
+
+**Without progressive disclosure:**
+```
+SessionStart injects 25,000 tokens of past context
+Agent reads everything
+Agent finds 1 relevant observation (buried in middle)
+Total tokens consumed: 25,000
+Relevant tokens: ~200
+Efficiency: 0.8%
+```
+
+**With progressive disclosure:**
+```
+SessionStart shows index: ~800 tokens
+Agent sees title: "🔴 Hook timeout issue: 60s too short"
+Agent thinks: "This looks relevant to my bug!"
+Agent fetches observation #2543: ~155 tokens
+Total tokens consumed: 955
+Relevant tokens: 955
+Efficiency: 100%
+```
+
+### The Index Entry
+
+```markdown
+| #2543 | 2:14 PM | 🔴 | Hook timeout: 60s too short for npm install | ~155 |
+```
+
+**What the agent learns WITHOUT fetching:**
+- There's a known gotcha (🔴) about hook timeouts
+- It's related to npm install taking too long
+- Full details are ~155 tokens (cheap)
+- Happened at 2:14 PM (recent)
+
+**Decision tree:**
+```
+Is my task related to hooks? → YES
+Is my task related to timeouts? → YES
+Is my task related to npm? → YES
+155 tokens is cheap → FETCH IT
+```
+
+---
+
+## The Two-Tier Search Strategy
+
+Claude-Mem implements progressive disclosure in search results too:
+
+### Tier 1: Index Format (Default)
+
+```typescript
+search_observations({
+  query: "hook timeout",
+  format: "index"  // Default
+})
+```
+
+**Returns:**
+```
+Found 3 observations matching "hook timeout":
+
+| ID | Date | Type | Title | Tokens |
+|----|------|------|-------|--------|
+| #2543 | Oct 26 | gotcha | Hook timeout: 60s too short | ~155 |
+| #2891 | Oct 25 | how-it-works | Hook timeout configuration | ~203 |
+| #2102 | Oct 20 | problem-solution | Fixed timeout in CI | ~89 |
+```
+
+**Cost:** ~100 tokens for 3 results
+**Value:** Agent can scan and decide which to fetch
+
+### Tier 2: Full Format (On-Demand)
+
+```typescript
+search_observations({
+  query: "hook timeout",
+  format: "full",
+  limit: 1  // Fetch just the most relevant
+})
+```
+
+**Returns:**
+```
+#2543 🔴 Hook timeout: 60s too short for npm install
+─────────────────────────────────────────────────
+Date: Oct 26, 2025 2:14 PM
+Type: gotcha
+Project: claude-mem
+
+Narrative:
+Discovered that the default 60-second hook timeout is insufficient
+for npm install operations, especially with large dependency trees
+or slow network conditions. This causes SessionStart hook to fail
+silently, preventing context injection.
+
+Facts:
+- Default timeout: 60 seconds
+- npm install with cold cache: ~90 seconds
+- Configured timeout: 120 seconds in plugin/hooks/hooks.json:25
+
+Files Modified:
+- plugin/hooks/hooks.json
+
+Concepts: hooks, timeout, npm, configuration
+```
+
+**Cost:** ~155 tokens for full details
+**Value:** Complete understanding of the issue
+
+---
+
+## Cognitive Load Theory
+
+Progressive disclosure is grounded in **Cognitive Load Theory**:
+
+### Intrinsic Load
+The inherent difficulty of the task itself.
+
+**Example:** "Fix authentication bug"
+- Must understand auth system
+- Must understand the bug
+- Must write the fix
+
+This load is unavoidable.
+
+### Extraneous Load
+The cognitive burden of poorly presented information.
+
+**Traditional RAG adds extraneous load:**
+- Scanning irrelevant observations
+- Filtering out noise
+- Remembering what to ignore
+- Re-contextualizing after each section
+
+**Progressive disclosure minimizes extraneous load:**
+- Scan titles (low effort)
+- Fetch only relevant (targeted effort)
+- Full attention on current task
+
+### Germane Load
+The effort of building mental models and schemas.
+
+**Progressive disclosure supports germane load:**
+- Consistent structure (legend, grouping)
+- Clear categorization (types, icons)
+- Semantic compression (good titles)
+- Explicit costs (token counts)
+
+---
+
+## Anti-Patterns to Avoid
+
+### ❌ Verbose Titles
+
+**Bad:**
+```
+| #2543 | 2:14 PM | 🔴 | Investigation into the issue where hooks time out | ~155 |
+```
+
+**Good:**
+```
+| #2543 | 2:14 PM | 🔴 | Hook timeout: 60s too short for npm install | ~155 |
+```
+
+### ❌ Hiding Costs
+
+**Bad:**
+```
+| #2543 | 2:14 PM | 🔴 | Hook timeout issue |
+```
+
+**Good:**
+```
+| #2543 | 2:14 PM | 🔴 | Hook timeout issue | ~155 |
+```
+
+### ❌ No Retrieval Path
+
+**Bad:**
+```
+Here are 10 observations. [No instructions on how to get full details]
+```
+
+**Good:**
+```
+Here are 10 observations.
+*Use MCP search tools to fetch full observation details on-demand*
+```
+
+### ❌ Defaulting to Full Format
+
+**Bad:**
+```typescript
+search_observations({
+  query: "hooks",
+  format: "full"  // Fetches everything
+})
+```
+
+**Good:**
+```typescript
+search_observations({
+  query: "hooks",
+  format: "index",  // Scan first
+  limit: 20
+})
+
+// Then, if needed:
+search_observations({
+  query: "hooks",
+  format: "full",
+  limit: 1  // Just the most relevant
+})
+```
+
+---
+
+## Key Design Decisions
+
+### Why Token Counts?
+
+**Decision:** Show approximate token counts (~155, ~203) rather than exact counts.
+
+**Rationale:**
+- Communicates scale (50 vs 500) without false precision
+- Maps to human intuition (small/medium/large)
+- Allows agent to budget attention
+- Encourages cost-conscious retrieval
+
+### Why Icons Instead of Text Labels?
+
+**Decision:** Use emoji icons (🔴, 🟡, 🔵) rather than text (GOTCHA, PROBLEM, HOWTO).
+
+**Rationale:**
+- Visual scanning (pattern recognition)
+- Token efficient (1 char vs 10 chars)
+- Language-agnostic
+- Aesthetically distinct
+- Works for both humans and AI
+
+### Why Index-First, Not Smart Pre-Fetch?
+
+**Decision:** Always show index first, even if we "know" what's relevant.
+
+**Rationale:**
+- We can't know what's relevant better than the agent
+- Pre-fetching assumes we understand the task
+- Agent knows current context, we don't
+- Respects agent autonomy
+- Fails gracefully (can always fetch more)
+
+### Why Group by File Path?
+
+**Decision:** Group observations by file path in addition to date.
+
+**Rationale:**
+- Spatial locality: Work on file X likely needs context about file X
+- Reduces scanning effort
+- Matches how developers think
+- Clear semantic boundaries
+
+---
+
+## Measuring Success
+
+Progressive disclosure is working when:
+
+### ✅ Low Waste Ratio
+```
+Relevant Tokens / Total Context Tokens > 80%
+```
+
+Most of the context consumed is actually useful.
+
+### ✅ Selective Fetching
+```
+Index Shown: 50 observations
+Details Fetched: 2-3 observations
+```
+
+Agent is being selective, not fetching everything.
+
+### ✅ Fast Task Completion
+```
+Session with index: 30 seconds to find relevant context
+Session without: 90 seconds scanning all context
+```
+
+Time-to-relevant-information is faster.
+
+### ✅ Appropriate Depth
+```
+Simple task: Only index needed
+Medium task: 1-2 observations fetched
+Complex task: 5-10 observations + code reads
+```
+
+Depth scales with task complexity.
+
+---
+
+## Future Enhancements
+
+### Adaptive Index Size
+
+```typescript
+// Vary index size based on session type
+SessionStart({ source: "startup" }):
+  → Show last 10 sessions (small index)
+
+SessionStart({ source: "resume" }):
+  → Show only current session (micro index)
+
+SessionStart({ source: "compact" }):
+  → Show last 20 sessions (larger index)
+```
+
+### Relevance Scoring
+
+```typescript
+// Use embeddings to pre-sort index by relevance
+search_observations({
+  query: "authentication bug",
+  format: "index",
+  sort: "relevance"  // Based on semantic similarity
+})
+```
+
+### Cost Forecasting
+
+```markdown
+💡 **Budget Estimate:**
+- Fetching all 🔴 gotchas: ~450 tokens
+- Fetching all file-related: ~1,200 tokens
+- Fetching everything: ~8,500 tokens
+```
+
+### Progressive Detail Levels
+
+```
+Layer 1: Index (titles only)
+Layer 2: Summaries (2-3 sentences)
+Layer 3: Full details (complete observation)
+Layer 4: Source files (referenced code)
+```
+
+---
+
+## Key Takeaways
+
+1. **Show, don't tell**: Index reveals what exists without forcing consumption
+2. **Cost-conscious**: Make retrieval costs visible for informed decisions
+3. **Agent autonomy**: Let the agent decide what's relevant
+4. **Semantic compression**: Good titles make or break the system
+5. **Consistent structure**: Patterns reduce cognitive load
+6. **Two-tier everything**: Index first, details on-demand
+7. **Context as currency**: Spend wisely on high-value information
+
+---
+
+## Remember
+
+> "The best interface is one that disappears when not needed, and appears exactly when it is."
+
+Progressive disclosure respects the agent's intelligence and autonomy. We provide the map; the agent chooses the path.
+
+---
+
+## Further Reading
+
+- [Context Engineering for AI Agents](/docs/context-engineering) - Foundational principles
+- [Claude-Mem Architecture](/docs/architecture) - How it all fits together
+- Cognitive Load Theory (Sweller, 1988)
+- Information Foraging Theory (Pirolli & Card, 1999)
+- Progressive Disclosure (Nielsen Norman Group)
+
+---
+
+*This philosophy emerged from real-world usage of Claude-Mem across hundreds of coding sessions. The pattern works because it aligns with both human cognition and LLM attention mechanics.*
@@ -8,6 +8,11 @@
            "type": "command",
            "command": "cd \"${CLAUDE_PLUGIN_ROOT}/..\" && npm install --prefer-offline --no-audit --no-fund --loglevel=silent && node ${CLAUDE_PLUGIN_ROOT}/scripts/context-hook.js",
            "timeout": 300
+          },
+          {
+            "type": "command",
+            "command": "node ${CLAUDE_PLUGIN_ROOT}/scripts/stderr-test-hook.js",
+            "timeout": 10
          }
        ]
      }
@@ -0,0 +1,3 @@
+#!/usr/bin/env node
+#!/usr/bin/env node
+console.error("\u{1F9EA} TEST: This is a stderr message from the claude-mem hook");process.exit(0);
@@ -17,7 +17,8 @@ const HOOKS = [
  { name: 'new-hook', source: 'src/hooks/new-hook.ts' },
  { name: 'save-hook', source: 'src/hooks/save-hook.ts' },
  { name: 'summary-hook', source: 'src/hooks/summary-hook.ts' },
-  { name: 'cleanup-hook', source: 'src/hooks/cleanup-hook.ts' }
+  { name: 'cleanup-hook', source: 'src/hooks/cleanup-hook.ts' },
+  { name: 'stderr-test-hook', source: 'src/hooks/stderr-test-hook.ts' }
 ];

 const WORKER_SERVICE = {
@@ -0,0 +1,12 @@
+#!/usr/bin/env node
+
+/**
+ * Test hook to verify if stderr messages appear in Claude Code UI
+ * This hook simply outputs a message via console.error()
+ */
+
+// Output a test message to stderr
+console.error('🧪 TEST: This is a stderr message from the claude-mem hook');
+
+// Exit successfully
+process.exit(0);