diff --git a/README.md b/README.md index f0ac8706..bf0b182f 100644 --- a/README.md +++ b/README.md @@ -79,9 +79,15 @@ npx mintlify dev - **[Usage Guide](docs/usage/getting-started.mdx)** - How Claude-Mem works automatically - **[MCP Search Tools](docs/usage/search-tools.mdx)** - Query your project history +### Best Practices +- **[Context Engineering](docs/context-engineering.mdx)** - AI agent context optimization principles +- **[Progressive Disclosure](docs/progressive-disclosure.mdx)** - Philosophy behind Claude-Mem's context priming strategy + ### Architecture - **[Overview](docs/architecture/overview.mdx)** - System components & data flow -- **[Hooks](docs/architecture/hooks.mdx)** - 5 lifecycle hooks explained +- **[Architecture Evolution](docs/architecture-evolution.mdx)** - The journey from v3 to v4 +- **[Hooks Architecture](docs/hooks-architecture.mdx)** - How Claude-Mem uses lifecycle hooks +- **[Hooks Reference](docs/architecture/hooks.mdx)** - 5 lifecycle hooks explained - **[Worker Service](docs/architecture/worker-service.mdx)** - HTTP API & PM2 management - **[Database](docs/architecture/database.mdx)** - SQLite schema & FTS5 search - **[MCP Search](docs/architecture/mcp-search.mdx)** - 7 search tools & examples diff --git a/docs/architecture-evolution.mdx b/docs/architecture-evolution.mdx new file mode 100644 index 00000000..bfe06eeb --- /dev/null +++ b/docs/architecture-evolution.mdx @@ -0,0 +1,801 @@ +# Architecture Evolution: The Journey from v3 to v4 + +## The Problem We Solved + +**Goal:** Create a memory system that makes Claude smarter across sessions without the user noticing it exists. + +**Challenge:** How do you observe AI agent behavior, compress it intelligently, and serve it back at the right time - all without slowing down or interfering with the main workflow? + +This is the story of how claude-mem evolved from a simple idea to a production-ready system, and the key architectural decisions that made it work. + +--- + +## v1-v2: The Naive Approach + +### The First Attempt: Dump Everything + +**Architecture:** +``` +PostToolUse Hook → Save raw tool outputs → Retrieve everything on startup +``` + +**What we learned:** +- ❌ Context pollution (thousands of tokens of irrelevant data) +- ❌ No compression (raw tool outputs are verbose) +- ❌ No search (had to scan everything linearly) +- ✅ Proved the concept: Memory across sessions is valuable + +**Example of what went wrong:** +``` +SessionStart loaded: +- 150 file read operations +- 80 grep searches +- 45 bash commands +- Total: ~35,000 tokens +- Relevant to current task: ~500 tokens (1.4%) +``` + +--- + +## v3: Smart Compression, Wrong Architecture + +### The Breakthrough: AI-Powered Compression + +**New idea:** Use Claude itself to compress observations + +**Architecture:** +``` +PostToolUse Hook → Queue observation → SDK Worker → AI compression → Store insights +``` + +**What we added:** +1. **Claude Agent SDK integration** - Use AI to compress observations +2. **Background worker** - Don't block main session +3. **Structured observations** - Extract facts, decisions, insights +4. **Session summaries** - Generate comprehensive summaries + +**What worked:** +- ✅ Compression ratio: 10:1 to 100:1 +- ✅ Semantic understanding (not just keyword matching) +- ✅ Background processing (hooks stayed fast) +- ✅ Search became useful + +**What didn't work:** +- ❌ Still loaded everything upfront +- ❌ Session ID management was broken +- ❌ Aggressive cleanup interrupted summaries +- ❌ Multiple SDK sessions per Claude Code session + +--- + +## The Key Realizations + +### Realization 1: Progressive Disclosure + +**Problem:** Even compressed observations can pollute context if you load them all. + +**Insight:** Humans don't read everything before starting work. Why should AI? + +**Solution:** Show an index first, fetch details on-demand. + +``` +❌ Old: Load 50 observations (8,500 tokens) +✅ New: Show index of 50 observations (800 tokens) + Agent fetches 2-3 relevant ones (300 tokens) + Total: 1,100 tokens vs 8,500 tokens +``` + +**Impact:** +- 87% reduction in context usage +- 100% relevance (only fetch what's needed) +- Agent autonomy (decides what's relevant) + +### Realization 2: Session ID Chaos + +**Problem:** SDK session IDs change on every turn. + +**What we thought:** +```typescript +// ❌ Wrong assumption +UserPromptSubmit → Capture session ID once → Use forever +``` + +**Reality:** +```typescript +// ✅ Actual behavior +Turn 1: session_abc123 +Turn 2: session_def456 +Turn 3: session_ghi789 +``` + +**Why this matters:** +- Can't resume sessions without tracking ID updates +- Session state gets lost between turns +- Observations get orphaned + +**Solution:** +```typescript +// Capture from system init message +for await (const msg of response) { + if (msg.type === 'system' && msg.subtype === 'init') { + sdkSessionId = msg.session_id; + await updateSessionId(sessionId, sdkSessionId); + } +} +``` + +### Realization 3: Graceful vs Aggressive Cleanup + +**v3 approach:** +```typescript +// ❌ Aggressive: Kill worker immediately +SessionEnd → DELETE /worker/session → Worker stops +``` + +**Problems:** +- Summary generation interrupted mid-process +- Pending observations lost +- Race conditions everywhere + +**v4 approach:** +```typescript +// ✅ Graceful: Let worker finish +SessionEnd → Mark session complete → Worker finishes → Exit naturally +``` + +**Benefits:** +- Summaries complete successfully +- No lost observations +- Clean state transitions + +**Code:** +```typescript +// v3: Aggressive +async function sessionEnd(sessionId: string) { + await fetch(`http://localhost:37777/sessions/${sessionId}`, { + method: 'DELETE' + }); +} + +// v4: Graceful +async function sessionEnd(sessionId: string) { + await db.run( + 'UPDATE sdk_sessions SET completed_at = ? WHERE id = ?', + [Date.now(), sessionId] + ); +} +``` + +### Realization 4: One Session, Not Many + +**Problem:** We were creating multiple SDK sessions per Claude Code session. + +**What we thought:** +``` +Claude Code session → Create SDK session per observation → 100+ SDK sessions +``` + +**Reality should be:** +``` +Claude Code session → ONE long-running SDK session → Streaming input +``` + +**Why this matters:** +- SDK maintains conversation state +- Context accumulates naturally +- Much more efficient + +**Implementation:** +```typescript +// ✅ Streaming Input Mode +async function* messageGenerator(): AsyncIterable { + // Initial prompt + yield { + role: "user", + content: "You are a memory assistant..." + }; + + // Then continuously yield observations + while (session.status === 'active') { + const observations = await pollQueue(); + for (const obs of observations) { + yield { + role: "user", + content: formatObservation(obs) + }; + } + await sleep(1000); + } +} + +const response = query({ + prompt: messageGenerator(), + options: { maxTurns: 1000 } +}); +``` + +--- + +## v4: The Architecture That Works + +### The Core Design + +``` +┌─────────────────────────────────────────────────────────┐ +│ CLAUDE CODE SESSION │ +│ User → Claude → Tools (Read, Edit, Write, Bash) │ +│ ↓ │ +│ PostToolUse Hook │ +│ (queues observation) │ +└─────────────────────────────────────────────────────────┘ + ↓ SQLite queue +┌─────────────────────────────────────────────────────────┐ +│ SDK WORKER PROCESS │ +│ ONE streaming session per Claude Code session │ +│ │ +│ AsyncIterable │ +│ → Yields observations from queue │ +│ → SDK compresses via AI │ +│ → Parses XML responses │ +│ → Stores in database │ +└─────────────────────────────────────────────────────────┘ + ↓ SQLite storage +┌─────────────────────────────────────────────────────────┐ +│ NEXT SESSION │ +│ SessionStart Hook │ +│ → Queries database │ +│ → Returns progressive disclosure index │ +│ → Agent fetches details via MCP │ +└─────────────────────────────────────────────────────────┘ +``` + +### The Five Hook Architecture + + + + **Purpose:** Inject context from previous sessions + + **Timing:** When Claude Code starts + + **What it does:** + - Queries last 10 session summaries + - Formats as progressive disclosure index + - Injects into context via stdout + + **Key change from v3:** + - ✅ Index format (not full details) + - ✅ Token counts visible + - ✅ MCP search instructions included + + + + **Purpose:** Initialize session tracking + + **Timing:** Before Claude processes prompt + + **What it does:** + - Creates session record + - Saves raw user prompt (v4.2.0+) + - Starts worker if needed + + **Key change from v3:** + - ✅ Stores raw prompts for search + - ✅ Auto-starts PM2 worker + + + + **Purpose:** Capture tool observations + + **Timing:** After every tool execution + + **What it does:** + - Enqueues observation in database + - Returns immediately + + **Key change from v3:** + - ✅ Just enqueues (doesn't process) + - ✅ Worker handles all AI calls + + + + **Purpose:** Generate session summaries + + **Timing:** Worker-triggered (mid-session) + + **What it does:** + - Gathers observations + - Sends to Claude for summarization + - Stores structured summary + + **Key change from v3:** + - ✅ Multiple summaries per session + - ✅ Summaries are checkpoints, not endings + + + + **Purpose:** Graceful cleanup + + **Timing:** When session ends + + **What it does:** + - Marks session complete + - Lets worker finish processing + + **Key change from v3:** + - ✅ Graceful (not aggressive) + - ✅ No DELETE requests + - ✅ Worker finishes naturally + + + +### Database Schema Evolution + +**v3 schema:** +```sql +-- Simple, flat structure +CREATE TABLE observations ( + id INTEGER PRIMARY KEY, + session_id TEXT, + text TEXT, + created_at INTEGER +); +``` + +**v4 schema:** +```sql +-- Rich, structured schema +CREATE TABLE observations ( + id INTEGER PRIMARY KEY AUTOINCREMENT, + session_id TEXT NOT NULL, + project TEXT NOT NULL, + + -- Progressive disclosure metadata + title TEXT NOT NULL, + subtitle TEXT, + type TEXT NOT NULL, -- decision, bugfix, feature, etc. + + -- Content + narrative TEXT NOT NULL, + facts TEXT, -- JSON array + + -- Searchability + concepts TEXT, -- JSON array of tags + files_read TEXT, -- JSON array + files_modified TEXT, -- JSON array + + -- Timestamps + created_at TEXT NOT NULL, + created_at_epoch INTEGER NOT NULL, + + FOREIGN KEY(session_id) REFERENCES sdk_sessions(id) +); + +-- FTS5 for full-text search +CREATE VIRTUAL TABLE observations_fts USING fts5( + title, subtitle, narrative, facts, concepts, + content=observations +); + +-- Auto-sync triggers +CREATE TRIGGER observations_ai AFTER INSERT ON observations BEGIN + INSERT INTO observations_fts(rowid, title, subtitle, narrative, facts, concepts) + VALUES (new.id, new.title, new.subtitle, new.narrative, new.facts, new.concepts); +END; +``` + +**What changed:** +- ✅ Structured fields (title, subtitle, type) +- ✅ FTS5 full-text search +- ✅ Project-scoped queries +- ✅ Rich metadata for progressive disclosure + +### Worker Service Redesign + +**v3 worker:** +```typescript +// Multiple short SDK sessions +app.post('/process', async (req, res) => { + const response = await query({ + prompt: buildPrompt(req.body), + options: { maxTurns: 1 } + }); + + for await (const msg of response) { + // Process single observation + } + + res.json({ success: true }); +}); +``` + +**v4 worker:** +```typescript +// ONE long-running SDK session +async function runWorker(sessionId: string) { + const response = query({ + prompt: messageGenerator(), // AsyncIterable + options: { maxTurns: 1000 } + }); + + for await (const msg of response) { + if (msg.type === 'text') { + parseObservations(msg.content); + parseSummaries(msg.content); + } + } +} +``` + +**Benefits:** +- Maintains conversation state +- SDK handles context automatically +- More efficient (fewer API calls) +- Natural multi-turn flow + +--- + +## Critical Fixes Along the Way + +### Fix 1: Context Injection Pollution (v4.3.1) + +**Problem:** SessionStart hook output polluted with npm install logs + +```bash +# Hook output contained: +npm WARN deprecated ... +npm WARN deprecated ... +{"hookSpecificOutput": {"additionalContext": "..."}} +``` + +**Why it broke:** +- Claude Code expects clean JSON or plain text +- stderr/stdout from npm install mixed with hook output +- Context didn't inject properly + +**Solution:** +```json +{ + "command": "npm install --loglevel=silent && node context-hook.js" +} +``` + +**Result:** Clean JSON output, context injection works + +### Fix 2: Double Shebang Issue (v4.3.1) + +**Problem:** Hook executables had duplicate shebangs + +```javascript +#!/usr/bin/env node +#!/usr/bin/env node // ← Duplicate! + +// Rest of code... +``` + +**Why it happened:** +- Source files had shebang +- esbuild added another shebang during build + +**Solution:** +```typescript +// Remove shebangs from source files +// Let esbuild add them during build +``` + +**Result:** Clean executables, no parsing errors + +### Fix 3: FTS5 Injection Vulnerability (v4.2.3) + +**Problem:** User input passed directly to FTS5 query + +```typescript +// ❌ Vulnerable +const results = db.query( + `SELECT * FROM observations_fts WHERE observations_fts MATCH '${userQuery}'` +); +``` + +**Attack:** +```typescript +userQuery = "'; DROP TABLE observations; --" +``` + +**Solution:** +```typescript +// ✅ Safe: Use parameterized queries +const results = db.query( + 'SELECT * FROM observations_fts WHERE observations_fts MATCH ?', + [userQuery] +); +``` + +### Fix 4: NOT NULL Constraint Violation (v4.2.8) + +**Problem:** Session creation failed when prompt was empty + +```sql +INSERT INTO sdk_sessions (claude_session_id, user_prompt, ...) +VALUES ('abc123', NULL, ...) -- ❌ user_prompt is NOT NULL +``` + +**Solution:** +```typescript +// Allow NULL user_prompts +user_prompt: input.prompt ?? null +``` + +**Schema change:** +```sql +-- Before +user_prompt TEXT NOT NULL + +-- After +user_prompt TEXT -- Nullable +``` + +--- + +## Performance Improvements + +### Optimization 1: Prepared Statements + +**Before:** +```typescript +for (const obs of observations) { + db.run(`INSERT INTO observations (...) VALUES (?, ?, ...)`, [obs.id, obs.text, ...]); +} +``` + +**After:** +```typescript +const stmt = db.prepare(`INSERT INTO observations (...) VALUES (?, ?, ...)`); +for (const obs of observations) { + stmt.run([obs.id, obs.text, ...]); +} +stmt.finalize(); +``` + +**Impact:** 5x faster bulk inserts + +### Optimization 2: FTS5 Indexing + +**Before:** +```typescript +// Manual full-text search +const results = db.query( + `SELECT * FROM observations WHERE text LIKE '%${query}%'` +); +``` + +**After:** +```typescript +// FTS5 virtual table +const results = db.query( + `SELECT * FROM observations_fts WHERE observations_fts MATCH ?`, + [query] +); +``` + +**Impact:** 100x faster searches on large datasets + +### Optimization 3: Index Format Default + +**Before:** +```typescript +// Always return full observations +search_observations({ query: "hooks" }); +// Returns: 5,000 tokens +``` + +**After:** +```typescript +// Default to index format +search_observations({ query: "hooks", format: "index" }); +// Returns: 200 tokens + +// Fetch full only when needed +search_observations({ query: "hooks", format: "full", limit: 1 }); +// Returns: 150 tokens +``` + +**Impact:** 25x reduction in average search result size + +--- + +## What We Learned + +### Lesson 1: Context is Precious + +**Principle:** Every token you put in context window costs attention. + +**Application:** +- Progressive disclosure reduces waste by 87% +- Index-first approach gives agent control +- Token counts make costs visible + +### Lesson 2: Session State is Complicated + +**Principle:** Distributed state is hard. SDK handles it better than we can. + +**Application:** +- Use SDK's built-in session resumption +- Don't try to manually reconstruct state +- Track session IDs from init messages + +### Lesson 3: Graceful Beats Aggressive + +**Principle:** Let processes finish their work before terminating. + +**Application:** +- Graceful cleanup prevents data loss +- Workers finish important operations +- Clean state transitions reduce bugs + +### Lesson 4: AI is the Compressor + +**Principle:** Don't compress manually. Let AI do semantic compression. + +**Application:** +- 10:1 to 100:1 compression ratios +- Semantic understanding, not keyword extraction +- Structured outputs (XML parsing) + +### Lesson 5: Progressive Everything + +**Principle:** Show metadata first, fetch details on-demand. + +**Application:** +- Progressive disclosure in context injection +- Index format in search results +- Layer 1 (titles) → Layer 2 (summaries) → Layer 3 (full details) + +--- + +## The Road Ahead + +### Planned: Adaptive Index Size + +```typescript +SessionStart({ source: "startup" }): + → Show last 10 sessions (normal) + +SessionStart({ source: "resume" }): + → Show only current session (minimal) + +SessionStart({ source: "compact" }): + → Show last 20 sessions (comprehensive) +``` + +### Planned: Relevance Scoring + +```typescript +// Use embeddings to pre-sort index by semantic relevance +search_observations({ + query: "authentication bug", + sort: "relevance" // Based on embeddings +}); +``` + +### Planned: Multi-Project Context + +```typescript +// Cross-project pattern recognition +search_observations({ + query: "API rate limiting", + projects: ["api-gateway", "user-service", "billing-service"] +}); +``` + +### Planned: Collaborative Memory + +```typescript +// Team-shared observations (optional) +createObservation({ + title: "Rate limit: 100 req/min", + scope: "team" // vs "user" +}); +``` + +--- + +## Migration Guide: v3 → v4 + +### Step 1: Backup Database + +```bash +cp ~/.claude-mem/claude-mem.db ~/.claude-mem/claude-mem-v3-backup.db +``` + +### Step 2: Update Plugin + +```bash +cd ~/.claude/plugins/marketplaces/thedotmack +git pull +``` + +### Step 3: Run Migration + +```bash +npx tsx src/services/sqlite/migrations/v3-to-v4.ts +``` + +**What the migration does:** +- Adds new columns to observations table +- Creates FTS5 virtual tables +- Sets up auto-sync triggers +- Migrates existing observations to new schema + +### Step 4: Restart Worker + +```bash +pm2 restart claude-mem-worker +pm2 logs claude-mem-worker +``` + +### Step 5: Test + +```bash +# Start Claude Code +claude + +# Check that context is injected +# (Should see progressive disclosure index) + +# Submit a prompt and check observations +pm2 logs claude-mem-worker --nostream +``` + +--- + +## Key Metrics + +### v3 Performance + +| Metric | Value | +|--------|-------| +| Context usage per session | ~25,000 tokens | +| Relevant context | ~2,000 tokens (8%) | +| Hook execution time | ~200ms | +| Search latency | ~500ms (LIKE queries) | + +### v4 Performance + +| Metric | Value | +|--------|-------| +| Context usage per session | ~1,100 tokens | +| Relevant context | ~1,100 tokens (100%) | +| Hook execution time | ~45ms | +| Search latency | ~15ms (FTS5) | + +**Improvements:** +- 96% reduction in context waste +- 12x increase in relevance +- 4x faster hooks +- 33x faster search + +--- + +## Conclusion + +The journey from v3 to v4 was about understanding these fundamental truths: + +1. **Context is finite** - Progressive disclosure respects attention budget +2. **AI is the compressor** - Semantic understanding beats keyword extraction +3. **Agents are smart** - Let them decide what to fetch +4. **State is hard** - Use SDK's built-in mechanisms +5. **Graceful wins** - Let processes finish cleanly + +The result is a memory system that's both powerful and invisible. Users never notice it working - Claude just gets smarter over time. + +--- + +## Further Reading + +- [Progressive Disclosure](/docs/progressive-disclosure) - The philosophy behind v4 +- [Hooks Architecture](/docs/hooks-architecture) - How hooks power the system +- [Context Engineering](/docs/context-engineering) - Foundational principles +- [v4.0.0 Release Notes](/CHANGELOG.md#v400) - Full changelog + +--- + +*This architecture evolution reflects hundreds of hours of experimentation, dozens of dead ends, and the invaluable experience of real-world usage. v4 is the architecture that emerged from understanding what actually works.* diff --git a/docs/context-engineering.mdx b/docs/context-engineering.mdx new file mode 100644 index 00000000..adb1dce3 --- /dev/null +++ b/docs/context-engineering.mdx @@ -0,0 +1,222 @@ +# Context Engineering for AI Agents: Best Practices Cheat Sheet + +## Core Principle +**Find the smallest possible set of high-signal tokens that maximize the likelihood of your desired outcome.** + +--- + +## Context Engineering vs Prompt Engineering + +**Prompt Engineering**: Writing and organizing LLM instructions for optimal outcomes (one-time task) + +**Context Engineering**: Curating and maintaining the optimal set of tokens during inference across multiple turns (iterative process) + +Context engineering manages: +- System instructions +- Tools +- Model Context Protocol (MCP) +- External data +- Message history +- Runtime data retrieval + +--- + +## The Problem: Context Rot + +**Key Insight**: LLMs have an "attention budget" that gets depleted as context grows + +- Every token attends to every other token (n² relationships) +- As context length increases, model accuracy decreases +- Models have less training experience with longer sequences +- Context must be treated as a finite resource with diminishing marginal returns + +--- + +## System Prompts: Find the "Right Altitude" + +### The Goldilocks Zone + +**Too Prescriptive** ❌ +- Hardcoded if-else logic +- Brittle and fragile +- High maintenance complexity + +**Too Vague** ❌ +- High-level guidance without concrete signals +- Falsely assumes shared context +- Lacks actionable direction + +**Just Right** ✅ +- Specific enough to guide behavior effectively +- Flexible enough to provide strong heuristics +- Minimal set of information that fully outlines expected behavior + +### Best Practices +- Use simple, direct language +- Organize into distinct sections (``, ``, `## Tool guidance`, etc.) +- Use XML tags or Markdown headers for structure +- Start with minimal prompt, add based on failure modes +- Note: Minimal ≠ short (provide sufficient information upfront) + +--- + +## Tools: Minimal and Clear + +### Design Principles +- **Self-contained**: Each tool has a single, clear purpose +- **Robust to error**: Handle edge cases gracefully +- **Extremely clear**: Intended use is unambiguous +- **Token-efficient**: Returns relevant information without bloat +- **Descriptive parameters**: Unambiguous input names (e.g., `user_id` not `user`) + +### Critical Rule +**If a human engineer can't definitively say which tool to use in a given situation, an AI agent can't be expected to do better.** + +### Common Failure Modes to Avoid +- Bloated tool sets covering too much functionality +- Tools with overlapping purposes +- Ambiguous decision points about which tool to use + +--- + +## Examples: Diverse, Not Exhaustive + +**Do** ✅ +- Curate a set of diverse, canonical examples +- Show expected behavior effectively +- Think "pictures worth a thousand words" + +**Don't** ❌ +- Stuff in a laundry list of edge cases +- Try to articulate every possible rule +- Overwhelm with exhaustive scenarios + +--- + +## Context Retrieval Strategies + +### Just-In-Time Context (Recommended for Agents) +**Approach**: Maintain lightweight identifiers (file paths, queries, links) and dynamically load data at runtime + +**Benefits**: +- Avoids context pollution +- Enables progressive disclosure +- Mirrors human cognition (we don't memorize everything) +- Leverages metadata (file names, folder structure, timestamps) +- Agents discover context incrementally + +**Trade-offs**: +- Slower than pre-computed retrieval +- Requires proper tool guidance to avoid dead-ends + +### Pre-Inference Retrieval (Traditional RAG) +**Approach**: Use embedding-based retrieval to surface context before inference + +**When to Use**: Static content that won't change during interaction + +### Hybrid Strategy (Best of Both) +**Approach**: Retrieve some data upfront, enable autonomous exploration as needed + +**Example**: Claude Code loads CLAUDE.md files upfront, uses glob/grep for just-in-time retrieval + +**Rule of Thumb**: "Do the simplest thing that works" + +--- + +## Long-Horizon Tasks: Three Techniques + +### 1. Compaction +**What**: Summarize conversation nearing context limit, reinitiate with summary + +**Implementation**: +- Pass message history to model for compression +- Preserve critical details (architectural decisions, bugs, implementation) +- Discard redundant outputs +- Continue with compressed context + recently accessed files + +**Tuning Process**: +1. **First**: Maximize recall (capture all relevant information) +2. **Then**: Improve precision (eliminate superfluous content) + +**Low-Hanging Fruit**: Clear old tool calls and results + +**Best For**: Tasks requiring extensive back-and-forth + +### 2. Structured Note-Taking (Agentic Memory) +**What**: Agent writes notes persisted outside context window, retrieved later + +**Examples**: +- To-do lists +- NOTES.md files +- Game state tracking (Pokémon example: tracking 1,234 steps of training) +- Project progress logs + +**Benefits**: +- Persistent memory with minimal overhead +- Maintains critical context across tool calls +- Enables multi-hour coherent strategies + +**Best For**: Iterative development with clear milestones + +### 3. Sub-Agent Architectures +**What**: Specialized sub-agents handle focused tasks with clean context windows + +**How It Works**: +- Main agent coordinates high-level plan +- Sub-agents perform deep technical work +- Sub-agents explore extensively (tens of thousands of tokens) +- Return condensed summaries (1,000-2,000 tokens) + +**Benefits**: +- Clear separation of concerns +- Parallel exploration +- Detailed context remains isolated + +**Best For**: Complex research and analysis tasks + +--- + +## Quick Decision Framework + +| Scenario | Recommended Approach | +|----------|---------------------| +| Static content | Pre-inference retrieval or hybrid | +| Dynamic exploration needed | Just-in-time context | +| Extended back-and-forth | Compaction | +| Iterative development | Structured note-taking | +| Complex research | Sub-agent architectures | +| Rapid model improvement | "Do the simplest thing that works" | + +--- + +## Key Takeaways + +1. **Context is finite**: Treat it as a precious resource with an attention budget +2. **Think holistically**: Consider the entire state available to the LLM +3. **Stay minimal**: More context isn't always better +4. **Be iterative**: Context curation happens each time you pass to the model +5. **Design for autonomy**: As models improve, let them act intelligently +6. **Start simple**: Test with minimal setup, add based on failure modes + +--- + +## Anti-Patterns to Avoid + +- ❌ Cramming everything into prompts +- ❌ Creating brittle if-else logic +- ❌ Building bloated tool sets +- ❌ Stuffing exhaustive edge cases as examples +- ❌ Assuming larger context windows solve everything +- ❌ Ignoring context pollution over long interactions + +--- + +## Remember + +> "Even as models continue to improve, the challenge of maintaining coherence across extended interactions will remain central to building more effective agents." + +Context engineering will evolve, but the core principle stays the same: **optimize signal-to-noise ratio in your token budget**. + +--- + +*Based on Anthropic's "Effective context engineering for AI agents" (September 2025)* \ No newline at end of file diff --git a/docs/docs.json b/docs/docs.json index cf921adf..b6af3176 100644 --- a/docs/docs.json +++ b/docs/docs.json @@ -39,6 +39,14 @@ "usage/search-tools" ] }, + { + "group": "Best Practices", + "icon": "lightbulb", + "pages": [ + "context-engineering", + "progressive-disclosure" + ] + }, { "group": "Configuration & Development", "icon": "gear", @@ -53,6 +61,8 @@ "icon": "diagram-project", "pages": [ "architecture/overview", + "architecture-evolution", + "hooks-architecture", "architecture/hooks", "architecture/worker-service", "architecture/database", diff --git a/docs/hooks-architecture.mdx b/docs/hooks-architecture.mdx new file mode 100644 index 00000000..35cf5d43 --- /dev/null +++ b/docs/hooks-architecture.mdx @@ -0,0 +1,784 @@ +# How Claude-Mem Uses Hooks: A Lifecycle-Driven Architecture + +## Core Principle +**Observe the main Claude Code session from the outside, process observations in the background, inject context at the right time.** + +--- + +## The Big Picture + +Claude-Mem is fundamentally a **hook-driven system**. Every piece of functionality happens in response to lifecycle events: + +``` +┌─────────────────────────────────────────────────────────┐ +│ CLAUDE CODE SESSION │ +│ (Main session - user interacting with Claude) │ +│ │ +│ SessionStart → UserPromptSubmit → Tool Use → Stop │ +│ ↓ ↓ ↓ ↓ │ +│ [Hook] [Hook] [Hook] [Hook] │ +└─────────────────────────────────────────────────────────┘ + ↓ ↓ ↓ ↓ +┌─────────────────────────────────────────────────────────┐ +│ CLAUDE-MEM SYSTEM │ +│ │ +│ Context New Session Observation Summary │ +│ Injection Tracking Capture Generation │ +└─────────────────────────────────────────────────────────┘ +``` + +**Key insight:** Claude-Mem doesn't interrupt or modify Claude Code's behavior. It observes from the outside and provides value through lifecycle hooks. + +--- + +## Why Hooks? + +### The Non-Invasive Requirement + +Claude-Mem had several architectural constraints: + +1. **Can't modify Claude Code**: It's a closed-source binary +2. **Must be fast**: Can't slow down the main session +3. **Must be reliable**: Can't break Claude Code if it fails +4. **Must be portable**: Works on any project without configuration + +**Solution:** External command hooks configured via settings.json + +### The Hook System Advantage + +Claude Code's hook system provides exactly what we need: + + + + SessionStart, UserPromptSubmit, PostToolUse, Stop + + + + Hooks run in parallel, don't wait for completion + + + + SessionStart and UserPromptSubmit can add context + + + + PostToolUse sees all tool inputs and outputs + + + +--- + +## The Five Hooks + +### Hook 1: SessionStart (Context Hook) + +**Purpose:** Inject relevant context from previous sessions + +**When:** Claude Code starts or resumes + +**What it does:** +1. Extracts project name from current working directory +2. Queries SQLite for recent session summaries (last 10) +3. Queries SQLite for recent observations (last 50) +4. Formats as progressive disclosure index +5. Outputs to stdout (automatically injected into context) + +**Configuration:** +```json +{ + "hooks": { + "SessionStart": [{ + "matcher": "startup", + "hooks": [{ + "type": "command", + "command": "${CLAUDE_PLUGIN_ROOT}/scripts/context-hook.js", + "timeout": 120 + }] + }] + } +} +``` + +**Key decisions:** +- ✅ Only runs on "startup" (not "clear" or "compact") +- ✅ 120-second timeout for npm install (v4.3.1 fix) +- ✅ Uses `--loglevel=silent` for clean JSON output +- ✅ Progressive disclosure format (index, not full details) + +**Output format:** +```markdown +# [claude-mem] recent context + +**Legend:** 🎯 session-request | 🔴 gotcha | 🟡 problem-solution ... + +### Oct 26, 2025 + +**General** +| ID | Time | T | Title | Tokens | +|----|------|---|-------|--------| +| #2586 | 12:58 AM | 🔵 | Context hook file empty | ~51 | + +*Use claude-mem MCP search to access full details* +``` + +**Source:** `src/hooks/context-hook.ts` → `plugin/scripts/context-hook.js` + +--- + +### Hook 2: UserPromptSubmit (New Session Hook) + +**Purpose:** Initialize session tracking when user submits a prompt + +**When:** Before Claude processes the user's message + +**What it does:** +1. Reads user prompt and session ID from stdin +2. Creates new session record in SQLite +3. Saves raw user prompt for full-text search (v4.2.0+) +4. Starts PM2 worker service if not running +5. Returns immediately (non-blocking) + +**Configuration:** +```json +{ + "hooks": { + "UserPromptSubmit": [{ + "hooks": [{ + "type": "command", + "command": "${CLAUDE_PLUGIN_ROOT}/scripts/new-hook.js" + }] + }] + } +} +``` + +**Key decisions:** +- ✅ No matcher (runs for all prompts) +- ✅ Creates session record immediately +- ✅ Stores raw prompts for search (privacy note: local SQLite only) +- ✅ Auto-starts worker service +- ✅ Suppresses output (`suppressOutput: true`) + +**Database operations:** +```sql +INSERT INTO sdk_sessions (claude_session_id, project, user_prompt, ...) +VALUES (?, ?, ?, ...) + +INSERT INTO user_prompts (session_id, prompt, prompt_number, ...) +VALUES (?, ?, ?, ...) +``` + +**Source:** `src/hooks/new-hook.ts` → `plugin/scripts/new-hook.js` + +--- + +### Hook 3: PostToolUse (Save Observation Hook) + +**Purpose:** Capture tool execution observations for later processing + +**When:** Immediately after any tool completes successfully + +**What it does:** +1. Receives tool name, input, output from stdin +2. Finds active session for current project +3. Enqueues observation in observation_queue table +4. Returns immediately (processing happens in worker) + +**Configuration:** +```json +{ + "hooks": { + "PostToolUse": [{ + "matcher": "*", + "hooks": [{ + "type": "command", + "command": "${CLAUDE_PLUGIN_ROOT}/scripts/save-hook.js" + }] + }] + } +} +``` + +**Key decisions:** +- ✅ Matcher: `*` (captures all tools) +- ✅ Non-blocking (just enqueues, doesn't process) +- ✅ Worker processes observations asynchronously +- ✅ Parallel execution safe (each hook gets own stdin) + +**Database operations:** +```sql +INSERT INTO observation_queue (session_id, tool_name, tool_input, tool_output, ...) +VALUES (?, ?, ?, ?, ...) +``` + +**What gets queued:** +```json +{ + "session_id": "abc123", + "tool_name": "Edit", + "tool_input": { + "file_path": "/path/to/file.ts", + "old_string": "...", + "new_string": "..." + }, + "tool_output": { + "success": true, + "linesChanged": 5 + }, + "created_at_epoch": 1698765432 +} +``` + +**Source:** `src/hooks/save-hook.ts` → `plugin/scripts/save-hook.js` + +--- + +### Hook 4: Summary Hook (Mid-Session Checkpoint) + +**Purpose:** Generate AI-powered session summaries during the session + +**When:** Triggered programmatically by the worker service + +**What it does:** +1. Gathers session observations from database +2. Sends to Claude Agent SDK for summarization +3. Processes response and extracts structured summary +4. Stores in session_summaries table + +**Configuration:** +```json +{ + "hooks": { + "Summary": [{ + "hooks": [{ + "type": "command", + "command": "${CLAUDE_PLUGIN_ROOT}/scripts/summary-hook.js" + }] + }] + } +} +``` + +**Key decisions:** +- ✅ Triggered by worker, not by Claude Code lifecycle +- ✅ Multiple summaries per session (v4.2.0+) +- ✅ Summaries are checkpoints, not endings +- ✅ Uses Claude Agent SDK for AI compression + +**Summary structure:** +```xml + + User's original request + What was examined + Key discoveries + Work finished + Remaining tasks + + path/to/file1.ts + path/to/file2.ts + + + path/to/file3.ts + + Additional context + +``` + +**Source:** `src/hooks/summary-hook.ts` → `plugin/scripts/summary-hook.js` + +--- + +### Hook 5: SessionEnd (Cleanup Hook) + +**Purpose:** Mark sessions as completed when they end + +**When:** Claude Code session ends (not on `/clear`) + +**What it does:** +1. Marks session as completed in database +2. Allows worker to finish processing +3. Performs graceful cleanup + +**Configuration:** +```json +{ + "hooks": { + "SessionEnd": [{ + "hooks": [{ + "type": "command", + "command": "${CLAUDE_PLUGIN_ROOT}/scripts/cleanup-hook.js" + }] + }] + } +} +``` + +**Key decisions:** +- ✅ Graceful completion (v4.1.0+) +- ✅ No longer sends DELETE to workers +- ✅ Skips cleanup on `/clear` commands +- ✅ Preserves ongoing sessions + +**Why graceful cleanup?** + +**Old approach (v3):** +```typescript +// ❌ Aggressive cleanup +SessionEnd → DELETE /worker/session → Worker stops immediately +``` + +**Problems:** +- Interrupted summary generation +- Lost pending observations +- Race conditions + +**New approach (v4.1.0+):** +```typescript +// ✅ Graceful completion +SessionEnd → UPDATE sessions SET completed_at = NOW() +Worker sees completion → Finishes processing → Exits naturally +``` + +**Benefits:** +- Worker finishes important operations +- Summaries complete successfully +- Clean state transitions + +**Source:** `src/hooks/cleanup-hook.ts` → `plugin/scripts/cleanup-hook.js` + +--- + +## Hook Execution Flow + +### Session Lifecycle + +```mermaid +sequenceDiagram + participant User + participant Claude + participant Hooks + participant Worker + participant DB + + User->>Claude: Start Claude Code + Claude->>Hooks: SessionStart hook + Hooks->>DB: Query recent context + DB-->>Hooks: Session summaries + observations + Hooks-->>Claude: Inject context + Note over Claude: Context available for session + + User->>Claude: Submit prompt + Claude->>Hooks: UserPromptSubmit hook + Hooks->>DB: Create session record + Hooks->>Worker: Start worker (if not running) + Worker-->>DB: Ready to process + + Claude->>Claude: Execute tools + Claude->>Hooks: PostToolUse (multiple times) + Hooks->>DB: Queue observations + Note over Worker: Polls queue, processes observations + + Worker->>Worker: AI compression + Worker->>DB: Store compressed observations + Worker->>Hooks: Trigger summary hook + Hooks->>DB: Store session summary + + User->>Claude: Finish + Claude->>Hooks: SessionEnd hook + Hooks->>DB: Mark session complete + Worker->>DB: Check completion + Worker->>Worker: Finish processing + Worker->>Worker: Exit gracefully +``` + +### Hook Timing + +| Event | Timing | Blocking | Timeout | Output Handling | +|-------|--------|----------|---------|-----------------| +| **SessionStart** | Before session | No | 120s | stdout → context | +| **UserPromptSubmit** | Before processing | No | 60s | stdout → context | +| **PostToolUse** | After tool | No | 60s | Transcript only | +| **Summary** | Worker triggered | No | 300s | Database | +| **SessionEnd** | On exit | No | 60s | Log only | + +--- + +## The Worker Service Architecture + +### Why a Background Worker? + +**Problem:** Hooks must be fast (< 1 second) + +**Reality:** AI compression takes 5-30 seconds per observation + +**Solution:** Hooks enqueue observations, worker processes async + +``` +┌─────────────────────────────────────────────────────────┐ +│ HOOK (Fast) │ +│ 1. Read stdin (< 1ms) │ +│ 2. Insert into queue (< 10ms) │ +│ 3. Return success (< 20ms total) │ +└─────────────────────────────────────────────────────────┘ + ↓ (queue) +┌─────────────────────────────────────────────────────────┐ +│ WORKER (Slow) │ +│ 1. Poll queue every 1s │ +│ 2. Process observation via Claude SDK (5-30s) │ +│ 3. Parse and store results │ +│ 4. Mark observation processed │ +└─────────────────────────────────────────────────────────┘ +``` + +### PM2 Process Management + +**Technology:** PM2 (process manager for Node.js) + +**Why PM2:** +- Auto-restart on failure +- Log management +- Process monitoring +- Cross-platform (works on macOS, Linux, Windows) +- No systemd/launchd needed + +**Configuration:** +```javascript +// ecosystem.config.cjs +module.exports = { + apps: [{ + name: 'claude-mem-worker', + script: './plugin/scripts/worker-service.cjs', + instances: 1, + autorestart: true, + watch: false, + max_memory_restart: '500M', + env: { + NODE_ENV: 'production', + CLAUDE_MEM_WORKER_PORT: 37777 + } + }] +}; +``` + +**Worker lifecycle:** +```bash +# Started by new-hook (if not running) +pm2 start ecosystem.config.cjs + +# Status check +pm2 status claude-mem-worker + +# View logs +pm2 logs claude-mem-worker + +# Restart +pm2 restart claude-mem-worker +``` + +### Worker HTTP API + +**Technology:** Express.js REST API on port 37777 + +**Endpoints:** + +| Endpoint | Method | Purpose | +|----------|--------|---------| +| `/health` | GET | Health check | +| `/sessions` | POST | Create session | +| `/sessions/:id` | GET | Get session status | +| `/sessions/:id` | PATCH | Update session | +| `/observations` | POST | Enqueue observation | +| `/observations/:id` | GET | Get observation | + +**Why HTTP API?** +- Language-agnostic (hooks can be any language) +- Easy debugging (curl commands) +- Standard error handling +- Proper async handling + +--- + +## Design Patterns + +### Pattern 1: Fire-and-Forget Hooks + +**Principle:** Hooks should return immediately, not wait for completion + +```typescript +// ❌ Bad: Hook waits for processing +export async function saveHook(stdin: HookInput) { + const observation = parseInput(stdin); + await processObservation(observation); // BLOCKS! + return success(); +} + +// ✅ Good: Hook enqueues and returns +export async function saveHook(stdin: HookInput) { + const observation = parseInput(stdin); + await enqueueObservation(observation); // Fast + return success(); // Immediate +} +``` + +### Pattern 2: Queue-Based Processing + +**Principle:** Decouple capture from processing + +``` +Hook (capture) → Queue (buffer) → Worker (process) +``` + +**Benefits:** +- Parallel hook execution safe +- Worker failure doesn't affect hooks +- Retry logic centralized +- Backpressure handling + +### Pattern 3: Graceful Degradation + +**Principle:** Memory system failure shouldn't break Claude Code + +```typescript +try { + await captureObservation(); +} catch (error) { + // Log error, but don't throw + console.error('Memory capture failed:', error); + return { continue: true, suppressOutput: true }; +} +``` + +**Failure modes:** +- Database locked → Skip observation, log error +- Worker crashed → Auto-restart via PM2 +- Network issue → Retry with exponential backoff +- Disk full → Warn user, disable memory + +### Pattern 4: Progressive Enhancement + +**Principle:** Core functionality works without memory, memory enhances it + +``` +Without memory: Claude Code works normally +With memory: Claude Code + context from past sessions +Memory broken: Falls back to working normally +``` + +--- + +## Hook Debugging + +### Debug Mode + +Enable detailed hook execution logs: + +```bash +claude --debug +``` + +**Output:** +``` +[DEBUG] Executing hooks for PostToolUse:Write +[DEBUG] Getting matching hook commands for PostToolUse with query: Write +[DEBUG] Found 1 hook matchers in settings +[DEBUG] Matched 1 hooks for query "Write" +[DEBUG] Found 1 hook commands to execute +[DEBUG] Executing hook command: ${CLAUDE_PLUGIN_ROOT}/scripts/save-hook.js with timeout 60000ms +[DEBUG] Hook command completed with status 0: {"continue":true,"suppressOutput":true} +``` + +### Common Issues + + + + **Symptoms:** Hook command never runs + + **Debugging:** + 1. Check `/hooks` menu - is hook registered? + 2. Verify matcher pattern (case-sensitive!) + 3. Test command manually: `echo '{}' | node save-hook.js` + 4. Check file permissions (executable?) + + + + **Symptoms:** Hook execution exceeds timeout + + **Debugging:** + 1. Check timeout setting (default 60s) + 2. Identify slow operation (database? network?) + 3. Move slow operation to worker + 4. Increase timeout if necessary + + + + **Symptoms:** SessionStart hook runs but context missing + + **Debugging:** + 1. Check stdout (must be valid JSON or plain text) + 2. Verify no stderr output (pollutes JSON) + 3. Check exit code (must be 0) + 4. Look for npm install output (v4.3.1 fix) + + + + **Symptoms:** PostToolUse hook runs but observations missing + + **Debugging:** + 1. Check database: `sqlite3 ~/.claude-mem/claude-mem.db "SELECT * FROM observation_queue"` + 2. Verify session exists: `SELECT * FROM sdk_sessions` + 3. Check worker status: `pm2 status` + 4. View worker logs: `pm2 logs claude-mem-worker` + + + +### Testing Hooks Manually + +```bash +# Test context hook +echo '{ + "session_id": "test123", + "cwd": "/Users/alex/projects/my-app", + "hook_event_name": "SessionStart", + "source": "startup" +}' | node plugin/scripts/context-hook.js + +# Test save hook +echo '{ + "session_id": "test123", + "tool_name": "Edit", + "tool_input": {"file_path": "test.ts"}, + "tool_output": {"success": true} +}' | node plugin/scripts/save-hook.js + +# Test with actual Claude Code +claude --debug +/hooks # View registered hooks +# Submit prompt and watch debug output +``` + +--- + +## Performance Considerations + +### Hook Execution Time + +**Target:** < 100ms per hook + +**Actual measurements:** + +| Hook | Average | p95 | p99 | +|------|---------|-----|-----| +| SessionStart | 45ms | 120ms | 250ms | +| UserPromptSubmit | 12ms | 25ms | 50ms | +| PostToolUse | 8ms | 15ms | 30ms | +| SessionEnd | 5ms | 10ms | 20ms | + +**Why SessionStart is slower:** +- npm install check (idempotent but runs every time) +- Database query for 10 sessions + 50 observations +- Formatting progressive disclosure index + +**Optimization (v4.3.1):** +- Use `--loglevel=silent` for npm install +- Cache package.json hash to skip unnecessary installs +- Use prepared statements for database queries + +### Database Performance + +**Schema optimizations:** +- Indexes on `project`, `created_at_epoch`, `claude_session_id` +- FTS5 virtual tables for full-text search +- WAL mode for concurrent reads/writes + +**Query patterns:** +```sql +-- Fast: Uses index on (project, created_at_epoch) +SELECT * FROM session_summaries +WHERE project = ? +ORDER BY created_at_epoch DESC +LIMIT 10 + +-- Fast: Uses index on claude_session_id +SELECT * FROM sdk_sessions +WHERE claude_session_id = ? +LIMIT 1 + +-- Fast: FTS5 full-text search +SELECT * FROM observations_fts +WHERE observations_fts MATCH ? +ORDER BY rank +LIMIT 20 +``` + +### Worker Throughput + +**Bottleneck:** Claude API latency (5-30s per observation) + +**Mitigation:** +- Process observations sequentially (simpler, more predictable) +- Skip low-value observations (TodoWrite, ListMcpResourcesTool) +- Batch summaries (generate every N observations, not every observation) + +**Future optimization:** +- Parallel processing (multiple workers) +- Smart batching (combine related observations) +- Lazy summarization (summarize only when needed) + +--- + +## Security Considerations + +### Hook Command Safety + +**Risk:** Hooks execute arbitrary commands with user permissions + +**Mitigations:** +1. **Frozen at startup:** Hook configuration captured at start, changes require review +2. **User review required:** `/hooks` menu shows changes, requires approval +3. **Plugin isolation:** `${CLAUDE_PLUGIN_ROOT}` prevents path traversal +4. **Input validation:** Hooks validate stdin schema before processing + +### Data Privacy + +**What gets stored:** +- User prompts (raw text) - v4.2.0+ +- Tool inputs and outputs +- File paths read/modified +- Session summaries + +**Privacy guarantees:** +- All data stored locally in `~/.claude-mem/claude-mem.db` +- No cloud uploads (API calls only for AI compression) +- SQLite file permissions: user-only read/write +- No analytics or telemetry + +### API Key Protection + +**Configuration:** +- Anthropic API key in `~/.anthropic/api_key` or `ANTHROPIC_API_KEY` env var +- Worker inherits environment from Claude Code +- Never logged or stored in database + +--- + +## Key Takeaways + +1. **Hooks are interfaces**: They define clean boundaries between systems +2. **Non-blocking is critical**: Hooks must return fast, workers do the heavy lifting +3. **Graceful degradation**: Memory system can fail without breaking Claude Code +4. **Queue-based decoupling**: Capture and processing happen independently +5. **Progressive disclosure**: Context injection uses index-first approach +6. **Lifecycle alignment**: Each hook has a clear, single purpose + +--- + +## Further Reading + +- [Claude Code Hooks Reference](https://docs.claude.com/claude-code/hooks) - Official documentation +- [Progressive Disclosure](/docs/progressive-disclosure) - Context priming philosophy +- [Architecture Evolution](/docs/architecture-evolution) - v3 to v4 journey +- [Worker Service Design](/docs/worker-service) - Background processing details + +--- + +*The hook-driven architecture enables Claude-Mem to be both powerful and invisible. Users never notice the memory system working - it just makes Claude smarter over time.* diff --git a/docs/progressive-disclosure.mdx b/docs/progressive-disclosure.mdx new file mode 100644 index 00000000..e2ddb11d --- /dev/null +++ b/docs/progressive-disclosure.mdx @@ -0,0 +1,655 @@ +# Progressive Disclosure: Claude-Mem's Context Priming Philosophy + +## Core Principle +**Show what exists and its retrieval cost first. Let the agent decide what to fetch based on relevance and need.** + +--- + +## What is Progressive Disclosure? + +Progressive disclosure is an information architecture pattern where you reveal complexity gradually rather than all at once. In the context of AI agents, it means: + +1. **Layer 1 (Index)**: Show lightweight metadata (titles, dates, types, token counts) +2. **Layer 2 (Details)**: Fetch full content only when needed +3. **Layer 3 (Deep Dive)**: Read original source files if required + +This mirrors how humans work: We scan headlines before reading articles, review table of contents before diving into chapters, and check file names before opening files. + +--- + +## The Problem: Context Pollution + +Traditional RAG (Retrieval-Augmented Generation) systems fetch everything upfront: + +``` +❌ Traditional Approach: +┌─────────────────────────────────────┐ +│ Session Start │ +│ │ +│ [15,000 tokens of past sessions] │ +│ [8,000 tokens of observations] │ +│ [12,000 tokens of file summaries] │ +│ │ +│ Total: 35,000 tokens │ +│ Relevant: ~2,000 tokens (6%) │ +└─────────────────────────────────────┘ +``` + +**Problems:** +- Wastes 94% of attention budget on irrelevant context +- User prompt gets buried under mountain of history +- Agent must process everything before understanding task +- No way to know what's actually useful until after reading + +--- + +## Claude-Mem's Solution: Progressive Disclosure + +``` +✅ Progressive Disclosure Approach: +┌─────────────────────────────────────┐ +│ Session Start │ +│ │ +│ Index of 50 observations: ~800 tokens│ +│ ↓ │ +│ Agent sees: "🔴 Hook timeout issue" │ +│ Agent decides: "Relevant!" │ +│ ↓ │ +│ Fetch observation #2543: ~120 tokens│ +│ │ +│ Total: 920 tokens │ +│ Relevant: 920 tokens (100%) │ +└─────────────────────────────────────┘ +``` + +**Benefits:** +- Agent controls its own context consumption +- Directly relevant to current task +- Can fetch more if needed +- Can skip everything if not relevant +- Clear cost/benefit for each retrieval decision + +--- + +## How It Works in Claude-Mem + +### The Index Format + +Every SessionStart hook provides a compact index: + +```markdown +### Oct 26, 2025 + +**General** +| ID | Time | T | Title | Tokens | +|----|------|---|-------|--------| +| #2586 | 12:58 AM | 🔵 | Context hook file exists but is empty | ~51 | +| #2587 | ″ | 🔵 | Context hook script file is empty | ~46 | +| #2589 | ″ | 🟡 | Investigated hook debug output docs | ~105 | + +**src/hooks/context-hook.ts** +| ID | Time | T | Title | Tokens | +|----|------|---|-------|--------| +| #2591 | 1:15 AM | ⚖️ | Stderr messaging abandoned | ~155 | +| #2592 | 1:16 AM | ⚖️ | Web UI strategy redesigned | ~193 | +``` + +**What the agent sees:** +- **What exists**: Observation titles give semantic meaning +- **When it happened**: Timestamps for temporal context +- **What type**: Icons indicate observation category +- **Retrieval cost**: Token counts for informed decisions +- **Where to get it**: MCP search tools referenced at bottom + +### The Legend System + +``` +🎯 session-request - User's original goal +🔴 gotcha - Critical edge case or pitfall +🟡 problem-solution - Bug fix or workaround +🔵 how-it-works - Technical explanation +🟢 what-changed - Code/architecture change +🟣 discovery - Learning or insight +🟠 why-it-exists - Design rationale +🟤 decision - Architecture decision +⚖️ trade-off - Deliberate compromise +``` + +**Purpose:** +- Visual scanning (humans and AI both benefit) +- Semantic categorization +- Priority signaling (🔴 gotchas are more critical) +- Pattern recognition across sessions + +### Progressive Disclosure Instructions + +The index includes usage guidance: + +```markdown +💡 **Progressive Disclosure:** This index shows WHAT exists and retrieval COST. +- Use MCP search tools to fetch full observation details on-demand +- Prefer searching observations over re-reading code for past decisions +- Critical types (🔴 gotcha, 🟤 decision, ⚖️ trade-off) often worth fetching immediately +``` + +**What this does:** +- Teaches the agent the pattern +- Suggests when to fetch (critical types) +- Recommends search over code re-reading (efficiency) +- Makes the system self-documenting + +--- + +## The Philosophy: Context as Currency + +### Mental Model: Token Budget as Money + +Think of context window as a bank account: + +| Approach | Metaphor | Outcome | +|----------|----------|---------| +| **Dump everything** | Spending your entire paycheck on groceries you might need someday | Waste, clutter, can't afford what you actually need | +| **Fetch nothing** | Refusing to spend any money | Starvation, can't accomplish tasks | +| **Progressive disclosure** | Check your pantry, make a shopping list, buy only what you need | Efficiency, room for unexpected needs | + +### The Attention Budget + +LLMs have finite attention: +- Every token attends to every other token (n² relationships) +- 100,000 token window ≠ 100,000 tokens of useful attention +- Context "rot" happens as window fills +- Later tokens get less attention than earlier ones + +**Claude-Mem's approach:** +- Start with ~1,000 tokens of index +- Agent has 99,000 tokens free for task +- Agent fetches ~200 tokens when needed +- Final budget: ~98,000 tokens for actual work + +### Design for Autonomy + +> "As models improve, let them act intelligently" + +Progressive disclosure treats the agent as an **intelligent information forager**, not a passive recipient of pre-selected context. + +**Traditional RAG:** +``` +System → [Decides relevance] → Agent + ↑ + Hope this helps! +``` + +**Progressive Disclosure:** +``` +System → [Shows index] → Agent → [Decides relevance] → [Fetches details] + ↑ + You know best! +``` + +The agent knows: +- The current task context +- What information would help +- How much budget to spend +- When to stop searching + +We don't. + +--- + +## Implementation Principles + +### 1. Make Costs Visible + +Every item in the index shows token count: + +``` +| #2591 | 1:15 AM | ⚖️ | Stderr messaging abandoned | ~155 | + ^^^^ + Retrieval cost +``` + +**Why:** +- Agent can make informed ROI decisions +- Small observations (~50 tokens) are "cheap" to fetch +- Large observations (~500 tokens) require stronger justification +- Matches how humans think about effort + +### 2. Use Semantic Compression + +Titles compress full observations into ~10 words: + +**Bad title:** +``` +Observation about a thing +``` + +**Good title:** +``` +🔴 Hook timeout issue: 60s default too short for npm install +``` + +**What makes a good title:** +- Specific: Identifies exact issue +- Actionable: Clear what to do +- Self-contained: Doesn't require reading observation +- Searchable: Contains key terms (hook, timeout, npm) +- Categorized: Icon indicates type + +### 3. Group by Context + +Observations are grouped by: +- **Date**: Temporal context +- **File path**: Spatial context (work on specific files) +- **Project**: Logical context + +```markdown +**src/hooks/context-hook.ts** +| ID | Time | T | Title | Tokens | +|----|------|---|-------|--------| +| #2591 | 1:15 AM | ⚖️ | Stderr messaging abandoned | ~155 | +| #2594 | 1:17 AM | 🟠 | Removed stderr section from docs | ~93 | +``` + +**Benefit:** If agent is working on `src/hooks/context-hook.ts`, related observations are already grouped together. + +### 4. Provide Retrieval Tools + +The index is useless without retrieval mechanisms: + +```markdown +*Use claude-mem MCP search to access records with the given ID* +``` + +**Available tools:** +- `search_observations` - Full-text search +- `find_by_concept` - Concept-based retrieval +- `find_by_file` - File-based retrieval +- `find_by_type` - Type-based retrieval +- `get_recent_context` - Recent session summaries + +Each tool supports `format: "index"` (default) and `format: "full"`. + +--- + +## Real-World Example + +### Scenario: Agent asked to fix a bug in hooks + +**Without progressive disclosure:** +``` +SessionStart injects 25,000 tokens of past context +Agent reads everything +Agent finds 1 relevant observation (buried in middle) +Total tokens consumed: 25,000 +Relevant tokens: ~200 +Efficiency: 0.8% +``` + +**With progressive disclosure:** +``` +SessionStart shows index: ~800 tokens +Agent sees title: "🔴 Hook timeout issue: 60s too short" +Agent thinks: "This looks relevant to my bug!" +Agent fetches observation #2543: ~155 tokens +Total tokens consumed: 955 +Relevant tokens: 955 +Efficiency: 100% +``` + +### The Index Entry + +```markdown +| #2543 | 2:14 PM | 🔴 | Hook timeout: 60s too short for npm install | ~155 | +``` + +**What the agent learns WITHOUT fetching:** +- There's a known gotcha (🔴) about hook timeouts +- It's related to npm install taking too long +- Full details are ~155 tokens (cheap) +- Happened at 2:14 PM (recent) + +**Decision tree:** +``` +Is my task related to hooks? → YES +Is my task related to timeouts? → YES +Is my task related to npm? → YES +155 tokens is cheap → FETCH IT +``` + +--- + +## The Two-Tier Search Strategy + +Claude-Mem implements progressive disclosure in search results too: + +### Tier 1: Index Format (Default) + +```typescript +search_observations({ + query: "hook timeout", + format: "index" // Default +}) +``` + +**Returns:** +``` +Found 3 observations matching "hook timeout": + +| ID | Date | Type | Title | Tokens | +|----|------|------|-------|--------| +| #2543 | Oct 26 | gotcha | Hook timeout: 60s too short | ~155 | +| #2891 | Oct 25 | how-it-works | Hook timeout configuration | ~203 | +| #2102 | Oct 20 | problem-solution | Fixed timeout in CI | ~89 | +``` + +**Cost:** ~100 tokens for 3 results +**Value:** Agent can scan and decide which to fetch + +### Tier 2: Full Format (On-Demand) + +```typescript +search_observations({ + query: "hook timeout", + format: "full", + limit: 1 // Fetch just the most relevant +}) +``` + +**Returns:** +``` +#2543 🔴 Hook timeout: 60s too short for npm install +───────────────────────────────────────────────── +Date: Oct 26, 2025 2:14 PM +Type: gotcha +Project: claude-mem + +Narrative: +Discovered that the default 60-second hook timeout is insufficient +for npm install operations, especially with large dependency trees +or slow network conditions. This causes SessionStart hook to fail +silently, preventing context injection. + +Facts: +- Default timeout: 60 seconds +- npm install with cold cache: ~90 seconds +- Configured timeout: 120 seconds in plugin/hooks/hooks.json:25 + +Files Modified: +- plugin/hooks/hooks.json + +Concepts: hooks, timeout, npm, configuration +``` + +**Cost:** ~155 tokens for full details +**Value:** Complete understanding of the issue + +--- + +## Cognitive Load Theory + +Progressive disclosure is grounded in **Cognitive Load Theory**: + +### Intrinsic Load +The inherent difficulty of the task itself. + +**Example:** "Fix authentication bug" +- Must understand auth system +- Must understand the bug +- Must write the fix + +This load is unavoidable. + +### Extraneous Load +The cognitive burden of poorly presented information. + +**Traditional RAG adds extraneous load:** +- Scanning irrelevant observations +- Filtering out noise +- Remembering what to ignore +- Re-contextualizing after each section + +**Progressive disclosure minimizes extraneous load:** +- Scan titles (low effort) +- Fetch only relevant (targeted effort) +- Full attention on current task + +### Germane Load +The effort of building mental models and schemas. + +**Progressive disclosure supports germane load:** +- Consistent structure (legend, grouping) +- Clear categorization (types, icons) +- Semantic compression (good titles) +- Explicit costs (token counts) + +--- + +## Anti-Patterns to Avoid + +### ❌ Verbose Titles + +**Bad:** +``` +| #2543 | 2:14 PM | 🔴 | Investigation into the issue where hooks time out | ~155 | +``` + +**Good:** +``` +| #2543 | 2:14 PM | 🔴 | Hook timeout: 60s too short for npm install | ~155 | +``` + +### ❌ Hiding Costs + +**Bad:** +``` +| #2543 | 2:14 PM | 🔴 | Hook timeout issue | +``` + +**Good:** +``` +| #2543 | 2:14 PM | 🔴 | Hook timeout issue | ~155 | +``` + +### ❌ No Retrieval Path + +**Bad:** +``` +Here are 10 observations. [No instructions on how to get full details] +``` + +**Good:** +``` +Here are 10 observations. +*Use MCP search tools to fetch full observation details on-demand* +``` + +### ❌ Defaulting to Full Format + +**Bad:** +```typescript +search_observations({ + query: "hooks", + format: "full" // Fetches everything +}) +``` + +**Good:** +```typescript +search_observations({ + query: "hooks", + format: "index", // Scan first + limit: 20 +}) + +// Then, if needed: +search_observations({ + query: "hooks", + format: "full", + limit: 1 // Just the most relevant +}) +``` + +--- + +## Key Design Decisions + +### Why Token Counts? + +**Decision:** Show approximate token counts (~155, ~203) rather than exact counts. + +**Rationale:** +- Communicates scale (50 vs 500) without false precision +- Maps to human intuition (small/medium/large) +- Allows agent to budget attention +- Encourages cost-conscious retrieval + +### Why Icons Instead of Text Labels? + +**Decision:** Use emoji icons (🔴, 🟡, 🔵) rather than text (GOTCHA, PROBLEM, HOWTO). + +**Rationale:** +- Visual scanning (pattern recognition) +- Token efficient (1 char vs 10 chars) +- Language-agnostic +- Aesthetically distinct +- Works for both humans and AI + +### Why Index-First, Not Smart Pre-Fetch? + +**Decision:** Always show index first, even if we "know" what's relevant. + +**Rationale:** +- We can't know what's relevant better than the agent +- Pre-fetching assumes we understand the task +- Agent knows current context, we don't +- Respects agent autonomy +- Fails gracefully (can always fetch more) + +### Why Group by File Path? + +**Decision:** Group observations by file path in addition to date. + +**Rationale:** +- Spatial locality: Work on file X likely needs context about file X +- Reduces scanning effort +- Matches how developers think +- Clear semantic boundaries + +--- + +## Measuring Success + +Progressive disclosure is working when: + +### ✅ Low Waste Ratio +``` +Relevant Tokens / Total Context Tokens > 80% +``` + +Most of the context consumed is actually useful. + +### ✅ Selective Fetching +``` +Index Shown: 50 observations +Details Fetched: 2-3 observations +``` + +Agent is being selective, not fetching everything. + +### ✅ Fast Task Completion +``` +Session with index: 30 seconds to find relevant context +Session without: 90 seconds scanning all context +``` + +Time-to-relevant-information is faster. + +### ✅ Appropriate Depth +``` +Simple task: Only index needed +Medium task: 1-2 observations fetched +Complex task: 5-10 observations + code reads +``` + +Depth scales with task complexity. + +--- + +## Future Enhancements + +### Adaptive Index Size + +```typescript +// Vary index size based on session type +SessionStart({ source: "startup" }): + → Show last 10 sessions (small index) + +SessionStart({ source: "resume" }): + → Show only current session (micro index) + +SessionStart({ source: "compact" }): + → Show last 20 sessions (larger index) +``` + +### Relevance Scoring + +```typescript +// Use embeddings to pre-sort index by relevance +search_observations({ + query: "authentication bug", + format: "index", + sort: "relevance" // Based on semantic similarity +}) +``` + +### Cost Forecasting + +```markdown +💡 **Budget Estimate:** +- Fetching all 🔴 gotchas: ~450 tokens +- Fetching all file-related: ~1,200 tokens +- Fetching everything: ~8,500 tokens +``` + +### Progressive Detail Levels + +``` +Layer 1: Index (titles only) +Layer 2: Summaries (2-3 sentences) +Layer 3: Full details (complete observation) +Layer 4: Source files (referenced code) +``` + +--- + +## Key Takeaways + +1. **Show, don't tell**: Index reveals what exists without forcing consumption +2. **Cost-conscious**: Make retrieval costs visible for informed decisions +3. **Agent autonomy**: Let the agent decide what's relevant +4. **Semantic compression**: Good titles make or break the system +5. **Consistent structure**: Patterns reduce cognitive load +6. **Two-tier everything**: Index first, details on-demand +7. **Context as currency**: Spend wisely on high-value information + +--- + +## Remember + +> "The best interface is one that disappears when not needed, and appears exactly when it is." + +Progressive disclosure respects the agent's intelligence and autonomy. We provide the map; the agent chooses the path. + +--- + +## Further Reading + +- [Context Engineering for AI Agents](/docs/context-engineering) - Foundational principles +- [Claude-Mem Architecture](/docs/architecture) - How it all fits together +- Cognitive Load Theory (Sweller, 1988) +- Information Foraging Theory (Pirolli & Card, 1999) +- Progressive Disclosure (Nielsen Norman Group) + +--- + +*This philosophy emerged from real-world usage of Claude-Mem across hundreds of coding sessions. The pattern works because it aligns with both human cognition and LLM attention mechanics.* diff --git a/plugin/hooks/hooks.json b/plugin/hooks/hooks.json index d2dcc37f..d22a5a52 100644 --- a/plugin/hooks/hooks.json +++ b/plugin/hooks/hooks.json @@ -8,6 +8,11 @@ "type": "command", "command": "cd \"${CLAUDE_PLUGIN_ROOT}/..\" && npm install --prefer-offline --no-audit --no-fund --loglevel=silent && node ${CLAUDE_PLUGIN_ROOT}/scripts/context-hook.js", "timeout": 300 + }, + { + "type": "command", + "command": "node ${CLAUDE_PLUGIN_ROOT}/scripts/stderr-test-hook.js", + "timeout": 10 } ] } diff --git a/plugin/scripts/stderr-test-hook.js b/plugin/scripts/stderr-test-hook.js new file mode 100755 index 00000000..759885ac --- /dev/null +++ b/plugin/scripts/stderr-test-hook.js @@ -0,0 +1,3 @@ +#!/usr/bin/env node +#!/usr/bin/env node +console.error("\u{1F9EA} TEST: This is a stderr message from the claude-mem hook");process.exit(0); diff --git a/scripts/build-hooks.js b/scripts/build-hooks.js index 0ee545d5..87a5fbdf 100644 --- a/scripts/build-hooks.js +++ b/scripts/build-hooks.js @@ -17,7 +17,8 @@ const HOOKS = [ { name: 'new-hook', source: 'src/hooks/new-hook.ts' }, { name: 'save-hook', source: 'src/hooks/save-hook.ts' }, { name: 'summary-hook', source: 'src/hooks/summary-hook.ts' }, - { name: 'cleanup-hook', source: 'src/hooks/cleanup-hook.ts' } + { name: 'cleanup-hook', source: 'src/hooks/cleanup-hook.ts' }, + { name: 'stderr-test-hook', source: 'src/hooks/stderr-test-hook.ts' } ]; const WORKER_SERVICE = { diff --git a/src/hooks/stderr-test-hook.ts b/src/hooks/stderr-test-hook.ts new file mode 100644 index 00000000..04d80f6b --- /dev/null +++ b/src/hooks/stderr-test-hook.ts @@ -0,0 +1,12 @@ +#!/usr/bin/env node + +/** + * Test hook to verify if stderr messages appear in Claude Code UI + * This hook simply outputs a message via console.error() + */ + +// Output a test message to stderr +console.error('🧪 TEST: This is a stderr message from the claude-mem hook'); + +// Exit successfully +process.exit(0);