Add stderr test hook for UI experiment

This commit is contained in:
Alex Newman
2025-10-26 22:29:43 -04:00
parent 44b69b737b
commit b0fae0cfd4
10 changed files with 2501 additions and 2 deletions
+7 -1
View File
@@ -79,9 +79,15 @@ npx mintlify dev
- **[Usage Guide](docs/usage/getting-started.mdx)** - How Claude-Mem works automatically
- **[MCP Search Tools](docs/usage/search-tools.mdx)** - Query your project history
### Best Practices
- **[Context Engineering](docs/context-engineering.mdx)** - AI agent context optimization principles
- **[Progressive Disclosure](docs/progressive-disclosure.mdx)** - Philosophy behind Claude-Mem's context priming strategy
### Architecture
- **[Overview](docs/architecture/overview.mdx)** - System components & data flow
- **[Hooks](docs/architecture/hooks.mdx)** - 5 lifecycle hooks explained
- **[Architecture Evolution](docs/architecture-evolution.mdx)** - The journey from v3 to v4
- **[Hooks Architecture](docs/hooks-architecture.mdx)** - How Claude-Mem uses lifecycle hooks
- **[Hooks Reference](docs/architecture/hooks.mdx)** - 5 lifecycle hooks explained
- **[Worker Service](docs/architecture/worker-service.mdx)** - HTTP API & PM2 management
- **[Database](docs/architecture/database.mdx)** - SQLite schema & FTS5 search
- **[MCP Search](docs/architecture/mcp-search.mdx)** - 7 search tools & examples
+801
View File
@@ -0,0 +1,801 @@
# Architecture Evolution: The Journey from v3 to v4
## The Problem We Solved
**Goal:** Create a memory system that makes Claude smarter across sessions without the user noticing it exists.
**Challenge:** How do you observe AI agent behavior, compress it intelligently, and serve it back at the right time - all without slowing down or interfering with the main workflow?
This is the story of how claude-mem evolved from a simple idea to a production-ready system, and the key architectural decisions that made it work.
---
## v1-v2: The Naive Approach
### The First Attempt: Dump Everything
**Architecture:**
```
PostToolUse Hook → Save raw tool outputs → Retrieve everything on startup
```
**What we learned:**
- ❌ Context pollution (thousands of tokens of irrelevant data)
- ❌ No compression (raw tool outputs are verbose)
- ❌ No search (had to scan everything linearly)
- ✅ Proved the concept: Memory across sessions is valuable
**Example of what went wrong:**
```
SessionStart loaded:
- 150 file read operations
- 80 grep searches
- 45 bash commands
- Total: ~35,000 tokens
- Relevant to current task: ~500 tokens (1.4%)
```
---
## v3: Smart Compression, Wrong Architecture
### The Breakthrough: AI-Powered Compression
**New idea:** Use Claude itself to compress observations
**Architecture:**
```
PostToolUse Hook → Queue observation → SDK Worker → AI compression → Store insights
```
**What we added:**
1. **Claude Agent SDK integration** - Use AI to compress observations
2. **Background worker** - Don't block main session
3. **Structured observations** - Extract facts, decisions, insights
4. **Session summaries** - Generate comprehensive summaries
**What worked:**
- ✅ Compression ratio: 10:1 to 100:1
- ✅ Semantic understanding (not just keyword matching)
- ✅ Background processing (hooks stayed fast)
- ✅ Search became useful
**What didn't work:**
- ❌ Still loaded everything upfront
- ❌ Session ID management was broken
- ❌ Aggressive cleanup interrupted summaries
- ❌ Multiple SDK sessions per Claude Code session
---
## The Key Realizations
### Realization 1: Progressive Disclosure
**Problem:** Even compressed observations can pollute context if you load them all.
**Insight:** Humans don't read everything before starting work. Why should AI?
**Solution:** Show an index first, fetch details on-demand.
```
❌ Old: Load 50 observations (8,500 tokens)
✅ New: Show index of 50 observations (800 tokens)
Agent fetches 2-3 relevant ones (300 tokens)
Total: 1,100 tokens vs 8,500 tokens
```
**Impact:**
- 87% reduction in context usage
- 100% relevance (only fetch what's needed)
- Agent autonomy (decides what's relevant)
### Realization 2: Session ID Chaos
**Problem:** SDK session IDs change on every turn.
**What we thought:**
```typescript
// ❌ Wrong assumption
UserPromptSubmit → Capture session ID once → Use forever
```
**Reality:**
```typescript
// ✅ Actual behavior
Turn 1: session_abc123
Turn 2: session_def456
Turn 3: session_ghi789
```
**Why this matters:**
- Can't resume sessions without tracking ID updates
- Session state gets lost between turns
- Observations get orphaned
**Solution:**
```typescript
// Capture from system init message
for await (const msg of response) {
if (msg.type === 'system' && msg.subtype === 'init') {
sdkSessionId = msg.session_id;
await updateSessionId(sessionId, sdkSessionId);
}
}
```
### Realization 3: Graceful vs Aggressive Cleanup
**v3 approach:**
```typescript
// ❌ Aggressive: Kill worker immediately
SessionEnd → DELETE /worker/session → Worker stops
```
**Problems:**
- Summary generation interrupted mid-process
- Pending observations lost
- Race conditions everywhere
**v4 approach:**
```typescript
// ✅ Graceful: Let worker finish
SessionEnd → Mark session complete → Worker finishes → Exit naturally
```
**Benefits:**
- Summaries complete successfully
- No lost observations
- Clean state transitions
**Code:**
```typescript
// v3: Aggressive
async function sessionEnd(sessionId: string) {
await fetch(`http://localhost:37777/sessions/${sessionId}`, {
method: 'DELETE'
});
}
// v4: Graceful
async function sessionEnd(sessionId: string) {
await db.run(
'UPDATE sdk_sessions SET completed_at = ? WHERE id = ?',
[Date.now(), sessionId]
);
}
```
### Realization 4: One Session, Not Many
**Problem:** We were creating multiple SDK sessions per Claude Code session.
**What we thought:**
```
Claude Code session → Create SDK session per observation → 100+ SDK sessions
```
**Reality should be:**
```
Claude Code session → ONE long-running SDK session → Streaming input
```
**Why this matters:**
- SDK maintains conversation state
- Context accumulates naturally
- Much more efficient
**Implementation:**
```typescript
// ✅ Streaming Input Mode
async function* messageGenerator(): AsyncIterable<UserMessage> {
// Initial prompt
yield {
role: "user",
content: "You are a memory assistant..."
};
// Then continuously yield observations
while (session.status === 'active') {
const observations = await pollQueue();
for (const obs of observations) {
yield {
role: "user",
content: formatObservation(obs)
};
}
await sleep(1000);
}
}
const response = query({
prompt: messageGenerator(),
options: { maxTurns: 1000 }
});
```
---
## v4: The Architecture That Works
### The Core Design
```
┌─────────────────────────────────────────────────────────┐
│ CLAUDE CODE SESSION │
│ User → Claude → Tools (Read, Edit, Write, Bash) │
│ ↓ │
│ PostToolUse Hook │
│ (queues observation) │
└─────────────────────────────────────────────────────────┘
↓ SQLite queue
┌─────────────────────────────────────────────────────────┐
│ SDK WORKER PROCESS │
│ ONE streaming session per Claude Code session │
│ │
│ AsyncIterable<UserMessage> │
│ → Yields observations from queue │
│ → SDK compresses via AI │
│ → Parses XML responses │
│ → Stores in database │
└─────────────────────────────────────────────────────────┘
↓ SQLite storage
┌─────────────────────────────────────────────────────────┐
│ NEXT SESSION │
│ SessionStart Hook │
│ → Queries database │
│ → Returns progressive disclosure index │
│ → Agent fetches details via MCP │
└─────────────────────────────────────────────────────────┘
```
### The Five Hook Architecture
<Tabs>
<Tab title="SessionStart">
**Purpose:** Inject context from previous sessions
**Timing:** When Claude Code starts
**What it does:**
- Queries last 10 session summaries
- Formats as progressive disclosure index
- Injects into context via stdout
**Key change from v3:**
- ✅ Index format (not full details)
- ✅ Token counts visible
- ✅ MCP search instructions included
</Tab>
<Tab title="UserPromptSubmit">
**Purpose:** Initialize session tracking
**Timing:** Before Claude processes prompt
**What it does:**
- Creates session record
- Saves raw user prompt (v4.2.0+)
- Starts worker if needed
**Key change from v3:**
- ✅ Stores raw prompts for search
- ✅ Auto-starts PM2 worker
</Tab>
<Tab title="PostToolUse">
**Purpose:** Capture tool observations
**Timing:** After every tool execution
**What it does:**
- Enqueues observation in database
- Returns immediately
**Key change from v3:**
- ✅ Just enqueues (doesn't process)
- ✅ Worker handles all AI calls
</Tab>
<Tab title="Summary">
**Purpose:** Generate session summaries
**Timing:** Worker-triggered (mid-session)
**What it does:**
- Gathers observations
- Sends to Claude for summarization
- Stores structured summary
**Key change from v3:**
- ✅ Multiple summaries per session
- ✅ Summaries are checkpoints, not endings
</Tab>
<Tab title="SessionEnd">
**Purpose:** Graceful cleanup
**Timing:** When session ends
**What it does:**
- Marks session complete
- Lets worker finish processing
**Key change from v3:**
- ✅ Graceful (not aggressive)
- ✅ No DELETE requests
- ✅ Worker finishes naturally
</Tab>
</Tabs>
### Database Schema Evolution
**v3 schema:**
```sql
-- Simple, flat structure
CREATE TABLE observations (
id INTEGER PRIMARY KEY,
session_id TEXT,
text TEXT,
created_at INTEGER
);
```
**v4 schema:**
```sql
-- Rich, structured schema
CREATE TABLE observations (
id INTEGER PRIMARY KEY AUTOINCREMENT,
session_id TEXT NOT NULL,
project TEXT NOT NULL,
-- Progressive disclosure metadata
title TEXT NOT NULL,
subtitle TEXT,
type TEXT NOT NULL, -- decision, bugfix, feature, etc.
-- Content
narrative TEXT NOT NULL,
facts TEXT, -- JSON array
-- Searchability
concepts TEXT, -- JSON array of tags
files_read TEXT, -- JSON array
files_modified TEXT, -- JSON array
-- Timestamps
created_at TEXT NOT NULL,
created_at_epoch INTEGER NOT NULL,
FOREIGN KEY(session_id) REFERENCES sdk_sessions(id)
);
-- FTS5 for full-text search
CREATE VIRTUAL TABLE observations_fts USING fts5(
title, subtitle, narrative, facts, concepts,
content=observations
);
-- Auto-sync triggers
CREATE TRIGGER observations_ai AFTER INSERT ON observations BEGIN
INSERT INTO observations_fts(rowid, title, subtitle, narrative, facts, concepts)
VALUES (new.id, new.title, new.subtitle, new.narrative, new.facts, new.concepts);
END;
```
**What changed:**
- ✅ Structured fields (title, subtitle, type)
- ✅ FTS5 full-text search
- ✅ Project-scoped queries
- ✅ Rich metadata for progressive disclosure
### Worker Service Redesign
**v3 worker:**
```typescript
// Multiple short SDK sessions
app.post('/process', async (req, res) => {
const response = await query({
prompt: buildPrompt(req.body),
options: { maxTurns: 1 }
});
for await (const msg of response) {
// Process single observation
}
res.json({ success: true });
});
```
**v4 worker:**
```typescript
// ONE long-running SDK session
async function runWorker(sessionId: string) {
const response = query({
prompt: messageGenerator(), // AsyncIterable
options: { maxTurns: 1000 }
});
for await (const msg of response) {
if (msg.type === 'text') {
parseObservations(msg.content);
parseSummaries(msg.content);
}
}
}
```
**Benefits:**
- Maintains conversation state
- SDK handles context automatically
- More efficient (fewer API calls)
- Natural multi-turn flow
---
## Critical Fixes Along the Way
### Fix 1: Context Injection Pollution (v4.3.1)
**Problem:** SessionStart hook output polluted with npm install logs
```bash
# Hook output contained:
npm WARN deprecated ...
npm WARN deprecated ...
{"hookSpecificOutput": {"additionalContext": "..."}}
```
**Why it broke:**
- Claude Code expects clean JSON or plain text
- stderr/stdout from npm install mixed with hook output
- Context didn't inject properly
**Solution:**
```json
{
"command": "npm install --loglevel=silent && node context-hook.js"
}
```
**Result:** Clean JSON output, context injection works
### Fix 2: Double Shebang Issue (v4.3.1)
**Problem:** Hook executables had duplicate shebangs
```javascript
#!/usr/bin/env node
#!/usr/bin/env node // ← Duplicate!
// Rest of code...
```
**Why it happened:**
- Source files had shebang
- esbuild added another shebang during build
**Solution:**
```typescript
// Remove shebangs from source files
// Let esbuild add them during build
```
**Result:** Clean executables, no parsing errors
### Fix 3: FTS5 Injection Vulnerability (v4.2.3)
**Problem:** User input passed directly to FTS5 query
```typescript
// ❌ Vulnerable
const results = db.query(
`SELECT * FROM observations_fts WHERE observations_fts MATCH '${userQuery}'`
);
```
**Attack:**
```typescript
userQuery = "'; DROP TABLE observations; --"
```
**Solution:**
```typescript
// ✅ Safe: Use parameterized queries
const results = db.query(
'SELECT * FROM observations_fts WHERE observations_fts MATCH ?',
[userQuery]
);
```
### Fix 4: NOT NULL Constraint Violation (v4.2.8)
**Problem:** Session creation failed when prompt was empty
```sql
INSERT INTO sdk_sessions (claude_session_id, user_prompt, ...)
VALUES ('abc123', NULL, ...) -- ❌ user_prompt is NOT NULL
```
**Solution:**
```typescript
// Allow NULL user_prompts
user_prompt: input.prompt ?? null
```
**Schema change:**
```sql
-- Before
user_prompt TEXT NOT NULL
-- After
user_prompt TEXT -- Nullable
```
---
## Performance Improvements
### Optimization 1: Prepared Statements
**Before:**
```typescript
for (const obs of observations) {
db.run(`INSERT INTO observations (...) VALUES (?, ?, ...)`, [obs.id, obs.text, ...]);
}
```
**After:**
```typescript
const stmt = db.prepare(`INSERT INTO observations (...) VALUES (?, ?, ...)`);
for (const obs of observations) {
stmt.run([obs.id, obs.text, ...]);
}
stmt.finalize();
```
**Impact:** 5x faster bulk inserts
### Optimization 2: FTS5 Indexing
**Before:**
```typescript
// Manual full-text search
const results = db.query(
`SELECT * FROM observations WHERE text LIKE '%${query}%'`
);
```
**After:**
```typescript
// FTS5 virtual table
const results = db.query(
`SELECT * FROM observations_fts WHERE observations_fts MATCH ?`,
[query]
);
```
**Impact:** 100x faster searches on large datasets
### Optimization 3: Index Format Default
**Before:**
```typescript
// Always return full observations
search_observations({ query: "hooks" });
// Returns: 5,000 tokens
```
**After:**
```typescript
// Default to index format
search_observations({ query: "hooks", format: "index" });
// Returns: 200 tokens
// Fetch full only when needed
search_observations({ query: "hooks", format: "full", limit: 1 });
// Returns: 150 tokens
```
**Impact:** 25x reduction in average search result size
---
## What We Learned
### Lesson 1: Context is Precious
**Principle:** Every token you put in context window costs attention.
**Application:**
- Progressive disclosure reduces waste by 87%
- Index-first approach gives agent control
- Token counts make costs visible
### Lesson 2: Session State is Complicated
**Principle:** Distributed state is hard. SDK handles it better than we can.
**Application:**
- Use SDK's built-in session resumption
- Don't try to manually reconstruct state
- Track session IDs from init messages
### Lesson 3: Graceful Beats Aggressive
**Principle:** Let processes finish their work before terminating.
**Application:**
- Graceful cleanup prevents data loss
- Workers finish important operations
- Clean state transitions reduce bugs
### Lesson 4: AI is the Compressor
**Principle:** Don't compress manually. Let AI do semantic compression.
**Application:**
- 10:1 to 100:1 compression ratios
- Semantic understanding, not keyword extraction
- Structured outputs (XML parsing)
### Lesson 5: Progressive Everything
**Principle:** Show metadata first, fetch details on-demand.
**Application:**
- Progressive disclosure in context injection
- Index format in search results
- Layer 1 (titles) → Layer 2 (summaries) → Layer 3 (full details)
---
## The Road Ahead
### Planned: Adaptive Index Size
```typescript
SessionStart({ source: "startup" }):
→ Show last 10 sessions (normal)
SessionStart({ source: "resume" }):
→ Show only current session (minimal)
SessionStart({ source: "compact" }):
→ Show last 20 sessions (comprehensive)
```
### Planned: Relevance Scoring
```typescript
// Use embeddings to pre-sort index by semantic relevance
search_observations({
query: "authentication bug",
sort: "relevance" // Based on embeddings
});
```
### Planned: Multi-Project Context
```typescript
// Cross-project pattern recognition
search_observations({
query: "API rate limiting",
projects: ["api-gateway", "user-service", "billing-service"]
});
```
### Planned: Collaborative Memory
```typescript
// Team-shared observations (optional)
createObservation({
title: "Rate limit: 100 req/min",
scope: "team" // vs "user"
});
```
---
## Migration Guide: v3 → v4
### Step 1: Backup Database
```bash
cp ~/.claude-mem/claude-mem.db ~/.claude-mem/claude-mem-v3-backup.db
```
### Step 2: Update Plugin
```bash
cd ~/.claude/plugins/marketplaces/thedotmack
git pull
```
### Step 3: Run Migration
```bash
npx tsx src/services/sqlite/migrations/v3-to-v4.ts
```
**What the migration does:**
- Adds new columns to observations table
- Creates FTS5 virtual tables
- Sets up auto-sync triggers
- Migrates existing observations to new schema
### Step 4: Restart Worker
```bash
pm2 restart claude-mem-worker
pm2 logs claude-mem-worker
```
### Step 5: Test
```bash
# Start Claude Code
claude
# Check that context is injected
# (Should see progressive disclosure index)
# Submit a prompt and check observations
pm2 logs claude-mem-worker --nostream
```
---
## Key Metrics
### v3 Performance
| Metric | Value |
|--------|-------|
| Context usage per session | ~25,000 tokens |
| Relevant context | ~2,000 tokens (8%) |
| Hook execution time | ~200ms |
| Search latency | ~500ms (LIKE queries) |
### v4 Performance
| Metric | Value |
|--------|-------|
| Context usage per session | ~1,100 tokens |
| Relevant context | ~1,100 tokens (100%) |
| Hook execution time | ~45ms |
| Search latency | ~15ms (FTS5) |
**Improvements:**
- 96% reduction in context waste
- 12x increase in relevance
- 4x faster hooks
- 33x faster search
---
## Conclusion
The journey from v3 to v4 was about understanding these fundamental truths:
1. **Context is finite** - Progressive disclosure respects attention budget
2. **AI is the compressor** - Semantic understanding beats keyword extraction
3. **Agents are smart** - Let them decide what to fetch
4. **State is hard** - Use SDK's built-in mechanisms
5. **Graceful wins** - Let processes finish cleanly
The result is a memory system that's both powerful and invisible. Users never notice it working - Claude just gets smarter over time.
---
## Further Reading
- [Progressive Disclosure](/docs/progressive-disclosure) - The philosophy behind v4
- [Hooks Architecture](/docs/hooks-architecture) - How hooks power the system
- [Context Engineering](/docs/context-engineering) - Foundational principles
- [v4.0.0 Release Notes](/CHANGELOG.md#v400) - Full changelog
---
*This architecture evolution reflects hundreds of hours of experimentation, dozens of dead ends, and the invaluable experience of real-world usage. v4 is the architecture that emerged from understanding what actually works.*
+222
View File
@@ -0,0 +1,222 @@
# Context Engineering for AI Agents: Best Practices Cheat Sheet
## Core Principle
**Find the smallest possible set of high-signal tokens that maximize the likelihood of your desired outcome.**
---
## Context Engineering vs Prompt Engineering
**Prompt Engineering**: Writing and organizing LLM instructions for optimal outcomes (one-time task)
**Context Engineering**: Curating and maintaining the optimal set of tokens during inference across multiple turns (iterative process)
Context engineering manages:
- System instructions
- Tools
- Model Context Protocol (MCP)
- External data
- Message history
- Runtime data retrieval
---
## The Problem: Context Rot
**Key Insight**: LLMs have an "attention budget" that gets depleted as context grows
- Every token attends to every other token (n² relationships)
- As context length increases, model accuracy decreases
- Models have less training experience with longer sequences
- Context must be treated as a finite resource with diminishing marginal returns
---
## System Prompts: Find the "Right Altitude"
### The Goldilocks Zone
**Too Prescriptive** ❌
- Hardcoded if-else logic
- Brittle and fragile
- High maintenance complexity
**Too Vague** ❌
- High-level guidance without concrete signals
- Falsely assumes shared context
- Lacks actionable direction
**Just Right** ✅
- Specific enough to guide behavior effectively
- Flexible enough to provide strong heuristics
- Minimal set of information that fully outlines expected behavior
### Best Practices
- Use simple, direct language
- Organize into distinct sections (`<background_information>`, `<instructions>`, `## Tool guidance`, etc.)
- Use XML tags or Markdown headers for structure
- Start with minimal prompt, add based on failure modes
- Note: Minimal ≠ short (provide sufficient information upfront)
---
## Tools: Minimal and Clear
### Design Principles
- **Self-contained**: Each tool has a single, clear purpose
- **Robust to error**: Handle edge cases gracefully
- **Extremely clear**: Intended use is unambiguous
- **Token-efficient**: Returns relevant information without bloat
- **Descriptive parameters**: Unambiguous input names (e.g., `user_id` not `user`)
### Critical Rule
**If a human engineer can't definitively say which tool to use in a given situation, an AI agent can't be expected to do better.**
### Common Failure Modes to Avoid
- Bloated tool sets covering too much functionality
- Tools with overlapping purposes
- Ambiguous decision points about which tool to use
---
## Examples: Diverse, Not Exhaustive
**Do** ✅
- Curate a set of diverse, canonical examples
- Show expected behavior effectively
- Think "pictures worth a thousand words"
**Don't** ❌
- Stuff in a laundry list of edge cases
- Try to articulate every possible rule
- Overwhelm with exhaustive scenarios
---
## Context Retrieval Strategies
### Just-In-Time Context (Recommended for Agents)
**Approach**: Maintain lightweight identifiers (file paths, queries, links) and dynamically load data at runtime
**Benefits**:
- Avoids context pollution
- Enables progressive disclosure
- Mirrors human cognition (we don't memorize everything)
- Leverages metadata (file names, folder structure, timestamps)
- Agents discover context incrementally
**Trade-offs**:
- Slower than pre-computed retrieval
- Requires proper tool guidance to avoid dead-ends
### Pre-Inference Retrieval (Traditional RAG)
**Approach**: Use embedding-based retrieval to surface context before inference
**When to Use**: Static content that won't change during interaction
### Hybrid Strategy (Best of Both)
**Approach**: Retrieve some data upfront, enable autonomous exploration as needed
**Example**: Claude Code loads CLAUDE.md files upfront, uses glob/grep for just-in-time retrieval
**Rule of Thumb**: "Do the simplest thing that works"
---
## Long-Horizon Tasks: Three Techniques
### 1. Compaction
**What**: Summarize conversation nearing context limit, reinitiate with summary
**Implementation**:
- Pass message history to model for compression
- Preserve critical details (architectural decisions, bugs, implementation)
- Discard redundant outputs
- Continue with compressed context + recently accessed files
**Tuning Process**:
1. **First**: Maximize recall (capture all relevant information)
2. **Then**: Improve precision (eliminate superfluous content)
**Low-Hanging Fruit**: Clear old tool calls and results
**Best For**: Tasks requiring extensive back-and-forth
### 2. Structured Note-Taking (Agentic Memory)
**What**: Agent writes notes persisted outside context window, retrieved later
**Examples**:
- To-do lists
- NOTES.md files
- Game state tracking (Pokémon example: tracking 1,234 steps of training)
- Project progress logs
**Benefits**:
- Persistent memory with minimal overhead
- Maintains critical context across tool calls
- Enables multi-hour coherent strategies
**Best For**: Iterative development with clear milestones
### 3. Sub-Agent Architectures
**What**: Specialized sub-agents handle focused tasks with clean context windows
**How It Works**:
- Main agent coordinates high-level plan
- Sub-agents perform deep technical work
- Sub-agents explore extensively (tens of thousands of tokens)
- Return condensed summaries (1,000-2,000 tokens)
**Benefits**:
- Clear separation of concerns
- Parallel exploration
- Detailed context remains isolated
**Best For**: Complex research and analysis tasks
---
## Quick Decision Framework
| Scenario | Recommended Approach |
|----------|---------------------|
| Static content | Pre-inference retrieval or hybrid |
| Dynamic exploration needed | Just-in-time context |
| Extended back-and-forth | Compaction |
| Iterative development | Structured note-taking |
| Complex research | Sub-agent architectures |
| Rapid model improvement | "Do the simplest thing that works" |
---
## Key Takeaways
1. **Context is finite**: Treat it as a precious resource with an attention budget
2. **Think holistically**: Consider the entire state available to the LLM
3. **Stay minimal**: More context isn't always better
4. **Be iterative**: Context curation happens each time you pass to the model
5. **Design for autonomy**: As models improve, let them act intelligently
6. **Start simple**: Test with minimal setup, add based on failure modes
---
## Anti-Patterns to Avoid
- ❌ Cramming everything into prompts
- ❌ Creating brittle if-else logic
- ❌ Building bloated tool sets
- ❌ Stuffing exhaustive edge cases as examples
- ❌ Assuming larger context windows solve everything
- ❌ Ignoring context pollution over long interactions
---
## Remember
> "Even as models continue to improve, the challenge of maintaining coherence across extended interactions will remain central to building more effective agents."
Context engineering will evolve, but the core principle stays the same: **optimize signal-to-noise ratio in your token budget**.
---
*Based on Anthropic's "Effective context engineering for AI agents" (September 2025)*
+10
View File
@@ -39,6 +39,14 @@
"usage/search-tools"
]
},
{
"group": "Best Practices",
"icon": "lightbulb",
"pages": [
"context-engineering",
"progressive-disclosure"
]
},
{
"group": "Configuration & Development",
"icon": "gear",
@@ -53,6 +61,8 @@
"icon": "diagram-project",
"pages": [
"architecture/overview",
"architecture-evolution",
"hooks-architecture",
"architecture/hooks",
"architecture/worker-service",
"architecture/database",
+784
View File
@@ -0,0 +1,784 @@
# How Claude-Mem Uses Hooks: A Lifecycle-Driven Architecture
## Core Principle
**Observe the main Claude Code session from the outside, process observations in the background, inject context at the right time.**
---
## The Big Picture
Claude-Mem is fundamentally a **hook-driven system**. Every piece of functionality happens in response to lifecycle events:
```
┌─────────────────────────────────────────────────────────┐
│ CLAUDE CODE SESSION │
│ (Main session - user interacting with Claude) │
│ │
│ SessionStart → UserPromptSubmit → Tool Use → Stop │
│ ↓ ↓ ↓ ↓ │
│ [Hook] [Hook] [Hook] [Hook] │
└─────────────────────────────────────────────────────────┘
↓ ↓ ↓ ↓
┌─────────────────────────────────────────────────────────┐
│ CLAUDE-MEM SYSTEM │
│ │
│ Context New Session Observation Summary │
│ Injection Tracking Capture Generation │
└─────────────────────────────────────────────────────────┘
```
**Key insight:** Claude-Mem doesn't interrupt or modify Claude Code's behavior. It observes from the outside and provides value through lifecycle hooks.
---
## Why Hooks?
### The Non-Invasive Requirement
Claude-Mem had several architectural constraints:
1. **Can't modify Claude Code**: It's a closed-source binary
2. **Must be fast**: Can't slow down the main session
3. **Must be reliable**: Can't break Claude Code if it fails
4. **Must be portable**: Works on any project without configuration
**Solution:** External command hooks configured via settings.json
### The Hook System Advantage
Claude Code's hook system provides exactly what we need:
<CardGroup cols={2}>
<Card title="Lifecycle Events" icon="clock">
SessionStart, UserPromptSubmit, PostToolUse, Stop
</Card>
<Card title="Non-Blocking" icon="forward">
Hooks run in parallel, don't wait for completion
</Card>
<Card title="Context Injection" icon="upload">
SessionStart and UserPromptSubmit can add context
</Card>
<Card title="Tool Observation" icon="eye">
PostToolUse sees all tool inputs and outputs
</Card>
</CardGroup>
---
## The Five Hooks
### Hook 1: SessionStart (Context Hook)
**Purpose:** Inject relevant context from previous sessions
**When:** Claude Code starts or resumes
**What it does:**
1. Extracts project name from current working directory
2. Queries SQLite for recent session summaries (last 10)
3. Queries SQLite for recent observations (last 50)
4. Formats as progressive disclosure index
5. Outputs to stdout (automatically injected into context)
**Configuration:**
```json
{
"hooks": {
"SessionStart": [{
"matcher": "startup",
"hooks": [{
"type": "command",
"command": "${CLAUDE_PLUGIN_ROOT}/scripts/context-hook.js",
"timeout": 120
}]
}]
}
}
```
**Key decisions:**
- ✅ Only runs on "startup" (not "clear" or "compact")
- ✅ 120-second timeout for npm install (v4.3.1 fix)
- ✅ Uses `--loglevel=silent` for clean JSON output
- ✅ Progressive disclosure format (index, not full details)
**Output format:**
```markdown
# [claude-mem] recent context
**Legend:** 🎯 session-request | 🔴 gotcha | 🟡 problem-solution ...
### Oct 26, 2025
**General**
| ID | Time | T | Title | Tokens |
|----|------|---|-------|--------|
| #2586 | 12:58 AM | 🔵 | Context hook file empty | ~51 |
*Use claude-mem MCP search to access full details*
```
**Source:** `src/hooks/context-hook.ts` → `plugin/scripts/context-hook.js`
---
### Hook 2: UserPromptSubmit (New Session Hook)
**Purpose:** Initialize session tracking when user submits a prompt
**When:** Before Claude processes the user's message
**What it does:**
1. Reads user prompt and session ID from stdin
2. Creates new session record in SQLite
3. Saves raw user prompt for full-text search (v4.2.0+)
4. Starts PM2 worker service if not running
5. Returns immediately (non-blocking)
**Configuration:**
```json
{
"hooks": {
"UserPromptSubmit": [{
"hooks": [{
"type": "command",
"command": "${CLAUDE_PLUGIN_ROOT}/scripts/new-hook.js"
}]
}]
}
}
```
**Key decisions:**
- ✅ No matcher (runs for all prompts)
- ✅ Creates session record immediately
- ✅ Stores raw prompts for search (privacy note: local SQLite only)
- ✅ Auto-starts worker service
- ✅ Suppresses output (`suppressOutput: true`)
**Database operations:**
```sql
INSERT INTO sdk_sessions (claude_session_id, project, user_prompt, ...)
VALUES (?, ?, ?, ...)
INSERT INTO user_prompts (session_id, prompt, prompt_number, ...)
VALUES (?, ?, ?, ...)
```
**Source:** `src/hooks/new-hook.ts` → `plugin/scripts/new-hook.js`
---
### Hook 3: PostToolUse (Save Observation Hook)
**Purpose:** Capture tool execution observations for later processing
**When:** Immediately after any tool completes successfully
**What it does:**
1. Receives tool name, input, output from stdin
2. Finds active session for current project
3. Enqueues observation in observation_queue table
4. Returns immediately (processing happens in worker)
**Configuration:**
```json
{
"hooks": {
"PostToolUse": [{
"matcher": "*",
"hooks": [{
"type": "command",
"command": "${CLAUDE_PLUGIN_ROOT}/scripts/save-hook.js"
}]
}]
}
}
```
**Key decisions:**
- ✅ Matcher: `*` (captures all tools)
- ✅ Non-blocking (just enqueues, doesn't process)
- ✅ Worker processes observations asynchronously
- ✅ Parallel execution safe (each hook gets own stdin)
**Database operations:**
```sql
INSERT INTO observation_queue (session_id, tool_name, tool_input, tool_output, ...)
VALUES (?, ?, ?, ?, ...)
```
**What gets queued:**
```json
{
"session_id": "abc123",
"tool_name": "Edit",
"tool_input": {
"file_path": "/path/to/file.ts",
"old_string": "...",
"new_string": "..."
},
"tool_output": {
"success": true,
"linesChanged": 5
},
"created_at_epoch": 1698765432
}
```
**Source:** `src/hooks/save-hook.ts` → `plugin/scripts/save-hook.js`
---
### Hook 4: Summary Hook (Mid-Session Checkpoint)
**Purpose:** Generate AI-powered session summaries during the session
**When:** Triggered programmatically by the worker service
**What it does:**
1. Gathers session observations from database
2. Sends to Claude Agent SDK for summarization
3. Processes response and extracts structured summary
4. Stores in session_summaries table
**Configuration:**
```json
{
"hooks": {
"Summary": [{
"hooks": [{
"type": "command",
"command": "${CLAUDE_PLUGIN_ROOT}/scripts/summary-hook.js"
}]
}]
}
}
```
**Key decisions:**
- ✅ Triggered by worker, not by Claude Code lifecycle
- ✅ Multiple summaries per session (v4.2.0+)
- ✅ Summaries are checkpoints, not endings
- ✅ Uses Claude Agent SDK for AI compression
**Summary structure:**
```xml
<summary>
<request>User's original request</request>
<investigated>What was examined</investigated>
<learned>Key discoveries</learned>
<completed>Work finished</completed>
<next_steps>Remaining tasks</next_steps>
<files_read>
<file>path/to/file1.ts</file>
<file>path/to/file2.ts</file>
</files_read>
<files_modified>
<file>path/to/file3.ts</file>
</files_modified>
<notes>Additional context</notes>
</summary>
```
**Source:** `src/hooks/summary-hook.ts` → `plugin/scripts/summary-hook.js`
---
### Hook 5: SessionEnd (Cleanup Hook)
**Purpose:** Mark sessions as completed when they end
**When:** Claude Code session ends (not on `/clear`)
**What it does:**
1. Marks session as completed in database
2. Allows worker to finish processing
3. Performs graceful cleanup
**Configuration:**
```json
{
"hooks": {
"SessionEnd": [{
"hooks": [{
"type": "command",
"command": "${CLAUDE_PLUGIN_ROOT}/scripts/cleanup-hook.js"
}]
}]
}
}
```
**Key decisions:**
- ✅ Graceful completion (v4.1.0+)
- ✅ No longer sends DELETE to workers
- ✅ Skips cleanup on `/clear` commands
- ✅ Preserves ongoing sessions
**Why graceful cleanup?**
**Old approach (v3):**
```typescript
// ❌ Aggressive cleanup
SessionEnd → DELETE /worker/session → Worker stops immediately
```
**Problems:**
- Interrupted summary generation
- Lost pending observations
- Race conditions
**New approach (v4.1.0+):**
```typescript
// ✅ Graceful completion
SessionEnd → UPDATE sessions SET completed_at = NOW()
Worker sees completion → Finishes processing → Exits naturally
```
**Benefits:**
- Worker finishes important operations
- Summaries complete successfully
- Clean state transitions
**Source:** `src/hooks/cleanup-hook.ts` → `plugin/scripts/cleanup-hook.js`
---
## Hook Execution Flow
### Session Lifecycle
```mermaid
sequenceDiagram
participant User
participant Claude
participant Hooks
participant Worker
participant DB
User->>Claude: Start Claude Code
Claude->>Hooks: SessionStart hook
Hooks->>DB: Query recent context
DB-->>Hooks: Session summaries + observations
Hooks-->>Claude: Inject context
Note over Claude: Context available for session
User->>Claude: Submit prompt
Claude->>Hooks: UserPromptSubmit hook
Hooks->>DB: Create session record
Hooks->>Worker: Start worker (if not running)
Worker-->>DB: Ready to process
Claude->>Claude: Execute tools
Claude->>Hooks: PostToolUse (multiple times)
Hooks->>DB: Queue observations
Note over Worker: Polls queue, processes observations
Worker->>Worker: AI compression
Worker->>DB: Store compressed observations
Worker->>Hooks: Trigger summary hook
Hooks->>DB: Store session summary
User->>Claude: Finish
Claude->>Hooks: SessionEnd hook
Hooks->>DB: Mark session complete
Worker->>DB: Check completion
Worker->>Worker: Finish processing
Worker->>Worker: Exit gracefully
```
### Hook Timing
| Event | Timing | Blocking | Timeout | Output Handling |
|-------|--------|----------|---------|-----------------|
| **SessionStart** | Before session | No | 120s | stdout → context |
| **UserPromptSubmit** | Before processing | No | 60s | stdout → context |
| **PostToolUse** | After tool | No | 60s | Transcript only |
| **Summary** | Worker triggered | No | 300s | Database |
| **SessionEnd** | On exit | No | 60s | Log only |
---
## The Worker Service Architecture
### Why a Background Worker?
**Problem:** Hooks must be fast (< 1 second)
**Reality:** AI compression takes 5-30 seconds per observation
**Solution:** Hooks enqueue observations, worker processes async
```
┌─────────────────────────────────────────────────────────┐
│ HOOK (Fast) │
│ 1. Read stdin (< 1ms) │
│ 2. Insert into queue (< 10ms) │
│ 3. Return success (< 20ms total) │
└─────────────────────────────────────────────────────────┘
↓ (queue)
┌─────────────────────────────────────────────────────────┐
│ WORKER (Slow) │
│ 1. Poll queue every 1s │
│ 2. Process observation via Claude SDK (5-30s) │
│ 3. Parse and store results │
│ 4. Mark observation processed │
└─────────────────────────────────────────────────────────┘
```
### PM2 Process Management
**Technology:** PM2 (process manager for Node.js)
**Why PM2:**
- Auto-restart on failure
- Log management
- Process monitoring
- Cross-platform (works on macOS, Linux, Windows)
- No systemd/launchd needed
**Configuration:**
```javascript
// ecosystem.config.cjs
module.exports = {
apps: [{
name: 'claude-mem-worker',
script: './plugin/scripts/worker-service.cjs',
instances: 1,
autorestart: true,
watch: false,
max_memory_restart: '500M',
env: {
NODE_ENV: 'production',
CLAUDE_MEM_WORKER_PORT: 37777
}
}]
};
```
**Worker lifecycle:**
```bash
# Started by new-hook (if not running)
pm2 start ecosystem.config.cjs
# Status check
pm2 status claude-mem-worker
# View logs
pm2 logs claude-mem-worker
# Restart
pm2 restart claude-mem-worker
```
### Worker HTTP API
**Technology:** Express.js REST API on port 37777
**Endpoints:**
| Endpoint | Method | Purpose |
|----------|--------|---------|
| `/health` | GET | Health check |
| `/sessions` | POST | Create session |
| `/sessions/:id` | GET | Get session status |
| `/sessions/:id` | PATCH | Update session |
| `/observations` | POST | Enqueue observation |
| `/observations/:id` | GET | Get observation |
**Why HTTP API?**
- Language-agnostic (hooks can be any language)
- Easy debugging (curl commands)
- Standard error handling
- Proper async handling
---
## Design Patterns
### Pattern 1: Fire-and-Forget Hooks
**Principle:** Hooks should return immediately, not wait for completion
```typescript
// ❌ Bad: Hook waits for processing
export async function saveHook(stdin: HookInput) {
const observation = parseInput(stdin);
await processObservation(observation); // BLOCKS!
return success();
}
// ✅ Good: Hook enqueues and returns
export async function saveHook(stdin: HookInput) {
const observation = parseInput(stdin);
await enqueueObservation(observation); // Fast
return success(); // Immediate
}
```
### Pattern 2: Queue-Based Processing
**Principle:** Decouple capture from processing
```
Hook (capture) → Queue (buffer) → Worker (process)
```
**Benefits:**
- Parallel hook execution safe
- Worker failure doesn't affect hooks
- Retry logic centralized
- Backpressure handling
### Pattern 3: Graceful Degradation
**Principle:** Memory system failure shouldn't break Claude Code
```typescript
try {
await captureObservation();
} catch (error) {
// Log error, but don't throw
console.error('Memory capture failed:', error);
return { continue: true, suppressOutput: true };
}
```
**Failure modes:**
- Database locked → Skip observation, log error
- Worker crashed → Auto-restart via PM2
- Network issue → Retry with exponential backoff
- Disk full → Warn user, disable memory
### Pattern 4: Progressive Enhancement
**Principle:** Core functionality works without memory, memory enhances it
```
Without memory: Claude Code works normally
With memory: Claude Code + context from past sessions
Memory broken: Falls back to working normally
```
---
## Hook Debugging
### Debug Mode
Enable detailed hook execution logs:
```bash
claude --debug
```
**Output:**
```
[DEBUG] Executing hooks for PostToolUse:Write
[DEBUG] Getting matching hook commands for PostToolUse with query: Write
[DEBUG] Found 1 hook matchers in settings
[DEBUG] Matched 1 hooks for query "Write"
[DEBUG] Found 1 hook commands to execute
[DEBUG] Executing hook command: ${CLAUDE_PLUGIN_ROOT}/scripts/save-hook.js with timeout 60000ms
[DEBUG] Hook command completed with status 0: {"continue":true,"suppressOutput":true}
```
### Common Issues
<AccordionGroup>
<Accordion title="Hook not executing">
**Symptoms:** Hook command never runs
**Debugging:**
1. Check `/hooks` menu - is hook registered?
2. Verify matcher pattern (case-sensitive!)
3. Test command manually: `echo '{}' | node save-hook.js`
4. Check file permissions (executable?)
</Accordion>
<Accordion title="Hook times out">
**Symptoms:** Hook execution exceeds timeout
**Debugging:**
1. Check timeout setting (default 60s)
2. Identify slow operation (database? network?)
3. Move slow operation to worker
4. Increase timeout if necessary
</Accordion>
<Accordion title="Context not injecting">
**Symptoms:** SessionStart hook runs but context missing
**Debugging:**
1. Check stdout (must be valid JSON or plain text)
2. Verify no stderr output (pollutes JSON)
3. Check exit code (must be 0)
4. Look for npm install output (v4.3.1 fix)
</Accordion>
<Accordion title="Observations not captured">
**Symptoms:** PostToolUse hook runs but observations missing
**Debugging:**
1. Check database: `sqlite3 ~/.claude-mem/claude-mem.db "SELECT * FROM observation_queue"`
2. Verify session exists: `SELECT * FROM sdk_sessions`
3. Check worker status: `pm2 status`
4. View worker logs: `pm2 logs claude-mem-worker`
</Accordion>
</AccordionGroup>
### Testing Hooks Manually
```bash
# Test context hook
echo '{
"session_id": "test123",
"cwd": "/Users/alex/projects/my-app",
"hook_event_name": "SessionStart",
"source": "startup"
}' | node plugin/scripts/context-hook.js
# Test save hook
echo '{
"session_id": "test123",
"tool_name": "Edit",
"tool_input": {"file_path": "test.ts"},
"tool_output": {"success": true}
}' | node plugin/scripts/save-hook.js
# Test with actual Claude Code
claude --debug
/hooks # View registered hooks
# Submit prompt and watch debug output
```
---
## Performance Considerations
### Hook Execution Time
**Target:** < 100ms per hook
**Actual measurements:**
| Hook | Average | p95 | p99 |
|------|---------|-----|-----|
| SessionStart | 45ms | 120ms | 250ms |
| UserPromptSubmit | 12ms | 25ms | 50ms |
| PostToolUse | 8ms | 15ms | 30ms |
| SessionEnd | 5ms | 10ms | 20ms |
**Why SessionStart is slower:**
- npm install check (idempotent but runs every time)
- Database query for 10 sessions + 50 observations
- Formatting progressive disclosure index
**Optimization (v4.3.1):**
- Use `--loglevel=silent` for npm install
- Cache package.json hash to skip unnecessary installs
- Use prepared statements for database queries
### Database Performance
**Schema optimizations:**
- Indexes on `project`, `created_at_epoch`, `claude_session_id`
- FTS5 virtual tables for full-text search
- WAL mode for concurrent reads/writes
**Query patterns:**
```sql
-- Fast: Uses index on (project, created_at_epoch)
SELECT * FROM session_summaries
WHERE project = ?
ORDER BY created_at_epoch DESC
LIMIT 10
-- Fast: Uses index on claude_session_id
SELECT * FROM sdk_sessions
WHERE claude_session_id = ?
LIMIT 1
-- Fast: FTS5 full-text search
SELECT * FROM observations_fts
WHERE observations_fts MATCH ?
ORDER BY rank
LIMIT 20
```
### Worker Throughput
**Bottleneck:** Claude API latency (5-30s per observation)
**Mitigation:**
- Process observations sequentially (simpler, more predictable)
- Skip low-value observations (TodoWrite, ListMcpResourcesTool)
- Batch summaries (generate every N observations, not every observation)
**Future optimization:**
- Parallel processing (multiple workers)
- Smart batching (combine related observations)
- Lazy summarization (summarize only when needed)
---
## Security Considerations
### Hook Command Safety
**Risk:** Hooks execute arbitrary commands with user permissions
**Mitigations:**
1. **Frozen at startup:** Hook configuration captured at start, changes require review
2. **User review required:** `/hooks` menu shows changes, requires approval
3. **Plugin isolation:** `${CLAUDE_PLUGIN_ROOT}` prevents path traversal
4. **Input validation:** Hooks validate stdin schema before processing
### Data Privacy
**What gets stored:**
- User prompts (raw text) - v4.2.0+
- Tool inputs and outputs
- File paths read/modified
- Session summaries
**Privacy guarantees:**
- All data stored locally in `~/.claude-mem/claude-mem.db`
- No cloud uploads (API calls only for AI compression)
- SQLite file permissions: user-only read/write
- No analytics or telemetry
### API Key Protection
**Configuration:**
- Anthropic API key in `~/.anthropic/api_key` or `ANTHROPIC_API_KEY` env var
- Worker inherits environment from Claude Code
- Never logged or stored in database
---
## Key Takeaways
1. **Hooks are interfaces**: They define clean boundaries between systems
2. **Non-blocking is critical**: Hooks must return fast, workers do the heavy lifting
3. **Graceful degradation**: Memory system can fail without breaking Claude Code
4. **Queue-based decoupling**: Capture and processing happen independently
5. **Progressive disclosure**: Context injection uses index-first approach
6. **Lifecycle alignment**: Each hook has a clear, single purpose
---
## Further Reading
- [Claude Code Hooks Reference](https://docs.claude.com/claude-code/hooks) - Official documentation
- [Progressive Disclosure](/docs/progressive-disclosure) - Context priming philosophy
- [Architecture Evolution](/docs/architecture-evolution) - v3 to v4 journey
- [Worker Service Design](/docs/worker-service) - Background processing details
---
*The hook-driven architecture enables Claude-Mem to be both powerful and invisible. Users never notice the memory system working - it just makes Claude smarter over time.*
+655
View File
@@ -0,0 +1,655 @@
# Progressive Disclosure: Claude-Mem's Context Priming Philosophy
## Core Principle
**Show what exists and its retrieval cost first. Let the agent decide what to fetch based on relevance and need.**
---
## What is Progressive Disclosure?
Progressive disclosure is an information architecture pattern where you reveal complexity gradually rather than all at once. In the context of AI agents, it means:
1. **Layer 1 (Index)**: Show lightweight metadata (titles, dates, types, token counts)
2. **Layer 2 (Details)**: Fetch full content only when needed
3. **Layer 3 (Deep Dive)**: Read original source files if required
This mirrors how humans work: We scan headlines before reading articles, review table of contents before diving into chapters, and check file names before opening files.
---
## The Problem: Context Pollution
Traditional RAG (Retrieval-Augmented Generation) systems fetch everything upfront:
```
❌ Traditional Approach:
┌─────────────────────────────────────┐
│ Session Start │
│ │
│ [15,000 tokens of past sessions] │
│ [8,000 tokens of observations] │
│ [12,000 tokens of file summaries] │
│ │
│ Total: 35,000 tokens │
│ Relevant: ~2,000 tokens (6%) │
└─────────────────────────────────────┘
```
**Problems:**
- Wastes 94% of attention budget on irrelevant context
- User prompt gets buried under mountain of history
- Agent must process everything before understanding task
- No way to know what's actually useful until after reading
---
## Claude-Mem's Solution: Progressive Disclosure
```
✅ Progressive Disclosure Approach:
┌─────────────────────────────────────┐
│ Session Start │
│ │
│ Index of 50 observations: ~800 tokens│
│ ↓ │
│ Agent sees: "🔴 Hook timeout issue" │
│ Agent decides: "Relevant!" │
│ ↓ │
│ Fetch observation #2543: ~120 tokens│
│ │
│ Total: 920 tokens │
│ Relevant: 920 tokens (100%) │
└─────────────────────────────────────┘
```
**Benefits:**
- Agent controls its own context consumption
- Directly relevant to current task
- Can fetch more if needed
- Can skip everything if not relevant
- Clear cost/benefit for each retrieval decision
---
## How It Works in Claude-Mem
### The Index Format
Every SessionStart hook provides a compact index:
```markdown
### Oct 26, 2025
**General**
| ID | Time | T | Title | Tokens |
|----|------|---|-------|--------|
| #2586 | 12:58 AM | 🔵 | Context hook file exists but is empty | ~51 |
| #2587 | ″ | 🔵 | Context hook script file is empty | ~46 |
| #2589 | ″ | 🟡 | Investigated hook debug output docs | ~105 |
**src/hooks/context-hook.ts**
| ID | Time | T | Title | Tokens |
|----|------|---|-------|--------|
| #2591 | 1:15 AM | ⚖️ | Stderr messaging abandoned | ~155 |
| #2592 | 1:16 AM | ⚖️ | Web UI strategy redesigned | ~193 |
```
**What the agent sees:**
- **What exists**: Observation titles give semantic meaning
- **When it happened**: Timestamps for temporal context
- **What type**: Icons indicate observation category
- **Retrieval cost**: Token counts for informed decisions
- **Where to get it**: MCP search tools referenced at bottom
### The Legend System
```
🎯 session-request - User's original goal
🔴 gotcha - Critical edge case or pitfall
🟡 problem-solution - Bug fix or workaround
🔵 how-it-works - Technical explanation
🟢 what-changed - Code/architecture change
🟣 discovery - Learning or insight
🟠 why-it-exists - Design rationale
🟤 decision - Architecture decision
⚖️ trade-off - Deliberate compromise
```
**Purpose:**
- Visual scanning (humans and AI both benefit)
- Semantic categorization
- Priority signaling (🔴 gotchas are more critical)
- Pattern recognition across sessions
### Progressive Disclosure Instructions
The index includes usage guidance:
```markdown
💡 **Progressive Disclosure:** This index shows WHAT exists and retrieval COST.
- Use MCP search tools to fetch full observation details on-demand
- Prefer searching observations over re-reading code for past decisions
- Critical types (🔴 gotcha, 🟤 decision, ⚖️ trade-off) often worth fetching immediately
```
**What this does:**
- Teaches the agent the pattern
- Suggests when to fetch (critical types)
- Recommends search over code re-reading (efficiency)
- Makes the system self-documenting
---
## The Philosophy: Context as Currency
### Mental Model: Token Budget as Money
Think of context window as a bank account:
| Approach | Metaphor | Outcome |
|----------|----------|---------|
| **Dump everything** | Spending your entire paycheck on groceries you might need someday | Waste, clutter, can't afford what you actually need |
| **Fetch nothing** | Refusing to spend any money | Starvation, can't accomplish tasks |
| **Progressive disclosure** | Check your pantry, make a shopping list, buy only what you need | Efficiency, room for unexpected needs |
### The Attention Budget
LLMs have finite attention:
- Every token attends to every other token (n² relationships)
- 100,000 token window ≠ 100,000 tokens of useful attention
- Context "rot" happens as window fills
- Later tokens get less attention than earlier ones
**Claude-Mem's approach:**
- Start with ~1,000 tokens of index
- Agent has 99,000 tokens free for task
- Agent fetches ~200 tokens when needed
- Final budget: ~98,000 tokens for actual work
### Design for Autonomy
> "As models improve, let them act intelligently"
Progressive disclosure treats the agent as an **intelligent information forager**, not a passive recipient of pre-selected context.
**Traditional RAG:**
```
System → [Decides relevance] → Agent
Hope this helps!
```
**Progressive Disclosure:**
```
System → [Shows index] → Agent → [Decides relevance] → [Fetches details]
You know best!
```
The agent knows:
- The current task context
- What information would help
- How much budget to spend
- When to stop searching
We don't.
---
## Implementation Principles
### 1. Make Costs Visible
Every item in the index shows token count:
```
| #2591 | 1:15 AM | ⚖️ | Stderr messaging abandoned | ~155 |
^^^^
Retrieval cost
```
**Why:**
- Agent can make informed ROI decisions
- Small observations (~50 tokens) are "cheap" to fetch
- Large observations (~500 tokens) require stronger justification
- Matches how humans think about effort
### 2. Use Semantic Compression
Titles compress full observations into ~10 words:
**Bad title:**
```
Observation about a thing
```
**Good title:**
```
🔴 Hook timeout issue: 60s default too short for npm install
```
**What makes a good title:**
- Specific: Identifies exact issue
- Actionable: Clear what to do
- Self-contained: Doesn't require reading observation
- Searchable: Contains key terms (hook, timeout, npm)
- Categorized: Icon indicates type
### 3. Group by Context
Observations are grouped by:
- **Date**: Temporal context
- **File path**: Spatial context (work on specific files)
- **Project**: Logical context
```markdown
**src/hooks/context-hook.ts**
| ID | Time | T | Title | Tokens |
|----|------|---|-------|--------|
| #2591 | 1:15 AM | ⚖️ | Stderr messaging abandoned | ~155 |
| #2594 | 1:17 AM | 🟠 | Removed stderr section from docs | ~93 |
```
**Benefit:** If agent is working on `src/hooks/context-hook.ts`, related observations are already grouped together.
### 4. Provide Retrieval Tools
The index is useless without retrieval mechanisms:
```markdown
*Use claude-mem MCP search to access records with the given ID*
```
**Available tools:**
- `search_observations` - Full-text search
- `find_by_concept` - Concept-based retrieval
- `find_by_file` - File-based retrieval
- `find_by_type` - Type-based retrieval
- `get_recent_context` - Recent session summaries
Each tool supports `format: "index"` (default) and `format: "full"`.
---
## Real-World Example
### Scenario: Agent asked to fix a bug in hooks
**Without progressive disclosure:**
```
SessionStart injects 25,000 tokens of past context
Agent reads everything
Agent finds 1 relevant observation (buried in middle)
Total tokens consumed: 25,000
Relevant tokens: ~200
Efficiency: 0.8%
```
**With progressive disclosure:**
```
SessionStart shows index: ~800 tokens
Agent sees title: "🔴 Hook timeout issue: 60s too short"
Agent thinks: "This looks relevant to my bug!"
Agent fetches observation #2543: ~155 tokens
Total tokens consumed: 955
Relevant tokens: 955
Efficiency: 100%
```
### The Index Entry
```markdown
| #2543 | 2:14 PM | 🔴 | Hook timeout: 60s too short for npm install | ~155 |
```
**What the agent learns WITHOUT fetching:**
- There's a known gotcha (🔴) about hook timeouts
- It's related to npm install taking too long
- Full details are ~155 tokens (cheap)
- Happened at 2:14 PM (recent)
**Decision tree:**
```
Is my task related to hooks? → YES
Is my task related to timeouts? → YES
Is my task related to npm? → YES
155 tokens is cheap → FETCH IT
```
---
## The Two-Tier Search Strategy
Claude-Mem implements progressive disclosure in search results too:
### Tier 1: Index Format (Default)
```typescript
search_observations({
query: "hook timeout",
format: "index" // Default
})
```
**Returns:**
```
Found 3 observations matching "hook timeout":
| ID | Date | Type | Title | Tokens |
|----|------|------|-------|--------|
| #2543 | Oct 26 | gotcha | Hook timeout: 60s too short | ~155 |
| #2891 | Oct 25 | how-it-works | Hook timeout configuration | ~203 |
| #2102 | Oct 20 | problem-solution | Fixed timeout in CI | ~89 |
```
**Cost:** ~100 tokens for 3 results
**Value:** Agent can scan and decide which to fetch
### Tier 2: Full Format (On-Demand)
```typescript
search_observations({
query: "hook timeout",
format: "full",
limit: 1 // Fetch just the most relevant
})
```
**Returns:**
```
#2543 🔴 Hook timeout: 60s too short for npm install
─────────────────────────────────────────────────
Date: Oct 26, 2025 2:14 PM
Type: gotcha
Project: claude-mem
Narrative:
Discovered that the default 60-second hook timeout is insufficient
for npm install operations, especially with large dependency trees
or slow network conditions. This causes SessionStart hook to fail
silently, preventing context injection.
Facts:
- Default timeout: 60 seconds
- npm install with cold cache: ~90 seconds
- Configured timeout: 120 seconds in plugin/hooks/hooks.json:25
Files Modified:
- plugin/hooks/hooks.json
Concepts: hooks, timeout, npm, configuration
```
**Cost:** ~155 tokens for full details
**Value:** Complete understanding of the issue
---
## Cognitive Load Theory
Progressive disclosure is grounded in **Cognitive Load Theory**:
### Intrinsic Load
The inherent difficulty of the task itself.
**Example:** "Fix authentication bug"
- Must understand auth system
- Must understand the bug
- Must write the fix
This load is unavoidable.
### Extraneous Load
The cognitive burden of poorly presented information.
**Traditional RAG adds extraneous load:**
- Scanning irrelevant observations
- Filtering out noise
- Remembering what to ignore
- Re-contextualizing after each section
**Progressive disclosure minimizes extraneous load:**
- Scan titles (low effort)
- Fetch only relevant (targeted effort)
- Full attention on current task
### Germane Load
The effort of building mental models and schemas.
**Progressive disclosure supports germane load:**
- Consistent structure (legend, grouping)
- Clear categorization (types, icons)
- Semantic compression (good titles)
- Explicit costs (token counts)
---
## Anti-Patterns to Avoid
### ❌ Verbose Titles
**Bad:**
```
| #2543 | 2:14 PM | 🔴 | Investigation into the issue where hooks time out | ~155 |
```
**Good:**
```
| #2543 | 2:14 PM | 🔴 | Hook timeout: 60s too short for npm install | ~155 |
```
### ❌ Hiding Costs
**Bad:**
```
| #2543 | 2:14 PM | 🔴 | Hook timeout issue |
```
**Good:**
```
| #2543 | 2:14 PM | 🔴 | Hook timeout issue | ~155 |
```
### ❌ No Retrieval Path
**Bad:**
```
Here are 10 observations. [No instructions on how to get full details]
```
**Good:**
```
Here are 10 observations.
*Use MCP search tools to fetch full observation details on-demand*
```
### ❌ Defaulting to Full Format
**Bad:**
```typescript
search_observations({
query: "hooks",
format: "full" // Fetches everything
})
```
**Good:**
```typescript
search_observations({
query: "hooks",
format: "index", // Scan first
limit: 20
})
// Then, if needed:
search_observations({
query: "hooks",
format: "full",
limit: 1 // Just the most relevant
})
```
---
## Key Design Decisions
### Why Token Counts?
**Decision:** Show approximate token counts (~155, ~203) rather than exact counts.
**Rationale:**
- Communicates scale (50 vs 500) without false precision
- Maps to human intuition (small/medium/large)
- Allows agent to budget attention
- Encourages cost-conscious retrieval
### Why Icons Instead of Text Labels?
**Decision:** Use emoji icons (🔴, 🟡, 🔵) rather than text (GOTCHA, PROBLEM, HOWTO).
**Rationale:**
- Visual scanning (pattern recognition)
- Token efficient (1 char vs 10 chars)
- Language-agnostic
- Aesthetically distinct
- Works for both humans and AI
### Why Index-First, Not Smart Pre-Fetch?
**Decision:** Always show index first, even if we "know" what's relevant.
**Rationale:**
- We can't know what's relevant better than the agent
- Pre-fetching assumes we understand the task
- Agent knows current context, we don't
- Respects agent autonomy
- Fails gracefully (can always fetch more)
### Why Group by File Path?
**Decision:** Group observations by file path in addition to date.
**Rationale:**
- Spatial locality: Work on file X likely needs context about file X
- Reduces scanning effort
- Matches how developers think
- Clear semantic boundaries
---
## Measuring Success
Progressive disclosure is working when:
### ✅ Low Waste Ratio
```
Relevant Tokens / Total Context Tokens > 80%
```
Most of the context consumed is actually useful.
### ✅ Selective Fetching
```
Index Shown: 50 observations
Details Fetched: 2-3 observations
```
Agent is being selective, not fetching everything.
### ✅ Fast Task Completion
```
Session with index: 30 seconds to find relevant context
Session without: 90 seconds scanning all context
```
Time-to-relevant-information is faster.
### ✅ Appropriate Depth
```
Simple task: Only index needed
Medium task: 1-2 observations fetched
Complex task: 5-10 observations + code reads
```
Depth scales with task complexity.
---
## Future Enhancements
### Adaptive Index Size
```typescript
// Vary index size based on session type
SessionStart({ source: "startup" }):
→ Show last 10 sessions (small index)
SessionStart({ source: "resume" }):
→ Show only current session (micro index)
SessionStart({ source: "compact" }):
→ Show last 20 sessions (larger index)
```
### Relevance Scoring
```typescript
// Use embeddings to pre-sort index by relevance
search_observations({
query: "authentication bug",
format: "index",
sort: "relevance" // Based on semantic similarity
})
```
### Cost Forecasting
```markdown
💡 **Budget Estimate:**
- Fetching all 🔴 gotchas: ~450 tokens
- Fetching all file-related: ~1,200 tokens
- Fetching everything: ~8,500 tokens
```
### Progressive Detail Levels
```
Layer 1: Index (titles only)
Layer 2: Summaries (2-3 sentences)
Layer 3: Full details (complete observation)
Layer 4: Source files (referenced code)
```
---
## Key Takeaways
1. **Show, don't tell**: Index reveals what exists without forcing consumption
2. **Cost-conscious**: Make retrieval costs visible for informed decisions
3. **Agent autonomy**: Let the agent decide what's relevant
4. **Semantic compression**: Good titles make or break the system
5. **Consistent structure**: Patterns reduce cognitive load
6. **Two-tier everything**: Index first, details on-demand
7. **Context as currency**: Spend wisely on high-value information
---
## Remember
> "The best interface is one that disappears when not needed, and appears exactly when it is."
Progressive disclosure respects the agent's intelligence and autonomy. We provide the map; the agent chooses the path.
---
## Further Reading
- [Context Engineering for AI Agents](/docs/context-engineering) - Foundational principles
- [Claude-Mem Architecture](/docs/architecture) - How it all fits together
- Cognitive Load Theory (Sweller, 1988)
- Information Foraging Theory (Pirolli & Card, 1999)
- Progressive Disclosure (Nielsen Norman Group)
---
*This philosophy emerged from real-world usage of Claude-Mem across hundreds of coding sessions. The pattern works because it aligns with both human cognition and LLM attention mechanics.*
+5
View File
@@ -8,6 +8,11 @@
"type": "command",
"command": "cd \"${CLAUDE_PLUGIN_ROOT}/..\" && npm install --prefer-offline --no-audit --no-fund --loglevel=silent && node ${CLAUDE_PLUGIN_ROOT}/scripts/context-hook.js",
"timeout": 300
},
{
"type": "command",
"command": "node ${CLAUDE_PLUGIN_ROOT}/scripts/stderr-test-hook.js",
"timeout": 10
}
]
}
+3
View File
@@ -0,0 +1,3 @@
#!/usr/bin/env node
#!/usr/bin/env node
console.error("\u{1F9EA} TEST: This is a stderr message from the claude-mem hook");process.exit(0);
+2 -1
View File
@@ -17,7 +17,8 @@ const HOOKS = [
{ name: 'new-hook', source: 'src/hooks/new-hook.ts' },
{ name: 'save-hook', source: 'src/hooks/save-hook.ts' },
{ name: 'summary-hook', source: 'src/hooks/summary-hook.ts' },
{ name: 'cleanup-hook', source: 'src/hooks/cleanup-hook.ts' }
{ name: 'cleanup-hook', source: 'src/hooks/cleanup-hook.ts' },
{ name: 'stderr-test-hook', source: 'src/hooks/stderr-test-hook.ts' }
];
const WORKER_SERVICE = {
+12
View File
@@ -0,0 +1,12 @@
#!/usr/bin/env node
/**
* Test hook to verify if stderr messages appear in Claude Code UI
* This hook simply outputs a message via console.error()
*/
// Output a test message to stderr
console.error('🧪 TEST: This is a stderr message from the claude-mem hook');
// Exit successfully
process.exit(0);