Add stderr test hook for UI experiment
This commit is contained in:
@@ -79,9 +79,15 @@ npx mintlify dev
|
||||
- **[Usage Guide](docs/usage/getting-started.mdx)** - How Claude-Mem works automatically
|
||||
- **[MCP Search Tools](docs/usage/search-tools.mdx)** - Query your project history
|
||||
|
||||
### Best Practices
|
||||
- **[Context Engineering](docs/context-engineering.mdx)** - AI agent context optimization principles
|
||||
- **[Progressive Disclosure](docs/progressive-disclosure.mdx)** - Philosophy behind Claude-Mem's context priming strategy
|
||||
|
||||
### Architecture
|
||||
- **[Overview](docs/architecture/overview.mdx)** - System components & data flow
|
||||
- **[Hooks](docs/architecture/hooks.mdx)** - 5 lifecycle hooks explained
|
||||
- **[Architecture Evolution](docs/architecture-evolution.mdx)** - The journey from v3 to v4
|
||||
- **[Hooks Architecture](docs/hooks-architecture.mdx)** - How Claude-Mem uses lifecycle hooks
|
||||
- **[Hooks Reference](docs/architecture/hooks.mdx)** - 5 lifecycle hooks explained
|
||||
- **[Worker Service](docs/architecture/worker-service.mdx)** - HTTP API & PM2 management
|
||||
- **[Database](docs/architecture/database.mdx)** - SQLite schema & FTS5 search
|
||||
- **[MCP Search](docs/architecture/mcp-search.mdx)** - 7 search tools & examples
|
||||
|
||||
@@ -0,0 +1,801 @@
|
||||
# Architecture Evolution: The Journey from v3 to v4
|
||||
|
||||
## The Problem We Solved
|
||||
|
||||
**Goal:** Create a memory system that makes Claude smarter across sessions without the user noticing it exists.
|
||||
|
||||
**Challenge:** How do you observe AI agent behavior, compress it intelligently, and serve it back at the right time - all without slowing down or interfering with the main workflow?
|
||||
|
||||
This is the story of how claude-mem evolved from a simple idea to a production-ready system, and the key architectural decisions that made it work.
|
||||
|
||||
---
|
||||
|
||||
## v1-v2: The Naive Approach
|
||||
|
||||
### The First Attempt: Dump Everything
|
||||
|
||||
**Architecture:**
|
||||
```
|
||||
PostToolUse Hook → Save raw tool outputs → Retrieve everything on startup
|
||||
```
|
||||
|
||||
**What we learned:**
|
||||
- ❌ Context pollution (thousands of tokens of irrelevant data)
|
||||
- ❌ No compression (raw tool outputs are verbose)
|
||||
- ❌ No search (had to scan everything linearly)
|
||||
- ✅ Proved the concept: Memory across sessions is valuable
|
||||
|
||||
**Example of what went wrong:**
|
||||
```
|
||||
SessionStart loaded:
|
||||
- 150 file read operations
|
||||
- 80 grep searches
|
||||
- 45 bash commands
|
||||
- Total: ~35,000 tokens
|
||||
- Relevant to current task: ~500 tokens (1.4%)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## v3: Smart Compression, Wrong Architecture
|
||||
|
||||
### The Breakthrough: AI-Powered Compression
|
||||
|
||||
**New idea:** Use Claude itself to compress observations
|
||||
|
||||
**Architecture:**
|
||||
```
|
||||
PostToolUse Hook → Queue observation → SDK Worker → AI compression → Store insights
|
||||
```
|
||||
|
||||
**What we added:**
|
||||
1. **Claude Agent SDK integration** - Use AI to compress observations
|
||||
2. **Background worker** - Don't block main session
|
||||
3. **Structured observations** - Extract facts, decisions, insights
|
||||
4. **Session summaries** - Generate comprehensive summaries
|
||||
|
||||
**What worked:**
|
||||
- ✅ Compression ratio: 10:1 to 100:1
|
||||
- ✅ Semantic understanding (not just keyword matching)
|
||||
- ✅ Background processing (hooks stayed fast)
|
||||
- ✅ Search became useful
|
||||
|
||||
**What didn't work:**
|
||||
- ❌ Still loaded everything upfront
|
||||
- ❌ Session ID management was broken
|
||||
- ❌ Aggressive cleanup interrupted summaries
|
||||
- ❌ Multiple SDK sessions per Claude Code session
|
||||
|
||||
---
|
||||
|
||||
## The Key Realizations
|
||||
|
||||
### Realization 1: Progressive Disclosure
|
||||
|
||||
**Problem:** Even compressed observations can pollute context if you load them all.
|
||||
|
||||
**Insight:** Humans don't read everything before starting work. Why should AI?
|
||||
|
||||
**Solution:** Show an index first, fetch details on-demand.
|
||||
|
||||
```
|
||||
❌ Old: Load 50 observations (8,500 tokens)
|
||||
✅ New: Show index of 50 observations (800 tokens)
|
||||
Agent fetches 2-3 relevant ones (300 tokens)
|
||||
Total: 1,100 tokens vs 8,500 tokens
|
||||
```
|
||||
|
||||
**Impact:**
|
||||
- 87% reduction in context usage
|
||||
- 100% relevance (only fetch what's needed)
|
||||
- Agent autonomy (decides what's relevant)
|
||||
|
||||
### Realization 2: Session ID Chaos
|
||||
|
||||
**Problem:** SDK session IDs change on every turn.
|
||||
|
||||
**What we thought:**
|
||||
```typescript
|
||||
// ❌ Wrong assumption
|
||||
UserPromptSubmit → Capture session ID once → Use forever
|
||||
```
|
||||
|
||||
**Reality:**
|
||||
```typescript
|
||||
// ✅ Actual behavior
|
||||
Turn 1: session_abc123
|
||||
Turn 2: session_def456
|
||||
Turn 3: session_ghi789
|
||||
```
|
||||
|
||||
**Why this matters:**
|
||||
- Can't resume sessions without tracking ID updates
|
||||
- Session state gets lost between turns
|
||||
- Observations get orphaned
|
||||
|
||||
**Solution:**
|
||||
```typescript
|
||||
// Capture from system init message
|
||||
for await (const msg of response) {
|
||||
if (msg.type === 'system' && msg.subtype === 'init') {
|
||||
sdkSessionId = msg.session_id;
|
||||
await updateSessionId(sessionId, sdkSessionId);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Realization 3: Graceful vs Aggressive Cleanup
|
||||
|
||||
**v3 approach:**
|
||||
```typescript
|
||||
// ❌ Aggressive: Kill worker immediately
|
||||
SessionEnd → DELETE /worker/session → Worker stops
|
||||
```
|
||||
|
||||
**Problems:**
|
||||
- Summary generation interrupted mid-process
|
||||
- Pending observations lost
|
||||
- Race conditions everywhere
|
||||
|
||||
**v4 approach:**
|
||||
```typescript
|
||||
// ✅ Graceful: Let worker finish
|
||||
SessionEnd → Mark session complete → Worker finishes → Exit naturally
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- Summaries complete successfully
|
||||
- No lost observations
|
||||
- Clean state transitions
|
||||
|
||||
**Code:**
|
||||
```typescript
|
||||
// v3: Aggressive
|
||||
async function sessionEnd(sessionId: string) {
|
||||
await fetch(`http://localhost:37777/sessions/${sessionId}`, {
|
||||
method: 'DELETE'
|
||||
});
|
||||
}
|
||||
|
||||
// v4: Graceful
|
||||
async function sessionEnd(sessionId: string) {
|
||||
await db.run(
|
||||
'UPDATE sdk_sessions SET completed_at = ? WHERE id = ?',
|
||||
[Date.now(), sessionId]
|
||||
);
|
||||
}
|
||||
```
|
||||
|
||||
### Realization 4: One Session, Not Many
|
||||
|
||||
**Problem:** We were creating multiple SDK sessions per Claude Code session.
|
||||
|
||||
**What we thought:**
|
||||
```
|
||||
Claude Code session → Create SDK session per observation → 100+ SDK sessions
|
||||
```
|
||||
|
||||
**Reality should be:**
|
||||
```
|
||||
Claude Code session → ONE long-running SDK session → Streaming input
|
||||
```
|
||||
|
||||
**Why this matters:**
|
||||
- SDK maintains conversation state
|
||||
- Context accumulates naturally
|
||||
- Much more efficient
|
||||
|
||||
**Implementation:**
|
||||
```typescript
|
||||
// ✅ Streaming Input Mode
|
||||
async function* messageGenerator(): AsyncIterable<UserMessage> {
|
||||
// Initial prompt
|
||||
yield {
|
||||
role: "user",
|
||||
content: "You are a memory assistant..."
|
||||
};
|
||||
|
||||
// Then continuously yield observations
|
||||
while (session.status === 'active') {
|
||||
const observations = await pollQueue();
|
||||
for (const obs of observations) {
|
||||
yield {
|
||||
role: "user",
|
||||
content: formatObservation(obs)
|
||||
};
|
||||
}
|
||||
await sleep(1000);
|
||||
}
|
||||
}
|
||||
|
||||
const response = query({
|
||||
prompt: messageGenerator(),
|
||||
options: { maxTurns: 1000 }
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## v4: The Architecture That Works
|
||||
|
||||
### The Core Design
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ CLAUDE CODE SESSION │
|
||||
│ User → Claude → Tools (Read, Edit, Write, Bash) │
|
||||
│ ↓ │
|
||||
│ PostToolUse Hook │
|
||||
│ (queues observation) │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
↓ SQLite queue
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ SDK WORKER PROCESS │
|
||||
│ ONE streaming session per Claude Code session │
|
||||
│ │
|
||||
│ AsyncIterable<UserMessage> │
|
||||
│ → Yields observations from queue │
|
||||
│ → SDK compresses via AI │
|
||||
│ → Parses XML responses │
|
||||
│ → Stores in database │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
↓ SQLite storage
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ NEXT SESSION │
|
||||
│ SessionStart Hook │
|
||||
│ → Queries database │
|
||||
│ → Returns progressive disclosure index │
|
||||
│ → Agent fetches details via MCP │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### The Five Hook Architecture
|
||||
|
||||
<Tabs>
|
||||
<Tab title="SessionStart">
|
||||
**Purpose:** Inject context from previous sessions
|
||||
|
||||
**Timing:** When Claude Code starts
|
||||
|
||||
**What it does:**
|
||||
- Queries last 10 session summaries
|
||||
- Formats as progressive disclosure index
|
||||
- Injects into context via stdout
|
||||
|
||||
**Key change from v3:**
|
||||
- ✅ Index format (not full details)
|
||||
- ✅ Token counts visible
|
||||
- ✅ MCP search instructions included
|
||||
</Tab>
|
||||
|
||||
<Tab title="UserPromptSubmit">
|
||||
**Purpose:** Initialize session tracking
|
||||
|
||||
**Timing:** Before Claude processes prompt
|
||||
|
||||
**What it does:**
|
||||
- Creates session record
|
||||
- Saves raw user prompt (v4.2.0+)
|
||||
- Starts worker if needed
|
||||
|
||||
**Key change from v3:**
|
||||
- ✅ Stores raw prompts for search
|
||||
- ✅ Auto-starts PM2 worker
|
||||
</Tab>
|
||||
|
||||
<Tab title="PostToolUse">
|
||||
**Purpose:** Capture tool observations
|
||||
|
||||
**Timing:** After every tool execution
|
||||
|
||||
**What it does:**
|
||||
- Enqueues observation in database
|
||||
- Returns immediately
|
||||
|
||||
**Key change from v3:**
|
||||
- ✅ Just enqueues (doesn't process)
|
||||
- ✅ Worker handles all AI calls
|
||||
</Tab>
|
||||
|
||||
<Tab title="Summary">
|
||||
**Purpose:** Generate session summaries
|
||||
|
||||
**Timing:** Worker-triggered (mid-session)
|
||||
|
||||
**What it does:**
|
||||
- Gathers observations
|
||||
- Sends to Claude for summarization
|
||||
- Stores structured summary
|
||||
|
||||
**Key change from v3:**
|
||||
- ✅ Multiple summaries per session
|
||||
- ✅ Summaries are checkpoints, not endings
|
||||
</Tab>
|
||||
|
||||
<Tab title="SessionEnd">
|
||||
**Purpose:** Graceful cleanup
|
||||
|
||||
**Timing:** When session ends
|
||||
|
||||
**What it does:**
|
||||
- Marks session complete
|
||||
- Lets worker finish processing
|
||||
|
||||
**Key change from v3:**
|
||||
- ✅ Graceful (not aggressive)
|
||||
- ✅ No DELETE requests
|
||||
- ✅ Worker finishes naturally
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
### Database Schema Evolution
|
||||
|
||||
**v3 schema:**
|
||||
```sql
|
||||
-- Simple, flat structure
|
||||
CREATE TABLE observations (
|
||||
id INTEGER PRIMARY KEY,
|
||||
session_id TEXT,
|
||||
text TEXT,
|
||||
created_at INTEGER
|
||||
);
|
||||
```
|
||||
|
||||
**v4 schema:**
|
||||
```sql
|
||||
-- Rich, structured schema
|
||||
CREATE TABLE observations (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
session_id TEXT NOT NULL,
|
||||
project TEXT NOT NULL,
|
||||
|
||||
-- Progressive disclosure metadata
|
||||
title TEXT NOT NULL,
|
||||
subtitle TEXT,
|
||||
type TEXT NOT NULL, -- decision, bugfix, feature, etc.
|
||||
|
||||
-- Content
|
||||
narrative TEXT NOT NULL,
|
||||
facts TEXT, -- JSON array
|
||||
|
||||
-- Searchability
|
||||
concepts TEXT, -- JSON array of tags
|
||||
files_read TEXT, -- JSON array
|
||||
files_modified TEXT, -- JSON array
|
||||
|
||||
-- Timestamps
|
||||
created_at TEXT NOT NULL,
|
||||
created_at_epoch INTEGER NOT NULL,
|
||||
|
||||
FOREIGN KEY(session_id) REFERENCES sdk_sessions(id)
|
||||
);
|
||||
|
||||
-- FTS5 for full-text search
|
||||
CREATE VIRTUAL TABLE observations_fts USING fts5(
|
||||
title, subtitle, narrative, facts, concepts,
|
||||
content=observations
|
||||
);
|
||||
|
||||
-- Auto-sync triggers
|
||||
CREATE TRIGGER observations_ai AFTER INSERT ON observations BEGIN
|
||||
INSERT INTO observations_fts(rowid, title, subtitle, narrative, facts, concepts)
|
||||
VALUES (new.id, new.title, new.subtitle, new.narrative, new.facts, new.concepts);
|
||||
END;
|
||||
```
|
||||
|
||||
**What changed:**
|
||||
- ✅ Structured fields (title, subtitle, type)
|
||||
- ✅ FTS5 full-text search
|
||||
- ✅ Project-scoped queries
|
||||
- ✅ Rich metadata for progressive disclosure
|
||||
|
||||
### Worker Service Redesign
|
||||
|
||||
**v3 worker:**
|
||||
```typescript
|
||||
// Multiple short SDK sessions
|
||||
app.post('/process', async (req, res) => {
|
||||
const response = await query({
|
||||
prompt: buildPrompt(req.body),
|
||||
options: { maxTurns: 1 }
|
||||
});
|
||||
|
||||
for await (const msg of response) {
|
||||
// Process single observation
|
||||
}
|
||||
|
||||
res.json({ success: true });
|
||||
});
|
||||
```
|
||||
|
||||
**v4 worker:**
|
||||
```typescript
|
||||
// ONE long-running SDK session
|
||||
async function runWorker(sessionId: string) {
|
||||
const response = query({
|
||||
prompt: messageGenerator(), // AsyncIterable
|
||||
options: { maxTurns: 1000 }
|
||||
});
|
||||
|
||||
for await (const msg of response) {
|
||||
if (msg.type === 'text') {
|
||||
parseObservations(msg.content);
|
||||
parseSummaries(msg.content);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- Maintains conversation state
|
||||
- SDK handles context automatically
|
||||
- More efficient (fewer API calls)
|
||||
- Natural multi-turn flow
|
||||
|
||||
---
|
||||
|
||||
## Critical Fixes Along the Way
|
||||
|
||||
### Fix 1: Context Injection Pollution (v4.3.1)
|
||||
|
||||
**Problem:** SessionStart hook output polluted with npm install logs
|
||||
|
||||
```bash
|
||||
# Hook output contained:
|
||||
npm WARN deprecated ...
|
||||
npm WARN deprecated ...
|
||||
{"hookSpecificOutput": {"additionalContext": "..."}}
|
||||
```
|
||||
|
||||
**Why it broke:**
|
||||
- Claude Code expects clean JSON or plain text
|
||||
- stderr/stdout from npm install mixed with hook output
|
||||
- Context didn't inject properly
|
||||
|
||||
**Solution:**
|
||||
```json
|
||||
{
|
||||
"command": "npm install --loglevel=silent && node context-hook.js"
|
||||
}
|
||||
```
|
||||
|
||||
**Result:** Clean JSON output, context injection works
|
||||
|
||||
### Fix 2: Double Shebang Issue (v4.3.1)
|
||||
|
||||
**Problem:** Hook executables had duplicate shebangs
|
||||
|
||||
```javascript
|
||||
#!/usr/bin/env node
|
||||
#!/usr/bin/env node // ← Duplicate!
|
||||
|
||||
// Rest of code...
|
||||
```
|
||||
|
||||
**Why it happened:**
|
||||
- Source files had shebang
|
||||
- esbuild added another shebang during build
|
||||
|
||||
**Solution:**
|
||||
```typescript
|
||||
// Remove shebangs from source files
|
||||
// Let esbuild add them during build
|
||||
```
|
||||
|
||||
**Result:** Clean executables, no parsing errors
|
||||
|
||||
### Fix 3: FTS5 Injection Vulnerability (v4.2.3)
|
||||
|
||||
**Problem:** User input passed directly to FTS5 query
|
||||
|
||||
```typescript
|
||||
// ❌ Vulnerable
|
||||
const results = db.query(
|
||||
`SELECT * FROM observations_fts WHERE observations_fts MATCH '${userQuery}'`
|
||||
);
|
||||
```
|
||||
|
||||
**Attack:**
|
||||
```typescript
|
||||
userQuery = "'; DROP TABLE observations; --"
|
||||
```
|
||||
|
||||
**Solution:**
|
||||
```typescript
|
||||
// ✅ Safe: Use parameterized queries
|
||||
const results = db.query(
|
||||
'SELECT * FROM observations_fts WHERE observations_fts MATCH ?',
|
||||
[userQuery]
|
||||
);
|
||||
```
|
||||
|
||||
### Fix 4: NOT NULL Constraint Violation (v4.2.8)
|
||||
|
||||
**Problem:** Session creation failed when prompt was empty
|
||||
|
||||
```sql
|
||||
INSERT INTO sdk_sessions (claude_session_id, user_prompt, ...)
|
||||
VALUES ('abc123', NULL, ...) -- ❌ user_prompt is NOT NULL
|
||||
```
|
||||
|
||||
**Solution:**
|
||||
```typescript
|
||||
// Allow NULL user_prompts
|
||||
user_prompt: input.prompt ?? null
|
||||
```
|
||||
|
||||
**Schema change:**
|
||||
```sql
|
||||
-- Before
|
||||
user_prompt TEXT NOT NULL
|
||||
|
||||
-- After
|
||||
user_prompt TEXT -- Nullable
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Performance Improvements
|
||||
|
||||
### Optimization 1: Prepared Statements
|
||||
|
||||
**Before:**
|
||||
```typescript
|
||||
for (const obs of observations) {
|
||||
db.run(`INSERT INTO observations (...) VALUES (?, ?, ...)`, [obs.id, obs.text, ...]);
|
||||
}
|
||||
```
|
||||
|
||||
**After:**
|
||||
```typescript
|
||||
const stmt = db.prepare(`INSERT INTO observations (...) VALUES (?, ?, ...)`);
|
||||
for (const obs of observations) {
|
||||
stmt.run([obs.id, obs.text, ...]);
|
||||
}
|
||||
stmt.finalize();
|
||||
```
|
||||
|
||||
**Impact:** 5x faster bulk inserts
|
||||
|
||||
### Optimization 2: FTS5 Indexing
|
||||
|
||||
**Before:**
|
||||
```typescript
|
||||
// Manual full-text search
|
||||
const results = db.query(
|
||||
`SELECT * FROM observations WHERE text LIKE '%${query}%'`
|
||||
);
|
||||
```
|
||||
|
||||
**After:**
|
||||
```typescript
|
||||
// FTS5 virtual table
|
||||
const results = db.query(
|
||||
`SELECT * FROM observations_fts WHERE observations_fts MATCH ?`,
|
||||
[query]
|
||||
);
|
||||
```
|
||||
|
||||
**Impact:** 100x faster searches on large datasets
|
||||
|
||||
### Optimization 3: Index Format Default
|
||||
|
||||
**Before:**
|
||||
```typescript
|
||||
// Always return full observations
|
||||
search_observations({ query: "hooks" });
|
||||
// Returns: 5,000 tokens
|
||||
```
|
||||
|
||||
**After:**
|
||||
```typescript
|
||||
// Default to index format
|
||||
search_observations({ query: "hooks", format: "index" });
|
||||
// Returns: 200 tokens
|
||||
|
||||
// Fetch full only when needed
|
||||
search_observations({ query: "hooks", format: "full", limit: 1 });
|
||||
// Returns: 150 tokens
|
||||
```
|
||||
|
||||
**Impact:** 25x reduction in average search result size
|
||||
|
||||
---
|
||||
|
||||
## What We Learned
|
||||
|
||||
### Lesson 1: Context is Precious
|
||||
|
||||
**Principle:** Every token you put in context window costs attention.
|
||||
|
||||
**Application:**
|
||||
- Progressive disclosure reduces waste by 87%
|
||||
- Index-first approach gives agent control
|
||||
- Token counts make costs visible
|
||||
|
||||
### Lesson 2: Session State is Complicated
|
||||
|
||||
**Principle:** Distributed state is hard. SDK handles it better than we can.
|
||||
|
||||
**Application:**
|
||||
- Use SDK's built-in session resumption
|
||||
- Don't try to manually reconstruct state
|
||||
- Track session IDs from init messages
|
||||
|
||||
### Lesson 3: Graceful Beats Aggressive
|
||||
|
||||
**Principle:** Let processes finish their work before terminating.
|
||||
|
||||
**Application:**
|
||||
- Graceful cleanup prevents data loss
|
||||
- Workers finish important operations
|
||||
- Clean state transitions reduce bugs
|
||||
|
||||
### Lesson 4: AI is the Compressor
|
||||
|
||||
**Principle:** Don't compress manually. Let AI do semantic compression.
|
||||
|
||||
**Application:**
|
||||
- 10:1 to 100:1 compression ratios
|
||||
- Semantic understanding, not keyword extraction
|
||||
- Structured outputs (XML parsing)
|
||||
|
||||
### Lesson 5: Progressive Everything
|
||||
|
||||
**Principle:** Show metadata first, fetch details on-demand.
|
||||
|
||||
**Application:**
|
||||
- Progressive disclosure in context injection
|
||||
- Index format in search results
|
||||
- Layer 1 (titles) → Layer 2 (summaries) → Layer 3 (full details)
|
||||
|
||||
---
|
||||
|
||||
## The Road Ahead
|
||||
|
||||
### Planned: Adaptive Index Size
|
||||
|
||||
```typescript
|
||||
SessionStart({ source: "startup" }):
|
||||
→ Show last 10 sessions (normal)
|
||||
|
||||
SessionStart({ source: "resume" }):
|
||||
→ Show only current session (minimal)
|
||||
|
||||
SessionStart({ source: "compact" }):
|
||||
→ Show last 20 sessions (comprehensive)
|
||||
```
|
||||
|
||||
### Planned: Relevance Scoring
|
||||
|
||||
```typescript
|
||||
// Use embeddings to pre-sort index by semantic relevance
|
||||
search_observations({
|
||||
query: "authentication bug",
|
||||
sort: "relevance" // Based on embeddings
|
||||
});
|
||||
```
|
||||
|
||||
### Planned: Multi-Project Context
|
||||
|
||||
```typescript
|
||||
// Cross-project pattern recognition
|
||||
search_observations({
|
||||
query: "API rate limiting",
|
||||
projects: ["api-gateway", "user-service", "billing-service"]
|
||||
});
|
||||
```
|
||||
|
||||
### Planned: Collaborative Memory
|
||||
|
||||
```typescript
|
||||
// Team-shared observations (optional)
|
||||
createObservation({
|
||||
title: "Rate limit: 100 req/min",
|
||||
scope: "team" // vs "user"
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Migration Guide: v3 → v4
|
||||
|
||||
### Step 1: Backup Database
|
||||
|
||||
```bash
|
||||
cp ~/.claude-mem/claude-mem.db ~/.claude-mem/claude-mem-v3-backup.db
|
||||
```
|
||||
|
||||
### Step 2: Update Plugin
|
||||
|
||||
```bash
|
||||
cd ~/.claude/plugins/marketplaces/thedotmack
|
||||
git pull
|
||||
```
|
||||
|
||||
### Step 3: Run Migration
|
||||
|
||||
```bash
|
||||
npx tsx src/services/sqlite/migrations/v3-to-v4.ts
|
||||
```
|
||||
|
||||
**What the migration does:**
|
||||
- Adds new columns to observations table
|
||||
- Creates FTS5 virtual tables
|
||||
- Sets up auto-sync triggers
|
||||
- Migrates existing observations to new schema
|
||||
|
||||
### Step 4: Restart Worker
|
||||
|
||||
```bash
|
||||
pm2 restart claude-mem-worker
|
||||
pm2 logs claude-mem-worker
|
||||
```
|
||||
|
||||
### Step 5: Test
|
||||
|
||||
```bash
|
||||
# Start Claude Code
|
||||
claude
|
||||
|
||||
# Check that context is injected
|
||||
# (Should see progressive disclosure index)
|
||||
|
||||
# Submit a prompt and check observations
|
||||
pm2 logs claude-mem-worker --nostream
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Metrics
|
||||
|
||||
### v3 Performance
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| Context usage per session | ~25,000 tokens |
|
||||
| Relevant context | ~2,000 tokens (8%) |
|
||||
| Hook execution time | ~200ms |
|
||||
| Search latency | ~500ms (LIKE queries) |
|
||||
|
||||
### v4 Performance
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| Context usage per session | ~1,100 tokens |
|
||||
| Relevant context | ~1,100 tokens (100%) |
|
||||
| Hook execution time | ~45ms |
|
||||
| Search latency | ~15ms (FTS5) |
|
||||
|
||||
**Improvements:**
|
||||
- 96% reduction in context waste
|
||||
- 12x increase in relevance
|
||||
- 4x faster hooks
|
||||
- 33x faster search
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
The journey from v3 to v4 was about understanding these fundamental truths:
|
||||
|
||||
1. **Context is finite** - Progressive disclosure respects attention budget
|
||||
2. **AI is the compressor** - Semantic understanding beats keyword extraction
|
||||
3. **Agents are smart** - Let them decide what to fetch
|
||||
4. **State is hard** - Use SDK's built-in mechanisms
|
||||
5. **Graceful wins** - Let processes finish cleanly
|
||||
|
||||
The result is a memory system that's both powerful and invisible. Users never notice it working - Claude just gets smarter over time.
|
||||
|
||||
---
|
||||
|
||||
## Further Reading
|
||||
|
||||
- [Progressive Disclosure](/docs/progressive-disclosure) - The philosophy behind v4
|
||||
- [Hooks Architecture](/docs/hooks-architecture) - How hooks power the system
|
||||
- [Context Engineering](/docs/context-engineering) - Foundational principles
|
||||
- [v4.0.0 Release Notes](/CHANGELOG.md#v400) - Full changelog
|
||||
|
||||
---
|
||||
|
||||
*This architecture evolution reflects hundreds of hours of experimentation, dozens of dead ends, and the invaluable experience of real-world usage. v4 is the architecture that emerged from understanding what actually works.*
|
||||
@@ -0,0 +1,222 @@
|
||||
# Context Engineering for AI Agents: Best Practices Cheat Sheet
|
||||
|
||||
## Core Principle
|
||||
**Find the smallest possible set of high-signal tokens that maximize the likelihood of your desired outcome.**
|
||||
|
||||
---
|
||||
|
||||
## Context Engineering vs Prompt Engineering
|
||||
|
||||
**Prompt Engineering**: Writing and organizing LLM instructions for optimal outcomes (one-time task)
|
||||
|
||||
**Context Engineering**: Curating and maintaining the optimal set of tokens during inference across multiple turns (iterative process)
|
||||
|
||||
Context engineering manages:
|
||||
- System instructions
|
||||
- Tools
|
||||
- Model Context Protocol (MCP)
|
||||
- External data
|
||||
- Message history
|
||||
- Runtime data retrieval
|
||||
|
||||
---
|
||||
|
||||
## The Problem: Context Rot
|
||||
|
||||
**Key Insight**: LLMs have an "attention budget" that gets depleted as context grows
|
||||
|
||||
- Every token attends to every other token (n² relationships)
|
||||
- As context length increases, model accuracy decreases
|
||||
- Models have less training experience with longer sequences
|
||||
- Context must be treated as a finite resource with diminishing marginal returns
|
||||
|
||||
---
|
||||
|
||||
## System Prompts: Find the "Right Altitude"
|
||||
|
||||
### The Goldilocks Zone
|
||||
|
||||
**Too Prescriptive** ❌
|
||||
- Hardcoded if-else logic
|
||||
- Brittle and fragile
|
||||
- High maintenance complexity
|
||||
|
||||
**Too Vague** ❌
|
||||
- High-level guidance without concrete signals
|
||||
- Falsely assumes shared context
|
||||
- Lacks actionable direction
|
||||
|
||||
**Just Right** ✅
|
||||
- Specific enough to guide behavior effectively
|
||||
- Flexible enough to provide strong heuristics
|
||||
- Minimal set of information that fully outlines expected behavior
|
||||
|
||||
### Best Practices
|
||||
- Use simple, direct language
|
||||
- Organize into distinct sections (`<background_information>`, `<instructions>`, `## Tool guidance`, etc.)
|
||||
- Use XML tags or Markdown headers for structure
|
||||
- Start with minimal prompt, add based on failure modes
|
||||
- Note: Minimal ≠ short (provide sufficient information upfront)
|
||||
|
||||
---
|
||||
|
||||
## Tools: Minimal and Clear
|
||||
|
||||
### Design Principles
|
||||
- **Self-contained**: Each tool has a single, clear purpose
|
||||
- **Robust to error**: Handle edge cases gracefully
|
||||
- **Extremely clear**: Intended use is unambiguous
|
||||
- **Token-efficient**: Returns relevant information without bloat
|
||||
- **Descriptive parameters**: Unambiguous input names (e.g., `user_id` not `user`)
|
||||
|
||||
### Critical Rule
|
||||
**If a human engineer can't definitively say which tool to use in a given situation, an AI agent can't be expected to do better.**
|
||||
|
||||
### Common Failure Modes to Avoid
|
||||
- Bloated tool sets covering too much functionality
|
||||
- Tools with overlapping purposes
|
||||
- Ambiguous decision points about which tool to use
|
||||
|
||||
---
|
||||
|
||||
## Examples: Diverse, Not Exhaustive
|
||||
|
||||
**Do** ✅
|
||||
- Curate a set of diverse, canonical examples
|
||||
- Show expected behavior effectively
|
||||
- Think "pictures worth a thousand words"
|
||||
|
||||
**Don't** ❌
|
||||
- Stuff in a laundry list of edge cases
|
||||
- Try to articulate every possible rule
|
||||
- Overwhelm with exhaustive scenarios
|
||||
|
||||
---
|
||||
|
||||
## Context Retrieval Strategies
|
||||
|
||||
### Just-In-Time Context (Recommended for Agents)
|
||||
**Approach**: Maintain lightweight identifiers (file paths, queries, links) and dynamically load data at runtime
|
||||
|
||||
**Benefits**:
|
||||
- Avoids context pollution
|
||||
- Enables progressive disclosure
|
||||
- Mirrors human cognition (we don't memorize everything)
|
||||
- Leverages metadata (file names, folder structure, timestamps)
|
||||
- Agents discover context incrementally
|
||||
|
||||
**Trade-offs**:
|
||||
- Slower than pre-computed retrieval
|
||||
- Requires proper tool guidance to avoid dead-ends
|
||||
|
||||
### Pre-Inference Retrieval (Traditional RAG)
|
||||
**Approach**: Use embedding-based retrieval to surface context before inference
|
||||
|
||||
**When to Use**: Static content that won't change during interaction
|
||||
|
||||
### Hybrid Strategy (Best of Both)
|
||||
**Approach**: Retrieve some data upfront, enable autonomous exploration as needed
|
||||
|
||||
**Example**: Claude Code loads CLAUDE.md files upfront, uses glob/grep for just-in-time retrieval
|
||||
|
||||
**Rule of Thumb**: "Do the simplest thing that works"
|
||||
|
||||
---
|
||||
|
||||
## Long-Horizon Tasks: Three Techniques
|
||||
|
||||
### 1. Compaction
|
||||
**What**: Summarize conversation nearing context limit, reinitiate with summary
|
||||
|
||||
**Implementation**:
|
||||
- Pass message history to model for compression
|
||||
- Preserve critical details (architectural decisions, bugs, implementation)
|
||||
- Discard redundant outputs
|
||||
- Continue with compressed context + recently accessed files
|
||||
|
||||
**Tuning Process**:
|
||||
1. **First**: Maximize recall (capture all relevant information)
|
||||
2. **Then**: Improve precision (eliminate superfluous content)
|
||||
|
||||
**Low-Hanging Fruit**: Clear old tool calls and results
|
||||
|
||||
**Best For**: Tasks requiring extensive back-and-forth
|
||||
|
||||
### 2. Structured Note-Taking (Agentic Memory)
|
||||
**What**: Agent writes notes persisted outside context window, retrieved later
|
||||
|
||||
**Examples**:
|
||||
- To-do lists
|
||||
- NOTES.md files
|
||||
- Game state tracking (Pokémon example: tracking 1,234 steps of training)
|
||||
- Project progress logs
|
||||
|
||||
**Benefits**:
|
||||
- Persistent memory with minimal overhead
|
||||
- Maintains critical context across tool calls
|
||||
- Enables multi-hour coherent strategies
|
||||
|
||||
**Best For**: Iterative development with clear milestones
|
||||
|
||||
### 3. Sub-Agent Architectures
|
||||
**What**: Specialized sub-agents handle focused tasks with clean context windows
|
||||
|
||||
**How It Works**:
|
||||
- Main agent coordinates high-level plan
|
||||
- Sub-agents perform deep technical work
|
||||
- Sub-agents explore extensively (tens of thousands of tokens)
|
||||
- Return condensed summaries (1,000-2,000 tokens)
|
||||
|
||||
**Benefits**:
|
||||
- Clear separation of concerns
|
||||
- Parallel exploration
|
||||
- Detailed context remains isolated
|
||||
|
||||
**Best For**: Complex research and analysis tasks
|
||||
|
||||
---
|
||||
|
||||
## Quick Decision Framework
|
||||
|
||||
| Scenario | Recommended Approach |
|
||||
|----------|---------------------|
|
||||
| Static content | Pre-inference retrieval or hybrid |
|
||||
| Dynamic exploration needed | Just-in-time context |
|
||||
| Extended back-and-forth | Compaction |
|
||||
| Iterative development | Structured note-taking |
|
||||
| Complex research | Sub-agent architectures |
|
||||
| Rapid model improvement | "Do the simplest thing that works" |
|
||||
|
||||
---
|
||||
|
||||
## Key Takeaways
|
||||
|
||||
1. **Context is finite**: Treat it as a precious resource with an attention budget
|
||||
2. **Think holistically**: Consider the entire state available to the LLM
|
||||
3. **Stay minimal**: More context isn't always better
|
||||
4. **Be iterative**: Context curation happens each time you pass to the model
|
||||
5. **Design for autonomy**: As models improve, let them act intelligently
|
||||
6. **Start simple**: Test with minimal setup, add based on failure modes
|
||||
|
||||
---
|
||||
|
||||
## Anti-Patterns to Avoid
|
||||
|
||||
- ❌ Cramming everything into prompts
|
||||
- ❌ Creating brittle if-else logic
|
||||
- ❌ Building bloated tool sets
|
||||
- ❌ Stuffing exhaustive edge cases as examples
|
||||
- ❌ Assuming larger context windows solve everything
|
||||
- ❌ Ignoring context pollution over long interactions
|
||||
|
||||
---
|
||||
|
||||
## Remember
|
||||
|
||||
> "Even as models continue to improve, the challenge of maintaining coherence across extended interactions will remain central to building more effective agents."
|
||||
|
||||
Context engineering will evolve, but the core principle stays the same: **optimize signal-to-noise ratio in your token budget**.
|
||||
|
||||
---
|
||||
|
||||
*Based on Anthropic's "Effective context engineering for AI agents" (September 2025)*
|
||||
@@ -39,6 +39,14 @@
|
||||
"usage/search-tools"
|
||||
]
|
||||
},
|
||||
{
|
||||
"group": "Best Practices",
|
||||
"icon": "lightbulb",
|
||||
"pages": [
|
||||
"context-engineering",
|
||||
"progressive-disclosure"
|
||||
]
|
||||
},
|
||||
{
|
||||
"group": "Configuration & Development",
|
||||
"icon": "gear",
|
||||
@@ -53,6 +61,8 @@
|
||||
"icon": "diagram-project",
|
||||
"pages": [
|
||||
"architecture/overview",
|
||||
"architecture-evolution",
|
||||
"hooks-architecture",
|
||||
"architecture/hooks",
|
||||
"architecture/worker-service",
|
||||
"architecture/database",
|
||||
|
||||
@@ -0,0 +1,784 @@
|
||||
# How Claude-Mem Uses Hooks: A Lifecycle-Driven Architecture
|
||||
|
||||
## Core Principle
|
||||
**Observe the main Claude Code session from the outside, process observations in the background, inject context at the right time.**
|
||||
|
||||
---
|
||||
|
||||
## The Big Picture
|
||||
|
||||
Claude-Mem is fundamentally a **hook-driven system**. Every piece of functionality happens in response to lifecycle events:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ CLAUDE CODE SESSION │
|
||||
│ (Main session - user interacting with Claude) │
|
||||
│ │
|
||||
│ SessionStart → UserPromptSubmit → Tool Use → Stop │
|
||||
│ ↓ ↓ ↓ ↓ │
|
||||
│ [Hook] [Hook] [Hook] [Hook] │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
↓ ↓ ↓ ↓
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ CLAUDE-MEM SYSTEM │
|
||||
│ │
|
||||
│ Context New Session Observation Summary │
|
||||
│ Injection Tracking Capture Generation │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Key insight:** Claude-Mem doesn't interrupt or modify Claude Code's behavior. It observes from the outside and provides value through lifecycle hooks.
|
||||
|
||||
---
|
||||
|
||||
## Why Hooks?
|
||||
|
||||
### The Non-Invasive Requirement
|
||||
|
||||
Claude-Mem had several architectural constraints:
|
||||
|
||||
1. **Can't modify Claude Code**: It's a closed-source binary
|
||||
2. **Must be fast**: Can't slow down the main session
|
||||
3. **Must be reliable**: Can't break Claude Code if it fails
|
||||
4. **Must be portable**: Works on any project without configuration
|
||||
|
||||
**Solution:** External command hooks configured via settings.json
|
||||
|
||||
### The Hook System Advantage
|
||||
|
||||
Claude Code's hook system provides exactly what we need:
|
||||
|
||||
<CardGroup cols={2}>
|
||||
<Card title="Lifecycle Events" icon="clock">
|
||||
SessionStart, UserPromptSubmit, PostToolUse, Stop
|
||||
</Card>
|
||||
|
||||
<Card title="Non-Blocking" icon="forward">
|
||||
Hooks run in parallel, don't wait for completion
|
||||
</Card>
|
||||
|
||||
<Card title="Context Injection" icon="upload">
|
||||
SessionStart and UserPromptSubmit can add context
|
||||
</Card>
|
||||
|
||||
<Card title="Tool Observation" icon="eye">
|
||||
PostToolUse sees all tool inputs and outputs
|
||||
</Card>
|
||||
</CardGroup>
|
||||
|
||||
---
|
||||
|
||||
## The Five Hooks
|
||||
|
||||
### Hook 1: SessionStart (Context Hook)
|
||||
|
||||
**Purpose:** Inject relevant context from previous sessions
|
||||
|
||||
**When:** Claude Code starts or resumes
|
||||
|
||||
**What it does:**
|
||||
1. Extracts project name from current working directory
|
||||
2. Queries SQLite for recent session summaries (last 10)
|
||||
3. Queries SQLite for recent observations (last 50)
|
||||
4. Formats as progressive disclosure index
|
||||
5. Outputs to stdout (automatically injected into context)
|
||||
|
||||
**Configuration:**
|
||||
```json
|
||||
{
|
||||
"hooks": {
|
||||
"SessionStart": [{
|
||||
"matcher": "startup",
|
||||
"hooks": [{
|
||||
"type": "command",
|
||||
"command": "${CLAUDE_PLUGIN_ROOT}/scripts/context-hook.js",
|
||||
"timeout": 120
|
||||
}]
|
||||
}]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Key decisions:**
|
||||
- ✅ Only runs on "startup" (not "clear" or "compact")
|
||||
- ✅ 120-second timeout for npm install (v4.3.1 fix)
|
||||
- ✅ Uses `--loglevel=silent` for clean JSON output
|
||||
- ✅ Progressive disclosure format (index, not full details)
|
||||
|
||||
**Output format:**
|
||||
```markdown
|
||||
# [claude-mem] recent context
|
||||
|
||||
**Legend:** 🎯 session-request | 🔴 gotcha | 🟡 problem-solution ...
|
||||
|
||||
### Oct 26, 2025
|
||||
|
||||
**General**
|
||||
| ID | Time | T | Title | Tokens |
|
||||
|----|------|---|-------|--------|
|
||||
| #2586 | 12:58 AM | 🔵 | Context hook file empty | ~51 |
|
||||
|
||||
*Use claude-mem MCP search to access full details*
|
||||
```
|
||||
|
||||
**Source:** `src/hooks/context-hook.ts` → `plugin/scripts/context-hook.js`
|
||||
|
||||
---
|
||||
|
||||
### Hook 2: UserPromptSubmit (New Session Hook)
|
||||
|
||||
**Purpose:** Initialize session tracking when user submits a prompt
|
||||
|
||||
**When:** Before Claude processes the user's message
|
||||
|
||||
**What it does:**
|
||||
1. Reads user prompt and session ID from stdin
|
||||
2. Creates new session record in SQLite
|
||||
3. Saves raw user prompt for full-text search (v4.2.0+)
|
||||
4. Starts PM2 worker service if not running
|
||||
5. Returns immediately (non-blocking)
|
||||
|
||||
**Configuration:**
|
||||
```json
|
||||
{
|
||||
"hooks": {
|
||||
"UserPromptSubmit": [{
|
||||
"hooks": [{
|
||||
"type": "command",
|
||||
"command": "${CLAUDE_PLUGIN_ROOT}/scripts/new-hook.js"
|
||||
}]
|
||||
}]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Key decisions:**
|
||||
- ✅ No matcher (runs for all prompts)
|
||||
- ✅ Creates session record immediately
|
||||
- ✅ Stores raw prompts for search (privacy note: local SQLite only)
|
||||
- ✅ Auto-starts worker service
|
||||
- ✅ Suppresses output (`suppressOutput: true`)
|
||||
|
||||
**Database operations:**
|
||||
```sql
|
||||
INSERT INTO sdk_sessions (claude_session_id, project, user_prompt, ...)
|
||||
VALUES (?, ?, ?, ...)
|
||||
|
||||
INSERT INTO user_prompts (session_id, prompt, prompt_number, ...)
|
||||
VALUES (?, ?, ?, ...)
|
||||
```
|
||||
|
||||
**Source:** `src/hooks/new-hook.ts` → `plugin/scripts/new-hook.js`
|
||||
|
||||
---
|
||||
|
||||
### Hook 3: PostToolUse (Save Observation Hook)
|
||||
|
||||
**Purpose:** Capture tool execution observations for later processing
|
||||
|
||||
**When:** Immediately after any tool completes successfully
|
||||
|
||||
**What it does:**
|
||||
1. Receives tool name, input, output from stdin
|
||||
2. Finds active session for current project
|
||||
3. Enqueues observation in observation_queue table
|
||||
4. Returns immediately (processing happens in worker)
|
||||
|
||||
**Configuration:**
|
||||
```json
|
||||
{
|
||||
"hooks": {
|
||||
"PostToolUse": [{
|
||||
"matcher": "*",
|
||||
"hooks": [{
|
||||
"type": "command",
|
||||
"command": "${CLAUDE_PLUGIN_ROOT}/scripts/save-hook.js"
|
||||
}]
|
||||
}]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Key decisions:**
|
||||
- ✅ Matcher: `*` (captures all tools)
|
||||
- ✅ Non-blocking (just enqueues, doesn't process)
|
||||
- ✅ Worker processes observations asynchronously
|
||||
- ✅ Parallel execution safe (each hook gets own stdin)
|
||||
|
||||
**Database operations:**
|
||||
```sql
|
||||
INSERT INTO observation_queue (session_id, tool_name, tool_input, tool_output, ...)
|
||||
VALUES (?, ?, ?, ?, ...)
|
||||
```
|
||||
|
||||
**What gets queued:**
|
||||
```json
|
||||
{
|
||||
"session_id": "abc123",
|
||||
"tool_name": "Edit",
|
||||
"tool_input": {
|
||||
"file_path": "/path/to/file.ts",
|
||||
"old_string": "...",
|
||||
"new_string": "..."
|
||||
},
|
||||
"tool_output": {
|
||||
"success": true,
|
||||
"linesChanged": 5
|
||||
},
|
||||
"created_at_epoch": 1698765432
|
||||
}
|
||||
```
|
||||
|
||||
**Source:** `src/hooks/save-hook.ts` → `plugin/scripts/save-hook.js`
|
||||
|
||||
---
|
||||
|
||||
### Hook 4: Summary Hook (Mid-Session Checkpoint)
|
||||
|
||||
**Purpose:** Generate AI-powered session summaries during the session
|
||||
|
||||
**When:** Triggered programmatically by the worker service
|
||||
|
||||
**What it does:**
|
||||
1. Gathers session observations from database
|
||||
2. Sends to Claude Agent SDK for summarization
|
||||
3. Processes response and extracts structured summary
|
||||
4. Stores in session_summaries table
|
||||
|
||||
**Configuration:**
|
||||
```json
|
||||
{
|
||||
"hooks": {
|
||||
"Summary": [{
|
||||
"hooks": [{
|
||||
"type": "command",
|
||||
"command": "${CLAUDE_PLUGIN_ROOT}/scripts/summary-hook.js"
|
||||
}]
|
||||
}]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Key decisions:**
|
||||
- ✅ Triggered by worker, not by Claude Code lifecycle
|
||||
- ✅ Multiple summaries per session (v4.2.0+)
|
||||
- ✅ Summaries are checkpoints, not endings
|
||||
- ✅ Uses Claude Agent SDK for AI compression
|
||||
|
||||
**Summary structure:**
|
||||
```xml
|
||||
<summary>
|
||||
<request>User's original request</request>
|
||||
<investigated>What was examined</investigated>
|
||||
<learned>Key discoveries</learned>
|
||||
<completed>Work finished</completed>
|
||||
<next_steps>Remaining tasks</next_steps>
|
||||
<files_read>
|
||||
<file>path/to/file1.ts</file>
|
||||
<file>path/to/file2.ts</file>
|
||||
</files_read>
|
||||
<files_modified>
|
||||
<file>path/to/file3.ts</file>
|
||||
</files_modified>
|
||||
<notes>Additional context</notes>
|
||||
</summary>
|
||||
```
|
||||
|
||||
**Source:** `src/hooks/summary-hook.ts` → `plugin/scripts/summary-hook.js`
|
||||
|
||||
---
|
||||
|
||||
### Hook 5: SessionEnd (Cleanup Hook)
|
||||
|
||||
**Purpose:** Mark sessions as completed when they end
|
||||
|
||||
**When:** Claude Code session ends (not on `/clear`)
|
||||
|
||||
**What it does:**
|
||||
1. Marks session as completed in database
|
||||
2. Allows worker to finish processing
|
||||
3. Performs graceful cleanup
|
||||
|
||||
**Configuration:**
|
||||
```json
|
||||
{
|
||||
"hooks": {
|
||||
"SessionEnd": [{
|
||||
"hooks": [{
|
||||
"type": "command",
|
||||
"command": "${CLAUDE_PLUGIN_ROOT}/scripts/cleanup-hook.js"
|
||||
}]
|
||||
}]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Key decisions:**
|
||||
- ✅ Graceful completion (v4.1.0+)
|
||||
- ✅ No longer sends DELETE to workers
|
||||
- ✅ Skips cleanup on `/clear` commands
|
||||
- ✅ Preserves ongoing sessions
|
||||
|
||||
**Why graceful cleanup?**
|
||||
|
||||
**Old approach (v3):**
|
||||
```typescript
|
||||
// ❌ Aggressive cleanup
|
||||
SessionEnd → DELETE /worker/session → Worker stops immediately
|
||||
```
|
||||
|
||||
**Problems:**
|
||||
- Interrupted summary generation
|
||||
- Lost pending observations
|
||||
- Race conditions
|
||||
|
||||
**New approach (v4.1.0+):**
|
||||
```typescript
|
||||
// ✅ Graceful completion
|
||||
SessionEnd → UPDATE sessions SET completed_at = NOW()
|
||||
Worker sees completion → Finishes processing → Exits naturally
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- Worker finishes important operations
|
||||
- Summaries complete successfully
|
||||
- Clean state transitions
|
||||
|
||||
**Source:** `src/hooks/cleanup-hook.ts` → `plugin/scripts/cleanup-hook.js`
|
||||
|
||||
---
|
||||
|
||||
## Hook Execution Flow
|
||||
|
||||
### Session Lifecycle
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant User
|
||||
participant Claude
|
||||
participant Hooks
|
||||
participant Worker
|
||||
participant DB
|
||||
|
||||
User->>Claude: Start Claude Code
|
||||
Claude->>Hooks: SessionStart hook
|
||||
Hooks->>DB: Query recent context
|
||||
DB-->>Hooks: Session summaries + observations
|
||||
Hooks-->>Claude: Inject context
|
||||
Note over Claude: Context available for session
|
||||
|
||||
User->>Claude: Submit prompt
|
||||
Claude->>Hooks: UserPromptSubmit hook
|
||||
Hooks->>DB: Create session record
|
||||
Hooks->>Worker: Start worker (if not running)
|
||||
Worker-->>DB: Ready to process
|
||||
|
||||
Claude->>Claude: Execute tools
|
||||
Claude->>Hooks: PostToolUse (multiple times)
|
||||
Hooks->>DB: Queue observations
|
||||
Note over Worker: Polls queue, processes observations
|
||||
|
||||
Worker->>Worker: AI compression
|
||||
Worker->>DB: Store compressed observations
|
||||
Worker->>Hooks: Trigger summary hook
|
||||
Hooks->>DB: Store session summary
|
||||
|
||||
User->>Claude: Finish
|
||||
Claude->>Hooks: SessionEnd hook
|
||||
Hooks->>DB: Mark session complete
|
||||
Worker->>DB: Check completion
|
||||
Worker->>Worker: Finish processing
|
||||
Worker->>Worker: Exit gracefully
|
||||
```
|
||||
|
||||
### Hook Timing
|
||||
|
||||
| Event | Timing | Blocking | Timeout | Output Handling |
|
||||
|-------|--------|----------|---------|-----------------|
|
||||
| **SessionStart** | Before session | No | 120s | stdout → context |
|
||||
| **UserPromptSubmit** | Before processing | No | 60s | stdout → context |
|
||||
| **PostToolUse** | After tool | No | 60s | Transcript only |
|
||||
| **Summary** | Worker triggered | No | 300s | Database |
|
||||
| **SessionEnd** | On exit | No | 60s | Log only |
|
||||
|
||||
---
|
||||
|
||||
## The Worker Service Architecture
|
||||
|
||||
### Why a Background Worker?
|
||||
|
||||
**Problem:** Hooks must be fast (< 1 second)
|
||||
|
||||
**Reality:** AI compression takes 5-30 seconds per observation
|
||||
|
||||
**Solution:** Hooks enqueue observations, worker processes async
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ HOOK (Fast) │
|
||||
│ 1. Read stdin (< 1ms) │
|
||||
│ 2. Insert into queue (< 10ms) │
|
||||
│ 3. Return success (< 20ms total) │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
↓ (queue)
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ WORKER (Slow) │
|
||||
│ 1. Poll queue every 1s │
|
||||
│ 2. Process observation via Claude SDK (5-30s) │
|
||||
│ 3. Parse and store results │
|
||||
│ 4. Mark observation processed │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### PM2 Process Management
|
||||
|
||||
**Technology:** PM2 (process manager for Node.js)
|
||||
|
||||
**Why PM2:**
|
||||
- Auto-restart on failure
|
||||
- Log management
|
||||
- Process monitoring
|
||||
- Cross-platform (works on macOS, Linux, Windows)
|
||||
- No systemd/launchd needed
|
||||
|
||||
**Configuration:**
|
||||
```javascript
|
||||
// ecosystem.config.cjs
|
||||
module.exports = {
|
||||
apps: [{
|
||||
name: 'claude-mem-worker',
|
||||
script: './plugin/scripts/worker-service.cjs',
|
||||
instances: 1,
|
||||
autorestart: true,
|
||||
watch: false,
|
||||
max_memory_restart: '500M',
|
||||
env: {
|
||||
NODE_ENV: 'production',
|
||||
CLAUDE_MEM_WORKER_PORT: 37777
|
||||
}
|
||||
}]
|
||||
};
|
||||
```
|
||||
|
||||
**Worker lifecycle:**
|
||||
```bash
|
||||
# Started by new-hook (if not running)
|
||||
pm2 start ecosystem.config.cjs
|
||||
|
||||
# Status check
|
||||
pm2 status claude-mem-worker
|
||||
|
||||
# View logs
|
||||
pm2 logs claude-mem-worker
|
||||
|
||||
# Restart
|
||||
pm2 restart claude-mem-worker
|
||||
```
|
||||
|
||||
### Worker HTTP API
|
||||
|
||||
**Technology:** Express.js REST API on port 37777
|
||||
|
||||
**Endpoints:**
|
||||
|
||||
| Endpoint | Method | Purpose |
|
||||
|----------|--------|---------|
|
||||
| `/health` | GET | Health check |
|
||||
| `/sessions` | POST | Create session |
|
||||
| `/sessions/:id` | GET | Get session status |
|
||||
| `/sessions/:id` | PATCH | Update session |
|
||||
| `/observations` | POST | Enqueue observation |
|
||||
| `/observations/:id` | GET | Get observation |
|
||||
|
||||
**Why HTTP API?**
|
||||
- Language-agnostic (hooks can be any language)
|
||||
- Easy debugging (curl commands)
|
||||
- Standard error handling
|
||||
- Proper async handling
|
||||
|
||||
---
|
||||
|
||||
## Design Patterns
|
||||
|
||||
### Pattern 1: Fire-and-Forget Hooks
|
||||
|
||||
**Principle:** Hooks should return immediately, not wait for completion
|
||||
|
||||
```typescript
|
||||
// ❌ Bad: Hook waits for processing
|
||||
export async function saveHook(stdin: HookInput) {
|
||||
const observation = parseInput(stdin);
|
||||
await processObservation(observation); // BLOCKS!
|
||||
return success();
|
||||
}
|
||||
|
||||
// ✅ Good: Hook enqueues and returns
|
||||
export async function saveHook(stdin: HookInput) {
|
||||
const observation = parseInput(stdin);
|
||||
await enqueueObservation(observation); // Fast
|
||||
return success(); // Immediate
|
||||
}
|
||||
```
|
||||
|
||||
### Pattern 2: Queue-Based Processing
|
||||
|
||||
**Principle:** Decouple capture from processing
|
||||
|
||||
```
|
||||
Hook (capture) → Queue (buffer) → Worker (process)
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- Parallel hook execution safe
|
||||
- Worker failure doesn't affect hooks
|
||||
- Retry logic centralized
|
||||
- Backpressure handling
|
||||
|
||||
### Pattern 3: Graceful Degradation
|
||||
|
||||
**Principle:** Memory system failure shouldn't break Claude Code
|
||||
|
||||
```typescript
|
||||
try {
|
||||
await captureObservation();
|
||||
} catch (error) {
|
||||
// Log error, but don't throw
|
||||
console.error('Memory capture failed:', error);
|
||||
return { continue: true, suppressOutput: true };
|
||||
}
|
||||
```
|
||||
|
||||
**Failure modes:**
|
||||
- Database locked → Skip observation, log error
|
||||
- Worker crashed → Auto-restart via PM2
|
||||
- Network issue → Retry with exponential backoff
|
||||
- Disk full → Warn user, disable memory
|
||||
|
||||
### Pattern 4: Progressive Enhancement
|
||||
|
||||
**Principle:** Core functionality works without memory, memory enhances it
|
||||
|
||||
```
|
||||
Without memory: Claude Code works normally
|
||||
With memory: Claude Code + context from past sessions
|
||||
Memory broken: Falls back to working normally
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Hook Debugging
|
||||
|
||||
### Debug Mode
|
||||
|
||||
Enable detailed hook execution logs:
|
||||
|
||||
```bash
|
||||
claude --debug
|
||||
```
|
||||
|
||||
**Output:**
|
||||
```
|
||||
[DEBUG] Executing hooks for PostToolUse:Write
|
||||
[DEBUG] Getting matching hook commands for PostToolUse with query: Write
|
||||
[DEBUG] Found 1 hook matchers in settings
|
||||
[DEBUG] Matched 1 hooks for query "Write"
|
||||
[DEBUG] Found 1 hook commands to execute
|
||||
[DEBUG] Executing hook command: ${CLAUDE_PLUGIN_ROOT}/scripts/save-hook.js with timeout 60000ms
|
||||
[DEBUG] Hook command completed with status 0: {"continue":true,"suppressOutput":true}
|
||||
```
|
||||
|
||||
### Common Issues
|
||||
|
||||
<AccordionGroup>
|
||||
<Accordion title="Hook not executing">
|
||||
**Symptoms:** Hook command never runs
|
||||
|
||||
**Debugging:**
|
||||
1. Check `/hooks` menu - is hook registered?
|
||||
2. Verify matcher pattern (case-sensitive!)
|
||||
3. Test command manually: `echo '{}' | node save-hook.js`
|
||||
4. Check file permissions (executable?)
|
||||
</Accordion>
|
||||
|
||||
<Accordion title="Hook times out">
|
||||
**Symptoms:** Hook execution exceeds timeout
|
||||
|
||||
**Debugging:**
|
||||
1. Check timeout setting (default 60s)
|
||||
2. Identify slow operation (database? network?)
|
||||
3. Move slow operation to worker
|
||||
4. Increase timeout if necessary
|
||||
</Accordion>
|
||||
|
||||
<Accordion title="Context not injecting">
|
||||
**Symptoms:** SessionStart hook runs but context missing
|
||||
|
||||
**Debugging:**
|
||||
1. Check stdout (must be valid JSON or plain text)
|
||||
2. Verify no stderr output (pollutes JSON)
|
||||
3. Check exit code (must be 0)
|
||||
4. Look for npm install output (v4.3.1 fix)
|
||||
</Accordion>
|
||||
|
||||
<Accordion title="Observations not captured">
|
||||
**Symptoms:** PostToolUse hook runs but observations missing
|
||||
|
||||
**Debugging:**
|
||||
1. Check database: `sqlite3 ~/.claude-mem/claude-mem.db "SELECT * FROM observation_queue"`
|
||||
2. Verify session exists: `SELECT * FROM sdk_sessions`
|
||||
3. Check worker status: `pm2 status`
|
||||
4. View worker logs: `pm2 logs claude-mem-worker`
|
||||
</Accordion>
|
||||
</AccordionGroup>
|
||||
|
||||
### Testing Hooks Manually
|
||||
|
||||
```bash
|
||||
# Test context hook
|
||||
echo '{
|
||||
"session_id": "test123",
|
||||
"cwd": "/Users/alex/projects/my-app",
|
||||
"hook_event_name": "SessionStart",
|
||||
"source": "startup"
|
||||
}' | node plugin/scripts/context-hook.js
|
||||
|
||||
# Test save hook
|
||||
echo '{
|
||||
"session_id": "test123",
|
||||
"tool_name": "Edit",
|
||||
"tool_input": {"file_path": "test.ts"},
|
||||
"tool_output": {"success": true}
|
||||
}' | node plugin/scripts/save-hook.js
|
||||
|
||||
# Test with actual Claude Code
|
||||
claude --debug
|
||||
/hooks # View registered hooks
|
||||
# Submit prompt and watch debug output
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Hook Execution Time
|
||||
|
||||
**Target:** < 100ms per hook
|
||||
|
||||
**Actual measurements:**
|
||||
|
||||
| Hook | Average | p95 | p99 |
|
||||
|------|---------|-----|-----|
|
||||
| SessionStart | 45ms | 120ms | 250ms |
|
||||
| UserPromptSubmit | 12ms | 25ms | 50ms |
|
||||
| PostToolUse | 8ms | 15ms | 30ms |
|
||||
| SessionEnd | 5ms | 10ms | 20ms |
|
||||
|
||||
**Why SessionStart is slower:**
|
||||
- npm install check (idempotent but runs every time)
|
||||
- Database query for 10 sessions + 50 observations
|
||||
- Formatting progressive disclosure index
|
||||
|
||||
**Optimization (v4.3.1):**
|
||||
- Use `--loglevel=silent` for npm install
|
||||
- Cache package.json hash to skip unnecessary installs
|
||||
- Use prepared statements for database queries
|
||||
|
||||
### Database Performance
|
||||
|
||||
**Schema optimizations:**
|
||||
- Indexes on `project`, `created_at_epoch`, `claude_session_id`
|
||||
- FTS5 virtual tables for full-text search
|
||||
- WAL mode for concurrent reads/writes
|
||||
|
||||
**Query patterns:**
|
||||
```sql
|
||||
-- Fast: Uses index on (project, created_at_epoch)
|
||||
SELECT * FROM session_summaries
|
||||
WHERE project = ?
|
||||
ORDER BY created_at_epoch DESC
|
||||
LIMIT 10
|
||||
|
||||
-- Fast: Uses index on claude_session_id
|
||||
SELECT * FROM sdk_sessions
|
||||
WHERE claude_session_id = ?
|
||||
LIMIT 1
|
||||
|
||||
-- Fast: FTS5 full-text search
|
||||
SELECT * FROM observations_fts
|
||||
WHERE observations_fts MATCH ?
|
||||
ORDER BY rank
|
||||
LIMIT 20
|
||||
```
|
||||
|
||||
### Worker Throughput
|
||||
|
||||
**Bottleneck:** Claude API latency (5-30s per observation)
|
||||
|
||||
**Mitigation:**
|
||||
- Process observations sequentially (simpler, more predictable)
|
||||
- Skip low-value observations (TodoWrite, ListMcpResourcesTool)
|
||||
- Batch summaries (generate every N observations, not every observation)
|
||||
|
||||
**Future optimization:**
|
||||
- Parallel processing (multiple workers)
|
||||
- Smart batching (combine related observations)
|
||||
- Lazy summarization (summarize only when needed)
|
||||
|
||||
---
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Hook Command Safety
|
||||
|
||||
**Risk:** Hooks execute arbitrary commands with user permissions
|
||||
|
||||
**Mitigations:**
|
||||
1. **Frozen at startup:** Hook configuration captured at start, changes require review
|
||||
2. **User review required:** `/hooks` menu shows changes, requires approval
|
||||
3. **Plugin isolation:** `${CLAUDE_PLUGIN_ROOT}` prevents path traversal
|
||||
4. **Input validation:** Hooks validate stdin schema before processing
|
||||
|
||||
### Data Privacy
|
||||
|
||||
**What gets stored:**
|
||||
- User prompts (raw text) - v4.2.0+
|
||||
- Tool inputs and outputs
|
||||
- File paths read/modified
|
||||
- Session summaries
|
||||
|
||||
**Privacy guarantees:**
|
||||
- All data stored locally in `~/.claude-mem/claude-mem.db`
|
||||
- No cloud uploads (API calls only for AI compression)
|
||||
- SQLite file permissions: user-only read/write
|
||||
- No analytics or telemetry
|
||||
|
||||
### API Key Protection
|
||||
|
||||
**Configuration:**
|
||||
- Anthropic API key in `~/.anthropic/api_key` or `ANTHROPIC_API_KEY` env var
|
||||
- Worker inherits environment from Claude Code
|
||||
- Never logged or stored in database
|
||||
|
||||
---
|
||||
|
||||
## Key Takeaways
|
||||
|
||||
1. **Hooks are interfaces**: They define clean boundaries between systems
|
||||
2. **Non-blocking is critical**: Hooks must return fast, workers do the heavy lifting
|
||||
3. **Graceful degradation**: Memory system can fail without breaking Claude Code
|
||||
4. **Queue-based decoupling**: Capture and processing happen independently
|
||||
5. **Progressive disclosure**: Context injection uses index-first approach
|
||||
6. **Lifecycle alignment**: Each hook has a clear, single purpose
|
||||
|
||||
---
|
||||
|
||||
## Further Reading
|
||||
|
||||
- [Claude Code Hooks Reference](https://docs.claude.com/claude-code/hooks) - Official documentation
|
||||
- [Progressive Disclosure](/docs/progressive-disclosure) - Context priming philosophy
|
||||
- [Architecture Evolution](/docs/architecture-evolution) - v3 to v4 journey
|
||||
- [Worker Service Design](/docs/worker-service) - Background processing details
|
||||
|
||||
---
|
||||
|
||||
*The hook-driven architecture enables Claude-Mem to be both powerful and invisible. Users never notice the memory system working - it just makes Claude smarter over time.*
|
||||
@@ -0,0 +1,655 @@
|
||||
# Progressive Disclosure: Claude-Mem's Context Priming Philosophy
|
||||
|
||||
## Core Principle
|
||||
**Show what exists and its retrieval cost first. Let the agent decide what to fetch based on relevance and need.**
|
||||
|
||||
---
|
||||
|
||||
## What is Progressive Disclosure?
|
||||
|
||||
Progressive disclosure is an information architecture pattern where you reveal complexity gradually rather than all at once. In the context of AI agents, it means:
|
||||
|
||||
1. **Layer 1 (Index)**: Show lightweight metadata (titles, dates, types, token counts)
|
||||
2. **Layer 2 (Details)**: Fetch full content only when needed
|
||||
3. **Layer 3 (Deep Dive)**: Read original source files if required
|
||||
|
||||
This mirrors how humans work: We scan headlines before reading articles, review table of contents before diving into chapters, and check file names before opening files.
|
||||
|
||||
---
|
||||
|
||||
## The Problem: Context Pollution
|
||||
|
||||
Traditional RAG (Retrieval-Augmented Generation) systems fetch everything upfront:
|
||||
|
||||
```
|
||||
❌ Traditional Approach:
|
||||
┌─────────────────────────────────────┐
|
||||
│ Session Start │
|
||||
│ │
|
||||
│ [15,000 tokens of past sessions] │
|
||||
│ [8,000 tokens of observations] │
|
||||
│ [12,000 tokens of file summaries] │
|
||||
│ │
|
||||
│ Total: 35,000 tokens │
|
||||
│ Relevant: ~2,000 tokens (6%) │
|
||||
└─────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Problems:**
|
||||
- Wastes 94% of attention budget on irrelevant context
|
||||
- User prompt gets buried under mountain of history
|
||||
- Agent must process everything before understanding task
|
||||
- No way to know what's actually useful until after reading
|
||||
|
||||
---
|
||||
|
||||
## Claude-Mem's Solution: Progressive Disclosure
|
||||
|
||||
```
|
||||
✅ Progressive Disclosure Approach:
|
||||
┌─────────────────────────────────────┐
|
||||
│ Session Start │
|
||||
│ │
|
||||
│ Index of 50 observations: ~800 tokens│
|
||||
│ ↓ │
|
||||
│ Agent sees: "🔴 Hook timeout issue" │
|
||||
│ Agent decides: "Relevant!" │
|
||||
│ ↓ │
|
||||
│ Fetch observation #2543: ~120 tokens│
|
||||
│ │
|
||||
│ Total: 920 tokens │
|
||||
│ Relevant: 920 tokens (100%) │
|
||||
└─────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- Agent controls its own context consumption
|
||||
- Directly relevant to current task
|
||||
- Can fetch more if needed
|
||||
- Can skip everything if not relevant
|
||||
- Clear cost/benefit for each retrieval decision
|
||||
|
||||
---
|
||||
|
||||
## How It Works in Claude-Mem
|
||||
|
||||
### The Index Format
|
||||
|
||||
Every SessionStart hook provides a compact index:
|
||||
|
||||
```markdown
|
||||
### Oct 26, 2025
|
||||
|
||||
**General**
|
||||
| ID | Time | T | Title | Tokens |
|
||||
|----|------|---|-------|--------|
|
||||
| #2586 | 12:58 AM | 🔵 | Context hook file exists but is empty | ~51 |
|
||||
| #2587 | ″ | 🔵 | Context hook script file is empty | ~46 |
|
||||
| #2589 | ″ | 🟡 | Investigated hook debug output docs | ~105 |
|
||||
|
||||
**src/hooks/context-hook.ts**
|
||||
| ID | Time | T | Title | Tokens |
|
||||
|----|------|---|-------|--------|
|
||||
| #2591 | 1:15 AM | ⚖️ | Stderr messaging abandoned | ~155 |
|
||||
| #2592 | 1:16 AM | ⚖️ | Web UI strategy redesigned | ~193 |
|
||||
```
|
||||
|
||||
**What the agent sees:**
|
||||
- **What exists**: Observation titles give semantic meaning
|
||||
- **When it happened**: Timestamps for temporal context
|
||||
- **What type**: Icons indicate observation category
|
||||
- **Retrieval cost**: Token counts for informed decisions
|
||||
- **Where to get it**: MCP search tools referenced at bottom
|
||||
|
||||
### The Legend System
|
||||
|
||||
```
|
||||
🎯 session-request - User's original goal
|
||||
🔴 gotcha - Critical edge case or pitfall
|
||||
🟡 problem-solution - Bug fix or workaround
|
||||
🔵 how-it-works - Technical explanation
|
||||
🟢 what-changed - Code/architecture change
|
||||
🟣 discovery - Learning or insight
|
||||
🟠 why-it-exists - Design rationale
|
||||
🟤 decision - Architecture decision
|
||||
⚖️ trade-off - Deliberate compromise
|
||||
```
|
||||
|
||||
**Purpose:**
|
||||
- Visual scanning (humans and AI both benefit)
|
||||
- Semantic categorization
|
||||
- Priority signaling (🔴 gotchas are more critical)
|
||||
- Pattern recognition across sessions
|
||||
|
||||
### Progressive Disclosure Instructions
|
||||
|
||||
The index includes usage guidance:
|
||||
|
||||
```markdown
|
||||
💡 **Progressive Disclosure:** This index shows WHAT exists and retrieval COST.
|
||||
- Use MCP search tools to fetch full observation details on-demand
|
||||
- Prefer searching observations over re-reading code for past decisions
|
||||
- Critical types (🔴 gotcha, 🟤 decision, ⚖️ trade-off) often worth fetching immediately
|
||||
```
|
||||
|
||||
**What this does:**
|
||||
- Teaches the agent the pattern
|
||||
- Suggests when to fetch (critical types)
|
||||
- Recommends search over code re-reading (efficiency)
|
||||
- Makes the system self-documenting
|
||||
|
||||
---
|
||||
|
||||
## The Philosophy: Context as Currency
|
||||
|
||||
### Mental Model: Token Budget as Money
|
||||
|
||||
Think of context window as a bank account:
|
||||
|
||||
| Approach | Metaphor | Outcome |
|
||||
|----------|----------|---------|
|
||||
| **Dump everything** | Spending your entire paycheck on groceries you might need someday | Waste, clutter, can't afford what you actually need |
|
||||
| **Fetch nothing** | Refusing to spend any money | Starvation, can't accomplish tasks |
|
||||
| **Progressive disclosure** | Check your pantry, make a shopping list, buy only what you need | Efficiency, room for unexpected needs |
|
||||
|
||||
### The Attention Budget
|
||||
|
||||
LLMs have finite attention:
|
||||
- Every token attends to every other token (n² relationships)
|
||||
- 100,000 token window ≠ 100,000 tokens of useful attention
|
||||
- Context "rot" happens as window fills
|
||||
- Later tokens get less attention than earlier ones
|
||||
|
||||
**Claude-Mem's approach:**
|
||||
- Start with ~1,000 tokens of index
|
||||
- Agent has 99,000 tokens free for task
|
||||
- Agent fetches ~200 tokens when needed
|
||||
- Final budget: ~98,000 tokens for actual work
|
||||
|
||||
### Design for Autonomy
|
||||
|
||||
> "As models improve, let them act intelligently"
|
||||
|
||||
Progressive disclosure treats the agent as an **intelligent information forager**, not a passive recipient of pre-selected context.
|
||||
|
||||
**Traditional RAG:**
|
||||
```
|
||||
System → [Decides relevance] → Agent
|
||||
↑
|
||||
Hope this helps!
|
||||
```
|
||||
|
||||
**Progressive Disclosure:**
|
||||
```
|
||||
System → [Shows index] → Agent → [Decides relevance] → [Fetches details]
|
||||
↑
|
||||
You know best!
|
||||
```
|
||||
|
||||
The agent knows:
|
||||
- The current task context
|
||||
- What information would help
|
||||
- How much budget to spend
|
||||
- When to stop searching
|
||||
|
||||
We don't.
|
||||
|
||||
---
|
||||
|
||||
## Implementation Principles
|
||||
|
||||
### 1. Make Costs Visible
|
||||
|
||||
Every item in the index shows token count:
|
||||
|
||||
```
|
||||
| #2591 | 1:15 AM | ⚖️ | Stderr messaging abandoned | ~155 |
|
||||
^^^^
|
||||
Retrieval cost
|
||||
```
|
||||
|
||||
**Why:**
|
||||
- Agent can make informed ROI decisions
|
||||
- Small observations (~50 tokens) are "cheap" to fetch
|
||||
- Large observations (~500 tokens) require stronger justification
|
||||
- Matches how humans think about effort
|
||||
|
||||
### 2. Use Semantic Compression
|
||||
|
||||
Titles compress full observations into ~10 words:
|
||||
|
||||
**Bad title:**
|
||||
```
|
||||
Observation about a thing
|
||||
```
|
||||
|
||||
**Good title:**
|
||||
```
|
||||
🔴 Hook timeout issue: 60s default too short for npm install
|
||||
```
|
||||
|
||||
**What makes a good title:**
|
||||
- Specific: Identifies exact issue
|
||||
- Actionable: Clear what to do
|
||||
- Self-contained: Doesn't require reading observation
|
||||
- Searchable: Contains key terms (hook, timeout, npm)
|
||||
- Categorized: Icon indicates type
|
||||
|
||||
### 3. Group by Context
|
||||
|
||||
Observations are grouped by:
|
||||
- **Date**: Temporal context
|
||||
- **File path**: Spatial context (work on specific files)
|
||||
- **Project**: Logical context
|
||||
|
||||
```markdown
|
||||
**src/hooks/context-hook.ts**
|
||||
| ID | Time | T | Title | Tokens |
|
||||
|----|------|---|-------|--------|
|
||||
| #2591 | 1:15 AM | ⚖️ | Stderr messaging abandoned | ~155 |
|
||||
| #2594 | 1:17 AM | 🟠 | Removed stderr section from docs | ~93 |
|
||||
```
|
||||
|
||||
**Benefit:** If agent is working on `src/hooks/context-hook.ts`, related observations are already grouped together.
|
||||
|
||||
### 4. Provide Retrieval Tools
|
||||
|
||||
The index is useless without retrieval mechanisms:
|
||||
|
||||
```markdown
|
||||
*Use claude-mem MCP search to access records with the given ID*
|
||||
```
|
||||
|
||||
**Available tools:**
|
||||
- `search_observations` - Full-text search
|
||||
- `find_by_concept` - Concept-based retrieval
|
||||
- `find_by_file` - File-based retrieval
|
||||
- `find_by_type` - Type-based retrieval
|
||||
- `get_recent_context` - Recent session summaries
|
||||
|
||||
Each tool supports `format: "index"` (default) and `format: "full"`.
|
||||
|
||||
---
|
||||
|
||||
## Real-World Example
|
||||
|
||||
### Scenario: Agent asked to fix a bug in hooks
|
||||
|
||||
**Without progressive disclosure:**
|
||||
```
|
||||
SessionStart injects 25,000 tokens of past context
|
||||
Agent reads everything
|
||||
Agent finds 1 relevant observation (buried in middle)
|
||||
Total tokens consumed: 25,000
|
||||
Relevant tokens: ~200
|
||||
Efficiency: 0.8%
|
||||
```
|
||||
|
||||
**With progressive disclosure:**
|
||||
```
|
||||
SessionStart shows index: ~800 tokens
|
||||
Agent sees title: "🔴 Hook timeout issue: 60s too short"
|
||||
Agent thinks: "This looks relevant to my bug!"
|
||||
Agent fetches observation #2543: ~155 tokens
|
||||
Total tokens consumed: 955
|
||||
Relevant tokens: 955
|
||||
Efficiency: 100%
|
||||
```
|
||||
|
||||
### The Index Entry
|
||||
|
||||
```markdown
|
||||
| #2543 | 2:14 PM | 🔴 | Hook timeout: 60s too short for npm install | ~155 |
|
||||
```
|
||||
|
||||
**What the agent learns WITHOUT fetching:**
|
||||
- There's a known gotcha (🔴) about hook timeouts
|
||||
- It's related to npm install taking too long
|
||||
- Full details are ~155 tokens (cheap)
|
||||
- Happened at 2:14 PM (recent)
|
||||
|
||||
**Decision tree:**
|
||||
```
|
||||
Is my task related to hooks? → YES
|
||||
Is my task related to timeouts? → YES
|
||||
Is my task related to npm? → YES
|
||||
155 tokens is cheap → FETCH IT
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## The Two-Tier Search Strategy
|
||||
|
||||
Claude-Mem implements progressive disclosure in search results too:
|
||||
|
||||
### Tier 1: Index Format (Default)
|
||||
|
||||
```typescript
|
||||
search_observations({
|
||||
query: "hook timeout",
|
||||
format: "index" // Default
|
||||
})
|
||||
```
|
||||
|
||||
**Returns:**
|
||||
```
|
||||
Found 3 observations matching "hook timeout":
|
||||
|
||||
| ID | Date | Type | Title | Tokens |
|
||||
|----|------|------|-------|--------|
|
||||
| #2543 | Oct 26 | gotcha | Hook timeout: 60s too short | ~155 |
|
||||
| #2891 | Oct 25 | how-it-works | Hook timeout configuration | ~203 |
|
||||
| #2102 | Oct 20 | problem-solution | Fixed timeout in CI | ~89 |
|
||||
```
|
||||
|
||||
**Cost:** ~100 tokens for 3 results
|
||||
**Value:** Agent can scan and decide which to fetch
|
||||
|
||||
### Tier 2: Full Format (On-Demand)
|
||||
|
||||
```typescript
|
||||
search_observations({
|
||||
query: "hook timeout",
|
||||
format: "full",
|
||||
limit: 1 // Fetch just the most relevant
|
||||
})
|
||||
```
|
||||
|
||||
**Returns:**
|
||||
```
|
||||
#2543 🔴 Hook timeout: 60s too short for npm install
|
||||
─────────────────────────────────────────────────
|
||||
Date: Oct 26, 2025 2:14 PM
|
||||
Type: gotcha
|
||||
Project: claude-mem
|
||||
|
||||
Narrative:
|
||||
Discovered that the default 60-second hook timeout is insufficient
|
||||
for npm install operations, especially with large dependency trees
|
||||
or slow network conditions. This causes SessionStart hook to fail
|
||||
silently, preventing context injection.
|
||||
|
||||
Facts:
|
||||
- Default timeout: 60 seconds
|
||||
- npm install with cold cache: ~90 seconds
|
||||
- Configured timeout: 120 seconds in plugin/hooks/hooks.json:25
|
||||
|
||||
Files Modified:
|
||||
- plugin/hooks/hooks.json
|
||||
|
||||
Concepts: hooks, timeout, npm, configuration
|
||||
```
|
||||
|
||||
**Cost:** ~155 tokens for full details
|
||||
**Value:** Complete understanding of the issue
|
||||
|
||||
---
|
||||
|
||||
## Cognitive Load Theory
|
||||
|
||||
Progressive disclosure is grounded in **Cognitive Load Theory**:
|
||||
|
||||
### Intrinsic Load
|
||||
The inherent difficulty of the task itself.
|
||||
|
||||
**Example:** "Fix authentication bug"
|
||||
- Must understand auth system
|
||||
- Must understand the bug
|
||||
- Must write the fix
|
||||
|
||||
This load is unavoidable.
|
||||
|
||||
### Extraneous Load
|
||||
The cognitive burden of poorly presented information.
|
||||
|
||||
**Traditional RAG adds extraneous load:**
|
||||
- Scanning irrelevant observations
|
||||
- Filtering out noise
|
||||
- Remembering what to ignore
|
||||
- Re-contextualizing after each section
|
||||
|
||||
**Progressive disclosure minimizes extraneous load:**
|
||||
- Scan titles (low effort)
|
||||
- Fetch only relevant (targeted effort)
|
||||
- Full attention on current task
|
||||
|
||||
### Germane Load
|
||||
The effort of building mental models and schemas.
|
||||
|
||||
**Progressive disclosure supports germane load:**
|
||||
- Consistent structure (legend, grouping)
|
||||
- Clear categorization (types, icons)
|
||||
- Semantic compression (good titles)
|
||||
- Explicit costs (token counts)
|
||||
|
||||
---
|
||||
|
||||
## Anti-Patterns to Avoid
|
||||
|
||||
### ❌ Verbose Titles
|
||||
|
||||
**Bad:**
|
||||
```
|
||||
| #2543 | 2:14 PM | 🔴 | Investigation into the issue where hooks time out | ~155 |
|
||||
```
|
||||
|
||||
**Good:**
|
||||
```
|
||||
| #2543 | 2:14 PM | 🔴 | Hook timeout: 60s too short for npm install | ~155 |
|
||||
```
|
||||
|
||||
### ❌ Hiding Costs
|
||||
|
||||
**Bad:**
|
||||
```
|
||||
| #2543 | 2:14 PM | 🔴 | Hook timeout issue |
|
||||
```
|
||||
|
||||
**Good:**
|
||||
```
|
||||
| #2543 | 2:14 PM | 🔴 | Hook timeout issue | ~155 |
|
||||
```
|
||||
|
||||
### ❌ No Retrieval Path
|
||||
|
||||
**Bad:**
|
||||
```
|
||||
Here are 10 observations. [No instructions on how to get full details]
|
||||
```
|
||||
|
||||
**Good:**
|
||||
```
|
||||
Here are 10 observations.
|
||||
*Use MCP search tools to fetch full observation details on-demand*
|
||||
```
|
||||
|
||||
### ❌ Defaulting to Full Format
|
||||
|
||||
**Bad:**
|
||||
```typescript
|
||||
search_observations({
|
||||
query: "hooks",
|
||||
format: "full" // Fetches everything
|
||||
})
|
||||
```
|
||||
|
||||
**Good:**
|
||||
```typescript
|
||||
search_observations({
|
||||
query: "hooks",
|
||||
format: "index", // Scan first
|
||||
limit: 20
|
||||
})
|
||||
|
||||
// Then, if needed:
|
||||
search_observations({
|
||||
query: "hooks",
|
||||
format: "full",
|
||||
limit: 1 // Just the most relevant
|
||||
})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Design Decisions
|
||||
|
||||
### Why Token Counts?
|
||||
|
||||
**Decision:** Show approximate token counts (~155, ~203) rather than exact counts.
|
||||
|
||||
**Rationale:**
|
||||
- Communicates scale (50 vs 500) without false precision
|
||||
- Maps to human intuition (small/medium/large)
|
||||
- Allows agent to budget attention
|
||||
- Encourages cost-conscious retrieval
|
||||
|
||||
### Why Icons Instead of Text Labels?
|
||||
|
||||
**Decision:** Use emoji icons (🔴, 🟡, 🔵) rather than text (GOTCHA, PROBLEM, HOWTO).
|
||||
|
||||
**Rationale:**
|
||||
- Visual scanning (pattern recognition)
|
||||
- Token efficient (1 char vs 10 chars)
|
||||
- Language-agnostic
|
||||
- Aesthetically distinct
|
||||
- Works for both humans and AI
|
||||
|
||||
### Why Index-First, Not Smart Pre-Fetch?
|
||||
|
||||
**Decision:** Always show index first, even if we "know" what's relevant.
|
||||
|
||||
**Rationale:**
|
||||
- We can't know what's relevant better than the agent
|
||||
- Pre-fetching assumes we understand the task
|
||||
- Agent knows current context, we don't
|
||||
- Respects agent autonomy
|
||||
- Fails gracefully (can always fetch more)
|
||||
|
||||
### Why Group by File Path?
|
||||
|
||||
**Decision:** Group observations by file path in addition to date.
|
||||
|
||||
**Rationale:**
|
||||
- Spatial locality: Work on file X likely needs context about file X
|
||||
- Reduces scanning effort
|
||||
- Matches how developers think
|
||||
- Clear semantic boundaries
|
||||
|
||||
---
|
||||
|
||||
## Measuring Success
|
||||
|
||||
Progressive disclosure is working when:
|
||||
|
||||
### ✅ Low Waste Ratio
|
||||
```
|
||||
Relevant Tokens / Total Context Tokens > 80%
|
||||
```
|
||||
|
||||
Most of the context consumed is actually useful.
|
||||
|
||||
### ✅ Selective Fetching
|
||||
```
|
||||
Index Shown: 50 observations
|
||||
Details Fetched: 2-3 observations
|
||||
```
|
||||
|
||||
Agent is being selective, not fetching everything.
|
||||
|
||||
### ✅ Fast Task Completion
|
||||
```
|
||||
Session with index: 30 seconds to find relevant context
|
||||
Session without: 90 seconds scanning all context
|
||||
```
|
||||
|
||||
Time-to-relevant-information is faster.
|
||||
|
||||
### ✅ Appropriate Depth
|
||||
```
|
||||
Simple task: Only index needed
|
||||
Medium task: 1-2 observations fetched
|
||||
Complex task: 5-10 observations + code reads
|
||||
```
|
||||
|
||||
Depth scales with task complexity.
|
||||
|
||||
---
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
### Adaptive Index Size
|
||||
|
||||
```typescript
|
||||
// Vary index size based on session type
|
||||
SessionStart({ source: "startup" }):
|
||||
→ Show last 10 sessions (small index)
|
||||
|
||||
SessionStart({ source: "resume" }):
|
||||
→ Show only current session (micro index)
|
||||
|
||||
SessionStart({ source: "compact" }):
|
||||
→ Show last 20 sessions (larger index)
|
||||
```
|
||||
|
||||
### Relevance Scoring
|
||||
|
||||
```typescript
|
||||
// Use embeddings to pre-sort index by relevance
|
||||
search_observations({
|
||||
query: "authentication bug",
|
||||
format: "index",
|
||||
sort: "relevance" // Based on semantic similarity
|
||||
})
|
||||
```
|
||||
|
||||
### Cost Forecasting
|
||||
|
||||
```markdown
|
||||
💡 **Budget Estimate:**
|
||||
- Fetching all 🔴 gotchas: ~450 tokens
|
||||
- Fetching all file-related: ~1,200 tokens
|
||||
- Fetching everything: ~8,500 tokens
|
||||
```
|
||||
|
||||
### Progressive Detail Levels
|
||||
|
||||
```
|
||||
Layer 1: Index (titles only)
|
||||
Layer 2: Summaries (2-3 sentences)
|
||||
Layer 3: Full details (complete observation)
|
||||
Layer 4: Source files (referenced code)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Takeaways
|
||||
|
||||
1. **Show, don't tell**: Index reveals what exists without forcing consumption
|
||||
2. **Cost-conscious**: Make retrieval costs visible for informed decisions
|
||||
3. **Agent autonomy**: Let the agent decide what's relevant
|
||||
4. **Semantic compression**: Good titles make or break the system
|
||||
5. **Consistent structure**: Patterns reduce cognitive load
|
||||
6. **Two-tier everything**: Index first, details on-demand
|
||||
7. **Context as currency**: Spend wisely on high-value information
|
||||
|
||||
---
|
||||
|
||||
## Remember
|
||||
|
||||
> "The best interface is one that disappears when not needed, and appears exactly when it is."
|
||||
|
||||
Progressive disclosure respects the agent's intelligence and autonomy. We provide the map; the agent chooses the path.
|
||||
|
||||
---
|
||||
|
||||
## Further Reading
|
||||
|
||||
- [Context Engineering for AI Agents](/docs/context-engineering) - Foundational principles
|
||||
- [Claude-Mem Architecture](/docs/architecture) - How it all fits together
|
||||
- Cognitive Load Theory (Sweller, 1988)
|
||||
- Information Foraging Theory (Pirolli & Card, 1999)
|
||||
- Progressive Disclosure (Nielsen Norman Group)
|
||||
|
||||
---
|
||||
|
||||
*This philosophy emerged from real-world usage of Claude-Mem across hundreds of coding sessions. The pattern works because it aligns with both human cognition and LLM attention mechanics.*
|
||||
@@ -8,6 +8,11 @@
|
||||
"type": "command",
|
||||
"command": "cd \"${CLAUDE_PLUGIN_ROOT}/..\" && npm install --prefer-offline --no-audit --no-fund --loglevel=silent && node ${CLAUDE_PLUGIN_ROOT}/scripts/context-hook.js",
|
||||
"timeout": 300
|
||||
},
|
||||
{
|
||||
"type": "command",
|
||||
"command": "node ${CLAUDE_PLUGIN_ROOT}/scripts/stderr-test-hook.js",
|
||||
"timeout": 10
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
Executable
+3
@@ -0,0 +1,3 @@
|
||||
#!/usr/bin/env node
|
||||
#!/usr/bin/env node
|
||||
console.error("\u{1F9EA} TEST: This is a stderr message from the claude-mem hook");process.exit(0);
|
||||
@@ -17,7 +17,8 @@ const HOOKS = [
|
||||
{ name: 'new-hook', source: 'src/hooks/new-hook.ts' },
|
||||
{ name: 'save-hook', source: 'src/hooks/save-hook.ts' },
|
||||
{ name: 'summary-hook', source: 'src/hooks/summary-hook.ts' },
|
||||
{ name: 'cleanup-hook', source: 'src/hooks/cleanup-hook.ts' }
|
||||
{ name: 'cleanup-hook', source: 'src/hooks/cleanup-hook.ts' },
|
||||
{ name: 'stderr-test-hook', source: 'src/hooks/stderr-test-hook.ts' }
|
||||
];
|
||||
|
||||
const WORKER_SERVICE = {
|
||||
|
||||
@@ -0,0 +1,12 @@
|
||||
#!/usr/bin/env node
|
||||
|
||||
/**
|
||||
* Test hook to verify if stderr messages appear in Claude Code UI
|
||||
* This hook simply outputs a message via console.error()
|
||||
*/
|
||||
|
||||
// Output a test message to stderr
|
||||
console.error('🧪 TEST: This is a stderr message from the claude-mem hook');
|
||||
|
||||
// Exit successfully
|
||||
process.exit(0);
|
||||
Reference in New Issue
Block a user