# Architecture Evolution: The Journey from v3 to v4 ## The Problem We Solved **Goal:** Create a memory system that makes Claude smarter across sessions without the user noticing it exists. **Challenge:** How do you observe AI agent behavior, compress it intelligently, and serve it back at the right time - all without slowing down or interfering with the main workflow? This is the story of how claude-mem evolved from a simple idea to a production-ready system, and the key architectural decisions that made it work. --- ## v1-v2: The Naive Approach ### The First Attempt: Dump Everything **Architecture:** ``` PostToolUse Hook → Save raw tool outputs → Retrieve everything on startup ``` **What we learned:** - ❌ Context pollution (thousands of tokens of irrelevant data) - ❌ No compression (raw tool outputs are verbose) - ❌ No search (had to scan everything linearly) - ✅ Proved the concept: Memory across sessions is valuable **Example of what went wrong:** ``` SessionStart loaded: - 150 file read operations - 80 grep searches - 45 bash commands - Total: ~35,000 tokens - Relevant to current task: ~500 tokens (1.4%) ``` --- ## v3: Smart Compression, Wrong Architecture ### The Breakthrough: AI-Powered Compression **New idea:** Use Claude itself to compress observations **Architecture:** ``` PostToolUse Hook → Queue observation → SDK Worker → AI compression → Store insights ``` **What we added:** 1. **Claude Agent SDK integration** - Use AI to compress observations 2. **Background worker** - Don't block main session 3. **Structured observations** - Extract facts, decisions, insights 4. **Session summaries** - Generate comprehensive summaries **What worked:** - ✅ Compression ratio: 10:1 to 100:1 - ✅ Semantic understanding (not just keyword matching) - ✅ Background processing (hooks stayed fast) - ✅ Search became useful **What didn't work:** - ❌ Still loaded everything upfront - ❌ Session ID management was broken - ❌ Aggressive cleanup interrupted summaries - ❌ Multiple SDK sessions per Claude Code session --- ## The Key Realizations ### Realization 1: Progressive Disclosure **Problem:** Even compressed observations can pollute context if you load them all. **Insight:** Humans don't read everything before starting work. Why should AI? **Solution:** Show an index first, fetch details on-demand. ``` ❌ Old: Load 50 observations (8,500 tokens) ✅ New: Show index of 50 observations (800 tokens) Agent fetches 2-3 relevant ones (300 tokens) Total: 1,100 tokens vs 8,500 tokens ``` **Impact:** - 87% reduction in context usage - 100% relevance (only fetch what's needed) - Agent autonomy (decides what's relevant) ### Realization 2: Session ID Chaos **Problem:** SDK session IDs change on every turn. **What we thought:** ```typescript // ❌ Wrong assumption UserPromptSubmit → Capture session ID once → Use forever ``` **Reality:** ```typescript // ✅ Actual behavior Turn 1: session_abc123 Turn 2: session_def456 Turn 3: session_ghi789 ``` **Why this matters:** - Can't resume sessions without tracking ID updates - Session state gets lost between turns - Observations get orphaned **Solution:** ```typescript // Capture from system init message for await (const msg of response) { if (msg.type === 'system' && msg.subtype === 'init') { sdkSessionId = msg.session_id; await updateSessionId(sessionId, sdkSessionId); } } ``` ### Realization 3: Graceful vs Aggressive Cleanup **v3 approach:** ```typescript // ❌ Aggressive: Kill worker immediately SessionEnd → DELETE /worker/session → Worker stops ``` **Problems:** - Summary generation interrupted mid-process - Pending observations lost - Race conditions everywhere **v4 approach:** ```typescript // ✅ Graceful: Let worker finish SessionEnd → Mark session complete → Worker finishes → Exit naturally ``` **Benefits:** - Summaries complete successfully - No lost observations - Clean state transitions **Code:** ```typescript // v3: Aggressive async function sessionEnd(sessionId: string) { await fetch(`http://localhost:37777/sessions/${sessionId}`, { method: 'DELETE' }); } // v4: Graceful async function sessionEnd(sessionId: string) { await db.run( 'UPDATE sdk_sessions SET completed_at = ? WHERE id = ?', [Date.now(), sessionId] ); } ``` ### Realization 4: One Session, Not Many **Problem:** We were creating multiple SDK sessions per Claude Code session. **What we thought:** ``` Claude Code session → Create SDK session per observation → 100+ SDK sessions ``` **Reality should be:** ``` Claude Code session → ONE long-running SDK session → Streaming input ``` **Why this matters:** - SDK maintains conversation state - Context accumulates naturally - Much more efficient **Implementation:** ```typescript // ✅ Streaming Input Mode async function* messageGenerator(): AsyncIterable { // Initial prompt yield { role: "user", content: "You are a memory assistant..." }; // Then continuously yield observations while (session.status === 'active') { const observations = await pollQueue(); for (const obs of observations) { yield { role: "user", content: formatObservation(obs) }; } await sleep(1000); } } const response = query({ prompt: messageGenerator(), options: { maxTurns: 1000 } }); ``` --- ## v4: The Architecture That Works ### The Core Design ``` ┌─────────────────────────────────────────────────────────┐ │ CLAUDE CODE SESSION │ │ User → Claude → Tools (Read, Edit, Write, Bash) │ │ ↓ │ │ PostToolUse Hook │ │ (queues observation) │ └─────────────────────────────────────────────────────────┘ ↓ SQLite queue ┌─────────────────────────────────────────────────────────┐ │ SDK WORKER PROCESS │ │ ONE streaming session per Claude Code session │ │ │ │ AsyncIterable │ │ → Yields observations from queue │ │ → SDK compresses via AI │ │ → Parses XML responses │ │ → Stores in database │ └─────────────────────────────────────────────────────────┘ ↓ SQLite storage ┌─────────────────────────────────────────────────────────┐ │ NEXT SESSION │ │ SessionStart Hook │ │ → Queries database │ │ → Returns progressive disclosure index │ │ → Agent fetches details via MCP │ └─────────────────────────────────────────────────────────┘ ``` ### The Five Hook Architecture **Purpose:** Inject context from previous sessions **Timing:** When Claude Code starts **What it does:** - Queries last 10 session summaries - Formats as progressive disclosure index - Injects into context via stdout **Key change from v3:** - ✅ Index format (not full details) - ✅ Token counts visible - ✅ MCP search instructions included **Purpose:** Initialize session tracking **Timing:** Before Claude processes prompt **What it does:** - Creates session record - Saves raw user prompt (v4.2.0+) - Starts worker if needed **Key change from v3:** - ✅ Stores raw prompts for search - ✅ Auto-starts PM2 worker **Purpose:** Capture tool observations **Timing:** After every tool execution **What it does:** - Enqueues observation in database - Returns immediately **Key change from v3:** - ✅ Just enqueues (doesn't process) - ✅ Worker handles all AI calls **Purpose:** Generate session summaries **Timing:** Worker-triggered (mid-session) **What it does:** - Gathers observations - Sends to Claude for summarization - Stores structured summary **Key change from v3:** - ✅ Multiple summaries per session - ✅ Summaries are checkpoints, not endings **Purpose:** Graceful cleanup **Timing:** When session ends **What it does:** - Marks session complete - Lets worker finish processing **Key change from v3:** - ✅ Graceful (not aggressive) - ✅ No DELETE requests - ✅ Worker finishes naturally ### Database Schema Evolution **v3 schema:** ```sql -- Simple, flat structure CREATE TABLE observations ( id INTEGER PRIMARY KEY, session_id TEXT, text TEXT, created_at INTEGER ); ``` **v4 schema:** ```sql -- Rich, structured schema CREATE TABLE observations ( id INTEGER PRIMARY KEY AUTOINCREMENT, session_id TEXT NOT NULL, project TEXT NOT NULL, -- Progressive disclosure metadata title TEXT NOT NULL, subtitle TEXT, type TEXT NOT NULL, -- decision, bugfix, feature, etc. -- Content narrative TEXT NOT NULL, facts TEXT, -- JSON array -- Searchability concepts TEXT, -- JSON array of tags files_read TEXT, -- JSON array files_modified TEXT, -- JSON array -- Timestamps created_at TEXT NOT NULL, created_at_epoch INTEGER NOT NULL, FOREIGN KEY(session_id) REFERENCES sdk_sessions(id) ); -- FTS5 for full-text search CREATE VIRTUAL TABLE observations_fts USING fts5( title, subtitle, narrative, facts, concepts, content=observations ); -- Auto-sync triggers CREATE TRIGGER observations_ai AFTER INSERT ON observations BEGIN INSERT INTO observations_fts(rowid, title, subtitle, narrative, facts, concepts) VALUES (new.id, new.title, new.subtitle, new.narrative, new.facts, new.concepts); END; ``` **What changed:** - ✅ Structured fields (title, subtitle, type) - ✅ FTS5 full-text search - ✅ Project-scoped queries - ✅ Rich metadata for progressive disclosure ### Worker Service Redesign **v3 worker:** ```typescript // Multiple short SDK sessions app.post('/process', async (req, res) => { const response = await query({ prompt: buildPrompt(req.body), options: { maxTurns: 1 } }); for await (const msg of response) { // Process single observation } res.json({ success: true }); }); ``` **v4 worker:** ```typescript // ONE long-running SDK session async function runWorker(sessionId: string) { const response = query({ prompt: messageGenerator(), // AsyncIterable options: { maxTurns: 1000 } }); for await (const msg of response) { if (msg.type === 'text') { parseObservations(msg.content); parseSummaries(msg.content); } } } ``` **Benefits:** - Maintains conversation state - SDK handles context automatically - More efficient (fewer API calls) - Natural multi-turn flow --- ## Critical Fixes Along the Way ### Fix 1: Context Injection Pollution (v4.3.1) **Problem:** SessionStart hook output polluted with npm install logs ```bash # Hook output contained: npm WARN deprecated ... npm WARN deprecated ... {"hookSpecificOutput": {"additionalContext": "..."}} ``` **Why it broke:** - Claude Code expects clean JSON or plain text - stderr/stdout from npm install mixed with hook output - Context didn't inject properly **Solution:** ```json { "command": "npm install --loglevel=silent && node context-hook.js" } ``` **Result:** Clean JSON output, context injection works ### Fix 2: Double Shebang Issue (v4.3.1) **Problem:** Hook executables had duplicate shebangs ```javascript #!/usr/bin/env node #!/usr/bin/env node // ← Duplicate! // Rest of code... ``` **Why it happened:** - Source files had shebang - esbuild added another shebang during build **Solution:** ```typescript // Remove shebangs from source files // Let esbuild add them during build ``` **Result:** Clean executables, no parsing errors ### Fix 3: FTS5 Injection Vulnerability (v4.2.3) **Problem:** User input passed directly to FTS5 query ```typescript // ❌ Vulnerable const results = db.query( `SELECT * FROM observations_fts WHERE observations_fts MATCH '${userQuery}'` ); ``` **Attack:** ```typescript userQuery = "'; DROP TABLE observations; --" ``` **Solution:** ```typescript // ✅ Safe: Use parameterized queries const results = db.query( 'SELECT * FROM observations_fts WHERE observations_fts MATCH ?', [userQuery] ); ``` ### Fix 4: NOT NULL Constraint Violation (v4.2.8) **Problem:** Session creation failed when prompt was empty ```sql INSERT INTO sdk_sessions (claude_session_id, user_prompt, ...) VALUES ('abc123', NULL, ...) -- ❌ user_prompt is NOT NULL ``` **Solution:** ```typescript // Allow NULL user_prompts user_prompt: input.prompt ?? null ``` **Schema change:** ```sql -- Before user_prompt TEXT NOT NULL -- After user_prompt TEXT -- Nullable ``` --- ## Performance Improvements ### Optimization 1: Prepared Statements **Before:** ```typescript for (const obs of observations) { db.run(`INSERT INTO observations (...) VALUES (?, ?, ...)`, [obs.id, obs.text, ...]); } ``` **After:** ```typescript const stmt = db.prepare(`INSERT INTO observations (...) VALUES (?, ?, ...)`); for (const obs of observations) { stmt.run([obs.id, obs.text, ...]); } stmt.finalize(); ``` **Impact:** 5x faster bulk inserts ### Optimization 2: FTS5 Indexing **Before:** ```typescript // Manual full-text search const results = db.query( `SELECT * FROM observations WHERE text LIKE '%${query}%'` ); ``` **After:** ```typescript // FTS5 virtual table const results = db.query( `SELECT * FROM observations_fts WHERE observations_fts MATCH ?`, [query] ); ``` **Impact:** 100x faster searches on large datasets ### Optimization 3: Index Format Default **Before:** ```typescript // Always return full observations search_observations({ query: "hooks" }); // Returns: 5,000 tokens ``` **After:** ```typescript // Default to index format search_observations({ query: "hooks", format: "index" }); // Returns: 200 tokens // Fetch full only when needed search_observations({ query: "hooks", format: "full", limit: 1 }); // Returns: 150 tokens ``` **Impact:** 25x reduction in average search result size --- ## What We Learned ### Lesson 1: Context is Precious **Principle:** Every token you put in context window costs attention. **Application:** - Progressive disclosure reduces waste by 87% - Index-first approach gives agent control - Token counts make costs visible ### Lesson 2: Session State is Complicated **Principle:** Distributed state is hard. SDK handles it better than we can. **Application:** - Use SDK's built-in session resumption - Don't try to manually reconstruct state - Track session IDs from init messages ### Lesson 3: Graceful Beats Aggressive **Principle:** Let processes finish their work before terminating. **Application:** - Graceful cleanup prevents data loss - Workers finish important operations - Clean state transitions reduce bugs ### Lesson 4: AI is the Compressor **Principle:** Don't compress manually. Let AI do semantic compression. **Application:** - 10:1 to 100:1 compression ratios - Semantic understanding, not keyword extraction - Structured outputs (XML parsing) ### Lesson 5: Progressive Everything **Principle:** Show metadata first, fetch details on-demand. **Application:** - Progressive disclosure in context injection - Index format in search results - Layer 1 (titles) → Layer 2 (summaries) → Layer 3 (full details) --- ## The Road Ahead ### Planned: Adaptive Index Size ```typescript SessionStart({ source: "startup" }): → Show last 10 sessions (normal) SessionStart({ source: "resume" }): → Show only current session (minimal) SessionStart({ source: "compact" }): → Show last 20 sessions (comprehensive) ``` ### Planned: Relevance Scoring ```typescript // Use embeddings to pre-sort index by semantic relevance search_observations({ query: "authentication bug", sort: "relevance" // Based on embeddings }); ``` ### Planned: Multi-Project Context ```typescript // Cross-project pattern recognition search_observations({ query: "API rate limiting", projects: ["api-gateway", "user-service", "billing-service"] }); ``` ### Planned: Collaborative Memory ```typescript // Team-shared observations (optional) createObservation({ title: "Rate limit: 100 req/min", scope: "team" // vs "user" }); ``` --- ## Migration Guide: v3 → v4 ### Step 1: Backup Database ```bash cp ~/.claude-mem/claude-mem.db ~/.claude-mem/claude-mem-v3-backup.db ``` ### Step 2: Update Plugin ```bash cd ~/.claude/plugins/marketplaces/thedotmack git pull ``` ### Step 3: Run Migration ```bash npx tsx src/services/sqlite/migrations/v3-to-v4.ts ``` **What the migration does:** - Adds new columns to observations table - Creates FTS5 virtual tables - Sets up auto-sync triggers - Migrates existing observations to new schema ### Step 4: Restart Worker ```bash pm2 restart claude-mem-worker pm2 logs claude-mem-worker ``` ### Step 5: Test ```bash # Start Claude Code claude # Check that context is injected # (Should see progressive disclosure index) # Submit a prompt and check observations pm2 logs claude-mem-worker --nostream ``` --- ## Key Metrics ### v3 Performance | Metric | Value | |--------|-------| | Context usage per session | ~25,000 tokens | | Relevant context | ~2,000 tokens (8%) | | Hook execution time | ~200ms | | Search latency | ~500ms (LIKE queries) | ### v4 Performance | Metric | Value | |--------|-------| | Context usage per session | ~1,100 tokens | | Relevant context | ~1,100 tokens (100%) | | Hook execution time | ~45ms | | Search latency | ~15ms (FTS5) | **Improvements:** - 96% reduction in context waste - 12x increase in relevance - 4x faster hooks - 33x faster search --- ## Conclusion The journey from v3 to v4 was about understanding these fundamental truths: 1. **Context is finite** - Progressive disclosure respects attention budget 2. **AI is the compressor** - Semantic understanding beats keyword extraction 3. **Agents are smart** - Let them decide what to fetch 4. **State is hard** - Use SDK's built-in mechanisms 5. **Graceful wins** - Let processes finish cleanly The result is a memory system that's both powerful and invisible. Users never notice it working - Claude just gets smarter over time. --- ## Further Reading - [Progressive Disclosure](/docs/progressive-disclosure) - The philosophy behind v4 - [Hooks Architecture](/docs/hooks-architecture) - How hooks power the system - [Context Engineering](/docs/context-engineering) - Foundational principles - [v4.0.0 Release Notes](/CHANGELOG.md#v400) - Full changelog --- *This architecture evolution reflects hundreds of hours of experimentation, dozens of dead ends, and the invaluable experience of real-world usage. v4 is the architecture that emerged from understanding what actually works.*