# Architecture Evolution: The Journey from v3 to v5 ## The Problem We Solved **Goal:** Create a memory system that makes Claude smarter across sessions without the user noticing it exists. **Challenge:** How do you observe AI agent behavior, compress it intelligently, and serve it back at the right time - all without slowing down or interfering with the main workflow? This is the story of how claude-mem evolved from a simple idea to a production-ready system, and the key architectural decisions that made it work. --- ## v5.x: Maturity and User Experience After establishing the solid v4 architecture, v5.x focused on user experience, visualization, and polish. ### v5.1.2: Theme Toggle (November 2025) **What Changed**: Added light/dark mode theme toggle to viewer UI **New Features**: - User-selectable theme preference (light, dark, system) - Persistent theme settings in localStorage - Smooth theme transitions - System preference detection **Implementation**: ```typescript // Theme context with persistence const ThemeProvider = ({ children }) => { const [theme, setTheme] = useState<'light' | 'dark' | 'system'>(() => { return localStorage.getItem('claude-mem-theme') || 'system'; }); useEffect(() => { localStorage.setItem('claude-mem-theme', theme); }, [theme]); return ( {children} ); }; ``` **Why It Matters**: Users working in different lighting conditions can now customize the viewer for comfort. ### v5.1.1: PM2 Windows Fix (November 2025) **The Problem**: PM2 startup failed on Windows with ENOENT error **Root Cause**: ```typescript // ❌ Failed on Windows - PM2 not in PATH execSync('pm2 start ecosystem.config.cjs'); ``` **The Fix**: ```typescript // ✅ Use full path to PM2 binary const PM2_PATH = join(PLUGIN_ROOT, 'node_modules', '.bin', 'pm2'); execSync(`"${PM2_PATH}" start "${ECOSYSTEM_CONFIG}"`); ``` **Impact**: Cross-platform compatibility restored, Windows users can now use claude-mem without issues. ### v5.1.0: Web-Based Viewer UI (October 2025) **The Breakthrough**: Real-time visualization of memory stream **What We Built**: - React-based web UI at http://localhost:37777 - Server-Sent Events (SSE) for real-time updates - Infinite scroll pagination - Project filtering - Settings persistence (sidebar state, selected project) - Auto-reconnection with exponential backoff - GPU-accelerated animations **New Worker Endpoints** (8 additions): ``` GET / # Serves viewer HTML GET /stream # SSE real-time updates GET /api/prompts # Paginated user prompts GET /api/observations # Paginated observations GET /api/summaries # Paginated session summaries GET /api/stats # Database statistics GET /api/settings # User settings POST /api/settings # Save settings ``` **Database Enhancements**: ```typescript // New SessionStore methods for viewer getRecentPrompts(limit, offset, project?) getRecentObservations(limit, offset, project?) getRecentSummaries(limit, offset, project?) getStats() getUniqueProjects() ``` **React Architecture**: ``` src/ui/viewer/ ├── components/ │ ├── Header.tsx # Navigation + stats │ ├── Sidebar.tsx # Project filter │ ├── Feed.tsx # Infinite scroll │ └── cards/ │ ├── ObservationCard.tsx │ ├── PromptCard.tsx │ ├── SummaryCard.tsx │ └── SkeletonCard.tsx ├── hooks/ │ ├── useSSE.ts # Real-time events │ ├── usePagination.ts # Infinite scroll │ ├── useSettings.ts # Persistence │ └── useStats.ts # Statistics └── utils/ ├── merge.ts # Data deduplication └── format.ts # Display formatting ``` **Build Process**: ```typescript // esbuild bundles everything into single HTML file esbuild.build({ entryPoints: ['src/ui/viewer/index.tsx'], bundle: true, outfile: 'plugin/ui/viewer.html', loader: { '.tsx': 'tsx', '.woff2': 'dataurl' }, define: { 'process.env.NODE_ENV': '"production"' }, }); ``` **Why It Matters**: Users can now see exactly what's being captured in real-time, making the memory system transparent and debuggable. ### v5.0.3: Smart Install Caching (October 2025) **The Problem**: `npm install` ran on every SessionStart (2-5 seconds) **The Insight**: Dependencies rarely change between sessions **The Solution**: Version-based caching ```typescript // Check version marker before installing const currentVersion = getPackageVersion(); const installedVersion = readFileSync('.install-version', 'utf-8'); if (currentVersion !== installedVersion) { // Only install if version changed await runNpmInstall(); writeFileSync('.install-version', currentVersion); } ``` **Cached Check Logic**: 1. Does `node_modules` exist? 2. Does `.install-version` match `package.json` version? 3. Is `better-sqlite3` present? **Impact**: - SessionStart hook: 2-5 seconds → 10ms (99.5% faster) - Only installs on: first run, version change, missing deps - Better Windows error messages with build tool help ### v5.0.2: Worker Health Checks (October 2025) **What Changed**: More robust worker startup and monitoring **New Features**: ```typescript // Health check endpoint app.get('/health', (req, res) => { res.json({ status: 'ok', uptime: process.uptime(), port: WORKER_PORT, memory: process.memoryUsage(), }); }); // Smart worker startup async function ensureWorkerHealthy() { const healthy = await isWorkerHealthy(1000); if (!healthy) { await startWorker(); await waitForWorkerHealth(10000); } } ``` **Benefits**: - Graceful degradation when worker is down - Auto-recovery from crashes - Better error messages for debugging ### v5.0.1: Stability Improvements (October 2025) **What Changed**: Various bug fixes and stability enhancements **Key Fixes**: - Fixed race conditions in observation queue processing - Improved error handling in SDK worker - Better cleanup of stale PM2 processes - Enhanced logging for debugging ### v5.0.0: Hybrid Search Architecture (October 2025) **The Evolution**: SQLite FTS5 + Chroma vector search **What We Added**: ``` ┌─────────────────────────────────────────────────────────┐ │ HYBRID SEARCH │ │ │ │ Text Query → SQLite FTS5 (keyword matching) │ │ ↓ │ │ Chroma Vector Search (semantic) │ │ ↓ │ │ Merge + Re-rank Results │ └─────────────────────────────────────────────────────────┘ ``` **New Dependencies**: - `chromadb` - Vector database for semantic search - Python 3.8+ - Required by chromadb **MCP Tools Enhancement**: ```typescript // Chroma-backed semantic search search_observations({ query: "authentication bug", useSemanticSearch: true // Uses Chroma }); // Falls back to FTS5 if Chroma unavailable ``` **Why Hybrid**: - FTS5: Fast keyword matching, no dependencies - Chroma: Semantic understanding, finds related concepts - Graceful degradation: Works without Chroma (FTS5 only) **Trade-offs**: - Added Python dependency (optional) - Increased installation complexity - Better search relevance --- ## v1-v2: The Naive Approach ### The First Attempt: Dump Everything **Architecture:** ``` PostToolUse Hook → Save raw tool outputs → Retrieve everything on startup ``` **What we learned:** - ❌ Context pollution (thousands of tokens of irrelevant data) - ❌ No compression (raw tool outputs are verbose) - ❌ No search (had to scan everything linearly) - ✅ Proved the concept: Memory across sessions is valuable **Example of what went wrong:** ``` SessionStart loaded: - 150 file read operations - 80 grep searches - 45 bash commands - Total: ~35,000 tokens - Relevant to current task: ~500 tokens (1.4%) ``` --- ## v3: Smart Compression, Wrong Architecture ### The Breakthrough: AI-Powered Compression **New idea:** Use Claude itself to compress observations **Architecture:** ``` PostToolUse Hook → Queue observation → SDK Worker → AI compression → Store insights ``` **What we added:** 1. **Claude Agent SDK integration** - Use AI to compress observations 2. **Background worker** - Don't block main session 3. **Structured observations** - Extract facts, decisions, insights 4. **Session summaries** - Generate comprehensive summaries **What worked:** - ✅ Compression ratio: 10:1 to 100:1 - ✅ Semantic understanding (not just keyword matching) - ✅ Background processing (hooks stayed fast) - ✅ Search became useful **What didn't work:** - ❌ Still loaded everything upfront - ❌ Session ID management was broken - ❌ Aggressive cleanup interrupted summaries - ❌ Multiple SDK sessions per Claude Code session --- ## The Key Realizations ### Realization 1: Progressive Disclosure **Problem:** Even compressed observations can pollute context if you load them all. **Insight:** Humans don't read everything before starting work. Why should AI? **Solution:** Show an index first, fetch details on-demand. ``` ❌ Old: Load 50 observations (8,500 tokens) ✅ New: Show index of 50 observations (800 tokens) Agent fetches 2-3 relevant ones (300 tokens) Total: 1,100 tokens vs 8,500 tokens ``` **Impact:** - 87% reduction in context usage - 100% relevance (only fetch what's needed) - Agent autonomy (decides what's relevant) ### Realization 2: Session ID Chaos **Problem:** SDK session IDs change on every turn. **What we thought:** ```typescript // ❌ Wrong assumption UserPromptSubmit → Capture session ID once → Use forever ``` **Reality:** ```typescript // ✅ Actual behavior Turn 1: session_abc123 Turn 2: session_def456 Turn 3: session_ghi789 ``` **Why this matters:** - Can't resume sessions without tracking ID updates - Session state gets lost between turns - Observations get orphaned **Solution:** ```typescript // Capture from system init message for await (const msg of response) { if (msg.type === 'system' && msg.subtype === 'init') { sdkSessionId = msg.session_id; await updateSessionId(sessionId, sdkSessionId); } } ``` ### Realization 3: Graceful vs Aggressive Cleanup **v3 approach:** ```typescript // ❌ Aggressive: Kill worker immediately SessionEnd → DELETE /worker/session → Worker stops ``` **Problems:** - Summary generation interrupted mid-process - Pending observations lost - Race conditions everywhere **v4 approach:** ```typescript // ✅ Graceful: Let worker finish SessionEnd → Mark session complete → Worker finishes → Exit naturally ``` **Benefits:** - Summaries complete successfully - No lost observations - Clean state transitions **Code:** ```typescript // v3: Aggressive async function sessionEnd(sessionId: string) { await fetch(`http://localhost:37777/sessions/${sessionId}`, { method: 'DELETE' }); } // v4: Graceful async function sessionEnd(sessionId: string) { await db.run( 'UPDATE sdk_sessions SET completed_at = ? WHERE id = ?', [Date.now(), sessionId] ); } ``` ### Realization 4: One Session, Not Many **Problem:** We were creating multiple SDK sessions per Claude Code session. **What we thought:** ``` Claude Code session → Create SDK session per observation → 100+ SDK sessions ``` **Reality should be:** ``` Claude Code session → ONE long-running SDK session → Streaming input ``` **Why this matters:** - SDK maintains conversation state - Context accumulates naturally - Much more efficient **Implementation:** ```typescript // ✅ Streaming Input Mode async function* messageGenerator(): AsyncIterable { // Initial prompt yield { role: "user", content: "You are a memory assistant..." }; // Then continuously yield observations while (session.status === 'active') { const observations = await pollQueue(); for (const obs of observations) { yield { role: "user", content: formatObservation(obs) }; } await sleep(1000); } } const response = query({ prompt: messageGenerator(), options: { maxTurns: 1000 } }); ``` --- ## v4: The Architecture That Works ### The Core Design ``` ┌─────────────────────────────────────────────────────────┐ │ CLAUDE CODE SESSION │ │ User → Claude → Tools (Read, Edit, Write, Bash) │ │ ↓ │ │ PostToolUse Hook │ │ (queues observation) │ └─────────────────────────────────────────────────────────┘ ↓ SQLite queue ┌─────────────────────────────────────────────────────────┐ │ SDK WORKER PROCESS │ │ ONE streaming session per Claude Code session │ │ │ │ AsyncIterable │ │ → Yields observations from queue │ │ → SDK compresses via AI │ │ → Parses XML responses │ │ → Stores in database │ └─────────────────────────────────────────────────────────┘ ↓ SQLite storage ┌─────────────────────────────────────────────────────────┐ │ NEXT SESSION │ │ SessionStart Hook │ │ → Queries database │ │ → Returns progressive disclosure index │ │ → Agent fetches details via MCP │ └─────────────────────────────────────────────────────────┘ ``` ### The Five Hook Architecture **Purpose:** Inject context from previous sessions **Timing:** When Claude Code starts **What it does:** - Queries last 10 session summaries - Formats as progressive disclosure index - Injects into context via stdout **Key change from v3:** - ✅ Index format (not full details) - ✅ Token counts visible - ✅ MCP search instructions included **Purpose:** Initialize session tracking **Timing:** Before Claude processes prompt **What it does:** - Creates session record - Saves raw user prompt (v4.2.0+) - Starts worker if needed **Key change from v3:** - ✅ Stores raw prompts for search - ✅ Auto-starts PM2 worker **Purpose:** Capture tool observations **Timing:** After every tool execution **What it does:** - Enqueues observation in database - Returns immediately **Key change from v3:** - ✅ Just enqueues (doesn't process) - ✅ Worker handles all AI calls **Purpose:** Generate session summaries **Timing:** Worker-triggered (mid-session) **What it does:** - Gathers observations - Sends to Claude for summarization - Stores structured summary **Key change from v3:** - ✅ Multiple summaries per session - ✅ Summaries are checkpoints, not endings **Purpose:** Graceful cleanup **Timing:** When session ends **What it does:** - Marks session complete - Lets worker finish processing **Key change from v3:** - ✅ Graceful (not aggressive) - ✅ No DELETE requests - ✅ Worker finishes naturally ### Database Schema Evolution **v3 schema:** ```sql -- Simple, flat structure CREATE TABLE observations ( id INTEGER PRIMARY KEY, session_id TEXT, text TEXT, created_at INTEGER ); ``` **v4 schema:** ```sql -- Rich, structured schema CREATE TABLE observations ( id INTEGER PRIMARY KEY AUTOINCREMENT, session_id TEXT NOT NULL, project TEXT NOT NULL, -- Progressive disclosure metadata title TEXT NOT NULL, subtitle TEXT, type TEXT NOT NULL, -- decision, bugfix, feature, etc. -- Content narrative TEXT NOT NULL, facts TEXT, -- JSON array -- Searchability concepts TEXT, -- JSON array of tags files_read TEXT, -- JSON array files_modified TEXT, -- JSON array -- Timestamps created_at TEXT NOT NULL, created_at_epoch INTEGER NOT NULL, FOREIGN KEY(session_id) REFERENCES sdk_sessions(id) ); -- FTS5 for full-text search CREATE VIRTUAL TABLE observations_fts USING fts5( title, subtitle, narrative, facts, concepts, content=observations ); -- Auto-sync triggers CREATE TRIGGER observations_ai AFTER INSERT ON observations BEGIN INSERT INTO observations_fts(rowid, title, subtitle, narrative, facts, concepts) VALUES (new.id, new.title, new.subtitle, new.narrative, new.facts, new.concepts); END; ``` **What changed:** - ✅ Structured fields (title, subtitle, type) - ✅ FTS5 full-text search - ✅ Project-scoped queries - ✅ Rich metadata for progressive disclosure ### Worker Service Redesign **v3 worker:** ```typescript // Multiple short SDK sessions app.post('/process', async (req, res) => { const response = await query({ prompt: buildPrompt(req.body), options: { maxTurns: 1 } }); for await (const msg of response) { // Process single observation } res.json({ success: true }); }); ``` **v4 worker:** ```typescript // ONE long-running SDK session async function runWorker(sessionId: string) { const response = query({ prompt: messageGenerator(), // AsyncIterable options: { maxTurns: 1000 } }); for await (const msg of response) { if (msg.type === 'text') { parseObservations(msg.content); parseSummaries(msg.content); } } } ``` **Benefits:** - Maintains conversation state - SDK handles context automatically - More efficient (fewer API calls) - Natural multi-turn flow --- ## Critical Fixes Along the Way ### Fix 1: Context Injection Pollution (v4.3.1) **Problem:** SessionStart hook output polluted with npm install logs ```bash # Hook output contained: npm WARN deprecated ... npm WARN deprecated ... {"hookSpecificOutput": {"additionalContext": "..."}} ``` **Why it broke:** - Claude Code expects clean JSON or plain text - stderr/stdout from npm install mixed with hook output - Context didn't inject properly **Solution:** ```json { "command": "npm install --loglevel=silent && node context-hook.js" } ``` **Result:** Clean JSON output, context injection works ### Fix 2: Double Shebang Issue (v4.3.1) **Problem:** Hook executables had duplicate shebangs ```javascript #!/usr/bin/env node #!/usr/bin/env node // ← Duplicate! // Rest of code... ``` **Why it happened:** - Source files had shebang - esbuild added another shebang during build **Solution:** ```typescript // Remove shebangs from source files // Let esbuild add them during build ``` **Result:** Clean executables, no parsing errors ### Fix 3: FTS5 Injection Vulnerability (v4.2.3) **Problem:** User input passed directly to FTS5 query ```typescript // ❌ Vulnerable const results = db.query( `SELECT * FROM observations_fts WHERE observations_fts MATCH '${userQuery}'` ); ``` **Attack:** ```typescript userQuery = "'; DROP TABLE observations; --" ``` **Solution:** ```typescript // ✅ Safe: Use parameterized queries const results = db.query( 'SELECT * FROM observations_fts WHERE observations_fts MATCH ?', [userQuery] ); ``` ### Fix 4: NOT NULL Constraint Violation (v4.2.8) **Problem:** Session creation failed when prompt was empty ```sql INSERT INTO sdk_sessions (claude_session_id, user_prompt, ...) VALUES ('abc123', NULL, ...) -- ❌ user_prompt is NOT NULL ``` **Solution:** ```typescript // Allow NULL user_prompts user_prompt: input.prompt ?? null ``` **Schema change:** ```sql -- Before user_prompt TEXT NOT NULL -- After user_prompt TEXT -- Nullable ``` --- ## Performance Improvements ### Optimization 1: Prepared Statements **Before:** ```typescript for (const obs of observations) { db.run(`INSERT INTO observations (...) VALUES (?, ?, ...)`, [obs.id, obs.text, ...]); } ``` **After:** ```typescript const stmt = db.prepare(`INSERT INTO observations (...) VALUES (?, ?, ...)`); for (const obs of observations) { stmt.run([obs.id, obs.text, ...]); } stmt.finalize(); ``` **Impact:** 5x faster bulk inserts ### Optimization 2: FTS5 Indexing **Before:** ```typescript // Manual full-text search const results = db.query( `SELECT * FROM observations WHERE text LIKE '%${query}%'` ); ``` **After:** ```typescript // FTS5 virtual table const results = db.query( `SELECT * FROM observations_fts WHERE observations_fts MATCH ?`, [query] ); ``` **Impact:** 100x faster searches on large datasets ### Optimization 3: Index Format Default **Before:** ```typescript // Always return full observations search_observations({ query: "hooks" }); // Returns: 5,000 tokens ``` **After:** ```typescript // Default to index format search_observations({ query: "hooks", format: "index" }); // Returns: 200 tokens // Fetch full only when needed search_observations({ query: "hooks", format: "full", limit: 1 }); // Returns: 150 tokens ``` **Impact:** 25x reduction in average search result size --- ## What We Learned ### Lesson 1: Context is Precious **Principle:** Every token you put in context window costs attention. **Application:** - Progressive disclosure reduces waste by 87% - Index-first approach gives agent control - Token counts make costs visible ### Lesson 2: Session State is Complicated **Principle:** Distributed state is hard. SDK handles it better than we can. **Application:** - Use SDK's built-in session resumption - Don't try to manually reconstruct state - Track session IDs from init messages ### Lesson 3: Graceful Beats Aggressive **Principle:** Let processes finish their work before terminating. **Application:** - Graceful cleanup prevents data loss - Workers finish important operations - Clean state transitions reduce bugs ### Lesson 4: AI is the Compressor **Principle:** Don't compress manually. Let AI do semantic compression. **Application:** - 10:1 to 100:1 compression ratios - Semantic understanding, not keyword extraction - Structured outputs (XML parsing) ### Lesson 5: Progressive Everything **Principle:** Show metadata first, fetch details on-demand. **Application:** - Progressive disclosure in context injection - Index format in search results - Layer 1 (titles) → Layer 2 (summaries) → Layer 3 (full details) --- ## The Road Ahead ### Planned: Adaptive Index Size ```typescript SessionStart({ source: "startup" }): → Show last 10 sessions (normal) SessionStart({ source: "resume" }): → Show only current session (minimal) SessionStart({ source: "compact" }): → Show last 20 sessions (comprehensive) ``` ### Planned: Relevance Scoring ```typescript // Use embeddings to pre-sort index by semantic relevance search_observations({ query: "authentication bug", sort: "relevance" // Based on embeddings }); ``` ### Planned: Multi-Project Context ```typescript // Cross-project pattern recognition search_observations({ query: "API rate limiting", projects: ["api-gateway", "user-service", "billing-service"] }); ``` ### Planned: Collaborative Memory ```typescript // Team-shared observations (optional) createObservation({ title: "Rate limit: 100 req/min", scope: "team" // vs "user" }); ``` --- ## Migration Guide: v3 → v5 ### Step 1: Backup Database ```bash cp ~/.claude-mem/claude-mem.db ~/.claude-mem/claude-mem-v3-backup.db ``` ### Step 2: Update Plugin ```bash cd ~/.claude/plugins/marketplaces/thedotmack git pull ``` ### Step 3: Update Plugin ```bash /plugin update claude-mem ``` **What happens automatically:** - Dependencies update (including new ones like chromadb for v5.0.0+) - Database schema migrations run automatically - Worker service restarts with new code - Smart install caching activates (v5.0.3+) ### Step 4: Test ```bash # Start Claude Code claude # Check that context is injected # (Should see progressive disclosure index with v5 viewer link) # Open viewer UI (v5.1.0+) open http://localhost:37777 # Submit a prompt and watch real-time updates in viewer ``` ### Step 5: Explore New Features ```bash # View memory stream in browser (v5.1.0+) open http://localhost:37777 # Toggle theme (v5.1.2+) # Click theme button in viewer header # Check worker health npm run worker:status curl http://localhost:37777/health ``` --- ## Key Metrics ### v3 Performance | Metric | Value | |--------|-------| | Context usage per session | ~25,000 tokens | | Relevant context | ~2,000 tokens (8%) | | Hook execution time | ~200ms | | Search latency | ~500ms (LIKE queries) | ### v4 Performance | Metric | Value | |--------|-------| | Context usage per session | ~1,100 tokens | | Relevant context | ~1,100 tokens (100%) | | Hook execution time | ~45ms | | Search latency | ~15ms (FTS5) | ### v5 Performance | Metric | Value | |--------|-------| | Context usage per session | ~1,100 tokens | | Relevant context | ~1,100 tokens (100%) | | Hook execution time | ~10ms (cached install) | | Search latency | ~12ms (FTS5) or ~25ms (hybrid) | | Viewer UI load time | ~50ms (bundled HTML) | | SSE update latency | ~5ms (real-time) | **v3 → v4 Improvements:** - 96% reduction in context waste - 12x increase in relevance - 4x faster hooks - 33x faster search **v4 → v5 Improvements:** - 78% faster hooks (smart caching) - Real-time visualization (viewer UI) - Better search relevance (hybrid) - Enhanced UX (theme toggle, persistence) --- ## Conclusion The journey from v3 to v5 was about understanding these fundamental truths: 1. **Context is finite** - Progressive disclosure respects attention budget 2. **AI is the compressor** - Semantic understanding beats keyword extraction 3. **Agents are smart** - Let them decide what to fetch 4. **State is hard** - Use SDK's built-in mechanisms 5. **Graceful wins** - Let processes finish cleanly The result is a memory system that's both powerful and invisible. Users never notice it working - Claude just gets smarter over time. **v5 adds visibility**: Now users CAN see the memory system working if they want (via viewer UI), but it's still non-intrusive. --- ## Further Reading - [Progressive Disclosure](progressive-disclosure) - The philosophy behind v4 - [Hooks Architecture](hooks-architecture) - How hooks power the system - [Context Engineering](context-engineering) - Foundational principles - [Viewer UI](VIEWER) - Real-time visualization (v5.1.0+) --- *This architecture evolution reflects hundreds of hours of experimentation, dozens of dead ends, and the invaluable experience of real-world usage. v5 is the architecture that emerged from understanding what actually works - and making it visible to users.*