Add stderr test hook for UI experiment

2025-10-26 22:29:43 -04:00
parent 44b69b737b
commit b0fae0cfd4
10 changed files with 2501 additions and 2 deletions
@@ -0,0 +1,655 @@
+# Progressive Disclosure: Claude-Mem's Context Priming Philosophy
+
+## Core Principle
+**Show what exists and its retrieval cost first. Let the agent decide what to fetch based on relevance and need.**
+
+---
+
+## What is Progressive Disclosure?
+
+Progressive disclosure is an information architecture pattern where you reveal complexity gradually rather than all at once. In the context of AI agents, it means:
+
+1. **Layer 1 (Index)**: Show lightweight metadata (titles, dates, types, token counts)
+2. **Layer 2 (Details)**: Fetch full content only when needed
+3. **Layer 3 (Deep Dive)**: Read original source files if required
+
+This mirrors how humans work: We scan headlines before reading articles, review table of contents before diving into chapters, and check file names before opening files.
+
+---
+
+## The Problem: Context Pollution
+
+Traditional RAG (Retrieval-Augmented Generation) systems fetch everything upfront:
+
+```
+❌ Traditional Approach:
+┌─────────────────────────────────────┐
+│ Session Start                        │
+│                                      │
+│ [15,000 tokens of past sessions]    │
+│ [8,000 tokens of observations]      │
+│ [12,000 tokens of file summaries]   │
+│                                      │
+│ Total: 35,000 tokens                │
+│ Relevant: ~2,000 tokens (6%)        │
+└─────────────────────────────────────┘
+```
+
+**Problems:**
+- Wastes 94% of attention budget on irrelevant context
+- User prompt gets buried under mountain of history
+- Agent must process everything before understanding task
+- No way to know what's actually useful until after reading
+
+---
+
+## Claude-Mem's Solution: Progressive Disclosure
+
+```
+✅ Progressive Disclosure Approach:
+┌─────────────────────────────────────┐
+│ Session Start                        │
+│                                      │
+│ Index of 50 observations: ~800 tokens│
+│ ↓                                    │
+│ Agent sees: "🔴 Hook timeout issue"  │
+│ Agent decides: "Relevant!"           │
+│ ↓                                    │
+│ Fetch observation #2543: ~120 tokens│
+│                                      │
+│ Total: 920 tokens                   │
+│ Relevant: 920 tokens (100%)         │
+└─────────────────────────────────────┘
+```
+
+**Benefits:**
+- Agent controls its own context consumption
+- Directly relevant to current task
+- Can fetch more if needed
+- Can skip everything if not relevant
+- Clear cost/benefit for each retrieval decision
+
+---
+
+## How It Works in Claude-Mem
+
+### The Index Format
+
+Every SessionStart hook provides a compact index:
+
+```markdown
+### Oct 26, 2025
+
+**General**
+| ID | Time | T | Title | Tokens |
+|----|------|---|-------|--------|
+| #2586 | 12:58 AM | 🔵 | Context hook file exists but is empty | ~51 |
+| #2587 | ″ | 🔵 | Context hook script file is empty | ~46 |
+| #2589 | ″ | 🟡 | Investigated hook debug output docs | ~105 |
+
+**src/hooks/context-hook.ts**
+| ID | Time | T | Title | Tokens |
+|----|------|---|-------|--------|
+| #2591 | 1:15 AM | ⚖️ | Stderr messaging abandoned | ~155 |
+| #2592 | 1:16 AM | ⚖️ | Web UI strategy redesigned | ~193 |
+```
+
+**What the agent sees:**
+- **What exists**: Observation titles give semantic meaning
+- **When it happened**: Timestamps for temporal context
+- **What type**: Icons indicate observation category
+- **Retrieval cost**: Token counts for informed decisions
+- **Where to get it**: MCP search tools referenced at bottom
+
+### The Legend System
+
+```
+🎯 session-request  - User's original goal
+🔴 gotcha          - Critical edge case or pitfall
+🟡 problem-solution - Bug fix or workaround
+🔵 how-it-works    - Technical explanation
+🟢 what-changed    - Code/architecture change
+🟣 discovery       - Learning or insight
+🟠 why-it-exists   - Design rationale
+🟤 decision        - Architecture decision
+⚖️ trade-off       - Deliberate compromise
+```
+
+**Purpose:**
+- Visual scanning (humans and AI both benefit)
+- Semantic categorization
+- Priority signaling (🔴 gotchas are more critical)
+- Pattern recognition across sessions
+
+### Progressive Disclosure Instructions
+
+The index includes usage guidance:
+
+```markdown
+💡 **Progressive Disclosure:** This index shows WHAT exists and retrieval COST.
+- Use MCP search tools to fetch full observation details on-demand
+- Prefer searching observations over re-reading code for past decisions
+- Critical types (🔴 gotcha, 🟤 decision, ⚖️ trade-off) often worth fetching immediately
+```
+
+**What this does:**
+- Teaches the agent the pattern
+- Suggests when to fetch (critical types)
+- Recommends search over code re-reading (efficiency)
+- Makes the system self-documenting
+
+---
+
+## The Philosophy: Context as Currency
+
+### Mental Model: Token Budget as Money
+
+Think of context window as a bank account:
+
+| Approach | Metaphor | Outcome |
+|----------|----------|---------|
+| **Dump everything** | Spending your entire paycheck on groceries you might need someday | Waste, clutter, can't afford what you actually need |
+| **Fetch nothing** | Refusing to spend any money | Starvation, can't accomplish tasks |
+| **Progressive disclosure** | Check your pantry, make a shopping list, buy only what you need | Efficiency, room for unexpected needs |
+
+### The Attention Budget
+
+LLMs have finite attention:
+- Every token attends to every other token (n² relationships)
+- 100,000 token window ≠ 100,000 tokens of useful attention
+- Context "rot" happens as window fills
+- Later tokens get less attention than earlier ones
+
+**Claude-Mem's approach:**
+- Start with ~1,000 tokens of index
+- Agent has 99,000 tokens free for task
+- Agent fetches ~200 tokens when needed
+- Final budget: ~98,000 tokens for actual work
+
+### Design for Autonomy
+
+> "As models improve, let them act intelligently"
+
+Progressive disclosure treats the agent as an **intelligent information forager**, not a passive recipient of pre-selected context.
+
+**Traditional RAG:**
+```
+System → [Decides relevance] → Agent
+        ↑
+   Hope this helps!
+```
+
+**Progressive Disclosure:**
+```
+System → [Shows index] → Agent → [Decides relevance] → [Fetches details]
+                          ↑
+                   You know best!
+```
+
+The agent knows:
+- The current task context
+- What information would help
+- How much budget to spend
+- When to stop searching
+
+We don't.
+
+---
+
+## Implementation Principles
+
+### 1. Make Costs Visible
+
+Every item in the index shows token count:
+
+```
+| #2591 | 1:15 AM | ⚖️ | Stderr messaging abandoned | ~155 |
+                                                        ^^^^
+                                                    Retrieval cost
+```
+
+**Why:**
+- Agent can make informed ROI decisions
+- Small observations (~50 tokens) are "cheap" to fetch
+- Large observations (~500 tokens) require stronger justification
+- Matches how humans think about effort
+
+### 2. Use Semantic Compression
+
+Titles compress full observations into ~10 words:
+
+**Bad title:**
+```
+Observation about a thing
+```
+
+**Good title:**
+```
+🔴 Hook timeout issue: 60s default too short for npm install
+```
+
+**What makes a good title:**
+- Specific: Identifies exact issue
+- Actionable: Clear what to do
+- Self-contained: Doesn't require reading observation
+- Searchable: Contains key terms (hook, timeout, npm)
+- Categorized: Icon indicates type
+
+### 3. Group by Context
+
+Observations are grouped by:
+- **Date**: Temporal context
+- **File path**: Spatial context (work on specific files)
+- **Project**: Logical context
+
+```markdown
+**src/hooks/context-hook.ts**
+| ID | Time | T | Title | Tokens |
+|----|------|---|-------|--------|
+| #2591 | 1:15 AM | ⚖️ | Stderr messaging abandoned | ~155 |
+| #2594 | 1:17 AM | 🟠 | Removed stderr section from docs | ~93 |
+```
+
+**Benefit:** If agent is working on `src/hooks/context-hook.ts`, related observations are already grouped together.
+
+### 4. Provide Retrieval Tools
+
+The index is useless without retrieval mechanisms:
+
+```markdown
+*Use claude-mem MCP search to access records with the given ID*
+```
+
+**Available tools:**
+- `search_observations` - Full-text search
+- `find_by_concept` - Concept-based retrieval
+- `find_by_file` - File-based retrieval
+- `find_by_type` - Type-based retrieval
+- `get_recent_context` - Recent session summaries
+
+Each tool supports `format: "index"` (default) and `format: "full"`.
+
+---
+
+## Real-World Example
+
+### Scenario: Agent asked to fix a bug in hooks
+
+**Without progressive disclosure:**
+```
+SessionStart injects 25,000 tokens of past context
+Agent reads everything
+Agent finds 1 relevant observation (buried in middle)
+Total tokens consumed: 25,000
+Relevant tokens: ~200
+Efficiency: 0.8%
+```
+
+**With progressive disclosure:**
+```
+SessionStart shows index: ~800 tokens
+Agent sees title: "🔴 Hook timeout issue: 60s too short"
+Agent thinks: "This looks relevant to my bug!"
+Agent fetches observation #2543: ~155 tokens
+Total tokens consumed: 955
+Relevant tokens: 955
+Efficiency: 100%
+```
+
+### The Index Entry
+
+```markdown
+| #2543 | 2:14 PM | 🔴 | Hook timeout: 60s too short for npm install | ~155 |
+```
+
+**What the agent learns WITHOUT fetching:**
+- There's a known gotcha (🔴) about hook timeouts
+- It's related to npm install taking too long
+- Full details are ~155 tokens (cheap)
+- Happened at 2:14 PM (recent)
+
+**Decision tree:**
+```
+Is my task related to hooks? → YES
+Is my task related to timeouts? → YES
+Is my task related to npm? → YES
+155 tokens is cheap → FETCH IT
+```
+
+---
+
+## The Two-Tier Search Strategy
+
+Claude-Mem implements progressive disclosure in search results too:
+
+### Tier 1: Index Format (Default)
+
+```typescript
+search_observations({
+  query: "hook timeout",
+  format: "index"  // Default
+})
+```
+
+**Returns:**
+```
+Found 3 observations matching "hook timeout":
+
+| ID | Date | Type | Title | Tokens |
+|----|------|------|-------|--------|
+| #2543 | Oct 26 | gotcha | Hook timeout: 60s too short | ~155 |
+| #2891 | Oct 25 | how-it-works | Hook timeout configuration | ~203 |
+| #2102 | Oct 20 | problem-solution | Fixed timeout in CI | ~89 |
+```
+
+**Cost:** ~100 tokens for 3 results
+**Value:** Agent can scan and decide which to fetch
+
+### Tier 2: Full Format (On-Demand)
+
+```typescript
+search_observations({
+  query: "hook timeout",
+  format: "full",
+  limit: 1  // Fetch just the most relevant
+})
+```
+
+**Returns:**
+```
+#2543 🔴 Hook timeout: 60s too short for npm install
+─────────────────────────────────────────────────
+Date: Oct 26, 2025 2:14 PM
+Type: gotcha
+Project: claude-mem
+
+Narrative:
+Discovered that the default 60-second hook timeout is insufficient
+for npm install operations, especially with large dependency trees
+or slow network conditions. This causes SessionStart hook to fail
+silently, preventing context injection.
+
+Facts:
+- Default timeout: 60 seconds
+- npm install with cold cache: ~90 seconds
+- Configured timeout: 120 seconds in plugin/hooks/hooks.json:25
+
+Files Modified:
+- plugin/hooks/hooks.json
+
+Concepts: hooks, timeout, npm, configuration
+```
+
+**Cost:** ~155 tokens for full details
+**Value:** Complete understanding of the issue
+
+---
+
+## Cognitive Load Theory
+
+Progressive disclosure is grounded in **Cognitive Load Theory**:
+
+### Intrinsic Load
+The inherent difficulty of the task itself.
+
+**Example:** "Fix authentication bug"
+- Must understand auth system
+- Must understand the bug
+- Must write the fix
+
+This load is unavoidable.
+
+### Extraneous Load
+The cognitive burden of poorly presented information.
+
+**Traditional RAG adds extraneous load:**
+- Scanning irrelevant observations
+- Filtering out noise
+- Remembering what to ignore
+- Re-contextualizing after each section
+
+**Progressive disclosure minimizes extraneous load:**
+- Scan titles (low effort)
+- Fetch only relevant (targeted effort)
+- Full attention on current task
+
+### Germane Load
+The effort of building mental models and schemas.
+
+**Progressive disclosure supports germane load:**
+- Consistent structure (legend, grouping)
+- Clear categorization (types, icons)
+- Semantic compression (good titles)
+- Explicit costs (token counts)
+
+---
+
+## Anti-Patterns to Avoid
+
+### ❌ Verbose Titles
+
+**Bad:**
+```
+| #2543 | 2:14 PM | 🔴 | Investigation into the issue where hooks time out | ~155 |
+```
+
+**Good:**
+```
+| #2543 | 2:14 PM | 🔴 | Hook timeout: 60s too short for npm install | ~155 |
+```
+
+### ❌ Hiding Costs
+
+**Bad:**
+```
+| #2543 | 2:14 PM | 🔴 | Hook timeout issue |
+```
+
+**Good:**
+```
+| #2543 | 2:14 PM | 🔴 | Hook timeout issue | ~155 |
+```
+
+### ❌ No Retrieval Path
+
+**Bad:**
+```
+Here are 10 observations. [No instructions on how to get full details]
+```
+
+**Good:**
+```
+Here are 10 observations.
+*Use MCP search tools to fetch full observation details on-demand*
+```
+
+### ❌ Defaulting to Full Format
+
+**Bad:**
+```typescript
+search_observations({
+  query: "hooks",
+  format: "full"  // Fetches everything
+})
+```
+
+**Good:**
+```typescript
+search_observations({
+  query: "hooks",
+  format: "index",  // Scan first
+  limit: 20
+})
+
+// Then, if needed:
+search_observations({
+  query: "hooks",
+  format: "full",
+  limit: 1  // Just the most relevant
+})
+```
+
+---
+
+## Key Design Decisions
+
+### Why Token Counts?
+
+**Decision:** Show approximate token counts (~155, ~203) rather than exact counts.
+
+**Rationale:**
+- Communicates scale (50 vs 500) without false precision
+- Maps to human intuition (small/medium/large)
+- Allows agent to budget attention
+- Encourages cost-conscious retrieval
+
+### Why Icons Instead of Text Labels?
+
+**Decision:** Use emoji icons (🔴, 🟡, 🔵) rather than text (GOTCHA, PROBLEM, HOWTO).
+
+**Rationale:**
+- Visual scanning (pattern recognition)
+- Token efficient (1 char vs 10 chars)
+- Language-agnostic
+- Aesthetically distinct
+- Works for both humans and AI
+
+### Why Index-First, Not Smart Pre-Fetch?
+
+**Decision:** Always show index first, even if we "know" what's relevant.
+
+**Rationale:**
+- We can't know what's relevant better than the agent
+- Pre-fetching assumes we understand the task
+- Agent knows current context, we don't
+- Respects agent autonomy
+- Fails gracefully (can always fetch more)
+
+### Why Group by File Path?
+
+**Decision:** Group observations by file path in addition to date.
+
+**Rationale:**
+- Spatial locality: Work on file X likely needs context about file X
+- Reduces scanning effort
+- Matches how developers think
+- Clear semantic boundaries
+
+---
+
+## Measuring Success
+
+Progressive disclosure is working when:
+
+### ✅ Low Waste Ratio
+```
+Relevant Tokens / Total Context Tokens > 80%
+```
+
+Most of the context consumed is actually useful.
+
+### ✅ Selective Fetching
+```
+Index Shown: 50 observations
+Details Fetched: 2-3 observations
+```
+
+Agent is being selective, not fetching everything.
+
+### ✅ Fast Task Completion
+```
+Session with index: 30 seconds to find relevant context
+Session without: 90 seconds scanning all context
+```
+
+Time-to-relevant-information is faster.
+
+### ✅ Appropriate Depth
+```
+Simple task: Only index needed
+Medium task: 1-2 observations fetched
+Complex task: 5-10 observations + code reads
+```
+
+Depth scales with task complexity.
+
+---
+
+## Future Enhancements
+
+### Adaptive Index Size
+
+```typescript
+// Vary index size based on session type
+SessionStart({ source: "startup" }):
+  → Show last 10 sessions (small index)
+
+SessionStart({ source: "resume" }):
+  → Show only current session (micro index)
+
+SessionStart({ source: "compact" }):
+  → Show last 20 sessions (larger index)
+```
+
+### Relevance Scoring
+
+```typescript
+// Use embeddings to pre-sort index by relevance
+search_observations({
+  query: "authentication bug",
+  format: "index",
+  sort: "relevance"  // Based on semantic similarity
+})
+```
+
+### Cost Forecasting
+
+```markdown
+💡 **Budget Estimate:**
+- Fetching all 🔴 gotchas: ~450 tokens
+- Fetching all file-related: ~1,200 tokens
+- Fetching everything: ~8,500 tokens
+```
+
+### Progressive Detail Levels
+
+```
+Layer 1: Index (titles only)
+Layer 2: Summaries (2-3 sentences)
+Layer 3: Full details (complete observation)
+Layer 4: Source files (referenced code)
+```
+
+---
+
+## Key Takeaways
+
+1. **Show, don't tell**: Index reveals what exists without forcing consumption
+2. **Cost-conscious**: Make retrieval costs visible for informed decisions
+3. **Agent autonomy**: Let the agent decide what's relevant
+4. **Semantic compression**: Good titles make or break the system
+5. **Consistent structure**: Patterns reduce cognitive load
+6. **Two-tier everything**: Index first, details on-demand
+7. **Context as currency**: Spend wisely on high-value information
+
+---
+
+## Remember
+
+> "The best interface is one that disappears when not needed, and appears exactly when it is."
+
+Progressive disclosure respects the agent's intelligence and autonomy. We provide the map; the agent chooses the path.
+
+---
+
+## Further Reading
+
+- [Context Engineering for AI Agents](/docs/context-engineering) - Foundational principles
+- [Claude-Mem Architecture](/docs/architecture) - How it all fits together
+- Cognitive Load Theory (Sweller, 1988)
+- Information Foraging Theory (Pirolli & Card, 1999)
+- Progressive Disclosure (Nielsen Norman Group)
+
+---
+
+*This philosophy emerged from real-world usage of Claude-Mem across hundreds of coding sessions. The pattern works because it aligns with both human cognition and LLM attention mechanics.*