Refactor search documentation to implement a 3-layer workflow for memory retrieval; update tool names and usage examples for clarity and efficiency. Enhance troubleshooting section with new error handling and token management strategies.

2025-12-29 00:26:06 -05:00
parent f1aa4c3943
commit 00d0bc51e0
6 changed files with 1024 additions and 732 deletions
@@ -248,6 +248,164 @@ search_observations({

 ---

+## MCP Architecture Simplification (December 2025)
+
+### The Problem: Complex MCP Implementation
+
+**Before:**
+```
+9+ MCP tools registered at session start:
+- search_observations
+- find_by_type
+- find_by_file
+- find_by_concept
+- get_recent_context
+- get_observation
+- get_session
+- get_prompt
+- help
+
+Problems:
+- Overlapping operations (search_observations vs find_by_type)
+- Complex parameter schemas (~2,500 tokens in tool definitions)
+- No built-in workflow guidance
+- High cognitive load for Claude (which tool to use?)
+- Code size: ~2,718 lines in mcp-server.ts
+```
+
+**The Insight:** Progressive disclosure should be built into tool design itself, not something Claude has to remember.
+
+### The Solution: 3-Layer Workflow
+
+**After:**
+```
+4 MCP tools following 3-layer workflow:
+
+1. __IMPORTANT - Workflow documentation (always visible)
+   "3-LAYER WORKFLOW (ALWAYS FOLLOW):
+    1. search(query) → Get index with IDs
+    2. timeline(anchor=ID) → Get context
+    3. get_observations([IDs]) → Fetch details
+    NEVER fetch full details without filtering first."
+
+2. search - Layer 1: Get index with IDs (~50-100 tokens/result)
+3. timeline - Layer 2: Get chronological context
+4. get_observations - Layer 3: Fetch full details (~500-1,000 tokens/result)
+
+Benefits:
+- Progressive disclosure enforced by tool structure
+- No overlapping operations
+- Simple schemas (additionalProperties: true)
+- Clear workflow pattern
+- Code size: ~312 lines in mcp-server.ts (88% reduction)
+- ~10x token savings
+```
+
+### Migration: Skill-Based Search Removed
+
+**Previously:** Used skill-based search
+- mem-search skill invoked via natural language
+- HTTP API called directly via curl
+- Progressive disclosure through skill loading
+- 17 skill documentation files
+
+**Now:** Removed skill-based approach
+- MCP-only architecture
+- Native MCP protocol (better Claude integration)
+- Works with both Claude Desktop and Claude Code
+- Simpler to maintain (no skill files)
+- All 19 mem-search skill files removed (~2,744 lines)
+
+### Key Architectural Changes
+
+**MCP Server Refactor:**
+
+Before:
+```typescript
+// Complex parameter schemas
+{
+  name: "search_observations",
+  inputSchema: {
+    type: "object",
+    properties: {
+      query: { type: "string", description: "..." },
+      type: { type: "array", items: { enum: [...] } },
+      format: { enum: ["index", "full"] },
+      limit: { type: "number", minimum: 1, maximum: 100 },
+      // ... many more parameters
+    }
+  }
+}
+```
+
+After:
+```typescript
+// Simple schemas with workflow guidance
+{
+  name: "search",
+  description: "Step 1: Search memory. Returns index with IDs.",
+  inputSchema: {
+    type: "object",
+    properties: {},
+    additionalProperties: true  // Accept any parameters
+  }
+}
+```
+
+**Workflow Enforcement:**
+
+Before: Claude had to remember progressive disclosure pattern
+
+After: Tool structure makes it impossible to skip steps
+- Can't get details without IDs from search
+- Can't search without seeing __IMPORTANT reminder
+- Timeline provides middle ground (context without full details)
+
+### Impact
+
+**Token Efficiency:**
+```
+Traditional: Fetch 20 observations upfront
+→ 10,000-20,000 tokens
+→ Only 2 observations relevant (90% waste)
+
+3-Layer Workflow:
+→ search (20 results): ~1,000-2,000 tokens
+→ Review index, identify 3 relevant IDs
+→ get_observations (3 IDs): ~1,500-3,000 tokens
+→ Total: 2,500-5,000 tokens (50-75% savings)
+```
+
+**Code Simplicity:**
+- MCP server: 2,718 lines → 312 lines (88% reduction)
+- Removed: 19 skill files (~2,744 lines)
+- Net reduction: ~5,150 lines of code removed
+
+**User Experience:**
+- Same natural language interaction
+- Better token efficiency
+- Clearer architecture
+- Works identically on Claude Desktop and Claude Code
+
+### Design Philosophy
+
+**Progressive Disclosure Through Structure:**
+
+The 3-layer workflow embodies progressive disclosure at the architectural level:
+
+1. **Layer 1 (Index)** - "What exists?" - Cheap survey of options
+2. **Layer 2 (Timeline)** - "What was happening?" - Context around specific points
+3. **Layer 3 (Details)** - "Tell me everything" - Full details only when justified
+
+Each layer provides a decision point where Claude can:
+- Stop if irrelevant
+- Get more context if uncertain
+- Dive deep if confident
+
+This makes it structurally difficult to waste tokens.
+
+---
+
 ## v1-v2: The Naive Approach

 ### The First Attempt: Dump Everything
@@ -1,448 +1,497 @@
 ---
 title: "Search Architecture"
-description: "mem-search skill with HTTP API and progressive disclosure"
+description: "MCP tools with 3-layer workflow for token-efficient memory retrieval"
 ---

 # Search Architecture

-Claude-Mem uses a skill-based search architecture that provides intelligent memory retrieval through natural language queries. This replaced the MCP-based approach in v5.4.0 with a more efficient implementation. The skill was enhanced and renamed to "mem-search" in v5.5.0 for better scope differentiation.
+Claude-mem uses an **MCP-based search architecture** that provides intelligent memory retrieval through 4 streamlined tools following a 3-layer workflow pattern.

 ## Overview

-**Architecture**: Skill-Based Search + HTTP API + Progressive Disclosure
+**Architecture**: MCP Tools → MCP Protocol → HTTP API → Worker Service

 **Key Components**:
-1. **mem-search Skill** (`plugin/skills/mem-search/SKILL.md`) - Auto-invoked when users ask about past work
-2. **HTTP API Endpoints** (10 routes) - Fast, efficient search operations on port 37777
-3. **Worker Service** - Express.js server with FTS5 full-text search
-4. **SQLite Database** - Persistent storage with FTS5 virtual tables
-5. **Chroma Vector DB** - Semantic search with hybrid retrieval
+1. **MCP Tools** (4 tools) - `search`, `timeline`, `get_observations`, `__IMPORTANT`
+2. **MCP Server** (`plugin/scripts/mcp-server.cjs`) - Thin wrapper over HTTP API
+3. **HTTP API Endpoints** - Fast search operations on Worker Service (port 37777)
+4. **Worker Service** - Express.js server with FTS5 full-text search
+5. **SQLite Database** - Persistent storage with FTS5 virtual tables
+6. **Chroma Vector DB** - Semantic search with hybrid retrieval

-**v5.5.0 Enhancement**: Renamed from "search" to "mem-search" with:
- Effectiveness increased from 67% to 100%
- Concrete triggers increased from 44% to 85%
- 5+ unique identifiers for better scope differentiation
- Comprehensive documentation (17 files, 12 operation guides)
+**Token Efficiency**: ~10x savings through 3-layer workflow pattern

 ## How It Works

-### 1. User Query (Natural Language)
+### 1. User Query
+
+Claude has access to 4 MCP tools. When searching memory, Claude follows the 3-layer workflow:

 ```
-User: "What bugs did we fix last session?"
+Step 1: search(query="authentication bug", type="bugfix", limit=10)
+Step 2: timeline(anchor=<observation_id>, depth_before=3, depth_after=3)
+Step 3: get_observations(ids=[123, 456, 789])
 ```

-### 2. Skill Invocation
+### 2. MCP Protocol

-Claude recognizes the intent and invokes the mem-search skill:
- Skill frontmatter (~250 tokens) loaded at session start
- Full skill instructions loaded on-demand when skill is invoked
- Progressive disclosure pattern minimizes context overhead
- "mem-search" naming provides clear scope differentiation from native memory
+MCP server receives tool call via JSON-RPC over stdio:
+
+```json
+{
+  "method": "tools/call",
+  "params": {
+    "name": "search",
+    "arguments": {
+      "query": "authentication bug",
+      "type": "bugfix",
+      "limit": 10
+    }
+  }
+}
+```

 ### 3. HTTP API Call

-The skill uses `curl` to call the HTTP API:
+MCP server translates to HTTP request:

-```bash
-curl "http://localhost:37777/api/search/observations?query=bugs&type=bugfix&limit=5"
+```typescript
+const url = `http://localhost:37777/api/search?query=authentication%20bug&type=bugfix&limit=10`;
+const response = await fetch(url);
 ```

-### 4. FTS5 Search
+### 4. Worker Processing

-Worker service queries SQLite FTS5 virtual tables:
+Worker service executes FTS5 query:

 ```sql
 SELECT * FROM observations_fts
 WHERE observations_fts MATCH ?
 AND type = 'bugfix'
 ORDER BY rank
-LIMIT 5
+LIMIT 10
 ```

-### 5. Results Formatted
+### 5. Results Returned

-Skill formats results and returns to Claude:
+Worker returns structured data → MCP server → Claude:

-```
-## Recent Bugfixes
-
-1. [bugfix] Fixed authentication token expiry
-   Date: 2025-11-08 14:23:45
-   Files: src/auth/jwt.ts
-
-2. [bugfix] Resolved database connection leak
-   Date: 2025-11-08 13:15:22
-   Files: src/services/database.ts
-```
-
-### 6. User Sees Answer
-
-Claude presents the formatted results naturally in conversation.
-
-## Architecture Change (v5.4.0)
-
-### Before: MCP-Based Search
-
-**Approach**: 9 MCP tools registered at session start
-
-**Token Cost**: ~2,500 tokens in tool definitions per session
- Each tool's schema, parameters, descriptions loaded
- All 9 tools available whether needed or not
- No progressive disclosure
-
-**Example MCP Tool**:
 ```json
 {
-  "name": "search_observations",
-  "description": "Full-text search across observations...",
-  "inputSchema": {
-    "type": "object",
-    "properties": {
-      "query": { "type": "string", "description": "..." },
-      "type": { "type": "array", "items": { "enum": [...] } },
-      "format": { "enum": ["index", "full"] },
-      // ... many more parameters
+  "content": [{
+    "type": "text",
+    "text": "| ID | Time | Title | Type |\n|---|---|---|---|\n| #123 | 2:15 PM | Fixed auth token expiry | bugfix |"
+  }]
+}
+```
+
+### 6. Claude Processes Results
+
+Claude reviews the index, decides which observations are relevant, and can:
+- Use `timeline` to get context
+- Use `get_observations` to fetch full details for selected IDs
+
+## The 4 MCP Tools
+
+### `__IMPORTANT` - Workflow Documentation
+
+Always visible to Claude. Explains the 3-layer workflow pattern.
+
+**Description:**
+```
+3-LAYER WORKFLOW (ALWAYS FOLLOW):
+1. search(query) → Get index with IDs (~50-100 tokens/result)
+2. timeline(anchor=ID) → Get context around interesting results
+3. get_observations([IDs]) → Fetch full details ONLY for filtered IDs
+NEVER fetch full details without filtering first. 10x token savings.
+```
+
+**Purpose:** Ensures Claude follows token-efficient pattern
+
+### `search` - Search Memory Index
+
+**Tool Definition:**
+```typescript
+{
+  name: 'search',
+  description: 'Step 1: Search memory. Returns index with IDs. Params: query, limit, project, type, obs_type, dateStart, dateEnd, offset, orderBy',
+  inputSchema: {
+    type: 'object',
+    properties: {},
+    additionalProperties: true  // Accepts any parameters
+  }
+}
+```
+
+**HTTP Endpoint:** `GET /api/search`
+
+**Parameters:**
+- `query` - Full-text search query
+- `limit` - Maximum results (default: 20)
+- `type` - Filter by observation type
+- `project` - Filter by project name
+- `dateStart`, `dateEnd` - Date range filters
+- `offset` - Pagination offset
+- `orderBy` - Sort order
+
+**Returns:** Compact index with IDs, titles, dates, types (~50-100 tokens per result)
+
+### `timeline` - Get Chronological Context
+
+**Tool Definition:**
+```typescript
+{
+  name: 'timeline',
+  description: 'Step 2: Get context around results. Params: anchor (observation ID) OR query (finds anchor automatically), depth_before, depth_after, project',
+  inputSchema: {
+    type: 'object',
+    properties: {},
+    additionalProperties: true
+  }
+}
+```
+
+**HTTP Endpoint:** `GET /api/timeline`
+
+**Parameters:**
+- `anchor` - Observation ID to center timeline around (optional if query provided)
+- `query` - Search query to find anchor automatically (optional if anchor provided)
+- `depth_before` - Number of observations before anchor (default: 3)
+- `depth_after` - Number of observations after anchor (default: 3)
+- `project` - Filter by project name
+
+**Returns:** Chronological view showing what happened before/during/after
+
+### `get_observations` - Fetch Full Details
+
+**Tool Definition:**
+```typescript
+{
+  name: 'get_observations',
+  description: 'Step 3: Fetch full details for filtered IDs. Params: ids (array of observation IDs, required), orderBy, limit, project',
+  inputSchema: {
+    type: 'object',
+    properties: {
+      ids: {
+        type: 'array',
+        items: { type: 'number' },
+        description: 'Array of observation IDs to fetch (required)'
+      }
+    },
+    required: ['ids'],
+    additionalProperties: true
+  }
+}
+```
+
+**HTTP Endpoint:** `POST /api/observations/batch`
+
+**Body:**
+```json
+{
+  "ids": [123, 456, 789],
+  "orderBy": "date_desc",
+  "project": "my-app"
+}
+```
+
+**Returns:** Complete observation details (~500-1,000 tokens per observation)
+
+## MCP Server Implementation
+
+**Location:** `/Users/YOUR_USERNAME/.claude/plugins/marketplaces/thedotmack/plugin/scripts/mcp-server.cjs`
+
+**Role:** Thin wrapper that translates MCP protocol to HTTP API calls
+
+**Key Characteristics:**
+- ~312 lines of code (reduced from ~2,718 lines in old implementation)
+- No business logic - just protocol translation
+- Single source of truth: Worker HTTP API
+- Simple schemas with `additionalProperties: true`
+
+**Handler Example:**
+```typescript
+{
+  name: 'search',
+  handler: async (args: any) => {
+    const endpoint = '/api/search';
+    const searchParams = new URLSearchParams();
+
+    for (const [key, value] of Object.entries(args)) {
+      searchParams.append(key, String(value));
+    }
+
+    const url = `http://localhost:37777${endpoint}?${searchParams}`;
+    const response = await fetch(url);
+    return await response.json();
+  }
+}
+```
+
+## Worker HTTP API
+
+**Location:** `src/services/worker-service.ts`
+
+**Port:** 37777
+
+**Search Endpoints:**
+```typescript
+GET  /api/search           # Main search (used by MCP search tool)
+GET  /api/timeline         # Timeline context (used by MCP timeline tool)
+POST /api/observations/batch  # Fetch by IDs (used by MCP get_observations tool)
+GET  /api/health           # Health check
+```
+
+**Database Access:**
+- Uses `SessionSearch` service for FTS5 queries
+- Uses `SessionStore` for structured queries
+- Hybrid search with ChromaDB for semantic similarity
+
+**FTS5 Full-Text Search:**
+```typescript
+// search tool → HTTP GET → FTS5 query
+SELECT * FROM observations_fts
+WHERE observations_fts MATCH ?
+AND type = ?
+AND date >= ? AND date <= ?
+ORDER BY rank
+LIMIT ? OFFSET ?
+```
+
+## The 3-Layer Workflow Pattern
+
+### Design Philosophy
+
+The 3-layer workflow embodies **progressive disclosure** - a core principle of claude-mem's architecture.
+
+**Layer 1: Index (Search)**
+- **What:** Compact table with IDs, titles, dates, types
+- **Cost:** ~50-100 tokens per result
+- **Purpose:** Survey what exists before committing tokens
+- **Decision Point:** "Which observations are relevant?"
+
+**Layer 2: Context (Timeline)**
+- **What:** Chronological view of observations around a point
+- **Cost:** Variable based on depth
+- **Purpose:** Understand narrative arc, see what led to/from a point
+- **Decision Point:** "Do I need full details?"
+
+**Layer 3: Details (Get Observations)**
+- **What:** Complete observation data (narrative, facts, files, concepts)
+- **Cost:** ~500-1,000 tokens per observation
+- **Purpose:** Deep dive on validated, relevant observations
+- **Decision Point:** "Apply knowledge to current task"
+
+### Token Efficiency
+
+**Traditional RAG Approach:**
+```
+Fetch 20 observations upfront: 10,000-20,000 tokens
+Relevance: ~10% (only 2 observations actually useful)
+Waste: 18,000 tokens on irrelevant context
+```
+
+**3-Layer Workflow:**
+```
+Step 1: search (20 results)        ~1,000-2,000 tokens
+Step 2: Review index, filter to 3 relevant IDs
+Step 3: get_observations (3 IDs)   ~1,500-3,000 tokens
+Total: 2,500-5,000 tokens (50-75% savings)
+```
+
+**10x Savings:** By filtering at index level before fetching full details
+
+## Architecture Evolution
+
+### Before: Complex MCP Implementation
+
+**Approach:** 9 MCP tools with detailed parameter schemas
+
+**Token Cost:** ~2,500 tokens in tool definitions per session
+- `search_observations` - Full-text search
+- `find_by_type` - Filter by type
+- `find_by_file` - Filter by file
+- `find_by_concept` - Filter by concept
+- `get_recent_context` - Recent sessions
+- `get_observation` - Fetch single observation
+- `get_session` - Fetch session
+- `get_prompt` - Fetch prompt
+- `help` - API documentation
+
+**Problems:**
+- Overlapping operations (search_observations vs find_by_type)
+- Complex parameter schemas
+- No built-in workflow guidance
+- High token cost at session start
+
+**Code Size:** ~2,718 lines in mcp-server.ts
+
+### After: Streamlined MCP Implementation
+
+**Approach:** 4 MCP tools following 3-layer workflow
+
+**Token Cost:** ~312 lines of code, simplified tool definitions
+
+**Tools:**
+1. `__IMPORTANT` - Workflow guidance (always visible)
+2. `search` - Step 1 (index)
+3. `timeline` - Step 2 (context)
+4. `get_observations` - Step 3 (details)
+
+**Benefits:**
+- Progressive disclosure built into tool design
+- No overlapping operations
+- Simple schemas (`additionalProperties: true`)
+- Clear workflow pattern
+- ~10x token savings
+
+**Code Size:** ~312 lines in mcp-server.ts (88% reduction)
+
+### Key Insight
+
+**Before:** Progressive disclosure was something Claude had to remember
+
+**After:** Progressive disclosure is enforced by tool design itself
+
+The 3-layer workflow pattern makes it structurally difficult to waste tokens:
+- Can't fetch details without first getting IDs from search
+- Can't search without seeing workflow reminder (`__IMPORTANT`)
+- Timeline provides middle ground between index and full details
+
+## Configuration
+
+### Claude Desktop
+
+Add to `claude_desktop_config.json`:
+
+```json
+{
+  "mcpServers": {
+    "mcp-search": {
+      "command": "node",
+      "args": [
+        "/Users/YOUR_USERNAME/.claude/plugins/marketplaces/thedotmack/plugin/scripts/mcp-server.cjs"
+      ]
    }
  }
 }
 ```

-### After: Skill-Based Search
+### Claude Code

-**Approach**: 1 mem-search skill with progressive disclosure
+MCP server is automatically configured via plugin installation. No manual setup required.

-**Token Cost**: ~250 tokens in skill frontmatter per session
- Only skill description loaded at session start
- Full instructions loaded on-demand when skill is invoked
- HTTP API endpoints instead of MCP protocol
+**Both clients use the same MCP tools** - the architecture works identically for Claude Desktop and Claude Code.

-**Example Skill Frontmatter**:
-```markdown
-# Claude-Mem mem-search Skill
+## Security

-Access claude-mem's persistent memory through a comprehensive HTTP API.
-Search for past work, understand context, and learn from previous decisions.
+### FTS5 Injection Prevention

-## When to Use This Skill
+All search queries are escaped before FTS5 processing:

-Invoke this skill when users ask about:
- Past work: "What did we do last session?"
- Bug fixes: "Did we fix this before?"
- Features: "How did we implement authentication?"
-...
-```
-
-**Token Efficiency**: Minimal frontmatter at session start with progressive disclosure
-
-## HTTP API Endpoints
-
-The worker service exposes 10 search endpoints:
-
-### Full-Text Search
-
-```
-GET /api/search/observations
-GET /api/search/sessions
-GET /api/search/prompts
-```
-
-**Parameters**:
- `query` - FTS5 search query (required)
- `type` - Filter by type (bugfix, feature, refactor, etc.)
- `project` - Filter by project name
- `limit` - Maximum results (default: 20)
- `offset` - Pagination offset
- `format` - Response format (index or full)
-
-**Example**:
-```bash
-curl "http://localhost:37777/api/search/observations?query=authentication&type=decision&limit=5"
-```
-
-### Filtered Search
-
-```
-GET /api/search/by-type
-GET /api/search/by-concept
-GET /api/search/by-file
-```
-
-**Parameters**:
- `type` / `concept` / `filePath` - Filter criteria (required)
- `project` - Filter by project
- `limit` - Maximum results
- `format` - Response format
-
-**Example**:
-```bash
-curl "http://localhost:37777/api/search/by-file?filePath=worker-service.ts&limit=10"
-```
-
-### Context Retrieval
-
-```
-GET /api/context/recent
-GET /api/context/timeline
-GET /api/timeline/by-query
-```
-
-**Parameters**:
- `project` - Filter by project
- `limit` - Number of sessions/records
- `anchor` - Timeline anchor point (ID or timestamp)
- `depth_before` - Records before anchor
- `depth_after` - Records after anchor
-
-**Example**:
-```bash
-curl "http://localhost:37777/api/context/recent?project=claude-mem&limit=5"
-```
-
-### Documentation
-
-```
-GET /api/search/help
-```
-
-Returns API documentation in JSON format.
-
-## Progressive Disclosure Pattern
-
-The mem-search skill uses progressive disclosure to minimize token usage:
-
-### Layer 1: Skill Frontmatter (Session Start)
-
-**What's Loaded**: Skill description and when to use it (~250 tokens)
-
-**Purpose**: Claude can recognize when to invoke the skill
-
-**Example**:
-```markdown
-# Claude-Mem mem-search Skill
-
-Access claude-mem's persistent memory through a comprehensive HTTP API.
-
-## When to Use This Skill
-Invoke this skill when users ask about:
- Past work: "What did we do last session?"
- Bug fixes: "Did we fix this before?"
-...
-```
-
-### Layer 2: Full Skill Instructions (On-Demand)
-
-**What's Loaded**: Complete operation documentation (~2,500 tokens)
-
-**Purpose**: Detailed instructions for each search operation
-
-**When Loaded**: Only when Claude invokes the skill
-
-**Example Structure**:
-```
-/skills/search/
-├── SKILL.md (main frontmatter)
-├── operations/
-│   ├── observations.md (detailed instructions)
-│   ├── sessions.md
-│   ├── prompts.md
-│   ├── by-type.md
-│   ├── by-concept.md
-│   ├── by-file.md
-│   ├── recent-context.md
-│   ├── timeline.md
-│   ├── timeline-by-query.md
-│   ├── help.md
-│   ├── formatting.md
-│   └── common-workflows.md
-```
-
-### Layer 3: API Response
-
-**What's Returned**: Search results in requested format
-
-**Format Options**:
- `index` - Titles, dates, IDs only (~50-100 tokens per result)
- `full` - Complete details (~500-1000 tokens per result)
-
-**Progressive Usage**: Start with `index`, drill down with `full` as needed
-
-## Implementation Details
-
-### mem-search Skill Structure
-
-```
-plugin/skills/mem-search/
-├── SKILL.md                           # Main frontmatter (~250 tokens)
-├── operations/
-│   ├── observations.md                # Search observations
-│   ├── sessions.md                    # Search sessions
-│   ├── prompts.md                     # Search prompts
-│   ├── by-type.md                     # Filter by type
-│   ├── by-concept.md                  # Filter by concept
-│   ├── by-file.md                     # Filter by file
-│   ├── recent-context.md              # Get recent context
-│   ├── timeline.md                    # Timeline around point
-│   ├── timeline-by-query.md           # Search + timeline
-│   ├── help.md                        # API documentation
-│   ├── formatting.md                  # Result formatting guide
-│   └── common-workflows.md            # Usage patterns
-```
-
-### Worker Service Integration
-
-**File**: `src/services/worker-service.ts`
-
-**Search Routes**:
-```typescript
-// Full-text search
-app.get('/api/search/observations', handleSearchObservations);
-app.get('/api/search/sessions', handleSearchSessions);
-app.get('/api/search/prompts', handleSearchPrompts);
-
-// Filtered search
-app.get('/api/search/by-type', handleSearchByType);
-app.get('/api/search/by-concept', handleSearchByConcept);
-app.get('/api/search/by-file', handleSearchByFile);
-
-// Context retrieval
-app.get('/api/context/recent', handleRecentContext);
-app.get('/api/context/timeline', handleTimeline);
-app.get('/api/timeline/by-query', handleTimelineByQuery);
-
-// Documentation
-app.get('/api/search/help', handleHelp);
-```
-
-**Database Access**:
- Uses `SessionSearch` service for FTS5 queries
- Uses `SessionStore` for structured queries
- Hybrid search with ChromaDB for semantic similarity
-
-### Security
-
-**FTS5 Injection Prevention** (v4.2.3):
 ```typescript
 function escapeFTS5Query(query: string): string {
  return query.replace(/"/g, '""');
 }
 ```

-All user-provided search queries are properly escaped to prevent SQL injection.
+**Testing:** 332 injection attack tests covering special characters, SQL keywords, quote escaping, and boolean operators.

-**Comprehensive Testing**: 332 injection attack tests covering:
- Special characters
- SQL keywords
- Quote escaping
- Boolean operators
+### MCP Protocol Security

-## Benefits
+- Stdio transport (no network exposure)
+- Local-only HTTP API (localhost:37777)
+- No authentication needed (local development only)

-### 1. Token Efficiency
+## Performance

-**Before (MCP)**:
- Session start: All tool definitions loaded upfront
- Every session pays this cost
- No progressive disclosure
+**FTS5 Full-Text Search:** <10ms for typical queries

-**After (Skill)**:
- Session start: Minimal token cost for skill frontmatter
- Full instructions loaded only when invoked (progressive disclosure)
- More efficient than loading all tool definitions upfront
+**MCP Overhead:** Minimal - simple protocol translation

-### 2. Natural Language Interface
+**Caching:** HTTP layer allows response caching (future enhancement)

-**Before**: Users needed to learn MCP tool syntax
-```
-search_observations with query="authentication" and type="decision"
-```
+**Pagination:** Efficient with offset/limit

-**After**: Users ask naturally
-```
-"What decisions did we make about authentication?"
-```
+**Batching:** `get_observations` accepts multiple IDs in single call

-Claude translates to appropriate API call.
+## Benefits Over Alternative Approaches

-### 3. Flexibility
+### vs. Traditional RAG

-**HTTP API Benefits**:
- Can be called from skills, MCP tools, or other clients
- Easy to test with curl
- Standard REST conventions
- JSON responses
+**Traditional RAG:**
+- Fetches everything upfront
+- High token cost
+- Low relevance ratio

-**Progressive Disclosure**:
- Loads only what's needed
- Can add more operations without increasing base cost
- Documentation co-located with operations
+**3-Layer MCP:**
+- Fetches only what's needed
+- ~10x token savings
+- 100% relevance (Claude chooses what to fetch)

-### 4. Performance
+### vs. Previous MCP Implementation (v5.x)

-**Fast Queries**: FTS5 full-text search under 10ms for typical queries
+**Previous (9 tools):**
+- Complex schemas
+- Overlapping operations
+- No workflow guidance
+- ~2,500 tokens in definitions

-**Caching**: HTTP layer allows response caching
+**Current (4 tools):**
+- Simple schemas
+- Clear workflow
+- Built-in guidance
+- ~312 lines of code

-**Pagination**: Efficient result pagination with offset/limit
+### vs. Skill-Based Approach (Previously)

-## Migration Notes
+**Skill approach:**
+- Required separate skill files
+- HTTP API called directly via curl
+- Progressive disclosure through skill loading

-### For Users
+**MCP approach:**
+- Native MCP protocol (better Claude integration)
+- Cleaner architecture (protocol translation layer)
+- Works with both Claude Desktop and Claude Code
+- Simpler to maintain (no skill files)

-**No Action Required**: The migration from MCP to skill-based search is transparent.
-
-**Same Questions Work**: Natural language queries work exactly the same way.
-
-**Invisible Change**: Users won't notice any difference except better performance.
-
-### For Developers
-
-**Renamed**: MCP server (formerly `search-server.ts`, now `src/servers/mcp-server.ts`)
- Source file kept for reference
- No longer built or registered
- MCP configuration removed from `plugin/.mcp.json`
-
-**New Implementation**: Skill-based search
- Skill files: `plugin/skills/mem-search/`
- HTTP endpoints: `src/services/worker-service.ts` (lines 200-400)
- Build script: `npm run build` includes skill files
- Sync script: `npm run sync-marketplace` copies to plugin directory
+**Migration:** Skill-based search was removed in favor of streamlined MCP architecture.

 ## Troubleshooting

+### MCP Server Not Connected
+
+**Symptoms:** Tools not appearing in Claude
+
+**Solution:**
+1. Check MCP server path in configuration
+2. Verify worker service is running: `curl http://localhost:37777/api/health`
+3. Restart Claude Desktop/Code
+
 ### Worker Service Not Running

-If searches fail, check worker service:
+**Symptoms:** MCP tools fail with connection errors

+**Solution:**
 ```bash
 npm run worker:status       # Check status
 npm run worker:restart      # Restart worker
 npm run worker:logs         # View logs
 ```

-### HTTP Endpoints Not Responding
+### Empty Search Results

-Test endpoints directly:
+**Symptoms:** search() returns no results

-```bash
-# Health check
-curl http://localhost:37777/health
-
-# Search test
-curl "http://localhost:37777/api/search/observations?query=test&limit=1"
-```
-
-### Skill Not Invoking
-
-If Claude doesn't invoke the mem-search skill automatically:
-
-1. Check skill files exist: `ls ~/.claude/plugins/marketplaces/thedotmack/plugin/skills/mem-search/`
-2. Restart Claude Code session to reload skill definitions
-3. Try more explicit phrasing: "Search past sessions for bug fixes" or "What did we do in yesterday's session?"
-4. Ensure your question is about previous sessions (not current conversation context)
+**Troubleshooting:**
+1. Test API directly: `curl "http://localhost:37777/api/search?query=test"`
+2. Check database: `ls ~/.claude-mem/claude-mem.db`
+3. Verify observations exist: `curl "http://localhost:37777/api/health"`

 ## Next Steps

- [Search Tools Usage](/usage/search-tools) - User guide with examples
+- [Memory Search Usage](/usage/search-tools) - User guide with examples
+- [Progressive Disclosure](/progressive-disclosure) - Philosophy behind 3-layer workflow
 - [Worker Service Architecture](/architecture/worker-service) - HTTP API details
 - [Database Schema](/architecture/database) - FTS5 tables and indexes
@@ -260,14 +260,12 @@ The index is useless without retrieval mechanisms:
 *Use claude-mem MCP search to access records with the given ID*
 ```

-**Available tools:**
- `search_observations` - Full-text search
- `find_by_concept` - Concept-based retrieval
- `find_by_file` - File-based retrieval
- `find_by_type` - Type-based retrieval
- `get_recent_context` - Recent session summaries
+**Available MCP tools:**
+- `search` - Search memory index (Layer 1: Get IDs)
+- `timeline` - Get chronological context (Layer 2: See narrative arc)
+- `get_observations` - Fetch full details (Layer 3: Deep dive)

-Each tool supports `format: "index"` (default) and `format: "full"`.
+The 3-layer workflow ensures progressive disclosure: index → context → details.

 ---

@@ -318,16 +316,18 @@ Is my task related to npm? → YES

 ---

-## The Two-Tier Search Strategy
+## The Three-Layer Workflow

-Claude-Mem implements progressive disclosure in search results too:
+Claude-Mem implements progressive disclosure through a 3-layer workflow pattern:

-### Tier 1: Index Format (Default)
+### Layer 1: Search (Index)
+
+Start by searching to get a compact index with IDs:

 ```typescript
-search_observations({
+search({
  query: "hook timeout",
-  format: "index"  // Default
+  limit: 10
 })
 ```

@@ -335,23 +335,40 @@ search_observations({
 ```
 Found 3 observations matching "hook timeout":

-| ID | Date | Type | Title | Tokens |
-|----|------|------|-------|--------|
-| #2543 | Oct 26 | gotcha | Hook timeout: 60s too short | ~155 |
-| #2891 | Oct 25 | how-it-works | Hook timeout configuration | ~203 |
-| #2102 | Oct 20 | problem-solution | Fixed timeout in CI | ~89 |
+| ID | Date | Type | Title |
+|----|------|------|-------|
+| #2543 | Oct 26 | gotcha | Hook timeout: 60s too short |
+| #2891 | Oct 25 | how-it-works | Hook timeout configuration |
+| #2102 | Oct 20 | problem-solution | Fixed timeout in CI |
 ```

-**Cost:** ~100 tokens for 3 results
-**Value:** Agent can scan and decide which to fetch
+**Cost:** ~50-100 tokens per result
+**Value:** Agent can scan and decide which observations are relevant

-### Tier 2: Full Format (On-Demand)
+### Layer 2: Timeline (Context)
+
+Get chronological context around interesting observations:

 ```typescript
-search_observations({
-  query: "hook timeout",
-  format: "full",
-  limit: 1  // Fetch just the most relevant
+timeline({
+  anchor: 2543,  // Observation ID from search
+  depth_before: 3,
+  depth_after: 3
+})
+```
+
+**Returns:** Chronological view showing what happened before/during/after observation #2543
+
+**Cost:** Variable based on depth
+**Value:** Understand narrative arc and context
+
+### Layer 3: Get Observations (Details)
+
+Fetch full details only for relevant observations:
+
+```typescript
+get_observations({
+  ids: [2543, 2102]  // Selected from search results
 })
 ```

@@ -463,29 +480,30 @@ Here are 10 observations.
 *Use MCP search tools to fetch full observation details on-demand*
 ```

-### ❌ Defaulting to Full Format
+### ❌ Skipping the Index Layer

 **Bad:**
 ```typescript
-search_observations({
-  query: "hooks",
-  format: "full"  // Fetches everything
+// Fetching full details immediately
+get_observations({
+  ids: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]  // Guessing which are relevant
 })
 ```

 **Good:**
 ```typescript
-search_observations({
+// Follow the 3-layer workflow
+// Layer 1: Search for index
+search({
  query: "hooks",
-  format: "index",  // Scan first
  limit: 20
 })

-// Then, if needed:
-search_observations({
-  query: "hooks",
-  format: "full",
-  limit: 1  // Just the most relevant
+// Layer 2: Review index, identify 2-3 relevant IDs
+
+// Layer 3: Fetch only relevant observations
+get_observations({
+  ids: [2543, 2891]  // Just the most relevant
 })
 ```

@@ -595,10 +613,9 @@ SessionStart({ source: "compact" }):

 ```typescript
 // Use embeddings to pre-sort index by relevance
-search_observations({
+search({
  query: "authentication bug",
-  format: "index",
-  sort: "relevance"  // Based on semantic similarity
+  orderBy: "relevance"  // Based on semantic similarity (future enhancement)
 })
 ```

@@ -742,17 +742,17 @@ sqlite3 ~/.claude-mem/claude-mem.db "

 3. Test simple query:
   ```bash
-   # In Claude Code
-   search_observations with query="test"
+   # Test MCP search tool
+   search(query="test", limit=5)
   ```

 4. Check query syntax:
   ```bash
-   # Bad: Special characters
-   search_observations with query="[test]"
+   # Bad: Special characters may cause issues
+   search(query="[test]")

   # Good: Simple words
-   search_observations with query="test"
+   search(query="test")
   ```

 ### Token Limit Errors
@@ -761,28 +761,40 @@ sqlite3 ~/.claude-mem/claude-mem.db "

 **Solutions**:

-1. Use index format:
+1. Follow 3-layer workflow (don't skip to get_observations):
   ```bash
-   search_observations with query="..." and format="index"
+   # Start with search to get index
+   search(query="...", limit=10)
+
+   # Review IDs, then fetch only relevant ones
+   get_observations(ids=[<2-3 relevant IDs>])
   ```

-2. Reduce limit:
+2. Reduce limit in search:
   ```bash
-   search_observations with query="..." and limit=3
+   search(query="...", limit=3)
   ```

 3. Use filters to narrow results:
   ```bash
-   search_observations with query="..." and type="decision" and limit=5
+   search(query="...", type="decision", limit=5)
   ```

 4. Paginate results:
   ```bash
   # First page
-   search_observations with query="..." and limit=5 and offset=0
+   search(query="...", limit=5, offset=0)

   # Second page
-   search_observations with query="..." and limit=5 and offset=5
+   search(query="...", limit=5, offset=5)
+   ```
+
+5. Batch IDs in get_observations:
+   ```bash
+   # Always batch multiple IDs in one call
+   get_observations(ids=[123, 456, 789])
+
+   # Don't make separate calls per ID
   ```

 ## Performance Issues
@@ -1,403 +1,454 @@
 ---
-title: "mem-search Skill"
-description: "Query your project history with natural language"
+title: "Memory Search"
+description: "Search your project history with MCP tools"
 ---

-# mem-search Skill Usage
+# Memory Search with MCP Tools

-Once claude-mem is installed as a plugin, you can search your project history using natural language. Claude automatically invokes the mem-search skill when you ask about past work.
+Claude-mem provides persistent memory across sessions through **4 MCP tools** that follow a token-efficient **3-layer workflow pattern**.

-## How It Works
+## Overview

-**v5.5.0 Enhancement**: The search skill was renamed to "mem-search" for better scope differentiation, with effectiveness increased from 67% to 100% and enhanced concrete triggers (85% vs 44%).
+Instead of fetching all historical data upfront (expensive), claude-mem uses a progressive disclosure approach:

-**v5.4.0 Architecture**: Claude-Mem uses a skill-based search architecture instead of MCP tools, saving ~2,250 tokens per session start through progressive disclosure.
+1. **Search** → Get a compact index with IDs (~50-100 tokens/result)
+2. **Timeline** → Get context around interesting results
+3. **Get Observations** → Fetch full details ONLY for filtered IDs

-**Simple Usage:**
- Just ask naturally: *"What did we do last session?"*
- Claude recognizes the intent and invokes the mem-search skill
- The skill uses HTTP API endpoints to query your memory
- Results are formatted and presented to you
+This achieves **~10x token savings** compared to traditional RAG approaches.

-**Benefits:**
- **Token Efficient**: ~250 tokens (skill frontmatter) vs ~2,500 tokens (MCP tool definitions)
- **Natural Language**: No need to learn specific tool syntax
- **Progressive Disclosure**: Only loads detailed instructions when needed
- **Auto-Invoked**: Claude knows when to search based on your questions
- **Scope Differentiation**: "mem-search" clearly distinguishes from native conversation memory
+## The 3-Layer Workflow

-## Quick Reference
+### Layer 1: Search (Index)

-| Operation               | Purpose                                      |
-|-------------------------|----------------------------------------------|
-| Search Observations     | Full-text search across observations         |
-| Search Sessions         | Full-text search across session summaries    |
-| Search Prompts          | Full-text search across raw user prompts     |
-| By Concept              | Find observations tagged with concepts       |
-| By File                 | Find observations referencing files          |
-| By Type                 | Find observations by type                    |
-| Recent Context          | Get recent session context                   |
-| Timeline                | Get unified timeline around a specific point |
-| Timeline by Query       | Search and get timeline context in one step  |
-| API Help                | Get search API documentation                 |
-
-## Example Queries
-
-### Natural Language Queries
-
-**Search Observations:**
-```
-"What bugs did we fix related to authentication?"
-"Show me all decisions about the build system"
-"Find refactoring work on the database"
-```
-
-**Search Sessions:**
-```
-"What did we learn about hooks?"
-"What was accomplished in the API implementation?"
-"Show me recent work on this project"
-```
-
-**Search Prompts:**
-```
-"When did I ask about authentication features?"
-"Find all my requests about dark mode"
-```
-
-**Note**: Claude automatically translates your natural language queries into the appropriate search operations.
-
-### Search by File
+Start by searching to get a lightweight index of results:

 ```
-"Show me everything related to worker-service.ts"
-"What changes were made to migrations.ts?"
-"Find all work on the database file"
+search(query="authentication bug", type="bugfix", limit=10)
 ```

-### Search by Concept
+**Returns:** Compact table with IDs, titles, dates, types
+**Cost:** ~50-100 tokens per result
+**Purpose:** Survey what exists before fetching details
+
+### Layer 2: Timeline (Context)
+
+Get chronological context around specific observations:

 ```
-"Show observations tagged with architecture"
-"Find all security-related observations"
-"What patterns have we used?"
+timeline(anchor=<observation_id>, depth_before=3, depth_after=3)
 ```

-### Search by Type
+Or search and get timeline in one step:

 ```
-"Find all feature implementations"
-"Show me all decisions and discoveries"
-"What bugs have we fixed?"
+timeline(query="authentication", depth_before=2, depth_after=2)
 ```

-### Recent Context
+**Returns:** Chronological view showing what was happening before/after
+**Cost:** Variable, depends on depth
+**Purpose:** Understand narrative arc and context
+
+### Layer 3: Get Observations (Details)
+
+Fetch full details only for relevant observations:

 ```
-"Show me what we've been working on"
-"Get context from the last 5 sessions"
-"What happened recently on this project?"
+get_observations(ids=[123, 456, 789])
 ```

-### Timeline Queries
+**Returns:** Complete observation details (narrative, facts, files, concepts)
+**Cost:** ~500-1000 tokens per observation
+**Purpose:** Deep dive on specific, validated items

-**Get timeline around a specific point:**
+### Why This Works
+
+**Traditional Approach:**
+- Fetch everything upfront: 20,000 tokens
+- Relevance: ~10% (2,000 tokens actually useful)
+- Waste: 18,000 tokens on irrelevant context
+
+**3-Layer Approach:**
+- Search index: 1,000 tokens (10 results)
+- Timeline context: 500 tokens (around 2 key results)
+- Fetch details: 1,500 tokens (3 observations)
+- **Total: 3,000 tokens, 100% relevant**
+
+## Available Tools
+
+### `__IMPORTANT` - Workflow Documentation
+
+Always visible reminder of the 3-layer workflow pattern. Helps Claude understand how to use the search tools efficiently.
+
+**Usage:** Automatically shown, no need to invoke
+
+### `search` - Search Memory Index
+
+Search your memory and get a compact index with IDs.
+
+**Parameters:**
+- `query` - Full-text search query (supports AND, OR, NOT, phrase searches)
+- `limit` - Maximum results (default: 20)
+- `offset` - Skip first N results for pagination
+- `type` - Filter by observation type (bugfix, feature, decision, discovery, refactor, change)
+- `obs_type` - Filter by record type (observation, session, prompt)
+- `project` - Filter by project name
+- `dateStart` - Filter by start date (YYYY-MM-DD)
+- `dateEnd` - Filter by end date (YYYY-MM-DD)
+- `orderBy` - Sort order (date_desc, date_asc, relevance)
+
+**Returns:** Compact index table with IDs, titles, dates, types
+
+**Example:**
 ```
-"What was happening when we implemented authentication?"
-"Show me the context around that bug fix"
-"What led to the decision to refactor the database?"
+search(query="database migration", type="bugfix", limit=5, orderBy="date_desc")
 ```

-**Timeline by query:**
+### `timeline` - Get Chronological Context
+
+Get a chronological view of observations around a specific point or query.
+
+**Parameters:**
+- `anchor` - Observation ID to center timeline around (optional if query provided)
+- `query` - Search query to find anchor automatically (optional if anchor provided)
+- `depth_before` - Number of observations before anchor (default: 3)
+- `depth_after` - Number of observations after anchor (default: 3)
+- `project` - Filter by project name
+
+**Returns:** Chronological list showing what happened before/during/after
+
+**Example:**
 ```
-"Find when we added the viewer UI and show what happened around that time"
-"Search for authentication work and show the timeline"
+timeline(anchor=12345, depth_before=5, depth_after=5)
 ```

-**Benefits:**
- See the complete narrative arc around key events
- All record types (observations, sessions, prompts) in chronological view
- Understand what was happening before and after important changes
-
-## Search Strategy
-
-The mem-search skill uses a progressive disclosure pattern to efficiently retrieve information:
-
-### 1. Ask Naturally
-
-Start with a natural language question:
+Or search-based:
 ```
-"What bugs did we fix related to authentication?"
+timeline(query="implemented JWT auth", depth_before=3, depth_after=3)
 ```

-### 2. Claude Invokes mem-search Skill
+### `get_observations` - Fetch Full Details

-Claude recognizes your intent and loads the mem-search skill (~250 tokens for skill frontmatter).
+Fetch complete observation details by IDs. **Always batch multiple IDs in a single call for efficiency.**

-### 3. Skill Uses HTTP API
+**Parameters:**
+- `ids` - Array of observation IDs (required)
+- `orderBy` - Sort order (date_desc, date_asc)
+- `limit` - Maximum observations to return
+- `project` - Filter by project name

-The skill calls the appropriate HTTP endpoint (e.g., `/api/search/observations`) with the query.
+**Returns:** Complete observation details including narrative, facts, files, concepts

-### 4. Results Formatted
-
-Results are formatted and presented to you, usually starting with an index/summary format.
-
-### 5. Deep Dive if Needed
-
-If you need more details, ask follow-up questions:
+**Example:**
 ```
-"Tell me more about observation #123"
-"Show me the full details of that decision"
+get_observations(ids=[123, 456, 789, 1011])
 ```

-**Benefits of This Approach:**
- **Token Efficient**: Only loads what you need, when you need it
- **Natural**: No syntax to learn
- **Progressive**: Start with overview, drill down as needed
- **Automatic**: Claude handles the search invocation
+**Important:** Always batch IDs instead of making separate calls per observation.
+
+## Common Use Cases
+
+### Debugging Issues
+
+**Scenario:** Find what went wrong with database connections
+
+```
+Step 1: search(query="error database connection", type="bugfix", limit=10)
+  → Review index, identify observations #245, #312, #489
+
+Step 2: timeline(anchor=312, depth_before=3, depth_after=3)
+  → See what was happening around the fix
+
+Step 3: get_observations(ids=[312, 489])
+  → Get full details on relevant fixes
+```
+
+### Understanding Decisions
+
+**Scenario:** Review architectural choices about authentication
+
+```
+Step 1: search(query="authentication", type="decision", limit=5)
+  → Find decision observations
+
+Step 2: get_observations(ids=[<relevant_ids>])
+  → Get full decision rationale, trade-offs, facts
+```
+
+### Code Archaeology
+
+**Scenario:** Find when a specific file was modified
+
+```
+Step 1: search(query="worker-service.ts", limit=20)
+  → Get all observations mentioning that file
+
+Step 2: timeline(query="worker-service.ts refactor", depth_before=2, depth_after=2)
+  → See what led to and followed from the refactor
+
+Step 3: get_observations(ids=[<specific_observation_ids>])
+  → Get implementation details
+```
+
+### Feature History
+
+**Scenario:** Track how a feature evolved
+
+```
+Step 1: search(query="dark mode", type="feature", orderBy="date_asc")
+  → Chronological view of feature work
+
+Step 2: timeline(anchor=<first_observation_id>, depth_after=10)
+  → See the full development timeline
+
+Step 3: get_observations(ids=[<key_milestones>])
+  → Deep dive on critical implementation points
+```
+
+### Learning from Past Work
+
+**Scenario:** Review refactoring patterns
+
+```
+Step 1: search(type="refactor", limit=10, orderBy="date_desc")
+  → Recent refactoring work
+
+Step 2: get_observations(ids=[<interesting_ids>])
+  → Study the patterns and approaches used
+```
+
+### Context Recovery
+
+**Scenario:** Restore context after time away from project
+
+```
+Step 1: search(query="project-name", limit=10, orderBy="date_desc")
+  → See recent work
+
+Step 2: timeline(anchor=<most_recent_id>, depth_before=10)
+  → Understand what led to current state
+
+Step 3: get_observations(ids=[<critical_observations>])
+  → Refresh memory on key decisions
+```
+
+## Search Query Syntax
+
+The `query` parameter supports SQLite FTS5 full-text search syntax:
+
+### Boolean Operators
+
+```
+query="authentication AND JWT"           # Both terms must appear
+query="OAuth OR JWT"                      # Either term can appear
+query="security NOT deprecated"           # Exclude deprecated items
+```
+
+### Phrase Searches
+
+```
+query='"database migration"'             # Exact phrase match
+```
+
+### Column-Specific Searches
+
+```
+query="title:authentication"             # Search in title only
+query="content:database"                  # Search in content only
+query="concepts:security"                 # Search in concepts only
+```
+
+### Combining Operators
+
+```
+query='"user auth" AND (JWT OR session) NOT deprecated'
+```
+
+## Token Management
+
+### Token Efficiency Best Practices
+
+1. **Always start with search** - Get index first (~50-100 tokens/result)
+2. **Use small limits** - Start with 3-5 results, increase if needed
+3. **Filter before fetching** - Use type, date, project filters
+4. **Batch get_observations** - Always group multiple IDs in one call
+5. **Use timeline strategically** - Get context only when narrative matters
+
+### Token Cost Estimates
+
+| Operation | Tokens per Result |
+|-----------|-------------------|
+| search (index) | 50-100 |
+| timeline (per observation) | 100-200 |
+| get_observations (full details) | 500-1,000 |
+
+**Example Comparison:**
+
+**Inefficient:**
+```
+# Fetching 20 full observations upfront: 10,000-20,000 tokens
+get_observations(ids=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20])
+```
+
+**Efficient:**
+```
+# Search index: ~1,000 tokens
+search(query="bug fix", limit=20)
+
+# Review IDs, identify 3 relevant observations
+
+# Fetch only relevant: ~1,500-3,000 tokens
+get_observations(ids=[5, 12, 18])
+
+# Total: 2,500-4,000 tokens (vs 10,000-20,000)
+```

 ## Advanced Filtering

-You can refine searches using natural language filters:
-
 ### Date Ranges

 ```
-"What bugs did we fix in October?"
-"Show me work from last week"
-"Find decisions made between October 1-31"
+search(
+  query="performance optimization",
+  dateStart="2025-10-01",
+  dateEnd="2025-10-31"
+)
 ```

 ### Multiple Types

-```
-"Show me all decisions and features"
-"Find bugfixes and refactorings"
-```
-
-### Concepts
+For observations of multiple types, make multiple searches or use broader query:

 ```
-"Find database work related to architecture and performance"
-"Show security observations"
+search(query="database", type="bugfix", limit=10)
+search(query="database", type="feature", limit=10)
 ```

-### File-Specific
+### Project-Specific

 ```
-"Show refactoring work that touched worker-service.ts"
-"Find changes to auth files"
+search(query="API", project="my-app", limit=15)
 ```

-### Project Filtering
+### Pagination

 ```
-"Show authentication work on my-app project"
-"What have we done on this codebase?"
+# First page
+search(query="refactor", limit=10, offset=0)
+
+# Second page
+search(query="refactor", limit=10, offset=10)
+
+# Third page
+search(query="refactor", limit=10, offset=20)
 ```

-**Note**: Claude translates your natural language into the appropriate API filters automatically.
-
-## Under the Hood: HTTP API
-
-The mem-search skill uses HTTP endpoints on the worker service (port 37777):
-
- `GET /api/search/observations` - Full-text search observations
- `GET /api/search/sessions` - Full-text search session summaries
- `GET /api/search/prompts` - Full-text search user prompts
- `GET /api/search/by-concept` - Find observations by concept tag
- `GET /api/search/by-file` - Find work related to specific files
- `GET /api/search/by-type` - Find observations by type
- `GET /api/context/recent` - Get recent session context
- `GET /api/context/timeline` - Get timeline around specific point
- `GET /api/timeline/by-query` - Search + timeline in one call
- `GET /api/search/help` - API documentation
-
-These endpoints use FTS5 full-text search with support for:
- Boolean operators (AND, OR, NOT)
- Phrase searches
- Column-specific searches
- Date range filtering
- Project filtering
-
 ## Result Metadata

-All results include rich metadata:
+All observations include rich metadata:

-```
-## JWT authentication decision
-
-**Type**: decision
-**Date**: 2025-10-21 14:23:45
-**Concepts**: authentication, security, architecture
-**Files Read**: src/auth/middleware.ts, src/utils/jwt.ts
-**Files Modified**: src/auth/jwt-strategy.ts
-
-**Narrative**:
-Decided to implement JWT-based authentication instead of session-based
-authentication for better scalability and stateless design...
-
-**Facts**:
-• JWT tokens expire after 1 hour
-• Refresh tokens stored in httpOnly cookies
-• Token signing uses RS256 algorithm
-• Public keys rotated every 30 days
-```
-
-## Citations
-
-All search results include observation IDs that can be accessed via the HTTP API:
-
- `http://localhost:37777/api/observation/{id}` - Get specific observation by ID
- View all observations in the web viewer at `http://localhost:37777`
-
-These citations enable referencing specific historical context in your work.
-
-## Token Management
-
-### Token Efficiency Tips
-
-1. **Start with index format**: ~50-100 tokens per result
-2. **Use small limits**: Start with 3-5 results
-3. **Apply filters**: Narrow results before searching
-4. **Paginate**: Use offset to browse results in batches
-
-### Token Estimates
-
-| Format | Tokens per Result |
-|--------|-------------------|
-| Index  | 50-100            |
-| Full   | 500-1000          |
-
-**Example**:
- 20 results in index format: ~1,000-2,000 tokens
- 20 results in full format: ~10,000-20,000 tokens
-
-## Common Use Cases
-
-### 1. Debugging Issues
-
-Find what went wrong:
-```
-search_observations with query="error database connection" and type="bugfix"
-```
-
-### 2. Understanding Decisions
-
-Review architectural choices:
-```
-find_by_type with type="decision" and format="index"
-```
-
-Then deep dive on specific decisions:
-```
-search_observations with query="[DECISION TITLE]" and format="full"
-```
-
-### 3. Code Archaeology
-
-Find when a file was modified:
-```
-find_by_file with filePath="worker-service.ts"
-```
-
-### 4. Feature History
-
-Track feature development:
-```
-search_sessions with query="authentication feature"
-search_user_prompts with query="add authentication"
-```
-
-### 5. Learning from Past Work
-
-Review refactoring patterns:
-```
-find_by_type with type="refactor" and limit=10
-```
-
-### 6. Context Recovery
-
-Restore context after time away:
-```
-get_recent_context with limit=5
-search_sessions with query="[YOUR PROJECT NAME]" and orderBy="date_desc"
-```
-
-## Best Practices
-
-1. **Index first, full later**: Always start with index format
-2. **Small limits**: Start with 3-5 results to avoid token limits
-3. **Use filters**: Narrow results before searching
-4. **Specific queries**: More specific = better results
-5. **Review citations**: Use citations to reference past decisions
-6. **Date filtering**: Use date ranges for time-based searches
-7. **Type filtering**: Use types to categorize searches
-8. **Concept tags**: Use concepts for thematic searches
+- **ID** - Unique observation identifier
+- **Type** - bugfix, feature, decision, discovery, refactor, change
+- **Date** - When the work occurred
+- **Title** - Concise description
+- **Concepts** - Tagged themes (e.g., security, performance, architecture)
+- **Files Read** - Files examined during work
+- **Files Modified** - Files changed during work
+- **Narrative** - Story of what happened and why
+- **Facts** - Key factual points (decisions made, patterns used, metrics)

 ## Troubleshooting

 ### No Results Found

-1. Check database has data:
+1. **Broaden your search:**
+   ```
+   # Too specific
+   search(query="JWT authentication implementation with RS256")
+
+   # Better
+   search(query="authentication")
+   ```
+
+2. **Check database has data:**
   ```bash
-   sqlite3 ~/.claude-mem/claude-mem.db "SELECT COUNT(*) FROM observations;"
+   curl "http://localhost:37777/api/search?query=test"
   ```

-2. Try broader natural language query:
+3. **Try without filters:**
   ```
-   "Show me anything about authentication"  # Broader
-   vs
-   "Find exact JWT authentication implementation"  # Too specific
+   # Remove type/date filters to see if data exists
+   search(query="your-search-term")
   ```

-3. Ask without filters first:
-   ```
-   "What do we have about auth?"
-   # Then narrow down
-   "Show me auth-related decisions"
-   ```
+### IDs Not Found in get_observations

-### Worker Service Not Running
+**Error:** "Observation IDs not found: [123, 456]"

-If search isn't working, check the worker service:
+**Causes:**
+- IDs from different project (use `project` parameter)
+- IDs were deleted
+- Typo in ID numbers

-```bash
-npm run worker:status       # Check worker status
-npm run worker:restart      # Restart if needed
-npm run worker:logs         # View logs
+**Solution:**
+```
+# Verify IDs exist
+search(query="<related-search>")
+
+# Use correct project filter
+get_observations(ids=[123, 456], project="correct-project-name")
 ```

-Or describe the issue to Claude and the troubleshoot skill will automatically activate to provide diagnosis.
+### Token Limit Errors

-### Performance Issues
+**Error:** Response exceeds token limits
+
+**Solution:** Use the 3-layer workflow to reduce upfront costs:
+
+```
+# Instead of fetching 50 full observations:
+# get_observations(ids=[1,2,3,...,50])  # 25,000-50,000 tokens!
+
+# Do this:
+search(query="<your-query>", limit=50)  # ~2,500-5,000 tokens
+# Review index, identify 5 relevant observations
+get_observations(ids=[<5-most-relevant>])  # ~2,500-5,000 tokens
+# Total: 5,000-10,000 tokens (50-80% savings)
+```
+
+### Search Performance

 If searches seem slow:
-1. Be more specific in your queries
-2. Ask for recent work (naturally filters by date)
-3. Specify the project you're interested in
-4. Ask for fewer results initially
+1. Be more specific in queries (helps FTS5 index)
+2. Use date range filters to narrow scope
+3. Specify project filter when possible
+4. Use smaller limit values
+
+## Best Practices
+
+1. **Index First, Details Later** - Always start with search to survey options
+2. **Filter Before Fetching** - Use search parameters to narrow results
+3. **Batch ID Fetches** - Group multiple IDs in one get_observations call
+4. **Use Timeline for Context** - When narrative matters, timeline shows the story
+5. **Specific Queries** - More specific = better relevance
+6. **Small Limits Initially** - Start with 3-5 results, expand if needed
+7. **Review Before Deep Dive** - Check index before fetching full details

 ## Technical Details

-**Architecture Change (v5.4.0)**:
- **Before**: 9 MCP tools (~2,500 tokens in tool definitions per session start)
- **After**: 1 mem-search skill (~250 tokens in frontmatter, full instructions loaded on-demand)
- **Savings**: ~2,250 tokens per session start
- **Migration**: Transparent - users don't need to change how they ask questions
+**Architecture:** MCP tools are a thin wrapper over the Worker HTTP API (localhost:37777). The MCP server translates tool calls into HTTP requests to the worker service, which handles all business logic, database queries, and Chroma vector search.

-**v5.5.0 Enhancement**: Renamed from "search" to "mem-search" with improved effectiveness (67% → 100%) and enhanced triggers (44% → 85%).
+**MCP Server:** Located at `~/.claude/plugins/marketplaces/thedotmack/plugin/scripts/mcp-server.cjs`

-**How the Skill Works:**
-1. User asks a question about past work
-2. Claude recognizes the intent matches the mem-search skill description
-3. Skill loads full instructions from `plugin/skills/mem-search/SKILL.md`
-4. Skill uses `curl` to call HTTP API endpoints
-5. Results formatted and returned to Claude
-6. Claude presents results to user
+**Worker Service:** Express API on port 37777, managed by Bun
+
+**Database:** SQLite FTS5 full-text search on `~/.claude-mem/claude-mem.db`
+
+**Vector Search:** Chroma embeddings for semantic search (underlying implementation)

 ## Next Steps

+- [Progressive Disclosure](/progressive-disclosure) - Philosophy behind 3-layer workflow
 - [Architecture Overview](/architecture/overview) - System components
- [Database Schema](/architecture/database) - Understanding the data
- [Getting Started](/usage/getting-started) - Automatic operation
+- [Database Schema](/architecture/database) - Understanding the data structure
+- [Claude Desktop Setup](/usage/claude-desktop) - Installation and configuration