d874ce6eb3
Changed '<10ms' to 'Sub-10ms' to avoid MDX interpreting the < character as an HTML tag opening, which was causing deployment failure. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
498 lines
13 KiB
Plaintext
498 lines
13 KiB
Plaintext
---
|
|
title: "Search Architecture"
|
|
description: "MCP tools with 3-layer workflow for token-efficient memory retrieval"
|
|
---
|
|
|
|
# Search Architecture
|
|
|
|
Claude-mem uses an **MCP-based search architecture** that provides intelligent memory retrieval through 4 streamlined tools following a 3-layer workflow pattern.
|
|
|
|
## Overview
|
|
|
|
**Architecture**: MCP Tools → MCP Protocol → HTTP API → Worker Service
|
|
|
|
**Key Components**:
|
|
1. **MCP Tools** (4 tools) - `search`, `timeline`, `get_observations`, `__IMPORTANT`
|
|
2. **MCP Server** (`plugin/scripts/mcp-server.cjs`) - Thin wrapper over HTTP API
|
|
3. **HTTP API Endpoints** - Fast search operations on Worker Service (port 37777)
|
|
4. **Worker Service** - Express.js server with FTS5 full-text search
|
|
5. **SQLite Database** - Persistent storage with FTS5 virtual tables
|
|
6. **Chroma Vector DB** - Semantic search with hybrid retrieval
|
|
|
|
**Token Efficiency**: ~10x savings through 3-layer workflow pattern
|
|
|
|
## How It Works
|
|
|
|
### 1. User Query
|
|
|
|
Claude has access to 4 MCP tools. When searching memory, Claude follows the 3-layer workflow:
|
|
|
|
```
|
|
Step 1: search(query="authentication bug", type="bugfix", limit=10)
|
|
Step 2: timeline(anchor=<observation_id>, depth_before=3, depth_after=3)
|
|
Step 3: get_observations(ids=[123, 456, 789])
|
|
```
|
|
|
|
### 2. MCP Protocol
|
|
|
|
MCP server receives tool call via JSON-RPC over stdio:
|
|
|
|
```json
|
|
{
|
|
"method": "tools/call",
|
|
"params": {
|
|
"name": "search",
|
|
"arguments": {
|
|
"query": "authentication bug",
|
|
"type": "bugfix",
|
|
"limit": 10
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### 3. HTTP API Call
|
|
|
|
MCP server translates to HTTP request:
|
|
|
|
```typescript
|
|
const url = `http://localhost:37777/api/search?query=authentication%20bug&type=bugfix&limit=10`;
|
|
const response = await fetch(url);
|
|
```
|
|
|
|
### 4. Worker Processing
|
|
|
|
Worker service executes FTS5 query:
|
|
|
|
```sql
|
|
SELECT * FROM observations_fts
|
|
WHERE observations_fts MATCH ?
|
|
AND type = 'bugfix'
|
|
ORDER BY rank
|
|
LIMIT 10
|
|
```
|
|
|
|
### 5. Results Returned
|
|
|
|
Worker returns structured data → MCP server → Claude:
|
|
|
|
```json
|
|
{
|
|
"content": [{
|
|
"type": "text",
|
|
"text": "| ID | Time | Title | Type |\n|---|---|---|---|\n| #123 | 2:15 PM | Fixed auth token expiry | bugfix |"
|
|
}]
|
|
}
|
|
```
|
|
|
|
### 6. Claude Processes Results
|
|
|
|
Claude reviews the index, decides which observations are relevant, and can:
|
|
- Use `timeline` to get context
|
|
- Use `get_observations` to fetch full details for selected IDs
|
|
|
|
## The 4 MCP Tools
|
|
|
|
### `__IMPORTANT` - Workflow Documentation
|
|
|
|
Always visible to Claude. Explains the 3-layer workflow pattern.
|
|
|
|
**Description:**
|
|
```
|
|
3-LAYER WORKFLOW (ALWAYS FOLLOW):
|
|
1. search(query) → Get index with IDs (~50-100 tokens/result)
|
|
2. timeline(anchor=ID) → Get context around interesting results
|
|
3. get_observations([IDs]) → Fetch full details ONLY for filtered IDs
|
|
NEVER fetch full details without filtering first. 10x token savings.
|
|
```
|
|
|
|
**Purpose:** Ensures Claude follows token-efficient pattern
|
|
|
|
### `search` - Search Memory Index
|
|
|
|
**Tool Definition:**
|
|
```typescript
|
|
{
|
|
name: 'search',
|
|
description: 'Step 1: Search memory. Returns index with IDs. Params: query, limit, project, type, obs_type, dateStart, dateEnd, offset, orderBy',
|
|
inputSchema: {
|
|
type: 'object',
|
|
properties: {},
|
|
additionalProperties: true // Accepts any parameters
|
|
}
|
|
}
|
|
```
|
|
|
|
**HTTP Endpoint:** `GET /api/search`
|
|
|
|
**Parameters:**
|
|
- `query` - Full-text search query
|
|
- `limit` - Maximum results (default: 20)
|
|
- `type` - Filter by observation type
|
|
- `project` - Filter by project name
|
|
- `dateStart`, `dateEnd` - Date range filters
|
|
- `offset` - Pagination offset
|
|
- `orderBy` - Sort order
|
|
|
|
**Returns:** Compact index with IDs, titles, dates, types (~50-100 tokens per result)
|
|
|
|
### `timeline` - Get Chronological Context
|
|
|
|
**Tool Definition:**
|
|
```typescript
|
|
{
|
|
name: 'timeline',
|
|
description: 'Step 2: Get context around results. Params: anchor (observation ID) OR query (finds anchor automatically), depth_before, depth_after, project',
|
|
inputSchema: {
|
|
type: 'object',
|
|
properties: {},
|
|
additionalProperties: true
|
|
}
|
|
}
|
|
```
|
|
|
|
**HTTP Endpoint:** `GET /api/timeline`
|
|
|
|
**Parameters:**
|
|
- `anchor` - Observation ID to center timeline around (optional if query provided)
|
|
- `query` - Search query to find anchor automatically (optional if anchor provided)
|
|
- `depth_before` - Number of observations before anchor (default: 3)
|
|
- `depth_after` - Number of observations after anchor (default: 3)
|
|
- `project` - Filter by project name
|
|
|
|
**Returns:** Chronological view showing what happened before/during/after
|
|
|
|
### `get_observations` - Fetch Full Details
|
|
|
|
**Tool Definition:**
|
|
```typescript
|
|
{
|
|
name: 'get_observations',
|
|
description: 'Step 3: Fetch full details for filtered IDs. Params: ids (array of observation IDs, required), orderBy, limit, project',
|
|
inputSchema: {
|
|
type: 'object',
|
|
properties: {
|
|
ids: {
|
|
type: 'array',
|
|
items: { type: 'number' },
|
|
description: 'Array of observation IDs to fetch (required)'
|
|
}
|
|
},
|
|
required: ['ids'],
|
|
additionalProperties: true
|
|
}
|
|
}
|
|
```
|
|
|
|
**HTTP Endpoint:** `POST /api/observations/batch`
|
|
|
|
**Body:**
|
|
```json
|
|
{
|
|
"ids": [123, 456, 789],
|
|
"orderBy": "date_desc",
|
|
"project": "my-app"
|
|
}
|
|
```
|
|
|
|
**Returns:** Complete observation details (~500-1,000 tokens per observation)
|
|
|
|
## MCP Server Implementation
|
|
|
|
**Location:** `/Users/YOUR_USERNAME/.claude/plugins/marketplaces/thedotmack/plugin/scripts/mcp-server.cjs`
|
|
|
|
**Role:** Thin wrapper that translates MCP protocol to HTTP API calls
|
|
|
|
**Key Characteristics:**
|
|
- ~312 lines of code (reduced from ~2,718 lines in old implementation)
|
|
- No business logic - just protocol translation
|
|
- Single source of truth: Worker HTTP API
|
|
- Simple schemas with `additionalProperties: true`
|
|
|
|
**Handler Example:**
|
|
```typescript
|
|
{
|
|
name: 'search',
|
|
handler: async (args: any) => {
|
|
const endpoint = '/api/search';
|
|
const searchParams = new URLSearchParams();
|
|
|
|
for (const [key, value] of Object.entries(args)) {
|
|
searchParams.append(key, String(value));
|
|
}
|
|
|
|
const url = `http://localhost:37777${endpoint}?${searchParams}`;
|
|
const response = await fetch(url);
|
|
return await response.json();
|
|
}
|
|
}
|
|
```
|
|
|
|
## Worker HTTP API
|
|
|
|
**Location:** `src/services/worker-service.ts`
|
|
|
|
**Port:** 37777
|
|
|
|
**Search Endpoints:**
|
|
```typescript
|
|
GET /api/search # Main search (used by MCP search tool)
|
|
GET /api/timeline # Timeline context (used by MCP timeline tool)
|
|
POST /api/observations/batch # Fetch by IDs (used by MCP get_observations tool)
|
|
GET /api/health # Health check
|
|
```
|
|
|
|
**Database Access:**
|
|
- Uses `SessionSearch` service for FTS5 queries
|
|
- Uses `SessionStore` for structured queries
|
|
- Hybrid search with ChromaDB for semantic similarity
|
|
|
|
**FTS5 Full-Text Search:**
|
|
```typescript
|
|
// search tool → HTTP GET → FTS5 query
|
|
SELECT * FROM observations_fts
|
|
WHERE observations_fts MATCH ?
|
|
AND type = ?
|
|
AND date >= ? AND date <= ?
|
|
ORDER BY rank
|
|
LIMIT ? OFFSET ?
|
|
```
|
|
|
|
## The 3-Layer Workflow Pattern
|
|
|
|
### Design Philosophy
|
|
|
|
The 3-layer workflow embodies **progressive disclosure** - a core principle of claude-mem's architecture.
|
|
|
|
**Layer 1: Index (Search)**
|
|
- **What:** Compact table with IDs, titles, dates, types
|
|
- **Cost:** ~50-100 tokens per result
|
|
- **Purpose:** Survey what exists before committing tokens
|
|
- **Decision Point:** "Which observations are relevant?"
|
|
|
|
**Layer 2: Context (Timeline)**
|
|
- **What:** Chronological view of observations around a point
|
|
- **Cost:** Variable based on depth
|
|
- **Purpose:** Understand narrative arc, see what led to/from a point
|
|
- **Decision Point:** "Do I need full details?"
|
|
|
|
**Layer 3: Details (Get Observations)**
|
|
- **What:** Complete observation data (narrative, facts, files, concepts)
|
|
- **Cost:** ~500-1,000 tokens per observation
|
|
- **Purpose:** Deep dive on validated, relevant observations
|
|
- **Decision Point:** "Apply knowledge to current task"
|
|
|
|
### Token Efficiency
|
|
|
|
**Traditional RAG Approach:**
|
|
```
|
|
Fetch 20 observations upfront: 10,000-20,000 tokens
|
|
Relevance: ~10% (only 2 observations actually useful)
|
|
Waste: 18,000 tokens on irrelevant context
|
|
```
|
|
|
|
**3-Layer Workflow:**
|
|
```
|
|
Step 1: search (20 results) ~1,000-2,000 tokens
|
|
Step 2: Review index, filter to 3 relevant IDs
|
|
Step 3: get_observations (3 IDs) ~1,500-3,000 tokens
|
|
Total: 2,500-5,000 tokens (50-75% savings)
|
|
```
|
|
|
|
**10x Savings:** By filtering at index level before fetching full details
|
|
|
|
## Architecture Evolution
|
|
|
|
### Before: Complex MCP Implementation
|
|
|
|
**Approach:** 9 MCP tools with detailed parameter schemas
|
|
|
|
**Token Cost:** ~2,500 tokens in tool definitions per session
|
|
- `search_observations` - Full-text search
|
|
- `find_by_type` - Filter by type
|
|
- `find_by_file` - Filter by file
|
|
- `find_by_concept` - Filter by concept
|
|
- `get_recent_context` - Recent sessions
|
|
- `get_observation` - Fetch single observation
|
|
- `get_session` - Fetch session
|
|
- `get_prompt` - Fetch prompt
|
|
- `help` - API documentation
|
|
|
|
**Problems:**
|
|
- Overlapping operations (search_observations vs find_by_type)
|
|
- Complex parameter schemas
|
|
- No built-in workflow guidance
|
|
- High token cost at session start
|
|
|
|
**Code Size:** ~2,718 lines in mcp-server.ts
|
|
|
|
### After: Streamlined MCP Implementation
|
|
|
|
**Approach:** 4 MCP tools following 3-layer workflow
|
|
|
|
**Token Cost:** ~312 lines of code, simplified tool definitions
|
|
|
|
**Tools:**
|
|
1. `__IMPORTANT` - Workflow guidance (always visible)
|
|
2. `search` - Step 1 (index)
|
|
3. `timeline` - Step 2 (context)
|
|
4. `get_observations` - Step 3 (details)
|
|
|
|
**Benefits:**
|
|
- Progressive disclosure built into tool design
|
|
- No overlapping operations
|
|
- Simple schemas (`additionalProperties: true`)
|
|
- Clear workflow pattern
|
|
- ~10x token savings
|
|
|
|
**Code Size:** ~312 lines in mcp-server.ts (88% reduction)
|
|
|
|
### Key Insight
|
|
|
|
**Before:** Progressive disclosure was something Claude had to remember
|
|
|
|
**After:** Progressive disclosure is enforced by tool design itself
|
|
|
|
The 3-layer workflow pattern makes it structurally difficult to waste tokens:
|
|
- Can't fetch details without first getting IDs from search
|
|
- Can't search without seeing workflow reminder (`__IMPORTANT`)
|
|
- Timeline provides middle ground between index and full details
|
|
|
|
## Configuration
|
|
|
|
### Claude Desktop
|
|
|
|
Add to `claude_desktop_config.json`:
|
|
|
|
```json
|
|
{
|
|
"mcpServers": {
|
|
"mcp-search": {
|
|
"command": "node",
|
|
"args": [
|
|
"/Users/YOUR_USERNAME/.claude/plugins/marketplaces/thedotmack/plugin/scripts/mcp-server.cjs"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Claude Code
|
|
|
|
MCP server is automatically configured via plugin installation. No manual setup required.
|
|
|
|
**Both clients use the same MCP tools** - the architecture works identically for Claude Desktop and Claude Code.
|
|
|
|
## Security
|
|
|
|
### FTS5 Injection Prevention
|
|
|
|
All search queries are escaped before FTS5 processing:
|
|
|
|
```typescript
|
|
function escapeFTS5Query(query: string): string {
|
|
return query.replace(/"/g, '""');
|
|
}
|
|
```
|
|
|
|
**Testing:** 332 injection attack tests covering special characters, SQL keywords, quote escaping, and boolean operators.
|
|
|
|
### MCP Protocol Security
|
|
|
|
- Stdio transport (no network exposure)
|
|
- Local-only HTTP API (localhost:37777)
|
|
- No authentication needed (local development only)
|
|
|
|
## Performance
|
|
|
|
**FTS5 Full-Text Search:** Sub-10ms for typical queries
|
|
|
|
**MCP Overhead:** Minimal - simple protocol translation
|
|
|
|
**Caching:** HTTP layer allows response caching (future enhancement)
|
|
|
|
**Pagination:** Efficient with offset/limit
|
|
|
|
**Batching:** `get_observations` accepts multiple IDs in single call
|
|
|
|
## Benefits Over Alternative Approaches
|
|
|
|
### vs. Traditional RAG
|
|
|
|
**Traditional RAG:**
|
|
- Fetches everything upfront
|
|
- High token cost
|
|
- Low relevance ratio
|
|
|
|
**3-Layer MCP:**
|
|
- Fetches only what's needed
|
|
- ~10x token savings
|
|
- 100% relevance (Claude chooses what to fetch)
|
|
|
|
### vs. Previous MCP Implementation (v5.x)
|
|
|
|
**Previous (9 tools):**
|
|
- Complex schemas
|
|
- Overlapping operations
|
|
- No workflow guidance
|
|
- ~2,500 tokens in definitions
|
|
|
|
**Current (4 tools):**
|
|
- Simple schemas
|
|
- Clear workflow
|
|
- Built-in guidance
|
|
- ~312 lines of code
|
|
|
|
### vs. Skill-Based Approach (Previously)
|
|
|
|
**Skill approach:**
|
|
- Required separate skill files
|
|
- HTTP API called directly via curl
|
|
- Progressive disclosure through skill loading
|
|
|
|
**MCP approach:**
|
|
- Native MCP protocol (better Claude integration)
|
|
- Cleaner architecture (protocol translation layer)
|
|
- Works with both Claude Desktop and Claude Code
|
|
- Simpler to maintain (no skill files)
|
|
|
|
**Migration:** Skill-based search was removed in favor of streamlined MCP architecture.
|
|
|
|
## Troubleshooting
|
|
|
|
### MCP Server Not Connected
|
|
|
|
**Symptoms:** Tools not appearing in Claude
|
|
|
|
**Solution:**
|
|
1. Check MCP server path in configuration
|
|
2. Verify worker service is running: `curl http://localhost:37777/api/health`
|
|
3. Restart Claude Desktop/Code
|
|
|
|
### Worker Service Not Running
|
|
|
|
**Symptoms:** MCP tools fail with connection errors
|
|
|
|
**Solution:**
|
|
```bash
|
|
npm run worker:status # Check status
|
|
npm run worker:restart # Restart worker
|
|
npm run worker:logs # View logs
|
|
```
|
|
|
|
### Empty Search Results
|
|
|
|
**Symptoms:** search() returns no results
|
|
|
|
**Troubleshooting:**
|
|
1. Test API directly: `curl "http://localhost:37777/api/search?query=test"`
|
|
2. Check database: `ls ~/.claude-mem/claude-mem.db`
|
|
3. Verify observations exist: `curl "http://localhost:37777/api/health"`
|
|
|
|
## Next Steps
|
|
|
|
- [Memory Search Usage](/usage/search-tools) - User guide with examples
|
|
- [Progressive Disclosure](/progressive-disclosure) - Philosophy behind 3-layer workflow
|
|
- [Worker Service Architecture](/architecture/worker-service) - HTTP API details
|
|
- [Database Schema](/architecture/database) - FTS5 tables and indexes
|