Files
Alex Newman d874ce6eb3 fix: escape less-than character in search-architecture.mdx to resolve MDX parsing error
Changed '<10ms' to 'Sub-10ms' to avoid MDX interpreting the < character as an HTML tag opening, which was causing deployment failure.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-29 00:34:14 -05:00

498 lines
13 KiB
Plaintext

---
title: "Search Architecture"
description: "MCP tools with 3-layer workflow for token-efficient memory retrieval"
---
# Search Architecture
Claude-mem uses an **MCP-based search architecture** that provides intelligent memory retrieval through 4 streamlined tools following a 3-layer workflow pattern.
## Overview
**Architecture**: MCP Tools → MCP Protocol → HTTP API → Worker Service
**Key Components**:
1. **MCP Tools** (4 tools) - `search`, `timeline`, `get_observations`, `__IMPORTANT`
2. **MCP Server** (`plugin/scripts/mcp-server.cjs`) - Thin wrapper over HTTP API
3. **HTTP API Endpoints** - Fast search operations on Worker Service (port 37777)
4. **Worker Service** - Express.js server with FTS5 full-text search
5. **SQLite Database** - Persistent storage with FTS5 virtual tables
6. **Chroma Vector DB** - Semantic search with hybrid retrieval
**Token Efficiency**: ~10x savings through 3-layer workflow pattern
## How It Works
### 1. User Query
Claude has access to 4 MCP tools. When searching memory, Claude follows the 3-layer workflow:
```
Step 1: search(query="authentication bug", type="bugfix", limit=10)
Step 2: timeline(anchor=<observation_id>, depth_before=3, depth_after=3)
Step 3: get_observations(ids=[123, 456, 789])
```
### 2. MCP Protocol
MCP server receives tool call via JSON-RPC over stdio:
```json
{
"method": "tools/call",
"params": {
"name": "search",
"arguments": {
"query": "authentication bug",
"type": "bugfix",
"limit": 10
}
}
}
```
### 3. HTTP API Call
MCP server translates to HTTP request:
```typescript
const url = `http://localhost:37777/api/search?query=authentication%20bug&type=bugfix&limit=10`;
const response = await fetch(url);
```
### 4. Worker Processing
Worker service executes FTS5 query:
```sql
SELECT * FROM observations_fts
WHERE observations_fts MATCH ?
AND type = 'bugfix'
ORDER BY rank
LIMIT 10
```
### 5. Results Returned
Worker returns structured data → MCP server → Claude:
```json
{
"content": [{
"type": "text",
"text": "| ID | Time | Title | Type |\n|---|---|---|---|\n| #123 | 2:15 PM | Fixed auth token expiry | bugfix |"
}]
}
```
### 6. Claude Processes Results
Claude reviews the index, decides which observations are relevant, and can:
- Use `timeline` to get context
- Use `get_observations` to fetch full details for selected IDs
## The 4 MCP Tools
### `__IMPORTANT` - Workflow Documentation
Always visible to Claude. Explains the 3-layer workflow pattern.
**Description:**
```
3-LAYER WORKFLOW (ALWAYS FOLLOW):
1. search(query) → Get index with IDs (~50-100 tokens/result)
2. timeline(anchor=ID) → Get context around interesting results
3. get_observations([IDs]) → Fetch full details ONLY for filtered IDs
NEVER fetch full details without filtering first. 10x token savings.
```
**Purpose:** Ensures Claude follows token-efficient pattern
### `search` - Search Memory Index
**Tool Definition:**
```typescript
{
name: 'search',
description: 'Step 1: Search memory. Returns index with IDs. Params: query, limit, project, type, obs_type, dateStart, dateEnd, offset, orderBy',
inputSchema: {
type: 'object',
properties: {},
additionalProperties: true // Accepts any parameters
}
}
```
**HTTP Endpoint:** `GET /api/search`
**Parameters:**
- `query` - Full-text search query
- `limit` - Maximum results (default: 20)
- `type` - Filter by observation type
- `project` - Filter by project name
- `dateStart`, `dateEnd` - Date range filters
- `offset` - Pagination offset
- `orderBy` - Sort order
**Returns:** Compact index with IDs, titles, dates, types (~50-100 tokens per result)
### `timeline` - Get Chronological Context
**Tool Definition:**
```typescript
{
name: 'timeline',
description: 'Step 2: Get context around results. Params: anchor (observation ID) OR query (finds anchor automatically), depth_before, depth_after, project',
inputSchema: {
type: 'object',
properties: {},
additionalProperties: true
}
}
```
**HTTP Endpoint:** `GET /api/timeline`
**Parameters:**
- `anchor` - Observation ID to center timeline around (optional if query provided)
- `query` - Search query to find anchor automatically (optional if anchor provided)
- `depth_before` - Number of observations before anchor (default: 3)
- `depth_after` - Number of observations after anchor (default: 3)
- `project` - Filter by project name
**Returns:** Chronological view showing what happened before/during/after
### `get_observations` - Fetch Full Details
**Tool Definition:**
```typescript
{
name: 'get_observations',
description: 'Step 3: Fetch full details for filtered IDs. Params: ids (array of observation IDs, required), orderBy, limit, project',
inputSchema: {
type: 'object',
properties: {
ids: {
type: 'array',
items: { type: 'number' },
description: 'Array of observation IDs to fetch (required)'
}
},
required: ['ids'],
additionalProperties: true
}
}
```
**HTTP Endpoint:** `POST /api/observations/batch`
**Body:**
```json
{
"ids": [123, 456, 789],
"orderBy": "date_desc",
"project": "my-app"
}
```
**Returns:** Complete observation details (~500-1,000 tokens per observation)
## MCP Server Implementation
**Location:** `/Users/YOUR_USERNAME/.claude/plugins/marketplaces/thedotmack/plugin/scripts/mcp-server.cjs`
**Role:** Thin wrapper that translates MCP protocol to HTTP API calls
**Key Characteristics:**
- ~312 lines of code (reduced from ~2,718 lines in old implementation)
- No business logic - just protocol translation
- Single source of truth: Worker HTTP API
- Simple schemas with `additionalProperties: true`
**Handler Example:**
```typescript
{
name: 'search',
handler: async (args: any) => {
const endpoint = '/api/search';
const searchParams = new URLSearchParams();
for (const [key, value] of Object.entries(args)) {
searchParams.append(key, String(value));
}
const url = `http://localhost:37777${endpoint}?${searchParams}`;
const response = await fetch(url);
return await response.json();
}
}
```
## Worker HTTP API
**Location:** `src/services/worker-service.ts`
**Port:** 37777
**Search Endpoints:**
```typescript
GET /api/search # Main search (used by MCP search tool)
GET /api/timeline # Timeline context (used by MCP timeline tool)
POST /api/observations/batch # Fetch by IDs (used by MCP get_observations tool)
GET /api/health # Health check
```
**Database Access:**
- Uses `SessionSearch` service for FTS5 queries
- Uses `SessionStore` for structured queries
- Hybrid search with ChromaDB for semantic similarity
**FTS5 Full-Text Search:**
```typescript
// search tool → HTTP GET → FTS5 query
SELECT * FROM observations_fts
WHERE observations_fts MATCH ?
AND type = ?
AND date >= ? AND date <= ?
ORDER BY rank
LIMIT ? OFFSET ?
```
## The 3-Layer Workflow Pattern
### Design Philosophy
The 3-layer workflow embodies **progressive disclosure** - a core principle of claude-mem's architecture.
**Layer 1: Index (Search)**
- **What:** Compact table with IDs, titles, dates, types
- **Cost:** ~50-100 tokens per result
- **Purpose:** Survey what exists before committing tokens
- **Decision Point:** "Which observations are relevant?"
**Layer 2: Context (Timeline)**
- **What:** Chronological view of observations around a point
- **Cost:** Variable based on depth
- **Purpose:** Understand narrative arc, see what led to/from a point
- **Decision Point:** "Do I need full details?"
**Layer 3: Details (Get Observations)**
- **What:** Complete observation data (narrative, facts, files, concepts)
- **Cost:** ~500-1,000 tokens per observation
- **Purpose:** Deep dive on validated, relevant observations
- **Decision Point:** "Apply knowledge to current task"
### Token Efficiency
**Traditional RAG Approach:**
```
Fetch 20 observations upfront: 10,000-20,000 tokens
Relevance: ~10% (only 2 observations actually useful)
Waste: 18,000 tokens on irrelevant context
```
**3-Layer Workflow:**
```
Step 1: search (20 results) ~1,000-2,000 tokens
Step 2: Review index, filter to 3 relevant IDs
Step 3: get_observations (3 IDs) ~1,500-3,000 tokens
Total: 2,500-5,000 tokens (50-75% savings)
```
**10x Savings:** By filtering at index level before fetching full details
## Architecture Evolution
### Before: Complex MCP Implementation
**Approach:** 9 MCP tools with detailed parameter schemas
**Token Cost:** ~2,500 tokens in tool definitions per session
- `search_observations` - Full-text search
- `find_by_type` - Filter by type
- `find_by_file` - Filter by file
- `find_by_concept` - Filter by concept
- `get_recent_context` - Recent sessions
- `get_observation` - Fetch single observation
- `get_session` - Fetch session
- `get_prompt` - Fetch prompt
- `help` - API documentation
**Problems:**
- Overlapping operations (search_observations vs find_by_type)
- Complex parameter schemas
- No built-in workflow guidance
- High token cost at session start
**Code Size:** ~2,718 lines in mcp-server.ts
### After: Streamlined MCP Implementation
**Approach:** 4 MCP tools following 3-layer workflow
**Token Cost:** ~312 lines of code, simplified tool definitions
**Tools:**
1. `__IMPORTANT` - Workflow guidance (always visible)
2. `search` - Step 1 (index)
3. `timeline` - Step 2 (context)
4. `get_observations` - Step 3 (details)
**Benefits:**
- Progressive disclosure built into tool design
- No overlapping operations
- Simple schemas (`additionalProperties: true`)
- Clear workflow pattern
- ~10x token savings
**Code Size:** ~312 lines in mcp-server.ts (88% reduction)
### Key Insight
**Before:** Progressive disclosure was something Claude had to remember
**After:** Progressive disclosure is enforced by tool design itself
The 3-layer workflow pattern makes it structurally difficult to waste tokens:
- Can't fetch details without first getting IDs from search
- Can't search without seeing workflow reminder (`__IMPORTANT`)
- Timeline provides middle ground between index and full details
## Configuration
### Claude Desktop
Add to `claude_desktop_config.json`:
```json
{
"mcpServers": {
"mcp-search": {
"command": "node",
"args": [
"/Users/YOUR_USERNAME/.claude/plugins/marketplaces/thedotmack/plugin/scripts/mcp-server.cjs"
]
}
}
}
```
### Claude Code
MCP server is automatically configured via plugin installation. No manual setup required.
**Both clients use the same MCP tools** - the architecture works identically for Claude Desktop and Claude Code.
## Security
### FTS5 Injection Prevention
All search queries are escaped before FTS5 processing:
```typescript
function escapeFTS5Query(query: string): string {
return query.replace(/"/g, '""');
}
```
**Testing:** 332 injection attack tests covering special characters, SQL keywords, quote escaping, and boolean operators.
### MCP Protocol Security
- Stdio transport (no network exposure)
- Local-only HTTP API (localhost:37777)
- No authentication needed (local development only)
## Performance
**FTS5 Full-Text Search:** Sub-10ms for typical queries
**MCP Overhead:** Minimal - simple protocol translation
**Caching:** HTTP layer allows response caching (future enhancement)
**Pagination:** Efficient with offset/limit
**Batching:** `get_observations` accepts multiple IDs in single call
## Benefits Over Alternative Approaches
### vs. Traditional RAG
**Traditional RAG:**
- Fetches everything upfront
- High token cost
- Low relevance ratio
**3-Layer MCP:**
- Fetches only what's needed
- ~10x token savings
- 100% relevance (Claude chooses what to fetch)
### vs. Previous MCP Implementation (v5.x)
**Previous (9 tools):**
- Complex schemas
- Overlapping operations
- No workflow guidance
- ~2,500 tokens in definitions
**Current (4 tools):**
- Simple schemas
- Clear workflow
- Built-in guidance
- ~312 lines of code
### vs. Skill-Based Approach (Previously)
**Skill approach:**
- Required separate skill files
- HTTP API called directly via curl
- Progressive disclosure through skill loading
**MCP approach:**
- Native MCP protocol (better Claude integration)
- Cleaner architecture (protocol translation layer)
- Works with both Claude Desktop and Claude Code
- Simpler to maintain (no skill files)
**Migration:** Skill-based search was removed in favor of streamlined MCP architecture.
## Troubleshooting
### MCP Server Not Connected
**Symptoms:** Tools not appearing in Claude
**Solution:**
1. Check MCP server path in configuration
2. Verify worker service is running: `curl http://localhost:37777/api/health`
3. Restart Claude Desktop/Code
### Worker Service Not Running
**Symptoms:** MCP tools fail with connection errors
**Solution:**
```bash
npm run worker:status # Check status
npm run worker:restart # Restart worker
npm run worker:logs # View logs
```
### Empty Search Results
**Symptoms:** search() returns no results
**Troubleshooting:**
1. Test API directly: `curl "http://localhost:37777/api/search?query=test"`
2. Check database: `ls ~/.claude-mem/claude-mem.db`
3. Verify observations exist: `curl "http://localhost:37777/api/health"`
## Next Steps
- [Memory Search Usage](/usage/search-tools) - User guide with examples
- [Progressive Disclosure](/progressive-disclosure) - Philosophy behind 3-layer workflow
- [Worker Service Architecture](/architecture/worker-service) - HTTP API details
- [Database Schema](/architecture/database) - FTS5 tables and indexes