claude-mem/docs/public/architecture/search-architecture.mdx

---
title: "Search Architecture"
description: "MCP tools with 3-layer workflow for token-efficient memory retrieval"
---

# Search Architecture

Claude-mem uses an **MCP-based search architecture** that provides intelligent memory retrieval through 4 streamlined tools following a 3-layer workflow pattern.

## Overview

**Architecture**: MCP Tools → MCP Protocol → HTTP API → Worker Service

**Key Components**:
1. **MCP Tools** (4 tools) - `search`, `timeline`, `get_observations`, `__IMPORTANT`
2. **MCP Server** (`plugin/scripts/mcp-server.cjs`) - Thin wrapper over HTTP API
3. **HTTP API Endpoints** - Fast search operations on Worker Service (port 37777)
4. **Worker Service** - Express.js server with FTS5 full-text search
5. **SQLite Database** - Persistent storage with FTS5 virtual tables
6. **Chroma Vector DB** - Semantic search with hybrid retrieval

**Token Efficiency**: ~10x savings through 3-layer workflow pattern

## How It Works

### 1. User Query

Claude has access to 4 MCP tools. When searching memory, Claude follows the 3-layer workflow:

```
Step 1: search(query="authentication bug", type="bugfix", limit=10)
Step 2: timeline(anchor=<observation_id>, depth_before=3, depth_after=3)
Step 3: get_observations(ids=[123, 456, 789])
```

### 2. MCP Protocol

MCP server receives tool call via JSON-RPC over stdio:

```json
{
  "method": "tools/call",
  "params": {
    "name": "search",
    "arguments": {
      "query": "authentication bug",
      "type": "bugfix",
      "limit": 10
    }
  }
}
```

### 3. HTTP API Call

MCP server translates to HTTP request:

```typescript
const url = `http://localhost:37777/api/search?query=authentication%20bug&type=bugfix&limit=10`;
const response = await fetch(url);
```

### 4. Worker Processing

Worker service executes FTS5 query:

```sql
SELECT * FROM observations_fts
WHERE observations_fts MATCH ?
AND type = 'bugfix'
ORDER BY rank
LIMIT 10
```

### 5. Results Returned

Worker returns structured data → MCP server → Claude:

```json
{
  "content": [{
    "type": "text",
    "text": "| ID | Time | Title | Type |\n|---|---|---|---|\n| #123 | 2:15 PM | Fixed auth token expiry | bugfix |"
  }]
}
```

### 6. Claude Processes Results

Claude reviews the index, decides which observations are relevant, and can:
- Use `timeline` to get context
- Use `get_observations` to fetch full details for selected IDs

## The 4 MCP Tools

### `__IMPORTANT` - Workflow Documentation

Always visible to Claude. Explains the 3-layer workflow pattern.

**Description:**
```
3-LAYER WORKFLOW (ALWAYS FOLLOW):
1. search(query) → Get index with IDs (~50-100 tokens/result)
2. timeline(anchor=ID) → Get context around interesting results
3. get_observations([IDs]) → Fetch full details ONLY for filtered IDs
NEVER fetch full details without filtering first. 10x token savings.
```

**Purpose:** Ensures Claude follows token-efficient pattern

### `search` - Search Memory Index

**Tool Definition:**
```typescript
{
  name: 'search',
  description: 'Step 1: Search memory. Returns index with IDs. Params: query, limit, project, type, obs_type, dateStart, dateEnd, offset, orderBy',
  inputSchema: {
    type: 'object',
    properties: {},
    additionalProperties: true  // Accepts any parameters
  }
}
```

**HTTP Endpoint:** `GET /api/search`

**Parameters:**
- `query` - Full-text search query
- `limit` - Maximum results (default: 20)
- `type` - Filter by observation type
- `project` - Filter by project name
- `dateStart`, `dateEnd` - Date range filters
- `offset` - Pagination offset
- `orderBy` - Sort order

**Returns:** Compact index with IDs, titles, dates, types (~50-100 tokens per result)

### `timeline` - Get Chronological Context

**Tool Definition:**
```typescript
{
  name: 'timeline',
  description: 'Step 2: Get context around results. Params: anchor (observation ID) OR query (finds anchor automatically), depth_before, depth_after, project',
  inputSchema: {
    type: 'object',
    properties: {},
    additionalProperties: true
  }
}
```

**HTTP Endpoint:** `GET /api/timeline`

**Parameters:**
- `anchor` - Observation ID to center timeline around (optional if query provided)
- `query` - Search query to find anchor automatically (optional if anchor provided)
- `depth_before` - Number of observations before anchor (default: 3)
- `depth_after` - Number of observations after anchor (default: 3)
- `project` - Filter by project name

**Returns:** Chronological view showing what happened before/during/after

### `get_observations` - Fetch Full Details

**Tool Definition:**
```typescript
{
  name: 'get_observations',
  description: 'Step 3: Fetch full details for filtered IDs. Params: ids (array of observation IDs, required), orderBy, limit, project',
  inputSchema: {
    type: 'object',
    properties: {
      ids: {
        type: 'array',
        items: { type: 'number' },
        description: 'Array of observation IDs to fetch (required)'
      }
    },
    required: ['ids'],
    additionalProperties: true
  }
}
```

**HTTP Endpoint:** `POST /api/observations/batch`

**Body:**
```json
{
  "ids": [123, 456, 789],
  "orderBy": "date_desc",
  "project": "my-app"
}
```

**Returns:** Complete observation details (~500-1,000 tokens per observation)

## MCP Server Implementation

**Location:** `/Users/YOUR_USERNAME/.claude/plugins/marketplaces/thedotmack/plugin/scripts/mcp-server.cjs`

**Role:** Thin wrapper that translates MCP protocol to HTTP API calls

**Key Characteristics:**
- ~312 lines of code (reduced from ~2,718 lines in old implementation)
- No business logic - just protocol translation
- Single source of truth: Worker HTTP API
- Simple schemas with `additionalProperties: true`

**Handler Example:**
```typescript
{
  name: 'search',
  handler: async (args: any) => {
    const endpoint = '/api/search';
    const searchParams = new URLSearchParams();

    for (const [key, value] of Object.entries(args)) {
      searchParams.append(key, String(value));
    }

    const url = `http://localhost:37777${endpoint}?${searchParams}`;
    const response = await fetch(url);
    return await response.json();
  }
}
```

## Worker HTTP API

**Location:** `src/services/worker-service.ts`

**Port:** 37777

**Search Endpoints:**
```typescript
GET  /api/search           # Main search (used by MCP search tool)
GET  /api/timeline         # Timeline context (used by MCP timeline tool)
POST /api/observations/batch  # Fetch by IDs (used by MCP get_observations tool)
GET  /api/health           # Health check
```

**Database Access:**
- Uses `SessionSearch` service for FTS5 queries
- Uses `SessionStore` for structured queries
- Hybrid search with ChromaDB for semantic similarity

**FTS5 Full-Text Search:**
```typescript
// search tool → HTTP GET → FTS5 query
SELECT * FROM observations_fts
WHERE observations_fts MATCH ?
AND type = ?
AND date >= ? AND date <= ?
ORDER BY rank
LIMIT ? OFFSET ?
```

## The 3-Layer Workflow Pattern

### Design Philosophy

The 3-layer workflow embodies **progressive disclosure** - a core principle of claude-mem's architecture.

**Layer 1: Index (Search)**
- **What:** Compact table with IDs, titles, dates, types
- **Cost:** ~50-100 tokens per result
- **Purpose:** Survey what exists before committing tokens
- **Decision Point:** "Which observations are relevant?"

**Layer 2: Context (Timeline)**
- **What:** Chronological view of observations around a point
- **Cost:** Variable based on depth
- **Purpose:** Understand narrative arc, see what led to/from a point
- **Decision Point:** "Do I need full details?"

**Layer 3: Details (Get Observations)**
- **What:** Complete observation data (narrative, facts, files, concepts)
- **Cost:** ~500-1,000 tokens per observation
- **Purpose:** Deep dive on validated, relevant observations
- **Decision Point:** "Apply knowledge to current task"

### Token Efficiency

**Traditional RAG Approach:**
```
Fetch 20 observations upfront: 10,000-20,000 tokens
Relevance: ~10% (only 2 observations actually useful)
Waste: 18,000 tokens on irrelevant context
```

**3-Layer Workflow:**
```
Step 1: search (20 results)        ~1,000-2,000 tokens
Step 2: Review index, filter to 3 relevant IDs
Step 3: get_observations (3 IDs)   ~1,500-3,000 tokens
Total: 2,500-5,000 tokens (50-75% savings)
```

**10x Savings:** By filtering at index level before fetching full details

## Architecture Evolution

### Before: Complex MCP Implementation

**Approach:** 9 MCP tools with detailed parameter schemas

**Token Cost:** ~2,500 tokens in tool definitions per session
- `search_observations` - Full-text search
- `find_by_type` - Filter by type
- `find_by_file` - Filter by file
- `find_by_concept` - Filter by concept
- `get_recent_context` - Recent sessions
- `get_observation` - Fetch single observation
- `get_session` - Fetch session
- `get_prompt` - Fetch prompt
- `help` - API documentation

**Problems:**
- Overlapping operations (search_observations vs find_by_type)
- Complex parameter schemas
- No built-in workflow guidance
- High token cost at session start

**Code Size:** ~2,718 lines in mcp-server.ts

### After: Streamlined MCP Implementation

**Approach:** 4 MCP tools following 3-layer workflow

**Token Cost:** ~312 lines of code, simplified tool definitions

**Tools:**
1. `__IMPORTANT` - Workflow guidance (always visible)
2. `search` - Step 1 (index)
3. `timeline` - Step 2 (context)
4. `get_observations` - Step 3 (details)

**Benefits:**
- Progressive disclosure built into tool design
- No overlapping operations
- Simple schemas (`additionalProperties: true`)
- Clear workflow pattern
- ~10x token savings

**Code Size:** ~312 lines in mcp-server.ts (88% reduction)

### Key Insight

**Before:** Progressive disclosure was something Claude had to remember

**After:** Progressive disclosure is enforced by tool design itself

The 3-layer workflow pattern makes it structurally difficult to waste tokens:
- Can't fetch details without first getting IDs from search
- Can't search without seeing workflow reminder (`__IMPORTANT`)
- Timeline provides middle ground between index and full details

## Configuration

### Claude Desktop

Add to `claude_desktop_config.json`:

```json
{
  "mcpServers": {
    "mcp-search": {
      "command": "node",
      "args": [
        "/Users/YOUR_USERNAME/.claude/plugins/marketplaces/thedotmack/plugin/scripts/mcp-server.cjs"
      ]
    }
  }
}
```

### Claude Code

MCP server is automatically configured via plugin installation. No manual setup required.

**Both clients use the same MCP tools** - the architecture works identically for Claude Desktop and Claude Code.

## Security

### FTS5 Injection Prevention

All search queries are escaped before FTS5 processing:

```typescript
function escapeFTS5Query(query: string): string {
  return query.replace(/"/g, '""');
}
```

**Testing:** 332 injection attack tests covering special characters, SQL keywords, quote escaping, and boolean operators.

### MCP Protocol Security

- Stdio transport (no network exposure)
- Local-only HTTP API (localhost:37777)
- No authentication needed (local development only)

## Performance

**FTS5 Full-Text Search:** Sub-10ms for typical queries

**MCP Overhead:** Minimal - simple protocol translation

**Caching:** HTTP layer allows response caching (future enhancement)

**Pagination:** Efficient with offset/limit

**Batching:** `get_observations` accepts multiple IDs in single call

## Benefits Over Alternative Approaches

### vs. Traditional RAG

**Traditional RAG:**
- Fetches everything upfront
- High token cost
- Low relevance ratio

**3-Layer MCP:**
- Fetches only what's needed
- ~10x token savings
- 100% relevance (Claude chooses what to fetch)

### vs. Previous MCP Implementation (v5.x)

**Previous (9 tools):**
- Complex schemas
- Overlapping operations
- No workflow guidance
- ~2,500 tokens in definitions

**Current (4 tools):**
- Simple schemas
- Clear workflow
- Built-in guidance
- ~312 lines of code

### vs. Skill-Based Approach (Previously)

**Skill approach:**
- Required separate skill files
- HTTP API called directly via curl
- Progressive disclosure through skill loading

**MCP approach:**
- Native MCP protocol (better Claude integration)
- Cleaner architecture (protocol translation layer)
- Works with both Claude Desktop and Claude Code
- Simpler to maintain (no skill files)

**Migration:** Skill-based search was removed in favor of streamlined MCP architecture.

## Troubleshooting

### MCP Server Not Connected

**Symptoms:** Tools not appearing in Claude

**Solution:**
1. Check MCP server path in configuration
2. Verify worker service is running: `curl http://localhost:37777/api/health`
3. Restart Claude Desktop/Code

### Worker Service Not Running

**Symptoms:** MCP tools fail with connection errors

**Solution:**
```bash
npm run worker:status       # Check status
npm run worker:restart      # Restart worker
npm run worker:logs         # View logs
```

### Empty Search Results

**Symptoms:** search() returns no results

**Troubleshooting:**
1. Test API directly: `curl "http://localhost:37777/api/search?query=test"`
2. Check database: `ls ~/.claude-mem/claude-mem.db`
3. Verify observations exist: `curl "http://localhost:37777/api/health"`

## Next Steps

- [Memory Search Usage](/usage/search-tools) - User guide with examples
- [Progressive Disclosure](/progressive-disclosure) - Philosophy behind 3-layer workflow
- [Worker Service Architecture](/architecture/worker-service) - HTTP API details
- [Database Schema](/architecture/database) - FTS5 tables and indexes