docs: Align search documentation with hybrid ChromaDB architecture (#116)

* feat: Add discovery_tokens for ROI tracking in observations and session summaries

- Introduced `discovery_tokens` column in `observations` and `session_summaries` tables to track token costs associated with discovering and creating each observation and summary.
- Updated relevant services and hooks to calculate and display ROI metrics based on discovery tokens.
- Enhanced context economics reporting to include savings from reusing previous observations.
- Implemented migration to ensure the new column is added to existing tables.
- Adjusted data models and sync processes to accommodate the new `discovery_tokens` field.

* refactor: streamline context hook by removing unused functions and updating terminology

- Removed the estimateTokens and getObservations helper functions as they were not utilized.
- Updated the legend and output messages to replace "discovery" with "work" for clarity.
- Changed the emoji representation for different observation types to better reflect their purpose.
- Enhanced output formatting for improved readability and understanding of token usage.

* Refactor user-message-hook and context-hook for improved clarity and functionality

- Updated user-message-hook.js to enhance error messaging and improve variable naming for clarity.
- Modified context-hook.ts to include a new column key section, improved context index instructions, and added emoji icons for observation types.
- Adjusted footer messages in context-hook.ts to emphasize token savings and access to past research.
- Changed user-message-hook.ts to update the feedback and support message for clarity.

* fix: Critical ROI tracking fixes from PR review

Addresses critical findings from PR #111 review:

1. **Fixed incorrect discovery token calculation** (src/services/worker/SDKAgent.ts)
   - Changed from passing cumulative total to per-response delta
   - Now correctly tracks token cost for each observation/summary
   - Captures token state before/after response processing
   - Prevents all observations getting inflated cumulative values

2. **Fixed schema version mismatch** (src/services/sqlite/SessionStore.ts)
   - Changed ensureDiscoveryTokensColumn() from version 11 to version 7
   - Now matches migration007 definition in migrations.ts
   - Ensures consistent version tracking across migration system

These fixes ensure ROI metrics accurately reflect token costs.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: Update search documentation to reflect hybrid ChromaDB architecture

The backend correctly implements ChromaDB-first semantic search with SQLite
temporal ordering and FTS5 fallback, but documentation incorrectly described
it as "FTS5 full-text search". This fix aligns all skill guides and tool
descriptions with the actual implementation.

Changes:
- Update SKILL.md to describe hybrid architecture with ChromaDB primary
- Update observations.md title and query parameter descriptions
- Update all three search tool descriptions in search-server.ts:
  * search_observations
  * search_sessions
  * search_user_prompts

All tools now correctly document:
- ChromaDB semantic search (primary ranking)
- 90-day recency filter
- SQLite temporal ordering
- FTS5 fallback (when ChromaDB unavailable)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: Add discovery_tokens column to observations and session_summaries tables

---------

Co-authored-by: Claude <noreply@anthropic.com>
This commit is contained in:
Alex Newman
2025-11-16 13:36:17 -05:00
committed by GitHub
parent 3cbc041c8b
commit c0778bef00
10 changed files with 541 additions and 64 deletions
@@ -148,16 +148,19 @@ When Claude invokes the skill:
## Search Architecture
### Hybrid Search System
### 3-Layer Hybrid Search System
claude-mem uses a **hybrid search architecture** combining:
claude-mem uses a **3-layer sequential search architecture** that mimics human long-term memory:
1. **SQLite FTS5 (Full-Text Search)** - Keyword-based search
2. **ChromaDB (Vector Search)** - Semantic similarity search
**Storage Flow (Write Path):**
1. **SQLite First** - Data written synchronously to SQLite (fast, immediate access)
2. **ChromaDB Background Sync** - Worker asynchronously generates embeddings and syncs to ChromaDB
**Search Flow (Read Path - Sequential, NOT parallel):**
```
┌─────────────────────────────────────────────────────────────┐
Search Request Flow
3-Layer Sequential Search Flow
└─────────────────────────────────────────────────────────────┘
@@ -166,61 +169,70 @@ claude-mem uses a **hybrid search architecture** combining:
│ /api/search/* │
└─────────────────────────┘
┌─────────────┴─────────────┐
▼ ▼
┌──────────────────────────┐ ┌──────────────────────────┐
│ SessionSearch (FTS5) │ │ ChromaSync (Vector DB) │
│ │ │ │
│ Full-text keyword │ │ Semantic similarity │
│ search on: │ │ search on: │
│ - titles │ │ - narratives │
│ - narratives │ │ - facts │
│ - facts │ │ - file content │
│ - concepts │ │ │
│ │ │ Embeddings: │
│ SQLite DB: │ │ - text-embedding-3-small│
│ observations_fts │ │ - 90-day recency filter │
│ sessions_fts │ │ │
│ prompts_fts │ │ ChromaDB: │
│ │ │ observations collection │
└──────────────────────────┘ └──────────────────────────┘
│ │
└─────────────┬─────────────┘
─────────────────────────┐
│ Merged Results
│ - Deduplicated
│ - Sorted by relevance
│ - Formatted (index/full)
└─────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
LAYER 1: Semantic Retrieval (ChromaDB)
│ ─────────────────────────────────────────────────────────
│ Vector similarity search finds semantically relevant items
Returns: observation IDs in index format (~50-100 tokens)
│ Filter: 90-day recency prioritizes recent work │
│ Output: List of relevant observation IDs │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ LAYER 2: Temporal Ordering (SQLite) │
│ ───────────────────────────────────────────────────────── │
│ Takes observation IDs from Layer 1 │
│ Sorts by created_at timestamp (fast SQLite temporal query) │
│ Identifies: MOST RECENT relevant observation │
│ Why: ChromaDB doesn't easily query by date range sorted │
│ Output: Top observation ID by time │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ LAYER 3: Instant Context Timeline (SQLite) │
│ ───────────────────────────────────────────────────────── │
│ Uses top observation ID from Layer 2 as anchor │
│ Retrieves N observations BEFORE and AFTER that point │
│ Provides: "what led here" + "what happened next" context │
│ This is the KILLER FEATURE: mimics human memory │
│ Output: Timeline with temporal context │
└─────────────────────────────────────────────────────────────┘
```
**Why This Architecture Exists:**
The problem: LLMs don't experience time linearly like humans do. Finding semantically relevant information isn't enough—you need temporal context.
The solution:
- **ChromaDB** for "what's relevant" (semantic understanding)
- **SQLite** for "when did it happen" (temporal ordering with fast date-range queries)
- **Timeline** for "what was the context" (before/after observations)
Together, they mimic how humans recall: "I did X, which led to Y, then Z happened."
**Human Memory Analogy:**
Humans don't just remember isolated facts. They remember sequences: what they did before something, what happened after. The instant context timeline gives LLMs this same temporal awareness that humans experience naturally.
### Search Types
#### 1. Full-Text Search (FTS5)
#### 1. Vector Search (ChromaDB) - PRIMARY Search Layer
**How it works:**
- Uses SQLite FTS5 virtual tables for instant keyword matching
- Supports boolean operators: `AND`, `OR`, `NOT`, `NEAR`, `*` (wildcard)
- Ranks results by BM25 relevance scoring
- Sub-100ms performance on 8,000+ observations
**Example query:**
```sql
-- User asks: "How did we implement JWT authentication?"
SELECT * FROM observations_fts
WHERE observations_fts MATCH 'JWT AND authentication'
ORDER BY rank
LIMIT 20;
```
#### 2. Vector Search (ChromaDB)
**Role:** Layer 1 - Semantic Retrieval
**How it works:**
- Text is embedded using OpenAI's `text-embedding-3-small` model
- Vector similarity search finds semantically related content
- Vector similarity search finds semantically related content, not just keyword matches
- 90-day recency filter prioritizes recent work
- Combined with keyword search for hybrid results
- Returns observation IDs for temporal processing in Layer 2
**Why it's primary:**
- Understands meaning, not just keywords ("auth flow" matches "JWT implementation")
- Finds relevant work even when you don't know exact terms used
- Semantic understanding crucial for LLM memory retrieval
**Example query:**
```python
@@ -230,6 +242,37 @@ collection.query(
n_results=20,
where={"created_at": {"$gte": ninety_days_ago}}
)
# Returns: observation IDs semantically related to login/auth
```
#### 2. Full-Text Search (FTS5) - Supporting Layer
**Role:** Layer 2 & 3 - Temporal Ordering and Timeline Context
**How it works:**
- Uses SQLite FTS5 virtual tables for instant keyword matching
- Supports boolean operators: `AND`, `OR`, `NOT`, `NEAR`, `*` (wildcard)
- Fast temporal queries with date-range sorting
- Sub-100ms performance on 8,000+ observations
**Why it's supporting:**
- ChromaDB handles semantic "what's relevant"
- SQLite/FTS5 handles temporal "when did it happen" and "what came before/after"
- Optimized for timeline queries and date-based sorting
**Example query:**
```sql
-- Takes observation IDs from ChromaDB, sorts by time
SELECT * FROM observations
WHERE id IN (/* IDs from ChromaDB */)
ORDER BY created_at_epoch DESC
LIMIT 1;
-- Then retrieves timeline context around that observation
SELECT * FROM observations
WHERE created_at_epoch < anchor_timestamp
ORDER BY created_at_epoch DESC
LIMIT 10; -- "what led here"
```
#### 3. Structured Filters