feat(search-server): enhance decision search with optional semantic query support

- Updated the 'decisions' tool to accept an optional 'query' parameter for semantic filtering.
- Implemented logic to handle semantic search using Chroma when a query is provided.
- Preserved ranking order of results based on Chroma's output.
- Added fallback to metadata-first search when no query is present.
This commit is contained in:
Alex Newman
2025-11-25 16:37:08 -05:00
parent 73982dc709
commit 4e5913611a
3 changed files with 135 additions and 43 deletions
+77
View File
@@ -0,0 +1,77 @@
@everyone
**Endless Mode: Breaking Claude's Context Limits**
## The Problem
Ever hit 67% context usage mid-session and had to restart Claude Code? Context window limits are the #1 killer of long coding sessions. When you're deep in a complex refactor or debugging session, the last thing you want is to lose all that built-up context.
## The Solution: Endless Mode
Endless Mode compresses tool outputs **in real-time** as you work. Instead of storing the full 500-line file you just read, it stores a compact observation like:
> "Read package.json - found 47 dependencies including React 18, TypeScript 5.2, and custom build scripts"
**The result: 70-84% token reduction** on tool outputs, letting you work indefinitely without hitting context limits.
## The Numbers (Real Test Results)
We analyzed **500 transcripts** containing **1,884 tool uses**:
| Metric | Value |
|--------|-------|
| Tool uses analyzed | 1,884 |
| Observations matched | 868 |
| Eligible for compression | 406 |
| Compression rate (facts-only) | **84%** |
| Characters saved | 887,783 of 1,056,285 |
**Which tools benefit most:**
- **Bash output**: 236 compressible (command outputs -> facts)
- **Read file contents**: 98 compressible (file contents -> summaries)
- **Grep results**: 42 compressible (search results -> key matches)
**Key insight**: We only compress tool **outputs**, never inputs. Inputs contain semantic meaning (the actual diff, the query, the code you wrote). Outputs are verbose results that can be summarized without losing meaning.
## The Journey (69 observations over 10 days)
**Nov 16 - The Vision**
Decided to build Endless Mode as an *optional* feature to avoid mandatory architectural refactoring. The idea: let users opt-in to experimental compression without breaking anything for those who don't.
**Nov 19-20 - Implementation Begins**
Hit our first bug immediately: duplicate observations appearing on the 2nd prompt of each session. Classic regression - the endless mode changes broke something that was already working. Fixed it, kept going.
**Nov 21 - The Big Switch**
Made a critical architectural change: switched from **deferred** (async, 5-second timeout) to **synchronous** transformation (blocking, 90-second timeout). Endless Mode needs to wait for compression to complete before continuing - otherwise you'd read uncompressed data.
Multiple rounds of experimental release preparation. Documented all dependencies. Critical bugs kept appearing.
**Nov 22 - Validation**
Endpoints verified. Toggle working. Documentation reviewed. Things looking stable.
**Nov 23 - The Setback**
**Disabled endless mode.** It was causing everything to hang. The 90-second synchronous blocking was too aggressive - when compression took too long, the whole system locked up. Had to prioritize stability.
25 sessions had successfully used it before this point.
**Nov 25 - The Solution**
Created a **beta branch strategy**: Endless Mode lives on `beta/7.0`, isolated from main. Added Version Channel UI so users can safely try it without affecting stable users. Easy rollback if issues occur.
Built analysis scripts to measure *actual* compression rates instead of theoretical. Validated 84% savings on real transcripts.
## How to Try It
**v6.3.1** added a Version Channel switcher:
1. Open http://localhost:37777
2. Find **"Version Channel"** in Settings sidebar
3. Click **"Try Beta (Endless Mode)"**
4. Refresh the UI after switching
**Safe to try**: Your memory data lives in `~/.claude-mem/` - completely separate from the plugin code. Switching branches won't touch your data. Easy rollback with "Switch to Stable" button.
**Current beta branch**: `beta/7.0`
---
This has been a real engineering journey - vision, implementation, bugs, setbacks, and creative solutions. The beta branch approach lets us keep iterating on stability while giving adventurous users access to the feature.
File diff suppressed because one or more lines are too long
+32 -17
View File
@@ -930,8 +930,9 @@ const tools = [
},
{
name: 'decisions',
description: 'Semantic shortcut to find decision-type observations. Returns observations where important architectural, technical, or process decisions were made. Equivalent to find_by_type with type="decision".',
description: 'Semantic shortcut to find decision-type observations. Returns observations where important architectural, technical, or process decisions were made. Supports optional semantic search query to filter decisions by relevance.',
inputSchema: z.object({
query: z.string().optional().describe('Search query to filter decisions semantically'),
format: z.enum(['index', 'full']).default('index').describe('Output format: "index" for titles/dates only (default), "full" for complete details'),
project: z.string().optional().describe('Filter by project name'),
dateRange: z.object({
@@ -944,33 +945,47 @@ const tools = [
}),
handler: async (args: any) => {
try {
const { format = 'index', ...filters } = args;
const { query, format = 'index', ...filters } = args;
let results: ObservationSearchResult[] = [];
// Search for decision-type observations
if (chromaClient) {
try {
console.error('[search-server] Using metadata-first + semantic ranking for decisions');
const metadataResults = search.findByType('decision', filters);
if (query) {
// Semantic search filtered to decision type
console.error('[search-server] Using Chroma semantic search with type=decision filter');
const chromaResults = await queryChroma(query, Math.min((filters.limit || 20) * 2, 100), { type: 'decision' });
const obsIds = chromaResults.ids;
if (metadataResults.length > 0) {
const ids = metadataResults.map(obs => obs.id);
const chromaResults = await queryChroma('decision', Math.min(ids.length, 100));
const rankedIds: number[] = [];
for (const chromaId of chromaResults.ids) {
if (ids.includes(chromaId) && !rankedIds.includes(chromaId)) {
rankedIds.push(chromaId);
}
if (obsIds.length > 0) {
results = store.getObservationsByIds(obsIds, { ...filters, type: 'decision' });
// Preserve Chroma ranking order
results.sort((a, b) => obsIds.indexOf(a.id) - obsIds.indexOf(b.id));
}
} else {
// No query: get all decisions, rank by "decision" keyword
console.error('[search-server] Using metadata-first + semantic ranking for decisions');
const metadataResults = search.findByType('decision', filters);
if (rankedIds.length > 0) {
results = store.getObservationsByIds(rankedIds, { limit: filters.limit || 20 });
results.sort((a, b) => rankedIds.indexOf(a.id) - rankedIds.indexOf(b.id));
if (metadataResults.length > 0) {
const ids = metadataResults.map(obs => obs.id);
const chromaResults = await queryChroma('decision', Math.min(ids.length, 100));
const rankedIds: number[] = [];
for (const chromaId of chromaResults.ids) {
if (ids.includes(chromaId) && !rankedIds.includes(chromaId)) {
rankedIds.push(chromaId);
}
}
if (rankedIds.length > 0) {
results = store.getObservationsByIds(rankedIds, { limit: filters.limit || 20 });
results.sort((a, b) => rankedIds.indexOf(a.id) - rankedIds.indexOf(b.id));
}
}
}
} catch (chromaError: any) {
console.error('[search-server] Chroma ranking failed, using SQLite order:', chromaError.message);
console.error('[search-server] Chroma search failed, using SQLite fallback:', chromaError.message);
}
}