feat(search-server): enhance decision search with optional semantic query support
- Updated the 'decisions' tool to accept an optional 'query' parameter for semantic filtering. - Implemented logic to handle semantic search using Chroma when a query is provided. - Preserved ranking order of results based on Chroma's output. - Added fallback to metadata-first search when no query is present.
This commit is contained in:
@@ -0,0 +1,77 @@
|
||||
@everyone
|
||||
|
||||
**Endless Mode: Breaking Claude's Context Limits**
|
||||
|
||||
## The Problem
|
||||
|
||||
Ever hit 67% context usage mid-session and had to restart Claude Code? Context window limits are the #1 killer of long coding sessions. When you're deep in a complex refactor or debugging session, the last thing you want is to lose all that built-up context.
|
||||
|
||||
## The Solution: Endless Mode
|
||||
|
||||
Endless Mode compresses tool outputs **in real-time** as you work. Instead of storing the full 500-line file you just read, it stores a compact observation like:
|
||||
|
||||
> "Read package.json - found 47 dependencies including React 18, TypeScript 5.2, and custom build scripts"
|
||||
|
||||
**The result: 70-84% token reduction** on tool outputs, letting you work indefinitely without hitting context limits.
|
||||
|
||||
## The Numbers (Real Test Results)
|
||||
|
||||
We analyzed **500 transcripts** containing **1,884 tool uses**:
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| Tool uses analyzed | 1,884 |
|
||||
| Observations matched | 868 |
|
||||
| Eligible for compression | 406 |
|
||||
| Compression rate (facts-only) | **84%** |
|
||||
| Characters saved | 887,783 of 1,056,285 |
|
||||
|
||||
**Which tools benefit most:**
|
||||
- **Bash output**: 236 compressible (command outputs -> facts)
|
||||
- **Read file contents**: 98 compressible (file contents -> summaries)
|
||||
- **Grep results**: 42 compressible (search results -> key matches)
|
||||
|
||||
**Key insight**: We only compress tool **outputs**, never inputs. Inputs contain semantic meaning (the actual diff, the query, the code you wrote). Outputs are verbose results that can be summarized without losing meaning.
|
||||
|
||||
## The Journey (69 observations over 10 days)
|
||||
|
||||
**Nov 16 - The Vision**
|
||||
Decided to build Endless Mode as an *optional* feature to avoid mandatory architectural refactoring. The idea: let users opt-in to experimental compression without breaking anything for those who don't.
|
||||
|
||||
**Nov 19-20 - Implementation Begins**
|
||||
Hit our first bug immediately: duplicate observations appearing on the 2nd prompt of each session. Classic regression - the endless mode changes broke something that was already working. Fixed it, kept going.
|
||||
|
||||
**Nov 21 - The Big Switch**
|
||||
Made a critical architectural change: switched from **deferred** (async, 5-second timeout) to **synchronous** transformation (blocking, 90-second timeout). Endless Mode needs to wait for compression to complete before continuing - otherwise you'd read uncompressed data.
|
||||
|
||||
Multiple rounds of experimental release preparation. Documented all dependencies. Critical bugs kept appearing.
|
||||
|
||||
**Nov 22 - Validation**
|
||||
Endpoints verified. Toggle working. Documentation reviewed. Things looking stable.
|
||||
|
||||
**Nov 23 - The Setback**
|
||||
**Disabled endless mode.** It was causing everything to hang. The 90-second synchronous blocking was too aggressive - when compression took too long, the whole system locked up. Had to prioritize stability.
|
||||
|
||||
25 sessions had successfully used it before this point.
|
||||
|
||||
**Nov 25 - The Solution**
|
||||
Created a **beta branch strategy**: Endless Mode lives on `beta/7.0`, isolated from main. Added Version Channel UI so users can safely try it without affecting stable users. Easy rollback if issues occur.
|
||||
|
||||
Built analysis scripts to measure *actual* compression rates instead of theoretical. Validated 84% savings on real transcripts.
|
||||
|
||||
## How to Try It
|
||||
|
||||
**v6.3.1** added a Version Channel switcher:
|
||||
|
||||
1. Open http://localhost:37777
|
||||
2. Find **"Version Channel"** in Settings sidebar
|
||||
3. Click **"Try Beta (Endless Mode)"**
|
||||
4. Refresh the UI after switching
|
||||
|
||||
**Safe to try**: Your memory data lives in `~/.claude-mem/` - completely separate from the plugin code. Switching branches won't touch your data. Easy rollback with "Switch to Stable" button.
|
||||
|
||||
**Current beta branch**: `beta/7.0`
|
||||
|
||||
---
|
||||
|
||||
This has been a real engineering journey - vision, implementation, bugs, setbacks, and creative solutions. The beta branch approach lets us keep iterating on stability while giving adventurous users access to the feature.
|
||||
File diff suppressed because one or more lines are too long
@@ -930,8 +930,9 @@ const tools = [
|
||||
},
|
||||
{
|
||||
name: 'decisions',
|
||||
description: 'Semantic shortcut to find decision-type observations. Returns observations where important architectural, technical, or process decisions were made. Equivalent to find_by_type with type="decision".',
|
||||
description: 'Semantic shortcut to find decision-type observations. Returns observations where important architectural, technical, or process decisions were made. Supports optional semantic search query to filter decisions by relevance.',
|
||||
inputSchema: z.object({
|
||||
query: z.string().optional().describe('Search query to filter decisions semantically'),
|
||||
format: z.enum(['index', 'full']).default('index').describe('Output format: "index" for titles/dates only (default), "full" for complete details'),
|
||||
project: z.string().optional().describe('Filter by project name'),
|
||||
dateRange: z.object({
|
||||
@@ -944,33 +945,47 @@ const tools = [
|
||||
}),
|
||||
handler: async (args: any) => {
|
||||
try {
|
||||
const { format = 'index', ...filters } = args;
|
||||
const { query, format = 'index', ...filters } = args;
|
||||
let results: ObservationSearchResult[] = [];
|
||||
|
||||
// Search for decision-type observations
|
||||
if (chromaClient) {
|
||||
try {
|
||||
console.error('[search-server] Using metadata-first + semantic ranking for decisions');
|
||||
const metadataResults = search.findByType('decision', filters);
|
||||
if (query) {
|
||||
// Semantic search filtered to decision type
|
||||
console.error('[search-server] Using Chroma semantic search with type=decision filter');
|
||||
const chromaResults = await queryChroma(query, Math.min((filters.limit || 20) * 2, 100), { type: 'decision' });
|
||||
const obsIds = chromaResults.ids;
|
||||
|
||||
if (metadataResults.length > 0) {
|
||||
const ids = metadataResults.map(obs => obs.id);
|
||||
const chromaResults = await queryChroma('decision', Math.min(ids.length, 100));
|
||||
|
||||
const rankedIds: number[] = [];
|
||||
for (const chromaId of chromaResults.ids) {
|
||||
if (ids.includes(chromaId) && !rankedIds.includes(chromaId)) {
|
||||
rankedIds.push(chromaId);
|
||||
}
|
||||
if (obsIds.length > 0) {
|
||||
results = store.getObservationsByIds(obsIds, { ...filters, type: 'decision' });
|
||||
// Preserve Chroma ranking order
|
||||
results.sort((a, b) => obsIds.indexOf(a.id) - obsIds.indexOf(b.id));
|
||||
}
|
||||
} else {
|
||||
// No query: get all decisions, rank by "decision" keyword
|
||||
console.error('[search-server] Using metadata-first + semantic ranking for decisions');
|
||||
const metadataResults = search.findByType('decision', filters);
|
||||
|
||||
if (rankedIds.length > 0) {
|
||||
results = store.getObservationsByIds(rankedIds, { limit: filters.limit || 20 });
|
||||
results.sort((a, b) => rankedIds.indexOf(a.id) - rankedIds.indexOf(b.id));
|
||||
if (metadataResults.length > 0) {
|
||||
const ids = metadataResults.map(obs => obs.id);
|
||||
const chromaResults = await queryChroma('decision', Math.min(ids.length, 100));
|
||||
|
||||
const rankedIds: number[] = [];
|
||||
for (const chromaId of chromaResults.ids) {
|
||||
if (ids.includes(chromaId) && !rankedIds.includes(chromaId)) {
|
||||
rankedIds.push(chromaId);
|
||||
}
|
||||
}
|
||||
|
||||
if (rankedIds.length > 0) {
|
||||
results = store.getObservationsByIds(rankedIds, { limit: filters.limit || 20 });
|
||||
results.sort((a, b) => rankedIds.indexOf(a.id) - rankedIds.indexOf(b.id));
|
||||
}
|
||||
}
|
||||
}
|
||||
} catch (chromaError: any) {
|
||||
console.error('[search-server] Chroma ranking failed, using SQLite order:', chromaError.message);
|
||||
console.error('[search-server] Chroma search failed, using SQLite fallback:', chromaError.message);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user