chore: Bump version to 6.3.2

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
feat(search-server): enhance decision search with optional semantic query support
2025-11-25 16:38:06 -05:00 · 2025-11-25 16:37:08 -05:00 · 2025-11-25 15:48:37 -05:00
8 changed files with 145 additions and 47 deletions
@@ -10,7 +10,7 @@
  "plugins": [
    {
      "name": "claude-mem",
-      "version": "6.3.1",
+      "version": "6.3.2",
      "source": "./plugin",
      "description": "Persistent memory system for Claude Code - context compression across sessions"
    }
@@ -4,6 +4,12 @@ All notable changes to this project will be documented in this file.

 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

+## [6.3.1] - 2025-11-25
+
+## What's New
+
+- Add script to help estimate token savings from on-the-fly replacements
+
 ## [6.3.0] - 2025-11-25

 ## What's New
@@ -6,7 +6,7 @@

 Claude-mem is a Claude Code plugin providing persistent memory across sessions. It captures tool usage, compresses observations using the Claude Agent SDK, and injects relevant context into future sessions.

-**Current Version**: 6.3.1
+**Current Version**: 6.3.2

 ## Architecture

@@ -0,0 +1,77 @@
+@everyone
+
+**Endless Mode: Breaking Claude's Context Limits**
+
+## The Problem
+
+Ever hit 67% context usage mid-session and had to restart Claude Code? Context window limits are the #1 killer of long coding sessions. When you're deep in a complex refactor or debugging session, the last thing you want is to lose all that built-up context.
+
+## The Solution: Endless Mode
+
+Endless Mode compresses tool outputs **in real-time** as you work. Instead of storing the full 500-line file you just read, it stores a compact observation like:
+
+> "Read package.json - found 47 dependencies including React 18, TypeScript 5.2, and custom build scripts"
+
+**The result: 70-84% token reduction** on tool outputs, letting you work indefinitely without hitting context limits.
+
+## The Numbers (Real Test Results)
+
+We analyzed **500 transcripts** containing **1,884 tool uses**:
+
+| Metric | Value |
+|--------|-------|
+| Tool uses analyzed | 1,884 |
+| Observations matched | 868 |
+| Eligible for compression | 406 |
+| Compression rate (facts-only) | **84%** |
+| Characters saved | 887,783 of 1,056,285 |
+
+**Which tools benefit most:**
+- **Bash output**: 236 compressible (command outputs -> facts)
+- **Read file contents**: 98 compressible (file contents -> summaries)
+- **Grep results**: 42 compressible (search results -> key matches)
+
+**Key insight**: We only compress tool **outputs**, never inputs. Inputs contain semantic meaning (the actual diff, the query, the code you wrote). Outputs are verbose results that can be summarized without losing meaning.
+
+## The Journey (69 observations over 10 days)
+
+**Nov 16 - The Vision**
+Decided to build Endless Mode as an *optional* feature to avoid mandatory architectural refactoring. The idea: let users opt-in to experimental compression without breaking anything for those who don't.
+
+**Nov 19-20 - Implementation Begins**
+Hit our first bug immediately: duplicate observations appearing on the 2nd prompt of each session. Classic regression - the endless mode changes broke something that was already working. Fixed it, kept going.
+
+**Nov 21 - The Big Switch**
+Made a critical architectural change: switched from **deferred** (async, 5-second timeout) to **synchronous** transformation (blocking, 90-second timeout). Endless Mode needs to wait for compression to complete before continuing - otherwise you'd read uncompressed data.
+
+Multiple rounds of experimental release preparation. Documented all dependencies. Critical bugs kept appearing.
+
+**Nov 22 - Validation**
+Endpoints verified. Toggle working. Documentation reviewed. Things looking stable.
+
+**Nov 23 - The Setback**
+**Disabled endless mode.** It was causing everything to hang. The 90-second synchronous blocking was too aggressive - when compression took too long, the whole system locked up. Had to prioritize stability.
+
+25 sessions had successfully used it before this point.
+
+**Nov 25 - The Solution**
+Created a **beta branch strategy**: Endless Mode lives on `beta/7.0`, isolated from main. Added Version Channel UI so users can safely try it without affecting stable users. Easy rollback if issues occur.
+
+Built analysis scripts to measure *actual* compression rates instead of theoretical. Validated 84% savings on real transcripts.
+
+## How to Try It
+
+**v6.3.1** added a Version Channel switcher:
+
+1. Open http://localhost:37777
+2. Find **"Version Channel"** in Settings sidebar
+3. Click **"Try Beta (Endless Mode)"**
+4. Refresh the UI after switching
+
+**Safe to try**: Your memory data lives in `~/.claude-mem/` - completely separate from the plugin code. Switching branches won't touch your data. Easy rollback with "Switch to Stable" button.
+
+**Current beta branch**: `beta/7.0`
+
+---
+
+This has been a real engineering journey - vision, implementation, bugs, setbacks, and creative solutions. The beta branch approach lets us keep iterating on stability while giving adventurous users access to the feature.
@@ -1,6 +1,6 @@
 {
  "name": "claude-mem",
-  "version": "6.3.1",
+  "version": "6.3.2",
  "description": "Memory compression system for Claude Code - persist context across sessions",
  "keywords": [
    "claude",
@@ -1,6 +1,6 @@
 {
  "name": "claude-mem",
-  "version": "6.3.1",
+  "version": "6.3.2",
  "description": "Persistent memory system for Claude Code - seamlessly preserve context across sessions",
  "author": {
    "name": "Alex Newman"
@@ -930,8 +930,9 @@ const tools = [
  },
  {
    name: 'decisions',
-    description: 'Semantic shortcut to find decision-type observations. Returns observations where important architectural, technical, or process decisions were made. Equivalent to find_by_type with type="decision".',
+    description: 'Semantic shortcut to find decision-type observations. Returns observations where important architectural, technical, or process decisions were made. Supports optional semantic search query to filter decisions by relevance.',
    inputSchema: z.object({
+      query: z.string().optional().describe('Search query to filter decisions semantically'),
      format: z.enum(['index', 'full']).default('index').describe('Output format: "index" for titles/dates only (default), "full" for complete details'),
      project: z.string().optional().describe('Filter by project name'),
      dateRange: z.object({
@@ -944,33 +945,47 @@ const tools = [
    }),
    handler: async (args: any) => {
      try {
-        const { format = 'index', ...filters } = args;
+        const { query, format = 'index', ...filters } = args;
        let results: ObservationSearchResult[] = [];

        // Search for decision-type observations
        if (chromaClient) {
          try {
-            console.error('[search-server] Using metadata-first + semantic ranking for decisions');
-            const metadataResults = search.findByType('decision', filters);
+            if (query) {
+              // Semantic search filtered to decision type
+              console.error('[search-server] Using Chroma semantic search with type=decision filter');
+              const chromaResults = await queryChroma(query, Math.min((filters.limit || 20) * 2, 100), { type: 'decision' });
+              const obsIds = chromaResults.ids;

-            if (metadataResults.length > 0) {
-              const ids = metadataResults.map(obs => obs.id);
-              const chromaResults = await queryChroma('decision', Math.min(ids.length, 100));
-
-              const rankedIds: number[] = [];
-              for (const chromaId of chromaResults.ids) {
-                if (ids.includes(chromaId) && !rankedIds.includes(chromaId)) {
-                  rankedIds.push(chromaId);
-                }
+              if (obsIds.length > 0) {
+                results = store.getObservationsByIds(obsIds, { ...filters, type: 'decision' });
+                // Preserve Chroma ranking order
+                results.sort((a, b) => obsIds.indexOf(a.id) - obsIds.indexOf(b.id));
              }
+            } else {
+              // No query: get all decisions, rank by "decision" keyword
+              console.error('[search-server] Using metadata-first + semantic ranking for decisions');
+              const metadataResults = search.findByType('decision', filters);

-              if (rankedIds.length > 0) {
-                results = store.getObservationsByIds(rankedIds, { limit: filters.limit || 20 });
-                results.sort((a, b) => rankedIds.indexOf(a.id) - rankedIds.indexOf(b.id));
+              if (metadataResults.length > 0) {
+                const ids = metadataResults.map(obs => obs.id);
+                const chromaResults = await queryChroma('decision', Math.min(ids.length, 100));
+
+                const rankedIds: number[] = [];
+                for (const chromaId of chromaResults.ids) {
+                  if (ids.includes(chromaId) && !rankedIds.includes(chromaId)) {
+                    rankedIds.push(chromaId);
+                  }
+                }
+
+                if (rankedIds.length > 0) {
+                  results = store.getObservationsByIds(rankedIds, { limit: filters.limit || 20 });
+                  results.sort((a, b) => rankedIds.indexOf(a.id) - rankedIds.indexOf(b.id));
+                }
              }
            }
          } catch (chromaError: any) {
-            console.error('[search-server] Chroma ranking failed, using SQLite order:', chromaError.message);
+            console.error('[search-server] Chroma search failed, using SQLite fallback:', chromaError.message);
          }
        }