claude-mem/docs/context/post-about-endless.md at 0a70bcecc5b8bebf31168fa4099b2c21ca166c2b

Files

T

Alex Newman 4e5913611a feat(search-server): enhance decision search with optional semantic query support

- Updated the 'decisions' tool to accept an optional 'query' parameter for semantic filtering.
- Implemented logic to handle semantic search using Chroma when a query is provided.
- Preserved ranking order of results based on Chroma's output.
- Added fallback to metadata-first search when no query is present.

2025-11-25 16:37:08 -05:00

3.8 KiB

Raw Blame History

@everyone

Endless Mode: Breaking Claude's Context Limits

The Problem

Ever hit 67% context usage mid-session and had to restart Claude Code? Context window limits are the #1 killer of long coding sessions. When you're deep in a complex refactor or debugging session, the last thing you want is to lose all that built-up context.

The Solution: Endless Mode

Endless Mode compresses tool outputs in real-time as you work. Instead of storing the full 500-line file you just read, it stores a compact observation like:

"Read package.json - found 47 dependencies including React 18, TypeScript 5.2, and custom build scripts"

The result: 70-84% token reduction on tool outputs, letting you work indefinitely without hitting context limits.

The Numbers (Real Test Results)

We analyzed 500 transcripts containing 1,884 tool uses:

Metric	Value
Tool uses analyzed	1,884
Observations matched	868
Eligible for compression	406
Compression rate (facts-only)	84%
Characters saved	887,783 of 1,056,285

Which tools benefit most:

Bash output: 236 compressible (command outputs -> facts)
Read file contents: 98 compressible (file contents -> summaries)
Grep results: 42 compressible (search results -> key matches)

Key insight: We only compress tool outputs, never inputs. Inputs contain semantic meaning (the actual diff, the query, the code you wrote). Outputs are verbose results that can be summarized without losing meaning.

The Journey (69 observations over 10 days)

Nov 16 - The Vision Decided to build Endless Mode as an optional feature to avoid mandatory architectural refactoring. The idea: let users opt-in to experimental compression without breaking anything for those who don't.

Nov 19-20 - Implementation Begins Hit our first bug immediately: duplicate observations appearing on the 2nd prompt of each session. Classic regression - the endless mode changes broke something that was already working. Fixed it, kept going.

Nov 21 - The Big Switch Made a critical architectural change: switched from deferred (async, 5-second timeout) to synchronous transformation (blocking, 90-second timeout). Endless Mode needs to wait for compression to complete before continuing - otherwise you'd read uncompressed data.

Multiple rounds of experimental release preparation. Documented all dependencies. Critical bugs kept appearing.

Nov 22 - Validation Endpoints verified. Toggle working. Documentation reviewed. Things looking stable.

Nov 23 - The Setback Disabled endless mode. It was causing everything to hang. The 90-second synchronous blocking was too aggressive - when compression took too long, the whole system locked up. Had to prioritize stability.

25 sessions had successfully used it before this point.

Nov 25 - The Solution Created a beta branch strategy: Endless Mode lives on beta/7.0, isolated from main. Added Version Channel UI so users can safely try it without affecting stable users. Easy rollback if issues occur.

Built analysis scripts to measure actual compression rates instead of theoretical. Validated 84% savings on real transcripts.

How to Try It

v6.3.1 added a Version Channel switcher:

Open http://localhost:37777
Find "Version Channel" in Settings sidebar
Click "Try Beta (Endless Mode)"
Refresh the UI after switching

Safe to try: Your memory data lives in ~/.claude-mem/ - completely separate from the plugin code. Switching branches won't touch your data. Easy rollback with "Switch to Stable" button.

Current beta branch: beta/7.0

This has been a real engineering journey - vision, implementation, bugs, setbacks, and creative solutions. The beta branch approach lets us keep iterating on stability while giving adventurous users access to the feature.

3.8 KiB Raw Blame History