airkjw/claude-mem

Fork 0

Files

T

Alex Newman 9285826547 feat: implement Endless Mode for real-time context compression in Claude sessions

2025-11-16 23:19:43 -05:00

14 KiB

Raw Permalink Blame History

Endless Mode: Real-Time Context Compression Plan

Executive Summary

"Endless Mode" is an optional feature that enables Claude sessions to run indefinitely by transparently compressing tool use transcripts in real-time. Using an in-memory transformation layer in the worker service, heavy tool outputs are dynamically replaced with lightweight observations during session resume—without modifying the immutable source transcripts. This allows sessions to continue for weeks or months without hitting context window limits, while preserving full conversation history and maintaining zero risk of data corruption.

Problem Statement

Current Behavior

Claude sessions accumulate full tool transcripts in the context window:

File reads: 5k-10k tokens per read
Bash outputs: 1k-5k tokens per command
Search results: 2k-8k tokens per search
Total context limit: ~200k tokens

When the context window fills, users must start a new session, losing conversational continuity.

What Happens Today

Tool executes during session
PostToolUse hook captures tool data
Worker creates compressed observation (~200-500 tokens)
But: Full tool transcript stays in Claude's context window
Observation only helps next session via SessionStart injection

The Gap

Observations exist and are created in real-time, but they're not used to compress the current session's context. We have the compressed data, we just don't apply it to the active session.

Proposed Solution: Endless Mode

Core Concept

When a session resumes (either after restart or during continuation), transform messages in memory by replacing heavy tool use content with lightweight observations before feeding them to the Agent SDK. The source transcript remains immutable on disk.

Architecture Principle

Immutable Storage + Ephemeral Transform = Safe Compression

Disk (never modified)     Memory (transform)          Agent SDK
──────────────────────    ──────────────────────      ────────────────
transcript.jsonl          Load messages               Resume session
  tool_use_abc      →     Look up observation   →     with compressed
  tool_use_def            Replace content             context
  tool_use_xyz            Feed to SDK

Key Properties

Immutable: Original transcripts never modified
Non-destructive: Full history preserved on disk
No duplication: No forks, no copies
Transparent: User sees same conversation, compression is under the hood
Optional: Feature flag allows users to opt-in/out
Reversible: Can always read original transcript

How It Works

Session Resume Flow (Endless Mode Enabled)

1. User continues session / Claude Code restarts
   ↓
2. Worker service intercepts resume request
   ↓
3. Load transcript JSONL from disk (immutable)
   ↓
4. Transform Loop:
   For each message in transcript:
     - If tool_use message:
       - Query SQLite: SELECT observation WHERE tool_use_id = ?
       - Replace tool content with observation (facts, narrative, concepts)
     - If other message type:
       - Pass through unchanged
   ↓
5. Feed transformed messages to Agent SDK
   ↓
6. Agent SDK resumes session with compressed context
   ↓
7. New tool uses append to original transcript (normal flow)
   ↓
8. Next resume: Loop repeats, new tool uses also get compressed

Session Resume Flow (Endless Mode Disabled)

1. User continues session
   ↓
2. Load transcript JSONL from disk
   ↓
3. Feed messages directly to Agent SDK (no transformation)
   ↓
4. Session resumes with full tool transcripts (current behavior)

Implementation Plan

Phase 1: Foundation (Week 1)

Goal: Set up infrastructure for transformation layer

Tasks:

Add tool_use_id column to observations table (SQLite schema migration)
Update PostToolUse hook to capture and store tool_use_id
Create TransformLayer class in worker service
Add CLAUDE_MEM_ENDLESS_MODE environment variable (default: false)
Write tests for observation lookup by tool_use_id

Deliverable: Database schema updated, tool_use_ids being captured

Phase 2: Transform Logic (Week 2)

Goal: Build message transformation engine

Tasks:

Implement TransformLayer.transformMessages(messages) function
Tool use detection logic (identify tool_use messages in transcript)
Observation lookup and replacement logic
Fallback handling (if observation missing, keep original content)
Message serialization/deserialization

Deliverable: Working transform function that compresses messages in memory

Phase 3: Agent SDK Integration (Week 2-3)

Goal: Wire transform layer into session resume flow

Tasks:

Identify where worker service resumes Agent SDK sessions
Inject transform layer before session resume
Add feature flag check (only transform if endless mode enabled)
Logging and instrumentation (track compression ratios, transform time)
Error handling and graceful degradation

Deliverable: Worker service can resume sessions with compressed context

Phase 4: Testing & Validation (Week 3-4)

Goal: Verify endless mode works correctly

Tasks:

Create test session with 50+ tool uses
Enable endless mode and resume session
Verify context window usage (should be dramatically lower)
Test conversation quality (does Claude have enough context?)
Measure performance (transform latency, lookup speed)
Edge case testing (missing observations, malformed transcripts)

Deliverable: Endless mode working in test environment

Phase 5: Beta Release (Week 4+)

Goal: Release to power users for feedback

Tasks:

Documentation (how to enable, what to expect, how to disable)
Add endless mode toggle to viewer UI
Monitoring and observability (track usage, failures, compression stats)
Collect feedback from beta users
Iterate based on real-world usage

Deliverable: Endless mode available as opt-in beta feature

Technical Requirements

Database Schema

-- Add to observations table
ALTER TABLE observations ADD COLUMN tool_use_id TEXT UNIQUE;
CREATE INDEX idx_observations_tool_use_id ON observations(tool_use_id);

Worker Service API

interface TransformLayerConfig {
  enabled: boolean; // CLAUDE_MEM_ENDLESS_MODE
  fallbackToOriginal: boolean; // If observation missing, use full content
  maxLookupTime: number; // Timeout for SQLite queries
}

class TransformLayer {
  constructor(config: TransformLayerConfig, db: SessionStore);

  // Main transform function
  async transformMessages(messages: Message[]): Promise<Message[]>;

  // Helper functions
  private async lookupObservation(toolUseId: string): Promise<Observation | null>;
  private replaceToolContent(message: Message, observation: Observation): Message;
  private isToolUseMessage(message: Message): boolean;
}

Agent SDK Integration Point

// In worker service session resume logic
async function resumeSession(sessionId: string, transcriptPath: string) {
  const messages = await loadTranscript(transcriptPath);

  // Transform layer (only if endless mode enabled)
  const transformedMessages = config.endlessMode
    ? await transformLayer.transformMessages(messages)
    : messages;

  // Resume with transformed (or original) messages
  return await agentSDK.resumeSession({
    sessionId,
    messages: transformedMessages
  });
}

Risks and Mitigations

Risk 1: Information Loss

Risk: Compressed observations may lose critical details that Claude needs to reference later.

Mitigation:

Make endless mode optional (users can disable if quality degrades)
Improve observation quality (better prompts, more comprehensive facts)
Hybrid approach: Keep recent N tool uses in full, compress older ones
Monitor conversation quality metrics

Risk 2: Transform Performance

Risk: Looking up observations for 100+ tool uses during resume could be slow.

Mitigation:

Index tool_use_id in SQLite (O(log n) lookups)
Batch queries (single SELECT with IN clause)
Measure and optimize (target <100ms for typical session)
Cache observations in memory during session

Risk 3: Missing Observations

Risk: Tool use executed but observation not yet created (async worker lag).

Mitigation:

Fallback to original content if observation missing
Log when fallback occurs (helps identify worker performance issues)
Allow observations to be created retroactively
Consider synchronous observation creation for critical tools

Risk 4: Transcript Corruption

Risk: Bug in transform layer could corrupt user conversations.

Mitigation:

Never modify source transcripts (read-only)
Transform happens in memory only
Extensive testing before beta release
Feature flag allows instant disable if issues found
Keep full audit trail in logs

Risk 5: Agent SDK Compatibility

Risk: Agent SDK updates could break transform layer integration.

Mitigation:

Document exact Agent SDK version requirements
Monitor Agent SDK release notes
Test against new SDK versions before upgrading
Graceful degradation if SDK changes detected

Success Criteria

Proof of Concept Success

Transform layer successfully compresses a 50-tool-use session
Context window usage reduced by 80%+ compared to uncompressed
Session resumes without errors
Conversation quality remains high (subjective evaluation)

Beta Release Success

10+ users running endless mode without issues
Average context savings: 85%+ across all sessions
Transform latency: <200ms for typical resume
Zero transcript corruption incidents
Positive user feedback on conversation continuity

Production Success

Endless mode becomes default setting
Sessions running for weeks/months without context issues
Context window exhaustion becomes rare edge case
User-reported "session too long" issues drop to near zero
Transform layer performance scales to 1000+ tool use sessions

Configuration

Environment Variables

# Enable endless mode (default: false)
CLAUDE_MEM_ENDLESS_MODE=true

# Fallback behavior if observation missing (default: true)
CLAUDE_MEM_TRANSFORM_FALLBACK=true

# Max time to wait for observation lookup (default: 500ms)
CLAUDE_MEM_TRANSFORM_TIMEOUT=500

# Keep recent N tool uses uncompressed (default: 0, compress all)
CLAUDE_MEM_TRANSFORM_KEEP_RECENT=0

User Controls

// Future: UI toggle in viewer
interface EndlessModeSettings {
  enabled: boolean;
  keepRecentToolUses: number; // Hybrid mode
  fallbackToOriginal: boolean;
}

Context Economics: Before vs. After

Example Session (50 tool uses)

Before (Endless Mode OFF):

File reads:    10 × 8,000 tokens  = 80,000 tokens
Bash outputs:  20 × 2,000 tokens  = 40,000 tokens
Searches:      15 × 4,000 tokens  = 60,000 tokens
Other tools:    5 × 1,000 tokens  =  5,000 tokens
──────────────────────────────────────────────────
Total:                              185,000 tokens
Context remaining:                   15,000 tokens (92% full)

After (Endless Mode ON):

File reads:    10 ×   300 tokens  =  3,000 tokens
Bash outputs:  20 ×   250 tokens  =  5,000 tokens
Searches:      15 ×   400 tokens  =  6,000 tokens
Other tools:    5 ×   200 tokens  =  1,000 tokens
──────────────────────────────────────────────────
Total:                               15,000 tokens
Context remaining:                  185,000 tokens (7.5% full)

Savings: 170,000 tokens (92% reduction)

Session Longevity:

Before: ~50 tool uses before context full
After: ~600+ tool uses before context full
12x longer sessions

Next Steps

Immediate Actions (This Week)

Database Migration: Add tool_use_id column to observations table
Hook Update: Modify PostToolUse hook to capture tool_use_id from Agent SDK
Architecture Validation: Confirm where Agent SDK session resume happens in worker service
Prototype: Build minimal TransformLayer class with observation lookup

Short Term (Next 2 Weeks)

Implement complete transform logic
Wire into worker service resume flow
Add endless mode feature flag
Test with real sessions

Medium Term (Next Month)

Beta release to power users
Gather feedback and iterate
Performance optimization
Documentation and user guides

Long Term (Future)

Make endless mode default
Hybrid sliding window (keep recent tools uncompressed)
Selective compression by tool type
Auto-tune compression based on context usage patterns

Open Questions

Tool Use ID Format: What does the Agent SDK's tool_use_id look like? Is it UUID, hash, or sequential?
Transcript Format: What's the exact JSONL schema for tool_use messages? Where is the content we'll replace?
Resume Hook Point: Where exactly in the worker service does session resume happen? Is there a clear integration point?
Observation Delay: How long between PostToolUse firing and observation being available in SQLite? Does this affect resume?
Feature Flag Storage: Environment variable, or persist user preference in database?

Conclusion

Endless Mode transforms claude-mem from a "memory between sessions" system into a "continuous compression engine" that enables truly infinite sessions. By leveraging the observations we're already creating in real-time and applying them as an ephemeral transformation layer during resume, we can extend session longevity by 10-12x without any risk to user data.

The key architectural insight is immutability: by never modifying source transcripts and performing all compression in memory, we get the benefits of context window optimization without the risks of data corruption or loss. Combined with the optional nature of the feature, this provides a safe, reversible path to fundamentally better session continuity.

This is the natural evolution of claude-mem: from remembering what happened before, to making it possible to never stop.

14 KiB Raw Permalink Blame History Unescape Escape

Endless Mode: Real-Time Context Compression Plan

Executive Summary

Problem Statement

Current Behavior

What Happens Today

The Gap

Proposed Solution: Endless Mode

Core Concept

Architecture Principle

Key Properties

How It Works

Session Resume Flow (Endless Mode Enabled)

Session Resume Flow (Endless Mode Disabled)

Implementation Plan

Phase 1: Foundation (Week 1)

Phase 2: Transform Logic (Week 2)

Phase 3: Agent SDK Integration (Week 2-3)

Phase 4: Testing & Validation (Week 3-4)

Phase 5: Beta Release (Week 4+)

Technical Requirements

Database Schema

Worker Service API

Agent SDK Integration Point

Risks and Mitigations

Risk 1: Information Loss

Risk 2: Transform Performance

Risk 3: Missing Observations

Risk 4: Transcript Corruption

Risk 5: Agent SDK Compatibility

Success Criteria

Proof of Concept Success

Beta Release Success

Production Success

Configuration

Environment Variables

User Controls

Context Economics: Before vs. After

Example Session (50 tool uses)

Next Steps

Immediate Actions (This Week)

Short Term (Next 2 Weeks)

Medium Term (Next Month)

Long Term (Future)

Open Questions

Conclusion

14 KiB

Raw Permalink Blame History