14 KiB
Endless Mode: Real-Time Context Compression Plan
Executive Summary
"Endless Mode" is an optional feature that enables Claude sessions to run indefinitely by transparently compressing tool use transcripts in real-time. Using an in-memory transformation layer in the worker service, heavy tool outputs are dynamically replaced with lightweight observations during session resume—without modifying the immutable source transcripts. This allows sessions to continue for weeks or months without hitting context window limits, while preserving full conversation history and maintaining zero risk of data corruption.
Problem Statement
Current Behavior
Claude sessions accumulate full tool transcripts in the context window:
- File reads: 5k-10k tokens per read
- Bash outputs: 1k-5k tokens per command
- Search results: 2k-8k tokens per search
- Total context limit: ~200k tokens
When the context window fills, users must start a new session, losing conversational continuity.
What Happens Today
- Tool executes during session
- PostToolUse hook captures tool data
- Worker creates compressed observation (~200-500 tokens)
- But: Full tool transcript stays in Claude's context window
- Observation only helps next session via SessionStart injection
The Gap
Observations exist and are created in real-time, but they're not used to compress the current session's context. We have the compressed data, we just don't apply it to the active session.
Proposed Solution: Endless Mode
Core Concept
When a session resumes (either after restart or during continuation), transform messages in memory by replacing heavy tool use content with lightweight observations before feeding them to the Agent SDK. The source transcript remains immutable on disk.
Architecture Principle
Immutable Storage + Ephemeral Transform = Safe Compression
Disk (never modified) Memory (transform) Agent SDK
────────────────────── ────────────────────── ────────────────
transcript.jsonl Load messages Resume session
tool_use_abc → Look up observation → with compressed
tool_use_def Replace content context
tool_use_xyz Feed to SDK
Key Properties
- Immutable: Original transcripts never modified
- Non-destructive: Full history preserved on disk
- No duplication: No forks, no copies
- Transparent: User sees same conversation, compression is under the hood
- Optional: Feature flag allows users to opt-in/out
- Reversible: Can always read original transcript
How It Works
Session Resume Flow (Endless Mode Enabled)
1. User continues session / Claude Code restarts
↓
2. Worker service intercepts resume request
↓
3. Load transcript JSONL from disk (immutable)
↓
4. Transform Loop:
For each message in transcript:
- If tool_use message:
- Query SQLite: SELECT observation WHERE tool_use_id = ?
- Replace tool content with observation (facts, narrative, concepts)
- If other message type:
- Pass through unchanged
↓
5. Feed transformed messages to Agent SDK
↓
6. Agent SDK resumes session with compressed context
↓
7. New tool uses append to original transcript (normal flow)
↓
8. Next resume: Loop repeats, new tool uses also get compressed
Session Resume Flow (Endless Mode Disabled)
1. User continues session
↓
2. Load transcript JSONL from disk
↓
3. Feed messages directly to Agent SDK (no transformation)
↓
4. Session resumes with full tool transcripts (current behavior)
Implementation Plan
Phase 1: Foundation (Week 1)
Goal: Set up infrastructure for transformation layer
Tasks:
- Add
tool_use_idcolumn to observations table (SQLite schema migration) - Update PostToolUse hook to capture and store tool_use_id
- Create
TransformLayerclass in worker service - Add
CLAUDE_MEM_ENDLESS_MODEenvironment variable (default: false) - Write tests for observation lookup by tool_use_id
Deliverable: Database schema updated, tool_use_ids being captured
Phase 2: Transform Logic (Week 2)
Goal: Build message transformation engine
Tasks:
- Implement
TransformLayer.transformMessages(messages)function - Tool use detection logic (identify tool_use messages in transcript)
- Observation lookup and replacement logic
- Fallback handling (if observation missing, keep original content)
- Message serialization/deserialization
Deliverable: Working transform function that compresses messages in memory
Phase 3: Agent SDK Integration (Week 2-3)
Goal: Wire transform layer into session resume flow
Tasks:
- Identify where worker service resumes Agent SDK sessions
- Inject transform layer before session resume
- Add feature flag check (only transform if endless mode enabled)
- Logging and instrumentation (track compression ratios, transform time)
- Error handling and graceful degradation
Deliverable: Worker service can resume sessions with compressed context
Phase 4: Testing & Validation (Week 3-4)
Goal: Verify endless mode works correctly
Tasks:
- Create test session with 50+ tool uses
- Enable endless mode and resume session
- Verify context window usage (should be dramatically lower)
- Test conversation quality (does Claude have enough context?)
- Measure performance (transform latency, lookup speed)
- Edge case testing (missing observations, malformed transcripts)
Deliverable: Endless mode working in test environment
Phase 5: Beta Release (Week 4+)
Goal: Release to power users for feedback
Tasks:
- Documentation (how to enable, what to expect, how to disable)
- Add endless mode toggle to viewer UI
- Monitoring and observability (track usage, failures, compression stats)
- Collect feedback from beta users
- Iterate based on real-world usage
Deliverable: Endless mode available as opt-in beta feature
Technical Requirements
Database Schema
-- Add to observations table
ALTER TABLE observations ADD COLUMN tool_use_id TEXT UNIQUE;
CREATE INDEX idx_observations_tool_use_id ON observations(tool_use_id);
Worker Service API
interface TransformLayerConfig {
enabled: boolean; // CLAUDE_MEM_ENDLESS_MODE
fallbackToOriginal: boolean; // If observation missing, use full content
maxLookupTime: number; // Timeout for SQLite queries
}
class TransformLayer {
constructor(config: TransformLayerConfig, db: SessionStore);
// Main transform function
async transformMessages(messages: Message[]): Promise<Message[]>;
// Helper functions
private async lookupObservation(toolUseId: string): Promise<Observation | null>;
private replaceToolContent(message: Message, observation: Observation): Message;
private isToolUseMessage(message: Message): boolean;
}
Agent SDK Integration Point
// In worker service session resume logic
async function resumeSession(sessionId: string, transcriptPath: string) {
const messages = await loadTranscript(transcriptPath);
// Transform layer (only if endless mode enabled)
const transformedMessages = config.endlessMode
? await transformLayer.transformMessages(messages)
: messages;
// Resume with transformed (or original) messages
return await agentSDK.resumeSession({
sessionId,
messages: transformedMessages
});
}
Risks and Mitigations
Risk 1: Information Loss
Risk: Compressed observations may lose critical details that Claude needs to reference later.
Mitigation:
- Make endless mode optional (users can disable if quality degrades)
- Improve observation quality (better prompts, more comprehensive facts)
- Hybrid approach: Keep recent N tool uses in full, compress older ones
- Monitor conversation quality metrics
Risk 2: Transform Performance
Risk: Looking up observations for 100+ tool uses during resume could be slow.
Mitigation:
- Index tool_use_id in SQLite (O(log n) lookups)
- Batch queries (single SELECT with IN clause)
- Measure and optimize (target <100ms for typical session)
- Cache observations in memory during session
Risk 3: Missing Observations
Risk: Tool use executed but observation not yet created (async worker lag).
Mitigation:
- Fallback to original content if observation missing
- Log when fallback occurs (helps identify worker performance issues)
- Allow observations to be created retroactively
- Consider synchronous observation creation for critical tools
Risk 4: Transcript Corruption
Risk: Bug in transform layer could corrupt user conversations.
Mitigation:
- Never modify source transcripts (read-only)
- Transform happens in memory only
- Extensive testing before beta release
- Feature flag allows instant disable if issues found
- Keep full audit trail in logs
Risk 5: Agent SDK Compatibility
Risk: Agent SDK updates could break transform layer integration.
Mitigation:
- Document exact Agent SDK version requirements
- Monitor Agent SDK release notes
- Test against new SDK versions before upgrading
- Graceful degradation if SDK changes detected
Success Criteria
Proof of Concept Success
- Transform layer successfully compresses a 50-tool-use session
- Context window usage reduced by 80%+ compared to uncompressed
- Session resumes without errors
- Conversation quality remains high (subjective evaluation)
Beta Release Success
- 10+ users running endless mode without issues
- Average context savings: 85%+ across all sessions
- Transform latency: <200ms for typical resume
- Zero transcript corruption incidents
- Positive user feedback on conversation continuity
Production Success
- Endless mode becomes default setting
- Sessions running for weeks/months without context issues
- Context window exhaustion becomes rare edge case
- User-reported "session too long" issues drop to near zero
- Transform layer performance scales to 1000+ tool use sessions
Configuration
Environment Variables
# Enable endless mode (default: false)
CLAUDE_MEM_ENDLESS_MODE=true
# Fallback behavior if observation missing (default: true)
CLAUDE_MEM_TRANSFORM_FALLBACK=true
# Max time to wait for observation lookup (default: 500ms)
CLAUDE_MEM_TRANSFORM_TIMEOUT=500
# Keep recent N tool uses uncompressed (default: 0, compress all)
CLAUDE_MEM_TRANSFORM_KEEP_RECENT=0
User Controls
// Future: UI toggle in viewer
interface EndlessModeSettings {
enabled: boolean;
keepRecentToolUses: number; // Hybrid mode
fallbackToOriginal: boolean;
}
Context Economics: Before vs. After
Example Session (50 tool uses)
Before (Endless Mode OFF):
File reads: 10 × 8,000 tokens = 80,000 tokens
Bash outputs: 20 × 2,000 tokens = 40,000 tokens
Searches: 15 × 4,000 tokens = 60,000 tokens
Other tools: 5 × 1,000 tokens = 5,000 tokens
──────────────────────────────────────────────────
Total: 185,000 tokens
Context remaining: 15,000 tokens (92% full)
After (Endless Mode ON):
File reads: 10 × 300 tokens = 3,000 tokens
Bash outputs: 20 × 250 tokens = 5,000 tokens
Searches: 15 × 400 tokens = 6,000 tokens
Other tools: 5 × 200 tokens = 1,000 tokens
──────────────────────────────────────────────────
Total: 15,000 tokens
Context remaining: 185,000 tokens (7.5% full)
Savings: 170,000 tokens (92% reduction)
Session Longevity:
- Before: ~50 tool uses before context full
- After: ~600+ tool uses before context full
- 12x longer sessions
Next Steps
Immediate Actions (This Week)
- Database Migration: Add tool_use_id column to observations table
- Hook Update: Modify PostToolUse hook to capture tool_use_id from Agent SDK
- Architecture Validation: Confirm where Agent SDK session resume happens in worker service
- Prototype: Build minimal TransformLayer class with observation lookup
Short Term (Next 2 Weeks)
- Implement complete transform logic
- Wire into worker service resume flow
- Add endless mode feature flag
- Test with real sessions
Medium Term (Next Month)
- Beta release to power users
- Gather feedback and iterate
- Performance optimization
- Documentation and user guides
Long Term (Future)
- Make endless mode default
- Hybrid sliding window (keep recent tools uncompressed)
- Selective compression by tool type
- Auto-tune compression based on context usage patterns
Open Questions
- Tool Use ID Format: What does the Agent SDK's tool_use_id look like? Is it UUID, hash, or sequential?
- Transcript Format: What's the exact JSONL schema for tool_use messages? Where is the content we'll replace?
- Resume Hook Point: Where exactly in the worker service does session resume happen? Is there a clear integration point?
- Observation Delay: How long between PostToolUse firing and observation being available in SQLite? Does this affect resume?
- Feature Flag Storage: Environment variable, or persist user preference in database?
Conclusion
Endless Mode transforms claude-mem from a "memory between sessions" system into a "continuous compression engine" that enables truly infinite sessions. By leveraging the observations we're already creating in real-time and applying them as an ephemeral transformation layer during resume, we can extend session longevity by 10-12x without any risk to user data.
The key architectural insight is immutability: by never modifying source transcripts and performing all compression in memory, we get the benefits of context window optimization without the risks of data corruption or loss. Combined with the optional nature of the feature, this provides a safe, reversible path to fundamentally better session continuity.
This is the natural evolution of claude-mem: from remembering what happened before, to making it possible to never stop.