Files
claude-mem/docs/context/jit-context-postmortem.md
T
basher83 97d565e3cd Replace search skill with mem-search (#91)
* feat: add mem-search skill with progressive disclosure architecture

Add comprehensive mem-search skill for accessing claude-mem's persistent
cross-session memory database. Implements progressive disclosure workflow
and token-efficient search patterns.

Features:
- 12 search operations (observations, sessions, prompts, by-type, by-concept, by-file, timelines, etc.)
- Progressive disclosure principles to minimize token usage
- Anti-patterns documentation to guide LLM behavior
- HTTP API integration for all search functionality
- Common workflows with composition examples

Structure:
- SKILL.md: Entry point with temporal trigger patterns
- principles/: Progressive disclosure + anti-patterns
- operations/: 12 search operation files

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* docs: add CHANGELOG entry for mem-search skill

Document mem-search skill addition in Unreleased section with:
- 100% effectiveness compliance metrics
- Comparison to previous search skill implementation
- Progressive disclosure architecture details
- Reference to audit report documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* docs: add mem-search skill audit report

Add comprehensive audit report validating mem-search skill against
Anthropic's official skill-creator documentation.

Report includes:
- Effectiveness metrics comparison (search vs mem-search)
- Critical issues analysis for production readiness
- Compliance validation across 6 key dimensions
- Reference implementation guidance

Result: mem-search achieves 100% compliance vs search's 67%

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* feat: Add comprehensive search architecture analysis document

- Document current state of dual search architectures (HTTP API and MCP)
- Analyze HTTP endpoints and MCP search server architectures
- Identify DRY violations across search implementations
- Evaluate the use of curl as the optimal approach for search
- Provide architectural recommendations for immediate and long-term improvements
- Outline action plan for cleanup, feature parity, DRY refactoring

* refactor: Remove deprecated search skill documentation and operations

* refactor: Reorganize documentation into public and context directories

Changes:
- Created docs/public/ for Mintlify documentation (.mdx files)
- Created docs/context/ for internal planning and implementation docs
- Moved all .mdx files and assets to docs/public/
- Moved all internal .md files to docs/context/
- Added CLAUDE.md to both directories explaining their purpose
- Updated docs.json paths to work with new structure

Benefits:
- Clear separation between user-facing and internal documentation
- Easier to maintain Mintlify docs in dedicated directory
- Internal context files organized separately

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Enhance session management and continuity in hooks

- Updated new-hook.ts to clarify session_id threading and idempotent session creation.
- Modified prompts.ts to require claudeSessionId for continuation prompts, ensuring session context is maintained.
- Improved SessionStore.ts documentation on createSDKSession to emphasize idempotent behavior and session connection.
- Refined SDKAgent.ts to detail continuation prompt logic and its reliance on session.claudeSessionId for unified session handling.

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Alex Newman <thedotmack@gmail.com>
2025-11-11 16:15:07 -05:00

22 KiB

JIT Context Filtering: Post-Mortem

Date: November 9, 2025 Duration: 3.5 hours (7:45 PM - 11:11 PM) Branches: feature/jit-context, failed/jit-context Status: Failed, reverted to main Commits:

  • 3ac0790 - feat: Implement JIT context hook for user prompt submission
  • adf7bf4 - Refactor JIT context handling in SDKAgent and WorkerService

Executive Summary

Attempted to implement JIT (Just-In-Time) context filtering—a feature that would dynamically generate relevant context timelines on every user prompt, potentially replacing the static session-start context entirely. After multiple architectural iterations spanning 3.5 hours and adding ~2,850 lines of code, the implementation was abandoned and reverted. The revert was not due to lack of vision (the feature aligns with long-term architectural goals), but due to implementation complexity and the need for a simpler initial approach. Significant architectural knowledge was gained about hook limitations, worker patterns, and proper separation of concerns.

What We Tried to Build

Goal

When a user submits a prompt, dynamically generate a relevant context timeline instead of the static session-start context. Use the fast search infrastructure (SQLite FTS5 + ChromaDB) to fetch precisely relevant context on-demand.

The Vision

Current approach: SessionStart hook loads 50 recent observations blindly, displays them all.

Proposed approach: UserPromptSubmit hook analyzes the prompt, queries the timeline search API, and loads only the relevant context window dynamically.

Why this makes sense:

  • We already have fast search: SQLite FTS5 + Chroma semantic search
  • Dynamic context timeline search is implemented and tested
  • Search results come back in <200ms
  • Could replace session-start context entirely with smarter, prompt-specific context

User Experience

User types: "How did we fix the authentication bug?"

Behind the scenes:
1. Analyze prompt: "authentication bug fix"
2. Query timeline search for relevant period
3. Load 5-10 observations from that specific timeline
4. Inject as context
5. Claude answers with precisely relevant historical context

vs. Current:
Load 50 most recent observations regardless of relevance

Why Checkbox Settings Became Less Important

Originally asked for checkboxes to customize session-start context display. But if JIT context could replace session-start context with intelligent, prompt-specific timelines, the display customization became a non-issue.

Architectural Attempts

Attempt 1: Hook-Based Filtering (7:45 PM - 9:30 PM)

Approach: Call Agent SDK query() directly in new-hook.ts during UserPromptSubmit event.

Implementation:

  • Created jit-context-hook.ts (~432 lines)
  • Added generateJitContext() function in hook
  • Called SDK query() with observation list and user prompt
  • Expected hook to block for ~1-2s while Haiku filters

Failure:

Error: Claude Code executable not found at
/Users/alexnewman/.claude/plugins/marketplaces/thedotmack/plugin/scripts/cli.js

Root Cause: Hooks run in sandboxed environment without access to claudePath (path to Claude Code executable). The Agent SDK requires this path, which is only available in the worker service.

Architectural Violation: This broke the established pattern where hooks handle orchestration and workers handle AI processing. The save-hook sets the precedent: hooks capture data, send to worker, worker runs SDK queries asynchronously.

Attempt 2: Worker-Based with Simple Queries (9:30 PM - 10:30 PM)

Approach: Move JIT filtering to worker service, keep it simple with per-request SDK queries.

Implementation:

  • Documented architecture fix plan in docs/jit-context-architecture-fix.md
  • Moved generateJitContext() to worker (considered creating src/services/worker/JitContext.ts)
  • Modified /sessions/:id/init endpoint to accept jitEnabled flag
  • Worker would run one-shot SDK query per prompt

Architecture:

UserPromptSubmit → new-hook → POST /sessions/:id/init { jitEnabled: true }
                                      ↓
                              Worker spawns Claude Haiku
                                      ↓
                              Filters 50 obs → 3-5 IDs
                                      ↓
                              Returns { context: [...] }
                                      ↓
                              Hook injects context → Claude

Issues Identified:

  • Each filter request spawns a new Claude subprocess (~200-500ms overhead)
  • Observation list re-sent on every prompt (~5-10KB per request)
  • No token caching between requests
  • Performance worse than just loading all observations directly

Decision: Pivoted to persistent sessions to solve performance issues.

Attempt 3: Persistent JIT Sessions (10:30 PM - 11:11 PM)

Approach: Create a long-lived Agent SDK session that persists throughout user session, similar to main memory session pattern.

Implementation (291 new lines in SDKAgent.ts):

  1. Session Lifecycle:

    • Added jitSessionId, jitAbortController, jitGeneratorPromise to ActiveSession interface
    • startJitSession(): Creates persistent SDK session at session init
    • cleanupJitSession(): Terminates JIT session at session end
  2. Request Queue Architecture:

    • jitFilterQueues Map: Per-session request queues
    • JITFilterRequest interface: { userPrompt, resolve, reject }
    • EventEmitter coordination: Wake generator when new requests arrive
  3. Message Generator Pattern:

    • createJitMessageGenerator(): Async generator that yields filter requests
    • Initial prompt: Load 50 observations, wait for "READY" response
    • Loop: Wait for EventEmitter signal → yield user prompt → parse response → resolve promise
    • Pattern: Persistent session stays alive between requests
  4. Filter Query Flow:

    runFilterQuery(sessionDbId, userPrompt) {
      // Queue request
      queue.requests.push({ userPrompt, resolve, reject });
      queue.emitter.emit('request');
    
      // Wait for response (30s timeout)
      return Promise.race([
        new Promise((resolve, reject) => { /* queued */ }),
        timeout(30000)
      ]);
    }
    
  5. Response Processing:

    • processJitFilterResponse(): Accumulate streaming text
    • Parse IDs: "1,5,23,41" or "NONE"
    • Resolve queued promise with ID array

Added Files:

  • src/services/worker/SDKAgent.ts: +291 lines
  • src/services/worker-types.ts: +3 fields (jit state tracking)
  • src/services/worker/SessionManager.ts: +26 lines (JIT cleanup)
  • src/services/worker-service.ts: +102 lines (JIT initialization)
  • src/shared/settings.ts: +65 lines (JIT config)
  • src/hooks/jit-context-hook.ts: +208 lines (orchestration)
  • docs/jit-context-architecture-fix.md: +265 lines
  • context/session-pattern-parity.md: +298 lines

Total Changes: 18 files, +2,852 lines, -133 lines

Final Status at Revert: Implementation was complete and likely functional, but...

Why It Failed

1. Architectural Complexity Explosion

Problem: The persistent session pattern added enormous complexity for marginal benefit.

Evidence:

  • Parallel session management: Regular + JIT sessions running concurrently
  • Complex coordination: EventEmitter + promise queues + generator pattern
  • Lifecycle coupling: Session init, request handling, cleanup all intertwined
  • State explosion: 3 new fields per session (jitSessionId, jitAbortController, jitGeneratorPromise)

Code Smell: When the "optimization" requires 300 lines of coordination code, it's probably not an optimization.

2. Premature Optimization

YAGNI Violation: Built elaborate token caching and persistent session architecture before proving the feature provided value.

Reality Check:

  • Current approach: Load 50 observations = ~25KB context, works fine
  • JIT overhead: Haiku query = 1-2s latency + coordination complexity
  • User benefit: Unclear—users haven't complained about context relevance
  • Token savings: Marginal—Claude caches long contexts efficiently anyway

Quote from CLAUDE.md:

"Write the dumb, obvious thing first. Add complexity only when you actually hit the problem."

We didn't hit a problem. We invented one.

3. Implementation Complexity, Not Vision

The Vision is Sound:

  • Dynamic context is better than static context
  • Timeline search API exists and is fast
  • Infrastructure (SQLite + Chroma) can support this
  • Replacing session-start context with prompt-specific context makes sense

The Problem: We jumped to the complex persistent-session approach without trying the simple per-request approach first.

What We Should Have Done:

// Simple version (not tried):
app.post('/sessions/:id/init', async (req, res) => {
  const { userPrompt } = req.body;

  // Query timeline search API (already exists, fast)
  const timeline = await timelineSearch(project, userPrompt, depth=10);

  // Return observations
  return res.json({ context: timeline });
});

This would have:

  • Validated the feature's value quickly
  • Used existing infrastructure
  • Avoided all the persistence complexity
  • Taken 30 minutes instead of 3.5 hours

4. Pattern Divergence

Inconsistency: JIT sessions work fundamentally differently from memory sessions.

Memory Session Pattern:

// One-shot: Init → Process observations → Complete
startSession()  yield prompts  parse responses  complete

JIT Session Pattern:

// Persistent: Init → Wait indefinitely → Process on-demand → Complete
startJitSession()  yield initial load  LOOP:
  - Wait for EventEmitter signal
  - Yield filter request
  - Parse response
  - Resolve promise
  - GOTO LOOP

Maintenance Burden: Two completely different session patterns means:

  • Doubled testing complexity
  • Increased cognitive load for contributors
  • Higher risk of subtle bugs in lifecycle management

Session Pattern Parity Document: The 298-line session-pattern-parity.md was created to document the differences—a sign that maybe they shouldn't be different.

5. Blocking I/O in Critical Path

Performance Impact: Every user prompt now blocks for 1-2s waiting for Haiku filtering.

Current Flow:

User types prompt → 10ms → Claude responds

JIT Flow:

User types prompt → 10ms init → 1-2s Haiku filter → Claude responds

User Experience: We added 1-2 seconds of latency to every interaction for questionable benefit.

Alternative: If context filtering is valuable, do it asynchronously and apply to next prompt.

6. Missing the Forest for the Trees

Real Issue: We focused on technical implementation without asking strategic questions:

  • Is context relevance actually a problem? No evidence.
  • Do users want this? No feedback requested.
  • Is 50 observations too many? Not proven.
  • Does filtering improve responses? Not tested.

Anti-Pattern: Solution in search of a problem.

What We Should Have Done

Option 1: Don't Build It

Justification: No validated user need. Current system works fine.

Next Step: Wait for user feedback indicating context relevance is an issue.

Option 2: Simple MVP

If we really wanted to explore this:

  1. Week 1: Add basic filtering in worker with one-shot queries

    • Accept slight performance hit (~500ms overhead)
    • Measure filter accuracy and user impact
    • Gather feedback
  2. Week 2: If proven valuable, optimize

    • Add token caching only if needed
    • Consider persistent sessions only if performance is bottleneck
  3. Week 3: If still valuable, scale

    • Polish error handling
    • Add configuration options
    • Document patterns

Philosophy: Incremental validation, not big-bang architecture.

Option 3: Different Approach Entirely

Alternative: Pre-computed relevance scores

Instead of on-demand filtering:

  • Score observations at creation time (save-hook)
  • Store relevance embeddings in Chroma
  • At session start, query Chroma with user's first prompt
  • Load top 10-20 most relevant observations
  • No runtime latency, better accuracy, simpler architecture

Benefit: Leverages existing Chroma infrastructure, avoids runtime overhead.

Technical Lessons Learned

1. EventEmitter Coordination Anti-Pattern

Code:

queue.emitter.on('request', () => {
  // Wake up generator to process request
});

Issue: Complex async coordination using event-driven wakeup signals is hard to reason about.

Better: Use async queues or channels (e.g., async-queue package) that handle coordination internally.

2. Generator Pattern Complexity

Pattern:

async *createJitMessageGenerator() {
  yield initialPrompt;
  while (!aborted) {
    await waitForEvent();  // Blocks here
    yield nextRequest;
  }
}

Tradeoff: Generators are great for iteration, but terrible for event-driven request/response patterns.

Better: Use explicit session object with sendMessage()/waitForResponse() methods.

3. Dual Session Management

Complexity: Managing two concurrent SDK sessions per user session is inherently complex.

Alternatives Considered:

  • Single session handling both observations and filtering (rejected: tight coupling)
  • Separate service for filtering (rejected: too much infrastructure)
  • Pre-computed filtering (not considered: should have been)

Lesson: When parallel state management feels hard, question whether you need parallel state.

4. Promise Queue Pattern

Implementation:

interface QueuedRequest {
  resolve: (result: T) => void;
  reject: (error: Error) => void;
}
queue.push({ resolve, reject });
// Later...
queue[0].resolve(result);

Good: Clean async API for callers Bad: Easy to leak promises if error handling isn't perfect Improvement: Use libraries like p-queue that handle edge cases

Process Lessons Learned

1. No Incremental Validation

Mistake: Went from "idea" to "complete architecture" without validation points.

Better Process:

  1. Write one-pager explaining user value
  2. Build simplest possible version (2 hours max)
  3. Test with real usage
  4. Measure impact
  5. Decide: kill, iterate, or scale

Checkpoint Questions:

  • After 1 hour: "Does this solve a real problem?"
  • After 2 hours: "Is this getting too complex?"
  • After 3 hours: "Should I just ship the simple version?"

2. Architecture Astronomy

Definition: Designing elaborate systems without building/testing them.

Evidence:

  • 265-line architecture doc written before any code
  • 298-line session pattern parity analysis
  • Multiple complete rewrites of the same feature

Better: Code first, document later. Spike solutions, learn from implementation.

3. Sunk Cost Fallacy

Timeline:

  • Hour 1: "This seems complex but achievable"
  • Hour 2: "We're halfway done, can't stop now"
  • Hour 3: "Just need to fix this one coordination issue"
  • Hour 4: "It's working, but... this feels wrong"

Correct Decision: Revert. Took courage to throw away 4 hours of work.

Learning: Time invested is not a reason to continue. Quality of outcome matters more.

4. Missing User Feedback Loop

No User Input:

  • Didn't ask: "Is context relevance a problem for you?"
  • Didn't test: "Does filtered context improve your responses?"
  • Didn't measure: "Are you hitting context limits?"

Engineering Theater: Building impressive-sounding features without user validation.

What We Actually Learned (The Real Value)

Despite reverting, this was productive R&D:

1. Deep Understanding of Hook Architecture

Critical Discovery: Hooks run in sandboxed environment without claudePath.

  • Hooks cannot call Agent SDK query() directly
  • All AI processing must happen in worker service
  • This architectural constraint is now documented

Learned Pattern:

Hook (orchestration) → Worker (AI processing)
✓ save-hook: Captures data → Worker processes with SDK
✓ new-hook: Creates session → Worker returns confirmation
✗ jit-hook: Tried SDK in hook → Failed, no claudePath

Value: Future features will avoid this mistake. We now know the boundary.

2. Worker Architecture Patterns

Blocking vs. Non-Blocking:

  • SessionStart: Can be non-blocking (context loads async)
  • UserPromptSubmit: Must be blocking (session must exist before processing)
  • JIT Context: Must be blocking (context needed before prompt processed)

Established Pattern:

// Worker endpoint for features requiring AI
app.post('/sessions/:id/operation', async (req, res) => {
  const { operationData } = req.body;
  const result = await sdkAgent.performOperation(operationData);
  return res.json({ result });
});

3. Persistent Session Management

Architecture Knowledge Gained:

  • How to maintain long-lived SDK sessions
  • EventEmitter coordination patterns for request/response
  • Promise queue management for async operations
  • Proper cleanup with AbortControllers

Pattern Documented:

  • Dual session management (regular + JIT)
  • Generator-based message loops
  • Request queuing with timeouts

Value: When we build the simpler version, we'll know these patterns.

4. Configuration Infrastructure

src/shared/settings.ts (65 lines) provides reusable configuration patterns:

export function getConfigValue(key: string, defaultValue: string): string {
  // Priority: settings.json → env var → default
}

Kept After Revert: This module is useful for other features.

5. Key Architectural Decisions Made

Decisions that will guide future implementation:

  1. JIT context filtering must happen in worker (proven via failed hook attempt)
  2. Context must be blocking on UserPromptSubmit (session needs context before processing)
  3. Dynamic timeline search is the right approach (fast, precise, leverages existing infrastructure)
  4. Simple per-request queries should be tried before persistent sessions

6. Documentation Quality

  • jit-context-architecture-fix.md: Documents why hooks can't run SDK queries
  • session-pattern-parity.md: Reference for implementing dual sessions
  • Hooks reference: Comprehensive hook documentation added

Value: These docs help future contributors understand the system constraints.

7. Infrastructure Validation

Confirmed that our search stack is ready:

  • SQLite FTS5: Fast full-text search (<50ms)
  • ChromaDB: Semantic search (<200ms with 8,000+ vectors)
  • Timeline search API: Already implemented and tested
  • Worker service: Can handle synchronous AI operations

The infrastructure exists. We just need a simpler integration.

Recommendations

Immediate Actions

  1. Archive the work:

    • Keep failed/jit-context branch for reference
    • Extract reusable components (settings.ts)
    • Save architecture docs for future features
  2. Document the anti-patterns:

    • Add this post-mortem to CLAUDE.md references
    • Update coding standards with lessons learned
  3. Reset focus:

    • Return to validated user needs
    • Prioritize features with clear value propositions

Future Feature Development

Gating Questions (Answer before coding):

  1. User Value: What specific user problem does this solve?
  2. Evidence: Have users requested this or reported the underlying issue?
  3. Measurement: How will we know if it's successful?
  4. Simplicity: What's the dumbest version that could work?
  5. Time Limit: If we can't prove value in 2 hours, should we build it?

Process:

VALIDATE → BUILD SIMPLE → TEST → MEASURE → DECIDE
   ↑                                          ↓
   └──────────── ITERATE OR KILL ────────────┘

If Context Filtering Returns

Should we revisit this idea in the future:

Prerequisites:

  • User feedback requesting better context relevance
  • Metrics showing current context is too broad
  • Evidence that filtering improves response quality

Simple Approach:

// In worker-service.ts /sessions/:id/init
if (jitEnabled) {
  const observations = await db.getRecentObservations(project, 50);
  const filtered = await simpleFilter(observations, userPrompt);  // One-shot query
  return { context: filtered };
}

Acceptance Criteria:

  • <100 lines of code
  • <500ms latency impact
  • No new session types
  • Degrades gracefully on errors

If that works: Then consider optimization.

Conclusion

JIT context filtering failed not because the vision was wrong, but because we jumped to the complex implementation without validating the simple one first. The feature aligns with long-term goals (dynamic, prompt-specific context using our fast search infrastructure), but the persistent-session architecture was premature optimization.

The right call: Revert the complex implementation. Build the simple version when ready.

Key Takeaway: The vision is sound. The execution was overcomplicated. We now have:

  • Deep knowledge of hook/worker architecture constraints
  • Documented patterns for persistent SDK sessions
  • Validated fast search infrastructure
  • Clear understanding of what to build next time (simple timeline search API integration)

This was R&D, not failure. We learned what doesn't work (SDK in hooks), what does work (worker-based AI processing), and how to approach it next time (simple API calls before persistent sessions).

Next Implementation: When we revisit this (and we should), start with:

  1. Worker endpoint that accepts prompt
  2. Queries existing timeline search API
  3. Returns context
  4. Hook injects context
  5. Validate it improves responses
  6. Then optimize if needed

Final Thought: Sometimes you have to build the wrong thing to understand the right thing. That's R&D.


Branch Status:

  • feature/jit-context: Abandoned
  • failed/jit-context: Archived for reference
  • main: Stable at v5.4.0

Files to Keep:

  • src/shared/settings.ts: Reusable config utilities

Files Discarded:

  • Everything else (+2,850 lines)

Emotional State: Relieved. Dodged a maintenance nightmare.