Files

T

basher83 97d565e3cd Replace search skill with mem-search (#91 )

* feat: add mem-search skill with progressive disclosure architecture

Add comprehensive mem-search skill for accessing claude-mem's persistent
cross-session memory database. Implements progressive disclosure workflow
and token-efficient search patterns.

Features:
- 12 search operations (observations, sessions, prompts, by-type, by-concept, by-file, timelines, etc.)
- Progressive disclosure principles to minimize token usage
- Anti-patterns documentation to guide LLM behavior
- HTTP API integration for all search functionality
- Common workflows with composition examples

Structure:
- SKILL.md: Entry point with temporal trigger patterns
- principles/: Progressive disclosure + anti-patterns
- operations/: 12 search operation files

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* docs: add CHANGELOG entry for mem-search skill

Document mem-search skill addition in Unreleased section with:
- 100% effectiveness compliance metrics
- Comparison to previous search skill implementation
- Progressive disclosure architecture details
- Reference to audit report documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* docs: add mem-search skill audit report

Add comprehensive audit report validating mem-search skill against
Anthropic's official skill-creator documentation.

Report includes:
- Effectiveness metrics comparison (search vs mem-search)
- Critical issues analysis for production readiness
- Compliance validation across 6 key dimensions
- Reference implementation guidance

Result: mem-search achieves 100% compliance vs search's 67%

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* feat: Add comprehensive search architecture analysis document

- Document current state of dual search architectures (HTTP API and MCP)
- Analyze HTTP endpoints and MCP search server architectures
- Identify DRY violations across search implementations
- Evaluate the use of curl as the optimal approach for search
- Provide architectural recommendations for immediate and long-term improvements
- Outline action plan for cleanup, feature parity, DRY refactoring

* refactor: Remove deprecated search skill documentation and operations

* refactor: Reorganize documentation into public and context directories

Changes:
- Created docs/public/ for Mintlify documentation (.mdx files)
- Created docs/context/ for internal planning and implementation docs
- Moved all .mdx files and assets to docs/public/
- Moved all internal .md files to docs/context/
- Added CLAUDE.md to both directories explaining their purpose
- Updated docs.json paths to work with new structure

Benefits:
- Clear separation between user-facing and internal documentation
- Easier to maintain Mintlify docs in dedicated directory
- Internal context files organized separately

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Enhance session management and continuity in hooks

- Updated new-hook.ts to clarify session_id threading and idempotent session creation.
- Modified prompts.ts to require claudeSessionId for continuation prompts, ensuring session context is maintained.
- Improved SessionStore.ts documentation on createSDKSession to emphasize idempotent behavior and session connection.
- Refined SDKAgent.ts to detail continuation prompt logic and its reliance on session.claudeSessionId for unified session handling.

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Alex Newman <thedotmack@gmail.com>

2025-11-11 16:15:07 -05:00

22 KiB

Raw Blame History

JIT Context Filtering: Post-Mortem

Date: November 9, 2025 Duration: 3.5 hours (7:45 PM - 11:11 PM) Branches: feature/jit-context, failed/jit-context Status: Failed, reverted to main Commits:

3ac0790 - feat: Implement JIT context hook for user prompt submission
adf7bf4 - Refactor JIT context handling in SDKAgent and WorkerService

Executive Summary

Attempted to implement JIT (Just-In-Time) context filtering—a feature that would dynamically generate relevant context timelines on every user prompt, potentially replacing the static session-start context entirely. After multiple architectural iterations spanning 3.5 hours and adding ~2,850 lines of code, the implementation was abandoned and reverted. The revert was not due to lack of vision (the feature aligns with long-term architectural goals), but due to implementation complexity and the need for a simpler initial approach. Significant architectural knowledge was gained about hook limitations, worker patterns, and proper separation of concerns.

What We Tried to Build

Goal

When a user submits a prompt, dynamically generate a relevant context timeline instead of the static session-start context. Use the fast search infrastructure (SQLite FTS5 + ChromaDB) to fetch precisely relevant context on-demand.

The Vision

Current approach: SessionStart hook loads 50 recent observations blindly, displays them all.

Proposed approach: UserPromptSubmit hook analyzes the prompt, queries the timeline search API, and loads only the relevant context window dynamically.

Why this makes sense:

We already have fast search: SQLite FTS5 + Chroma semantic search
Dynamic context timeline search is implemented and tested
Search results come back in <200ms
Could replace session-start context entirely with smarter, prompt-specific context

User Experience

User types: "How did we fix the authentication bug?"

Behind the scenes:
1. Analyze prompt: "authentication bug fix"
2. Query timeline search for relevant period
3. Load 5-10 observations from that specific timeline
4. Inject as context
5. Claude answers with precisely relevant historical context

vs. Current:
Load 50 most recent observations regardless of relevance

Why Checkbox Settings Became Less Important

Originally asked for checkboxes to customize session-start context display. But if JIT context could replace session-start context with intelligent, prompt-specific timelines, the display customization became a non-issue.

Architectural Attempts

Attempt 1: Hook-Based Filtering (7:45 PM - 9:30 PM)

Approach: Call Agent SDK query() directly in new-hook.ts during UserPromptSubmit event.

Implementation:

Created jit-context-hook.ts (~432 lines)
Added generateJitContext() function in hook
Called SDK query() with observation list and user prompt
Expected hook to block for ~1-2s while Haiku filters

Failure:

Error: Claude Code executable not found at
/Users/alexnewman/.claude/plugins/marketplaces/thedotmack/plugin/scripts/cli.js

Root Cause: Hooks run in sandboxed environment without access to claudePath (path to Claude Code executable). The Agent SDK requires this path, which is only available in the worker service.

Architectural Violation: This broke the established pattern where hooks handle orchestration and workers handle AI processing. The save-hook sets the precedent: hooks capture data, send to worker, worker runs SDK queries asynchronously.

Attempt 2: Worker-Based with Simple Queries (9:30 PM - 10:30 PM)

Approach: Move JIT filtering to worker service, keep it simple with per-request SDK queries.

Implementation:

Documented architecture fix plan in docs/jit-context-architecture-fix.md
Moved generateJitContext() to worker (considered creating src/services/worker/JitContext.ts)
Modified /sessions/:id/init endpoint to accept jitEnabled flag
Worker would run one-shot SDK query per prompt

Architecture:

UserPromptSubmit → new-hook → POST /sessions/:id/init { jitEnabled: true }
                                      ↓
                              Worker spawns Claude Haiku
                                      ↓
                              Filters 50 obs → 3-5 IDs
                                      ↓
                              Returns { context: [...] }
                                      ↓
                              Hook injects context → Claude

Issues Identified:

Each filter request spawns a new Claude subprocess (~200-500ms overhead)
Observation list re-sent on every prompt (~5-10KB per request)
No token caching between requests
Performance worse than just loading all observations directly

Decision: Pivoted to persistent sessions to solve performance issues.

Attempt 3: Persistent JIT Sessions (10:30 PM - 11:11 PM)

Approach: Create a long-lived Agent SDK session that persists throughout user session, similar to main memory session pattern.

Implementation (291 new lines in SDKAgent.ts):

Session Lifecycle:
- Added jitSessionId, jitAbortController, jitGeneratorPromise to ActiveSession interface
- startJitSession(): Creates persistent SDK session at session init
- cleanupJitSession(): Terminates JIT session at session end
Request Queue Architecture:
- jitFilterQueues Map: Per-session request queues
- JITFilterRequest interface: { userPrompt, resolve, reject }
- EventEmitter coordination: Wake generator when new requests arrive
Message Generator Pattern:
- createJitMessageGenerator(): Async generator that yields filter requests
- Initial prompt: Load 50 observations, wait for "READY" response
- Loop: Wait for EventEmitter signal → yield user prompt → parse response → resolve promise
- Pattern: Persistent session stays alive between requests

Filter Query Flow:

runFilterQuery(sessionDbId, userPrompt) {
  // Queue request
  queue.requests.push({ userPrompt, resolve, reject });
  queue.emitter.emit('request');

  // Wait for response (30s timeout)
  return Promise.race([
    new Promise((resolve, reject) => { /* queued */ }),
    timeout(30000)
  ]);
}

Response Processing:
- processJitFilterResponse(): Accumulate streaming text
- Parse IDs: "1,5,23,41" or "NONE"
- Resolve queued promise with ID array

Added Files:

src/services/worker/SDKAgent.ts: +291 lines
src/services/worker-types.ts: +3 fields (jit state tracking)
src/services/worker/SessionManager.ts: +26 lines (JIT cleanup)
src/services/worker-service.ts: +102 lines (JIT initialization)
src/shared/settings.ts: +65 lines (JIT config)
src/hooks/jit-context-hook.ts: +208 lines (orchestration)
docs/jit-context-architecture-fix.md: +265 lines
context/session-pattern-parity.md: +298 lines

Total Changes: 18 files, +2,852 lines, -133 lines

Final Status at Revert: Implementation was complete and likely functional, but...

Why It Failed

1. Architectural Complexity Explosion

Problem: The persistent session pattern added enormous complexity for marginal benefit.

Evidence:

Parallel session management: Regular + JIT sessions running concurrently
Complex coordination: EventEmitter + promise queues + generator pattern
Lifecycle coupling: Session init, request handling, cleanup all intertwined
State explosion: 3 new fields per session (jitSessionId, jitAbortController, jitGeneratorPromise)

Code Smell: When the "optimization" requires 300 lines of coordination code, it's probably not an optimization.

2. Premature Optimization

YAGNI Violation: Built elaborate token caching and persistent session architecture before proving the feature provided value.

Reality Check:

Current approach: Load 50 observations = ~25KB context, works fine
JIT overhead: Haiku query = 1-2s latency + coordination complexity
User benefit: Unclear—users haven't complained about context relevance
Token savings: Marginal—Claude caches long contexts efficiently anyway

Quote from CLAUDE.md:

"Write the dumb, obvious thing first. Add complexity only when you actually hit the problem."

We didn't hit a problem. We invented one.

3. Implementation Complexity, Not Vision

The Vision is Sound:

Dynamic context is better than static context
Timeline search API exists and is fast
Infrastructure (SQLite + Chroma) can support this
Replacing session-start context with prompt-specific context makes sense

The Problem: We jumped to the complex persistent-session approach without trying the simple per-request approach first.

What We Should Have Done:

// Simple version (not tried):
app.post('/sessions/:id/init', async (req, res) => {
  const { userPrompt } = req.body;

  // Query timeline search API (already exists, fast)
  const timeline = await timelineSearch(project, userPrompt, depth=10);

  // Return observations
  return res.json({ context: timeline });
});

This would have:

Validated the feature's value quickly
Used existing infrastructure
Avoided all the persistence complexity
Taken 30 minutes instead of 3.5 hours

4. Pattern Divergence

Inconsistency: JIT sessions work fundamentally differently from memory sessions.

Memory Session Pattern:

// One-shot: Init → Process observations → Complete
startSession() → yield prompts → parse responses → complete

JIT Session Pattern:

// Persistent: Init → Wait indefinitely → Process on-demand → Complete
startJitSession() → yield initial load → LOOP:
  - Wait for EventEmitter signal
  - Yield filter request
  - Parse response
  - Resolve promise
  - GOTO LOOP

Maintenance Burden: Two completely different session patterns means:

Doubled testing complexity
Increased cognitive load for contributors
Higher risk of subtle bugs in lifecycle management

Session Pattern Parity Document: The 298-line session-pattern-parity.md was created to document the differences—a sign that maybe they shouldn't be different.

5. Blocking I/O in Critical Path

Performance Impact: Every user prompt now blocks for 1-2s waiting for Haiku filtering.

Current Flow:

User types prompt → 10ms → Claude responds

JIT Flow:

User types prompt → 10ms init → 1-2s Haiku filter → Claude responds

User Experience: We added 1-2 seconds of latency to every interaction for questionable benefit.

Alternative: If context filtering is valuable, do it asynchronously and apply to next prompt.

6. Missing the Forest for the Trees

Real Issue: We focused on technical implementation without asking strategic questions:

Is context relevance actually a problem? No evidence.
Do users want this? No feedback requested.
Is 50 observations too many? Not proven.
Does filtering improve responses? Not tested.

Anti-Pattern: Solution in search of a problem.

What We Should Have Done

Option 1: Don't Build It

Justification: No validated user need. Current system works fine.

Next Step: Wait for user feedback indicating context relevance is an issue.

Option 2: Simple MVP

If we really wanted to explore this:

Week 1: Add basic filtering in worker with one-shot queries
- Accept slight performance hit (~500ms overhead)
- Measure filter accuracy and user impact
- Gather feedback
Week 2: If proven valuable, optimize
- Add token caching only if needed
- Consider persistent sessions only if performance is bottleneck
Week 3: If still valuable, scale
- Polish error handling
- Add configuration options
- Document patterns

Philosophy: Incremental validation, not big-bang architecture.

Option 3: Different Approach Entirely

Alternative: Pre-computed relevance scores

Instead of on-demand filtering:

Score observations at creation time (save-hook)
Store relevance embeddings in Chroma
At session start, query Chroma with user's first prompt
Load top 10-20 most relevant observations
No runtime latency, better accuracy, simpler architecture

Benefit: Leverages existing Chroma infrastructure, avoids runtime overhead.

Technical Lessons Learned

1. EventEmitter Coordination Anti-Pattern

Code:

queue.emitter.on('request', () => {
  // Wake up generator to process request
});

Issue: Complex async coordination using event-driven wakeup signals is hard to reason about.

Better: Use async queues or channels (e.g., async-queue package) that handle coordination internally.

2. Generator Pattern Complexity

Pattern:

async *createJitMessageGenerator() {
  yield initialPrompt;
  while (!aborted) {
    await waitForEvent();  // Blocks here
    yield nextRequest;
  }
}

Tradeoff: Generators are great for iteration, but terrible for event-driven request/response patterns.

Better: Use explicit session object with sendMessage()/waitForResponse() methods.

3. Dual Session Management

Complexity: Managing two concurrent SDK sessions per user session is inherently complex.

Alternatives Considered:

Single session handling both observations and filtering (rejected: tight coupling)
Separate service for filtering (rejected: too much infrastructure)
Pre-computed filtering (not considered: should have been)

Lesson: When parallel state management feels hard, question whether you need parallel state.

4. Promise Queue Pattern

Implementation:

interface QueuedRequest {
  resolve: (result: T) => void;
  reject: (error: Error) => void;
}
queue.push({ resolve, reject });
// Later...
queue[0].resolve(result);

Good: Clean async API for callers Bad: Easy to leak promises if error handling isn't perfect Improvement: Use libraries like p-queue that handle edge cases

Process Lessons Learned

1. No Incremental Validation

Mistake: Went from "idea" to "complete architecture" without validation points.

Better Process:

Write one-pager explaining user value
Build simplest possible version (2 hours max)
Test with real usage
Measure impact
Decide: kill, iterate, or scale

Checkpoint Questions:

After 1 hour: "Does this solve a real problem?"
After 2 hours: "Is this getting too complex?"
After 3 hours: "Should I just ship the simple version?"

2. Architecture Astronomy

Definition: Designing elaborate systems without building/testing them.

Evidence:

265-line architecture doc written before any code
298-line session pattern parity analysis
Multiple complete rewrites of the same feature

Better: Code first, document later. Spike solutions, learn from implementation.

3. Sunk Cost Fallacy

Timeline:

Hour 1: "This seems complex but achievable"
Hour 2: "We're halfway done, can't stop now"
Hour 3: "Just need to fix this one coordination issue"
Hour 4: "It's working, but... this feels wrong"

Correct Decision: Revert. Took courage to throw away 4 hours of work.

Learning: Time invested is not a reason to continue. Quality of outcome matters more.

4. Missing User Feedback Loop

No User Input:

Didn't ask: "Is context relevance a problem for you?"
Didn't test: "Does filtered context improve your responses?"
Didn't measure: "Are you hitting context limits?"

Engineering Theater: Building impressive-sounding features without user validation.

What We Actually Learned (The Real Value)

Despite reverting, this was productive R&D:

1. Deep Understanding of Hook Architecture

Critical Discovery: Hooks run in sandboxed environment without claudePath.

Hooks cannot call Agent SDK query() directly
All AI processing must happen in worker service
This architectural constraint is now documented

Learned Pattern:

Hook (orchestration) → Worker (AI processing)
✓ save-hook: Captures data → Worker processes with SDK
✓ new-hook: Creates session → Worker returns confirmation
✗ jit-hook: Tried SDK in hook → Failed, no claudePath

Value: Future features will avoid this mistake. We now know the boundary.

2. Worker Architecture Patterns

Blocking vs. Non-Blocking:

SessionStart: Can be non-blocking (context loads async)
UserPromptSubmit: Must be blocking (session must exist before processing)
JIT Context: Must be blocking (context needed before prompt processed)

Established Pattern:

// Worker endpoint for features requiring AI
app.post('/sessions/:id/operation', async (req, res) => {
  const { operationData } = req.body;
  const result = await sdkAgent.performOperation(operationData);
  return res.json({ result });
});

3. Persistent Session Management

Architecture Knowledge Gained:

How to maintain long-lived SDK sessions
EventEmitter coordination patterns for request/response
Promise queue management for async operations
Proper cleanup with AbortControllers

Pattern Documented:

Dual session management (regular + JIT)
Generator-based message loops
Request queuing with timeouts

Value: When we build the simpler version, we'll know these patterns.

4. Configuration Infrastructure

src/shared/settings.ts (65 lines) provides reusable configuration patterns:

export function getConfigValue(key: string, defaultValue: string): string {
  // Priority: settings.json → env var → default
}

Kept After Revert: This module is useful for other features.

5. Key Architectural Decisions Made

Decisions that will guide future implementation:

JIT context filtering must happen in worker (proven via failed hook attempt)
Context must be blocking on UserPromptSubmit (session needs context before processing)
Dynamic timeline search is the right approach (fast, precise, leverages existing infrastructure)
Simple per-request queries should be tried before persistent sessions

6. Documentation Quality

jit-context-architecture-fix.md: Documents why hooks can't run SDK queries
session-pattern-parity.md: Reference for implementing dual sessions
Hooks reference: Comprehensive hook documentation added

Value: These docs help future contributors understand the system constraints.

7. Infrastructure Validation

Confirmed that our search stack is ready:

SQLite FTS5: Fast full-text search (<50ms)
ChromaDB: Semantic search (<200ms with 8,000+ vectors)
Timeline search API: Already implemented and tested
Worker service: Can handle synchronous AI operations

The infrastructure exists. We just need a simpler integration.

Recommendations

Immediate Actions

Archive the work:
- Keep failed/jit-context branch for reference
- Extract reusable components (settings.ts)
- Save architecture docs for future features
Document the anti-patterns:
- Add this post-mortem to CLAUDE.md references
- Update coding standards with lessons learned
Reset focus:
- Return to validated user needs
- Prioritize features with clear value propositions

Future Feature Development

Gating Questions (Answer before coding):

User Value: What specific user problem does this solve?
Evidence: Have users requested this or reported the underlying issue?
Measurement: How will we know if it's successful?
Simplicity: What's the dumbest version that could work?
Time Limit: If we can't prove value in 2 hours, should we build it?

Process:

VALIDATE → BUILD SIMPLE → TEST → MEASURE → DECIDE
   ↑                                          ↓
   └──────────── ITERATE OR KILL ────────────┘

If Context Filtering Returns

Should we revisit this idea in the future:

Prerequisites:

User feedback requesting better context relevance
Metrics showing current context is too broad
Evidence that filtering improves response quality

Simple Approach:

// In worker-service.ts /sessions/:id/init
if (jitEnabled) {
  const observations = await db.getRecentObservations(project, 50);
  const filtered = await simpleFilter(observations, userPrompt);  // One-shot query
  return { context: filtered };
}

Acceptance Criteria:

<100 lines of code
<500ms latency impact
No new session types
Degrades gracefully on errors

If that works: Then consider optimization.

Conclusion

JIT context filtering failed not because the vision was wrong, but because we jumped to the complex implementation without validating the simple one first. The feature aligns with long-term goals (dynamic, prompt-specific context using our fast search infrastructure), but the persistent-session architecture was premature optimization.

The right call: Revert the complex implementation. Build the simple version when ready.

Key Takeaway: The vision is sound. The execution was overcomplicated. We now have:

Deep knowledge of hook/worker architecture constraints
Documented patterns for persistent SDK sessions
Validated fast search infrastructure
Clear understanding of what to build next time (simple timeline search API integration)

This was R&D, not failure. We learned what doesn't work (SDK in hooks), what does work (worker-based AI processing), and how to approach it next time (simple API calls before persistent sessions).

Next Implementation: When we revisit this (and we should), start with:

Worker endpoint that accepts prompt
Queries existing timeline search API
Returns context
Hook injects context
Validate it improves responses
Then optimize if needed

Final Thought: Sometimes you have to build the wrong thing to understand the right thing. That's R&D.

Branch Status:

feature/jit-context: Abandoned
failed/jit-context: Archived for reference
main: Stable at v5.4.0

Files to Keep:

src/shared/settings.ts: Reusable config utilities

Files Discarded:

Everything else (+2,850 lines)

Emotional State: Relieved. Dodged a maintenance nightmare.

22 KiB Raw Blame History

JIT Context Filtering: Post-Mortem

Executive Summary

What We Tried to Build

Goal

The Vision

User Experience

Why Checkbox Settings Became Less Important

Architectural Attempts

Attempt 1: Hook-Based Filtering (7:45 PM - 9:30 PM)

Attempt 2: Worker-Based with Simple Queries (9:30 PM - 10:30 PM)

Attempt 3: Persistent JIT Sessions (10:30 PM - 11:11 PM)

Why It Failed

1. Architectural Complexity Explosion

2. Premature Optimization

3. Implementation Complexity, Not Vision

4. Pattern Divergence

5. Blocking I/O in Critical Path

6. Missing the Forest for the Trees

What We Should Have Done

Option 1: Don't Build It

Option 2: Simple MVP

Option 3: Different Approach Entirely

Technical Lessons Learned

1. EventEmitter Coordination Anti-Pattern

2. Generator Pattern Complexity

3. Dual Session Management

4. Promise Queue Pattern

Process Lessons Learned

1. No Incremental Validation

2. Architecture Astronomy

3. Sunk Cost Fallacy

4. Missing User Feedback Loop

What We Actually Learned (The Real Value)

1. Deep Understanding of Hook Architecture

2. Worker Architecture Patterns

3. Persistent Session Management

4. Configuration Infrastructure

5. Key Architectural Decisions Made

6. Documentation Quality

7. Infrastructure Validation

Recommendations

Immediate Actions

Future Feature Development

If Context Filtering Returns

Conclusion

22 KiB

Raw Blame History