From 5fdf25d60fb3a8f599585f6fcfe2b36b917f8229 Mon Sep 17 00:00:00 2001
From: Alex Newman <thedotmack@gmail.com>
Date: Sun, 9 Nov 2025 23:31:03 -0500
Subject: [PATCH] docs: Add JIT context filtering post-mortem
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Comprehensive analysis of the 3.5-hour JIT context filtering experiment.
Documents architectural learnings, implementation attempts, and why the
complex persistent-session approach was reverted in favor of a simpler
future implementation.

Key learnings:
- Hooks cannot run Agent SDK queries (no claudePath access)
- All AI processing must happen in worker service
- Dynamic timeline search is the right approach
- Infrastructure (SQLite FTS5 + Chroma) validated and ready
- Simple per-request approach should be tried before optimization

The vision is sound, execution was overcomplicated. This was productive
R&D that validated constraints and documented patterns for future work.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
---
 context/jit-context-postmortem.md | 616 ++++++++++++++++++++++++++++++
 1 file changed, 616 insertions(+)
 create mode 100644 context/jit-context-postmortem.md

diff --git a/context/jit-context-postmortem.md b/context/jit-context-postmortem.md
new file mode 100644
index 00000000..d58fa03b
--- /dev/null
+++ b/context/jit-context-postmortem.md
@@ -0,0 +1,616 @@
+# JIT Context Filtering: Post-Mortem
+
+**Date:** November 9, 2025
+**Duration:** 3.5 hours (7:45 PM - 11:11 PM)
+**Branches:** `feature/jit-context`, `failed/jit-context`
+**Status:** Failed, reverted to main
+**Commits:**
+- `3ac0790` - feat: Implement JIT context hook for user prompt submission
+- `adf7bf4` - Refactor JIT context handling in SDKAgent and WorkerService
+
+## Executive Summary
+
+Attempted to implement JIT (Just-In-Time) context filtering—a feature that would dynamically generate relevant context timelines on every user prompt, potentially replacing the static session-start context entirely. After multiple architectural iterations spanning 3.5 hours and adding ~2,850 lines of code, the implementation was abandoned and reverted. The revert was not due to lack of vision (the feature aligns with long-term architectural goals), but due to implementation complexity and the need for a simpler initial approach. Significant architectural knowledge was gained about hook limitations, worker patterns, and proper separation of concerns.
+
+## What We Tried to Build
+
+### Goal
+When a user submits a prompt, dynamically generate a relevant context timeline instead of the static session-start context. Use the fast search infrastructure (SQLite FTS5 + ChromaDB) to fetch precisely relevant context on-demand.
+
+### The Vision
+**Current approach:** SessionStart hook loads 50 recent observations blindly, displays them all.
+
+**Proposed approach:** UserPromptSubmit hook analyzes the prompt, queries the timeline search API, and loads only the relevant context window dynamically.
+
+**Why this makes sense:**
+- We already have fast search: SQLite FTS5 + Chroma semantic search
+- Dynamic context timeline search is implemented and tested
+- Search results come back in <200ms
+- Could **replace** session-start context entirely with smarter, prompt-specific context
+
+### User Experience
+```
+User types: "How did we fix the authentication bug?"
+
+Behind the scenes:
+1. Analyze prompt: "authentication bug fix"
+2. Query timeline search for relevant period
+3. Load 5-10 observations from that specific timeline
+4. Inject as context
+5. Claude answers with precisely relevant historical context
+
+vs. Current:
+Load 50 most recent observations regardless of relevance
+```
+
+### Why Checkbox Settings Became Less Important
+Originally asked for checkboxes to customize session-start context display. But if JIT context could replace session-start context with intelligent, prompt-specific timelines, the display customization became a non-issue.
+
+## Architectural Attempts
+
+### Attempt 1: Hook-Based Filtering (7:45 PM - 9:30 PM)
+
+**Approach:** Call Agent SDK `query()` directly in `new-hook.ts` during UserPromptSubmit event.
+
+**Implementation:**
+- Created `jit-context-hook.ts` (~432 lines)
+- Added `generateJitContext()` function in hook
+- Called SDK `query()` with observation list and user prompt
+- Expected hook to block for ~1-2s while Haiku filters
+
+**Failure:**
+```
+Error: Claude Code executable not found at
+/Users/alexnewman/.claude/plugins/marketplaces/thedotmack/plugin/scripts/cli.js
+```
+
+**Root Cause:** Hooks run in sandboxed environment without access to `claudePath` (path to Claude Code executable). The Agent SDK requires this path, which is only available in the worker service.
+
+**Architectural Violation:** This broke the established pattern where hooks handle orchestration and workers handle AI processing. The `save-hook` sets the precedent: hooks capture data, send to worker, worker runs SDK queries asynchronously.
+
+### Attempt 2: Worker-Based with Simple Queries (9:30 PM - 10:30 PM)
+
+**Approach:** Move JIT filtering to worker service, keep it simple with per-request SDK queries.
+
+**Implementation:**
+- Documented architecture fix plan in `docs/jit-context-architecture-fix.md`
+- Moved `generateJitContext()` to worker (considered creating `src/services/worker/JitContext.ts`)
+- Modified `/sessions/:id/init` endpoint to accept `jitEnabled` flag
+- Worker would run one-shot SDK query per prompt
+
+**Architecture:**
+```
+UserPromptSubmit → new-hook → POST /sessions/:id/init { jitEnabled: true }
+                                      ↓
+                              Worker spawns Claude Haiku
+                                      ↓
+                              Filters 50 obs → 3-5 IDs
+                                      ↓
+                              Returns { context: [...] }
+                                      ↓
+                              Hook injects context → Claude
+```
+
+**Issues Identified:**
+- Each filter request spawns a new Claude subprocess (~200-500ms overhead)
+- Observation list re-sent on every prompt (~5-10KB per request)
+- No token caching between requests
+- Performance worse than just loading all observations directly
+
+**Decision:** Pivoted to persistent sessions to solve performance issues.
+
+### Attempt 3: Persistent JIT Sessions (10:30 PM - 11:11 PM)
+
+**Approach:** Create a long-lived Agent SDK session that persists throughout user session, similar to main memory session pattern.
+
+**Implementation (291 new lines in SDKAgent.ts):**
+
+1. **Session Lifecycle:**
+   - Added `jitSessionId`, `jitAbortController`, `jitGeneratorPromise` to `ActiveSession` interface
+   - `startJitSession()`: Creates persistent SDK session at session init
+   - `cleanupJitSession()`: Terminates JIT session at session end
+
+2. **Request Queue Architecture:**
+   - `jitFilterQueues` Map: Per-session request queues
+   - `JITFilterRequest` interface: `{ userPrompt, resolve, reject }`
+   - EventEmitter coordination: Wake generator when new requests arrive
+
+3. **Message Generator Pattern:**
+   - `createJitMessageGenerator()`: Async generator that yields filter requests
+   - Initial prompt: Load 50 observations, wait for "READY" response
+   - Loop: Wait for EventEmitter signal → yield user prompt → parse response → resolve promise
+   - Pattern: Persistent session stays alive between requests
+
+4. **Filter Query Flow:**
+   ```typescript
+   runFilterQuery(sessionDbId, userPrompt) {
+     // Queue request
+     queue.requests.push({ userPrompt, resolve, reject });
+     queue.emitter.emit('request');
+
+     // Wait for response (30s timeout)
+     return Promise.race([
+       new Promise((resolve, reject) => { /* queued */ }),
+       timeout(30000)
+     ]);
+   }
+   ```
+
+5. **Response Processing:**
+   - `processJitFilterResponse()`: Accumulate streaming text
+   - Parse IDs: "1,5,23,41" or "NONE"
+   - Resolve queued promise with ID array
+
+**Added Files:**
+- `src/services/worker/SDKAgent.ts`: +291 lines
+- `src/services/worker-types.ts`: +3 fields (jit state tracking)
+- `src/services/worker/SessionManager.ts`: +26 lines (JIT cleanup)
+- `src/services/worker-service.ts`: +102 lines (JIT initialization)
+- `src/shared/settings.ts`: +65 lines (JIT config)
+- `src/hooks/jit-context-hook.ts`: +208 lines (orchestration)
+- `docs/jit-context-architecture-fix.md`: +265 lines
+- `context/session-pattern-parity.md`: +298 lines
+
+**Total Changes:** 18 files, +2,852 lines, -133 lines
+
+**Final Status at Revert:** Implementation was complete and likely functional, but...
+
+## Why It Failed
+
+### 1. Architectural Complexity Explosion
+
+**Problem:** The persistent session pattern added enormous complexity for marginal benefit.
+
+**Evidence:**
+- Parallel session management: Regular + JIT sessions running concurrently
+- Complex coordination: EventEmitter + promise queues + generator pattern
+- Lifecycle coupling: Session init, request handling, cleanup all intertwined
+- State explosion: 3 new fields per session (`jitSessionId`, `jitAbortController`, `jitGeneratorPromise`)
+
+**Code Smell:** When the "optimization" requires 300 lines of coordination code, it's probably not an optimization.
+
+### 2. Premature Optimization
+
+**YAGNI Violation:** Built elaborate token caching and persistent session architecture before proving the feature provided value.
+
+**Reality Check:**
+- **Current approach:** Load 50 observations = ~25KB context, works fine
+- **JIT overhead:** Haiku query = 1-2s latency + coordination complexity
+- **User benefit:** Unclear—users haven't complained about context relevance
+- **Token savings:** Marginal—Claude caches long contexts efficiently anyway
+
+**Quote from CLAUDE.md:**
+> "Write the dumb, obvious thing first. Add complexity only when you actually hit the problem."
+
+We didn't hit a problem. We invented one.
+
+### 3. Implementation Complexity, Not Vision
+
+**The Vision is Sound:**
+- Dynamic context is better than static context
+- Timeline search API exists and is fast
+- Infrastructure (SQLite + Chroma) can support this
+- Replacing session-start context with prompt-specific context makes sense
+
+**The Problem:**
+We jumped to the complex persistent-session approach without trying the simple per-request approach first.
+
+**What We Should Have Done:**
+```typescript
+// Simple version (not tried):
+app.post('/sessions/:id/init', async (req, res) => {
+  const { userPrompt } = req.body;
+
+  // Query timeline search API (already exists, fast)
+  const timeline = await timelineSearch(project, userPrompt, depth=10);
+
+  // Return observations
+  return res.json({ context: timeline });
+});
+```
+
+**This would have:**
+- Validated the feature's value quickly
+- Used existing infrastructure
+- Avoided all the persistence complexity
+- Taken 30 minutes instead of 3.5 hours
+
+### 4. Pattern Divergence
+
+**Inconsistency:** JIT sessions work fundamentally differently from memory sessions.
+
+**Memory Session Pattern:**
+```typescript
+// One-shot: Init → Process observations → Complete
+startSession() → yield prompts → parse responses → complete
+```
+
+**JIT Session Pattern:**
+```typescript
+// Persistent: Init → Wait indefinitely → Process on-demand → Complete
+startJitSession() → yield initial load → LOOP:
+  - Wait for EventEmitter signal
+  - Yield filter request
+  - Parse response
+  - Resolve promise
+  - GOTO LOOP
+```
+
+**Maintenance Burden:** Two completely different session patterns means:
+- Doubled testing complexity
+- Increased cognitive load for contributors
+- Higher risk of subtle bugs in lifecycle management
+
+**Session Pattern Parity Document:** The 298-line `session-pattern-parity.md` was created to document the differences—a sign that maybe they shouldn't be different.
+
+### 5. Blocking I/O in Critical Path
+
+**Performance Impact:** Every user prompt now blocks for 1-2s waiting for Haiku filtering.
+
+**Current Flow:**
+```
+User types prompt → 10ms → Claude responds
+```
+
+**JIT Flow:**
+```
+User types prompt → 10ms init → 1-2s Haiku filter → Claude responds
+```
+
+**User Experience:** We added 1-2 seconds of latency to every interaction for questionable benefit.
+
+**Alternative:** If context filtering is valuable, do it asynchronously and apply to next prompt.
+
+### 6. Missing the Forest for the Trees
+
+**Real Issue:** We focused on technical implementation without asking strategic questions:
+
+- **Is context relevance actually a problem?** No evidence.
+- **Do users want this?** No feedback requested.
+- **Is 50 observations too many?** Not proven.
+- **Does filtering improve responses?** Not tested.
+
+**Anti-Pattern:** Solution in search of a problem.
+
+## What We Should Have Done
+
+### Option 1: Don't Build It
+
+**Justification:** No validated user need. Current system works fine.
+
+**Next Step:** Wait for user feedback indicating context relevance is an issue.
+
+### Option 2: Simple MVP
+
+If we really wanted to explore this:
+
+1. **Week 1:** Add basic filtering in worker with one-shot queries
+   - Accept slight performance hit (~500ms overhead)
+   - Measure filter accuracy and user impact
+   - Gather feedback
+
+2. **Week 2:** If proven valuable, optimize
+   - Add token caching only if needed
+   - Consider persistent sessions only if performance is bottleneck
+
+3. **Week 3:** If still valuable, scale
+   - Polish error handling
+   - Add configuration options
+   - Document patterns
+
+**Philosophy:** Incremental validation, not big-bang architecture.
+
+### Option 3: Different Approach Entirely
+
+**Alternative:** Pre-computed relevance scores
+
+Instead of on-demand filtering:
+- Score observations at creation time (save-hook)
+- Store relevance embeddings in Chroma
+- At session start, query Chroma with user's first prompt
+- Load top 10-20 most relevant observations
+- No runtime latency, better accuracy, simpler architecture
+
+**Benefit:** Leverages existing Chroma infrastructure, avoids runtime overhead.
+
+## Technical Lessons Learned
+
+### 1. EventEmitter Coordination Anti-Pattern
+
+**Code:**
+```typescript
+queue.emitter.on('request', () => {
+  // Wake up generator to process request
+});
+```
+
+**Issue:** Complex async coordination using event-driven wakeup signals is hard to reason about.
+
+**Better:** Use async queues or channels (e.g., `async-queue` package) that handle coordination internally.
+
+### 2. Generator Pattern Complexity
+
+**Pattern:**
+```typescript
+async *createJitMessageGenerator() {
+  yield initialPrompt;
+  while (!aborted) {
+    await waitForEvent();  // Blocks here
+    yield nextRequest;
+  }
+}
+```
+
+**Tradeoff:** Generators are great for iteration, but terrible for event-driven request/response patterns.
+
+**Better:** Use explicit session object with `sendMessage()/waitForResponse()` methods.
+
+### 3. Dual Session Management
+
+**Complexity:** Managing two concurrent SDK sessions per user session is inherently complex.
+
+**Alternatives Considered:**
+- Single session handling both observations and filtering (rejected: tight coupling)
+- Separate service for filtering (rejected: too much infrastructure)
+- Pre-computed filtering (not considered: should have been)
+
+**Lesson:** When parallel state management feels hard, question whether you need parallel state.
+
+### 4. Promise Queue Pattern
+
+**Implementation:**
+```typescript
+interface QueuedRequest {
+  resolve: (result: T) => void;
+  reject: (error: Error) => void;
+}
+queue.push({ resolve, reject });
+// Later...
+queue[0].resolve(result);
+```
+
+**Good:** Clean async API for callers
+**Bad:** Easy to leak promises if error handling isn't perfect
+**Improvement:** Use libraries like `p-queue` that handle edge cases
+
+## Process Lessons Learned
+
+### 1. No Incremental Validation
+
+**Mistake:** Went from "idea" to "complete architecture" without validation points.
+
+**Better Process:**
+1. Write one-pager explaining user value
+2. Build simplest possible version (2 hours max)
+3. Test with real usage
+4. Measure impact
+5. Decide: kill, iterate, or scale
+
+**Checkpoint Questions:**
+- After 1 hour: "Does this solve a real problem?"
+- After 2 hours: "Is this getting too complex?"
+- After 3 hours: "Should I just ship the simple version?"
+
+### 2. Architecture Astronomy
+
+**Definition:** Designing elaborate systems without building/testing them.
+
+**Evidence:**
+- 265-line architecture doc written before any code
+- 298-line session pattern parity analysis
+- Multiple complete rewrites of the same feature
+
+**Better:** Code first, document later. Spike solutions, learn from implementation.
+
+### 3. Sunk Cost Fallacy
+
+**Timeline:**
+- **Hour 1:** "This seems complex but achievable"
+- **Hour 2:** "We're halfway done, can't stop now"
+- **Hour 3:** "Just need to fix this one coordination issue"
+- **Hour 4:** "It's working, but... this feels wrong"
+
+**Correct Decision:** Revert. Took courage to throw away 4 hours of work.
+
+**Learning:** Time invested is not a reason to continue. Quality of outcome matters more.
+
+### 4. Missing User Feedback Loop
+
+**No User Input:**
+- Didn't ask: "Is context relevance a problem for you?"
+- Didn't test: "Does filtered context improve your responses?"
+- Didn't measure: "Are you hitting context limits?"
+
+**Engineering Theater:** Building impressive-sounding features without user validation.
+
+## What We Actually Learned (The Real Value)
+
+Despite reverting, this was productive R&D:
+
+### 1. Deep Understanding of Hook Architecture
+
+**Critical Discovery:** Hooks run in sandboxed environment without `claudePath`.
+- Hooks cannot call Agent SDK `query()` directly
+- All AI processing must happen in worker service
+- This architectural constraint is now documented
+
+**Learned Pattern:**
+```
+Hook (orchestration) → Worker (AI processing)
+✓ save-hook: Captures data → Worker processes with SDK
+✓ new-hook: Creates session → Worker returns confirmation
+✗ jit-hook: Tried SDK in hook → Failed, no claudePath
+```
+
+**Value:** Future features will avoid this mistake. We now know the boundary.
+
+### 2. Worker Architecture Patterns
+
+**Blocking vs. Non-Blocking:**
+- SessionStart: Can be non-blocking (context loads async)
+- UserPromptSubmit: Must be blocking (session must exist before processing)
+- JIT Context: Must be blocking (context needed before prompt processed)
+
+**Established Pattern:**
+```typescript
+// Worker endpoint for features requiring AI
+app.post('/sessions/:id/operation', async (req, res) => {
+  const { operationData } = req.body;
+  const result = await sdkAgent.performOperation(operationData);
+  return res.json({ result });
+});
+```
+
+### 3. Persistent Session Management
+
+**Architecture Knowledge Gained:**
+- How to maintain long-lived SDK sessions
+- EventEmitter coordination patterns for request/response
+- Promise queue management for async operations
+- Proper cleanup with AbortControllers
+
+**Pattern Documented:**
+- Dual session management (regular + JIT)
+- Generator-based message loops
+- Request queuing with timeouts
+
+**Value:** When we build the simpler version, we'll know these patterns.
+
+### 4. Configuration Infrastructure
+
+`src/shared/settings.ts` (65 lines) provides reusable configuration patterns:
+```typescript
+export function getConfigValue(key: string, defaultValue: string): string {
+  // Priority: settings.json → env var → default
+}
+```
+
+**Kept After Revert:** This module is useful for other features.
+
+### 5. Key Architectural Decisions Made
+
+**Decisions that will guide future implementation:**
+1. JIT context filtering must happen in worker (proven via failed hook attempt)
+2. Context must be blocking on UserPromptSubmit (session needs context before processing)
+3. Dynamic timeline search is the right approach (fast, precise, leverages existing infrastructure)
+4. Simple per-request queries should be tried before persistent sessions
+
+### 6. Documentation Quality
+
+- `jit-context-architecture-fix.md`: Documents why hooks can't run SDK queries
+- `session-pattern-parity.md`: Reference for implementing dual sessions
+- Hooks reference: Comprehensive hook documentation added
+
+**Value:** These docs help future contributors understand the system constraints.
+
+### 7. Infrastructure Validation
+
+**Confirmed that our search stack is ready:**
+- SQLite FTS5: Fast full-text search (<50ms)
+- ChromaDB: Semantic search (<200ms with 8,000+ vectors)
+- Timeline search API: Already implemented and tested
+- Worker service: Can handle synchronous AI operations
+
+**The infrastructure exists. We just need a simpler integration.**
+
+## Recommendations
+
+### Immediate Actions
+
+1. **Archive the work:**
+   - Keep `failed/jit-context` branch for reference
+   - Extract reusable components (settings.ts)
+   - Save architecture docs for future features
+
+2. **Document the anti-patterns:**
+   - Add this post-mortem to CLAUDE.md references
+   - Update coding standards with lessons learned
+
+3. **Reset focus:**
+   - Return to validated user needs
+   - Prioritize features with clear value propositions
+
+### Future Feature Development
+
+**Gating Questions (Answer before coding):**
+
+1. **User Value:** What specific user problem does this solve?
+2. **Evidence:** Have users requested this or reported the underlying issue?
+3. **Measurement:** How will we know if it's successful?
+4. **Simplicity:** What's the dumbest version that could work?
+5. **Time Limit:** If we can't prove value in 2 hours, should we build it?
+
+**Process:**
+
+```
+VALIDATE → BUILD SIMPLE → TEST → MEASURE → DECIDE
+   ↑                                          ↓
+   └──────────── ITERATE OR KILL ────────────┘
+```
+
+### If Context Filtering Returns
+
+Should we revisit this idea in the future:
+
+**Prerequisites:**
+- User feedback requesting better context relevance
+- Metrics showing current context is too broad
+- Evidence that filtering improves response quality
+
+**Simple Approach:**
+```typescript
+// In worker-service.ts /sessions/:id/init
+if (jitEnabled) {
+  const observations = await db.getRecentObservations(project, 50);
+  const filtered = await simpleFilter(observations, userPrompt);  // One-shot query
+  return { context: filtered };
+}
+```
+
+**Acceptance Criteria:**
+- <100 lines of code
+- <500ms latency impact
+- No new session types
+- Degrades gracefully on errors
+
+**If that works:** Then consider optimization.
+
+## Conclusion
+
+JIT context filtering failed not because the vision was wrong, but because we jumped to the complex implementation without validating the simple one first. The feature aligns with long-term goals (dynamic, prompt-specific context using our fast search infrastructure), but the persistent-session architecture was premature optimization.
+
+**The right call:** Revert the complex implementation. Build the simple version when ready.
+
+**Key Takeaway:** The vision is sound. The execution was overcomplicated. We now have:
+- Deep knowledge of hook/worker architecture constraints
+- Documented patterns for persistent SDK sessions
+- Validated fast search infrastructure
+- Clear understanding of what to build next time (simple timeline search API integration)
+
+**This was R&D, not failure.** We learned what doesn't work (SDK in hooks), what does work (worker-based AI processing), and how to approach it next time (simple API calls before persistent sessions).
+
+**Next Implementation:**
+When we revisit this (and we should), start with:
+1. Worker endpoint that accepts prompt
+2. Queries existing timeline search API
+3. Returns context
+4. Hook injects context
+5. Validate it improves responses
+6. Then optimize if needed
+
+**Final Thought:** Sometimes you have to build the wrong thing to understand the right thing. That's R&D.
+
+---
+
+**Branch Status:**
+- `feature/jit-context`: Abandoned
+- `failed/jit-context`: Archived for reference
+- `main`: Stable at v5.4.0
+
+**Files to Keep:**
+- `src/shared/settings.ts`: Reusable config utilities
+
+**Files Discarded:**
+- Everything else (+2,850 lines)
+
+**Emotional State:** Relieved. Dodged a maintenance nightmare.