From 5fdf25d60fb3a8f599585f6fcfe2b36b917f8229 Mon Sep 17 00:00:00 2001 From: Alex Newman Date: Sun, 9 Nov 2025 23:31:03 -0500 Subject: [PATCH] docs: Add JIT context filtering post-mortem MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Comprehensive analysis of the 3.5-hour JIT context filtering experiment. Documents architectural learnings, implementation attempts, and why the complex persistent-session approach was reverted in favor of a simpler future implementation. Key learnings: - Hooks cannot run Agent SDK queries (no claudePath access) - All AI processing must happen in worker service - Dynamic timeline search is the right approach - Infrastructure (SQLite FTS5 + Chroma) validated and ready - Simple per-request approach should be tried before optimization The vision is sound, execution was overcomplicated. This was productive R&D that validated constraints and documented patterns for future work. πŸ€– Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- context/jit-context-postmortem.md | 616 ++++++++++++++++++++++++++++++ 1 file changed, 616 insertions(+) create mode 100644 context/jit-context-postmortem.md diff --git a/context/jit-context-postmortem.md b/context/jit-context-postmortem.md new file mode 100644 index 00000000..d58fa03b --- /dev/null +++ b/context/jit-context-postmortem.md @@ -0,0 +1,616 @@ +# JIT Context Filtering: Post-Mortem + +**Date:** November 9, 2025 +**Duration:** 3.5 hours (7:45 PM - 11:11 PM) +**Branches:** `feature/jit-context`, `failed/jit-context` +**Status:** Failed, reverted to main +**Commits:** +- `3ac0790` - feat: Implement JIT context hook for user prompt submission +- `adf7bf4` - Refactor JIT context handling in SDKAgent and WorkerService + +## Executive Summary + +Attempted to implement JIT (Just-In-Time) context filteringβ€”a feature that would dynamically generate relevant context timelines on every user prompt, potentially replacing the static session-start context entirely. After multiple architectural iterations spanning 3.5 hours and adding ~2,850 lines of code, the implementation was abandoned and reverted. The revert was not due to lack of vision (the feature aligns with long-term architectural goals), but due to implementation complexity and the need for a simpler initial approach. Significant architectural knowledge was gained about hook limitations, worker patterns, and proper separation of concerns. + +## What We Tried to Build + +### Goal +When a user submits a prompt, dynamically generate a relevant context timeline instead of the static session-start context. Use the fast search infrastructure (SQLite FTS5 + ChromaDB) to fetch precisely relevant context on-demand. + +### The Vision +**Current approach:** SessionStart hook loads 50 recent observations blindly, displays them all. + +**Proposed approach:** UserPromptSubmit hook analyzes the prompt, queries the timeline search API, and loads only the relevant context window dynamically. + +**Why this makes sense:** +- We already have fast search: SQLite FTS5 + Chroma semantic search +- Dynamic context timeline search is implemented and tested +- Search results come back in <200ms +- Could **replace** session-start context entirely with smarter, prompt-specific context + +### User Experience +``` +User types: "How did we fix the authentication bug?" + +Behind the scenes: +1. Analyze prompt: "authentication bug fix" +2. Query timeline search for relevant period +3. Load 5-10 observations from that specific timeline +4. Inject as context +5. Claude answers with precisely relevant historical context + +vs. Current: +Load 50 most recent observations regardless of relevance +``` + +### Why Checkbox Settings Became Less Important +Originally asked for checkboxes to customize session-start context display. But if JIT context could replace session-start context with intelligent, prompt-specific timelines, the display customization became a non-issue. + +## Architectural Attempts + +### Attempt 1: Hook-Based Filtering (7:45 PM - 9:30 PM) + +**Approach:** Call Agent SDK `query()` directly in `new-hook.ts` during UserPromptSubmit event. + +**Implementation:** +- Created `jit-context-hook.ts` (~432 lines) +- Added `generateJitContext()` function in hook +- Called SDK `query()` with observation list and user prompt +- Expected hook to block for ~1-2s while Haiku filters + +**Failure:** +``` +Error: Claude Code executable not found at +/Users/alexnewman/.claude/plugins/marketplaces/thedotmack/plugin/scripts/cli.js +``` + +**Root Cause:** Hooks run in sandboxed environment without access to `claudePath` (path to Claude Code executable). The Agent SDK requires this path, which is only available in the worker service. + +**Architectural Violation:** This broke the established pattern where hooks handle orchestration and workers handle AI processing. The `save-hook` sets the precedent: hooks capture data, send to worker, worker runs SDK queries asynchronously. + +### Attempt 2: Worker-Based with Simple Queries (9:30 PM - 10:30 PM) + +**Approach:** Move JIT filtering to worker service, keep it simple with per-request SDK queries. + +**Implementation:** +- Documented architecture fix plan in `docs/jit-context-architecture-fix.md` +- Moved `generateJitContext()` to worker (considered creating `src/services/worker/JitContext.ts`) +- Modified `/sessions/:id/init` endpoint to accept `jitEnabled` flag +- Worker would run one-shot SDK query per prompt + +**Architecture:** +``` +UserPromptSubmit β†’ new-hook β†’ POST /sessions/:id/init { jitEnabled: true } + ↓ + Worker spawns Claude Haiku + ↓ + Filters 50 obs β†’ 3-5 IDs + ↓ + Returns { context: [...] } + ↓ + Hook injects context β†’ Claude +``` + +**Issues Identified:** +- Each filter request spawns a new Claude subprocess (~200-500ms overhead) +- Observation list re-sent on every prompt (~5-10KB per request) +- No token caching between requests +- Performance worse than just loading all observations directly + +**Decision:** Pivoted to persistent sessions to solve performance issues. + +### Attempt 3: Persistent JIT Sessions (10:30 PM - 11:11 PM) + +**Approach:** Create a long-lived Agent SDK session that persists throughout user session, similar to main memory session pattern. + +**Implementation (291 new lines in SDKAgent.ts):** + +1. **Session Lifecycle:** + - Added `jitSessionId`, `jitAbortController`, `jitGeneratorPromise` to `ActiveSession` interface + - `startJitSession()`: Creates persistent SDK session at session init + - `cleanupJitSession()`: Terminates JIT session at session end + +2. **Request Queue Architecture:** + - `jitFilterQueues` Map: Per-session request queues + - `JITFilterRequest` interface: `{ userPrompt, resolve, reject }` + - EventEmitter coordination: Wake generator when new requests arrive + +3. **Message Generator Pattern:** + - `createJitMessageGenerator()`: Async generator that yields filter requests + - Initial prompt: Load 50 observations, wait for "READY" response + - Loop: Wait for EventEmitter signal β†’ yield user prompt β†’ parse response β†’ resolve promise + - Pattern: Persistent session stays alive between requests + +4. **Filter Query Flow:** + ```typescript + runFilterQuery(sessionDbId, userPrompt) { + // Queue request + queue.requests.push({ userPrompt, resolve, reject }); + queue.emitter.emit('request'); + + // Wait for response (30s timeout) + return Promise.race([ + new Promise((resolve, reject) => { /* queued */ }), + timeout(30000) + ]); + } + ``` + +5. **Response Processing:** + - `processJitFilterResponse()`: Accumulate streaming text + - Parse IDs: "1,5,23,41" or "NONE" + - Resolve queued promise with ID array + +**Added Files:** +- `src/services/worker/SDKAgent.ts`: +291 lines +- `src/services/worker-types.ts`: +3 fields (jit state tracking) +- `src/services/worker/SessionManager.ts`: +26 lines (JIT cleanup) +- `src/services/worker-service.ts`: +102 lines (JIT initialization) +- `src/shared/settings.ts`: +65 lines (JIT config) +- `src/hooks/jit-context-hook.ts`: +208 lines (orchestration) +- `docs/jit-context-architecture-fix.md`: +265 lines +- `context/session-pattern-parity.md`: +298 lines + +**Total Changes:** 18 files, +2,852 lines, -133 lines + +**Final Status at Revert:** Implementation was complete and likely functional, but... + +## Why It Failed + +### 1. Architectural Complexity Explosion + +**Problem:** The persistent session pattern added enormous complexity for marginal benefit. + +**Evidence:** +- Parallel session management: Regular + JIT sessions running concurrently +- Complex coordination: EventEmitter + promise queues + generator pattern +- Lifecycle coupling: Session init, request handling, cleanup all intertwined +- State explosion: 3 new fields per session (`jitSessionId`, `jitAbortController`, `jitGeneratorPromise`) + +**Code Smell:** When the "optimization" requires 300 lines of coordination code, it's probably not an optimization. + +### 2. Premature Optimization + +**YAGNI Violation:** Built elaborate token caching and persistent session architecture before proving the feature provided value. + +**Reality Check:** +- **Current approach:** Load 50 observations = ~25KB context, works fine +- **JIT overhead:** Haiku query = 1-2s latency + coordination complexity +- **User benefit:** Unclearβ€”users haven't complained about context relevance +- **Token savings:** Marginalβ€”Claude caches long contexts efficiently anyway + +**Quote from CLAUDE.md:** +> "Write the dumb, obvious thing first. Add complexity only when you actually hit the problem." + +We didn't hit a problem. We invented one. + +### 3. Implementation Complexity, Not Vision + +**The Vision is Sound:** +- Dynamic context is better than static context +- Timeline search API exists and is fast +- Infrastructure (SQLite + Chroma) can support this +- Replacing session-start context with prompt-specific context makes sense + +**The Problem:** +We jumped to the complex persistent-session approach without trying the simple per-request approach first. + +**What We Should Have Done:** +```typescript +// Simple version (not tried): +app.post('/sessions/:id/init', async (req, res) => { + const { userPrompt } = req.body; + + // Query timeline search API (already exists, fast) + const timeline = await timelineSearch(project, userPrompt, depth=10); + + // Return observations + return res.json({ context: timeline }); +}); +``` + +**This would have:** +- Validated the feature's value quickly +- Used existing infrastructure +- Avoided all the persistence complexity +- Taken 30 minutes instead of 3.5 hours + +### 4. Pattern Divergence + +**Inconsistency:** JIT sessions work fundamentally differently from memory sessions. + +**Memory Session Pattern:** +```typescript +// One-shot: Init β†’ Process observations β†’ Complete +startSession() β†’ yield prompts β†’ parse responses β†’ complete +``` + +**JIT Session Pattern:** +```typescript +// Persistent: Init β†’ Wait indefinitely β†’ Process on-demand β†’ Complete +startJitSession() β†’ yield initial load β†’ LOOP: + - Wait for EventEmitter signal + - Yield filter request + - Parse response + - Resolve promise + - GOTO LOOP +``` + +**Maintenance Burden:** Two completely different session patterns means: +- Doubled testing complexity +- Increased cognitive load for contributors +- Higher risk of subtle bugs in lifecycle management + +**Session Pattern Parity Document:** The 298-line `session-pattern-parity.md` was created to document the differencesβ€”a sign that maybe they shouldn't be different. + +### 5. Blocking I/O in Critical Path + +**Performance Impact:** Every user prompt now blocks for 1-2s waiting for Haiku filtering. + +**Current Flow:** +``` +User types prompt β†’ 10ms β†’ Claude responds +``` + +**JIT Flow:** +``` +User types prompt β†’ 10ms init β†’ 1-2s Haiku filter β†’ Claude responds +``` + +**User Experience:** We added 1-2 seconds of latency to every interaction for questionable benefit. + +**Alternative:** If context filtering is valuable, do it asynchronously and apply to next prompt. + +### 6. Missing the Forest for the Trees + +**Real Issue:** We focused on technical implementation without asking strategic questions: + +- **Is context relevance actually a problem?** No evidence. +- **Do users want this?** No feedback requested. +- **Is 50 observations too many?** Not proven. +- **Does filtering improve responses?** Not tested. + +**Anti-Pattern:** Solution in search of a problem. + +## What We Should Have Done + +### Option 1: Don't Build It + +**Justification:** No validated user need. Current system works fine. + +**Next Step:** Wait for user feedback indicating context relevance is an issue. + +### Option 2: Simple MVP + +If we really wanted to explore this: + +1. **Week 1:** Add basic filtering in worker with one-shot queries + - Accept slight performance hit (~500ms overhead) + - Measure filter accuracy and user impact + - Gather feedback + +2. **Week 2:** If proven valuable, optimize + - Add token caching only if needed + - Consider persistent sessions only if performance is bottleneck + +3. **Week 3:** If still valuable, scale + - Polish error handling + - Add configuration options + - Document patterns + +**Philosophy:** Incremental validation, not big-bang architecture. + +### Option 3: Different Approach Entirely + +**Alternative:** Pre-computed relevance scores + +Instead of on-demand filtering: +- Score observations at creation time (save-hook) +- Store relevance embeddings in Chroma +- At session start, query Chroma with user's first prompt +- Load top 10-20 most relevant observations +- No runtime latency, better accuracy, simpler architecture + +**Benefit:** Leverages existing Chroma infrastructure, avoids runtime overhead. + +## Technical Lessons Learned + +### 1. EventEmitter Coordination Anti-Pattern + +**Code:** +```typescript +queue.emitter.on('request', () => { + // Wake up generator to process request +}); +``` + +**Issue:** Complex async coordination using event-driven wakeup signals is hard to reason about. + +**Better:** Use async queues or channels (e.g., `async-queue` package) that handle coordination internally. + +### 2. Generator Pattern Complexity + +**Pattern:** +```typescript +async *createJitMessageGenerator() { + yield initialPrompt; + while (!aborted) { + await waitForEvent(); // Blocks here + yield nextRequest; + } +} +``` + +**Tradeoff:** Generators are great for iteration, but terrible for event-driven request/response patterns. + +**Better:** Use explicit session object with `sendMessage()/waitForResponse()` methods. + +### 3. Dual Session Management + +**Complexity:** Managing two concurrent SDK sessions per user session is inherently complex. + +**Alternatives Considered:** +- Single session handling both observations and filtering (rejected: tight coupling) +- Separate service for filtering (rejected: too much infrastructure) +- Pre-computed filtering (not considered: should have been) + +**Lesson:** When parallel state management feels hard, question whether you need parallel state. + +### 4. Promise Queue Pattern + +**Implementation:** +```typescript +interface QueuedRequest { + resolve: (result: T) => void; + reject: (error: Error) => void; +} +queue.push({ resolve, reject }); +// Later... +queue[0].resolve(result); +``` + +**Good:** Clean async API for callers +**Bad:** Easy to leak promises if error handling isn't perfect +**Improvement:** Use libraries like `p-queue` that handle edge cases + +## Process Lessons Learned + +### 1. No Incremental Validation + +**Mistake:** Went from "idea" to "complete architecture" without validation points. + +**Better Process:** +1. Write one-pager explaining user value +2. Build simplest possible version (2 hours max) +3. Test with real usage +4. Measure impact +5. Decide: kill, iterate, or scale + +**Checkpoint Questions:** +- After 1 hour: "Does this solve a real problem?" +- After 2 hours: "Is this getting too complex?" +- After 3 hours: "Should I just ship the simple version?" + +### 2. Architecture Astronomy + +**Definition:** Designing elaborate systems without building/testing them. + +**Evidence:** +- 265-line architecture doc written before any code +- 298-line session pattern parity analysis +- Multiple complete rewrites of the same feature + +**Better:** Code first, document later. Spike solutions, learn from implementation. + +### 3. Sunk Cost Fallacy + +**Timeline:** +- **Hour 1:** "This seems complex but achievable" +- **Hour 2:** "We're halfway done, can't stop now" +- **Hour 3:** "Just need to fix this one coordination issue" +- **Hour 4:** "It's working, but... this feels wrong" + +**Correct Decision:** Revert. Took courage to throw away 4 hours of work. + +**Learning:** Time invested is not a reason to continue. Quality of outcome matters more. + +### 4. Missing User Feedback Loop + +**No User Input:** +- Didn't ask: "Is context relevance a problem for you?" +- Didn't test: "Does filtered context improve your responses?" +- Didn't measure: "Are you hitting context limits?" + +**Engineering Theater:** Building impressive-sounding features without user validation. + +## What We Actually Learned (The Real Value) + +Despite reverting, this was productive R&D: + +### 1. Deep Understanding of Hook Architecture + +**Critical Discovery:** Hooks run in sandboxed environment without `claudePath`. +- Hooks cannot call Agent SDK `query()` directly +- All AI processing must happen in worker service +- This architectural constraint is now documented + +**Learned Pattern:** +``` +Hook (orchestration) β†’ Worker (AI processing) +βœ“ save-hook: Captures data β†’ Worker processes with SDK +βœ“ new-hook: Creates session β†’ Worker returns confirmation +βœ— jit-hook: Tried SDK in hook β†’ Failed, no claudePath +``` + +**Value:** Future features will avoid this mistake. We now know the boundary. + +### 2. Worker Architecture Patterns + +**Blocking vs. Non-Blocking:** +- SessionStart: Can be non-blocking (context loads async) +- UserPromptSubmit: Must be blocking (session must exist before processing) +- JIT Context: Must be blocking (context needed before prompt processed) + +**Established Pattern:** +```typescript +// Worker endpoint for features requiring AI +app.post('/sessions/:id/operation', async (req, res) => { + const { operationData } = req.body; + const result = await sdkAgent.performOperation(operationData); + return res.json({ result }); +}); +``` + +### 3. Persistent Session Management + +**Architecture Knowledge Gained:** +- How to maintain long-lived SDK sessions +- EventEmitter coordination patterns for request/response +- Promise queue management for async operations +- Proper cleanup with AbortControllers + +**Pattern Documented:** +- Dual session management (regular + JIT) +- Generator-based message loops +- Request queuing with timeouts + +**Value:** When we build the simpler version, we'll know these patterns. + +### 4. Configuration Infrastructure + +`src/shared/settings.ts` (65 lines) provides reusable configuration patterns: +```typescript +export function getConfigValue(key: string, defaultValue: string): string { + // Priority: settings.json β†’ env var β†’ default +} +``` + +**Kept After Revert:** This module is useful for other features. + +### 5. Key Architectural Decisions Made + +**Decisions that will guide future implementation:** +1. JIT context filtering must happen in worker (proven via failed hook attempt) +2. Context must be blocking on UserPromptSubmit (session needs context before processing) +3. Dynamic timeline search is the right approach (fast, precise, leverages existing infrastructure) +4. Simple per-request queries should be tried before persistent sessions + +### 6. Documentation Quality + +- `jit-context-architecture-fix.md`: Documents why hooks can't run SDK queries +- `session-pattern-parity.md`: Reference for implementing dual sessions +- Hooks reference: Comprehensive hook documentation added + +**Value:** These docs help future contributors understand the system constraints. + +### 7. Infrastructure Validation + +**Confirmed that our search stack is ready:** +- SQLite FTS5: Fast full-text search (<50ms) +- ChromaDB: Semantic search (<200ms with 8,000+ vectors) +- Timeline search API: Already implemented and tested +- Worker service: Can handle synchronous AI operations + +**The infrastructure exists. We just need a simpler integration.** + +## Recommendations + +### Immediate Actions + +1. **Archive the work:** + - Keep `failed/jit-context` branch for reference + - Extract reusable components (settings.ts) + - Save architecture docs for future features + +2. **Document the anti-patterns:** + - Add this post-mortem to CLAUDE.md references + - Update coding standards with lessons learned + +3. **Reset focus:** + - Return to validated user needs + - Prioritize features with clear value propositions + +### Future Feature Development + +**Gating Questions (Answer before coding):** + +1. **User Value:** What specific user problem does this solve? +2. **Evidence:** Have users requested this or reported the underlying issue? +3. **Measurement:** How will we know if it's successful? +4. **Simplicity:** What's the dumbest version that could work? +5. **Time Limit:** If we can't prove value in 2 hours, should we build it? + +**Process:** + +``` +VALIDATE β†’ BUILD SIMPLE β†’ TEST β†’ MEASURE β†’ DECIDE + ↑ ↓ + └──────────── ITERATE OR KILL β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + +### If Context Filtering Returns + +Should we revisit this idea in the future: + +**Prerequisites:** +- User feedback requesting better context relevance +- Metrics showing current context is too broad +- Evidence that filtering improves response quality + +**Simple Approach:** +```typescript +// In worker-service.ts /sessions/:id/init +if (jitEnabled) { + const observations = await db.getRecentObservations(project, 50); + const filtered = await simpleFilter(observations, userPrompt); // One-shot query + return { context: filtered }; +} +``` + +**Acceptance Criteria:** +- <100 lines of code +- <500ms latency impact +- No new session types +- Degrades gracefully on errors + +**If that works:** Then consider optimization. + +## Conclusion + +JIT context filtering failed not because the vision was wrong, but because we jumped to the complex implementation without validating the simple one first. The feature aligns with long-term goals (dynamic, prompt-specific context using our fast search infrastructure), but the persistent-session architecture was premature optimization. + +**The right call:** Revert the complex implementation. Build the simple version when ready. + +**Key Takeaway:** The vision is sound. The execution was overcomplicated. We now have: +- Deep knowledge of hook/worker architecture constraints +- Documented patterns for persistent SDK sessions +- Validated fast search infrastructure +- Clear understanding of what to build next time (simple timeline search API integration) + +**This was R&D, not failure.** We learned what doesn't work (SDK in hooks), what does work (worker-based AI processing), and how to approach it next time (simple API calls before persistent sessions). + +**Next Implementation:** +When we revisit this (and we should), start with: +1. Worker endpoint that accepts prompt +2. Queries existing timeline search API +3. Returns context +4. Hook injects context +5. Validate it improves responses +6. Then optimize if needed + +**Final Thought:** Sometimes you have to build the wrong thing to understand the right thing. That's R&D. + +--- + +**Branch Status:** +- `feature/jit-context`: Abandoned +- `failed/jit-context`: Archived for reference +- `main`: Stable at v5.4.0 + +**Files to Keep:** +- `src/shared/settings.ts`: Reusable config utilities + +**Files Discarded:** +- Everything else (+2,850 lines) + +**Emotional State:** Relieved. Dodged a maintenance nightmare.