Files
claude-mem/NEXT_SESSION_PROMPT.md
T
Alex Newman 309e8a7139 Implement hybrid search: Chroma semantic + SQLite temporal
Core implementation:
- Added Chroma MCP client integration to search-server.ts
- Implemented queryChroma() helper with Python dict parsing
- Added VECTOR_DB_DIR constant to paths.ts
- Added SessionStore.getObservationsByIds() method

Search handlers updated:
- search_observations: Semantic-first with 90-day temporal filter
- find_by_concept/type/file: Metadata-first, semantic-enhanced ranking
- All handlers fall back to FTS5 if Chroma unavailable

Technical details:
- Direct MCP client usage (no abstractions)
- Regex parsing of Chroma Python dict responses
- Semantic ranking preserved in final results
- Graceful degradation to FTS5-only search

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-31 23:00:04 -04:00

194 lines
6.1 KiB
Markdown

# Prompt for Next Session: Hybrid Search Implementation
Copy this entire prompt into a new Claude Code session to continue the hybrid search feature implementation.
---
## Context
I'm working on the `claude-mem` project (persistent memory system for Claude Code). I have an experimental branch `experiment/chroma-mcp` that attempted to add semantic search via ChromaDB, but it has implementation issues and was done in the wrong order.
**Current Status:**
- ✅ Experiment validated: Semantic search (Chroma) + temporal filtering (SQLite) works
- ✅ Chroma collection `cm__claude-mem` has 2,800+ documents synced
- ✅ Search quality tests show semantic search provides value
- ❌ Production implementation has issues (dead code, uncommitted fixes, wrong process)
- ✅ Feature plan written and ready to execute
**Your Task:**
Follow the feature implementation plan in `FEATURE_PLAN_HYBRID_SEARCH.md` to implement hybrid search correctly from the ground up.
---
## Immediate Actions
1. **Read the feature plan:**
```
Read: /Users/alexnewman/Scripts/claude-mem/FEATURE_PLAN_HYBRID_SEARCH.md
```
2. **Understand the experiment results:**
- The experiment scripts work correctly
- Chroma semantic search is functional
- We just need to implement it properly in production
3. **Execute Phase 1 of the plan:**
- Create new `feature/hybrid-search` branch from `main`
- Port working experiment scripts from `experiment/chroma-mcp`
- Clean up any dead code references
---
## Key Principles for This Implementation
1. **Start clean:** New branch from `main`, no baggage from failed attempt
2. **No abstractions:** Direct MCP client usage, no ChromaOrchestrator wrapper
3. **Validate at each step:** Don't commit until you've tested it works
4. **Proper parsing:** Chroma MCP returns Python dicts, not JSON - use regex parsing
5. **Temporal boundaries:** 90-day filter prevents stale semantic matches
---
## Files You'll Need to Work With
**Core Implementation:**
- `src/servers/search-server.ts` - Add hybrid search workflows
- `src/services/sync/ChromaSync.ts` - NEW: Auto-sync observations to Chroma
- `src/services/worker-service.ts` - Integrate auto-sync
- `src/shared/paths.ts` - Add VECTOR_DB_DIR constant
**Experiment Files (keep these, they work):**
- `experiment/chroma-sync-experiment.ts` - Manual sync tool
- `experiment/chroma-search-test.ts` - Search quality validator
**Files to DELETE (dead code from failed attempt):**
- `src/services/chroma/ChromaOrchestrator.ts` - Broken wrapper, never used
- `test-chroma-connection.ts` - Uses broken ChromaOrchestrator
- `plugin/scripts/search-server.cjs` - Stale CommonJS build
---
## Validation Checklist
Before committing any code, verify:
```bash
# 1. Build succeeds
npm run build
# 2. Sync works
npx tsx experiment/chroma-sync-experiment.ts
# 3. Search works
npx tsx experiment/chroma-search-test.ts
# 4. MCP server starts
node plugin/scripts/search-server.js
# (Ctrl+C to stop)
# 5. No dead code
grep -r "ChromaOrchestrator" src/ # Should return nothing
# 6. No stale builds
ls plugin/scripts/search-server.cjs # Should not exist
# 7. Git status clean
git status # No uncommitted changes to production files
```
---
## Implementation Workflow (from Phase 3 of plan)
### Step 1: Add queryChroma Helper
In `src/servers/search-server.ts`, add a helper function that:
- Takes: `query: string, limit: number, whereFilter?: object`
- Calls: `chromaClient.callTool({ name: 'chroma_query_documents', ... })`
- Parses: Python dict response with regex (see lines 256-318 in current branch for example)
- Returns: `{ ids: number[], distances: number[], metadatas: any[] }`
### Step 2: Initialize Chroma Client
In `main()` function:
```typescript
const chromaTransport = new StdioClientTransport({
command: 'uvx',
args: ['chroma-mcp', '--client-type', 'persistent', '--data-dir', VECTOR_DB_DIR]
});
chromaClient = new Client({ name: 'claude-mem-search-chroma-client', version: '1.0.0' }, { capabilities: {} });
await chromaClient.connect(chromaTransport);
```
### Step 3: Update search_observations Handler
Replace FTS5 keyword search with:
1. Chroma semantic search (top 100)
2. Filter by recency (90 days)
3. Hydrate from SQLite in temporal order
4. Return results
### Step 4: Update Metadata Search Handlers
For `find_by_concept`, `find_by_type`, `find_by_file`:
1. SQLite metadata filter first
2. Chroma semantic ranking second
3. Preserve semantic rank order in results
---
## Expected Timeline
- Phase 1 (Clean Start): 15 minutes
- Phase 2 (Architecture Review): Already done, read the plan
- Phase 3 (Implementation): 2-3 hours
- Phase 4 (Validation): 1 hour
- Phase 5 (Documentation): 1 hour
- Phase 6 (Deployment): 30 minutes
**Total: ~5-6 hours**
---
## Questions to Ask Me
If you encounter any issues:
1. "The Chroma MCP client isn't connecting" → Check if `uvx chroma-mcp` is available
2. "Parsing errors from Chroma responses" → Show me the response format, I'll help fix regex
3. "Not sure about the search workflow logic" → Reference Phase 2.2 in the plan
4. "Should I commit now?" → Only if validation checklist passes
5. "Merge to main or PR?" → I'll decide, just get to Phase 6 first
---
## Success Criteria
Don't merge until ALL of these are true:
- ✅ Sync experiment completes without errors
- ✅ Search test shows Chroma returning relevant results
- ✅ MCP server starts and responds to queries
- ✅ Fallback to FTS5 works if Chroma unavailable
- ✅ No breaking changes to MCP tool interfaces
- ✅ Documentation updated (CLAUDE.md + release notes)
- ✅ No uncommitted changes in git status
- ✅ No dead code (ChromaOrchestrator removed)
- ✅ No stale build artifacts (.cjs files deleted)
---
## Start Here
```
1. Read the feature plan:
Read: /Users/alexnewman/Scripts/claude-mem/FEATURE_PLAN_HYBRID_SEARCH.md
2. Create the feature branch:
Bash: git checkout main && git pull && git checkout -b feature/hybrid-search
3. Begin Phase 1 of the plan (porting experiment scripts)
4. Work through each phase systematically, validating at each step
5. Ask me questions if anything is unclear
```
Let's build this correctly, from the ground up. Take your time and validate at each step.