Files

T

Alex Newman 309e8a7139 Implement hybrid search: Chroma semantic + SQLite temporal

Core implementation:
- Added Chroma MCP client integration to search-server.ts
- Implemented queryChroma() helper with Python dict parsing
- Added VECTOR_DB_DIR constant to paths.ts
- Added SessionStore.getObservationsByIds() method

Search handlers updated:
- search_observations: Semantic-first with 90-day temporal filter
- find_by_concept/type/file: Metadata-first, semantic-enhanced ranking
- All handlers fall back to FTS5 if Chroma unavailable

Technical details:
- Direct MCP client usage (no abstractions)
- Regex parsing of Chroma Python dict responses
- Semantic ranking preserved in final results
- Graceful degradation to FTS5-only search

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-10-31 23:00:04 -04:00

6.1 KiB

Raw Permalink Blame History

Prompt for Next Session: Hybrid Search Implementation

Copy this entire prompt into a new Claude Code session to continue the hybrid search feature implementation.

Context

I'm working on the claude-mem project (persistent memory system for Claude Code). I have an experimental branch experiment/chroma-mcp that attempted to add semantic search via ChromaDB, but it has implementation issues and was done in the wrong order.

Current Status:

✅ Experiment validated: Semantic search (Chroma) + temporal filtering (SQLite) works
✅ Chroma collection cm__claude-mem has 2,800+ documents synced
✅ Search quality tests show semantic search provides value
❌ Production implementation has issues (dead code, uncommitted fixes, wrong process)
✅ Feature plan written and ready to execute

Your Task: Follow the feature implementation plan in FEATURE_PLAN_HYBRID_SEARCH.md to implement hybrid search correctly from the ground up.

Immediate Actions

Read the feature plan:

Read: /Users/alexnewman/Scripts/claude-mem/FEATURE_PLAN_HYBRID_SEARCH.md

Understand the experiment results:
- The experiment scripts work correctly
- Chroma semantic search is functional
- We just need to implement it properly in production
Execute Phase 1 of the plan:
- Create new feature/hybrid-search branch from main
- Port working experiment scripts from experiment/chroma-mcp
- Clean up any dead code references

Key Principles for This Implementation

Start clean: New branch from main, no baggage from failed attempt
No abstractions: Direct MCP client usage, no ChromaOrchestrator wrapper
Validate at each step: Don't commit until you've tested it works
Proper parsing: Chroma MCP returns Python dicts, not JSON - use regex parsing
Temporal boundaries: 90-day filter prevents stale semantic matches

Files You'll Need to Work With

Core Implementation:

src/servers/search-server.ts - Add hybrid search workflows
src/services/sync/ChromaSync.ts - NEW: Auto-sync observations to Chroma
src/services/worker-service.ts - Integrate auto-sync
src/shared/paths.ts - Add VECTOR_DB_DIR constant

Experiment Files (keep these, they work):

experiment/chroma-sync-experiment.ts - Manual sync tool
experiment/chroma-search-test.ts - Search quality validator

Files to DELETE (dead code from failed attempt):

src/services/chroma/ChromaOrchestrator.ts - Broken wrapper, never used
test-chroma-connection.ts - Uses broken ChromaOrchestrator
plugin/scripts/search-server.cjs - Stale CommonJS build

Validation Checklist

Before committing any code, verify:

# 1. Build succeeds
npm run build

# 2. Sync works
npx tsx experiment/chroma-sync-experiment.ts

# 3. Search works
npx tsx experiment/chroma-search-test.ts

# 4. MCP server starts
node plugin/scripts/search-server.js
# (Ctrl+C to stop)

# 5. No dead code
grep -r "ChromaOrchestrator" src/  # Should return nothing

# 6. No stale builds
ls plugin/scripts/search-server.cjs  # Should not exist

# 7. Git status clean
git status  # No uncommitted changes to production files

Implementation Workflow (from Phase 3 of plan)

Step 1: Add queryChroma Helper

In src/servers/search-server.ts, add a helper function that:

Takes: query: string, limit: number, whereFilter?: object
Calls: chromaClient.callTool({ name: 'chroma_query_documents', ... })
Parses: Python dict response with regex (see lines 256-318 in current branch for example)
Returns: { ids: number[], distances: number[], metadatas: any[] }

Step 2: Initialize Chroma Client

In main() function:

const chromaTransport = new StdioClientTransport({
  command: 'uvx',
  args: ['chroma-mcp', '--client-type', 'persistent', '--data-dir', VECTOR_DB_DIR]
});
chromaClient = new Client({ name: 'claude-mem-search-chroma-client', version: '1.0.0' }, { capabilities: {} });
await chromaClient.connect(chromaTransport);

Step 3: Update search_observations Handler

Replace FTS5 keyword search with:

Chroma semantic search (top 100)
Filter by recency (90 days)
Hydrate from SQLite in temporal order
Return results

Step 4: Update Metadata Search Handlers

For find_by_concept, find_by_type, find_by_file:

SQLite metadata filter first
Chroma semantic ranking second
Preserve semantic rank order in results

Expected Timeline

Phase 1 (Clean Start): 15 minutes
Phase 2 (Architecture Review): Already done, read the plan
Phase 3 (Implementation): 2-3 hours
Phase 4 (Validation): 1 hour
Phase 5 (Documentation): 1 hour
Phase 6 (Deployment): 30 minutes

Total: ~5-6 hours

Questions to Ask Me

If you encounter any issues:

"The Chroma MCP client isn't connecting" → Check if uvx chroma-mcp is available
"Parsing errors from Chroma responses" → Show me the response format, I'll help fix regex
"Not sure about the search workflow logic" → Reference Phase 2.2 in the plan
"Should I commit now?" → Only if validation checklist passes
"Merge to main or PR?" → I'll decide, just get to Phase 6 first

Success Criteria

Don't merge until ALL of these are true:

✅ Sync experiment completes without errors
✅ Search test shows Chroma returning relevant results
✅ MCP server starts and responds to queries
✅ Fallback to FTS5 works if Chroma unavailable
✅ No breaking changes to MCP tool interfaces
✅ Documentation updated (CLAUDE.md + release notes)
✅ No uncommitted changes in git status
✅ No dead code (ChromaOrchestrator removed)
✅ No stale build artifacts (.cjs files deleted)

Start Here

1. Read the feature plan:
   Read: /Users/alexnewman/Scripts/claude-mem/FEATURE_PLAN_HYBRID_SEARCH.md

2. Create the feature branch:
   Bash: git checkout main && git pull && git checkout -b feature/hybrid-search

3. Begin Phase 1 of the plan (porting experiment scripts)

4. Work through each phase systematically, validating at each step

5. Ask me questions if anything is unclear

Let's build this correctly, from the ground up. Take your time and validate at each step.

6.1 KiB Raw Permalink Blame History