Files
claude-mem/NEXT_SESSION_PROMPT.md
T
Alex Newman 309e8a7139 Implement hybrid search: Chroma semantic + SQLite temporal
Core implementation:
- Added Chroma MCP client integration to search-server.ts
- Implemented queryChroma() helper with Python dict parsing
- Added VECTOR_DB_DIR constant to paths.ts
- Added SessionStore.getObservationsByIds() method

Search handlers updated:
- search_observations: Semantic-first with 90-day temporal filter
- find_by_concept/type/file: Metadata-first, semantic-enhanced ranking
- All handlers fall back to FTS5 if Chroma unavailable

Technical details:
- Direct MCP client usage (no abstractions)
- Regex parsing of Chroma Python dict responses
- Semantic ranking preserved in final results
- Graceful degradation to FTS5-only search

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-31 23:00:04 -04:00

6.1 KiB

Prompt for Next Session: Hybrid Search Implementation

Copy this entire prompt into a new Claude Code session to continue the hybrid search feature implementation.


Context

I'm working on the claude-mem project (persistent memory system for Claude Code). I have an experimental branch experiment/chroma-mcp that attempted to add semantic search via ChromaDB, but it has implementation issues and was done in the wrong order.

Current Status:

  • Experiment validated: Semantic search (Chroma) + temporal filtering (SQLite) works
  • Chroma collection cm__claude-mem has 2,800+ documents synced
  • Search quality tests show semantic search provides value
  • Production implementation has issues (dead code, uncommitted fixes, wrong process)
  • Feature plan written and ready to execute

Your Task: Follow the feature implementation plan in FEATURE_PLAN_HYBRID_SEARCH.md to implement hybrid search correctly from the ground up.


Immediate Actions

  1. Read the feature plan:

    Read: /Users/alexnewman/Scripts/claude-mem/FEATURE_PLAN_HYBRID_SEARCH.md
    
  2. Understand the experiment results:

    • The experiment scripts work correctly
    • Chroma semantic search is functional
    • We just need to implement it properly in production
  3. Execute Phase 1 of the plan:

    • Create new feature/hybrid-search branch from main
    • Port working experiment scripts from experiment/chroma-mcp
    • Clean up any dead code references

Key Principles for This Implementation

  1. Start clean: New branch from main, no baggage from failed attempt
  2. No abstractions: Direct MCP client usage, no ChromaOrchestrator wrapper
  3. Validate at each step: Don't commit until you've tested it works
  4. Proper parsing: Chroma MCP returns Python dicts, not JSON - use regex parsing
  5. Temporal boundaries: 90-day filter prevents stale semantic matches

Files You'll Need to Work With

Core Implementation:

  • src/servers/search-server.ts - Add hybrid search workflows
  • src/services/sync/ChromaSync.ts - NEW: Auto-sync observations to Chroma
  • src/services/worker-service.ts - Integrate auto-sync
  • src/shared/paths.ts - Add VECTOR_DB_DIR constant

Experiment Files (keep these, they work):

  • experiment/chroma-sync-experiment.ts - Manual sync tool
  • experiment/chroma-search-test.ts - Search quality validator

Files to DELETE (dead code from failed attempt):

  • src/services/chroma/ChromaOrchestrator.ts - Broken wrapper, never used
  • test-chroma-connection.ts - Uses broken ChromaOrchestrator
  • plugin/scripts/search-server.cjs - Stale CommonJS build

Validation Checklist

Before committing any code, verify:

# 1. Build succeeds
npm run build

# 2. Sync works
npx tsx experiment/chroma-sync-experiment.ts

# 3. Search works
npx tsx experiment/chroma-search-test.ts

# 4. MCP server starts
node plugin/scripts/search-server.js
# (Ctrl+C to stop)

# 5. No dead code
grep -r "ChromaOrchestrator" src/  # Should return nothing

# 6. No stale builds
ls plugin/scripts/search-server.cjs  # Should not exist

# 7. Git status clean
git status  # No uncommitted changes to production files

Implementation Workflow (from Phase 3 of plan)

Step 1: Add queryChroma Helper

In src/servers/search-server.ts, add a helper function that:

  • Takes: query: string, limit: number, whereFilter?: object
  • Calls: chromaClient.callTool({ name: 'chroma_query_documents', ... })
  • Parses: Python dict response with regex (see lines 256-318 in current branch for example)
  • Returns: { ids: number[], distances: number[], metadatas: any[] }

Step 2: Initialize Chroma Client

In main() function:

const chromaTransport = new StdioClientTransport({
  command: 'uvx',
  args: ['chroma-mcp', '--client-type', 'persistent', '--data-dir', VECTOR_DB_DIR]
});
chromaClient = new Client({ name: 'claude-mem-search-chroma-client', version: '1.0.0' }, { capabilities: {} });
await chromaClient.connect(chromaTransport);

Step 3: Update search_observations Handler

Replace FTS5 keyword search with:

  1. Chroma semantic search (top 100)
  2. Filter by recency (90 days)
  3. Hydrate from SQLite in temporal order
  4. Return results

Step 4: Update Metadata Search Handlers

For find_by_concept, find_by_type, find_by_file:

  1. SQLite metadata filter first
  2. Chroma semantic ranking second
  3. Preserve semantic rank order in results

Expected Timeline

  • Phase 1 (Clean Start): 15 minutes
  • Phase 2 (Architecture Review): Already done, read the plan
  • Phase 3 (Implementation): 2-3 hours
  • Phase 4 (Validation): 1 hour
  • Phase 5 (Documentation): 1 hour
  • Phase 6 (Deployment): 30 minutes

Total: ~5-6 hours


Questions to Ask Me

If you encounter any issues:

  1. "The Chroma MCP client isn't connecting" → Check if uvx chroma-mcp is available
  2. "Parsing errors from Chroma responses" → Show me the response format, I'll help fix regex
  3. "Not sure about the search workflow logic" → Reference Phase 2.2 in the plan
  4. "Should I commit now?" → Only if validation checklist passes
  5. "Merge to main or PR?" → I'll decide, just get to Phase 6 first

Success Criteria

Don't merge until ALL of these are true:

  • Sync experiment completes without errors
  • Search test shows Chroma returning relevant results
  • MCP server starts and responds to queries
  • Fallback to FTS5 works if Chroma unavailable
  • No breaking changes to MCP tool interfaces
  • Documentation updated (CLAUDE.md + release notes)
  • No uncommitted changes in git status
  • No dead code (ChromaOrchestrator removed)
  • No stale build artifacts (.cjs files deleted)

Start Here

1. Read the feature plan:
   Read: /Users/alexnewman/Scripts/claude-mem/FEATURE_PLAN_HYBRID_SEARCH.md

2. Create the feature branch:
   Bash: git checkout main && git pull && git checkout -b feature/hybrid-search

3. Begin Phase 1 of the plan (porting experiment scripts)

4. Work through each phase systematically, validating at each step

5. Ask me questions if anything is unclear

Let's build this correctly, from the ground up. Take your time and validate at each step.