03ba89b703
- Changed user prompt formatting to use full text instead of truncated version. - Updated date filtering logic to use milliseconds instead of seconds for 90-day cutoff. - Renamed doc_type values in ChromaSync to ensure consistency and prevent deduplication issues. - Improved documentation for concept tags in input schema.
315 lines
9.9 KiB
Markdown
315 lines
9.9 KiB
Markdown
# CodeRabbit Review - Issue Validation
|
|
|
|
**Analysis Date:** 2025-11-03
|
|
**Analyzed By:** Claude (Sonnet 4.5)
|
|
**Priority:** 🔴 Critical | 🟡 Medium | 🟢 Low
|
|
|
|
---
|
|
|
|
## Issue 1: Chroma Search False Positives
|
|
|
|
**Location:** `experiment/chroma-search-test.ts:135-166`
|
|
**Priority:** 🟢 Low
|
|
**Status:** ✅ CONFIRMED - Real bug, correct fix
|
|
**Severity:** Low (experiment file only, not production code)
|
|
|
|
### Problem
|
|
The code marks `chromaFound = true` if the raw text contains the string `'ids'`, even for empty results like `'ids': [[]]`.
|
|
|
|
**Current code (line 137):**
|
|
```typescript
|
|
testResult.chromaFound = resultText.includes('ids') && resultText.length > 50;
|
|
```
|
|
|
|
This creates false positives by checking for string containment rather than validating actual result content.
|
|
|
|
### Validation
|
|
Confirmed by reading the actual code. The logic uses simple string matching which would match both:
|
|
- Real results: `'ids': [['obs_123', 'obs_456']]` ✓
|
|
- Empty results: `'ids': [[]]` ✗ (incorrectly marked as success)
|
|
|
|
### Recommended Fix
|
|
Parse and validate the actual content of the `ids` and/or `documents` arrays:
|
|
|
|
```typescript
|
|
// Extract and parse the 'ids' array
|
|
const idsMatch = resultText.match(/'ids':\s*\[(.*?)\]/s);
|
|
if (idsMatch) {
|
|
try {
|
|
// Check if there's at least one non-empty inner array
|
|
const idsContent = idsMatch[1];
|
|
const hasResults = idsContent.includes('[') &&
|
|
!idsContent.match(/\[\s*\]/); // Not just empty arrays
|
|
testResult.chromaFound = hasResults;
|
|
} catch {
|
|
testResult.chromaFound = false;
|
|
}
|
|
}
|
|
```
|
|
|
|
### Decision
|
|
**DEFER** - This is an experiment file, not production code. The bug doesn't affect actual functionality. Can be fixed as a cleanup task when working in this area.
|
|
|
|
---
|
|
|
|
## Issue 2: 90-Day Cutoff Units Mismatch
|
|
|
|
**Location:** `src/servers/search-server.ts:374-381` (and 3 other hybrid search handlers)
|
|
**Priority:** 🔴 Critical
|
|
**Status:** ✅ CONFIRMED - Critical bug, MUST FIX IMMEDIATELY
|
|
**Severity:** High (breaks 90-day temporal filtering entirely)
|
|
|
|
### Problem
|
|
The 90-day cutoff is computed in **seconds** but `created_at_epoch` is stored in **milliseconds**, causing the filter to never exclude anything.
|
|
|
|
**Current code (line 374):**
|
|
```typescript
|
|
const ninetyDaysAgo = Math.floor(Date.now() / 1000) - (90 * 24 * 60 * 60);
|
|
// ...
|
|
return meta && meta.created_at_epoch > ninetyDaysAgo;
|
|
```
|
|
|
|
### Validation
|
|
**Database verification:**
|
|
```bash
|
|
$ sqlite3 ~/.claude-mem/claude-mem.db "SELECT created_at_epoch FROM observations LIMIT 1"
|
|
1762212399087 # This is in MILLISECONDS
|
|
```
|
|
|
|
**Comparison breakdown:**
|
|
- `ninetyDaysAgo` = ~1,754,000,000 (seconds, 10 digits)
|
|
- `created_at_epoch` = ~1,762,212,399,087 (milliseconds, 13 digits)
|
|
|
|
The millisecond value is **ALWAYS** larger than the second value, so the filter `created_at_epoch > ninetyDaysAgo` **ALWAYS** passes, accepting ALL documents regardless of age.
|
|
|
|
### Impact
|
|
- 90-day temporal boundary completely non-functional
|
|
- Performance degradation (processes all historical data)
|
|
- Incorrect search results (includes very old observations)
|
|
- Affects 4 handlers: `search_observations`, `search_sessions`, `search_user_prompts`, `get_timeline_by_query`
|
|
|
|
### Recommended Fix
|
|
Keep milliseconds throughout (remove the `/1000` division):
|
|
|
|
**File:** `src/servers/search-server.ts`
|
|
|
|
**Find and replace in all 4 hybrid search handlers:**
|
|
```typescript
|
|
// OLD (WRONG - converts to seconds)
|
|
const ninetyDaysAgo = Math.floor(Date.now() / 1000) - (90 * 24 * 60 * 60);
|
|
|
|
// NEW (CORRECT - stays in milliseconds)
|
|
const ninetyDaysAgo = Date.now() - (90 * 24 * 60 * 60 * 1000);
|
|
```
|
|
|
|
**Locations to fix:**
|
|
1. `search_observations` handler (~line 374)
|
|
2. `search_sessions` handler
|
|
3. `search_user_prompts` handler
|
|
4. `get_timeline_by_query` handler
|
|
|
|
### Decision
|
|
**FIX IMMEDIATELY** - This is a critical bug that breaks core functionality.
|
|
|
|
---
|
|
|
|
## Issue 3: Chroma Collection Name Mismatch
|
|
|
|
**Location:** `src/services/sync/ChromaSync.ts:77-81` and `src/servers/search-server.ts:26`
|
|
**Priority:** 🟡 Medium
|
|
**Status:** ⚠️ CURRENTLY WORKS but architectural risk
|
|
**Severity:** Medium (maintainability issue, potential future breakage)
|
|
|
|
### Problem
|
|
ChromaSync builds collection names as `cm__${project}` (parameterized) while search-server uses a hard-coded `'cm__claude-mem'`, creating maintainability risk.
|
|
|
|
**ChromaSync.ts (line 79):**
|
|
```typescript
|
|
this.collectionName = `cm__${project}`;
|
|
```
|
|
|
|
**search-server.ts (line 26):**
|
|
```typescript
|
|
const COLLECTION_NAME = 'cm__claude-mem';
|
|
```
|
|
|
|
**worker-service.ts (line 94):**
|
|
```typescript
|
|
this.chromaSync = new ChromaSync('claude-mem');
|
|
```
|
|
|
|
### Validation
|
|
**Current state:** WORKS (both resolve to `'cm__claude-mem'`)
|
|
**Risk:** If anyone changes the ChromaSync instantiation parameter or creates another instance, collections won't match.
|
|
|
|
### Recommended Fix
|
|
Create a shared constant in a common config location:
|
|
|
|
**New file:** `src/shared/config.ts`
|
|
```typescript
|
|
export const CHROMA_COLLECTION_NAME = 'cm__claude-mem';
|
|
// OR for dynamic project support:
|
|
export function getCollectionName(project: string = 'claude-mem'): string {
|
|
return `cm__${project}`;
|
|
}
|
|
```
|
|
|
|
**Update ChromaSync.ts:**
|
|
```typescript
|
|
import { CHROMA_COLLECTION_NAME } from '../shared/config';
|
|
// ...
|
|
this.collectionName = CHROMA_COLLECTION_NAME;
|
|
```
|
|
|
|
**Update search-server.ts:**
|
|
```typescript
|
|
import { CHROMA_COLLECTION_NAME } from '../shared/config';
|
|
// ...
|
|
const COLLECTION_NAME = CHROMA_COLLECTION_NAME;
|
|
```
|
|
|
|
### Decision
|
|
**RECOMMENDED FIX** - Good architectural improvement, prevents future bugs. Not urgent since it currently works, but should be included in the next refactoring pass.
|
|
|
|
---
|
|
|
|
## Issue 4: doc_type Value Mismatch in ChromaSync
|
|
|
|
**Location:** `src/services/sync/ChromaSync.ts:523-532` (read) vs lines 240, 429 (write)
|
|
**Priority:** 🔴 Critical
|
|
**Status:** ✅ CONFIRMED - Critical bug, MUST FIX
|
|
**Severity:** High (breaks deduplication, causes duplicate insert failures)
|
|
|
|
### Problem
|
|
Documents are written with `'session_summary'` and `'user_prompt'` but the deduplication logic looks for `'summary'` and `'prompt'`, causing existing documents to not be detected.
|
|
|
|
**Write side (formatSummaryDocs, line 240):**
|
|
```typescript
|
|
doc_type: 'session_summary',
|
|
```
|
|
|
|
**Write side (formatUserPromptDoc, line 429):**
|
|
```typescript
|
|
doc_type: 'user_prompt',
|
|
```
|
|
|
|
**Read side (getExistingChromaIds, lines 526-529):**
|
|
```typescript
|
|
} else if (meta.doc_type === 'summary') {
|
|
summaryIds.add(meta.sqlite_id);
|
|
} else if (meta.doc_type === 'prompt') {
|
|
promptIds.add(meta.sqlite_id);
|
|
}
|
|
```
|
|
|
|
### Validation
|
|
Confirmed by code inspection. The mismatch causes:
|
|
1. `getExistingChromaIds` doesn't find existing summaries/prompts
|
|
2. They're not added to the deduplication sets
|
|
3. System tries to insert them again
|
|
4. Chroma rejects with duplicate ID errors
|
|
|
|
### Impact
|
|
- Deduplication completely broken for summaries and prompts
|
|
- Backfill operations fail (see Issue 5)
|
|
- Duplicate insert errors in production
|
|
- Observations work fine (they use 'observation' consistently)
|
|
|
|
### Recommended Fix
|
|
**PREFERRED APPROACH:** Fix the read side (backward compatible with existing Chroma data)
|
|
|
|
**File:** `src/services/sync/ChromaSync.ts`
|
|
**Lines:** 526-529
|
|
|
|
```typescript
|
|
} else if (meta.doc_type === 'session_summary') { // Changed from 'summary'
|
|
summaryIds.add(meta.sqlite_id);
|
|
} else if (meta.doc_type === 'user_prompt') { // Changed from 'prompt'
|
|
promptIds.add(meta.sqlite_id);
|
|
}
|
|
```
|
|
|
|
**Why this approach:**
|
|
- ✅ Backward compatible with existing Chroma data
|
|
- ✅ No data migration required
|
|
- ✅ Safer than changing write side
|
|
- ✅ Works immediately
|
|
|
|
**Alternative approach (NOT recommended):** Change write side to use 'summary'/'prompt'
|
|
- ❌ Requires Chroma data migration
|
|
- ❌ Orphans existing documents
|
|
- ❌ Higher risk
|
|
|
|
### Decision
|
|
**FIX IMMEDIATELY** - Critical bug affecting deduplication. Use the backward-compatible fix (change read side).
|
|
|
|
---
|
|
|
|
## Issue 5: doc_type Mismatch Causing Backfill Failures
|
|
|
|
**Location:** `src/services/worker-service.ts:120-128` (manifestation of Issue 4)
|
|
**Priority:** 🔴 Critical
|
|
**Status:** ✅ CONFIRMED - Same root cause as Issue 4
|
|
**Severity:** High (duplicate of Issue 4)
|
|
|
|
### Problem
|
|
Backfill operations fail because of the doc_type mismatch described in Issue 4.
|
|
|
|
### Validation
|
|
This is not a separate bug - it's a **symptom** of Issue 4. The backfill process:
|
|
1. Queries SQLite for summaries/prompts to sync
|
|
2. Calls `getExistingChromaIds` to avoid duplicates
|
|
3. Due to doc_type mismatch, existing IDs aren't found
|
|
4. Tries to insert documents that already exist
|
|
5. Chroma rejects with duplicate ID errors
|
|
6. Backfill fails
|
|
|
|
### Decision
|
|
**AUTOMATICALLY RESOLVED** by fixing Issue 4. Not a separate fix needed.
|
|
|
|
---
|
|
|
|
## Summary & Action Plan
|
|
|
|
### Critical Issues (Fix Immediately)
|
|
1. ✅ **Issue 2** - 90-day units mismatch
|
|
- Fix: Change all 4 handlers to use milliseconds
|
|
- Impact: Restores temporal filtering functionality
|
|
|
|
2. ✅ **Issue 4** - doc_type mismatch
|
|
- Fix: Change getExistingChromaIds to use 'session_summary'/'user_prompt'
|
|
- Impact: Fixes deduplication and backfill
|
|
|
|
3. ✅ **Issue 5** - Automatically resolved by fixing Issue 4
|
|
|
|
### Medium Priority (Include in Next Refactor)
|
|
4. ⚠️ **Issue 3** - Collection name consistency
|
|
- Fix: Create shared constant
|
|
- Impact: Better maintainability, prevents future bugs
|
|
|
|
### Low Priority (Defer)
|
|
5. 🟢 **Issue 1** - False positives in experiment
|
|
- Fix: Parse and validate arrays
|
|
- Impact: More accurate test results (experiment only)
|
|
|
|
### Files Requiring Changes
|
|
|
|
**High Priority:**
|
|
- `src/servers/search-server.ts` (Issue 2 - 4 locations)
|
|
- `src/services/sync/ChromaSync.ts` (Issue 4 - lines 526-529)
|
|
|
|
**Medium Priority:**
|
|
- `src/shared/config.ts` (Issue 3 - new file)
|
|
- `src/services/sync/ChromaSync.ts` (Issue 3 - import)
|
|
- `src/servers/search-server.ts` (Issue 3 - import)
|
|
|
|
**Low Priority:**
|
|
- `experiment/chroma-search-test.ts` (Issue 1)
|
|
|
|
### Testing Recommendations
|
|
After fixes:
|
|
1. Test 90-day filtering with dates before/after cutoff
|
|
2. Run backfill operation to verify deduplication
|
|
3. Verify no duplicate ID errors in logs
|
|
4. Test hybrid search with temporal boundaries
|