Files
claude-mem/docs/coderabbit-PR-41.md
T
Alex Newman 03ba89b703 fix: update user prompt formatting and correct 90-day cutoff logic
- Changed user prompt formatting to use full text instead of truncated version.
- Updated date filtering logic to use milliseconds instead of seconds for 90-day cutoff.
- Renamed doc_type values in ChromaSync to ensure consistency and prevent deduplication issues.
- Improved documentation for concept tags in input schema.
2025-11-03 19:05:12 -05:00

315 lines
9.9 KiB
Markdown

# CodeRabbit Review - Issue Validation
**Analysis Date:** 2025-11-03
**Analyzed By:** Claude (Sonnet 4.5)
**Priority:** 🔴 Critical | 🟡 Medium | 🟢 Low
---
## Issue 1: Chroma Search False Positives
**Location:** `experiment/chroma-search-test.ts:135-166`
**Priority:** 🟢 Low
**Status:** ✅ CONFIRMED - Real bug, correct fix
**Severity:** Low (experiment file only, not production code)
### Problem
The code marks `chromaFound = true` if the raw text contains the string `'ids'`, even for empty results like `'ids': [[]]`.
**Current code (line 137):**
```typescript
testResult.chromaFound = resultText.includes('ids') && resultText.length > 50;
```
This creates false positives by checking for string containment rather than validating actual result content.
### Validation
Confirmed by reading the actual code. The logic uses simple string matching which would match both:
- Real results: `'ids': [['obs_123', 'obs_456']]`
- Empty results: `'ids': [[]]` ✗ (incorrectly marked as success)
### Recommended Fix
Parse and validate the actual content of the `ids` and/or `documents` arrays:
```typescript
// Extract and parse the 'ids' array
const idsMatch = resultText.match(/'ids':\s*\[(.*?)\]/s);
if (idsMatch) {
try {
// Check if there's at least one non-empty inner array
const idsContent = idsMatch[1];
const hasResults = idsContent.includes('[') &&
!idsContent.match(/\[\s*\]/); // Not just empty arrays
testResult.chromaFound = hasResults;
} catch {
testResult.chromaFound = false;
}
}
```
### Decision
**DEFER** - This is an experiment file, not production code. The bug doesn't affect actual functionality. Can be fixed as a cleanup task when working in this area.
---
## Issue 2: 90-Day Cutoff Units Mismatch
**Location:** `src/servers/search-server.ts:374-381` (and 3 other hybrid search handlers)
**Priority:** 🔴 Critical
**Status:** ✅ CONFIRMED - Critical bug, MUST FIX IMMEDIATELY
**Severity:** High (breaks 90-day temporal filtering entirely)
### Problem
The 90-day cutoff is computed in **seconds** but `created_at_epoch` is stored in **milliseconds**, causing the filter to never exclude anything.
**Current code (line 374):**
```typescript
const ninetyDaysAgo = Math.floor(Date.now() / 1000) - (90 * 24 * 60 * 60);
// ...
return meta && meta.created_at_epoch > ninetyDaysAgo;
```
### Validation
**Database verification:**
```bash
$ sqlite3 ~/.claude-mem/claude-mem.db "SELECT created_at_epoch FROM observations LIMIT 1"
1762212399087 # This is in MILLISECONDS
```
**Comparison breakdown:**
- `ninetyDaysAgo` = ~1,754,000,000 (seconds, 10 digits)
- `created_at_epoch` = ~1,762,212,399,087 (milliseconds, 13 digits)
The millisecond value is **ALWAYS** larger than the second value, so the filter `created_at_epoch > ninetyDaysAgo` **ALWAYS** passes, accepting ALL documents regardless of age.
### Impact
- 90-day temporal boundary completely non-functional
- Performance degradation (processes all historical data)
- Incorrect search results (includes very old observations)
- Affects 4 handlers: `search_observations`, `search_sessions`, `search_user_prompts`, `get_timeline_by_query`
### Recommended Fix
Keep milliseconds throughout (remove the `/1000` division):
**File:** `src/servers/search-server.ts`
**Find and replace in all 4 hybrid search handlers:**
```typescript
// OLD (WRONG - converts to seconds)
const ninetyDaysAgo = Math.floor(Date.now() / 1000) - (90 * 24 * 60 * 60);
// NEW (CORRECT - stays in milliseconds)
const ninetyDaysAgo = Date.now() - (90 * 24 * 60 * 60 * 1000);
```
**Locations to fix:**
1. `search_observations` handler (~line 374)
2. `search_sessions` handler
3. `search_user_prompts` handler
4. `get_timeline_by_query` handler
### Decision
**FIX IMMEDIATELY** - This is a critical bug that breaks core functionality.
---
## Issue 3: Chroma Collection Name Mismatch
**Location:** `src/services/sync/ChromaSync.ts:77-81` and `src/servers/search-server.ts:26`
**Priority:** 🟡 Medium
**Status:** ⚠️ CURRENTLY WORKS but architectural risk
**Severity:** Medium (maintainability issue, potential future breakage)
### Problem
ChromaSync builds collection names as `cm__${project}` (parameterized) while search-server uses a hard-coded `'cm__claude-mem'`, creating maintainability risk.
**ChromaSync.ts (line 79):**
```typescript
this.collectionName = `cm__${project}`;
```
**search-server.ts (line 26):**
```typescript
const COLLECTION_NAME = 'cm__claude-mem';
```
**worker-service.ts (line 94):**
```typescript
this.chromaSync = new ChromaSync('claude-mem');
```
### Validation
**Current state:** WORKS (both resolve to `'cm__claude-mem'`)
**Risk:** If anyone changes the ChromaSync instantiation parameter or creates another instance, collections won't match.
### Recommended Fix
Create a shared constant in a common config location:
**New file:** `src/shared/config.ts`
```typescript
export const CHROMA_COLLECTION_NAME = 'cm__claude-mem';
// OR for dynamic project support:
export function getCollectionName(project: string = 'claude-mem'): string {
return `cm__${project}`;
}
```
**Update ChromaSync.ts:**
```typescript
import { CHROMA_COLLECTION_NAME } from '../shared/config';
// ...
this.collectionName = CHROMA_COLLECTION_NAME;
```
**Update search-server.ts:**
```typescript
import { CHROMA_COLLECTION_NAME } from '../shared/config';
// ...
const COLLECTION_NAME = CHROMA_COLLECTION_NAME;
```
### Decision
**RECOMMENDED FIX** - Good architectural improvement, prevents future bugs. Not urgent since it currently works, but should be included in the next refactoring pass.
---
## Issue 4: doc_type Value Mismatch in ChromaSync
**Location:** `src/services/sync/ChromaSync.ts:523-532` (read) vs lines 240, 429 (write)
**Priority:** 🔴 Critical
**Status:** ✅ CONFIRMED - Critical bug, MUST FIX
**Severity:** High (breaks deduplication, causes duplicate insert failures)
### Problem
Documents are written with `'session_summary'` and `'user_prompt'` but the deduplication logic looks for `'summary'` and `'prompt'`, causing existing documents to not be detected.
**Write side (formatSummaryDocs, line 240):**
```typescript
doc_type: 'session_summary',
```
**Write side (formatUserPromptDoc, line 429):**
```typescript
doc_type: 'user_prompt',
```
**Read side (getExistingChromaIds, lines 526-529):**
```typescript
} else if (meta.doc_type === 'summary') {
summaryIds.add(meta.sqlite_id);
} else if (meta.doc_type === 'prompt') {
promptIds.add(meta.sqlite_id);
}
```
### Validation
Confirmed by code inspection. The mismatch causes:
1. `getExistingChromaIds` doesn't find existing summaries/prompts
2. They're not added to the deduplication sets
3. System tries to insert them again
4. Chroma rejects with duplicate ID errors
### Impact
- Deduplication completely broken for summaries and prompts
- Backfill operations fail (see Issue 5)
- Duplicate insert errors in production
- Observations work fine (they use 'observation' consistently)
### Recommended Fix
**PREFERRED APPROACH:** Fix the read side (backward compatible with existing Chroma data)
**File:** `src/services/sync/ChromaSync.ts`
**Lines:** 526-529
```typescript
} else if (meta.doc_type === 'session_summary') { // Changed from 'summary'
summaryIds.add(meta.sqlite_id);
} else if (meta.doc_type === 'user_prompt') { // Changed from 'prompt'
promptIds.add(meta.sqlite_id);
}
```
**Why this approach:**
- ✅ Backward compatible with existing Chroma data
- ✅ No data migration required
- ✅ Safer than changing write side
- ✅ Works immediately
**Alternative approach (NOT recommended):** Change write side to use 'summary'/'prompt'
- ❌ Requires Chroma data migration
- ❌ Orphans existing documents
- ❌ Higher risk
### Decision
**FIX IMMEDIATELY** - Critical bug affecting deduplication. Use the backward-compatible fix (change read side).
---
## Issue 5: doc_type Mismatch Causing Backfill Failures
**Location:** `src/services/worker-service.ts:120-128` (manifestation of Issue 4)
**Priority:** 🔴 Critical
**Status:** ✅ CONFIRMED - Same root cause as Issue 4
**Severity:** High (duplicate of Issue 4)
### Problem
Backfill operations fail because of the doc_type mismatch described in Issue 4.
### Validation
This is not a separate bug - it's a **symptom** of Issue 4. The backfill process:
1. Queries SQLite for summaries/prompts to sync
2. Calls `getExistingChromaIds` to avoid duplicates
3. Due to doc_type mismatch, existing IDs aren't found
4. Tries to insert documents that already exist
5. Chroma rejects with duplicate ID errors
6. Backfill fails
### Decision
**AUTOMATICALLY RESOLVED** by fixing Issue 4. Not a separate fix needed.
---
## Summary & Action Plan
### Critical Issues (Fix Immediately)
1.**Issue 2** - 90-day units mismatch
- Fix: Change all 4 handlers to use milliseconds
- Impact: Restores temporal filtering functionality
2.**Issue 4** - doc_type mismatch
- Fix: Change getExistingChromaIds to use 'session_summary'/'user_prompt'
- Impact: Fixes deduplication and backfill
3.**Issue 5** - Automatically resolved by fixing Issue 4
### Medium Priority (Include in Next Refactor)
4. ⚠️ **Issue 3** - Collection name consistency
- Fix: Create shared constant
- Impact: Better maintainability, prevents future bugs
### Low Priority (Defer)
5. 🟢 **Issue 1** - False positives in experiment
- Fix: Parse and validate arrays
- Impact: More accurate test results (experiment only)
### Files Requiring Changes
**High Priority:**
- `src/servers/search-server.ts` (Issue 2 - 4 locations)
- `src/services/sync/ChromaSync.ts` (Issue 4 - lines 526-529)
**Medium Priority:**
- `src/shared/config.ts` (Issue 3 - new file)
- `src/services/sync/ChromaSync.ts` (Issue 3 - import)
- `src/servers/search-server.ts` (Issue 3 - import)
**Low Priority:**
- `experiment/chroma-search-test.ts` (Issue 1)
### Testing Recommendations
After fixes:
1. Test 90-day filtering with dates before/after cutoff
2. Run backfill operation to verify deduplication
3. Verify no duplicate ID errors in logs
4. Test hybrid search with temporal boundaries