Files
claude-mem/docs/coderabbit-PR-41.md
T
Alex Newman 03ba89b703 fix: update user prompt formatting and correct 90-day cutoff logic
- Changed user prompt formatting to use full text instead of truncated version.
- Updated date filtering logic to use milliseconds instead of seconds for 90-day cutoff.
- Renamed doc_type values in ChromaSync to ensure consistency and prevent deduplication issues.
- Improved documentation for concept tags in input schema.
2025-11-03 19:05:12 -05:00

9.9 KiB

CodeRabbit Review - Issue Validation

Analysis Date: 2025-11-03 Analyzed By: Claude (Sonnet 4.5) Priority: 🔴 Critical | 🟡 Medium | 🟢 Low


Issue 1: Chroma Search False Positives

Location: experiment/chroma-search-test.ts:135-166 Priority: 🟢 Low Status: CONFIRMED - Real bug, correct fix Severity: Low (experiment file only, not production code)

Problem

The code marks chromaFound = true if the raw text contains the string 'ids', even for empty results like 'ids': [[]].

Current code (line 137):

testResult.chromaFound = resultText.includes('ids') && resultText.length > 50;

This creates false positives by checking for string containment rather than validating actual result content.

Validation

Confirmed by reading the actual code. The logic uses simple string matching which would match both:

  • Real results: 'ids': [['obs_123', 'obs_456']]
  • Empty results: 'ids': [[]] ✗ (incorrectly marked as success)

Parse and validate the actual content of the ids and/or documents arrays:

// Extract and parse the 'ids' array
const idsMatch = resultText.match(/'ids':\s*\[(.*?)\]/s);
if (idsMatch) {
  try {
    // Check if there's at least one non-empty inner array
    const idsContent = idsMatch[1];
    const hasResults = idsContent.includes('[') &&
                      !idsContent.match(/\[\s*\]/); // Not just empty arrays
    testResult.chromaFound = hasResults;
  } catch {
    testResult.chromaFound = false;
  }
}

Decision

DEFER - This is an experiment file, not production code. The bug doesn't affect actual functionality. Can be fixed as a cleanup task when working in this area.


Issue 2: 90-Day Cutoff Units Mismatch

Location: src/servers/search-server.ts:374-381 (and 3 other hybrid search handlers) Priority: 🔴 Critical Status: CONFIRMED - Critical bug, MUST FIX IMMEDIATELY Severity: High (breaks 90-day temporal filtering entirely)

Problem

The 90-day cutoff is computed in seconds but created_at_epoch is stored in milliseconds, causing the filter to never exclude anything.

Current code (line 374):

const ninetyDaysAgo = Math.floor(Date.now() / 1000) - (90 * 24 * 60 * 60);
// ...
return meta && meta.created_at_epoch > ninetyDaysAgo;

Validation

Database verification:

$ sqlite3 ~/.claude-mem/claude-mem.db "SELECT created_at_epoch FROM observations LIMIT 1"
1762212399087  # This is in MILLISECONDS

Comparison breakdown:

  • ninetyDaysAgo = ~1,754,000,000 (seconds, 10 digits)
  • created_at_epoch = ~1,762,212,399,087 (milliseconds, 13 digits)

The millisecond value is ALWAYS larger than the second value, so the filter created_at_epoch > ninetyDaysAgo ALWAYS passes, accepting ALL documents regardless of age.

Impact

  • 90-day temporal boundary completely non-functional
  • Performance degradation (processes all historical data)
  • Incorrect search results (includes very old observations)
  • Affects 4 handlers: search_observations, search_sessions, search_user_prompts, get_timeline_by_query

Keep milliseconds throughout (remove the /1000 division):

File: src/servers/search-server.ts

Find and replace in all 4 hybrid search handlers:

// OLD (WRONG - converts to seconds)
const ninetyDaysAgo = Math.floor(Date.now() / 1000) - (90 * 24 * 60 * 60);

// NEW (CORRECT - stays in milliseconds)
const ninetyDaysAgo = Date.now() - (90 * 24 * 60 * 60 * 1000);

Locations to fix:

  1. search_observations handler (~line 374)
  2. search_sessions handler
  3. search_user_prompts handler
  4. get_timeline_by_query handler

Decision

FIX IMMEDIATELY - This is a critical bug that breaks core functionality.


Issue 3: Chroma Collection Name Mismatch

Location: src/services/sync/ChromaSync.ts:77-81 and src/servers/search-server.ts:26 Priority: 🟡 Medium Status: ⚠️ CURRENTLY WORKS but architectural risk Severity: Medium (maintainability issue, potential future breakage)

Problem

ChromaSync builds collection names as cm__${project} (parameterized) while search-server uses a hard-coded 'cm__claude-mem', creating maintainability risk.

ChromaSync.ts (line 79):

this.collectionName = `cm__${project}`;

search-server.ts (line 26):

const COLLECTION_NAME = 'cm__claude-mem';

worker-service.ts (line 94):

this.chromaSync = new ChromaSync('claude-mem');

Validation

Current state: WORKS (both resolve to 'cm__claude-mem') Risk: If anyone changes the ChromaSync instantiation parameter or creates another instance, collections won't match.

Create a shared constant in a common config location:

New file: src/shared/config.ts

export const CHROMA_COLLECTION_NAME = 'cm__claude-mem';
// OR for dynamic project support:
export function getCollectionName(project: string = 'claude-mem'): string {
  return `cm__${project}`;
}

Update ChromaSync.ts:

import { CHROMA_COLLECTION_NAME } from '../shared/config';
// ...
this.collectionName = CHROMA_COLLECTION_NAME;

Update search-server.ts:

import { CHROMA_COLLECTION_NAME } from '../shared/config';
// ...
const COLLECTION_NAME = CHROMA_COLLECTION_NAME;

Decision

RECOMMENDED FIX - Good architectural improvement, prevents future bugs. Not urgent since it currently works, but should be included in the next refactoring pass.


Issue 4: doc_type Value Mismatch in ChromaSync

Location: src/services/sync/ChromaSync.ts:523-532 (read) vs lines 240, 429 (write) Priority: 🔴 Critical Status: CONFIRMED - Critical bug, MUST FIX Severity: High (breaks deduplication, causes duplicate insert failures)

Problem

Documents are written with 'session_summary' and 'user_prompt' but the deduplication logic looks for 'summary' and 'prompt', causing existing documents to not be detected.

Write side (formatSummaryDocs, line 240):

doc_type: 'session_summary',

Write side (formatUserPromptDoc, line 429):

doc_type: 'user_prompt',

Read side (getExistingChromaIds, lines 526-529):

} else if (meta.doc_type === 'summary') {
  summaryIds.add(meta.sqlite_id);
} else if (meta.doc_type === 'prompt') {
  promptIds.add(meta.sqlite_id);
}

Validation

Confirmed by code inspection. The mismatch causes:

  1. getExistingChromaIds doesn't find existing summaries/prompts
  2. They're not added to the deduplication sets
  3. System tries to insert them again
  4. Chroma rejects with duplicate ID errors

Impact

  • Deduplication completely broken for summaries and prompts
  • Backfill operations fail (see Issue 5)
  • Duplicate insert errors in production
  • Observations work fine (they use 'observation' consistently)

PREFERRED APPROACH: Fix the read side (backward compatible with existing Chroma data)

File: src/services/sync/ChromaSync.ts Lines: 526-529

} else if (meta.doc_type === 'session_summary') {  // Changed from 'summary'
  summaryIds.add(meta.sqlite_id);
} else if (meta.doc_type === 'user_prompt') {      // Changed from 'prompt'
  promptIds.add(meta.sqlite_id);
}

Why this approach:

  • Backward compatible with existing Chroma data
  • No data migration required
  • Safer than changing write side
  • Works immediately

Alternative approach (NOT recommended): Change write side to use 'summary'/'prompt'

  • Requires Chroma data migration
  • Orphans existing documents
  • Higher risk

Decision

FIX IMMEDIATELY - Critical bug affecting deduplication. Use the backward-compatible fix (change read side).


Issue 5: doc_type Mismatch Causing Backfill Failures

Location: src/services/worker-service.ts:120-128 (manifestation of Issue 4) Priority: 🔴 Critical Status: CONFIRMED - Same root cause as Issue 4 Severity: High (duplicate of Issue 4)

Problem

Backfill operations fail because of the doc_type mismatch described in Issue 4.

Validation

This is not a separate bug - it's a symptom of Issue 4. The backfill process:

  1. Queries SQLite for summaries/prompts to sync
  2. Calls getExistingChromaIds to avoid duplicates
  3. Due to doc_type mismatch, existing IDs aren't found
  4. Tries to insert documents that already exist
  5. Chroma rejects with duplicate ID errors
  6. Backfill fails

Decision

AUTOMATICALLY RESOLVED by fixing Issue 4. Not a separate fix needed.


Summary & Action Plan

Critical Issues (Fix Immediately)

  1. Issue 2 - 90-day units mismatch

    • Fix: Change all 4 handlers to use milliseconds
    • Impact: Restores temporal filtering functionality
  2. Issue 4 - doc_type mismatch

    • Fix: Change getExistingChromaIds to use 'session_summary'/'user_prompt'
    • Impact: Fixes deduplication and backfill
  3. Issue 5 - Automatically resolved by fixing Issue 4

Medium Priority (Include in Next Refactor)

  1. ⚠️ Issue 3 - Collection name consistency
    • Fix: Create shared constant
    • Impact: Better maintainability, prevents future bugs

Low Priority (Defer)

  1. 🟢 Issue 1 - False positives in experiment
    • Fix: Parse and validate arrays
    • Impact: More accurate test results (experiment only)

Files Requiring Changes

High Priority:

  • src/servers/search-server.ts (Issue 2 - 4 locations)
  • src/services/sync/ChromaSync.ts (Issue 4 - lines 526-529)

Medium Priority:

  • src/shared/config.ts (Issue 3 - new file)
  • src/services/sync/ChromaSync.ts (Issue 3 - import)
  • src/servers/search-server.ts (Issue 3 - import)

Low Priority:

  • experiment/chroma-search-test.ts (Issue 1)

Testing Recommendations

After fixes:

  1. Test 90-day filtering with dates before/after cutoff
  2. Run backfill operation to verify deduplication
  3. Verify no duplicate ID errors in logs
  4. Test hybrid search with temporal boundaries