Files
claude-mem/Auto Run Docs/PR-Triage/PR-Triage-09.md
T

5.9 KiB

Phase 09: Chroma/Vector Search Fixes

These PRs address Chroma vector database stability, zombie processes, and enterprise compatibility.

Bug Fixes

  • Review PR #887 (fix: align IDs with metadatas in ChromaSearchStrategy by @abkrim). Files: src/services/worker/search/strategies/ChromaSearchStrategy.ts, tests. IDs and metadata arrays are misaligned causing incorrect search results. Steps: (1) gh pr checkout 887 (2) Review — array alignment between document IDs and their metadata is critical for correct results (3) Check test coverage (4) Run npm run build (5) If clean: gh pr merge 887 --rebase --delete-branch

    • Merged (2026-02-06): Confirmed the bug — queryChroma() deduplicates IDs but returns raw metadatas array, causing index misalignment in filterByRecency(). Fix builds a Map from sqlite_id→metadata then iterates deduplicated IDs. 20 tests pass (including 2 new). Build clean.
  • Review PR #769 (fix: close transport on connection error to prevent chroma-mcp zombie processes by @jenyapoyarkov). Files: src/services/sync/ChromaSync.ts, tests. Transport left open on connection failure creates zombies. Steps: (1) gh pr checkout 769 (2) Review — should close/dispose transport in the error path (3) Run npm run build (4) If clean: gh pr merge 769 --rebase --delete-branch

    • Merged (2026-02-06): Confirmed the bug — both ensureCollection() (~line 202) and queryChroma() (~line 862) error handlers reset connected and client on connection errors but never called transport.close(), leaving chroma-mcp subprocesses alive as zombies. Fix adds transport.close() (wrapped in try/catch for already-dead transports) and transport = null before resetting state, mirroring the close() method pattern. 3 new regression tests added. All 19 integration tests + 20 ChromaSearchStrategy tests pass. Build clean.
  • Review PR #884 (fix: add Zscaler SSL certificate support for ChromaDB vector search by @RClark4958). Files: src/services/sync/ChromaSync.ts + build artifacts. Enterprise environments use Zscaler SSL inspection which breaks Chroma HTTPS connections. Steps: (1) gh pr checkout 884 (2) Review — should respect NODE_EXTRA_CA_CERTS or custom CA cert configuration (3) Verify this doesn't weaken SSL for non-Zscaler users (4) Run npm run build (5) If appropriate for enterprise support: gh pr merge 884 --rebase --delete-branch

    • Merged (2026-02-06): Manually applied to main (compiled output conflicts from PR #769 merge). Adds getCombinedCertPath() method that detects Zscaler certificates in the macOS system keychain via security find-certificate, combines them with certifi CA certs into ~/.claude-mem/combined_certs.pem (cached 24h, atomic write), and passes SSL_CERT_FILE/REQUESTS_CA_BUNDLE/CURL_CA_BUNDLE env vars to the chroma-mcp subprocess. Falls back gracefully when Zscaler is absent — no impact on non-Zscaler environments. NODE_EXTRA_CA_CERTS not needed since the Python subprocess uses Python SSL env vars, not Node.js ones. macOS-only cert extraction with cross-platform cache reuse. All 39 Chroma tests pass. Build clean.
  • Review PR #830 (fix: Chroma stability + additional process management layers by @michelhelsdingen). Files: 7 files including ChromaSync.ts, ProcessManager.ts, worker-service.ts. Steps: (1) gh pr checkout 830 (2) This is a broader stability PR — review scope carefully (3) Check for overlap with ProcessRegistry (v9.0.8) (4) If too broad, request scope reduction to just the Chroma stability portions (5) Run npm run build

    • Closed (2026-02-06): PR scope too broad (7 files, 7 distinct concerns) with severe merge conflicts against already-merged PRs #769 (transport cleanup) and #884 (Zscaler SSL). The ensureConnection() rewrite in this PR predates both merges and would require extensive conflict resolution. Left detailed review comment requesting rebase on current main and split into 8 focused PRs: CHROMA_DISABLED setting, connection timeout, reconnection mutex, parent heartbeat, chroma watchdog, stale session recovery, subprocess pool limit, and MCP orphan cleanup. Individual ideas are solid — the connection timeout, parent heartbeat, and CHROMA_DISABLED setting are particularly welcome as standalone PRs.

Feature

  • Evaluate PR #792 (feat: Replace MCP subprocess with persistent Chroma HTTP server by @bigph00t). Files: 12 files. Major architectural change — replaces the MCP subprocess model for Chroma with a persistent HTTP server. Steps: (1) gh pr checkout 792 (2) This is a significant architecture change. Review the approach: persistent HTTP server vs. MCP subprocess (3) Check resource implications (always-running server vs. on-demand subprocess) (4) Verify graceful shutdown and cross-platform support (5) Run npm run build (6) If the architecture is sound and well-tested: gh pr merge 792 --rebase --delete-branch. If concerns exist, leave detailed review comments.
    • Changes Requested (2026-02-06): Architecture direction is sound — persistent HTTP server via npx chroma run with ChromaServerManager singleton is the right long-term approach. Eliminates per-operation subprocess overhead, re-enables Windows support, clean separation between server lifecycle and data operations. Build succeeds and all 34 existing Chroma tests pass. However, 4 critical blockers prevent merge: (1) Zscaler SSL regression — PR #884's getCombinedCertPath() and SSL env var injection are completely removed, breaking enterprise users behind corporate proxies; (2) No tests for 290-line ChromaServerManager (process spawn, heartbeat polling, cross-platform shutdown); (3) Merge conflicts with PRs #769, #884, #887 already merged to main; (4) macOS untested despite being primary dev platform. Additional concerns: ~180MB memory overhead, port 8000 default conflicts, Python dependency not actually removed (shifted to npx chroma run). Left detailed review comment requesting rebase, Zscaler SSL port, ChromaServerManager tests, and macOS verification.