Files
claude-mem/Auto Run Docs/PR-Triage/PR-Triage-09.md
T

23 lines
5.9 KiB
Markdown

# Phase 09: Chroma/Vector Search Fixes
These PRs address Chroma vector database stability, zombie processes, and enterprise compatibility.
## Bug Fixes
- [x] Review PR #887 (`fix: align IDs with metadatas in ChromaSearchStrategy` by @abkrim). Files: `src/services/worker/search/strategies/ChromaSearchStrategy.ts`, tests. IDs and metadata arrays are misaligned causing incorrect search results. Steps: (1) `gh pr checkout 887` (2) Review — array alignment between document IDs and their metadata is critical for correct results (3) Check test coverage (4) Run `npm run build` (5) If clean: `gh pr merge 887 --rebase --delete-branch`
- **Merged** (2026-02-06): Confirmed the bug — `queryChroma()` deduplicates IDs but returns raw metadatas array, causing index misalignment in `filterByRecency()`. Fix builds a Map from sqlite_id→metadata then iterates deduplicated IDs. 20 tests pass (including 2 new). Build clean.
- [x] Review PR #769 (`fix: close transport on connection error to prevent chroma-mcp zombie processes` by @jenyapoyarkov). Files: `src/services/sync/ChromaSync.ts`, tests. Transport left open on connection failure creates zombies. Steps: (1) `gh pr checkout 769` (2) Review — should close/dispose transport in the error path (3) Run `npm run build` (4) If clean: `gh pr merge 769 --rebase --delete-branch`
- **Merged** (2026-02-06): Confirmed the bug — both `ensureCollection()` (~line 202) and `queryChroma()` (~line 862) error handlers reset `connected` and `client` on connection errors but never called `transport.close()`, leaving chroma-mcp subprocesses alive as zombies. Fix adds `transport.close()` (wrapped in try/catch for already-dead transports) and `transport = null` before resetting state, mirroring the `close()` method pattern. 3 new regression tests added. All 19 integration tests + 20 ChromaSearchStrategy tests pass. Build clean.
- [x] Review PR #884 (`fix: add Zscaler SSL certificate support for ChromaDB vector search` by @RClark4958). Files: `src/services/sync/ChromaSync.ts` + build artifacts. Enterprise environments use Zscaler SSL inspection which breaks Chroma HTTPS connections. Steps: (1) `gh pr checkout 884` (2) Review — should respect NODE_EXTRA_CA_CERTS or custom CA cert configuration (3) Verify this doesn't weaken SSL for non-Zscaler users (4) Run `npm run build` (5) If appropriate for enterprise support: `gh pr merge 884 --rebase --delete-branch`
- **Merged** (2026-02-06): Manually applied to main (compiled output conflicts from PR #769 merge). Adds `getCombinedCertPath()` method that detects Zscaler certificates in the macOS system keychain via `security find-certificate`, combines them with certifi CA certs into `~/.claude-mem/combined_certs.pem` (cached 24h, atomic write), and passes `SSL_CERT_FILE`/`REQUESTS_CA_BUNDLE`/`CURL_CA_BUNDLE` env vars to the chroma-mcp subprocess. Falls back gracefully when Zscaler is absent — no impact on non-Zscaler environments. `NODE_EXTRA_CA_CERTS` not needed since the Python subprocess uses Python SSL env vars, not Node.js ones. macOS-only cert extraction with cross-platform cache reuse. All 39 Chroma tests pass. Build clean.
- [x] Review PR #830 (`fix: Chroma stability + additional process management layers` by @michelhelsdingen). Files: 7 files including ChromaSync.ts, ProcessManager.ts, worker-service.ts. Steps: (1) `gh pr checkout 830` (2) This is a broader stability PR — review scope carefully (3) Check for overlap with ProcessRegistry (v9.0.8) (4) If too broad, request scope reduction to just the Chroma stability portions (5) Run `npm run build`
- **Closed** (2026-02-06): PR scope too broad (7 files, 7 distinct concerns) with severe merge conflicts against already-merged PRs #769 (transport cleanup) and #884 (Zscaler SSL). The `ensureConnection()` rewrite in this PR predates both merges and would require extensive conflict resolution. Left detailed review comment requesting rebase on current main and split into 8 focused PRs: CHROMA_DISABLED setting, connection timeout, reconnection mutex, parent heartbeat, chroma watchdog, stale session recovery, subprocess pool limit, and MCP orphan cleanup. Individual ideas are solid — the connection timeout, parent heartbeat, and CHROMA_DISABLED setting are particularly welcome as standalone PRs.
## Feature
- [x] Evaluate PR #792 (`feat: Replace MCP subprocess with persistent Chroma HTTP server` by @bigph00t). Files: 12 files. Major architectural change — replaces the MCP subprocess model for Chroma with a persistent HTTP server. Steps: (1) `gh pr checkout 792` (2) This is a significant architecture change. Review the approach: persistent HTTP server vs. MCP subprocess (3) Check resource implications (always-running server vs. on-demand subprocess) (4) Verify graceful shutdown and cross-platform support (5) Run `npm run build` (6) If the architecture is sound and well-tested: `gh pr merge 792 --rebase --delete-branch`. If concerns exist, leave detailed review comments.
- **Changes Requested** (2026-02-06): Architecture direction is sound — persistent HTTP server via `npx chroma run` with ChromaServerManager singleton is the right long-term approach. Eliminates per-operation subprocess overhead, re-enables Windows support, clean separation between server lifecycle and data operations. Build succeeds and all 34 existing Chroma tests pass. However, 4 critical blockers prevent merge: (1) **Zscaler SSL regression** — PR #884's `getCombinedCertPath()` and SSL env var injection are completely removed, breaking enterprise users behind corporate proxies; (2) **No tests** for 290-line ChromaServerManager (process spawn, heartbeat polling, cross-platform shutdown); (3) **Merge conflicts** with PRs #769, #884, #887 already merged to main; (4) **macOS untested** despite being primary dev platform. Additional concerns: ~180MB memory overhead, port 8000 default conflicts, Python dependency not actually removed (shifted to npx chroma run). Left detailed review comment requesting rebase, Zscaler SSL port, ChromaServerManager tests, and macOS verification.