feat: replace WASM embeddings with persistent chroma-mcp MCP connection (#1176)

* feat: replace WASM embeddings with persistent chroma-mcp MCP connection

Replace ChromaServerManager (npx chroma run + chromadb npm + ONNX/WASM)
with ChromaMcpManager, a singleton stdio MCP client that communicates with
chroma-mcp via uvx. This eliminates native binary issues, segfaults, and
WASM embedding failures that plagued cross-platform installs.

Key changes:
- Add ChromaMcpManager: singleton MCP client with lazy connect, auto-reconnect,
  connection lock, and Zscaler SSL cert support
- Rewrite ChromaSync to use MCP tool calls instead of chromadb npm client
- Handle chroma-mcp's non-JSON responses (plain text success/error messages)
- Treat "collection already exists" as idempotent success
- Wire ChromaMcpManager into GracefulShutdown for clean subprocess teardown
- Delete ChromaServerManager (no longer needed)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — connection guard leak, timer leak, async reset

- Clear connecting guard in finally block to prevent permanent reconnection block
- Clear timeout after successful connection to prevent timer leak
- Make reset() async to await stop() before nullifying instance
- Delete obsolete chroma-server-manager test (imports deleted class)
- Update graceful-shutdown test to use chromaMcpManager property name

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: prevent chroma-mcp spawn storm — zombie cleanup, stale onclose guard, reconnect backoff

Three bugs caused chroma-mcp processes to accumulate (92+ observed):

1. Zombie on timeout: failed connections left subprocess alive because
   only the timer was cleared, not the transport. Now catch block
   explicitly closes transport+client before rethrowing.

2. Stale onclose race: old transport's onclose handler captured `this`
   and overwrote the current connection reference after reconnect,
   orphaning the new subprocess. Now guarded with reference check.

3. No backoff: every failure triggered immediate reconnect. With
   backfill doing hundreds of MCP calls, this created rapid-fire
   spawning. Added 10s backoff on both connection failure and
   unexpected process death.

Also includes ChromaSync fixes from PR review:
- queryChroma deduplication now preserves index-aligned arrays
- SQL injection guard on backfill ID exclusion lists

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Alex Newman
2026-02-18 18:32:38 -05:00
committed by GitHub
parent 7e57b6e02d
commit 40daf8f3fa
13 changed files with 833 additions and 1189 deletions
@@ -30,9 +30,9 @@ export interface CloseableDatabase {
}
/**
* Stoppable service interface for Chroma server
* Stoppable service interface for ChromaMcpManager
*/
export interface StoppableServer {
export interface StoppableService {
stop(): Promise<void>;
}
@@ -44,7 +44,7 @@ export interface GracefulShutdownConfig {
sessionManager: ShutdownableService;
mcpClient?: CloseableClient;
dbManager?: CloseableDatabase;
chromaServer?: StoppableServer;
chromaMcpManager?: StoppableService;
}
/**
@@ -79,11 +79,11 @@ export async function performGracefulShutdown(config: GracefulShutdownConfig): P
logger.info('SYSTEM', 'MCP client closed');
}
// STEP 5: Stop Chroma server (local mode only)
if (config.chromaServer) {
logger.info('SHUTDOWN', 'Stopping Chroma server...');
await config.chromaServer.stop();
logger.info('SHUTDOWN', 'Chroma server stopped');
// STEP 5: Stop Chroma MCP connection
if (config.chromaMcpManager) {
logger.info('SHUTDOWN', 'Stopping Chroma MCP connection...');
await config.chromaMcpManager.stop();
logger.info('SHUTDOWN', 'Chroma MCP connection stopped');
}
// STEP 6: Close database connection (includes ChromaSync cleanup)