fix: remove ONNX/OpenBLAS thread cap from chroma-mcp spawn env

The 2-thread cap was a bandaid for #2220 (Windows) and #2253 (macOS Intel)
CPU runaway reports on v12.4.9. The actual root causes (watermark stuck
at 0 → continuous re-embed, orphan process trees, fire-and-forget backfill
across 80+ projects) were fixed structurally in #2282: per-batch watermark
persistence, killProcessTree() + pgid registration, max-3 concurrent
backfills with re-entrancy guard, kernel-enforced child cleanup (#2216).

With the structural fixes in place, capping ONNX/OpenBLAS/MKL at 2 threads
slows initial backfill 3–6× on multi-core machines and provides no
steady-state benefit. Defer to the OS scheduler and the user's environment.

ANONYMIZED_TELEMETRY=false stays — unrelated to the storm, blocks
background HTTP from the embedding subprocess.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Alex Newman
2026-05-04 13:08:53 -07:00
parent 43037782a8
commit 39f1102600
2 changed files with 1 additions and 10 deletions
File diff suppressed because one or more lines are too long
-9
View File
@@ -588,15 +588,6 @@ export class ChromaMcpManager {
}
}
// Cap embedding-thread fanout. ONNX Runtime / OpenBLAS / MKL all default to
// cpu_count(), so a 12-core box runs 12 threads burning embeddings in
// parallel — the dominant cause of the chroma-mcp CPU storm on Windows
// (#2220). Two threads keeps backfill latency reasonable without saturating
// the box. Only set if the user hasn't pinned them explicitly.
const threadCap = '2';
for (const key of ['OMP_NUM_THREADS', 'ONNX_NUM_THREADS', 'OPENBLAS_NUM_THREADS', 'MKL_NUM_THREADS']) {
if (!baseEnv[key]) baseEnv[key] = threadCap;
}
// Disable Chroma's anonymous telemetry — it issues background HTTP from
// the embedding subprocess on every collection touch.
if (!baseEnv.ANONYMIZED_TELEMETRY) baseEnv.ANONYMIZED_TELEMETRY = 'false';