fa4ae3b9467df18e5eadf025876fe37cb4e4558b
6 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
8d166b47c1 |
Revert "revert: roll back v12.3.3 (Issue Blowout 2026)"
This reverts commit
|
||
|
|
bfc7de377a |
revert: roll back v12.3.3 (Issue Blowout 2026)
SessionStart context injection regressed in v12.3.3 — no memory context is being delivered to new sessions. Rolling back to the v12.3.2 tree state while the regression is investigated. Reverts #2080. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
ba1ef6c42c |
fix: Issue Blowout 2026 — 25 bugs across worker, hooks, security, and search (#2080)
* fix: resolve search, database, and docker bugs (#1913, #1916, #1956, #1957, #2048) - Fix concept/concepts param mismatch in SearchManager.normalizeParams (#1916) - Add FTS5 keyword fallback when ChromaDB is unavailable (#1913, #2048) - Add periodic WAL checkpoint and journal_size_limit to prevent unbounded WAL growth (#1956) - Add periodic clearFailed() to purge stale pending_messages (#1957) - Fix nounset-safe TTY_ARGS expansion in docker/claude-mem/run.sh Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: prevent silent data loss on non-XML responses, add queue info to /health (#1867, #1874) - ResponseProcessor: mark messages as failed (with retry) instead of confirming when the LLM returns non-XML garbage (auth errors, rate limits) (#1874) - Health endpoint: include activeSessions count for queue liveness monitoring (#1867) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: cache isFts5Available() at construction time Addresses Greptile review: avoid DDL probe (CREATE + DROP) on every text query. Result is now cached in _fts5Available at construction. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve worker stability bugs — pool deadlock, MCP loopback, restart guard (#1868, #1876, #2053) - Replace flat consecutiveRestarts counter with time-windowed RestartGuard: only counts restarts within 60s window (cap=10), decays after 5min of success. Prevents stranding pending messages on long-running sessions. (#2053) - Add idle session eviction to pool slot allocation: when all slots are full, evict the idlest session (no pending work, oldest activity) to free a slot for new requests, preventing 60s timeout deadlock. (#1868) - Fix MCP loopback self-check: use process.execPath instead of bare 'node' which fails on non-interactive PATH. Fix crash misclassification by removing false "Generator exited unexpectedly" error log on normal completion. (#1876) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve hooks reliability bugs — summarize exit code, session-init health wait (#1896, #1901, #1903, #1907) - Wrap summarize hook's workerHttpRequest in try/catch to prevent exit code 2 (blocking error) on network failures or malformed responses. Session exit no longer blocks on worker errors. (#1901) - Add health-check wait loop to UserPromptSubmit session-init command in hooks.json. On Linux/WSL where hook ordering fires UserPromptSubmit before SessionStart, session-init now waits up to 10s for worker health before proceeding. Also wrap session-init HTTP call in try/catch. (#1907) - Close #1896 as already-fixed: mtime comparison at file-context.ts:255-267 bypasses truncation when file is newer than latest observation. - Close #1903 as no-repro: hooks.json correctly declares all hook events. Issue was Claude Code 12.0.1/macOS platform event-dispatch bug. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: security hardening — bearer auth, path validation, rate limits, per-user port (#1932, #1933, #1934, #1935, #1936) - Add bearer token auth to all API endpoints: auto-generated 32-byte token stored at ~/.claude-mem/worker-auth-token (mode 0600). All hook, MCP, viewer, and OpenCode requests include Authorization header. Health/readiness endpoints exempt for polling. (#1932, #1933) - Add path traversal protection: watch.context.path validated against project root and ~/.claude-mem/ before write. Rejects ../../../etc style attacks. (#1934) - Reduce JSON body limit from 50MB to 5MB. Add in-memory rate limiter (300 req/min/IP) to prevent abuse. (#1935) - Derive default worker port from UID (37700 + uid%100) to prevent cross-user data leakage on multi-user macOS. Windows falls back to 37777. Shell hooks use same formula via id -u. (#1936) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve search project filtering and import Chroma sync (#1911, #1912, #1914, #1918) - Fix per-type search endpoints to pass project filter to Chroma queries and SQLite hydration. searchObservations/Sessions/UserPrompts now use $or clause matching project + merged_into_project. (#1912) - Fix timeline/search methods to pass project to Chroma anchor queries. Prevents cross-project result leakage when project param omitted. (#1911) - Sync imported observations to ChromaDB after FTS rebuild. Import endpoint now calls chromaSync.syncObservation() for each imported row, making them visible to MCP search(). (#1914) - Fix session-init cwd fallback to match context.ts (process.cwd()). Prevents project key mismatch that caused "no previous sessions" on fresh sessions. (#1918) - Fix sync-marketplace restart to include auth token and per-user port. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve all CodeRabbit and Greptile review comments on PR #2080 - Fix run.sh comment mismatch (no-op flag vs empty array) - Gate session-init on health check success (prevent running when worker unreachable) - Fix date_desc ordering ignored in FTS session search - Age-scope failed message purge (1h retention) instead of clearing all - Anchor RestartGuard decay to real successes (null init, not Date.now()) - Add recordSuccess() calls in ResponseProcessor and completion path - Prevent caller headers from overriding bearer auth token - Add lazy cleanup for rate limiter map to prevent unbounded growth - Bound post-import Chroma sync with concurrency limit of 8 - Add doc_type:'observation' filter to Chroma queries feeding observation hydration - Add FTS fallback to all specialized search handlers (observations, sessions, prompts, timeline) - Add response.ok check and error handling in viewer saveSettings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve CodeRabbit round-2 review comments - Use failure timestamp (COALESCE) instead of created_at_epoch for stale purge - Downgrade _fts5Available flag when FTS table creation fails - Escape FTS5 MATCH input by quoting user queries as literal phrases - Escape LIKE metacharacters (%, _, \) in prompt text search - Add response.ok check in initial settings load (matches save flow) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve CodeRabbit round-3 review comments - Include failed_at_epoch in COALESCE for age-scoped purge - Re-throw FTS5 errors so callers can distinguish failure from no-results - Wrap all FTS fallback calls in SearchManager with try/catch Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
be99a5d690 |
fix: resolve search, database, and docker bugs (#2079)
* fix: resolve search, database, and docker bugs (#1913, #1916, #1956, #1957, #2048) - Fix concept/concepts param mismatch in SearchManager.normalizeParams (#1916) - Add FTS5 keyword fallback when ChromaDB is unavailable (#1913, #2048) - Add periodic WAL checkpoint and journal_size_limit to prevent unbounded WAL growth (#1956) - Add periodic clearFailed() to purge stale pending_messages (#1957) - Fix nounset-safe TTY_ARGS expansion in docker/claude-mem/run.sh Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: prevent silent data loss on non-XML responses, add queue info to /health (#1867, #1874) - ResponseProcessor: mark messages as failed (with retry) instead of confirming when the LLM returns non-XML garbage (auth errors, rate limits) (#1874) - Health endpoint: include activeSessions count for queue liveness monitoring (#1867) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: cache isFts5Available() at construction time Addresses Greptile review: avoid DDL probe (CREATE + DROP) on every text query. Result is now cached in _fts5Available at construction. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
c9adb1c77b |
docs: add README for docker/claude-mem harness
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
97c7c999b1 |
feat: basic claude-mem Docker container for easy spin-up (#2076)
* feat(evals): SWE-bench Docker scaffolding for claude-mem resolve-rate measurement Adds evals/swebench/ scaffolding per .claude/plans/swebench-claude-mem-docker.md. Agent image builds Claude Code 2.1.114 + locally-built claude-mem plugin; run-instance.sh executes the two-turn ingest/fix protocol per instance; run-batch.py orchestrates parallel Docker runs with per-instance isolation; eval.sh wraps the upstream SWE-bench harness; summarize.py aggregates reports. Orchestrator owns JSONL writes under a lock to avoid racy concurrent appends; agent writes its authoritative diff to CLAUDE_MEM_OUTPUT_DIR (/scratch in container mode) and the orchestrator reads it back. Scaffolding only — no Docker build or smoke test run yet. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(evals): OAuth credential mounting for Claude Max/Pro subscriptions Skips per-call API billing by extracting OAuth creds from host Keychain (macOS) or ~/.claude/.credentials.json (Linux) and bind-mounting them read-only into each agent container. Creds are copied into HOME=$SCRATCH/.claude at container start so the per-instance isolation model still holds. Adds run-batch.py --auth {oauth,api-key,auto} (auto prefers OAuth, falls back to API key). run-instance.sh accepts either ANTHROPIC_API_KEY or CLAUDE_MEM_CREDENTIALS_FILE. smoke-test.sh runs one instance end-to-end using OAuth for quick verification before batch runs. Caveat surfaced in docstrings: Max/Pro has per-window usage limits and is framed for individual developer use — batch evaluation may exhaust the quota or raise compliance questions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(docker): basic claude-mem container for ad-hoc testing Adds docker/claude-mem/ with a fresh spin-up image: - Dockerfile: FROM node:20 (reproduces anthropics/claude-code .devcontainer pattern — Anthropic ships the Dockerfile, not a pullable image); layers Bun + uv + locally-built plugin/; runs as non-root node user - entrypoint.sh: seeds OAuth creds from CLAUDE_MEM_CREDENTIALS_FILE into $HOME/.claude/.credentials.json, then exec's the command (default: bash) - build.sh: npm run build + docker build - run.sh: interactive launcher; auto-extracts OAuth from macOS Keychain (security find-generic-password) or ~/.claude/.credentials.json on Linux, mounts host .docker-claude-mem-data/ at /home/node/.claude-mem so the observations DB survives container exit Validated end-to-end: PostToolUse hook fires, queue enqueues, worker's SDK compression runs under subscription OAuth, observations row lands with populated facts/concepts/files_read, Chroma sync triggers. Also updates .gitignore/.dockerignore for the new runtime-output paths. Built plugin artifacts refreshed by the build step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(evals/swebench): non-root user, OAuth mount, Lite dataset default - Dockerfile.agent: switch to non-root \`node\` user (uid 1000); Claude Code refuses --permission-mode bypassPermissions when euid==0, which made every agent run exit 1 before producing a diff. Also move Bun + uv installs to system paths so the non-root user can exec them. - run-batch.py: add extract_oauth_credentials() that pulls from macOS Keychain / Linux ~/.claude/.credentials.json into a temp file and bind- mounts it at /auth/.credentials.json:ro with CLAUDE_MEM_CREDENTIALS_FILE. New --auth {oauth,api-key,auto} flag. New --dataset flag so the batch can target SWE-bench_Lite without editing the script. - smoke-test.sh: default DATASET to princeton-nlp/SWE-bench_Lite (Lite contains sympy__sympy-24152, Verified does not); accept DATASET env override. Caveat surfaced during testing: Max/Pro subscriptions have per-window usage limits; running 5 instances in parallel with the "read every source file" ingest prompt exhausted the 5h window within ~25 minutes (3/5 hit HTTP 429). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address PR #2076 review comments - docker/claude-mem/run.sh: chmod 600 (not 644) on extracted OAuth creds to match what `claude login` writes; avoids exposing tokens to other host users. Verified readable inside the container under Docker Desktop's UID translation. - docker/claude-mem/Dockerfile: pin Bun + uv via --build-arg BUN_VERSION / UV_VERSION (defaults: 1.3.12, 0.11.7). Bun via `bash -s "bun-v<V>"`; uv via versioned installer URL `https://astral.sh/uv/<V>/install.sh`. - evals/swebench/smoke-test.sh: pipe JSON through stdin to `python3 -c` so paths with spaces/special chars can't break shell interpolation. - evals/swebench/run-batch.py: add --overwrite flag; abort by default when predictions.jsonl for the run-id already exists, preventing accidental silent discard of partial results. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address coderabbit review on PR #2076 Actionable (4): - Dockerfile uv install: wrap `chmod ... || true` in braces so the trailing `|| true` no longer masks failures from `curl|sh` via bash operator precedence (&& binds tighter than ||). Applied to both docker/claude-mem/ and evals/swebench/Dockerfile.agent. Added `set -eux` to the RUN lines. - docker/claude-mem/Dockerfile: drop unused `sudo` apt package (~2 MB). - run-batch.py: name each agent container (`swebench-agent-<id>-<pid>-<tid>`) and force-remove via `docker rm -f <name>` in the TimeoutExpired handler so timed-out runs don't leave orphan containers. Nitpicks (2): - smoke-test.sh: collapse 3 python3 invocations into 1 — parse the instance JSON once, print `repo base_commit`, and write problem.txt in the same call. - run-instance.sh: shallow clone via `--depth 1 --no-single-branch` + `fetch --depth 1 origin $BASE_COMMIT`. Falls back to a full clone if the server rejects the by-commit fetch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address second coderabbit review on PR #2076 Actionable (3): - docker/claude-mem/run.sh: on macOS, fall back to ~/.claude/.credentials.json when the Keychain lookup misses (some setups still have file-only creds). Unified into a single creds_obtained gate so the error surface lists both sources tried. - docker/claude-mem/run.sh: drop `exec docker run` — `exec` replaces the shell so the EXIT trap (`rm -f "$CREDS_FILE"`) never fires and the extracted OAuth JSON leaks to disk until tmpfs cleanup. Run as a child instead so the trap runs on exit. - evals/swebench/smoke-test.sh: actually enforce the TIMEOUT env var. Pick `timeout` or `gtimeout` (coreutils on macOS), fall back to uncapped with a warning. Name the container so exit-124 from timeout can `docker rm -f` it deterministically. Nitpick from the same review (consolidated python3 calls in smoke-test.sh) was already addressed in the prior commit ef621e00. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address third coderabbit review on PR #2076 Actionable (1): - evals/swebench/smoke-test.sh: the consolidated python heredoc had competing stdin redirections — `<<'PY'` (script body) AND `< "$INSTANCE_JSON"` (data). The heredoc won, so `json.load(sys.stdin)` saw an empty stream and the parse would have failed at runtime. Pass INSTANCE_JSON as argv[2] and `open()` it inside the script instead; the heredoc is now only the script body, which is what `python3 -` needs. Nitpicks (2): - evals/swebench/smoke-test.sh: macOS Keychain lookup now falls through to ~/.claude/.credentials.json on miss (matches docker/claude-mem/run.sh). - evals/swebench/run-batch.py: extract_oauth_credentials() no longer early-returns on Darwin keychain miss; falls through to the on-disk creds file so macOS setups with file-only credentials work in batch mode too. Functional spot-check of the parse fix confirmed: REPO/BASE_COMMIT populated and problem.txt written from a synthetic INSTANCE_JSON. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |