feat: basic claude-mem Docker container for easy spin-up (#2076)
* feat(evals): SWE-bench Docker scaffolding for claude-mem resolve-rate measurement Adds evals/swebench/ scaffolding per .claude/plans/swebench-claude-mem-docker.md. Agent image builds Claude Code 2.1.114 + locally-built claude-mem plugin; run-instance.sh executes the two-turn ingest/fix protocol per instance; run-batch.py orchestrates parallel Docker runs with per-instance isolation; eval.sh wraps the upstream SWE-bench harness; summarize.py aggregates reports. Orchestrator owns JSONL writes under a lock to avoid racy concurrent appends; agent writes its authoritative diff to CLAUDE_MEM_OUTPUT_DIR (/scratch in container mode) and the orchestrator reads it back. Scaffolding only — no Docker build or smoke test run yet. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(evals): OAuth credential mounting for Claude Max/Pro subscriptions Skips per-call API billing by extracting OAuth creds from host Keychain (macOS) or ~/.claude/.credentials.json (Linux) and bind-mounting them read-only into each agent container. Creds are copied into HOME=$SCRATCH/.claude at container start so the per-instance isolation model still holds. Adds run-batch.py --auth {oauth,api-key,auto} (auto prefers OAuth, falls back to API key). run-instance.sh accepts either ANTHROPIC_API_KEY or CLAUDE_MEM_CREDENTIALS_FILE. smoke-test.sh runs one instance end-to-end using OAuth for quick verification before batch runs. Caveat surfaced in docstrings: Max/Pro has per-window usage limits and is framed for individual developer use — batch evaluation may exhaust the quota or raise compliance questions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(docker): basic claude-mem container for ad-hoc testing Adds docker/claude-mem/ with a fresh spin-up image: - Dockerfile: FROM node:20 (reproduces anthropics/claude-code .devcontainer pattern — Anthropic ships the Dockerfile, not a pullable image); layers Bun + uv + locally-built plugin/; runs as non-root node user - entrypoint.sh: seeds OAuth creds from CLAUDE_MEM_CREDENTIALS_FILE into $HOME/.claude/.credentials.json, then exec's the command (default: bash) - build.sh: npm run build + docker build - run.sh: interactive launcher; auto-extracts OAuth from macOS Keychain (security find-generic-password) or ~/.claude/.credentials.json on Linux, mounts host .docker-claude-mem-data/ at /home/node/.claude-mem so the observations DB survives container exit Validated end-to-end: PostToolUse hook fires, queue enqueues, worker's SDK compression runs under subscription OAuth, observations row lands with populated facts/concepts/files_read, Chroma sync triggers. Also updates .gitignore/.dockerignore for the new runtime-output paths. Built plugin artifacts refreshed by the build step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(evals/swebench): non-root user, OAuth mount, Lite dataset default - Dockerfile.agent: switch to non-root \`node\` user (uid 1000); Claude Code refuses --permission-mode bypassPermissions when euid==0, which made every agent run exit 1 before producing a diff. Also move Bun + uv installs to system paths so the non-root user can exec them. - run-batch.py: add extract_oauth_credentials() that pulls from macOS Keychain / Linux ~/.claude/.credentials.json into a temp file and bind- mounts it at /auth/.credentials.json:ro with CLAUDE_MEM_CREDENTIALS_FILE. New --auth {oauth,api-key,auto} flag. New --dataset flag so the batch can target SWE-bench_Lite without editing the script. - smoke-test.sh: default DATASET to princeton-nlp/SWE-bench_Lite (Lite contains sympy__sympy-24152, Verified does not); accept DATASET env override. Caveat surfaced during testing: Max/Pro subscriptions have per-window usage limits; running 5 instances in parallel with the "read every source file" ingest prompt exhausted the 5h window within ~25 minutes (3/5 hit HTTP 429). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address PR #2076 review comments - docker/claude-mem/run.sh: chmod 600 (not 644) on extracted OAuth creds to match what `claude login` writes; avoids exposing tokens to other host users. Verified readable inside the container under Docker Desktop's UID translation. - docker/claude-mem/Dockerfile: pin Bun + uv via --build-arg BUN_VERSION / UV_VERSION (defaults: 1.3.12, 0.11.7). Bun via `bash -s "bun-v<V>"`; uv via versioned installer URL `https://astral.sh/uv/<V>/install.sh`. - evals/swebench/smoke-test.sh: pipe JSON through stdin to `python3 -c` so paths with spaces/special chars can't break shell interpolation. - evals/swebench/run-batch.py: add --overwrite flag; abort by default when predictions.jsonl for the run-id already exists, preventing accidental silent discard of partial results. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address coderabbit review on PR #2076 Actionable (4): - Dockerfile uv install: wrap `chmod ... || true` in braces so the trailing `|| true` no longer masks failures from `curl|sh` via bash operator precedence (&& binds tighter than ||). Applied to both docker/claude-mem/ and evals/swebench/Dockerfile.agent. Added `set -eux` to the RUN lines. - docker/claude-mem/Dockerfile: drop unused `sudo` apt package (~2 MB). - run-batch.py: name each agent container (`swebench-agent-<id>-<pid>-<tid>`) and force-remove via `docker rm -f <name>` in the TimeoutExpired handler so timed-out runs don't leave orphan containers. Nitpicks (2): - smoke-test.sh: collapse 3 python3 invocations into 1 — parse the instance JSON once, print `repo base_commit`, and write problem.txt in the same call. - run-instance.sh: shallow clone via `--depth 1 --no-single-branch` + `fetch --depth 1 origin $BASE_COMMIT`. Falls back to a full clone if the server rejects the by-commit fetch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address second coderabbit review on PR #2076 Actionable (3): - docker/claude-mem/run.sh: on macOS, fall back to ~/.claude/.credentials.json when the Keychain lookup misses (some setups still have file-only creds). Unified into a single creds_obtained gate so the error surface lists both sources tried. - docker/claude-mem/run.sh: drop `exec docker run` — `exec` replaces the shell so the EXIT trap (`rm -f "$CREDS_FILE"`) never fires and the extracted OAuth JSON leaks to disk until tmpfs cleanup. Run as a child instead so the trap runs on exit. - evals/swebench/smoke-test.sh: actually enforce the TIMEOUT env var. Pick `timeout` or `gtimeout` (coreutils on macOS), fall back to uncapped with a warning. Name the container so exit-124 from timeout can `docker rm -f` it deterministically. Nitpick from the same review (consolidated python3 calls in smoke-test.sh) was already addressed in the prior commit ef621e00. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address third coderabbit review on PR #2076 Actionable (1): - evals/swebench/smoke-test.sh: the consolidated python heredoc had competing stdin redirections — `<<'PY'` (script body) AND `< "$INSTANCE_JSON"` (data). The heredoc won, so `json.load(sys.stdin)` saw an empty stream and the parse would have failed at runtime. Pass INSTANCE_JSON as argv[2] and `open()` it inside the script instead; the heredoc is now only the script body, which is what `python3 -` needs. Nitpicks (2): - evals/swebench/smoke-test.sh: macOS Keychain lookup now falls through to ~/.claude/.credentials.json on miss (matches docker/claude-mem/run.sh). - evals/swebench/run-batch.py: extract_oauth_credentials() no longer early-returns on Darwin keychain miss; falls through to the on-disk creds file so macOS setups with file-only credentials work in batch mode too. Functional spot-check of the parse fix confirmed: REPO/BASE_COMMIT populated and problem.txt written from a synthetic INSTANCE_JSON. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,93 @@
|
||||
# Basic claude-mem container for ad-hoc testing.
|
||||
#
|
||||
# Base layout mirrors anthropics/claude-code .devcontainer
|
||||
# (https://github.com/anthropics/claude-code/blob/main/.devcontainer/Dockerfile):
|
||||
# FROM node:20, non-root `node` user, global npm install of @anthropic-ai/claude-code.
|
||||
# We skip the firewall/zsh/fzf/delta/git-hist noise since this image is for
|
||||
# exercising claude-mem, not as a full dev environment.
|
||||
#
|
||||
# On top of that base we install:
|
||||
# - Bun (claude-mem worker service runtime)
|
||||
# - uv (provides Python for Chroma per CLAUDE.md)
|
||||
# - The locally-built plugin/ tree at /opt/claude-mem
|
||||
#
|
||||
# Usage:
|
||||
# docker build -f docker/claude-mem/Dockerfile -t claude-mem:basic .
|
||||
# docker run --rm -it \
|
||||
# -v $(mktemp -d):/home/node/.claude-mem \
|
||||
# -e CLAUDE_MEM_CREDENTIALS_FILE=/auth/.credentials.json \
|
||||
# -v /path/to/extracted/creds.json:/auth/.credentials.json:ro \
|
||||
# claude-mem:basic
|
||||
|
||||
FROM node:20
|
||||
|
||||
ENV DEBIAN_FRONTEND=noninteractive
|
||||
|
||||
RUN apt-get update \
|
||||
&& apt-get install -y --no-install-recommends \
|
||||
git \
|
||||
curl \
|
||||
ca-certificates \
|
||||
unzip \
|
||||
jq \
|
||||
less \
|
||||
procps \
|
||||
uuid-runtime \
|
||||
sqlite3 \
|
||||
&& apt-get clean && rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Bun — system-wide so the unprivileged `node` user can execute it.
|
||||
# Pin via --build-arg BUN_VERSION=X.Y.Z; default is the version verified at PR time.
|
||||
ARG BUN_VERSION=1.3.12
|
||||
ENV BUN_INSTALL="/usr/local/bun"
|
||||
RUN curl -fsSL https://bun.sh/install | bash -s "bun-v${BUN_VERSION}" \
|
||||
&& chmod -R a+rX /usr/local/bun
|
||||
ENV PATH="/usr/local/bun/bin:${PATH}"
|
||||
|
||||
# uv — system-wide, for Chroma's Python runtime. Pin via --build-arg UV_VERSION=X.Y.Z.
|
||||
# Versioned installer URL per https://docs.astral.sh/uv/getting-started/installation/.
|
||||
ARG UV_VERSION=0.11.7
|
||||
ENV UV_INSTALL_DIR="/usr/local/bin"
|
||||
# `&&` binds tighter than `||` in bash, so the previous form let `curl|sh` fail
|
||||
# silently via the trailing `|| true`. Group the chmod so tolerated failure is
|
||||
# scoped to perms-fixing only.
|
||||
RUN set -eux \
|
||||
&& curl -LsSf "https://astral.sh/uv/${UV_VERSION}/install.sh" | sh \
|
||||
&& { chmod a+rX /usr/local/bin/uv /usr/local/bin/uvx 2>/dev/null || true; }
|
||||
|
||||
# Match the upstream devcontainer's npm-global prefix so `npm install -g`
|
||||
# targets a dir the `node` user owns.
|
||||
RUN mkdir -p /usr/local/share/npm-global \
|
||||
&& chown -R node:node /usr/local/share/npm-global
|
||||
ENV NPM_CONFIG_PREFIX=/usr/local/share/npm-global
|
||||
ENV PATH="/usr/local/share/npm-global/bin:${PATH}"
|
||||
|
||||
# Claude Code CLI. Override at build-time with --build-arg CLAUDE_CODE_VERSION=X.Y.Z
|
||||
# to pin; default tracks latest.
|
||||
ARG CLAUDE_CODE_VERSION=latest
|
||||
USER node
|
||||
RUN npm install -g @anthropic-ai/claude-code@${CLAUDE_CODE_VERSION}
|
||||
|
||||
# Locally-built claude-mem plugin. COPY runs as root by default and layers are
|
||||
# cached, so put this after the npm install so iterating on the plugin doesn't
|
||||
# invalidate the CLI install layer.
|
||||
USER root
|
||||
COPY plugin/ /opt/claude-mem/
|
||||
RUN chown -R node:node /opt/claude-mem
|
||||
|
||||
# Persistent mount points for ad-hoc testing — mount a host dir at either of
|
||||
# these to inspect the claude-mem DB after a session.
|
||||
RUN mkdir -p /home/node/.claude /home/node/.claude-mem \
|
||||
&& chown -R node:node /home/node/.claude /home/node/.claude-mem
|
||||
|
||||
USER node
|
||||
WORKDIR /home/node
|
||||
|
||||
# Helper: copies OAuth creds out of the read-only mount into $HOME/.claude/
|
||||
# before exec'ing whatever you asked for. Saves the "cp + chmod" dance every
|
||||
# time you drop in.
|
||||
COPY --chown=node:node docker/claude-mem/entrypoint.sh /usr/local/bin/claude-mem-entrypoint
|
||||
RUN chmod +x /usr/local/bin/claude-mem-entrypoint
|
||||
|
||||
ENTRYPOINT ["/usr/local/bin/claude-mem-entrypoint"]
|
||||
CMD ["bash"]
|
||||
Executable
+24
@@ -0,0 +1,24 @@
|
||||
#!/usr/bin/env bash
|
||||
# Build the basic claude-mem Docker image from the current worktree.
|
||||
#
|
||||
# Usage:
|
||||
# docker/claude-mem/build.sh # builds claude-mem:basic
|
||||
# TAG=my-tag docker/claude-mem/build.sh # override the tag
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
|
||||
TAG="${TAG:-claude-mem:basic}"
|
||||
|
||||
cd "$REPO_ROOT"
|
||||
|
||||
echo "[build] npm run build"
|
||||
npm run build
|
||||
|
||||
echo "[build] docker build -t $TAG"
|
||||
docker build \
|
||||
-f docker/claude-mem/Dockerfile \
|
||||
-t "$TAG" \
|
||||
"$REPO_ROOT"
|
||||
|
||||
echo "[build] done: $TAG"
|
||||
Executable
+28
@@ -0,0 +1,28 @@
|
||||
#!/usr/bin/env bash
|
||||
# Entrypoint for the basic claude-mem container. Seeds OAuth creds if a
|
||||
# credentials file is mounted, then exec's whatever was passed (default: bash).
|
||||
#
|
||||
# Env vars:
|
||||
# CLAUDE_MEM_CREDENTIALS_FILE Path to a mounted OAuth credentials JSON file
|
||||
# (e.g. /auth/.credentials.json). Copied into
|
||||
# $HOME/.claude/.credentials.json at startup.
|
||||
# ANTHROPIC_API_KEY Standard API-key auth; set when OAuth isn't used.
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
mkdir -p "$HOME/.claude" "$HOME/.claude-mem"
|
||||
|
||||
if [[ -n "${CLAUDE_MEM_CREDENTIALS_FILE:-}" ]]; then
|
||||
if [[ ! -f "$CLAUDE_MEM_CREDENTIALS_FILE" ]]; then
|
||||
echo "ERROR: CLAUDE_MEM_CREDENTIALS_FILE set but file missing: $CLAUDE_MEM_CREDENTIALS_FILE" >&2
|
||||
exit 1
|
||||
fi
|
||||
cp "$CLAUDE_MEM_CREDENTIALS_FILE" "$HOME/.claude/.credentials.json"
|
||||
chmod 600 "$HOME/.claude/.credentials.json"
|
||||
fi
|
||||
|
||||
# Helpful one-liner for interactive users: run `claude` with the plugin dir
|
||||
# preconfigured. Don't force it — `exec "$@"` lets you override freely.
|
||||
export PATH="/usr/local/bun/bin:/usr/local/share/npm-global/bin:$PATH"
|
||||
|
||||
exec "$@"
|
||||
Executable
+69
@@ -0,0 +1,69 @@
|
||||
#!/usr/bin/env bash
|
||||
# Drop into an interactive claude-mem container with OAuth creds + persistent
|
||||
# memory volume. For ad-hoc testing / poking around.
|
||||
#
|
||||
# Usage:
|
||||
# docker/claude-mem/run.sh
|
||||
# docker/claude-mem/run.sh claude --plugin-dir /opt/claude-mem --print "hi"
|
||||
#
|
||||
# On exit, the mounted .claude-mem/ dir on the host survives so you can inspect
|
||||
# the DB: `sqlite3 <HOST_MEM_DIR>/claude-mem.db 'select count(*) from observations'`.
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
|
||||
TAG="${TAG:-claude-mem:basic}"
|
||||
|
||||
HOST_MEM_DIR="${HOST_MEM_DIR:-$REPO_ROOT/.docker-claude-mem-data}"
|
||||
mkdir -p "$HOST_MEM_DIR"
|
||||
echo "[run] host .claude-mem dir: $HOST_MEM_DIR" >&2
|
||||
|
||||
# Auth. Prefer OAuth (extracted from macOS Keychain / Linux creds file);
|
||||
# fall back to ANTHROPIC_API_KEY env.
|
||||
CREDS_FILE=""
|
||||
CREDS_MOUNT_ARGS=()
|
||||
if [[ -z "${ANTHROPIC_API_KEY:-}" ]]; then
|
||||
CREDS_FILE="$(mktemp -t claude-mem-creds.XXXXXX.json)"
|
||||
trap 'rm -f "$CREDS_FILE"' EXIT
|
||||
|
||||
# Try macOS Keychain first (primary storage on Darwin), then fall back to
|
||||
# the on-disk credentials file — some macOS setups (older CLI versions,
|
||||
# users who migrated machines) still have the file-only form.
|
||||
creds_obtained=0
|
||||
if [[ "$(uname)" == "Darwin" ]]; then
|
||||
if security find-generic-password -s 'Claude Code-credentials' -w > "$CREDS_FILE" 2>/dev/null \
|
||||
&& [[ -s "$CREDS_FILE" ]]; then
|
||||
creds_obtained=1
|
||||
fi
|
||||
fi
|
||||
if [[ "$creds_obtained" -eq 0 && -f "$HOME/.claude/.credentials.json" ]]; then
|
||||
cp "$HOME/.claude/.credentials.json" "$CREDS_FILE"
|
||||
creds_obtained=1
|
||||
fi
|
||||
if [[ "$creds_obtained" -eq 0 ]]; then
|
||||
echo "ERROR: no ANTHROPIC_API_KEY set and no Claude OAuth credentials found." >&2
|
||||
echo " Tried: macOS Keychain ('Claude Code-credentials') and ~/.claude/.credentials.json." >&2
|
||||
echo " Run \`claude login\` on the host first, or set ANTHROPIC_API_KEY." >&2
|
||||
exit 1
|
||||
fi
|
||||
chmod 600 "$CREDS_FILE"
|
||||
CREDS_MOUNT_ARGS=(
|
||||
-e CLAUDE_MEM_CREDENTIALS_FILE=/auth/.credentials.json
|
||||
-v "$CREDS_FILE:/auth/.credentials.json:ro"
|
||||
)
|
||||
else
|
||||
CREDS_MOUNT_ARGS=(-e ANTHROPIC_API_KEY)
|
||||
fi
|
||||
|
||||
# Pick -it only when a TTY is attached (keeps non-interactive callers working).
|
||||
TTY_ARGS=()
|
||||
[[ -t 0 && -t 1 ]] && TTY_ARGS=(-it)
|
||||
|
||||
# NOT `exec` — we want the EXIT trap above to run and remove $CREDS_FILE
|
||||
# after the container exits. Running docker as a child keeps the shell
|
||||
# alive long enough for the trap to fire.
|
||||
docker run --rm "${TTY_ARGS[@]}" \
|
||||
"${CREDS_MOUNT_ARGS[@]}" \
|
||||
-v "$HOST_MEM_DIR:/home/node/.claude-mem" \
|
||||
"$TAG" \
|
||||
"$@"
|
||||
Reference in New Issue
Block a user