Files
claude-mem/docker/claude-mem/Dockerfile
T
Alex Newman 97c7c999b1 feat: basic claude-mem Docker container for easy spin-up (#2076)
* feat(evals): SWE-bench Docker scaffolding for claude-mem resolve-rate measurement

Adds evals/swebench/ scaffolding per .claude/plans/swebench-claude-mem-docker.md.
Agent image builds Claude Code 2.1.114 + locally-built claude-mem plugin;
run-instance.sh executes the two-turn ingest/fix protocol per instance;
run-batch.py orchestrates parallel Docker runs with per-instance isolation;
eval.sh wraps the upstream SWE-bench harness; summarize.py aggregates reports.

Orchestrator owns JSONL writes under a lock to avoid racy concurrent appends;
agent writes its authoritative diff to CLAUDE_MEM_OUTPUT_DIR (/scratch in
container mode) and the orchestrator reads it back. Scaffolding only — no
Docker build or smoke test run yet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(evals): OAuth credential mounting for Claude Max/Pro subscriptions

Skips per-call API billing by extracting OAuth creds from host Keychain
(macOS) or ~/.claude/.credentials.json (Linux) and bind-mounting them
read-only into each agent container. Creds are copied into HOME=$SCRATCH/.claude
at container start so the per-instance isolation model still holds.

Adds run-batch.py --auth {oauth,api-key,auto} (auto prefers OAuth, falls
back to API key). run-instance.sh accepts either ANTHROPIC_API_KEY or
CLAUDE_MEM_CREDENTIALS_FILE. smoke-test.sh runs one instance end-to-end
using OAuth for quick verification before batch runs.

Caveat surfaced in docstrings: Max/Pro has per-window usage limits and is
framed for individual developer use — batch evaluation may exhaust the
quota or raise compliance questions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(docker): basic claude-mem container for ad-hoc testing

Adds docker/claude-mem/ with a fresh spin-up image:
- Dockerfile: FROM node:20 (reproduces anthropics/claude-code .devcontainer
  pattern — Anthropic ships the Dockerfile, not a pullable image); layers
  Bun + uv + locally-built plugin/; runs as non-root node user
- entrypoint.sh: seeds OAuth creds from CLAUDE_MEM_CREDENTIALS_FILE into
  $HOME/.claude/.credentials.json, then exec's the command (default: bash)
- build.sh: npm run build + docker build
- run.sh: interactive launcher; auto-extracts OAuth from macOS Keychain
  (security find-generic-password) or ~/.claude/.credentials.json on Linux,
  mounts host .docker-claude-mem-data/ at /home/node/.claude-mem so the
  observations DB survives container exit

Validated end-to-end: PostToolUse hook fires, queue enqueues, worker's SDK
compression runs under subscription OAuth, observations row lands with
populated facts/concepts/files_read, Chroma sync triggers.

Also updates .gitignore/.dockerignore for the new runtime-output paths.
Built plugin artifacts refreshed by the build step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(evals/swebench): non-root user, OAuth mount, Lite dataset default

- Dockerfile.agent: switch to non-root \`node\` user (uid 1000); Claude Code
  refuses --permission-mode bypassPermissions when euid==0, which made every
  agent run exit 1 before producing a diff. Also move Bun + uv installs to
  system paths so the non-root user can exec them.
- run-batch.py: add extract_oauth_credentials() that pulls from macOS
  Keychain / Linux ~/.claude/.credentials.json into a temp file and bind-
  mounts it at /auth/.credentials.json:ro with CLAUDE_MEM_CREDENTIALS_FILE.
  New --auth {oauth,api-key,auto} flag. New --dataset flag so the batch can
  target SWE-bench_Lite without editing the script.
- smoke-test.sh: default DATASET to princeton-nlp/SWE-bench_Lite (Lite
  contains sympy__sympy-24152, Verified does not); accept DATASET env
  override.

Caveat surfaced during testing: Max/Pro subscriptions have per-window usage
limits; running 5 instances in parallel with the "read every source file"
ingest prompt exhausted the 5h window within ~25 minutes (3/5 hit HTTP 429).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address PR #2076 review comments

- docker/claude-mem/run.sh: chmod 600 (not 644) on extracted OAuth creds
  to match what `claude login` writes; avoids exposing tokens to other
  host users. Verified readable inside the container under Docker
  Desktop's UID translation.
- docker/claude-mem/Dockerfile: pin Bun + uv via --build-arg BUN_VERSION
  / UV_VERSION (defaults: 1.3.12, 0.11.7). Bun via `bash -s "bun-v<V>"`;
  uv via versioned installer URL `https://astral.sh/uv/<V>/install.sh`.
- evals/swebench/smoke-test.sh: pipe JSON through stdin to `python3 -c`
  so paths with spaces/special chars can't break shell interpolation.
- evals/swebench/run-batch.py: add --overwrite flag; abort by default
  when predictions.jsonl for the run-id already exists, preventing
  accidental silent discard of partial results.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address coderabbit review on PR #2076

Actionable (4):
- Dockerfile uv install: wrap `chmod ... || true` in braces so the trailing
  `|| true` no longer masks failures from `curl|sh` via bash operator
  precedence (&& binds tighter than ||). Applied to both docker/claude-mem/
  and evals/swebench/Dockerfile.agent. Added `set -eux` to the RUN lines.
- docker/claude-mem/Dockerfile: drop unused `sudo` apt package (~2 MB).
- run-batch.py: name each agent container (`swebench-agent-<id>-<pid>-<tid>`)
  and force-remove via `docker rm -f <name>` in the TimeoutExpired handler
  so timed-out runs don't leave orphan containers.

Nitpicks (2):
- smoke-test.sh: collapse 3 python3 invocations into 1 — parse the instance
  JSON once, print `repo base_commit`, and write problem.txt in the same
  call.
- run-instance.sh: shallow clone via `--depth 1 --no-single-branch` +
  `fetch --depth 1 origin $BASE_COMMIT`. Falls back to a full clone if the
  server rejects the by-commit fetch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address second coderabbit review on PR #2076

Actionable (3):
- docker/claude-mem/run.sh: on macOS, fall back to ~/.claude/.credentials.json
  when the Keychain lookup misses (some setups still have file-only creds).
  Unified into a single creds_obtained gate so the error surface lists both
  sources tried.
- docker/claude-mem/run.sh: drop `exec docker run` — `exec` replaces the shell
  so the EXIT trap (`rm -f "$CREDS_FILE"`) never fires and the extracted
  OAuth JSON leaks to disk until tmpfs cleanup. Run as a child instead so
  the trap runs on exit.
- evals/swebench/smoke-test.sh: actually enforce the TIMEOUT env var. Pick
  `timeout` or `gtimeout` (coreutils on macOS), fall back to uncapped with
  a warning. Name the container so exit-124 from timeout can `docker rm -f`
  it deterministically.

Nitpick from the same review (consolidated python3 calls in smoke-test.sh)
was already addressed in the prior commit ef621e00.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address third coderabbit review on PR #2076

Actionable (1):
- evals/swebench/smoke-test.sh: the consolidated python heredoc had competing
  stdin redirections — `<<'PY'` (script body) AND `< "$INSTANCE_JSON"` (data).
  The heredoc won, so `json.load(sys.stdin)` saw an empty stream and the parse
  would have failed at runtime. Pass INSTANCE_JSON as argv[2] and `open()` it
  inside the script instead; the heredoc is now only the script body, which
  is what `python3 -` needs.

Nitpicks (2):
- evals/swebench/smoke-test.sh: macOS Keychain lookup now falls through to
  ~/.claude/.credentials.json on miss (matches docker/claude-mem/run.sh).
- evals/swebench/run-batch.py: extract_oauth_credentials() no longer
  early-returns on Darwin keychain miss; falls through to the on-disk creds
  file so macOS setups with file-only credentials work in batch mode too.

Functional spot-check of the parse fix confirmed: REPO/BASE_COMMIT populated
and problem.txt written from a synthetic INSTANCE_JSON.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 17:34:30 -07:00

94 lines
3.6 KiB
Docker

# Basic claude-mem container for ad-hoc testing.
#
# Base layout mirrors anthropics/claude-code .devcontainer
# (https://github.com/anthropics/claude-code/blob/main/.devcontainer/Dockerfile):
# FROM node:20, non-root `node` user, global npm install of @anthropic-ai/claude-code.
# We skip the firewall/zsh/fzf/delta/git-hist noise since this image is for
# exercising claude-mem, not as a full dev environment.
#
# On top of that base we install:
# - Bun (claude-mem worker service runtime)
# - uv (provides Python for Chroma per CLAUDE.md)
# - The locally-built plugin/ tree at /opt/claude-mem
#
# Usage:
# docker build -f docker/claude-mem/Dockerfile -t claude-mem:basic .
# docker run --rm -it \
# -v $(mktemp -d):/home/node/.claude-mem \
# -e CLAUDE_MEM_CREDENTIALS_FILE=/auth/.credentials.json \
# -v /path/to/extracted/creds.json:/auth/.credentials.json:ro \
# claude-mem:basic
FROM node:20
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
git \
curl \
ca-certificates \
unzip \
jq \
less \
procps \
uuid-runtime \
sqlite3 \
&& apt-get clean && rm -rf /var/lib/apt/lists/*
# Bun — system-wide so the unprivileged `node` user can execute it.
# Pin via --build-arg BUN_VERSION=X.Y.Z; default is the version verified at PR time.
ARG BUN_VERSION=1.3.12
ENV BUN_INSTALL="/usr/local/bun"
RUN curl -fsSL https://bun.sh/install | bash -s "bun-v${BUN_VERSION}" \
&& chmod -R a+rX /usr/local/bun
ENV PATH="/usr/local/bun/bin:${PATH}"
# uv — system-wide, for Chroma's Python runtime. Pin via --build-arg UV_VERSION=X.Y.Z.
# Versioned installer URL per https://docs.astral.sh/uv/getting-started/installation/.
ARG UV_VERSION=0.11.7
ENV UV_INSTALL_DIR="/usr/local/bin"
# `&&` binds tighter than `||` in bash, so the previous form let `curl|sh` fail
# silently via the trailing `|| true`. Group the chmod so tolerated failure is
# scoped to perms-fixing only.
RUN set -eux \
&& curl -LsSf "https://astral.sh/uv/${UV_VERSION}/install.sh" | sh \
&& { chmod a+rX /usr/local/bin/uv /usr/local/bin/uvx 2>/dev/null || true; }
# Match the upstream devcontainer's npm-global prefix so `npm install -g`
# targets a dir the `node` user owns.
RUN mkdir -p /usr/local/share/npm-global \
&& chown -R node:node /usr/local/share/npm-global
ENV NPM_CONFIG_PREFIX=/usr/local/share/npm-global
ENV PATH="/usr/local/share/npm-global/bin:${PATH}"
# Claude Code CLI. Override at build-time with --build-arg CLAUDE_CODE_VERSION=X.Y.Z
# to pin; default tracks latest.
ARG CLAUDE_CODE_VERSION=latest
USER node
RUN npm install -g @anthropic-ai/claude-code@${CLAUDE_CODE_VERSION}
# Locally-built claude-mem plugin. COPY runs as root by default and layers are
# cached, so put this after the npm install so iterating on the plugin doesn't
# invalidate the CLI install layer.
USER root
COPY plugin/ /opt/claude-mem/
RUN chown -R node:node /opt/claude-mem
# Persistent mount points for ad-hoc testing — mount a host dir at either of
# these to inspect the claude-mem DB after a session.
RUN mkdir -p /home/node/.claude /home/node/.claude-mem \
&& chown -R node:node /home/node/.claude /home/node/.claude-mem
USER node
WORKDIR /home/node
# Helper: copies OAuth creds out of the read-only mount into $HOME/.claude/
# before exec'ing whatever you asked for. Saves the "cp + chmod" dance every
# time you drop in.
COPY --chown=node:node docker/claude-mem/entrypoint.sh /usr/local/bin/claude-mem-entrypoint
RUN chmod +x /usr/local/bin/claude-mem-entrypoint
ENTRYPOINT ["/usr/local/bin/claude-mem-entrypoint"]
CMD ["bash"]