Files
claude-mem/docs/public/usage/openrouter-provider.mdx
T
Alex Newman d13662d5d8 Cynical deletion: close 27 issues by removing defenders + tolerators (#2141)
* fix: mirror migration 28 in SessionStore so pending_messages.tool_use_id and worker_pid columns are created (#2139)

SessionStore's inline migration list jumped from v27 to v29, skipping
rebuildPendingMessagesForSelfHealingClaim. The worker uses SessionStore
directly via worker/DatabaseManager.ts and bypasses the canonical
MigrationRunner, so fresh installs ended up at "max v29" with neither
column present — every queue claim and observation insert failed.

Adds addPendingMessagesToolUseIdAndWorkerPidColumns following the existing
mirror precedent (addObservationSubagentColumns / addObservationsUniqueContentHashIndex).
Uses ALTER TABLE + column-existence guards so already-broken DBs at v29
self-heal on next worker boot.

Verified on fresh DB and on a synthetic v29-without-v28 broken DB:
both columns and indexes (idx_pending_messages_worker_pid,
ux_pending_session_tool) appear after one boot.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: wrap v28 mirror dedup+index creation in transaction

Addresses Greptile P2 review on PR #2140: matches the existing pattern in
addObservationsUniqueContentHashIndex (v29 mirror at SessionStore.ts:1127)
and runner.ts rebuildPendingMessagesForSelfHealingClaim. A crash between
the dedup DELETE and the schema_versions INSERT no longer leaves the DB
in a half-applied state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plan): cynical-deletion plan for 29 open issues

9-phase plan applying delete-first lens to triaged issue corpus.
Headlines: kill defenders (orphan cleanup, EncodedCommand spawn,
restart-port-steal) and tolerators (silent JSON drops, drifted SSE
filters). Each phase closes a named subset of issues.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: delete process-management theater (Phase 1: DEL-1 + DEL-2)

Delete aggressiveStartupCleanup, the PowerShell -EncodedCommand
spawn branch, and the restart-with-port-steal sequence. Replace
daemon spawning with a single uniform child_process.spawn path
using arg-array form, keeping setsid on Unix when available.

The defenders (orphan cleanup, duplicate-worker probes, port
stealing) bred more bugs than they fixed. PID file with start-time
token already provides correct OS-trust ownership; restart now
requests httpShutdown, waits 5s for the port to free, then exits 1
if it didn't (user resolves). Net -247 lines.

Closes #2090, #2095 (already fixed at session-init.ts:78), #2107,
#2111, #2114, #2117, #2123, #2097, #2135.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: observer-sessions trust boundary via CLAUDE_MEM_INTERNAL env (Phase 2: DEL-9)

Replace the cwd === OBSERVER_SESSIONS_DIR discriminator (which every
consumer must repeat and inevitably drifts) with a single env-var
trust boundary set once at spawn time in buildIsolatedEnv.

- buildIsolatedEnv now sets CLAUDE_MEM_INTERNAL=1, covering all three
  spawn sites (SDKAgent, KnowledgeAgent.prime, KnowledgeAgent.executeQuery)
- shouldTrackProject checks the env var first (cwd check stays as
  belt-and-braces fallback)
- New shared shouldEmitProjectRow predicate — SSE broadcaster and
  pagination filter share the same predicate so they can never drift
  apart (#2118)
- ObservationBroadcaster filters observer rows from SSE stream
- PaginationHelper hardcoded 'observer-sessions' replaced with
  OBSERVER_SESSIONS_PROJECT const
- project-filter basename match pass — *observer-sessions* now matches
  basename, not just full path (globToRegex's [^/]* can't cross /)
  (#2126 item 1)
- New `claude-mem cleanup [--dry-run]` subcommand wires CleanupV12_4_3
  through to the worker for #2126 item 5

Closes #2118, #2126.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: strip proxy env vars before spawning worker (Phase 4: CON-1)

User's HTTP_PROXY/HTTPS_PROXY config was bleeding into internal AI
calls when claude-mem spawns the claude subprocess, causing
connection failures. Strip unconditionally — no passthrough knob,
which rejects #2099's whitelist proposal.

Closes #2115, #2099.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: fail-fast on silent drops in stdin/file-context/memory-save (Phase 5: FF-1)

Three independent fail-fast fixes:

#2089 — stdin-reader silent drop. Non-empty stdin that fails JSON.parse
now rejects with a clear error instead of resolving undefined. Empty
stdin still resolves undefined.

#2094 — PreToolUse:Read truncation Edit deadlock. file-context handler
no longer returns a fake truncated Read result via updatedInput.
Removes userOffset/userLimit/truncated machinery; injects the timeline
via additionalContext only and lets the real Read pass through. Read
state and Claude's expectation now stay consistent, eliminating the
infinite Edit retry loop.

#2116 — /api/memory/save metadata drop + project bug. Schema accepts
metadata as a documented JSON column (migration 30 adds observations.
metadata TEXT, mirrored in SessionStore). Schema also tightened to
.strict() so unknown top-level fields fail fast instead of being
silently dropped. Project resolution now consults metadata.project as
a fallback before defaultProject.

Closes #2089, #2094, #2116.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: small deletions — Zod externalize / Gemini fallback / session timeout / installCLI alias (Phase 6)

DEL-4 (#2113): Externalize zod from mcp-server.cjs and context-generator.cjs
hook bundles so OpenCode's runtime resolves a single Zod copy. Worker
keeps Zod bundled (it's a daemon subprocess, not in OpenCode's hook
bundle). Added zod to plugin/package.json so externalized requires
resolve at runtime.

DEL-5 (#2087): Delete the never-wired GeminiAgent → Claude fallback.
fallbackAgent was always null in production. On 429 the agent now
throws cleanly (message stays pending for retry). Removed
setFallbackAgent, FallbackAgent interface, and the 429 fallback
branch from both GeminiAgent and OpenRouterAgent. Updated docs
that claimed automatic Claude fallback.

DEL-6 (#2127, #2098): Raise MAX_SESSION_WALL_CLOCK_MS from 4h to
24h. The timeout is a real guard against runaway-cost loops (per
issue #1590), but 4h kills legitimate long Claude Code days. 24h
preserves the guard while never hitting in normal use. No knob —
a session approaching this age is a bug worth investigating, not
a value worth tuning.

DEL-8 (#2054): Delete installCLI() alias function. Saves 4 keystrokes
at the cost of cross-platform shell-config mutation surface — not
worth it. Canonical entry is npx claude-mem (and bunx). Uninstall
now strips legacy alias/function lines from ~/.bashrc, ~/.zshrc,
and the PowerShell profile.

Closes #2087, #2098, #2113, #2127, #2054.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: de-hardcode worker port + multi-account commit (Phase 3: CON-2 + DEL-7)

Replace hardcoded 37777 fallbacks with SettingsDefaultsManager.get(
'CLAUDE_MEM_WORKER_PORT') in npx-cli (runtime/install/uninstall),
opencode-plugin, OpenClaw installer, SearchRoutes example URLs.
Timeline-report SKILL.md now resolves WORKER_PORT from settings.json
at the top and uses ${WORKER_PORT} in all curl invocations.
Remaining 37777 literals are doc comments + viewer build-time form-
field placeholder (which is replaced by /api/settings on mount).

hooks.json: add cygpath POSIX→Windows path translation between _R
resolution and node invocation. No-op on macOS/Linux. Closes the
Windows + Git Bash MODULE_NOT_FOUND in #2109.

CLAUDE.md gains a Multi-account section documenting CLAUDE_MEM_DATA_DIR
+ optional CLAUDE_MEM_WORKER_PORT — every existing path/port code
path now honors them.

Closes #2103, #2109, #2101.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: install/uninstall improvements (Phase 7: #2106)

5 fixes for the install/uninstall flow:

Item 1 — multiselect default. install.ts no longer pre-selects every
detected IDE; user explicitly opts in.

Item 3 — shutdown-before-overwrite. New
src/services/install/shutdown-helper.ts shared by install and
uninstall: POSTs /api/admin/shutdown then polls /api/health until
the worker stops responding. install calls it before
copyPluginToMarketplace so reinstall over a running worker doesn't
conflict; uninstall calls it before deletion.

Item 4 — uninstall path coverage. Removes ~/.npm/_npx/*/node_modules/
claude-mem, ~/.cache/claude-cli-nodejs/*/mcp-logs-plugin-claude-mem-*,
~/.claude/plugins/data/claude-mem-thedotmack/. Best-effort: per-path
try/catch so a single permission failure doesn't abort uninstall.
chroma-mcp shutdown is implicit via the worker's GracefulShutdown
cascade in item 3's helper.

Item 5 — install summary documents "Close all Claude Code sessions
before uninstalling, or ~/.claude-mem will be recreated by active
hooks."

Item 6 — real-port query. After install, fetches /api/health on the
configured port with 3s timeout. Reports actually-bound port if the
response carries it; falls back to requested port. No retry loop.

Closes #2106 (items 1, 3, 4, 5, 6). Items 2, 7 closed separately
as already-fixed and insufficient-detail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: pin chroma-mcp to 0.2.6 (Phase 8: DEL-3 lite)

Replace unpinned 'chroma-mcp' arg with chroma-mcp==0.2.6 in both
local and remote modes. Pinning makes installs deterministic across
machines and across time, eliminating the dependency-drift class
of bugs.

Verified 0.2.6 in a clean uv cache: starts cleanly, no httpcore/
httpx ImportError, no --with flags needed. The --with flags removed
in a0dd516c are not required at this pin (transitive deps resolve
correctly when the top-level version is fixed).

#2102's three protections (transport cleanup on failure, stale onclose
handler guard, 10s reconnect backoff) confirmed intact.

Closes #2046, #2085, #2102.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: update stale assertions for per-UID port + migration 30 (Phase 9)

SettingsDefaultsManager.CLAUDE_MEM_WORKER_PORT default is per-UID
(37700 + uid%100), not literal '37777'. Three assertions in
settings-defaults-manager.test.ts now compute the expected value
the same way the source does.

migration-runner.test.ts: drop expect(versions).toContain(19)
(version 19 was a noop never recorded — pre-existing bug at parent),
add expect(versions).toContain(30) for the new observations.metadata
column added in Phase 5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address Greptile P1/P2 review comments on PR #2141

P1: spawnDaemon return value was unchecked in worker-service.ts restart
case, so a failed spawn silently exited 0 with a misleading "Worker
restart spawned" log. Now error and exit 1 when restartPid is undefined.

P2: shutdown-helper.ts health-poll catch treated AbortError (timeout)
the same as connection-refused, so a slow worker could be reported
confirmedStopped while still holding file locks. Now distinguish:
AbortError continues polling; other errors return confirmedStopped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* build: rebuild plugin artifacts after merging main

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address CodeRabbit review comments on PR #2141

- hooks.json: quote $HOME in cache lookup so paths with spaces work
- timeline-report SKILL.md: fall back when process.getuid is unavailable (Windows)
- opencode-plugin: validate CLAUDE_MEM_WORKER_PORT before using
- uninstall.ts: only strip alias lines, not function declarations (multi-line bodies left intact)
- MemoryRoutes: trim whitespace-only project before precedence resolution
- SessionStore migration 21: preserve metadata column if observations already has it
- stdin-reader test: restore full property descriptor to avoid cross-test pollution

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 21:23:24 -07:00

309 lines
10 KiB
Plaintext

---
title: "OpenRouter Provider"
description: "Access 100+ AI models through OpenRouter's unified API, including free models for cost-effective observation extraction"
---
# OpenRouter Provider
Claude-mem supports [OpenRouter](https://openrouter.ai) as an alternative provider for observation extraction. OpenRouter provides a unified API to access 100+ models from different providers including Google, Meta, Mistral, DeepSeek, and many others—often with generous free tiers.
<Tip>
**Free Models Available**: OpenRouter offers several completely free models, making it an excellent choice for reducing observation extraction costs to zero while maintaining quality.
</Tip>
## Why Use OpenRouter?
- **Access to 100+ models**: Choose from models across multiple providers through one API
- **Free tier options**: Several high-quality models are completely free to use
- **Cost flexibility**: Pay-as-you-go pricing on premium models with no commitments
- **Errors throw clearly**: 429s, 5xx, and network failures throw — leaving messages pending so they can be retried
- **Hot-swappable**: Switch providers without restarting the worker
- **Multi-turn conversations**: Full conversation history maintained across API calls
## Free Models on OpenRouter
OpenRouter actively supports democratizing AI access by offering free models. These are production-ready models suitable for observation extraction.
### Featured Free Models
| Model | ID | Parameters | Context | Best For |
|-------|------|------------|---------|----------|
| **Xiaomi MiMo-V2-Flash** | `xiaomi/mimo-v2-flash:free` | 309B (15B active, MoE) | 256K | Reasoning, coding, agents |
| **Gemini 2.0 Flash** | `google/gemini-2.0-flash-exp:free` | — | 1M | General purpose |
| **Gemini 2.5 Flash** | `google/gemini-2.5-flash-preview:free` | — | 1M | Latest capabilities |
| **DeepSeek R1** | `deepseek/deepseek-r1:free` | 671B | 64K | Reasoning, analysis |
| **Llama 3.1 70B** | `meta-llama/llama-3.1-70b-instruct:free` | 70B | 128K | General purpose |
| **Llama 3.1 8B** | `meta-llama/llama-3.1-8b-instruct:free` | 8B | 128K | Fast, lightweight |
| **Mistral Nemo** | `mistralai/mistral-nemo:free` | 12B | 128K | Efficient performance |
<Note>
**Default Model**: Claude-mem uses `xiaomi/mimo-v2-flash:free` by default—a 309B parameter mixture-of-experts model that ranks #1 on SWE-bench Verified and excels at coding and reasoning tasks.
</Note>
### Free Model Considerations
- **Rate limits**: Free models may have stricter rate limits than paid models
- **Availability**: Free capacity depends on provider partnerships and demand
- **Queue times**: During peak usage, requests may be queued briefly
- **Max tokens**: Most free models support 65,536 completion tokens
All free models support:
- Tool use and function calling
- Temperature and sampling controls
- Stop sequences
- Streaming responses
## Getting an API Key
1. Go to [OpenRouter](https://openrouter.ai)
2. Sign in with Google, GitHub, or email
3. Navigate to [API Keys](https://openrouter.ai/keys)
4. Click **Create Key**
5. Copy and securely store your API key
<Tip>
**Free to start**: No credit card required to create an account or use free models. Add credits only if you want to use premium models.
</Tip>
## Configuration
### Settings
| Setting | Values | Default | Description |
|---------|--------|---------|-------------|
| `CLAUDE_MEM_PROVIDER` | `claude`, `gemini`, `openrouter` | `claude` | AI provider for observation extraction |
| `CLAUDE_MEM_OPENROUTER_API_KEY` | string | — | Your OpenRouter API key |
| `CLAUDE_MEM_OPENROUTER_MODEL` | string | `xiaomi/mimo-v2-flash:free` | Model identifier (see list above) |
| `CLAUDE_MEM_OPENROUTER_MAX_CONTEXT_MESSAGES` | number | `20` | Max messages in conversation history |
| `CLAUDE_MEM_OPENROUTER_MAX_TOKENS` | number | `100000` | Token budget safety limit |
| `CLAUDE_MEM_OPENROUTER_SITE_URL` | string | — | Optional: URL for analytics attribution |
| `CLAUDE_MEM_OPENROUTER_APP_NAME` | string | `claude-mem` | Optional: App name for analytics |
### Using the Settings UI
1. Open the viewer at http://localhost:37777
2. Click the **gear icon** to open Settings
3. Under **AI Provider**, select **OpenRouter**
4. Enter your OpenRouter API key
5. Optionally select a different model
Settings are applied immediately—no restart required.
### Manual Configuration
Edit `~/.claude-mem/settings.json`:
```json
{
"CLAUDE_MEM_PROVIDER": "openrouter",
"CLAUDE_MEM_OPENROUTER_API_KEY": "sk-or-v1-your-key-here",
"CLAUDE_MEM_OPENROUTER_MODEL": "xiaomi/mimo-v2-flash:free"
}
```
Alternatively, set the API key via environment variable:
```bash
export OPENROUTER_API_KEY="sk-or-v1-your-key-here"
```
The settings file takes precedence over the environment variable.
## Model Selection Guide
### For Free Usage (No Cost)
**Recommended**: `xiaomi/mimo-v2-flash:free`
- Best-in-class performance on coding benchmarks
- 256K context window handles large observations
- 65K max completion tokens
- Mixture-of-experts architecture (15B active parameters)
**Alternatives**:
- `google/gemini-2.0-flash-exp:free` - 1M context, Google's flagship
- `deepseek/deepseek-r1:free` - Excellent reasoning capabilities
- `meta-llama/llama-3.1-70b-instruct:free` - Strong general purpose
### For Paid Usage (Higher Quality/Speed)
| Model | Price (per 1M tokens) | Best For |
|-------|----------------------|----------|
| `anthropic/claude-3.5-sonnet` | $3 in / $15 out | Highest quality observations |
| `google/gemini-2.0-flash` | $0.075 in / $0.30 out | Fast, cost-effective |
| `openai/gpt-4o` | $2.50 in / $10 out | GPT-4 quality |
## Context Window Management
OpenRouter agent implements intelligent context management to prevent runaway costs:
### Automatic Truncation
The agent uses a sliding window strategy:
1. Checks if message count exceeds `MAX_CONTEXT_MESSAGES` (default: 20)
2. Checks if estimated tokens exceed `MAX_TOKENS` (default: 100,000)
3. If limits exceeded, keeps most recent messages only
4. Logs warnings with dropped message counts
### Token Estimation
- Conservative estimate: 1 token ≈ 4 characters
- Used for proactive context management
- Actual usage logged from API response
### Cost Tracking
Logs include detailed usage information:
```
OpenRouter API usage: {
model: "xiaomi/mimo-v2-flash:free",
inputTokens: 2500,
outputTokens: 1200,
totalTokens: 3700,
estimatedCostUSD: "0.00",
messagesInContext: 8
}
```
## Provider Switching
You can switch between providers at any time:
- **No restart required**: Changes take effect on the next observation
- **Conversation history preserved**: When switching mid-session, the new provider sees the full conversation context
- **Seamless transition**: All providers use the same observation format
### Switching via UI
1. Open Settings in the viewer
2. Change the **AI Provider** dropdown
3. The next observation will use the new provider
### Switching via Settings File
```json
{
"CLAUDE_MEM_PROVIDER": "openrouter"
}
```
## Error Behavior
If OpenRouter errors, claude-mem logs the failure and re-throws so the message stays pending for later retry. There is no Claude SDK fallback — earlier docs claimed automatic Claude fallback, but the wiring was never actually engaged in production (#2087). To switch providers, change `CLAUDE_MEM_PROVIDER` in settings.
**Throwing conditions:**
- Rate limiting (HTTP 429)
- Server errors (HTTP 500, 502, 503)
- Network issues (connection refused, timeout)
- 4xx errors other than 429
- Missing API key
## Multi-Turn Conversation Support
OpenRouter agent maintains full conversation history across API calls:
```
Session Created
Load Pending Messages (observations from queue)
For each message:
→ Add to conversation history
→ Call OpenRouter API with FULL history
→ Parse XML response
→ Store observations in database
→ Sync to Chroma vector DB
Session complete
```
This enables:
- Coherent multi-turn exchanges
- Context preservation across observations
- Seamless provider switching mid-session
## Troubleshooting
### "OpenRouter API key not configured"
Either:
- Set `CLAUDE_MEM_OPENROUTER_API_KEY` in `~/.claude-mem/settings.json`, or
- Set the `OPENROUTER_API_KEY` environment variable
### Rate Limiting
Free models may have rate limits during peak usage. If you hit rate limits:
- The agent throws and leaves the message pending — it will be retried later
- Consider switching to a different free model
- Add credits for premium model access
### Model Not Found
Verify the model ID is correct:
- Check [OpenRouter Models](https://openrouter.ai/models) for current availability
- Use the `:free` suffix for free model variants
- Model IDs are case-sensitive
### High Token Usage Warning
If you see warnings about high token usage (>50,000 per request):
- Reduce `CLAUDE_MEM_OPENROUTER_MAX_CONTEXT_MESSAGES`
- Reduce `CLAUDE_MEM_OPENROUTER_MAX_TOKENS`
- Consider a model with larger context window
### Connection Errors
If you see connection errors:
- Check your internet connection
- Verify OpenRouter service status at [status.openrouter.ai](https://status.openrouter.ai)
- The agent throws and leaves the message pending for later retry
## API Details
OpenRouter uses an OpenAI-compatible REST API:
**Endpoint**: `https://openrouter.ai/api/v1/chat/completions`
**Headers**:
```
Authorization: Bearer {apiKey}
HTTP-Referer: https://github.com/thedotmack/claude-mem
X-Title: claude-mem
Content-Type: application/json
```
**Request Format**:
```json
{
"model": "xiaomi/mimo-v2-flash:free",
"messages": [
{"role": "system", "content": "..."},
{"role": "user", "content": "..."}
],
"temperature": 0.3,
"max_tokens": 4096
}
```
## Comparing Providers
| Feature | Claude (SDK) | Gemini | OpenRouter |
|---------|-------------|--------|------------|
| **Cost** | Pay per token | Free tier + paid | Free models + paid |
| **Models** | Claude only | Gemini only | 100+ models |
| **Quality** | Highest | High | Varies by model |
| **Rate limits** | Based on tier | 5-4000 RPM | Varies by model |
| **On error** | Throws | Throws | Throws |
| **Setup** | Automatic | API key required | API key required |
<Tip>
**Recommendation**: Start with OpenRouter's free `xiaomi/mimo-v2-flash:free` model for zero-cost observation extraction. If you need higher quality or encounter rate limits, switch to Claude or add OpenRouter credits for premium models.
</Tip>
## Next Steps
- [Configuration](/configuration) - Full settings reference
- [Gemini Provider](/usage/gemini-provider) - Alternative free provider
- [Getting Started](/usage/getting-started) - Basic usage guide
- [Troubleshooting](/troubleshooting) - Common issues