321 lines
11 KiB
Plaintext
321 lines
11 KiB
Plaintext
---
|
|
title: "OpenRouter Provider"
|
|
description: "Access 100+ AI models through OpenRouter's unified API, including free models for cost-effective observation extraction"
|
|
---
|
|
|
|
# OpenRouter Provider
|
|
|
|
Claude-mem supports [OpenRouter](https://openrouter.ai) as an alternative provider for observation extraction. OpenRouter provides a unified API to access 100+ models from different providers including Google, Meta, Mistral, DeepSeek, and many others—often with generous free tiers.
|
|
|
|
<Tip>
|
|
**Free Models Available**: OpenRouter offers several completely free models, making it an excellent choice for reducing observation extraction costs to zero while maintaining quality.
|
|
</Tip>
|
|
|
|
## Why Use OpenRouter?
|
|
|
|
- **Access to 100+ models**: Choose from models across multiple providers through one API
|
|
- **Free tier options**: Several high-quality models are completely free to use
|
|
- **Cost flexibility**: Pay-as-you-go pricing on premium models with no commitments
|
|
- **Seamless fallback**: Automatically falls back to Claude if OpenRouter is unavailable
|
|
- **Hot-swappable**: Switch providers without restarting the worker
|
|
- **Multi-turn conversations**: Full conversation history maintained across API calls
|
|
|
|
## Free Models on OpenRouter
|
|
|
|
OpenRouter actively supports democratizing AI access by offering free models. These are production-ready models suitable for observation extraction.
|
|
|
|
### Featured Free Models
|
|
|
|
| Model | ID | Parameters | Context | Best For |
|
|
|-------|------|------------|---------|----------|
|
|
| **Xiaomi MiMo-V2-Flash** | `xiaomi/mimo-v2-flash:free` | 309B (15B active, MoE) | 256K | Reasoning, coding, agents |
|
|
| **Gemini 2.0 Flash** | `google/gemini-2.0-flash-exp:free` | — | 1M | General purpose |
|
|
| **Gemini 2.5 Flash** | `google/gemini-2.5-flash-preview:free` | — | 1M | Latest capabilities |
|
|
| **DeepSeek R1** | `deepseek/deepseek-r1:free` | 671B | 64K | Reasoning, analysis |
|
|
| **Llama 3.1 70B** | `meta-llama/llama-3.1-70b-instruct:free` | 70B | 128K | General purpose |
|
|
| **Llama 3.1 8B** | `meta-llama/llama-3.1-8b-instruct:free` | 8B | 128K | Fast, lightweight |
|
|
| **Mistral Nemo** | `mistralai/mistral-nemo:free` | 12B | 128K | Efficient performance |
|
|
|
|
<Note>
|
|
**Default Model**: Claude-mem uses `xiaomi/mimo-v2-flash:free` by default—a 309B parameter mixture-of-experts model that ranks #1 on SWE-bench Verified and excels at coding and reasoning tasks.
|
|
</Note>
|
|
|
|
### Free Model Considerations
|
|
|
|
- **Rate limits**: Free models may have stricter rate limits than paid models
|
|
- **Availability**: Free capacity depends on provider partnerships and demand
|
|
- **Queue times**: During peak usage, requests may be queued briefly
|
|
- **Max tokens**: Most free models support 65,536 completion tokens
|
|
|
|
All free models support:
|
|
- Tool use and function calling
|
|
- Temperature and sampling controls
|
|
- Stop sequences
|
|
- Streaming responses
|
|
|
|
## Getting an API Key
|
|
|
|
1. Go to [OpenRouter](https://openrouter.ai)
|
|
2. Sign in with Google, GitHub, or email
|
|
3. Navigate to [API Keys](https://openrouter.ai/keys)
|
|
4. Click **Create Key**
|
|
5. Copy and securely store your API key
|
|
|
|
<Tip>
|
|
**Free to start**: No credit card required to create an account or use free models. Add credits only if you want to use premium models.
|
|
</Tip>
|
|
|
|
## Configuration
|
|
|
|
### Settings
|
|
|
|
| Setting | Values | Default | Description |
|
|
|---------|--------|---------|-------------|
|
|
| `CLAUDE_MEM_PROVIDER` | `claude`, `gemini`, `openrouter` | `claude` | AI provider for observation extraction |
|
|
| `CLAUDE_MEM_OPENROUTER_API_KEY` | string | — | Your OpenRouter API key |
|
|
| `CLAUDE_MEM_OPENROUTER_MODEL` | string | `xiaomi/mimo-v2-flash:free` | Model identifier (see list above) |
|
|
| `CLAUDE_MEM_OPENROUTER_MAX_CONTEXT_MESSAGES` | number | `20` | Max messages in conversation history |
|
|
| `CLAUDE_MEM_OPENROUTER_MAX_TOKENS` | number | `100000` | Token budget safety limit |
|
|
| `CLAUDE_MEM_OPENROUTER_SITE_URL` | string | — | Optional: URL for analytics attribution |
|
|
| `CLAUDE_MEM_OPENROUTER_APP_NAME` | string | `claude-mem` | Optional: App name for analytics |
|
|
|
|
### Using the Settings UI
|
|
|
|
1. Open the viewer at http://localhost:37777
|
|
2. Click the **gear icon** to open Settings
|
|
3. Under **AI Provider**, select **OpenRouter**
|
|
4. Enter your OpenRouter API key
|
|
5. Optionally select a different model
|
|
|
|
Settings are applied immediately—no restart required.
|
|
|
|
### Manual Configuration
|
|
|
|
Edit `~/.claude-mem/settings.json`:
|
|
|
|
```json
|
|
{
|
|
"CLAUDE_MEM_PROVIDER": "openrouter",
|
|
"CLAUDE_MEM_OPENROUTER_API_KEY": "sk-or-v1-your-key-here",
|
|
"CLAUDE_MEM_OPENROUTER_MODEL": "xiaomi/mimo-v2-flash:free"
|
|
}
|
|
```
|
|
|
|
Alternatively, set the API key via environment variable:
|
|
|
|
```bash
|
|
export OPENROUTER_API_KEY="sk-or-v1-your-key-here"
|
|
```
|
|
|
|
The settings file takes precedence over the environment variable.
|
|
|
|
## Model Selection Guide
|
|
|
|
### For Free Usage (No Cost)
|
|
|
|
**Recommended**: `xiaomi/mimo-v2-flash:free`
|
|
- Best-in-class performance on coding benchmarks
|
|
- 256K context window handles large observations
|
|
- 65K max completion tokens
|
|
- Mixture-of-experts architecture (15B active parameters)
|
|
|
|
**Alternatives**:
|
|
- `google/gemini-2.0-flash-exp:free` - 1M context, Google's flagship
|
|
- `deepseek/deepseek-r1:free` - Excellent reasoning capabilities
|
|
- `meta-llama/llama-3.1-70b-instruct:free` - Strong general purpose
|
|
|
|
### For Paid Usage (Higher Quality/Speed)
|
|
|
|
| Model | Price (per 1M tokens) | Best For |
|
|
|-------|----------------------|----------|
|
|
| `anthropic/claude-3.5-sonnet` | $3 in / $15 out | Highest quality observations |
|
|
| `google/gemini-2.0-flash` | $0.075 in / $0.30 out | Fast, cost-effective |
|
|
| `openai/gpt-4o` | $2.50 in / $10 out | GPT-4 quality |
|
|
|
|
## Context Window Management
|
|
|
|
OpenRouter agent implements intelligent context management to prevent runaway costs:
|
|
|
|
### Automatic Truncation
|
|
|
|
The agent uses a sliding window strategy:
|
|
1. Checks if message count exceeds `MAX_CONTEXT_MESSAGES` (default: 20)
|
|
2. Checks if estimated tokens exceed `MAX_TOKENS` (default: 100,000)
|
|
3. If limits exceeded, keeps most recent messages only
|
|
4. Logs warnings with dropped message counts
|
|
|
|
### Token Estimation
|
|
|
|
- Conservative estimate: 1 token ≈ 4 characters
|
|
- Used for proactive context management
|
|
- Actual usage logged from API response
|
|
|
|
### Cost Tracking
|
|
|
|
Logs include detailed usage information:
|
|
|
|
```
|
|
OpenRouter API usage: {
|
|
model: "xiaomi/mimo-v2-flash:free",
|
|
inputTokens: 2500,
|
|
outputTokens: 1200,
|
|
totalTokens: 3700,
|
|
estimatedCostUSD: "0.00",
|
|
messagesInContext: 8
|
|
}
|
|
```
|
|
|
|
## Provider Switching
|
|
|
|
You can switch between providers at any time:
|
|
|
|
- **No restart required**: Changes take effect on the next observation
|
|
- **Conversation history preserved**: When switching mid-session, the new provider sees the full conversation context
|
|
- **Seamless transition**: All providers use the same observation format
|
|
|
|
### Switching via UI
|
|
|
|
1. Open Settings in the viewer
|
|
2. Change the **AI Provider** dropdown
|
|
3. The next observation will use the new provider
|
|
|
|
### Switching via Settings File
|
|
|
|
```json
|
|
{
|
|
"CLAUDE_MEM_PROVIDER": "openrouter"
|
|
}
|
|
```
|
|
|
|
## Fallback Behavior
|
|
|
|
If OpenRouter encounters errors, claude-mem automatically falls back to the Claude Agent SDK:
|
|
|
|
**Triggers fallback:**
|
|
- Rate limiting (HTTP 429)
|
|
- Server errors (HTTP 500, 502, 503)
|
|
- Network issues (connection refused, timeout)
|
|
- Generic fetch failures
|
|
|
|
**Does not trigger fallback:**
|
|
- Missing API key (logs warning, uses Claude from start)
|
|
- Invalid API key (fails with error)
|
|
|
|
When fallback occurs:
|
|
1. A warning is logged
|
|
2. Any in-progress messages are reset to pending
|
|
3. Claude SDK takes over with the full conversation context
|
|
|
|
<Note>
|
|
**Fallback is transparent**: Your observations continue processing without interruption. The fallback preserves all conversation context.
|
|
</Note>
|
|
|
|
## Multi-Turn Conversation Support
|
|
|
|
OpenRouter agent maintains full conversation history across API calls:
|
|
|
|
```
|
|
Session Created
|
|
↓
|
|
Load Pending Messages (observations from queue)
|
|
↓
|
|
For each message:
|
|
→ Add to conversation history
|
|
→ Call OpenRouter API with FULL history
|
|
→ Parse XML response
|
|
→ Store observations in database
|
|
→ Sync to Chroma vector DB
|
|
↓
|
|
Session complete
|
|
```
|
|
|
|
This enables:
|
|
- Coherent multi-turn exchanges
|
|
- Context preservation across observations
|
|
- Seamless provider switching mid-session
|
|
|
|
## Troubleshooting
|
|
|
|
### "OpenRouter API key not configured"
|
|
|
|
Either:
|
|
- Set `CLAUDE_MEM_OPENROUTER_API_KEY` in `~/.claude-mem/settings.json`, or
|
|
- Set the `OPENROUTER_API_KEY` environment variable
|
|
|
|
### Rate Limiting
|
|
|
|
Free models may have rate limits during peak usage. If you hit rate limits:
|
|
- Claude-mem automatically falls back to Claude SDK
|
|
- Consider switching to a different free model
|
|
- Add credits for premium model access
|
|
|
|
### Model Not Found
|
|
|
|
Verify the model ID is correct:
|
|
- Check [OpenRouter Models](https://openrouter.ai/models) for current availability
|
|
- Use the `:free` suffix for free model variants
|
|
- Model IDs are case-sensitive
|
|
|
|
### High Token Usage Warning
|
|
|
|
If you see warnings about high token usage (>50,000 per request):
|
|
- Reduce `CLAUDE_MEM_OPENROUTER_MAX_CONTEXT_MESSAGES`
|
|
- Reduce `CLAUDE_MEM_OPENROUTER_MAX_TOKENS`
|
|
- Consider a model with larger context window
|
|
|
|
### Connection Errors
|
|
|
|
If you see connection errors:
|
|
- Check your internet connection
|
|
- Verify OpenRouter service status at [status.openrouter.ai](https://status.openrouter.ai)
|
|
- The agent will automatically fall back to Claude
|
|
|
|
## API Details
|
|
|
|
OpenRouter uses an OpenAI-compatible REST API:
|
|
|
|
**Endpoint**: `https://openrouter.ai/api/v1/chat/completions`
|
|
|
|
**Headers**:
|
|
```
|
|
Authorization: Bearer {apiKey}
|
|
HTTP-Referer: https://github.com/thedotmack/claude-mem
|
|
X-Title: claude-mem
|
|
Content-Type: application/json
|
|
```
|
|
|
|
**Request Format**:
|
|
```json
|
|
{
|
|
"model": "xiaomi/mimo-v2-flash:free",
|
|
"messages": [
|
|
{"role": "system", "content": "..."},
|
|
{"role": "user", "content": "..."}
|
|
],
|
|
"temperature": 0.3,
|
|
"max_tokens": 4096
|
|
}
|
|
```
|
|
|
|
## Comparing Providers
|
|
|
|
| Feature | Claude (SDK) | Gemini | OpenRouter |
|
|
|---------|-------------|--------|------------|
|
|
| **Cost** | Pay per token | Free tier + paid | Free models + paid |
|
|
| **Models** | Claude only | Gemini only | 100+ models |
|
|
| **Quality** | Highest | High | Varies by model |
|
|
| **Rate limits** | Based on tier | 5-4000 RPM | Varies by model |
|
|
| **Fallback** | N/A (primary) | → Claude | → Claude |
|
|
| **Setup** | Automatic | API key required | API key required |
|
|
|
|
<Tip>
|
|
**Recommendation**: Start with OpenRouter's free `xiaomi/mimo-v2-flash:free` model for zero-cost observation extraction. If you need higher quality or encounter rate limits, switch to Claude or add OpenRouter credits for premium models.
|
|
</Tip>
|
|
|
|
## Next Steps
|
|
|
|
- [Configuration](/configuration) - Full settings reference
|
|
- [Gemini Provider](/usage/gemini-provider) - Alternative free provider
|
|
- [Getting Started](/usage/getting-started) - Basic usage guide
|
|
- [Troubleshooting](/troubleshooting) - Common issues
|