diff --git a/docs/public/configuration.mdx b/docs/public/configuration.mdx index be0fc8eb..1e906c29 100644 --- a/docs/public/configuration.mdx +++ b/docs/public/configuration.mdx @@ -14,7 +14,7 @@ Settings are managed in `~/.claude-mem/settings.json`. The file is auto-created | Setting | Default | Description | |-------------------------------|---------------------------------|---------------------------------------| | `CLAUDE_MEM_MODEL` | `sonnet` | AI model for processing observations (when using Claude) | -| `CLAUDE_MEM_PROVIDER` | `claude` | AI provider: `claude` or `gemini` | +| `CLAUDE_MEM_PROVIDER` | `claude` | AI provider: `claude`, `gemini`, or `openrouter` | | `CLAUDE_MEM_MODE` | `code` | Active mode profile (e.g., `code--es`, `email-investigation`) | | `CLAUDE_MEM_CONTEXT_OBSERVATIONS` | `50` | Number of observations to inject | | `CLAUDE_MEM_WORKER_PORT` | `37777` | Worker service port | @@ -29,6 +29,19 @@ Settings are managed in `~/.claude-mem/settings.json`. The file is auto-created See [Gemini Provider](usage/gemini-provider) for detailed configuration and free tier information. +### OpenRouter Provider Settings + +| Setting | Default | Description | +|----------------------------------------------|-----------------------------|---------------------------------------| +| `CLAUDE_MEM_OPENROUTER_API_KEY` | — | OpenRouter API key ([get key](https://openrouter.ai/keys)) | +| `CLAUDE_MEM_OPENROUTER_MODEL` | `xiaomi/mimo-v2-flash:free` | Model identifier (supports 100+ models) | +| `CLAUDE_MEM_OPENROUTER_MAX_CONTEXT_MESSAGES` | `20` | Max messages in conversation history | +| `CLAUDE_MEM_OPENROUTER_MAX_TOKENS` | `100000` | Token budget safety limit | +| `CLAUDE_MEM_OPENROUTER_SITE_URL` | — | Optional: URL for analytics | +| `CLAUDE_MEM_OPENROUTER_APP_NAME` | `claude-mem` | Optional: App name for analytics | + +See [OpenRouter Provider](usage/openrouter-provider) for detailed configuration, free model list, and usage guide. + ### System Configuration | Setting | Default | Description | diff --git a/docs/public/docs.json b/docs/public/docs.json index 7d521c41..7f2b2771 100644 --- a/docs/public/docs.json +++ b/docs/public/docs.json @@ -36,6 +36,7 @@ "introduction", "installation", "usage/getting-started", + "usage/openrouter-provider", "usage/gemini-provider", "usage/search-tools", "usage/claude-desktop", diff --git a/docs/public/usage/openrouter-provider.mdx b/docs/public/usage/openrouter-provider.mdx new file mode 100644 index 00000000..d14f46c4 --- /dev/null +++ b/docs/public/usage/openrouter-provider.mdx @@ -0,0 +1,320 @@ +--- +title: "OpenRouter Provider" +description: "Access 100+ AI models through OpenRouter's unified API, including free models for cost-effective observation extraction" +--- + +# OpenRouter Provider + +Claude-mem supports [OpenRouter](https://openrouter.ai) as an alternative provider for observation extraction. OpenRouter provides a unified API to access 100+ models from different providers including Google, Meta, Mistral, DeepSeek, and many others—often with generous free tiers. + + +**Free Models Available**: OpenRouter offers several completely free models, making it an excellent choice for reducing observation extraction costs to zero while maintaining quality. + + +## Why Use OpenRouter? + +- **Access to 100+ models**: Choose from models across multiple providers through one API +- **Free tier options**: Several high-quality models are completely free to use +- **Cost flexibility**: Pay-as-you-go pricing on premium models with no commitments +- **Seamless fallback**: Automatically falls back to Claude if OpenRouter is unavailable +- **Hot-swappable**: Switch providers without restarting the worker +- **Multi-turn conversations**: Full conversation history maintained across API calls + +## Free Models on OpenRouter + +OpenRouter actively supports democratizing AI access by offering free models. These are production-ready models suitable for observation extraction. + +### Featured Free Models + +| Model | ID | Parameters | Context | Best For | +|-------|------|------------|---------|----------| +| **Xiaomi MiMo-V2-Flash** | `xiaomi/mimo-v2-flash:free` | 309B (15B active, MoE) | 256K | Reasoning, coding, agents | +| **Gemini 2.0 Flash** | `google/gemini-2.0-flash-exp:free` | — | 1M | General purpose | +| **Gemini 2.5 Flash** | `google/gemini-2.5-flash-preview:free` | — | 1M | Latest capabilities | +| **DeepSeek R1** | `deepseek/deepseek-r1:free` | 671B | 64K | Reasoning, analysis | +| **Llama 3.1 70B** | `meta-llama/llama-3.1-70b-instruct:free` | 70B | 128K | General purpose | +| **Llama 3.1 8B** | `meta-llama/llama-3.1-8b-instruct:free` | 8B | 128K | Fast, lightweight | +| **Mistral Nemo** | `mistralai/mistral-nemo:free` | 12B | 128K | Efficient performance | + + +**Default Model**: Claude-mem uses `xiaomi/mimo-v2-flash:free` by default—a 309B parameter mixture-of-experts model that ranks #1 on SWE-bench Verified and excels at coding and reasoning tasks. + + +### Free Model Considerations + +- **Rate limits**: Free models may have stricter rate limits than paid models +- **Availability**: Free capacity depends on provider partnerships and demand +- **Queue times**: During peak usage, requests may be queued briefly +- **Max tokens**: Most free models support 65,536 completion tokens + +All free models support: +- Tool use and function calling +- Temperature and sampling controls +- Stop sequences +- Streaming responses + +## Getting an API Key + +1. Go to [OpenRouter](https://openrouter.ai) +2. Sign in with Google, GitHub, or email +3. Navigate to [API Keys](https://openrouter.ai/keys) +4. Click **Create Key** +5. Copy and securely store your API key + + +**Free to start**: No credit card required to create an account or use free models. Add credits only if you want to use premium models. + + +## Configuration + +### Settings + +| Setting | Values | Default | Description | +|---------|--------|---------|-------------| +| `CLAUDE_MEM_PROVIDER` | `claude`, `gemini`, `openrouter` | `claude` | AI provider for observation extraction | +| `CLAUDE_MEM_OPENROUTER_API_KEY` | string | — | Your OpenRouter API key | +| `CLAUDE_MEM_OPENROUTER_MODEL` | string | `xiaomi/mimo-v2-flash:free` | Model identifier (see list above) | +| `CLAUDE_MEM_OPENROUTER_MAX_CONTEXT_MESSAGES` | number | `20` | Max messages in conversation history | +| `CLAUDE_MEM_OPENROUTER_MAX_TOKENS` | number | `100000` | Token budget safety limit | +| `CLAUDE_MEM_OPENROUTER_SITE_URL` | string | — | Optional: URL for analytics attribution | +| `CLAUDE_MEM_OPENROUTER_APP_NAME` | string | `claude-mem` | Optional: App name for analytics | + +### Using the Settings UI + +1. Open the viewer at http://localhost:37777 +2. Click the **gear icon** to open Settings +3. Under **AI Provider**, select **OpenRouter** +4. Enter your OpenRouter API key +5. Optionally select a different model + +Settings are applied immediately—no restart required. + +### Manual Configuration + +Edit `~/.claude-mem/settings.json`: + +```json +{ + "CLAUDE_MEM_PROVIDER": "openrouter", + "CLAUDE_MEM_OPENROUTER_API_KEY": "sk-or-v1-your-key-here", + "CLAUDE_MEM_OPENROUTER_MODEL": "xiaomi/mimo-v2-flash:free" +} +``` + +Alternatively, set the API key via environment variable: + +```bash +export OPENROUTER_API_KEY="sk-or-v1-your-key-here" +``` + +The settings file takes precedence over the environment variable. + +## Model Selection Guide + +### For Free Usage (No Cost) + +**Recommended**: `xiaomi/mimo-v2-flash:free` +- Best-in-class performance on coding benchmarks +- 256K context window handles large observations +- 65K max completion tokens +- Mixture-of-experts architecture (15B active parameters) + +**Alternatives**: +- `google/gemini-2.0-flash-exp:free` - 1M context, Google's flagship +- `deepseek/deepseek-r1:free` - Excellent reasoning capabilities +- `meta-llama/llama-3.1-70b-instruct:free` - Strong general purpose + +### For Paid Usage (Higher Quality/Speed) + +| Model | Price (per 1M tokens) | Best For | +|-------|----------------------|----------| +| `anthropic/claude-3.5-sonnet` | $3 in / $15 out | Highest quality observations | +| `google/gemini-2.0-flash` | $0.075 in / $0.30 out | Fast, cost-effective | +| `openai/gpt-4o` | $2.50 in / $10 out | GPT-4 quality | + +## Context Window Management + +OpenRouter agent implements intelligent context management to prevent runaway costs: + +### Automatic Truncation + +The agent uses a sliding window strategy: +1. Checks if message count exceeds `MAX_CONTEXT_MESSAGES` (default: 20) +2. Checks if estimated tokens exceed `MAX_TOKENS` (default: 100,000) +3. If limits exceeded, keeps most recent messages only +4. Logs warnings with dropped message counts + +### Token Estimation + +- Conservative estimate: 1 token ≈ 4 characters +- Used for proactive context management +- Actual usage logged from API response + +### Cost Tracking + +Logs include detailed usage information: + +``` +OpenRouter API usage: { + model: "xiaomi/mimo-v2-flash:free", + inputTokens: 2500, + outputTokens: 1200, + totalTokens: 3700, + estimatedCostUSD: "0.00", + messagesInContext: 8 +} +``` + +## Provider Switching + +You can switch between providers at any time: + +- **No restart required**: Changes take effect on the next observation +- **Conversation history preserved**: When switching mid-session, the new provider sees the full conversation context +- **Seamless transition**: All providers use the same observation format + +### Switching via UI + +1. Open Settings in the viewer +2. Change the **AI Provider** dropdown +3. The next observation will use the new provider + +### Switching via Settings File + +```json +{ + "CLAUDE_MEM_PROVIDER": "openrouter" +} +``` + +## Fallback Behavior + +If OpenRouter encounters errors, claude-mem automatically falls back to the Claude Agent SDK: + +**Triggers fallback:** +- Rate limiting (HTTP 429) +- Server errors (HTTP 500, 502, 503) +- Network issues (connection refused, timeout) +- Generic fetch failures + +**Does not trigger fallback:** +- Missing API key (logs warning, uses Claude from start) +- Invalid API key (fails with error) + +When fallback occurs: +1. A warning is logged +2. Any in-progress messages are reset to pending +3. Claude SDK takes over with the full conversation context + + +**Fallback is transparent**: Your observations continue processing without interruption. The fallback preserves all conversation context. + + +## Multi-Turn Conversation Support + +OpenRouter agent maintains full conversation history across API calls: + +``` +Session Created + ↓ +Load Pending Messages (observations from queue) + ↓ +For each message: + → Add to conversation history + → Call OpenRouter API with FULL history + → Parse XML response + → Store observations in database + → Sync to Chroma vector DB + ↓ +Session complete +``` + +This enables: +- Coherent multi-turn exchanges +- Context preservation across observations +- Seamless provider switching mid-session + +## Troubleshooting + +### "OpenRouter API key not configured" + +Either: +- Set `CLAUDE_MEM_OPENROUTER_API_KEY` in `~/.claude-mem/settings.json`, or +- Set the `OPENROUTER_API_KEY` environment variable + +### Rate Limiting + +Free models may have rate limits during peak usage. If you hit rate limits: +- Claude-mem automatically falls back to Claude SDK +- Consider switching to a different free model +- Add credits for premium model access + +### Model Not Found + +Verify the model ID is correct: +- Check [OpenRouter Models](https://openrouter.ai/models) for current availability +- Use the `:free` suffix for free model variants +- Model IDs are case-sensitive + +### High Token Usage Warning + +If you see warnings about high token usage (>50,000 per request): +- Reduce `CLAUDE_MEM_OPENROUTER_MAX_CONTEXT_MESSAGES` +- Reduce `CLAUDE_MEM_OPENROUTER_MAX_TOKENS` +- Consider a model with larger context window + +### Connection Errors + +If you see connection errors: +- Check your internet connection +- Verify OpenRouter service status at [status.openrouter.ai](https://status.openrouter.ai) +- The agent will automatically fall back to Claude + +## API Details + +OpenRouter uses an OpenAI-compatible REST API: + +**Endpoint**: `https://openrouter.ai/api/v1/chat/completions` + +**Headers**: +``` +Authorization: Bearer {apiKey} +HTTP-Referer: https://github.com/thedotmack/claude-mem +X-Title: claude-mem +Content-Type: application/json +``` + +**Request Format**: +```json +{ + "model": "xiaomi/mimo-v2-flash:free", + "messages": [ + {"role": "system", "content": "..."}, + {"role": "user", "content": "..."} + ], + "temperature": 0.3, + "max_tokens": 4096 +} +``` + +## Comparing Providers + +| Feature | Claude (SDK) | Gemini | OpenRouter | +|---------|-------------|--------|------------| +| **Cost** | Pay per token | Free tier + paid | Free models + paid | +| **Models** | Claude only | Gemini only | 100+ models | +| **Quality** | Highest | High | Varies by model | +| **Rate limits** | Based on tier | 5-4000 RPM | Varies by model | +| **Fallback** | N/A (primary) | → Claude | → Claude | +| **Setup** | Automatic | API key required | API key required | + + +**Recommendation**: Start with OpenRouter's free `xiaomi/mimo-v2-flash:free` model for zero-cost observation extraction. If you need higher quality or encounter rate limits, switch to Claude or add OpenRouter credits for premium models. + + +## Next Steps + +- [Configuration](../configuration) - Full settings reference +- [Gemini Provider](gemini-provider) - Alternative free provider +- [Getting Started](getting-started) - Basic usage guide +- [Troubleshooting](../troubleshooting) - Common issues