fix: add context window management to prevent runaway API costs

Implements critical safeguards for OpenRouter integration:

- Context truncation: sliding window keeps max 20 messages (configurable)
- Token limits: hard cap at 100k estimated tokens per request
- Cost tracking: logs token usage and estimated cost per API call
- High-usage warnings: alerts when single request exceeds 50k tokens

New settings:
- CLAUDE_MEM_OPENROUTER_MAX_CONTEXT_MESSAGES (default: 20)
- CLAUDE_MEM_OPENROUTER_MAX_TOKENS (default: 100000)

Prevents exponential context growth that caused $300 runaway charges
in initial testing. Context now automatically truncates to most recent
messages within token budget, with detailed logging for monitoring.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
Jarad DeLorenzo
2025-12-26 15:38:14 -05:00
parent 86d0d1a21a
commit e3186ede5d
11 changed files with 195 additions and 103 deletions
File diff suppressed because one or more lines are too long