85 Commits

Author SHA1 Message Date
hkfires ce7474d953 feat(cliproxy): propagate thinking support metadata to aliased models 2025-12-30 15:16:54 +08:00
leaph 6403ff4ec4 feat(iflow): add model-specific thinking configs for GLM-4.7 and MiniMax-M2.1
- GLM-4.7: Uses extra_body={"thinking": {"type": "enabled"}, "clear_thinking": false}
- MiniMax-M2.1: Uses reasoning_split=true for OpenAI-style reasoning separation
- Added preserveReasoningContentInMessages() to support re-injection of reasoning
  content in assistant message history for multi-turn conversations
- Added ThinkingSupport to MiniMax-M2.1 model definition
2025-12-27 18:39:15 +01:00
Luis Pater 9d975e0375 feat(models): add support for GLM-4.7 and MiniMax-M2.1 2025-12-24 19:30:57 +08:00
sheauhuu df777650ac feat: add gemini-3-flash-preview model definition in GetGeminiModels 2025-12-20 20:05:20 +08:00
hkfires fa70b220e9 feat(registry): add gpt 5.2 codex model definition 2025-12-19 09:53:03 +08:00
Ben Vargas a33f5d31fc feat: use thinkingLevel for Gemini 3 models per Google documentation
Per Google's official documentation, Gemini 3 models should use
thinkingLevel (string) instead of thinkingBudget (number) for
optimal performance.

From Google's Gemini Thinking docs:
> Use the thinkingLevel parameter with Gemini 3 models. While
> thinkingBudget is accepted for backwards compatibility, using
> it with Gemini 3 Pro may result in suboptimal performance.

Changes:
- Add model family detection functions (IsGemini3Model, IsGemini25Model,
  IsGemini3ProModel, IsGemini3FlashModel)
- Add ApplyGeminiThinkingLevel and ApplyGeminiCLIThinkingLevel functions
  for applying thinkingLevel config
- Add ValidateGemini3ThinkingLevel for model-specific level validation
- Add ThinkingBudgetToGemini3Level for backward compatibility conversion
- Update NormalizeGeminiThinkingBudget to convert budget to level for
  Gemini 3 models
- Update ApplyDefaultThinkingIfNeeded to not set a default level for
  Gemini 3 (lets API use its dynamic default "high")
- Update ConvertThinkingLevelToBudget to preserve thinkingLevel for
  Gemini 3 models
- Add Levels field to all Gemini 3 model definitions:
  - Gemini 3 Pro: ["low", "high"]
  - Gemini 3 Flash: ["minimal", "low", "medium", "high"]

Backward compatibility:
- Gemini 2.5 models continue to use thinkingBudget as before
- If thinkingBudget is provided for Gemini 3, it's converted to the
  appropriate thinkingLevel
- Existing configurations continue to work
2025-12-17 15:28:20 -07:00
Luis Pater f27672f6cf feat(antigravity): add Gemini 3 Flash Preview model definition with enhanced capabilities
docker-image / docker (push) Has been cancelled
goreleaser / goreleaser (push) Has been cancelled
2025-12-18 01:02:19 +08:00
hkfires b326ec3641 feat(iflow): add thinking support for iFlow models 2025-12-16 18:34:43 +08:00
Luis Pater 5a75ef8ffd Merge pull request #536 from AoaoMH/feature/auth-model-check
docker-image / docker (push) Has been cancelled
goreleaser / goreleaser (push) Has been cancelled
feat: using Client Model Infos;
2025-12-15 00:29:33 +08:00
Test 07279f8746 feat: using Client Model Infos; 2025-12-15 00:13:05 +08:00
Luis Pater 71f788b13a fix(registry): remove unused ThinkingSupport from DeepSeek-R1 model 2025-12-14 21:30:17 +08:00
Luis Pater 59c62dc580 fix(registry): correct DeepSeek-V3.2 experimental model ID 2025-12-14 21:27:43 +08:00
Luis Pater d5310a3300 Merge pull request #531 from AoaoMH/feature/auth-model-check
docker-image / docker (push) Has been cancelled
goreleaser / goreleaser (push) Has been cancelled
feat: add API endpoint to query models for auth credentials
2025-12-14 16:46:43 +08:00
Luis Pater f0a3eb574e fix(registry): update DeepSeek model definitions with new IDs and descriptions
docker-image / docker (push) Has been cancelled
goreleaser / goreleaser (push) Has been cancelled
2025-12-14 16:17:11 +08:00
Test bb15855443 feat: add API endpoint to query models for auth credentials 2025-12-14 15:16:26 +08:00
Ben Vargas b09e2115d1 fix(models): add "none" reasoning effort level to gpt-5.2
Per OpenAI API documentation, gpt-5.2 supports reasoning_effort values
of "none", "low", "medium", "high", and "xhigh". The "none" level was
missing from the model definition.

Reference: https://platform.openai.com/docs/api-reference/chat/create#chat_create-reasoning_effort
2025-12-11 15:26:23 -07:00
Luis Pater cd2da152d4 feat(models): add GPT 5.2 model definition and prompts
docker-image / docker (push) Has been cancelled
goreleaser / goreleaser (push) Has been cancelled
2025-12-12 03:02:27 +08:00
hkfires 007572b58e fix(util): do not strip thinking suffix on registered models
NormalizeThinkingModel now checks ModelSupportsThinking before removing
"-thinking" or "-thinking-<ver>", avoiding accidental parsing of model
names where the suffix is part of the official id (e.g., kimi-k2-thinking,
qwen3-235b-a22b-thinking-2507).

The registry adds ThinkingSupport metadata for several models and
propagates it via ModelInfo (e.g., kimi-k2-thinking, deepseek-r1,
qwen3-235b-a22b-thinking-2507, minimax-m2), enabling accurate detection
of thinking-capable models and correcting base model inference.
2025-12-11 15:52:14 +08:00
hkfires a03d514095 feat(registry): add thinking metadata for models 2025-12-11 11:28:44 +08:00
Luis Pater 423ce97665 feat(util): implement dynamic thinking suffix normalization and refactor budget resolution logic
docker-image / docker (push) Has been cancelled
goreleaser / goreleaser (push) Has been cancelled
- Added support for parsing and normalizing dynamic thinking model suffixes.
- Centralized budget resolution across executors and payload helpers.
- Retired legacy Gemini-specific thinking handlers in favor of unified logic.
- Updated executors to use metadata-based thinking configuration.
- Added `ResolveOriginalModel` utility for resolving normalized upstream models using request metadata.
- Updated executors (Gemini, Codex, iFlow, OpenAI, Qwen) to incorporate upstream model resolution and substitute model values in payloads and request URLs.
- Ensured fallbacks handle cases with missing or malformed metadata to derive models robustly.
- Refactored upstream model resolution to dynamically incorporate metadata for selecting and normalizing models.
- Improved handling of thinking configurations and model overrides in executors.
- Removed hardcoded thinking model entries and migrated logic to metadata-based resolution.
- Updated payload mutations to always include the resolved model.
2025-12-11 03:10:50 +08:00
hkfires 3cfe7008a2 fix(registry): update gpt 5.1 model names 2025-12-09 17:55:21 +08:00
hkfires e5312fb5a2 feat(antigravity): support canonical names for antigravity models 2025-12-09 16:54:13 +08:00
hkfires a283545b6b feat(antigravity): enforce thinking budget limits for Claude models 2025-12-08 20:36:17 +08:00
hkfires 9c09128e00 feat(registry): add explicit thinking support config for antigravity models 2025-12-07 19:12:55 +08:00
Luis Pater 897c40bed8 feat(registry): add DeepSeek-V3.2-Chat model definition
docker-image / docker (push) Has been cancelled
goreleaser / goreleaser (push) Has been cancelled
Add new DeepSeek-V3.2-Chat model to the registry with standard chat configuration, positioned before the experimental variant for better organization.
2025-12-03 21:34:50 +08:00
Luis Pater 1434bc38e5 **refactor(registry): remove Qwen3-Coder from model definitions**
docker-image / docker (push) Has been cancelled
goreleaser / goreleaser (push) Has been cancelled
2025-12-02 11:34:38 +08:00
hkfires 75e278c7a5 feat(registry): add thinking support to gemini models 2025-11-30 20:56:29 +08:00
Luis Pater d2e4639b2a **feat(registry): add context length and update max tokens for Claude model configurations**
- Added `ContextLength` field with a value of 200,000 to all applicable Claude model definitions.
- Standardized `MaxCompletionTokens` values across models for consistency and alignment.
2025-11-27 16:13:25 +08:00
nestharus e73cdf5cff fix(claude): ensure max_tokens exceeds thinking budget for thinking models
Fixes an issue where Claude thinking models would return 400 errors when
the thinking.budget_tokens was greater than or equal to max_tokens.

Changes:
- Add MaxCompletionTokens: 128000 to all Claude thinking model definitions
- Add ensureMaxTokensForThinking() function in claude_executor.go that:
  - Checks if thinking is enabled with a budget_tokens value
  - Looks up the model's MaxCompletionTokens from the registry
  - Ensures max_tokens is set to at least the model's MaxCompletionTokens
  - Falls back to budget_tokens + 4000 buffer if registry lookup fails

This ensures Anthropic API constraint (max_tokens > thinking.budget_tokens)
is always satisfied when using extended thinking features.

Fixes: #339

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 22:31:05 -08:00
Luis Pater 36755421fe Merge pull request #343 from router-for-me/misc
style(amp): tidy whitespace in proxy module and tests
2025-11-26 19:03:07 +08:00
hkfires 6c17dbc4da style(amp): tidy whitespace in proxy module and tests 2025-11-26 18:57:26 +08:00
Luis Pater ee6429cc75 **feat(registry): add Gemini 3 Pro Image Preview model and remove Claude Sonnet 4.5 Thinking**
docker-image / docker (push) Has been cancelled
goreleaser / goreleaser (push) Has been cancelled
- Added new `Gemini 3 Pro Image Preview` model with detailed metadata and configuration.
- Removed outdated `Claude Sonnet 4.5 Thinking` model definition for cleanup and relevance.
2025-11-26 18:22:40 +08:00
nestharus d0e694d4ed feat(claude): add thinking model variants and beta headers support
- Add Claude thinking model definitions (sonnet-4-5-thinking, opus-4-5-thinking variants)
- Add Thinking support for antigravity models with -thinking suffix
- Add injectThinkingConfig() for automatic thinking budget based on model suffix
- Add resolveUpstreamModel() mappings for thinking variants to actual Claude models
- Add extractAndRemoveBetas() to convert betas array to anthropic-beta header
- Update applyClaudeHeaders() to merge custom betas from request body

Closes #324
2025-11-25 03:33:05 -08:00
Ben Vargas 0895533400 fix(registry): correct Claude Opus 4.5 created timestamp
Update epoch from 1730419200 (2024-11-01) to 1761955200 (2025-11-01).
2025-11-24 12:27:23 -07:00
Ben Vargas 43f007c234 feat(registry): add Claude Opus 4.5 model definition
Add support for claude-opus-4-5-20251101 with 200K context window
and 64K max output tokens.
2025-11-24 12:26:39 -07:00
Luis Pater db81331ae8 **refactor(middleware): extract request logging logic and optimize condition checks**
docker-image / docker (push) Has been cancelled
goreleaser / goreleaser (push) Has been cancelled
- Added `shouldLogRequest` helper to simplify path-based request logging logic.
- Updated middleware to skip management endpoints for improved security.
- Introduced an explicit `nil` logger check for minimal overhead.
- Updated dependencies in `go.mod`.

**feat(auth): add handling for 404 response with retry logic**

- Introduced support for 404 `not_found` status with a 12-hour backoff period.
- Updated `manager.go` to align state and status messages for 404 scenarios.

**refactor(translator): comment out debug logging in Gemini responses request**
2025-11-20 23:20:40 +08:00
Luis Pater 371324c090 **feat(registry): expand Gemini model definitions and support Vertex AI** 2025-11-20 18:16:26 +08:00
Luis Pater 0586da9c2b **refactor(registry): move Gemini 3 Pro Preview model definition to base set** 2025-11-20 10:51:16 +08:00
Ben Vargas 782bba0bc4 feat(registry): enable gemini-3-pro-preview for gemini-cli provider
Add gemini-3-pro-preview model to GetGeminiCLIModels() to make it
available for OAuth-based Gemini CLI users, matching the model
already available in AI Studio provider.

Model spec:
- ID: gemini-3-pro-preview
- Version: 3.0
- Input: 1M tokens
- Output: 64K tokens
- Thinking: 128-32K tokens (dynamic)
2025-11-19 12:47:39 -07:00
Luis Pater bf116b68f8 **feat(registry): add GPT-5.1 Codex Max model definitions and support**
docker-image / docker (push) Has been cancelled
goreleaser / goreleaser (push) Has been cancelled
- Introduced `gpt-5.1-codex-max` variants to model definitions (`low`, `medium`, `high`, `xhigh`).
- Updated executor logic to map effort levels for Codex Max models.
- Added `lastCodexMaxPrompt` processing for `gpt-5.1-codex-max` prompts.
- Defined instructions for `gpt-5.1-codex-max` in a new file: `codex_instructions/gpt-5.1-codex-max_prompt.md`.
2025-11-20 03:12:22 +08:00
Luis Pater 17016ae6a5 **feat(registry): add Gemini 3 Pro Preview model definition**
docker-image / docker (push) Has been cancelled
goreleaser / goreleaser (push) Has been cancelled
2025-11-18 23:48:21 +08:00
Luis Pater 01b7b60901 **feat(registry): add Gemini 3 Pro Preview model definition** 2025-11-18 23:46:58 +08:00
Luis Pater 23a7633e6d **fix(registry): update Thinking parameters and replace Gemini-3 Preview with Gemini-2.5 Flash Lite**
docker-image / docker (push) Has been cancelled
goreleaser / goreleaser (push) Has been cancelled
2025-11-18 11:51:52 +08:00
Luis Pater 772fa69515 Fixed: #254
docker-image / docker (push) Has been cancelled
goreleaser / goreleaser (push) Has been cancelled
**feat(registry): add Kimi-K2-Thinking model to model definitions**
2025-11-14 21:20:54 +08:00
Luis Pater 1ccb01631d **refactor(runtime): centralize reasoning effort logic for GPT models**
docker-image / docker (push) Has been cancelled
goreleaser / goreleaser (push) Has been cancelled
Extract reasoning effort mapping into a reusable function `setReasoningEffortByAlias` to reduce redundancy and improve maintainability. Introduce support for the "gpt-5.1-none" variant in the registry and runtime executor.
2025-11-14 17:24:40 +08:00
Ben Vargas cfbaed0e90 fix(runtime): remove gpt-5.1 minimal effort variant
Stop advertising and mapping the unsupported gpt-5.1-minimal variant in the model registry and Codex executor, and align bare gpt-5.1 requests to use medium reasoning effort like Codex CLI while preserving minimal for gpt-5.
2025-11-13 19:43:52 -07:00
Luis Pater 75b57bc112 Fixed: #246
docker-image / docker (push) Has been cancelled
goreleaser / goreleaser (push) Has been cancelled
feat(runtime): add support for GPT-5.1 models and variants

Introduce GPT-5.1 model family, including minimal, low, medium, high, Codex, and Codex Mini variants. Update tokenization and reasoning effort handling to accommodate new models in executor and registry.
2025-11-13 17:42:19 +08:00
TUGOhost 92f4278039 feat: add auto model resolution and model creation timestamp tracking
- Add 'created' field to model registry for tracking model creation time
- Implement GetFirstAvailableModel() to find the first available model by newest creation timestamp
- Add ResolveAutoModel() utility function to resolve "auto" model name to actual available model
- Update request handler to resolve "auto" model before processing requests
- Ensures automatic model selection when "auto" is specified as model name

This enables dynamic model selection based on availability and creation time, improving the user experience when no specific model is requested.
2025-11-11 20:30:09 +08:00
Luis Pater d745f07044 fix(registry): replace Gemini model list with updated stable and preview versions
docker-image / docker (push) Has been cancelled
goreleaser / goreleaser (push) Has been cancelled
2025-11-08 15:51:57 +08:00
Jeff Nash fcb0293c0d feat(registry): add GPT-5 Codex Mini model variants
Adds three new Codex Mini model variants (mini, mini-medium, mini-high)
that map to codex-mini-latest. Codex Mini supports medium and high
reasoning effort levels only (no low/minimal). Base model defaults to
medium reasoning effort.
2025-11-07 17:07:39 -08:00