Commit Graph

299 Commits

Author SHA1 Message Date
Luis Pater 13eb5268de Merge pull request #582 from ben-vargas/fix-gemini-3-thinking-level
feat: use thinkingLevel for Gemini 3 models per Google documentation
2025-12-18 07:19:37 +08:00
Ben Vargas 598f0af19b fix: apply thinkingLevel from model suffix metadata for Gemini 3
The previous commit added thinkingLevel support but didn't apply it
when the reasoning effort came from model name suffix (e.g., model(minimal)).

This was because ResolveThinkingConfigFromMetadata returns nil for
level-based models, bypassing the metadata application.

Changes:
- Add ApplyGemini3ThinkingLevelFromMetadata for standard Gemini API
- Add ApplyGemini3ThinkingLevelFromMetadataCLI for CLI API format
- Update gemini_cli_executor to apply Gemini 3 thinkingLevel from metadata
- Update antigravity_executor to apply Gemini 3 thinkingLevel from metadata
- Update aistudio_executor to apply Gemini 3 thinkingLevel from metadata
- Add comprehensive test coverage for Gemini 3 thinkingLevel functions
2025-12-17 16:08:38 -07:00
Ben Vargas a33f5d31fc feat: use thinkingLevel for Gemini 3 models per Google documentation
Per Google's official documentation, Gemini 3 models should use
thinkingLevel (string) instead of thinkingBudget (number) for
optimal performance.

From Google's Gemini Thinking docs:
> Use the thinkingLevel parameter with Gemini 3 models. While
> thinkingBudget is accepted for backwards compatibility, using
> it with Gemini 3 Pro may result in suboptimal performance.

Changes:
- Add model family detection functions (IsGemini3Model, IsGemini25Model,
  IsGemini3ProModel, IsGemini3FlashModel)
- Add ApplyGeminiThinkingLevel and ApplyGeminiCLIThinkingLevel functions
  for applying thinkingLevel config
- Add ValidateGemini3ThinkingLevel for model-specific level validation
- Add ThinkingBudgetToGemini3Level for backward compatibility conversion
- Update NormalizeGeminiThinkingBudget to convert budget to level for
  Gemini 3 models
- Update ApplyDefaultThinkingIfNeeded to not set a default level for
  Gemini 3 (lets API use its dynamic default "high")
- Update ConvertThinkingLevelToBudget to preserve thinkingLevel for
  Gemini 3 models
- Add Levels field to all Gemini 3 model definitions:
  - Gemini 3 Pro: ["low", "high"]
  - Gemini 3 Flash: ["minimal", "low", "medium", "high"]

Backward compatibility:
- Gemini 2.5 models continue to use thinkingBudget as before
- If thinkingBudget is provided for Gemini 3, it's converted to the
  appropriate thinkingLevel
- Existing configurations continue to work
2025-12-17 15:28:20 -07:00
Luis Pater 68a27772b3 feat(antigravity): enable token counting via API with resilient routing
docker-image / docker (push) Has been cancelled
goreleaser / goreleaser (push) Has been cancelled
Introduces the capability to count tokens for Antigravity-backed requests. This implementation leverages the `countTokens` endpoint of the Antigravity API, replacing the prior unsupported stub.

Key aspects of this update include:

- **API Integration**: Direct integration with the Antigravity `countTokens` API, including necessary request payload translation and authentication.
- **Resilient Infrastructure**: A fallback mechanism has been established, allowing the system to attempt connections across multiple Antigravity base URLs to ensure request success even in the event of temporary service interruptions.
- **Model Aliasing**: Added mappings for `gemini-3-flash` and `gemini-3-flash-preview` to ensure compatibility with the latest model variants.
- **Robust Error Handling**: Comprehensive error handling and logging are in place to manage failures during API interactions.
2025-12-18 03:12:46 +08:00
Luis Pater 5fda6f8ef3 feat(antigravity): implement non-streaming execution for Claude model requests 2025-12-17 23:17:11 +08:00
Luis Pater 09923f654c feat(antigravity): add streaming support for Claude model requests 2025-12-17 22:16:57 +08:00
이대희 1b8e538a77 feature: Improves Gemini JSON schema compatibility
Enhances compatibility with the Gemini API by implementing a schema cleaning process.

This includes:
- Centralizing schema cleaning logic for Gemini in a dedicated utility function.
- Converting unsupported schema keywords to hints within the description field.
- Flattening complex schema structures like `anyOf`, `oneOf`, and type arrays to simplify the schema.
- Handling streaming responses with empty tool names, which can occur in subsequent chunks after the initial tool use.
2025-12-17 17:10:53 +09:00
hkfires 28a428ae2f fix(thinking): align budget effort mapping across translators
Unify thinking budget-to-effort conversion in a shared helper, handle disabled/default thinking cases in translators, adjust zero-budget mapping, and drop the old OpenAI-specific helper with updated tests.
2025-12-16 18:34:43 +08:00
hkfires b326ec3641 feat(iflow): add thinking support for iFlow models 2025-12-16 18:34:43 +08:00
hkfires 367a05bdf6 refactor(thinking): export thinking helpers
Expose thinking/effort normalization helpers from the executor package
so conversion tests use production code and stay aligned with runtime
validation behavior.
2025-12-15 09:16:15 +08:00
hkfires 712ce9f781 fix(thinking): drop unsupported none effort
When budget 0 maps to "none" for models that use thinking levels
but don't support that effort level, strip thinking fields instead
of setting an invalid reasoning_effort value.
Tests now expect removal for this edge case.
2025-12-15 09:16:14 +08:00
hkfires 27a5ad8ec2 Fixed: #534
fix(aistudio): correct JSON string boundary detection for backslash sequences
2025-12-15 09:00:14 +08:00
Luis Pater 14ce6aebd1 Merge pull request #449 from sususu98/fix/gemini-cli-429-retry-delay-parsing
fix(gemini-cli): enhance 429 retry delay parsing
2025-12-14 14:04:14 +08:00
Luis Pater 660aabc437 fix(executor): add allowCompat support for reasoning effort normalization
Introduced `allowCompat` parameter to improve compatibility handling for reasoning effort in payloads across OpenAI and similar models.
2025-12-13 04:06:02 +08:00
Luis Pater f3f0f1717d Merge branch 'dev' into think 2025-12-12 22:16:44 +08:00
Luis Pater 7621ec609e Merge pull request #501 from huynguyen03dev/fix/openai-compat-model-alias-resolution
docker-image / docker (push) Has been cancelled
goreleaser / goreleaser (push) Has been cancelled
fix(openai-compat): prevent model alias from being overwritten
2025-12-12 21:58:15 +08:00
Luis Pater 9f511f0024 fix(executor): improve model compatibility handling for OpenAI-compatibility
Enhances payload handling by introducing OpenAI-compatibility checks and refining how reasoning metadata is resolved, ensuring broader model support.
2025-12-12 21:57:25 +08:00
hkfires 374faa2640 fix(thinking): map budgets to effort levels
Ensure thinking settings translate correctly across providers:
- Only apply reasoning_effort to level-based models and derive it from numeric
  budget suffixes when present
- Strip effort string fields for budget-based models and skip Claude/Gemini
  budget resolution for level-based or unsupported models
- Default Gemini include_thoughts when a nonzero budget override is set
- Add cross-protocol conversion and budget range tests
2025-12-12 21:33:20 +08:00
huynguyen03.dev 15c3cc3a50 fix(openai-compat): prevent model alias from being overwritten by ResolveOriginalModel
When using OpenAI-compatible providers with model aliases (e.g., glm-4.6-zai -> glm-4.6),
the alias resolution was correctly applied but then immediately overwritten by
ResolveOriginalModel, causing 'Unknown Model' errors from upstream APIs.

This fix skips the ResolveOriginalModel override when a model alias has already
been resolved, ensuring the correct model name is sent to the upstream provider.

Co-authored-by: Amp <amp@ampcode.com>
2025-12-12 17:20:24 +07:00
hkfires d131435e25 fix(codex): raise default reasoning effort to medium 2025-12-12 18:18:48 +08:00
hkfires 3c315551b0 refactor(executor): relocate gemini token counters 2025-12-11 21:56:44 +08:00
hkfires 27c9c5c4da refactor(executor): clarify executor comments and oauth names 2025-12-11 21:56:44 +08:00
hkfires fc9f6c974a refactor(executor): clarify providers and streams
Add package and constructor documentation for AI Studio, Antigravity,
Gemini CLI, Gemini API, and Vertex executors to describe their roles and
inputs.

Introduce a shared stream scanner buffer constant in the Gemini API
executor and reuse it in Gemini CLI and Vertex streaming code so stream
handling uses a consistent configuration.

Update Refresh implementations for AI Studio, Gemini CLI, Gemini API
(API key), and Vertex executors to short‑circuit and simply return the
incoming auth object, while keeping Antigravity token renewal as the
only executor that performs OAuth refresh.

Remove OAuth2-based token refresh logic and related dependencies from
the Gemini API executor, since it now operates strictly with API key
credentials.
2025-12-11 21:56:43 +08:00
Luis Pater a74ee3f319 Merge pull request #481 from sususu98/fix/increase-buffer-size
docker-image / docker (push) Has been cancelled
goreleaser / goreleaser (push) Has been cancelled
fix: increase buffer size for stream scanners to 50MB across multiple executors
2025-12-11 21:20:54 +08:00
hkfires e79f65fd8e refactor(thinking): use parentheses for metadata suffix 2025-12-11 18:39:07 +08:00
hkfires facfe7c518 refactor(thinking): use bracket tags for thinking meta
Align thinking suffix handling on a single bracket-style marker.

NormalizeThinkingModel strips a terminal `[value]` segment from
model identifiers and turns it into either a thinking budget (for
numeric values) or a reasoning effort hint (for strings). Emission
of `ThinkingIncludeThoughtsMetadataKey` is removed.

Executor helpers and the example config are updated so their
comments reference the new `[value]` suffix format instead of the
legacy dash variants.

BREAKING CHANGE: dash-based thinking suffixes (`-thinking`,
`-thinking-N`, `-reasoning`, `-nothinking`) are no longer parsed
for thinking metadata; only `[value]` annotations are recognized.
2025-12-11 18:17:28 +08:00
hkfires 6285459c08 fix(runtime): unify claude thinking config resolution 2025-12-11 17:20:44 +08:00
hkfires 21bbceca0c docs(runtime): document reasoning effort precedence 2025-12-11 16:35:36 +08:00
hkfires f6300c72b7 fix(runtime): validate thinking config in iflow and qwen 2025-12-11 16:21:50 +08:00
hkfires 3a81ab22fd fix(runtime): unify reasoning effort metadata overrides 2025-12-11 14:35:05 +08:00
hkfires 519da2e042 fix(runtime): validate reasoning effort levels 2025-12-11 12:36:54 +08:00
hkfires 3ffd120ae9 feat(runtime): add thinking config normalization 2025-12-11 11:51:33 +08:00
sususu 07d21463ca fix(gemini-cli): enhance 429 retry delay parsing
Add fallback parsing for quota reset delay when RetryInfo is not present:
- Try ErrorInfo.metadata.quotaResetDelay (e.g., "373.801628ms")
- Parse from error.message "Your quota will reset after Xs."

This ensures proper cooldown timing for rate-limited requests.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 09:34:39 +08:00
Luis Pater 423ce97665 feat(util): implement dynamic thinking suffix normalization and refactor budget resolution logic
docker-image / docker (push) Has been cancelled
goreleaser / goreleaser (push) Has been cancelled
- Added support for parsing and normalizing dynamic thinking model suffixes.
- Centralized budget resolution across executors and payload helpers.
- Retired legacy Gemini-specific thinking handlers in favor of unified logic.
- Updated executors to use metadata-based thinking configuration.
- Added `ResolveOriginalModel` utility for resolving normalized upstream models using request metadata.
- Updated executors (Gemini, Codex, iFlow, OpenAI, Qwen) to incorporate upstream model resolution and substitute model values in payloads and request URLs.
- Ensured fallbacks handle cases with missing or malformed metadata to derive models robustly.
- Refactored upstream model resolution to dynamically incorporate metadata for selecting and normalizing models.
- Improved handling of thinking configurations and model overrides in executors.
- Removed hardcoded thinking model entries and migrated logic to metadata-based resolution.
- Updated payload mutations to always include the resolved model.
2025-12-11 03:10:50 +08:00
sususu 76c563d161 fix(executor): increase buffer size for stream scanners to 50MB across multiple executors 2025-12-10 23:20:04 +08:00
Luis Pater 94d61c7b2b fix(logging): update response aggregation logic to include all attempts
docker-image / docker (push) Has been cancelled
goreleaser / goreleaser (push) Has been cancelled
2025-12-10 16:53:48 +08:00
Luis Pater f25f419e5a fix(antigravity): remove references to autopush endpoint and update fallback logic
docker-image / docker (push) Has been cancelled
goreleaser / goreleaser (push) Has been cancelled
2025-12-10 00:13:20 +08:00
hkfires 9b202b6c1c fix(executor): centralize default thinking config 2025-12-09 21:05:06 +08:00
hkfires 6a66b6801a feat(executor): enforce minimum thinking budget for antigravity models 2025-12-09 21:05:06 +08:00
hkfires 5ec9b5e5a9 feat(executor): normalize thinking budget across all Gemini executors 2025-12-09 21:05:06 +08:00
Luis Pater 5db3b58717 Merge pull request #470 from router-for-me/agry
fix(gemini): normalize model listing output
2025-12-09 21:00:29 +08:00
Luis Pater 39b6b3b289 Fixed: #463
docker-image / docker (push) Has been cancelled
goreleaser / goreleaser (push) Has been cancelled
fix(antigravity): remove `$ref` and `$defs` from JSON during key deletion
2025-12-09 17:32:17 +08:00
hkfires e5312fb5a2 feat(antigravity): support canonical names for antigravity models 2025-12-09 16:54:13 +08:00
hkfires 96b55acff8 feat(aistudio): normalize thinking budget in request translation 2025-12-09 08:27:44 +08:00
Luis Pater af00304b0c fix(antigravity): remove exclusiveMaximum from JSON during key deletion 2025-12-08 23:28:01 +08:00
Luis Pater 6ad188921c refactor(logging): remove unused variable in ensureAttempt and redundant function call
docker-image / docker (push) Has been cancelled
goreleaser / goreleaser (push) Has been cancelled
2025-12-08 22:25:58 +08:00
hkfires a283545b6b feat(antigravity): enforce thinking budget limits for Claude models 2025-12-08 20:36:17 +08:00
hkfires 9c09128e00 feat(registry): add explicit thinking support config for antigravity models 2025-12-07 19:12:55 +08:00
Luis Pater fd29ab418a Fixed: #424
docker-image / docker (push) Has been cancelled
goreleaser / goreleaser (push) Has been cancelled
**feat(antigravity): add support for maxOutputTokens and refine Claude model handling**
2025-12-07 01:55:57 +08:00
Luis Pater d7564173dd fix(antigravity): restore production base URL in the executor
docker-image / docker (push) Has been cancelled
goreleaser / goreleaser (push) Has been cancelled
2025-12-06 01:11:37 +08:00