- Added comprehensive unit tests for `authScheduler` and related components.
- Implemented `authScheduler` with support for Round Robin, Fill First, and custom selector strategies.
- Improved tracking of auth states, cooldowns, and recovery logic in scheduler.
test(websocket): add tests for incremental input and prewarm handling logic
- Added test cases for incremental input support based on upstream capabilities.
- Introduced validation for prewarm handling of `response.create` messages locally.
- Enhanced test coverage for websocket executor behavior, including payload forwarding checks.
- Updated websocket implementation with prewarm and incremental input logic for better testability.
test(translator): add unit tests for OpenAI to Claude requests and tool result handling
- Introduced tests for converting OpenAI requests to Claude with text, base64 images, and URL images in tool results.
- Refactored `convertClaudeToolResultContent` and related functionality to properly handle raw content with images and text.
- Updated conversion logic to streamline image handling for both base64 and URL formats.
- Simplified connection logic by removing `connCreateSent` and related state handling.
- Updated `buildCodexWebsocketRequestBody` to always use `response.create`.
- Added unit tests to validate `response.create` behavior and beta header preservation.
- Dropped unsupported `response.append` and outdated `response.done` event types.
- Add model to GetGeminiModels()
- Add model to GetGeminiVertexModels()
- Add model to GetGeminiCLIModels()
- Add model to GetAIStudioModels()
- Add to AntigravityModelConfig with thinking levels
- Update gemini-3-flash-preview description
Registers the new lightweight Gemini model across all provider
endpoints for cost-effective high-volume usage scenarios.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SetOAuthSessionError previously sent generic messages to the management
panel (e.g. "Failed to complete Gemini CLI onboarding"), hiding the
actual error returned by Google APIs. The specific error was only
written to the server log via log.Errorf, which is often inaccessible
in headless/Docker deployments.
Include the upstream error in all 8 OAuth error paths so the
management panel shows actionable messages like "no Google Cloud
projects available for this account" instead of a generic failure.
normalizeCacheControlTTL unconditionally re-serializes the entire request
body through json.Unmarshal/json.Marshal even when no TTL normalization
is needed. Go's json.Marshal randomizes map key order and HTML-escapes
<, >, & characters (to \u003c, \u003e, \u0026), producing different raw
bytes on every call.
Anthropic's prompt caching uses byte-prefix matching, so any byte-level
difference causes a cache miss. This means the ~119K system prompt and
tools are re-processed on every request when routed through CPA.
The fix adds a bool return to normalizeTTLForBlock to indicate whether
it actually modified anything, and skips the marshal step in
normalizeCacheControlTTL when no blocks were changed.
Add util.SanitizeClaudeToolID() to replace non-conforming characters in
tool_use.id fields across all five response translators (gemini, codex,
openai, antigravity, gemini-cli).
Upstream tool names may contain dots or other special characters
(e.g. "fs.readFile") that violate Claude's ID validation regex.
The sanitizer replaces such characters with underscores and provides
a generated fallback for empty IDs.
Fixes#1872, Fixes#1849
Made-with: Cursor
- Add `TestTriggerServerUpdateCancelsPendingTimerOnImmediate` to verify proper handling of server update debounce and timer cancellation.
- Fix logic in `triggerServerUpdate` to prevent duplicate timers and ensure proper cleanup of pending state.
The instructions restore logic was originally needed when the proxy
injected custom instructions (per-model system prompts) into requests.
Since ac802a46 removed the injection system, the proxy no longer
modifies instructions before forwarding. The upstream response's
instructions field now matches the client's original value, making
the restore a no-op.
Also removes unused sjson import.
Closesrouter-for-me/CLIProxyAPI#1868
fix(gemini): add `deprecated` to unsupported schema keywords
Add `deprecated` to the list of unsupported schema metadata fields in Gemini and update tests to verify its removal.
- Set Accept-Encoding: identity for SSE streams; upstream must not compress
line-delimited SSE bodies that bufio.Scanner reads directly
- Re-enforce identity after ApplyCustomHeadersFromAttrs to prevent auth
attribute injection from re-enabling compression on the stream path
- Add peekableBody type wrapping bufio.Reader for non-consuming magic-byte
inspection of the first 4 bytes without affecting downstream readers
- Detect gzip (0x1f 0x8b) and zstd (0x28 0xb5 0x2f 0xfd) by magic bytes
when Content-Encoding header is absent, covering misbehaving upstreams
- Remove if-Content-Encoding guard on all three error paths (Execute,
ExecuteStream, CountTokens); unconditionally delegate to decodeResponseBody
so magic-byte detection applies consistently to all response paths
- Add 10 tests covering stream identity enforcement, compressed success bodies,
magic-byte detection without headers, error path decoding, and
auth attribute override prevention
Claude's Tool Search feature (advanced-tool-use-2025-11-20 beta) adds
defer_loading field to tool definitions. When proxying Claude requests
to Codex or Gemini, this unknown field causes 400 errors upstream.
Strip defer_loading (and cache_control where missing) in all three
Claude-to-upstream translation paths:
- codex/claude: defer_loading + cache_control
- gemini-cli/claude: defer_loading
- gemini/claude: defer_loading
Fixes#1725, Fixes#1375
Add support for Claude's "adaptive" and "auto" thinking modes using `output_config.effort`. Introduce support for new effort level "max" in adaptive thinking. Update thinking logic, validate model capabilities, and extend converters and handling to ensure compatibility with adaptive modes. Adjust static model data with supported levels and refine handling across translators and executors.
## Problem
When using Antigravity Claude models through CLIProxyAPI, the thinking
chain (reasoning content) does not display in the Amp client.
## Root Cause
The Amp client sends `thinking: {"type": "auto"}` in its requests,
but `ConvertClaudeRequestToAntigravity` only handled `"enabled"` and
`"adaptive"` types in its switch statement. The `"auto"` type was
silently ignored, resulting in no `thinkingConfig` being set in the
translated Gemini request. Without `thinkingConfig`, the Antigravity
API returns responses without any thinking content.
Additionally, the Antigravity API for Claude models does not support
`thinkingBudget: -1` (auto mode sentinel). It requires a concrete
positive budget value. The fix uses 128000 as the budget for "auto"
mode, which `ApplyThinking` will then normalize to stay within the
model's actual limits (e.g., capped to `maxOutputTokens - 1`).
## Changes
### internal/translator/antigravity/claude/antigravity_claude_request.go
1. **Add "auto" case** to the thinking type switch statement.
Sets `thinkingBudget: 128000` and `includeThoughts: true`.
The budget is subsequently normalized by `ApplyThinking` based
on model-specific limits.
2. **Add "auto" to hasThinking check** so that interleaved thinking
hints are injected for tool-use scenarios when Amp sends
`thinking.type="auto"`.
### internal/registry/model_definitions_static_data.go
3. **Add Thinking configuration** for `claude-sonnet-4-6`,
`claude-sonnet-4-5`, and `claude-opus-4-6` in
`GetAntigravityModelConfig()` -- these were previously missing,
causing `ApplyThinking` to skip thinking config entirely.
## Testing
- Deployed to Railway test instance (cpa-thinking-test)
- Verified via debug logging that:
- Amp sends `thinking: {"type": "auto"}`
- CPA now translates this to `thinkingConfig: {thinkingBudget: 128000, includeThoughts: true}`
- `ApplyThinking` normalizes the budget to model-specific limits
- Antigravity API receives the correct thinkingConfig
Amp-Thread-ID: https://ampcode.com/threads/T-019ca511-710d-776d-a07c-4b750f871a93
Co-authored-by: Amp <amp@ampcode.com>
Captured and compared outgoing requests from CLIProxyAPI against real
Claude Code 2.1.63 and fixed all detectable differences:
Headers:
- Update anthropic-beta to match 2.1.63: replace fine-grained-tool-streaming
and prompt-caching-2024-07-31 with context-management-2025-06-27 and
prompt-caching-scope-2026-01-05
- Remove X-Stainless-Helper-Method header (real Claude Code does not send it)
- Update default User-Agent from "claude-cli/2.1.44 (external, sdk-cli)" to
"claude-cli/2.1.63 (external, cli)"
- Force Claude Code User-Agent for non-Claude clients to avoid leaking
real client identity (e.g. curl, OpenAI SDKs) during cloaking
Body:
- Inject x-anthropic-billing-header as system[0] (matches real format)
- Change system prompt identifier from "You are Claude Code..." to
"You are a Claude agent, built on Anthropic's Claude Agent SDK."
- Add cache_control with ttl:"1h" to match real request format
- Fix user_id format: user_[64hex]_account_[uuid]_session_[uuid]
(was missing account UUID)
- Disable tool name prefix (set claudeToolPrefix to empty string)
TLS:
- Switch utls fingerprint from HelloFirefox_Auto to HelloChrome_Auto
(closer to Node.js/OpenSSL used by real Claude Code)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Guard the openai-response streaming path against truncated/invalid SSE data payloads by validating data: JSON before forwarding; surface a 502 terminal error instead of letting clients crash with JSON parse errors.
Move base64 image data from Claude tool_result into functionResponse.parts
as inlineData instead of outer sibling parts, preventing context bloat.
Unify all inlineData field naming to camelCase mimeType across Claude,
OpenAI, and Gemini translators. Add comprehensive edge case tests and
Gemini-side regression test for functionResponse.parts preservation.
CI blocks PRs that modify internal/translator. Revert translator edits and keep only the /v1/responses streaming error-chunk fix; file an issue for translator conformance work.
Ensure response.created and response.completed chunks produced by the OpenAI/Gemini/Claude translators always include required fields (response.model and response.usage) so clients validating Responses SSE do not fail schema validation.
When /v1/responses streaming fails after headers are sent, we now emit a type=error chunk instead of an HTTP-style {error:{...}} payload, preventing AI SDK chunk validation errors.
- Add 60 requests/minute rate limiting per credential using sliding window
- Detect insufficient_quota errors and set cooldown until next day (Beijing time)
- Map quota errors (HTTP 403/429) to 429 with retryAfter for conductor integration
- Cache Beijing timezone at package level to avoid repeated syscalls
- Add redactAuthID function to protect credentials in logs
- Extract wrapQwenError helper to consolidate error handling
- force http/1.1 instead of http/2
- explicit connection close
- strip proxy headers X-Forwarded-For and X-Real-IP
- add project id to fetch models payload
Changes the RoundRobinSelector to use two-level round-robin when
gemini-cli virtual auths are detected (via gemini_virtual_parent attr):
- Level 1: cycle across credential groups (parent accounts)
- Level 2: cycle within each group's project auths
Credentials start from a random offset (rand.IntN) for fair distribution.
Non-virtual auths and single-credential scenarios fall back to flat RR.
Adds 3 test cases covering multi-credential grouping, single-parent
fallback, and mixed virtual/non-virtual fallback.
Default to generating a fresh random user_id per request instead of
reusing cached IDs. Add cache-user-id config option to opt in to the
previous caching behavior.
- Add CacheUserID field to CloakConfig
- Extract user_id cache logic to dedicated file
- Generate fresh user_id by default, cache only when enabled
- Add tests for both paths
console.anthropic.com is now protected by a Cloudflare managed challenge
that blocks all non-browser POST requests to /v1/oauth/token, causing
`-claude-login` to fail with a 403 error.
Switch to api.anthropic.com which hosts the same OAuth token endpoint
without the Cloudflare managed challenge.
Fixes#1659
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Non-spark codex models (gpt-5.3-codex, gpt-5.2-codex) stream function call
arguments via multiple delta events followed by a done event. The done handler
unconditionally emitted the full arguments, duplicating what deltas already
streamed. This produced invalid double JSON that Claude Code couldn't parse,
causing tool calls to fail with missing parameters and infinite retry loops.
Add HasReceivedArgumentsDelta flag to track whether delta events were received.
The done handler now only emits arguments when no deltas preceded it (spark
models), while delta-based streaming continues to work for non-spark models.
Some Codex models (e.g. gpt-5.3-codex-spark) send function call arguments
in a single "done" event without preceding "delta" events. The streaming
translator only handled "delta" events, causing tool call arguments to be
lost — resulting in empty tool inputs and infinite retry loops in clients
like Claude Code.
Emit the full arguments from the "done" event as a single input_json_delta
so downstream clients receive the complete tool input.
- Introduced `passthrough-headers` option in configuration to control forwarding of upstream response headers.
- Updated handlers to respect the passthrough headers setting.
- Added tests to verify behavior when passthrough is enabled or disabled.
sjson.SetRaw with an empty string produces malformed JSON (e.g. "result":}).
This happens when a Claude tool_result block has no content field, causing
functionResponseResult.Raw to be "". Guard against this by falling back to
sjson.Set with an empty string only when .Raw is empty.
- Implemented `--standalone` mode to launch an embedded server for TUI.
- Enhanced TUI client to support API-based log polling when log hooks are unavailable.
- Added authentication gate for password input and connection handling.
- Improved localization and UX for logs, authentication, and status bar rendering.
- Implemented `--standalone` mode to launch an embedded server for TUI.
- Enhanced TUI client to support API-based log polling when log hooks are unavailable.
- Added authentication gate for password input and connection handling.
- Improved localization and UX for logs, authentication, and status bar rendering.
- Introduced unit tests for request logging middleware to enhance coverage.
- Added WebSocket-based Codex executor to support Responses API upgrade.
- Updated middleware logic to selectively capture request bodies for memory efficiency.
- Enhanced Codex configuration handling with new WebSocket attributes.
The OpenAI Chat Completions translator was silently dropping
response.function_call_arguments.delta and
response.function_call_arguments.done Codex SSE events, meaning
tool call arguments were never streamed incrementally to clients.
Add proper handling mirroring the proven Claude translator pattern:
- response.output_item.added: announce tool call (id, name, empty args)
- response.function_call_arguments.delta: stream argument chunks
- response.function_call_arguments.done: emit full args if no deltas
- response.output_item.done: defensive fallback for backward compat
State tracking via HasReceivedArgumentsDelta and HasToolCallAnnounced
ensures no duplicate argument emission and correct behavior for models
like codex-spark that skip delta events entirely.
Update hardcoded X-Stainless-* and User-Agent defaults to match
Claude Code 2.1.44 / @anthropic-ai/sdk 0.74.0 (verified via
diagnostic proxy capture 2026-02-17).
Changes:
- X-Stainless-Os/Arch: dynamic via runtime.GOOS/GOARCH
- X-Stainless-Package-Version: 0.55.1 → 0.74.0
- X-Stainless-Timeout: 60 → 600
- User-Agent: claude-cli/1.0.83 (external, cli) → claude-cli/2.1.44 (external, sdk-cli)
Add claude-header-defaults config section so values can be updated
without recompilation when Claude Code releases new versions.
CPA-internal thinking levels like 'xhigh' and 'minimal' are not accepted
by OpenAI-format providers (MiniMax, etc.). The OpenAI applier now maps
non-standard levels to the nearest valid reasoning_effort value before
writing to the request body:
xhigh → high
minimal → low
auto → medium
Allow disabling the proxy_ tool name prefix on a per-account basis.
Users who route their own Anthropic account through CPA can set
"tool_prefix_disabled": true in their OAuth auth JSON to send tool
names unchanged to Anthropic.
Default behavior is fully preserved — prefix is applied unless
explicitly disabled.
Changes:
- Add ToolPrefixDisabled() accessor to Auth (reads metadata key
"tool_prefix_disabled" or "tool-prefix-disabled")
- Gate all 6 prefix apply/strip points with the new flag
- Add unit tests for the accessor
The proxy_ prefix logic correctly skips built-in tools (those with a
non-empty "type" field) in tools[] definitions but does not skip them
in messages[].content[] tool_use blocks or tool_choice. This causes
web_search in conversation history to become proxy_web_search, which
Anthropic does not recognize.
Fix: collect built-in tool names from tools[] into a set and also
maintain a hardcoded fallback set (web_search, code_execution,
text_editor, computer) for cases where the built-in tool appears in
history but not in the current request's tools[] array. Skip prefixing
in messages and tool_choice when name matches a built-in.
- Collect built-in tool names (those with a "type" field like
web_search, code_execution) and skip prefixing tool_reference
blocks that reference them, preventing name mismatch.
- Refactor if-else if chains to switch statements in all three
prefix functions for idiomatic Go style.
tool_reference blocks can appear nested inside tool_result.content[]
arrays, not just at the top level of messages[].content[]. The prefix
logic now iterates into tool_result blocks with array content to find
and prefix/strip nested tool_reference.tool_name fields.
applyClaudeToolPrefix, stripClaudeToolPrefixFromResponse, and
stripClaudeToolPrefixFromStreamLine now handle "tool_reference" blocks
(field "tool_name") in addition to "tool_use" blocks (field "name").
Without this fix, tool_reference blocks in conversation history retain
their original unprefixed names while tool definitions carry the proxy_
prefix, causing Anthropic API 400 errors: "Tool reference 'X' not found
in available tools."
Co-authored-by: Kirill Turanskiy <kt@novamedia.ru>
Adds explicit handling for context.Canceled errors in the reverse proxy error handler to return 499 status code without logging, which is more appropriate for client-side cancellations during polling.
Also adds a test case to verify this behavior.
- Added unit tests for combining OAuth excluded models across global and attribute-specific scopes.
- Implemented priority attribute parsing with support for different formats and trimming.
Add Google One personal account login to Gemini CLI OAuth flow:
- CLI --login shows mode menu (Code Assist vs Google One)
- Web management API accepts project_id=GOOGLE_ONE sentinel
- Auto-discover project via onboardUser without cloudaicompanionProject when project is unresolved
Improve robustness of auto-discovery and token handling:
- Add context-aware auto-discovery polling (30s timeout, 2s interval)
- Distinguish network errors from project-selection-required errors
- Refresh expired access tokens in readAuthFile before project lookup
- Extend project_id auto-fill to gemini auth type (was antigravity-only)
Unify credential file naming to geminicli- prefix for both CLI and web.
Add extractAccessToken unit tests (9 cases).
Sanitize tool schemas by stripping prefill, enumTitles, $id, and patternProperties to prevent Gemini INVALID_ARGUMENT 400 errors, and add unit and executor-level tests to lock in the behavior.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Added support for single and array-based `content` cases.
- Enhanced `system_instruction` structure population logic.
- Improved handling of user role assignment for string-based `content`.
Silences logging for client cancellations during polling to reduce noise in logs.
Client-side cancellations are common during long-running operations and should not be treated as errors.
The `ConvertOpenAIResponsesRequestToGemini` function had code that attempted
to uppercase JSON Schema type values (e.g. "string" -> "STRING") for Gemini
compatibility. This broke nullable types because when `type` is a JSON array
like `["string", "null"]`:
1. `gjson.Result.String()` returns the raw JSON text `["string","null"]`
2. `strings.ToUpper()` produces `["STRING","NULL"]`
3. `sjson.Set()` stores it as a JSON **string** `"[\"STRING\",\"NULL\"]"`
instead of a JSON array
4. The downstream `CleanJSONSchemaForGemini()` / `flattenTypeArrays()`
cannot detect it (since `IsArray()` returns false on a string)
5. Gemini/Antigravity API rejects it with:
`400 Invalid value at '...type' (Type), "["STRING","NULL"]"`
This was confirmed and tested with Droid Factory (Antigravity) Gemini models
where Claude Code sends tool schemas with nullable parameters.
The fix removes the uppercasing logic entirely and passes the raw schema
through to `parametersJsonSchema`. This is safe because:
- Antigravity executor already runs `CleanJSONSchemaForGemini()` which
properly handles type arrays, nullable fields, and all schema cleanup
- Gemini/Vertex executors use `parametersJsonSchema` which accepts raw
JSON Schema directly (no uppercasing needed)
- The uppercasing code also only iterated top-level properties, missing
nested schemas entirely
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The ResponseRewriter's modelFieldPaths was missing 'response.model',
causing the mapped model name to leak through SSE streaming events
(response.created, response.in_progress, response.completed) in the
OpenAI Responses API (/v1/responses).
This caused Amp CLI to report 'Unknown OpenAI model' errors when
model mapping was active (e.g., gpt-5.2-codex -> gpt-5.3-codex),
because the mapped name reached Amp's backend via telemetry.
Also sorted modelFieldPaths alphabetically per review feedback
and added regression tests for all rewrite paths.
Fixes#1463
fix(translator): restructure message content handling to support multiple content types
- Consolidated `input_text` and `output_text` handling into a single case.
- Added support for processing `input_image` content with associated URLs.
- Updated `KimiAPIBaseURL` to remove versioning from the root path.
- Integrated `ClaudeExecutor` fallback in `KimiExecutor` methods for compatibility with Claude requests.
- Simplified token counting by delegating to `ClaudeExecutor`.
- Standardized the handling of `stop_reason` and `finish_reason` across Codex and Gemini responses.
- Restricted pass-through of specific reasons (`max_tokens`, `stop`) for consistency.
- Enhanced fallback logic for undefined reasons.
- Added support to extract and include `cachedContentTokenCount` in `usage.prompt_tokens_details`.
- Logged warnings for failures to set cached token count for better debugging.
- Introduced `RequestKimiToken` API for Kimi authentication flow.
- Integrated device ID management throughout Kimi-related components.
- Enhanced header management for Kimi API requests with device ID context.
- OAuth2 device authorization grant flow (RFC 8628) for authentication
- Streaming and non-streaming chat completions via OpenAI-compatible API
- Models: kimi-k2, kimi-k2-thinking, kimi-k2.5
- CLI `--kimi-login` command for device flow auth
- Token management with automatic refresh
- Thinking/reasoning effort support for thinking-enabled models
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replaced all instances of `bytes.Clone` with direct references to enhance efficiency.
- Simplified payload handling across executors and translators by eliminating unnecessary data duplication.
- Consolidated path-finding logic into a new `findPathsByFields` helper function.
- Refactored repetitive loop structures to improve readability and performance.
- Added depth-based sorting for deletion paths to ensure proper removal order.
Claude Haiku 4.5 (claude-haiku-4-5-20251001) supports extended thinking
according to Anthropic's official documentation:
https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking
The model was incorrectly marked as not supporting thinking in the static
model definitions. This fix adds ThinkingSupport with the same parameters
as other Claude 4.5 models (Sonnet, Opus):
- Min: 1024 tokens
- Max: 128000 tokens
- ZeroAllowed: true
- DynamicAllowed: false
- Replaced repetitive string operations with a centralized `escapeGJSONPathKey` function.
- Streamlined handling of JSON schema cleaning for Gemini and Antigravity requests.
- Improved payload management by transitioning from byte slices to strings for processing.
- Removed unnecessary cloning of byte slices in several places.
Google official Gemini Python SDK sends thinking_level, thinking_budget,
and include_thoughts (snake_case) instead of thinkingLevel, thinkingBudget,
and includeThoughts (camelCase). This caused thinking configuration to be
ignored when using Python SDK.
Changes:
- Extract layer: extractGeminiConfig now reads snake_case as fallback
- Apply layer: Gemini/CLI/Antigravity appliers clean up snake_case fields
- Translator layer: Gemini->OpenAI/Claude/Codex translators support fallback
- Tests: Added 4 test cases for snake_case field coverage
Fixes#1426
- Add logic to handle `image` content type during request translation.
- Map Claude base64 image data to Gemini's `inlineData` structure.
- Support automatic extraction of `media_type` and `data` for image parts.
- Introduced a new `pprof` server to enable/debug HTTP profiling.
- Added configuration options for enabling/disabling and specifying the server address.
- Integrated pprof server lifecycle management with `Service`.
#1287
- Use util.ResolveAuthDir to properly expand ~ to user home directory
- Fixes issue where logs were created in literal "~/.cli-proxy-api" folder
- Add warning log when auth-dir resolution fails for debugging
Bug introduced in 62e2b67 (refactor(logging): centralize log directory
resolution logic), where strings.TrimSpace was used instead of
util.ResolveAuthDir to process auth-dir path.
Introduce `Filter` rules in the payload configuration to remove specified JSON paths from the payload. Update related helper functions and add examples to `config.example.yaml`.
- Add ErrorLogsMaxFiles config field with default value 10
- Support hot-reload via config file changes
- Add Management API: GET/PUT/PATCH /v0/management/error-logs-max-files
- Maintain SDK backward compatibility with NewFileRequestLogger (3 params)
- Add NewFileRequestLoggerWithOptions for custom error log retention
When request logging is disabled, forced error logs are retained up to
the configured limit. Set to 0 to disable cleanup.
Introduce a custom HTTP client utilizing utls with Firefox TLS fingerprinting to bypass Cloudflare fingerprinting on Anthropic domains. Includes support for proxy configuration and enhanced connection management for HTTP/2.
Add support for Gemini's code_execution and url_context tools in the
request translators, enabling:
- Agentic Vision: Image analysis with Python code execution for
bounding boxes, annotations, and visual reasoning
- URL Context: Live web page content fetching and analysis
Tools are passed through using the same pattern as google_search:
- code_execution: {} -> codeExecution: {}
- url_context: {} -> urlContext: {}
Tested with Gemini 3 Flash Preview agentic vision successfully.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Removes x-* extension fields from JSON schemas to ensure compatibility with the Gemini API.
These fields, while valid in OpenAPI/JSON Schema, are not recognized by the Gemini API and can cause issues.
The change recursively walks the schema, identifies these extension fields, and removes them, except when they define properties.
Amp-Thread-ID: https://ampcode.com/threads/T-019c0cd1-9e59-722b-83f0-e0582aba6914
Co-authored-by: Amp <amp@ampcode.com>
- Add ensureCacheControl() to auto-inject cache breakpoints
- Cache tools (last tool), system (last element), and messages (2nd-to-last user turn)
- Add prompt-caching-2024-07-31 beta header
- Return original payload on sjson error to prevent corruption
- Include verification test for caching logic
Enables up to 90% cost reduction on cached tokens.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add firstChunkTimestamp field to ResponseWriterWrapper for sync capture
- Capture TTFB in Write() and WriteString() before async channel send
- Add SetFirstChunkTimestamp() to StreamingLogWriter interface
- Make requestTimestamp/apiResponseTimestamp required in LogRequest()
- Remove timestamp capture from WriteAPIResponse() (now via setter)
- Fix Gemini handler to set API_RESPONSE_TIMESTAMP before writing response
This ensures accurate TTFB measurement for all streaming API formats
(OpenAI, Gemini, Claude) by capturing timestamp synchronously when
the first response chunk arrives, not when the stream finalizes.
Previously:
- REQUEST INFO timestamp was captured at log write time (not request arrival)
- API RESPONSE had NO timestamp at all
This fix:
- Captures REQUEST INFO timestamp when request first arrives
- Adds API RESPONSE timestamp when upstream response arrives
Changes:
- Add Timestamp field to RequestInfo, set at middleware initialization
- Set API_RESPONSE_TIMESTAMP in appendAPIResponse() and gemini handler
- Pass timestamps through logging chain to writeNonStreamingLog()
- Add timestamp output to API RESPONSE section
This enables accurate measurement of backend response latency in error logs.
Introduce `TestExecuteStreamWithAuthManager_DoesNotRetryAfterFirstByte` to validate that stream executions do not retry after receiving partial responses. Implement `payloadThenErrorStreamExecutor` for test coverage of this behavior.
When using Gemini API format with Antigravity backend, the executor
renames usageMetadata to cpaUsageMetadata in non-terminal chunks.
The Gemini translator was returning this internal field name directly
to clients instead of the standard usageMetadata field.
Add restoreUsageMetadata() to rename cpaUsageMetadata back to
usageMetadata before returning responses to clients.
When Claude API sends an assistant message with empty text content like:
{"role":"assistant","content":[{"type":"text","text":""}]}
The translator was creating a part object {} with no data field,
causing Gemini API to return error:
"required oneof field 'data' must have one initialized field"
This fix:
1. Skips empty text parts (text="") during translation
2. Skips entire messages when their parts array becomes empty
This ensures compatibility when clients send empty assistant messages
in their conversation history.
Optimize channel operations by introducing reusable context-aware send functions (`send` and `sendErr`) across `wsrelay`, `handlers`, and `cliproxy`. Ensure graceful handling of canceled contexts during stream operations.
Implement `request_retry` and `disable_cooling` metadata overrides for authentication management. Update retry and cooling logic accordingly across `Manager`, Antigravity executor, and file synthesizer. Add tests to validate new behaviors.
Introduce `WithSkipPersist` to disable persistence during Manager Update/Register calls, preventing write-back loops caused by redundant file writes. Add corresponding tests and integrate with existing file store and conductor logic.
Fixes#1082
When all Antigravity accounts are unavailable, the error response now returns
HTTP 502 (Bad Gateway) instead of HTTP 400 (Bad Request). This ensures that
NewAPI and other clients will retry the request on a different channel,
improving overall reliability.
Previously GeminiModels handler unconditionally overwrote displayName
and description with the model name, losing the original values defined
in model definitions (e.g., 'Gemini 3 Pro Preview').
Now only set these fields as fallback when they are missing or empty.
Add support for Imagen 3.0 and 4.0 image generation models in Vertex AI:
- Add 5 Imagen model definitions (4.0, 4.0-ultra, 4.0-fast, 3.0, 3.0-fast)
- Implement :predict action routing for Imagen models
- Convert Imagen request/response format to match Gemini structure like gemini-3-pro-image
- Transform prompts to Imagen's instances/parameters format
- Convert base64 image responses to Gemini-compatible inline data
Add new PATCH /v0/management/auth-files/status endpoint that allows
toggling the disabled state of auth files without deleting them.
This enables users to temporarily disable credentials from the
management UI.
- Introduced `ResolveLogDirectory` function in `logging` package to standardize log directory determination across components.
- Replaced redundant logic in `server`, `global_logger`, and `handlers` with the new utility function.
Update ApplyThinking signature to accept fromFormat and toFormat parameters
instead of a single provider string. This enables:
- Proper level-to-budget conversion when source is level-based (openai/codex)
and target is budget-based (gemini/claude)
- Strict budget range validation when source and target formats match
- Level clamping to nearest supported level for cross-format requests
- Format alias resolution in SDK translator registry for codex/openai-response
Also adds ErrBudgetOutOfRange error code and improves iflow config extraction
to fall back to openai format when iflow-specific config is not present.
Refactor thinking configuration tests by separating model name suffix-based
scenarios from request body parameter-based scenarios into distinct test
functions with independent case numbering.
Architectural improvements:
- Extract thinkingTestCase struct to package level for shared usage
- Add getTestModels() helper returning complete model fixture set
- Introduce runThinkingTests() runner with protocol-specific field detection
- Register level-subset-model fixture with constrained low/high level support
- Extend iflow protocol handling for glm-test and minimax-test models
- Add same-protocol strict boundary validation cases (80-89)
- Replace error responses with clamped values for boundary-exceeding budgets
Gemini API requires all enum values in function declarations to be
strings. Some MCP tools (e.g., roxybrowser) define schemas with numeric
enums like `"enum": [0, 1, 2]`, causing INVALID_ARGUMENT errors.
Add convertEnumValuesToStrings() to automatically convert numeric and
boolean enum values to their string representations during schema
transformation.
feat(translator): improve system message handling and content indexing across translators
- Updated logic for processing system messages in `claude`, `gemini`, `gemini-cli`, and `antigravity` translators.
- Introduced indexing for `systemInstruction.parts` to ensure proper ordering and handling of multi-part content.
- Added safeguards for accurate content transformation and serialization.
- Introduced `payloadModelAliases` and `payloadModelCandidates` functions to support model aliases for improved flexibility.
- Updated rule matching logic to handle multiple model candidates.
- Refactored variable naming in executor to improve code clarity and consistency.
- Enhanced ID matching in `cliproxy` by adding additional conditions to better handle ID equality cases.
- Updated `gemini` handlers to include `displayName` and `description` in normalized models for enriched metadata.
- Added conditional logic for Codex instruction injection based on configuration.
- Updated role terminology from "user" to "developer" for better alignment with context.
- Added logic to transform `inputResults` into structured JSON for improved processing.
- Removed redundant `safety_identifier` field in executor payload to streamline requests.
- Introduced `default-raw` and `override-raw` rules to handle raw JSON values.
- Enhanced `PayloadConfig` to validate and sanitize raw JSON payload rules.
- Updated executor logic to apply `default-raw` and `override-raw` rules.
- Extended example YAML to include usage of raw JSON rules.
This change removes the translation logic for several non-standard, proprietary extensions used to configure thinking/reasoning. Specifically, support for `extra_body.google.thinking_config` and the Anthropic-style `thinking` object has been dropped from the OpenAI request translators.
This simplification streamlines the translators, focusing them on the standard `reasoning_effort` parameter. It also removes the need to look up model information from the registry within these components.
BREAKING CHANGE: Support for non-standard thinking configurations via `extra_body.google.thinking_config` and the Anthropic-style `thinking` object has been removed. Clients should now use the standard `reasoning_effort` parameter to control reasoning.
- Added `metadataEqualIgnoringTimestamps` to compare metadata while ignoring volatile fields.
- Prevented redundant writes caused by changes in timestamp-related fields.
- Improved efficiency in filestore operations by skipping unnecessary updates.
feat(auth): fetch and update Antigravity project ID from metadata during filestore operations
- Added support to retrieve and update `project_id` using the access token if missing in metadata.
- Integrated HTTP client to fetch project ID dynamically.
- Enhanced metadata persistence logic.
- Updated `ideType` to `ANTIGRAVITY` in request payload.
- Introduced tier-selection logic to determine default tier for onboarding.
- Added `antigravityOnboardUser` function for project ID retrieval via polling.
- Enhanced error handling and response decoding for onboarding flow.
This change introduces environment variable interpolation for volume paths, allowing users to customize where configuration, authentication, and log data are stored.
Why: Makes the project easier to deploy on various hosting environments that require decoupled data management without needing to modify the core docker-compose.yml..
Key points:
Defaults to existing paths (./config.yaml, ./auths, ./logs) to ensure zero breaking changes for current users.
Follows the existing naming convention used in the project.
Enhances portability for CI/CD and cloud-native deployments.
feat(translator): add function name to response output item serialization
- Included `item.name` in the serialized response output to enhance output item handling.
feat(oauth): add support for customizable OAuth callback ports
- Introduced `oauth-callback-port` flag to override default callback ports.
- Updated SDK and login flows for `iflow`, `gemini`, `antigravity`, `codex`, `claude`, and `openai` to respect configurable callback ports.
- Refactored internal OAuth servers to dynamically assign ports based on the provided options.
- Revised tests and documentation to reflect the new flag and behavior.
When switching from Claude models (e.g., Opus 4.5) to Gemini models
(e.g., Flash) mid-conversation via Antigravity OAuth, the client-provided
thinking signatures from Claude would cause "Corrupted thought signature"
errors since they are incompatible with Gemini API.
Changes:
- Remove fallback to client-provided signatures in thinking block handling
- Only use cached signatures (from same-session Gemini responses)
- Skip thinking blocks without valid cached signatures
- tool_use blocks continue to use skip_thought_signature_validator when
no valid signature is available
This ensures cross-model switching works correctly while preserving
signature validation for same-model conversations.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Updated `SDKConfig` to use `nonstream-keepalive-interval` (seconds) instead of the boolean `nonstream-keepalive`.
- Refactored handlers and logic to incorporate the new interval-based configuration.
- Updated config diff, tests, and example YAML to reflect the changes.
- Introduced `StartNonStreamingKeepAlive` to emit periodic blank lines during non-streaming responses.
- Added `nonstream-keepalive` configuration option in `SDKConfig`.
- Updated handlers to utilize `StartNonStreamingKeepAlive` and ensure proper cleanup.
- Extended config diff and tests to include `nonstream-keepalive` changes.
feat(cliproxy): support multiple aliases for OAuth model mappings
- Updated mapping logic to allow multiple aliases per upstream model name.
- Adjusted `SanitizeOAuthModelMappings` to ensure aliases remain unique within channels.
- Added test cases to validate multi-alias scenarios.
- Updated example config to clarify multi-alias support.
Keep the upstream payload prefixed for OAuth while passing the unprefixed request body into response translators. This avoids proxy_ leaking into OpenAI Responses echoed tool metadata while preserving the Claude OAuth workaround.
Prefix tool names with proxy_ for Claude OAuth requests and strip the prefix from streaming and non-streaming responses to restore client-facing names.
Updates the Claude executor to:
- add prefixing for tools, tool_choice, and tool_use messages when using OAuth tokens
- strip the prefix from tool_use events in SSE and non-streaming payloads
- add focused unit tests for prefix/strip helpers
Update the model name check in `buildRequest` to target "gemini-3-pro-preview" instead of "gemini-3-pro" when applying specific system instruction handling.
When streaming responses with tool calls, the finish_reason was being
overwritten. The upstream sends functionCall in chunk 1, then
finishReason: STOP in chunk 2. The old code would set finish_reason
from every chunk, causing "tool_calls" to be overwritten by "stop".
This broke clients like Claude Code that rely on finish_reason to
detect when tool calls are complete.
Changes:
- Add SawToolCall bool to track tool calls across entire stream
- Add UpstreamFinishReason to cache the finish reason
- Only emit finish_reason on final chunk (has both finishReason + usage)
- Priority: tool_calls > max_tokens > stop
Includes 5 unit tests covering:
- Tool calls not overwritten by subsequent STOP
- Normal text gets "stop"
- MAX_TOKENS without tool calls gets "max_tokens"
- Tool calls take priority over MAX_TOKENS
- Intermediate chunks have no finish_reason
Fixes streaming tool call detection for Claude Code + Gemini models.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Previously, metadataEqualIgnoringTimestamps() ignored access_token for all
providers, which prevented refreshed tokens from being persisted to disk/database.
This caused tokens to be lost on server restart for providers like iFlow.
This change makes the behavior provider-specific:
- Providers like gemini/gemini-cli that issue new tokens on every refresh and
can re-fetch when needed will continue to ignore access_token (optimization)
- Other providers like iFlow will now persist access_token changes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
integrate claude-cloak functionality to disguise api requests:
- add CloakConfig with mode (auto/always/never) and strict-mode options
- generate fake user_id in claude code format (user_[hex]_account__session_[uuid])
- inject claude code system prompt (configurable strict mode)
- obfuscate sensitive words with zero-width characters
- auto-detect claude code clients via user-agent
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Enhanced node structure by including `thoughtSignature` for inline data parts in Gemini OpenAI, Gemini CLI, and Antigravity request handlers to improve traceability of thought processes.
Fixes issue where free tier users cannot access Gemini 3 preview models
due to frontend/backend project ID mapping.
## Problem
Google's Gemini API uses a frontend/backend project mapping system for
free tier users:
- Frontend projects (e.g., gen-lang-client-*) are user-visible
- Backend projects (e.g., mystical-victor-*) host actual API access
- Only backend projects have access to preview models (gemini-3-*)
Previously, CLIProxyAPI ignored the backend project ID returned by
Google's onboarding API and kept using the frontend ID, preventing
access to preview models.
## Solution
### CLI (internal/cmd/login.go)
- Detect free tier users (gen-lang-client-* projects or FREE/LEGACY tier)
- Show interactive prompt allowing users to choose frontend or backend
- Default to backend (recommended for preview model access)
- Pro users: maintain original behavior (keep frontend ID)
### Web UI (internal/api/handlers/management/auth_files.go)
- Detect free tier users using same logic
- Automatically use backend project ID (recommended choice)
- Pro users: maintain original behavior (keep frontend ID)
### Deduplication (internal/cmd/login.go)
- Add deduplication when user selects ALL projects
- Prevents redundant API calls when multiple frontend projects map to
same backend
- Skips duplicate project IDs in activation loop
## Impact
- Free tier users: Can now access gemini-3-pro-preview and
gemini-3-flash-preview models
- Pro users: No change in behavior (backward compatible)
- Only affects Gemini CLI OAuth (not antigravity or API key auth)
## Testing
- Tested with free tier account selecting single project
- Tested with free tier account selecting ALL projects
- Verified deduplication prevents redundant onboarding calls
- Confirmed pro user behavior unchanged
Remove os.RemoveAll() call in syncAuthFromBucket() that was causing
a race condition with the file watcher.
Problem:
1. syncAuthFromBucket() wipes local auth directory with RemoveAll
2. File watcher detects deletions and propagates them to remote store
3. syncAuthFromBucket() then pulls from remote, but files are now gone
Solution:
Use incremental sync instead of delete-then-pull. Just ensure the
directory exists and overwrite files as they're downloaded.
This prevents the watcher from seeing spurious delete events.
Fixed incorrect boundary logic for `message_delta` emission, ensuring proper handling of usage updates and `emitMessageStopIfNeeded` within the response loop.
Implemented `Fork` flag in `ModelNameMapping` to allow aliases as additional models while preserving the original model ID. Updated the `applyOAuthModelMappings` logic, added tests for `Fork` behavior, and updated documentation and examples accordingly.
Introduced `tokenRefreshTimeout` constant for token refresh operations and enhanced context propagation for `refreshToken` by embedding roundtrip information if available. Adjusted `refreshAuth` to ensure default context initialization and handle cancellation errors appropriately.
Replaced file-based auth entry counting with `TokenStore`-backed implementation, enhancing flexibility and context-aware token management. Updated related logic to reflect this change.
Added support for external hooks to observe model registry events using the `ModelRegistryHook` interface. Implemented thread-safe, non-blocking execution of hooks with panic recovery. Comprehensive tests added to verify hook behavior during registration, unregistration, blocking, and panic scenarios.
Refactored `applyPayloadConfig` to `applyPayloadConfigWithRoot`, adding support for default rule validation against the original payload when available. Updated all executors to use `applyPayloadConfigWithRoot` and incorporate an optional original request payload for translations.
This commit reverts all modifications within internal/translator. A separate issue
will be created for the maintenance team to integrate SanitizeFunctionName into
the translators.
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
This commit centralizes tool name sanitization in SanitizeFunctionName,
applying character compliance, starting character rules, and length limits.
It also fixes a regression in gemini_schema tests and preserves MCP-specific
shortening logic while ensuring compliance.
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Implemented SanitizeFunctionName utility to ensure Claude tool names meet
Gemini/Upstream strict naming conventions (alphanumeric, starts with letter/underscore, max 64 chars).
Applied sanitization to tool definitions and usage in all relevant translators.
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
When receiving HTTP 429 (Too Many Requests) responses, parse the retry
delay from the response body using parseRetryDelay and populate the
statusErr.retryAfter field. This allows upstream callers to respect
the server's requested retry timing.
Applied to all error paths in Execute, executeClaudeNonStream,
ExecuteStream, CountTokens, and refreshToken functions.
When multiple OAuth providers share an account email, the existing "Use OAuth" debug lines are ambiguous and hard to correlate with management usage stats. Include provider, auth file, and auth index in the selection log, and only compute these fields when debug logging is enabled to avoid impacting normal request performance.
Before:
[debug] Use OAuth user@example.com for model gemini-3-flash-preview
[debug] Use OAuth user@example.com (project-1234) for model gemini-3-flash-preview
After:
[debug] Use OAuth provider=antigravity auth_file=antigravity-user_example_com.json auth_index=1a2b3c4d5e6f7788 account="user@example.com" for model gemini-3-flash-preview
[debug] Use OAuth provider=gemini-cli auth_file=gemini-user@example.com-project-1234.json auth_index=99aabbccddeeff00 account="user@example.com (project-1234)" for model gemini-3-flash-preview
Anthropic API does not allow extended thinking when tool_choice is set
to "any" or a specific tool. This was causing 400 errors when using
features like Amp's /handoff command which forces tool_choice.
Added disableThinkingIfToolChoiceForced() that removes thinking config
when incompatible tool_choice is detected, applied to both streaming
and non-streaming paths.
Fixesrouter-for-me/CLIProxyAPI#630
- GLM-4.7: Uses extra_body={"thinking": {"type": "enabled"}, "clear_thinking": false}
- MiniMax-M2.1: Uses reasoning_split=true for OpenAI-style reasoning separation
- Added preserveReasoningContentInMessages() to support re-injection of reasoning
content in assistant message history for multi-turn conversations
- Added ThinkingSupport to MiniMax-M2.1 model definition
Improved consistency across OpenAI, Claude, and Gemini handlers by replacing initial `select` statement with a `for` loop for better readability and error-handling robustness.
Refined header assignment to use `x-api-key` for Anthropic API requests, ensuring correct authorization behavior based on request attributes and URL validation.
Updated Antigravity, Gemini, and Gemini-CLI translators to process `systemResult` of type `string` for system instructions. Ensures properly formatted JSON with dynamic content assignment.
fix(antigravity): validate function arguments before serialization
Ensure `function.arguments` is a valid JSON before setting raw bytes, fallback to setting as parameterized content if invalid.
feat: handle array input for system instructions in translators
Enhanced Gemini, Gemini-CLI, and Antigravity translators to process array content for system instructions. Adds support for assigning roles and handling multiple content parts dynamically.
Added comprehensive tests for `FillFirstSelector` and `RoundRobinSelector` to ensure proper behavior, including deterministic, cyclical, and concurrent scenarios. Introduced dynamic routing strategy updates in `service.go`, normalizing strategies and seamlessly switching between `fill-first` and `round-robin`. Updated `Manager` to support selector changes via the new `SetSelector` method.
Optimized the handling of JSON serialization and deserialization by replacing redundant `json.Marshal` and `json.Unmarshal` calls with `sjson` and `gjson`. Introduced a `marshalJSONValue` utility for compact JSON encoding, improving performance and code simplicity. Removed unused `encoding/json` imports.
This update enhances the `FileRequestLogger` by introducing support for spooling large request and response bodies to temporary files, reducing memory consumption. It adds atomic requestLogID generation for sequential log naming and new methods for non-streaming/streaming log assembly. Also includes better error handling during logging and temp file cleanups.
Improve parsing of tool call inputs and Antigravity compatibility to avoid invalid thinking/tool_use errors.
- Parse tool call inputs robustly by accepting both object and JSON-string formats and only produce a functionCall part when valid args exist, reducing spurious or malformed parts.
- Preserve the skip_thought_signature_validator approach for calls without a valid thinking signature but stop toggling/tracking a separate "disable thinking" flag; this prevents unnecessary removal of thinkingConfig.
- Sanitize tool input schemas before attaching them to the Antigravity request to improve compatibility.
- Append the interleaved-thinking hint as a new parts entry instead of overwriting/setting text directly, preserving structure.
- Remove unused tracking logic and related comments to simplify flow.
These changes reduce errors related to missing/invalid thinking signatures, improve schema compatibility, and make hint injection safer and more consistent.
Removes the addition of the "anthropic-beta: interleaved-thinking-2025-05-14" header for Claude thinking models when building HTTP requests.
This prevents sending an experimental/feature flag header that is no longer required and avoids potential compatibility or routing issues with downstream services. Keeps request headers simpler and more standard.
Prefer cached signatures and avoid injecting dummy thinking blocks; instead remove unsigned thinking blocks and add a skip sentinel for tool calls without a valid signature. Generate stable session IDs from the first user message, apply schema cleaning only for Claude models, and reorder thinking parts so thinking appears first. For Gemini, remove thinking blocks and attach a skip sentinel to function calls. Simplify response handling by passing raw function args through (remove special Bash conversion). Update and add tests to reflect the new behavior.
These changes prevent rejected dummy signatures, improve compatibility with Antigravity’s signature validation, provide more stable session IDs for conversation grouping, and make request/response translation more robust.
Remove the opencode-antigravity-auth submodule reference from the repository.
Cleans up the project by eliminating an external submodule pointer that is no longer needed or maintained, reducing repository complexity and avoiding dangling submodule state.
Removes two obsolete git submodules to clean up repository state and reduce maintenance overhead.
This eliminates external references that are no longer needed, simplifying dependency management and repository maintenance going forward.
fix: unify response field naming across translators
Standardize `text` to `delta` and add missing `output` field in all response payloads for consistency across OpenAI, Claude, and Gemini translators.
This commit introduces several improvements to the AMP (Advanced Model Proxy) module:
- **Model Mapping Logic:** The `FallbackHandler` now uses a more robust approach for model mapping. It includes the extraction and preservation of dynamic "thinking suffixes" (e.g., `(xhigh)`) during mapping, ensuring that these configurations are correctly applied to the mapped model. A new `resolveMappedModel` function centralizes this logic for cleaner code.
- **ModelMapper Verification:** The `ModelMapper` in `model_mapping.go` now verifies that the target model of a mapping has available providers *after* normalizing it. This prevents mappings to non-existent or unresolvable models.
- **Gemini Thinking Configuration Cleanup:** In `gemini_thinking.go`, unnecessary `generationConfig.thinkingConfig.include_thoughts` and `generationConfig.thinkingConfig.thinkingBudget` fields are now deleted from the request body when applying Gemini thinking levels. This prevents potential conflicts or redundant configurations.
- **Testing:** A new test case `TestModelMapper_MapModel_TargetWithThinkingSuffix` has been added to `model_mapping_test.go` to specifically cover the preservation of thinking suffixes during model mapping.
This commit introduces a new configuration option `logs-max-total-size-mb` that allows users to set a maximum total size (in MB) for log files in the logs directory. When this limit is exceeded, the oldest log files will be automatically deleted to stay within the specified size. Setting this value to 0 (the default) disables this feature. This change enhances log management by preventing excessive disk space usage.
Add applyPayloadConfig calls to all Antigravity executor paths (Execute,
executeClaudeNonStream, ExecuteStream) to enable config.yaml payload
overrides for Antigravity/Gemini-Claude models.
This allows users to configure thinking budget and other parameters via
payload.override in config.yaml for models like gemini-claude-opus-4-5*.
Refactors the reasoning effort conversion logic for Gemini models.
The update specifically addresses how `reasoning_effort` is translated into Gemini 3 specific thinking configurations (`thinkingLevel`, `includeThoughts`) and ensures that numeric budgets are not incorrectly applied to level-based models.
Changes include:
- Differentiating conversion logic for Gemini 3 models versus other models.
- Handling `none`, `auto`, and validated thinking levels for Gemini 3.
- Maintaining existing conversion for models not using discrete thinking levels.
The prompt for the gpt-5.2 codex model has been updated with more comprehensive instructions. This includes detailed guidelines on general usage, editing constraints, the plan tool, sandboxing configurations, handling special user requests, frontend task considerations, and final message presentation. The updates aim to improve the model's understanding and execution of complex coding tasks by providing clearer directives and constraints.
Improve the cache eviction routine to sort entries by timestamp using the standard library sort routine (stable, clearer and faster than the prior manual selection/bubble logic), and remove a redundant request-derived session ID helper in favor of the centralized session ID function. Also drop now-unused crypto/encoding imports.
This yields clearer, more maintainable eviction logic and removes duplicated/unused code and imports to reduce surface area and potential inconsistencies.
Improve robustness when handling "thinking" content by using a dedicated helper to extract the thinking text. This ensures wrapped or nested thinking objects are handled correctly instead of relying on a direct string extraction, reducing parsing errors for complex payloads.
This change introduces specific logic to load and use instructions for the 'gpt-5.2-codex' model variant by recognizing the 'gpt-5.2-codex_prompt.md' filename. This ensures the correct prompts are used when the '5.2-codex' model is identified, complementing the recent addition of its definition.
Normalize Bash tool arguments by converting a "command" key into "cmd" using JSON-aware parsing, avoiding brittle string replacements that could corrupt values. Apply this conversion in both streaming and non-streaming response paths so bash-style tool calls are emitted with the expected "cmd" field.
Add support for accumulating thinking text and carrying session identifiers to enable signature caching/restore for unsigned thinking blocks, improving handling of thinking-state continuity across requests/responses.
Also perform small cleanups: import logging, tidy comments and test descriptions. These changes make tool-argument handling more robust and enable reliable signature restoration for thinking blocks.
Ensures compatibility with the Amp client by suppressing
"thinking" blocks when "tool_use" blocks are also present in
the response.
The Amp client has issues rendering both types of blocks
simultaneously. This change filters out "thinking" blocks in
such cases, preventing rendering problems.
Gemini API does not support the JSON Schema `propertyNames` keyword,
causing 400 errors when Claude tool schemas containing this field are
proxied through the Antigravity provider.
Add `propertyNames` to the list of unsupported keywords removed by
CleanJSONSchemaForGemini(), alongside existing removals like $ref,
definitions, and additionalProperties.
Introduce a centralized OAuth session store with TTL-based expiration
to replace the previous simple map-based status tracking. Add a new
/api/oauth/callback endpoint that allows remote clients to relay OAuth
callback data back to the CLI proxy, enabling OAuth flows when the
callback cannot reach the local machine directly.
- Add oauth_sessions.go with thread-safe session store and validation
- Add oauth_callback.go with POST handler for remote callback relay
- Refactor auth_files.go to use new session management APIs
- Register new callback route in server.go
Add metadataEqualIgnoringTimestamps() function to compare metadata JSON
without timestamp/expired/expires_in/last_refresh/access_token fields.
This prevents unnecessary file writes when only these fields change
during refresh, breaking the fsnotify event → Watcher callback → refresh loop.
Key insight: Google OAuth returns a new access_token on each refresh,
which was causing file writes and triggering the refresh loop.
Fixes antigravity channel excessive log generation issue.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
The previous commit added thinkingLevel support but didn't apply it
when the reasoning effort came from model name suffix (e.g., model(minimal)).
This was because ResolveThinkingConfigFromMetadata returns nil for
level-based models, bypassing the metadata application.
Changes:
- Add ApplyGemini3ThinkingLevelFromMetadata for standard Gemini API
- Add ApplyGemini3ThinkingLevelFromMetadataCLI for CLI API format
- Update gemini_cli_executor to apply Gemini 3 thinkingLevel from metadata
- Update antigravity_executor to apply Gemini 3 thinkingLevel from metadata
- Update aistudio_executor to apply Gemini 3 thinkingLevel from metadata
- Add comprehensive test coverage for Gemini 3 thinkingLevel functions
Per Google's official documentation, Gemini 3 models should use
thinkingLevel (string) instead of thinkingBudget (number) for
optimal performance.
From Google's Gemini Thinking docs:
> Use the thinkingLevel parameter with Gemini 3 models. While
> thinkingBudget is accepted for backwards compatibility, using
> it with Gemini 3 Pro may result in suboptimal performance.
Changes:
- Add model family detection functions (IsGemini3Model, IsGemini25Model,
IsGemini3ProModel, IsGemini3FlashModel)
- Add ApplyGeminiThinkingLevel and ApplyGeminiCLIThinkingLevel functions
for applying thinkingLevel config
- Add ValidateGemini3ThinkingLevel for model-specific level validation
- Add ThinkingBudgetToGemini3Level for backward compatibility conversion
- Update NormalizeGeminiThinkingBudget to convert budget to level for
Gemini 3 models
- Update ApplyDefaultThinkingIfNeeded to not set a default level for
Gemini 3 (lets API use its dynamic default "high")
- Update ConvertThinkingLevelToBudget to preserve thinkingLevel for
Gemini 3 models
- Add Levels field to all Gemini 3 model definitions:
- Gemini 3 Pro: ["low", "high"]
- Gemini 3 Flash: ["minimal", "low", "medium", "high"]
Backward compatibility:
- Gemini 2.5 models continue to use thinkingBudget as before
- If thinkingBudget is provided for Gemini 3, it's converted to the
appropriate thinkingLevel
- Existing configurations continue to work
Introduces the capability to count tokens for Antigravity-backed requests. This implementation leverages the `countTokens` endpoint of the Antigravity API, replacing the prior unsupported stub.
Key aspects of this update include:
- **API Integration**: Direct integration with the Antigravity `countTokens` API, including necessary request payload translation and authentication.
- **Resilient Infrastructure**: A fallback mechanism has been established, allowing the system to attempt connections across multiple Antigravity base URLs to ensure request success even in the event of temporary service interruptions.
- **Model Aliasing**: Added mappings for `gemini-3-flash` and `gemini-3-flash-preview` to ensure compatibility with the latest model variants.
- **Robust Error Handling**: Comprehensive error handling and logging are in place to manage failures during API interactions.
Updates schema flattening logic to handle multiple non-null types, providing a more descriptive "Accepts" hint.
Removes redundant tracking of the current tool name in `Params` as it's no longer needed for streaming limits, simplifying the structure.
Enhances compatibility with the Gemini API by implementing a schema cleaning process.
This includes:
- Centralizing schema cleaning logic for Gemini in a dedicated utility function.
- Converting unsupported schema keywords to hints within the description field.
- Flattening complex schema structures like `anyOf`, `oneOf`, and type arrays to simplify the schema.
- Handling streaming responses with empty tool names, which can occur in subsequent chunks after the initial tool use.
Enhance the configuration diff logic to include detection and reporting of `prefix` changes for all model types. Update related struct naming for consistency across the watcher module.
Normalize action handling by accommodating wildcard patterns in route definitions for Gemini endpoints. Adjust `request.Action` parsing logic to correctly process routes with prefixed actions.
Refactor model management to include an optional `prefix` field for model credentials, enabling better namespace handling. Update affected configuration files, APIs, and handlers to support prefix normalization and routing. Remove unused OpenAI compatibility provider logic to simplify processing.
Introduce formatProxyURL helper to sanitize proxy addresses before
logging, stripping credentials and path components while preserving
host information. Rework model hash computation to sort and deduplicate
name/alias pairs with case normalization, ensuring consistent output
regardless of input ordering. Add signature-based identification for
anonymous OpenAI-compatible provider entries to maintain stable keys
across configuration reloads. Replace direct stdout prints with
structured logger calls for file change notifications.
Break out config diffing, hashing, and OpenAI compatibility utilities into a dedicated diff package, update watcher to consume them, and add comprehensive tests for diff logic and watcher behavior.
Unify thinking budget-to-effort conversion in a shared helper, handle disabled/default thinking cases in translators, adjust zero-budget mapping, and drop the old OpenAI-specific helper with updated tests.
Some OpenAI-compatible providers (like GitHub Copilot) may send tool_calls
in the first streaming chunk without including the role field. The previous
implementation only emitted message_start when the first chunk contained
role="assistant", causing Anthropic protocol violations when tool calls
arrived first.
This fix ensures message_start is always emitted on the very first chunk,
preventing 'content_block_start before message_start' errors in clients
that strictly validate Anthropic SSE event ordering.
Introduce `panel-github-repository` in the configuration to allow specifying a custom repository for management panel assets. Update dependency versions and enhance asset URL resolution logic to support overrides.
- Pass through non-function tool definitions like web_search
- Translate tool_choice for built-in tools and function tools
- Add regression tests for built-in tool passthrough
The response writer wrapper has been refactored to more reliably capture response bodies for logging, fixing several edge cases.
- Implements `WriteString` to capture writes from `io.StringWriter`, which were previously missed by the `Write` method override.
- A new `shouldBufferResponseBody` helper centralizes the logic to ensure the body is buffered only when logging is active or for errors when `logOnErrorOnly` is enabled.
- Streaming detection is now more robust. It correctly handles non-streaming error responses (e.g., `application/json`) that are generated for a request that was intended to be streaming.
BREAKING CHANGE: The public methods `Status()`, `Size()`, and `Written()` have been removed from the `ResponseWriterWrapper` as they are no longer required by the new implementation.
The WriteErrorResponse function now caches the error response body in the gin context.
The deferred request logger checks for this cached response. If an error response is found, it bypasses the standard response logging. This prevents scenarios where an error is logged twice or an empty payload log overwrites the original, more detailed error log.
The API error handling is updated to return a structured JSON payload
instead of a plain text message. This provides more context and allows
clients to programmatically handle different error types.
The new error response has the following structure:
{
"error": {
"message": "...",
"type": "..."
}
}
The `type` field is determined by the HTTP status code, such as
`authentication_error`, `rate_limit_error`, or `server_error`.
If the underlying error message from an upstream service is already a
valid JSON string, it will be preserved and returned directly.
BREAKING CHANGE: API error responses are now in a structured JSON
format instead of plain text. Clients expecting plain text error
messages will need to be updated to parse the new JSON body.
All Amp management endpoints (e.g., /api/user, /threads) are now protected by the standard API key authentication middleware. This ensures that all management operations require a valid API key, significantly improving security.
As a result of this change:
- The `restrict-management-to-localhost` setting now defaults to `false`. API key authentication provides a stronger and more flexible security control than IP-based restrictions, improving usability in containerized environments.
- The reverse proxy logic now strips the client's `Authorization` header after authenticating the initial request. It then injects the configured `upstream-api-key` for the request to the upstream Amp service.
BREAKING CHANGE: Amp management endpoints now require a valid API key for authentication. Requests without a valid API key in the `Authorization` header will be rejected with a 401 Unauthorized error.
Expose thinking/effort normalization helpers from the executor package
so conversion tests use production code and stay aligned with runtime
validation behavior.
Route OpenAI reasoning effort through ThinkingEffortToBudget for Claude
translators, preserve "minimal" when translating OpenAI Responses, and
treat blank/unknown efforts as no-ops for Gemini thinking configs.
Also map budget -1 to "auto" and expand cross-protocol thinking tests.
When budget 0 maps to "none" for models that use thinking levels
but don't support that effort level, strip thinking fields instead
of setting an invalid reasoning_effort value.
Tests now expect removal for this edge case.
Move OpenAI `reasoning_effort` -> Gemini `thinkingConfig` budget logic into
shared helpers used by Gemini, Gemini CLI, and antigravity translators.
Normalize Claude thinking handling by preferring positive budgets, applying
budget token normalization, and gating by model support.
Always convert Gemini `thinkingBudget` back to OpenAI `reasoning_effort` to
support allowCompat models, and update tests for normalization behavior.
Only map OpenAI reasoning effort to Claude thinking for models that support
thinking and use budget tokens (not level-based thinking).
Also add "xhigh" effort mapping and adjust minimal/low budgets, with new
raw-payload conversion tests across protocols and models.
Ensure thinking settings translate correctly across providers:
- Only apply reasoning_effort to level-based models and derive it from numeric
budget suffixes when present
- Strip effort string fields for budget-based models and skip Claude/Gemini
budget resolution for level-based or unsupported models
- Default Gemini include_thoughts when a nonzero budget override is set
- Add cross-protocol conversion and budget range tests
When using OpenAI-compatible providers with model aliases (e.g., glm-4.6-zai -> glm-4.6),
the alias resolution was correctly applied but then immediately overwritten by
ResolveOriginalModel, causing 'Unknown Model' errors from upstream APIs.
This fix skips the ResolveOriginalModel override when a model alias has already
been resolved, ensuring the correct model name is sent to the upstream provider.
Co-authored-by: Amp <amp@ampcode.com>
- Write each SSE chunk directly to c.Writer and flush immediately
- Remove buffered writer and ticker-based flushing that caused delayed output
- Add 500ms timeout case for consistency with OpenAI/Gemini handlers
- Clean up unused bufio import
This fixes the 'not streaming' issue where small responses were held
in the buffer until timeout/threshold was reached.
Amp-Thread-ID: https://ampcode.com/threads/T-019b1186-164e-740c-96ab-856f64ee6bee
Co-authored-by: Amp <amp@ampcode.com>
Add package and constructor documentation for AI Studio, Antigravity,
Gemini CLI, Gemini API, and Vertex executors to describe their roles and
inputs.
Introduce a shared stream scanner buffer constant in the Gemini API
executor and reuse it in Gemini CLI and Vertex streaming code so stream
handling uses a consistent configuration.
Update Refresh implementations for AI Studio, Gemini CLI, Gemini API
(API key), and Vertex executors to short‑circuit and simply return the
incoming auth object, while keeping Antigravity token renewal as the
only executor that performs OAuth refresh.
Remove OAuth2-based token refresh logic and related dependencies from
the Gemini API executor, since it now operates strictly with API key
credentials.
Align thinking suffix handling on a single bracket-style marker.
NormalizeThinkingModel strips a terminal `[value]` segment from
model identifiers and turns it into either a thinking budget (for
numeric values) or a reasoning effort hint (for strings). Emission
of `ThinkingIncludeThoughtsMetadataKey` is removed.
Executor helpers and the example config are updated so their
comments reference the new `[value]` suffix format instead of the
legacy dash variants.
BREAKING CHANGE: dash-based thinking suffixes (`-thinking`,
`-thinking-N`, `-reasoning`, `-nothinking`) are no longer parsed
for thinking metadata; only `[value]` annotations are recognized.
NormalizeThinkingModel now checks ModelSupportsThinking before removing
"-thinking" or "-thinking-<ver>", avoiding accidental parsing of model
names where the suffix is part of the official id (e.g., kimi-k2-thinking,
qwen3-235b-a22b-thinking-2507).
The registry adds ThinkingSupport metadata for several models and
propagates it via ModelInfo (e.g., kimi-k2-thinking, deepseek-r1,
qwen3-235b-a22b-thinking-2507, minimax-m2), enabling accurate detection
of thinking-capable models and correcting base model inference.
Add fallback parsing for quota reset delay when RetryInfo is not present:
- Try ErrorInfo.metadata.quotaResetDelay (e.g., "373.801628ms")
- Parse from error.message "Your quota will reset after Xs."
This ensures proper cooldown timing for rate-limited requests.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Added support for parsing and normalizing dynamic thinking model suffixes.
- Centralized budget resolution across executors and payload helpers.
- Retired legacy Gemini-specific thinking handlers in favor of unified logic.
- Updated executors to use metadata-based thinking configuration.
- Added `ResolveOriginalModel` utility for resolving normalized upstream models using request metadata.
- Updated executors (Gemini, Codex, iFlow, OpenAI, Qwen) to incorporate upstream model resolution and substitute model values in payloads and request URLs.
- Ensured fallbacks handle cases with missing or malformed metadata to derive models robustly.
- Refactored upstream model resolution to dynamically incorporate metadata for selecting and normalizing models.
- Improved handling of thinking configurations and model overrides in executors.
- Removed hardcoded thinking model entries and migrated logic to metadata-based resolution.
- Updated payload mutations to always include the resolved model.
Add new REST API endpoints under /v0/management/ampcode for managing
ampcode configuration including upstream URL, API key, localhost
restriction, model mappings, and force model mappings settings.
- Move force-model-mappings from config_basic to config_lists
- Add GET/PUT/PATCH/DELETE endpoints for all ampcode settings
- Support model mapping CRUD with upsert (PATCH) capability
- Add comprehensive test coverage for all ampcode endpoints
Add a configuration option to control whether model mappings take
precedence over local API keys for Amp CLI requests.
- Add PrioritizeModelMappings field to AmpCode config struct
- When false (default): Local API keys take precedence (original behavior)
- When true: Model mappings take precedence over local API keys
- Add management API endpoints GET/PUT /prioritize-model-mappings
This allows users who want mapping priority to enable it explicitly
while preserving backward compatibility.
Config example:
ampcode:
model-mappings:
- from: claude-opus-4-5-20251101
to: gemini-claude-opus-4-5-thinking
prioritize-model-mappings: true
Skip text content blocks that are empty or contain only whitespace
when translating Claude messages to OpenAI format. This fixes GLM-4.6
and other strict OpenAI-compatible providers that reject empty text
with error 'text cannot be empty'.
Simplifies routing logic by delegating all provider/mapping/proxy
decisions to FallbackHandler. Previously, the route checked for
provider/mapping availability before calling the handler, then
FallbackHandler performed the same checks again.
Changes:
- Remove model extraction and provider checking from route (lines 182-201)
- Route now only checks if request is POST with /models/ path
- FallbackHandler handles provider -> mapping -> proxy fallback
- Remove unused internal/util import
Benefits:
- Eliminates duplicate checks (addresses PR review feedback #2)
- Centralizes all provider/mapping logic in FallbackHandler
- Reduces routing code by ~20 lines
- Aligns with how other /api/provider routes work
Performance: No impact (checks still happen once in FallbackHandler)
- Change createGeminiBridgeHandler to accept gin.HandlerFunc instead of *gemini.GeminiAPIHandler
This allows tests to inject mock handlers instead of duplicating bridge logic
- Replace magic number 8 with len(modelsPrefix) for better maintainability
- Remove redundant test case that doesn't test edge case in production
- Update routes.go to pass geminiHandlers.GeminiHandler directly
Addresses PR review feedback on test architecture and code clarity.
Amp-Thread-ID: https://ampcode.com/threads/T-1ae2c691-e434-4b99-a49a-10cabd3544db
Gemini handler extracts model from URL path, not JSON body, so
rewriting the request body alone wasn't sufficient for model mapping.
- Add MappedModelContextKey constant for context passing
- Update routes.go to use NewFallbackHandlerWithMapper
- Add check for valid mapping before routing to local handler
- Add tests for gemini bridge model mapping
- Added `gpt_5_1_prompt.md` and `gpt_5_codex_prompt.md` to document Codex instruction guidelines.
- These detail the behavior, constraints, and execution policies for GPT-5-based Codex agents in the CLI environment.
* feat(config): add pruning of stale YAML mapping keys during config save
* Revert watcher.go in "fix: enable hot reload for amp-model-mappings config"
- Renamed `generationConfig.responseSchema` to `generationConfig.responseJsonSchema` in Gemini request transformation to align with updated schema expectations.
- Removed `vertex-compat` executor and related configuration.
- Consolidated Vertex compatibility checks into `vertex` handling with `apikey`-based model resolution.
- Streamlined model generation logic for Vertex API key entries.
- Store ampModule in Server struct to access it during config updates
- Call ampModule.OnConfigUpdated() in UpdateClients() for hot reload
- Watch config directory instead of file to handle atomic saves (vim, VSCode, etc.)
- Improve config file event detection with basename matching
- Add diagnostic logging for config reload tracing
Extract applyThinkingMetadata and applyThinkingMetadataCLI helpers to
payload_helpers.go and use them across all four Gemini-based executors:
- gemini_executor.go (Execute, ExecuteStream, CountTokens)
- gemini_cli_executor.go (Execute, ExecuteStream, CountTokens)
- aistudio_executor.go (translateRequest)
- antigravity_executor.go (Execute, ExecuteStream)
This eliminates code duplication introduced in the -reasoning suffix PR
and centralizes the thinking config application logic.
Net reduction: 28 lines of code.
Adds support for the `-reasoning` model name suffix which enables
thinking/reasoning mode with dynamic budget. This allows clients to
request reasoning-enabled inference using model names like
`gemini-2.5-flash-reasoning` without explicit configuration.
The suffix is normalized to the base model (e.g., gemini-2.5-flash)
with thinkingBudget=-1 (dynamic) and include_thoughts=true.
Follows the existing pattern established by -nothinking and
-thinking-N suffixes.
- Updated `usage_helpers.go` to call `EnsureIndex()` for proper index assignment in reporter initialization.
- Adjusted `auth/manager.go` to assign auth indices inside a locked section when they are unassigned, ensuring thread safety and consistency.
AMP CLI requests /threads.rss at the root level, but the AMP module
only registered routes under /api/*. This caused a 404 error during
AMP CLI startup.
Add the missing root-level route with the same security middleware
(noCORS, optional localhost restriction) as other management routes.
- Add model mapping feature to README.md Amp CLI section
- Add detailed Model Mapping Configuration section to amp-cli-integration.md
- Update architecture diagram to show model mapping flow
- Update Model Fallback Behavior to include mapping step
- Add Table of Contents entry for model mapping
- Add AmpModelMapping config to route models like 'claude-opus-4.5' to 'claude-sonnet-4'
- Add ModelMapper interface and DefaultModelMapper implementation with hot-reload support
- Enhance FallbackHandler to apply model mappings before falling back to ampcode.com
- Add structured logging for routing decisions (local provider, mapping, amp credits)
- Update config.example.yaml with amp-model-mappings documentation
- Updated `antigravity_openai_request.go` to process non-JSON outputs gracefully by verifying and distinguishing between JSON and plain string formats.
- Ensured proper assignment of parsed or raw response to `functionResponse`.
- Added `ContextLength` field with a value of 200,000 to all applicable Claude model definitions.
- Standardized `MaxCompletionTokens` values across models for consistency and alignment.
**fix(translator): add support for "xhigh" reasoning effort in OpenAI responses**
- Updated handling in `openai_openai-responses_request.go` to include the new "xhigh" reasoning effort level.
Fixes an issue where Claude thinking models would return 400 errors when
the thinking.budget_tokens was greater than or equal to max_tokens.
Changes:
- Add MaxCompletionTokens: 128000 to all Claude thinking model definitions
- Add ensureMaxTokensForThinking() function in claude_executor.go that:
- Checks if thinking is enabled with a budget_tokens value
- Looks up the model's MaxCompletionTokens from the registry
- Ensures max_tokens is set to at least the model's MaxCompletionTokens
- Falls back to budget_tokens + 4000 buffer if registry lookup fails
This ensures Anthropic API constraint (max_tokens > thinking.budget_tokens)
is always satisfied when using extended thinking features.
Fixes: #339🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Implemented logic to pair consecutive function calls and their outputs, ensuring proper sequencing for processing.
- Adjusted `gemini_openai-responses_request.go` to normalize message structures and maintain expected flow.
- Updated handling of `output` in `gemini_openai-responses_request.go` to use `.Str` instead of `.Raw` when parsing non-JSON string outputs.
- Added checks to distinguish between JSON and non-JSON `output` types for accurate `functionResponse` construction.
- Updated handling of `output` in `gemini_openai-responses_request.go` to use `.Raw` instead of `.String` for preserving original JSON encoding.
- Ensured proper setting of raw JSON output when constructing `functionResponse`.
- Updated handling of `thoughtSignature` across all translator modules to retain other content payloads if present.
- Adjusted logic for `thought_signature` and `inline_data` keys for consistent processing.
- Added new `Gemini 3 Pro Image Preview` model with detailed metadata and configuration.
- Removed outdated `Claude Sonnet 4.5 Thinking` model definition for cleanup and relevance.
**feat(handlers, executor): add Gemini 3 Pro Preview support and refine Claude system instructions**
- Added support for the new "Gemini 3 Pro Preview" action in Gemini handlers, including detailed metadata and configuration.
- Removed redundant `cache_control` field from Claude system instructions for cleaner payload structure.
**fix(executor): replace redundant commented code with `checkSystemInstructions` helper**
- Replaced commented-out `sjson.SetRawBytes` lines with the new `checkSystemInstructions` function.
- Centralized system instruction handling for better code clarity and reuse.
- Ensured consistent logic for managing `system` field across Claude executor flows.
- Commented out multiple instances of `sjson.SetRawBytes` for setting `system` key to Claude instructions as they are redundant.
- Code cleanup to improve clarity and maintainability without affecting functionality.
- Add Claude thinking model definitions (sonnet-4-5-thinking, opus-4-5-thinking variants)
- Add Thinking support for antigravity models with -thinking suffix
- Add injectThinkingConfig() for automatic thinking budget based on model suffix
- Add resolveUpstreamModel() mappings for thinking variants to actual Claude models
- Add extractAndRemoveBetas() to convert betas array to anthropic-beta header
- Update applyClaudeHeaders() to merge custom betas from request body
Closes#324
- Introduced `appendAPIResponse` helper to preserve and append data to existing API responses.
- Ensured newline inclusion when appending, if necessary.
- Improved `nil` and data type checks for response handling.
- Updated middleware to skip request logging for `GET` requests.
- Added additional metadata fields (`Name`, `Description`, `DisplayName`, `Version`) to `ModelInfo` struct initialization for better model representation.
- Removed unnecessary whitespace in the code.
- Ensured `functionDeclarations` key renaming only occurs if the key exists in Gemini tools processing.
- Prevented unnecessary JSON reassignment when the target key is absent.
- Fixed parameter key renaming to correctly handle `functionDeclarations` and `parametersJsonSchema` in Gemini tools.
- Resolved potential overwriting issue by reassigning JSON strings after each key rename.
- Introduced `TLSConfig` to support HTTPS configurations, including enabling TLS, specifying certificate and key files.
- Updated HTTP server logic to handle HTTPS mode when TLS is enabled.
- Enhanced `config.example.yaml` with TLS settings example.
- Adjusted internal URL generation to respect protocol based on TLS state.
- Fixed improper handling of `indexAssigned` and `Index` during auth reassignment.
- Ensured `EnsureIndex` is invoked after validating existing auth entries.
**feat(retry): add configurable retry logic with cooldown support**
- Introduced `max-retry-interval` configuration for cooldown durations between retries.
- Added `SetRetryConfig` in `Manager` to handle retry attempts and cooldown intervals.
- Enhanced provider execution logic to include retry attempts, cooldown management, and dynamic wait periods.
- Updated API endpoints and YAML configuration to support `max-retry-interval`.
- Introduced `logOnErrorOnly` mode to enable logging only for error responses when request logging is disabled.
- Added endpoints to list and download error logs (`/request-error-logs`).
- Implemented error log file cleanup to retain only the newest 10 logs.
- Refactored `ResponseWriterWrapper` to support forced logging for error responses.
- Enhanced middleware to capture data for upstream error persistence.
- Improved log file naming and error log filename generation.
- Restored `thoughtSignature` validator bypass for model-specific parts in Gemini content processing.
- Removed redundant logic from the `executor` for cleaner handling.
Remove generationConfig.maxOutputTokens, generationConfig.responseMimeType and generationConfig.responseJsonSchema from the Gemini payload in translateRequest so we no longer send unsupported or conflicting response configuration fields. This lets the backend or caller control response formatting and output limits and helps prevent potential API errors caused by these keys.
- Updated `defaultAntigravityAgent` to version `1.11.5`.
- Adjusted const value formatting for improved readability.
**feat(executor): introduce fallback mechanism for Antigravity base URLs**
- Added retry logic with fallback order for Antigravity base URLs to handle request errors and rate limits.
- Refactored base URL handling with `antigravityBaseURLFallbackOrder` and related utilities.
- Enhanced error handling in non-streaming and streaming requests with retry support and improved metadata reporting.
- Updated `buildRequest` to support dynamic base URL assignment.
- Improved Antigravity executor to handle `thinkingConfig` adjustments and default `thinkingBudget` when `thinkingLevel` is removed.
- Updated translator response handling to set default values for output token counts when specific token data is missing.
- Introduced `modelName2Alias` and `alias2ModelName` functions for mapping between model names and aliases.
- Improved Antigravity payload transformation to include alias-to-model name conversion.
- Enhanced processing for Claude Sonnet models to adjust template parameters based on schema presence.
- Added `authFileUnchanged` to skip reloads for unchanged files based on SHA256 hash comparisons.
- Introduced `isKnownAuthFile` to verify known files before handling removal events.
- Improved event processing in `handleEvent` to reduce unnecessary reloads and enhance performance.
- Refactored response translation logic to support mixed content types (`input_text`, `output_text`, `input_image`) with better role assignments and part handling.
- Added image processing logic for embedding inline data with MIME type and base64 encoded content.
- Updated Antigravity request conversion to replace Gemini CLI references for consistency.
**feat(executor): enhance WebSocket error handling and metadata logging**
- Added handling for stream closure before start with appropriate error recording.
- Improved metadata logging for non-OK HTTP status codes in WebSocket responses.
- Consolidated event processing logic with `processEvent` for better error handling and payload management.
- Refactored stream initialization to include the first event handling for smoother execution flow.
- Added new endpoint `/antigravity-auth-url` to initiate Antigravity authentication.
- Implemented `RequestAntigravityToken` to manage the OAuth flow, including token exchange and user info retrieval.
- Introduced `.oauth-antigravity` temporary file handling for state and code management.
- Added `sanitizeAntigravityFileName` utility for safe token file names based on user email.
- Registered `/antigravity/callback` endpoint for OAuth redirects.
- Introduced request and response translation functions to enable compatibility between OpenAI Chat Completions API and Antigravity.
- Registered translation utilities for both streaming and non-streaming scenarios.
- Added support for reasoning content, tool calls, and metadata handling.
- Established request normalization and embedding for Antigravity-compatible payloads.
- Added new fields to `Params` struct for better tracking of finish reasons, usage metadata, and tool usage.
- Refactored handling of response transitions, final events, and state-driven logic in `ConvertAntigravityResponseToClaude`.
- Introduced `appendFinalEvents` and `resolveStopReason` helper functions for cleaner separation of concerns.
- Added `TotalTokenCount` field to `Params` struct for enhanced token tracking.
- Updated token count calculations to fallback on `TotalTokenCount` when specific counts are missing.
- Introduced `hasNonZeroUsageMetadata` function to validate presence of token data in `usage_metadata`.
- Replaced `http.DefaultClient` with a configurable `http.Client` instance for Antigravity OAuth flow methods.
- Updated `exchangeAntigravityCode` and `fetchAntigravityUserInfo` to accept `httpClient` as a parameter.
- Added `util.SetProxy` usage to initialize the `httpClient` with proxy support.
- Implemented OAuth login flow for the Antigravity provider in `auth/antigravity.go`.
- Added `AntigravityExecutor` for handling requests and streaming via Antigravity APIs.
- Created `antigravity_login.go` command for triggering Antigravity authentication.
- Introduced OpenAI-to-Antigravity translation logic in `translator/antigravity/openai/chat-completions`.
**refactor(translator, executor): update Gemini CLI response translation and add Antigravity payload customization**
- Renamed Gemini CLI translation methods to align with response handling (`ConvertGeminiCliResponseToGemini` and `ConvertGeminiCliResponseToGeminiNonStream`).
- Updated `init.go` to reflect these method changes.
- Introduced `geminiToAntigravity` function to embed metadata (`model`, `userAgent`, `project`, etc.) into Antigravity payloads.
- Added random project, request, and session ID generators for enhanced tracking.
- Streamlined `buildRequest` to use `geminiToAntigravity` transformation before request execution.
**feat(executor): add thinking level to budget conversion utility**
- Introduced `ConvertThinkingLevelToBudget` to map thinking level ("high"/"low") to corresponding budget values.
- Applied the utility in `aistudio_executor.go` before stripping unsupported configs.
- Updated dependencies to include `tidwall/gjson` for JSON parsing.
- Added `shouldLogRequest` helper to simplify path-based request logging logic.
- Updated middleware to skip management endpoints for improved security.
- Introduced an explicit `nil` logger check for minimal overhead.
- Updated dependencies in `go.mod`.
**feat(auth): add handling for 404 response with retry logic**
- Introduced support for 404 `not_found` status with a 12-hour backoff period.
- Updated `manager.go` to align state and status messages for 404 scenarios.
**refactor(translator): comment out debug logging in Gemini responses request**
- Updated `README_CN.md` to introduce Amp CLI and IDE support.
- Added detailed integration guide in `docs/amp-cli-integration_CN.md`.
- Covered setup, configuration, OAuth, security, and usage of Amp CLI with Google/ChatGPT/Claude subscriptions.
Fix critical security vulnerability in amp-restrict-management-to-localhost
feature where attackers could bypass localhost restriction by spoofing
X-Forwarded-For headers.
Changes:
- Use RemoteAddr (actual TCP connection) instead of ClientIP() in
localhostOnlyMiddleware to prevent header spoofing attacks
- Add comprehensive test coverage for spoofing prevention (6 test cases)
- Update documentation with reverse proxy deployment guidance and
limitations of the RemoteAddr approach
The fix prevents attacks like:
curl -H "X-Forwarded-For: 127.0.0.1" https://server/api/user
Trade-off: Users behind reverse proxies will need to disable the feature
and use alternative security measures (firewall rules, proxy ACLs).
Addresses security review feedback from PR #287.
AMP CLI sends Gemini requests to non-standard paths that were being
directly proxied to ampcode.com without checking for local OAuth.
This fix adds:
- GeminiBridge handler to transform AMP CLI paths to standard format
- Enhanced model extraction from AMP's /publishers/google/models/* paths
- FallbackHandler wrapper to check for local OAuth before proxying
Flow:
- If user has local Google OAuth → use it (free tier)
- If no local OAuth → fallback to ampcode.com (charges credits)
Fixes issue where gemini-3-pro-preview requests always charged AMP
credits even when user had valid Google Cloud OAuth configured.
- Update README.md to present Amp CLI as standard feature, not fork differentiator
- Remove USING_WITH_FACTORY_AND_AMP.md (fork-specific, Factory docs live upstream)
- Add comprehensive docs/amp-cli-integration.md with setup, config, troubleshooting
- Eliminate fork justification messaging throughout documentation
- Prepare Amp CLI integration for upstream merge consideration
This positions Amp CLI support as a natural extension of CLIProxyAPI's
multi-client architecture rather than a fork-specific feature.
- Replace deprecated GPT-5 and GPT-5-Codex with GPT-5.1 family
- Add explicit reasoning effort levels (low/medium/high)
- Remove duplicate base models (use medium as default)
- GPT-5.1 Codex Mini supports medium/high only (per OpenAI docs)
- Remove older Claude Sonnet 4 (keep 4.5)
- Final config: 11 models (3 Claude + 8 GPT-5.1 variants)
Amp CLI sends 'context-1m-2025-08-07' in Anthropic-Beta header which
requires a special 1M context window subscription. After upstream rebase
to v6.3.7 (commit 38cfbac), CLIProxyAPI now respects client-provided
Anthropic-Beta headers instead of always using defaults.
When users configure local OAuth providers (Claude, etc), requests bypass
the ampcode.com proxy and use their own API subscriptions. These personal
subscriptions typically don't include the 1M context beta feature, causing
'long context beta not available' errors.
Changes:
- Add filterBetaFeatures() helper to strip specific beta features
- Filter context-1m-2025-08-07 in fallback handler when using local providers
- Preserve full headers when proxying to ampcode.com (paid users get all features)
- Add 7 test cases covering all edge cases
This fix is isolated to the Amp module and only affects the local provider
path. Users proxying through ampcode.com are unaffected and receive full
1M context support as part of their paid service.
- add fallback handler that forwards Amp provider requests to ampcode.com when the provider isn’t configured locally
- wrap AMP provider routes with the fallback so requests always have a handler
- share Gemini thinking model normalization helper between core handlers and AMP fallback
Add full Amp CLI support to enable routing AI model requests through the proxy
while maintaining Amp-specific features like thread management, user info, and
telemetry. Includes complete documentation and pull bot configuration.
Features:
- Modular architecture with RouteModule interface for clean integration
- Reverse proxy for Amp management routes (thread/user/meta/ads/telemetry)
- Provider-specific route aliases (/api/provider/{provider}/*)
- Secret management with precedence: config > env > file
- 5-minute secret caching to reduce file I/O
- Automatic gzip decompression for responses
- Proper connection cleanup to prevent leaks
- Localhost-only restriction for management routes (configurable)
- CORS protection for management endpoints
Documentation:
- Complete setup guide (USING_WITH_FACTORY_AND_AMP.md)
- OAuth setup for OpenAI (ChatGPT Plus/Pro) and Anthropic (Claude Pro/Max)
- Factory CLI config examples with all model variants
- Amp CLI/IDE configuration examples
- tmux setup for remote server deployment
- Screenshots and diagrams
Configuration:
- Pull bot disabled for this repo (manual rebase workflow)
- Config fields: AmpUpstreamURL, AmpUpstreamAPIKey, AmpRestrictManagementToLocalhost
- Compatible with upstream DisableCooling and other features
Technical details:
- internal/api/modules/amp/: Complete Amp routing module
- sdk/api/httpx/: HTTP utilities for gzip/transport
- 94.6% test coverage with 34 comprehensive test cases
- Clean integration minimizes merge conflict risk
Security:
- Management routes restricted to localhost by default
- Configurable via amp-restrict-management-to-localhost
- Prevents drive-by browser attacks on user data
This provides a production-ready foundation for Amp CLI integration while
maintaining clean separation from upstream code for easy rebasing.
Amp-Thread-ID: https://ampcode.com/threads/T-9e2befc5-f969-41c6-890c-5b779d58cf18
Fix critical bug where ExecuteStream would create a streaming channel
from a failed (non-2xx) response after exhausting all retries with no
fallback models available.
When retries were exhausted on the last model, the code would break from
the inner loop but fall through to streaming channel creation (line 401),
immediately returning at line 461. This made the error handling code at
lines 464-471 unreachable, causing clients to receive an empty/closed
stream instead of a proper error response.
Solution: Check if httpResp is non-2xx before creating the streaming
channel. If failed, continue the outer loop to reach error handling.
Identified by: codex-bot review
Ref: https://github.com/router-for-me/CLIProxyAPI/pull/280#pullrequestreview-3484560423
Fix critical bug where ExecuteStream would create a streaming channel
using a 429 error response instead of continuing to the next fallback
model after exhausting retries.
When 429 retries were exhausted and a fallback model was available,
the inner retry loop would break but immediately fall through to the
streaming channel creation, attempting to stream from the failed 429
response instead of trying the next model.
Solution: Add shouldContinueToNextModel flag to explicitly skip the
streaming logic and continue the outer model loop when appropriate.
Identified by: codex-bot review
Ref: https://github.com/router-for-me/CLIProxyAPI/pull/280#pullrequestreview-3484479106
Match official Gemini CLI behavior by always sending default
thinkingConfig when client doesn't specify reasoning parameters.
- Set thinkingBudget=-1 (dynamic) for gemini-3-pro-preview
- Set include_thoughts=true to return thinking process
- Apply to both /v1/chat/completions and /v1/responses endpoints
- See: ai-gemini-cli/packages/core/src/config/defaultModelConfigs.ts
Implement Google RetryInfo.retryDelay support for handling 429 rate
limit errors. Retries same model up to 3 times using exact delays
from Google's API before trying fallback models.
- Add parseRetryDelay() to extract Google's retry guidance
- Implement inner retry loop in Execute() and ExecuteStream()
- Context-aware waiting with cancellation support
- Cap delays at 60s maximum for safety
Add gemini-3-pro-preview model to GetGeminiCLIModels() to make it
available for OAuth-based Gemini CLI users, matching the model
already available in AI Studio provider.
Model spec:
- ID: gemini-3-pro-preview
- Version: 3.0
- Input: 1M tokens
- Output: 64K tokens
- Thinking: 128-32K tokens (dynamic)
- Introduced `gpt-5.1-codex-max` variants to model definitions (`low`, `medium`, `high`, `xhigh`).
- Updated executor logic to map effort levels for Codex Max models.
- Added `lastCodexMaxPrompt` processing for `gpt-5.1-codex-max` prompts.
- Defined instructions for `gpt-5.1-codex-max` in a new file: `codex_instructions/gpt-5.1-codex-max_prompt.md`.
Replace the "~<n>" suffix with "_<n>" when generating unique short names in codex translators (Claude, Gemini, OpenAI chat).
This avoids using a special character in identifiers, improving compatibility with downstream APIs while preserving length constraints.
Add logic to avoid exposing credentials that have been removed from disk but still persist in memory. Ensure `runtimeOnly` checks and proper handling of disabled or removed authentication states.
Introduce `migrateLegacyOpenAICompatibilityKeys` to streamline and reuse the normalization of OpenAI compatibility entries. Remove redundant loops and enhance maintainability for compatibility key handling. Add cleanup for legacy `api-keys` in YAML configuration during persistence.
Extract reasoning effort mapping into a reusable function `setReasoningEffortByAlias` to reduce redundancy and improve maintainability. Introduce support for the "gpt-5.1-none" variant in the registry and runtime executor.
Stop advertising and mapping the unsupported gpt-5.1-minimal variant in the model registry and Codex executor, and align bare gpt-5.1 requests to use medium reasoning effort like Codex CLI while preserving minimal for gpt-5.
Expand executor logic to handle GPT-5.1 Codex family and its variants, including reasoning effort configurations for minimal, low, medium, and high levels. Ensure proper mapping of models to payload parameters.
Introduce `PayloadConfig` in the configuration to define default and override rules for modifying payload parameters. Implement `applyPayloadConfig` and `applyPayloadConfigWithRoot` to apply these rules across various executors, ensuring consistent parameter handling for different models and protocols. Update all relevant executors to utilize this functionality.
feat(runtime): add support for GPT-5.1 models and variants
Introduce GPT-5.1 model family, including minimal, low, medium, high, Codex, and Codex Mini variants. Update tokenization and reasoning effort handling to accommodate new models in executor and registry.
Refine `handleEvent` to support additional file system operations (Rename, Remove) for config and auth JSON files. Improve client update/removal logic with atomic file replacement handling and incremental processing for auth changes.
Introduce a new `buildinfo` package to store version, commit, and build date metadata. Update HTTP handlers to include build metadata in response headers and modify initialization to set `buildinfo` values during runtime.
Introduce support for multi-project Gemini CLI logins, including shared and virtual credential management. Enhance runtime, metadata handling, and token updates for better project granularity and consistency across virtual and shared credentials. Extend onboarding to allow activating all available projects.
Extend `resolveUsageSource` to support Vertex projects by extracting and normalizing `project_id` or `project` from the metadata for accurate source resolution.
Extend `vertexAccessToken` to support proxy-aware HTTP clients and update calls accordingly for better configurability. Add `deleteTokenRecord` to handle token cleanup, improving management of authentication files.
- Add 'created' field to model registry for tracking model creation time
- Implement GetFirstAvailableModel() to find the first available model by newest creation timestamp
- Add ResolveAutoModel() utility function to resolve "auto" model name to actual available model
- Update request handler to resolve "auto" model before processing requests
- Ensures automatic model selection when "auto" is specified as model name
This enables dynamic model selection based on availability and creation time, improving the user experience when no specific model is requested.
feat(management): add auth ID normalization and file-based ID resolution
Introduce `authIDForPath` to standardize ID generation from file paths, improving consistency in authentication handling. Update `registerAuthFromFile` and `disableAuth` to utilize normalized IDs, incorporating relative path resolution and file name extraction where applicable.
Enhance error management for file operations and clean up temporary files. Add `NormalizeCommentIndentation` function to ensure YAML comments maintain consistent formatting.
Introduce an endpoint for importing Vertex service account JSON keys and storing them as authentication records. Add handlers for managing WebSocket authentication configuration.
Introduce `scheduleConfigReload` with debounce functionality for config reloads, ensuring efficient handling of frequent changes. Added `stopConfigReloadTimer` for stopping timers during watcher shutdown.
Introduce Vertex AI Gemini integration with support for service account-based authentication, credential storage, and import functionality. Added new executor for Vertex AI requests, including execution and streaming paths, and integrated it into the core manager. Enhanced CLI with `--vertex-import` flag for importing service account keys.
Add `ensurePublished` to guarantee request counting even when usage fields (e.g., tokens) are absent in OpenAI-compatible executor responses, particularly for streaming paths.
Added detailed operational instructions for Codex agents based on GPT-5, covering shell usage, editing constraints, sandboxing policies, and approval mechanisms. Also included comprehensive review process guidelines for flagging and communicating issues effectively.
Adds three new Codex Mini model variants (mini, mini-medium, mini-high)
that map to codex-mini-latest. Codex Mini supports medium and high
reasoning effort levels only (no low/minimal). Base model defaults to
medium reasoning effort.
- Deleted `MANAGEMENT_API.md` and `MANAGEMENT_API_CN.md` as they are no longer necessary.
- Streamlined project documentation by removing redundant API details already covered elsewhere.
- Added detailed descriptions for new `/config.yaml` endpoints (GET/PUT).
- Documented API responses, error codes, and enhancements for log management, usage statistics, and OAuth flows.
- Updated examples and notes for better clarity across both EN and CN versions.
- Streamlined Chinese README by reducing redundancy and unnecessary sections.
- Added a concise link to CLIProxyAPI user manual for detailed instructions.
- Reorganized the original README with a simplified overview.
- Streamlined Chinese README by reducing redundancy and unnecessary sections.
- Added a concise link to CLIProxyAPI user manual for detailed instructions.
- Reorganized the original README with a simplified overview.
- Streamlined Chinese README by reducing redundancy and unnecessary sections.
- Added a concise link to CLIProxyAPI user manual for detailed instructions.
- Reorganized the original README with a simplified overview.
fix(translator): consolidate temperature and top_p conditionals in OpenAI Claude request
Fixed: #169
fix(translator): adjust instruction strings in Codex Claude and OpenAI responses
- Introduced `ClientSupportsModel` function to `ModelRegistry` for verifying client support for specific models.
- Integrated model support validation into executor candidate filtering logic.
- Updated CLIProxy registry interface to include the new support check method.
- Introduced `builtin` package exposing a default registry and pipeline for built-in translators.
- Added format constants for common schemas (e.g., OpenAI, Gemini, Codex).
- Implemented helper functions for schema translation using format name strings.
- Provided example usage for integration with translator helpers.
- Standardized User-Agent strings for Codex and Claude executors to improve request tracing and compatibility.
- Updated header insertion logic in both executors for consistency.
- Updated `README.md` and `README_CN.md` with steps to install via the Linux installer.
- Acknowledged [brokechubb](https://github.com/brokechubb) for building the installer.
- Added fallback to set default `parametersJsonSchema` when `parameters` key is absent.
- Enhanced logging to capture detailed errors during schema transformation.
- Refined tool declaration appending logic for robustness.
- Updated `README.md` and `README_CN.md` to include additional instructions on requesting undefined models using the `openrouter://` format.
- Added example for `moonshotai/kimi-k2:free` usage.
Implements functionality to parse model names with provider information in the format "provider://model" This allows dynamic provider selection rather than relying only on predefined mappings.
The change affects all execution methods to properly handle these dynamic model specifications while maintaining compatibility with the existing approach for standard model names.
feat(runtime): add Brotli and Zstd compression support, improve response handling
- Implemented Brotli and Zstd decompression handling in `FileRequestLogger` and executor logic for enhanced compatibility.
- Added `decodeResponseBody` utility for streamlined multi-encoding support (Gzip, Deflate, Brotli, Zstd).
- Improved resource cleanup with composite readers for proper closure under all conditions.
- Updated dependencies in `go.mod` and `go.sum` to include Brotli and Zstd libraries.
- Introduced model alias mapping for Claude configurations, enabling upstream and client-facing model name associations.
- Added `computeClaudeModelsHash` to generate a consistent hash for model aliases.
- Implemented `normalizeClaudeKey` function to standardize input API key configuration, including models.
- Enhanced executor to resolve model aliases to upstream names dynamically.
- Updated documentation and configuration examples to reflect new model alias support.
- Replaced `s.currentPath` with `s.configFilePath` for consistent handling of management asset paths.
- Adjusted calls to `managementasset.FilePath` and `StaticDir` to use the updated configuration path.
- Introduce Server.AttachWebsocketRoute(path, handler) to mount websocket
upgrade handlers on the Gin engine.
- Track registered WS paths via wsRoutes with wsRouteMu to prevent
duplicate registrations; initialize in NewServer and import sync.
- Add Manager.UnregisterExecutor(provider) for clean executor lifecycle
management.
- Add github.com/gorilla/websocket v1.5.3 dependency and update go.sum.
Motivation: enable services to expose WS endpoints through the core server
and allow removing auth executors dynamically while avoiding duplicate
route setup. No breaking changes.
feat(translator): add token counting functionality for Gemini, Claude, and CLI
- Introduced `TokenCount` handling across various Codex translators (Gemini, Claude, CLI) with respective implementations.
- Added utility methods for token counting and formatting responses.
- Integrated `tiktoken-go/tokenizer` library for tokenization.
- Updated CodexExecutor with token counting logic to support multiple models including GPT-5 variants.
- Refined go.mod and go.sum to include new dependencies.
feat(runtime): add token counting functionality across executors
- Implemented token counting in OpenAICompatExecutor, QwenExecutor, and IFlowExecutor.
- Added utilities for token counting and response formatting using `tiktoken-go/tokenizer`.
- Integrated token counting into translators for Gemini, Claude, and Gemini CLI.
- Enhanced multiple model support, including GPT-5 variants, for token counting.
docs: update environment variable instructions for multi-model support
- Added details for setting `ANTHROPIC_DEFAULT_OPUS_MODEL`, `ANTHROPIC_DEFAULT_SONNET_MODEL`, and `ANTHROPIC_DEFAULT_HAIKU_MODEL` for version 2.x.x.
- Clarified usage of `ANTHROPIC_MODEL` and `ANTHROPIC_SMALL_FAST_MODEL` for version 1.x.x.
- Expanded examples for setting environment variables across different models including Gemini, GPT-5, Claude, and Qwen3.
docs: add GPT-5 Codex guidelines for CLI usage
- Added detailed guidelines for GPT-5 Codex in Codex CLI.
- Expanded instructions on sandboxing, approvals, editing constraints, and style requirements.
- Included presentation and response formatting best practices.
fix(codex_instructions): update comparison logic to use prefix matching
- Changed system instructions comparison to use `strings.HasPrefix` for improved flexibility.
- Added comprehensive instructions for Codex CLI harness, sandboxing, approvals, and editing constraints to `internal/misc/codex_instructions/`.
- Clarified `approval_policy` configurations and scenarios requiring escalated permissions.
- Provided detailed style and structure guidelines for presenting results in the Codex CLI.
- Updated Execute methods to include enhanced error handling via `StatusCode` and `Headers` extraction.
- Introduced structured error responses for cooling down scenarios, providing additional metadata and retry suggestions.
- Refined quota management, allowing for differentiation between cool-down, disabled, and other block reasons.
- Improved model filtering logic based on client availability and suspension criteria.
- Updated the Execute methods in various executors (GeminiCLIExecutor, GeminiExecutor, IFlowExecutor, OpenAICompatExecutor, QwenExecutor) to return a response and error as named return values for improved clarity.
- Enhanced error handling by deferring failure tracking in usage reporters, ensuring that failures are reported correctly.
- Improved response body handling by ensuring proper closure and error logging for HTTP responses across all executors.
- Added failure tracking and reporting in the usage reporter to capture unsuccessful requests.
- Updated the usage logging structure to include a 'Failed' field for better tracking of request outcomes.
- Adjusted the logic in the RequestStatistics and Record methods to accommodate the new failure tracking mechanism.
- Added detailed logging of upstream request metadata including URL, method, headers, and body for Codex, Gemini, IFlow, OpenAI Compat, and Qwen executors.
- Implemented error logging for API response failures to capture errors during HTTP requests.
- Introduced structured logging for authentication details (AuthID, AuthLabel, AuthType, AuthValue) to improve traceability.
- Updated response logging to include status codes and headers for better debugging.
- Ensured that all executors consistently log API interactions to facilitate monitoring and troubleshooting.
feat(gemini): add Gemini thinking configuration support and metadata normalization
- Introduced logic to parse and apply `thinkingBudget` and `include_thoughts` configurations from metadata.
- Enhanced request handling to include normalized Gemini model metadata, preserving the original model identifier.
- Updated Gemini and Gemini-CLI executors to apply thinking configuration based on metadata overrides.
- Refactored handlers to support metadata extraction and cloning during request preparation.
Add a `MessageStarted` flag to `ConvertOpenAIResponseToAnthropicParams` to ensure the `message_start` event is emitted only once during streaming.
Refactor response handling to detect streaming mode via the `stream` field instead of the `object` type, simplifying the branching logic.
Update the streaming conversion to set `MessageStarted` after sending the `message_start` event, preventing duplicate starts.
These changes improve correctness of streaming response handling for Claude integration.
When from==to (Claude→Claude scenario), directly forward SSE stream
line-by-line without invoking TranslateStream. This preserves the
multi-line SSE event structure (event:/data:/blank) and prevents
JSON parsing errors caused by event fragmentation.
Resolves: JSON parsing error when using Claude Code streaming responses
fix: correct SSE event formatting in Handler layer
Remove duplicate newline additions (\n\n) that were breaking SSE event format.
The Executor layer already provides properly formatted SSE chunks with correct
line endings, so the Handler should forward them as-is without modification.
Changes:
- Remove redundant \n\n addition after each chunk
- Add len(chunk) > 0 check before writing
- Format error messages as proper SSE events (event: error\ndata: {...}\n\n)
- Add chunkIdx counter for future debugging needs
This fixes JSON parsing errors caused by malformed SSE event streams.
fix: update comments for clarity in SSE event forwarding
- Add oldConfigYaml to store previous config snapshot
- Rebuild oldCfg from YAML in UpdateClients for reliable change detection
- Initialize and refresh snapshot on startup and after updates
- Prevents change detection bugs when Management API mutates cfg in place
- Import gopkg.in/yaml.v3
refactor(translator): streamline Codex response handling and remove redundant code
- Updated `ConvertCodexResponseToOpenAIResponses` logic for clarity and consistency.
- Simplified `ConvertCodexResponseToOpenAIResponsesNonStream` by removing unnecessary buffer setup and scanner logic.
- Switched to using `sjson.SetRaw` for improved processing of raw input strings.
feat(translator): map Claude web search tool type to Codex web_search
- Added special handling to replace `web_search_20250305` tool type with `{"type":"web_search"}` in Claude request processing.
- Introduced `Source` field to usage-related structs for better origin tracking.
- Updated `newUsageReporter` to resolve and populate the `Source` attribute.
- Implemented `resolveUsageSource` to determine source from auth metadata or API key.
- Replaced `contentParts` with `aggregatedParts` to support mixed content (text and inline data).
- Introduced `textBuilder` for efficient text concatenation.
- Added support for inline data processing, including base64-encoded image URLs.
- Updated `msg["content"]` logic to handle both plain text and mixed content scenarios.
- Added instructions for configuring a Git repository as a backend for `config.yaml` and token storage.
- Included example environment variable configurations for Docker and Docker Compose.
- Updated both English (README.md) and Chinese (README_CN.md) documentation.
- Added `GitTokenStore` to handle token storage and metadata using Git as a backing storage.
- Implemented methods for initialization, save, retrieval, listing, and deletion of auth files.
- Updated `go.mod` and `go.sum` to include new dependencies for Git integration.
- Integrated support for Git-backed configuration via `GitTokenStore`.
- Updated server logic to clone, initialize, and manage configurations from Git repositories.
- Added helper functions for verifying and synchronizing configuration files.
- Improved error handling and contextual logging for Git operations.
- Modified Dockerfile to include `config.example.yaml` for initial setup.
- Added `gitCommitter` interface to handle Git-based commit and push operations.
- Configured `Watcher` to detect and leverage Git-backed token stores.
- Implemented `commitConfigAsync` and `commitAuthAsync` methods for asynchronous change synchronization.
- Enhanced `GitTokenStore` with `CommitPaths` method to support selective file commits.
- Introduced synchronization with `sync.Mutex` to ensure thread safety.
- Added logic to skip update checks if the last check was performed within the 3-hour interval.
feat(translator): add support for removing `strict` in Gemini request transformation
- Updated API and CLI translators to remove the `strict` path during request transformation, in addition to existing predefined JSON paths.
- Introduced `gemini-2.5-flash-image-preview` model to the registry with updated definitions.
- Enhanced Gemini CLI and API executors to handle image aspect ratio adjustments for the new model.
- Added utility function to create base64 white image placeholders based on aspect ratio configurations.
- Added `glm-4.6` model to registry and documentation.
- Updated Gemini CLI executor to pass configuration to `prepareGeminiCLITokenSource` for improved token management.
- Introduced `gemini-2.5-flash-image` model with updated definitions in registry.
- Enhanced model marker detection in Gemini CLI executor to include support for the new model.
feat(translator): enhance request and response parsing for Gemini API and CLI
- Added support for removing predefined JSON paths (`additionalProperties`, `$schema`, `ref`) during request transformation for Gemini.
- Introduced `FunctionIndex` parameter to manage function call indexing in streaming responses for both API and CLI translators.
- Improved handling of tool call content and function call templates in response parsing logic.
feat(translator): add support for single input string in Codex responses parser
- Modified input parsing logic to handle cases where input is a single string instead of an array.
- Added functionality to convert single string inputs into structured JSON format.
- Added `DefaultConfigPath` variable to manage default configuration file paths.
- Updated `config` flag to use `DefaultConfigPath` for better path handling.
- Added new "Gemini 2.5 Flash Image Preview" model definition, with enhanced image generation capabilities.
- Increased scanner buffer size to 20,971,520 bytes across executors and translators to handle larger payloads.
- Added `ensureGeminiProjectAndOnboard` to streamline project onboarding.
- Implemented API checks for Cloud AI enablement to ensure compatibility.
- Extended record metadata with additional onboarding details such as `auto` and `checked`.
- Centralized OAuth success HTML response in `oauthCallbackSuccessHTML`.
- Introduced callback forwarders for Anthropic, Gemini, Codex, and iFlow OAuth flows.
- Added `is_webui` query parameter detection to enhance Web UI compatibility.
- Implemented mechanisms to start and stop callback forwarders dynamically.
- Improved error handling and logging for callback server initialization.
- Integrated iFlow as a new authentication provider with OAuth.
- Updated README and documentation for iFlow-specific configuration.
- Enhanced CLI and Docker commands to support iFlow login and server setup.
- Expanded model routing to include iFlow-supported models.
Previously, management API routes were conditionally registered at server startup based on the presence of the `remote-management-key`. This static approach meant a server restart was required to enable or disable these endpoints.
This commit refactors the route handling by:
1. Introducing an `atomic.Bool` flag, `managementRoutesEnabled`, to track the state.
2. Always registering the management routes at startup.
3. Adding a new `managementAvailabilityMiddleware` to the management route group.
This middleware checks the `managementRoutesEnabled` flag for each request, rejecting it if management is disabled. This change provides the same initial behavior but creates a more flexible architecture that will allow for dynamically enabling or disabling management routes at runtime in the future.
- Introduced `trimmedRequest` for consistent project ID trimming.
- Improved handling of project ID retrieval from Gemini onboarding responses.
- Added safeguards to maintain the requested project ID when discrepancies occur.
- Refined trimming and normalization logic for `baseURL` and `apiKey` attributes.
- Updated `Authorization` header logic to omit empty API keys.
- Enhanced compatibility processing by handling empty `api-key-entries`.
- Improved legacy format fallback and added safeguards for empty credentials across executor paths.
- Implemented logic to delete `previous_response_id` property from the request body in Codex executor.
- Applied changes consistently across relevant Codex executor paths.
- Introduced `proxyURL` parameter for `EnsureLatestManagementHTML` to enable proxy configuration.
- Refactored HTTP client initialization with new `newHTTPClient` to support proxy-aware requests.
- Updated asset download and fetch logic to utilize the proxy-aware HTTP client.
- Adjusted `server.go` to pass `cfg.ProxyURL` for management asset synchronization calls.
- Dropped unused `context` and `managementasset` imports from `main.go`.
- Removed logic for management control panel asset synchronization, including `EnsureLatestManagementHTML`.
- Simplified server initialization by eliminating related control panel functionality.
- Added explanation for the `remote-management.disable-control-panel` configuration in both `README.md` and `README_CN.md`.
- Clarified its functionality to disable the bundled management UI and return 404 for `/management.html`.
- Introduced `EnsureLatestManagementHTML` to sync `management.html` asset from the latest GitHub release.
- Added config option `DisableControlPanel` to toggle control panel functionality.
- Serve management control panel via `/management.html` endpoint, with automatic download and update mechanism.
- Updated `.gitignore` to include `static/*` directory for control panel assets.
- Replaced `SyncInlineAPIKeys` with `MakeInlineAPIKeyProvider` for better clarity and reduced redundancy.
- Removed legacy logic for inline API key syncing and migration.
- Enhanced provider synchronization logic to handle empty states consistently.
- Added normalization to API key handling across configurations.
- Updated handlers to reflect streamlined provider update logic.
- Added automatic trimming of API keys and migration of legacy `api-keys` to `api-key-entries`.
- Introduced per-key `proxy-url` handling across OpenAI, Codex, and Claude API configurations.
- Updated documentation to clarify usage of `proxy-url` with examples, ensuring backward compatibility.
- Added normalization logic to reduce duplication and improve configuration consistency.
- Added nil check for proxy user credentials to prevent potential nil pointer dereference.
- Enhanced authentication logic for SOCKS5 proxies in `proxy_helpers.go` and `proxy.go`.
- Introduced `ProxyURL` field to Claude and Codex API key configurations.
- Added support for `api-key-entries` in OpenAI compatibility section with per-key proxy configuration.
- Maintained backward compatibility for legacy `api-keys` format.
- Updated logic to prioritize `api-key-entries` where applicable.
- Improved documentation and examples to reflect new proxy support.
- Added `newProxyAwareHTTPClient` to centralize proxy configuration with priority on `auth.ProxyURL` and `cfg.ProxyURL`.
- Integrated enhanced proxy support across executors for HTTP, HTTPS, and SOCKS5 protocols.
- Refactored redundant HTTP client initialization to use `newProxyAwareHTTPClient` for consistent behavior.
- Added SOCKS5 proxy support, including authentication.
- Improved handling of proxy schemes and associated error logging.
- Enhanced transport creation for HTTP, HTTPS, and SOCKS5 proxies with better configuration management.
- Introduced new model definition for `Claude 4.5 Sonnet` with metadata and creation details.
- Ensures compatibility and access to the latest Claude model variant.
- Introduced `promptForProjectSelection` to enable interactive project selection for better user onboarding.
- Improved project validation and handling when no preset project ID is provided.
- Added a default project prompt mechanism to guide users through project selection seamlessly.
- Refined error handling for onboarding and project selection failures.
- Integrated Gemini CLI user setup into the `DoLogin` flow for streamlined authentication.
- Added project selection handling, automatic project detection, and validation of Cloud AI API enablement.
- Implemented new helper functions for Gemini CLI operations, project fetching, and onboarding logic.
- Enhanced token storage and metadata updates for better user and project management.
- Replaced `TokenRecord` with `coreauth.Auth` for centralized and consistent authentication data structures.
- Migrated `TokenStore` interface to `coreauth.Store` for alignment with core CLIProxy authentication.
- Updated related login methods, token persistence logic, and file storage handling to use the new `coreauth.Auth` model.
The OpenAI Codex Responses API (chatgpt.com/backend-api/codex/responses)
rejects requests containing max_output_tokens and max_completion_tokens fields,
causing Factory CLI to fail with "Unsupported parameter" errors.
This fix strips these incompatible fields during request translation, allowing
Factory CLI to work properly with CLIProxyAPI when using ChatGPT Plus/Pro OAuth.
Fixes compatibility issue where Factory sends token limit parameters that aren't
supported by the Codex Responses endpoint.
- Moved `config-api-key` provider logic from SDK to the internal `config_access` package.
- Updated provider registration and initialization to ensure proper management via `Register` function.
- Removed redundant `config-api-key` documentation, simplifying configuration examples.
- Adjusted related imports and reconciliations for seamless integration with the new structure.
Introduces a new utility function, `util.ResolveAuthDir`, to handle the normalization and resolution of the authentication directory path.
Previously, the logic for expanding the tilde (~) to the user's home directory was implemented inline in `main.go`. This refactoring extracts that logic into a reusable function within the `util` package.
The new `ResolveAuthDir` function is now used consistently across the application:
- During initial server startup in `main.go`.
- When counting authentication files in `util.CountAuthFiles`.
- When the configuration is reloaded by the watcher.
This change eliminates code duplication, improves consistency, and makes the path resolution logic more robust and maintainable.
The logic for reconciling access providers, updating the manager, and logging the changes was previously handled directly in the service layer.
This commit introduces a new `ApplyAccessProviders` helper function in the `internal/access` package to encapsulate this entire process. The service layer is updated to use this new helper, which simplifies its implementation and reduces code duplication.
This refactoring centralizes the provider update logic and improves overall code maintainability. Additionally, the `sdk/access` package import is now aliased to `sdkaccess` for clarity.
- Replaced `config.Config` with `SDKConfig` in authentication and provider logic for consistency with SDK changes.
- Updated provider registration, reconciliation, and build functions to align with the `SDKConfig` structure.
- Refactored related imports and handlers to support the new configuration approach.
- Improved clarity and reduced redundancy in API key synchronization and provider initialization.
- Replaced `config.Config` with `config.SDKConfig` across components for simpler configuration management.
- Updated proxy setup functions and handlers to align with `SDKConfig` improvements.
- Reorganized handler imports to match new SDK structure.
- Integrated input, output, reasoning, and total token tracking in response processing for Claude and OpenAI.
- Ensured support for usage details even when specific fields are missing in the response.
- Enhanced completion outputs with aggregated usage details for accurate reporting.
- Implemented mapping for input, output, and total token usage in Gemini OpenAI response processing.
- Ensured compatibility with existing response structure even when specific token details are unavailable.
- Introduced unique `user_id` metadata generation in OpenAI to Claude transformation functions.
- Utilized `uuid` and `sha256` for deterministic `account`, `session`, and `user` values.
- Embedded `user_id` into request payloads to enhance request tracking and identification.
Previously, if an OpenAI compatibility configuration was removed from the
config file or its model list was emptied, the associated models for
that auth entry were not unregistered from the global model registry.
This resulted in stale registrations persisting.
This change ensures that when an auth entry is identified as being for
a compatibility provider, its models are explicitly unregistered if:
- The corresponding configuration is found but has an empty model list.
- The corresponding configuration is no longer found in the config file.
The `ReconcileProviders` function was incorrectly including the default
inline provider (`access.teleport.dev`) in the lists of added, updated,
and removed providers.
The inline provider is a special case managed directly by the access
controller and does not correspond to a separate, reloadable resource.
Including it in the change lists could lead to errors when attempting
to perform lifecycle operations on it.
This commit modifies the reconciliation logic to explicitly ignore the
inline provider when calculating changes. This ensures that only
external, reloadable providers are reported as changed, preventing
incorrect lifecycle management.
The provider reconciliation logic did not correctly handle aliased provider configurations (e.g., using YAML anchors). When a provider config was aliased, the check for configuration equality would pass, causing the system to reuse the existing provider instance without rebuilding it, even if the underlying configuration had changed.
This change introduces a check to detect if the old and new provider configurations point to the same object in memory. If they are aliased, the provider is now always rebuilt to ensure it reflects the latest configuration. The optimization to reuse an existing provider based on deep equality is now only applied to non-aliased providers.
The `RegisterClient` function previously deduplicated the list of models provided by a client. This could lead to an inaccurate representation of the client's state if it intentionally registered the same model ID multiple times.
This change refactors the registration logic to store the raw, unfiltered list of models, preserving their original order and count.
A new `rawModelIDs` slice tracks the complete list for storage in `clientModels`, while the logic for processing changes continues to use a unique set of model IDs for efficiency. This ensures the registry's state accurately reflects what the client provides.
When a client re-registers with the model registry, its previous status for a given model (e.g., quota exceeded or suspended) was not being cleared. This could lead to a situation where a client is permanently unable to use a model even after re-registering.
This change ensures that when a client re-registers an existing model, its ID is removed from the model's `QuotaExceededClients` and `SuspendedClients` lists. This effectively resets the client's status for that model, allowing for a fresh start upon reconnection.
The previous model registration logic used a set-like map to track the models associated with a client. This caused issues when a client registered multiple instances of the same model ID, as they were all treated as a single registration.
This commit refactors the registration logic to use count maps for both the old and new model lists. This allows the system to accurately track the number of instances for each model ID provided by a client.
The changes ensure that:
- When a client updates its model list, the exact number of added or removed instances for each model ID is correctly calculated.
- Provider counts are accurately incremented or decremented based on the number of model instances being added, removed, or having their provider changed.
- The registry correctly handles scenarios where a client reduces the number of duplicate model registrations (e.g., from `[A, A]` to `[A]`), properly deregistering the surplus instance.
When a client re-registered and changed its provider from a non-empty value to an empty string, the logic would still trigger a provider update for the client's models. An empty provider string should not cause an update.
This commit fixes this behavior by adding a check to ensure the new provider is a non-empty string before updating the model's provider information.
Additionally, the logic for detecting a provider change has been simplified by removing an unnecessary variable.
When a client changed its provider and registered a new model in the same `RegisterClient` call, the logic would incorrectly attempt to decrement the provider count for the new model from the old provider. This was because the loop iterated over all new model IDs without checking if they were part of the client's previous registration.
This commit adds a check to ensure that a model existed in the client's old model set before attempting to decrement the old provider's usage count. This prevents incorrect state updates in the registry during provider transitions that also introduce new models.
This commit introduces a reconciliation mechanism for handling configuration updates, significantly improving efficiency and resource management.
Previously, reloading the configuration would tear down and recreate all access providers from scratch, regardless of whether their individual configurations had changed. This was inefficient and could disrupt services.
The new `sdkaccess.ReconcileProviders` function now compares the old and new configurations to intelligently manage the provider lifecycle:
- Unchanged providers are kept.
- New providers are created.
- Providers removed from the config are closed and discarded.
- Providers with updated configurations are gracefully closed and recreated.
To support this, a `Close()` method has been added to the `Provider` interface.
A similar reconciliation logic has been applied to the client registration state in `state.RegisterClient`. This ensures that model registrations are accurately tracked when a client's configuration is updated, correctly handling added, removed, and unchanged models. Enhanced logging provides visibility into these operations.
- Introduced `computeOpenAICompatModelsHash` for generating a stable hash of compatibility models.
- Enhanced `watcher` to include the hash in auth attributes, enabling dynamic updates on model list changes.
- Introduced `usage-statistics-enabled` configuration to control in-memory usage aggregation.
- Updated API to include handlers for managing `usage-statistics-enabled` and `logging-to-file` options.
- Enhanced `watcher` to log changes to both configurations dynamically.
- Updated documentation and examples to reflect new configuration options.
- Added references to EasyCLI and Cli-Proxy-API-Management-Center in both English and Chinese README files.
- Updated SDK documentation section with new links for access, watcher, and custom provider examples.
- Introduced a keep-alive endpoint to monitor service activity.
- Added timeout-specific shutdown functionality when the endpoint is idle.
- Implemented password-protected access for the keep-alive endpoint.
- Updated server startup to support configurable keep-alive options.
- Implemented a global logger with structured formatting for consistent log output.
- Added support for rotating log files using Lumberjack.
- Integrated new logging functionality with Gin HTTP server for unified log handling.
- Replaced direct `log.Info` calls with `fmt.Printf` in non-critical paths to simplify core functionality.
- Added `label` field to the management API for better token identification.
- Updated request payload and validation logic to include `label` as a required field.
- Adjusted documentation (`MANAGEMENT_API.md`, `MANAGEMENT_API_CN.md`) to reflect changes.
- Added extensive SDK usage guides for `cliproxy`, `sdk/access`, and watcher integration.
- Introduced `--password` flag for specifying local management access passwords.
- Enhanced management API with local password checks to secure localhost requests.
- Updated documentation to reflect the new password functionality.
- Eliminated `allow-localhost-unauthenticated` configuration field and its usage.
- Removed associated management API handlers and middleware logic.
- Simplified authentication middleware by deprecating localhost-specific checks.
- Changed log timestamp format in `request_logger.go` to align with ISO standards for improved readability.
- Removed deprecated `force-gpt-5-codex` handlers from management API.
- Eliminated the deprecated `force-gpt-5-codex` configuration option from all configs, handlers, and documentation.
- Updated examples and README files to reflect the removal.
- Streamlined related code by dropping unused fields and logging.
- Updated all authentication commands to use `fmt` instead of `log` for errors, info, and success messages.
- Ensured message formatting aligns across commands for consistent user experience.
- Introduced `POST /gemini-web-token` endpoint to save Gemini Web cookies directly.
- Added payload validation and hashed-based file naming for persistence.
- Updated documentation to reflect the new management API functionality.
- Removed `FileStore` in favor of the new `FileTokenStore`.
- Centralized auth JSON handling and token operations through `FileTokenStore`.
- Updated all components to utilize `FileTokenStore` for consistent storage operations.
- Introduced `SetBaseDir` and directory locking mechanisms for flexible configurations.
- Enhanced metadata management, including path resolution and deep JSON comparisons.
- Introduced `RegisterTokenStore` and `GetTokenStore` to centralize token store access.
- Replaced direct file operations with a unified token persistence API.
- Updated all components to use the shared token store for consistent behavior.
- Improved logging for token save operations to include file paths.
- Introduced in-memory request statistics aggregation in `LoggerPlugin`.
- Added new structures for detailed metrics collection (e.g., token breakdown, request success/failure).
- Implemented `/usage` management API endpoint for retrieving aggregated statistics.
- Updated management handlers to support the new usage statistics functionality.
- Enhanced documentation to describe the usage metrics API.
- Adjusted stream handling to skip "[DONE]" chunks.
- Ensured "data:" prefix is trimmed for non-prefixed input in translation.
- Removed `session_id` from request bodies before processing.
The Gemini Web API client logic has been relocated from `internal/client/gemini-web` to a new, more specific `internal/provider/gemini-web` package. This refactoring improves code organization and modularity by better isolating provider-specific implementations.
As a result of this move, the `GeminiWebState` struct and its methods have been exported (capitalized) to make them accessible from the executor. All call sites have been updated to use the new package path and the exported identifiers.
- Added `ModelState` for detailed per-model runtime status management.
- Implemented methods to manage model-specific error handling, quotas, and recovery logic.
- Enhanced aggregated availability calculations for auth entries with model-specific states.
- Updated retry and recovery logic to operate separately for models and auth entries.
- Improved selector logic to filter based on model states and availability.
- Deleted `ClientAdapter` implementation and associated fallback methods.
- Removed legacy executor logic from `codex`, `claude`, `gemini`, and `qwen` executors.
- Simplified `handlers` by eliminating `UnwrapError` handling and related dependencies.
- Cleaned up `model_registry` by removing logic associated with suspended clients.
- Updated `.gitignore` to ignore `.serena/` directory.
- Implemented `TokenCount` transform method across translators to calculate token usage.
- Integrated token counting logic into executor pipelines for Claude, Gemini, and CLI translators.
- Added corresponding API endpoints and handlers (`/messages/count_tokens`) for token usage retrieval.
- Enhanced translation registry to support `TokenCount` functionality alongside existing response types.
- Switched to a slice-based queue with mutex and condition variable for better control over queuing and dispatching.
- Removed fixed buffer size to handle dynamic queuing.
- Enhanced shutdown logic to safely close the queue and wake up waiting goroutines.
- Added `LoggerPlugin` to log usage metrics for observability.
- Introduced a new `Manager` to handle usage record queuing and plugin registration.
- Integrated new usage reporter and detailed metrics parsing into executors, covering providers like OpenAI, Codex, Claude, and Gemini.
- Improved token usage breakdown across streaming and non-streaming responses.
- Enhanced support for extracting system instructions from input arrays.
- Improved input message role and type determination logic for consistent message processing.
- Refined instruction handling logic across translator types for better compatibility.
- Enhanced support for extracting system instructions from input arrays.
- Improved input message role and type determination logic for consistent message processing.
- Refined instruction handling logic across translator types for better compatibility.
- Added `ConvertCodexResponseToClaudeNonStream`, `ConvertGeminiCLIResponseToClaudeNonStream`, `ConvertGeminiResponseToClaudeNonStream`, and `ConvertOpenAIResponseToClaudeNonStream` methods for handling non-streaming JSON response conversion.
- Introduced logic for parsing and structuring content, handling reasoning, text, and tool usage blocks.
- Enhanced support for stop reasons and refined token usage data aggregation.
This commit simplifies the Gemini web client by removing several complex, stateful features. The previous implementation for auto-refreshing cookies and auto-closing the client involved background goroutines, timers, and file system caching, which made the client's lifecycle difficult to manage.
The following features have been removed:
- The cookie auto-refresh mechanism, including the background goroutine (`rotateCookies`) and related configuration fields.
- The file-based caching for the `__Secure-1PSIDTS` token. The `rotate1PSIDTS` function now fetches a new token on every call.
- The auto-close functionality, which used timers to close the client after a period of inactivity.
- Associated configuration options and methods (`WithAccountLabel`, `WithOnCookiesRefreshed`, `Close`, etc.).
By removing this logic, the client becomes more stateless and predictable. The responsibility for managing the client's lifecycle and handling token expiration is now shifted to the caller, leading to a simpler and more robust integration.
- Integrated ZSTD decompression via `github.com/klauspost/compress` for responses with "zstd" content-encoding.
- Added helper `hasZSTDEcoding` to detect ZSTD-encoded responses.
- Updated response handling logic to initialize and use a ZSTD decoder when necessary.
refactor(api-handlers): split streaming and non-streaming response handling
- Introduced `handleNonStreamingResponse` for processing non-streaming requests in `ClaudeCodeAPIHandler`.
- Improved code clarity by separating streaming and non-streaming logic.
fix(service): remove redundant token refresh interval assignment logic in `cliproxy` service.
- Updated loop iteration in `AuthSelector` to correct index management for selecting candidates.
- Fixed cursor index reset condition for large values to prevent overflow.
- Removed unnecessary conditional reassignment of `allowRemote` in management handler for clarity and correctness.
- Renamed field `NextRefreshAfter` to `NextRetryAfter` across `AuthManager`, `types`, and selector logic.
- Updated references to ensure proper handling of retry timing logic.
- Improved code readability and clarified retry behavior for different auth states.
- Added model suspension with reason tracking for 401 (unauthorized) and 402/403 (payment-related) errors.
- Implemented resumption logic upon model quota recovery or auth state changes.
- Enhanced registry to manage suspended clients, including counts and observability data.
- Updated availability computation to exclude suspended clients, ensuring accurate client model tracking.
- Added async dispatch loop to `Watcher` for handling incremental `AuthUpdate` with in-memory buffering.
- Improved resilience against high-frequency auth changes by coalescing updates and reducing redundant processing.
- Updated `cliproxy` service to increase auth update queue capacity and optimize backlog consumption.
- Added detailed SDK integration documentation in English and Chinese (`sdk-watcher.md`, `sdk-watcher_CN.md`).
- Added support for incremental auth updates using `AuthUpdate` and `AuthUpdateAction`.
- Integrated `SetAuthUpdateQueue` to propagate updates through a dedicated channel.
- Introduced new methods for handling auth add, modify, and delete actions.
- Updated service to ensure auth update queues are correctly initialized and consumed.
- Improved auth state synchronization across core and file-based clients with real-time updates.
- Refactored redundant auth handling logic for better efficiency and maintainability.
- Added `sdk-access.md` and `sdk-access_CN.md` documentation files.
- Included detailed guidelines for authentication manager lifecycle, configuration, built-in and custom providers.
- Documented integration steps with `cliproxy` and instructions for hot reloading.
- Added `CountTokens` for token counting requests in Gemini executor.
- Integrated request translation via `sdktranslator` and response handling.
- Improved error handling, logging, and API request configuration with headers.
- Introduced `CountTokens` method to Codex, Claude, Gemini, Qwen, OpenAI-compatible, and other executors.
- Implemented `ExecuteCount` in `AuthManager` for token counting via provider round-robin.
- Updated handlers to leverage `ExecuteCountWithAuthManager` for streamlined token counting.
- Added fallback and error handling logic for token counting requests.
- Introduced `EnsureHeader` in `internal/misc/header_utils.go` to streamline header setting across executors.
- Updated Codex, Claude, and Gemini executors to utilize `EnsureHeader` for consistent header application.
- Incorporated Gin context headers (if available) into request header manipulation for better integration.
- Added detailed debug logs in all executors (Codex, Claude, Gemini, Qwen, OpenAI-compatible) to capture HTTP status and response body for failed API requests.
- Added detailed debug logs in all executors (Codex, Claude, Gemini, Qwen, OpenAI-compatible) to capture HTTP status and response body for failed API requests.
- Added `last_refresh` timestamp to metadata for Codex, Claude, Qwen, and Gemini executors.
- Implemented `extractLastRefreshTimestamp` utility for parsing diverse timestamp formats in management handlers.
- Ensured consistent update and preservation of `last_refresh` in file-based auth handling.
- Introduced dynamic `providerKey` resolution for OpenAI-compatible providers, incorporating attributes like `provider_key` and `compat_name`.
- Implemented upstream model overrides via `resolveUpstreamModel` and `overrideModel` methods in the OpenAI executor.
- Updated registry logic to correctly store provider mappings and register clients using normalized keys.
- Ensured consistency in handling empty or default provider names across components.
- Added `GinLogrusLogger` for structured request logging using Logrus.
- Implemented `GinLogrusRecovery` to handle panics and log stack traces.
- Configured log rotation using Lumberjack for efficient log management.
- Replaced Gin's default logger and recovery middleware with the custom implementations.
- Deleted `Refresh` implementations in Codex, Claude, Gemini, Qwen, and Gemini-web authenticators.
- Updated the `Authenticator` interface to exclude `Refresh` for cleaner design.
- Revised `Manager` and related components to handle refresh logic improvements.
- Simplified token refresh behavior and eliminated redundant code paths.
- Introduced `logrus` for structured debugging across all executors.
- Added debug log messages in `Refresh` methods for better traceability.
- Updated `Manager` to log additional details during refresh checks.
- Replaced legacy `api-keys` field with `auth.providers` in configuration, supporting multiple authentication providers including `config-api-key`.
- Added synchronization to maintain compatibility with legacy `api-keys`.
- Updated core components like request handling and middleware to use the new provider system.
- Enhanced management API endpoints for seamless integration with `auth.providers`.
- Added `Refresh` method implementations for Codex, Claude, Gemini, and Qwen executors.
- Introduced OAuth-based token handling for Gemini and Qwen with support for refresh tokens.
- Updated Codex and Claude to use new internal auth services.
- Enhanced metadata structure and consistency for token storage across all executors.
- Added `examples/custom-provider/main.go` showcasing custom executor and translator integration using the SDK.
- Removed redundant debug logs from translator modules to enhance code cleanliness.
- Updated SDK documentation with new usage and advanced examples.
- Expanded the management API with new endpoints, including request logging and GPT-5 Codex features.
- Added helper functions to log API request and response payloads in the Gin context.
- Improved `AccountInfo` to support cookie-based authentication in addition to API key and OAuth.
- Updated log messages for better clarity on account types used.
- Renamed constants from uppercase to CamelCase for consistency.
- Replaced redundant file-based auth handling logic with the new `util.CountAuthFiles` helper.
- Fixed various error-handling inconsistencies and enhanced robustness in file operations.
- Streamlined auth client reload logic in server and watcher components.
- Applied minor code readability improvements across multiple packages.
- Unified `dataTag` initialization by removing spaces after `data:`.
- Replaced manual slicing with `bytes.TrimSpace` for consistent and robust handling of JSON payloads.
- Introduced `gemini-2.5-flash-image-preview` alias in `GeminiWebAliasMap` for enhanced model handling.
- Added `gemini-2.5-flash-image-preview` as a new model variant with custom ID, name, display name, and description.
The logic for reading authentication files, which includes retries and a preference for cookie snapshot files, was previously implemented locally within the `watcher` package. This was done to handle potential file locks during writes.
This change moves this functionality into a shared `ReadAuthFileWithRetry` function in the `util` package to promote code reuse and consistency.
The `watcher` package is updated to use this new centralized function. Additionally, the initial token loading in the `run` command now also uses this logic, making it more resilient to file access issues and consistent with the watcher's behavior.
The logic for logging the path where credentials are saved was duplicated across several client implementations.
This commit refactors this behavior by creating a new centralized function, `misc.LogSavingCredentials`, to handle this logging. The `SaveTokenToFile` method in each authentication token storage struct now calls this new function, ensuring consistent logging and reducing code duplication.
The redundant logging statements in the client-level `SaveTokenToFile` methods have been removed.
This commit introduces a generic `cookies.Manager` to centralize the logic for handling cookie snapshots, which was previously duplicated across the Gemini and PaLM clients. This refactoring eliminates code duplication and improves maintainability.
The new `cookies.Manager[T]` in `internal/auth/cookies` orchestrates the lifecycle of cookie data between a temporary snapshot file and the main token file. It provides `Apply`, `Persist`, and `Flush` methods to manage this process.
Key changes:
- A generic `Manager` is created in `internal/auth/cookies`, usable for any token storage type.
- A `Hooks` struct allows for customizable behavior, such as custom merging strategies for different token types.
- Duplicated snapshot handling code has been removed from the `gemini-web` and `palm` persistence packages.
- The `GeminiWebClient` and `PaLMClient` have been updated to use the new `cookies.Manager`.
- The `auth_gemini` and `auth_palm` CLI commands now leverage the client's `Flush` method, simplifying the command logic.
- Cookie snapshot utility functions have been moved from `internal/util/files.go` to a new `internal/util/cookies.go` for better organization.
The logic for managing cookie persistence files was previously implemented directly within the `gemini-web` client's persistence layer. This approach was not reusable and led to duplicated helper functions.
This commit refactors the cookie persistence mechanism by:
- Renaming the concept from "sidecar" to "snapshot" for clarity.
- Extracting file I/O and path manipulation logic into a new, generic `internal/util/cookie_snapshot.go` file.
- Creating reusable utility functions: `WriteCookieSnapshot`, `TryReadCookieSnapshotInto`, and `RemoveCookieSnapshot`.
- Updating the `gemini-web` persistence code to use these new centralized utility functions.
This change improves code organization, reduces duplication, and makes the cookie snapshot functionality easier to maintain and potentially reuse across other clients.
- Eliminated unnecessary calls to `misc.CodexInstructions` and corresponding checks in response processing.
- Streamlined instruction handling by directly updating "instructions" field from the original request payload.
- Simplified `gpt_5_instructions.txt` by removing redundancy and aligning scope with AGENTS.md standard instructions.
- Enhanced clarity and relevance of directives per Codex context.
- Added `CodexInstructions(modelName string)` function to dynamically select instructions based on the model (e.g., GPT-5 Codex).
- Introduced `gpt_5_instructions.txt` and `gpt_5_codex_instructions.txt` for respective model configurations.
- Updated translators to pass `modelName` and use the new instruction logic.
This commit introduces two major enhancements to the Gemini Web client to improve user experience and conversation continuity.
First, it implements a pseudo-streaming mechanism for non-code mode. The Gemini Web API returns the full response at once in this mode, leading to a poor user experience with a long wait for output. This change splits the full response into smaller chunks and sends them with an 80ms delay, simulating a real-time streaming effect.
Second, the conversation context reuse logic is now more robust. A fallback mechanism has been added to reuse conversation metadata when a clear continuation context is detected (e.g., a user replies to an assistant's turn). This improves conversational flow. Metadata lookups have also been improved to check both the canonical model key and its alias for better compatibility.
- Introduced `streamDone` channel to signal the completion of streaming goroutines.
- Updated `processStreamingChunks` to incorporate proper cleanup and defer close operations.
- Ensured `streamWriter` and associated resources are released when streaming completes.
- Add SanitizeSchemaForGemini utility handling union types, allOf, exclusiveMinimum
- Fix both gemini-cli and gemini API translators
- Resolve "Proto field is not repeating, cannot start list" errors
- Maintain backward compatibility with fallback logic
This fixes Claude Code CLI compatibility issues when using tools with either
Gemini CLI credentials or direct Gemini API keys by properly sanitizing
JSON Schema fields that are incompatible with Gemini's Protocol Buffer validation.
- Introduced a new `ForceGPT5Codex` configuration option in settings.
- Added relevant API endpoints for managing `ForceGPT5Codex`.
- Enhanced Codex client to handle GPT-5 Codex-specific logic and mapping.
- Updated example configuration file to include the new option.
Add GPT-5 Codex model support and configuration options in documentation
- Updated all `github.com/luispater/CLIProxyAPI/internal/...` imports to point to `github.com/luispater/CLIProxyAPI/v5/internal/...`.
- Adjusted `go.mod` to specify `module github.com/luispater/CLIProxyAPI/v5`.
- Introduced a new `attemptInfo` structure to track failed login attempts per IP.
- Added logic to temporarily ban IPs exceeding the allowed number of failures.
- Enhanced middleware to reset failed attempt counters on successful authentication.
- Updated `Handler` to include a `failedAttempts` map with thread-safe access.
- Transitioned OAuth callback handling from temporary servers to predefined persistent endpoints.
- Simplified token retrieval by replacing in-memory handling with state-file-based persistence.
- Introduced unified `oauthStatus` map for tracking flow progress and errors.
- Added new `/auth/*/callback` routes, streamlining code and state management for OAuth flows.
- Improved error handling and logging in token exchange and callback flows.
- Implemented new handlers (`RequestGeminiCLIToken`, `RequestAnthropicToken`, `RequestCodexToken`, `RequestQwenToken`) for initiating OAuth flows.
- Added endpoints for authorization URLs in management API.
- Extracted `GenerateRandomState` to a reusable utility in `misc` package.
- Refactored related logic in `openai_login.go` and `anthropic_login.go` for consistency.
Add endpoints for initiating OAuth flows in management API documentation
- Documented new endpoints for Anthropic, Codex, Gemini CLI, and Qwen login URLs.
- Provided request and response examples for each endpoint in both English and Chinese versions.
Add support for API key indexing in OpenAI compatibility clients
- Updated `NewOpenAICompatibilityClient` to accept `apiKeyIndex` for managing multiple API keys.
- Modified client instantiation loops to initialize one client per API key.
- Adjusted client ID format to include `apiKeyIndex` for unique identification.
- Removed API key rotation logic within `GetCurrentAPIKey`.
- Updated `.gitignore` to include `AGENTS.md`.
- Introduced `SetLogLevel` utility function for unified log level management.
- Updated dynamic log level handling across server and watcher components.
- Extended auth files response by extracting and including the `type` field from file content.
- Updated management API documentation with the new `type` field in auth files response.
- Added JSON annotations across all configuration structs in `config.go`.
- Introduced `/config` management API endpoint to fetch complete configuration.
- Updated management API documentation (`MANAGEMENT_API.md`, `MANAGEMENT_API_CN.md`) with `/config` usage.
- Implemented `GetConfig` handler in `config_basic.go`.
- Introduced `Version` variable, set during build via `-ldflags`, to embed application version.
- Updated Dockerfile to accept `APP_VERSION` argument for version injection during build.
- Modified `.goreleaser.yml` to pass GitHub release tag as version via `ldflags`.
- Added version logging in the application startup.
- Introduced `Version` variable, set during build via `-ldflags`, to embed application version.
- Updated Dockerfile to accept `APP_VERSION` argument for version injection during build.
- Modified `.goreleaser.yml` to pass GitHub release tag as version via `ldflags`.
- Added version logging in the application startup.
- replace debounce timing with content-based change detection using SHA256 hashes
- skip client reload when auth file content is unchanged
- handle empty auth files gracefully by ignoring them
- ensure hash cache is updated only on successful client creation
- clean up hash cache when clients are removed
- separate file-based and API key-based clients in watcher
- improve client reloading logic with better locking and error handling
- add dedicated functions for building API key clients and loading file clients
- update combined client map generation to include cached API key clients
- enhance logging and debugging information during client reloads
- fix potential race conditions in client updates and removals
- Introduced `FunctionCallIndex` to track and manage function call indices within `ConvertCliToOpenAIParams`.
- Enhanced handling for `response.completed` and `response.output_item.done` data types to support tool call scenarios.
- Improved logic for restoring original tool names and setting function arguments during response parsing.
- Documented Gemini 2.5 Flash-Lite model in English and Chinese README files.
- Updated README and example configuration to include Codex API key settings.
- Added examples for custom Codex API endpoint configuration.
- Updated imports and function calls to use `filepath` across all token storage implementations and server entry point.
- Ensured consistent handling of directory and file paths for improved portability.
- Introduced reverse mapping logic for tool names in translators to restore original names when shortened.
- Enhanced error handling by logging API response errors consistently across handlers.
- Refactored request and response loggers to include API error details, improving debugging capabilities.
- Integrated robust tool name shortening and uniqueness mechanisms for OpenAI, Gemini, and Claude requests.
- Improved handler retry logic to properly capture and respond to errors.
- Added `buildCombinedClientMap` to merge file-based clients with API key and compatibility clients.
- Updated callbacks to use the combined client map for consistency.
- Improved error logging and variable naming for clarity in client creation logic.
- Refined API documentation structure and enhanced clarity for various endpoints, including `/debug`, `/proxy-url`, `/quota-exceeded`, and authentication management.
- Added comprehensive examples for request and response bodies across endpoints.
- Detailed object-array API key management for Codex, Gemini, Claude, and OpenAI compatibility providers.
- Enhanced descriptions for request retry logic and request logging endpoints.
- Improved authentication key handling descriptions and updated examples for YAML configuration impacts.
- Introduced functionality to handle Codex API keys, including initialization and management via new endpoints in the management API.
- Updated Codex client to support both OAuth and API key authentication.
- Documented Codex API key configuration in both English and Chinese README files.
- Enhanced logging to distinguish between API key and OAuth usage scenarios.
- Added OpenAI Codex (GPT models) support in both English and Chinese README versions.
- Documented multi-account load balancing for Codex in the feature list.
- Added OpenAI Codex (GPT models) support in both English and Chinese README versions.
- Documented multi-account load balancing for Codex in the feature list.
- Updated README and README_CN to include a guide for configuring multiple account load balancing with CLI Proxy API.
- Enhanced JSON handling in gemini translators by differentiating object and string outputs.
- Added commented debug logging for Gemini CLI response conversion.
Simplifies parsing and error handling for function arguments across OpenAI response processing methods. Replaces repeated logic with a reusable utility function.
Add support for API providers that emit `data:` (no space) in Server‑Sent Events. Introduces a new `dataUglyTag` and corresponding parsing logic to correctly process and forward these lines, ensuring compatibility with non‑standard streaming formats.
Fuck for them all
- Renamed `openai` packages to `chat_completions` across translator modules.
- Introduced `openai_responses_handlers` with handlers for `/v1/models` and OpenAI-compatible chat completions endpoints.
- Updated constants and registry identifiers for OpenAI response type.
- Simplified request/response conversions and added detailed retry/error handling.
- Added `golang.org/x/crypto` for additional cryptographic functions.
- Renamed `openai` packages to `chat_completions` across translator modules.
- Introduced `openai_responses_handlers` with handlers for `/v1/models` and OpenAI-compatible chat completions endpoints.
- Updated constants and registry identifiers for OpenAI response type.
- Simplified request/response conversions and added detailed retry/error handling.
- Added `golang.org/x/crypto` for additional cryptographic functions.
- Renamed `openai` packages to `chat_completions` across translator modules.
- Introduced `openai_responses_handlers` with handlers for `/v1/models` and OpenAI-compatible chat completions endpoints.
- Updated constants and registry identifiers for OpenAI response type.
- Simplified request/response conversions and added detailed retry/error handling.
- Added `golang.org/x/crypto` for additional cryptographic functions.
- Renamed `openai` packages to `chat_completions` across translator modules.
- Introduced `openai_responses_handlers` with handlers for `/v1/models` and OpenAI-compatible chat completions endpoints.
- Updated constants and registry identifiers for OpenAI response type.
- Simplified request/response conversions and added detailed retry/error handling.
- Added `golang.org/x/crypto` for additional cryptographic functions.
- Renamed `openai` packages to `chat_completions` across translator modules.
- Introduced `openai_responses_handlers` with handlers for `/v1/models` and OpenAI-compatible chat completions endpoints.
- Updated constants and registry identifiers for OpenAI response type.
- Simplified request/response conversions and added detailed retry/error handling.
- Added `golang.org/x/crypto` for additional cryptographic functions.
- Renamed `openai` packages to `chat_completions` across translator modules.
- Introduced `openai_responses_handlers` with handlers for `/v1/models` and OpenAI-compatible chat completions endpoints.
- Updated constants and registry identifiers for OpenAI response type.
- Simplified request/response conversions and added detailed retry/error handling.
- Added `golang.org/x/crypto` for additional cryptographic functions.
- Changed authentication requirements to mandate management keys for all requests, including local access.
- Clarified remote access setup and key provision methods.
- Adjusted the section header.
- Introduced a `ConvertGeminiRequestToGemini` function to normalize Gemini v1beta requests by ensuring valid or default roles.
- Added passthrough response handlers for both streamed and non-streamed Gemini responses.
- Registered translators for Gemini-to-Gemini traffic in the initialization process.
- Updated `gemini-cli` request normalization to align with the new Gemini translator logic.
- Implemented CRUD operations for authentication files.
- Added endpoints for managing API keys, quotas, proxy settings, and other configurations.
- Enhanced management access with robust validation, remote access control, and persistence support.
- Updated README with new configuration details.
Fixed OpenAI Chat Completions for codex
- Added handling for OpenAI-compatible providers during client reload.
- Implemented client unregistration for old clients during reload.
- Improved logging for detailed client reload insights.
Expand `AuthDir` handling to support tilde (`~`) for home directory resolution
- Added logic to replace `~` with the user's home directory in `AuthDir`.
- Prevents errors when using `~` in configuration paths.
- Implemented `RefreshTokens` method in client interfaces and Gemini clients.
- Updated handlers to call `RefreshTokens` on 401 responses and retry requests if token refresh succeeds.
- Enhanced error handling and retry logic to accommodate token refresh flow.
Update README to include Qwen login instructions
- Added Qwen OAuth login command instructions in both English and Chinese README files.
- Made minor updates to existing command examples for consistency.
- Implemented `RefreshTokens` method in client interfaces and Gemini clients.
- Updated handlers to call `RefreshTokens` on 401 responses and retry requests if token refresh succeeds.
- Enhanced error handling and retry logic to accommodate token refresh flow.
- Added a note for setting `auth-dir` on Windows systems in both English and Chinese README files.
- Improved descriptions for existing configuration options.
Address Qwen3 tool injection issue to prevent random token insertions
- Modify Qwen client to insert a placeholder tool when none is defined, avoiding erratic behavior in streaming responses.
- Added a note for setting `auth-dir` on Windows systems in both English and Chinese README files.
- Improved descriptions for existing configuration options.
- Updated all handlers to safely unlock the request mutex only if it's non-nil.
- Enhanced mutex locking and unlocking logic to avoid runtime errors.
- Improved robustness of resource cleanup across clients.
Add `GetRequestMutex` method for synchronization across clients
- Introduced a new `GetRequestMutex` method in OpenAICompatibilityClient, CodexClient, GeminiCLIClient, GeminiClient, and QwenClient for request synchronization.
- Ensures only one request is processed at a time to manage quotas effectively.
- Implemented `/v1/completions` endpoint mirroring OpenAI's completions API specification.
- Added conversion functions to translate between completions and chat completions formats.
- Introduced streaming and non-streaming response handling for completions requests.
- Updated `server.go` to register the new endpoint and include it in the API's metadata.
- Comment out verbose routing logs in the API server to reduce noise.
- Remove the `tools` field from Qwen client requests when it is an empty array.
- Add guards in Claude, Codex, Gemini‑CLI, and Gemini translators to skip tool conversion when the `tools` array is empty, preventing unnecessary payload modifications.
- Introduced `AllowLocalhostUnauthenticated` flag allowing unauthenticated requests from localhost.
- Updated authentication middleware to bypass checks for localhost when enabled.
Add new Gemini CLI models and update model registry function
- Introduced `GetGeminiCLIModels` for updated Gemini CLI model definitions.
- Added new models: "Gemini 2.5 Flash Lite" and "Gemini 2.5 Pro".
- Updated `RegisterModels` to use `GetGeminiCLIModels` in Gemini client initialization.
- Introduced OpenAI compatibility configurations for external providers, enabling model alias routing via the OpenAI API format.
- Enhanced provider logic in `GetProviderName` to handle OpenAI aliases and added new helper functions for compatibility checks.
- Updated API handlers and client initialization to support OpenAI compatibility models.
- Improved resource cleanup across clients by closing response bodies and streams using deferred functions.
- Introduced retry counter with a configurable ` RequestRetry ` limit in all handlers.
- Enhanced error handling with specific HTTP status codes for switching clients.
- Standardized response forwarding for non-retriable errors.
- Improved logging for quota and client switch scenarios.
- Simplified variable initialization in `browser.go` for readability.
- Updated error handling in `request_logger.go` with better resource cleanup using deferred anonymous functions.
Refactor API handlers to use `GetContextWithCancel` for streamlined context creation and response handling
- Replaced redundant `context.WithCancel` and `context.WithValue` logic with the new `GetContextWithCancel` utility in all handlers.
- Centralized API response storage in the given context during cancellation.
- Updated associated cancellation calls for consistency and improved resource management.
- Replaced `apiResponseData` with `AddAPIResponseData` for centralized response recording.
- Simplified cancellation logic by switching to a boolean-based `cliCancel` method.
- Removed unused `apiResponseData` slices across handlers to reduce memory usage.
- Updated `handlers.go` to support unified response data storage per request context.
A proxy server that provides an OpenAI/Gemini/Claude compatible API interface for CLI. This allows you to use CLI models with tools and libraries designed for the OpenAI/Gemini/Claude API.
A proxy server that provides OpenAI/Gemini/Claude/Codex compatible API interfaces for CLI.
## Features
It now also supports OpenAI Codex (GPT models) and Claude Code via OAuth.
So you can use local or multi-account CLI access with OpenAI(include Responses)/Gemini/Claude-compatible clients and SDKs.
This project is sponsored by Z.ai, supporting us with their GLM CODING PLAN.
GLM CODING PLAN is a subscription service designed for AI coding, starting at just $10/month. It provides access to their flagship GLM-4.7 & (GLM-5 Only Available for Pro Users)model across 10+ popular AI coding tools (Claude Code, Cline, Roo Code, etc.), offering developers top-tier, fast, and stable coding experiences.
Get 10% OFF GLM CODING PLAN:https://z.ai/subscribe?ic=8JVLJQFSKB
<td>Thanks to PackyCode for sponsoring this project! PackyCode is a reliable and efficient API relay service provider, offering relay services for Claude Code, Codex, Gemini, and more. PackyCode provides special discounts for our software users: register using <a href="https://www.packyapi.com/register?aff=cliproxyapi">this link</a> and enter the "cliproxyapi" promo code during recharge to get 10% off.</td>
<td>Thanks to AICodeMirror for sponsoring this project! AICodeMirror provides official high-stability relay services for Claude Code / Codex / Gemini CLI, with enterprise-grade concurrency, fast invoicing, and 24/7 dedicated technical support. Claude Code / Codex / Gemini official channels at 38% / 2% / 9% of original price, with extra discounts on top-ups! AICodeMirror offers special benefits for CLIProxyAPI users: register via <a href="https://www.aicodemirror.com/register?invitecode=TJNAIF">this link</a> to enjoy 20% off your first top-up, and enterprise customers can get up to 25% off!</td>
</tr>
</tbody>
</table>
## Overview
- OpenAI/Gemini/Claude compatible API endpoints for CLI models
-Support for both streaming and non-streaming responses
-OpenAI Codex support (GPT models) via OAuth login
- Claude Code support via OAuth login
- Qwen Code support via OAuth login
- iFlow support via OAuth login
- Amp CLI and IDE extensions support with provider routing
- Streaming and non-streaming responses
- Function calling/tools support
- Multimodal input support (text and images)
- Multiple account support with load balancing
- Simple CLI authentication flow
- Support for Generative Language API Key
-Support Gemini CLI with multiple account load balancing
- Multiple accounts with round-robin load balancing (Gemini, OpenAI, Claude, Qwen and iFlow)
CLIProxyAPI includes integrated support for [Amp CLI](https://ampcode.com) and Amp IDE extensions, enabling you to use your Google/ChatGPT/Claude OAuth subscriptions with Amp's coding tools:
## Usage
- Provider route aliases for Amp's API patterns (`/api/provider/{provider}/v1...`)
- Management proxy for OAuth authentication and account features
- Smart model fallback with automatic routing
- **Model mapping** to route unavailable models to alternatives (e.g., `claude-opus-4.5` → `claude-sonnet-4`)
- Security-first design with localhost-only management endpoints
- And it automates switching to various preview versions
## Configuration
The server uses a YAML configuration file (`config.yaml`) located in the project root directory by default. You can specify a different configuration file path using the `--config` flag:
| `api-keys` | string[] | [] | List of API keys that can be used to authenticate requests |
| `generative-language-api-key` | string[] | [] | List of Generative Language API keys |
### Example Configuration File
```yaml
# Server port
port: 8317
# Authentication directory (supports ~ for home directory)
auth-dir: "~/.cli-proxy-api"
# Enable debug logging
debug: false
# Proxy url, support socks5/http/https protocol, example: socks5://user:pass@192.168.1.1:1080/
proxy-url: ""
# Quota exceeded behavior
quota-exceeded:
switch-project: true # Whether to automatically switch to another project when a quota is exceeded
switch-preview-model: true # Whether to automatically switch to a preview model when a quota is exceeded
# API keys for authentication
api-keys:
- "your-api-key-1"
- "your-api-key-2"
# API keys for official Generative Language API
generative-language-api-key:
- "AIzaSy...01"
- "AIzaSy...02"
- "AIzaSy...03"
- "AIzaSy...04"
```
### Authentication Directory
The `auth-dir` parameter specifies where authentication tokens are stored. When you run the login command, the application will create JSON files in this directory containing the authentication tokens for your Google accounts. Multiple accounts can be used for load balancing.
### API Keys
The `api-keys` parameter allows you to define a list of API keys that can be used to authenticate requests to your proxy server. When making requests to the API, you can include one of these keys in the `Authorization` header:
```
Authorization: Bearer your-api-key-1
```
### Official Generative Language API
The `generative-language-api-key` parameter allows you to define a list of API keys that can be used to authenticate requests to the official Generative Language API.
## Gemini CLI with multiple account load balancing
Start CLI Proxy API server, and then set the `CODE_ASSIST_ENDPOINT` environment variable to the URL of the CLI Proxy API server.
The server will relay the `loadCodeAssist`, `onboardUser`, and `countTokens` requests. And automatically load balance the text generation requests between the multiple accounts.
> [!NOTE]
> This feature only allows local access because I couldn't find a way to authenticate the requests.
> I hardcoded `127.0.0.1` into the load balancing.
Browser-based tool to translate SRT subtitles using your Gemini subscription via CLIProxyAPI with automatic validation/error correction - no API keys needed
CLI wrapper for instant switching between multiple Claude accounts and alternative models (Gemini, Codex, Antigravity) via CLIProxyAPI OAuth - no API keys needed
Native macOS menu bar app that unifies Claude, Gemini, OpenAI, Qwen, and Antigravity subscriptions with real-time quota tracking and smart auto-failover for AI coding tools like Claude Code, OpenCode, and Droid - no API keys needed.
### [CodMate](https://github.com/loocor/CodMate)
Native macOS SwiftUI app for managing CLI AI sessions (Codex, Claude Code, Gemini CLI) with unified provider management, Git review, project organization, global search, and terminal integration. Integrates CLIProxyAPI to provide OAuth authentication for Codex, Claude, Gemini, Antigravity, and Qwen Code, with built-in and third-party provider rerouting through a single proxy endpoint - no API keys needed for OAuth providers.
VSCode extension for quick switching between Claude Code models, featuring integrated CLIProxyAPI as its backend with automatic background lifecycle management.
Windows desktop app built with Tauri + React for monitoring AI coding assistant quotas via CLIProxyAPI. Track usage across Gemini, Claude, OpenAI Codex, and Antigravity accounts with real-time dashboard, system tray integration, and one-click proxy control - no API keys needed.
A lightweight web admin panel for CLIProxyAPI with health checks, resource monitoring, real-time logs, auto-update, request statistics and pricing display. Supports one-click installation and systemd service.
A Windows tray application implemented using PowerShell scripts, without relying on any third-party libraries. The main features include: automatic creation of shortcuts, silent running, password management, channel switching (Main / Plus), and automatic downloading and updating.
### [霖君](https://github.com/wangdabaoqq/LinJun)
霖君 is a cross-platform desktop application for managing AI programming assistants, supporting macOS, Windows, and Linux systems. Unified management of Claude Code, Gemini CLI, OpenAI Codex, Qwen Code, and other AI coding tools, with local proxy for multi-account quota tracking and one-click configuration.
A modern web-based management dashboard for CLIProxyAPI built with Next.js, React, and PostgreSQL. Features real-time log streaming, structured configuration editing, API key management, OAuth provider integration for Claude/Gemini/Codex, usage analytics, container management, and config sync with OpenCode via companion plugin - no manual YAML editing needed.
> [!NOTE]
> If you developed a project based on CLIProxyAPI, please open a PR to add it to this list.
## More choices
Those projects are ports of CLIProxyAPI or inspired by it:
### [9Router](https://github.com/decolua/9router)
A Next.js implementation inspired by CLIProxyAPI, easy to install and use, built from scratch with format translation (OpenAI/Claude/Gemini/Ollama), combo system with auto-fallback, multi-account management with exponential backoff, a Next.js web dashboard, and support for CLI tools (Cursor, Claude Code, Cline, RooCode) - no API keys needed.
Never stop coding. Smart routing to FREE & low-cost AI models with automatic fallback.
OmniRoute is an AI gateway for multi-provider LLMs: an OpenAI-compatible endpoint with smart routing, load balancing, retries, and fallbacks. Add policies, rate limits, caching, and observability for reliable, cost-aware inference.
> [!NOTE]
> If you have developed a port of CLIProxyAPI or a project inspired by it, please open a PR to add it to this list.
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
The `github.com/router-for-me/CLIProxyAPI/v6/sdk/access` package centralizes inbound request authentication for the proxy. It offers a lightweight manager that chains credential providers, so servers can reuse the same access control logic inside or outside the CLI runtime.
// Supplied credentials were present but rejected.
default:
// Internal/transport failure was returned by a provider.
}
```
`Manager.Authenticate` walks the configured providers in order. It returns on the first success, skips providers that return `AuthErrorCodeNotHandled`, and aggregates `AuthErrorCodeNoCredentials` / `AuthErrorCodeInvalidCredential` for a final result.
Each `Result` includes the provider identifier, the resolved principal, and optional metadata (for example, which header carried the credential).
## Built-in `config-api-key` Provider
The proxy includes one built-in access provider:
-`config-api-key`: Validates API keys declared under top-level `api-keys`.
The blank identifier import ensures `init` runs so `sdkaccess.RegisterProvider` executes before you call `RegisteredProviders()` (or before `cliproxy.NewBuilder().Build()`).
### Metadata and auditing
`Result.Metadata` carries provider-specific context. The built-in `config-api-key` provider, for example, stores the credential source (`authorization`, `x-goog-api-key`, `x-api-key`, `query-key`, `query-auth-token`). Populate this map in custom providers to enrich logs and downstream auditing.
A provider must implement `Identifier()` and `Authenticate()`. To make it available to the access manager, call `RegisterProvider` inside `init` with an initialized provider instance.
## Error Semantics
-`NewNoCredentialsError()` (`AuthErrorCodeNoCredentials`): no credentials were present or recognized. (HTTP 401)
-`NewInvalidCredentialError()` (`AuthErrorCodeInvalidCredential`): credentials were present but rejected. (HTTP 401)
-`NewNotHandledError()` (`AuthErrorCodeNotHandled`): fall through to the next provider.
Errors propagate immediately to the caller unless they are classified as `not_handled` / `no_credentials` / `invalid_credential` and can be aggregated by the manager.
## Integration with cliproxy Service
`sdk/cliproxy` wires `@sdk/access` automatically when you build a CLI service via `cliproxy.NewBuilder`. Supplying a manager lets you reuse the same instance in your host process:
```go
coreCfg,_:=config.LoadConfig("config.yaml")
accessManager:=sdkaccess.NewManager()
svc,_:=cliproxy.NewBuilder().
WithConfig(coreCfg).
WithConfigPath("config.yaml").
WithRequestAccessManager(accessManager).
Build()
```
Register any custom providers (typically via blank imports) before calling `Build()` so they are present in the global registry snapshot.
### Hot reloading
When configuration changes, refresh any config-backed providers and then reset the manager's provider chain:
```go
// configaccess is github.com/router-for-me/CLIProxyAPI/v6/internal/access/config_access
This guide explains how to extend the embedded proxy with custom providers and schemas using the SDK. You will:
- Implement a provider executor that talks to your upstream API
- Register request/response translators for schema conversion
- Register models so they appear in `/v1/models`
The examples use Go 1.24+ and the v6 module path.
## Concepts
- Provider executor: a runtime component implementing `auth.ProviderExecutor` that performs outbound calls for a given provider key (e.g., `gemini`, `claude`, `codex`). Executors can also implement `RequestPreparer` to inject credentials on raw HTTP requests.
- Translator registry: schema conversion functions routed by `sdk/translator`. The built‑in handlers translate between OpenAI/Gemini/Claude/Codex formats; you can register new ones.
- Model registry: publishes the list of available models per client/provider to power `/v1/models` and routing hints.
## 1) Implement a Provider Executor
Create a type that satisfies `auth.ProviderExecutor`.
If your auth entries use provider `"myprov"`, the manager routes requests to your executor.
## 2) Register Translators
The handlers accept OpenAI/Gemini/Claude/Codex inputs. To support a new provider format, register translation functions in `sdk/translator`’s default registry.
Direction matters:
- Request: register from inbound schema to provider schema
- Response: register from provider schema back to inbound schema
Example: Convert OpenAI Chat → MyProv Chat and back.
The embedded server calls this automatically for built‑in providers; for custom providers, register during startup (e.g., after loading auths) or upon auth registration hooks.
## Credentials & Transports
- Use `Manager.SetRoundTripperProvider` to inject per‑auth `*http.Transport` (e.g., proxy):
```go
core.SetRoundTripperProvider(myProvider) // returns transport per auth
```
- For raw HTTP flows, implement `PrepareRequest` and/or call `Manager.InjectCredentials(req, authID)` to set headers.
## Testing Tips
- Enable request logging: Management API GET/PUT `/v0/management/request-log`
- Toggle debug logs: Management API GET/PUT `/v0/management/debug`
- Hot reload changes in `config.yaml` and `auths/` are picked up automatically by the watcher
The `sdk/cliproxy` module exposes the proxy as a reusable Go library so external programs can embed the routing, authentication, hot‑reload, and translation layers without depending on the CLI binary.
## Install & Import
```bash
go get github.com/router-for-me/CLIProxyAPI/v6/sdk/cliproxy
- See MANAGEMENT_API.md for endpoints. Your embedded server exposes them under `/v0/management` on the configured port.
## Using the Core Auth Manager
The service uses a core `auth.Manager` for selection, execution, and auto‑refresh. When embedding, you can provide your own manager to customize transports or hooks:
Note: Built‑in provider executors are wired automatically when you run the `Service`. If you want to use `Manager` stand‑alone without the HTTP server, you must register your own executors that implement `auth.ProviderExecutor`.
## Custom Client Sources
Replace the default loaders if your creds live outside the local filesystem:
The SDK service exposes a watcher integration that surfaces granular auth updates without forcing a full reload. This document explains the queue contract, how the service consumes updates, and how high-frequency change bursts are handled.
## Update Queue Contract
-`watcher.AuthUpdate` represents a single credential change. `Action` may be `add`, `modify`, or `delete`, and `ID` carries the credential identifier. For `add`/`modify` the `Auth` payload contains a fully populated clone of the credential; `delete` may omit `Auth`.
-`WatcherWrapper.SetAuthUpdateQueue(chan<- watcher.AuthUpdate)` wires the queue produced by the SDK service into the watcher. The queue must be created before the watcher starts.
- The service builds the queue via `ensureAuthUpdateQueue`, using a buffered channel (`capacity=256`) and a dedicated consumer goroutine (`consumeAuthUpdates`). The consumer drains bursts by looping through the backlog before reacquiring the select loop.
## Watcher Behaviour
-`internal/watcher/watcher.go` keeps a shadow snapshot of auth state (`currentAuths`). Each filesystem or configuration event triggers a recomputation and a diff against the previous snapshot to produce minimal `AuthUpdate` entries that mirror adds, edits, and removals.
- Updates are coalesced per credential identifier. If multiple changes occur before dispatch (e.g., write followed by delete), only the final action is sent downstream.
- The watcher runs an internal dispatch loop that buffers pending updates in memory and forwards them asynchronously to the queue. Producers never block on channel capacity; they just enqueue into the in-memory buffer and signal the dispatcher. Dispatch cancellation happens when the watcher stops, guaranteeing goroutines exit cleanly.
## High-Frequency Change Handling
- The dispatch loop and service consumer run independently, preventing filesystem watchers from blocking even when many updates arrive at once.
- Back-pressure is absorbed in two places:
- The dispatch buffer (map + order slice) coalesces repeated updates for the same credential until the consumer catches up.
- The service channel capacity (256) combined with the consumer drain loop ensures several bursts can be processed without oscillation.
- If the queue is saturated for an extended period, updates continue to be merged, so the latest state is eventually applied without replaying redundant intermediate states.
## Usage Checklist
1. Instantiate the SDK service (builder or manual construction).
2. Call `ensureAuthUpdateQueue` before starting the watcher to allocate the shared channel.
3. When the `WatcherWrapper` is created, call `SetAuthUpdateQueue` with the service queue, then start the watcher.
4. Provide a reload callback that handles configuration updates; auth deltas will arrive via the queue and are applied by the service automatically through `handleAuthUpdate`.
Following this flow keeps auth changes responsive while avoiding full reloads for every edit.
fmt.Printf("Translated request to Gemini format:\n%s\n\n",translatedRequest)
claudeResponse:=[]byte(`{"candidates":[{"content":{"role":"model","parts":[{"thought":true,"text":"Okay, here's what's going through my mind. I need to schedule a meeting"},{"thoughtSignature":"","functionCall":{"name":"schedule_meeting","args":{"topic":"Q3 planning","attendees":["Bob","Alice"],"time":"10:00","date":"2025-03-27"}}}]},"finishReason":"STOP","avgLogprobs":-0.50018133435930523}],"usageMetadata":{"promptTokenCount":117,"candidatesTokenCount":28,"totalTokenCount":474,"trafficType":"PROVISIONED_THROUGHPUT","promptTokensDetails":[{"modality":"TEXT","tokenCount":117}],"candidatesTokensDetails":[{"modality":"TEXT","tokenCount":28}],"thoughtsTokenCount":329},"modelVersion":"gemini-2.5-pro","createTime":"2025-08-15T04:12:55.249090Z","responseId":"x7OeaIKaD6CU48APvNXDyA4"}`)
returnnil,&client.ErrorMessage{StatusCode:429,Error:fmt.Errorf(`{"error":{"code":429,"message":"All the models of '%s' are quota exceeded","status":"RESOURCE_EXHAUSTED"}}`,modelName)}
}
locked:=false
fori:=0;i<len(reorderedClients);i++{
cliClient=reorderedClients[i]
ifcliClient.RequestMutex.TryLock(){
locked=true
break
}
}
if!locked{
cliClient=h.CliClients[0]
cliClient.RequestMutex.Lock()
}
returncliClient,nil
}
// GetAlt extracts the 'alt' parameter from the request query string.
// It checks both 'alt' and '$alt' parameters and returns the appropriate value.
// model mapping already logged in mapper; avoid duplicate here
caseRouteTypeAmpCredits:
fields["cost"]="amp_credits"
fields["source"]="ampcode.com"
fields["model_id"]=requestedModel// Explicit model_id for easy config reference
log.WithFields(fields).Warnf("forwarding to ampcode.com (uses amp credits) - model_id: %s | To use local provider, add to config: ampcode.model-mappings: [{from: \"%s\", to: \"<your-local-model>\"}]",requestedModel,requestedModel)
caseRouteTypeNoProvider:
fields["cost"]="none"
fields["source"]="error"
fields["model_id"]=requestedModel// Explicit model_id for easy config reference
log.WithFields(fields).Warnf("no provider available for model_id: %s",requestedModel)
}
}
// FallbackHandler wraps a standard handler with fallback logic to ampcode.com
// when the model's provider is not available in CLIProxyAPI
typeFallbackHandlerstruct{
getProxyfunc()*httputil.ReverseProxy
modelMapperModelMapper
forceModelMappingsfunc()bool
}
// NewFallbackHandler creates a new fallback handler wrapper
// The getProxy function allows lazy evaluation of the proxy (useful when proxy is created after routes)
log.Debugf("CLI login required.\nAttempting to open authentication page in your browser.\nIf it does not open, please navigate to this URL:\n\n%s\n",authURL)
varerrerror
err=open.Run(authURL)
iferr!=nil{
log.Errorf("Failed to open browser: %v. Please open the URL manually.",err)
}
// Wait for the authorization code or an error.
varauthCodestring
select{
casecode:=<-codeChan:
authCode=code
caseerr=<-errChan:
returnnil,err
case<-time.After(5*time.Minute):// Timeout
returnnil,fmt.Errorf("oauth flow timed out")
}
// Shutdown the server.
iferr=server.Shutdown(ctx);err!=nil{
log.Errorf("Failed to shut down server: %v",err)
}
// Exchange the authorization code for a token.
token,err:=config.Exchange(ctx,authCode)
iferr!=nil{
returnnil,fmt.Errorf("failed to exchange token: %w",err)
returnnil,fmt.Errorf("failed to unmarshal JWT claims: %w",err)
}
return&claims,nil
}
// base64URLDecode decodes a Base64 URL-encoded string, adding padding if necessary.
// JWTs use a URL-safe Base64 alphabet and omit padding, so this function ensures
// correct decoding by re-adding the padding before decoding.
funcbase64URLDecode(datastring)([]byte,error){
// Add padding if necessary
switchlen(data)%4{
case2:
data+="=="
case3:
data+="="
}
returnbase64.URLEncoding.DecodeString(data)
}
// GetUserEmail extracts the user's email address from the JWT claims.
func(c*JWTClaims)GetUserEmail()string{
returnc.Email
}
// GetAccountID extracts the user's account ID (subject) from the JWT claims.
// It retrieves the unique identifier for the user's ChatGPT account.
func(c*JWTClaims)GetAccountID()string{
returnc.CodexAuthInfo.ChatgptAccountID
}
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.