docs: add API usage guide with multi-language code examples

curl, Python(Anthropic/OpenAI SDK), Node.js(Anthropic/OpenAI SDK), Claude Code CLI 등 다양한 호출 방법 정리 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
docs: add Docker deployment and reverse proxy setup guides
2026-03-31 21:23:22 +09:00 · 2026-03-31 21:20:55 +09:00 · 2026-03-31 07:23:02 +08:00 · 2026-03-30 22:22:42 +08:00 · 2026-03-30 16:59:46 +08:00 · 2026-03-30 15:09:33 +08:00
22 changed files with 1601 additions and 471 deletions
--- a/API_USAGE.md
+++ b/API_USAGE.md
@@ -0,0 +1,343 @@
+# CLIProxyAPI 호출 가이드
+
+## 접속 정보
+
+| 항목 | 값 |
+|------|-----|
+| 외부 URL | `https://cliproxy.gru.farm` |
+| 내부 URL | `http://192.168.0.17:8317` |
+| API 키 | `Jinie4eva!` |
+| 인증 방식 | `Authorization: Bearer <API키>` |
+
+## 엔드포인트
+
+| 용도 | 경로 |
+|------|------|
+| Claude 네이티브 (권장) | `/api/provider/claude/v1/messages` |
+| OpenAI 호환 | `/v1/chat/completions` |
+| 모델 목록 | `/v1/models` |
+
+## 사용 가능한 모델
+
+| 모델 ID | 설명 |
+|---------|------|
+| `claude-sonnet-4-6` | Claude Sonnet 4.6 (최신, 권장) |
+| `claude-opus-4-6` | Claude Opus 4.6 (최고 성능) |
+| `claude-sonnet-4-5-20250929` | Claude Sonnet 4.5 |
+| `claude-opus-4-5-20251101` | Claude Opus 4.5 |
+| `claude-haiku-4-5-20251001` | Claude Haiku 4.5 (경량/빠름) |
+| `claude-sonnet-4-20250514` | Claude Sonnet 4 |
+| `claude-opus-4-20250514` | Claude Opus 4 |
+| `claude-3-7-sonnet-20250219` | Claude 3.7 Sonnet |
+| `claude-3-5-haiku-20241022` | Claude 3.5 Haiku |
+
+---
+
+## 1. curl
+
+### 기본 호출
+
+```bash
+curl -X POST https://cliproxy.gru.farm/api/provider/claude/v1/messages \
+  -H "Authorization: Bearer Jinie4eva!" \
+  -H "anthropic-version: 2023-06-01" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "claude-sonnet-4-6",
+    "max_tokens": 1024,
+    "messages": [
+      {"role": "user", "content": "안녕! 간단히 소개해줘"}
+    ]
+  }'
+```
+
+### 스트리밍
+
+```bash
+curl -X POST https://cliproxy.gru.farm/api/provider/claude/v1/messages \
+  -H "Authorization: Bearer Jinie4eva!" \
+  -H "anthropic-version: 2023-06-01" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "claude-sonnet-4-6",
+    "max_tokens": 1024,
+    "stream": true,
+    "messages": [
+      {"role": "user", "content": "안녕!"}
+    ]
+  }'
+```
+
+### 모델 목록 조회
+
+```bash
+curl https://cliproxy.gru.farm/v1/models \
+  -H "Authorization: Bearer Jinie4eva!"
+```
+
+---
+
+## 2. Python — Anthropic SDK
+
+### 설치
+
+```bash
+pip install anthropic
+```
+
+### 기본 호출
+
+```python
+from anthropic import Anthropic
+
+client = Anthropic(
+    base_url="https://cliproxy.gru.farm/api/provider/claude",
+    api_key="Jinie4eva!"
+)
+
+response = client.messages.create(
+    model="claude-sonnet-4-6",
+    max_tokens=1024,
+    messages=[
+        {"role": "user", "content": "안녕! 간단히 소개해줘"}
+    ]
+)
+
+print(response.content[0].text)
+```
+
+### 스트리밍
+
+```python
+from anthropic import Anthropic
+
+client = Anthropic(
+    base_url="https://cliproxy.gru.farm/api/provider/claude",
+    api_key="Jinie4eva!"
+)
+
+with client.messages.stream(
+    model="claude-sonnet-4-6",
+    max_tokens=1024,
+    messages=[
+        {"role": "user", "content": "안녕! 간단히 소개해줘"}
+    ]
+) as stream:
+    for text in stream.text_stream:
+        print(text, end="", flush=True)
+```
+
+### 시스템 프롬프트 + 멀티턴
+
+```python
+from anthropic import Anthropic
+
+client = Anthropic(
+    base_url="https://cliproxy.gru.farm/api/provider/claude",
+    api_key="Jinie4eva!"
+)
+
+response = client.messages.create(
+    model="claude-sonnet-4-6",
+    max_tokens=1024,
+    system="당신은 친절한 한국어 AI 어시스턴트입니다.",
+    messages=[
+        {"role": "user", "content": "파이썬이 뭐야?"},
+        {"role": "assistant", "content": "파이썬은 프로그래밍 언어입니다."},
+        {"role": "user", "content": "그럼 자바스크립트는?"}
+    ]
+)
+
+print(response.content[0].text)
+```
+
+---
+
+## 3. Python — OpenAI SDK (호환 모드)
+
+### 설치
+
+```bash
+pip install openai
+```
+
+### 기본 호출
+
+```python
+from openai import OpenAI
+
+client = OpenAI(
+    base_url="https://cliproxy.gru.farm/v1",
+    api_key="Jinie4eva!"
+)
+
+response = client.chat.completions.create(
+    model="claude-sonnet-4-6",
+    messages=[
+        {"role": "user", "content": "안녕!"}
+    ]
+)
+
+print(response.choices[0].message.content)
+```
+
+### 스트리밍
+
+```python
+from openai import OpenAI
+
+client = OpenAI(
+    base_url="https://cliproxy.gru.farm/v1",
+    api_key="Jinie4eva!"
+)
+
+stream = client.chat.completions.create(
+    model="claude-sonnet-4-6",
+    messages=[{"role": "user", "content": "안녕!"}],
+    stream=True
+)
+
+for chunk in stream:
+    if chunk.choices[0].delta.content:
+        print(chunk.choices[0].delta.content, end="", flush=True)
+```
+
+---
+
+## 4. Node.js — Anthropic SDK
+
+### 설치
+
+```bash
+npm install @anthropic-ai/sdk
+```
+
+### 기본 호출
+
+```javascript
+import Anthropic from "@anthropic-ai/sdk";
+
+const client = new Anthropic({
+  baseURL: "https://cliproxy.gru.farm/api/provider/claude",
+  apiKey: "Jinie4eva!",
+});
+
+const response = await client.messages.create({
+  model: "claude-sonnet-4-6",
+  max_tokens: 1024,
+  messages: [{ role: "user", content: "안녕!" }],
+});
+
+console.log(response.content[0].text);
+```
+
+### 스트리밍
+
+```javascript
+import Anthropic from "@anthropic-ai/sdk";
+
+const client = new Anthropic({
+  baseURL: "https://cliproxy.gru.farm/api/provider/claude",
+  apiKey: "Jinie4eva!",
+});
+
+const stream = client.messages.stream({
+  model: "claude-sonnet-4-6",
+  max_tokens: 1024,
+  messages: [{ role: "user", content: "안녕!" }],
+});
+
+for await (const chunk of stream) {
+  if (
+    chunk.type === "content_block_delta" &&
+    chunk.delta.type === "text_delta"
+  ) {
+    process.stdout.write(chunk.delta.text);
+  }
+}
+```
+
+---
+
+## 5. Node.js — OpenAI SDK (호환 모드)
+
+### 설치
+
+```bash
+npm install openai
+```
+
+### 기본 호출
+
+```javascript
+import OpenAI from "openai";
+
+const client = new OpenAI({
+  baseURL: "https://cliproxy.gru.farm/v1",
+  apiKey: "Jinie4eva!",
+});
+
+const response = await client.chat.completions.create({
+  model: "claude-sonnet-4-6",
+  messages: [{ role: "user", content: "안녕!" }],
+});
+
+console.log(response.choices[0].message.content);
+```
+
+---
+
+## 6. Claude Code CLI
+
+```bash
+export ANTHROPIC_BASE_URL=https://cliproxy.gru.farm/api/provider/claude
+export ANTHROPIC_API_KEY=Jinie4eva!
+
+claude
+```
+
+영구 적용 (`~/.zshrc` 또는 `~/.bashrc`):
+
+```bash
+echo 'export ANTHROPIC_BASE_URL=https://cliproxy.gru.farm/api/provider/claude' >> ~/.zshrc
+echo 'export ANTHROPIC_API_KEY=Jinie4eva!' >> ~/.zshrc
+source ~/.zshrc
+```
+
+---
+
+## 7. 환경변수로 관리
+
+`.env` 파일:
+
+```env
+ANTHROPIC_BASE_URL=https://cliproxy.gru.farm/api/provider/claude
+ANTHROPIC_API_KEY=Jinie4eva!
+```
+
+Python에서 `.env` 사용:
+
+```python
+from dotenv import load_dotenv
+from anthropic import Anthropic
+
+load_dotenv()
+
+# base_url, api_key 자동으로 환경변수에서 읽음
+client = Anthropic()
+
+response = client.messages.create(
+    model="claude-sonnet-4-6",
+    max_tokens=1024,
+    messages=[{"role": "user", "content": "안녕!"}]
+)
+print(response.content[0].text)
+```
+
+---
+
+## 주의사항
+
+- **내부망 접근 시** URL을 `http://192.168.0.17:8317`로 변경
+- **OpenAI 호환 모드**는 `/v1/chat/completions`를 사용하지만, Claude 네이티브 기능(extended thinking 등)은 `/api/provider/claude/v1/messages` 사용 권장
+- **타임아웃** 설정: 긴 응답의 경우 클라이언트 타임아웃을 600초 이상으로 설정
--- a/DOCKER_DEPLOY.md
+++ b/DOCKER_DEPLOY.md
@@ -0,0 +1,212 @@
+# CLIProxyAPI Docker 배포 가이드
+
+NAS(nas.gru.farm)에 Docker로 CLIProxyAPI를 배포하는 방법을 정리합니다.
+
+## 사전 조건
+
+| 항목 | 내용 |
+|------|------|
+| NAS 접속 | `ssh airkjw@nas.gru.farm -p 22` |
+| Docker | `sudo /usr/local/bin/docker` (NOPASSWD) |
+| Docker Compose | `sudo /usr/local/bin/docker compose` |
+| NAS 내부 IP | 192.168.0.17 |
+
+## 1. 배포 디렉토리 준비
+
+```bash
+ssh airkjw@nas.gru.farm
+
+# 배포 디렉토리 생성
+mkdir -p ~/docker/cli-proxy-api
+cd ~/docker/cli-proxy-api
+```
+
+## 2. 필요 파일 구성
+
+NAS에 아래 파일들이 필요합니다:
+
+```
+~/docker/cli-proxy-api/
+├── docker-compose.yml    # 컨테이너 설정
+├── config.yaml           # 서비스 설정 (API 키, 포트 등)
+├── auths/                # OAuth 인증 데이터 (자동 생성)
+└── logs/                 # 로그 디렉토리 (자동 생성)
+```
+
+## 3. docker-compose.yml
+
+로컬 빌드 방식 (소스에서 직접 빌드):
+
+```yaml
+services:
+  cli-proxy-api:
+    build:
+      context: .
+      dockerfile: Dockerfile
+    container_name: cli-proxy-api
+    ports:
+      - "8317:8317"       # 메인 API 포트
+      # 필요시 추가 포트 오픈
+      # - "8085:8085"
+    volumes:
+      - ./config.yaml:/CLIProxyAPI/config.yaml
+      - ./auths:/root/.cli-proxy-api
+      - ./logs:/CLIProxyAPI/logs
+    environment:
+      - TZ=Asia/Seoul
+    restart: unless-stopped
+```
+
+또는 공식 이미지 사용:
+
+```yaml
+services:
+  cli-proxy-api:
+    image: eceasy/cli-proxy-api:latest
+    container_name: cli-proxy-api
+    ports:
+      - "8317:8317"
+    volumes:
+      - ./config.yaml:/CLIProxyAPI/config.yaml
+      - ./auths:/root/.cli-proxy-api
+      - ./logs:/CLIProxyAPI/logs
+    environment:
+      - TZ=Asia/Seoul
+    restart: unless-stopped
+```
+
+## 4. config.yaml 설정
+
+`config.example.yaml`을 기반으로 작성합니다.
+
+### 최소 설정 예시
+
+```yaml
+# 서버 바인딩
+host: ""
+port: 8317
+
+# API 키 (클라이언트 인증용, 원하는 값으로 설정)
+api-keys:
+  - "my-secret-api-key-1"
+
+# 디버그 (초기 설정 시 true 권장, 안정화 후 false)
+debug: false
+
+# 로그를 파일로 기록
+logging-to-file: true
+logs-max-total-size-mb: 100
+
+# 재시도 설정
+request-retry: 3
+```
+
+### Claude API 키 사용 시 추가
+
+```yaml
+claude-api-key:
+  - api-key: "sk-ant-xxxxx"
+    # base-url: "https://api.anthropic.com"  # 기본값이므로 생략 가능
+```
+
+### Gemini API 키 사용 시 추가
+
+```yaml
+gemini-api-key:
+  - api-key: "AIzaSy..."
+```
+
+### Management UI 활성화 (웹 관리 패널)
+
+```yaml
+remote-management:
+  allow-remote: true
+  secret-key: "my-management-password"
+  disable-control-panel: false
+```
+
+## 5. 배포 실행
+
+```bash
+cd ~/docker/cli-proxy-api
+
+# 공식 이미지 사용 시
+sudo /usr/local/bin/docker compose up -d
+
+# 소스 빌드 시 (Gitea에서 소스 가져와서)
+git clone http://nas.gru.farm:3001/airkjw/CLIProxyAPI.git src
+sudo /usr/local/bin/docker compose -f src/docker-compose.yml up -d --build
+```
+
+## 6. 확인
+
+```bash
+# 컨테이너 상태 확인
+sudo /usr/local/bin/docker ps | grep cli-proxy-api
+
+# 로그 확인
+sudo /usr/local/bin/docker logs cli-proxy-api
+
+# API 응답 테스트
+curl http://localhost:8317/
+curl http://192.168.0.17:8317/
+
+# 모델 목록 확인 (API 키 인증)
+curl -H "Authorization: Bearer my-secret-api-key-1" http://localhost:8317/v1/models
+```
+
+## 7. 클라이언트 연결
+
+CLIProxyAPI가 실행되면 각 AI CLI 도구에서 프록시 주소로 연결합니다.
+
+### Claude Code에서 사용
+
+```bash
+# 환경변수 설정
+export ANTHROPIC_BASE_URL=http://192.168.0.17:8317
+export ANTHROPIC_API_KEY=my-secret-api-key-1
+```
+
+### OpenAI 호환 클라이언트에서 사용
+
+```bash
+export OPENAI_BASE_URL=http://192.168.0.17:8317/v1
+export OPENAI_API_KEY=my-secret-api-key-1
+```
+
+## 8. 관리 & 운영
+
+```bash
+# 컨테이너 중지
+sudo /usr/local/bin/docker compose down
+
+# 설정 변경 후 재시작
+sudo /usr/local/bin/docker compose restart
+
+# 이미지 업데이트 (공식 이미지 사용 시)
+sudo /usr/local/bin/docker compose pull
+sudo /usr/local/bin/docker compose up -d
+
+# 로그 실시간 모니터링
+sudo /usr/local/bin/docker logs -f cli-proxy-api
+```
+
+## 포트 목록
+
+| 포트 | 용도 | 필수 여부 |
+|------|------|-----------|
+| 8317 | 메인 API | 필수 |
+| 8085 | 추가 API | 선택 |
+| 1455 | 추가 서비스 | 선택 |
+| 54545 | 추가 서비스 | 선택 |
+| 51121 | 추가 서비스 | 선택 |
+| 11451 | 추가 서비스 | 선택 |
+
+> 기본적으로 8317 포트만 열면 됩니다. 나머지는 특정 기능 사용 시 필요합니다.
+
+## 주의사항
+
+- `config.yaml`은 `.gitignore`에 포함되어 있어 Git에 커밋되지 않음 (API 키 보호)
+- OAuth 인증(Claude, Gemini 등)은 최초 1회 브라우저 로그인 필요
+- `auths/` 디렉토리를 볼륨으로 마운트하면 컨테이너 재생성 시에도 인증 유지
+- NAS 외부 접근 시 방화벽/포트포워딩 설정 필요
--- a/README.md
+++ b/README.md
@@ -126,10 +126,6 @@ Browser-based tool to translate SRT subtitles using your Gemini subscription via

 CLI wrapper for instant switching between multiple Claude accounts and alternative models (Gemini, Codex, Antigravity) via CLIProxyAPI OAuth - no API keys needed

-### [ProxyPal](https://github.com/heyhuynhgiabuu/proxypal)
-
-Native macOS GUI for managing CLIProxyAPI: configure providers, model mappings, and endpoints via OAuth - no API keys needed.
-
 ### [Quotio](https://github.com/nguyenphutrong/quotio)

 Native macOS menu bar app that unifies Claude, Gemini, OpenAI, Qwen, and Antigravity subscriptions with real-time quota tracking and smart auto-failover for AI coding tools like Claude Code, OpenCode, and Droid - no API keys needed.
--- a/README_CN.md
+++ b/README_CN.md
@@ -125,10 +125,6 @@ CLIProxyAPI 已内置对 [Amp CLI](https://ampcode.com) 和 Amp IDE 扩展的支

 CLI 封装器，用于通过 CLIProxyAPI OAuth 即时切换多个 Claude 账户和替代模型（Gemini, Codex, Antigravity），无需 API 密钥。

-### [ProxyPal](https://github.com/heyhuynhgiabuu/proxypal)
-
-基于 macOS 平台的原生 CLIProxyAPI GUI：配置供应商、模型映射以及OAuth端点，无需 API 密钥。
-
 ### [Quotio](https://github.com/nguyenphutrong/quotio)

 原生 macOS 菜单栏应用，统一管理 Claude、Gemini、OpenAI、Qwen 和 Antigravity 订阅，提供实时配额追踪和智能自动故障转移，支持 Claude Code、OpenCode 和 Droid 等 AI 编程工具，无需 API 密钥。
--- a/README_JA.md
+++ b/README_JA.md
@@ -126,10 +126,6 @@ CLIProxyAPI経由でGeminiサブスクリプションを使用してSRT字幕を

 CLIProxyAPI OAuthを使用して複数のClaudeアカウントや代替モデル（Gemini、Codex、Antigravity）を即座に切り替えるCLIラッパー - APIキー不要

-### [ProxyPal](https://github.com/heyhuynhgiabuu/proxypal)
-
-CLIProxyAPI管理用のmacOSネイティブGUI：OAuth経由でプロバイダー、モデルマッピング、エンドポイントを設定 - APIキー不要
-
 ### [Quotio](https://github.com/nguyenphutrong/quotio)

 Claude、Gemini、OpenAI、Qwen、Antigravityのサブスクリプションを統合し、リアルタイムのクォータ追跡とスマート自動フェイルオーバーを備えたmacOSネイティブのメニューバーアプリ。Claude Code、OpenCode、Droidなどのコーディングツール向け - APIキー不要
--- a/REVERSE_PROXY_SETUP.md
+++ b/REVERSE_PROXY_SETUP.md
@@ -0,0 +1,104 @@
+# CLIProxyAPI 역방향 프록시 & HTTPS 설정 가이드
+
+외부에서 `https://cliproxy.gru.farm`으로 CLIProxyAPI에 접근하기 위한 설정입니다.
+
+## 1단계: DNS 레코드 추가
+
+hostcocoa.com DNS 관리에서 A 레코드를 추가합니다.
+
+| 타입 | 호스트 | 값 |
+|------|--------|-----|
+| A | cliproxy | 125.188.185.74 |
+
+> 기존 `nas.gru.farm`, `haesol.gru.farm` 등과 같은 IP입니다.
+
+## 2단계: Synology DSM 역방향 프록시 설정
+
+1. DSM 웹 UI 접속 (보통 `https://nas.gru.farm:5001`)
+2. **제어판** → **로그인 포털** → **고급** 탭 → **역방향 프록시** 클릭
+3. **생성** 버튼 클릭
+4. 아래와 같이 입력:
+
+### 일반 설정
+
+| 항목 | 값 |
+|------|-----|
+| 설명 | `CLIProxyAPI` |
+| **소스 (프론트엔드)** | |
+| 프로토콜 | `HTTPS` |
+| 호스트 이름 | `cliproxy.gru.farm` |
+| 포트 | `443` |
+| HSTS | 비활성화 |
+| **대상 (백엔드)** | |
+| 프로토콜 | `HTTP` |
+| 호스트 이름 | `localhost` |
+| 포트 | `8317` |
+
+### 사용자 지정 헤더 (선택)
+
+필요 시 WebSocket 지원을 위해 사용자 지정 헤더 추가:
+- `Upgrade` → `$http_upgrade`
+- `Connection` → `$connection_upgrade`
+
+### 타임아웃 설정
+
+AI 요청은 응답이 오래 걸릴 수 있으므로 타임아웃을 늘려주세요:
+- 연결 타임아웃: `600`
+- 전송 타임아웃: `600`
+- 수신 타임아웃: `600`
+
+5. **저장** 클릭
+
+## 3단계: SSL 인증서 설정
+
+Synology DSM에서 `cliproxy.gru.farm` 용 SSL 인증서를 설정합니다.
+
+### Let's Encrypt 인증서 발급 (권장)
+
+1. **제어판** → **보안** → **인증서** 탭
+2. **추가** → **새 인증서 추가** → **Let's Encrypt에서 인증서 가져오기**
+3. 도메인: `cliproxy.gru.farm`
+4. 이메일: 본인 이메일
+5. 발급 완료 후, **설정** 버튼 클릭
+6. `cliproxy.gru.farm` 역방향 프록시 항목에 방금 발급한 인증서 선택
+
+### 기존 와일드카드 인증서가 있는 경우
+
+`*.gru.farm` 와일드카드 인증서가 있다면 별도 발급 없이 해당 인증서를 선택하면 됩니다.
+
+## 4단계: 공유기 포트 포워딩
+
+공유기에서 443 포트가 NAS(192.168.0.17)로 포워딩되어 있는지 확인합니다.
+
+> 기존 `haesol.gru.farm` 등이 HTTPS로 동작 중이라면 이미 설정되어 있을 가능성이 높습니다.
+
+| 외부 포트 | 내부 IP | 내부 포트 | 프로토콜 |
+|-----------|---------|-----------|----------|
+| 443 | 192.168.0.17 | 443 | TCP |
+
+## 5단계: 확인
+
+```bash
+# DNS 전파 확인
+dig +short cliproxy.gru.farm
+# 125.188.185.74 가 나오면 성공
+
+# HTTPS 접속 테스트
+curl https://cliproxy.gru.farm/
+# {"endpoints":[...],"message":"CLI Proxy API Server"}
+
+# 모델 목록 확인
+curl -H "Authorization: Bearer Jinie4eva!" https://cliproxy.gru.farm/v1/models
+```
+
+## 클라이언트 연결 (외부)
+
+```bash
+# Claude Code
+export ANTHROPIC_BASE_URL=https://cliproxy.gru.farm
+export ANTHROPIC_API_KEY=Jinie4eva!
+
+# OpenAI 호환
+export OPENAI_BASE_URL=https://cliproxy.gru.farm/v1
+export OPENAI_API_KEY=Jinie4eva!
+```
--- a/internal/api/modules/amp/fallback_handlers.go
+++ b/internal/api/modules/amp/fallback_handlers.go
@@ -123,6 +123,10 @@ func (fh *FallbackHandler) WrapHandler(handler gin.HandlerFunc) gin.HandlerFunc
 			return
 		}

+		// Sanitize request body: remove thinking blocks with invalid signatures
+		// to prevent upstream API 400 errors
+		bodyBytes = SanitizeAmpRequestBody(bodyBytes)
+
 		// Restore the body for the handler to read
 		c.Request.Body = io.NopCloser(bytes.NewReader(bodyBytes))

@@ -259,10 +263,16 @@ func (fh *FallbackHandler) WrapHandler(handler gin.HandlerFunc) gin.HandlerFunc
 		} else if len(providers) > 0 {
 			// Log: Using local provider (free)
 			logAmpRouting(RouteTypeLocalProvider, modelName, resolvedModel, providerName, requestPath)
+			// Wrap with ResponseRewriter for local providers too, because upstream
+			// proxies (e.g. NewAPI) may return a different model name and lack
+			// Amp-required fields like thinking.signature.
+			rewriter := NewResponseRewriter(c.Writer, modelName)
+			c.Writer = rewriter
 			// Filter Anthropic-Beta header only for local handling paths
 			filterAntropicBetaHeader(c)
 			c.Request.Body = io.NopCloser(bytes.NewReader(bodyBytes))
 			handler(c)
+			rewriter.Flush()
 		} else {
 			// No provider, no mapping, no proxy: fall back to the wrapped handler so it can return an error response
 			c.Request.Body = io.NopCloser(bytes.NewReader(bodyBytes))
--- a/internal/api/modules/amp/response_rewriter.go
+++ b/internal/api/modules/amp/response_rewriter.go
@@ -2,6 +2,7 @@ package amp

 import (
 	"bytes"
+	"fmt"
 	"net/http"
 	"strings"

@@ -12,32 +13,83 @@ import (
 )

 // ResponseRewriter wraps a gin.ResponseWriter to intercept and modify the response body
-// It's used to rewrite model names in responses when model mapping is used
+// It is used to rewrite model names in responses when model mapping is used
+// and to keep Amp-compatible response shapes.
 type ResponseRewriter struct {
 	gin.ResponseWriter
-	body          *bytes.Buffer
-	originalModel string
-	isStreaming   bool
+	body                   *bytes.Buffer
+	originalModel          string
+	isStreaming            bool
+	suppressedContentBlock map[int]struct{}
 }

-// NewResponseRewriter creates a new response rewriter for model name substitution
+// NewResponseRewriter creates a new response rewriter for model name substitution.
 func NewResponseRewriter(w gin.ResponseWriter, originalModel string) *ResponseRewriter {
 	return &ResponseRewriter{
-		ResponseWriter: w,
-		body:           &bytes.Buffer{},
-		originalModel:  originalModel,
+		ResponseWriter:         w,
+		body:                   &bytes.Buffer{},
+		originalModel:          originalModel,
+		suppressedContentBlock: make(map[int]struct{}),
 	}
 }

-// Write intercepts response writes and buffers them for model name replacement
+const maxBufferedResponseBytes = 2 * 1024 * 1024 // 2MB safety cap
+
+func looksLikeSSEChunk(data []byte) bool {
+	for _, line := range bytes.Split(data, []byte("\n")) {
+		trimmed := bytes.TrimSpace(line)
+		if bytes.HasPrefix(trimmed, []byte("data:")) ||
+			bytes.HasPrefix(trimmed, []byte("event:")) {
+			return true
+		}
+	}
+	return false
+}
+
+func (rw *ResponseRewriter) enableStreaming(reason string) error {
+	if rw.isStreaming {
+		return nil
+	}
+	rw.isStreaming = true
+
+	if rw.body != nil && rw.body.Len() > 0 {
+		buf := rw.body.Bytes()
+		toFlush := make([]byte, len(buf))
+		copy(toFlush, buf)
+		rw.body.Reset()
+
+		if _, err := rw.ResponseWriter.Write(rw.rewriteStreamChunk(toFlush)); err != nil {
+			return err
+		}
+		if flusher, ok := rw.ResponseWriter.(http.Flusher); ok {
+			flusher.Flush()
+		}
+	}
+
+	log.Debugf("amp response rewriter: switched to streaming (%s)", reason)
+	return nil
+}
+
 func (rw *ResponseRewriter) Write(data []byte) (int, error) {
-	// Detect streaming on first write
-	if rw.body.Len() == 0 && !rw.isStreaming {
+	if !rw.isStreaming && rw.body.Len() == 0 {
 		contentType := rw.Header().Get("Content-Type")
 		rw.isStreaming = strings.Contains(contentType, "text/event-stream") ||
 			strings.Contains(contentType, "stream")
 	}

+	if !rw.isStreaming {
+		if looksLikeSSEChunk(data) {
+			if err := rw.enableStreaming("sse heuristic"); err != nil {
+				return 0, err
+			}
+		} else if rw.body.Len()+len(data) > maxBufferedResponseBytes {
+			log.Warnf("amp response rewriter: buffer exceeded %d bytes, switching to streaming", maxBufferedResponseBytes)
+			if err := rw.enableStreaming("buffer limit"); err != nil {
+				return 0, err
+			}
+		}
+	}
+
 	if rw.isStreaming {
 		n, err := rw.ResponseWriter.Write(rw.rewriteStreamChunk(data))
 		if err == nil {
@@ -50,7 +102,6 @@ func (rw *ResponseRewriter) Write(data []byte) (int, error) {
 	return rw.body.Write(data)
 }

-// Flush writes the buffered response with model names rewritten
 func (rw *ResponseRewriter) Flush() {
 	if rw.isStreaming {
 		if flusher, ok := rw.ResponseWriter.(http.Flusher); ok {
@@ -59,26 +110,68 @@ func (rw *ResponseRewriter) Flush() {
 		return
 	}
 	if rw.body.Len() > 0 {
-		if _, err := rw.ResponseWriter.Write(rw.rewriteModelInResponse(rw.body.Bytes())); err != nil {
+		rewritten := rw.rewriteModelInResponse(rw.body.Bytes())
+		// Update Content-Length to match the rewritten body size, since
+		// signature injection and model name changes alter the payload length.
+		rw.ResponseWriter.Header().Set("Content-Length", fmt.Sprintf("%d", len(rewritten)))
+		if _, err := rw.ResponseWriter.Write(rewritten); err != nil {
 			log.Warnf("amp response rewriter: failed to write rewritten response: %v", err)
 		}
 	}
 }

-// modelFieldPaths lists all JSON paths where model name may appear
 var modelFieldPaths = []string{"message.model", "model", "modelVersion", "response.model", "response.modelVersion"}

-// rewriteModelInResponse replaces all occurrences of the mapped model with the original model in JSON
-// It also suppresses "thinking" blocks if "tool_use" is present to ensure Amp client compatibility
-func (rw *ResponseRewriter) rewriteModelInResponse(data []byte) []byte {
-	// 1. Amp Compatibility: Suppress thinking blocks if tool use is detected
-	// The Amp client struggles when both thinking and tool_use blocks are present
+// ensureAmpSignature injects empty signature fields into tool_use/thinking blocks
+// in API responses so that the Amp TUI does not crash on P.signature.length.
+func ensureAmpSignature(data []byte) []byte {
+	for index, block := range gjson.GetBytes(data, "content").Array() {
+		blockType := block.Get("type").String()
+		if blockType != "tool_use" && blockType != "thinking" {
+			continue
+		}
+		signaturePath := fmt.Sprintf("content.%d.signature", index)
+		if gjson.GetBytes(data, signaturePath).Exists() {
+			continue
+		}
+		var err error
+		data, err = sjson.SetBytes(data, signaturePath, "")
+		if err != nil {
+			log.Warnf("Amp ResponseRewriter: failed to add empty signature to %s block: %v", blockType, err)
+			break
+		}
+	}
+
+	contentBlockType := gjson.GetBytes(data, "content_block.type").String()
+	if (contentBlockType == "tool_use" || contentBlockType == "thinking") && !gjson.GetBytes(data, "content_block.signature").Exists() {
+		var err error
+		data, err = sjson.SetBytes(data, "content_block.signature", "")
+		if err != nil {
+			log.Warnf("Amp ResponseRewriter: failed to add empty signature to streaming %s block: %v", contentBlockType, err)
+		}
+	}
+
+	return data
+}
+
+func (rw *ResponseRewriter) markSuppressedContentBlock(index int) {
+	if rw.suppressedContentBlock == nil {
+		rw.suppressedContentBlock = make(map[int]struct{})
+	}
+	rw.suppressedContentBlock[index] = struct{}{}
+}
+
+func (rw *ResponseRewriter) isSuppressedContentBlock(index int) bool {
+	_, ok := rw.suppressedContentBlock[index]
+	return ok
+}
+
+func (rw *ResponseRewriter) suppressAmpThinking(data []byte) []byte {
 	if gjson.GetBytes(data, `content.#(type=="tool_use")`).Exists() {
 		filtered := gjson.GetBytes(data, `content.#(type!="thinking")#`)
 		if filtered.Exists() {
 			originalCount := gjson.GetBytes(data, "content.#").Int()
 			filteredCount := filtered.Get("#").Int()
-
 			if originalCount > filteredCount {
 				var err error
 				data, err = sjson.SetBytes(data, "content", filtered.Value())
@@ -86,13 +179,41 @@ func (rw *ResponseRewriter) rewriteModelInResponse(data []byte) []byte {
 					log.Warnf("Amp ResponseRewriter: failed to suppress thinking blocks: %v", err)
 				} else {
 					log.Debugf("Amp ResponseRewriter: Suppressed %d thinking blocks due to tool usage", originalCount-filteredCount)
-					// Log the result for verification
-					log.Debugf("Amp ResponseRewriter: Resulting content: %s", gjson.GetBytes(data, "content").String())
 				}
 			}
 		}
 	}

+	eventType := gjson.GetBytes(data, "type").String()
+	indexResult := gjson.GetBytes(data, "index")
+	if eventType == "content_block_start" && gjson.GetBytes(data, "content_block.type").String() == "thinking" && indexResult.Exists() {
+		rw.markSuppressedContentBlock(int(indexResult.Int()))
+		return nil
+	}
+	if gjson.GetBytes(data, "delta.type").String() == "thinking_delta" {
+		if indexResult.Exists() {
+			rw.markSuppressedContentBlock(int(indexResult.Int()))
+		}
+		return nil
+	}
+	if eventType == "content_block_stop" && indexResult.Exists() {
+		index := int(indexResult.Int())
+		if rw.isSuppressedContentBlock(index) {
+			delete(rw.suppressedContentBlock, index)
+			return nil
+		}
+	}
+
+	return data
+}
+
+func (rw *ResponseRewriter) rewriteModelInResponse(data []byte) []byte {
+	data = ensureAmpSignature(data)
+	data = rw.suppressAmpThinking(data)
+	if len(data) == 0 {
+		return data
+	}
+
 	if rw.originalModel == "" {
 		return data
 	}
@@ -104,24 +225,158 @@ func (rw *ResponseRewriter) rewriteModelInResponse(data []byte) []byte {
 	return data
 }

-// rewriteStreamChunk rewrites model names in SSE stream chunks
 func (rw *ResponseRewriter) rewriteStreamChunk(chunk []byte) []byte {
-	if rw.originalModel == "" {
-		return chunk
+	lines := bytes.Split(chunk, []byte("\n"))
+	var out [][]byte
+
+	i := 0
+	for i < len(lines) {
+		line := lines[i]
+		trimmed := bytes.TrimSpace(line)
+
+		// Case 1: "event:" line - look ahead for its "data:" line
+		if bytes.HasPrefix(trimmed, []byte("event: ")) {
+			// Scan forward past blank lines to find the data: line
+			dataIdx := -1
+			for j := i + 1; j < len(lines); j++ {
+				t := bytes.TrimSpace(lines[j])
+				if len(t) == 0 {
+					continue
+				}
+				if bytes.HasPrefix(t, []byte("data: ")) {
+					dataIdx = j
+				}
+				break
+			}
+
+			if dataIdx >= 0 {
+				// Found event+data pair - process through rewriter
+				jsonData := bytes.TrimPrefix(bytes.TrimSpace(lines[dataIdx]), []byte("data: "))
+				if len(jsonData) > 0 && jsonData[0] == '{' {
+					rewritten := rw.rewriteStreamEvent(jsonData)
+					if rewritten == nil {
+						// Event suppressed (e.g. thinking block), skip event+data pair
+						i = dataIdx + 1
+						continue
+					}
+					// Emit event line
+					out = append(out, line)
+					// Emit blank lines between event and data
+					for k := i + 1; k < dataIdx; k++ {
+						out = append(out, lines[k])
+					}
+					// Emit rewritten data
+					out = append(out, append([]byte("data: "), rewritten...))
+					i = dataIdx + 1
+					continue
+				}
+			}
+
+			// No data line found (orphan event from cross-chunk split)
+			// Pass it through as-is - the data will arrive in the next chunk
+			out = append(out, line)
+			i++
+			continue
+		}
+
+		// Case 2: standalone "data:" line (no preceding event: in this chunk)
+		if bytes.HasPrefix(trimmed, []byte("data: ")) {
+			jsonData := bytes.TrimPrefix(trimmed, []byte("data: "))
+			if len(jsonData) > 0 && jsonData[0] == '{' {
+				rewritten := rw.rewriteStreamEvent(jsonData)
+				if rewritten != nil {
+					out = append(out, append([]byte("data: "), rewritten...))
+				}
+				i++
+				continue
+			}
+		}
+
+		// Case 3: everything else
+		out = append(out, line)
+		i++
 	}

-	// SSE format: "data: {json}\n\n"
-	lines := bytes.Split(chunk, []byte("\n"))
-	for i, line := range lines {
-		if bytes.HasPrefix(line, []byte("data: ")) {
-			jsonData := bytes.TrimPrefix(line, []byte("data: "))
-			if len(jsonData) > 0 && jsonData[0] == '{' {
-				// Rewrite JSON in the data line
-				rewritten := rw.rewriteModelInResponse(jsonData)
-				lines[i] = append([]byte("data: "), rewritten...)
+	return bytes.Join(out, []byte("\n"))
+}
+
+// rewriteStreamEvent processes a single JSON event in the SSE stream.
+// It rewrites model names and ensures signature fields exist.
+func (rw *ResponseRewriter) rewriteStreamEvent(data []byte) []byte {
+	// Suppress thinking blocks before any other processing.
+	data = rw.suppressAmpThinking(data)
+	if len(data) == 0 {
+		return nil
+	}
+
+	// Inject empty signature where needed
+	data = ensureAmpSignature(data)
+
+	// Rewrite model name
+	if rw.originalModel != "" {
+		for _, path := range modelFieldPaths {
+			if gjson.GetBytes(data, path).Exists() {
+				data, _ = sjson.SetBytes(data, path, rw.originalModel)
 			}
 		}
 	}

-	return bytes.Join(lines, []byte("\n"))
+	return data
+}
+
+// SanitizeAmpRequestBody removes thinking blocks with empty/missing/invalid signatures
+// from the messages array in a request body before forwarding to the upstream API.
+// This prevents 400 errors from the API which requires valid signatures on thinking blocks.
+func SanitizeAmpRequestBody(body []byte) []byte {
+	messages := gjson.GetBytes(body, "messages")
+	if !messages.Exists() || !messages.IsArray() {
+		return body
+	}
+
+	modified := false
+	for msgIdx, msg := range messages.Array() {
+		if msg.Get("role").String() != "assistant" {
+			continue
+		}
+		content := msg.Get("content")
+		if !content.Exists() || !content.IsArray() {
+			continue
+		}
+
+		var keepBlocks []interface{}
+		removedCount := 0
+
+		for _, block := range content.Array() {
+			blockType := block.Get("type").String()
+			if blockType == "thinking" {
+				sig := block.Get("signature")
+				if !sig.Exists() || sig.Type != gjson.String || strings.TrimSpace(sig.String()) == "" {
+					removedCount++
+					continue
+				}
+			}
+			keepBlocks = append(keepBlocks, block.Value())
+		}
+
+		if removedCount > 0 {
+			contentPath := fmt.Sprintf("messages.%d.content", msgIdx)
+			var err error
+			if len(keepBlocks) == 0 {
+				body, err = sjson.SetBytes(body, contentPath, []interface{}{})
+			} else {
+				body, err = sjson.SetBytes(body, contentPath, keepBlocks)
+			}
+			if err != nil {
+				log.Warnf("Amp RequestSanitizer: failed to remove thinking blocks from message %d: %v", msgIdx, err)
+				continue
+			}
+			modified = true
+			log.Debugf("Amp RequestSanitizer: removed %d thinking blocks with invalid signatures from message %d", removedCount, msgIdx)
+		}
+	}
+
+	if modified {
+		log.Debugf("Amp RequestSanitizer: sanitized request body")
+	}
+	return body
 }
--- a/internal/api/modules/amp/response_rewriter_test.go
+++ b/internal/api/modules/amp/response_rewriter_test.go
@@ -100,6 +100,44 @@ func TestRewriteStreamChunk_MessageModel(t *testing.T) {
 	}
 }

+func TestRewriteStreamChunk_SuppressesThinkingContentBlockFrames(t *testing.T) {
+	rw := &ResponseRewriter{suppressedContentBlock: make(map[int]struct{})}
+
+	chunk := []byte("event: content_block_start\ndata: {\"type\":\"content_block_start\",\"index\":0,\"content_block\":{\"type\":\"thinking\",\"thinking\":\"\"}}\n\nevent: content_block_delta\ndata: {\"type\":\"content_block_delta\",\"index\":0,\"delta\":{\"type\":\"thinking_delta\",\"thinking\":\"abc\"}}\n\nevent: content_block_stop\ndata: {\"type\":\"content_block_stop\",\"index\":0}\n\nevent: content_block_start\ndata: {\"type\":\"content_block_start\",\"index\":1,\"content_block\":{\"type\":\"tool_use\",\"name\":\"bash\",\"input\":{}}}\n\n")
+	result := rw.rewriteStreamChunk(chunk)
+
+	if contains(result, []byte("\"thinking\"")) || contains(result, []byte("\"thinking_delta\"")) {
+		t.Fatalf("expected thinking content_block frames to be suppressed, got %s", string(result))
+	}
+	if contains(result, []byte("content_block_stop")) {
+		t.Fatalf("expected suppressed thinking content_block_stop to be removed, got %s", string(result))
+	}
+	if !contains(result, []byte("\"tool_use\"")) {
+		t.Fatalf("expected tool_use content_block frame to remain, got %s", string(result))
+	}
+	if !contains(result, []byte("\"signature\":\"\"")) {
+		t.Fatalf("expected tool_use content_block signature injection, got %s", string(result))
+	}
+}
+
+func TestSanitizeAmpRequestBody_RemovesWhitespaceAndNonStringSignatures(t *testing.T) {
+	input := []byte(`{"messages":[{"role":"assistant","content":[{"type":"thinking","thinking":"drop-whitespace","signature":"   "},{"type":"thinking","thinking":"drop-number","signature":123},{"type":"thinking","thinking":"keep-valid","signature":"valid-signature"},{"type":"text","text":"keep-text"}]}]}`)
+	result := SanitizeAmpRequestBody(input)
+
+	if contains(result, []byte("drop-whitespace")) {
+		t.Fatalf("expected whitespace-only signature block to be removed, got %s", string(result))
+	}
+	if contains(result, []byte("drop-number")) {
+		t.Fatalf("expected non-string signature block to be removed, got %s", string(result))
+	}
+	if !contains(result, []byte("keep-valid")) {
+		t.Fatalf("expected valid thinking block to remain, got %s", string(result))
+	}
+	if !contains(result, []byte("keep-text")) {
+		t.Fatalf("expected non-thinking content to remain, got %s", string(result))
+	}
+}
+
 func contains(data, substr []byte) bool {
 	for i := 0; i <= len(data)-len(substr); i++ {
 		if string(data[i:i+len(substr)]) == string(substr) {
--- a/internal/runtime/executor/claude_executor.go
+++ b/internal/runtime/executor/claude_executor.go
@@ -22,6 +22,7 @@ import (
 	claudeauth "github.com/router-for-me/CLIProxyAPI/v6/internal/auth/claude"
 	"github.com/router-for-me/CLIProxyAPI/v6/internal/config"
 	"github.com/router-for-me/CLIProxyAPI/v6/internal/misc"
+	"github.com/router-for-me/CLIProxyAPI/v6/internal/registry"
 	"github.com/router-for-me/CLIProxyAPI/v6/internal/thinking"
 	"github.com/router-for-me/CLIProxyAPI/v6/internal/util"
 	cliproxyauth "github.com/router-for-me/CLIProxyAPI/v6/sdk/cliproxy/auth"
@@ -44,6 +45,10 @@ type ClaudeExecutor struct {
 // Previously "proxy_" was used but this is a detectable fingerprint difference.
 const claudeToolPrefix = ""

+// Anthropic-compatible upstreams may reject or even crash when Claude models
+// omit max_tokens. Prefer registered model metadata before using a fallback.
+const defaultModelMaxTokens = 1024
+
 func NewClaudeExecutor(cfg *config.Config) *ClaudeExecutor { return &ClaudeExecutor{cfg: cfg} }

 func (e *ClaudeExecutor) Identifier() string { return "claude" }
@@ -127,6 +132,7 @@ func (e *ClaudeExecutor) Execute(ctx context.Context, auth *cliproxyauth.Auth, r

 	requestedModel := payloadRequestedModel(opts, req.Model)
 	body = applyPayloadConfigWithRoot(e.cfg, baseModel, to.String(), "", body, originalTranslated, requestedModel)
+	body = ensureModelMaxTokens(body, baseModel)

 	// Disable thinking if tool_choice forces tool use (Anthropic API constraint)
 	body = disableThinkingIfToolChoiceForced(body)
@@ -293,6 +299,7 @@ func (e *ClaudeExecutor) ExecuteStream(ctx context.Context, auth *cliproxyauth.A

 	requestedModel := payloadRequestedModel(opts, req.Model)
 	body = applyPayloadConfigWithRoot(e.cfg, baseModel, to.String(), "", body, originalTranslated, requestedModel)
+	body = ensureModelMaxTokens(body, baseModel)

 	// Disable thinking if tool_choice forces tool use (Anthropic API constraint)
 	body = disableThinkingIfToolChoiceForced(body)
@@ -1880,3 +1887,26 @@ func injectSystemCacheControl(payload []byte) []byte {

 	return payload
 }
+
+func ensureModelMaxTokens(body []byte, modelID string) []byte {
+	if len(body) == 0 || !gjson.ValidBytes(body) {
+		return body
+	}
+
+	if maxTokens := gjson.GetBytes(body, "max_tokens"); maxTokens.Exists() {
+		return body
+	}
+
+	for _, provider := range registry.GetGlobalRegistry().GetModelProviders(strings.TrimSpace(modelID)) {
+		if strings.EqualFold(provider, "claude") {
+			maxTokens := defaultModelMaxTokens
+			if info := registry.GetGlobalRegistry().GetModelInfo(strings.TrimSpace(modelID), "claude"); info != nil && info.MaxCompletionTokens > 0 {
+				maxTokens = info.MaxCompletionTokens
+			}
+			body, _ = sjson.SetBytes(body, "max_tokens", maxTokens)
+			return body
+		}
+	}
+
+	return body
+}
--- a/internal/runtime/executor/claude_executor_test.go
+++ b/internal/runtime/executor/claude_executor_test.go
@@ -15,6 +15,7 @@ import (
 	"github.com/gin-gonic/gin"
 	"github.com/klauspost/compress/zstd"
 	"github.com/router-for-me/CLIProxyAPI/v6/internal/config"
+	"github.com/router-for-me/CLIProxyAPI/v6/internal/registry"
 	cliproxyauth "github.com/router-for-me/CLIProxyAPI/v6/sdk/cliproxy/auth"
 	cliproxyexecutor "github.com/router-for-me/CLIProxyAPI/v6/sdk/cliproxy/executor"
 	sdktranslator "github.com/router-for-me/CLIProxyAPI/v6/sdk/translator"
@@ -1183,6 +1184,83 @@ func testClaudeExecutorInvalidCompressedErrorBody(
 	}
 }

+func TestEnsureModelMaxTokens_UsesRegisteredMaxCompletionTokens(t *testing.T) {
+	reg := registry.GetGlobalRegistry()
+	clientID := "test-claude-max-completion-tokens-client"
+	modelID := "test-claude-max-completion-tokens-model"
+	reg.RegisterClient(clientID, "claude", []*registry.ModelInfo{{
+		ID:                  modelID,
+		Type:                "claude",
+		OwnedBy:             "anthropic",
+		Object:              "model",
+		Created:             time.Now().Unix(),
+		MaxCompletionTokens: 4096,
+		UserDefined:         true,
+	}})
+	defer reg.UnregisterClient(clientID)
+
+	input := []byte(`{"model":"test-claude-max-completion-tokens-model","messages":[{"role":"user","content":"hi"}]}`)
+	out := ensureModelMaxTokens(input, modelID)
+
+	if got := gjson.GetBytes(out, "max_tokens").Int(); got != 4096 {
+		t.Fatalf("max_tokens = %d, want %d", got, 4096)
+	}
+}
+
+func TestEnsureModelMaxTokens_DefaultsMissingValue(t *testing.T) {
+	reg := registry.GetGlobalRegistry()
+	clientID := "test-claude-default-max-tokens-client"
+	modelID := "test-claude-default-max-tokens-model"
+	reg.RegisterClient(clientID, "claude", []*registry.ModelInfo{{
+		ID:          modelID,
+		Type:        "claude",
+		OwnedBy:     "anthropic",
+		Object:      "model",
+		Created:     time.Now().Unix(),
+		UserDefined: true,
+	}})
+	defer reg.UnregisterClient(clientID)
+
+	input := []byte(`{"model":"test-claude-default-max-tokens-model","messages":[{"role":"user","content":"hi"}]}`)
+	out := ensureModelMaxTokens(input, modelID)
+
+	if got := gjson.GetBytes(out, "max_tokens").Int(); got != defaultModelMaxTokens {
+		t.Fatalf("max_tokens = %d, want %d", got, defaultModelMaxTokens)
+	}
+}
+
+func TestEnsureModelMaxTokens_PreservesExplicitValue(t *testing.T) {
+	reg := registry.GetGlobalRegistry()
+	clientID := "test-claude-preserve-max-tokens-client"
+	modelID := "test-claude-preserve-max-tokens-model"
+	reg.RegisterClient(clientID, "claude", []*registry.ModelInfo{{
+		ID:                  modelID,
+		Type:                "claude",
+		OwnedBy:             "anthropic",
+		Object:              "model",
+		Created:             time.Now().Unix(),
+		MaxCompletionTokens: 4096,
+		UserDefined:         true,
+	}})
+	defer reg.UnregisterClient(clientID)
+
+	input := []byte(`{"model":"test-claude-preserve-max-tokens-model","max_tokens":2048,"messages":[{"role":"user","content":"hi"}]}`)
+	out := ensureModelMaxTokens(input, modelID)
+
+	if got := gjson.GetBytes(out, "max_tokens").Int(); got != 2048 {
+		t.Fatalf("max_tokens = %d, want %d", got, 2048)
+	}
+}
+
+func TestEnsureModelMaxTokens_SkipsUnregisteredModel(t *testing.T) {
+	input := []byte(`{"model":"test-claude-unregistered-model","messages":[{"role":"user","content":"hi"}]}`)
+	out := ensureModelMaxTokens(input, "test-claude-unregistered-model")
+
+	if gjson.GetBytes(out, "max_tokens").Exists() {
+		t.Fatalf("max_tokens should remain unset, got %s", gjson.GetBytes(out, "max_tokens").Raw)
+	}
+}
+
 // TestClaudeExecutor_ExecuteStream_SetsIdentityAcceptEncoding verifies that streaming
 // requests use Accept-Encoding: identity so the upstream cannot respond with a
 // compressed SSE body that would silently break the line scanner.
--- a/internal/runtime/executor/codex_continuity.go
+++ b/internal/runtime/executor/codex_continuity.go
@@ -1,125 +0,0 @@
-package executor
-
-import (
-	"context"
-	"fmt"
-	"net/http"
-	"strings"
-
-	"github.com/google/uuid"
-	cliproxyauth "github.com/router-for-me/CLIProxyAPI/v6/sdk/cliproxy/auth"
-	cliproxyexecutor "github.com/router-for-me/CLIProxyAPI/v6/sdk/cliproxy/executor"
-	log "github.com/sirupsen/logrus"
-	"github.com/tidwall/gjson"
-	"github.com/tidwall/sjson"
-)
-
-type codexContinuity struct {
-	Key    string
-	Source string
-}
-
-func metadataString(meta map[string]any, key string) string {
-	if len(meta) == 0 {
-		return ""
-	}
-	raw, ok := meta[key]
-	if !ok || raw == nil {
-		return ""
-	}
-	switch v := raw.(type) {
-	case string:
-		return strings.TrimSpace(v)
-	case []byte:
-		return strings.TrimSpace(string(v))
-	default:
-		return ""
-	}
-}
-
-func principalString(raw any) string {
-	switch v := raw.(type) {
-	case string:
-		return strings.TrimSpace(v)
-	case fmt.Stringer:
-		return strings.TrimSpace(v.String())
-	default:
-		return strings.TrimSpace(fmt.Sprintf("%v", raw))
-	}
-}
-
-func resolveCodexContinuity(ctx context.Context, auth *cliproxyauth.Auth, req cliproxyexecutor.Request, opts cliproxyexecutor.Options) codexContinuity {
-	if promptCacheKey := strings.TrimSpace(gjson.GetBytes(req.Payload, "prompt_cache_key").String()); promptCacheKey != "" {
-		return codexContinuity{Key: promptCacheKey, Source: "prompt_cache_key"}
-	}
-	if executionSession := metadataString(opts.Metadata, cliproxyexecutor.ExecutionSessionMetadataKey); executionSession != "" {
-		return codexContinuity{Key: executionSession, Source: "execution_session"}
-	}
-	if ginCtx := ginContextFrom(ctx); ginCtx != nil {
-		if ginCtx.Request != nil {
-			if v := strings.TrimSpace(ginCtx.GetHeader("Idempotency-Key")); v != "" {
-				return codexContinuity{Key: v, Source: "idempotency_key"}
-			}
-		}
-		if v, exists := ginCtx.Get("apiKey"); exists && v != nil {
-			if trimmed := principalString(v); trimmed != "" {
-				return codexContinuity{Key: uuid.NewSHA1(uuid.NameSpaceOID, []byte("cli-proxy-api:codex:prompt-cache:"+trimmed)).String(), Source: "client_principal"}
-			}
-		}
-	}
-	if auth != nil {
-		if authID := strings.TrimSpace(auth.ID); authID != "" {
-			return codexContinuity{Key: uuid.NewSHA1(uuid.NameSpaceOID, []byte("cli-proxy-api:codex:prompt-cache:auth:"+authID)).String(), Source: "auth_id"}
-		}
-	}
-	return codexContinuity{}
-}
-
-func applyCodexContinuityBody(rawJSON []byte, continuity codexContinuity) []byte {
-	if continuity.Key == "" {
-		return rawJSON
-	}
-	rawJSON, _ = sjson.SetBytes(rawJSON, "prompt_cache_key", continuity.Key)
-	return rawJSON
-}
-
-func applyCodexContinuityHeaders(headers http.Header, continuity codexContinuity) {
-	if headers == nil || continuity.Key == "" {
-		return
-	}
-	headers.Set("session_id", continuity.Key)
-}
-
-func logCodexRequestDiagnostics(ctx context.Context, auth *cliproxyauth.Auth, req cliproxyexecutor.Request, opts cliproxyexecutor.Options, headers http.Header, body []byte, continuity codexContinuity) {
-	if !log.IsLevelEnabled(log.DebugLevel) {
-		return
-	}
-	entry := logWithRequestID(ctx)
-	authID := ""
-	authFile := ""
-	if auth != nil {
-		authID = strings.TrimSpace(auth.ID)
-		authFile = strings.TrimSpace(auth.FileName)
-	}
-	selectedAuthID := metadataString(opts.Metadata, cliproxyexecutor.SelectedAuthMetadataKey)
-	executionSessionID := metadataString(opts.Metadata, cliproxyexecutor.ExecutionSessionMetadataKey)
-	entry.Debugf(
-		"codex request diagnostics auth_id=%s selected_auth_id=%s auth_file=%s exec_session=%s continuity_source=%s session_id=%s prompt_cache_key=%s prompt_cache_retention=%s store=%t has_instructions=%t reasoning_effort=%s reasoning_summary=%s chatgpt_account_id=%t originator=%s model=%s source_format=%s",
-		authID,
-		selectedAuthID,
-		authFile,
-		executionSessionID,
-		continuity.Source,
-		strings.TrimSpace(headers.Get("session_id")),
-		gjson.GetBytes(body, "prompt_cache_key").String(),
-		gjson.GetBytes(body, "prompt_cache_retention").String(),
-		gjson.GetBytes(body, "store").Bool(),
-		gjson.GetBytes(body, "instructions").Exists(),
-		gjson.GetBytes(body, "reasoning.effort").String(),
-		gjson.GetBytes(body, "reasoning.summary").String(),
-		strings.TrimSpace(headers.Get("Chatgpt-Account-Id")) != "",
-		strings.TrimSpace(headers.Get("Originator")),
-		req.Model,
-		opts.SourceFormat.String(),
-	)
-}
--- a/internal/runtime/executor/codex_executor.go
+++ b/internal/runtime/executor/codex_executor.go
@@ -111,6 +111,7 @@ func (e *CodexExecutor) Execute(ctx context.Context, auth *cliproxyauth.Auth, re
 	body, _ = sjson.SetBytes(body, "model", baseModel)
 	body, _ = sjson.SetBytes(body, "stream", true)
 	body, _ = sjson.DeleteBytes(body, "previous_response_id")
+	body, _ = sjson.DeleteBytes(body, "prompt_cache_retention")
 	body, _ = sjson.DeleteBytes(body, "safety_identifier")
 	body, _ = sjson.DeleteBytes(body, "stream_options")
 	if !gjson.GetBytes(body, "instructions").Exists() {
@@ -118,12 +119,11 @@ func (e *CodexExecutor) Execute(ctx context.Context, auth *cliproxyauth.Auth, re
 	}

 	url := strings.TrimSuffix(baseURL, "/") + "/responses"
-	httpReq, continuity, err := e.cacheHelper(ctx, auth, from, url, req, opts, body)
+	httpReq, err := e.cacheHelper(ctx, from, url, req, body)
 	if err != nil {
 		return resp, err
 	}
 	applyCodexHeaders(httpReq, auth, apiKey, true, e.cfg)
-	logCodexRequestDiagnostics(ctx, auth, req, opts, httpReq.Header, body, continuity)
 	var authID, authLabel, authType, authValue string
 	if auth != nil {
 		authID = auth.ID
@@ -223,12 +223,11 @@ func (e *CodexExecutor) executeCompact(ctx context.Context, auth *cliproxyauth.A
 	body, _ = sjson.DeleteBytes(body, "stream")

 	url := strings.TrimSuffix(baseURL, "/") + "/responses/compact"
-	httpReq, continuity, err := e.cacheHelper(ctx, auth, from, url, req, opts, body)
+	httpReq, err := e.cacheHelper(ctx, from, url, req, body)
 	if err != nil {
 		return resp, err
 	}
 	applyCodexHeaders(httpReq, auth, apiKey, false, e.cfg)
-	logCodexRequestDiagnostics(ctx, auth, req, opts, httpReq.Header, body, continuity)
 	var authID, authLabel, authType, authValue string
 	if auth != nil {
 		authID = auth.ID
@@ -311,6 +310,7 @@ func (e *CodexExecutor) ExecuteStream(ctx context.Context, auth *cliproxyauth.Au
 	requestedModel := payloadRequestedModel(opts, req.Model)
 	body = applyPayloadConfigWithRoot(e.cfg, baseModel, to.String(), "", body, originalTranslated, requestedModel)
 	body, _ = sjson.DeleteBytes(body, "previous_response_id")
+	body, _ = sjson.DeleteBytes(body, "prompt_cache_retention")
 	body, _ = sjson.DeleteBytes(body, "safety_identifier")
 	body, _ = sjson.DeleteBytes(body, "stream_options")
 	body, _ = sjson.SetBytes(body, "model", baseModel)
@@ -319,12 +319,11 @@ func (e *CodexExecutor) ExecuteStream(ctx context.Context, auth *cliproxyauth.Au
 	}

 	url := strings.TrimSuffix(baseURL, "/") + "/responses"
-	httpReq, continuity, err := e.cacheHelper(ctx, auth, from, url, req, opts, body)
+	httpReq, err := e.cacheHelper(ctx, from, url, req, body)
 	if err != nil {
 		return nil, err
 	}
 	applyCodexHeaders(httpReq, auth, apiKey, true, e.cfg)
-	logCodexRequestDiagnostics(ctx, auth, req, opts, httpReq.Header, body, continuity)
 	var authID, authLabel, authType, authValue string
 	if auth != nil {
 		authID = auth.ID
@@ -600,9 +599,8 @@ func (e *CodexExecutor) Refresh(ctx context.Context, auth *cliproxyauth.Auth) (*
 	return auth, nil
 }

-func (e *CodexExecutor) cacheHelper(ctx context.Context, auth *cliproxyauth.Auth, from sdktranslator.Format, url string, req cliproxyexecutor.Request, opts cliproxyexecutor.Options, rawJSON []byte) (*http.Request, codexContinuity, error) {
+func (e *CodexExecutor) cacheHelper(ctx context.Context, from sdktranslator.Format, url string, req cliproxyexecutor.Request, rawJSON []byte) (*http.Request, error) {
 	var cache codexCache
-	continuity := codexContinuity{}
 	if from == "claude" {
 		userIDResult := gjson.GetBytes(req.Payload, "metadata.user_id")
 		if userIDResult.Exists() {
@@ -615,26 +613,30 @@ func (e *CodexExecutor) cacheHelper(ctx context.Context, auth *cliproxyauth.Auth
 				}
 				setCodexCache(key, cache)
 			}
-			continuity = codexContinuity{Key: cache.ID, Source: "claude_user_cache"}
 		}
 	} else if from == "openai-response" {
 		promptCacheKey := gjson.GetBytes(req.Payload, "prompt_cache_key")
 		if promptCacheKey.Exists() {
 			cache.ID = promptCacheKey.String()
-			continuity = codexContinuity{Key: cache.ID, Source: "prompt_cache_key"}
 		}
 	} else if from == "openai" {
-		continuity = resolveCodexContinuity(ctx, auth, req, opts)
-		cache.ID = continuity.Key
+		if apiKey := strings.TrimSpace(apiKeyFromContext(ctx)); apiKey != "" {
+			cache.ID = uuid.NewSHA1(uuid.NameSpaceOID, []byte("cli-proxy-api:codex:prompt-cache:"+apiKey)).String()
+		}
 	}

-	rawJSON = applyCodexContinuityBody(rawJSON, continuity)
+	if cache.ID != "" {
+		rawJSON, _ = sjson.SetBytes(rawJSON, "prompt_cache_key", cache.ID)
+	}
 	httpReq, err := http.NewRequestWithContext(ctx, http.MethodPost, url, bytes.NewReader(rawJSON))
 	if err != nil {
-		return nil, continuity, err
+		return nil, err
 	}
-	applyCodexContinuityHeaders(httpReq.Header, continuity)
-	return httpReq, continuity, nil
+	if cache.ID != "" {
+		httpReq.Header.Set("Conversation_id", cache.ID)
+		httpReq.Header.Set("Session_id", cache.ID)
+	}
+	return httpReq, nil
 }

 func applyCodexHeaders(r *http.Request, auth *cliproxyauth.Auth, token string, stream bool, cfg *config.Config) {
@@ -647,7 +649,7 @@ func applyCodexHeaders(r *http.Request, auth *cliproxyauth.Auth, token string, s
 	}

 	misc.EnsureHeader(r.Header, ginHeaders, "Version", "")
-	misc.EnsureHeader(r.Header, ginHeaders, "session_id", uuid.NewString())
+	misc.EnsureHeader(r.Header, ginHeaders, "Session_id", uuid.NewString())
 	misc.EnsureHeader(r.Header, ginHeaders, "X-Codex-Turn-Metadata", "")
 	misc.EnsureHeader(r.Header, ginHeaders, "X-Client-Request-Id", "")
 	cfgUserAgent, _ := codexHeaderDefaults(cfg, auth)
--- a/internal/runtime/executor/codex_executor_cache_test.go
+++ b/internal/runtime/executor/codex_executor_cache_test.go
@@ -8,7 +8,6 @@ import (

 	"github.com/gin-gonic/gin"
 	"github.com/google/uuid"
-	cliproxyauth "github.com/router-for-me/CLIProxyAPI/v6/sdk/cliproxy/auth"
 	cliproxyexecutor "github.com/router-for-me/CLIProxyAPI/v6/sdk/cliproxy/executor"
 	sdktranslator "github.com/router-for-me/CLIProxyAPI/v6/sdk/translator"
 	"github.com/tidwall/gjson"
@@ -28,7 +27,7 @@ func TestCodexExecutorCacheHelper_OpenAIChatCompletions_StablePromptCacheKeyFrom
 	}
 	url := "https://example.com/responses"

-	httpReq, _, err := executor.cacheHelper(ctx, nil, sdktranslator.FromString("openai"), url, req, cliproxyexecutor.Options{}, rawJSON)
+	httpReq, err := executor.cacheHelper(ctx, sdktranslator.FromString("openai"), url, req, rawJSON)
 	if err != nil {
 		t.Fatalf("cacheHelper error: %v", err)
 	}
@@ -43,14 +42,14 @@ func TestCodexExecutorCacheHelper_OpenAIChatCompletions_StablePromptCacheKeyFrom
 	if gotKey != expectedKey {
 		t.Fatalf("prompt_cache_key = %q, want %q", gotKey, expectedKey)
 	}
-	if gotSession := httpReq.Header.Get("session_id"); gotSession != expectedKey {
-		t.Fatalf("session_id = %q, want %q", gotSession, expectedKey)
+	if gotConversation := httpReq.Header.Get("Conversation_id"); gotConversation != expectedKey {
+		t.Fatalf("Conversation_id = %q, want %q", gotConversation, expectedKey)
 	}
-	if got := httpReq.Header.Get("Conversation_id"); got != "" {
-		t.Fatalf("Conversation_id = %q, want empty", got)
+	if gotSession := httpReq.Header.Get("Session_id"); gotSession != expectedKey {
+		t.Fatalf("Session_id = %q, want %q", gotSession, expectedKey)
 	}

-	httpReq2, _, err := executor.cacheHelper(ctx, nil, sdktranslator.FromString("openai"), url, req, cliproxyexecutor.Options{}, rawJSON)
+	httpReq2, err := executor.cacheHelper(ctx, sdktranslator.FromString("openai"), url, req, rawJSON)
 	if err != nil {
 		t.Fatalf("cacheHelper error (second call): %v", err)
 	}
@@ -63,118 +62,3 @@ func TestCodexExecutorCacheHelper_OpenAIChatCompletions_StablePromptCacheKeyFrom
 		t.Fatalf("prompt_cache_key (second call) = %q, want %q", gotKey2, expectedKey)
 	}
 }
-
-func TestCodexExecutorCacheHelper_OpenAIResponses_PreservesPromptCacheRetention(t *testing.T) {
-	executor := &CodexExecutor{}
-	url := "https://example.com/responses"
-	req := cliproxyexecutor.Request{
-		Model:   "gpt-5.3-codex",
-		Payload: []byte(`{"model":"gpt-5.3-codex","prompt_cache_key":"cache-key-1","prompt_cache_retention":"persistent"}`),
-	}
-	rawJSON := []byte(`{"model":"gpt-5.3-codex","stream":true,"prompt_cache_retention":"persistent"}`)
-
-	httpReq, _, err := executor.cacheHelper(context.Background(), nil, sdktranslator.FromString("openai-response"), url, req, cliproxyexecutor.Options{}, rawJSON)
-	if err != nil {
-		t.Fatalf("cacheHelper error: %v", err)
-	}
-
-	body, err := io.ReadAll(httpReq.Body)
-	if err != nil {
-		t.Fatalf("read request body: %v", err)
-	}
-
-	if got := gjson.GetBytes(body, "prompt_cache_key").String(); got != "cache-key-1" {
-		t.Fatalf("prompt_cache_key = %q, want %q", got, "cache-key-1")
-	}
-	if got := gjson.GetBytes(body, "prompt_cache_retention").String(); got != "persistent" {
-		t.Fatalf("prompt_cache_retention = %q, want %q", got, "persistent")
-	}
-	if got := httpReq.Header.Get("session_id"); got != "cache-key-1" {
-		t.Fatalf("session_id = %q, want %q", got, "cache-key-1")
-	}
-	if got := httpReq.Header.Get("Conversation_id"); got != "" {
-		t.Fatalf("Conversation_id = %q, want empty", got)
-	}
-}
-
-func TestCodexExecutorCacheHelper_OpenAIChatCompletions_UsesExecutionSessionForContinuity(t *testing.T) {
-	executor := &CodexExecutor{}
-	rawJSON := []byte(`{"model":"gpt-5.4","stream":true}`)
-	req := cliproxyexecutor.Request{
-		Model:   "gpt-5.4",
-		Payload: []byte(`{"model":"gpt-5.4"}`),
-	}
-	opts := cliproxyexecutor.Options{Metadata: map[string]any{cliproxyexecutor.ExecutionSessionMetadataKey: "exec-session-1"}}
-
-	httpReq, _, err := executor.cacheHelper(context.Background(), nil, sdktranslator.FromString("openai"), "https://example.com/responses", req, opts, rawJSON)
-	if err != nil {
-		t.Fatalf("cacheHelper error: %v", err)
-	}
-
-	body, err := io.ReadAll(httpReq.Body)
-	if err != nil {
-		t.Fatalf("read request body: %v", err)
-	}
-
-	if got := gjson.GetBytes(body, "prompt_cache_key").String(); got != "exec-session-1" {
-		t.Fatalf("prompt_cache_key = %q, want %q", got, "exec-session-1")
-	}
-	if got := httpReq.Header.Get("session_id"); got != "exec-session-1" {
-		t.Fatalf("session_id = %q, want %q", got, "exec-session-1")
-	}
-}
-
-func TestCodexExecutorCacheHelper_OpenAIChatCompletions_FallsBackToStableAuthID(t *testing.T) {
-	executor := &CodexExecutor{}
-	rawJSON := []byte(`{"model":"gpt-5.4","stream":true}`)
-	req := cliproxyexecutor.Request{
-		Model:   "gpt-5.4",
-		Payload: []byte(`{"model":"gpt-5.4"}`),
-	}
-	auth := &cliproxyauth.Auth{ID: "codex-auth-1", Provider: "codex"}
-
-	httpReq, _, err := executor.cacheHelper(context.Background(), auth, sdktranslator.FromString("openai"), "https://example.com/responses", req, cliproxyexecutor.Options{}, rawJSON)
-	if err != nil {
-		t.Fatalf("cacheHelper error: %v", err)
-	}
-
-	body, err := io.ReadAll(httpReq.Body)
-	if err != nil {
-		t.Fatalf("read request body: %v", err)
-	}
-
-	expected := uuid.NewSHA1(uuid.NameSpaceOID, []byte("cli-proxy-api:codex:prompt-cache:auth:codex-auth-1")).String()
-	if got := gjson.GetBytes(body, "prompt_cache_key").String(); got != expected {
-		t.Fatalf("prompt_cache_key = %q, want %q", got, expected)
-	}
-	if got := httpReq.Header.Get("session_id"); got != expected {
-		t.Fatalf("session_id = %q, want %q", got, expected)
-	}
-}
-
-func TestCodexExecutorCacheHelper_ClaudePreservesCacheContinuity(t *testing.T) {
-	executor := &CodexExecutor{}
-	req := cliproxyexecutor.Request{
-		Model:   "claude-3-7-sonnet",
-		Payload: []byte(`{"metadata":{"user_id":"user-1"}}`),
-	}
-	rawJSON := []byte(`{"model":"gpt-5.4","stream":true}`)
-
-	httpReq, continuity, err := executor.cacheHelper(context.Background(), nil, sdktranslator.FromString("claude"), "https://example.com/responses", req, cliproxyexecutor.Options{}, rawJSON)
-	if err != nil {
-		t.Fatalf("cacheHelper error: %v", err)
-	}
-	if continuity.Key == "" {
-		t.Fatal("continuity.Key = empty, want non-empty")
-	}
-	body, err := io.ReadAll(httpReq.Body)
-	if err != nil {
-		t.Fatalf("read request body: %v", err)
-	}
-	if got := gjson.GetBytes(body, "prompt_cache_key").String(); got != continuity.Key {
-		t.Fatalf("prompt_cache_key = %q, want %q", got, continuity.Key)
-	}
-	if got := httpReq.Header.Get("session_id"); got != continuity.Key {
-		t.Fatalf("session_id = %q, want %q", got, continuity.Key)
-	}
-}
--- a/internal/runtime/executor/codex_websockets_executor.go
+++ b/internal/runtime/executor/codex_websockets_executor.go
@@ -178,6 +178,7 @@ func (e *CodexWebsocketsExecutor) Execute(ctx context.Context, auth *cliproxyaut
 	body, _ = sjson.SetBytes(body, "model", baseModel)
 	body, _ = sjson.SetBytes(body, "stream", true)
 	body, _ = sjson.DeleteBytes(body, "previous_response_id")
+	body, _ = sjson.DeleteBytes(body, "prompt_cache_retention")
 	body, _ = sjson.DeleteBytes(body, "safety_identifier")
 	if !gjson.GetBytes(body, "instructions").Exists() {
 		body, _ = sjson.SetBytes(body, "instructions", "")
@@ -189,7 +190,7 @@ func (e *CodexWebsocketsExecutor) Execute(ctx context.Context, auth *cliproxyaut
 		return resp, err
 	}

-	body, wsHeaders, continuity := applyCodexPromptCacheHeaders(ctx, auth, from, req, opts, body)
+	body, wsHeaders := applyCodexPromptCacheHeaders(from, req, body)
 	wsHeaders = applyCodexWebsocketHeaders(ctx, wsHeaders, auth, apiKey, e.cfg)

 	var authID, authLabel, authType, authValue string
@@ -208,7 +209,6 @@ func (e *CodexWebsocketsExecutor) Execute(ctx context.Context, auth *cliproxyaut
 	}

 	wsReqBody := buildCodexWebsocketRequestBody(body)
-	logCodexRequestDiagnostics(ctx, auth, req, opts, wsHeaders, body, continuity)
 	recordAPIRequest(ctx, e.cfg, upstreamRequestLog{
 		URL:       wsURL,
 		Method:    "WEBSOCKET",
@@ -385,7 +385,7 @@ func (e *CodexWebsocketsExecutor) ExecuteStream(ctx context.Context, auth *clipr
 		return nil, err
 	}

-	body, wsHeaders, continuity := applyCodexPromptCacheHeaders(ctx, auth, from, req, opts, body)
+	body, wsHeaders := applyCodexPromptCacheHeaders(from, req, body)
 	wsHeaders = applyCodexWebsocketHeaders(ctx, wsHeaders, auth, apiKey, e.cfg)

 	var authID, authLabel, authType, authValue string
@@ -403,7 +403,6 @@ func (e *CodexWebsocketsExecutor) ExecuteStream(ctx context.Context, auth *clipr
 	}

 	wsReqBody := buildCodexWebsocketRequestBody(body)
-	logCodexRequestDiagnostics(ctx, auth, req, opts, wsHeaders, body, continuity)
 	recordAPIRequest(ctx, e.cfg, upstreamRequestLog{
 		URL:       wsURL,
 		Method:    "WEBSOCKET",
@@ -762,14 +761,13 @@ func buildCodexResponsesWebsocketURL(httpURL string) (string, error) {
 	return parsed.String(), nil
 }

-func applyCodexPromptCacheHeaders(ctx context.Context, auth *cliproxyauth.Auth, from sdktranslator.Format, req cliproxyexecutor.Request, opts cliproxyexecutor.Options, rawJSON []byte) ([]byte, http.Header, codexContinuity) {
+func applyCodexPromptCacheHeaders(from sdktranslator.Format, req cliproxyexecutor.Request, rawJSON []byte) ([]byte, http.Header) {
 	headers := http.Header{}
 	if len(rawJSON) == 0 {
-		return rawJSON, headers, codexContinuity{}
+		return rawJSON, headers
 	}

 	var cache codexCache
-	continuity := codexContinuity{}
 	if from == "claude" {
 		userIDResult := gjson.GetBytes(req.Payload, "metadata.user_id")
 		if userIDResult.Exists() {
@@ -783,22 +781,20 @@ func applyCodexPromptCacheHeaders(ctx context.Context, auth *cliproxyauth.Auth,
 				}
 				setCodexCache(key, cache)
 			}
-			continuity = codexContinuity{Key: cache.ID, Source: "claude_user_cache"}
 		}
 	} else if from == "openai-response" {
 		if promptCacheKey := gjson.GetBytes(req.Payload, "prompt_cache_key"); promptCacheKey.Exists() {
 			cache.ID = promptCacheKey.String()
-			continuity = codexContinuity{Key: cache.ID, Source: "prompt_cache_key"}
 		}
-	} else if from == "openai" {
-		continuity = resolveCodexContinuity(ctx, auth, req, opts)
-		cache.ID = continuity.Key
 	}

-	rawJSON = applyCodexContinuityBody(rawJSON, continuity)
-	applyCodexContinuityHeaders(headers, continuity)
+	if cache.ID != "" {
+		rawJSON, _ = sjson.SetBytes(rawJSON, "prompt_cache_key", cache.ID)
+		headers.Set("Conversation_id", cache.ID)
+		headers.Set("Session_id", cache.ID)
+	}

-	return rawJSON, headers, continuity
+	return rawJSON, headers
 }

 func applyCodexWebsocketHeaders(ctx context.Context, headers http.Header, auth *cliproxyauth.Auth, token string, cfg *config.Config) http.Header {
@@ -830,7 +826,7 @@ func applyCodexWebsocketHeaders(ctx context.Context, headers http.Header, auth *
 		betaHeader = codexResponsesWebsocketBetaHeaderValue
 	}
 	headers.Set("OpenAI-Beta", betaHeader)
-	misc.EnsureHeader(headers, ginHeaders, "session_id", uuid.NewString())
+	misc.EnsureHeader(headers, ginHeaders, "Session_id", uuid.NewString())
 	ensureHeaderWithConfigPrecedence(headers, ginHeaders, "User-Agent", cfgUserAgent, codexUserAgent)

 	isAPIKey := false
--- a/internal/runtime/executor/codex_websockets_executor_test.go
+++ b/internal/runtime/executor/codex_websockets_executor_test.go
@@ -9,9 +9,7 @@ import (
 	"github.com/gin-gonic/gin"
 	"github.com/router-for-me/CLIProxyAPI/v6/internal/config"
 	cliproxyauth "github.com/router-for-me/CLIProxyAPI/v6/sdk/cliproxy/auth"
-	cliproxyexecutor "github.com/router-for-me/CLIProxyAPI/v6/sdk/cliproxy/executor"
 	sdkconfig "github.com/router-for-me/CLIProxyAPI/v6/sdk/config"
-	sdktranslator "github.com/router-for-me/CLIProxyAPI/v6/sdk/translator"
 	"github.com/tidwall/gjson"
 )

@@ -34,49 +32,6 @@ func TestBuildCodexWebsocketRequestBodyPreservesPreviousResponseID(t *testing.T)
 	}
 }

-func TestApplyCodexPromptCacheHeaders_PreservesPromptCacheRetention(t *testing.T) {
-	req := cliproxyexecutor.Request{
-		Model:   "gpt-5-codex",
-		Payload: []byte(`{"prompt_cache_key":"cache-key-1","prompt_cache_retention":"persistent"}`),
-	}
-	body := []byte(`{"model":"gpt-5-codex","stream":true,"prompt_cache_retention":"persistent"}`)
-
-	updatedBody, headers, _ := applyCodexPromptCacheHeaders(context.Background(), nil, sdktranslator.FromString("openai-response"), req, cliproxyexecutor.Options{}, body)
-
-	if got := gjson.GetBytes(updatedBody, "prompt_cache_key").String(); got != "cache-key-1" {
-		t.Fatalf("prompt_cache_key = %q, want %q", got, "cache-key-1")
-	}
-	if got := gjson.GetBytes(updatedBody, "prompt_cache_retention").String(); got != "persistent" {
-		t.Fatalf("prompt_cache_retention = %q, want %q", got, "persistent")
-	}
-	if got := headers.Get("session_id"); got != "cache-key-1" {
-		t.Fatalf("session_id = %q, want %q", got, "cache-key-1")
-	}
-	if got := headers.Get("Conversation_id"); got != "" {
-		t.Fatalf("Conversation_id = %q, want empty", got)
-	}
-}
-
-func TestApplyCodexPromptCacheHeaders_ClaudePreservesContinuity(t *testing.T) {
-	req := cliproxyexecutor.Request{
-		Model:   "claude-3-7-sonnet",
-		Payload: []byte(`{"metadata":{"user_id":"user-1"}}`),
-	}
-	body := []byte(`{"model":"gpt-5.4","stream":true}`)
-
-	updatedBody, headers, continuity := applyCodexPromptCacheHeaders(context.Background(), nil, sdktranslator.FromString("claude"), req, cliproxyexecutor.Options{}, body)
-
-	if continuity.Key == "" {
-		t.Fatal("continuity.Key = empty, want non-empty")
-	}
-	if got := gjson.GetBytes(updatedBody, "prompt_cache_key").String(); got != continuity.Key {
-		t.Fatalf("prompt_cache_key = %q, want %q", got, continuity.Key)
-	}
-	if got := headers.Get("session_id"); got != continuity.Key {
-		t.Fatalf("session_id = %q, want %q", got, continuity.Key)
-	}
-}
-
 func TestApplyCodexWebsocketHeadersDefaultsToCurrentResponsesBeta(t *testing.T) {
 	headers := applyCodexWebsocketHeaders(context.Background(), http.Header{}, nil, "", nil)

--- a/internal/translator/antigravity/claude/antigravity_claude_request.go
+++ b/internal/translator/antigravity/claude/antigravity_claude_request.go
@@ -330,32 +330,45 @@ func ConvertClaudeRequestToAntigravity(modelName string, inputRawJSON []byte, _
 					}
 				}

-				// Reorder parts for 'model' role to ensure thinking block is first
+				// Reorder parts for 'model' role:
+				// 1. Thinking parts first (Antigravity API requirement)
+				// 2. Regular parts (text, inlineData, etc.)
+				// 3. FunctionCall parts last
+				//
+				// Moving functionCall parts to the end prevents tool_use↔tool_result
+				// pairing breakage: the Antigravity API internally splits model messages
+				// at functionCall boundaries. If a text part follows a functionCall, the
+				// split creates an extra assistant turn between tool_use and tool_result,
+				// which Claude rejects with "tool_use ids were found without tool_result
+				// blocks immediately after".
 				if role == "model" {
 					partsResult := gjson.GetBytes(clientContentJSON, "parts")
 					if partsResult.IsArray() {
 						parts := partsResult.Array()
-						var thinkingParts []gjson.Result
-						var otherParts []gjson.Result
-						for _, part := range parts {
-							if part.Get("thought").Bool() {
-								thinkingParts = append(thinkingParts, part)
-							} else {
-								otherParts = append(otherParts, part)
-							}
-						}
-						if len(thinkingParts) > 0 {
-							firstPartIsThinking := parts[0].Get("thought").Bool()
-							if !firstPartIsThinking || len(thinkingParts) > 1 {
-								var newParts []interface{}
-								for _, p := range thinkingParts {
-									newParts = append(newParts, p.Value())
+						if len(parts) > 1 {
+							var thinkingParts []gjson.Result
+							var regularParts []gjson.Result
+							var functionCallParts []gjson.Result
+							for _, part := range parts {
+								if part.Get("thought").Bool() {
+									thinkingParts = append(thinkingParts, part)
+								} else if part.Get("functionCall").Exists() {
+									functionCallParts = append(functionCallParts, part)
+								} else {
+									regularParts = append(regularParts, part)
 								}
-								for _, p := range otherParts {
-									newParts = append(newParts, p.Value())
-								}
-								clientContentJSON, _ = sjson.SetBytes(clientContentJSON, "parts", newParts)
 							}
+							var newParts []interface{}
+							for _, p := range thinkingParts {
+								newParts = append(newParts, p.Value())
+							}
+							for _, p := range regularParts {
+								newParts = append(newParts, p.Value())
+							}
+							for _, p := range functionCallParts {
+								newParts = append(newParts, p.Value())
+							}
+							clientContentJSON, _ = sjson.SetBytes(clientContentJSON, "parts", newParts)
 						}
 					}
 				}
--- a/internal/translator/antigravity/claude/antigravity_claude_request_test.go
+++ b/internal/translator/antigravity/claude/antigravity_claude_request_test.go
@@ -361,6 +361,167 @@ func TestConvertClaudeRequestToAntigravity_ReorderThinking(t *testing.T) {
 	}
 }

+func TestConvertClaudeRequestToAntigravity_ReorderTextAfterFunctionCall(t *testing.T) {
+	// Bug: text part after tool_use in an assistant message causes Antigravity
+	// to split at functionCall boundary, creating an extra assistant turn that
+	// breaks tool_use↔tool_result adjacency (upstream issue #989).
+	// Fix: reorder parts so functionCall comes last.
+	inputJSON := []byte(`{
+		"model": "claude-sonnet-4-5",
+		"messages": [
+			{
+				"role": "assistant",
+				"content": [
+					{"type": "text", "text": "Let me check..."},
+					{
+						"type": "tool_use",
+						"id": "call_abc",
+						"name": "Read",
+						"input": {"file": "test.go"}
+					},
+					{"type": "text", "text": "Reading the file now"}
+				]
+			},
+			{
+				"role": "user",
+				"content": [
+					{
+						"type": "tool_result",
+						"tool_use_id": "call_abc",
+						"content": "file content"
+					}
+				]
+			}
+		]
+	}`)
+
+	output := ConvertClaudeRequestToAntigravity("claude-sonnet-4-5", inputJSON, false)
+	outputStr := string(output)
+
+	parts := gjson.Get(outputStr, "request.contents.0.parts").Array()
+	if len(parts) != 3 {
+		t.Fatalf("Expected 3 parts, got %d", len(parts))
+	}
+
+	// Text parts should come before functionCall
+	if parts[0].Get("text").String() != "Let me check..." {
+		t.Errorf("Expected first text part first, got %s", parts[0].Raw)
+	}
+	if parts[1].Get("text").String() != "Reading the file now" {
+		t.Errorf("Expected second text part second, got %s", parts[1].Raw)
+	}
+	if !parts[2].Get("functionCall").Exists() {
+		t.Errorf("Expected functionCall last, got %s", parts[2].Raw)
+	}
+	if parts[2].Get("functionCall.name").String() != "Read" {
+		t.Errorf("Expected functionCall name 'Read', got '%s'", parts[2].Get("functionCall.name").String())
+	}
+}
+
+func TestConvertClaudeRequestToAntigravity_ReorderParallelFunctionCalls(t *testing.T) {
+	inputJSON := []byte(`{
+		"model": "claude-sonnet-4-5",
+		"messages": [
+			{
+				"role": "assistant",
+				"content": [
+					{"type": "text", "text": "Reading both files."},
+					{
+						"type": "tool_use",
+						"id": "call_1",
+						"name": "Read",
+						"input": {"file": "a.go"}
+					},
+					{"type": "text", "text": "And this one too."},
+					{
+						"type": "tool_use",
+						"id": "call_2",
+						"name": "Read",
+						"input": {"file": "b.go"}
+					}
+				]
+			}
+		]
+	}`)
+
+	output := ConvertClaudeRequestToAntigravity("claude-sonnet-4-5", inputJSON, false)
+	outputStr := string(output)
+
+	parts := gjson.Get(outputStr, "request.contents.0.parts").Array()
+	if len(parts) != 4 {
+		t.Fatalf("Expected 4 parts, got %d", len(parts))
+	}
+
+	if parts[0].Get("text").String() != "Reading both files." {
+		t.Errorf("Expected first text, got %s", parts[0].Raw)
+	}
+	if parts[1].Get("text").String() != "And this one too." {
+		t.Errorf("Expected second text, got %s", parts[1].Raw)
+	}
+	if parts[2].Get("functionCall.name").String() != "Read" || parts[2].Get("functionCall.id").String() != "call_1" {
+		t.Errorf("Expected fc1 third, got %s", parts[2].Raw)
+	}
+	if parts[3].Get("functionCall.name").String() != "Read" || parts[3].Get("functionCall.id").String() != "call_2" {
+		t.Errorf("Expected fc2 fourth, got %s", parts[3].Raw)
+	}
+}
+
+func TestConvertClaudeRequestToAntigravity_ReorderThinkingAndTextBeforeFunctionCall(t *testing.T) {
+	cache.ClearSignatureCache("")
+
+	validSignature := "abc123validSignature1234567890123456789012345678901234567890"
+	thinkingText := "Let me think about this..."
+
+	inputJSON := []byte(`{
+		"model": "claude-sonnet-4-5-thinking",
+		"messages": [
+			{
+				"role": "user",
+				"content": [{"type": "text", "text": "Hello"}]
+			},
+			{
+				"role": "assistant",
+				"content": [
+					{"type": "text", "text": "Before thinking"},
+					{"type": "thinking", "thinking": "` + thinkingText + `", "signature": "` + validSignature + `"},
+					{
+						"type": "tool_use",
+						"id": "call_xyz",
+						"name": "Bash",
+						"input": {"command": "ls"}
+					},
+					{"type": "text", "text": "After tool call"}
+				]
+			}
+		]
+	}`)
+
+	cache.CacheSignature("claude-sonnet-4-5-thinking", thinkingText, validSignature)
+
+	output := ConvertClaudeRequestToAntigravity("claude-sonnet-4-5-thinking", inputJSON, false)
+	outputStr := string(output)
+
+	// contents.1 = assistant message (contents.0 = user)
+	parts := gjson.Get(outputStr, "request.contents.1.parts").Array()
+	if len(parts) != 4 {
+		t.Fatalf("Expected 4 parts, got %d", len(parts))
+	}
+
+	// Order: thinking → text → text → functionCall
+	if !parts[0].Get("thought").Bool() {
+		t.Error("First part should be thinking")
+	}
+	if parts[1].Get("functionCall").Exists() || parts[1].Get("thought").Bool() {
+		t.Errorf("Second part should be text, got %s", parts[1].Raw)
+	}
+	if parts[2].Get("functionCall").Exists() || parts[2].Get("thought").Bool() {
+		t.Errorf("Third part should be text, got %s", parts[2].Raw)
+	}
+	if !parts[3].Get("functionCall").Exists() {
+		t.Errorf("Last part should be functionCall, got %s", parts[3].Raw)
+	}
+}
+
 func TestConvertClaudeRequestToAntigravity_ToolResult(t *testing.T) {
 	inputJSON := []byte(`{
 		"model": "claude-3-5-sonnet-20240620",
--- a/sdk/cliproxy/auth/conductor.go
+++ b/sdk/cliproxy/auth/conductor.go
@@ -1734,77 +1734,79 @@ func (m *Manager) MarkResult(ctx context.Context, result Result) {
 			}
 		} else {
 			if result.Model != "" {
-				state := ensureModelState(auth, result.Model)
-				state.Unavailable = true
-				state.Status = StatusError
-				state.UpdatedAt = now
-				if result.Error != nil {
-					state.LastError = cloneError(result.Error)
-					state.StatusMessage = result.Error.Message
-					auth.LastError = cloneError(result.Error)
-					auth.StatusMessage = result.Error.Message
-				}
+				if !isRequestScopedNotFoundResultError(result.Error) {
+					state := ensureModelState(auth, result.Model)
+					state.Unavailable = true
+					state.Status = StatusError
+					state.UpdatedAt = now
+					if result.Error != nil {
+						state.LastError = cloneError(result.Error)
+						state.StatusMessage = result.Error.Message
+						auth.LastError = cloneError(result.Error)
+						auth.StatusMessage = result.Error.Message
+					}

-				statusCode := statusCodeFromResult(result.Error)
-				if isModelSupportResultError(result.Error) {
-					next := now.Add(12 * time.Hour)
-					state.NextRetryAfter = next
-					suspendReason = "model_not_supported"
-					shouldSuspendModel = true
-				} else {
-					switch statusCode {
-					case 401:
-						next := now.Add(30 * time.Minute)
-						state.NextRetryAfter = next
-						suspendReason = "unauthorized"
-						shouldSuspendModel = true
-					case 402, 403:
-						next := now.Add(30 * time.Minute)
-						state.NextRetryAfter = next
-						suspendReason = "payment_required"
-						shouldSuspendModel = true
-					case 404:
+					statusCode := statusCodeFromResult(result.Error)
+					if isModelSupportResultError(result.Error) {
 						next := now.Add(12 * time.Hour)
 						state.NextRetryAfter = next
-						suspendReason = "not_found"
+						suspendReason = "model_not_supported"
 						shouldSuspendModel = true
-					case 429:
-						var next time.Time
-						backoffLevel := state.Quota.BackoffLevel
-						if result.RetryAfter != nil {
-							next = now.Add(*result.RetryAfter)
-						} else {
-							cooldown, nextLevel := nextQuotaCooldown(backoffLevel, quotaCooldownDisabledForAuth(auth))
-							if cooldown > 0 {
-								next = now.Add(cooldown)
-							}
-							backoffLevel = nextLevel
-						}
-						state.NextRetryAfter = next
-						state.Quota = QuotaState{
-							Exceeded:      true,
-							Reason:        "quota",
-							NextRecoverAt: next,
-							BackoffLevel:  backoffLevel,
-						}
-						suspendReason = "quota"
-						shouldSuspendModel = true
-						setModelQuota = true
-					case 408, 500, 502, 503, 504:
-						if quotaCooldownDisabledForAuth(auth) {
-							state.NextRetryAfter = time.Time{}
-						} else {
-							next := now.Add(1 * time.Minute)
+					} else {
+						switch statusCode {
+						case 401:
+							next := now.Add(30 * time.Minute)
 							state.NextRetryAfter = next
+							suspendReason = "unauthorized"
+							shouldSuspendModel = true
+						case 402, 403:
+							next := now.Add(30 * time.Minute)
+							state.NextRetryAfter = next
+							suspendReason = "payment_required"
+							shouldSuspendModel = true
+						case 404:
+							next := now.Add(12 * time.Hour)
+							state.NextRetryAfter = next
+							suspendReason = "not_found"
+							shouldSuspendModel = true
+						case 429:
+							var next time.Time
+							backoffLevel := state.Quota.BackoffLevel
+							if result.RetryAfter != nil {
+								next = now.Add(*result.RetryAfter)
+							} else {
+								cooldown, nextLevel := nextQuotaCooldown(backoffLevel, quotaCooldownDisabledForAuth(auth))
+								if cooldown > 0 {
+									next = now.Add(cooldown)
+								}
+								backoffLevel = nextLevel
+							}
+							state.NextRetryAfter = next
+							state.Quota = QuotaState{
+								Exceeded:      true,
+								Reason:        "quota",
+								NextRecoverAt: next,
+								BackoffLevel:  backoffLevel,
+							}
+							suspendReason = "quota"
+							shouldSuspendModel = true
+							setModelQuota = true
+						case 408, 500, 502, 503, 504:
+							if quotaCooldownDisabledForAuth(auth) {
+								state.NextRetryAfter = time.Time{}
+							} else {
+								next := now.Add(1 * time.Minute)
+								state.NextRetryAfter = next
+							}
+						default:
+							state.NextRetryAfter = time.Time{}
 						}
-					default:
-						state.NextRetryAfter = time.Time{}
 					}
-				}

-				auth.Status = StatusError
-				auth.UpdatedAt = now
-				updateAggregatedAvailability(auth, now)
+					auth.Status = StatusError
+					auth.UpdatedAt = now
+					updateAggregatedAvailability(auth, now)
+				}
 			} else {
 				applyAuthFailureState(auth, result.Error, result.RetryAfter, now)
 			}
@@ -2056,11 +2058,29 @@ func isModelSupportResultError(err *Error) bool {
 	return isModelSupportErrorMessage(err.Message)
 }

+func isRequestScopedNotFoundMessage(message string) bool {
+	if message == "" {
+		return false
+	}
+	lower := strings.ToLower(message)
+	return strings.Contains(lower, "item with id") &&
+		strings.Contains(lower, "not found") &&
+		strings.Contains(lower, "items are not persisted when `store` is set to false")
+}
+
+func isRequestScopedNotFoundResultError(err *Error) bool {
+	if err == nil || statusCodeFromResult(err) != http.StatusNotFound {
+		return false
+	}
+	return isRequestScopedNotFoundMessage(err.Message)
+}
+
 // isRequestInvalidError returns true if the error represents a client request
 // error that should not be retried. Specifically, it treats 400 responses with
-// "invalid_request_error" and all 422 responses as request-shape failures,
-// where switching auths or pooled upstream models will not help. Model-support
-// errors are excluded so routing can fall through to another auth or upstream.
+// "invalid_request_error", request-scoped 404 item misses caused by `store=false`,
+// and all 422 responses as request-shape failures, where switching auths or
+// pooled upstream models will not help. Model-support errors are excluded so
+// routing can fall through to another auth or upstream.
 func isRequestInvalidError(err error) bool {
 	if err == nil {
 		return false
@@ -2072,6 +2092,8 @@ func isRequestInvalidError(err error) bool {
 	switch status {
 	case http.StatusBadRequest:
 		return strings.Contains(err.Error(), "invalid_request_error")
+	case http.StatusNotFound:
+		return isRequestScopedNotFoundMessage(err.Error())
 	case http.StatusUnprocessableEntity:
 		return true
 	default:
@@ -2083,6 +2105,9 @@ func applyAuthFailureState(auth *Auth, resultErr *Error, retryAfter *time.Durati
 	if auth == nil {
 		return
 	}
+	if isRequestScopedNotFoundResultError(resultErr) {
+		return
+	}
 	auth.Unavailable = true
 	auth.Status = StatusError
 	auth.UpdatedAt = now
--- a/sdk/cliproxy/auth/conductor_overrides_test.go
+++ b/sdk/cliproxy/auth/conductor_overrides_test.go
@@ -12,6 +12,8 @@ import (
 	cliproxyexecutor "github.com/router-for-me/CLIProxyAPI/v6/sdk/cliproxy/executor"
 )

+const requestScopedNotFoundMessage = "Item with id 'rs_0b5f3eb6f51f175c0169ca74e4a85881998539920821603a74' not found. Items are not persisted when `store` is set to false. Try again with `store` set to true, or remove this item from your input."
+
 func TestManager_ShouldRetryAfterError_RespectsAuthRequestRetryOverride(t *testing.T) {
 	m := NewManager(nil, nil, nil)
 	m.SetRetryConfig(3, 30*time.Second, 0)
@@ -447,3 +449,114 @@ func TestManager_MarkResult_RespectsAuthDisableCoolingOverride(t *testing.T) {
 		t.Fatalf("expected NextRetryAfter to be zero when disable_cooling=true, got %v", state.NextRetryAfter)
 	}
 }
+
+func TestManager_MarkResult_RequestScopedNotFoundDoesNotCooldownAuth(t *testing.T) {
+	m := NewManager(nil, nil, nil)
+
+	auth := &Auth{
+		ID:       "auth-1",
+		Provider: "openai",
+	}
+	if _, errRegister := m.Register(context.Background(), auth); errRegister != nil {
+		t.Fatalf("register auth: %v", errRegister)
+	}
+
+	model := "gpt-4.1"
+	m.MarkResult(context.Background(), Result{
+		AuthID:   auth.ID,
+		Provider: auth.Provider,
+		Model:    model,
+		Success:  false,
+		Error: &Error{
+			HTTPStatus: http.StatusNotFound,
+			Message:    requestScopedNotFoundMessage,
+		},
+	})
+
+	updated, ok := m.GetByID(auth.ID)
+	if !ok || updated == nil {
+		t.Fatalf("expected auth to be present")
+	}
+	if updated.Unavailable {
+		t.Fatalf("expected request-scoped 404 to keep auth available")
+	}
+	if !updated.NextRetryAfter.IsZero() {
+		t.Fatalf("expected request-scoped 404 to keep auth cooldown unset, got %v", updated.NextRetryAfter)
+	}
+	if state := updated.ModelStates[model]; state != nil {
+		t.Fatalf("expected request-scoped 404 to avoid model cooldown state, got %#v", state)
+	}
+}
+
+func TestManager_RequestScopedNotFoundStopsRetryWithoutSuspendingAuth(t *testing.T) {
+	m := NewManager(nil, nil, nil)
+	executor := &authFallbackExecutor{
+		id: "openai",
+		executeErrors: map[string]error{
+			"aa-bad-auth": &Error{
+				HTTPStatus: http.StatusNotFound,
+				Message:    requestScopedNotFoundMessage,
+			},
+		},
+	}
+	m.RegisterExecutor(executor)
+
+	model := "gpt-4.1"
+	badAuth := &Auth{ID: "aa-bad-auth", Provider: "openai"}
+	goodAuth := &Auth{ID: "bb-good-auth", Provider: "openai"}
+
+	reg := registry.GetGlobalRegistry()
+	reg.RegisterClient(badAuth.ID, "openai", []*registry.ModelInfo{{ID: model}})
+	reg.RegisterClient(goodAuth.ID, "openai", []*registry.ModelInfo{{ID: model}})
+	t.Cleanup(func() {
+		reg.UnregisterClient(badAuth.ID)
+		reg.UnregisterClient(goodAuth.ID)
+	})
+
+	if _, errRegister := m.Register(context.Background(), badAuth); errRegister != nil {
+		t.Fatalf("register bad auth: %v", errRegister)
+	}
+	if _, errRegister := m.Register(context.Background(), goodAuth); errRegister != nil {
+		t.Fatalf("register good auth: %v", errRegister)
+	}
+
+	_, errExecute := m.Execute(context.Background(), []string{"openai"}, cliproxyexecutor.Request{Model: model}, cliproxyexecutor.Options{})
+	if errExecute == nil {
+		t.Fatal("expected request-scoped not-found error")
+	}
+	errResult, ok := errExecute.(*Error)
+	if !ok {
+		t.Fatalf("expected *Error, got %T", errExecute)
+	}
+	if errResult.HTTPStatus != http.StatusNotFound {
+		t.Fatalf("status = %d, want %d", errResult.HTTPStatus, http.StatusNotFound)
+	}
+	if errResult.Message != requestScopedNotFoundMessage {
+		t.Fatalf("message = %q, want %q", errResult.Message, requestScopedNotFoundMessage)
+	}
+
+	got := executor.ExecuteCalls()
+	want := []string{badAuth.ID}
+	if len(got) != len(want) {
+		t.Fatalf("execute calls = %v, want %v", got, want)
+	}
+	for i := range want {
+		if got[i] != want[i] {
+			t.Fatalf("execute call %d auth = %q, want %q", i, got[i], want[i])
+		}
+	}
+
+	updatedBad, ok := m.GetByID(badAuth.ID)
+	if !ok || updatedBad == nil {
+		t.Fatalf("expected bad auth to remain registered")
+	}
+	if updatedBad.Unavailable {
+		t.Fatalf("expected request-scoped 404 to keep bad auth available")
+	}
+	if !updatedBad.NextRetryAfter.IsZero() {
+		t.Fatalf("expected request-scoped 404 to keep bad auth cooldown unset, got %v", updatedBad.NextRetryAfter)
+	}
+	if state := updatedBad.ModelStates[model]; state != nil {
+		t.Fatalf("expected request-scoped 404 to avoid bad auth model cooldown state, got %#v", state)
+	}
+}
--- a/sdk/cliproxy/auth/scheduler.go
+++ b/sdk/cliproxy/auth/scheduler.go
@@ -293,12 +293,46 @@ func (s *authScheduler) pickMixed(ctx context.Context, providers []string, model
 	}

 	cursorKey := strings.Join(normalized, ",") + ":" + modelKey
-	start := 0
-	if len(normalized) > 0 {
-		start = s.mixedCursors[cursorKey] % len(normalized)
+	weights := make([]int, len(normalized))
+	segmentStarts := make([]int, len(normalized))
+	segmentEnds := make([]int, len(normalized))
+	totalWeight := 0
+	for providerIndex, shard := range candidateShards {
+		segmentStarts[providerIndex] = totalWeight
+		if shard != nil {
+			weights[providerIndex] = shard.readyCountAtPriorityLocked(false, bestPriority)
+		}
+		totalWeight += weights[providerIndex]
+		segmentEnds[providerIndex] = totalWeight
 	}
+	if totalWeight == 0 {
+		return nil, "", s.mixedUnavailableErrorLocked(normalized, model, tried)
+	}
+
+	startSlot := s.mixedCursors[cursorKey] % totalWeight
+	startProviderIndex := -1
+	for providerIndex := range normalized {
+		if weights[providerIndex] == 0 {
+			continue
+		}
+		if startSlot < segmentEnds[providerIndex] {
+			startProviderIndex = providerIndex
+			break
+		}
+	}
+	if startProviderIndex < 0 {
+		return nil, "", s.mixedUnavailableErrorLocked(normalized, model, tried)
+	}
+
+	slot := startSlot
 	for offset := 0; offset < len(normalized); offset++ {
-		providerIndex := (start + offset) % len(normalized)
+		providerIndex := (startProviderIndex + offset) % len(normalized)
+		if weights[providerIndex] == 0 {
+			continue
+		}
+		if providerIndex != startProviderIndex {
+			slot = segmentStarts[providerIndex]
+		}
 		providerKey := normalized[providerIndex]
 		shard := candidateShards[providerIndex]
 		if shard == nil {
@@ -308,7 +342,7 @@ func (s *authScheduler) pickMixed(ctx context.Context, providers []string, model
 		if picked == nil {
 			continue
 		}
-		s.mixedCursors[cursorKey] = providerIndex + 1
+		s.mixedCursors[cursorKey] = slot + 1
 		return picked, providerKey, nil
 	}
 	return nil, "", s.mixedUnavailableErrorLocked(normalized, model, tried)
@@ -704,6 +738,20 @@ func (m *modelScheduler) pickReadyAtPriorityLocked(preferWebsocket bool, priorit
 	return picked.auth
 }

+func (m *modelScheduler) readyCountAtPriorityLocked(preferWebsocket bool, priority int) int {
+	if m == nil {
+		return 0
+	}
+	bucket := m.readyByPriority[priority]
+	if bucket == nil {
+		return 0
+	}
+	if preferWebsocket && len(bucket.ws.flat) > 0 {
+		return len(bucket.ws.flat)
+	}
+	return len(bucket.all.flat)
+}
+
 // unavailableErrorLocked returns the correct unavailable or cooldown error for the shard.
 func (m *modelScheduler) unavailableErrorLocked(provider, model string, predicate func(*scheduledAuth) bool) error {
 	now := time.Now()
--- a/sdk/cliproxy/auth/scheduler_test.go
+++ b/sdk/cliproxy/auth/scheduler_test.go
@@ -208,7 +208,7 @@ func TestSchedulerPick_CodexWebsocketPrefersWebsocketEnabledSubset(t *testing.T)
 	}
 }

-func TestSchedulerPick_MixedProvidersUsesProviderRotationOverReadyCandidates(t *testing.T) {
+func TestSchedulerPick_MixedProvidersUsesWeightedProviderRotationOverReadyCandidates(t *testing.T) {
 	t.Parallel()

 	scheduler := newSchedulerForTest(
@@ -218,8 +218,8 @@ func TestSchedulerPick_MixedProvidersUsesProviderRotationOverReadyCandidates(t *
 		&Auth{ID: "claude-a", Provider: "claude"},
 	)

-	wantProviders := []string{"gemini", "claude", "gemini", "claude"}
-	wantIDs := []string{"gemini-a", "claude-a", "gemini-b", "claude-a"}
+	wantProviders := []string{"gemini", "gemini", "claude", "gemini"}
+	wantIDs := []string{"gemini-a", "gemini-b", "claude-a", "gemini-a"}
 	for index := range wantProviders {
 		got, provider, errPick := scheduler.pickMixed(context.Background(), []string{"gemini", "claude"}, "", cliproxyexecutor.Options{}, nil)
 		if errPick != nil {
@@ -272,7 +272,7 @@ func TestSchedulerPick_MixedProvidersPrefersHighestPriorityTier(t *testing.T) {
 	}
 }

-func TestManager_PickNextMixed_UsesProviderRotationBeforeCredentialRotation(t *testing.T) {
+func TestManager_PickNextMixed_UsesWeightedProviderRotationBeforeCredentialRotation(t *testing.T) {
 	t.Parallel()

 	manager := NewManager(nil, &RoundRobinSelector{}, nil)
@@ -288,8 +288,8 @@ func TestManager_PickNextMixed_UsesProviderRotationBeforeCredentialRotation(t *t
 		t.Fatalf("Register(claude-a) error = %v", errRegister)
 	}

-	wantProviders := []string{"gemini", "claude", "gemini", "claude"}
-	wantIDs := []string{"gemini-a", "claude-a", "gemini-b", "claude-a"}
+	wantProviders := []string{"gemini", "gemini", "claude", "gemini"}
+	wantIDs := []string{"gemini-a", "gemini-b", "claude-a", "gemini-a"}
 	for index := range wantProviders {
 		got, _, provider, errPick := manager.pickNextMixed(context.Background(), []string{"gemini", "claude"}, "", cliproxyexecutor.Options{}, map[string]struct{}{})
 		if errPick != nil {
@@ -399,8 +399,8 @@ func TestManager_PickNextMixed_UsesSchedulerRotation(t *testing.T) {
 		t.Fatalf("Register(claude-a) error = %v", errRegister)
 	}

-	wantProviders := []string{"gemini", "claude", "gemini", "claude"}
-	wantIDs := []string{"gemini-a", "claude-a", "gemini-b", "claude-a"}
+	wantProviders := []string{"gemini", "gemini", "claude", "gemini"}
+	wantIDs := []string{"gemini-a", "gemini-b", "claude-a", "gemini-a"}
 	for index := range wantProviders {
 		got, _, provider, errPick := manager.pickNextMixed(context.Background(), []string{"gemini", "claude"}, "", cliproxyexecutor.Options{}, nil)
 		if errPick != nil {
Author	SHA1	Message	Date
JOUNGWOOK KWON	f93dde43df	docs: add API usage guide with multi-language code examples curl, Python(Anthropic/OpenAI SDK), Node.js(Anthropic/OpenAI SDK), Claude Code CLI 등 다양한 호출 방법 정리 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-31 21:23:22 +09:00
JOUNGWOOK KWON	84a2874c7b	docs: add Docker deployment and reverse proxy setup guides - DOCKER_DEPLOY.md: NAS Docker 배포 전체 가이드 (컨테이너 설정, config.yaml, 포트 등) - REVERSE_PROXY_SETUP.md: cliproxy.gru.farm HTTPS/역방향 프록시 설정 가이드 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-31 21:20:55 +09:00
Luis Pater	c10f8ae2e2	Fixed: #2420 Some checks failed docker-image / docker_amd64 (push) Has been cancelled Details docker-image / docker_arm64 (push) Has been cancelled Details goreleaser / goreleaser (push) Has been cancelled Details docker-image / docker_manifest (push) Has been cancelled Details docs(readme): remove ProxyPal section from all README translations	2026-03-31 07:23:02 +08:00
Luis Pater	17363edf25	fix(auth): skip downtime for request-scoped 404 errors in model state management	2026-03-30 22:22:42 +08:00
Luis Pater	486cd4c343	Merge pull request #2409 from sususu98/fix/tool-use-pairing-break fix(antigravity): reorder model parts to prevent tool_use↔tool_result pairing breakage	2026-03-30 16:59:46 +08:00
sususu98	25feceb783	fix(antigravity): reorder model parts to prevent tool_use↔tool_result pairing breakage When a Claude assistant message contains [text, tool_use, text], the Antigravity API internally splits the model message at functionCall boundaries, creating an extra assistant turn between tool_use and the following tool_result. Claude then rejects with: tool_use ids were found without tool_result blocks immediately after Fix: extend the existing 2-way part reordering (thinking-first) to a 3-way partition: thinking → regular → functionCall. This ensures functionCall parts are always last, so Antigravity's split cannot insert an extra assistant turn before the user's tool_result. Fixes #989	2026-03-30 15:09:33 +08:00
Luis Pater	d26752250d	Merge pull request #2403 from CharTyr/clean-pr fix(amp): 修复Amp CLI 集成缺失/无效 signature 导致的 TUI 崩溃与上游 400 问题	2026-03-30 12:54:15 +08:00
CharTyr	b15453c369	fix(amp): address PR review - stream thinking suppression, SSE detection, test init - Call suppressAmpThinking in rewriteStreamEvent for streaming path - Handle nil return from suppressAmpThinking to skip suppressed events - Narrow looksLikeSSEChunk to line-prefix detection (HasPrefix vs Contains) - Initialize suppressedContentBlock map in test	2026-03-30 00:42:04 -04:00
CharTyr	04ba8c8bc3	feat(amp): sanitize signatures and handle stream suppression for Amp compatibility	2026-03-29 22:23:18 -04:00
Luis Pater	6570692291	Merge pull request #2400 from router-for-me/revert-2374-codex-cache-clean Some checks failed docker-image / docker_amd64 (push) Has been cancelled Details docker-image / docker_arm64 (push) Has been cancelled Details goreleaser / goreleaser (push) Has been cancelled Details docker-image / docker_manifest (push) Has been cancelled Details Revert "fix(codex): restore prompt cache continuity for Codex requests"	2026-03-29 22:19:39 +08:00
Luis Pater	13aa5b3375	Revert "fix(codex): restore prompt cache continuity for Codex requests"	2026-03-29 22:18:14 +08:00
Luis Pater	6d8de0ade4	feat(auth): implement weighted provider rotation for improved scheduling fairness	2026-03-29 13:49:01 +08:00
Luis Pater	1587ff5e74	Merge pull request #2389 from router-for-me/claude fix(claude): add default max_tokens for models	2026-03-29 13:03:20 +08:00
hkfires	f033d3a6df	fix(claude): enhance ensureModelMaxTokens to use registered max_completion_tokens and fallback to default	2026-03-29 13:00:43 +08:00
hkfires	145e0e0b5d	fix(claude): add default max_tokens for models	2026-03-29 12:46:00 +08:00