Compare commits

..

5 Commits

Author SHA1 Message Date
Luis Pater
65f47c196a Merge pull request #1 from chaudhryfaisal/main
Some checks failed
goreleaser / goreleaser (push) Has been cancelled
Correct config in README.md
2025-07-09 16:57:19 +08:00
Faisal Chaudhry
9be56fe8e0 Correct config in README.md 2025-07-08 23:28:55 -04:00
Luis Pater
589ae6d3aa Add support for Generative Language API Key and improve client initialization
Some checks failed
goreleaser / goreleaser (push) Has been cancelled
- Added `GlAPIKey` support in configuration to enable Generative Language API.
- Integrated `GenerativeLanguageAPIKey` handling in client and API handlers.
- Updated response translators to manage generative language responses properly.
- Enhanced HTTP client initialization logic with proxy support for API requests.
- Refactored streaming and non-streaming flows to account for generative language-specific logic.
2025-07-06 02:13:11 +08:00
Luis Pater
7cb76ae1a5 Enhance quota management and refactor configuration handling
Some checks failed
goreleaser / goreleaser (push) Has been cancelled
- Introduced `QuotaExceeded` settings in configuration to handle quota limits more effectively.
- Added preview model switching logic to `Client` to automatically use fallback models on quota exhaustion.
- Refactored `APIHandlers` to leverage new configuration structure.
- Simplified server initialization and removed redundant `ServerConfig` structure.
- Streamlined client initialization by unifying configuration handling throughout the project.
- Improved error handling and response mechanisms in both streaming and non-streaming flows.
2025-07-05 07:53:46 +08:00
Luis Pater
e73f165070 Refactor API handlers to streamline response handling
Some checks failed
goreleaser / goreleaser (push) Has been cancelled
- Replaced channel-based handling in `SendMessage` flow with direct synchronous execution.
- Introduced `hasFirstResponse` flag to manage keep-alive signals in streaming handler.
- Simplified error handling and removed redundant code for enhanced readability and maintainability.
2025-07-05 04:10:00 +08:00
8 changed files with 505 additions and 295 deletions

View File

@@ -10,6 +10,7 @@ A proxy server that provides an OpenAI-compatible API interface for CLI. This al
- Multimodal input support (text and images)
- Multiple account support with load balancing
- Simple CLI authentication flow
- Support for Generative Language API Key
## Installation
@@ -146,13 +147,14 @@ The server uses a YAML configuration file (`config.yaml`) located in the project
### Configuration Options
| Parameter | Type | Default | Description |
|-------------|----------|--------------------|----------------------------------------------------------------------------------------------|
| `port` | integer | 8317 | The port number on which the server will listen |
| `auth_dir` | string | "~/.cli-proxy-api" | Directory where authentication tokens are stored. Supports using `~` for home directory |
| `proxy-url` | string | "" | Proxy url, support socks5/http/https protocol, example: socks5://user:pass@192.168.1.1:1080/ |
| `debug` | boolean | false | Enable debug mode for verbose logging |
| `api_keys` | string[] | [] | List of API keys that can be used to authenticate requests |
| Parameter | Type | Default | Description |
|-------------------------------|----------|--------------------|----------------------------------------------------------------------------------------------|
| `port` | integer | 8317 | The port number on which the server will listen |
| `auth-dir` | string | "~/.cli-proxy-api" | Directory where authentication tokens are stored. Supports using `~` for home directory |
| `proxy-url` | string | "" | Proxy url, support socks5/http/https protocol, example: socks5://user:pass@192.168.1.1:1080/ |
| `debug` | boolean | false | Enable debug mode for verbose logging |
| `api-keys` | string[] | [] | List of API keys that can be used to authenticate requests |
| `generative-language-api-key` | string[] | [] | List of Generative Language API keys |
### Example Configuration File
@@ -161,24 +163,24 @@ The server uses a YAML configuration file (`config.yaml`) located in the project
port: 8317
# Authentication directory (supports ~ for home directory)
auth_dir: "~/.cli-proxy-api"
auth-dir: "~/.cli-proxy-api"
# Enable debug logging
debug: false
# API keys for authentication
api_keys:
api-keys:
- "your-api-key-1"
- "your-api-key-2"
```
### Authentication Directory
The `auth_dir` parameter specifies where authentication tokens are stored. When you run the login command, the application will create JSON files in this directory containing the authentication tokens for your Google accounts. Multiple accounts can be used for load balancing.
The `auth-dir` parameter specifies where authentication tokens are stored. When you run the login command, the application will create JSON files in this directory containing the authentication tokens for your Google accounts. Multiple accounts can be used for load balancing.
### API Keys
The `api_keys` parameter allows you to define a list of API keys that can be used to authenticate requests to your proxy server. When making requests to the API, you can include one of these keys in the `Authorization` header:
The `api-keys` parameter allows you to define a list of API keys that can be used to authenticate requests to your proxy server. When making requests to the API, you can include one of these keys in the `Authorization` header:
```
Authorization: Bearer your-api-key-1

View File

@@ -1,7 +1,15 @@
port: 8317
auth_dir: "~/.cli-proxy-api"
auth-dir: "~/.cli-proxy-api"
debug: true
proxy-url: ""
api_keys:
quota-exceeded:
switch-project: true
switch-preview-model: true
api-keys:
- "12345"
- "23456"
- "23456"
generative-language-api-key:
- "AIzaSy...01"
- "AIzaSy...02"
- "AIzaSy...03"
- "AIzaSy...04"

View File

@@ -5,6 +5,7 @@ import (
"fmt"
"github.com/luispater/CLIProxyAPI/internal/api/translator"
"github.com/luispater/CLIProxyAPI/internal/client"
"github.com/luispater/CLIProxyAPI/internal/config"
log "github.com/sirupsen/logrus"
"github.com/tidwall/gjson"
"net/http"
@@ -23,15 +24,15 @@ var (
// It holds a pool of clients to interact with the backend service.
type APIHandlers struct {
cliClients []*client.Client
debug bool
cfg *config.Config
}
// NewAPIHandlers creates a new API handlers instance.
// It takes a slice of clients and a debug flag as input.
func NewAPIHandlers(cliClients []*client.Client, debug bool) *APIHandlers {
func NewAPIHandlers(cliClients []*client.Client, cfg *config.Config) *APIHandlers {
return &APIHandlers{
cliClients: cliClients,
debug: debug,
cfg: cfg,
}
}
@@ -216,70 +217,74 @@ func (h *APIHandlers) handleNonStreamingResponse(c *gin.Context, rawJson []byte)
}
}()
// Lock the mutex to update the last used page index
mutex.Lock()
startIndex := lastUsedClientIndex
currentIndex := (startIndex + 1) % len(h.cliClients)
lastUsedClientIndex = currentIndex
mutex.Unlock()
for {
// Lock the mutex to update the last used client index
mutex.Lock()
startIndex := lastUsedClientIndex
currentIndex := (startIndex + 1) % len(h.cliClients)
lastUsedClientIndex = currentIndex
mutex.Unlock()
// Reorder the pages to start from the last used index
reorderedPages := make([]*client.Client, len(h.cliClients))
for i := 0; i < len(h.cliClients); i++ {
reorderedPages[i] = h.cliClients[(startIndex+1+i)%len(h.cliClients)]
}
locked := false
for i := 0; i < len(reorderedPages); i++ {
cliClient = reorderedPages[i]
if cliClient.RequestMutex.TryLock() {
locked = true
break
// Reorder the client to start from the last used index
reorderedClients := make([]*client.Client, 0)
for i := 0; i < len(h.cliClients); i++ {
cliClient = h.cliClients[(startIndex+1+i)%len(h.cliClients)]
if cliClient.IsModelQuotaExceeded(modelName) {
log.Debugf("Model %s is quota exceeded for account %s, project id: %s", modelName, cliClient.GetEmail(), cliClient.GetProjectID())
cliClient = nil
continue
}
reorderedClients = append(reorderedClients, cliClient)
}
}
if !locked {
cliClient = h.cliClients[0]
cliClient.RequestMutex.Lock()
}
log.Debugf("Request use account: %s, project id: %s", cliClient.GetEmail(), cliClient.GetProjectID())
if len(reorderedClients) == 0 {
c.Status(429)
_, _ = fmt.Fprint(c.Writer, fmt.Sprintf(`{"error":{"code":429,"message":"All the models of '%s' are quota exceeded","status":"RESOURCE_EXHAUSTED"}}`, modelName))
flusher.Flush()
cliCancel()
return
}
locked := false
for i := 0; i < len(reorderedClients); i++ {
cliClient = reorderedClients[i]
if cliClient.RequestMutex.TryLock() {
locked = true
break
}
}
if !locked {
cliClient = h.cliClients[0]
cliClient.RequestMutex.Lock()
}
isGlAPIKey := false
if glAPIKey := cliClient.GetGenerativeLanguageAPIKey(); glAPIKey != "" {
log.Debugf("Request use generative language API Key: %s", glAPIKey)
isGlAPIKey = true
} else {
log.Debugf("Request use account: %s, project id: %s", cliClient.GetEmail(), cliClient.GetProjectID())
}
respChan := make(chan []byte)
errChan := make(chan *client.ErrorMessage)
go func() {
resp, err := cliClient.SendMessage(cliCtx, rawJson, modelName, contents, tools)
if err != nil {
errChan <- err
} else {
respChan <- resp
}
}()
for {
select {
case <-c.Request.Context().Done():
if c.Request.Context().Err().Error() == "context canceled" {
log.Debugf("Client disconnected: %v", c.Request.Context().Err())
if err.StatusCode == 429 && h.cfg.QuotaExceeded.SwitchProject {
continue
} else {
c.Status(err.StatusCode)
_, _ = fmt.Fprint(c.Writer, err.Error.Error())
flusher.Flush()
cliCancel()
return
}
case respBody := <-respChan:
openAIFormat := translator.ConvertCliToOpenAINonStream(respBody)
break
} else {
openAIFormat := translator.ConvertCliToOpenAINonStream(resp, time.Now().Unix(), isGlAPIKey)
if openAIFormat != "" {
_, _ = fmt.Fprintf(c.Writer, "data: %s\n\n", openAIFormat)
flusher.Flush()
}
cliCancel()
return
case err := <-errChan:
c.Status(err.StatusCode)
_, _ = fmt.Fprint(c.Writer, err.Error.Error())
flusher.Flush()
cliCancel()
return
case <-time.After(500 * time.Millisecond):
_, _ = c.Writer.Write([]byte("\n"))
flusher.Flush()
break
}
}
}
@@ -314,75 +319,104 @@ func (h *APIHandlers) handleStreamingResponse(c *gin.Context, rawJson []byte) {
}
}()
// Use a round-robin approach to select the next available client.
// This distributes the load among the available clients.
mutex.Lock()
startIndex := lastUsedClientIndex
currentIndex := (startIndex + 1) % len(h.cliClients)
lastUsedClientIndex = currentIndex
mutex.Unlock()
// Reorder the clients to start from the next client in the rotation.
reorderedPages := make([]*client.Client, len(h.cliClients))
for i := 0; i < len(h.cliClients); i++ {
reorderedPages[i] = h.cliClients[(startIndex+1+i)%len(h.cliClients)]
}
// Attempt to lock a client for the request.
locked := false
for i := 0; i < len(reorderedPages); i++ {
cliClient = reorderedPages[i]
if cliClient.RequestMutex.TryLock() {
locked = true
break
}
}
// If no client is available, block and wait for the first client.
if !locked {
cliClient = h.cliClients[0]
cliClient.RequestMutex.Lock()
}
log.Debugf("Request use account: %s, project id: %s", cliClient.GetEmail(), cliClient.GetProjectID())
// Send the message and receive response chunks and errors via channels.
respChan, errChan := cliClient.SendMessageStream(cliCtx, rawJson, modelName, contents, tools)
outLoop:
for {
select {
// Handle client disconnection.
case <-c.Request.Context().Done():
if c.Request.Context().Err().Error() == "context canceled" {
log.Debugf("Client disconnected: %v", c.Request.Context().Err())
cliCancel() // Cancel the backend request.
return
// Lock the mutex to update the last used client index
mutex.Lock()
startIndex := lastUsedClientIndex
currentIndex := (startIndex + 1) % len(h.cliClients)
lastUsedClientIndex = currentIndex
mutex.Unlock()
// Reorder the client to start from the last used index
reorderedClients := make([]*client.Client, 0)
for i := 0; i < len(h.cliClients); i++ {
cliClient = h.cliClients[(startIndex+1+i)%len(h.cliClients)]
if cliClient.IsModelQuotaExceeded(modelName) {
log.Debugf("Model %s is quota exceeded for account %s, project id: %s", modelName, cliClient.GetEmail(), cliClient.GetProjectID())
cliClient = nil
continue
}
// Process incoming response chunks.
case chunk, okStream := <-respChan:
if !okStream {
// Stream is closed, send the final [DONE] message.
_, _ = fmt.Fprintf(c.Writer, "data: [DONE]\n\n")
flusher.Flush()
cliCancel()
return
} else {
// Convert the chunk to OpenAI format and send it to the client.
openAIFormat := translator.ConvertCliToOpenAI(chunk)
if openAIFormat != "" {
_, _ = fmt.Fprintf(c.Writer, "data: %s\n\n", openAIFormat)
reorderedClients = append(reorderedClients, cliClient)
}
if len(reorderedClients) == 0 {
c.Status(429)
_, _ = fmt.Fprint(c.Writer, fmt.Sprintf(`{"error":{"code":429,"message":"All the models of '%s' are quota exceeded","status":"RESOURCE_EXHAUSTED"}}`, modelName))
flusher.Flush()
cliCancel()
return
}
locked := false
for i := 0; i < len(reorderedClients); i++ {
cliClient = reorderedClients[i]
if cliClient.RequestMutex.TryLock() {
locked = true
break
}
}
if !locked {
cliClient = h.cliClients[0]
cliClient.RequestMutex.Lock()
}
isGlAPIKey := false
if glAPIKey := cliClient.GetGenerativeLanguageAPIKey(); glAPIKey != "" {
log.Debugf("Request use generative language API Key: %s", glAPIKey)
isGlAPIKey = true
} else {
log.Debugf("Request use account: %s, project id: %s", cliClient.GetEmail(), cliClient.GetProjectID())
}
// Send the message and receive response chunks and errors via channels.
respChan, errChan := cliClient.SendMessageStream(cliCtx, rawJson, modelName, contents, tools)
hasFirstResponse := false
for {
select {
// Handle client disconnection.
case <-c.Request.Context().Done():
if c.Request.Context().Err().Error() == "context canceled" {
log.Debugf("Client disconnected: %v", c.Request.Context().Err())
cliCancel() // Cancel the backend request.
return
}
// Process incoming response chunks.
case chunk, okStream := <-respChan:
if !okStream {
// Stream is closed, send the final [DONE] message.
_, _ = fmt.Fprintf(c.Writer, "data: [DONE]\n\n")
flusher.Flush()
cliCancel()
return
} else {
// Convert the chunk to OpenAI format and send it to the client.
hasFirstResponse = true
openAIFormat := translator.ConvertCliToOpenAI(chunk, time.Now().Unix(), isGlAPIKey)
if openAIFormat != "" {
_, _ = fmt.Fprintf(c.Writer, "data: %s\n\n", openAIFormat)
flusher.Flush()
}
}
// Handle errors from the backend.
case err, okError := <-errChan:
if okError {
if err.StatusCode == 429 && h.cfg.QuotaExceeded.SwitchProject {
continue outLoop
} else {
c.Status(err.StatusCode)
_, _ = fmt.Fprint(c.Writer, err.Error.Error())
flusher.Flush()
cliCancel()
}
return
}
// Send a keep-alive signal to the client.
case <-time.After(500 * time.Millisecond):
if hasFirstResponse {
_, _ = c.Writer.Write([]byte(": CLI-PROXY-API PROCESSING\n\n"))
flusher.Flush()
}
}
// Handle errors from the backend.
case err, okError := <-errChan:
if okError {
c.Status(err.StatusCode)
_, _ = fmt.Fprint(c.Writer, err.Error.Error())
flusher.Flush()
cliCancel()
return
}
// Send a keep-alive signal to the client.
case <-time.After(500 * time.Millisecond):
_, _ = c.Writer.Write([]byte(": CLI-PROXY-API PROCESSING\n\n"))
flusher.Flush()
}
}
}

View File

@@ -6,6 +6,7 @@ import (
"fmt"
"github.com/gin-gonic/gin"
"github.com/luispater/CLIProxyAPI/internal/client"
"github.com/luispater/CLIProxyAPI/internal/config"
log "github.com/sirupsen/logrus"
"net/http"
"strings"
@@ -17,29 +18,19 @@ type Server struct {
engine *gin.Engine
server *http.Server
handlers *APIHandlers
cfg *ServerConfig
}
// ServerConfig contains the configuration for the API server.
type ServerConfig struct {
// Port is the port number the server will listen on.
Port string
// Debug enables or disables debug mode for the server and Gin.
Debug bool
// ApiKeys is a list of valid API keys for authentication.
ApiKeys []string
cfg *config.Config
}
// NewServer creates and initializes a new API server instance.
// It sets up the Gin engine, middleware, routes, and handlers.
func NewServer(config *ServerConfig, cliClients []*client.Client) *Server {
func NewServer(cfg *config.Config, cliClients []*client.Client) *Server {
// Set gin mode
if !config.Debug {
if !cfg.Debug {
gin.SetMode(gin.ReleaseMode)
}
// Create handlers
handlers := NewAPIHandlers(cliClients, config.Debug)
handlers := NewAPIHandlers(cliClients, cfg)
// Create gin engine
engine := gin.New()
@@ -53,7 +44,7 @@ func NewServer(config *ServerConfig, cliClients []*client.Client) *Server {
s := &Server{
engine: engine,
handlers: handlers,
cfg: config,
cfg: cfg,
}
// Setup routes
@@ -61,7 +52,7 @@ func NewServer(config *ServerConfig, cliClients []*client.Client) *Server {
// Create HTTP server
s.server = &http.Server{
Addr: ":" + config.Port,
Addr: fmt.Sprintf(":%d", cfg.Port),
Handler: engine,
}
@@ -138,7 +129,7 @@ func corsMiddleware() gin.HandlerFunc {
// AuthMiddleware returns a Gin middleware handler that authenticates requests
// using API keys. If no API keys are configured, it allows all requests.
func AuthMiddleware(cfg *ServerConfig) gin.HandlerFunc {
func AuthMiddleware(cfg *config.Config) gin.HandlerFunc {
return func(c *gin.Context) {
if len(cfg.ApiKeys) == 0 {
c.Next()

View File

@@ -10,7 +10,11 @@ import (
// ConvertCliToOpenAI translates a single chunk of a streaming response from the
// backend client format to the OpenAI Server-Sent Events (SSE) format.
// It returns an empty string if the chunk contains no useful data.
func ConvertCliToOpenAI(rawJson []byte) string {
func ConvertCliToOpenAI(rawJson []byte, unixTimestamp int64, isGlAPIKey bool) string {
if isGlAPIKey {
rawJson, _ = sjson.SetRawBytes(rawJson, "response", rawJson)
}
// Initialize the OpenAI SSE template.
template := `{"id":"","object":"chat.completion.chunk","created":12345,"model":"model","choices":[{"index":0,"delta":{"role":null,"content":null,"reasoning_content":null,"tool_calls":null},"finish_reason":null,"native_finish_reason":null}]}`
@@ -22,11 +26,12 @@ func ConvertCliToOpenAI(rawJson []byte) string {
// Extract and set the creation timestamp.
if createTimeResult := gjson.GetBytes(rawJson, "response.createTime"); createTimeResult.Exists() {
t, err := time.Parse(time.RFC3339Nano, createTimeResult.String())
unixTimestamp := time.Now().Unix()
if err == nil {
unixTimestamp = t.Unix()
}
template, _ = sjson.Set(template, "created", unixTimestamp)
} else {
template, _ = sjson.Set(template, "created", unixTimestamp)
}
// Extract and set the response ID.
@@ -90,19 +95,25 @@ func ConvertCliToOpenAI(rawJson []byte) string {
// ConvertCliToOpenAINonStream aggregates response from the backend client
// convert a single, non-streaming OpenAI-compatible JSON response.
func ConvertCliToOpenAINonStream(rawJson []byte) string {
func ConvertCliToOpenAINonStream(rawJson []byte, unixTimestamp int64, isGlAPIKey bool) string {
if isGlAPIKey {
rawJson, _ = sjson.SetRawBytes(rawJson, "response", rawJson)
}
template := `{"id":"","object":"chat.completion","created":123456,"model":"model","choices":[{"index":0,"message":{"role":"assistant","content":null,"reasoning_content":null,"tool_calls":null},"finish_reason":null,"native_finish_reason":null}]}`
if modelVersionResult := gjson.GetBytes(rawJson, "response.modelVersion"); modelVersionResult.Exists() {
template, _ = sjson.Set(template, "model", modelVersionResult.String())
}
if createTimeResult := gjson.GetBytes(rawJson, "response.createTime"); createTimeResult.Exists() {
t, err := time.Parse(time.RFC3339Nano, createTimeResult.String())
unixTimestamp := time.Now().Unix()
if err == nil {
unixTimestamp = t.Unix()
}
template, _ = sjson.Set(template, "created", unixTimestamp)
} else {
template, _ = sjson.Set(template, "created", unixTimestamp)
}
if responseIdResult := gjson.GetBytes(rawJson, "response.responseId"); responseIdResult.Exists() {
template, _ = sjson.Set(template, "id", responseIdResult.String())
}

View File

@@ -27,22 +27,40 @@ const (
codeAssistEndpoint = "https://cloudcode-pa.googleapis.com"
apiVersion = "v1internal"
pluginVersion = "0.1.9"
glEndPoint = "https://generativelanguage.googleapis.com/"
glApiVersion = "v1beta"
)
var (
previewModels = map[string][]string{
"gemini-2.5-pro": {"gemini-2.5-pro-preview-05-06", "gemini-2.5-pro-preview-06-05"},
"gemini-2.5-flash": {"gemini-2.5-flash-preview-04-17", "gemini-2.5-flash-preview-05-20"},
}
)
// Client is the main client for interacting with the CLI API.
type Client struct {
httpClient *http.Client
RequestMutex sync.Mutex
tokenStorage *auth.TokenStorage
cfg *config.Config
httpClient *http.Client
RequestMutex sync.Mutex
tokenStorage *auth.TokenStorage
cfg *config.Config
modelQuotaExceeded map[string]*time.Time
glAPIKey string
}
// NewClient creates a new CLI API client.
func NewClient(httpClient *http.Client, ts *auth.TokenStorage, cfg *config.Config) *Client {
func NewClient(httpClient *http.Client, ts *auth.TokenStorage, cfg *config.Config, glAPIKey ...string) *Client {
var glKey string
if len(glAPIKey) > 0 {
glKey = glAPIKey[0]
}
return &Client{
httpClient: httpClient,
tokenStorage: ts,
cfg: cfg,
httpClient: httpClient,
tokenStorage: ts,
cfg: cfg,
modelQuotaExceeded: make(map[string]*time.Time),
glAPIKey: glKey,
}
}
@@ -71,7 +89,14 @@ func (c *Client) GetEmail() string {
}
func (c *Client) GetProjectID() string {
return c.tokenStorage.ProjectID
if c.tokenStorage != nil {
return c.tokenStorage.ProjectID
}
return ""
}
func (c *Client) GetGenerativeLanguageAPIKey() string {
return c.glAPIKey
}
// SetupUser performs the initial user onboarding and setup.
@@ -214,6 +239,168 @@ func (c *Client) makeAPIRequest(ctx context.Context, endpoint, method string, bo
return nil
}
// APIRequest handles making requests to the CLI API endpoints.
func (c *Client) APIRequest(ctx context.Context, endpoint string, body interface{}, stream bool) (io.ReadCloser, *ErrorMessage) {
var jsonBody []byte
var err error
if byteBody, ok := body.([]byte); ok {
jsonBody = byteBody
} else {
jsonBody, err = json.Marshal(body)
if err != nil {
return nil, &ErrorMessage{500, fmt.Errorf("failed to marshal request body: %w", err)}
}
}
var url string
if c.glAPIKey == "" {
// Add alt=sse for streaming
url = fmt.Sprintf("%s/%s:%s", codeAssistEndpoint, apiVersion, endpoint)
if stream {
url = url + "?alt=sse"
}
} else {
modelResult := gjson.GetBytes(jsonBody, "model")
url = fmt.Sprintf("%s/%s/models/%s:%s", glEndPoint, glApiVersion, modelResult.String(), endpoint)
if stream {
url = url + "?alt=sse"
}
jsonBody = []byte(gjson.GetBytes(jsonBody, "request").Raw)
}
// log.Debug(string(jsonBody))
reqBody := bytes.NewBuffer(jsonBody)
req, err := http.NewRequestWithContext(ctx, "POST", url, reqBody)
if err != nil {
return nil, &ErrorMessage{500, fmt.Errorf("failed to create request: %v", err)}
}
// Set headers
metadataStr := getClientMetadataString()
req.Header.Set("Content-Type", "application/json")
if c.glAPIKey == "" {
token, errToken := c.httpClient.Transport.(*oauth2.Transport).Source.Token()
if errToken != nil {
return nil, &ErrorMessage{500, fmt.Errorf("failed to get token: %v", errToken)}
}
req.Header.Set("User-Agent", getUserAgent())
req.Header.Set("Client-Metadata", metadataStr)
req.Header.Set("Authorization", fmt.Sprintf("Bearer %s", token.AccessToken))
} else {
req.Header.Set("x-goog-api-key", c.glAPIKey)
}
resp, err := c.httpClient.Do(req)
if err != nil {
return nil, &ErrorMessage{500, fmt.Errorf("failed to execute request: %v", err)}
}
if resp.StatusCode < 200 || resp.StatusCode >= 300 {
defer func() {
if err = resp.Body.Close(); err != nil {
log.Printf("warn: failed to close response body: %v", err)
}
}()
bodyBytes, _ := io.ReadAll(resp.Body)
return nil, &ErrorMessage{resp.StatusCode, fmt.Errorf(string(bodyBytes))}
}
return resp.Body, nil
}
// SendMessageStream handles a single conversational turn, including tool calls.
func (c *Client) SendMessage(ctx context.Context, rawJson []byte, model string, contents []Content, tools []ToolDeclaration) ([]byte, *ErrorMessage) {
request := GenerateContentRequest{
Contents: contents,
GenerationConfig: GenerationConfig{
ThinkingConfig: GenerationConfigThinkingConfig{
IncludeThoughts: true,
},
},
}
request.Tools = tools
requestBody := map[string]interface{}{
"project": c.GetProjectID(), // Assuming ProjectID is available
"request": request,
"model": model,
}
byteRequestBody, _ := json.Marshal(requestBody)
// log.Debug(string(byteRequestBody))
reasoningEffortResult := gjson.GetBytes(rawJson, "reasoning_effort")
if reasoningEffortResult.String() == "none" {
byteRequestBody, _ = sjson.DeleteBytes(byteRequestBody, "request.generationConfig.thinkingConfig.include_thoughts")
byteRequestBody, _ = sjson.SetBytes(byteRequestBody, "request.generationConfig.thinkingConfig.thinkingBudget", 0)
} else if reasoningEffortResult.String() == "auto" {
byteRequestBody, _ = sjson.SetBytes(byteRequestBody, "request.generationConfig.thinkingConfig.thinkingBudget", -1)
} else if reasoningEffortResult.String() == "low" {
byteRequestBody, _ = sjson.SetBytes(byteRequestBody, "request.generationConfig.thinkingConfig.thinkingBudget", 1024)
} else if reasoningEffortResult.String() == "medium" {
byteRequestBody, _ = sjson.SetBytes(byteRequestBody, "request.generationConfig.thinkingConfig.thinkingBudget", 8192)
} else if reasoningEffortResult.String() == "high" {
byteRequestBody, _ = sjson.SetBytes(byteRequestBody, "request.generationConfig.thinkingConfig.thinkingBudget", 24576)
} else {
byteRequestBody, _ = sjson.SetBytes(byteRequestBody, "request.generationConfig.thinkingConfig.thinkingBudget", -1)
}
temperatureResult := gjson.GetBytes(rawJson, "temperature")
if temperatureResult.Exists() && temperatureResult.Type == gjson.Number {
byteRequestBody, _ = sjson.SetBytes(byteRequestBody, "request.generationConfig.temperature", temperatureResult.Num)
}
topPResult := gjson.GetBytes(rawJson, "top_p")
if topPResult.Exists() && topPResult.Type == gjson.Number {
byteRequestBody, _ = sjson.SetBytes(byteRequestBody, "request.generationConfig.topP", topPResult.Num)
}
topKResult := gjson.GetBytes(rawJson, "top_k")
if topKResult.Exists() && topKResult.Type == gjson.Number {
byteRequestBody, _ = sjson.SetBytes(byteRequestBody, "request.generationConfig.topK", topKResult.Num)
}
modelName := model
// log.Debug(string(byteRequestBody))
for {
if c.isModelQuotaExceeded(modelName) {
if c.cfg.QuotaExceeded.SwitchPreviewModel && c.glAPIKey == "" {
modelName = c.getPreviewModel(model)
if modelName != "" {
log.Debugf("Model %s is quota exceeded. Switch to preview model %s", model, modelName)
byteRequestBody, _ = sjson.SetBytes(byteRequestBody, "model", modelName)
continue
}
}
return nil, &ErrorMessage{
StatusCode: 429,
Error: fmt.Errorf(`{"error":{"code":429,"message":"All the models of '%s' are quota exceeded","status":"RESOURCE_EXHAUSTED"}}`, model),
}
}
respBody, err := c.APIRequest(ctx, "generateContent", byteRequestBody, false)
if err != nil {
if err.StatusCode == 429 {
now := time.Now()
c.modelQuotaExceeded[modelName] = &now
if c.cfg.QuotaExceeded.SwitchPreviewModel && c.glAPIKey == "" {
continue
}
}
return nil, err
}
delete(c.modelQuotaExceeded, modelName)
bodyBytes, errReadAll := io.ReadAll(respBody)
if errReadAll != nil {
return nil, &ErrorMessage{StatusCode: 500, Error: errReadAll}
}
return bodyBytes, nil
}
}
// SendMessageStream handles a single conversational turn, including tool calls.
func (c *Client) SendMessageStream(ctx context.Context, rawJson []byte, model string, contents []Content, tools []ToolDeclaration) (<-chan []byte, <-chan *ErrorMessage) {
dataTag := []byte("data: ")
@@ -234,7 +421,7 @@ func (c *Client) SendMessageStream(ctx context.Context, rawJson []byte, model st
request.Tools = tools
requestBody := map[string]interface{}{
"project": c.tokenStorage.ProjectID, // Assuming ProjectID is available
"project": c.GetProjectID(), // Assuming ProjectID is available
"request": request,
"model": model,
}
@@ -275,12 +462,39 @@ func (c *Client) SendMessageStream(ctx context.Context, rawJson []byte, model st
}
// log.Debug(string(byteRequestBody))
stream, err := c.APIRequest(ctx, "streamGenerateContent", byteRequestBody, true)
if err != nil {
// log.Println(err)
errChan <- err
return
modelName := model
var stream io.ReadCloser
for {
if c.isModelQuotaExceeded(modelName) {
if c.cfg.QuotaExceeded.SwitchPreviewModel && c.glAPIKey == "" {
modelName = c.getPreviewModel(model)
if modelName != "" {
log.Debugf("Model %s is quota exceeded. Switch to preview model %s", model, modelName)
byteRequestBody, _ = sjson.SetBytes(byteRequestBody, "model", modelName)
continue
}
}
errChan <- &ErrorMessage{
StatusCode: 429,
Error: fmt.Errorf(`{"error":{"code":429,"message":"All the models of '%s' are quota exceeded","status":"RESOURCE_EXHAUSTED"}}`, model),
}
return
}
var err *ErrorMessage
stream, err = c.APIRequest(ctx, "streamGenerateContent", byteRequestBody, true)
if err != nil {
if err.StatusCode == 429 {
now := time.Now()
c.modelQuotaExceeded[modelName] = &now
if c.cfg.QuotaExceeded.SwitchPreviewModel && c.glAPIKey == "" {
continue
}
}
errChan <- err
return
}
delete(c.modelQuotaExceeded, modelName)
break
}
scanner := bufio.NewScanner(stream)
@@ -305,127 +519,36 @@ func (c *Client) SendMessageStream(ctx context.Context, rawJson []byte, model st
return dataChan, errChan
}
// APIRequest handles making requests to the CLI API endpoints.
func (c *Client) APIRequest(ctx context.Context, endpoint string, body interface{}, stream bool) (io.ReadCloser, *ErrorMessage) {
var jsonBody []byte
var err error
if byteBody, ok := body.([]byte); ok {
jsonBody = byteBody
} else {
jsonBody, err = json.Marshal(body)
if err != nil {
return nil, &ErrorMessage{500, fmt.Errorf("failed to marshal request body: %w", err)}
func (c *Client) isModelQuotaExceeded(model string) bool {
if lastExceededTime, hasKey := c.modelQuotaExceeded[model]; hasKey {
duration := time.Now().Sub(*lastExceededTime)
if duration > 30*time.Minute {
return false
}
return true
}
// log.Debug(string(jsonBody))
reqBody := bytes.NewBuffer(jsonBody)
// Add alt=sse for streaming
url := fmt.Sprintf("%s/%s:%s", codeAssistEndpoint, apiVersion, endpoint)
if stream {
url = url + "?alt=sse"
}
req, err := http.NewRequestWithContext(ctx, "POST", url, reqBody)
if err != nil {
return nil, &ErrorMessage{500, fmt.Errorf("failed to create request: %w", err)}
}
token, err := c.httpClient.Transport.(*oauth2.Transport).Source.Token()
if err != nil {
return nil, &ErrorMessage{500, fmt.Errorf("failed to get token: %w", err)}
}
// Set headers
metadataStr := getClientMetadataString()
req.Header.Set("Content-Type", "application/json")
req.Header.Set("User-Agent", getUserAgent())
req.Header.Set("Client-Metadata", metadataStr)
req.Header.Set("Authorization", fmt.Sprintf("Bearer %s", token.AccessToken))
resp, err := c.httpClient.Do(req)
if err != nil {
return nil, &ErrorMessage{500, fmt.Errorf("failed to execute request: %w", err)}
}
if resp.StatusCode < 200 || resp.StatusCode >= 300 {
defer func() {
if err = resp.Body.Close(); err != nil {
log.Printf("warn: failed to close response body: %v", err)
}
}()
bodyBytes, _ := io.ReadAll(resp.Body)
return nil, &ErrorMessage{resp.StatusCode, fmt.Errorf(string(bodyBytes))}
}
return resp.Body, nil
return false
}
// SendMessageStream handles a single conversational turn, including tool calls.
func (c *Client) SendMessage(ctx context.Context, rawJson []byte, model string, contents []Content, tools []ToolDeclaration) ([]byte, *ErrorMessage) {
request := GenerateContentRequest{
Contents: contents,
GenerationConfig: GenerationConfig{
ThinkingConfig: GenerationConfigThinkingConfig{
IncludeThoughts: true,
},
},
func (c *Client) getPreviewModel(model string) string {
if models, hasKey := previewModels[model]; hasKey {
for i := 0; i < len(models); i++ {
if !c.isModelQuotaExceeded(models[i]) {
return models[i]
}
}
}
request.Tools = tools
return ""
}
requestBody := map[string]interface{}{
"project": c.tokenStorage.ProjectID, // Assuming ProjectID is available
"request": request,
"model": model,
func (c *Client) IsModelQuotaExceeded(model string) bool {
if c.isModelQuotaExceeded(model) {
if c.cfg.QuotaExceeded.SwitchPreviewModel {
return c.getPreviewModel(model) == ""
}
return true
}
byteRequestBody, _ := json.Marshal(requestBody)
// log.Debug(string(byteRequestBody))
reasoningEffortResult := gjson.GetBytes(rawJson, "reasoning_effort")
if reasoningEffortResult.String() == "none" {
byteRequestBody, _ = sjson.DeleteBytes(byteRequestBody, "request.generationConfig.thinkingConfig.include_thoughts")
byteRequestBody, _ = sjson.SetBytes(byteRequestBody, "request.generationConfig.thinkingConfig.thinkingBudget", 0)
} else if reasoningEffortResult.String() == "auto" {
byteRequestBody, _ = sjson.SetBytes(byteRequestBody, "request.generationConfig.thinkingConfig.thinkingBudget", -1)
} else if reasoningEffortResult.String() == "low" {
byteRequestBody, _ = sjson.SetBytes(byteRequestBody, "request.generationConfig.thinkingConfig.thinkingBudget", 1024)
} else if reasoningEffortResult.String() == "medium" {
byteRequestBody, _ = sjson.SetBytes(byteRequestBody, "request.generationConfig.thinkingConfig.thinkingBudget", 8192)
} else if reasoningEffortResult.String() == "high" {
byteRequestBody, _ = sjson.SetBytes(byteRequestBody, "request.generationConfig.thinkingConfig.thinkingBudget", 24576)
} else {
byteRequestBody, _ = sjson.SetBytes(byteRequestBody, "request.generationConfig.thinkingConfig.thinkingBudget", -1)
}
temperatureResult := gjson.GetBytes(rawJson, "temperature")
if temperatureResult.Exists() && temperatureResult.Type == gjson.Number {
byteRequestBody, _ = sjson.SetBytes(byteRequestBody, "request.generationConfig.temperature", temperatureResult.Num)
}
topPResult := gjson.GetBytes(rawJson, "top_p")
if topPResult.Exists() && topPResult.Type == gjson.Number {
byteRequestBody, _ = sjson.SetBytes(byteRequestBody, "request.generationConfig.topP", topPResult.Num)
}
topKResult := gjson.GetBytes(rawJson, "top_k")
if topKResult.Exists() && topKResult.Type == gjson.Number {
byteRequestBody, _ = sjson.SetBytes(byteRequestBody, "request.generationConfig.topK", topKResult.Num)
}
// log.Debug(string(byteRequestBody))
respBody, err := c.APIRequest(ctx, "generateContent", byteRequestBody, false)
if err != nil {
return nil, err
}
bodyBytes, errReadAll := io.ReadAll(respBody)
if errReadAll != nil {
return nil, &ErrorMessage{StatusCode: 500, Error: errReadAll}
}
return bodyBytes, nil
return false
}
// CheckCloudAPIIsEnabled sends a simple test request to the API to verify

View File

@@ -3,13 +3,16 @@ package cmd
import (
"context"
"encoding/json"
"fmt"
"github.com/luispater/CLIProxyAPI/internal/api"
"github.com/luispater/CLIProxyAPI/internal/auth"
"github.com/luispater/CLIProxyAPI/internal/client"
"github.com/luispater/CLIProxyAPI/internal/config"
log "github.com/sirupsen/logrus"
"golang.org/x/net/proxy"
"io/fs"
"net"
"net/http"
"net/url"
"os"
"os/signal"
"path/filepath"
@@ -22,13 +25,6 @@ import (
// It loads all available authentication tokens, creates a pool of clients,
// starts the API server, and handles graceful shutdown signals.
func StartService(cfg *config.Config) {
// Configure the API server based on the main application config.
apiConfig := &api.ServerConfig{
Port: fmt.Sprintf("%d", cfg.Port),
Debug: cfg.Debug,
ApiKeys: cfg.ApiKeys,
}
// Create a pool of API clients, one for each token file found.
cliClients := make([]*client.Client, 0)
err := filepath.Walk(cfg.AuthDir, func(path string, info fs.FileInfo, err error) error {
@@ -72,9 +68,43 @@ func StartService(cfg *config.Config) {
log.Fatalf("Error walking auth directory: %v", err)
}
if len(cfg.GlAPIKey) > 0 {
var transport *http.Transport
proxyURL, errParse := url.Parse(cfg.ProxyUrl)
if errParse == nil {
if proxyURL.Scheme == "socks5" {
username := proxyURL.User.Username()
password, _ := proxyURL.User.Password()
proxyAuth := &proxy.Auth{User: username, Password: password}
dialer, errSOCKS5 := proxy.SOCKS5("tcp", proxyURL.Host, proxyAuth, proxy.Direct)
if errSOCKS5 != nil {
log.Fatalf("create SOCKS5 dialer failed: %v", errSOCKS5)
}
transport = &http.Transport{
DialContext: func(ctx context.Context, network, addr string) (net.Conn, error) {
return dialer.Dial(network, addr)
},
}
} else if proxyURL.Scheme == "http" || proxyURL.Scheme == "https" {
// Handle HTTP/HTTPS proxy.
transport = &http.Transport{Proxy: http.ProxyURL(proxyURL)}
}
}
for i := 0; i < len(cfg.GlAPIKey); i++ {
httpClient := &http.Client{}
if transport != nil {
httpClient.Transport = transport
}
log.Debug("Initializing with Generative Language API key...")
cliClient := client.NewClient(httpClient, nil, cfg, cfg.GlAPIKey[i])
cliClients = append(cliClients, cliClient)
}
}
// Create and start the API server with the pool of clients.
apiServer := api.NewServer(apiConfig, cliClients)
log.Infof("Starting API server on port %s", apiConfig.Port)
apiServer := api.NewServer(cfg, cliClients)
log.Infof("Starting API server on port %d", cfg.Port)
if err = apiServer.Start(); err != nil {
log.Fatalf("API server failed to start: %v", err)
}

View File

@@ -11,13 +11,24 @@ type Config struct {
// Port is the network port on which the API server will listen.
Port int `yaml:"port"`
// AuthDir is the directory where authentication token files are stored.
AuthDir string `yaml:"auth_dir"`
AuthDir string `yaml:"auth-dir"`
// Debug enables or disables debug-level logging and other debug features.
Debug bool `yaml:"debug"`
// ProxyUrl is the URL of an optional proxy server to use for outbound requests.
ProxyUrl string `yaml:"proxy-url"`
// ApiKeys is a list of keys for authenticating clients to this proxy server.
ApiKeys []string `yaml:"api_keys"`
ApiKeys []string `yaml:"api-keys"`
// QuotaExceeded defines the behavior when a quota is exceeded.
QuotaExceeded ConfigQuotaExceeded `yaml:"quota-exceeded"`
// GlAPIKey is the API key for the generative language API.
GlAPIKey []string `yaml:"generative-language-api-key"`
}
type ConfigQuotaExceeded struct {
// SwitchProject indicates whether to automatically switch to another project when a quota is exceeded.
SwitchProject bool `yaml:"switch-project"`
// SwitchPreviewModel indicates whether to automatically switch to a preview model when a quota is exceeded.
SwitchPreviewModel bool `yaml:"switch-preview-model"`
}
// LoadConfig reads a YAML configuration file from the given path,