Text

All three text endpoints are transparent proxies to MuleRun's native OpenAI/Anthropic-compatible APIs, with SSE streaming preserved.

Which models your account can hit depends on what MuleRun has enabled. A typical muk- studio key opens deepseek-v4-* plus the GPT-5.x code-plane (openai/gpt-5.5, openai/gpt-5.4-mini, …). Unprefixed gpt-5 / claude-* need an LLM-gateway key. If the gateway returns Model '…' is not supported, that's an account-tier issue — see Troubleshooting.

POST /v1/chat/completions

curl http://localhost:51222/v1/chat/completions \
  -H "Authorization: Bearer $CLI2API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-5","messages":[{"role":"user","content":"hello"}]}'

A vendor/ prefix on the model routes via the code-plane (currently openai/* goes to /vendors/openai/v1/chat/completions with the prefix stripped before forwarding). See Models.

POST /v1/responses

Transparent proxy to MuleRun's /vendors/openai/v1/responses (the OpenAI Agents SDK entrypoint). Supports "stream": true; "background": true for async jobs.

curl http://localhost:51222/v1/responses \
  -H "Authorization: Bearer $CLI2API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-5","input":"Summarize Black-Scholes in one paragraph."}'

Built-in tools (web_search, file_search, code_interpreter, image_generation)

The Responses API exposes OpenAI's server-side tool catalog as part of the request body — cli2api doesn't intercept tools, they pass straight through. This is the path to use when your app needs live web search: MuleRun has no standalone /v1/search endpoint, but the model can call web_search itself and you get the results inlined in the response.

curl http://localhost:51222/v1/responses \
  -H "Authorization: Bearer $CLI2API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "input": "What is the latest stable Go release as of today? Cite the source.",
    "tools": [{"type": "web_search"}],
    "max_output_tokens": 500
  }'

The response's output array interleaves model events:

{
  "status": "completed",
  "output": [
    {"type": "web_search_call", "status": "completed",
     "action": {"type": "search", "query": "latest stable Go release 2026"}},
    {"type": "message", "status": "completed",
     "content": [{"type": "output_text",
                  "text": "The latest stable release is go1.26.4 …",
                  "annotations": [{"type": "url_citation", "url": "https://go.dev/dl/", "title": "Downloads - The Go Programming Language"}]}]}
  ]
}

Tool variants

{"type": "web_search"} — search the public web (GPT-5.x family).
{"type": "file_search"} — search uploaded files / vector stores.
{"type": "code_interpreter"} — sandbox Python.
{"type": "image_generation"} — call OpenAI's image model inline.
Custom function tools work too — cli2api passes everything through.

Reasoning tokens eat the budget

The GPT-5.x family burns most of max_output_tokens on hidden reasoning before any visible message. Set max_output_tokens to ≥500 for tool-using calls, otherwise the response can finish_reason: length with empty text. Check usage.completion_tokens_details.reasoning_tokens to see how much went to thinking.

POST /v1/messages

Anthropic shape. Accepts x-api-key or Authorization: Bearer. Claude's own web_search_20250305 tool flows through here unmodified.

curl http://localhost:51222/v1/messages \
  -H "x-api-key: $CLI2API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{"model":"claude-sonnet-4-6","max_tokens":256,"messages":[{"role":"user","content":"hi"}]}'

SDK usage

from openai import OpenAI
c = OpenAI(api_key="local-key", base_url="http://localhost:51222/v1")

r = c.chat.completions.create(
    model="gpt-5",
    messages=[{"role": "user", "content": "hi"}],
)
print(r.choices[0].message.content)

from anthropic import Anthropic
a = Anthropic(api_key="local-key", base_url="http://localhost:51222")

r = a.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=256,
    messages=[{"role": "user", "content": "hi"}],
)
print(r.content[0].text)

for chunk in c.chat.completions.create(
    model="gpt-5",
    messages=[{"role": "user", "content": "tell me a joke"}],
    stream=True,
):
    print(chunk.choices[0].delta.content or "", end="")

POST /v1/chat/completions

POST /v1/responses

Built-in tools (web_search, file_search, code_interpreter, image_generation)

POST /v1/messages

SDK usage

On this page