cli2api

Troubleshooting

Common errors and FAQ.

Common errors

no mulerun credentials found

Run mulerun login, or export MULERUN_TOKEN=mr_xxx. The startup log's token_source field shows which file/env it read.

/v1/chat/completions returns 401 Invalid API Key format

The studio OAuth token can't call the text endpoints — MuleRun runs two independent auth systems. Get a separate LLM-gateway key from the MuleRun console and set it as MULERUN_TOKEN. Image/video/audio endpoints are unaffected (they use the studio-plane token).

502 upstream HTTP 401

Upstream token expired or rejected. Re-login / rotate the token. The OAuth cache has an expires_at; cli2api skips expired tokens, but the simplest fix is mulerun login again.

vendor_error: code 3005 / 3006 / ...

A real upstream error (the MiniMax/Seedance/Wan service itself failed). cli2api is working — it surfaces the structured upstream error to the client. Retry or switch models.

404 unknown image model: dall-e-3

cli2api does not alias OpenAI names to MuleRun. Use real names (gpt-image-2 / wan2.6-t2i / midjourney). Run curl localhost:8080/v1/models | jq '.data[].id' for the full list.

Video / music job stuck at queued

Check CLI2API_JOB_RETENTION and CLI2API_JOB_HARD_CAP_MULT: too-short retention + small multiplier means the reaper deletes the job before polling finishes. Defaults (7d / 3×) are fine; only shrink them for short-lived tests.

Job ID gone after restart

The in-memory store is lost on restart. Use CLI2API_JOBSTORE_DSN=file:... or a remote libsql to persist.

request body too large (400)

Per-request cap is 64 MB. This is chi's RequestSize middleware, wrapped as an OpenAI-style 400.

SSE streaming shows no increments

Your reverse proxy didn't disable proxy_buffering — nginx / Caddy each need it configured. Also confirm your client uses stream=True, not json().

FAQ

Why a proxy instead of calling MuleRun directly?

/v1/chat/completions and /v1/messages are already compatible — call them directly if you want. But image/video/audio use a /vendors/{vendor}/... async job shape unlike OpenAI's. cli2api hides that so existing SDK code runs unchanged.

Does it cache results?

No. Every call is a fresh MuleRun task. Cache at the application or CDN layer.

Can it run in Lambda / Cloud Functions?

Text and image sync endpoints, yes. Video/music are explicitly async — the client polls, instances can restart freely. Pair with a libsql persistent store.

Multi-tenancy?

No built-in users/quota. CLI2API_API_KEYS is a flat allow-list. Put quota behind an API gateway.

How well-reviewed is this code?

The project went through 6 rounds of reviewer/reviewee iteration (codex ×3 + cc ×2 + live e2e), fixing 26 real bugs — credential leaks, async jobs stuck forever, the reaper deleting live jobs, upstream schema nesting. 50+ unit tests, each bug carries a regression. See DEVELOPMENT.md.

On this page