# Lazu Docs

# Lazu

One API key, every major model, transparent per-lane pricing. Drop-in
OpenAI-compatible — and native Anthropic / Gemini work as-is.

  - [Quickstart](/quickstart): First call in under 60 seconds.
  - [Pricing & lanes](/models/pricing): Pick direct or cheap per vendor. Three-line transparent pricing on every model.
  - [Files API](/endpoints/files): Upload PDFs and images, reference via file_id from Responses. 30-day retention.
  - [API reference](/api-reference): Interactive OpenAPI — try every endpoint in the browser.

> [info]

  Self-hosted Lazu uses the same paths and OpenAI-compatible client config. Just
  point `base_url` at your own API domain.

## Why Lazu

- **One key, every provider.** OpenAI, Anthropic, Google, DeepSeek, Moonshot, xAI, Cloudflare Workers AI, and more.
- **Choose your lane.** `direct` (first-party / Azure / AWS / GCP) or `cheap` (lower-cost path). Pricing is transparent per lane.
- **OpenAI SDK works as-is.** Same `base_url` swap any existing project uses.
- **Native APIs too.** Anthropic `/v1/messages`, Gemini `/v1beta/...` — no proxy mismatch.

## Endpoints

| Endpoint                                    | Use for                                 | Docs                                |
| ------------------------------------------- | --------------------------------------- | ----------------------------------- |
| `/v1/chat/completions`                      | OpenAI-compatible chat                  | [Chat](/endpoints/chat)             |
| `/v1/responses`                             | Reasoning models + `file_id` references | [Responses](/endpoints/responses)   |
| `/v1/embeddings`                            | Text embeddings                         | [Embeddings](/endpoints/embeddings) |
| `/v1/files`                                 | Upload / list / download / delete files | [Files](/endpoints/files)           |
| `/v1/audio/speech`, `/audio/transcriptions` | TTS, Whisper STT                        | [API reference](/api-reference)     |
| `/v1/images/generations`, `/images/edits`   | Image gen / edit / variation            | [API reference](/api-reference)     |
| `/v1/messages`                              | Anthropic native                        | [API reference](/api-reference)     |
| `/v1beta/models/{model}:generateContent`    | Gemini native                           | [API reference](/api-reference)     |
| `/v1/models`, `/api/models/catalog`         | Model discovery                         | [Catalog](/models/catalog)          |

## Base URL

```
https://api.lazu.ai
```

## Get started

1. Get an API key in the [console](https://lazu.ai/console/token).
2. (Optional) Pick a lane per vendor under the key's advanced settings.
3. Send a request — see [Quickstart](/quickstart).

# Lazu 文档

# Lazu 文档

Lazu 是统一 AI 模型网关。你可以用一个 API Key 接入主流模型 provider，同时保留 OpenAI 兼容调用、原生 provider endpoint、模型发现、额度控制和用量日志。

  - Base URL: 托管版使用 https://api.lazu.ai；自部署时替换成你自己的 API 域名。
  - 模型目录: GET /api/models/catalog 返回当前 API Key 真正可用的模型和能力。
  - [代码示例](/zh/examples): 聊天、流式响应、多模态、错误处理和客户端封装。
  - [接口调试](/zh/api-reference): 交互式 OpenAPI Reference，适合按字段查请求和响应结构。

> [info]

  自部署版本可以保持相同的 API path 和 OpenAI 兼容客户端配置，只需要把 base URL
  换成你自己的 API 域名。

## 核心流程

1. 在 Lazu 控制台创建 API Key。
2. 调用 `GET /api/models/catalog`。
3. 根据 `modality`、`supported_endpoint_types`、`supported_endpoints`、`pricing`、`parameters` 和 `default_endpoint_type` 选择模型。
4. 按模型推荐的 endpoint 发起请求。

## 认证

最通用的鉴权方式：

```txt
Authorization: Bearer YOUR_API_KEY
```

| 接口类型           | 推荐鉴权方式                                          |
| ------------------ | ----------------------------------------------------- |
| OpenAI 兼容接口    | `Authorization: Bearer YOUR_API_KEY`                  |
| Anthropic 原生接口 | `x-api-key: YOUR_API_KEY` + `anthropic-version`       |
| Gemini 原生接口    | `?key=YOUR_API_KEY` 或 `x-goog-api-key: YOUR_API_KEY` |

Token 可以限制可访问模型、额度、过期时间和 IP 白名单。给 Agent、脚本或 CI 使用时，建议单独创建受限 Token。

## 快速开始

```bash
curl https://api.lazu.ai/api/models/catalog \
  -H "Authorization: Bearer YOUR_API_KEY"
```

```python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.lazu.ai/v1",
    api_key="YOUR_API_KEY",
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello from Lazu"}],
)
print(response.choices[0].message.content)
```

## 选模型

`GET /api/models/catalog` 是当前 token 的事实来源。Agent 和脚本应该优先读取 catalog，而不是猜模型名。

# Lazu 文件

# Lazu 文件

> [warning]

  繁體中文版仍在校稿中，目前頁面為摘要版本。完整內容請參考 [简体中文版](/zh/) 或
  [English](/)。

Lazu 是統一的 AI 模型閘道。你可以用一把 API Key 接入主流模型 provider，同時保留 OpenAI 相容呼叫、原生 provider 端點、模型發現、額度控管與用量記錄。

  - Base URL: 託管版使用 https://api.lazu.ai；自部署時替換成你自己的 API 網域。
  - 模型目錄: GET /api/models/catalog 會回傳當前 API Key 實際可用的模型與能力。
  - [程式碼範例](/zh/examples): 聊天、串流、多模態、錯誤處理與用戶端封裝（暫以簡體呈現）。
  - [API Reference](/zh/api-reference): 互動式 OpenAPI 參考（暫以簡體呈現）。

## 認證

最通用的鑑權方式：

```txt
Authorization: Bearer YOUR_API_KEY
```

| 介面類型           | 建議鑑權方式                                          |
| ------------------ | ----------------------------------------------------- |
| OpenAI 相容介面    | `Authorization: Bearer YOUR_API_KEY`                  |
| Anthropic 原生介面 | `x-api-key: YOUR_API_KEY` + `anthropic-version`       |
| Gemini 原生介面    | `?key=YOUR_API_KEY` 或 `x-goog-api-key: YOUR_API_KEY` |

## 快速開始

```bash
curl https://api.lazu.ai/api/models/catalog \
  -H "Authorization: Bearer YOUR_API_KEY"
```

```python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.lazu.ai/v1",
    api_key="YOUR_API_KEY",
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello from Lazu"}],
)
print(response.choices[0].message.content)
```

更多範例與排障細節請參考 [简体中文版](/zh/) 或 [English](/)。

# Lazu ドキュメント

# Lazu ドキュメント

> [warning]

  日本語版は現在準備中です。詳細は [English](/) または [简体中文](/zh/)
  をご覧ください。

Lazu は統合 AI モデルゲートウェイです。1 つの API Key で主要なモデル provider に接続しつつ、OpenAI 互換呼び出し、provider ネイティブの endpoint、モデル検索、クォータ管理、利用ログを利用できます。

  - Base URL: Hosted Lazu uses https://api.lazu.ai. For self-hosting, replace it with your own API domain.
  - モデルカタログ: GET /api/models/catalog は、現在の API Key で実際に利用可能なモデルと機能を返します。
  - [コードサンプル](/examples): チャット、ストリーミング、マルチモーダル、エラー処理（現在は英語のみ）。
  - [API リファレンス](/api-reference): インタラクティブな OpenAPI リファレンス（現在は英語のみ）。

## 認証

最も汎用的な認証方式：

```txt
Authorization: Bearer YOUR_API_KEY
```

| インターフェース     | 推奨認証方式                                              |
| -------------------- | --------------------------------------------------------- |
| OpenAI 互換          | `Authorization: Bearer YOUR_API_KEY`                      |
| Anthropic ネイティブ | `x-api-key: YOUR_API_KEY` + `anthropic-version`           |
| Gemini ネイティブ    | `?key=YOUR_API_KEY` または `x-goog-api-key: YOUR_API_KEY` |

## クイックスタート

```bash
curl https://api.lazu.ai/api/models/catalog \
  -H "Authorization: Bearer YOUR_API_KEY"
```

```python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.lazu.ai/v1",
    api_key="YOUR_API_KEY",
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello from Lazu"}],
)
print(response.choices[0].message.content)
```

詳しいサンプルやトラブルシューティングは [English 版](/) をご覧ください。

# Code Examples

# Code Examples

These examples use the hosted Lazu base URL. For self-hosted deployments, replace it with your own API origin.

## OpenAI-compatible chat

### Python

```python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.lazu.ai/v1",
    api_key="YOUR_LAZU_KEY",
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain model routing in one paragraph."},
    ],
)

print(response.choices[0].message.content)
```

### JavaScript

```javascript

const client = new OpenAI({
  baseURL: "https://api.lazu.ai/v1",
  apiKey: process.env.LAZU_API_KEY,
});

const response = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Hello from Lazu" }],
});

console.log(response.choices[0].message.content);
```

### cURL

```bash
curl https://api.lazu.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_LAZU_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "Reply with: Lazu test successful"}
    ]
  }'
```

## Streaming

```python
stream = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Write a short deployment checklist."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)
```

## Model discovery

```bash
curl https://api.lazu.ai/api/models/catalog \
  -H "Authorization: Bearer YOUR_LAZU_KEY"
```

> [tip]

  Use the catalog response to choose endpoint, modality and parameters. Do not
  hard-code a global model list in agents or SDK wrappers.

## Troubleshooting

Every response should include `x-request-id`. Use it in the console to trace model, upstream channel, status and token accounting.

# 代码示例

# 代码示例

以下示例使用 Lazu 托管版 Base URL。自部署时，请替换成你自己的 API 域名。

## OpenAI 兼容对话

### Python

```python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.lazu.ai/v1",
    api_key="YOUR_LAZU_KEY",
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "你是一个有帮助的助手。"},
        {"role": "user", "content": "用一段话解释模型路由。"},
    ],
)

print(response.choices[0].message.content)
```

### JavaScript

```javascript

const client = new OpenAI({
  baseURL: "https://api.lazu.ai/v1",
  apiKey: process.env.LAZU_API_KEY,
});

const response = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Hello from Lazu" }],
});

console.log(response.choices[0].message.content);
```

## 流式响应

```python
stream = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "写一份简短部署 checklist。"}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)
```

## 模型发现

```bash
curl https://api.lazu.ai/api/models/catalog \
  -H "Authorization: Bearer YOUR_LAZU_KEY"
```

> [tip]

  用 catalog 响应决定 endpoint、模态和参数。不要在 Agent 或 SDK wrapper
  里硬编码全局模型列表。

# 程式碼範例

# 程式碼範例

以下範例使用 Lazu 託管版 Base URL。自部署時，請替換成你自己的 API 網域。

## OpenAI 相容對話

### Python

```python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.lazu.ai/v1",
    api_key="YOUR_LAZU_KEY",
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "你是一個有幫助的助手。"},
        {"role": "user", "content": "用一段話解釋模型路由。"},
    ],
)

print(response.choices[0].message.content)
```

### JavaScript

```javascript

const client = new OpenAI({
  baseURL: "https://api.lazu.ai/v1",
  apiKey: process.env.LAZU_API_KEY,
});

const response = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Hello from Lazu" }],
});

console.log(response.choices[0].message.content);
```

## 串流回應

```python
stream = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "寫一份簡短部署 checklist。"}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)
```

## 模型發現

```bash
curl https://api.lazu.ai/api/models/catalog \
  -H "Authorization: Bearer YOUR_LAZU_KEY"
```

> [tip]

  用 catalog 回應決定 endpoint、模態和參數。不要在 Agent 或 SDK wrapper
  裡硬編碼全域模型列表。

# コードサンプル

# コードサンプル

These examples use the hosted Lazu base URL. For self-hosted deployments, replace it with your own API domain.

## OpenAI 互換チャット

### Python

```python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.lazu.ai/v1",
    api_key="YOUR_LAZU_KEY",
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain model routing in one paragraph."},
    ],
)

print(response.choices[0].message.content)
```

### JavaScript

```javascript

const client = new OpenAI({
  baseURL: "https://api.lazu.ai/v1",
  apiKey: process.env.LAZU_API_KEY,
});

const response = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Hello from Lazu" }],
});

console.log(response.choices[0].message.content);
```

### cURL

```bash
curl https://api.lazu.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_LAZU_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "Reply with: Lazu test successful"}
    ]
  }'
```

## ストリーミング

```python
stream = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Write a short deployment checklist."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)
```

## モデル検索

```bash
curl https://api.lazu.ai/api/models/catalog \
  -H "Authorization: Bearer YOUR_LAZU_KEY"
```

> [tip]

  catalog レスポンスを使って、endpoint、モダリティ、パラメータを選んでください。
  Agent や SDK wrapper
  の中にグローバルなモデル一覧をハードコードしないでください。

## トラブルシューティング

すべてのレスポンスには `x-request-id` が含まれている必要があります。コンソールでこの ID を使うと、モデル、upstream channel、ステータス、トークン計上を追跡できます。

# API Reference

# API Reference

Lazu API interactive OpenAPI reference.

- OpenAPI JSON: /openapi.json
- OpenAPI YAML: /openapi.yaml

# API Reference

# API Reference

Lazu API 交互式 OpenAPI 参考。

- OpenAPI JSON: /openapi.json
- OpenAPI YAML: /openapi.yaml

# API Reference

# API Reference

Lazu API 互動式 OpenAPI 參考。

- OpenAPI JSON: /openapi.json
- OpenAPI YAML: /openapi.yaml

# API Reference

# API Reference

Lazu API のインタラクティブ OpenAPI リファレンス。

- OpenAPI JSON: /openapi.json
- OpenAPI YAML: /openapi.yaml

# Changelog

# Changelog

## 2026-05-12 — Static docs return to the VPS

- Restore Vocs as the active documentation app for `docs.lazu.ai`.
- Serve docs as a standalone static nginx container.
- Keep the product brand and API examples consistent across the website and docs.

## 2026-05-09 — OpenAPI reference cleanup

- Keep `packages/contracts/openapi/lazu-api-reference.v1.yaml` as the source of truth.
- Sync OpenAPI JSON/YAML into the docs public assets during build.
- Keep the interactive reference available for request and response field checks.

# 更新日志

# 更新日志

## 2026-05-12 — 静态文档回到 VPS

- 恢复 Vocs 作为 `docs.lazu.ai` 的活跃文档站。
- 通过独立 nginx 静态容器服务文档。
- 统一官网和文档里的产品品牌与 API 示例。

## 2026-05-09 — OpenAPI Reference 整理

- `packages/contracts/openapi/lazu-api-reference.v1.yaml` 继续作为事实来源。
- 构建时同步 OpenAPI JSON/YAML 到 docs public assets。
- 保留交互式 reference，用于查请求和响应字段。

# 更新日誌

# 更新日誌

## 2026-05-12 — 靜態文件回到 VPS

- 恢復 Vocs 作為 `docs.lazu.ai` 的活躍文件站。
- 透過獨立 nginx 靜態容器服務文件。
- 統一官網和文件裡的產品品牌與 API 範例。

## 2026-05-09 — OpenAPI Reference 整理

- `packages/contracts/openapi/lazu-api-reference.v1.yaml` 繼續作為事實來源。
- 構建時同步 OpenAPI JSON/YAML 到 docs public assets。
- 保留互動式 reference，用於查請求和回應欄位。

# 更新履歴

# 更新履歴

## 2026-05-12 — 静的ドキュメントを VPS に戻す

- `docs.lazu.ai` のアクティブなドキュメントアプリとして Vocs を復元。
- ドキュメントを独立した静的 nginx コンテナとして配信。
- Web サイトとドキュメントのプロダクトブランドと API 例を統一。

## 2026-05-09 — OpenAPI リファレンスの整理

- `packages/contracts/openapi/lazu-api-reference.v1.yaml` を信頼できる唯一のソースとして維持。
- build 時に OpenAPI JSON/YAML を docs の public assets に同期。
- リクエストとレスポンスのフィールド確認に使えるインタラクティブリファレンスを維持。

# Authentication

# Authentication

Every Lazu request needs an API key. Create one in the
[console](https://lazu.ai/console/token). Keys look like `sk-lazu-…`.

## Headers by API style

| API style         | Header                                                      |
| ----------------- | ----------------------------------------------------------- |
| OpenAI-compatible | `Authorization: Bearer YOUR_API_KEY`                        |
| Anthropic native  | `x-api-key: YOUR_API_KEY` + `anthropic-version: 2023-06-01` |
| Gemini native     | `x-goog-api-key: YOUR_API_KEY` or query `?key=…`            |

All three are accepted by Lazu — pick whichever your client SDK already speaks.

## Recommended setup

- **Use a dedicated key per app / environment**. A key compromised in CI
  should not also let attackers into prod.
- **Restrict the key.** Each key can scope to specific models, quota cap,
  expiration time, IP allowlist and a per-vendor lane preference. See
  [API keys settings](https://lazu.ai/console/token).
- **Never commit keys to git**. Use environment variables, secret managers
  or vendor-specific secret stores.

## Storing the key

Shell:

```bash
export LAZU_API_KEY=sk-lazu-...
```

`.env` file (gitignored):

```
LAZU_API_KEY=sk-lazu-...
```

Python / Node SDK code: read from env, do not hardcode.

```python

api_key = os.environ["LAZU_API_KEY"]
```

```ts
const apiKey = process.env.LAZU_API_KEY!;
```

## Verifying

```bash
curl https://api.lazu.ai/v1/models \
  -H "Authorization: Bearer $LAZU_API_KEY"
```

A 200 response with a JSON model list means the key is live. A 401 with
`invalid_api_key` means it's been disabled, expired or never existed —
check the console.

## Rotation

If a key leaks: open the console, **delete** the compromised key, create a
fresh one, and update your deployments. There is no "rotate in place" — old
key is dead the moment you delete it.

# 鉴权

# 鉴权

> [info]

  中文版翻译进行中。完整内容请参阅 [English](/authentication)。

完整内容详见 [/authentication](/authentication)。

# 驗證

# 驗證

> [info]

  繁體中文版翻譯進行中。完整內容請參閱 [English](/authentication)。

完整內容詳見 [/authentication](/authentication)。

# 認証

# 認証

> [info]

  日本語版を翻訳中です。最新の内容は [English 版](/authentication)
  をご覧ください。

詳細は [/authentication](/authentication) を参照してください。

# How pricing works

# How pricing works

Lazu's billing is **prepaid** and **per-token** (or per-call for certain
image/audio models). You top up balance, every API call deducts at the
model's lane price, and your dashboard shows usage in real time.

## The basic formula

```
final_charge_microUSD =
    (input_tokens  × input_price_per_mtok)
  + (output_tokens × output_price_per_mtok)
  + (cache_read_tokens × cache_read_price_per_mtok)
  + (audio_tokens × audio_price_per_mtok)
  + (image_input_tokens × image_input_price_per_mtok)
  + (per_call_charge if any)
```

All amounts internally are **microUSD** (1 USD = 1,000,000 microUSD) so we
can be precise even on cents-fractional tokens. Your dashboard displays USD.

Prices come from `model_sell_prices[model_name, channel_group]` — see
[Pricing & lanes](/models/pricing) for the per-lane breakdown.

## Refund rules

| Upstream outcome                                     | What you pay                                                  |
| ---------------------------------------------------- | ------------------------------------------------------------- |
| `2xx` success                                        | Full price by upstream-reported usage                         |
| Streamed, then disconnect, **with** usage in trailer | Pay for the tokens actually streamed                          |
| Streamed, then disconnect, **no** usage trailer      | **Refund** — you pay 0                                        |
| `5xx` / timeout / network error                      | **Refund** — you pay 0                                        |
| `4xx` (content policy / bad request)                 | **Refund** — you pay 0 even if upstream charged us internally |
| Per-call image / audio model error                   | Charged at full per-call rate (see "Known edge case" below)   |

In practice this means: if Lazu returned `200` to you, you pay; if Lazu
returned `4xx`, `5xx` or a timeout, you don't.

> [warning]

  **Known edge case**: a small number of image-generation and audio models
  charge per-call (not per-token). If upstream fails mid-generation, you may
  still be charged the full per-call amount. We're working on this; for now,
  treat per-call models as best-effort billed.

## Top-up

Top up with a card via Stripe in the
[console](https://lazu.ai/console/topup). Funds appear instantly in your
balance.

- Minimum top-up: $5
- No expiration on credits
- Refunds: open a ticket within 7 days for failed-but-charged calls

Funded accounts are **verified** and immediately move from `Unverified`
(5 RPM cap) to the tier matching their lifetime top-up total. See
[Rate limits](/limits).

## Free credits

New accounts get a small free trial credit ($X, see console for current
amount). This is enough to test 100-ish basic chat calls. Free credit:

- Counts as balance — you can use it on any model in any lane
- Does **not** verify the account — to escape the 5 RPM cap, complete a
  real top-up
- Does not expire, but if you cap out without topping up, the account stays
  rate-limited

## Where to see usage

- **Console → Usage**: per-day, per-model, per-key breakdown
- **Console → Billing**: invoices, top-up history, current balance
- **API**: `GET /api/usage/...` (see [API reference](/api-reference))

## What's NOT layered on top

Lazu's bill is **just** input × price + output × price (etc., per the
formula above). There is no:

- "Premium tier discount" stacking on top of lane price
- "Loyalty multiplier" that reduces price over time
- Hidden margin per cache read or per audio token beyond the listed
  per-mtok rate
- Surcharge on weekends, regions, or model size

If you see a charge that doesn't match `tokens × listed_price`, that's a
bug — open a ticket.

## Streaming partial usage

When you call with `stream: true`:

- Tokens flush to your balance in real time as they're generated.
- If the client disconnects mid-stream, Lazu still bills for what was
  delivered (provided upstream reported it in a final `usage` trailer).
- If upstream errors before any tokens reach you, you're refunded.

This means an aborted stream of 1,000 tokens after the user clicked
"Cancel" still costs roughly 1,000 × output_price. The model already did
the work; the client just stopped reading.

## Enterprise / volume contracts

For workloads sustained above $1,000/month, contact sales via
[lazu.ai](https://lazu.ai/) — volume terms negotiated case by case.

## See also

- [Pricing & lanes](/models/pricing)
- [Rate limits](/limits)
- [Errors](/errors)

# 计费规则

# 计费规则

> [info]

  中文版翻译进行中。完整内容请参阅 [English](/billing)。

完整内容详见 [/billing](/billing)。

# 計費規則

# 計費規則

> [info]

  繁體中文版翻譯進行中。完整內容請參閱 [English](/billing)。

完整內容詳見 [/billing](/billing)。

# 料金体系

# 料金体系

> [info]

  日本語版を翻訳中です。最新の内容は [English 版](/billing) をご覧ください。

詳細は [/billing](/billing) を参照してください。

# Chat completions

# Chat completions

`POST /v1/chat/completions` — drop-in compatible with OpenAI's chat
completions API. Send the same request body, get the same response shape.

## Basic call

```bash
curl https://api.lazu.ai/v1/chat/completions \
  -H "Authorization: Bearer $LAZU_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Tell me a joke about Postgres."}
    ]
  }'
```

```python
from openai import OpenAI
client = OpenAI(base_url="https://api.lazu.ai/v1", api_key="...")

resp = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are concise."},
        {"role": "user", "content": "What's 2+2?"},
    ],
)
print(resp.choices[0].message.content)
```

## Streaming

Add `"stream": true`. The response is `text/event-stream` SSE.

```python
stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Count to 5"}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)
```

## Function / tool calling

Pass `tools` array; the response will include `tool_calls` when the model
decides to invoke one.

```json
{
  "model": "gpt-4o",
  "messages": [...],
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "parameters": { "type": "object", "properties": { ... } }
    }
  }]
}
```

The model catalog's `parameters.tools` field tells you which models support
this — see [Catalog](/models/catalog).

## Vision (image input)

For `chat/completions`, embed images inline (data URL or HTTPS URL):

```json
{
  "model": "gpt-4o",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "What's in this image?" },
        {
          "type": "image_url",
          "image_url": { "url": "data:image/png;base64,..." }
        }
      ]
    }
  ]
}
```

For PDFs / large documents, use the [Files API](/endpoints/files) +
[Responses](/endpoints/responses) instead. Chat completions does **not**
automatically dereference `file_id` references.

## Supported models

Anything where the catalog lists `chat` in `supported_endpoint_types`:

- OpenAI: `gpt-4o`, `gpt-4o-mini`, `gpt-4.1`, `o1`, `o3-mini`, `gpt-5` family
- Anthropic: `claude-haiku-4-5-*`, `claude-sonnet-4-5-*`, `claude-opus-*`
- Google: `gemini-2.5-flash`, `gemini-2.5-pro`
- DeepSeek: `deepseek-chat`, `deepseek-v3`, `deepseek-reasoner`
- Moonshot: `kimi-k2`, `kimi-k2-thinking`
- xAI: `grok-4-*`
- Many more — read the catalog at runtime.

## Request body fields

Standard OpenAI fields all work: `model`, `messages`, `max_tokens`,
`temperature`, `top_p`, `presence_penalty`, `frequency_penalty`,
`stop`, `n`, `tools`, `tool_choice`, `response_format`, `seed`, `user`,
`stream_options`.

See the [API reference](/api-reference) for the full schema.

## See also

- [Responses API](/endpoints/responses) — newer OpenAI endpoint with
  built-in `file_id` references
- [Pricing & lanes](/models/pricing) — direct vs cheap routing
- [Errors](/errors)

# Chat completions

# Chat completions

> [info]

  中文版翻译进行中。完整内容请参阅 [English](/endpoints/chat)。

完整内容详见 [/endpoints/chat](/endpoints/chat)。

# Chat completions

# Chat completions

> [info]

  繁體中文版翻譯進行中。完整內容請參閱 [English](/endpoints/chat)。

完整內容詳見 [/endpoints/chat](/endpoints/chat)。

# Chat completions

# Chat completions

> [info]

  日本語版を翻訳中です。最新の内容は [English 版](/endpoints/chat)
  をご覧ください。

詳細は [/endpoints/chat](/endpoints/chat) を参照してください。

# Embeddings

# Embeddings

`POST /v1/embeddings` — OpenAI-compatible embeddings endpoint.

## Basic call

```bash
curl https://api.lazu.ai/v1/embeddings \
  -H "Authorization: Bearer $LAZU_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "BAAI/bge-m3",
    "input": "Hello world"
  }'
```

Response:

```json
{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [-0.032, 0.023, -0.041, ...],
      "index": 0
    }
  ],
  "model": "BAAI/bge-m3",
  "usage": { "prompt_tokens": 4, "total_tokens": 4 }
}
```

## Python (OpenAI SDK)

```python
from openai import OpenAI
client = OpenAI(base_url="https://api.lazu.ai/v1", api_key="...")

resp = client.embeddings.create(
    model="text-embedding-3-small",
    input=["doc one", "doc two"],
)
for item in resp.data:
    print(item.embedding[:5], "...")
```

## Batch inputs

Pass an array of strings; the response `data` array preserves order.

```json
{
  "model": "BAAI/bge-m3",
  "input": ["first doc", "second doc", "third doc"]
}
```

For very large batches, chunk client-side — single request body must be
under reasonable size (typically 1 MB).

## Available models

Common embedding models on Lazu:

| Model                    | Dim  | Notes                                       |
| ------------------------ | ---- | ------------------------------------------- |
| `BAAI/bge-m3`            | 1024 | Multilingual, runs on Cloudflare Workers AI |
| `text-embedding-3-small` | 1536 | OpenAI's cheap default                      |
| `text-embedding-3-large` | 3072 | OpenAI's high-quality                       |
| `text-embedding-ada-002` | 1536 | Older OpenAI; for legacy compat             |
| `gemini-embedding-001`   | 768  | Google's default                            |

Read the [catalog](/models/catalog) at runtime — `supported_endpoint_types`
contains `embeddings` for any model you can call here.

## Dimensions

For models that support truncation (OpenAI `text-embedding-3-*`), you can
request fewer dimensions:

```json
{
  "model": "text-embedding-3-large",
  "input": "...",
  "dimensions": 256
}
```

Lower dimensions = smaller vectors, lower storage / search cost, slightly
lower quality. Not all models support this — check the catalog.

## See also

- [Pricing & lanes](/models/pricing) — embeddings are typically direct-only
- [Chat completions](/endpoints/chat)
- [Errors](/errors)

# Embeddings

# Embeddings

> [info]

  中文版翻译进行中。完整内容请参阅 [English](/endpoints/embeddings)。

完整内容详见 [/endpoints/embeddings](/endpoints/embeddings)。

# Embeddings

# Embeddings

> [info]

  繁體中文版翻譯進行中。完整內容請參閱 [English](/endpoints/embeddings)。

完整內容詳見 [/endpoints/embeddings](/endpoints/embeddings)。

# Embeddings

# Embeddings

> [info]

  日本語版を翻訳中です。最新の内容は [English 版](/endpoints/embeddings)
  をご覧ください。

詳細は [/endpoints/embeddings](/endpoints/embeddings) を参照してください。

# Files API

# Files API

OpenAI-compatible file storage. Upload once, reference by `file_id` from the
[Responses API](/endpoints/responses). PDFs, plain text, images — anything
under 512 MB.

## Endpoints

| Method | Path                    | Purpose        |
| ------ | ----------------------- | -------------- |
| POST   | `/v1/files`             | Upload         |
| GET    | `/v1/files`             | List           |
| GET    | `/v1/files/:id`         | Metadata       |
| GET    | `/v1/files/:id/content` | Download bytes |
| DELETE | `/v1/files/:id`         | Delete         |

All authenticated via the standard `Authorization: Bearer $LAZU_API_KEY`
header.

## Supported purposes

Every uploaded file must declare a `purpose`:

| Purpose     | Max size | Allowed mime types                                   | What it's for                                   |
| ----------- | -------- | ---------------------------------------------------- | ----------------------------------------------- |
| `user_data` | 512 MB   | Any non-executable                                   | PDFs, text, structured data passed to the model |
| `vision`    | 20 MB    | `image/png`, `image/jpeg`, `image/gif`, `image/webp` | Image inputs                                    |

OpenAI's `batch`, `fine-tune` and `assistants` purposes are **not yet
supported** — contact support if you need them.

Executable file extensions (`.exe`, `.dll`, `.sh`, `.bat`, …) are rejected
regardless of purpose.

## 1. Upload

### cURL

```bash
curl https://api.lazu.ai/v1/files \
  -H "Authorization: Bearer $LAZU_API_KEY" \
  -F purpose=user_data \
  -F file=@./paper.pdf
```

Response:

```json
{
  "id": "file-lazu-01KSBV4MC6THZ9TCZEM38KPYRX",
  "object": "file",
  "bytes": 1609275,
  "created_at": 1779587764,
  "filename": "paper.pdf",
  "purpose": "user_data",
  "status": "processed"
}
```

> [warning]

  Lazu returns **HTTP 201 Created** on successful upload (OpenAI returns 200).
  If your client SDK only accepts 200, treat 201 as success too.

### Python (OpenAI SDK)

```python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.lazu.ai/v1",
    api_key="YOUR_LAZU_KEY",
)

with open("paper.pdf", "rb") as f:
    file = client.files.create(file=f, purpose="user_data")

print(file.id)
```

## 2. Reference from the Responses API

```bash
curl https://api.lazu.ai/v1/responses \
  -H "Authorization: Bearer $LAZU_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": [{
      "role": "user",
      "content": [
        {"type": "input_text", "text": "Summarize this document"},
        {"type": "input_file", "file_id": "file-lazu-01KSBV4MC6THZ9TCZEM38KPYRX"}
      ]
    }]
  }'
```

For images, use `"type": "input_image"` instead of `input_file`.

> [info]

  Lazu only dereferences `file_id` on the **Responses API** (`/v1/responses`).
  `/v1/chat/completions` does **not** automatically pull the file contents — for
  chat completions, fetch the file content yourself and embed it inline (e.g. as
  base64 in an `image_url`).

## 3. List, retrieve, delete

```bash
# List all files for the current key
curl https://api.lazu.ai/v1/files \
  -H "Authorization: Bearer $LAZU_API_KEY"

# Metadata for one file
curl https://api.lazu.ai/v1/files/file-lazu-... \
  -H "Authorization: Bearer $LAZU_API_KEY"

# Download bytes
curl https://api.lazu.ai/v1/files/file-lazu-.../content \
  -H "Authorization: Bearer $LAZU_API_KEY" \
  -o downloaded.pdf

# Delete
curl -X DELETE https://api.lazu.ai/v1/files/file-lazu-... \
  -H "Authorization: Bearer $LAZU_API_KEY"
```

## Retention & cleanup

- **Files are kept for 30 days** from upload.
- After 30 days, the file is deleted from storage and the database in a
  daily cleanup job (runs around 04:30 Asia/Shanghai).
- You cannot extend retention from the public API. Re-upload if you need it
  to live longer.
- Deleted files are gone — there is no soft delete / recycle bin.

## Limits

| Limit                                          | Value  |
| ---------------------------------------------- | ------ |
| Single file (`user_data`)                      | 512 MB |
| Single file (`vision`)                         | 20 MB  |
| Total bytes dereferenced in one Responses call | 64 MB  |

If a Responses call references too many files (or files too large) and the
total exceeds 64 MB, Lazu returns `400` with code `file_too_large`.

## Errors

| HTTP | Code                         | Meaning                                            |
| ---- | ---------------------------- | -------------------------------------------------- |
| 400  | `missing_required_parameter` | Forgot `purpose`                                   |
| 400  | `purpose_not_supported`      | Used `batch` / `fine-tune` / `assistants`          |
| 400  | `invalid_request`            | Malformed multipart body / empty file              |
| 401  | `invalid_api_key`            | Auth header missing or wrong                       |
| 404  | `file_not_found`             | `file_id` doesn't exist or isn't owned by this key |
| 413  | `file_too_large`             | Single file exceeds purpose's max size             |

See [Errors](/errors) for the full table.

# Files API

# Files API

> [info]

  中文版翻译进行中。完整内容请参阅 [English](/endpoints/files)。

完整内容详见 [/endpoints/files](/endpoints/files)。

# Files API

# Files API

> [info]

  繁體中文版翻譯進行中。完整內容請參閱 [English](/endpoints/files)。

完整內容詳見 [/endpoints/files](/endpoints/files)。

# Files API

# Files API

> [info]

  日本語版を翻訳中です。最新の内容は [English 版](/endpoints/files)
  をご覧ください。

詳細は [/endpoints/files](/endpoints/files) を参照してください。

# Models list

# Models list

`GET /v1/models` — flat OpenAI-compatible list of accessible models for the
current API key.

## Call

```bash
curl https://api.lazu.ai/v1/models \
  -H "Authorization: Bearer $LAZU_API_KEY"
```

Response:

```json
{
  "object": "list",
  "data": [
    {
      "id": "gpt-4o-mini",
      "object": "model",
      "created": 1721347200,
      "owned_by": "openai"
    },
    {
      "id": "claude-haiku-4-5-20251001",
      "object": "model",
      "created": 1730851200,
      "owned_by": "anthropic"
    },
    ...
  ]
}
```

## When to use `/v1/models` vs `/api/models/catalog`

| Endpoint              | Use when                                                                         |
| --------------------- | -------------------------------------------------------------------------------- |
| `/v1/models`          | Your client SDK uses OpenAI's `models.list()` interface                          |
| `/api/models/catalog` | You need richer metadata: pricing per lane, modality, parameters, context length |

For agents and any code that picks a model dynamically, prefer
`/api/models/catalog` — it has the data you actually need to pick well.

## Python (OpenAI SDK)

```python
from openai import OpenAI
client = OpenAI(base_url="https://api.lazu.ai/v1", api_key="...")

for m in client.models.list().data:
    print(m.id, m.owned_by)
```

## What gets filtered

A model appears in this list only if:

1. It's enabled (`status=1`) on at least one Lazu channel
2. The current API key has access (no per-key model restriction excluding it)

If a model exists in the system but your key can't use it, it won't appear.

## See also

- [Catalog](/models/catalog) — richer schema
- [Chat completions](/endpoints/chat)

# 模型列表

# 模型列表

> [info]

  中文版翻译进行中。完整内容请参阅 [English](/endpoints/models)。

完整内容详见 [/endpoints/models](/endpoints/models)。

# 模型列表

# 模型列表

> [info]

  繁體中文版翻譯進行中。完整內容請參閱 [English](/endpoints/models)。

完整內容詳見 [/endpoints/models](/endpoints/models)。

# モデル一覧

# モデル一覧

> [info]

  日本語版を翻訳中です。最新の内容は [English 版](/endpoints/models)
  をご覧ください。

詳細は [/endpoints/models](/endpoints/models) を参照してください。

# Responses API

# Responses API

`POST /v1/responses` — OpenAI's newer endpoint that unifies chat,
reasoning and multimodal inputs. **The one Lazu endpoint that dereferences
`file_id`** automatically.

## When to use Responses vs Chat

| Need                                      | Use                                       |
| ----------------------------------------- | ----------------------------------------- |
| Quick chat, function calling              | [`/v1/chat/completions`](/endpoints/chat) |
| PDF / document with `file_id`             | `/v1/responses`                           |
| Image with `file_id` (uploaded via Files) | `/v1/responses`                           |
| Reasoning models (`o1`, `o3`, `gpt-5`)    | `/v1/responses` (preferred)               |

## Basic call

```bash
curl https://api.lazu.ai/v1/responses \
  -H "Authorization: Bearer $LAZU_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": [{
      "role": "user",
      "content": [
        {"type": "input_text", "text": "Say hi"}
      ]
    }]
  }'
```

Response (abridged):

```json
{
  "id": "resp_...",
  "object": "response",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [{ "type": "output_text", "text": "Hi! How can I help?" }]
    }
  ],
  "usage": {
    "input_tokens": 10,
    "output_tokens": 8,
    "total_tokens": 18
  }
}
```

## With a file

Upload via the [Files API](/endpoints/files), then reference:

```json
{
  "model": "gpt-4o",
  "input": [
    {
      "role": "user",
      "content": [
        { "type": "input_text", "text": "Summarize this PDF" },
        { "type": "input_file", "file_id": "file-lazu-..." }
      ]
    }
  ]
}
```

For images uploaded with `purpose=vision`, use `"type": "input_image"`
instead of `input_file`.

> [info]

  Lazu pulls the file from storage and inlines it server-side before forwarding
  to the upstream provider. You'll see `X-Lazu-File-Dereference: 1` on the
  response.

## Reasoning models

`o1`, `o3`, `gpt-5` family use Responses natively with reasoning effort
controls:

```json
{
  "model": "o3-mini",
  "reasoning": {"effort": "medium"},
  "input": [...]
}
```

## Tool calling

Same shape as chat completions — pass `tools`, get `tool_calls` back.
Reasoning models can invoke tools mid-reasoning.

## Limits

- Single `file_id` content ≤ Files API purpose limit (512 MB user_data,
  20 MB vision)
- Total dereferenced files in one Responses call ≤ **64 MB**

If you reference too many large files at once, Lazu returns 400 with
`file_too_large`.

## See also

- [Files API](/endpoints/files)
- [Chat completions](/endpoints/chat)
- [Examples](/examples)

# Responses

# Responses

> [info]

  中文版翻译进行中。完整内容请参阅 [English](/endpoints/responses)。

完整内容详见 [/endpoints/responses](/endpoints/responses)。

# Responses

# Responses

> [info]

  繁體中文版翻譯進行中。完整內容請參閱 [English](/endpoints/responses)。

完整內容詳見 [/endpoints/responses](/endpoints/responses)。

# Responses

# Responses

> [info]

  日本語版を翻訳中です。最新の内容は [English 版](/endpoints/responses)
  をご覧ください。

詳細は [/endpoints/responses](/endpoints/responses) を参照してください。

# Errors

# Errors

Every Lazu error follows the OpenAI error envelope:

```json
{
  "error": {
    "message": "Human-readable description",
    "type": "invalid_request_error",
    "code": "missing_required_parameter",
    "param": "purpose",
    "request_id": "req_lazu_01ABCDEF..."
  }
}
```

`code` is the stable machine-readable identifier — match on this, not on
`message`. `request_id` is also returned as the `X-Lazu-Request-Id` HTTP
header; include it when contacting support.

## HTTP status codes at a glance

| Status | Category              | What to do                                                    |
| ------ | --------------------- | ------------------------------------------------------------- |
| 200    | Success               | -                                                             |
| 201    | Created (file upload) | Same as 200; some clients reject 201 — patch them             |
| 400    | Bad request           | Fix the request body / params; safe to retry after fix        |
| 401    | Auth failed           | Wrong / missing API key; do **not** retry without fixing key  |
| 403    | Forbidden             | Key disabled, IP not allow-listed, or quota exhausted         |
| 404    | Not found             | Wrong path, missing model, deleted file_id                    |
| 413    | Payload too large     | Single file exceeds purpose's max size                        |
| 422    | Validation failed     | Body matched schema but values invalid (e.g. temperature OOB) |
| 429    | Rate limit            | Sleep `Retry-After` seconds, then retry                       |
| 500    | Lazu internal error   | Retry with backoff; if persistent, open a ticket              |
| 502    | Upstream bad gateway  | Retry; Lazu will try another channel automatically once       |
| 503    | Upstream unavailable  | Retry with backoff; usually transient                         |
| 504    | Upstream timeout      | Retry with backoff; check if your request is too slow         |

## Common error codes

### Auth

| Code                 | When                                                    |
| -------------------- | ------------------------------------------------------- |
| `invalid_api_key`    | Header missing, malformed, or key deleted               |
| `insufficient_quota` | Balance hit zero — top up to continue                   |
| `ip_not_allowed`     | Key has IP allowlist, your request came from outside it |
| `token_disabled`     | Key was manually disabled in console                    |
| `token_expired`      | Key passed its `expired_time`                           |

### Request shape

| Code                         | When                                                                |
| ---------------------------- | ------------------------------------------------------------------- |
| `bad_request_body`           | JSON didn't parse, or upstream rejected the request as malformed    |
| `missing_required_parameter` | Required field absent (`param` shows which one)                     |
| `invalid_request`            | Generic catch-all for malformed inputs                              |
| `model_not_found`            | The model name doesn't exist or isn't accessible to this key        |
| `convert_request_failed`     | Internal: couldn't translate the request for the chosen vendor      |
| `tokenization_error`         | Couldn't count input tokens (rare; usually huge / malformed prompt) |

### Files API specific

| Code                      | When                                                          |
| ------------------------- | ------------------------------------------------------------- |
| `purpose_not_supported`   | Used `batch` / `fine-tune` / `assistants` (not supported yet) |
| `file_not_found`          | `file_id` doesn't exist or wasn't uploaded by this key        |
| `file_expired`            | File aged past its 30-day retention                           |
| `file_too_large`          | Single file over purpose's max size, or Responses dereference |
|                           | total exceeds 64 MB                                           |
| `file_dereference_failed` | Couldn't pull the file from storage (very rare)               |

### Pricing & quota

| Code                             | When                                                     |
| -------------------------------- | -------------------------------------------------------- |
| `pricing_not_configured`         | Model exists but no sell-price set in the requested lane |
| `pre_consume_token_quota_failed` | Token quota tracking conflict (retry once)               |

### Rate limits

| Code                          | When                                                        |
| ----------------------------- | ----------------------------------------------------------- |
| `request_rate_limit_exceeded` | Hit your tier's RPM cap. `Retry-After` header shows seconds |

### Upstream / routing

| Code                             | When                                                              |
| -------------------------------- | ----------------------------------------------------------------- |
| `upstream_error`                 | Upstream returned non-2xx. `details.status_code` has the original |
| `upstream_timeout`               | Upstream didn't respond in time                                   |
| `upstream_network_error`         | Network failure connecting to upstream                            |
| `channel_response_time_exceeded` | Channel exceeded the per-channel timeout budget                   |
| `no_available_channel`           | No channel is configured to serve the requested model+lane combo  |

## Retry strategy

Roughly: retry **idempotent** error categories with exponential backoff:

| Status / code                        | Retry?                         |
| ------------------------------------ | ------------------------------ |
| `429 request_rate_limit_exceeded`    | Yes — honor `Retry-After`      |
| `500` / `502` / `503` / `504`        | Yes — exp backoff, max 3 tries |
| `400` / `404` / `422`                | No — fix the request           |
| `401` / `403`                        | No — fix auth / quota first    |
| `upstream_error` with 4xx underneath | No                             |
| `upstream_error` with 5xx underneath | Yes — exp backoff              |

Most popular OpenAI / Anthropic SDKs do this automatically. Don't disable
their retry logic unless you have a strong reason.

## Including a `request_id` in support tickets

Every Lazu response (success or error) includes:

```http
X-Lazu-Request-Id: req_lazu_01KSBV4MC6THZ9TCZEM38KPYRX
```

Paste this into any support ticket. We can trace the full request path
through routing, upstream call, and billing — far faster than re-creating
the issue from a fuzzy description.

## See also

- [Rate limits](/limits)
- [Authentication](/authentication)
- [API reference](/api-reference)

# 错误码

# 错误码

> [info]

  中文版翻译进行中。完整内容请参阅 [English](/errors)。

完整内容详见 [/errors](/errors)。

# 錯誤碼

# 錯誤碼

> [info]

  繁體中文版翻譯進行中。完整內容請參閱 [English](/errors)。

完整內容詳見 [/errors](/errors)。

# エラーコード

# エラーコード

> [info]

  日本語版を翻訳中です。最新の内容は [English 版](/errors) をご覧ください。

詳細は [/errors](/errors) を参照してください。

# Rate limits

# Rate limits

Lazu protects backends with two layers of limits:

1. **Tier-based RPM/TPM** — every account belongs to a tier (`Tier0..Tier3`)
   based on rolling spend. Higher tier = higher rate caps.
2. **Anti-abuse for unverified accounts** — if you've never funded the
   account (no successful top-up), you're capped at a tiny rate regardless
   of tier.

Both limits apply **per API key user**, not globally — your usage doesn't
affect other Lazu customers.

## Defaults

| Tier           | Eligibility                                       | RPM |       TPM |
| -------------- | ------------------------------------------------- | --: | --------: |
| **Unverified** | No successful top-up yet                          |   5 |     5,000 |
| Tier 0         | Verified (paid at least once), 30-day spend < $10 |  60 |   100,000 |
| Tier 1         | 30-day spend ≥ $10                                | 120 |   300,000 |
| Tier 2         | 30-day spend ≥ $100                               | 300 | 1,000,000 |
| Tier 3         | 30-day spend ≥ $1,000                             | 600 | 2,000,000 |

- **RPM** = requests per minute (rolling 60-second window)
- **TPM** = tokens per minute, counting both input and estimated output

## How tier is determined

- New account = `Tier0` after first successful top-up, `Unverified` before.
- Tier auto-adjusts daily at 04:30 (Asia/Shanghai) based on 30-day rolling
  spend.
- A single top-up immediately upgrades you to the tier matching your
  lifetime topup total — you don't have to wait for the daily job.

> [info]

  Promotional credits ("free $5 to try Lazu", referral bonuses, admin-added
  quota) do **not** count as funding. You stay `Unverified` until you complete a
  real top-up with a payment method.

## When you hit a limit

HTTP `429` with:

```json
{
  "error": {
    "message": "Request rate limit exceeded (60/min)",
    "type": "rate_limit_exceeded",
    "code": "request_rate_limit_exceeded"
  }
}
```

Response headers include `Retry-After: 60` (seconds).

Best practice: catch `429`, sleep `Retry-After` seconds, then retry with
exponential backoff if you hit it twice in a row. The major OpenAI / Anthropic
SDKs already do this — make sure you haven't disabled retries.

## Streaming concurrency

Streaming (`stream: true`) responses count as **one request** for RPM, and
their tokens flush into TPM as they arrive. A long-running stream that
emits 10,000 output tokens over 30 seconds is one RPM request but consumes
roughly 10,000 TPM credit in the minute(s) it's active.

There's no separate "concurrent streams" cap beyond the RPM/TPM math.

## Per-vendor sub-limits

Lazu doesn't enforce vendor-side limits — upstream providers like OpenAI
and Anthropic still apply their own caps to the underlying API key pool.
On a busy day you may see upstream `429` or `503` propagate through. Lazu's
routing will retry against backup channels in the same lane when possible.

## Need higher limits?

For sustained high-throughput workloads (1,000+ RPM, several million TPM),
contact support via [lazu.ai](https://lazu.ai/) — limits can be raised
per-account with no public commitment.

## See also

- [Errors](/errors) — full error code table
- [Billing](/billing) — how spend rolls up to tier
- [Pricing & lanes](/models/pricing) — choosing direct vs cheap

# 请求频率限制

# 请求频率限制

> [info]

  中文版翻译进行中。完整内容请参阅 [English](/limits)。

完整内容详见 [/limits](/limits)。

# 請求頻率限制

# 請求頻率限制

> [info]

  繁體中文版翻譯進行中。完整內容請參閱 [English](/limits)。

完整內容詳見 [/limits](/limits)。

# レート制限

# レート制限

> [info]

  日本語版を翻訳中です。最新の内容は [English 版](/limits) をご覧ください。

詳細は [/limits](/limits) を参照してください。

# Model catalog

# Model catalog

Lazu publishes the full set of models a given API key can access. Read this
catalog at runtime instead of hardcoding model names — new models get added,
old ones get deprecated, and which models a key can reach depends on the
key's scope.

## GET `/api/models/catalog`

```bash
curl https://api.lazu.ai/api/models/catalog \
  -H "Authorization: Bearer $LAZU_API_KEY"
```

Response (abridged):

```json
{
  "data": [
    {
      "model_name": "gpt-4o-mini",
      "vendor_id": "openai",
      "modality": {
        "input": ["text", "image"],
        "output": ["text"]
      },
      "supported_endpoint_types": ["chat", "responses"],
      "supported_endpoints": ["/v1/chat/completions", "/v1/responses"],
      "default_endpoint_type": "chat",
      "parameters": {
        "tools": true,
        "reasoning": false,
        "vision": true
      },
      "lanes": [
        { "name": "direct", "input_per_mtok": 0.18, "output_per_mtok": 0.72 },
        { "name": "cheap", "input_per_mtok": 0.05, "output_per_mtok": 0.2 }
      ],
      "official_input_price": 0.15,
      "official_output_price": 0.6,
      "context_length": 128000,
      "max_output_tokens": 16384
    },
    "..."
  ]
}
```

## Picking a model

| You want             | Filter on                                        | Common endpoint                           |
| -------------------- | ------------------------------------------------ | ----------------------------------------- |
| Text chat            | `modality.output` includes `text`                | `/v1/chat/completions`                    |
| Multimodal (vision)  | `modality.input` includes `image`                | `/v1/chat/completions` or `/v1/responses` |
| Reasoning (o-series) | `parameters.reasoning === true`                  | `/v1/responses` (recommended)             |
| Embeddings           | `supported_endpoint_types` includes `embeddings` | `/v1/embeddings`                          |
| Tool / function call | `parameters.tools === true`                      | `/v1/chat/completions`                    |

Use `default_endpoint_type` unless your client SDK specifically needs a
native provider endpoint (Anthropic's `/v1/messages`, Gemini's
`/v1beta/models/…:generateContent`, etc.).

## Listing via `/v1/models`

If your SDK uses OpenAI-style `models.list()`, Lazu also serves a flatter
OpenAI-compatible response at `/v1/models`:

```bash
curl https://api.lazu.ai/v1/models \
  -H "Authorization: Bearer $LAZU_API_KEY"
```

See [Models list](/endpoints/models) for the exact schema.

## Pricing detail

Each catalog entry has a `lanes` array — see [Pricing & lanes](/models/pricing)
for what `direct` and `cheap` mean and how to choose between them.

# 模型目录

# 模型目录

> [info]

  中文版翻译进行中。完整内容请参阅 [English](/models/catalog)。

完整内容详见 [/models/catalog](/models/catalog)。

# 模型目錄

# 模型目錄

> [info]

  繁體中文版翻譯進行中。完整內容請參閱 [English](/models/catalog)。

完整內容詳見 [/models/catalog](/models/catalog)。

# モデルカタログ

# モデルカタログ

> [info]

  日本語版を翻訳中です。最新の内容は [English 版](/models/catalog)
  をご覧ください。

詳細は [/models/catalog](/models/catalog) を参照してください。

# Pricing & lanes

# Pricing & lanes

Lazu shows three numbers on every model:

1. **Official reference price** — the upstream provider's public price for
   comparison. We don't bill this; it's only here so you can see how much
   Lazu saves vs. paying the provider direct.
2. **Direct lane price** — what we charge when routing via first-party /
   official cloud (OpenAI direct, Anthropic direct, Azure, AWS, GCP).
3. **Cheap lane price** — what we charge when routing via a lower-cost
   path. Available only for some models / vendors. Slightly less consistent
   uptime in exchange for big cost savings.

## What `direct` and `cheap` mean

| Lane     | What it routes to                                       | When to use                                                   |
| -------- | ------------------------------------------------------- | ------------------------------------------------------------- |
| `direct` | First-party endpoints + major cloud (Azure / AWS / GCP) | Production, regulated workloads, latency-sensitive            |
| `cheap`  | Lower-cost third-party paths                            | Batch jobs, experiments, prototyping, anything cost-sensitive |

Every model is guaranteed to have a `direct` lane. `cheap` is opt-in per
model — some models (small open-source, embeddings, Cloudflare Workers AI)
only ship as `direct` because there's no meaningful cheaper path.

## Example: gpt-4o-mini

```
gpt-4o-mini                       Context: 128K
─────────────────────────────────────────────────
                       input /1M    output /1M
Official (reference)   $0.15        $0.60
─────────────────────────────────────────────────
direct                 $0.14        $0.56
  discount             -6.7%        -6.7%
─────────────────────────────────────────────────
cheap                  $0.05        $0.20
  discount             -66.7%       -66.7%
```

Both lanes are billed per token, in microUSD precision. Your dashboard's
USD figure is rounded for display.

## Picking a lane per vendor

In the [console → API keys](https://lazu.ai/console/token), open a key and
expand **Advanced**. For each vendor you can choose:

- **direct** — always route this vendor's models through direct lane
- **cheap** — prefer cheap lane; fall back to direct if no cheap route
  exists for the requested model

If you don't configure anything, the default is **cheap** — most users want
the cheapest available option.

> [info]

  Lane is chosen **per vendor**, not per model. A key set to `cheap` for OpenAI
  will use the cheap lane for every OpenAI model that has one, and fall back to
  direct for models that don't.

## Fallback behaviour

If you ask for `cheap` but the requested model only ships as `direct`,
Lazu transparently falls back to `direct`:

- Request succeeds with HTTP 200
- Response header: `X-Lazu-Lane-Fallback: cheap->direct`
- You're billed at the **direct** lane price (since that's what served you)

If you ask for `direct` but no direct channel exists, Lazu returns
HTTP 404 `model_not_found` — we don't silently downgrade. This is to
avoid surprising regulated workloads with non-direct routing.

## What's NOT charged extra

Some token categories are billed at the lane's headline price, **without
any extra Lazu markup or hidden surcharge**:

- Prompt caching reads (when supported by the model)
- Cache writes (5-minute and 1-hour)
- Audio input/output tokens
- Image input tokens
- Per-call image / video generation cost

The same prompt sent to the same lane always bills the same way; there's no
"premium tier discount" or "loyalty multiplier" layered on top.

## API: reading prices programmatically

The model catalog exposes prices per lane:

```bash
curl https://api.lazu.ai/api/models/catalog \
  -H "Authorization: Bearer $LAZU_API_KEY" \
  | jq '.data[] | select(.model_name=="gpt-4o-mini") | .lanes'
```

```json
[
  {
    "name": "direct",
    "input_per_mtok": 0.14,
    "output_per_mtok": 0.56,
    "cache_read_per_mtok": 0.07,
    "audio_input_per_mtok": null,
    "image_input_per_mtok": null
  },
  {
    "name": "cheap",
    "input_per_mtok": 0.05,
    "output_per_mtok": 0.2,
    "cache_read_per_mtok": 0.025,
    "audio_input_per_mtok": null,
    "image_input_per_mtok": null
  }
]
```

For agents and scripts: read this at startup, cache for an hour, and pick the
lane with acceptable risk × cost for your workload.

## When prices change

We update prices when upstream providers change theirs, and announce in the
[Changelog](/changelog). Active billing is always at the price in effect at
**request time** — we never retroactively re-bill.

# 定价与渠道

# 定价与渠道

> [info]

  中文版翻译进行中。完整内容请参阅 [English](/models/pricing)。

完整内容详见 [/models/pricing](/models/pricing)。

# 定價與通道

# 定價與通道

> [info]

  繁體中文版翻譯進行中。完整內容請參閱 [English](/models/pricing)。

完整內容詳見 [/models/pricing](/models/pricing)。

# 料金と経路

# 料金と経路

> [info]

  日本語版を翻訳中です。最新の内容は [English 版](/models/pricing)
  をご覧ください。

詳細は [/models/pricing](/models/pricing) を参照してください。

# Quickstart

# Quickstart

## 1. Get an API key

Sign in to the [Lazu console](https://lazu.ai/console), open **API keys** and
click **Create key**. Copy the key (it starts with `sk-lazu-…`).

## 2. Pick a base URL

```
https://api.lazu.ai
```

For self-hosted Lazu, replace with your domain.

## 3. Make a request

### cURL

```bash
curl https://api.lazu.ai/v1/chat/completions \
  -H "Authorization: Bearer $LAZU_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "Say hello from Lazu"}
    ]
  }'
```

### Python (OpenAI SDK)

```python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.lazu.ai/v1",
    api_key="YOUR_LAZU_KEY",
)

resp = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Say hello from Lazu"}],
)
print(resp.choices[0].message.content)
```

### TypeScript (OpenAI SDK)

```ts

const client = new OpenAI({
  baseURL: "https://api.lazu.ai/v1",
  apiKey: process.env.LAZU_API_KEY,
});

const resp = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Say hello from Lazu" }],
});
console.log(resp.choices[0].message.content);
```

## 4. Next steps

- Browse available models: [Catalog](/models/catalog)
- See pricing per lane: [Pricing](/models/pricing)
- Stream responses, send images, upload PDFs: [Examples](/examples)
- Hit a rate limit?: [Rate limits](/limits) · [Errors](/errors)

# 快速开始

# 快速开始

> [info]

  中文版翻译进行中。完整内容请参阅 [English](/quickstart)。

完整内容详见 [/quickstart](/quickstart)。

# 快速開始

# 快速開始

> [info]

  繁體中文版翻譯進行中。完整內容請參閱 [English](/quickstart)。

完整內容詳見 [/quickstart](/quickstart)。

# クイックスタート

# クイックスタート

> [info]

  日本語版を翻訳中です。最新の内容は [English 版](/quickstart) をご覧ください。

詳細は [/quickstart](/quickstart) を参照してください。