Pricing & lanes

Lazu shows three numbers on every model:

Official reference price — the upstream provider's public price for comparison. We don't bill this; it's only here so you can see how much Lazu saves vs. paying the provider direct.
Direct lane price — what we charge when routing via first-party / official cloud (OpenAI direct, Anthropic direct, Azure, AWS, GCP).
Cheap lane price — what we charge when routing via a lower-cost path. Available only for some models / vendors. Slightly less consistent uptime in exchange for big cost savings.

What `direct` and `cheap` mean

Lane	What it routes to	When to use
`direct`	First-party endpoints + major cloud (Azure / AWS / GCP)	Production, regulated workloads, latency-sensitive
`cheap`	Lower-cost third-party paths	Batch jobs, experiments, prototyping, anything cost-sensitive

Every model is guaranteed to have a direct lane. cheap is opt-in per model — some models (small open-source, embeddings, Cloudflare Workers AI) only ship as direct because there's no meaningful cheaper path.

Example: gpt-4o-mini

gpt-4o-mini                       Context: 128K
─────────────────────────────────────────────────
                       input /1M    output /1M
Official (reference)   $0.15        $0.60
─────────────────────────────────────────────────
direct                 $0.14        $0.56
  discount             -6.7%        -6.7%
─────────────────────────────────────────────────
cheap                  $0.05        $0.20
  discount             -66.7%       -66.7%

Both lanes are billed per token, in microUSD precision. Your dashboard's USD figure is rounded for display.

Picking a lane per vendor

In the console → API keys, open a key and expand Advanced. For each vendor you can choose:

direct — always route this vendor's models through direct lane
cheap — prefer cheap lane; fall back to direct if no cheap route exists for the requested model

If you don't configure anything, the default is cheap — most users want the cheapest available option.

Fallback behaviour

If you ask for cheap but the requested model only ships as direct, Lazu transparently falls back to direct:

Request succeeds with HTTP 200
Response header: X-Lazu-Lane-Fallback: cheap->direct
You're billed at the direct lane price (since that's what served you)

If you ask for direct but no direct channel exists, Lazu returns HTTP 404 model_not_found — we don't silently downgrade. This is to avoid surprising regulated workloads with non-direct routing.

What's NOT charged extra

Some token categories are billed at the lane's headline price, without any extra Lazu markup or hidden surcharge:

Prompt caching reads (when supported by the model)
Cache writes (5-minute and 1-hour)
Audio input/output tokens
Image input tokens
Per-call image / video generation cost

The same prompt sent to the same lane always bills the same way; there's no "premium tier discount" or "loyalty multiplier" layered on top.

API: reading effective prices programmatically

The model catalog returns the effective catalog price for the current API key. Use /api/models/catalog when token-scoped access matters; use /api/pricing only when you need the public list-pricing view.

curl https://api.lazu.ai/api/models/catalog \
  -H "Authorization: Bearer $LAZU_API_KEY" \
  | jq '.data[] | select(.id=="gpt-4o-mini") | .pricing'

{
  "currency": "USD",
  "billing_type": "per_token",
  "input": 0.05,
  "output": 0.2,
  "cache_read": 0.025,
  "pricing_version": 12,
  "updated_at": "2026-06-01T00:00:00Z"
}

For agents and scripts: read /api/models/catalog at startup and choose from models the key can actually access. Lane preference is configured on the API key; the returned pricing object is the price that key should use for estimation.

When prices change

We update prices when upstream providers change theirs, and announce in the Changelog. Active billing is always at the price in effect at request time — we never retroactively re-bill.

What direct and cheap mean

Example: gpt-4o-mini

Picking a lane per vendor

Fallback behaviour

What's NOT charged extra

API: reading effective prices programmatically

When prices change

What `direct` and `cheap` mean