Pricing & lanes
Lazu shows three numbers on every model:
- Official reference price — the upstream provider's public price for comparison. We don't bill this; it's only here so you can see how much Lazu saves vs. paying the provider direct.
- Direct lane price — what we charge when routing via first-party / official cloud (OpenAI direct, Anthropic direct, Azure, AWS, GCP).
- Cheap lane price — what we charge when routing via a lower-cost path. Available only for some models / vendors. Slightly less consistent uptime in exchange for big cost savings.
What direct and cheap mean
| Lane | What it routes to | When to use |
|---|---|---|
direct | First-party endpoints + major cloud (Azure / AWS / GCP) | Production, regulated workloads, latency-sensitive |
cheap | Lower-cost third-party paths | Batch jobs, experiments, prototyping, anything cost-sensitive |
Every model is guaranteed to have a direct lane. cheap is opt-in per
model — some models (small open-source, embeddings, Cloudflare Workers AI)
only ship as direct because there's no meaningful cheaper path.
Example: gpt-4o-mini
gpt-4o-mini Context: 128K
─────────────────────────────────────────────────
input /1M output /1M
Official (reference) $0.15 $0.60
─────────────────────────────────────────────────
direct $0.14 $0.56
discount -6.7% -6.7%
─────────────────────────────────────────────────
cheap $0.05 $0.20
discount -66.7% -66.7%Both lanes are billed per token, in microUSD precision. Your dashboard's USD figure is rounded for display.
Picking a lane per vendor
In the console → API keys, open a key and expand Advanced. For each vendor you can choose:
- direct — always route this vendor's models through direct lane
- cheap — prefer cheap lane; fall back to direct if no cheap route exists for the requested model
If you don't configure anything, the default is cheap — most users want the cheapest available option.
Fallback behaviour
If you ask for cheap but the requested model only ships as direct,
Lazu transparently falls back to direct:
- Request succeeds with HTTP 200
- Response header:
X-Lazu-Lane-Fallback: cheap->direct - You're billed at the direct lane price (since that's what served you)
If you ask for direct but no direct channel exists, Lazu returns
HTTP 404 model_not_found — we don't silently downgrade. This is to
avoid surprising regulated workloads with non-direct routing.
What's NOT charged extra
Some token categories are billed at the lane's headline price, without any extra Lazu markup or hidden surcharge:
- Prompt caching reads (when supported by the model)
- Cache writes (5-minute and 1-hour)
- Audio input/output tokens
- Image input tokens
- Per-call image / video generation cost
The same prompt sent to the same lane always bills the same way; there's no "premium tier discount" or "loyalty multiplier" layered on top.
API: reading prices programmatically
The model catalog exposes prices per lane:
curl https://api.lazu.ai/api/models/catalog \
-H "Authorization: Bearer $LAZU_API_KEY" \
| jq '.data[] | select(.model_name=="gpt-4o-mini") | .lanes'[
{
"name": "direct",
"input_per_mtok": 0.14,
"output_per_mtok": 0.56,
"cache_read_per_mtok": 0.07,
"audio_input_per_mtok": null,
"image_input_per_mtok": null
},
{
"name": "cheap",
"input_per_mtok": 0.05,
"output_per_mtok": 0.2,
"cache_read_per_mtok": 0.025,
"audio_input_per_mtok": null,
"image_input_per_mtok": null
}
]For agents and scripts: read this at startup, cache for an hour, and pick the lane with acceptable risk × cost for your workload.
When prices change
We update prices when upstream providers change theirs, and announce in the Changelog. Active billing is always at the price in effect at request time — we never retroactively re-bill.