Are you an LLM? Read llms.txt for a summary of the docs, or llms-full.txt for the full context.
Skip to content

Pricing & lanes

Lazu shows three numbers on every model:

  1. Official reference price — the upstream provider's public price for comparison. We don't bill this; it's only here so you can see how much Lazu saves vs. paying the provider direct.
  2. Direct lane price — what we charge when routing via first-party / official cloud (OpenAI direct, Anthropic direct, Azure, AWS, GCP).
  3. Cheap lane price — what we charge when routing via a lower-cost path. Available only for some models / vendors. Slightly less consistent uptime in exchange for big cost savings.

What direct and cheap mean

LaneWhat it routes toWhen to use
directFirst-party endpoints + major cloud (Azure / AWS / GCP)Production, regulated workloads, latency-sensitive
cheapLower-cost third-party pathsBatch jobs, experiments, prototyping, anything cost-sensitive

Every model is guaranteed to have a direct lane. cheap is opt-in per model — some models (small open-source, embeddings, Cloudflare Workers AI) only ship as direct because there's no meaningful cheaper path.

Example: gpt-4o-mini

gpt-4o-mini                       Context: 128K
─────────────────────────────────────────────────
                       input /1M    output /1M
Official (reference)   $0.15        $0.60
─────────────────────────────────────────────────
direct                 $0.14        $0.56
  discount             -6.7%        -6.7%
─────────────────────────────────────────────────
cheap                  $0.05        $0.20
  discount             -66.7%       -66.7%

Both lanes are billed per token, in microUSD precision. Your dashboard's USD figure is rounded for display.

Picking a lane per vendor

In the console → API keys, open a key and expand Advanced. For each vendor you can choose:

  • direct — always route this vendor's models through direct lane
  • cheap — prefer cheap lane; fall back to direct if no cheap route exists for the requested model

If you don't configure anything, the default is cheap — most users want the cheapest available option.

Fallback behaviour

If you ask for cheap but the requested model only ships as direct, Lazu transparently falls back to direct:

  • Request succeeds with HTTP 200
  • Response header: X-Lazu-Lane-Fallback: cheap->direct
  • You're billed at the direct lane price (since that's what served you)

If you ask for direct but no direct channel exists, Lazu returns HTTP 404 model_not_found — we don't silently downgrade. This is to avoid surprising regulated workloads with non-direct routing.

What's NOT charged extra

Some token categories are billed at the lane's headline price, without any extra Lazu markup or hidden surcharge:

  • Prompt caching reads (when supported by the model)
  • Cache writes (5-minute and 1-hour)
  • Audio input/output tokens
  • Image input tokens
  • Per-call image / video generation cost

The same prompt sent to the same lane always bills the same way; there's no "premium tier discount" or "loyalty multiplier" layered on top.

API: reading prices programmatically

The model catalog exposes prices per lane:

curl https://api.lazu.ai/api/models/catalog \
  -H "Authorization: Bearer $LAZU_API_KEY" \
  | jq '.data[] | select(.model_name=="gpt-4o-mini") | .lanes'
[
  {
    "name": "direct",
    "input_per_mtok": 0.14,
    "output_per_mtok": 0.56,
    "cache_read_per_mtok": 0.07,
    "audio_input_per_mtok": null,
    "image_input_per_mtok": null
  },
  {
    "name": "cheap",
    "input_per_mtok": 0.05,
    "output_per_mtok": 0.2,
    "cache_read_per_mtok": 0.025,
    "audio_input_per_mtok": null,
    "image_input_per_mtok": null
  }
]

For agents and scripts: read this at startup, cache for an hour, and pick the lane with acceptable risk × cost for your workload.

When prices change

We update prices when upstream providers change theirs, and announce in the Changelog. Active billing is always at the price in effect at request time — we never retroactively re-bill.