How pricing works

Lazu's billing is prepaid and per-token (or per-call for certain image/audio models). You top up balance, every API call deducts at the model's lane price, and your dashboard shows usage in real time.

The basic formula

final_charge_microUSD =
    (input_tokens  × input_price_per_mtok)
  + (output_tokens × output_price_per_mtok)
  + (cache_read_tokens × cache_read_price_per_mtok)
  + (cache_write_5m_tokens × cache_write_5m_price_per_mtok)
  + (cache_write_1h_tokens × cache_write_1h_price_per_mtok)
  + (audio_tokens × audio_price_per_mtok)
  + (image_input_tokens × image_input_price_per_mtok)
  + (per_call_charge if any)

All amounts internally are microUSD (1 USD = 1,000,000 microUSD) so we can be precise even on cents-fractional tokens. Your dashboard displays USD.

Prices come from model_sell_prices[model_name, channel_group] — see Pricing & lanes for the per-lane breakdown.

Refund rules

Upstream outcome	What you pay
`2xx` success	Full price by upstream-reported usage
Streamed, then disconnect, with usage in trailer	Pay for the tokens actually streamed
Streamed, then disconnect, no usage trailer	Refund — you pay 0
`5xx` / timeout / network error	Refund — you pay 0
`4xx` (content policy / bad request)	Refund — you pay 0 even if upstream charged us internally
Per-call image / audio model error	Charged at full per-call rate (see "Known edge case" below)

In practice this means: if Lazu returned 200 to you, you pay; if Lazu returned 4xx, 5xx or a timeout, you don't.

Usage and cache fields

OpenAI-compatible inference responses keep the standard usage shape and add optional cache fields when the upstream reports them:

{
  "usage": {
    "prompt_tokens": 1200,
    "completion_tokens": 300,
    "total_tokens": 1500,
    "prompt_tokens_details": {
      "cached_tokens": 300,
      "cache_write_tokens": 100,
      "cache_write_5m_tokens": 100,
      "cache_miss_tokens": 800
    }
  }
}

cached_tokens means cache reads. cache_write_tokens, cache_write_5m_tokens, and cache_write_1h_tokens mean cache creation/write tokens. cache_miss_tokens is analytical and appears only when the provider reports misses separately.

For a complete reconciliation record, call:

curl https://api.lazu.ai/api/usage/requests/req_lazu_01ABCDEF \
  -H "Authorization: Bearer $LAZU_API_KEY"

The request detail response includes normalized usage.dimensions, billing.line_items, compact provider_usage.raw_fields, and routing metadata for the same API key that made the request.

Search billing

POST /v1/search records a web_search line item from the selected search backend. The charge amount is based on the backend's configured search_price, or provider-cost passthrough when an operator enables it. The hosted Tavily, Serper and Jina search backends currently do not have an explicit search_price, so their search line item is $0 while usage is still recorded. Search responses include two usage fields:

{
  "usage": {
    "web_search_requests": 1,
    "web_search_billable_units": 1
  }
}

web_search_requests is the Lazu request count, usually 1.
web_search_billable_units is the provider billing quantity used for the charge.
Tavily basic search is 1 unit; Tavily advanced search is 2 units.
Serper is normally 1 unit per successful query.
Jina is normally 1 unit per successful query.

See Search for request fields and routing details.

Top-up

Top up with a card via Stripe in the console. Funds appear instantly in your balance.

Minimum top-up: $5
No expiration on credits
Refunds: open a ticket within 7 days for failed-but-charged calls

Funded accounts are verified and immediately move from Unverified (5 RPM cap) to the tier matching their lifetime top-up total. See Rate limits.

Free credits

New accounts get a small free trial credit ($X, see console for current amount). This is enough to test 100-ish basic chat calls. Free credit:

Counts as balance — you can use it on any model in any lane
Does not verify the account — to escape the 5 RPM cap, complete a real top-up
Does not expire, but if you cap out without topping up, the account stays rate-limited

Where to see usage

Console → Usage: per-day, per-model, per-key breakdown
Console → Billing: invoices, top-up history, current balance
API: GET /api/usage/requests/{request_id} (see Request details)

What's NOT layered on top

Lazu's bill is just input × price + output × price (etc., per the formula above). There is no:

"Premium tier discount" stacking on top of lane price
"Loyalty multiplier" that reduces price over time
Hidden margin per cache read or per audio token beyond the listed per-mtok rate
Surcharge on weekends, regions, or model size

If you see a charge that doesn't match tokens × listed_price, that's a bug — open a ticket.

Streaming partial usage

When you call with stream: true:

Tokens flush to your balance in real time as they're generated.
If the client disconnects mid-stream, Lazu still bills for what was delivered (provided upstream reported it in a final usage trailer).
If upstream errors before any tokens reach you, you're refunded.

This means an aborted stream of 1,000 tokens after the user clicked "Cancel" still costs roughly 1,000 × output_price. The model already did the work; the client just stopped reading.

Enterprise / volume contracts

For workloads sustained above $1,000/month, contact sales via lazu.ai — volume terms negotiated case by case.