Are you an LLM? Read llms.txt for a summary of the docs, or llms-full.txt for the full context.
Skip to content

How pricing works

Lazu's billing is prepaid and per-token (or per-call for certain image/audio models). You top up balance, every API call deducts at the model's lane price, and your dashboard shows usage in real time.

The basic formula

final_charge_microUSD =
    (input_tokens  × input_price_per_mtok)
  + (output_tokens × output_price_per_mtok)
  + (cache_read_tokens × cache_read_price_per_mtok)
  + (audio_tokens × audio_price_per_mtok)
  + (image_input_tokens × image_input_price_per_mtok)
  + (per_call_charge if any)

All amounts internally are microUSD (1 USD = 1,000,000 microUSD) so we can be precise even on cents-fractional tokens. Your dashboard displays USD.

Prices come from model_sell_prices[model_name, channel_group] — see Pricing & lanes for the per-lane breakdown.

Refund rules

Upstream outcomeWhat you pay
2xx successFull price by upstream-reported usage
Streamed, then disconnect, with usage in trailerPay for the tokens actually streamed
Streamed, then disconnect, no usage trailerRefund — you pay 0
5xx / timeout / network errorRefund — you pay 0
4xx (content policy / bad request)Refund — you pay 0 even if upstream charged us internally
Per-call image / audio model errorCharged at full per-call rate (see "Known edge case" below)

In practice this means: if Lazu returned 200 to you, you pay; if Lazu returned 4xx, 5xx or a timeout, you don't.

Top-up

Top up with a card via Stripe in the console. Funds appear instantly in your balance.

  • Minimum top-up: $5
  • No expiration on credits
  • Refunds: open a ticket within 7 days for failed-but-charged calls

Funded accounts are verified and immediately move from Unverified (5 RPM cap) to the tier matching their lifetime top-up total. See Rate limits.

Free credits

New accounts get a small free trial credit ($X, see console for current amount). This is enough to test 100-ish basic chat calls. Free credit:

  • Counts as balance — you can use it on any model in any lane
  • Does not verify the account — to escape the 5 RPM cap, complete a real top-up
  • Does not expire, but if you cap out without topping up, the account stays rate-limited

Where to see usage

  • Console → Usage: per-day, per-model, per-key breakdown
  • Console → Billing: invoices, top-up history, current balance
  • API: GET /api/usage/... (see API reference)

What's NOT layered on top

Lazu's bill is just input × price + output × price (etc., per the formula above). There is no:

  • "Premium tier discount" stacking on top of lane price
  • "Loyalty multiplier" that reduces price over time
  • Hidden margin per cache read or per audio token beyond the listed per-mtok rate
  • Surcharge on weekends, regions, or model size

If you see a charge that doesn't match tokens × listed_price, that's a bug — open a ticket.

Streaming partial usage

When you call with stream: true:

  • Tokens flush to your balance in real time as they're generated.
  • If the client disconnects mid-stream, Lazu still bills for what was delivered (provided upstream reported it in a final usage trailer).
  • If upstream errors before any tokens reach you, you're refunded.

This means an aborted stream of 1,000 tokens after the user clicked "Cancel" still costs roughly 1,000 × output_price. The model already did the work; the client just stopped reading.

Enterprise / volume contracts

For workloads sustained above $1,000/month, contact sales via lazu.ai — volume terms negotiated case by case.

See also