Rate limits
Lazu protects backends with two layers of limits:
- Tier-based RPM/TPM — every account belongs to a tier (
Tier0..Tier3) based on rolling spend. Higher tier = higher rate caps. - Anti-abuse for unverified accounts — if you've never funded the account (no successful top-up), you're capped at a tiny rate regardless of tier.
Both limits apply per API key user, not globally — your usage doesn't affect other Lazu customers.
Defaults
| Tier | Eligibility | RPM | TPM |
|---|---|---|---|
| Unverified | No successful top-up yet | 5 | 5,000 |
| Tier 0 | Verified (paid at least once), 30-day spend < $10 | 60 | 100,000 |
| Tier 1 | 30-day spend ≥ $10 | 120 | 300,000 |
| Tier 2 | 30-day spend ≥ $100 | 300 | 1,000,000 |
| Tier 3 | 30-day spend ≥ $1,000 | 600 | 2,000,000 |
- RPM = requests per minute (rolling 60-second window)
- TPM = tokens per minute, counting both input and estimated output
How tier is determined
- New account =
Tier0after first successful top-up,Unverifiedbefore. - Tier auto-adjusts daily at 04:30 (Asia/Shanghai) based on 30-day rolling spend.
- A single top-up immediately upgrades you to the tier matching your lifetime topup total — you don't have to wait for the daily job.
When you hit a limit
HTTP 429 with:
{
"error": {
"message": "Request rate limit exceeded (60/min)",
"type": "rate_limit_exceeded",
"code": "request_rate_limit_exceeded"
}
}Response headers include Retry-After: 60 (seconds).
Best practice: catch 429, sleep Retry-After seconds, then retry with
exponential backoff if you hit it twice in a row. The major OpenAI / Anthropic
SDKs already do this — make sure you haven't disabled retries.
Streaming concurrency
Streaming (stream: true) responses count as one request for RPM, and
their tokens flush into TPM as they arrive. A long-running stream that
emits 10,000 output tokens over 30 seconds is one RPM request but consumes
roughly 10,000 TPM credit in the minute(s) it's active.
There's no separate "concurrent streams" cap beyond the RPM/TPM math.
Per-vendor sub-limits
Lazu doesn't enforce vendor-side limits — upstream providers like OpenAI
and Anthropic still apply their own caps to the underlying API key pool.
On a busy day you may see upstream 429 or 503 propagate through. Lazu's
routing will retry against backup channels in the same lane when possible.
Need higher limits?
For sustained high-throughput workloads (1,000+ RPM, several million TPM), contact support via lazu.ai — limits can be raised per-account with no public commitment.
See also
- Errors — full error code table
- Billing — how spend rolls up to tier
- Pricing & lanes — choosing direct vs cheap