Rate limits

Lazu protects backends with three layers of limits:

Tier-based RPM/TPM — every account belongs to a tier (Tier0..Tier3) based on rolling spend. Higher tier = higher rate caps.
Anti-abuse for unverified accounts — if you've never funded the account (no successful top-up), you're capped at a tiny rate regardless of tier.
Per-key RPM/TPM — you can set a stricter cap on an individual API key for product or workload isolation. A per-key limit never raises the account tier ceiling; it can only make that key stricter.

Both limits apply per API key user, not globally — your usage doesn't affect other Lazu customers.

Defaults

Tier	Eligibility	RPM	TPM
Unverified	No successful top-up yet	5	5,000
Tier 0	Verified (paid at least once), 30-day spend < $10	60	100,000
Tier 1	30-day spend ≥ $10	120	300,000
Tier 2	30-day spend ≥ $100	300	1,000,000
Tier 3	30-day spend ≥ $1,000	600	2,000,000

RPM = requests per minute (rolling 60-second window)
TPM = tokens per minute. Request preflight counts prompt/input tokens plus an explicit output reservation when the request includes max_tokens, max_completion_tokens, or max_output_tokens. If no maximum output is present, preflight uses prompt/input tokens only.

Per-key limits

Per-key RPM/TPM limits are configured on the token in the console. Use them when one product, environment, or integration needs its own ceiling without slowing the rest of the account.

0 means no key-level cap on that dimension; the account tier still applies.
The effective cap is the stricter of the key cap and the account tier cap.
Keys can carry an optional product label and JSON metadata for operational attribution and audits.
Token policy changes are audit-logged without storing the plaintext key.

How tier is determined

New account = Tier0 after first successful top-up, Unverified before.
Tier auto-adjusts daily at 04:30 (Asia/Shanghai) based on 30-day rolling spend.
A single top-up immediately upgrades you to the tier matching your lifetime topup total — you don't have to wait for the daily job.

When you hit a limit

HTTP 429 with:

{
  "error": {
    "message": "Request rate limit exceeded (60/min)",
    "type": "rate_limit_exceeded",
    "code": "request_rate_limit_exceeded"
  }
}

request_rate_limit_exceeded means an account/tier or success-request cap was hit. token_rate_limit_exceeded means the individual key's RPM/TPM cap was hit. total_request_rate_limit_exceeded is the legacy total-request guard.

Response headers include Retry-After: 60 (seconds).

Best practice: catch 429, sleep Retry-After seconds, then retry with exponential backoff if you hit it twice in a row. The major OpenAI / Anthropic SDKs already do this — make sure you haven't disabled retries.

Streaming concurrency

Streaming (stream: true) responses count as one request for RPM. For TPM preflight, Lazu uses the same prompt + explicit max-output reservation rule as non-streaming requests. If you omit a max-output value, only prompt/input tokens are available to the preflight limiter.

There's no separate "concurrent streams" cap beyond the RPM/TPM math.

Per-vendor sub-limits

Lazu doesn't enforce vendor-side limits — upstream providers like OpenAI and Anthropic still apply their own caps to the underlying API key pool. On a busy day you may see upstream 429 or 503 propagate through. Lazu's routing will retry against backup channels in the same lane when possible.

Need higher limits?

For sustained high-throughput workloads (1,000+ RPM, several million TPM), contact support via lazu.ai — limits can be raised per-account with no public commitment.

Defaults

Per-key limits

How tier is determined

When you hit a limit

Streaming concurrency

Per-vendor sub-limits

Need higher limits?

See also