Are you an LLM? Read llms.txt for a summary of the docs, or llms-full.txt for the full context.
Skip to content

Rate limits

Lazu protects backends with two layers of limits:

  1. Tier-based RPM/TPM — every account belongs to a tier (Tier0..Tier3) based on rolling spend. Higher tier = higher rate caps.
  2. Anti-abuse for unverified accounts — if you've never funded the account (no successful top-up), you're capped at a tiny rate regardless of tier.

Both limits apply per API key user, not globally — your usage doesn't affect other Lazu customers.

Defaults

TierEligibilityRPMTPM
UnverifiedNo successful top-up yet55,000
Tier 0Verified (paid at least once), 30-day spend < $1060100,000
Tier 130-day spend ≥ $10120300,000
Tier 230-day spend ≥ $1003001,000,000
Tier 330-day spend ≥ $1,0006002,000,000
  • RPM = requests per minute (rolling 60-second window)
  • TPM = tokens per minute, counting both input and estimated output

How tier is determined

  • New account = Tier0 after first successful top-up, Unverified before.
  • Tier auto-adjusts daily at 04:30 (Asia/Shanghai) based on 30-day rolling spend.
  • A single top-up immediately upgrades you to the tier matching your lifetime topup total — you don't have to wait for the daily job.

When you hit a limit

HTTP 429 with:

{
  "error": {
    "message": "Request rate limit exceeded (60/min)",
    "type": "rate_limit_exceeded",
    "code": "request_rate_limit_exceeded"
  }
}

Response headers include Retry-After: 60 (seconds).

Best practice: catch 429, sleep Retry-After seconds, then retry with exponential backoff if you hit it twice in a row. The major OpenAI / Anthropic SDKs already do this — make sure you haven't disabled retries.

Streaming concurrency

Streaming (stream: true) responses count as one request for RPM, and their tokens flush into TPM as they arrive. A long-running stream that emits 10,000 output tokens over 30 seconds is one RPM request but consumes roughly 10,000 TPM credit in the minute(s) it's active.

There's no separate "concurrent streams" cap beyond the RPM/TPM math.

Per-vendor sub-limits

Lazu doesn't enforce vendor-side limits — upstream providers like OpenAI and Anthropic still apply their own caps to the underlying API key pool. On a busy day you may see upstream 429 or 503 propagate through. Lazu's routing will retry against backup channels in the same lane when possible.

Need higher limits?

For sustained high-throughput workloads (1,000+ RPM, several million TPM), contact support via lazu.ai — limits can be raised per-account with no public commitment.

See also