Rate Limits
ai& enforces rate limits per organization to keep the platform healthy under shared load. Limits depend on your tier and apply to inference endpoints; management APIs have their own (looser) limits.
| Tier | Description |
|---|---|
| Tier 0 | Evaluation tier. Lower per-minute caps. Suitable for development and small projects. New orgs start here. |
| Tier 1 | Production tier. Higher caps and access to higher-throughput models. Orgs are promoted on their first successful payment. |
What’s measured
Section titled “What’s measured”Each request is checked against six buckets — four per-model and two per-org global:
- Per-model RPM — requests per minute against one specific model.
- Per-model Input TPM — input tokens per minute (estimated up-front from the request body).
- Per-model Output TPM — output tokens per minute (charged after the response completes).
- Per-model Concurrency — max in-flight requests against one model.
- Global RPM — requests per minute across all models you call.
- Global Concurrency — max in-flight requests across all models.
Whichever bucket fills first triggers throttling.
Headers
Section titled “Headers”Every response carries:
| Header | Meaning |
|---|---|
X-RateLimit-Limit | Your effective RPM cap — whichever of the per-model or global RPM bucket is currently more constrained. |
X-RateLimit-Remaining | Requests left in that same bucket. |
On a 429 Too Many Requests, two more headers are added:
| Header | Meaning |
|---|---|
X-RateLimit-Policy | Which bucket denied the request: rpm, global_rpm, input_tpm, output_tpm, concurrency, or global_concurrency. |
Retry-After | Seconds until the offending bucket has capacity again. Omitted for concurrency and global_concurrency rejects — finish or cancel in-flight requests instead. |
See Response Headers for non-rate-limit headers.
429 responses
Section titled “429 responses”When throttled, ai& returns 429 Too Many Requests. Back off and retry — exponential backoff with jitter is recommended. Retry-After tells you the minimum safe delay for time-based rejects.