Skip to content

Rate Limits

Rate limits are applied per-organization based on your plan tier. Limits are enforced using a token bucket algorithm.

LimitFreePlusPro
Requests per minute503001,200
Input tokens per minute50,000700,0002,500,000
Output tokens per minute7,000140,000450,000
Max concurrent requests31240

When a limit is exceeded, the API returns 429 Too Many Requests:

{
"error": {
"message": "Rate limit exceeded. Please retry after 60s.",
"type": "rate_limit_error",
"param": null,
"code": "rate_limit_exceeded"
}
}
  • Implement retries with exponential backoff — start with a 1-second delay and double on each retry, up to a maximum of 60 seconds.
  • Use streaming for long completions to avoid tying up a concurrency slot while waiting for the full response.
  • Monitor usage via the Analytics API to stay within your limits.
  • Batch strategically — sending fewer, larger requests is more efficient than many small ones when you’re near the RPM limit.