Rate Limits

Rate limits are applied per-organization based on your plan tier. Limits are enforced using a token bucket algorithm.

Plan tiers

Limit	Free	Plus	Pro
Requests per minute	50	300	1,200
Input tokens per minute	50,000	700,000	2,500,000
Output tokens per minute	7,000	140,000	450,000
Max concurrent requests	3	12	40

Rate limit responses

When a limit is exceeded, the API returns 429 Too Many Requests:

{
  "error": {
    "message": "Rate limit exceeded. Please retry after 60s.",
    "type": "rate_limit_error",
    "param": null,
    "code": "rate_limit_exceeded"
  }
}

Best practices

Implement retries with exponential backoff — start with a 1-second delay and double on each retry, up to a maximum of 60 seconds.
Use streaming for long completions to avoid tying up a concurrency slot while waiting for the full response.
Monitor usage via the Analytics API to stay within your limits.
Batch strategically — sending fewer, larger requests is more efficient than many small ones when you’re near the RPM limit.