Rate Limits
Rate limits are applied per-organization based on your plan tier. Limits are enforced using a token bucket algorithm.
Plan tiers
Section titled “Plan tiers”| Limit | Free | Plus | Pro |
|---|---|---|---|
| Requests per minute | 50 | 300 | 1,200 |
| Input tokens per minute | 50,000 | 700,000 | 2,500,000 |
| Output tokens per minute | 7,000 | 140,000 | 450,000 |
| Max concurrent requests | 3 | 12 | 40 |
Rate limit responses
Section titled “Rate limit responses”When a limit is exceeded, the API returns 429 Too Many Requests:
{ "error": { "message": "Rate limit exceeded. Please retry after 60s.", "type": "rate_limit_error", "param": null, "code": "rate_limit_exceeded" }}Best practices
Section titled “Best practices”- Implement retries with exponential backoff — start with a 1-second delay and double on each retry, up to a maximum of 60 seconds.
- Use streaming for long completions to avoid tying up a concurrency slot while waiting for the full response.
- Monitor usage via the Analytics API to stay within your limits.
- Batch strategically — sending fewer, larger requests is more efficient than many small ones when you’re near the RPM limit.