Pricing
ai& bills inference per token, deducted from your prepaid credit balance after each request completes. Top-up via Credits & Top-Up.
The formula
Section titled “The formula”cost_usd = (input_tokens / 1_000_000) × input_per_1m + (output_tokens / 1_000_000) × output_per_1mBoth input_per_1m and output_per_1m are USD per 1 million tokens, served on each model row via GET /v1/models.
Per-request reporting
Section titled “Per-request reporting”Every response carries the exact computed cost:
- Non-streaming:
X-AiAnd-Cost-USDheader. See Response Headers. - Streaming:
event: aiand.metadataafter the model’s terminal event. See Streaming Events.
You can also query usage and cost retrospectively via the Analytics and Request Logs endpoints.
When you’re charged
Section titled “When you’re charged”| Outcome | Charged? |
|---|---|
| 2xx response with output | Yes — (input × rate) + (output × rate) |
| 4xx before model dispatch (validation, auth, rate limit, no credit) | No |
| 5xx after partial output streamed | Yes for what was billed; the cost reflects what was produced |
| Stream cancelled mid-flight | Yes for tokens already produced |
Cost is deducted atomically when the request settles — concurrent requests can’t overdraw.
Token rates are the same across tiers. What differs is throughput — see Rate Limits for per-tier RPM/TPM caps.