Skip to content

Pricing

ai& bills inference per token, deducted from your prepaid credit balance after each request completes. Top-up via Credits & Top-Up.

cost_usd = (input_tokens / 1_000_000) × input_per_1m
+ (output_tokens / 1_000_000) × output_per_1m

Both input_per_1m and output_per_1m are USD per 1 million tokens, served on each model row via GET /v1/models.

Every response carries the exact computed cost:

You can also query usage and cost retrospectively via the Analytics and Request Logs endpoints.

OutcomeCharged?
2xx response with outputYes — (input × rate) + (output × rate)
4xx before model dispatch (validation, auth, rate limit, no credit)No
5xx after partial output streamedYes for what was billed; the cost reflects what was produced
Stream cancelled mid-flightYes for tokens already produced

Cost is deducted atomically when the request settles — concurrent requests can’t overdraw.

Token rates are the same across tiers. What differs is throughput — see Rate Limits for per-tier RPM/TPM caps.