Pricing

ai& bills inference per token, deducted from your prepaid credit balance after each request completes. Top-up via Credits & Top-Up.

The formula

cost = (input_tokens  / 1_000_000) × input_per_1m
        + (output_tokens / 1_000_000) × output_per_1m

Both input_per_1m and output_per_1m are prices per 1 million tokens in your organization’s billing currency (USD or JPY, fixed at org creation), served on each model row via GET /v1/models together with the currency field. A model is only listed and callable if it has a price in your currency.

Prompt caching

Repeated prompt prefixes are detected automatically, and the reused leading portion is billed at a reduced cached input rate on models that offer one. The common pattern — a stable system prompt (and tools) followed by a growing conversation — means every turn after the first reuses the previous turn’s prompt as its prefix, so that portion is discounted. No cache_control markers or code changes are required.

When a model offers a cached rate, the cost becomes:

cost = (cached_tokens                   / 1_000_000) × cached_input_per_1m
        + (input_tokens − cached_tokens) / 1_000_000 × input_per_1m
        + (output_tokens                 / 1_000_000) × output_per_1m

The cached token count is reported back in the response usage, under the field your SDK expects:

API	Field
OpenAI Chat Completions	`usage.prompt_tokens_details.cached_tokens`
OpenAI Responses	`usage.input_tokens_details.cached_tokens`
Anthropic Messages	`usage.cache_read_input_tokens`

Streaming responses report the same count as tokens.cached in the event: metrics trailer (with X-Aiand-Metrics: true).

Caching applies to prompts of 1024 tokens or more, counted in 128-token increments.
Reuse is scoped to your organization and lasts roughly 10 minutes of inactivity.
Only models with a cached input rate report cached_tokens; on other models all input bills at the standard rate and the field is omitted.

Per-request reporting

Send the request header X-Aiand-Metrics: true to receive the exact computed cost with each response:

Non-streaming: X-Cost + X-Cost-Currency headers. See Response Headers.
Streaming: a trailing event: metrics SSE event after the model’s terminal message. Its payload includes the token counts, cost, and a currency field telling you what currency the cost is denominated in. See Streaming Events.

You can also query usage and cost retrospectively via the Analytics and Request Logs endpoints.

When you’re charged

Outcome	Charged?
2xx response with output	Yes — `(input × rate) + (output × rate)`
4xx before model dispatch (validation, auth, rate limit, no credit)	No
5xx after partial output streamed	Yes for what was billed; the cost reflects what was produced
Stream cancelled mid-flight	Yes for tokens already produced

Cost is deducted atomically when the request settles — concurrent requests can’t overdraw.

Tiers

Token rates are the same across tiers. What differs is throughput — see Rate Limits for per-tier RPM/TPM caps.