Streaming Events

When you request a streaming response, ai& streams the model’s SSE events unchanged. If the request carries the opt-in header X-Aiand-Metrics: true, ai& appends a single trailer event named metrics carrying token counts, cost, and timing.

OpenAI shape

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hel"}}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"content":"lo"}}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":7,"completion_tokens":2,"total_tokens":9}}

data: [DONE]

event: metrics
data: {"tokens":{"input":7,"output":2,"total":9,"cached":0},"cost":0.000018,"currency":"usd","ttft_ms":120,"inference_ms":850}

The metrics event is emitted after [DONE]. Robust clients should keep the connection open until the stream closes naturally.

The trailer event

Field	Description
`tokens.input`	Input tokens counted toward billing.
`tokens.output`	Output tokens counted toward billing.
`tokens.total`	Sum of input and output tokens.
`tokens.cached`	Cached (repeated-prefix) input tokens billed at the model’s discounted cached rate. `0` when there is no cache hit. See Prompt caching.
`cost`	Final cost of the request, in your billing currency.
`currency`	The currency `cost` is denominated in (`usd` or `jpy`).
`ttft_ms`	Time to first token, in milliseconds.
`inference_ms`	Time the upstream model spent producing the response, in milliseconds.

Which endpoints emit it

The trailer is emitted on /v1/chat/completions and /v1/responses streams. Anthropic-shaped /v1/messages streams and legacy /v1/completions streams do not carry it — read the native usage block, or query Request Logs for cost.

Why a trailer event?

ai& never modifies the response body emitted by the model. Tokens, cost, and timing are delivered as a named SSE event after the terminal message, so the byte stream above the trailer is identical to what the source API would produce — the official OpenAI and Anthropic SDKs work without modification.