Chat Completions

POST /v1/chat/completions

Generate a model response for a given conversation. Fully compatible with the OpenAI Chat Completions API.

Request body

Parameter	Type	Required	Description
`model`	string	Yes	Model ID (see Models)
`messages`	array	Yes	Conversation messages (see Message types)
`stream`	boolean	No	Stream partial deltas as SSE. Default: `false`
`stream_options`	object	No	`{ "include_usage": true }` to include token counts in the final stream event
`temperature`	number	No	Sampling temperature, 0–2. Default: model-dependent
`top_p`	number	No	Nucleus sampling threshold, 0–1
`n`	integer	No	Number of choices to generate, 1–128
`max_tokens`	integer	No	Maximum tokens to generate (deprecated — use `max_completion_tokens`)
`max_completion_tokens`	integer	No	Upper bound on generated tokens, including reasoning tokens
`stop`	string \| string[]	No	Up to 4 stop sequences
`frequency_penalty`	number	No	Frequency penalty, -2 to 2
`presence_penalty`	number	No	Presence penalty, -2 to 2
`logprobs`	boolean	No	Return log probabilities of output tokens
`top_logprobs`	integer	No	Most likely tokens per position, 0–20
`logit_bias`	object	No	Map of token IDs to bias values (-100 to 100)
`response_format`	object	No	`{ "type": "text" }`, `{ "type": "json_object" }`, or `{ "type": "json_schema", "json_schema": {...} }`
`seed`	integer	No	Seed for deterministic sampling
`tools`	array	No	Function tools the model may call
`tool_choice`	string \| object	No	`"none"`, `"auto"`, `"required"`, or a specific tool
`parallel_tool_calls`	boolean	No	Allow parallel function calling
`reasoning_effort`	string	No	Per-model — read `reasoning_efforts` from `GET /v1/models`. A value the model doesn’t accept is rejected
`top_k`	integer	No	Top-k sampling (provider-specific)
`min_p`	number	No	Min-p sampling threshold, 0–1 (provider-specific)
`repetition_penalty`	number	No	Repetition penalty (provider-specific)
`user`	string	No	End-user identifier for abuse tracking

Message types

System message

{ "role": "system", "content": "You are a helpful assistant." }

User message

{ "role": "user", "content": "What is the capital of France?" }

User messages also accept multimodal content arrays:

{
  "role": "user",
  "content": [
    { "type": "text", "text": "What's in this image?" },
    { "type": "image_url", "image_url": { "url": "https://...", "detail": "auto" } }
  ]
}

Supported content types: text, image_url, video_url, audio_url, input_audio, file.

For images, prefer uploading via the Files API and referencing by file_id rather than inline base64 — multi-turn conversations and retries don’t re-send bytes from your client:

{
  "role": "user",
  "content": [
    { "type": "text", "text": "What's in this image?" },
    { "type": "file", "file": { "file_id": "file-abc123" } }
  ]
}

The model must have the matching capability (e.g. vision); otherwise the request is rejected with 400 model_capability_mismatch.

Assistant message

{ "role": "assistant", "content": "The capital of France is Paris." }

Tool message

{ "role": "tool", "tool_call_id": "call_abc123", "content": "{\"result\": 42}" }

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "openai/gpt-oss-120b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 9,
    "total_tokens": 19
  }
}

Usage

Field	Type	Description
`prompt_tokens`	integer	Input tokens consumed
`completion_tokens`	integer	Output tokens generated
`total_tokens`	integer	Sum of input and output tokens
`prompt_tokens_details`	object	Optional. `{ cached_tokens, audio_tokens }`
`completion_tokens_details`	object	Optional. `{ reasoning_tokens, audio_tokens }`

Streaming

Set stream: true to receive partial responses as server-sent events.

curl https://api.aiand.com/v1/chat/completions \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-120b",
    "stream": true,
    "messages": [{"role": "user", "content": "Count to 5"}]
  }'

Each event contains a data: line with a JSON chunk. The stream ends with data: [DONE].

To include token usage in the final event:

{
  "stream": true,
  "stream_options": { "include_usage": true }
}

Tool calling

Define tools in the request and the model can choose to call them:

{
  "model": "openai/gpt-oss-120b",
  "messages": [{ "role": "user", "content": "What's the weather in Tokyo?" }],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": { "type": "string" }
          },
          "required": ["location"]
        }
      }
    }
  ]
}

When the model calls a tool, the response includes tool_calls:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"location\": \"Tokyo\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}

Send the result back as a tool message to continue the conversation.