Responses

POST /v1/responses

Creates a model response for the given input. Compatible with the OpenAI Responses API.

Request body

Parameter	Type	Required	Description
`model`	string	Yes	Model ID to use
`input`	string \| array	Yes	Text string (treated as user message) or array of input items
`instructions`	string	No	System/developer message injected into context
`stream`	boolean	No	Stream as SSE. Default: `false`
`temperature`	number	No	Sampling temperature, 0–2
`top_p`	number	No	Nucleus sampling, 0–1
`max_output_tokens`	integer	No	Upper bound for generated tokens
`tools`	array	No	Function tools the model may call
`tool_choice`	string \| object	No	`"none"`, `"auto"`, `"required"`, or specific tool
`parallel_tool_calls`	boolean	No	Allow parallel tool calls
`reasoning`	object	No	`{ "effort": "low"\|"medium"\|"high", "summary": "auto"\|"concise"\|"detailed" }`
`truncation`	string	No	`"auto"` (truncate to fit context) or `"disabled"`
`previous_response_id`	string	No	Continue a multi-turn conversation
`store`	boolean	No	Store response for later retrieval
`metadata`	object	No	Key-value pairs for tracking
`text`	object	No	Text response format config
`seed`	integer	No	Deterministic sampling seed
`stop`	string \| string[]	No	Up to 4 stop sequences
`top_k`	integer	No	Top-k sampling (provider-specific)
`repetition_penalty`	number	No	Repetition penalty (provider-specific)

Input types

Simple string input:

{
  "model": "gpt-4o-mini",
  "input": "Explain quantum computing in one paragraph."
}

Structured input items:

{
  "model": "gpt-4o-mini",
  "input": [
    {
      "role": "user",
      "content": [
        { "type": "input_text", "text": "Describe this image" },
        { "type": "input_image", "image_url": "https://..." }
      ]
    }
  ]
}

Input item types:

Message: { "role": "user"|"assistant"|"developer"|"system", "content": string | content[] }
Function call output: { "type": "function_call_output", "call_id": "...", "output": "..." }
Item reference: { "type": "item_reference", "id": "..." }

Content types: input_text, input_image, input_file

Response

{
  "id": "resp_abc123",
  "object": "response",
  "status": "completed",
  "created_at": 1700000000,
  "model": "gpt-4o-mini",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "Quantum computing uses quantum bits (qubits)..."
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 15,
    "output_tokens": 80,
    "total_tokens": 95
  }
}

Response status

Status	Description
`completed`	Generation finished successfully
`failed`	Generation failed (see `error` field)
`in_progress`	Still generating (streaming)
`incomplete`	Stopped early (see `incomplete_details.reason`)
`cancelled`	Request was cancelled
`queued`	Waiting to be processed

Usage object

Field	Type	Description
`input_tokens`	integer	Input tokens consumed
`output_tokens`	integer	Output tokens generated
`total_tokens`	integer	Total tokens
`input_tokens_details`	object	`{ cached_tokens }`
`output_tokens_details`	object	`{ reasoning_tokens }`

Multi-turn conversations

Use previous_response_id to continue a conversation without resending the full history:

{
  "model": "gpt-4o-mini",
  "input": "Now explain it to a 5-year-old.",
  "previous_response_id": "resp_abc123"
}