Responses
POST /v1/responsesCreates a model response for the given input. Compatible with the OpenAI Responses API.
Request body
Section titled “Request body”| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID to use |
input | string | array | Yes | Text string (treated as user message) or array of input items |
instructions | string | No | System/developer message injected into context |
stream | boolean | No | Stream as SSE. Default: false |
temperature | number | No | Sampling temperature, 0–2 |
top_p | number | No | Nucleus sampling, 0–1 |
max_output_tokens | integer | No | Upper bound for generated tokens |
tools | array | No | Function tools the model may call |
tool_choice | string | object | No | "none", "auto", "required", or specific tool |
parallel_tool_calls | boolean | No | Allow parallel tool calls |
reasoning | object | No | { "effort": "low"|"medium"|"high", "summary": "auto"|"concise"|"detailed" } |
truncation | string | No | "auto" (truncate to fit context) or "disabled" |
previous_response_id | string | No | Continue a multi-turn conversation |
store | boolean | No | Store response for later retrieval |
metadata | object | No | Key-value pairs for tracking |
text | object | No | Text response format config |
seed | integer | No | Deterministic sampling seed |
stop | string | string[] | No | Up to 4 stop sequences |
top_k | integer | No | Top-k sampling (provider-specific) |
repetition_penalty | number | No | Repetition penalty (provider-specific) |
Input types
Section titled “Input types”Simple string input:
{ "model": "gpt-4o-mini", "input": "Explain quantum computing in one paragraph."}Structured input items:
{ "model": "gpt-4o-mini", "input": [ { "role": "user", "content": [ { "type": "input_text", "text": "Describe this image" }, { "type": "input_image", "image_url": "https://..." } ] } ]}Input item types:
- Message:
{ "role": "user"|"assistant"|"developer"|"system", "content": string | content[] } - Function call output:
{ "type": "function_call_output", "call_id": "...", "output": "..." } - Item reference:
{ "type": "item_reference", "id": "..." }
Content types: input_text, input_image, input_file
Response
Section titled “Response”{ "id": "resp_abc123", "object": "response", "status": "completed", "created_at": 1700000000, "model": "gpt-4o-mini", "output": [ { "type": "message", "role": "assistant", "content": [ { "type": "output_text", "text": "Quantum computing uses quantum bits (qubits)..." } ] } ], "usage": { "input_tokens": 15, "output_tokens": 80, "total_tokens": 95 }}Response status
Section titled “Response status”| Status | Description |
|---|---|
completed | Generation finished successfully |
failed | Generation failed (see error field) |
in_progress | Still generating (streaming) |
incomplete | Stopped early (see incomplete_details.reason) |
cancelled | Request was cancelled |
queued | Waiting to be processed |
Usage object
Section titled “Usage object”| Field | Type | Description |
|---|---|---|
input_tokens | integer | Input tokens consumed |
output_tokens | integer | Output tokens generated |
total_tokens | integer | Total tokens |
input_tokens_details | object | { cached_tokens } |
output_tokens_details | object | { reasoning_tokens } |
Multi-turn conversations
Section titled “Multi-turn conversations”Use previous_response_id to continue a conversation without resending the full history:
{ "model": "gpt-4o-mini", "input": "Now explain it to a 5-year-old.", "previous_response_id": "resp_abc123"}