Vision
Vision-capable models accept image inputs alongside text. ai& supports three input modes; choose the one that fits your use case.
Three ways to send an image
Section titled “Three ways to send an image”{ "role": "user", "content": [ { "type": "text", "text": "What's in this image?" }, { "type": "image_url", "image_url": { "url": "https://example.com/cat.png" } } ]}{ "role": "user", "content": [ { "type": "text", "text": "What's in this image?" }, { "type": "image_url", "image_url": { "url": "data:image/png;base64,iVBORw0KG..." } } ]}{ "role": "user", "content": [ { "type": "text", "text": "What's in this image?" }, { "type": "file", "file": { "file_id": "file-abc123" } } ]}The file_id path
Section titled “The file_id path”When you send {type: "file", file: {file_id}}, the model receives the bytes you uploaded — no public hosting required. The wire format on your end stays the same regardless of which mode you use.
Upload images via Files with purpose: "vision" — or omit the purpose and let it be inferred from MIME.
Supported MIME types: image/png, image/jpeg, image/webp, image/gif. Max size: 100 MB.
Capability gating
Section titled “Capability gating”Vision requests are rejected if the target model lacks the vision capability — both for inline image_url parts and for file_id references whose purpose maps to vision. List models and their capabilities via Models.