Skip to content

Vision Extraction

This recipe pulls structured data out of an image — receipts, invoices, ID cards, business cards. Three pieces:

  1. Upload the image via Files.
  2. Reference it in a chat message by file_id.
  3. Constrain the output with a JSON Schema.
from openai import OpenAI
from pydantic import BaseModel
from typing import List
client = OpenAI(base_url="https://api.aiand.com/v1", api_key="sk-...")
class LineItem(BaseModel):
description: str
quantity: int
unit_price_jpy: float
class Receipt(BaseModel):
merchant: str
date: str
items: List[LineItem]
total_jpy: float
uploaded = client.files.create(
file=open("receipt.jpg", "rb"),
purpose="vision",
)
response = client.chat.completions.parse(
model="google/gemma-4-31b-it",
messages=[
{"role": "system", "content": "Extract structured data from receipts."},
{
"role": "user",
"content": [
{"type": "text", "text": "Parse this receipt into JSON."},
{"type": "file", "file": {"file_id": uploaded.id}},
],
},
],
response_format=Receipt,
)
receipt = response.choices[0].message.parsed
for item in receipt.items:
print(f"{item.description}: ¥{item.unit_price_jpy}")
print(f"Total: ¥{receipt.total_jpy}")
  • client.files.create(..., purpose="vision") uploads the image to ai& storage. The purpose is also inferred from MIME if omitted.
  • The file content part — {type: "file", file: {file_id}} — tells ai& to resolve the file. The platform generates a short-lived signed URL and rewrites the part to image_url before forwarding to the model. See Vision for details.
  • response_format=Receipt ships the Pydantic schema as a strict JSON Schema.

Files cost storage. Delete them when you’re done:

client.files.delete(uploaded.id)

Or let the 30-day expiry handle it.