Skip to content

Structured Extraction

Extract structured records from free-form text — invoices, emails, transcripts, scraped pages. Use a strict JSON Schema so the result parses cleanly every time.

from openai import OpenAI
client = OpenAI(base_url="https://api.aiand.com/v1", api_key="sk-...")
text = """
From: Jane Doe <jane@example.com>
Re: Coffee next week
Phone: +1 555-123-4567
"""
response = client.chat.completions.create(
model="openai/gpt-oss-120b",
messages=[
{"role": "system", "content": "Extract contact info from the user's text."},
{"role": "user", "content": text},
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "contact",
"strict": True,
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"email": {"type": "string"},
"phone": {"type": ["string", "null"]},
},
"required": ["name", "email", "phone"],
"additionalProperties": False,
},
},
},
)
import json
contact = json.loads(response.choices[0].message.content)
print(contact)
# {'name': 'Jane Doe', 'email': 'jane@example.com', 'phone': '+1 555-123-4567'}

The OpenAI Python SDK can derive the schema from a Pydantic model:

from pydantic import BaseModel
from typing import Optional
class Contact(BaseModel):
name: str
email: str
phone: Optional[str]
response = client.chat.completions.parse(
model="openai/gpt-oss-120b",
messages=[
{"role": "system", "content": "Extract contact info."},
{"role": "user", "content": text},
],
response_format=Contact,
)
contact = response.choices[0].message.parsed
print(contact.name, contact.email)
  • Make every field required and use ["string", "null"] for optional values — strict mode rejects missing keys, not nullable ones.
  • Set additionalProperties: false to prevent the model from sneaking in extra fields.
  • For nested arrays (multiple contacts), wrap in {"type": "array", "items": {...}} inside an outer object.