Instructor vs raw JSON Schema — when to choose which?

Instructor when writing Python code — less boilerplate, automatic retry, multi-provider support. Raw JSON Schema with `strict: true` when you care about latency (no client-side retry overhead) or integrating with a platform that generates schemas from its own types (e.g. TypeScript + `zod.toJsonSchema()`). In practice: new Python projects → Instructor, cross-language integrations → raw schema.

How do I handle a list of 100+ items for extraction?

Don't ask the model to return 100 items at once — quality drops, context grows. Approach: split input into ~2000-token chunks, extract per chunk, deduplicate by key. Instructor supports `Iterable[Model]` as `response_model` — the model streams objects one by one, parsing is incremental and you can stop halfway.

Do structured outputs work with local models (Ollama, vLLM)?

Yes — via `instructor.from_openai()` with `base_url` pointing to the local server. Quality depends on the model: Llama 3.1 70B and Mistral Large handle simple schemas well. Avoid `strict: true` with local models — they don't support constrained decoding the same way as the OpenAI API. In production: test every model against your target schemas before deploying.

RETURN_TO_BLOG

2026-06-05AI & Automation 14 min

Structured outputs from AI: Pydantic, Instructor and JSON Schema in production

Paweł Wiszniewski

SEO & GEO Specialist · AI Engineer

Structured Outputs with JSON Schema or the Instructor library guarantee that the model returns data in exactly the format your code expects — validated either at the API level or in client code. This eliminates the entire string-parsing nightmare: no markdown fences, no comments inside JSON, no fields in the wrong type. If you are building a data extraction pipeline or an LLM integration with any downstream system, this is the only approach suitable for production.

How to stop parsing strings from GPT and start getting data ready to use in code — JSON Schema, Pydantic and the Instructor library.

Tuesday morning, production deployment. The model returns: `{"name": "Jan Kowalski", "age": "thirty-two", "tags": "python, django"}`. Your code expected `age` as int, `tags` as list — and it throws an exception. The model "tried its best", but it couldn't know that the list is `["python", "django"]`, not a string. This isn't an edge case — it's the daily reality when an LLM and code communicate through a string.

Three Approaches — and Why the First Two Fail

Most teams go through the same phases. Phase 1 — "tell GPT to return JSON" — works for a week, then the model adds a markdown fence or a comment and `json.loads` blows up. Phase 2 — JSON mode (`response_format={"type": "json_object"}`) — stable JSON, but without a schema the model decides the field shapes itself. Phase 3 — Structured Outputs with JSON Schema or Instructor — you get exactly what you described, validated either at the API level or in code.

/// EVOLUTION OF STRUCTURED OUTPUTS

3 approaches — from chaos to guaranteed structure

01Prompt JSON

"Return the answer as JSON"

Stability✗ random

Validation✗ none

Schema✗ none

json.loads() blows up

02JSON Mode

response_format: json_object

Stability✓ stable

Validation✗ none

Schema✗ model decides

A field can be int or string

03Structured Outputs

JSON Schema + Instructor

Stability✓ guaranteed

Validation✓ automatic

Schema✓ enforced

Type-safe Pydantic object

~60%

PARSE SUCCESS

PROMPT JSON

~95%

PARSE SUCCESS

JSON MODE

100%

PARSE SUCCESS

STRUCTURED OUTPUTS

JSON Schema and strict mode — API-side validation

OpenAI Structured Outputs (from GPT-4o) enforce the schema at the tokenisation level — the model only generates tokens matching the defined structure. `strict: true` + `response_format` with `json_schema` guarantees the response always parses without error. Requirements: every object needs `additionalProperties: false` and all fields in `required` — you implement optionality via `anyOf` with `{"type": "null"}`.

json_schema_strict.py

from openai import OpenAIimport jsonclient = OpenAI()SCHEMA = {    "name": "order_extraction",    "strict": True,    "schema": {        "type": "object",        "properties": {            "customer_name": {"type": "string"},            "order_id": {"type": "string"},            "items": {                "type": "array",                "items": {                    "type": "object",                    "properties": {                        "product": {"type": "string"},                        "quantity": {"type": "integer"},                        "price_pln": {"type": "number"}                    },                    "required": ["product", "quantity", "price_pln"],                    "additionalProperties": False                }            },            "total_pln": {"type": "number"}        },        "required": ["customer_name", "order_id", "items", "total_pln"],        "additionalProperties": False    }}resp = client.chat.completions.create(    model="gpt-4o",    messages=[{"role": "user", "content": "Extract: Jan Kowalski, ORD-001234, 3x coffee 12.99 PLN, 1x tea 8.50 PLN"}],    response_format={"type": "json_schema", "json_schema": SCHEMA})order = json.loads(resp.choices[0].message.content)print(order["total_pln"])

The result is always valid JSON matching the schema — zero exceptions from `json.loads`. The downside is verbosity: for complex objects, JSON Schema quickly becomes unreadable and hard to maintain.

Pydantic as the schema description layer

Instead of writing JSON Schema by hand, describe the structure as a Pydantic class. `Model.model_json_schema()` generates the schema automatically from type hints and validators. The key: `Field(description=...)` — the LLM reads field descriptions and fills data far more accurately when it knows what you expect. `field_validator` lets you add business rules that JSON Schema can't express — sum validation, ID format, conditional rules.

pydantic_model.py

from pydantic import BaseModel, Field, field_validatorfrom typing import Optionalimport reclass OrderItem(BaseModel):    product: str = Field(description="Product name exactly as written in the text")    quantity: int = Field(ge=1, description="Number of units, min 1")    price_pln: float = Field(gt=0, description="Unit price in PLN")class Order(BaseModel):    customer_name: str = Field(description="Customer's first and last name")    order_id: str = Field(description="Order ID in format ORD-XXXXXX")    items: list[OrderItem] = Field(description="List of all order items")    total_pln: float = Field(description="Sum of all items in PLN")    notes: Optional[str] = Field(default=None, description="Notes if provided, otherwise null")    @field_validator("order_id")    @classmethod    def validate_order_id(cls, v: str) -> str:        if not re.match(r"ORD-d{6}$", v):            raise ValueError(f"order_id must be ORD-XXXXXX, got: {v}")        return v    @field_validator("total_pln")    @classmethod    def validate_total(cls, v: float, info) -> float:        if "items" in info.data:            expected = sum(i.price_pln * i.quantity for i in info.data["items"])            if abs(v - expected) > 0.01:                raise ValueError(f"total_pln {v} != sum of items {expected:.2f}")        return v

`field_validator` lets you define business rules — sum validation, ID format, date ranges — that JSON Schema can't handle. A validation error gives you a concrete message you can pass back to the model in the next retry.

Instructor — 3 lines of code instead of your own parser

Instructor wraps the OpenAI client (and 10+ other providers) and turns the response directly into a validated Pydantic object. You don't need `json.loads`, `model.model_validate` or manual retry — the library handles it for you with 3 retries by default, sending the validation error message back to the model as context.

instructor_basic.py

import instructorfrom openai import OpenAIfrom pydantic import BaseModel, Fieldfrom typing import Literalclient = instructor.from_openai(OpenAI())class ProductReview(BaseModel):    sentiment: Literal["positive", "negative", "neutral"]    score: int = Field(ge=1, le=5, description="Rating 1–5")    key_issues: list[str] = Field(description="List of main problems or strengths, max 5 points")    would_recommend: bool    summary: str = Field(max_length=200, description="One-sentence summary")review = client.chat.completions.create(    model="gpt-4o",    response_model=ProductReview,    messages=[        {"role": "user", "content": "Analyse: 'Product arrived damaged, support didn't pick up for 3 days, eventually got a refund but wasted my time. Never again.'"}    ])print(review.sentiment)print(review.score)print(review.key_issues)

`response_model=ProductReview` is all you need — Instructor generates the JSON Schema from the class, calls the API, parses the response, validates with Pydantic, and on failure automatically retries with the error appended to the conversation context.

/// INSTRUCTOR — VALIDATION PIPELINE

From a Pydantic class to a validated object

Pydantic Model

Class with field descriptions

›

↓

instructor.from_openai()

Wrap the client

›

↓

LLM Call

response_model=Model

›

↓

JSON Parse

Automatic

›

↓

Pydantic Validate

field_validator()

↻

Automatic retry (3× by default)

When Pydantic validation fails, Instructor appends the error message to the model context and retries the call. The model "sees" its own error and fixes the data.

3×

DEFAULT RETRY LIMIT

10+

PROVIDERS (OAI, ANTHROPIC…)

LINES OF BOILERPLATE

Patterns: extraction, classification, normalisation

Three main use cases differ in their approach to the schema. Extraction (pulling data from text) — use `Optional` for fields that may not appear; never force fields the model can't fill. Classification — use `Literal` or `Enum` instead of `str`, the model will only choose from allowed values. Normalisation — describe the exact output format with an example in `description` and use `field_validator` to verify.

Pattern	Field type	Key trick	Pitfall
Extraction	Optional[str]	null when field absent from text	Forcing fields that aren't there
Classification	Literal["a","b","c"]	Enum instead of str	Too many classes (>10) — quality drops
Date normalisation	str + validator	Format example in description	Timezones — always use UTC
List of items	list[Model]	"Extract ALL" in description	Duplicates — deduplicate in validator
Nested objects	BaseModel in BaseModel	Flat schema is faster and more accurate	Depth >3 — hallucinations

instructor_patterns.py

from enum import Enumfrom typing import Optional, Literalfrom pydantic import BaseModel, Field, field_validatorfrom datetime import datetimeimport instructorfrom openai import OpenAIclass Priority(str, Enum):    LOW = "low"    MEDIUM = "medium"    HIGH = "high"    CRITICAL = "critical"class TicketExtraction(BaseModel):    title: str = Field(max_length=100, description="Short ticket title")    priority: Priority = Field(description="Priority based on urgency and business impact")    category: Literal["bug", "feature", "question", "billing"]    affected_users: Optional[int] = Field(default=None, ge=1, description="Number of affected users if stated, otherwise null")    reported_at: Optional[str] = Field(default=None, description="Date in ISO 8601 format e.g. 2026-06-05T10:30:00Z, null if unknown")    is_regression: bool = Field(description="True if it worked before")    @field_validator("reported_at")    @classmethod    def validate_date(cls, v: Optional[str]) -> Optional[str]:        if v is None:            return v        try:            datetime.fromisoformat(v.replace("Z", "+00:00"))        except ValueError:            raise ValueError(f"reported_at must be ISO 8601, got: {v}")        return vclient = instructor.from_openai(OpenAI())ticket = client.chat.completions.create(    model="gpt-4o",    response_model=TicketExtraction,    messages=[{"role": "user", "content": "URGENT: login stopped working at 10:30, around 500 users can't log in, it worked before"}])print(ticket.priority)print(ticket.affected_users)

Instructor works with multiple providers — `instructor.from_anthropic()`, `instructor.from_gemini()`, `instructor.from_mistral()` — same Pydantic code, different client.

When structured output fails — 4 scenarios

Even with Instructor you hit walls. Here are the four main ones and how to get out.

1. The model can't fill a required field. Symptom: retry loop, model hallucinates a value just to put "something". Fix: change the field to `Optional` and add `description="null if unknown"` — let the model admit missing information.

2. The schema is too complex. Symptom: the model fills a field with a random value instead of null. Fix: simplify to a flat structure. If you need complexity, split into two calls — first extracts flat data, second classifies or normalises.

3. Business validation fails after 3 retries. Symptom: `InstructorRetryException`. Fix: catch the exception and log the model's last attempt — often the rule is too restrictive or the prompt doesn't contain the information the validator expects. Loosen the validator or enrich the prompt with an example of a correct response.

4. The list has too few elements. Symptom: `items` has 2 instead of 5 entries. Fix: add `"Extract ALL items — don't skip any"` to the `description`. Instructor also supports `Iterable[Model]` as `response_model` — the model streams objects incrementally.

---

I build data extraction and classification systems for companies — from simple pipelines to complex multi-step architectures with business validation and monitoring. Get in touch — I start with an analysis of your input data and schema design.

/// RELATED_SERVICES

Need these concepts implemented? Explore the services related to this topic.

Service

AI App Development

Custom AI software and AI-powered web applications. MVP development, full stack engineering, and AI systems programming from scratch to production.

View service Service

Web Engineering

Digital brutalism architecture. Sites that are not templates, but manifestos.

View service

/// SOURCES

/// RELATED_RECORDS

AI & Automation

Vibe Coding: Complete Guide to AI Coding Tools 2026

Claude Code, Cursor, GitHub Copilot, Codex CLI, Gemini CLI, Lovable, Bolt.new — 60% of all new code worldwide is AI-generated (Gartner, 2026). A complete map of 11 vibe coding tools across 3 categories, with pricing, use cases, and a selection guide for businesses.

18 min

AI & Automation

AI Deep Research — How an Agent Searches the Web and Writes the Report Instead of Your Analyst

OpenAI Deep Research, Perplexity, and web-browsing agents are reshaping desk research: a report that takes an analyst 4–8 hours, an agent finishes in 5–20 minutes with source citations. I explain how these tools work, when they genuinely replace a human and when they don't, what ROI looks like, how to build your own research-automation pipeline, and when it makes sense to let the agent do it instead of an employee.

15 min

AI & Automation

AI in Recruitment and HR 2026 — CV Screening Automation, EU AI Act Obligations, and When AI Helps vs Hurts

AI cuts CV screening time by 75%, but recruitment systems are classified as high-risk AI under the EU AI Act — with a full compliance package: human oversight, transparency, technical documentation, EU database registration. I explain what AI in HR can safely do (screening as a filter, chatbot, onboarding), where the line is (autonomous decisions without a human), which tools work for SMEs, and how to avoid legal exposure.

17 min

/// AUTHOR

Paweł Wiszniewski

SEO & GEO Specialist & AI Engineer

SEO/GEO specialist (10 years) and AI engineer (3 years). I build search visibility, AI systems and automations that reduce costs and improve operational efficiency.

LinkedIn Facebook

Signal received?

Terminate
Silence

Initiate protocol. Establish connection. Let's build something loud.

> WAITING_FOR_INPUT...

BIAŁYSTOK, PL

+48 732 022 086 pawel.wiszniewski95@gmail.com

Three Approaches — and Why the First Two Fail

3 approaches — from chaos to guaranteed structure

JSON Schema and strict mode — API-side validation

Pydantic as the schema description layer

Instructor — 3 lines of code instead of your own parser

From a Pydantic class to a validated object

Patterns: extraction, classification, normalisation

When structured output fails — 4 scenarios

/// RELATED_SERVICES

AI App Development

Web Engineering

/// SOURCES

/// RELATED_RECORDS

Vibe Coding: Complete Guide to AI Coding Tools 2026

AI Deep Research — How an Agent Searches the Web and Writes the Report Instead of Your Analyst

AI in Recruitment and HR 2026 — CV Screening Automation, EU AI Act Obligations, and When AI Helps vs Hurts

Signal received?

TerminateSilence

Terminate
Silence