OpenAI Structured Outputs: Reliable JSON from LLMs - Cuttlesoft, Custom Software Developers

Every developer who has built an LLM-powered application has encountered this: you ask the model to return JSON, and sometimes it does. Other times it wraps the JSON in markdown code blocks. Occasionally, it adds a helpful preamble before the JSON. And every so often, it returns JSON that is missing required fields or includes values that do not match your expected types.

OpenAI's Structured Outputs feature solves this problem by guaranteeing that responses match your schema. Not "usually match." Not "match with high probability." Guarantee, with 100% reliability on their evaluations.

What You Can Build

Before getting into the mechanics, here is what Structured Outputs looks like in practice.

Data Extraction

Extract structured data from unstructured text. This works well for invoices, contracts, articles, or any document where you need specific fields.

from pydantic import BaseModel, Field
from typing import Optional

class InvoiceData(BaseModel):
    vendor_name: str
    invoice_number: str
    date: str = Field(description="Invoice date in YYYY-MM-DD format")
    line_items: list[dict]
    subtotal: float
    tax: Optional[float] = None
    total: float
    currency: str = Field(default="USD")

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Extract invoice data from the provided text."},
        {"role": "user", "content": invoice_text}
    ],
    response_format=InvoiceData
)

invoice = response.choices[0].message.parsed

Classification with Confidence

Classify inputs and get structured metadata about the classification.

from pydantic import BaseModel
from typing import Literal

class SupportTicketClassification(BaseModel):
    category: Literal["billing", "technical", "account", "general"]
    priority: Literal["low", "medium", "high", "urgent"]
    sentiment: Literal["positive", "neutral", "negative", "angry"]
    requires_human: bool
    summary: str
    suggested_response_template: str

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Classify the support ticket and suggest handling."},
        {"role": "user", "content": ticket_text}
    ],
    response_format=SupportTicketClassification
)

ticket = response.choices[0].message.parsed
if ticket.requires_human or ticket.priority == "urgent":
    escalate_to_human(ticket)

Content Generation with Structure

Generate content that fits a specific format. Useful for product descriptions, SEO content, or any templated output.

from pydantic import BaseModel
from typing import Literal

class BlogPostOutline(BaseModel):
    title: str
    meta_description: str
    sections: list[dict]  # Each with 'heading' and 'key_points'
    target_audience: str
    primary_keyword: str
    secondary_keywords: list[str]
    estimated_word_count: int
    tone: Literal["professional", "casual", "technical", "friendly"]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Create a blog post outline for the given topic."},
        {"role": "user", "content": "Topic: Python async programming best practices"}
    ],
    response_format=BlogPostOutline
)

How Structured Outputs Work

Structured Outputs uses constrained decoding to force the model's output to conform to your JSON schema. During generation, the model can only produce tokens that would result in valid JSON matching your schema. It is not a post-processing filter or a "try again if invalid" loop. The constraint is applied at the token generation level.

In practice, this means required fields are always present, enum values are always valid options, types are always correct (strings are strings, numbers are numbers), and nested objects follow their defined structure.

The trade-off is that the model must understand your schema well enough to generate meaningful content within those constraints. Complex schemas with many nested objects or unusual structures may require more detailed prompting to get useful results.

Two Ways to Use Structured Outputs

OpenAI provides Structured Outputs through two mechanisms, and choosing the right one matters.

1. response_format (for direct responses)

Use response_format when you want the model's response to the user to be structured. This is ideal for data extraction, content generation with specific fields, or any case where the final output should be JSON.

from openai import OpenAI
from pydantic import BaseModel

client = OpenAI()

class MovieReview(BaseModel):
    title: str
    rating: float
    summary: str
    pros: list[str]
    cons: list[str]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Extract movie review details."},
        {"role": "user", "content": "The new Dune movie is visually stunning..."}
    ],
    response_format=MovieReview
)

review = response.choices[0].message.parsed
print(f"{review.title}: {review.rating}/10")

The Python SDK accepts Pydantic models directly in response_format, handles the JSON schema conversion, and returns a parsed object. No manual JSON parsing required.

2. Function Calling with strict: true

Use function calling when the model should decide whether to call a function and with what parameters. This is for tool use, API integrations, and agent-style applications where the model takes actions.

from openai import OpenAI
from pydantic import BaseModel

client = OpenAI()

class WeatherQuery(BaseModel):
    location: str
    unit: str = "celsius"

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": WeatherQuery.model_json_schema(),
            "strict": True  # This enables Structured Outputs
        }
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools
)

The strict: true flag tells OpenAI to apply the same constrained decoding to function call arguments, ensuring the parameters always match your schema.

The key question for choosing between these two: Is the structured data the final output (use response_format) or an intermediate step that triggers an action (use function calling)?

Schema Design Tips

Structured Outputs has specific requirements and limitations that affect how you design schemas.

All fields must have explicit types. Use Pydantic's type hints consistently. Avoid Any types.

Use descriptions liberally. The model uses field descriptions to understand what content belongs where. A field named summary with no description is ambiguous. A field with description="A 2-3 sentence summary of the main argument" is clear.

class Article(BaseModel):
    title: str = Field(description="Article headline, 60 chars max")
    summary: str = Field(description="2-3 sentence summary of key points")
    body: str = Field(description="Full article text in markdown format")

Enum values are strict. If you define an enum, the model can only output those exact values. This is powerful for classification but means you need to anticipate all valid options.

Set additionalProperties: false. When using raw JSON schemas (not Pydantic), you must explicitly set additionalProperties: false on all objects. Pydantic handles this automatically.

Watch the limits. Schemas can have up to 1,000 enum values total. Single enum properties with more than 250 values have a 15,000 character limit for all values combined.

Common Gotchas

Model compatibility. Structured Outputs via response_format is supported on GPT-4o, GPT-4o mini, GPT-4.1 and its variants, o1, o3-mini, and newer models including GPT-5 and GPT-5.2. Function calling with strict: true works on all models that support tool use. Check the current documentation for the latest supported model list, as OpenAI adds support with each new release.

First request latency. The first request with a new schema has additional latency while OpenAI compiles the schema into a constrained decoding grammar. Subsequent requests with the same schema are faster because the compiled grammar is cached. If latency on the first call matters, consider warming the cache during deployment.

Token consumption. Structured outputs with complex schemas use more tokens than unstructured responses because the model must generate all JSON syntax: keys, braces, quotes, and commas. For schemas with many fields, this can meaningfully affect both cost and latency. Keep schemas as lean as possible for frequently called endpoints.

Parallel function calls. Structured Outputs does not work with parallel function calls. If your use case requires multiple simultaneous function calls, set parallel_tool_calls: false.

Refusals. The model can still refuse requests that violate content policies. When this happens, you get a refusal message instead of structured output. Always check response.choices[0].message.refusal.

response = client.chat.completions.create(...)

if response.choices[0].message.refusal:
    print(f"Request refused: {response.choices[0].message.refusal}")
else:
    result = response.choices[0].message.parsed

Streaming. Structured Outputs does support streaming. The model streams valid partial JSON that, when complete, forms a valid response matching your schema. The OpenAI Python and Node SDKs provide stream helpers that handle incremental parsing. This is useful for user-facing applications where you want to display fields as they arrive rather than waiting for the complete response.

Structured Outputs vs JSON Mode

OpenAI also offers a simpler "JSON mode" that ensures valid JSON but does not enforce a schema. With Structured Outputs available, there is rarely a reason to use JSON mode.

JSON mode guarantees valid JSON syntax. Structured Outputs guarantees valid JSON that matches your exact schema. Use Structured Outputs.

This Is Not Just an OpenAI Feature

While this post focuses on OpenAI's implementation, the concept of structured outputs is now available across major LLM providers. Anthropic's Claude recently launched structured outputs in public beta for Claude Sonnet 4.5 and Opus 4.1, with JSON schema enforcement and strict tool use. Google's Gemini API offers structured output with JSON schema support across Gemini 2.5 models and later. The underlying technique of constrained decoding varies by provider, and each has its own schema limitations, but the developer experience is converging: define a schema, get guaranteed-valid data back.

If you are building a production application, designing your structured output logic with a provider-agnostic Pydantic model at the center makes it straightforward to swap or add providers later.

Integration with Frameworks

If you are using higher-level frameworks, Structured Outputs integrates cleanly. LangChain provides with_structured_output() on chat models. Pydantic AI has native support via output types. And Instructor is built specifically for structured extraction with additional validation features, including automatic retries and streaming support on top of the base Structured Outputs feature.

Why This Should Be Your Default

Structured Outputs removes an entire category of LLM integration bugs. Instead of writing defensive parsing code, retry logic, and validation layers, you define a Pydantic model and get guaranteed-valid data back.

For any production application that needs reliable structured data from an LLM, this should be your default approach. The 100% schema adherence is not marketing speak. It is a technical guarantee backed by constrained decoding.

Define your schema. Trust your output. Focus on what you are building instead of parsing edge cases.

We have been building production AI systems since the early days of the API, and structured outputs have fundamentally changed how we approach LLM integrations for our clients. If you are working on an AI-powered application and need help with reliable data pipelines, agent workflows, or LLM integration architecture, we would like to hear about your project.

How to Get Guaranteed JSON from LLMs with Structured Outputs