Reasoning models

Long chain-of-thought, tool-using, plan-and-execute models. Configure effort, capture summaries.

FIG.

FIG. 00 · REASONING MODELSBRANCHING THOUGHT

Reasoning models think before they answer. Unlike standard chat models that produce tokens one at a time and never look back, reasoning models perform an internal chain-of-thought, evaluate multiple approaches, and only then start streaming a final answer. They're slower per request but materially better at math, code, planning, and multi-step tasks. Configure them through providerOptions on streamText to get both the reasoning channel and the final answer.

FIG. 01THINK · ANSWER

SCHEMATIC

A reasoning request runs an internal chain-of-thought first, then streams the final answer. The `reasoning` parts arrive before any `text-delta`, so TTFT to the *answer* is much higher than a non-reasoning call — render the summary while you wait.

Available reasoning models

Filter the catalog by the Reasoning capability on /models. The big four:

Model	Reasoning style
`openai/gpt-5.5-pro`, `gpt-5.4-pro`	Deep CoT; configurable effort + summary
`openai/o4`, `o3-mini`	Pure reasoning; no system prompt
`anthropic/claude-opus-4.6`	Extended thinking with token budget
`deepseek/r2`	Open-weights reasoning; transparent CoT

OpenAI reasoning (`gpt-5`, `o`)

Pass reasoningEffort and reasoningSummary via providerOptions.openai:

import { streamText } from "ai"

const result = streamText({
  model: "openai/gpt-5.5",
  baseURL: "https://synapse.garden/api/v1",
  apiKey: process.env.MG_KEY,
  prompt: "Prove that the square root of 2 is irrational.",
  providerOptions: {
    openai: {
      reasoningEffort: "high",      // 'minimal' | 'low' | 'medium' | 'high' | 'none'
      reasoningSummary: "detailed", // 'auto' | 'concise' | 'detailed'
    },
  },
})

for await (const part of result.fullStream) {
  if (part.type === "reasoning") process.stdout.write(`\x1b[90m${part.textDelta}\x1b[0m`)
  if (part.type === "text-delta") process.stdout.write(part.textDelta)
}

Effort	Behavior
`none`	Same as a non-reasoning model. Fast, cheap, no CoT.
`minimal`	Quick CoT for simple multi-step tasks.
`low`	Default for most apps. Balanced.
`medium`	Multi-page reasoning chains.
`high`	Maximum effort. Can take 30+ seconds per call.

Summary	Behavior
`auto`	Model decides. Often produces a brief recap.
`concise`	One-paragraph recap of the reasoning.
`detailed`	Full step-by-step recap before the answer.

Set both for gpt-5* models

For openai/gpt-5*, you must set both reasoningEffort and reasoningSummary to receive reasoning output. Set only one and the response will be empty for the reasoning channel.

Anthropic extended thinking

Claude exposes reasoning as extended thinking with a dollar-denominated budget:

const result = streamText({
  model: "anthropic/claude-opus-4.6",
  prompt: "Find the optimal route through these 12 cities (TSP, ~100km grid).",
  providerOptions: {
    anthropic: {
      thinkingBudget: 0.05, // dollars per request — caps internal reasoning spend
    },
  },
})

Anthropic spends up to thinkingBudget USD on internal reasoning, then produces the final answer. Higher budget → deeper reasoning, more tokens, longer wait. Common values:

0.001 — quick checks
0.01 — typical reasoning task
0.05 – 0.10 — competition math, complex planning
0.50+ — research-grade exploration

DeepSeek r2

DeepSeek's reasoning model exposes its CoT explicitly — useful for transparency or for distilling into smaller models:

const result = streamText({
  model: "deepseek/r2",
  prompt: "Design a rate limiter for a multi-region API.",
})

for await (const part of result.fullStream) {
  if (part.type === "reasoning") {
    // Full CoT — markdown-formatted, can be very long
    process.stdout.write(part.textDelta)
  }
  if (part.type === "text-delta") {
    // Final answer
    process.stdout.write(part.textDelta)
  }
}

When to use reasoning models

Use reasoning when:

Math — proofs, optimization, calculation chains
Code — multi-file refactors, debugging dense logic, designing data structures
Planning — multi-step task decomposition, scheduling, route optimization
Comparison — evaluating multiple options against complex criteria
Adversarial — anything that needs to anticipate edge cases

Don't use reasoning when:

The answer is a simple lookup (use a fast chat model)
The user expects an answer in <2 seconds (reasoning models routinely take 5–30s)
You're on a tight token budget (CoT can 10× your input bill)
Streaming UX matters more than answer quality (TTFT can be 5+ seconds)

Latency profile

Reasoning models stream the final answer at normal speed — but TTFT can be 15–60× longer than a non-reasoning model, because the entire CoT happens before the first answer token appears.

Typical TTFT (time to first answer token, not first reasoning token):

Model	Effort	TTFT
`gpt-5.4` (not reasoning)	n/a	400–800 ms
`gpt-5.4` w/ reasoning	low	2–4 s
`gpt-5.4` w/ reasoning	medium	5–15 s
`gpt-5.4` w/ reasoning	high	15–60 s
`o4`	n/a	8–30 s
`claude-opus-4.6`	budget $0.01	4–12 s
`claude-opus-4.6`	budget $0.10	30–120 s

If you want progressive UX, render the reasoning summary in a collapsible "thinking" pane while it streams, then reveal the answer when it lands.

Reasoning + tools

Reasoning models can use tools, but the order matters: the model usually thinks first, then issues tool calls, then thinks again with the results, then answers.

streamText({
  model: "openai/gpt-5.5",
  prompt: "Plan a multi-leg flight from SFO to Tokyo with two stopovers, optimizing for cost.",
  providerOptions: {
    openai: { reasoningEffort: "high", reasoningSummary: "auto" },
  },
  tools: {
    searchFlights: …,
    getAirportInfo: …,
    calculateLayover: …,
  },
  maxSteps: 6,
})

You'll see reasoning parts before any tool-call, and the reasoning often references the tool calls as planned actions.

Cost

Reasoning eats tokens. Per-request cost can be 5–30× a non-reasoning call on the same prompt. Strategies to keep cost under control:

Tier your routing. Use a non-reasoning model first; only escalate to reasoning when the task fails confidence checks.
Cap maxSteps when combined with tools.
Set reasoningEffort: 'low' or 'minimal' for most production traffic; reserve 'high' for explicit requests.
Use thinkingBudget on Claude to enforce a hard ceiling — it's the only provider that exposes this.

Capturing the reasoning trace

By default, the reasoning summary is included in the response. The full hidden CoT is not exposed for OpenAI / Anthropic — only the summary. DeepSeek r2 returns the full CoT.

For debugging, log the reasoning summary alongside the final answer:

const { text, reasoning } = await generateText({
  model: "openai/gpt-5.5",
  prompt: "...",
  providerOptions: {
    openai: { reasoningEffort: "medium", reasoningSummary: "detailed" },
  },
})

console.log("Reasoning:", reasoning)
console.log("Answer:", text)

For audit trails, store the reasoning summary alongside the final text — it's invaluable when you're debugging a wrong answer six months later.

Reasoning models

On this page