Reasoning models

Long chain-of-thought, tool-using, plan-and-execute models. Configure effort, capture summaries.

FIG.
FIG. 00 · REASONING MODELSBRANCHING THOUGHT

Reasoning models think before they answer. Unlike standard chat models that produce tokens one at a time and never look back, reasoning models perform an internal chain-of-thought, evaluate multiple approaches, and only then start streaming a final answer. They're slower per request but materially better at math, code, planning, and multi-step tasks. Configure them through providerOptions on streamText to get both the reasoning channel and the final answer.

FIG. 01THINK · ANSWER
SCHEMATIC
A reasoning request runs an internal chain-of-thought first, then streams the final answer. The `reasoning` parts arrive before any `text-delta`, so TTFT to the *answer* is much higher than a non-reasoning call — render the summary while you wait.

Available reasoning models

Filter the catalog by the Reasoning capability on /models. The big four:

ModelReasoning style
openai/gpt-5.5-pro, gpt-5.4-proDeep CoT; configurable effort + summary
openai/o4, o3-miniPure reasoning; no system prompt
anthropic/claude-opus-4.6Extended thinking with token budget
deepseek/r2Open-weights reasoning; transparent CoT

OpenAI reasoning (gpt-5*, o*)

Pass reasoningEffort and reasoningSummary via providerOptions.openai:

import { streamText } from "ai"

const result = streamText({
  model: "openai/gpt-5.5",
  baseURL: "https://synapse.garden/api/v1",
  apiKey: process.env.MG_KEY,
  prompt: "Prove that the square root of 2 is irrational.",
  providerOptions: {
    openai: {
      reasoningEffort: "high",      // 'minimal' | 'low' | 'medium' | 'high' | 'none'
      reasoningSummary: "detailed", // 'auto' | 'concise' | 'detailed'
    },
  },
})

for await (const part of result.fullStream) {
  if (part.type === "reasoning") process.stdout.write(`\x1b[90m${part.textDelta}\x1b[0m`)
  if (part.type === "text-delta") process.stdout.write(part.textDelta)
}
EffortBehavior
noneSame as a non-reasoning model. Fast, cheap, no CoT.
minimalQuick CoT for simple multi-step tasks.
lowDefault for most apps. Balanced.
mediumMulti-page reasoning chains.
highMaximum effort. Can take 30+ seconds per call.
SummaryBehavior
autoModel decides. Often produces a brief recap.
conciseOne-paragraph recap of the reasoning.
detailedFull step-by-step recap before the answer.
Set both for gpt-5* models

For openai/gpt-5*, you must set both reasoningEffort and reasoningSummary to receive reasoning output. Set only one and the response will be empty for the reasoning channel.

Anthropic extended thinking

Claude exposes reasoning as extended thinking with a dollar-denominated budget:

const result = streamText({
  model: "anthropic/claude-opus-4.6",
  prompt: "Find the optimal route through these 12 cities (TSP, ~100km grid).",
  providerOptions: {
    anthropic: {
      thinkingBudget: 0.05, // dollars per request — caps internal reasoning spend
    },
  },
})

Anthropic spends up to thinkingBudget USD on internal reasoning, then produces the final answer. Higher budget → deeper reasoning, more tokens, longer wait. Common values:

  • 0.001 — quick checks
  • 0.01 — typical reasoning task
  • 0.050.10 — competition math, complex planning
  • 0.50+ — research-grade exploration

DeepSeek r2

DeepSeek's reasoning model exposes its CoT explicitly — useful for transparency or for distilling into smaller models:

const result = streamText({
  model: "deepseek/r2",
  prompt: "Design a rate limiter for a multi-region API.",
})

for await (const part of result.fullStream) {
  if (part.type === "reasoning") {
    // Full CoT — markdown-formatted, can be very long
    process.stdout.write(part.textDelta)
  }
  if (part.type === "text-delta") {
    // Final answer
    process.stdout.write(part.textDelta)
  }
}

When to use reasoning models

Use reasoning when:

  • Math — proofs, optimization, calculation chains
  • Code — multi-file refactors, debugging dense logic, designing data structures
  • Planning — multi-step task decomposition, scheduling, route optimization
  • Comparison — evaluating multiple options against complex criteria
  • Adversarial — anything that needs to anticipate edge cases

Don't use reasoning when:

  • The answer is a simple lookup (use a fast chat model)
  • The user expects an answer in <2 seconds (reasoning models routinely take 5–30s)
  • You're on a tight token budget (CoT can 10× your input bill)
  • Streaming UX matters more than answer quality (TTFT can be 5+ seconds)

Latency profile

Reasoning models stream the final answer at normal speed — but TTFT can be 15–60× longer than a non-reasoning model, because the entire CoT happens before the first answer token appears.

Typical TTFT (time to first answer token, not first reasoning token):

ModelEffortTTFT
gpt-5.4 (not reasoning)n/a400–800 ms
gpt-5.4 w/ reasoninglow2–4 s
gpt-5.4 w/ reasoningmedium5–15 s
gpt-5.4 w/ reasoninghigh15–60 s
o4n/a8–30 s
claude-opus-4.6budget $0.014–12 s
claude-opus-4.6budget $0.1030–120 s

If you want progressive UX, render the reasoning summary in a collapsible "thinking" pane while it streams, then reveal the answer when it lands.

Reasoning + tools

Reasoning models can use tools, but the order matters: the model usually thinks first, then issues tool calls, then thinks again with the results, then answers.

streamText({
  model: "openai/gpt-5.5",
  prompt: "Plan a multi-leg flight from SFO to Tokyo with two stopovers, optimizing for cost.",
  providerOptions: {
    openai: { reasoningEffort: "high", reasoningSummary: "auto" },
  },
  tools: {
    searchFlights: …,
    getAirportInfo: …,
    calculateLayover: …,
  },
  maxSteps: 6,
})

You'll see reasoning parts before any tool-call, and the reasoning often references the tool calls as planned actions.

Cost

Reasoning eats tokens. Per-request cost can be 5–30× a non-reasoning call on the same prompt. Strategies to keep cost under control:

  • Tier your routing. Use a non-reasoning model first; only escalate to reasoning when the task fails confidence checks.
  • Cap maxSteps when combined with tools.
  • Set reasoningEffort: 'low' or 'minimal' for most production traffic; reserve 'high' for explicit requests.
  • Use thinkingBudget on Claude to enforce a hard ceiling — it's the only provider that exposes this.

Capturing the reasoning trace

By default, the reasoning summary is included in the response. The full hidden CoT is not exposed for OpenAI / Anthropic — only the summary. DeepSeek r2 returns the full CoT.

For debugging, log the reasoning summary alongside the final answer:

const { text, reasoning } = await generateText({
  model: "openai/gpt-5.5",
  prompt: "...",
  providerOptions: {
    openai: { reasoningEffort: "medium", reasoningSummary: "detailed" },
  },
})

console.log("Reasoning:", reasoning)
console.log("Answer:", text)

For audit trails, store the reasoning summary alongside the final text — it's invaluable when you're debugging a wrong answer six months later.