Reasoning models
Long chain-of-thought, tool-using, plan-and-execute models. Configure effort, capture summaries.
Reasoning models think before they answer. Unlike standard chat models that produce tokens one at a time and never look back, reasoning models perform an internal chain-of-thought, evaluate multiple approaches, and only then start streaming a final answer. They're slower per request but materially better at math, code, planning, and multi-step tasks. Configure them through providerOptions on streamText to get both the reasoning channel and the final answer.
Available reasoning models
Filter the catalog by the Reasoning capability on /models. The big four:
| Model | Reasoning style |
|---|---|
openai/gpt-5.5-pro, gpt-5.4-pro | Deep CoT; configurable effort + summary |
openai/o4, o3-mini | Pure reasoning; no system prompt |
anthropic/claude-opus-4.6 | Extended thinking with token budget |
deepseek/r2 | Open-weights reasoning; transparent CoT |
OpenAI reasoning (gpt-5*, o*)
Pass reasoningEffort and reasoningSummary via providerOptions.openai:
import { streamText } from "ai"
const result = streamText({
model: "openai/gpt-5.5",
baseURL: "https://synapse.garden/api/v1",
apiKey: process.env.MG_KEY,
prompt: "Prove that the square root of 2 is irrational.",
providerOptions: {
openai: {
reasoningEffort: "high", // 'minimal' | 'low' | 'medium' | 'high' | 'none'
reasoningSummary: "detailed", // 'auto' | 'concise' | 'detailed'
},
},
})
for await (const part of result.fullStream) {
if (part.type === "reasoning") process.stdout.write(`\x1b[90m${part.textDelta}\x1b[0m`)
if (part.type === "text-delta") process.stdout.write(part.textDelta)
}| Effort | Behavior |
|---|---|
none | Same as a non-reasoning model. Fast, cheap, no CoT. |
minimal | Quick CoT for simple multi-step tasks. |
low | Default for most apps. Balanced. |
medium | Multi-page reasoning chains. |
high | Maximum effort. Can take 30+ seconds per call. |
| Summary | Behavior |
|---|---|
auto | Model decides. Often produces a brief recap. |
concise | One-paragraph recap of the reasoning. |
detailed | Full step-by-step recap before the answer. |
For openai/gpt-5*, you must set both reasoningEffort and reasoningSummary to receive reasoning output. Set only one and the response will be empty for the reasoning channel.
Anthropic extended thinking
Claude exposes reasoning as extended thinking with a dollar-denominated budget:
const result = streamText({
model: "anthropic/claude-opus-4.6",
prompt: "Find the optimal route through these 12 cities (TSP, ~100km grid).",
providerOptions: {
anthropic: {
thinkingBudget: 0.05, // dollars per request — caps internal reasoning spend
},
},
})Anthropic spends up to thinkingBudget USD on internal reasoning, then produces the final answer. Higher budget → deeper reasoning, more tokens, longer wait. Common values:
0.001— quick checks0.01— typical reasoning task0.05–0.10— competition math, complex planning0.50+— research-grade exploration
DeepSeek r2
DeepSeek's reasoning model exposes its CoT explicitly — useful for transparency or for distilling into smaller models:
const result = streamText({
model: "deepseek/r2",
prompt: "Design a rate limiter for a multi-region API.",
})
for await (const part of result.fullStream) {
if (part.type === "reasoning") {
// Full CoT — markdown-formatted, can be very long
process.stdout.write(part.textDelta)
}
if (part.type === "text-delta") {
// Final answer
process.stdout.write(part.textDelta)
}
}When to use reasoning models
Use reasoning when:
- Math — proofs, optimization, calculation chains
- Code — multi-file refactors, debugging dense logic, designing data structures
- Planning — multi-step task decomposition, scheduling, route optimization
- Comparison — evaluating multiple options against complex criteria
- Adversarial — anything that needs to anticipate edge cases
Don't use reasoning when:
- The answer is a simple lookup (use a fast chat model)
- The user expects an answer in <2 seconds (reasoning models routinely take 5–30s)
- You're on a tight token budget (CoT can 10× your input bill)
- Streaming UX matters more than answer quality (TTFT can be 5+ seconds)
Latency profile
Reasoning models stream the final answer at normal speed — but TTFT can be 15–60× longer than a non-reasoning model, because the entire CoT happens before the first answer token appears.
Typical TTFT (time to first answer token, not first reasoning token):
| Model | Effort | TTFT |
|---|---|---|
gpt-5.4 (not reasoning) | n/a | 400–800 ms |
gpt-5.4 w/ reasoning | low | 2–4 s |
gpt-5.4 w/ reasoning | medium | 5–15 s |
gpt-5.4 w/ reasoning | high | 15–60 s |
o4 | n/a | 8–30 s |
claude-opus-4.6 | budget $0.01 | 4–12 s |
claude-opus-4.6 | budget $0.10 | 30–120 s |
If you want progressive UX, render the reasoning summary in a collapsible "thinking" pane while it streams, then reveal the answer when it lands.
Reasoning + tools
Reasoning models can use tools, but the order matters: the model usually thinks first, then issues tool calls, then thinks again with the results, then answers.
streamText({
model: "openai/gpt-5.5",
prompt: "Plan a multi-leg flight from SFO to Tokyo with two stopovers, optimizing for cost.",
providerOptions: {
openai: { reasoningEffort: "high", reasoningSummary: "auto" },
},
tools: {
searchFlights: …,
getAirportInfo: …,
calculateLayover: …,
},
maxSteps: 6,
})You'll see reasoning parts before any tool-call, and the reasoning often references the tool calls as planned actions.
Cost
Reasoning eats tokens. Per-request cost can be 5–30× a non-reasoning call on the same prompt. Strategies to keep cost under control:
- Tier your routing. Use a non-reasoning model first; only escalate to reasoning when the task fails confidence checks.
- Cap
maxStepswhen combined with tools. - Set
reasoningEffort: 'low'or'minimal'for most production traffic; reserve'high'for explicit requests. - Use
thinkingBudgeton Claude to enforce a hard ceiling — it's the only provider that exposes this.
Capturing the reasoning trace
By default, the reasoning summary is included in the response. The full hidden CoT is not exposed for OpenAI / Anthropic — only the summary. DeepSeek r2 returns the full CoT.
For debugging, log the reasoning summary alongside the final answer:
const { text, reasoning } = await generateText({
model: "openai/gpt-5.5",
prompt: "...",
providerOptions: {
openai: { reasoningEffort: "medium", reasoningSummary: "detailed" },
},
})
console.log("Reasoning:", reasoning)
console.log("Answer:", text)For audit trails, store the reasoning summary alongside the final text — it's invaluable when you're debugging a wrong answer six months later.