Provider routing

Try a preferred provider first. Set fallback chains. Sort by cost, latency, or throughput. All per-request.

FIG.
FIG. 00 · PROVIDER ROUTINGREQUEST → LANE

Some models are hosted by multiple providers — Claude lives on Anthropic, AWS Bedrock, and Google Vertex; Llama lives on Meta direct, Together, Fireworks, and a dozen others. Provider routing lets you control which one serves your request, in what order, and what to do if it fails. With the AI SDK, every option is a key on providerOptions.gateway for streamText and friends.

All routing options live under providerOptions.gateway in the AI SDK, or in the request body for the OpenAI Chat Completions / Anthropic Messages APIs.

FIG. 01ORDER · ONLY · SORT
SCHEMATIC
`order` prepends a preferred sequence, `only` filters which providers can be picked, and `sort` orders the rest by cost, TTFT, or throughput. `models` adds a cross-model fallback if every provider in the list errors.

Try a preferred provider first

Use order to specify the sequence to attempt:

import { streamText } from "ai"

const result = streamText({
  model: "anthropic/claude-opus-4.6",
  prompt,
  providerOptions: {
    gateway: { order: ["bedrock", "anthropic"] },
  },
})

This tries Bedrock first, falls back to Anthropic direct if Bedrock errors or is slow. Other providers (e.g. Vertex) are still available, but only after the listed ones.

Restrict to a subset

Use only to ban specific providers:

providerOptions: {
  gateway: { only: ["bedrock", "anthropic"] }, // never use Vertex
}

If none of the listed providers are available for the model, the request fails immediately with MODEL_NOT_AVAILABLE_FROM_LISTED_PROVIDERS.

`only` interacts with project allowlists

If your project has a model allowlist (Dashboard → Project → Models), the request's only is intersected with the allowlist. Anything not in both is dropped.

Sort by metric

Rank providers by performance or cost:

providerOptions: {
  gateway: {
    sort: "cost",  // 'cost' | 'ttft' (latency) | 'tps' (throughput)
  },
}
ValueMeaningDirection
"cost"Estimated cost per requestCheapest first
"ttft"Time to first token (median, ms)Fastest first
"tps"Tokens per second (median)Highest first

Combine order, only, and sort

providerOptions: {
  gateway: {
    only: ["anthropic", "bedrock", "vertex"], // restrict to these three
    sort: "tps",                              // among them, fastest first
    order: ["anthropic"],                     // anthropic always tried first
  },
}

Resolution order: order is prepended → remaining providers are sorted by sort → unmentioned providers (filtered out by only) are dropped.

Model fallbacks

If your primary model is unavailable, fall back to a backup model — not just a backup provider:

streamText({
  model: "openai/gpt-5.4",  // primary
  providerOptions: {
    gateway: {
      models: [
        "anthropic/claude-opus-4.6",       // try this if primary fails
        "google/gemini-3.1-pro-preview",   // and this if both above fail
      ],
    },
  },
})

The response always tells you which model actually answered (in providerMetadata.gateway.modelAttempts).

Fallbacks count toward your latency budget

Each model in the chain is a fresh round-trip. By default we cap at 3 model attempts per request so a long fallback chain doesn't blow your hot-path latency. Configurable per project under Settings → Routing.

Provider timeouts (BYOK only)

When you bring your own provider keys (BYOK, coming Phase 6), set per-provider timeouts to trigger fast failover:

providerOptions: {
  gateway: {
    providerTimeouts: {
      byok: { openai: 10000 }, // 10s — fail over after that
    },
  },
}

Range: 1,000–789,000ms (~13 min). Once the first token arrives, the timeout is cleared.

Inspecting what actually ran

The response includes a providerMetadata.gateway object with the full attempt history:

const result = streamText({ ... })

console.log(JSON.stringify(await result.providerMetadata, null, 2))
{
  "gateway": {
    "routing": {
      "originalModelId": "anthropic/claude-opus-4.6",
      "resolvedProvider": "bedrock",
      "fallbacksAvailable": ["anthropic", "vertex"],
      "modelAttempts": [
        {
          "modelId": "anthropic:claude-opus-4-6",
          "success": true,
          "providerAttempts": [
            { "provider": "bedrock", "success": true, "responseTimeMs": 4521 }
          ]
        }
      ]
    },
    "cost": "0.0045405",
    "generationId": "gen_01A2B3C4D5"
  }
}

Useful for debugging "why did this take 4 seconds?" — every attempted provider is logged with its response time.

Default behavior

If you don't specify any routing options, Synapse Garden picks providers based on recent uptime and latency, with a small bias toward cost. This is usually what you want for production — set explicit options only when you have a reason to.