On-call flow · bill spike · Synapse Garden

Spend alert fires at 03:00. Open Claude Desktop on your phone, ask "why did Synapse Garden spike?" The MCP narrates the answer and applies a budget cap with one click.

FIG.

FIG. 00 · ON-CALL FLOWALERT → ROOT CAUSE → CAP

This flow is the answer to "the bill is doing something weird and I'm in bed". You don't open a laptop, you don't grep logs, you don't write SQL. You open Claude Desktop on your phone, the hosted MCP at https://synapse.garden/api/mcp is already linked over OAuth, and you ask one question. The agent walks the diagnostic loop, narrates what it's checking, and proposes a cap you can approve with a tap. For the underlying signal model — the three IDs, the structured logs, the spans — see the observability docs and the AI SDK's streamText for how to emit your own X-Mg-Trace-Id so this loop can correlate across your services.

FIG. 01DIAGNOSTIC LOOP

SCHEMATIC

Alert fires from your spend monitor. The agent reads `mg://errors/recent`, calls `analyze_spend` to find the anomalous tuple, calls `tail_errors` to rule out retry storms, and proposes a project budget with a soft alert and a hard cutoff. You approve from your phone; the cap takes effect on the next request through the proxy.

Alert lands

At 03:14 local, your spend monitor fires. The webhook payload your Slack channel renders looks like this — same shape your Sentry rule receives:

{
  "type": "spend.anomaly",
  "org": { "id": "org_01HZ8J3PQR4X9VK2", "slug": "linear-prod" },
  "window": { "start": "2026-05-10T01:00:00Z", "end": "2026-05-10T03:00:00Z" },
  "observed_usd": 184.62,
  "rolling_7d_2h_avg_usd": 11.48,
  "ratio": 16.08,
  "top_offender_hint": { "project": "rag-indexer", "model": "openai/gpt-4o-mini" },
  "request_id": "req_01J0G5B7N2M8Q4ZR1AE6CY"
}

The ratio (16x the rolling 2-hour band averaged across the last seven days) is what tripped the alert, not the absolute number. Rules engineered against the rolling band catch a spike before it becomes a bill.

Open Claude Desktop on your phone

You linked Claude Desktop to the hosted MCP at https://synapse.garden/api/mcp weeks ago — the OAuth consent screen showed you the org list and the scope set, you picked linear-prod and granted logs:read plus governance:write. Nothing is installed; the tokens live in Claude Desktop's secure storage and refresh hourly.

Tap the email alert to deep-link the alert id into a new Claude Desktop chat, or just paste the prompt:

Synapse Garden alert at 3am — linear-prod spent $184 in two hours vs an $11 baseline. Why did it spike, and what cap should I set?

You stop typing here. The agent runs the rest.

The agent localizes the spike

First call — get a structured breakdown of where the dollars went:

{
  "type": "tool_use",
  "name": "analyze_spend",
  "input": {
    "org": "linear-prod",
    "window": { "start": "2026-05-10T01:00:00Z", "end": "2026-05-10T03:00:00Z" },
    "compare_to": "rolling_7d_same_window",
    "group_by": ["project", "model", "hour"],
    "anomaly_threshold_ratio": 3.0,
    "limit": 10
  }
}

The server returns a ranked list. The first row is the smoking gun:

{
  "anomalies": [
    {
      "project": "rag-indexer",
      "model": "openai/gpt-4o-mini",
      "hour": "2026-05-10T02:00:00Z",
      "observed_usd": 142.18,
      "baseline_usd": 4.31,
      "ratio": 32.99,
      "request_count": 18437,
      "avg_input_tokens": 5912,
      "p95_input_tokens": 11842,
      "finish_reasons": { "stop": 17984, "length": 412, "error": 41 }
    },
    {
      "project": "rag-indexer",
      "model": "openai/text-embedding-3-large",
      "hour": "2026-05-10T02:00:00Z",
      "observed_usd": 38.06,
      "baseline_usd": 6.94,
      "ratio": 5.48,
      "request_count": 9216
    }
  ],
  "other_projects_in_window_usd": 4.38
}

The agent narrates inline — not for you to read every word, but so you can interrupt if the inference is wrong:

The spike is one project, rag-indexer, in one hour. 18,437 requests on gpt-4o-mini with an average of 5.9k input tokens — that's 33x the baseline. The embedding model spiked too, but only 5.5x and only $38, so it's downstream. The ratio of length finishes to total is 2.2%, normal-ish. The next thing to check is whether errors are dragging retries up.

Rule out retry storms

A request count this high can be either real load or a misconfigured retry loop. The agent checks:

{
  "type": "tool_use",
  "name": "tail_errors",
  "input": {
    "project": "rag-indexer",
    "since": "2026-05-10T01:00:00Z",
    "until": "2026-05-10T03:00:00Z",
    "group_by": "code",
    "include_4xx": true,
    "limit": 200
  }
}

Response:

{
  "by_code": [
    { "code": "RATE_LIMITED", "count": 41, "first_seen": "2026-05-10T02:14:08Z", "last_seen": "2026-05-10T02:51:33Z" },
    { "code": "UPSTREAM_TIMEOUT", "count": 7 }
  ],
  "total_4xx": 41,
  "total_5xx": 7,
  "error_rate_pct": 0.26,
  "retry_storm_detected": false
}

The agent narrates:

Errors are 0.26% of requests and there's no retry-storm signature (no spikes of identical request_id prefixes, no 429-then-200 ratchet). The spend is real load. Most likely cause: an indexer job kicked off and re-embedded a corpus you didn't expect.

The agent proposes a cap

The agent does the arithmetic out loud and proposes a number that's not pulled from a default. The math: rolling 7-day average for rag-indexer is $312/mo, the spike implies a $1,840/mo run rate if it sustains, you almost certainly want the cap above your real burn but well below the runaway. It picks a calibrated number and asks for confirmation.

{
  "type": "tool_use",
  "name": "set_project_budget",
  "input": {
    "project": "rag-indexer",
    "monthly_cap_usd": 480,
    "soft_alert_pct": 68,
    "hard_cutoff_pct": 102,
    "alert_emails": ["priya@yourco.com", "oncall@yourco.com"]
  }
}

The first call comes back with a structured summary the agent renders for you:

{
  "status": "confirmation_required",
  "summary": "Set monthly cap on `rag-indexer` to $480 (≈ 1.54x current 7d avg of $312). Soft alert at 68% ($326) → priya@yourco.com, oncall@yourco.com. Hard cutoff at 102% ($489) → returns BUDGET_EXCEEDED on chat + embeddings. Current month-to-date spend on this project: $208.74 — cap leaves $271 of headroom.",
  "diff": {
    "monthly_cap_usd": { "before": null, "after": 480 },
    "soft_alert_pct": { "before": null, "after": 68 },
    "hard_cutoff_pct": { "before": null, "after": 102 }
  }
}

Why 68% instead of a round 70%? The agent is leaving a 2-point buffer on the alert so you don't get paged on the same minute the cap is breached — soft alert lands first, you have a window to react before hard cutoff.

Approve from the phone

Claude Desktop renders the summary as a confirmation card with Approve and Reject buttons. You tap Approve. The agent re-issues with confirm: true:

{
  "type": "tool_use",
  "name": "set_project_budget",
  "input": { "project": "rag-indexer", "monthly_cap_usd": 480, "soft_alert_pct": 68, "hard_cutoff_pct": 102, "alert_emails": ["priya@yourco.com", "oncall@yourco.com"], "confirm": true }
}

Response:

{
  "applied_at": "2026-05-10T03:21:47Z",
  "snapshot": {
    "project": "rag-indexer",
    "monthly_cap_usd": 480,
    "month_to_date_usd": 208.74,
    "headroom_usd": 271.26,
    "effective_on_request": "next"
  },
  "audit_event_id": "evt_01J0G6YR3K8M2QF7WD"
}

The cap is live on the next request the proxy sees — there's no replication lag because the budget service writes through to Upstash on the same hop.

The audit trail closes the loop

You're back asleep by 03:25. In the morning, /app/linear-prod/audit shows the full sequence with one row per tool call:

mcp.spend_analyzed — rag-indexer × openai/gpt-4o-mini, anomaly ratio 32.99
mcp.errors_tailed — rag-indexer, 48 errors over 2h, no retry storm
mcp.budget_set — rag-indexer, cap $480, soft 68%, hard 102%, by priya@yourco.com via claude-desktop (OAuth client id client_synapse_desktop_v2)

Each row carries the args hash and the client identifier the MCP transport reported. If your team asks "what did the agent do at 3am", the answer is three rows long.

By the time someone investigates the underlying job at 09:00, the indexer has caught up against the cap, traffic has flattened, and rag-indexer is sitting at $214 on the day with $266 of headroom — exactly enough buffer to write a real fix without a second page.

What this bought you

No laptop, no SQL. The MCP narrated the diagnostic loop using analyze_spend and tail_errors against indexes the dashboard already maintains. See the governance docs for the budget surface the agent wrote to.
No silent spend. The cap landed behind a confirmation card you tapped on your phone — see the MCP server confirmation flow for the full contract.
Proportional response. The agent didn't propose a 50% slash that breaks production; it picked 1.54x your real burn so the runaway hits the cap and the legitimate work doesn't.

Observability

The IDs, spans, and structured log rows the agent leaned on. How to wire your own `X-Mg-Trace-Id` so this loop can find your traffic in your traces.

Governance

Budgets, allowlists, alerts. The control plane the agent wrote to — and the one your humans use the rest of the week.

MCP server

Every tool the on-call flow used and the scopes each one needs.

On-call flow · bill spike