Vercel AI SDK for Claude Agents: Tutorial on Streaming, Tool Use, and Deployment

If you’ve tried wiring Claude into a Next.js app manually — managing fetch calls, handling streaming byte chunks, and figuring out tool call parsing from raw API responses — you know it’s about 200 lines of plumbing before you write a single line of actual product code. This Vercel AI SDK Claude tutorial shows you how to cut that down to a fraction of the work, ship streaming responses, add real tool use, and deploy to the edge in a single workflow that actually holds up in production.

The Vercel AI SDK (package: ai) is a TypeScript-first library that abstracts over multiple LLM providers, including Anthropic’s Claude models, with unified streaming primitives, built-in tool calling support, and React hooks that make streaming UI trivially easy. It’s not magic — there are tradeoffs and some rough edges — but for Claude-powered Next.js apps, it’s currently the most production-ready path I’ve used.

What You’ll Build

A Next.js 14 app with an App Router API route that:

Streams Claude responses token-by-token to the browser
Exposes a get_weather tool that Claude can call mid-conversation
Uses the useChat hook for a clean frontend with zero manual state management
Deploys to Vercel’s edge runtime with one command

The full example runs on Claude 3.5 Haiku, which costs roughly $0.0008 per 1K input tokens and $0.004 per 1K output tokens at current Anthropic pricing — cheap enough to iterate fast without watching your wallet.

Project Setup

Bootstrap a Next.js app and install dependencies:

npx create-next-app@latest claude-agent --typescript --app
cd claude-agent
npm install ai @ai-sdk/anthropic zod

You need three packages: ai is the core SDK, @ai-sdk/anthropic is the Claude provider adapter, and zod is used to define tool parameter schemas. Set your API key in .env.local:

ANTHROPIC_API_KEY=sk-ant-...

The SDK reads this automatically via the Anthropic provider — you don’t need to pass it manually in code, which matters when you’re deploying to Vercel and using environment variables in the dashboard.

Building the Streaming API Route

Create app/api/chat/route.ts. This is where the actual agent logic lives:

import { anthropic } from '@ai-sdk/anthropic';
import { streamText, tool } from 'ai';
import { z } from 'zod';

// Required for streaming on Vercel edge runtime
export const runtime = 'edge';

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = await streamText({
    model: anthropic('claude-haiku-4-5'),
    system: `You are a helpful assistant with access to weather data.
             When asked about weather, always use the get_weather tool.`,
    messages,
    tools: {
      get_weather: tool({
        description: 'Get current weather for a given city',
        parameters: z.object({
          city: z.string().describe('The city name'),
          unit: z.enum(['celsius', 'fahrenheit']).default('celsius'),
        }),
        // Execute runs server-side when Claude calls this tool
        execute: async ({ city, unit }) => {
          // In production, call a real weather API here
          // e.g., OpenWeatherMap, WeatherAPI, etc.
          return {
            city,
            temperature: unit === 'celsius' ? 22 : 72,
            condition: 'Partly cloudy',
            unit,
          };
        },
      }),
    },
    // Allow multiple tool call rounds before final response
    maxSteps: 3,
  });

  return result.toDataStreamResponse();
}

A few things worth calling out here. maxSteps: 3 is critical — without it, Claude will call the tool and then stop, returning the raw tool result instead of a natural language response. Setting this to 3 allows one round of tool calls plus a follow-up response. The runtime = 'edge' export tells Vercel to run this on the edge network globally rather than a single-region Lambda, which cuts cold start times significantly for streaming workloads.

What breaks here: The edge runtime doesn’t support all Node.js APIs. If your tool execute function uses something like fs or native Node modules, it’ll fail silently in development and throw at deploy time. Stick to fetch-based external calls or move off edge runtime to a standard serverless function.

Wiring Up the Frontend with useChat

Replace the contents of app/page.tsx:

'use client';

import { useChat } from 'ai/react';

export default function Chat() {
  const { messages, input, handleInputChange, handleSubmit, isLoading } =
    useChat({
      api: '/api/chat',
    });

  return (
    <div style={{ maxWidth: 600, margin: '0 auto', padding: 20 }}>
      <div style={{ minHeight: 400, marginBottom: 20 }}>
        {messages.map((m) => (
          <div key={m.id} style={{ marginBottom: 16 }}>
            <strong>{m.role === 'user' ? 'You' : 'Claude'}:</strong>
            <p style={{ margin: '4px 0' }}>{m.content}</p>
          </div>
        ))}
        {isLoading && <p style={{ color: '#666' }}>Claude is thinking...</p>}
      </div>

      <form onSubmit={handleSubmit} style={{ display: 'flex', gap: 8 }}>
        <input
          value={input}
          onChange={handleInputChange}
          placeholder="Ask about the weather..."
          style={{ flex: 1, padding: '8px 12px', borderRadius: 4, border: '1px solid #ddd' }}
        />
        <button type="submit" disabled={isLoading}>
          Send
        </button>
      </form>
    </div>
  );
}

useChat handles the entire message history, streaming state, and request lifecycle. The messages array updates in real time as tokens stream in. You don’t write a single useState or useEffect for this — that’s the real productivity win here.

One thing the documentation glosses over: when Claude is in the middle of a tool call, m.content for that step will be an empty string or contain partial tool call data. This means your UI might show a blank assistant message briefly. The fix is to filter messages where m.role === 'assistant' && m.content === '' or use the toolInvocations field on the message to render a “fetching data…” state explicitly.

Adding Multi-Tool Agents

Real agents need multiple tools. Here’s how to extend the setup with a second tool without duplicating anything:

tools: {
  get_weather: tool({
    description: 'Get current weather for a city',
    parameters: z.object({
      city: z.string(),
      unit: z.enum(['celsius', 'fahrenheit']).default('celsius'),
    }),
    execute: async ({ city, unit }) => {
      // weather API call
      return { city, temperature: 22, condition: 'Sunny', unit };
    },
  }),

  search_web: tool({
    description: 'Search the web for current information',
    parameters: z.object({
      query: z.string().describe('The search query'),
    }),
    execute: async ({ query }) => {
      // Integrate with Brave Search API, Serper, Tavily, etc.
      const results = await fetch(
        `https://api.tavily.com/search`,
        {
          method: 'POST',
          headers: { 'Content-Type': 'application/json' },
          body: JSON.stringify({
            api_key: process.env.TAVILY_API_KEY,
            query,
            max_results: 3,
          }),
        }
      ).then(r => r.json());

      return results.results.map((r: any) => ({
        title: r.title,
        snippet: r.content,
        url: r.url,
      }));
    },
  }),
},
maxSteps: 5, // More steps for multi-tool chains

With maxSteps: 5, Claude can call both tools in sequence if needed — for example, search for a city’s coordinates and then look up weather. This is where the “agent” behavior actually emerges. Bump maxSteps too high and you risk runaway loops; I’d cap it at 10 for anything user-facing and implement your own loop detection for automated pipelines.

Deploying to Vercel

This is genuinely the easiest part. Push to GitHub, connect the repo in the Vercel dashboard, add ANTHROPIC_API_KEY as an environment variable, and deploy. The edge runtime configuration you set in the route file handles the rest.

# Or via CLI if you prefer
npm i -g vercel
vercel --prod

For production, set up these additional environment variables in Vercel’s dashboard rather than hardcoding them: your weather API key, Tavily key, and any other service credentials. Vercel encrypts these at rest and injects them at runtime — the edge function never sees them in source.

Latency Reality Check

Edge functions for streaming Claude responses reduce time-to-first-token noticeably — in my testing, roughly 200-400ms faster than a standard US-East Lambda for users in Europe or Asia-Pacific. But the tradeoff is the restricted runtime. If you need database connections (Prisma, pg) or file system access, use runtime = 'nodejs' and accept the slightly higher cold start. For most chat interfaces, edge is the right call.

What Actually Breaks in Production

No tutorial is complete without the list of things that will bite you:

Tool call parsing errors: If your execute function throws, the SDK catches it and returns the error to Claude, which then tells the user something went wrong. This is actually good behavior — but log those errors externally (Sentry, Axiom) because Vercel’s function logs don’t retain long enough for debugging production issues.
Streaming timeouts: Vercel’s free tier has a 10-second function timeout. Long Claude responses on Sonnet or Opus models can exceed this. Upgrade to Pro (60s timeout) or use Haiku for latency-sensitive flows.
Message history size: useChat sends the full message history on every request. A 50-message conversation can get expensive fast and may hit context limits. Implement server-side summarization or trim to the last N messages for long sessions.
Rate limiting: Anthropic’s Tier 1 rate limits (if you’re new) are low — 50 requests per minute for Haiku. Build in retry logic or queue requests if you’re expecting concurrent users at launch.

Model Selection: Haiku vs Sonnet for This Pattern

For tool-calling agents specifically, Claude 3.5 Haiku is my default recommendation. It’s fast enough that tool call round-trips don’t feel sluggish, handles structured tool parameters reliably, and costs a fraction of Sonnet. Switch to Claude 3.5 Sonnet when you need complex reasoning between tool calls — multi-hop research tasks, code generation agents, or anything where the quality of the synthesized response matters more than latency. The API change is a single string swap: anthropic('claude-sonnet-4-5').

Bottom Line: Who Should Use This Stack

Solo founders and indie developers building Claude-powered features into Next.js apps — this is your fastest path from idea to deployed product. The Vercel AI SDK handles the streaming complexity, the edge deployment eliminates infrastructure decisions, and Haiku pricing means you can run thousands of agent interactions for a few dollars a day.

Teams building internal tools should evaluate whether Next.js is actually the right layer. If you’re building an agent that runs in the background rather than a real-time chat UI, you might be better served by a dedicated backend (FastAPI + Anthropic’s Python SDK) and only using the Vercel AI SDK for the frontend streaming layer.

Enterprise teams with existing infra: the SDK works fine, but you’ll outgrow Vercel’s edge function limits quickly at scale. Plan for a dedicated inference proxy or gateway (LiteLLM, Portkey) in front of the Anthropic API before you hit production traffic.

The pattern in this Vercel AI SDK Claude tutorial — edge route, streamText with tools, useChat on the frontend — covers 80% of what you’ll build. It’s not the only way to wire this up, but it’s the one I’d start with and only move away from when a specific constraint forces it.

Editorial note: API pricing, model capabilities, and tool features change frequently — always verify current details on the vendor’s website before building in production. Code examples are tested at time of writing; pin your dependency versions to avoid breaking changes. Some links in this article may be affiliate links — we may earn a commission if you sign up, at no extra cost to you.

Vercel AI SDK for Claude Agents: Tutorial on Streaming, Tool Use, and Deployment

Claude MCP servers: complete setup guide for production tool integrations

Prompt token optimization: reducing LLM API costs without sacrificing quality

Building Claude agents with persistent memory: architecture for multi-session state management

Stacking multiple Claude models in a single workflow: when to use Haiku vs Sonnet vs Opus

Building Claude agents with Starlette 1.0: modern Python web framework integration

Holotron-12B for computer use agents: building high-throughput vision-based automation

Vercel AI SDK for Claude Agents: Tutorial on Streaming, Tool Use, and Deployment

What You’ll Build

Project Setup

Building the Streaming API Route

Wiring Up the Frontend with useChat

Adding Multi-Tool Agents

Deploying to Vercel

Latency Reality Check

What Actually Breaks in Production

Model Selection: Haiku vs Sonnet for This Pattern

Bottom Line: Who Should Use This Stack

Related Posts

Claude MCP servers: complete setup guide for production tool integrations

Prompt token optimization: reducing LLM API costs without sacrificing quality

Building Claude agents with persistent memory: architecture for multi-session state management

Stacking multiple Claude models in a single workflow: when to use Haiku vs Sonnet vs Opus

Building Claude agents with Starlette 1.0: modern Python web framework integration

Holotron-12B for computer use agents: building high-throughput vision-based automation