By the end of this tutorial, you’ll have a working Python pipeline that submits 10,000+ documents to Claude’s batch API, polls for results, handles failures, and writes structured output — at roughly half the cost of synchronous API calls. Claude batch API processing is one of the most underused features in the Anthropic ecosystem, and for high-volume workloads it’s the obvious right choice.
The Batch API lets you submit up to 100,000 requests in a single job. Anthropic processes them asynchronously and charges 50% of standard per-token pricing. The tradeoff: results take up to 24 hours. For most document processing pipelines — contract extraction, lead scoring, content moderation, invoice parsing — that latency is completely acceptable, and the cost savings are significant.
At current Haiku pricing ($0.25/MTok input, $1.25/MTok output), a 10,000-document batch where each document is ~500 tokens in and ~200 tokens out costs roughly $1.25 input + $2.50 output = $3.75 total. The same job via synchronous API would cost ~$7.50. At Sonnet scale those savings get much more interesting.
What You’ll Build — and the Steps to Get There
- Install dependencies — Set up the Anthropic SDK and supporting libraries
- Structure your batch requests — Build valid JSONL request objects with custom IDs
- Submit the batch job — Call the batch API and store the batch ID
- Poll for completion — Write a resilient polling loop with exponential backoff
- Parse and store results — Stream the JSONL result file and write to your database
- Handle partial failures — Identify failed requests and resubmit them
Step 1: Install Dependencies
You need anthropic >= 0.28.0 for the Batch API. The SDK wraps the full lifecycle: submission, polling, and result retrieval.
pip install anthropic>=0.28.0 python-dotenv tqdm
import anthropic
import os
import json
import time
from pathlib import Path
from dotenv import load_dotenv
load_dotenv()
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
Step 2: Structure Your Batch Requests
Each request in a batch needs a unique custom_id (your reference, not Anthropic’s), a model, and a messages array. The custom ID is how you correlate results back to your documents — make it meaningful.
def build_extraction_request(doc_id: str, document_text: str) -> dict:
"""
Build a single batch request for contract data extraction.
Returns a dict that matches the MessageCreateParamsNonStreaming shape.
"""
return {
"custom_id": f"doc-{doc_id}", # Use your DB primary key here
"params": {
"model": "claude-haiku-4-5", # Haiku for cost, Sonnet for quality
"max_tokens": 512,
"system": (
"You are a contract extraction assistant. "
"Return ONLY valid JSON with keys: party_names, effective_date, "
"total_value, jurisdiction. If a field is missing, use null."
),
"messages": [
{
"role": "user",
"content": f"Extract structured data from this contract:\n\n{document_text[:8000]}"
# Truncate to avoid token limit surprises
}
]
}
}
# Build requests from your document list
def prepare_batch(documents: list[dict]) -> list[dict]:
requests = []
for doc in documents:
req = build_extraction_request(doc["id"], doc["text"])
requests.append(req)
return requests
If you’re doing lead scoring rather than extraction, the shape is identical — just swap the system prompt and expected JSON schema. For a deeper look at how to reliably get consistent JSON output from Claude, see our guide on structured output mastery for Claude — the same techniques apply inside batch requests.
Step 3: Submit the Batch Job
The SDK handles serialization. You pass a list of Request objects (or raw dicts) and get back a batch object with an ID you need to persist immediately.
def submit_batch(requests: list[dict]) -> str:
"""
Submit up to 100k requests. Returns the batch_id — store this in your DB.
"""
# Split into chunks of 100k if needed
CHUNK_SIZE = 100_000
batch_ids = []
for i in range(0, len(requests), CHUNK_SIZE):
chunk = requests[i:i + CHUNK_SIZE]
batch = client.messages.batches.create(requests=chunk)
batch_ids.append(batch.id)
print(f"Submitted chunk {i // CHUNK_SIZE + 1}: batch_id={batch.id}, requests={len(chunk)}")
# Brief pause between submissions to avoid rate limiting
time.sleep(1)
return batch_ids
# Usage
documents = [{"id": str(i), "text": f"Contract text {i}..."} for i in range(10_000)]
requests = prepare_batch(documents)
batch_ids = submit_batch(requests)
# CRITICAL: persist batch_ids immediately
with open("batch_ids.json", "w") as f:
json.dump(batch_ids, f)
Do not skip persisting the batch IDs. If your process crashes between submission and result retrieval, you lose the reference. Write them to a file, database, or even a Slack message — you need them to recover.
Step 4: Poll for Completion
Batches typically complete in 1–4 hours for 10k requests, but the SLA is 24 hours. Poll with exponential backoff — hammering the status endpoint every 5 seconds wastes quota and adds noise to your logs.
def wait_for_batch(batch_id: str, initial_interval: int = 60) -> anthropic.types.MessageBatch:
"""
Poll until batch ends (succeeded/errored/cancelled/expired).
Uses exponential backoff up to 10 minutes between checks.
"""
interval = initial_interval
MAX_INTERVAL = 600 # 10 minutes
while True:
batch = client.messages.batches.retrieve(batch_id)
status = batch.processing_status
counts = batch.request_counts
print(
f"[{batch_id}] Status: {status} | "
f"Processing: {counts.processing} | "
f"Succeeded: {counts.succeeded} | "
f"Errored: {counts.errored}"
)
if status == "ended":
return batch
# Exponential backoff with cap
time.sleep(interval)
interval = min(interval * 1.5, MAX_INTERVAL)
Step 5: Parse and Store Results
Results come as a streaming JSONL file. Each line is either a success with the full Claude response, or an error with a code and message. Process them line by line — don’t load the whole thing into memory for large batches.
def process_results(batch_id: str, output_path: str = "results.jsonl"):
"""
Stream results and write successes to output file.
Returns list of failed custom_ids for retry.
"""
failed_ids = []
success_count = 0
with open(output_path, "w") as out_file:
for result in client.messages.batches.results(batch_id):
if result.result.type == "succeeded":
# Extract the text content from the response
content = result.result.message.content[0].text
# Parse the JSON Claude returned
try:
parsed = json.loads(content)
out_file.write(json.dumps({
"id": result.custom_id,
"data": parsed
}) + "\n")
success_count += 1
except json.JSONDecodeError:
# Claude returned malformed JSON — treat as failure
failed_ids.append(result.custom_id)
elif result.result.type == "errored":
error = result.result.error
print(f"Error on {result.custom_id}: {error.type} - {error.message}")
failed_ids.append(result.custom_id)
print(f"Done: {success_count} succeeded, {len(failed_ids)} failed")
return failed_ids
Step 6: Handle Partial Failures and Resubmit
Even well-formed batches have a small error rate — typically 0.1–0.5% from transient API errors, context window overflows on outlier documents, or rate limit hits within the batch. You need a retry path.
def retry_failed(failed_ids: list[str], original_documents: dict) -> list[str]:
"""
Resubmit only the failed requests.
original_documents: dict mapping doc_id -> document text
"""
if not failed_ids:
print("No failures to retry.")
return []
print(f"Retrying {len(failed_ids)} failed requests...")
# Strip the "doc-" prefix we added in build_extraction_request
retry_requests = []
for custom_id in failed_ids:
doc_id = custom_id.replace("doc-", "")
if doc_id in original_documents:
req = build_extraction_request(doc_id, original_documents[doc_id])
retry_requests.append(req)
retry_batch_ids = submit_batch(retry_requests)
return retry_batch_ids
# Full pipeline
def run_pipeline(documents: list[dict]):
doc_lookup = {str(doc["id"]): doc["text"] for doc in documents}
# Submit
requests = prepare_batch(documents)
batch_ids = submit_batch(requests)
all_failed = []
for batch_id in batch_ids:
batch = wait_for_batch(batch_id)
failed = process_results(batch_id, f"results_{batch_id}.jsonl")
all_failed.extend(failed)
# One retry pass
if all_failed:
retry_ids = retry_failed(all_failed, doc_lookup)
for batch_id in retry_ids:
batch = wait_for_batch(batch_id)
process_results(batch_id, f"results_retry_{batch_id}.jsonl")
If you’re building this into a larger cost-managed system, the batch API pairs well with strategies for managing LLM API costs at scale — particularly around model selection per document tier.
Cost and Latency Comparison: Batch vs Synchronous
Here’s what the numbers actually look like on a realistic 10,000-document extraction job (500 tokens in, 200 tokens out per document):
| Approach | Model | Cost | Latency |
|---|---|---|---|
| Synchronous (concurrent) | Haiku | ~$3.75 | ~45 min at rate limits |
| Batch API | Haiku | ~$1.88 | 1–4 hours |
| Synchronous (concurrent) | Sonnet | ~$46.25 | ~2 hours at rate limits |
| Batch API | Sonnet | ~$23.13 | 2–8 hours |
The synchronous approach also requires you to manage concurrency, rate limits, and retry logic yourself — which adds engineering overhead. Batch offloads all of that. For lead scoring at scale, our AI lead scoring automation guide shows how to plug batch results directly into CRM updates.
Common Errors
Error 1: “invalid_request_error” on submission with large payloads
This usually means one or more requests in your batch exceeds the model’s context window. The entire batch is rejected, not just the offending request. Fix: validate token counts before submission using client.count_tokens() or a rough character/4 estimate, and truncate outliers.
def safe_truncate(text: str, max_chars: int = 24000) -> str:
"""Rough guard: 24k chars ≈ 6k tokens, leaving room for system prompt + output."""
return text[:max_chars] if len(text) > max_chars else text
Error 2: Batch expires before you retrieve results
Batch results are available for 29 days after the job ends. But if you never poll and the job expires (24-hour processing window), you get nothing. Fix: always persist batch IDs and set a cron job or scheduled task to poll and retrieve results within the window. See our guide on scheduling AI workflows with cron jobs for a clean setup.
Error 3: JSON parse failures on “succeeded” results
Claude occasionally wraps JSON in markdown code fences even when the system prompt says not to, especially on edge-case documents. Fix: strip fence markers before parsing.
import re
def clean_json_response(text: str) -> str:
"""Strip markdown code fences that Claude sometimes adds despite instructions."""
text = re.sub(r'^```(?:json)?\n', '', text.strip())
text = re.sub(r'\n```$', '', text)
return text.strip()
# Use it before json.loads()
parsed = json.loads(clean_json_response(content))
This is covered in more depth in our article on getting consistent JSON from Claude without hallucinations.
When to Use Batch vs Synchronous
Use batch API when: you have 100+ documents, results aren’t needed in real time, you’re on a tight budget, or you’re running overnight enrichment jobs (CRM updates, document indexing, report generation).
Use synchronous API when: a user is waiting for the result, you need sub-second responses, or you’re doing interactive workflows where each step depends on the last.
The batch API is not a fit for real-time chat, streaming responses, or anything with a human in the loop. It’s purpose-built for the boring-but-important bulk processing that underpins most production data pipelines.
What to Build Next
The natural extension of this pipeline is a tiered routing system: use a fast, cheap classifier (a single synchronous Haiku call) to score each incoming document for complexity, then route complex documents to Sonnet batches and simple ones to Haiku batches. This gets you Sonnet quality where it matters and Haiku pricing where it doesn’t — practically cutting your batch costs in half again while improving output quality on the hard cases. Pair it with the structured output patterns covered here and you have a production-grade document processing service that can handle virtually any volume.
Frequently Asked Questions
How long does Claude’s batch API take to process 10,000 requests?
Typically 1–4 hours for 10,000 requests using Claude Haiku, though the guaranteed SLA is 24 hours. Larger batches or higher-demand periods can push toward the upper end. You should design your pipeline to handle up to 24 hours and treat anything faster as a bonus.
What’s the maximum batch size for the Claude batch API?
A single batch can contain up to 100,000 requests. If you have more documents, split them across multiple batches. There’s no enforced limit on how many batches you can have active simultaneously, but in practice you’ll want to stagger submissions to keep polling manageable.
Does the Claude batch API support all models?
The Batch API supports Claude Haiku, Sonnet, and Opus (all current generations). You can mix models across different batches but not within a single batch — each batch uses one model. Haiku is the obvious choice for cost-sensitive bulk jobs; Sonnet is worth the premium for complex extraction or reasoning tasks.
How do I handle documents that fail in a batch without reprocessing everything?
When you stream results, filter for result.result.type == "errored" and collect the custom_id values. Since you assign those IDs yourself (typically your database primary key), you can look up just the failed documents and resubmit them as a new, smaller batch. This avoids re-billing for requests that already succeeded.
Can I cancel a batch job after submitting it?
Yes. Call client.messages.batches.cancel(batch_id). Any requests already processed will have results available; unprocessed requests are dropped. You’re only billed for the tokens that were actually processed before cancellation.
Put this into practice
Try the Api Security Audit agent — ready to use, no setup required.
Editorial note: API pricing, model capabilities, and tool features change frequently — always verify current details on the vendor’s website before building in production. Code examples are tested at time of writing; pin your dependency versions to avoid breaking changes. Some links in this article may be affiliate links — we may earn a commission if you sign up, at no extra cost to you.

