Batch processing workflows with Claude API: handle 10,000+ documents efficiently

Q: How do I handle documents that fail in a batch without reprocessing everything?

When you stream results, filter for result.result.type == "errored" and collect the custom_id values. Since you assign those IDs yourself (typically your database primary key), you can look up just the failed documents and resubmit them as a new, smaller batch. This avoids re-billing for requests that already succeeded.

Q: Can I cancel a batch job after submitting it?

Yes. Call client.messages.batches.cancel(batch_id). Any requests already processed will have results available; unprocessed requests are dropped. You're only billed for the tokens that were actually processed before cancellation.

By the end of this tutorial, you’ll have a working Python pipeline that submits 10,000+ documents to Claude’s batch API, polls for results, handles failures, and writes structured output — at roughly half the cost of synchronous API calls. Claude batch API processing is one of the most underused features in the Anthropic ecosystem, and for high-volume workloads it’s the obvious right choice.

The Batch API lets you submit up to 100,000 requests in a single job. Anthropic processes them asynchronously and charges 50% of standard per-token pricing. The tradeoff: results take up to 24 hours. For most document processing pipelines — contract extraction, lead scoring, content moderation, invoice parsing — that latency is completely acceptable, and the cost savings are significant.

At current Haiku pricing ($0.25/MTok input, $1.25/MTok output), a 10,000-document batch where each document is ~500 tokens in and ~200 tokens out costs roughly $1.25 input + $2.50 output = $3.75 total. The same job via synchronous API would cost ~$7.50. At Sonnet scale those savings get much more interesting.

What You’ll Build — and the Steps to Get There

Install dependencies — Set up the Anthropic SDK and supporting libraries
Structure your batch requests — Build valid JSONL request objects with custom IDs
Submit the batch job — Call the batch API and store the batch ID
Poll for completion — Write a resilient polling loop with exponential backoff
Parse and store results — Stream the JSONL result file and write to your database
Handle partial failures — Identify failed requests and resubmit them

Step 1: Install Dependencies

You need anthropic >= 0.28.0 for the Batch API. The SDK wraps the full lifecycle: submission, polling, and result retrieval.

pip install anthropic>=0.28.0 python-dotenv tqdm

import anthropic
import os
import json
import time
from pathlib import Path
from dotenv import load_dotenv

load_dotenv()

client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

Step 2: Structure Your Batch Requests

Each request in a batch needs a unique custom_id (your reference, not Anthropic’s), a model, and a messages array. The custom ID is how you correlate results back to your documents — make it meaningful.

def build_extraction_request(doc_id: str, document_text: str) -> dict:
    """
    Build a single batch request for contract data extraction.
    Returns a dict that matches the MessageCreateParamsNonStreaming shape.
    """
    return {
        "custom_id": f"doc-{doc_id}",  # Use your DB primary key here
        "params": {
            "model": "claude-haiku-4-5",  # Haiku for cost, Sonnet for quality
            "max_tokens": 512,
            "system": (
                "You are a contract extraction assistant. "
                "Return ONLY valid JSON with keys: party_names, effective_date, "
                "total_value, jurisdiction. If a field is missing, use null."
            ),
            "messages": [
                {
                    "role": "user",
                    "content": f"Extract structured data from this contract:\n\n{document_text[:8000]}"
                    # Truncate to avoid token limit surprises
                }
            ]
        }
    }

# Build requests from your document list
def prepare_batch(documents: list[dict]) -> list[dict]:
    requests = []
    for doc in documents:
        req = build_extraction_request(doc["id"], doc["text"])
        requests.append(req)
    return requests

If you’re doing lead scoring rather than extraction, the shape is identical — just swap the system prompt and expected JSON schema. For a deeper look at how to reliably get consistent JSON output from Claude, see our guide on structured output mastery for Claude — the same techniques apply inside batch requests.

Step 3: Submit the Batch Job

The SDK handles serialization. You pass a list of Request objects (or raw dicts) and get back a batch object with an ID you need to persist immediately.

def submit_batch(requests: list[dict]) -> str:
    """
    Submit up to 100k requests. Returns the batch_id — store this in your DB.
    """
    # Split into chunks of 100k if needed
    CHUNK_SIZE = 100_000
    batch_ids = []

    for i in range(0, len(requests), CHUNK_SIZE):
        chunk = requests[i:i + CHUNK_SIZE]
        
        batch = client.messages.batches.create(requests=chunk)
        batch_ids.append(batch.id)
        print(f"Submitted chunk {i // CHUNK_SIZE + 1}: batch_id={batch.id}, requests={len(chunk)}")
        
        # Brief pause between submissions to avoid rate limiting
        time.sleep(1)

    return batch_ids

# Usage
documents = [{"id": str(i), "text": f"Contract text {i}..."} for i in range(10_000)]
requests = prepare_batch(documents)
batch_ids = submit_batch(requests)

# CRITICAL: persist batch_ids immediately
with open("batch_ids.json", "w") as f:
    json.dump(batch_ids, f)

Do not skip persisting the batch IDs. If your process crashes between submission and result retrieval, you lose the reference. Write them to a file, database, or even a Slack message — you need them to recover.

Step 4: Poll for Completion

Batches typically complete in 1–4 hours for 10k requests, but the SLA is 24 hours. Poll with exponential backoff — hammering the status endpoint every 5 seconds wastes quota and adds noise to your logs.

def wait_for_batch(batch_id: str, initial_interval: int = 60) -> anthropic.types.MessageBatch:
    """
    Poll until batch ends (succeeded/errored/cancelled/expired).
    Uses exponential backoff up to 10 minutes between checks.
    """
    interval = initial_interval
    MAX_INTERVAL = 600  # 10 minutes

    while True:
        batch = client.messages.batches.retrieve(batch_id)
        status = batch.processing_status

        counts = batch.request_counts
        print(
            f"[{batch_id}] Status: {status} | "
            f"Processing: {counts.processing} | "
            f"Succeeded: {counts.succeeded} | "
            f"Errored: {counts.errored}"
        )

        if status == "ended":
            return batch

        # Exponential backoff with cap
        time.sleep(interval)
        interval = min(interval * 1.5, MAX_INTERVAL)

Step 5: Parse and Store Results

Results come as a streaming JSONL file. Each line is either a success with the full Claude response, or an error with a code and message. Process them line by line — don’t load the whole thing into memory for large batches.

def process_results(batch_id: str, output_path: str = "results.jsonl"):
    """
    Stream results and write successes to output file.
    Returns list of failed custom_ids for retry.
    """
    failed_ids = []
    success_count = 0

    with open(output_path, "w") as out_file:
        for result in client.messages.batches.results(batch_id):
            if result.result.type == "succeeded":
                # Extract the text content from the response
                content = result.result.message.content[0].text
                
                # Parse the JSON Claude returned
                try:
                    parsed = json.loads(content)
                    out_file.write(json.dumps({
                        "id": result.custom_id,
                        "data": parsed
                    }) + "\n")
                    success_count += 1
                except json.JSONDecodeError:
                    # Claude returned malformed JSON — treat as failure
                    failed_ids.append(result.custom_id)
                    
            elif result.result.type == "errored":
                error = result.result.error
                print(f"Error on {result.custom_id}: {error.type} - {error.message}")
                failed_ids.append(result.custom_id)

    print(f"Done: {success_count} succeeded, {len(failed_ids)} failed")
    return failed_ids

Step 6: Handle Partial Failures and Resubmit

Even well-formed batches have a small error rate — typically 0.1–0.5% from transient API errors, context window overflows on outlier documents, or rate limit hits within the batch. You need a retry path.

def retry_failed(failed_ids: list[str], original_documents: dict) -> list[str]:
    """
    Resubmit only the failed requests.
    original_documents: dict mapping doc_id -> document text
    """
    if not failed_ids:
        print("No failures to retry.")
        return []

    print(f"Retrying {len(failed_ids)} failed requests...")
    
    # Strip the "doc-" prefix we added in build_extraction_request
    retry_requests = []
    for custom_id in failed_ids:
        doc_id = custom_id.replace("doc-", "")
        if doc_id in original_documents:
            req = build_extraction_request(doc_id, original_documents[doc_id])
            retry_requests.append(req)

    retry_batch_ids = submit_batch(retry_requests)
    return retry_batch_ids

# Full pipeline
def run_pipeline(documents: list[dict]):
    doc_lookup = {str(doc["id"]): doc["text"] for doc in documents}
    
    # Submit
    requests = prepare_batch(documents)
    batch_ids = submit_batch(requests)
    
    all_failed = []
    for batch_id in batch_ids:
        batch = wait_for_batch(batch_id)
        failed = process_results(batch_id, f"results_{batch_id}.jsonl")
        all_failed.extend(failed)
    
    # One retry pass
    if all_failed:
        retry_ids = retry_failed(all_failed, doc_lookup)
        for batch_id in retry_ids:
            batch = wait_for_batch(batch_id)
            process_results(batch_id, f"results_retry_{batch_id}.jsonl")

If you’re building this into a larger cost-managed system, the batch API pairs well with strategies for managing LLM API costs at scale — particularly around model selection per document tier.

Cost and Latency Comparison: Batch vs Synchronous

Here’s what the numbers actually look like on a realistic 10,000-document extraction job (500 tokens in, 200 tokens out per document):

Approach	Model	Cost	Latency
Synchronous (concurrent)	Haiku	~$3.75	~45 min at rate limits
Batch API	Haiku	~$1.88	1–4 hours
Synchronous (concurrent)	Sonnet	~$46.25	~2 hours at rate limits
Batch API	Sonnet	~$23.13	2–8 hours

The synchronous approach also requires you to manage concurrency, rate limits, and retry logic yourself — which adds engineering overhead. Batch offloads all of that. For lead scoring at scale, our AI lead scoring automation guide shows how to plug batch results directly into CRM updates.

Common Errors

Error 1: “invalid_request_error” on submission with large payloads

This usually means one or more requests in your batch exceeds the model’s context window. The entire batch is rejected, not just the offending request. Fix: validate token counts before submission using client.count_tokens() or a rough character/4 estimate, and truncate outliers.

def safe_truncate(text: str, max_chars: int = 24000) -> str:
    """Rough guard: 24k chars ≈ 6k tokens, leaving room for system prompt + output."""
    return text[:max_chars] if len(text) > max_chars else text

Error 2: Batch expires before you retrieve results

Batch results are available for 29 days after the job ends. But if you never poll and the job expires (24-hour processing window), you get nothing. Fix: always persist batch IDs and set a cron job or scheduled task to poll and retrieve results within the window. See our guide on scheduling AI workflows with cron jobs for a clean setup.

Error 3: JSON parse failures on “succeeded” results

Claude occasionally wraps JSON in markdown code fences even when the system prompt says not to, especially on edge-case documents. Fix: strip fence markers before parsing.

import re

def clean_json_response(text: str) -> str:
    """Strip markdown code fences that Claude sometimes adds despite instructions."""
    text = re.sub(r'^```(?:json)?\n', '', text.strip())
    text = re.sub(r'\n```$', '', text)
    return text.strip()

# Use it before json.loads()
parsed = json.loads(clean_json_response(content))

This is covered in more depth in our article on getting consistent JSON from Claude without hallucinations.

When to Use Batch vs Synchronous

Use batch API when: you have 100+ documents, results aren’t needed in real time, you’re on a tight budget, or you’re running overnight enrichment jobs (CRM updates, document indexing, report generation).

Use synchronous API when: a user is waiting for the result, you need sub-second responses, or you’re doing interactive workflows where each step depends on the last.

The batch API is not a fit for real-time chat, streaming responses, or anything with a human in the loop. It’s purpose-built for the boring-but-important bulk processing that underpins most production data pipelines.

What to Build Next

The natural extension of this pipeline is a tiered routing system: use a fast, cheap classifier (a single synchronous Haiku call) to score each incoming document for complexity, then route complex documents to Sonnet batches and simple ones to Haiku batches. This gets you Sonnet quality where it matters and Haiku pricing where it doesn’t — practically cutting your batch costs in half again while improving output quality on the hard cases. Pair it with the structured output patterns covered here and you have a production-grade document processing service that can handle virtually any volume.

Frequently Asked Questions

How long does Claude’s batch API take to process 10,000 requests?

Typically 1–4 hours for 10,000 requests using Claude Haiku, though the guaranteed SLA is 24 hours. Larger batches or higher-demand periods can push toward the upper end. You should design your pipeline to handle up to 24 hours and treat anything faster as a bonus.

What’s the maximum batch size for the Claude batch API?

A single batch can contain up to 100,000 requests. If you have more documents, split them across multiple batches. There’s no enforced limit on how many batches you can have active simultaneously, but in practice you’ll want to stagger submissions to keep polling manageable.

Does the Claude batch API support all models?

The Batch API supports Claude Haiku, Sonnet, and Opus (all current generations). You can mix models across different batches but not within a single batch — each batch uses one model. Haiku is the obvious choice for cost-sensitive bulk jobs; Sonnet is worth the premium for complex extraction or reasoning tasks.

How do I handle documents that fail in a batch without reprocessing everything?

When you stream results, filter for result.result.type == "errored" and collect the custom_id values. Since you assign those IDs yourself (typically your database primary key), you can look up just the failed documents and resubmit them as a new, smaller batch. This avoids re-billing for requests that already succeeded.

Can I cancel a batch job after submitting it?

Yes. Call client.messages.batches.cancel(batch_id). Any requests already processed will have results available; unprocessed requests are dropped. You’re only billed for the tokens that were actually processed before cancellation.

Put this into practice

Try the Api Security Audit agent — ready to use, no setup required.

Browse Agents →

Editorial note: API pricing, model capabilities, and tool features change frequently — always verify current details on the vendor’s website before building in production. Code examples are tested at time of writing; pin your dependency versions to avoid breaking changes. Some links in this article may be affiliate links — we may earn a commission if you sign up, at no extra cost to you.

Batch processing workflows with Claude API: handle 10,000+ documents efficiently

Claude MCP servers: complete setup guide for production tool integrations

Prompt token optimization: reducing LLM API costs without sacrificing quality

Building Claude agents with persistent memory: architecture for multi-session state management

Stacking multiple Claude models in a single workflow: when to use Haiku vs Sonnet vs Opus

Building Claude agents with Starlette 1.0: modern Python web framework integration

Holotron-12B for computer use agents: building high-throughput vision-based automation

Batch processing workflows with Claude API: handle 10,000+ documents efficiently

What You’ll Build — and the Steps to Get There

Step 1: Install Dependencies

Step 2: Structure Your Batch Requests

Step 3: Submit the Batch Job

Step 4: Poll for Completion

Step 5: Parse and Store Results

Step 6: Handle Partial Failures and Resubmit

Cost and Latency Comparison: Batch vs Synchronous

Common Errors

Error 1: “invalid_request_error” on submission with large payloads

Error 2: Batch expires before you retrieve results

Error 3: JSON parse failures on “succeeded” results

When to Use Batch vs Synchronous

What to Build Next

Frequently Asked Questions

How long does Claude’s batch API take to process 10,000 requests?

What’s the maximum batch size for the Claude batch API?

Does the Claude batch API support all models?

How do I handle documents that fail in a batch without reprocessing everything?

Can I cancel a batch job after submitting it?

Put this into practice

Related Claude Code Agents

Related Posts

Claude MCP servers: complete setup guide for production tool integrations

Prompt token optimization: reducing LLM API costs without sacrificing quality

Building Claude agents with persistent memory: architecture for multi-session state management

Stacking multiple Claude models in a single workflow: when to use Haiku vs Sonnet vs Opus

Building Claude agents with Starlette 1.0: modern Python web framework integration

Holotron-12B for computer use agents: building high-throughput vision-based automation