Sunday, April 5

By the end of this tutorial, you’ll have a working Python pipeline that takes invoice and receipt images (or PDFs), extracts structured JSON data using Claude’s vision capabilities, validates the output, and pushes it to a downstream accounting system via API. This is production-grade invoice receipt extraction with Claude — not a toy demo.

I’ve built this for a client processing ~3,000 documents per day. The same pattern works whether you’re handling 50 invoices a week from a shared inbox or tens of thousands from a multi-vendor procurement system. The tricky parts are handling layout variability, avoiding hallucinated totals, and making the retry logic robust enough to run unattended.

  1. Install dependencies — Set up the Python environment with Anthropic SDK and supporting libraries
  2. Build the extraction prompt — Design a structured prompt that returns consistent JSON across invoice formats
  3. Encode and send documents — Handle images and PDFs, send to Claude’s vision API
  4. Validate extracted data — Parse, type-check, and flag low-confidence fields
  5. Push to accounting system — POST structured data to QuickBooks, Xero, or a custom endpoint
  6. Wrap in a queue processor — Build the loop that handles batches with retries

Step 1: Install Dependencies

You need anthropic, pdf2image for PDF conversion, Pillow for image handling, and pydantic for validation. Pinning versions matters here — the Anthropic SDK has had breaking changes between minor versions.

pip install anthropic==0.26.0 pdf2image==1.17.0 Pillow==10.3.0 pydantic==2.7.1 httpx==0.27.0
# pdf2image requires poppler — on Mac: brew install poppler
# On Ubuntu: apt-get install poppler-utils

Set your API key as an environment variable. Don’t hardcode it.

export ANTHROPIC_API_KEY="sk-ant-..."

Step 2: Build the Extraction Prompt

This is where most implementations go wrong. Vague prompts produce inconsistent field names, mixed currency formats, and null values where there should be zeros. Give Claude a schema and ask it to fill it — don’t ask it to decide the schema.

EXTRACTION_SYSTEM_PROMPT = """You are a financial document extraction specialist.
Extract data from invoices and receipts and return ONLY valid JSON — no explanation, no markdown fences.
If a field is not present in the document, return null for that field.
Never invent or estimate values. If a total is partially obscured, return null.
Always return amounts as floats with 2 decimal places. Dates as ISO 8601 (YYYY-MM-DD).
"""

EXTRACTION_SCHEMA = {
    "vendor_name": "string",
    "vendor_address": "string or null",
    "vendor_tax_id": "string or null",
    "invoice_number": "string or null",
    "invoice_date": "YYYY-MM-DD or null",
    "due_date": "YYYY-MM-DD or null",
    "currency": "3-letter ISO code, e.g. USD",
    "subtotal": "float or null",
    "tax_amount": "float or null",
    "tax_rate_percent": "float or null",
    "total_amount": "float — required",
    "line_items": [
        {
            "description": "string",
            "quantity": "float or null",
            "unit_price": "float or null",
            "line_total": "float or null"
        }
    ],
    "payment_terms": "string or null",
    "notes": "string or null",
    "document_type": "invoice | receipt | credit_note | unknown"
}

def build_user_prompt() -> str:
    import json
    return f"""Extract all financial data from this document using exactly this JSON structure:

{json.dumps(EXTRACTION_SCHEMA, indent=2)}

Return only the populated JSON object."""

The key constraint is "total_amount": "float — required". If the model can’t find a total, you want an exception at validation time, not a silent null. For tips on getting consistent behavior from structured prompts, the reducing LLM hallucinations in production guide covers verification patterns that apply directly here.

Step 3: Encode and Send Documents

Claude accepts images as base64-encoded strings in the message content. PDFs need to be converted to images first — Claude doesn’t accept raw PDF bytes via the standard messages API.

import anthropic
import base64
import io
from pathlib import Path
from pdf2image import convert_from_path
from PIL import Image

client = anthropic.Anthropic()

def load_document_as_images(file_path: str) -> list[str]:
    """Convert document to list of base64-encoded PNG strings."""
    path = Path(file_path)
    
    if path.suffix.lower() == ".pdf":
        # Convert each PDF page to an image
        pages = convert_from_path(str(path), dpi=200)
        images = pages
    else:
        # Single image file
        images = [Image.open(path)]
    
    encoded = []
    for img in images:
        # Resize if too large — Claude's vision works well up to ~2000px wide
        if img.width > 2000:
            ratio = 2000 / img.width
            img = img.resize((2000, int(img.height * ratio)), Image.LANCZOS)
        
        buf = io.BytesIO()
        img.save(buf, format="PNG")
        encoded.append(base64.standard_b64encode(buf.getvalue()).decode("utf-8"))
    
    return encoded

def extract_from_document(file_path: str) -> dict:
    """Send document to Claude and return raw JSON string."""
    image_data_list = load_document_as_images(file_path)
    
    # Build content blocks — one per page
    content = []
    for i, img_b64 in enumerate(image_data_list):
        if len(image_data_list) > 1:
            content.append({"type": "text", "text": f"Page {i+1}:"})
        content.append({
            "type": "image",
            "source": {
                "type": "base64",
                "media_type": "image/png",
                "data": img_b64
            }
        })
    content.append({"type": "text", "text": build_user_prompt()})
    
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",  # Use Sonnet for accuracy on complex layouts
        max_tokens=2048,
        system=EXTRACTION_SYSTEM_PROMPT,
        messages=[{"role": "user", "content": content}]
    )
    
    return response.content[0].text

Use claude-3-5-sonnet-20241022 for this, not Haiku. I tested both extensively. Haiku misses line items on multi-vendor invoices and gets tax calculations wrong more often than you’d want in production. Sonnet at ~$0.003 per image is worth it — the cost of one human correction is higher. If you’re running very high volume and want to evaluate cost tradeoffs, see the batch processing workflows with Claude API guide which covers async batch mode — you can cut costs ~50% with 24-hour turnaround.

Step 4: Validate Extracted Data

Raw LLM output needs validation before it touches your accounting system. Pydantic handles type coercion and catches malformed JSON cleanly.

import json
from pydantic import BaseModel, field_validator, ValidationError
from typing import Optional
from datetime import date

class LineItem(BaseModel):
    description: str
    quantity: Optional[float] = None
    unit_price: Optional[float] = None
    line_total: Optional[float] = None

class InvoiceData(BaseModel):
    vendor_name: str
    vendor_address: Optional[str] = None
    vendor_tax_id: Optional[str] = None
    invoice_number: Optional[str] = None
    invoice_date: Optional[str] = None  # Keep as string, validate format separately
    due_date: Optional[str] = None
    currency: str = "USD"
    subtotal: Optional[float] = None
    tax_amount: Optional[float] = None
    tax_rate_percent: Optional[float] = None
    total_amount: float  # Required — validation fails if missing
    line_items: list[LineItem] = []
    payment_terms: Optional[str] = None
    notes: Optional[str] = None
    document_type: str = "unknown"

    @field_validator("total_amount")
    @classmethod
    def total_must_be_positive(cls, v):
        if v <= 0:
            raise ValueError(f"total_amount must be positive, got {v}")
        return round(v, 2)
    
    @field_validator("currency")
    @classmethod
    def currency_must_be_valid(cls, v):
        if len(v) != 3:
            raise ValueError(f"Currency must be 3-letter ISO code, got: {v}")
        return v.upper()

def parse_and_validate(raw_json: str) -> InvoiceData:
    """Parse Claude's response and validate with Pydantic."""
    # Strip markdown fences if Claude included them despite instructions
    cleaned = raw_json.strip()
    if cleaned.startswith("```"):
        cleaned = "\n".join(cleaned.split("\n")[1:-1])
    
    data = json.loads(cleaned)
    return InvoiceData(**data)

Step 5: Push to Accounting System

This example targets a generic REST endpoint — swap in your QuickBooks or Xero SDK calls as needed. The structure stays the same.

import httpx

ACCOUNTING_API_URL = "https://your-accounting-api.example.com/v1/documents"
ACCOUNTING_API_KEY = "your-api-key"

def push_to_accounting(invoice: InvoiceData, source_file: str) -> dict:
    """POST extracted data to accounting system."""
    payload = {
        "source": source_file,
        "document_type": invoice.document_type,
        "vendor": {
            "name": invoice.vendor_name,
            "address": invoice.vendor_address,
            "tax_id": invoice.vendor_tax_id
        },
        "invoice_number": invoice.invoice_number,
        "dates": {
            "issued": invoice.invoice_date,
            "due": invoice.due_date
        },
        "amounts": {
            "subtotal": invoice.subtotal,
            "tax": invoice.tax_amount,
            "total": invoice.total_amount,
            "currency": invoice.currency
        },
        "line_items": [item.model_dump() for item in invoice.line_items]
    }
    
    with httpx.Client(timeout=30) as http:
        response = http.post(
            ACCOUNTING_API_URL,
            json=payload,
            headers={"Authorization": f"Bearer {ACCOUNTING_API_KEY}"}
        )
        response.raise_for_status()
        return response.json()

Step 6: Wrap in a Queue Processor with Retries

The full production loop needs retry logic, dead-letter handling, and logging. Don’t run this without retry — transient API errors will silently drop documents otherwise.

import time
import logging
from pathlib import Path

logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
logger = logging.getLogger(__name__)

def process_document(file_path: str, max_retries: int = 3) -> dict:
    """Full pipeline: extract → validate → push. With retries."""
    last_error = None
    
    for attempt in range(max_retries):
        try:
            raw = extract_from_document(file_path)
            invoice = parse_and_validate(raw)
            result = push_to_accounting(invoice, file_path)
            logger.info(f"✓ Processed {file_path} | total={invoice.total_amount} {invoice.currency}")
            return {"status": "success", "file": file_path, "result": result}
        
        except json.JSONDecodeError as e:
            logger.warning(f"JSON parse error on attempt {attempt+1}: {e}")
            last_error = e
            time.sleep(2 ** attempt)  # Exponential backoff
        
        except ValidationError as e:
            # Validation errors are usually not retryable (bad data, not transient)
            logger.error(f"Validation failed for {file_path}: {e}")
            return {"status": "validation_error", "file": file_path, "error": str(e)}
        
        except anthropic.APIStatusError as e:
            logger.warning(f"Claude API error on attempt {attempt+1}: {e.status_code}")
            last_error = e
            time.sleep(2 ** attempt)
    
    logger.error(f"✗ Failed after {max_retries} attempts: {file_path}")
    return {"status": "failed", "file": file_path, "error": str(last_error)}

def process_directory(directory: str, extensions: tuple = (".pdf", ".png", ".jpg", ".jpeg")):
    """Process all invoice files in a directory."""
    files = [f for f in Path(directory).iterdir() if f.suffix.lower() in extensions]
    logger.info(f"Found {len(files)} documents to process")
    
    results = {"success": 0, "validation_error": 0, "failed": 0}
    for file in files:
        outcome = process_document(str(file))
        results[outcome["status"]] = results.get(outcome["status"], 0) + 1
    
    logger.info(f"Batch complete: {results}")
    return results

# Run it
if __name__ == "__main__":
    process_directory("./invoices")

For more sophisticated retry patterns with circuit breakers and queue-based backpressure, see the LLM fallback and retry logic guide — the patterns there apply cleanly to document processing pipelines.

Common Errors

Claude returns markdown fences despite instructions

This happens more often than it should. The strip logic in parse_and_validate above handles it, but you’ll occasionally get responses wrapped in ```json. If you’re seeing it frequently, add an explicit line to the system prompt: “Do not use markdown code fences. Output raw JSON only, starting with { and ending with }.” That usually kills it.

Totals don’t match line item sums

Claude sometimes extracts a correct total but incorrect line items (especially when invoices have discounts, shipping, or multi-line taxes). Add a post-validation check: if sum(line_item.line_total) differs from subtotal by more than 0.10, flag the document for human review instead of auto-approving. Don’t silently push bad data to your accounting system.

def validate_totals(invoice: InvoiceData) -> list[str]:
    warnings = []
    if invoice.line_items and invoice.subtotal:
        computed = sum(i.line_total for i in invoice.line_items if i.line_total)
        if abs(computed - invoice.subtotal) > 0.10:
            warnings.append(f"Line item sum {computed:.2f} != subtotal {invoice.subtotal:.2f}")
    return warnings

PDF conversion fails on scanned documents

If the PDF is a scanned image (not a native PDF), pdf2image will still convert it, but at low DPI the text becomes unreadable to Claude. Set dpi=250 minimum for scanned docs — 200 works for native PDFs but not scans. You can detect scanned PDFs by checking if pdfplumber extracts zero text from the file, then applying higher DPI automatically.

What to Build Next

Add a confidence scoring layer. After extraction, send the structured data back to Claude with a second prompt: “Review this extracted data against the image. Rate your confidence 0-100 for each field and list any fields you’re uncertain about.” Documents scoring below 80 go to a human review queue; above 80 auto-approve. This hybrid approach dramatically reduces error rates without requiring humans to touch every document. Pair it with the structured output verification patterns covered in our hallucination reduction guide to make the confidence scores themselves reliable.

You can also extend this into a full email-to-accounting pipeline — watch an inbox for attachments, run them through this extractor, and post to your accounting API automatically. That’s roughly two additional steps: an IMAP listener and attachment handler. At current Sonnet pricing (~$0.003 per document image), processing 3,000 invoices daily costs around $9/day in API fees — substantially less than any human AP clerk, with faster turnaround and a full audit trail.

Frequently Asked Questions

How accurate is Claude for invoice receipt extraction compared to dedicated OCR tools like AWS Textract?

Claude outperforms Textract on layout-variable documents — handwritten notes, non-standard templates, mixed languages — because it understands context, not just character patterns. Textract is faster and cheaper for high-volume standard formats. For mixed document types, Claude wins on accuracy; for homogeneous, high-volume formats, Textract or Google Document AI may be more cost-effective. Most production systems combine both: OCR for pre-processing, Claude for semantic interpretation.

Can this handle invoices in multiple languages?

Yes. Claude 3.5 Sonnet handles invoices in French, German, Spanish, Japanese, Chinese, and most major languages without any prompt changes. For best results, add a "detected_language" field to your schema so you can audit the language distribution in your document set. Tax field names vary significantly by country (TVA, MwSt, GST), so Claude’s contextual understanding is a real advantage here over rule-based extraction.

What’s the cost to process 1,000 invoices per day with this setup?

At current Claude 3.5 Sonnet pricing ($3 input / $15 output per million tokens), a typical single-page invoice costs roughly $0.003–$0.006 per document depending on image size and output length. 1,000 invoices/day runs about $3–6/day in API fees. Using the Anthropic Batch API for non-urgent processing cuts that roughly in half with 24-hour turnaround. Multi-page invoices cost proportionally more per page.

How do I handle invoices where the total is spread across multiple pages?

Send all pages as separate image blocks in a single API call — the code in Step 3 already does this. Claude maintains context across all pages in the message, so it correctly aggregates totals, line items, and headers from page 1 with amounts from page 3. The only limit is the 5MB per-image size constraint; compress or downscale large scans before sending.

Can I use Claude Haiku instead of Sonnet to reduce costs?

You can, but I don’t recommend it for production invoice processing. Haiku misses line items on complex layouts, occasionally hallucinates tax rates, and struggles with low-quality scans. The cost difference (~10x cheaper) is real, but one incorrect invoice that causes an accounting reconciliation error costs more to fix than the savings. Use Haiku for routing/classification (is this a receipt or invoice?) and Sonnet for extraction.

How do I integrate this with QuickBooks or Xero specifically?

QuickBooks Online has a REST API (https://developer.intuit.com) with a /v3/company/{companyId}/bill endpoint for vendor bills. Xero’s API uses /api.xro/2.0/Invoices. Both require OAuth2 — use the official Python SDKs (python-quickbooks or xero-python) rather than raw HTTP calls, as they handle token refresh. Map the InvoiceData fields to the accounting system’s schema in the push_to_accounting function.

Put this into practice

Try the Data Analyst agent — ready to use, no setup required.

Browse Agents →

Editorial note: API pricing, model capabilities, and tool features change frequently — always verify current details on the vendor’s website before building in production. Code examples are tested at time of writing; pin your dependency versions to avoid breaking changes. Some links in this article may be affiliate links — we may earn a commission if you sign up, at no extra cost to you.


Share.
Leave A Reply