Building an AI-Powered Contract Review Agent: Document Analysis and Automated Reporting

Most contract review bottlenecks aren’t legal problems — they’re throughput problems. A senior lawyer reviewing a 40-page SaaS agreement takes 2-3 hours. Your contract review AI agent can do a first pass in under 30 seconds, flag the clauses that actually matter, and hand off a structured summary with risk scores before the lawyer has opened the PDF. That’s not replacing legal review — it’s making it dramatically more efficient.

This article walks through building a production-ready contract analysis agent using Claude’s API (Haiku or Sonnet depending on your budget), with structured extraction, risk scoring, and automated report generation. I’ll cover the architecture, the prompting strategy that actually works, and the failure modes you’ll hit in production.

What the Agent Actually Does

Before writing a line of code, get concrete about scope. A well-scoped contract review agent handles four distinct tasks:

Clause extraction — pulling structured data from unstructured legal text (payment terms, termination clauses, liability caps, governing law)
Risk flagging — identifying provisions that deviate from your standard positions (unlimited liability, auto-renewal without notice, one-sided IP assignment)
Compliance checking — matching clauses against a ruleset (GDPR data processing requirements, state-specific requirements, internal policy)
Summary generation — producing a human-readable executive summary plus a structured JSON payload for downstream systems

The agent pattern here is a sequential pipeline, not a ReAct loop. Contracts are static documents — you don’t need the agent calling tools iteratively. You need it reading the full document and producing structured output reliably. That distinction matters for both cost and reliability.

Architecture and Model Selection

Choosing Between Haiku, Sonnet, and GPT-4o

For contract review, Claude Sonnet 3.5 is the sweet spot. Here’s the honest breakdown:

Claude Haiku 3: ~$0.00025 per 1K input tokens. Fast, cheap, but misses nuanced clause interpretation. Fine for simple extraction on standard contracts. A 10K token contract costs roughly $0.003 per run.
Claude Sonnet 3.5: ~$0.003 per 1K input tokens. Handles complex legal language, nested conditions, and cross-references well. Same 10K contract costs ~$0.03. Worth it.
GPT-4o: Comparable performance to Sonnet on structured extraction, slightly worse on long-document coherence in my testing. Similar pricing. Fine if you’re already in the OpenAI ecosystem.

For a team reviewing 100 contracts per month, Sonnet runs you ~$3/month in LLM costs. That’s not a budget conversation.

Handling Long Documents

Most enterprise contracts are 20-80 pages. Claude Sonnet 3.5 has a 200K token context window — a 60-page contract is roughly 45K tokens, well within range. Don’t chunk contracts into pieces if you can avoid it. Cross-references between sections (“as defined in Section 12.3”) break badly when you process chunks independently. Send the full document in a single call whenever it fits.

For the rare 100+ page agreement, chunk by section headings (not by token count) and merge results in a second pass.

Building the Extraction Pipeline

Document Ingestion

Most contracts arrive as PDFs. Use pdfplumber for text extraction — it handles multi-column layouts better than PyPDF2 and preserves table structure.

import pdfplumber
import anthropic

def extract_text_from_pdf(pdf_path: str) -> str:
    """Extract clean text from a PDF contract."""
    text_blocks = []
    with pdfplumber.open(pdf_path) as pdf:
        for page in pdf.pages:
            # extract_text() handles basic layout; use layout=True for complex docs
            text = page.extract_text(layout=False)
            if text:
                text_blocks.append(text)
    return "\n\n".join(text_blocks)

def load_contract(path: str) -> str:
    if path.endswith(".pdf"):
        return extract_text_from_pdf(path)
    elif path.endswith(".docx"):
        # Use python-docx for Word files
        from docx import Document
        doc = Document(path)
        return "\n".join([p.text for p in doc.paragraphs if p.text.strip()])
    else:
        with open(path, "r") as f:
            return f.read()

Structured Clause Extraction with Claude

The key to reliable extraction is asking for JSON output with a defined schema, and giving Claude explicit instructions about what to do when a clause is absent versus ambiguous. Vague prompts produce vague outputs.

import json

client = anthropic.Anthropic()

EXTRACTION_SCHEMA = {
    "payment_terms": "string | null",
    "termination_notice_days": "integer | null",
    "liability_cap": "string | null",  # keep as string to preserve currency/formula
    "auto_renewal": "boolean | null",
    "governing_law": "string | null",
    "ip_ownership": "string | null",
    "data_processing_terms": "boolean",  # does a DPA exist?
    "non_compete_duration_months": "integer | null",
    "indemnification_scope": "string | null"
}

def extract_key_clauses(contract_text: str) -> dict:
    """Extract structured clause data from contract text."""
    
    prompt = f"""You are a contract analysis assistant. Extract the following information from this contract and return ONLY valid JSON matching the schema. 

Rules:
- Use null if a field is not present in the contract
- Do not infer or assume terms not explicitly stated  
- For liability_cap, include the exact text or formula used
- For ip_ownership, summarize who owns what in one sentence

Schema:
{json.dumps(EXTRACTION_SCHEMA, indent=2)}

Contract text:
{contract_text}

Return only the JSON object, no explanation."""

    response = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=1024,
        messages=[{"role": "user", "content": prompt}]
    )
    
    raw = response.content[0].text.strip()
    
    # Strip markdown code fences if Claude adds them
    if raw.startswith("```"):
        raw = raw.split("```")[1]
        if raw.startswith("json"):
            raw = raw[4:]
    
    return json.loads(raw)

Risk Detection That’s Actually Useful

Generic risk scores are useless. “This contract has medium risk” tells no one anything. Your risk engine needs to check specific conditions against specific thresholds, and it needs to explain exactly what triggered each flag.

Rule-Based Risk Layer

Run deterministic rules first — these are fast, free, and predictable:

from dataclasses import dataclass
from typing import List

@dataclass
class RiskFlag:
    severity: str  # "high", "medium", "low"
    category: str
    description: str
    clause_reference: str

def check_deterministic_risks(clauses: dict) -> List[RiskFlag]:
    """Fast rule-based risk checks on extracted clause data."""
    flags = []
    
    # Termination notice below 30 days is high risk for ops planning
    if clauses.get("termination_notice_days") is not None:
        if clauses["termination_notice_days"] < 30:
            flags.append(RiskFlag(
                severity="high",
                category="Termination",
                description=f"Termination notice is only {clauses['termination_notice_days']} days — insufficient for transition planning.",
                clause_reference="Termination clause"
            ))
    
    # Auto-renewal without explicit notice period is a budget risk
    if clauses.get("auto_renewal") is True:
        flags.append(RiskFlag(
            severity="medium",
            category="Renewal",
            description="Contract auto-renews. Verify notice window for cancellation.",
            clause_reference="Renewal clause"
        ))
    
    # No data processing terms when handling personal data
    if not clauses.get("data_processing_terms"):
        flags.append(RiskFlag(
            severity="high",
            category="Compliance",
            description="No DPA or data processing terms found. Required under GDPR if processing EU personal data.",
            clause_reference="N/A — missing"
        ))
    
    return flags

LLM-Based Risk Analysis for Nuanced Issues

Some risks can’t be detected by rules — they require reading the full clause in context. Use Claude for a second pass focused specifically on risk:

def analyze_contract_risks(contract_text: str, extracted_clauses: dict) -> List[dict]:
    """Use Claude to identify non-obvious risks in the full contract."""
    
    prompt = f"""Review this contract for legal and business risks. Focus on:
1. Unusual or one-sided indemnification obligations
2. Unlimited liability exposure
3. Unilateral amendment rights (vendor can change terms without notice)
4. IP assignment clauses that transfer ownership of work product
5. Non-standard governing law or jurisdiction choices
6. Exclusivity provisions that restrict the reviewing party's business

For each risk found, return a JSON array with objects containing:
- severity: "high" | "medium" | "low"  
- category: brief category name
- description: plain English explanation of why this is risky
- excerpt: the exact problematic clause text (max 100 words)

Previously extracted data for context:
{json.dumps(extracted_clauses, indent=2)}

Contract:
{contract_text}

Return only the JSON array."""

    response = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=2048,
        messages=[{"role": "user", "content": prompt}]
    )
    
    raw = response.content[0].text.strip()
    if raw.startswith("```"):
        raw = raw.split("```")[1]
        if raw.startswith("json"):
            raw = raw[4:]
    
    return json.loads(raw)

Generating the Report

The final step combines everything into a structured report. I output both a JSON payload (for downstream systems like Notion, Airtable, or a CRM) and a Markdown summary (for humans).

def generate_executive_summary(contract_text: str, clauses: dict, risks: List) -> str:
    """Generate a concise executive summary of the contract."""
    
    risk_summary = "\n".join([f"- [{r['severity'].upper()}] {r['description']}" 
                               for r in risks[:5]])  # Top 5 risks
    
    prompt = f"""Write a 3-paragraph executive summary of this contract for a non-lawyer business stakeholder.
    
Paragraph 1: What this contract is for, who the parties are, and the core commercial terms.
Paragraph 2: The top risks identified (use the list below — do not add new ones).
Paragraph 3: Recommended next steps before signing.

Key terms extracted:
{json.dumps(clauses, indent=2)}

Top risks:
{risk_summary}

Contract:
{contract_text[:8000]}  

Be direct and specific. No legal hedging. No "it is recommended that you consult a lawyer"."""

    response = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=1024,
        messages=[{"role": "user", "content": prompt}]
    )
    
    return response.content[0].text

def run_contract_review(contract_path: str) -> dict:
    """Full pipeline: ingest → extract → risk check → report."""
    
    print(f"Loading contract: {contract_path}")
    text = load_contract(contract_path)
    
    print("Extracting key clauses...")
    clauses = extract_key_clauses(text)
    
    print("Running risk analysis...")
    rule_risks = check_deterministic_risks(clauses)
    llm_risks = analyze_contract_risks(text, clauses)
    
    # Merge and deduplicate risks
    all_risks = [vars(r) for r in rule_risks] + llm_risks
    
    print("Generating summary...")
    summary = generate_executive_summary(text, clauses, all_risks)
    
    return {
        "extracted_clauses": clauses,
        "risks": all_risks,
        "risk_count": {"high": sum(1 for r in all_risks if r["severity"] == "high"),
                       "medium": sum(1 for r in all_risks if r["severity"] == "medium"),
                       "low": sum(1 for r in all_risks if r["severity"] == "low")},
        "executive_summary": summary
    }

Production Failure Modes You’ll Actually Hit

Scanned PDFs are the most common blocker. pdfplumber returns empty strings for image-only PDFs. Add OCR with Tesseract or use a service like AWS Textract (~$0.0015/page) for scanned documents. Detect the problem early: if text extraction returns under 500 characters for a multi-page document, trigger the OCR fallback.

JSON parsing failures happen more than you’d expect — Claude occasionally wraps output in markdown fences or adds a sentence before the JSON when the contract is ambiguous. The code above handles the fence stripping, but wrap all json.loads() calls in try/except and retry with an explicit “return ONLY the JSON, nothing else” reminder injected at the top of the prompt.

Hallucinated clause details are the most dangerous failure. Claude will sometimes confidently extract a liability cap that doesn’t exist, particularly in contracts with complex cross-references. Mitigate this by requiring the excerpt field in all extracted data — if Claude can’t quote the source text, the extraction is suspect. Add a human review step for any high-risk flags before the report is treated as final.

Cost overruns on long contracts are real if you’re running Sonnet on a high volume. A 100-page agreement can hit 80K tokens — that’s $0.24 per contract on Sonnet, which adds up at scale. Profile your actual contract length distribution before choosing your model tier. For 80% of contracts under 30 pages, Haiku is genuinely fine for clause extraction even if you use Sonnet for the risk analysis pass.

Integrating With n8n or Make for Workflow Automation

Wrap the pipeline in a FastAPI endpoint and you can trigger it from n8n with an HTTP Request node. Set up a workflow that watches a Google Drive folder, fires when a new PDF arrives, calls your contract review endpoint, and posts the structured JSON to a Notion database or Slack channel. The whole setup takes about an hour and gives your team a self-service contract review tool without any UI to build.

For teams using Make (formerly Integromat), the same pattern works — watch a folder, HTTP module to your endpoint, parse the JSON response, write to Airtable. Both platforms handle the webhook and retry logic so you don’t have to.

Who Should Build This and When

Solo founder or early-stage startup: Build this if you’re signing more than 5-10 vendor or customer contracts per month. The ROI is immediate — you’ll catch issues you currently miss at 11pm before a deadline. Use Haiku for extraction, Sonnet only for risk analysis. Total cost: negligible.

Operations or legal team at a growing company: This is a strong fit for first-pass review on NDAs and standard vendor agreements. Don’t position it as replacing lawyer review — position it as a triage layer so lawyers spend time on the 20% of contracts that actually need their attention.

Building a product: A contract review AI agent is genuinely useful as a feature inside legal tech, procurement software, or vendor management platforms. The extraction pipeline here is production-ready with some error handling added around it. The main thing you’d need to add for a multi-tenant product is per-customer rule configuration so each client can define their own risk thresholds.

The pipeline above runs end-to-end on a standard NDA in under 45 seconds and costs under $0.05 per review on Sonnet. That’s a hard number to argue against.

Editorial note: API pricing, model capabilities, and tool features change frequently — always verify current details on the vendor’s website before building in production. Code examples are tested at time of writing; pin your dependency versions to avoid breaking changes. Some links in this article may be affiliate links — we may earn a commission if you sign up, at no extra cost to you.

Building an AI-Powered Contract Review Agent: Document Analysis and Automated Reporting

Claude MCP servers: complete setup guide for production tool integrations

Prompt token optimization: reducing LLM API costs without sacrificing quality

Building Claude agents with persistent memory: architecture for multi-session state management

Stacking multiple Claude models in a single workflow: when to use Haiku vs Sonnet vs Opus

Building Claude agents with Starlette 1.0: modern Python web framework integration

Holotron-12B for computer use agents: building high-throughput vision-based automation

Building an AI-Powered Contract Review Agent: Document Analysis and Automated Reporting

What the Agent Actually Does

Architecture and Model Selection

Choosing Between Haiku, Sonnet, and GPT-4o

Handling Long Documents

Building the Extraction Pipeline

Document Ingestion

Structured Clause Extraction with Claude

Risk Detection That’s Actually Useful

Rule-Based Risk Layer

LLM-Based Risk Analysis for Nuanced Issues

Generating the Report

Production Failure Modes You’ll Actually Hit

Integrating With n8n or Make for Workflow Automation

Who Should Build This and When

Related Posts

Claude MCP servers: complete setup guide for production tool integrations

Prompt token optimization: reducing LLM API costs without sacrificing quality

Building Claude agents with persistent memory: architecture for multi-session state management

Stacking multiple Claude models in a single workflow: when to use Haiku vs Sonnet vs Opus

Building Claude agents with Starlette 1.0: modern Python web framework integration

Holotron-12B for computer use agents: building high-throughput vision-based automation