Building Claude Agents That Actually Read Your Emails: Complete Setup for Production Email Triage

Q: How do I prevent the agent from auto-replying to something sensitive or incorrect?

Keep reply_dry_run=True for at least the first two weeks and review the drafted replies in your database before enabling live sends. Add a hard blocklist of sender domains (e.g., legal firms, investors) that always require human approval regardless of category. The should_auto_reply() function is the right place to add these rules — keep the decision logic in code, not in the prompt.

By the end of this tutorial, you’ll have a working Claude agent email automation system that pulls emails via IMAP, classifies and triages them using Claude, stores persistent context in SQLite, and fires auto-replies for specific categories — all running as a webhook-triggered service you can deploy on any Python host.

This isn’t a toy demo. It’s the architecture I’d actually ship for a founder’s inbox or a support queue handling 200–500 emails per day. We’ll use Claude Haiku for classification (fast, cheap — roughly $0.0003 per email at current pricing) and Claude Sonnet for drafting replies that require nuance.

Install dependencies — set up the Python environment with anthropic, imaplib, and SQLite
Configure IMAP + Gmail OAuth — connect to your inbox without storing plaintext credentials
Build the triage classifier — prompt Claude to return structured JSON categories
Add persistent memory — track sender history in SQLite to personalize responses
Wire up auto-reply logic — conditional reply drafting with Sonnet for high-priority threads
Set up the webhook trigger — expose a FastAPI endpoint so n8n or Make can fire it on schedule

Step 1: Install Dependencies

You need Python 3.11+. The core stack is intentionally minimal — no LangChain, no agent frameworks. Just the Anthropic SDK, IMAP handling, and a lightweight web layer.

pip install anthropic fastapi uvicorn python-dotenv google-auth google-auth-oauthlib

Also create a .env file with:

ANTHROPIC_API_KEY=sk-ant-...
GMAIL_ADDRESS=you@yourdomain.com
GMAIL_OAUTH_TOKEN_PATH=./token.json
GMAIL_CLIENT_SECRET_PATH=./client_secret.json
DB_PATH=./email_agent.db

Step 2: Configure IMAP Access

Gmail now requires OAuth2 for IMAP — basic password auth was deprecated in 2022. Use Google’s OAuth2 flow to get a token once, then refresh automatically.

import imaplib
import base64
from google.oauth2.credentials import Credentials
from google.auth.transport.requests import Request
from google_auth_oauthlib.flow import InstalledAppFlow
import os

SCOPES = ["https://mail.google.com/"]

def get_credentials(token_path: str, secret_path: str) -> Credentials:
    creds = None
    if os.path.exists(token_path):
        creds = Credentials.from_authorized_user_file(token_path, SCOPES)
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(secret_path, SCOPES)
            creds = flow.run_local_server(port=0)
        with open(token_path, "w") as f:
            f.write(creds.to_json())
    return creds

def connect_imap(email_address: str, creds: Credentials) -> imaplib.IMAP4_SSL:
    mail = imaplib.IMAP4_SSL("imap.gmail.com")
    # Build XOAUTH2 string
    auth_string = f"user={email_address}\x01auth=Bearer {creds.token}\x01\x01"
    auth_bytes = base64.b64encode(auth_string.encode()).decode()
    mail.authenticate("XOAUTH2", lambda x: auth_bytes)
    return mail

For non-Gmail IMAP (Fastmail, custom domain, etc.), you can use app passwords and skip OAuth entirely — just replace the authenticate call with mail.login(user, password). Don’t use this with Gmail post-2022, it’ll silently fail or get blocked.

Step 3: Build the Triage Classifier

This is the core of your Claude agent email automation pipeline. We want structured output every time — no free-text responses that need parsing. Claude Haiku is the right model here: it handles classification tasks reliably and at a fraction of Sonnet’s cost.

import anthropic
import json
from dataclasses import dataclass

client = anthropic.Anthropic()

TRIAGE_SYSTEM = """You are an email triage agent. Classify each email and return ONLY valid JSON.

Categories:
- "urgent_action": Requires response within 2 hours (contract, payment, emergency)
- "reply_needed": Needs a response but not urgent
- "newsletter": Marketing, newsletters, subscriptions
- "notification": Automated system alerts, receipts
- "spam": Unwanted solicitation
- "internal": From known team members

Return format:
{"category": "...", "priority": 1-5, "summary": "one sentence", "suggested_reply": true/false}"""

@dataclass
class TriageResult:
    category: str
    priority: int
    summary: str
    suggested_reply: bool

def classify_email(subject: str, sender: str, body_snippet: str) -> TriageResult:
    prompt = f"From: {sender}\nSubject: {subject}\nBody preview: {body_snippet[:500]}"
    
    response = client.messages.create(
        model="claude-haiku-4-5",  # Fast + cheap for classification
        max_tokens=256,
        system=TRIAGE_SYSTEM,
        messages=[{"role": "user", "content": prompt}]
    )
    
    raw = response.content[0].text.strip()
    # Strip markdown code fences if Claude adds them despite instructions
    if raw.startswith("```"):
        raw = raw.split("```")[1]
        if raw.startswith("json"):
            raw = raw[4:]
    
    data = json.loads(raw)
    return TriageResult(**data)

I use max_tokens=256 deliberately — the classification JSON is small, and capping tokens prevents runaway charges if something goes weird. For getting consistent JSON from Claude without hallucinations, check out our guide on structured output mastery for Claude agents.

Step 4: Add Persistent Memory with SQLite

Stateless classification misses a critical signal: sender history. A first-time email from an unknown address is different from the same message from a customer who’s paid you $50k. SQLite is the right call here — no infrastructure overhead, zero cost, works fine up to millions of rows.

import sqlite3
from datetime import datetime

def init_db(db_path: str):
    conn = sqlite3.connect(db_path)
    conn.execute("""
        CREATE TABLE IF NOT EXISTS email_log (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            message_id TEXT UNIQUE,
            sender TEXT,
            subject TEXT,
            category TEXT,
            priority INTEGER,
            processed_at TEXT,
            reply_sent INTEGER DEFAULT 0
        )
    """)
    conn.execute("""
        CREATE TABLE IF NOT EXISTS sender_profile (
            email TEXT PRIMARY KEY,
            first_seen TEXT,
            email_count INTEGER DEFAULT 0,
            avg_priority REAL DEFAULT 3.0,
            last_category TEXT
        )
    """)
    conn.commit()
    return conn

def get_sender_context(conn: sqlite3.Connection, sender_email: str) -> dict:
    row = conn.execute(
        "SELECT * FROM sender_profile WHERE email = ?", (sender_email,)
    ).fetchone()
    if not row:
        return {"is_known": False, "email_count": 0, "avg_priority": 3.0}
    return {
        "is_known": True,
        "email_count": row[2],
        "avg_priority": row[3],
        "last_category": row[4]
    }

def update_sender_profile(conn: sqlite3.Connection, sender: str, result: TriageResult):
    existing = conn.execute(
        "SELECT email_count, avg_priority FROM sender_profile WHERE email = ?", (sender,)
    ).fetchone()
    
    if existing:
        new_count = existing[0] + 1
        new_avg = (existing[1] * existing[0] + result.priority) / new_count
        conn.execute(
            "UPDATE sender_profile SET email_count=?, avg_priority=?, last_category=? WHERE email=?",
            (new_count, new_avg, result.category, sender)
        )
    else:
        conn.execute(
            "INSERT INTO sender_profile VALUES (?,?,?,?,?)",
            (sender, datetime.utcnow().isoformat(), 1, result.priority, result.category)
        )
    conn.commit()

Step 5: Wire Up Auto-Reply Logic

Only fire Sonnet for emails flagged as suggested_reply: true with a priority of 3+. Everything else either gets a templated response or nothing. This keeps costs predictable — Sonnet at roughly $0.003 per 1K input tokens adds up fast if you run it on every email.

REPLY_SYSTEM = """You are a professional email assistant drafting replies on behalf of the inbox owner.
Be concise and helpful. Match the tone of the original email.
Do not fabricate commitments, pricing, or deadlines. Flag anything requiring human judgment.
Sign off as: "Best, [Name] (AI assistant — replies reviewed by the team)"."""

def draft_reply(subject: str, body: str, sender_context: dict) -> str:
    context_note = ""
    if sender_context["is_known"] and sender_context["email_count"] > 5:
        context_note = f"Note: This sender has emailed {sender_context['email_count']} times before."
    
    response = client.messages.create(
        model="claude-sonnet-4-5",  # Better reasoning for reply drafting
        max_tokens=512,
        system=REPLY_SYSTEM,
        messages=[{
            "role": "user",
            "content": f"{context_note}\n\nOriginal email:\nSubject: {subject}\n\n{body[:1500]}"
        }]
    )
    return response.content[0].text

def should_auto_reply(result: TriageResult, sender_context: dict) -> bool:
    # Never auto-reply to newsletters, notifications, or spam
    if result.category in ("newsletter", "notification", "spam"):
        return False
    # Always draft for urgent items regardless of sender history
    if result.category == "urgent_action":
        return True
    # For regular replies, require some sender history to reduce false positives
    return result.suggested_reply and sender_context.get("email_count", 0) > 0

If you’re building anything beyond a solo inbox — think support queue or sales triage — this is where you’d branch into a multi-agent pattern. Our article on building multi-agent teams with Claude covers how to hand off drafts to a review agent before sending.

Step 6: Set Up the Webhook Trigger

Wrap the whole pipeline in a FastAPI endpoint. You can call this from n8n on a schedule, from Make via HTTP module, or from a simple cron job. This is intentionally thin — the agent logic lives in the functions above, not in the route handler.

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import os
from dotenv import load_dotenv

load_dotenv()
app = FastAPI()
conn = init_db(os.getenv("DB_PATH", "./email_agent.db"))

class TriageRequest(BaseModel):
    max_emails: int = 20  # Safety cap per run
    reply_dry_run: bool = True  # Set False only in production

@app.post("/triage")
async def run_triage(req: TriageRequest):
    creds = get_credentials(
        os.getenv("GMAIL_OAUTH_TOKEN_PATH"),
        os.getenv("GMAIL_CLIENT_SECRET_PATH")
    )
    mail = connect_imap(os.getenv("GMAIL_ADDRESS"), creds)
    mail.select("INBOX")
    
    _, message_ids = mail.search(None, "UNSEEN")
    ids = message_ids[0].split()[-req.max_emails:]  # Process newest first
    
    results = []
    for mid in ids:
        _, data = mail.fetch(mid, "(RFC822)")
        # Parse email (use email.message_from_bytes in real impl)
        raw_email = data[0][1]
        # ... parse subject, sender, body here ...
        
        sender_ctx = get_sender_context(conn, sender)
        triage = classify_email(subject, sender, body_snippet)
        update_sender_profile(conn, sender, triage)
        
        reply_drafted = None
        if should_auto_reply(triage, sender_ctx) and not req.reply_dry_run:
            reply_drafted = draft_reply(subject, body, sender_ctx)
            # smtp send logic goes here
        
        results.append({
            "subject": subject,
            "category": triage.category,
            "priority": triage.priority,
            "reply_drafted": reply_drafted is not None
        })
    
    mail.logout()
    return {"processed": len(results), "results": results}

For the webhook architecture connecting this to n8n or Make, see our deep dive on webhook triggers and event-driven Claude agents — it covers retry logic, auth headers, and idempotency patterns you’ll want before going to production.

Common Errors

IMAP authentication failing with “Invalid credentials”

Nine times out of ten this is a stale OAuth token or a scope mismatch. Delete token.json and rerun the auth flow. Make sure your Google Cloud project has “Gmail API” enabled AND the IMAP access toggle is on in Gmail settings (Settings → Forwarding and POP/IMAP → Enable IMAP). Two separate switches that both need to be on.

Claude returning malformed JSON despite the system prompt

Haiku occasionally wraps output in markdown fences when the input looks “code-like” to it. The strip logic in Step 3 handles this, but if you’re still seeing failures, switch to using tool_use with a defined schema instead of asking for raw JSON. Tool use is more reliable than “return JSON” prompts — it forces structured output at the API level.

Processing the same email twice on consecutive runs

You’re likely not marking emails as read after processing, or the IMAP search is returning UIDs instead of sequence numbers and they’re drifting. Fix: after fetching, call mail.store(mid, '+FLAGS', '\\Seen'). Also log the Message-ID header (not the IMAP sequence number) in your SQLite table and check for duplicates before processing.

What to Build Next

Add a VIP sender list with escalation routing. Extend sender_profile with a vip_tier column (0/1/2). When a VIP sender emails and gets urgent_action, push a Slack notification via webhook instead of just drafting a reply. Combine this with the priority score to create a proper SLA — e.g., tier-2 VIPs always get a Slack ping within 5 minutes regardless of email category. This turns your triage agent into a full inbox-to-Slack escalation system without adding meaningful complexity. If you want to extend this to outbound prospecting too, our tutorial on building an AI lead generation email agent covers the outbound side of the same architecture.

Bottom Line: When to Use This Setup

Solo founders handling a mixed inbox of customer questions, investor emails, and newsletters: run this on a free-tier Railway instance with dry_run=True for a week, review the classifications, then enable live replies for newsletter/notification categories only. Risk is low, time savings are immediate.

Small teams with a shared support inbox: the SQLite approach won’t scale past ~3 concurrent workers — migrate to Postgres (same schema, swap the driver) and add a processing lock on message_id before parallelizing. The Claude agent email automation logic stays identical.

Enterprise / high volume: don’t run IMAP polling — use Gmail Push Notifications via Pub/Sub instead. This webhook-based approach cuts latency from minutes to seconds and removes the polling overhead entirely. You’ll also want to look at LLM caching strategies once you’re past a few thousand emails per day — system prompt caching alone cuts Haiku classification costs by ~40%.

Frequently Asked Questions

How much does it cost to run a Claude email triage agent on 500 emails per day?

Using Claude Haiku for classification (avg ~300 tokens per email), you’re looking at roughly $0.045/day or about $1.35/month for the classification layer. If 20% of emails trigger a Sonnet draft reply (~800 tokens each), add another ~$0.24/day. Total budget for 500 emails/day: under $10/month at current pricing. Always verify current token rates at anthropic.com before budgeting.

Can I use this with Outlook or Microsoft 365 instead of Gmail?

Yes. Microsoft 365 supports IMAP with OAuth2 via the Microsoft Identity Platform. The IMAP connection and auth string format differ slightly — you’ll use imaplib.IMAP4_SSL("outlook.office365.com") and generate the XOAUTH2 token via the MSAL library instead of google-auth. The Claude triage logic is identical once you have the raw email content.

How do I prevent the agent from auto-replying to something sensitive or incorrect?

Keep reply_dry_run=True for at least the first two weeks and review the drafted replies in your database before enabling live sends. Add a hard blocklist of sender domains (e.g., legal firms, investors) that always require human approval regardless of category. The should_auto_reply() function is the right place to add these rules — keep the decision logic in code, not in the prompt.

What’s the difference between using this approach vs n8n’s native email + AI nodes?

n8n’s email triage workflow is faster to set up but gives you less control over the classification logic, sender memory, and reply conditions. This Python-based approach is better when you need custom business logic, persistent state, or cost optimization across model tiers. If you want n8n handling the orchestration with this agent as a backend service, that’s a solid hybrid — the FastAPI webhook endpoint is designed exactly for that.

How do I handle emails with attachments or long threads?

For attachments, extract text content only (PDFs via pdfplumber, DOCX via python-docx) and truncate to 2,000 tokens before sending to Claude. For long threads, send only the most recent 3 messages — Claude doesn’t need the full history for triage, and token costs scale linearly with thread length. Flag emails with attachments in the database so a human reviewer knows to check the originals.

Put this into practice

Browse our directory of Claude Code agents — ready-to-use agents for development, automation, and data workflows.

Browse Agents →

Editorial note: API pricing, model capabilities, and tool features change frequently — always verify current details on the vendor’s website before building in production. Code examples are tested at time of writing; pin your dependency versions to avoid breaking changes. Some links in this article may be affiliate links — we may earn a commission if you sign up, at no extra cost to you.

Building Claude Agents That Actually Read Your Emails: Complete Setup for Production Email Triage

Claude MCP servers: complete setup guide for production tool integrations

Prompt token optimization: reducing LLM API costs without sacrificing quality

Building Claude agents with persistent memory: architecture for multi-session state management

Stacking multiple Claude models in a single workflow: when to use Haiku vs Sonnet vs Opus

Building Claude agents with Starlette 1.0: modern Python web framework integration

Holotron-12B for computer use agents: building high-throughput vision-based automation

Building Claude Agents That Actually Read Your Emails: Complete Setup for Production Email Triage

Step 1: Install Dependencies

Step 2: Configure IMAP Access

Step 3: Build the Triage Classifier

Step 4: Add Persistent Memory with SQLite

Step 5: Wire Up Auto-Reply Logic

Step 6: Set Up the Webhook Trigger

Common Errors

IMAP authentication failing with “Invalid credentials”

Claude returning malformed JSON despite the system prompt

Processing the same email twice on consecutive runs

What to Build Next

Bottom Line: When to Use This Setup

Frequently Asked Questions

How much does it cost to run a Claude email triage agent on 500 emails per day?

Can I use this with Outlook or Microsoft 365 instead of Gmail?

How do I prevent the agent from auto-replying to something sensitive or incorrect?

What’s the difference between using this approach vs n8n’s native email + AI nodes?

How do I handle emails with attachments or long threads?

Put this into practice

Related Claude Code Agents

Related Posts

Claude MCP servers: complete setup guide for production tool integrations

Prompt token optimization: reducing LLM API costs without sacrificing quality

Building Claude agents with persistent memory: architecture for multi-session state management

Stacking multiple Claude models in a single workflow: when to use Haiku vs Sonnet vs Opus

Building Claude agents with Starlette 1.0: modern Python web framework integration

Holotron-12B for computer use agents: building high-throughput vision-based automation