Sunday, April 5

By the end of this tutorial, you’ll have a working AI lead scoring automation agent that pulls prospect signals from email, LinkedIn data, and your CRM, sends them through Claude for analysis, and writes a structured score and reasoning back to your CRM — no human review required in the happy path.

Most lead scoring implementations are either rule-based (fragile, requires constant tuning) or require expensive enterprise software. What you’re building here sits in the middle: a Python agent using Claude’s API that scores a lead in under 3 seconds for roughly $0.003 per prospect at current Claude Haiku pricing. It runs on any trigger — webhook, cron, or CRM automation.

  1. Install dependencies — Set up the Python environment with Anthropic, HubSpot, and supporting libraries
  2. Define the scoring schema — Build a Pydantic model for structured score output
  3. Write the Claude scoring prompt — Craft a system prompt that produces consistent, calibrated scores
  4. Build the signal collector — Pull prospect data from CRM fields and normalize it
  5. Run the scoring agent — Chain the collector → Claude → CRM write-back
  6. Add webhook trigger — Fire the agent on new lead creation in HubSpot

Step 1: Install Dependencies

You need four packages. anthropic for the API, hubspot-api-client for CRM read/write, pydantic for output validation, and fastapi + uvicorn if you’re adding the webhook trigger in Step 6.

pip install anthropic==0.25.0 hubspot-api-client==8.2.1 pydantic==2.6.4 fastapi==0.110.0 uvicorn==0.29.0 python-dotenv==1.0.1

Pin those versions. HubSpot’s client library has had breaking changes between minor releases, and the Anthropic SDK changed its response structure in 0.20.0. If you’re deploying this as a long-running service, check out our comparison of serverless platforms for Claude agents — Beam is my preference for stateless scoring workloads because of its per-second billing and cold start performance.

Step 2: Define the Scoring Schema

Getting consistent structured output from Claude is non-negotiable for CRM write-back. A score that sometimes comes back as {"score": 82} and sometimes as {"lead_score": "high"} will break your downstream automation. Use Pydantic to validate before you write anything to your CRM.

from pydantic import BaseModel, Field
from typing import Literal

class LeadScore(BaseModel):
    score: int = Field(ge=0, le=100, description="Composite fit score 0-100")
    tier: Literal["hot", "warm", "cold", "disqualified"]
    fit_score: int = Field(ge=0, le=100, description="ICP fit: company size, industry, role")
    intent_score: int = Field(ge=0, le=100, description="Buying intent signals")
    confidence: Literal["high", "medium", "low"]
    reasoning: str = Field(max_length=500, description="1-3 sentence explanation for sales rep")
    next_action: Literal["call_now", "nurture_sequence", "send_case_study", "disqualify"]
    missing_signals: list[str] = Field(default_factory=list, description="Data gaps that limited scoring")

The missing_signals field is one I added after the first production run. Claude will tell you when it doesn’t have enough data to score confidently — which is actually more useful than a confident wrong score. If you want to go deeper on reliable JSON output, the structured output guide for Claude covers prefill tricks and tool-use forcing that eliminate most hallucination edge cases.

Step 3: Write the Claude Scoring Prompt

The system prompt does the heavy lifting. The key insight here: don’t ask Claude to “score this lead.” Ask it to reason through ICP fit and intent separately, then derive the composite. This matches how your best sales reps actually think and produces scores that hold up to scrutiny.

SYSTEM_PROMPT = """You are a B2B lead qualification specialist with deep expertise in SaaS sales.
Your job is to score inbound leads against our Ideal Customer Profile (ICP).

## Our ICP
- Company size: 50-500 employees
- Industry: SaaS, fintech, or professional services
- Role: VP/Director/Head of Engineering, Product, or Operations
- Budget signals: Series A or later, or profitable SMB
- Pain signals: mentions of manual processes, scaling issues, or team growth

## Scoring Rules
1. Score fit (0-100) based purely on firmographic and role match
2. Score intent (0-100) based on behavioral signals, message content, urgency language
3. Composite score = (fit_score * 0.6) + (intent_score * 0.4)
4. If fit_score < 30, tier = "disqualified" regardless of intent
5. Composite >= 75: hot | 50-74: warm | 30-49: cold | <30: disqualified
6. Set confidence="low" if you have fewer than 3 reliable signals

## Output
Return ONLY valid JSON matching the LeadScore schema. No preamble, no markdown fences."""

Notice the explicit weighting (60/40 fit/intent) — without this, Claude will weight them roughly equally, which oversells high-intent leads who are completely outside your ICP. Adjust those weights for your actual sales motion.

Step 4: Build the Signal Collector

This is where most tutorials skip the important part. Raw CRM data is messy — null fields, inconsistent formats, free-text notes. You need to normalize before passing to Claude.

import os
from hubspot import HubSpot
from hubspot.crm.contacts import ApiException

client = HubSpot(access_token=os.environ["HUBSPOT_ACCESS_TOKEN"])

def collect_prospect_signals(contact_id: str) -> dict:
    """Pull and normalize prospect data from HubSpot."""
    
    properties = [
        "firstname", "lastname", "email", "jobtitle", "company",
        "num_employees", "industry", "hs_lead_status", "lifecyclestage",
        "hs_email_last_email_name", "recent_conversion_event_name",
        "message", "linkedin_bio", "annualrevenue", "founded_year"
    ]
    
    try:
        contact = client.crm.contacts.basic_api.get_by_id(
            contact_id, 
            properties=properties
        )
    except ApiException as e:
        raise ValueError(f"Failed to fetch contact {contact_id}: {e}")
    
    props = contact.properties
    
    # Normalize — HubSpot returns everything as strings or None
    return {
        "name": f"{props.get('firstname', '')} {props.get('lastname', '')}".strip(),
        "email": props.get("email", ""),
        "title": props.get("jobtitle", "Unknown role"),
        "company": props.get("company", "Unknown company"),
        "employee_count": int(props.get("num_employees", 0) or 0),
        "industry": props.get("industry", ""),
        "inbound_message": props.get("message", ""),  # form submission text
        "linkedin_bio": props.get("linkedin_bio", ""),
        "last_email_campaign": props.get("hs_email_last_email_name", ""),
        "conversion_event": props.get("recent_conversion_event_name", ""),
        "annual_revenue": props.get("annualrevenue", ""),
    }

Step 5: Run the Scoring Agent

Now chain the pieces. The scoring function calls Claude, validates the response, and writes results back to HubSpot custom properties.

import anthropic
import json
from pydantic import ValidationError

anthropic_client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

def score_lead(contact_id: str) -> LeadScore:
    signals = collect_prospect_signals(contact_id)
    
    # Format signals as a readable block for Claude
    user_message = f"""Score this inbound lead:

Name: {signals['name']}
Title: {signals['title']}
Company: {signals['company']}
Industry: {signals['industry']}
Employee Count: {signals['employee_count']}
Annual Revenue: {signals['annual_revenue']}

Inbound Message:
{signals['inbound_message'] or 'No message provided'}

LinkedIn Bio:
{signals['linkedin_bio'] or 'Not available'}

Last Email Opened: {signals['last_email_campaign'] or 'None'}
Conversion Event: {signals['conversion_event'] or 'None'}
"""

    response = anthropic_client.messages.create(
        model="claude-haiku-4-5",  # ~$0.003 per scoring call at current pricing
        max_tokens=512,
        system=SYSTEM_PROMPT,
        messages=[{"role": "user", "content": user_message}]
    )
    
    raw_json = response.content[0].text
    
    try:
        score_data = json.loads(raw_json)
        lead_score = LeadScore(**score_data)
    except (json.JSONDecodeError, ValidationError) as e:
        # Log the raw response for debugging before raising
        print(f"Parsing failed for contact {contact_id}. Raw: {raw_json[:200]}")
        raise

    # Write score back to HubSpot custom properties
    write_score_to_crm(contact_id, lead_score)
    return lead_score


def write_score_to_crm(contact_id: str, score: LeadScore):
    """Write scoring results to HubSpot custom contact properties."""
    properties = {
        "ai_lead_score": str(score.score),
        "ai_lead_tier": score.tier,
        "ai_score_reasoning": score.reasoning,
        "ai_next_action": score.next_action,
        "ai_score_confidence": score.confidence,
        "ai_missing_signals": ", ".join(score.missing_signals),
    }
    
    client.crm.contacts.basic_api.update(
        contact_id=contact_id,
        simple_public_object_input={"properties": properties}
    )

You’ll need to create those custom properties in HubSpot first (Settings → Properties → Create property). Map them as single-line text except ai_lead_score which should be a number type.

Step 6: Add the Webhook Trigger

Running this manually isn’t the goal. Hook it to HubSpot’s contact creation webhook so scoring fires automatically on every new lead. This turns the script into a live AI lead scoring automation agent.

from fastapi import FastAPI, Request, HTTPException
import hmac, hashlib

app = FastAPI()

HUBSPOT_CLIENT_SECRET = os.environ["HUBSPOT_CLIENT_SECRET"]

def verify_hubspot_signature(request_body: bytes, signature: str) -> bool:
    """Verify the webhook is actually from HubSpot."""
    expected = hmac.new(
        HUBSPOT_CLIENT_SECRET.encode(),
        request_body,
        hashlib.sha256
    ).hexdigest()
    return hmac.compare_digest(expected, signature)

@app.post("/webhook/lead-score")
async def handle_new_lead(request: Request):
    body = await request.body()
    sig = request.headers.get("X-HubSpot-Signature-v3", "")
    
    if not verify_hubspot_signature(body, sig):
        raise HTTPException(status_code=401, detail="Invalid signature")
    
    events = await request.json()
    
    for event in events:
        if event.get("subscriptionType") == "contact.creation":
            contact_id = str(event["objectId"])
            try:
                score = score_lead(contact_id)
                print(f"Scored {contact_id}: {score.tier} ({score.score})")
            except Exception as e:
                print(f"Scoring failed for {contact_id}: {e}")
                # Don't raise — return 200 so HubSpot doesn't retry infinitely
    
    return {"status": "ok"}

Deploy this with uvicorn main:app --host 0.0.0.0 --port 8000 and register the endpoint in HubSpot under Settings → Integrations → Private Apps → Webhooks. For production deployments behind a load balancer, make the scoring async using a task queue so webhook responses return immediately.

If you want to extend this to also send personalized outreach after scoring, the AI email lead generation agent tutorial covers building the follow-up sequence with the same Claude stack. The two agents compose cleanly — score first, then trigger email based on tier.

Common Errors

Claude returns valid JSON but Pydantic validation fails

This usually happens when score or fit_score come back as strings (“82”) rather than integers. Claude Haiku is more prone to this than Sonnet. Fix it by adding a JSON coercion step before validation:

score_data = json.loads(raw_json)
# Coerce numeric fields that might arrive as strings
for field in ["score", "fit_score", "intent_score"]:
    if field in score_data and isinstance(score_data[field], str):
        score_data[field] = int(score_data[field])
lead_score = LeadScore(**score_data)

HubSpot rate limiting (429 errors) on bulk backfills

The HubSpot API allows 100 requests per 10 seconds on free/starter plans. If you’re backfilling existing contacts, add a rate limiter. Don’t fire this in a tight loop against your full contact list without throttling. Use time.sleep(0.15) between calls and batch during off-hours. For large-scale batch processing patterns, the Claude batch processing guide shows how to handle 10K+ records without hitting rate limits on either side.

Scores are inconsistent across similar leads

Claude’s default temperature (1.0) introduces variance that’s fine for creative tasks but not for scoring. Pin it to 0.2 for deterministic outputs:

response = anthropic_client.messages.create(
    model="claude-haiku-4-5",
    max_tokens=512,
    temperature=0.2,  # Add this — cuts score variance significantly
    system=SYSTEM_PROMPT,
    messages=[{"role": "user", "content": user_message}]
)

If you want to understand why temperature matters here and when you’d want it higher, the temperature and top-p guide is worth reading before you tune further.

What to Build Next

Add a score decay mechanism. A lead scored as “hot” three months ago who never converted should automatically downgrade. Implement a nightly cron job that fetches contacts where ai_lead_tier = hot and createdate is older than 30 days with no activity, then re-runs scoring with an additional context field: days_since_scored: 45, activity_since_scoring: none. Claude will naturally factor that into a lower intent score. Pair this with HubSpot workflow automation to trigger re-scoring on deal stage changes and you have a fully closed-loop system that maintains score accuracy without manual intervention.

For solo founders running lean, this whole stack costs under $10/month at 3,000 leads scored — Claude Haiku pricing makes it genuinely viable at that scale. For teams scoring 50K+ leads monthly, switch the model to Sonnet for the top 10% of borderline leads and keep Haiku for clear disqualifications. That hybrid routing approach is something we cover in the broader AI lead qualification sales assistant guide.

Frequently Asked Questions

How accurate is AI lead scoring compared to manual scoring by a sales rep?

In practice, Claude-based scoring matches experienced reps on ICP fit 85-90% of the time when the system prompt is well-calibrated against your actual customer data. Intent scoring is weaker — it can only work with signals you provide, so it misses things like tone of voice on a call. The real advantage isn’t accuracy vs. a human, it’s consistency at scale: Claude applies your ICP criteria identically to every lead, at 3am, without cherry-picking.

Can I use this with Salesforce instead of HubSpot?

Yes — replace the HubSpot client with the simple-salesforce library. The scoring logic and Claude integration are CRM-agnostic. The main difference is Salesforce uses SOQL for queries and has different object naming conventions. Swap collect_prospect_signals to use sf.Contact.get(contact_id) and update write_score_to_crm to use sf.Contact.update(contact_id, properties).

What happens if Claude returns an error or times out during scoring?

The webhook endpoint catches exceptions per-contact and logs them without re-raising, which returns a 200 to HubSpot and prevents infinite retries. For production, you should write a failed score to a dead-letter queue (Redis list or a simple database table) and run a retry job every 15 minutes. Build proper fallback logic using patterns from the Claude agent fallback and retry guide.

How do I calibrate the scoring prompt for my specific ICP?

Start by manually scoring 50 recent leads — 20 that converted, 20 that didn’t, 10 you’re unsure about. Use those as test cases and iterate on your system prompt until Claude’s scores match your human scores on at least 80% of the sample. Pay most attention to the edge cases (the 10 unsure ones) — those reveal where your ICP definition is fuzzy, which is actually a sales process problem, not a prompt problem.

What’s the cost per lead scored at scale?

With Claude Haiku and a typical prompt (~800 input tokens) plus 512 output tokens, you’re looking at roughly $0.002-$0.004 per lead at current pricing. At 10,000 leads/month that’s $20-40. Sonnet is 5-8x more expensive per call — use it selectively for borderline leads where confidence is low. Cache your system prompt using Anthropic’s prompt caching feature to cut costs another 30-50% on repeated runs.

Can I run this without a webhook, just on a schedule?

Absolutely. Skip Step 6 and instead query HubSpot for contacts created in the last hour where ai_lead_score is empty, then loop through them. Run this as a cron job every 30 minutes. It’s slightly less real-time but much simpler to operate — no public endpoint to secure, no always-on server required. A basic Linux cron setup works fine for this pattern.

Put this into practice

Try the Payment Integration agent — ready to use, no setup required.

Browse Agents →

Editorial note: API pricing, model capabilities, and tool features change frequently — always verify current details on the vendor’s website before building in production. Code examples are tested at time of writing; pin your dependency versions to avoid breaking changes. Some links in this article may be affiliate links — we may earn a commission if you sign up, at no extra cost to you.


Share.
Leave A Reply