Sunday, April 5

By the end of this tutorial, you’ll have a working Claude web browsing agent that fetches live web content, extracts relevant information, and returns grounded summaries — with rate limiting, error handling, and anti-hallucination guardrails baked in. This isn’t a toy demo: the architecture here is production-ready and costs roughly $0.003–0.008 per full research run using Claude Haiku 3.5.

The core problem with LLMs doing “research” is that they hallucinate when they don’t know something. Giving Claude real-time web access solves that — but it introduces new failure modes: rate limits, paywalls, bot detection, and the model confidently summarizing a 404 page. This guide covers all of it.

  1. Install dependencies — Set up Python environment with httpx, BeautifulSoup, and the Anthropic SDK
  2. Build the web fetcher tool — A robust fetch function with timeouts, headers, and content extraction
  3. Define the tool schema for Claude — Register the fetch function as a callable tool
  4. Wire up the agent loop — Implement the tool-use loop that drives multi-step browsing
  5. Add rate limiting and retry logic — Handle 429s and transient failures without crashing
  6. Ground Claude’s output in fetched content — Prompt engineering to prevent hallucination

Step 1: Install Dependencies

You need four packages: anthropic for the API, httpx for async HTTP (faster and more reliable than requests for agent workloads), beautifulsoup4 for HTML parsing, and tenacity for retry logic.

pip install anthropic httpx beautifulsoup4 tenacity lxml

Pin your versions in production. The Anthropic SDK in particular ships breaking changes between minor versions. Use anthropic==0.40.0 or whatever current stable release is at time of writing — check PyPI before locking.

Step 2: Build the Web Fetcher Tool

The fetcher needs to handle the full mess of real web content: JavaScript-heavy pages that return empty bodies, pages behind Cloudflare, redirects, and encoding issues. We’re using httpx synchronously here to keep the main agent loop simple — swap to asyncio if you’re running parallel fetches.

import httpx
from bs4 import BeautifulSoup
from tenacity import retry, stop_after_attempt, wait_exponential

HEADERS = {
    "User-Agent": "Mozilla/5.0 (compatible; ResearchBot/1.0)",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.5",
}

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=2, max=10)
)
def fetch_url(url: str, max_chars: int = 8000) -> dict:
    """
    Fetch a URL and return cleaned text content.
    Returns a dict with 'content', 'status_code', and 'error' keys.
    """
    try:
        with httpx.Client(timeout=15.0, follow_redirects=True) as client:
            response = client.get(url, headers=HEADERS)
            
            if response.status_code == 429:
                # Let tenacity handle the retry, but signal the rate limit
                raise httpx.HTTPStatusError(
                    "Rate limited", request=response.request, response=response
                )
            
            if response.status_code != 200:
                return {
                    "content": "",
                    "status_code": response.status_code,
                    "error": f"HTTP {response.status_code}"
                }
            
            # Parse HTML and extract readable text
            soup = BeautifulSoup(response.text, "lxml")
            
            # Remove noise: scripts, styles, nav, footer
            for tag in soup(["script", "style", "nav", "footer", "header", "aside"]):
                tag.decompose()
            
            # Get main content — prefer <main> or <article> if present
            main = soup.find("main") or soup.find("article") or soup.body
            text = main.get_text(separator="\n", strip=True) if main else ""
            
            # Truncate to stay within context limits
            return {
                "content": text[:max_chars],
                "status_code": response.status_code,
                "error": None
            }
    
    except httpx.TimeoutException:
        return {"content": "", "status_code": None, "error": "Request timed out"}
    except Exception as e:
        return {"content": "", "status_code": None, "error": str(e)}

The max_chars=8000 limit is deliberate. Feeding 50,000 characters of scraped HTML into Claude’s context wastes tokens and degrades output quality. Eight thousand characters gives enough signal for most pages without blowing your budget. If you’re running this at scale, the LLM caching strategies covered here can cut repeated fetch costs significantly when the same URLs appear across multiple agent runs.

Step 3: Define the Tool Schema for Claude

Claude’s tool use requires a JSON schema that describes what the tool does and what parameters it accepts. Keep descriptions precise — Claude uses them to decide when to call the tool, not just how.

TOOLS = [
    {
        "name": "fetch_webpage",
        "description": (
            "Fetches the text content of a webpage at the given URL. "
            "Use this when you need current, real-time information from a specific website. "
            "Do NOT invent or assume content — only use what is returned by this tool."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "url": {
                    "type": "string",
                    "description": "The full URL to fetch, including https://"
                },
                "reason": {
                    "type": "string",
                    "description": "Why you are fetching this URL — what information you expect to find"
                }
            },
            "required": ["url", "reason"]
        }
    }
]

The reason field isn’t used by the fetcher — it’s a chain-of-thought forcing mechanism. When Claude has to articulate why it’s fetching a URL, it makes fewer pointless requests and catches itself before fetching irrelevant pages. Cheap prompt engineering that actually works.

Step 4: Wire Up the Agent Loop

This is the core loop: send a message, check if Claude wants to use a tool, execute it, feed results back, repeat until Claude produces a final answer.

import anthropic
import json

client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from env

SYSTEM_PROMPT = """You are a research assistant with web browsing capability.

When answering questions that require current information:
1. Use the fetch_webpage tool to retrieve real content
2. Base your answer ONLY on what the tool returns — never fabricate facts
3. If a page fails to load, try an alternative source
4. Cite the URLs you used in your final answer
5. If you cannot find reliable information after 3 fetch attempts, say so explicitly

Do not answer from memory for time-sensitive topics like prices, news, or statistics."""

def run_browsing_agent(user_query: str, max_tool_calls: int = 5) -> str:
    messages = [{"role": "user", "content": user_query}]
    tool_call_count = 0
    
    while tool_call_count < max_tool_calls:
        response = client.messages.create(
            model="claude-haiku-3-5-20241022",  # Haiku: fast and cheap for browsing loops
            max_tokens=2048,
            system=SYSTEM_PROMPT,
            tools=TOOLS,
            messages=messages
        )
        
        # If Claude is done, return the text response
        if response.stop_reason == "end_turn":
            return _extract_text(response.content)
        
        # Handle tool use
        if response.stop_reason == "tool_use":
            # Add Claude's response (which includes the tool call) to messages
            messages.append({"role": "assistant", "content": response.content})
            
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    tool_call_count += 1
                    print(f"[Tool call {tool_call_count}] Fetching: {block.input['url']}")
                    
                    result = fetch_url(block.input["url"])
                    
                    # Build the tool result — always return something, even on failure
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": (
                            result["content"] if result["content"]
                            else f"Failed to fetch page. Error: {result['error']}"
                        )
                    })
            
            messages.append({"role": "user", "content": tool_results})
        else:
            # Unexpected stop reason
            break
    
    return "Max tool calls reached without a final answer."

def _extract_text(content_blocks) -> str:
    return " ".join(
        block.text for block in content_blocks
        if hasattr(block, "text")
    )

The max_tool_calls=5 guard is non-negotiable. Without it, a confused agent can spiral into fetching dozens of pages on an ambiguous query. Five calls is enough for most research tasks — competitive intelligence, current pricing, news summaries. If you’re building deeper research workflows, check out how multi-agent orchestration with subagents handles this kind of problem at scale.

Step 5: Add Rate Limiting and Retry Logic

The tenacity decorator on fetch_url handles transient failures, but you also need to manage Claude API rate limits — especially if you’re running this agent for multiple users concurrently.

import time
from collections import deque
from threading import Lock

class RateLimiter:
    """Simple token bucket — allows burst_size requests, refills at rate_per_second."""
    def __init__(self, rate_per_second: float = 2.0, burst_size: int = 5):
        self.rate = rate_per_second
        self.burst = burst_size
        self.tokens = burst_size
        self.last_refill = time.monotonic()
        self.lock = Lock()
    
    def acquire(self):
        with self.lock:
            now = time.monotonic()
            elapsed = now - self.last_refill
            self.tokens = min(self.burst, self.tokens + elapsed * self.rate)
            self.last_refill = now
            
            if self.tokens >= 1:
                self.tokens -= 1
                return
            
            # Wait until a token is available
            wait_time = (1 - self.tokens) / self.rate
            time.sleep(wait_time)
            self.tokens = 0

# Instantiate once and share across agent runs
web_limiter = RateLimiter(rate_per_second=2.0, burst_size=3)

# Wrap fetch_url calls with the limiter
def fetch_url_limited(url: str, **kwargs) -> dict:
    web_limiter.acquire()
    return fetch_url(url, **kwargs)

At 2 requests/second, you’re comfortably within most sites’ informal limits and well under Cloudflare’s default thresholds. Aggressive scrapers get IP-blocked; polite agents don’t. For production deployments where you need more throughput, the fallback and retry logic patterns on this site are worth reading before you ship.

Step 6: Ground Claude’s Output in Fetched Content

The system prompt already does heavy lifting here, but you need one more layer: a post-processing check that catches answers containing phrases like “as of my knowledge cutoff” or “I believe” when live data was available.

HALLUCINATION_FLAGS = [
    "as of my knowledge cutoff",
    "i believe this may have changed",
    "i'm not certain of the current",
    "based on my training data",
    "you should verify this",
]

def check_grounding(answer: str) -> tuple[str, bool]:
    """
    Returns the answer and a boolean indicating if hallucination flags were detected.
    """
    answer_lower = answer.lower()
    flagged = any(phrase in answer_lower for phrase in HALLUCINATION_FLAGS)
    
    if flagged:
        warning = (
            "\n\n⚠️ Note: This answer may contain unverified information. "
            "The agent flagged uncertainty — consider running the query again or checking sources manually."
        )
        return answer + warning, True
    
    return answer, False

# Updated usage
def run_and_check(query: str) -> str:
    raw_answer = run_browsing_agent(query)
    final_answer, was_flagged = check_grounding(raw_answer)
    
    if was_flagged:
        print("[Warning] Potential hallucination detected in output")
    
    return final_answer

This isn’t foolproof — a confident hallucination won’t trigger these flags. But it catches the cases where Claude hedges in its own language while still producing an answer, which is a common failure mode. For a deeper treatment of factual accuracy across models, the LLM factual accuracy benchmark we ran is worth reading before choosing your model for this use case.

Putting It All Together

if __name__ == "__main__":
    query = "What is the current price of Claude Opus 4 API tokens as of today?"
    
    print(f"Query: {query}\n")
    result = run_and_check(query)
    print(f"\nAnswer:\n{result}")

A typical run on a pricing query hits 2-3 fetch calls, consumes around 1,500–2,500 input tokens and 300–500 output tokens with Haiku. At current Haiku 3.5 pricing (~$0.80/M input, $4/M output), that’s roughly $0.003–0.006 per query. Claude Sonnet 3.5 gives better synthesis quality on complex pages at about 5x the cost — worth it for user-facing applications, overkill for batch research pipelines.

Common Errors

Error 1: Empty content despite 200 status

JavaScript-rendered pages return near-empty HTML to a plain HTTP client. Your parser gets a shell with no article text. Fix: Detect when extracted text is under 200 characters and return a specific error: "Page requires JavaScript rendering — content unavailable." Then either skip the URL or use a Playwright/Puppeteer-based fallback for that specific fetch. Don’t silently pass empty content to Claude — it’ll hallucinate page content.

Error 2: Tool result exceeding context window

If you fetch several long pages in one agent run, you can hit context limits before Claude produces a final answer. Fix: Track cumulative characters fed back as tool results and reduce max_chars dynamically. A simple heuristic: if you’ve already fed 20,000+ characters, drop subsequent page limits to 2,000. Claude can still extract signal from shorter excerpts.

Error 3: Claude fetching the same URL repeatedly

On ambiguous queries, Claude sometimes loops — fetching the same homepage three times. Fix: Keep a fetched_urls set in the agent loop and return a cached result (or a “already fetched this URL” message) if the same URL appears twice. Add this inside the tool call handler block before calling fetch_url_limited.

fetched_urls = set()  # add this before the while loop

# Inside the tool_use handler:
if block.input["url"] in fetched_urls:
    result_content = "Already fetched this URL in this session. Use a different source."
else:
    fetched_urls.add(block.input["url"])
    result = fetch_url_limited(block.input["url"])
    result_content = result["content"] or f"Error: {result['error']}"

What to Build Next

The natural extension is adding a search step before fetching — instead of requiring Claude to already know URLs, give it a search_web tool that queries the Brave Search API or SerpAPI and returns a list of relevant URLs. Claude then decides which ones to fetch. This turns the agent from a “fetch if you know the URL” tool into a genuine research assistant that can answer questions about anything on the web.

Brave Search API is $3/1000 queries at the basic tier — cheap enough to include in every agent run. Combine this with the prompt chaining patterns described here for multi-step research tasks (search → fetch → synthesize → follow-up search based on gaps), and you have a system that handles genuinely open-ended research queries without hallucinating.

Bottom line by reader type: If you’re a solo founder building a competitive intelligence or market research tool, this Claude web browsing agent architecture with Haiku is your cheapest path to reliable live data. If you’re deploying for a team with higher quality requirements, swap Haiku for Sonnet 3.5 and add the Playwright fallback for JS-heavy pages. Enterprise teams should front this with a proper caching layer — the same URL gets fetched repeatedly across users, and caching those fetch results alone can cut costs 40%+.

Frequently Asked Questions

How do I stop Claude from hallucinating when browsing the web?

The most effective approach is explicit system prompt instructions that prohibit using training knowledge for time-sensitive facts, combined with requiring Claude to cite fetched URLs in its answer. Adding a post-processing check for hedging language (“I believe”, “as of my knowledge cutoff”) catches cases where the model falls back on training data despite having tool access. Never pass empty fetch results to Claude — if a page fails to load, say so explicitly in the tool result.

Can I use this Claude web browsing agent with Claude Opus instead of Haiku?

Yes, just change the model parameter. Opus produces noticeably better synthesis on complex, multi-source research tasks but costs roughly 15x more than Haiku per token. For most browsing use cases — pricing lookups, news summaries, competitor research — Haiku’s quality is sufficient. Reserve Opus for tasks where output quality directly impacts revenue, like investor reports or detailed technical research.

What happens when sites block my web browsing agent?

Cloudflare and similar bot protection will return 403s or redirect to a challenge page. Your fetcher gets back an HTML page that’s essentially empty of useful content. Detect this by checking if the extracted text contains phrases like “Just a moment” or “Checking your browser” and return a specific error to Claude. For blocked domains, maintain a fallback list of alternative sources Claude can try (e.g., cached versions, archive.org, or alternative publishers covering the same topic).

How do I handle paywalled content in my Claude browsing agent?

Paywalled pages typically return a snippet plus a signup prompt — your fetcher gets 200–400 characters of real content followed by noise. Set a minimum content threshold (e.g., 500 characters) and treat anything below it as a soft failure. Return a message like “Paywall detected — only preview content available” so Claude knows to seek an alternative source rather than summarizing a subscription prompt.

Is it legal to scrape websites with a Claude agent?

This depends on the site’s Terms of Service and your jurisdiction. Most sites prohibit automated scraping in their ToS; actual legal enforcement varies widely. For commercial use, prefer APIs (news APIs, search APIs) over raw scraping. If you must scrape, respect robots.txt, use reasonable rate limits, and don’t reproduce full article content — summarization for internal use sits in a different legal position than republishing scraped text.

How do I add search capability so Claude can find URLs on its own?

Add a second tool — search_web — that calls the Brave Search API or SerpAPI and returns a list of URLs with titles and snippets. Claude decides which URLs to fetch based on the search results. Brave Search costs $3/1000 queries at the basic tier. Define the tool schema alongside fetch_webpage and the agent will naturally chain search → fetch → synthesize without additional prompting.

Put this into practice

Try the Web Vitals Optimizer agent — ready to use, no setup required.

Browse Agents →

Editorial note: API pricing, model capabilities, and tool features change frequently — always verify current details on the vendor’s website before building in production. Code examples are tested at time of writing; pin your dependency versions to avoid breaking changes. Some links in this article may be affiliate links — we may earn a commission if you sign up, at no extra cost to you.


Share.
Leave A Reply