Author: user

user

Building Claude agents that browse the web: tool use, follow-ups, and reliability patterns

March 23, 2026

By the end of this tutorial, you’ll have a working Claude agent that can search Google, fetch and parse web pages, handle pagination, deal with JavaScript-heavy sites, and recover gracefully when browsing fails. Claude agents web browsing is one of those capabilities that looks simple in demos and falls apart immediately in production — this guide covers the parts that actually break. Install dependencies — set up httpx, BeautifulSoup, and the Anthropic SDK Define browsing tools — register search and fetch tools with Claude’s tool-use API Build the agentic loop — handle multi-turn tool calls until Claude has enough context…

How to cut Claude API costs in half: caching, batching, and model selection strategies

March 23, 2026

If you’re running Claude at any meaningful scale, you’ve probably opened your Anthropic billing dashboard and felt a small jolt of panic. A few agents, a document pipeline, a customer-facing feature — and suddenly you’re looking at hundreds of dollars a month with a clear upward trajectory. The good news: most teams are leaving 40–60% savings on the table because they haven’t applied the three core techniques that actually move the needle. This article is a precise walkthrough of how to reduce Claude API costs using prompt caching, batch processing, and intelligent model routing — with real numbers attached to…

AI infrastructure for solo founders: serverless vs self-hosted vs API trade-offs

March 23, 2026

Most solo founders making AI infrastructure decisions are choosing based on vibes and blog posts written by people who’ve never paid a production invoice. The result is predictable: either massively over-engineered self-hosted setups that consume weekends, or naive API integrations that hit $800/month before the product has ten users. AI infrastructure for solo founders is genuinely different from the enterprise calculus — you’re optimizing for iteration speed, cash survival, and the ability to pivot, not multi-region HA and SLA guarantees. This article gives you real cost projections and setup complexity at three different usage scales, for three approaches: managed APIs…

Structured output with Claude: JSON, XML, and regex-based validation patterns

March 23, 2026

By the end of this tutorial, you’ll have a working Python implementation that forces Claude to return valid, schema-compliant JSON every time — with retry logic, fallback handling, and a comparison of three approaches so you can pick the right one for your use case. Claude structured output JSON is one of those things that looks trivial until you’re three months into production and getting random parse failures at 2am. There are three real approaches in play: Anthropic’s native structured output mode (via tool use), manual JSON prompting with validation, and regex-based extraction as a last resort. Each has a…

Building a production RAG pipeline in 60 minutes: PDF to Claude agent knowledge base

March 23, 2026

By the end of this tutorial, you’ll have a working RAG pipeline for Claude agents that ingests PDFs, chunks them intelligently, embeds them into a vector store, retrieves relevant context at query time, and feeds that context into a Claude agent response. Every step includes code that actually runs — not pseudocode. This is the pipeline I’d use for a production customer support bot, internal documentation search, or any agent that needs to answer questions grounded in your own documents. It avoids LangChain abstractions where they add friction, uses sentence-transformers for embeddings (free, fast, good enough), and ChromaDB for local…

Temperature and top-p in production: when to randomize LLM outputs for agents

March 23, 2026

Most developers set temperature once and forget it. They pick 0.7 because it “feels balanced,” or they hammer everything to 0 because “determinism is safer.” Both approaches cost you in production — one way is subtle bugs and boring outputs, the other is brittle agents that can’t generate variation when you actually need it. Getting temperature top-p LLM production settings right is one of those unsexy optimizations that quietly improves output quality across every task type you’re running. This article covers how these parameters actually work under the hood, the three most common misconceptions that will burn you, and exact…

RAG vs fine-tuning vs extended thinking: choosing the right knowledge strategy for Claude agents

March 23, 2026

Most teams building Claude agents waste weeks chasing the wrong solution. They reach for RAG when they need fine-tuning, or burn GPU budget fine-tuning when a simple retrieval setup would have been faster and cheaper. The RAG vs fine-tuning Claude decision isn’t really a technical debate — it’s a requirements conversation that most developers skip. And now there’s a third option that didn’t exist 18 months ago: extended thinking, which lets Claude reason through complex problems before answering without any knowledge injection at all. I ran all three approaches against two realistic tasks — contract clause review and code vulnerability…

Extended thinking vs chain-of-thought: when Claude’s reasoning modes actually matter

March 23, 2026

If you’ve run Claude extended thinking benchmarks against standard chain-of-thought prompting and found mixed results, you’re not alone. The Anthropic docs make extended thinking sound universally better — it’s not. Whether you should pay the latency and token cost for extended thinking or stick with a well-structured CoT prompt depends almost entirely on the task type and your tolerance for waiting 15–30 seconds for a response. I’ve benchmarked both approaches across four representative agent tasks: multi-step code debugging, mathematical reasoning, ambiguous classification, and structured data extraction. The results are more nuanced than “extended thinking wins at hard problems.” Let me…

Contract review automation: building a Claude agent that flags risks and extracts terms

March 23, 2026

By the end of this tutorial, you’ll have a working contract review Claude agent that ingests PDF or plain-text contracts, flags potentially risky clauses (indemnification, auto-renewal, limitation of liability, and more), extracts key terms into structured JSON, and produces a plain-English executive summary — all in under 30 seconds per document. Legal review is one of the highest-leverage places to deploy an LLM. A founder reviewing a vendor agreement, a ops team processing dozens of SaaS contracts, or a legal team triaging NDAs before sending to counsel — all of them share the same bottleneck: reading the same clause patterns…

Building a web-browsing Claude agent: fetching pages, parsing content, navigating links autonomously

March 23, 2026

By the end of this tutorial, you’ll have a working Claude agent web browsing system that can fetch URLs, parse HTML content, follow links autonomously, and synthesize multi-page research — all without human input between steps. We’ll handle both static pages and JavaScript-rendered content, and I’ll show you the retry and parsing patterns that actually hold up in production. Install dependencies — set up httpx, Playwright, BeautifulSoup4, and the Anthropic SDK Define browsing tools — build fetch_page, extract_links, and search_page as Claude tool-use functions Handle JavaScript rendering — add Playwright for SPAs and dynamic content Build the agent loop —…