Author: user

user

Building Multimodal AI Features with Claude: Image Understanding in Agent Workflows

March 21, 2026

Most Claude agent tutorials stop at text in, text out. That’s fine for summarisation and Q&A, but the moment a real workflow hits you — a user uploads a screenshot of an error, a client sends a PDF invoice, or a monitoring system captures a UI screenshot — you need your agent to actually see things. Building claude multimodal image agents closes that gap, and it’s less work than you’d expect. This article walks through the concrete implementation: how to pass images to Claude, how to structure vision tasks inside agent workflows, and where things break in production. What Claude’s…

System Prompts That Work: The Anatomy of High-Performance Claude Agent Instructions

March 21, 2026

Most developers writing claude system prompts agents treat the system prompt like a sticky note — a few lines reminding the model what it is. Then they wonder why their agent hallucinates, goes off-brand, or collapses on edge cases. The system prompt isn’t a sticky note. It’s the contract between your intentions and the model’s behavior, and if it’s vague, the model fills the gaps however it wants. After shipping production agents for customer support, data extraction, code review, and internal tooling, I’ve developed a fairly strong opinion on what separates a system prompt that holds up from one that…

Prompt Injection Attacks on Claude Agents: Defense Strategies That Actually Work in Production

March 21, 2026

If you’re building Claude agents that process external content — emails, web pages, user-submitted documents, tool outputs — you already have a prompt injection problem. You might just not know it yet. Prompt injection defense for Claude isn’t a nice-to-have you add before launch; it’s architecture you need to design in from day one. This article covers the actual attack vectors I’ve seen in production, the defenses that hold up, and the ones that fail the moment a real attacker shows up. The short version: layered validation, structural separation of data from instructions, and output monitoring will block the overwhelming…

Modal vs Replicate vs Beam: Choosing the Right Serverless Platform for Claude Agent Workloads

March 21, 2026

If you’re running Claude agents in production, you’ve probably already hit the wall with traditional deployment. You need something that scales to zero between runs, handles GPU bursts without pre-provisioning, and doesn’t charge you for idle time while your orchestration layer waits on API responses. That’s where serverless AI deployment platforms like Modal, Replicate, and Beam come in — and picking the wrong one for your workload pattern will cost you either money, latency, or both. I’ve deployed Claude-backed agents on all three. Here’s what actually matters in production and where each platform quietly fails you. What “Serverless” Actually Means…

Building Stateful Claude Agents with Persistent Memory: A Production Architecture Without Extra Databases

March 21, 2026

Most tutorials on Claude agents treat each conversation like it never happened. You get a clean context window, the user asks something, Claude responds, done. That works fine for demos. In production, it’s the reason your agent feels dumb — it can’t remember that this user prefers metric units, already completed onboarding, or has asked the same question three times this week. Building stateful Claude agents memory into your architecture is what separates a useful product from a fancy autocomplete. The good news: you don’t need Redis, Pinecone, or a custom vector store to get meaningful persistence across sessions. You…

RAG vs Fine-Tuning for Production Agents: Cost Analysis and When to Use Each (Updated 2025)

March 21, 2026

Most teams asking about RAG vs fine-tuning are asking the wrong question. They’re treating it as a binary choice when the real decision is: what does your agent actually need to know, and how often does that change? I’ve shipped both approaches in production — RAG pipelines handling millions of queries and fine-tuned models running in enterprise tools — and the failure modes are completely different. Get this architectural decision wrong early and you’ll be rebuilding six months later. This is an updated breakdown for 2025, where the calculus has shifted meaningfully. Models are smarter, context windows are larger, and…

Claude Projects in Cowork: Organizing Multi-Agent Systems With File Context and Instructions

March 21, 2026

If you’ve spent any time building multi-agent systems with Claude, you’ve probably hit the same wall: context gets stale, instructions drift between sessions, and keeping a shared “brain” across agents requires duct tape and prayer. Claude Projects is Anthropic’s answer to that problem — a persistent workspace that holds files, custom instructions, and conversation history in one place. When you layer that on top of a structured development environment like Cowork, you get something genuinely useful for complex agent architectures. This isn’t a surface-level walkthrough of clicking buttons in the Claude UI. We’re going to cover how to structure real…

Model Context Protocol (MCP) Servers for Claude: Building Production-Grade Tool Integrations

March 21, 2026

If you’ve spent any time building Claude agents that call external tools, you’ve hit the same wall: every integration is a custom snowflake. Database connector wired one way, web scraper wired another, internal API doing something completely different. MCP servers for Claude — part of Anthropic’s Model Context Protocol — exist to solve exactly this problem. One standard protocol, consistent tooling, and an agent surface that actually scales. This article walks you through building production-grade MCP server integrations from actual implementation experience, not the getting-started docs. What MCP Actually Is (And What the Docs Understate) The Model Context Protocol is…

Building Domain-Specific Embedding Models in 24 Hours: HuggingFace’s Fast-Track Method

March 21, 2026

Generic embeddings are leaving performance on the table. If you’re building a RAG pipeline for legal contracts, medical records, or e-commerce product catalogs, the off-the-shelf text-embedding-ada-002 or all-MiniLM-L6-v2 models were trained on general web text — not your domain. The result is retrieval that feels almost right but keeps surfacing the wrong chunks at the worst moments. Domain-specific embeddings fix this, and HuggingFace’s ecosystem now makes it possible to build them in a single working day without a PhD in NLP or a $50k GPU bill. This isn’t a tutorial about fine-tuning BERT for 72 hours on eight A100s. It’s…

Multi-Token Prediction (MTP) for Faster LLM Inference: How Qwen 3.5 Changes Agent Speed

March 21, 2026

If you’ve been running LLM agents in production, you already know where the time goes: it’s not the prefill, it’s the decode phase. Every token generated sequentially, one at a time, is a fundamental bottleneck that no amount of hardware throwing solves cleanly. Multi-token prediction is the architectural change that directly attacks this problem — and with Qwen’s roadmap and MLX’s upcoming support, it’s about to move from research curiosity to something you can actually deploy. This article is about what MTP means practically for agent developers: how it works, what the real latency gains look like, how to measure…