Building a Claude skill from scratch: step-by-step integration guide for the Agent SDK

Q: What is the difference between a Claude skill and a tool in the Agent SDK?

They're the same concept, just different naming conventions. Anthropic's documentation uses "tools" in the API reference and "skills" in higher-level agent framing. In the SDK, you define a tool schema and a handler function — together those form what most people call a skill. The JSON schema format is identical regardless of which term you use.

Q: How do I get Claude to always use a specific tool instead of answering from memory?

Set tool_choice={"type": "tool", "name": "your_tool_name"} in the API call to force a specific tool call, or {"type": "any"} to force Claude to use at least one tool. Use this during testing, but in production stick with "auto" unless you have a specific reason to override — forced tool calls can produce worse outputs when the tool isn't actually relevant.

Q: Can I run multiple skills in parallel in a single agent call?

Yes. Claude can return multiple tool_use blocks in a single response, and you can execute them concurrently before sending all results back in one tool_result batch. Use asyncio.gather() or a thread pool if your handlers do I/O. The loop in this tutorial already handles multiple tool_use blocks in each iteration.

Q: How much does it cost to run a Claude agent with tool calls?

Each tool call round-trip adds tokens: Claude's tool_use block plus your tool_result content both count toward input tokens on the next call. A typical agent call with one tool use on Haiku costs roughly $0.002–$0.005 total depending on result size. Multi-step chains with large database results can hit $0.01–0.03 per run. Always log token usage from response.usage during development so you're not surprised in production.

Q: What's the best model to use for skill-based agents in production?

Claude 3.5 Haiku for high-volume tool dispatch where the skill does the heavy lifting — it reliably triggers tools and is ~10x cheaper than Sonnet. Use Claude 3.5 Sonnet when the agent needs to reason about complex tool results or chain multiple tools with nuanced logic. Don't use Opus for tool calling unless you're also doing complex reasoning in the same call — the cost delta isn't justified by tool call quality alone.

By the end of this tutorial, you’ll have a fully working custom skill built with the Claude skill Agent SDK — defined, tested locally, and wired into a live agent that can call it. We’re building a database query skill: something realistic enough to show you where the sharp edges are, not a toy “hello world” example.

If you’ve read the architecture comparison between the Claude Agent SDK and plain Claude API, you already know the SDK adds structured tool dispatch, session handling, and a cleaner interface for skill registration. This tutorial picks up where that overview leaves off and gets into implementation.

Install dependencies — set up the SDK, environment, and project structure
Define the skill schema — write the JSON schema Claude uses to understand your skill
Implement the skill handler — write the Python function that actually executes
Register the skill with an agent — attach it to a Claude agent instance
Test locally without API calls — validate schema and handler in isolation
Run an end-to-end agent call — invoke Claude and confirm tool use fires correctly
Deploy to production — environment config, error handling, and logging

Step 1: Install Dependencies

You need Python 3.10+, the Anthropic SDK (which includes the agent/tool scaffolding), and a lightweight SQLite setup for the database skill we’re building. No heavy ORM required.

pip install anthropic>=0.25.0 python-dotenv sqlite-utils

Create your project structure:

mkdir claude-skill-demo && cd claude-skill-demo
touch main.py skills/db_query.py .env

In your .env:

ANTHROPIC_API_KEY=sk-ant-...
DB_PATH=./data/demo.db

Note on SDK versions: Anthropic moves fast. Pin anthropic>=0.25.0,<0.26.0 in production until you’ve validated the upgrade. Tool call interfaces have shifted between minor versions.

Step 2: Define the Skill Schema

The schema is what Claude sees. It determines whether Claude knows when to call your skill and what arguments to pass. A vague description means Claude will either over-call or under-call your skill — both are painful to debug.

# skills/db_query.py

DB_QUERY_SCHEMA = {
    "name": "query_database",
    "description": (
        "Execute a read-only SQL query against the product database. "
        "Use this when the user asks for specific data about orders, "
        "customers, or inventory. Do NOT use for write operations."
    ),
    "input_schema": {
        "type": "object",
        "properties": {
            "sql": {
                "type": "string",
                "description": (
                    "A valid SQLite SELECT statement. "
                    "Only SELECT queries are permitted."
                )
            },
            "limit": {
                "type": "integer",
                "description": "Maximum rows to return. Defaults to 10.",
                "default": 10
            }
        },
        "required": ["sql"]
    }
}

Three things that matter here: the description field needs to tell Claude when to use the tool (not just what it does), required fields must be genuinely required, and you should explicitly block operations you don’t want — Claude respects “do NOT use for write operations” more reliably than you’d expect.

Step 3: Implement the Skill Handler

The handler receives the validated inputs from Claude’s tool call and returns a result. Keep it to a single responsibility — this is not where you add retry logic or caching.

# skills/db_query.py (continued)

import sqlite3
import json
import os

def handle_db_query(inputs: dict) -> str:
    """
    Execute a read-only SQL query and return results as JSON string.
    The return value must be a string — Claude receives it as tool_result content.
    """
    sql = inputs["sql"].strip()
    limit = inputs.get("limit", 10)

    # Hard block on write operations — don't rely solely on the schema description
    forbidden = ["insert", "update", "delete", "drop", "alter", "create"]
    if any(sql.lower().startswith(kw) for kw in forbidden):
        return json.dumps({"error": "Write operations are not permitted."})

    # Enforce limit at the handler level too
    if "limit" not in sql.lower():
        sql = f"{sql} LIMIT {limit}"

    db_path = os.getenv("DB_PATH", "./data/demo.db")

    try:
        conn = sqlite3.connect(db_path)
        conn.row_factory = sqlite3.Row
        cursor = conn.cursor()
        cursor.execute(sql)
        rows = [dict(row) for row in cursor.fetchall()]
        conn.close()
        return json.dumps({"rows": rows, "count": len(rows)})
    except sqlite3.Error as e:
        return json.dumps({"error": str(e)})

Return type must be a string. The SDK wraps your return value in a tool_result block and sends it back to Claude. If you return a dict, it’ll either fail silently or raise a serialization error depending on SDK version.

Also notice the defense-in-depth: the schema says SELECT only, the handler also checks. This mirrors the patterns in our guide on reducing LLM hallucinations with structured output verification — never trust a single layer.

Step 4: Register the Skill With an Agent

# main.py

import anthropic
from dotenv import load_dotenv
from skills.db_query import DB_QUERY_SCHEMA, handle_db_query

load_dotenv()

client = anthropic.Anthropic()

# Map tool names to handler functions
SKILL_REGISTRY = {
    "query_database": handle_db_query,
}

TOOLS = [DB_QUERY_SCHEMA]  # Add more skills here as you build them

SYSTEM_PROMPT = (
    "You are a data assistant for an e-commerce team. "
    "When users ask about orders, customers, or inventory, "
    "use the query_database tool to retrieve accurate data. "
    "Always confirm what data you found before interpreting it."
)

The registry pattern — a plain dict mapping tool names to callables — is intentionally simple. As your skill count grows, you can replace this with class-based registration or a plugin loader, but don’t over-engineer it at step zero.

Step 5: Test Locally Without API Calls

Before burning API credits, validate your schema and handler in isolation. This catches the most common errors: wrong required field names, serialization issues, and SQL injection gaps.

# test_skill.py

import json
from skills.db_query import DB_QUERY_SCHEMA, handle_db_query
import jsonschema  # pip install jsonschema

def test_schema_validation():
    """Verify our schema is valid JSON Schema draft-7."""
    # Simulate what Claude would send as tool_use input
    valid_input = {"sql": "SELECT * FROM orders WHERE status = 'pending'"}
    invalid_input = {"limit": 5}  # Missing required 'sql'

    schema = DB_QUERY_SCHEMA["input_schema"]

    try:
        jsonschema.validate(valid_input, schema)
        print("✓ Valid input passes schema")
    except jsonschema.ValidationError as e:
        print(f"✗ Valid input failed: {e.message}")

    try:
        jsonschema.validate(invalid_input, schema)
        print("✗ Invalid input should have failed but didn't")
    except jsonschema.ValidationError:
        print("✓ Invalid input correctly rejected")

def test_handler():
    """Test the handler directly without Claude."""
    result = handle_db_query({"sql": "SELECT name FROM sqlite_master WHERE type='table'"})
    parsed = json.loads(result)
    print(f"Handler result: {parsed}")

    # Test write block
    blocked = handle_db_query({"sql": "DROP TABLE orders"})
    parsed_block = json.loads(blocked)
    assert "error" in parsed_block, "Write operation should be blocked"
    print("✓ Write operation correctly blocked")

if __name__ == "__main__":
    test_schema_validation()
    test_handler()

Run this with python test_skill.py — no API key needed. You want all four assertions passing before moving to the live call.

Step 6: Run an End-to-End Agent Call

Now wire it together. The key is the tool use loop: Claude returns a tool_use block, you call your handler, send back a tool_result, and Claude produces its final response.

# main.py (continued)

def run_agent(user_message: str) -> str:
    messages = [{"role": "user", "content": user_message}]

    while True:
        response = client.messages.create(
            model="claude-3-5-haiku-20241022",  # ~$0.0008/1K input tokens
            max_tokens=1024,
            system=SYSTEM_PROMPT,
            tools=TOOLS,
            messages=messages
        )

        # Check stop reason
        if response.stop_reason == "end_turn":
            # Extract text from final response
            for block in response.content:
                if hasattr(block, "text"):
                    return block.text
            return ""

        if response.stop_reason == "tool_use":
            # Append assistant's tool_use message
            messages.append({"role": "assistant", "content": response.content})

            # Process each tool call
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    handler = SKILL_REGISTRY.get(block.name)
                    if handler:
                        result = handler(block.input)
                    else:
                        result = f'{{"error": "Unknown tool: {block.name}"}}'

                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result
                    })

            # Append tool results and loop
            messages.append({"role": "user", "content": tool_results})

        else:
            # Unexpected stop reason — surface it
            raise RuntimeError(f"Unexpected stop_reason: {response.stop_reason}")

if __name__ == "__main__":
    answer = run_agent("How many pending orders do we have?")
    print(answer)

The while True loop handles multi-step tool chains — Claude can call tools multiple times in a single conversation turn, and this pattern handles that correctly. Using Haiku here costs roughly $0.0008 per 1K input tokens and $0.004 per 1K output tokens, which makes local testing affordable even if you run it 200 times.

Step 7: Deploy to Production

The local version gets you 80% of the way. Production needs three additions: proper error handling with retries, structured logging, and environment isolation.

# production wrapper — replace the raw API call in run_agent()

import logging
import time

logger = logging.getLogger(__name__)

def create_message_with_retry(client, **kwargs) -> object:
    """
    Simple retry wrapper for transient API errors.
    Covers rate limits and occasional 529s from Anthropic.
    """
    max_retries = 3
    backoff = 2.0

    for attempt in range(max_retries):
        try:
            return client.messages.create(**kwargs)
        except anthropic.RateLimitError:
            if attempt == max_retries - 1:
                raise
            wait = backoff ** attempt
            logger.warning(f"Rate limit hit, retrying in {wait}s (attempt {attempt + 1})")
            time.sleep(wait)
        except anthropic.APIStatusError as e:
            logger.error(f"API error {e.status_code}: {e.message}")
            raise

For a deeper look at production retry patterns, the error handling and fallback logic guide covers exponential backoff, circuit breakers, and graceful degradation in more detail than we can fit here.

Log every tool call with its inputs and outputs — you will need this for debugging. A simple approach:

logger.info(json.dumps({
    "event": "tool_call",
    "tool": block.name,
    "input": block.input,
    "result": result[:500],  # Truncate large results in logs
    "session_id": session_id
}))

Common Errors and How to Fix Them

Error 1: Claude never calls your tool

Most common cause: the description field is too vague or doesn’t connect to user intent. “Query the database” is bad. “Use this when the user asks for specific data about orders, customers, or inventory” is better. Also check you’re passing tools=TOOLS in the API call — I’ve forgotten this more than once. If the tool still doesn’t fire, add tool_choice={"type": "auto"} explicitly, or temporarily use {"type": "any"} to force tool use during debugging.

Error 2: ValidationError on tool_result content

Your handler returned something other than a string. Common culprits: returning a dict directly, returning None on a code path you missed, or returning a Python object that isn’t JSON-serializable. Add assert isinstance(result, str) at the end of every handler during development.

Error 3: Infinite tool loop

If your handler returns an error message and Claude tries again, you can get stuck. Fix it with a loop counter:

max_iterations = 10
iteration = 0

while True:
    iteration += 1
    if iteration > max_iterations:
        raise RuntimeError("Agent exceeded maximum tool call iterations")
    # ... rest of loop

This is especially important if you’re building multi-skill agents where Claude chains tool calls. The Claude tool use with Python guide has more on handling complex tool chains.

What to Build Next

The natural extension of this database query skill is adding a schema discovery skill alongside it — a second tool that lets Claude call list_tables and describe_table before writing a query. This dramatically improves query accuracy because Claude can check column names rather than hallucinating them. Register both tools in the same agent, and Claude will naturally use the discovery tools first when it’s uncertain about the schema structure. That pattern — a “planning” tool paired with an “execution” tool — generalizes to almost every domain you’d want to automate.

Frequently Asked Questions

What is the difference between a Claude skill and a tool in the Agent SDK?

They’re the same concept, just different naming conventions. Anthropic’s documentation uses “tools” in the API reference and “skills” in higher-level agent framing. In the SDK, you define a tool schema and a handler function — together those form what most people call a skill. The JSON schema format is identical regardless of which term you use.

How do I get Claude to always use a specific tool instead of answering from memory?

Set tool_choice={"type": "tool", "name": "your_tool_name"} in the API call to force a specific tool call, or {"type": "any"} to force Claude to use at least one tool. Use this during testing, but in production stick with "auto" unless you have a specific reason to override — forced tool calls can produce worse outputs when the tool isn’t actually relevant.

Can I run multiple skills in parallel in a single agent call?

Yes. Claude can return multiple tool_use blocks in a single response, and you can execute them concurrently before sending all results back in one tool_result batch. Use asyncio.gather() or a thread pool if your handlers do I/O. The loop in this tutorial already handles multiple tool_use blocks in each iteration.

How much does it cost to run a Claude agent with tool calls?

Each tool call round-trip adds tokens: Claude’s tool_use block plus your tool_result content both count toward input tokens on the next call. A typical agent call with one tool use on Haiku costs roughly $0.002–$0.005 total depending on result size. Multi-step chains with large database results can hit $0.01–0.03 per run. Always log token usage from response.usage during development so you’re not surprised in production.

What’s the best model to use for skill-based agents in production?

Claude 3.5 Haiku for high-volume tool dispatch where the skill does the heavy lifting — it reliably triggers tools and is ~10x cheaper than Sonnet. Use Claude 3.5 Sonnet when the agent needs to reason about complex tool results or chain multiple tools with nuanced logic. Don’t use Opus for tool calling unless you’re also doing complex reasoning in the same call — the cost delta isn’t justified by tool call quality alone.

Put this into practice

Try the Connection Agent agent — ready to use, no setup required.

Browse Agents →

Editorial note: API pricing, model capabilities, and tool features change frequently — always verify current details on the vendor’s website before building in production. Code examples are tested at time of writing; pin your dependency versions to avoid breaking changes. Some links in this article may be affiliate links — we may earn a commission if you sign up, at no extra cost to you.

Building a Claude skill from scratch: step-by-step integration guide for the Agent SDK

Claude MCP servers: complete setup guide for production tool integrations

Prompt token optimization: reducing LLM API costs without sacrificing quality

Building Claude agents with persistent memory: architecture for multi-session state management

Stacking multiple Claude models in a single workflow: when to use Haiku vs Sonnet vs Opus

Building Claude agents with Starlette 1.0: modern Python web framework integration

Holotron-12B for computer use agents: building high-throughput vision-based automation

Building a Claude skill from scratch: step-by-step integration guide for the Agent SDK

Step 1: Install Dependencies

Step 2: Define the Skill Schema

Step 3: Implement the Skill Handler

Step 4: Register the Skill With an Agent

Step 5: Test Locally Without API Calls

Step 6: Run an End-to-End Agent Call

Step 7: Deploy to Production

Common Errors and How to Fix Them

Error 1: Claude never calls your tool

Error 2: ValidationError on tool_result content

Error 3: Infinite tool loop

What to Build Next

Frequently Asked Questions

What is the difference between a Claude skill and a tool in the Agent SDK?

How do I get Claude to always use a specific tool instead of answering from memory?

Can I run multiple skills in parallel in a single agent call?

How much does it cost to run a Claude agent with tool calls?

What’s the best model to use for skill-based agents in production?

Put this into practice

Related Claude Code Agents

Related Posts

Claude MCP servers: complete setup guide for production tool integrations

Prompt token optimization: reducing LLM API costs without sacrificing quality

Building Claude agents with persistent memory: architecture for multi-session state management

Stacking multiple Claude models in a single workflow: when to use Haiku vs Sonnet vs Opus

Building Claude agents with Starlette 1.0: modern Python web framework integration

Holotron-12B for computer use agents: building high-throughput vision-based automation