By the end of this tutorial, you’ll have a working GitHub Actions workflow that sends every pull request diff to Claude, gets back structured feedback on bugs, security issues, and style violations, and posts that feedback directly as a PR comment. Automated code review with Claude fills the gap between static linters (which catch syntax problems) and human reviewers (who catch logic problems) — and it runs in under 30 seconds per PR.
ESLint won’t tell you that your database query will cause N+1 problems at scale. Bandit won’t notice that you’re logging a full request object that contains PII. A human reviewer might, but they’re busy and reviews get rushed. Claude reads the diff with full context and explains why something is a problem — not just that it is.
- Install dependencies — set up the Python review script and required packages
- Write the Claude review function — build the prompt and parse structured output
- Extract the PR diff — pull the diff from GitHub’s API in the action runner
- Post feedback as PR comments — write findings back to GitHub as inline or summary comments
- Wire up the GitHub Action — the full workflow YAML with secrets and triggers
Why static linters aren’t enough (and why Claude is the right fit)
Static tools are deterministic — they pattern-match against known bad code. That’s their strength and their ceiling. Claude reasons about intent. When you pass it a diff with surrounding context, it can spot that the new retry_count variable shadows an outer-scope variable, that an error handler is swallowing exceptions silently, or that a new API endpoint is missing authentication even though all the other routes have it.
I’ve tested both Claude 3.5 Sonnet and Claude 3 Haiku for code review tasks. Sonnet catches more subtle issues and writes clearer explanations — I’d use it for teams where review quality matters more than cost. Haiku runs at roughly $0.002–$0.004 per average PR diff and is fine for high-volume repos where you want a quick sanity check. For the architecture tradeoffs between models, this Claude vs GPT-4 code generation benchmark has detailed accuracy comparisons that are directly applicable here.
The other thing Claude does that static tools can’t: it explains feedback in plain English aimed at the person who wrote the code. That matters for junior developers who need to understand the reasoning, not just get a rule ID.
Step 1: Install dependencies
You need three things: the Anthropic Python SDK, httpx for GitHub API calls, and PyGithub for posting comments. Keep this lean — the action runner installs these on every run.
# requirements.txt
anthropic==0.25.0
PyGithub==2.3.0
httpx==0.27.0
Pin the versions. The Anthropic SDK has had breaking changes between minor versions — if you don’t pin, a dependency update will silently break your review pipeline on a Friday afternoon.
Step 2: Write the Claude review function
This is the core of the system. The prompt structure matters a lot here — you want Claude to return structured JSON so you can parse findings programmatically and decide which ones become blocking comments vs. informational notes.
import anthropic
import json
client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY from env
SYSTEM_PROMPT = """You are a senior software engineer performing a pull request code review.
Analyze the provided diff carefully and return a JSON object with this exact structure:
{
"summary": "2-3 sentence overall assessment",
"severity": "blocking" | "warning" | "info",
"findings": [
{
"type": "bug" | "security" | "performance" | "style" | "logic",
"severity": "blocking" | "warning" | "info",
"file": "path/to/file.py",
"line_hint": "approximate line or range from the diff",
"description": "what the issue is",
"suggestion": "concrete fix or alternative approach"
}
]
}
Rules:
- Only report actual problems, not style preferences unless they're consistency violations
- Mark as 'blocking' only if the code could cause data loss, security breach, or crash in production
- Be specific: reference actual variable names, function names, line numbers from the diff
- Keep descriptions under 100 words each
- If the diff looks clean, return an empty findings array — do not manufacture issues"""
def review_diff(diff: str, pr_title: str, pr_description: str) -> dict:
"""Send diff to Claude and return parsed review findings."""
# Truncate very large diffs to stay within token limits
# Claude 3.5 Sonnet has 200k context but large diffs get expensive fast
max_diff_chars = 40000
if len(diff) > max_diff_chars:
diff = diff[:max_diff_chars] + "\n\n[DIFF TRUNCATED - showing first 40k chars]"
message = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=2048,
system=SYSTEM_PROMPT,
messages=[
{
"role": "user",
"content": f"""PR Title: {pr_title}
PR Description: {pr_description or 'No description provided'}
Diff to review:
```
{diff}
```
Return only valid JSON matching the schema above."""
}
]
)
response_text = message.content[0].text
# Strip markdown code fences if Claude wraps the JSON
if response_text.startswith("```"):
response_text = response_text.split("```")[1]
if response_text.startswith("json"):
response_text = response_text[4:]
return json.loads(response_text.strip())
The system prompt is doing a lot of work here. The strict JSON schema means you can validate the output before posting it. The explicit “only report actual problems” instruction prevents Claude from padding the review with nitpicks — this is important for developer trust. If the tool cries wolf on every PR, people start ignoring it. For more on getting consistent structured output from Claude, see this guide on reducing LLM hallucinations with structured outputs.
Step 3: Extract the PR diff
GitHub Actions gives you the PR number via github.event.pull_request.number. You pull the diff using the GitHub API with an Accept: application/vnd.github.v3.diff header.
import os
import httpx
def get_pr_diff(repo: str, pr_number: int, token: str) -> str:
"""Fetch the unified diff for a pull request."""
headers = {
"Authorization": f"Bearer {token}",
"Accept": "application/vnd.github.v3.diff",
"X-GitHub-Api-Version": "2022-11-28"
}
url = f"https://api.github.com/repos/{repo}/pulls/{pr_number}"
response = httpx.get(url, headers=headers, timeout=30)
response.raise_for_status()
return response.text # GitHub returns raw diff when Accept header is set
def get_pr_metadata(repo: str, pr_number: int, token: str) -> dict:
"""Get PR title and body for context."""
headers = {
"Authorization": f"Bearer {token}",
"Accept": "application/vnd.github+json",
"X-GitHub-Api-Version": "2022-11-28"
}
url = f"https://api.github.com/repos/{repo}/pulls/{pr_number}"
response = httpx.get(url, headers=headers, timeout=30)
response.raise_for_status()
data = response.json()
return {"title": data["title"], "body": data.get("body", "")}
Step 4: Post feedback as PR comments
Post the summary as a single PR comment rather than trying to do line-by-line inline comments. Inline comments require exact commit SHA and position mapping against the diff — it’s fragile and breaks when the diff format changes. A well-structured summary comment is more reliable and easier to read.
from github import Github
def format_comment(review: dict) -> str:
"""Format the Claude review as a GitHub markdown comment."""
severity_emoji = {
"blocking": "🔴",
"warning": "🟡",
"info": "🔵"
}
type_labels = {
"bug": "Bug",
"security": "Security",
"performance": "Performance",
"style": "Style",
"logic": "Logic"
}
lines = [
"## 🤖 Claude Code Review",
"",
f"**Overall:** {review['summary']}",
""
]
if not review.get("findings"):
lines.append("✅ No significant issues found.")
return "\n".join(lines)
# Group by severity
blocking = [f for f in review["findings"] if f["severity"] == "blocking"]
warnings = [f for f in review["findings"] if f["severity"] == "warning"]
info = [f for f in review["findings"] if f["severity"] == "info"]
for group, label in [(blocking, "Blocking"), (warnings, "Warnings"), (info, "Info")]:
if not group:
continue
lines.append(f"### {severity_emoji.get(label.lower(), '⚪')} {label}")
for finding in group:
emoji = severity_emoji.get(finding["severity"], "⚪")
type_label = type_labels.get(finding["type"], finding["type"].title())
lines.extend([
f"**[{type_label}]** `{finding.get('file', 'unknown')}` {finding.get('line_hint', '')}",
f"{finding['description']}",
f"💡 {finding['suggestion']}",
""
])
lines.append("---")
lines.append("*Generated by Claude 3.5 Sonnet — review suggestions before acting*")
return "\n".join(lines)
def post_review_comment(repo_name: str, pr_number: int, token: str, comment_body: str):
"""Post the formatted review as a PR comment."""
g = Github(token)
repo = g.get_repo(repo_name)
pr = repo.get_pull(pr_number)
# Delete previous bot comments to avoid spam on re-runs
for comment in pr.get_issue_comments():
if comment.user.login == "github-actions[bot]" and "Claude Code Review" in comment.body:
comment.delete()
pr.create_issue_comment(comment_body)
Step 5: Wire up the GitHub Action
Now the glue. Create .github/workflows/claude-review.yml in your repo. You need two secrets: ANTHROPIC_API_KEY and the default GITHUB_TOKEN is available automatically.
name: Claude Code Review
on:
pull_request:
types: [opened, synchronize, reopened]
# Optionally restrict to specific paths
# paths: ['src/**', 'api/**']
permissions:
pull-requests: write # Required to post comments
contents: read
jobs:
review:
runs-on: ubuntu-latest
# Skip draft PRs — uncomment if you don't want reviews on WIP
# if: github.event.pull_request.draft == false
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install dependencies
run: pip install -r .github/review/requirements.txt
- name: Run Claude review
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
PR_NUMBER: ${{ github.event.pull_request.number }}
REPO: ${{ github.repository }}
run: python .github/review/review.py
And the entry point script that ties it all together:
# .github/review/review.py
import os
import sys
from diff_fetcher import get_pr_diff, get_pr_metadata
from claude_reviewer import review_diff
from comment_poster import format_comment, post_review_comment
def main():
token = os.environ["GITHUB_TOKEN"]
repo = os.environ["REPO"]
pr_number = int(os.environ["PR_NUMBER"])
print(f"Reviewing PR #{pr_number} in {repo}")
diff = get_pr_diff(repo, pr_number, token)
if not diff.strip():
print("Empty diff — nothing to review")
sys.exit(0)
metadata = get_pr_metadata(repo, pr_number, token)
review = review_diff(diff, metadata["title"], metadata["body"])
comment = format_comment(review)
post_review_comment(repo, pr_number, token, comment)
print("Review posted successfully")
# Exit with error code if blocking issues found
# This lets you optionally make the check required in branch protection
if review.get("severity") == "blocking":
sys.exit(1) # Remove this if you don't want to block merges
if __name__ == "__main__":
main()
Using Claude 3.5 Sonnet at current pricing, a typical PR with 200 lines changed costs roughly $0.008–$0.015. If you run this on every PR in an active team repo (say 20 PRs/day), you’re looking at ~$3–4/day. That’s cheap compared to the engineering time a missed bug costs. For high-volume repos, switch to Haiku and reserve Sonnet for PRs that touch security-sensitive files by checking the diff path before choosing the model.
It’s also worth building in retry logic for API timeouts — the GitHub action runner has a 6-hour limit but Claude API calls occasionally timeout under load. The patterns in this article on LLM fallback and retry logic for production apply directly here.
Common errors
JSON parsing fails on Claude’s response
This happens when Claude wraps the JSON in a markdown code fence, which it does intermittently despite instructions not to. The fix is in the review function above — strip the fence before parsing. If you’re still getting failures, add a fallback: catch the json.JSONDecodeError and post Claude’s raw text as the comment instead of crashing the action.
try:
review = json.loads(response_text.strip())
except json.JSONDecodeError:
# Fallback: post raw response if JSON parsing fails
return {"summary": response_text, "severity": "info", "findings": []}
GitHub token permission denied when posting comments
The most common cause: your repository settings require the workflow to have explicit write permissions and the default token doesn’t have them. Check Settings → Actions → General → Workflow permissions and set it to “Read and write permissions.” Alternatively, add permissions: pull-requests: write at the job level (already in the YAML above) — this is the safer approach since it’s explicit per-workflow.
Large diffs cause context window or cost issues
If someone opens a PR with 2,000 lines changed, you’ll either hit token limits or get a surprisingly large invoice. Two solutions: (1) the truncation logic in Step 2 handles the token limit issue — set max_diff_chars to what fits your budget. (2) For very large PRs, consider filtering the diff to only include files matching your sensitive path patterns (src/auth/**, api/**) and skip reviewing auto-generated files, lock files, and migrations unless they contain logic.
What to build next
The natural extension is file-aware context injection: before sending the diff to Claude, pull in the full content of files that were modified (not just the diff lines) so Claude can see the surrounding code. This dramatically improves detection of things like broken interface implementations where the change looks fine in isolation but the existing code it’s supposed to satisfy isn’t visible in the diff. You’d use the GitHub Contents API to fetch each modified file, prepend it to the prompt under a “Full file context” section, and update the system prompt to instruct Claude to check the change against the full file. Keep an eye on token counts — this can get expensive quickly on large files.
You can also extend this with tool use to let Claude call out to specific checkers — running bandit on Python files and feeding the output back into the review context. The Claude tool use with Python guide covers the implementation pattern for that kind of hybrid approach.
Frequently Asked Questions
How much does automated code review with Claude cost per month?
With Claude 3.5 Sonnet, expect $0.008–$0.015 per PR for typical diffs (150–300 lines). A team of 5 developers opening ~60 PRs/month would cost roughly $1–2/month. Switching to Claude Haiku brings that under $0.50/month for the same volume, but you’ll lose some nuance in the feedback quality.
Can I use this to block PR merges on blocking issues?
Yes — the script exits with code 1 when Claude returns a “blocking” severity finding. If you set the GitHub Action as a required status check in branch protection rules (Settings → Branches → Require status checks), PRs with blocking issues won’t be mergeable until they’re addressed or the check is manually bypassed. Be careful: this only works reliably if you also add retry logic so a Claude API timeout doesn’t accidentally block a PR forever.
How do I prevent Claude from flagging the same issues it already reviewed in a previous commit?
The simplest approach: delete and repost the review comment on every push (the code above already does this). For smarter deduplication, store the list of findings from the last review in a GitHub Actions cache keyed by the base commit SHA, and filter out any finding that appeared in the previous run. This is more complex to implement but prevents repeated noise in active PRs.
What’s the difference between this approach and using a tool like CodeRabbit or Greptile?
Hosted tools like CodeRabbit give you a polished UI, inline comments, and configurable rules without any setup — you’ll have reviews running in 10 minutes. The tradeoff is data residency (your code goes to their servers), less control over the prompt and model, and a monthly subscription ($19–$29/seat/month). The approach in this tutorial lets you customize exactly what gets reviewed, keep code on GitHub’s infrastructure only, and control costs directly against the Anthropic API.
How do I restrict the review to only certain file types or directories?
Two options: (1) Add path filters to the GitHub Action trigger (paths: ['src/**/*.py', 'api/**']) so the workflow only runs when relevant files change. (2) Filter the diff in Python before sending to Claude by splitting on diff --git and keeping only chunks for file paths matching your patterns. Option 2 is more flexible since the workflow still runs on every PR but only reviews the parts you care about.
Put this into practice
Try the Review Agent agent — ready to use, no setup required.
Editorial note: API pricing, model capabilities, and tool features change frequently — always verify current details on the vendor’s website before building in production. Code examples are tested at time of writing; pin your dependency versions to avoid breaking changes. Some links in this article may be affiliate links — we may earn a commission if you sign up, at no extra cost to you.

