AI Code Review via GitHub Actions: Setting Up @claude on Your Repos

I've had Claude reviewing PRs on my repositories for a while now. Not as the sole reviewer — as a first pass before human eyes. It catches things. Not everything, and not always the most important things, but enough that it earns its place in the workflow.

Here's how to set it up, and more importantly, how to make it actually useful.

The Basic Setup

You need a GitHub Action that triggers on pull request events and sends the diff to Claude for review. Here's a working workflow:

name: AI Code Review

on:
  pull_request:
    types: [opened, synchronize]

permissions:
  contents: read
  pull-requests: write

jobs:
  review:
    runs-on: ubuntu-latest
    if: |
      !contains(github.event.pull_request.labels.*.name, 'skip-ai-review')
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Get PR diff
        id: diff
        run: |
          git diff origin/${{ github.event.pull_request.base.ref }}...HEAD > /tmp/pr_diff.txt

      - name: Claude Review
        uses: anthropics/claude-code-action@v1
        with:
          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
          model: claude-sonnet-4-20250514
          prompt: |
            Review this pull request diff. Focus on:
            1. Bugs or logic errors
            2. Security concerns
            3. Performance issues
            4. Code style inconsistencies

            Be concise. Only comment on things that matter.
            Don't suggest stylistic changes unless they affect readability significantly.
            If the code looks good, say so briefly.

That's the skeleton. The real work is in tuning the prompt and permissions.

Permission Scoping

The permissions block is critical. Claude needs:

contents: read — to see the code
pull-requests: write — to post review comments

That's it. Don't give it contents: write. Don't give it access to secrets or deployment. The principle of least privilege applies to AI just as much as it does to human users. More so, actually, because AI can be unpredictable in what it decides to do.

If you're using the @claude mention approach (where you tag Claude in a PR comment and it responds), you'll also need:

on:
  issue_comment:
    types: [created]

With a condition that checks the comment body for the mention. This lets team members explicitly ask Claude to review specific things: "@claude is this SQL injection safe?" or "@claude what's the performance implication of this query?"

What It Actually Catches

After running this on dozens of PRs across multiple projects, here's an honest assessment:

Genuinely useful catches:

Null reference issues. It's surprisingly good at spotting cases where you access a property that might not exist. "This method returns nullable, but you're chaining on it without checking."
Missing error handling. If you make an API call or database query without try/catch in a context where you should, it'll flag it.
SQL injection vectors. Any raw query construction gets flagged immediately. Even the subtle ones where you interpolate user input into a whereRaw.
Type mismatches. Especially in PHP where you might pass an int where a string is expected. It reads the function signatures and checks.
Missing validation rules. If a controller uses request input that isn't covered by the form request validation, it'll notice.

Sometimes useful:

Performance suggestions. "This query inside a loop will cause N+1" — yes, good catch. "Consider caching this" — maybe, maybe not, depends on context you don't have.
Test coverage gaps. "This branch isn't covered by the tests in this PR" — often correct, but sometimes the test is in a different file or the branch is covered by integration tests.

Rarely useful:

Architecture feedback. "Consider using the repository pattern here" — no thank you, Claude, I have my own architecture opinions.
Naming suggestions. These are almost always worse than what I had.

What It Misses

Business logic errors. Claude doesn't know that annual subscriptions should prorate differently than monthly ones. It can see the code runs without errors, but it can't tell you the calculation is wrong because it doesn't know the business rules.

Historical context. "We tried this approach six months ago and it caused issues because of X." That's institutional knowledge that no AI reviewer has.

Team dynamics. A human reviewer might say "this works but it's going to confuse the juniors" or "this is fine but it breaks the pattern we agreed on last week." Claude can't do that.

Subtle concurrency issues. Race conditions, deadlocks, timing bugs — these are hard enough for experienced humans to spot in a code review. AI catches the obvious ones and misses the subtle ones.

Tuning the Prompt

The default prompt matters more than you'd think. Here's what I've learned through iteration:

Tell it what NOT to do. Without constraints, Claude will comment on everything — formatting, naming, style, documentation. You'll get 30 comments on a 50-line PR and most will be noise. Explicitly say: "Don't comment on formatting. Don't suggest documentation changes. Only flag issues that could cause bugs, security problems, or significant maintenance burden."

Give it project context. If your project uses Actions instead of service classes, tell it. If you use DTOs for data transfer, tell it. Otherwise, it'll suggest patterns your codebase doesn't use.

prompt: |
  You are reviewing code for a Laravel 12 application that follows these conventions:
  - Business logic lives in Action classes, not controllers
  - DTOs are used for data transfer to frontend via Inertia
  - Tests are written in Pest, not PHPUnit
  - We use Pint for formatting, so don't comment on style

  Focus only on: bugs, security, performance, logic errors.
  If the code is solid, say "Looks good" and nothing else.

Set a threshold. I use a "would I reject this PR over it?" test. If the answer is no, Claude shouldn't comment. This dramatically reduces noise.

The Claudavel Templates

If you're using Claudavel (the Laravel + Claude integration package), it ships with workflow templates that handle a lot of this for you. The templates include:

Pre-configured prompts tuned for Laravel codebases
Automatic detection of framework patterns (routes, migrations, policies)
Integration with Pint (skip style comments if Pint is configured)
Smart filtering to ignore auto-generated files (IDE helpers, Wayfinder output)

You can pull the templates from the Claudavel repo and customise them for your project. They're a better starting point than writing from scratch.

The Skip Label

Notice the skip-ai-review label check in the workflow:

if: |
  !contains(github.event.pull_request.labels.*.name, 'skip-ai-review')

This is essential. Some PRs don't need AI review — dependency updates, generated files, documentation-only changes. Having a label to skip the review saves API costs and reduces noise. I also skip it for PRs that are clearly draft or WIP.

Cost and Performance

Running Claude Sonnet on PR diffs typically costs between $0.01 and $0.10 per review, depending on diff size. For a team doing 5-10 PRs a day, that's pocket change. The Action typically runs in 30-60 seconds, so it doesn't significantly delay your CI pipeline.

If cost matters (and it might if you're on a large team with many PRs), use Haiku for the initial pass and only escalate to Sonnet for PRs that touch sensitive areas — payments, authentication, data handling.

My Honest Take

AI code review is worth setting up. It takes 15 minutes, costs almost nothing, and catches 2-3 real issues per week that I might have missed on a quick human review. That's enough to justify its existence.

But it's a supplement, not a replacement. The things I value most in human code review — "is this the right approach?", "does this align with where we're heading?", "will this cause problems in six months?" — those require understanding that AI doesn't have.

Set it up, tune the prompt, use it as a first pass. Then still review the code yourself. The best outcome is when Claude catches the mechanical issues so you can focus your human review time on the stuff that actually requires thought.

I write about Laravel, AI tooling, and the realities of building software. More at stuartmason.co.uk.