Stu Mason
Stu Mason
Guide

GitHub Webhooks to Activity Feed: Normalising Messy Event Data

Stuart Mason8 min read

The Progress site shows a developer activity feed — commits, pull requests, releases, that sort of thing. The obvious approach is GitHub webhooks: configure them, receive events, display them. Should

GitHub Webhooks to Activity Feed: Normalising Messy Event Data

The Progress site shows a developer activity feed — commits, pull requests, releases, that sort of thing. The obvious approach is GitHub webhooks: configure them, receive events, display them. Should take an afternoon.

It took a week. GitHub webhook payloads are a mess.

The Payload Problem

Every GitHub event type has a different payload structure. A push event looks nothing like a pull_request event, which looks nothing like a release event. Here's a taste:

// push event
{
    "ref": "refs/heads/main",
    "commits": [
        {
            "id": "abc123",
            "message": "Fix login bug",
            "author": { "name": "Stu", "email": "[email protected]" }
        }
    ],
    "repository": { "full_name": "stumason/progress" }
}

// pull_request event
{
    "action": "closed",
    "pull_request": {
        "title": "Add dark mode",
        "merged": true,
        "number": 42,
        "user": { "login": "stumason" }
    },
    "repository": { "full_name": "stumason/progress" }
}

// release event
{
    "action": "published",
    "release": {
        "tag_name": "v2.1.0",
        "name": "Version 2.1.0",
        "body": "### Changes\n- Added dark mode\n- Fixed login bug"
    },
    "repository": { "full_name": "stumason/progress" }
}

Three completely different structures. And that's just the well-formed ones. GitHub occasionally sends events with missing fields, null values where you expect objects, or payloads that don't match the documented schema at all.

Webhook Signature Validation

First things first — verify the webhook actually came from GitHub:

class GitHubWebhookController extends Controller
{
    public function handle(Request $request): JsonResponse
    {
        $this->validateSignature($request);

        $eventType = $request->header('X-GitHub-Event');
        $payload = $request->json()->all();

        ProcessGitHubEvent::dispatch($eventType, $payload);

        return response()->json(['status' => 'ok']);
    }

    private function validateSignature(Request $request): void
    {
        $secret = config('services.github.webhook_secret');
        $signature = $request->header('X-Hub-Signature-256');

        if (! $signature) {
            abort(401, 'Missing signature');
        }

        $expected = 'sha256=' . hash_hmac('sha256', $request->getContent(), $secret);

        if (! hash_equals($expected, $signature)) {
            abort(401, 'Invalid signature');
        }
    }
}

Always use hash_equals for constant-time comparison. Never use === for signature validation — timing attacks are real.

Return 200 immediately and process asynchronously. GitHub has a short timeout on webhook deliveries and will retry if you don't respond quickly.

Filtering Events

GitHub sends everything: issue comments, wiki edits, deployment statuses, check runs. For an activity feed, I only care about a subset:

class GitHubEventFilter
{
    private const ACCEPTED_EVENTS = [
        'push',
        'pull_request',
        'release',
        'create',        // Branch/tag creation
        'issues',        // Issue opened/closed
    ];

    private const PR_ACCEPTED_ACTIONS = [
        'opened',
        'closed',       // Includes merged
    ];

    private const ISSUE_ACCEPTED_ACTIONS = [
        'opened',
        'closed',
    ];

    public function shouldProcess(string $eventType, array $payload): bool
    {
        if (! in_array($eventType, self::ACCEPTED_EVENTS)) {
            return false;
        }

        // Filter PR events to only opened/merged/closed
        if ($eventType === 'pull_request') {
            return in_array($payload['action'] ?? '', self::PR_ACCEPTED_ACTIONS);
        }

        // Filter issue events
        if ($eventType === 'issues') {
            return in_array($payload['action'] ?? '', self::ISSUE_ACCEPTED_ACTIONS);
        }

        // Filter branch creation (ignore tag creation — releases cover that)
        if ($eventType === 'create') {
            return ($payload['ref_type'] ?? '') === 'branch';
        }

        return true;
    }
}

This drops about 70% of incoming events. Most GitHub activity is noise for a public-facing feed.

Normalisation

The normaliser converts each event type into a common NormalisedEvent DTO:

readonly class NormalisedEvent
{
    public function __construct(
        public string $type,
        public string $title,
        public ?string $description,
        public string $repository,
        public ?string $url,
        public Carbon $occurredAt,
        public string $deduplicationKey,
        public array $metadata = [],
    ) {}
}

Each event type gets its own normalisation method:

class NormaliseGitHubPayload
{
    public function execute(string $eventType, array $payload): ?NormalisedEvent
    {
        return match ($eventType) {
            'push' => $this->normalisePush($payload),
            'pull_request' => $this->normalisePullRequest($payload),
            'release' => $this->normaliseRelease($payload),
            'create' => $this->normaliseCreate($payload),
            'issues' => $this->normaliseIssue($payload),
            default => null,
        };
    }

    private function normalisePush(array $payload): ?NormalisedEvent
    {
        $commits = $payload['commits'] ?? [];

        if (empty($commits)) {
            return null; // Force pushes with no commits
        }

        $branch = str_replace('refs/heads/', '', $payload['ref'] ?? '');
        $repo = $payload['repository']['full_name'] ?? 'unknown';
        $commitCount = count($commits);

        // Use the latest commit's message as the title
        $latestCommit = end($commits);

        return new NormalisedEvent(
            type: 'push',
            title: $latestCommit['message'] ?? 'Push to ' . $branch,
            description: $commitCount > 1
                ? "{$commitCount} commits to {$branch}"
                : null,
            repository: $repo,
            url: $payload['compare'] ?? null,
            occurredAt: Carbon::parse(
                $latestCommit['timestamp'] ?? now()
            ),
            deduplicationKey: "push:{$repo}:{$latestCommit['id']}",
            metadata: [
                'branch' => $branch,
                'commit_count' => $commitCount,
                'commits' => collect($commits)->map(fn ($c) => [
                    'sha' => substr($c['id'], 0, 7),
                    'message' => Str::limit($c['message'], 100),
                ])->all(),
            ],
        );
    }

    private function normalisePullRequest(array $payload): ?NormalisedEvent
    {
        $pr = $payload['pull_request'] ?? [];
        $repo = $payload['repository']['full_name'] ?? 'unknown';
        $action = $payload['action'] ?? '';

        $isMerged = $action === 'closed' && ($pr['merged'] ?? false);
        $type = $isMerged ? 'pr_merged' : "pr_{$action}";

        return new NormalisedEvent(
            type: $type,
            title: $pr['title'] ?? 'Pull Request',
            description: $isMerged
                ? 'Merged into ' . ($pr['base']['ref'] ?? 'main')
                : null,
            repository: $repo,
            url: $pr['html_url'] ?? null,
            occurredAt: Carbon::parse(
                $pr['merged_at'] ?? $pr['updated_at'] ?? now()
            ),
            deduplicationKey: "pr:{$repo}:{$pr['number']}:{$action}",
            metadata: [
                'number' => $pr['number'] ?? null,
                'additions' => $pr['additions'] ?? 0,
                'deletions' => $pr['deletions'] ?? 0,
                'merged' => $isMerged,
            ],
        );
    }

    private function normaliseRelease(array $payload): ?NormalisedEvent
    {
        $release = $payload['release'] ?? [];
        $repo = $payload['repository']['full_name'] ?? 'unknown';

        return new NormalisedEvent(
            type: 'release',
            title: $release['name'] ?? $release['tag_name'] ?? 'New Release',
            description: Str::limit(strip_tags($release['body'] ?? ''), 200),
            repository: $repo,
            url: $release['html_url'] ?? null,
            occurredAt: Carbon::parse($release['published_at'] ?? now()),
            deduplicationKey: "release:{$repo}:{$release['tag_name']}",
            metadata: [
                'tag' => $release['tag_name'] ?? null,
                'prerelease' => $release['prerelease'] ?? false,
            ],
        );
    }

    private function normaliseCreate(array $payload): ?NormalisedEvent
    {
        $repo = $payload['repository']['full_name'] ?? 'unknown';
        $ref = $payload['ref'] ?? '';

        return new NormalisedEvent(
            type: 'branch_created',
            title: "Created branch {$ref}",
            description: null,
            repository: $repo,
            url: null,
            occurredAt: now(), // GitHub doesn't include a timestamp for create events
            deduplicationKey: "create:{$repo}:{$ref}",
            metadata: ['branch' => $ref],
        );
    }

    private function normaliseIssue(array $payload): ?NormalisedEvent
    {
        $issue = $payload['issue'] ?? [];
        $repo = $payload['repository']['full_name'] ?? 'unknown';
        $action = $payload['action'] ?? '';

        return new NormalisedEvent(
            type: "issue_{$action}",
            title: $issue['title'] ?? 'Issue',
            description: null,
            repository: $repo,
            url: $issue['html_url'] ?? null,
            occurredAt: Carbon::parse($issue['updated_at'] ?? now()),
            deduplicationKey: "issue:{$repo}:{$issue['number']}:{$action}",
            metadata: [
                'number' => $issue['number'] ?? null,
                'labels' => collect($issue['labels'] ?? [])->pluck('name')->all(),
            ],
        );
    }
}

Deduplication

The deduplicationKey is critical. GitHub sometimes sends the same event twice (retries on timeout). The polling backup (more on that shortly) will also find events that already arrived via webhook.

class DeduplicateFeedItem
{
    public function isDuplicate(NormalisedEvent $event): bool
    {
        return FeedItem::where('deduplication_key', $event->deduplicationKey)->exists();
    }
}

Simple, but it catches duplicates from any source — webhook retries, polling overlap, or the same event arriving from different repository webhook configurations.

The Hybrid Push/Pull Pattern

Webhooks are great when they work. But they get dropped. GitHub has outages. Your server has downtime. Network blips happen. If your feed relies solely on webhooks, you'll have gaps.

The solution: webhooks for real-time updates, polling as a backup:

// Webhook: immediate, processes events as they arrive
// (handled by the controller above)

// Polling: hourly backup, catches anything webhooks missed
Schedule::command('github:poll-events')->hourly();
class PollGitHubEvents extends Command
{
    protected $signature = 'github:poll-events';

    public function handle(
        GitHubService $github,
        NormaliseGitHubPayload $normaliser,
        DeduplicateFeedItem $deduplicator,
    ): int {
        $repositories = Repository::where('track_events', true)->get();

        foreach ($repositories as $repo) {
            $events = $github->getRecentEvents($repo->full_name);

            foreach ($events as $event) {
                $normalised = $normaliser->execute(
                    $event['type'],
                    $event['payload']
                );

                if (! $normalised) {
                    continue;
                }

                if ($deduplicator->isDuplicate($normalised)) {
                    continue;
                }

                FeedItem::create([
                    'source' => FeedSource::GitHub,
                    'event_type' => $normalised->type,
                    'title' => $normalised->title,
                    'description' => $normalised->description,
                    'repository' => $normalised->repository,
                    'url' => $normalised->url,
                    'occurred_at' => $normalised->occurredAt,
                    'deduplication_key' => $normalised->deduplicationKey,
                    'metadata' => $normalised->metadata,
                ]);
            }
        }

        return self::SUCCESS;
    }
}

The deduplication layer means it doesn't matter if an event arrives via both webhook and polling. The first one wins, the second is silently ignored.

The Garbage Problem

GitHub webhook payloads contain things you wouldn't expect:

  • push events with zero commits (force pushes that don't add commits)
  • pull_request events where the PR object is null (deleted repos)
  • Timestamps in different formats across different event types
  • Repository objects that are missing the full_name key
  • Commit messages that are empty strings

Defensive coding is essential:

$latestCommit = end($commits);
$message = trim($latestCommit['message'] ?? '');

if (empty($message)) {
    $message = 'Push to ' . $branch;
}

I wrap every normalisation method in a try/catch. If a payload is too broken to normalise, log it and move on. Don't let one garbage event break the entire pipeline.

The Result

The feed shows a clean timeline of development activity. Each item has a type icon, a title, a repository name, and a relative timestamp. Click through to see the commit, PR, or release on GitHub.

The hybrid approach means the feed is typically updated within seconds (webhook) and never more than an hour behind (polling). The deduplication layer keeps it clean. The normalisation layer makes the frontend simple — it just renders a list of feed items with consistent structure, regardless of which GitHub event type they originated from.

It's not sexy engineering. It's plumbing. But it's plumbing that works reliably, and that's what matters for a production system.


I write about Laravel, AI tooling, and building software. More at stuartmason.co.uk.

Get the Friday email

What I shipped this week, what I learned, one useful thing.

No spam. Unsubscribe anytime. Privacy policy.