Stu Mason
Stu Mason

Activity

Pull Request Merged

PR #31 merged: fix: cross-platform detection using title similarity matching

Problem

Cross-platform detection only matched via url_hash. Reddit stores reddit.com thread URLs while HN stores article URLs, producing zero HN-Reddit overlaps despite covering the same stories.

Solution

Hybrid URL + title similarity matching via a materialized view refreshed every 10 minutes.

How it works:

  1. pg_trgm extension enabled for trigram similarity matching
  2. normalized_title column added to raw_items (strips 'Show HN:' prefixes, punctuation, lowercases)
  3. Materialized view cross_platform_matches pre-computes matches using both url_hash AND title similarity (threshold: 0.4)
  4. Scheduled refresh every 10 minutes via app:refresh-cross-platform-matches
  5. API endpoint + dashboard actions updated to query the view

Files changed (10 files, +345/-61):

  • 2 migrations (pg_trgm + materialized view)
  • RawItem model (normalizeTitle)
  • BaseFetcher (populates normalized_title on store)
  • RefreshCrossPlatformMatchesCommand
  • Schedule registration
  • TrendsController crossPlatform()
  • Dashboard actions (GetBreakoutTracker, GetVelocity)
  • 11 unit tests for title normalization

Post-deploy:

php artisan migrate
php artisan app:refresh-cross-platform-matches
+361
additions
-61
deletions
10
files changed