Stu Mason
Stu Mason

Activity

Pull Request Opened

PR #138 opened: fix(predictions): resolve Reddit predictions against their own subreddit

Summary

`ResolvePrediction` was hardcoded to look up peak scores in `hn_top`/`hn_best`/`hn_new` regardless of where the prediction came from. All 1,024 Reddit predictions captured since 2026-05-15 were silently mis-graded — the resolver searched HN tables, found nothing, set `actual_peak_score = 0`, and tagged them `false_positive`. Meanwhile the matching Reddit data was sitting in `raw_items` under the same `url_hash` with the correct subreddit slug (e.g. one reddit_ChatGPT prediction has 1,642 raw_items rows with max_score 6,573).

Findings from 2026-05-30 prod deep dive

Source familyPredictionsReached "front page"Avg error
HN (best/top/new)839383 (46%)±105 score
Reddit (24 subreddits)1,0240 (0%)100% error

Hit-class distribution that was being polluted:

  • `false_positive`: 1,167 (62.6%) — mostly the broken Reddit set
  • `borderline`: 254 / `caught`: 234 / `precise`: 133 / `missed`: 2

Changes

  • `ResolvePrediction::handle` now resolves which source family the prediction belongs to via a new `familyFor()` helper. Reddit predictions look at `raw_items` filtered to their own subreddit `source_slug`; HN stays pan-feed across hn_top/hn_best/hn_new.

  • `ResolvePrediction::classify` takes an optional `int $threshold = 200` argument and scales the bucket boundaries proportionally: `low = t/2`, `really_big = t*1.5`. HN at `t=200` keeps the same 100 / 200 / 300 bands; Reddit at `t=500` (REDDIT_NOTABLE_THRESHOLD) gets 250 / 500 / 750. Existing tests pass unchanged because they use the default.

  • New `predictions:rebackfill` command re-runs the resolver on already-resolved predictions in place. Defaults to: ``` --source-pattern=reddit_% # SQL LIKE pattern matched against source_slug --broken-only # actual_peak_score = 0 --limit=0 # uncapped --dry-run # preview only ``` Safe to re-run; just overwrites graded fields with freshly computed values.

Deployment

Same dance as last time (`AUTO_MIGRATE=true` is fine — no migrations here):

  1. Merge + deploy
  2. Exec into container and run: ```bash php artisan predictions:rebackfill --dry-run --limit=5 # sanity check first php artisan predictions:rebackfill # full sweep ```
  3. Wait for next scheduled `predictions:classify-outcomes` (every 6h at :40) to catch up the `success_reason` / `failure_reason` fields, or run it manually

Test plan

  • All 8 `ResolvePredictionTest` cases pass (6 existing + 2 new)
  • Pint clean
  • Full suite 200 pass / 27 skip / 2 fail (pre-existing LinkedIn, unrelated)
  • `php artisan list` confirms `predictions:rebackfill` is registered
  • After backfill: hit-class breakdown for Reddit should show real precise/caught/borderline numbers instead of 100% false_positive
  • After 6h: failure_reason on Reddit predictions should populate with real reasons not driven by mis-resolution
+167
additions
-11
deletions
3
files changed