Stu Mason
Stu Mason

Activity

Pull Request Merged

PR #139 merged: feat(predictions): drop weak HN classes + add domain reputation gate

Summary

Tunes HN predictions based on the 2026-05-30 retrospective. Two heuristic classes were net-negative; one feature axis (domain) was captured but completely ignored.

What the data showed

HN classTotalHit %Verdict
`front_page_lock` (score ≥ 200)69100%Kept (tautologically perfect)
`rising_fast` (high conf, v≥60)37852.6%Kept (core class)
`rising_fast` (medium, v 30-59)10343.7%Kept
`sleeper` (v 10-29, age 4-8h, score<150)20625.7%Dropped — 75% miss rate, drag on overall hit rate
`cross_platform` (≥2 sources + v≥10)1010%Dropped — tiny sample but signal is broken as defined

Domain extremes that the heuristic ignores:

PatternHit rate
youtube.com0 / 8 → 0%
news.ycombinator.com1 / 7 → 14%
bbc.com2 / 9 → 22%
github.com11 / 45 → 24%
twitter.com13 / 17 → 77%
techcrunch.com8 / 10 → 80%
anthropic.com7 / 8 → 88%
science.org8 / 9 → 89%

What changes

1. `forecastHn()` retires the two weak classes

`MODEL_VERSION` bumped to `v2` so v1 (existing) and v2 (post-deploy) predictions are distinguishable downstream. Reddit's forecaster is untouched — same classes there are healthy.

2. Domain reputation gate

LayerPurpose
`domain_reputations` tableCaches `sample_size` / `hit_count` / `hit_rate` per source domain
`RefreshDomainReputations` action + `predictions:refresh-domain-reputations` commandRecomputes from history; scheduled daily at 04:10 UTC
`DomainReputationProvider`Lazy in-memory cache so a capture run does one SELECT, not one per candidate
`MakePredictions::applyDomainReputation()`Final gate: `hit_rate ≤ 0.10` → veto, `≤ 0.30` → demote a notch, `≥ 0.70` → promote a notch, sample < 5 → ignore

Net effect: YouTube/Twitter/BBC links stop generating predictions or get demoted to low; anthropic/science.org/techcrunch posts get an automatic confidence boost.

Estimated combined uplift

MoveEstimated pp gain
Drop sleeper+5 to +10
Drop cross_platform+0.5
Domain reputation gate+3 to +7
Combined+8 to +15

Worst case (no real lift from reputation gate): we still drop ~216 mis-graded predictions, cleaning the dataset.

Deployment

  1. Merge + deploy (`AUTO_MIGRATE=true` is fine — new table only)
  2. Exec into container and bootstrap the reputation table from existing history: ```bash php artisan predictions:refresh-domain-reputations ```
  3. Capture command picks up the gate on the next `hourlyAt(7)` tick automatically.

Test plan

  • 10 new `MakePredictionsHnForecastTest` cases covering retired classes + reputation gate
  • Existing `MakePredictionsTest` + `MakePredictionsRedditTest` + `ResolvePredictionTest` all green
  • Full suite: 210 pass / 27 skip / 2 fail (pre-existing LinkedIn, unrelated)
  • Pint clean
  • After deploy + 1 week: re-run the failure-mode SQL from the retro — HN hit rate should be 55%+ if the changes land as expected
+398
additions
-26
deletions
8
files changed