Stu Mason
Stu Mason

Activity

Pull Request Merged

PR #12 merged: feat: expand data sources for comprehensive content lake

Summary

  • Adds 11 new Reddit subreddits covering AI/ML, frontend frameworks, and security
  • Adds AI company blogs (Anthropic, Meta AI) and tooling blogs (LangChain, LlamaIndex, HuggingFace)
  • Adds engineering blogs from major tech companies (Stripe, Netflix, Uber, Spotify, Discord)
  • Adds new arXiv categories (Computer Vision, NLP, Cryptography)
  • Adds Papers With Code API integration for research papers with implementations
  • Creates PapersWithCodeFetcher to handle the new PWC API source

New Sources Added

Reddit AI/ML

  • LocalLLaMA, ollama, ClaudeAI, ChatGPT, Oobabooga, StableDiffusion

Reddit Web Dev

  • sveltejs, nextjs, tailwindcss

Reddit Security

  • hacking, ReverseEngineering

AI Blogs

  • Anthropic Blog, Meta AI Blog, LangChain Blog, LlamaIndex Blog, HuggingFace Blog

Engineering Blogs

  • Stripe Engineering, Netflix Tech Blog, Uber Engineering, Spotify Engineering, Discord Engineering

arXiv Categories

  • Computer Vision (cs.CV), Computation & Language (cs.CL), Cryptography (cs.CR)

Research

  • Papers With Code - Trending, Papers With Code - Latest

Test plan

  • Tests pass locally
  • Run php artisan db:seed --class=SourceSeeder to populate new sources
  • Run php artisan firehose:fetch --source=pwc_trending --dry-run to verify PWC fetcher
+739
additions
-0
deletions
4
files changed