Stu Mason

Summary

Three independent fixes to make the public MCP server stop timing out under sustained use, plus the observability needed to keep it that way.

Caching — Search, Compare, Neighborhood, Briefing, Trends, Velocity all wrap in Cache::remember (15-30min for user-driven queries, 5min for time-sensitive briefings). Compare keys per-query so popular terms reuse the Search summary across compare calls.
Trigram indexes — pg_trgm GIN on `raw_items.title` and `raw_items.url` so bidirectional ILIKE (`%foo%`) stops doing a seq scan over 4.7M rows. Plus a standalone `fetched_at` B-tree for the GetVelocity cross-platform CTE which filters only by time.
Timeouts — `fastcgi_read_timeout` and PHP `max_execution_time` bumped from 60s → 120s so cold-cache calls have room while the indexes finish building.
Telemetry — `mcp_call_logs` table + `McpRequestLogger` middleware on both `/mcp/public` and `/mcp/devtrends` capturing duration, query count, cache hits, errors, status code per call.
Dashboard — `/mcp-stats` (auth-gated) rendering p50/p95/p99 per tool, cache hit rate, error rate, recent errors.

Why now

The public MCP is hitting 504s and small (~127-byte) JSON-RPC error envelopes under sustained load. Container logs showed nginx `upstream timed out` on POST `/mcp/public` and inconsistent response sizes (127 → 7,680 bytes). Diagnosis: every Search call fans out into 4 sequential ILIKE scans of the 4.7M-row `raw_items` table, and Compare multiplies that up to 5×. No caching on the 6 most-called tools. Indexes don't cover `%foo%` lookups.

Deployment — important

Auto-migrate is currently disabled on the prod app (`AUTO_MIGRATE=false` env var set on Coolify) because `CREATE INDEX CONCURRENTLY` on the trigram indexes takes 5-15 min each on 4.7M rows. After merge:

Deploy as normal (entrypoint will skip migrations).
Exec into the container and run `php artisan migrate --force` manually.
Flip `AUTO_MIGRATE` back to `true`.

Test plan

Pest suite green (4 new tests passing, 1 skipped on sqlite for percentile_cont)
Pint formatted
After deploy + manual migrate, verify `raw_items_title_trgm_idx` and `raw_items_url_trgm_idx` exist in `pg_indexes`
Hit `/mcp-stats` and verify cache hit rate climbs above 0% as repeat queries land
Run a few `search` calls and confirm first call DB-heavy, second call cache_hit=true in logs
Confirm no more 504s on `/mcp/public` under normal traffic

Activity

PR #130 opened: perf(mcp): cache + indexes + telemetry for the public MCP

Summary

Why now

Deployment — important

Test plan