An LLM predicts the winner of Super Bowl LX

Mike Band
Feb 5
6 min read

Updated: Mar 10

How to get a real prediction from public data with better prompts

Before the Seattle Seahawks and New England Patriots take the field on Sunday, a lot of fans are going to do the same thing:

Open a large language model and type, “Who’s going to win the Super Bowl?”

I get it. It’s fast. It’s fun. It also tends to produce information that’s basically the sports equivalent of a fortune cookie. I’m doing a little more work than that, using only publicly available information that anyone can access, and using the model as a research assistant, not an oracle.

So before I explain the methodology, here’s the headline:

The model predicts the Seahawks beat the Patriots 23-20 in Super Bowl LX.

If you came here strictly for the score, that one was for you. If you’re interested in learning how to leverage LLMs as a sports research assistant, keep reading. I’ll walk you through the step-by-step process for extracting more value from outputs by obsessing over inputs.

A repeatable way to get professional answers from public data

The big game is Sunday, February 8 at Levi’s Stadium in San Francisco. There’s a ton of information out there: lines, injuries, tendencies, advanced stats, quotes, historical context and more. The problem is not a lack of takes, but rather turning the chaos into something structured and useful.

My workflow is intentionally simple:

Generate one strong deep research prompt
Run a second prompt to summarize and compress the report
Nod your head and say “wow” as you realize how much time you just saved

That’s all it is… plain english with a splash of creativity and expertise. For this use-case, it’s just enough structure to keep the model honest and the output repeatable.

Step 1: The deep research prompt

This is the prompt I paste into an LLM with web browsing (GPT, Claude, Gemini frontier models will suffice), and it’s designed to do three things well: (1) pull high-quality public sources, (2) produce a structured report that does not pick a winner until the end, and (3) takes into account all of the possible analytical dimensions of the matchup.

This is where subject-matter expertise becomes the differentiator. If you can’t imagine what actually decides a football game, you’ll end up with a report that reads like someone watched three highlights and skimmed a few AP articles.

When I built this, I iterated back and forth using voice-to-text and stream-of-consciousness typing until I had a checklist of what I wanted a real prognosticator to evaluate: coaching, quant indicators, trenches, psychology, matchups, context, and the boring stuff that actually matters like availability and game script.

That’s the whole engine. It’s not magic. It’s just structured enough that the model can’t drift into fan fiction without getting caught.

For this game, the deep research took 24 minutes, produced 18,043 words, and included 133 citations.

Step 2: The Synthesis Prompt

Once I get the deep research output, I immediately run a compression prompt. Deep research is great, but nobody wants to read a novel to get to what matters.

The synthesis prompt forces the model to keep only the highest leverage indicators, and turn the report into a decision brief, while preserving the final prediction and key takeaways.

Step 3: Read the output like a skeptic, not a fan

Here’s the part people skip; after the summary comes back, I do a fast sanity check:

Did it lean on outdated assumptions instead of current sources?
Does anything sound too confident without support?
Are injuries and availability treated as real variables, not footnotes?
Does the logic match the pick, or did it wander into narrative mode?

If something is off—for example, if the report incorrectly labels Patriots quarterback Drake Maye a rookie (instead of year two), a common mistake made by LLMs historically given their pretraining dataset—I rerun the first step with details on what it missed, and I force it back onto sources. Without access to up-to-date information, outdated analysis (even more so than hallucinations) becomes the bigger challenge in AI sports research.

The Output: Who will win Super Bowl LX, according to the model

Note: The output below is copied directly from a ChatGPT 5.2 + Deep Research conversation that included the two prompts above in succession. All statistics and information were verified before publishing.

What will decide the game:

Seattle defense vs Drake Maye: Seattle’s defense was #1 in points allowed (17.2 PPG) and #2 in EPA/play (-0.17); if they pressure Maye with four and win on early downs, it will be difficult for the Patriots offense to sustain drives.
Turnovers: Seattle was 31st in giveaways (28) in the regular season; New England’s offense was efficient and careful (Maye: 31 TD, 8 INT). One short field likely swings it.
Protection on the edges: Seattle LT Charles Cross is playing through a foot issue; NE EDGE Harold Landry is questionable (knee). Whichever “questionable” shows up closer to 100% shapes the pocket.
Explosives vs containment: Seattle’s pass game (Darnold → JSN/Kupp/Shaheed) can hit chunk plays; Seattle’s defense allowed very few explosives all year. NE likely needs 1–2 explosives to avoid long-drive perfection.
Red zone and 4th down: NE was 18/25 (72%) on 4th downs in 2025. Expect Vrabel/McDaniels to press leverage decisions near midfield and in the low red zone.

Key player availability:

Player	Status	“If limited” impact
Drake Maye (NE QB)	Questionable (shoulder/illness)	Fewer deep/drive throws; faster quick game
Harold Landry (NE EDGE)	Questionable (knee; missed bye-week practices)	Pass rush downgrade; more blitz/pressure manufacturing
Charles Cross (SEA LT)	Playing through foot	Protection risk on blind side; more chips/TE help
Sam Darnold (SEA QB)	Oblique noted; expected to play	Minor; manage hits
Zach Charbonnet (SEA RB)	OUT (ACL)	Less short-yardage/TD depth behind Walker

Key Matchups:

Seattle O vs NE D: New England’s defense graded poorly by advanced metrics (23rd DVOA), but can be situational. Seattle must stay ahead of schedule and avoid forced throws into disguised shells. Attack with quick winners (Kupp/JSN option routes) and selectively take shot plays when protection is clean.
NE O vs Seattle D: Seattle’s run defense was elite (1st EPA/rush defense). NE’s best path is rhythm passing + tempo early (they led the NFL in first-quarter points) to blunt the rush, then opportunistic shots (Diggs/Henry)
Critical 1-on-1s: Diggs vs Witherspoon, JSN vs Gonzalez, Moses vs Lawrence, Cross vs NE edge

Bottom Line:

Predicted winner: Seattle 23, New England 20. A hard-fought game where Seattle’s defense makes a late stand. They’re the more complete team, and completeness tends to win championships.
Seattle wins if: they can generate pressure without blitzing, and they avoid a multi-turnover game.
New England wins if: Drake Maye stays clean, they win the turnover battle, and they convert high-leverage fourth downs and red-zone snaps.

Why this is worth doing

If you try this exercise on models from even a year ago, the gap is obvious. The breadth and depth of analysis, along with improved reliability, has changed what “research” can look like for anyone with an internet connection.

Two years ago, you could spot hallucinations in sports outputs everywhere. Now, models are increasingly optimized to pull from public sources, and that changes the game. There’s simply too much content across the internet to process manually. This is one of the few realistic ways to harness it without drowning.

Above all, with LLMs, the output will only be as strong as the input that pushes it. In a short window, we’ve gone from unreliable answers to something that can look like near six-sigma research, at least when you structure the prompts, constrain the sources, and keep the model on a leash.

And that’s just with public data.

Imagine plugging this workflow into the kind of feeds we have at NFL Next Gen Stats.

Mike Band

NFL Next Gen Stats Research & Analytics

Lander Analytics Contributor

Subscribe to our Substack and below to our monthly emails for practical AI strategies for your organization: what to build, what to avoid, and how to make systems reliable in the real world.

Work with us: If you want help identifying the right first workflow, building a permissioned knowledge base, or training your team to ship responsibly, reach out at info@landeranalytics.com.

About the author: Mike Band is the Sr. Manager of Research & Analytics at NFL Next Gen Stats and AI Researcher at Lander Analytics.