Five Things We’re Watching in AI This Summer

Lander Analytics Team
2 days ago
7 min read

What Fable and Mythos revealed about frontier model adoption, why model routing is becoming practical, and more from the Lander team.

Every month in AI now feels like too much and not enough at the same time.

There are model releases, benchmark claims, executive interviews, open-source projects, security concerns, policy changes and a lot of noise in between. Some of it matters. Some of it feels important for a day and then disappears.

At Lander Analytics, we are more interested in trends than fads. Trends are sticky. They shape how teams build, buy, govern and use AI over time. Fads are louder, but they usually do not survive contact with real workflows.

Consider this column direct field notes from our team -- what we’re watching, why it seems useful and what you should probably take from it.

The Fable 5 Story Became Bigger Than the Model

Anthropic’s release of Claude Fable 5 and Claude Mythos 5 looked, at first, like the obvious model story of the month. Published benchmarks and early customer feedback were strong, especially around long-horizon coding, analytics, spreadsheet work, scientific reasoning and cybersecurity. Fable 5 was positioned as the generally available version. Mythos 5 was the more restricted model, aimed at trusted access use cases.

Then the story quickly changed. Anthropic suspended access to both models after a U.S. government directive citing national security concerns. Anthropic said the government believed it had found a way to bypass Fable 5’s safeguards. Around the same time, Reuters said Mythos identified vulnerabilities in highly sensitive U.S. government computer systems during a Project Glasswing testing exercise. The exact details are still limited, and some of this will probably take a while to sort out.

For businesses, that is the part worth paying attention to. Picking a frontier model is not just a question of who has the best evals this week. Legal, security, compliance and procurement all end up in the room eventually, especially when sensitive data is involved. A model can be excellent and still be the wrong fit for a workflow if the retention policy, access rules or vendor risk profile do not work for the organization. The model’s capabilities matter, but so do the terms around it.

The Right Model is Becoming a Workflow Decision

The Fable 5 launch also raised a more practical question: when do you actually need the biggest model, especially when the pricing is tied to how much you use it?

That sounds obvious, but it changes the incentives. As usage based pricing becomes the norm, the “let’s just use the best model for everything” approach starts to look a lot less harmless. A kitchen-sink prompt, a long agent loop or a workflow that keeps stuffing more context into every request can become a real cost center. At some point, the question becomes whether the better model changed the outcome enough to justify the spend.

Most AI tasks inside an organization probably do not need the strongest model available. Drafting a meeting recap, summarizing a clean document, classifying support tickets or writing a first-pass email should not require the same model as a complex code migration. A smaller model may be faster, cheaper, easier to govern and good enough for the work.

There will be tasks where extra reasoning is worth the cost. Our takeaway is to use smaller models for routine work, escalate when the task is complex and check whether the stronger model actually changed the outcome. This will be crucial for the maintenance of solutions in-production.

AI Is Changing the Work Before It Changes the Org Chart

The jobs conversation also became a little more grounded this month. Sam Altman said AI has not caused the near-term “jobs apocalypse” he once feared. That does not mean AI is having no effect on work. It just means the effects are showing up in the normal, uneven way work changes inside companies. Less dramatic than the headlines, but still real.

Most of the change seems to happen at the task level first. Drafting gets faster. Review gets more important. Some junior work gets reshaped. Managers have to get better at evaluating work that was partly produced by a model. The org chart may not look very different yet, but the way work moves through the team can change quite a bit.

For business leaders, the useful question is not whether a job title disappears next quarter. It is which parts of the work are becoming cheaper, which parts now require more review and which people have the judgment to use AI well. When generation gets cheaper, expertise does not stop mattering. In a lot of cases, it matters more.

The Benchmarks Are Starting to Look More Like Real Engineering

DeepSWE is one of the more useful benchmark developments we have been watching. It focuses on long-horizon software engineering tasks across real repositories, multiple languages and more realistic development environments.

That is a better test of the work we actually care about. A lot of coding benchmarks have been useful, but they can feel a little too clean compared with a real repo. Real engineering usually means finding the right part of the codebase, understanding the existing patterns, changing more than one file, not breaking old behavior and leaving the next person with something they can maintain.

For teams adopting AI coding tools, that changes what “good” should mean. The question should not stop at “did the code run?” It should also ask whether the agent understood the repo, respected the architecture, added useful tests and avoided making the system harder to own later. These tools are getting better. The merge button still needs a human being with judgment behind it.

Agent Governance Is Moving From Policy to Operations

Agent governance is starting to feel less theoretical. Gartner recently warned that a meaningful share of enterprises may demote or decommission autonomous agents because governance gaps show up only after production incidents. That sounds dramatic, but the underlying issue is familiar: teams move quickly on prototypes and slower on the operating model around them.

The risk changes pretty quickly depending on what the agent is allowed to touch. An assistant that summarizes internal documentation is one thing. An agent that can update records, send messages, call APIs or trigger business workflows is something else. Once an agent can take action, teams need to know what data it can see, what systems it can reach and how painful the mistake would be.

A useful way to start is to sort agents by what they are allowed to do: observe, recommend, act with approval or act on their own. Each level needs different controls around permissions, logging, monitoring, human review, rollback and incident response. Logging is not usually the first thing people want to talk about in an AI pilot. It becomes much more interesting the first time someone asks, “What exactly did the agent do?”

Also From Lander Analytics This Month

Each week, our team publishes a blog post on whatever we have been working through: AI, infrastructure, machine learning, statistics and, sometimes, sports. Here’s a quick recap of what went live this month:

DuckDB’s Quack Protocol Solves the Problem I Kept Working Around (June 23): Gus Lipkin wrote about DuckDB’s new Quack protocol and why it may solve one of the practical issues he kept running into: letting multiple processes write to the same DuckDB database without falling back to Postgres or folders full of Parquet files.
DGX Spark Series (Part 4): Setting up the Hermes Agentic Assistant (June 16): Jared Lander walked through building a private, local agentic assistant on a DGX Spark using Hermes, Ollama, Docker and scoped access to tools like email, Google Drive, Calendar and GitHub. The bigger point was how teams can give an assistant useful capabilities without giving up control over data, permissions and auditability.
What Was More Improbable: The Knicks’ Game 4 Comeback or the Patriots’ 28-3 Rally? (June 11): Mike Band used win probability models to compare two absurd comebacks, with the usual reminder that model outputs are useful but not truth machines.
DGX Spark Series (Part 3): When the Wrong-Sized GPU Is the Right Call (June 9): Joe Marlo walked through serving Chronos-2 from an R-friendly forecasting pipeline, and why “oversized” local AI infrastructure can make sense when the box is treated as shared infrastructure.
Zero Trust’s Blind Spot: The Unmanaged Package Manager (June 1): Travis Knoche wrote about how R and Python package installs can create software supply-chain risk, and why governed mirrors, frozen snapshots and lockfiles belong in the Zero Trust conversation.

What We Are Still Thinking About

The AI conversation still swings between two easy stories. One says everything is accelerating so fast that every organization has to reinvent itself immediately. The other says the whole thing is hype and the slowdown is finally here. We do not think either frame is especially useful.

The useful story is somewhere in the middle. Models are still getting better, but the surrounding work is getting harder to skip. Costs need to be watched. Data-retention policies need to be understood. Benchmarks need to look more like the work people actually do. And once agents move into real workflows, governance and monitoring stop being abstract concerns.

That is probably where organizations should spend their attention. Keep experimenting, but keep the boring questions close by. Which model is optimal? What data can it see? Who reviews the output? What does it cost? How do we monitor it? And what happens when it is wrong?

We believe preparing for those questions is crucial to the stability of the solutions that make it over the long run.

The Lander Analytics Team

Subscribe to our Substack and below to our monthly emails for practical AI strategies for your organization: what to build, what to avoid, and how to make systems reliable in the real world.

Work with us: If you want help identifying the right first workflow, building a permissioned knowledge base, or training your team to ship responsibly, reach out through Lander Analytics.