The Anatomy of an LLM Tool Call
- Joe Marlo
- Apr 28
- 3 min read
How a prompt engineering trick unlocked agentic AI

Tool calling is one of those concepts that sounds more complicated than it is. At its core, it started as a clever prompt engineering trick to solve a very specific problem: LLMs couldn't answer questions about current events because their knowledge stopped at their training date.
Ask an early LLM "What's the weather in New York right now?" and it would hedge, telling you it only knew data as of its last training cutoff. Somebody had a simple but consequential idea: what if, instead of answering the question, the LLM could request a weather API call?
The Mechanics
The original implementation was surprisingly straightforward. You expanded the system prompt to tell the LLM: "Here are the tools you have access to. If you'd like to use one, respond in XML with the tool name and arguments. If you don't need a tool, just answer the question."
A user asks "What's the weather in New York right now?" The LLM reads the question, recognizes it needs current data, and instead of returning a prose answer it returns structured XML requesting a call to a weather API with the argument location=New_York. Your application parses that XML and actually calls the weather API. Then you take the API results and append them to the conversation history, just like adding a new message in a chatbot thread. The LLM now sees the original question plus the fresh weather data, and it can compose a final answer grounded in real information.
If the results aren't sufficient, the LLM can request another tool call instead of answering. That loop (user question, LLM requests tool, application calls tool, results go back to LLM, LLM responds or requests another tool) is still the fundamental pattern behind every agentic workflow today.
It is also one of the three defining characteristics of agentic AI, alongside multiple LLM calls and nondeterministic workflows. That last one is worth unpacking. Traditional software is deterministic: do step A, then step B, then step C, with some if-then-else branching that a developer has mapped out in advance. In an agentic system, the LLM decides what to do next based on the results it has seen so far. The sequence of tool calls is not hardcoded. Two identical queries might produce different tool call sequences on different runs.
Four Phases of Evolution
This whole concept is only a few years old, but it has moved through distinct phases.
Phase 1: Manual XML prompting. You wrote the tool descriptions into your prompts yourself, parsed the XML responses, called the tools, and managed the whole loop – functional, but messy.
Phase 2: Frameworks. Libraries like LangChain abstracted away the XML formatting, response parsing, and tool execution. You defined your tools and the framework handled the plumbing.
Phase 3: Native API support. Starting with OpenAI in June 2023 and followed by Anthropic and others, the major providers built tool calling directly into their APIs. You pass a structured list of tools with your API call, and the model returns structured JSON (not XML) when it wants to invoke one. This wasn't just a developer experience improvement — these companies also specifically trained their models to be better at deciding when and how to call tools. On the research side, Meta's Toolformer paper from early 2023 had already demonstrated that LLMs could learn tool use in a self-supervised way, signaling that tool calling was becoming a core model capability, not just an application-layer trick.
Phase 4: User-facing tool selection. In Phases 1 through 3, developers wire up each tool manually: writing the function, describing it for the model, handling the execution. The emerging phase flips this. End users pick from a pre-built catalog of available tools without writing any code. Anthropic's Model Context Protocol (MCP) is standardizing how tools are described and connected, which makes that kind of plug-and-play catalog possible.
The Takeaway
The whole thing traces back to one good idea: ask the LLM to produce structured markup instead of an answer. That single trick opened up a world where LLMs could access current information, interact with APIs, update databases, and orchestrate multi-step workflows. It is a straightforward concept that enables a genuinely new category of software.
Joe Marlo
Director of Data Science
Lander Analytics
Subscribe to our Substack and below to our monthly emails for practical AI strategies for your organization: what to build, what to avoid, and how to make systems reliable in the real world.
Work with us: If you want help identifying the right first workflow, building a permissioned knowledge base, or training your team to ship responsibly, reach out at info@landeranalytics.com. About the author: Joe Marlo is Director of Data Science at Lander Analytics, where he designs agentic workflows, statistical models, and interactive frontends that put rigorous analysis into production.


