Retrieval Is the New Intelligence

Nicolas CorodFebruary 19, 2026March 31, 20266 min read

The Smartest Person With the Wrong Toolbox

You hire a contractor to fix a leaky faucet. She shows up with 500 tools in her truck. She knows how to use every single one. But she grabs a saw instead of a wrench. The faucet doesn't get fixed. Not because she lacks skill. Because she picked the wrong tool.

This is what happens inside AI agents every day. An AI agent takes your request, picks a tool from its available set, and runs it. When it works, it feels like magic. When it doesn't, the failure is specific: it grabbed the wrong tool. Or it didn't know the right one existed.

The industry has spent years making models smarter. Better at reasoning, writing code, understanding language. But a quieter problem has grown in the background. As agents gain access to more tools, finding the right one becomes harder. There's a name for this problem: retrieval.

What Is Retrieval?

Retrieval is the process of finding and selecting the right tool for a given task. When you ask an AI assistant to "make me a presentation," the agent needs to figure out which tool handles slide decks. When you say "summarize this document," it reaches for a different tool. When you say "clean this up," it has to interpret what you mean and then match that interpretation to a specific capability.

The AI doesn't return ten options for you to choose from. It picks one tool and runs it. If it picks wrong, you get a bad result and you might not understand why.

This problem is gaining serious attention in the research community. Researchers at Stanford and Harvard recently published a framework analyzing why agentic AI systems break down in practice, identifying tool retrieval as a core failure point. The ToolBench project, spotlighted at ICLR 2024, built a benchmark of over 16,000 real-world APIs and found that even advanced models struggle with retrieval accuracy as tool catalogs grow. More recently, MCP-Bench tested agents across 250 tools and confirmed that retrieving the right tool from vague instructions remains one of the hardest unsolved challenges in AI agent design.

Why It Breaks Down

The most dangerous retrieval failure is the one you never see. The agent doesn't pick the wrong tool. It picks no tool at all, because it doesn't know the right one exists. You ask it to "check this contract for risky clauses." It has a specialized legal review tool, but the retrieval system doesn't surface it. So the agent uses a generic text tool instead. You get a mediocre answer and assume the AI isn't capable. In reality, the perfect tool was sitting unused in its toolbox. This is the invisible failure, and it erodes trust faster than any visible error.

Behind this, several forces make retrieval hard. Every tool comes with a description. Think of it as a label on a jar. The agent reads these labels to decide which tool to grab. But labels are written by humans. Humans are inconsistent. One tool says "generate DOCX files." Another says "create professional documents." A third says "write formatted reports." All three do similar things with different words. The agent has to figure out that your request for "a polished memo" matches any of them.

Scale compounds the problem. An agent with five tools makes decisions quickly. An agent with 200 tools faces overlap, ambiguity, and a flooded context window. Every tool description takes up processing space. Some systems show the agent only a subset of tools. But what if the right tool wasn't in the subset?

Then there's the human side. People speak in ways that don't map neatly to tool descriptions. "Fix my data" could mean remove duplicates, correct formatting, fill gaps, or restructure the file. Each requires a different operation. The retrieval system bears the full weight of this translation gap between how people talk and how tools are labeled.

The Multi-Tool Puzzle

All of the above applies to tasks that need a single tool. Many real tasks need several, working together in sequence. You want to read a PDF, extract a table, clean the data, and export it to a spreadsheet. That's four tools, one after another.

The agent needs to plan the full chain before it starts. It needs to retrieve tools it hasn't used yet, for steps it hasn't taken yet. The common failure is that the agent completes step one and then struggles to find the right tool for step two. Each handoff is a new retrieval problem. Errors compound across steps.

Where Things Are Heading

The industry is starting to take retrieval seriously. A few directions are forming.

One approach is composable skills. Instead of treating each tool as standalone, systems allow tools to be combined like plugins. A "read PDF" skill connects to a "clean data" skill connects to an "export spreadsheet" skill. The agent doesn't retrieve each piece from scratch. It retrieves a composed workflow. Small, focused units that snap together based on what the task requires.

Another direction is better organization through what some call a living know-how graph. Instead of a flat list of 200 tools, skills and tools are mapped into a structured, evolving graph of relationships. The graph captures which tools relate to each other, which ones compose well, which ones cover similar ground, and how they've performed in past tasks. "Living" is the key word: when a new skill is added, the graph incorporates it. When an existing tool underperforms or becomes redundant, the graph restructures itself. It learns from usage patterns and adapts over time.

This is the infrastructure layer that most AI agent platforms are missing today. The model itself is the brain. The tools are the hands. Without a retrieval system that acts as an intelligent, evolving index, the brain keeps grabbing the wrong hands.

A well-maintained know-how graph becomes the connective tissue between what the user needs and what the agent knows how to do. Platforms like Skilder are building this infrastructure: a system where skills are structured, composable, and governed through a living graph that agents query in real time. The goal is to make retrieval a managed, scalable layer rather than an afterthought.

The next frontier in AI agents is not making them smarter. It is giving them the right infrastructure to find, compose, and deploy the right skills at the right time.

Retrieval is the new intelligence.

Enjoyed this article? Share it with your network.

Mar 31, 2026

Beyond Single .md Files: How Skill Graphs Scale AI Context

Graphs enable AI agents to navigate domain expertise through interconnected nodes rather than monolithic files. Learn how this changes context architecture.

Feb 5, 2026

Beyond "MCP vs Skills": Composing for Scalable Agent Architecture

The debate framing MCP and Skills as competitors misses the point. They solve different problems: MCP tackles the N×M integration problem, Skills address context saturation.

Jan 29, 2026