Cut AI token usage by 96%? Here’s how AWS Strands Agents does it.

In this episode of The New Stack Makers, AWS developer advocate Morgan Willis demonstrates Strands Agents, an open source agentic framework with rapid adoption since its launch.

For this episode of The New Stack Makers, I sat down with AWS developer advocate Morgan Willis to talk about Strands Agents, the company’s open source agentic framework, which has seen over 14 million downloads since it launched just under a year ago. Willis brought a hands-on demo built around a simple accounting API to show what building with Strands looks like in practice.

The demo walks through three iterations of the same task: looking up the latest invoice for a customer. First, Willis mapped each API endpoint directly to an agent tool, the way most developers would by default. The agent needed five chained API calls and burned roughly 52,000 tokens. Then she swapped in intent-based tools that are built around an outcome rather than a data operation. With the same query, getting an answer now took one tool call and only 2,000 tokens.

“It’s calling multiple API’s, but rolling them up into one intent-based tool for the agent that it’s going to have a better time using — and understanding when exactly to use it. […]

“The fewer tools that you expose to your agent, the less likely it is to call the wrong one.”

“Your agent is going to have a better time reasoning around what tool to use and when, because these tools are more aligned to a task and less aligned to data,” Willis tells The New Stack. “The fewer tools that you expose to your agent, the less likely it is to call the wrong one.”

Tools + semantic search

The third iteration moved those tools to a remote MCP server via AWS Agent Core Gateway and enabled semantic search across the tool catalog, so the agent received only the tools relevant to each query, rather than the full set of 16. That cut token usage roughly in half again compared to loading everything.

Willis says the broader principle at work here is that narrowly scoped agents tend to outperform general-purpose ones. 

“I think agents that are more narrowly defined tend to perform better than general use case agents. If you’re looking for context efficiency, speed, and accuracy, I would also look at your agent design as well.” 

Having many agents, each doing a small number of things, lets you design tools precisely for each use case rather than building a more general agent that tries to do everything. As MCP servers proliferate and tool catalogs grow, the question of which tools an agent actually sees on a given run is going to matter as much as the tools themselves.

The post Cut AI token usage by 96%? Here’s how AWS Strands Agents does it.