The company that made RAG mainstream is now betting against it

Multiple arrows pointing in conflicting directions, illustrating the fragmented retrieval loops that Pinecone's Nexus aims to replace.

Pinecone just declared the RAG era over.

Pinecone built the vector database category. It defined RAG as the standard pattern for grounding language models. Some 800,000 developers and 9,000 paying customers learned how to chunk, embed, and retrieve on Pinecone’s infrastructure. With Nexus, a knowledge engine built for agents that was revealed on Monday, Pinecone is telling those same developers that the pattern they learned is the bottleneck.

The category Pinecone defined is the one it just declared obsolete

If you read how Pinecone frames Nexus, it’s hard to miss the company’s characterization of retrieval-at-inference as “the ten blue links era of agentic retrieval.”

It says agents stuck in retrieve-read-retrieve loops finish 50 to 60 percent of tasks.
It says 85 percent of an agent’s effort goes to fetching context.

Pinecone’s argument is that handing raw chunks to a frontier model and hoping the model figures it out is fragile, slow, and expensive.

That description is RAG by another name. It is the pattern Pinecone has followed for the past four years, shipping tutorials, training content, and an entire developer relations program. The vector database is now the substrate, not the surface. The product layer has moved up one floor.

This kind of self-disclosure is rare. Most infrastructure vendors keep selling the old thing while the market notices it is fading. Pinecone is the vendor saying it first.

Knowledge compilation is the new RAG

The actual shift is moving reasoning upstream. Instead of an agent fetching twenty chunks at query time and burning tokens to figure out what they mean, Nexus precompiles source data into typed, cited, task-specific artifacts. Agents query the artifacts, not the corpus. KnowQL, also released on Monday, gives them a vocabulary to do it. Six primitives, namely intent, filter, provenance, output shape, confidence, and latency budget, in a single declarative call that returns a cited, structured response.

The claim Pinecone is making is that this lifts task completion rate above 90 percent and cuts token spend by 90 percent. Take this claim with a grain of salt until production teams confirm. The structural claim does not need the numbers to hold. Compile once, read many times, is the right shape for agent workloads, and Pinecone is not the only vendor pointing in that direction.

The bigger pattern is reasoning moving upstream everywhere

Anthropic shipped skills as compiled, reusable context bundles. Cursor rules do the same job at the editor layer. Claude Code subagents pre-package context and tools per task. LangChain’s Harrison Chase has been calling this “context engineering” for months. Pinecone is now doing it at the retrieval layer.

The pattern is not novel. The vendor making this announcement is.

There is honest pushback. KnowQL has to clear the standardization bar SQL cleared, and standards do not get declared into existence by a single vendor. Vector search will not vanish either. Plenty of agent workloads still need cheap similarity over raw text. What changes is where the value sits in the stack.

If the next twelve months go the way Pinecone is betting, vector search becomes the plumbing, knowledge compilation becomes the product, and “RAG pipeline” becomes a phrase developers use the way we now use “LAMP stack.” A respectful past tense.

The version of this read I am most likely to be wrong about is timing. Categories take longer to die than the vendor declaring their death usually allows. But the direction is clear, and the company that built the category is the one pointing the way out of it.

The vendor that taught you RAG is now telling you to stop doing it.

The post The company that made RAG mainstream is now betting against it