Gemini API File Search is now multimodal

Breadcrumb
  1. Innovation & AI
  2. Technology
  3. Developer tools

Gemini API File Search is now multimodal: build efficient, verifiable RAG

May 05, 2026

· Share x.com Facebook LinkedIn Mail Copy link

We’re introducing three major updates to the Gemini API File Search tool: multimodal support, custom metadata and page-level citations. These features help developers bring structure to unstructured data for efficient, verifiable RAG.

Ivan Solovyev Product Manager, Google DeepMind Kriti Dwivedi Software Engineer Share x.com Facebook LinkedIn Mail Copy link
Gemini API File Search

Your browser does not support the audio element.

Listen to article This content is generated by Google AI. Generative AI is experimental [[duration]] minutes Voice Speed Voice Speed 0.75X 1X 1.5X 2X

Today, we are expanding the Gemini API’s File Search tool. You can now build retrieval-augmented generation (RAG) systems with multimodal data and custom metadata. We’re also introducing page citations to improve grounding and transparency.

Whether you are prototyping a weekend project or scaling a production application for thousands of users, your RAG systems can now natively process and better organize your text and visual data.

Give your apps a photographic memory

File Search now processes images and text together. Powered by the Gemini Embedding 2 model, the tool understands native image data, providing your agents contextual awareness.

Think of a creative agency trying to dig up a specific visual asset. Instead of relying on keywords or filenames, your app can search an entire archive for an image matching a specific emotional tone or visual style described in a natural language brief.

See how developers are already using it:

"K-Dense Web is an AI co-scientist that autonomously executes complex multi-step workflows across science, engineering, healthcare, and finance. We’re building a unified visual memory to enable researchers to search across mixed modalities—from Western blots and microscopy images to agent-generated plots—in one query. Early testing with File Search “The new multimodal capabilities in the Gemini API are genuinely impressive. For a product like ours that combs through a massive, diverse library of GIFs, semantic retrieval quality is pivotal, and with this update, we “At Code Fundi, we provide the context layer for autonomous engineering. We solve the ‘Context Window Bottleneck’ by distilling massive, noisy repositories into logic-dense, LLM-ready markdown. Using the gemini-embedding-2 model to index a massive public pool of architectural diagrams, ERDs, and sequence diagrams from top open-source projects, we provide agents with a "photographic memory" of how the world

Filter the noise with custom metadata

Dumping files into a database is easy. Finding the right one at scale is the real challenge. Custom metadata allows you to attach key-value labels to your unstructured data — things like department: Legal or status: Final.

By applying metadata filters at query time, your application can scope requests to the data slice required. This significantly reduces noise from irrelevant documents, increasing both the speed and accuracy of your RAG workflows.

Show your work with page citations

When your application pulls an answer from a massive PDF, users need to verify exactly where that answer came from.

File Search now ties the model’s response directly to the original source. It captures the page number for every piece of indexed information. This level of granularity allows you to point users directly to the right spot, which helps build trust and makes your tool immediately useful for rigorous fact-checking.

Get started with File Search

We want to make it as easy as possible to store and retrieve the data that makes your ideas work. The File Search tool handles the heavy infrastructure so you can focus on building the product.

Uploading files and searching across them is simple:

Explore more code snippets in our developer guide and Gemini API documentation to learn how to build with File Search.

POSTED IN:
  • Developer tools

Related stories

Gemini API File Search is now multimodal Developer tools

Accelerating Gemma 4: faster inference with multi-token prediction drafters

By Olivier Lacombe & Maarten Grootendorst May 05, 2026 Gemini API File Search is now multimodal AI

The latest AI news we announced in April 2026

By The Keyword Team May 04, 2026 Gemini API File Search is now multimodal Developer tools

Reduce friction and latency for long-running jobs with Webhooks in Gemini API

By Lucia Loher & Hussein Hassan Harrirou May 04, 2026 Gemini API File Search is now multimodal Developer tools

Join the new AI Agents Vibe Coding Course from Google and Kaggle

By Anant Nawalgaria & Frank Guan Apr 27, 2026 Gemini API File Search is now multimodal Gemini models

Deep Research Max: a step change for autonomous research agents

By Lukas Haas & Srinivas Tadepalli Apr 21, 2026 Gemini API File Search is now multimodal Developer tools

Start vibe coding in AI Studio with your Google AI subscription.

By Seth Odoom Apr 20, 2026 . Jump to position 1 Jump to position 2 Jump to position 3 Jump to position 4 Jump to position 5 Jump to position 6 Gemini API File Search is now multimodal

Let’s stay in touch. Get the latest news from Google in your inbox.

Subscribe No thanks

Follow Us

  • Privacy
  • Terms
  • About Google
  • Google Products
  • About the Keyword
  • Help
  • Global (English) Africa (English) Australia (English) Brasil (Português) Canada (English) Canada (Français) Česko (Čeština) Deutschland (Deutsch) España (Español) France (Français) Greece (Ελληνικά) India (English) Indonesia (Bahasa Indonesia) Italia (Italiano) 日本 (日本語) 대한민국 (한국어) Latinoamérica (Español) الشرق الأوسط وشمال أفريقيا (اللغة العربية) MENA (English) Nederlands (Nederland) New Zealand (English) Polska (Polski) Portugal (Português) Sverige (Svenska) ประเทศไทย (ไทย) Türkiye (Türkçe) 台灣 (中文)