Gemini Flash model gets visual reasoning capability

news

Jan 27, 20262 mins

Agentic Vision combines visual reasoning with code execution to ground answers in visual evidence, delivering a 5% to 10% quality boost across most vision benchmarks, Google said.

Google has added an Agentic Vision capability to its Gemini 3 Flash model, which the company said combines visual reasoning with code execution to ground answers in visual evidence. The capability fundamentally changes how AI models process images, according to Google.

Introduced January 27, Agentic Vision is available via the Gemini API in the Google AI Studio development tool and Vertex AI in the Gemini app.

Agentic Vision in Gemini Flash converts image understanding from a static act into an agentic process, Google said. By combining visual reasoning andcode execution, the model formulates plans to zoom in, inspect, and manipulate images step-by-step. Until now, multimodal models typically processed the world in a single, static glance. If they missed a small detail—like a serial number or a distant sign—they were forced to guess, Google said. By contrast, Agentic Vision converts image understanding into an active investigation, introducing an agentic, “think, act, observe” loop into image understanding tasks, the company said.

Agentic Vision allows a model to interact with its environment by annotating images. Instead of just describing what it sees, Gemini 3 Flash can execute code to draw directly on the canvas to ground reasoning. Also, Agentic Vision can parse high-density tables and execute Python code to visualize findings. Future plans for Agentic Vision including adding more implicit code-driven behaviors, equipping Gemini models with more tools, and delivering the capability in more model sizes, extending it beyond Flash.

Artificial IntelligenceGenerative AIProgramming LanguagesPythonSoftware Development

by Paul Krill

Editor at Large

Follow Paul Krill on X

Paul Krill is editor at large at InfoWorld. Paul has been covering computer technology as a news and feature reporter for more than 35 years, including 30 years at InfoWorld. He has specialized in coverage of software development tools and technologies since the 1990s, and he continues to lead InfoWorld’s news coverage of software development platforms including Java and .NET and programming languages including JavaScript, TypeScript, PHP, Python, Ruby, Rust, and Go. Long trusted as a reporter who prioritizes accuracy, integrity, and the best interests of readers, Paul is sought out by technology companies and industry organizations who want to reach InfoWorld’s audience of software developers and other information technology professionals. Paul has won a “Best Technology News Coverage” award from IDG.

Show me more

Topics

About

Policies

Our Network

More

Gemini Flash model gets visual reasoning capability

Agentic Vision combines visual reasoning with code execution to ground answers in visual evidence, delivering a 5% to 10% quality boost across most vision benchmarks, Google said.

More from this author

Google adds vibe design to Stitch UI design tool

Project Detroit, bridging Java, Python, JavaScript, moves forward

JDK 26: The new features in Java 26

Oracle unveils the Java Verified Portfolio

Gemini CLI introduces plan mode

JetBrains unveils AI tracing library for Kotlin and Java

Microsoft’s .NET 11 Preview 2 offers cleaner stack traces

Claude Code adds code reviews

Show me more

OpenAI’s desktop superapp: The end of ChatGPT as we know it?

Google’s Stitch UI design tool is now AI-powered

Stop using AI to submit bug reports, says Google

How to build desktop apps in Typescript with Electrobun

Write and run assembly in Python with Copapy

Run AI Models Locally on Your PC — No Cloud Required (LM Studio Guide)