Agentic Vision is a new capability in Gemini 3 Flash that combines visual reasoning with code execution to ground answers in visual evidence. It converts image understanding from a static act into an agentic process.

Agentic Vision in Gemini 3 Flash introduces an agentic approach to image understanding that treats vision as an active investigation rather than a static process. This capability allows the model to formulate plans to zoom in, inspect, and manipulate images step-by-step, combining visual reasoning with code execution to ground answers in visual evidence.

Key features include the ability to zoom and inspect fine-grained details, annotate images by drawing bounding boxes and labels, and perform visual math and plotting by parsing high-density tables and generating charts. The system uses Python code execution as one of its primary tools, enabling deterministic computation and visual manipulation capabilities that replace probabilistic guessing with verifiable execution.

Agentic Vision operates through an agentic Think, Act, Observe loop. First, the model analyzes the user query and initial image to formulate a multi-step plan. Then it generates and executes Python code to actively manipulate images through cropping, rotating, annotating, or analyzing them. Finally, the transformed image is appended to the model's context window for better contextual understanding before generating a final response.

The capability delivers a consistent 5-10% quality boost across most vision benchmarks and enables use cases such as building plan validation, digit counting with visual verification, and data visualization from complex visual inputs. Developers can use it to improve accuracy in applications requiring detailed visual analysis and grounded reasoning.

Agentic Vision targets developers building AI applications that require sophisticated visual understanding capabilities. It integrates with Google AI Studio, Vertex AI, and the Gemini app, and is particularly useful for applications involving high-resolution image analysis, compliance verification, and data extraction from visual sources.

Agentic Vision in Gemini

Agentic Vision in Gemini

Key Features

Publisher

Tech Stack

Use Cases

Who is this for?

Comments