

Agentic Vision in Gemini 3 Flash introduces an agentic approach to image understanding that treats vision as an active investigation rather than a static process. This capability allows the model to formulate plans to zoom in, inspect, and manipulate images step-by-step, combining visual reasoning with code execution to ground answers in visual evidence.
Key features include the ability to zoom and inspect fine-grained details, annotate images by drawing bounding boxes and labels, and perform visual math and plotting by parsing high-density tables and generating charts. The system uses Python code execution as one of its primary tools, enabling deterministic computation and visual manipulation capabilities that replace probabilistic guessing with verifiable execution.
Agentic Vision operates through an agentic Think, Act, Observe loop. First, the model analyzes the user query and initial image to formulate a multi-step plan. Then it generates and executes Python code to actively manipulate images through cropping, rotating, annotating, or analyzing them. Finally, the transformed image is appended to the model's context window for better contextual understanding before generating a final response.
The capability delivers a consistent 5-10% quality boost across most vision benchmarks and enables use cases such as building plan validation, digit counting with visual verification, and data visualization from complex visual inputs. Developers can use it to improve accuracy in applications requiring detailed visual analysis and grounded reasoning.
Agentic Vision targets developers building AI applications that require sophisticated visual understanding capabilities. It integrates with Google AI Studio, Vertex AI, and the Gemini app, and is particularly useful for applications involving high-resolution image analysis, compliance verification, and data extraction from visual sources.
admin
Agentic Vision targets developers building AI applications that require sophisticated visual understanding capabilities. It's designed for developers integrating visual reasoning into their applications through the Gemini API in Google AI Studio and Vertex AI. The capability is particularly useful for applications involving high-resolution image analysis, compliance verification, data extraction from visual sources, and any scenario requiring grounded visual reasoning with code execution support.