Gemini multimodal analyst — structured extraction for any visual input type
Play to Gemini's strongest capability — paste any visual (chart, screenshot, document, photo, or two images to compare) and get a type-specific structured analysis that extracts what actually matters, not just a description.
Gemini's multimodal analysis is its standout strength. Paste your visual and use the prompt matched to its TYPE for dramatically better results: FOR CHARTS & DATA VISUALIZATIONS: 'Analyze this chart. Then: 1. Extract the underlying data into a markdown table (read exact values where visible, estimate where not). 2. Identify the main trend, the most significant outlier, and the story the chart is trying to tell. 3. Flag anything misleading — truncated axes, cherry-picked date ranges, missing baselines, scale manipulation, or unlabeled data. 4. Answer this: [YOUR QUESTION]. Ground every claim in specific numbers from the chart.' FOR SCREENSHOTS (UI, app, website): 'Analyze this screenshot. Then: 1. Extract ALL visible text, organized by section/hierarchy. 2. Identify the page type and purpose (what is the user supposed to do here?). 3. Flag UX issues: unclear CTAs, confusing navigation, accessibility problems (contrast, text size), information overload, or buried actions. 4. Note the visual design pattern being used and whether it's effective. 5. Answer this: [YOUR QUESTION]. Be specific — reference elements by their position and label.' FOR DOCUMENTS & SCANNED PAGES: 'Analyze this document. Then: 1. OCR: extract ALL text, preserving structure (headings, lists, tables, footnotes). 2. Classify the document type (contract, invoice, letter, form, report) and identify the key metadata (date, parties, reference numbers). 3. Summarize the 3 most important points or obligations. 4. Flag anything unusual, ambiguous, or potentially problematic. 5. Answer this: [YOUR QUESTION]. Preserve exact wording for any legal or financial terms.' FOR PHOTOS & IMAGES (general): 'Analyze this image. Then: 1. Describe what it shows factually — subject, setting, composition, lighting conditions. 2. Estimate technical details: approximate time of day (from shadows/light), season, camera position, and focal length range. 3. Identify any text, brands, signage, or identifiable locations. 4. Note what's unusual, notable, or might be missed on a quick look. 5. Answer this: [YOUR QUESTION]. State your confidence level for any estimates.' FOR COMPARISON (paste TWO images): 'I am sharing two images. Compare them systematically: 1. What is identical between them? 2. List every difference you can find — visual, textual, structural, positional. Be exhaustive. 3. Which differences are significant vs cosmetic? 4. If these are before/after or A/B variants, which is more effective and why? Organize differences in a table: [Element | Image 1 | Image 2 | Significance].' Tips: for complex charts, add 'also suggest a better chart type if this one obscures the data'; for screenshots, add 'suggest one specific improvement with the highest impact'; the comparison prompt is powerful for A/B test analysis, design review, or spot-the-difference QA; always end with your specific question — Gemini's analysis is far better when it knows what you actually need.
- Source
- promptfork seed
- License
- CC-BY-4.0
- Published
- 6/22/2026