Pick the right Ollama model and generate an install plus run script for your hardware

Produces a hardware-aware Ollama model recommendation for your task plus a ready-to-run install and run script with VRAM checks, instead of guessing a model name and hoping it fits.

Open in Studio

Prompt

You are a senior ML infra engineer who runs local models efficiently on consumer hardware.

Help me choose an Ollama model for my task and generate a runnable install and run script.

My setup:
- Task: [CODING / SUMMARIZATION / GENERAL CHAT / RAG EMBEDDINGS / something else — be specific]
- Hardware: [GPU model and VRAM, or 'CPU only' with RAM and cores]
- OS: [macOS (Apple Silicon / Intel) / Linux / Windows]
- Quality vs speed preference: [BEST QUALITY I CAN RUN / BALANCED / FASTEST]
- Already installed: [Ollama version / not yet]

Do the following:
1. Recommend 1-3 models that fit my hardware with honest VRAM/RAM math. For each: model name and tag, approx size on disk, approx RAM/VRAM needed at the chosen quantization, expected tokens/sec class (fast / ok / slow), and why it suits my task. If none fit comfortably, say so and name the minimum hardware.
2. Show the exact quantization tradeoff: why this tag and not a heavier or lighter one, in plain terms.
3. Generate a single install-and-run shell script that: checks Ollama is installed (and tells me how to install it if not), pulls the recommended model, runs a quick smoke-test prompt, and prints memory and latency observations. Make it safe to re-run.
4. Include the exact commands to set context length, temperature, and keep-alive for my task, with a one-line reason for each.
5. Flag failure modes specific to my hardware (e.g. CPU-only slowness, low-VRAM out-of-memory, Apple Silicon unified-memory behavior).

Rules:
- Be honest about limitations. Do not claim a model will be fast if the math says otherwise.
- Recommend tags that actually exist on the Ollama registry; if unsure, say 'verify the tag exists on ollama.com/library'.
- No fabricated benchmarks — label any speed estimate as approximate.

Output: the recommendation table, the quantization rationale, the full script, and the hardware failure-mode notes.

Success signal: the output is good only if the model fits my stated hardware with shown math, the script is re-runnable and installs Ollama if missing, and every speed or size claim is labeled approximate or sourced.

Use case

Use when you want to run a local model for a specific task (coding, summarizing, chat) but do not know which model fits your machine's memory.

When to use this

Before installing your first Ollama model, or when your current model is too slow or will not load. Not for cloud API workflows.

Follow-up prompts

Add a side-by-side speed and quality comparison script for the top two recommended models.
Generate a launch script with a custom system prompt and context-length flags.
Add a quantization-aware variant for the smallest machine that can still run the task.

#ollama#local-llm#self-hosting#cli#ai-setup

Source: promptfork seed
License: CC-BY-4.0
Published: 6/22/2026

Report

Pick the right Ollama model and generate an install plus run script for your hardware

Use case

When to use this

Follow-up prompts

Explore more

More prompts you might like

Wire a local RAG pipeline to Ollama with a doc loader and vector store

Design a privacy-first local chat setup with quantization guidance

RAG system prompt that refuses to hallucinate and cites sources

Pandas data-cleaning pipeline for a messy CSV

Scaffold a clean PyTorch training loop with eval and early stopping

Build a robust PyTorch Dataset and DataLoader with an augmentation pipeline