PromptFork

Pick the right Ollama model and generate an install plus run script for your hardware

Produces a hardware-aware Ollama model recommendation for your task plus a ready-to-run install and run script with VRAM checks, instead of guessing a model name and hoping it fits.

Open in Studio
Prompt
You are a senior ML infra engineer who runs local models efficiently on consumer hardware.

Help me choose an Ollama model for my task and generate a runnable install and run script.

My setup:
- Task: [CODING / SUMMARIZATION / GENERAL CHAT / RAG EMBEDDINGS / something else — be specific]
- Hardware: [GPU model and VRAM, or 'CPU only' with RAM and cores]
- OS: [macOS (Apple Silicon / Intel) / Linux / Windows]
- Quality vs speed preference: [BEST QUALITY I CAN RUN / BALANCED / FASTEST]
- Already installed: [Ollama version / not yet]

Do the following:
1. Recommend 1-3 models that fit my hardware with honest VRAM/RAM math. For each: model name and tag, approx size on disk, approx RAM/VRAM needed at the chosen quantization, expected tokens/sec class (fast / ok / slow), and why it suits my task. If none fit comfortably, say so and name the minimum hardware.
2. Show the exact quantization tradeoff: why this tag and not a heavier or lighter one, in plain terms.
3. Generate a single install-and-run shell script that: checks Ollama is installed (and tells me how to install it if not), pulls the recommended model, runs a quick smoke-test prompt, and prints memory and latency observations. Make it safe to re-run.
4. Include the exact commands to set context length, temperature, and keep-alive for my task, with a one-line reason for each.
5. Flag failure modes specific to my hardware (e.g. CPU-only slowness, low-VRAM out-of-memory, Apple Silicon unified-memory behavior).

Rules:
- Be honest about limitations. Do not claim a model will be fast if the math says otherwise.
- Recommend tags that actually exist on the Ollama registry; if unsure, say 'verify the tag exists on ollama.com/library'.
- No fabricated benchmarks — label any speed estimate as approximate.

Output: the recommendation table, the quantization rationale, the full script, and the hardware failure-mode notes.

Success signal: the output is good only if the model fits my stated hardware with shown math, the script is re-runnable and installs Ollama if missing, and every speed or size claim is labeled approximate or sourced.

Use case

Use when you want to run a local model for a specific task (coding, summarizing, chat) but do not know which model fits your machine's memory.

When to use this

Before installing your first Ollama model, or when your current model is too slow or will not load. Not for cloud API workflows.

Follow-up prompts

  • Add a side-by-side speed and quality comparison script for the top two recommended models.
  • Generate a launch script with a custom system prompt and context-length flags.
  • Add a quantization-aware variant for the smallest machine that can still run the task.
#ollama#local-llm#self-hosting#cli#ai-setup
Source
promptfork seed
License
CC-BY-4.0
Published
6/22/2026

More prompts you might like

Wire a local RAG pipeline to Ollama with a doc loader and vector store

Produces a complete, local-first RAG pipeline with document loading, chunking, Ollama embeddings, a vector store, retrieval, and a grounded answer step with citations, requiring no cloud APIs.

#ollama#rag
New

Design a privacy-first local chat setup with quantization guidance

Produces a privacy-first local chat configuration with model and quantization choice for your hardware, a system prompt, conversation settings, and a data-leakage audit checklist so nothing leaves your machine.

#ollama#privacy
New

RAG system prompt that refuses to hallucinate and cites sources

A retrieval-augmented system prompt that answers only from context and returns inline citations or 'I don't know'.

New

Pandas data-cleaning pipeline for a messy CSV

Produce a reproducible Pandas cleaning pipeline: types, missing values, dedupe, outliers.

New

Scaffold a clean PyTorch training loop with eval and early stopping

Gives you a reproducible, well-structured PyTorch training script — config, model, dataloaders, train/eval loop, metrics, checkpointing, and early stopping — tuned to your task.

#pytorch#machine-learning
New

Build a robust PyTorch Dataset and DataLoader with an augmentation pipeline

Produces a custom PyTorch Dataset with correct transforms, a tuned DataLoader, and a debuggable augmentation pipeline that handles edge cases instead of throwing on the first weird sample.

#pytorch#machine-learning
New