PromptFork

Wire a local RAG pipeline to Ollama with a doc loader and vector store

Produces a complete, local-first RAG pipeline with document loading, chunking, Ollama embeddings, a vector store, retrieval, and a grounded answer step with citations, requiring no cloud APIs.

Open in Studio
Prompt
You are a senior engineer who builds local-first RAG systems that stay grounded.

Build a complete local RAG pipeline wired to Ollama. Context:
- Documents: [FILE TYPES — e.g. 'PDFs and Markdown in a ./docs folder', and approx count/size]
- Language: [Python / TypeScript]
- Vector store: [Chroma / Qdrant local / LanceDB / FAISS in-memory]
- Embedding model (Ollama): [nomic-embed-text / mxbai-embed-large / suggest one]
- Generation model (Ollama): [llama3.1 / qwen2.5 / suggest one for my hardware]
- Hardware: [GPU and VRAM / CPU only / Apple Silicon]

Build a pipeline with these stages, each its own function:
1. Load — ingest the document types from the path, extract text, and track source plus page or section for citation.
2. Chunk — split with a sensible strategy (recursive or semantic) and chunk size plus overlap chosen for the doc type; explain the choice.
3. Embed — call the Ollama embedding model locally; batch to stay efficient; store vectors with metadata (source, chunk index).
4. Store — persist to the chosen vector store so re-embedding is not needed on every run.
5. Retrieve — take a query, embed it, return the top-k chunks with a similarity score; expose k and the score threshold as knobs.
6. Answer — build a prompt that uses ONLY the retrieved chunks, instruct the model to answer from them and to say when the context does not contain the answer, and require per-claim citations to source and chunk.
7. Guardrail — if retrieval returns nothing above threshold, the pipeline returns 'no relevant context found' instead of hallucinating.

Requirements:
- Everything runs locally — no OpenAI or Anthropic API calls.
- Show the exact Ollama model pulls needed and approximate disk/RAM cost.
- No silent errors; each stage logs what it did.

Output, in this exact order:
1. A design overview (stages, store, models, why).
2. The full runnable pipeline as one script with clear function boundaries.
3. A usage example: index a folder, then ask a question and print the grounded answer with citations.
4. A tuning checklist (chunk size, top-k, threshold, model choice) and how to tell retrieval quality is good.

Success signal: the output is good only if the pipeline runs fully local, answers are grounded in retrieved chunks with citations, and a no-match query returns an explicit 'no relevant context found' instead of a guess.

Use case

Use when you want to ask questions of your own documents privately with a local model, using retrieval and citations rather than stuffing everything into the prompt.

When to use this

For private document Q&A where data must not leave the machine. Not for very large multi-million-doc corpora or when you need frontier-model reasoning.

Follow-up prompts

  • Add a re-ranking step between retrieval and answer to improve relevance.
  • Add hybrid search (keyword plus vector) and show how it changes recall.
  • Wrap the pipeline in a small CLI or FastAPI endpoint for repeated queries.
#ollama#rag#local-llm#vector-store#python
Source
promptfork seed
License
CC-BY-4.0
Published
6/22/2026

More prompts you might like

Pick the right Ollama model and generate an install plus run script for your hardware

Produces a hardware-aware Ollama model recommendation for your task plus a ready-to-run install and run script with VRAM checks, instead of guessing a model name and hoping it fits.

#ollama#local-llm
New

Design a privacy-first local chat setup with quantization guidance

Produces a privacy-first local chat configuration with model and quantization choice for your hardware, a system prompt, conversation settings, and a data-leakage audit checklist so nothing leaves your machine.

#ollama#privacy
New

RAG system prompt that refuses to hallucinate and cites sources

A retrieval-augmented system prompt that answers only from context and returns inline citations or 'I don't know'.

New

Pandas data-cleaning pipeline for a messy CSV

Produce a reproducible Pandas cleaning pipeline: types, missing values, dedupe, outliers.

New

Scaffold a clean PyTorch training loop with eval and early stopping

Gives you a reproducible, well-structured PyTorch training script — config, model, dataloaders, train/eval loop, metrics, checkpointing, and early stopping — tuned to your task.

#pytorch#machine-learning
New

Build a robust PyTorch Dataset and DataLoader with an augmentation pipeline

Produces a custom PyTorch Dataset with correct transforms, a tuned DataLoader, and a debuggable augmentation pipeline that handles edge cases instead of throwing on the first weird sample.

#pytorch#machine-learning
New