PromptFork

Reshape a messy human-made spreadsheet into tidy long-form data

Takes a report-style spreadsheet (merged headers, multi-row titles, wide months-as-columns) and reshapes it into a tidy long DataFrame with a documented step-by-step transform you can re-run on the next export.

Open in Studio
Prompt
You are a senior data analyst who turns human-readable spreadsheets into tidy, machine-readable data.

The spreadsheet is messy like this: [DESCRIBE THE STRUCTURE — e.g. 'rows 1-2 are a title and a blank, row 3 is headers, columns D-O are months Jan-Dec, some cells merged vertically by region, a totals row at the bottom'].
What one tidy row should represent: [e.g. 'one observation: region, metric, month, value'].
Python 3, pandas [2.x]. Provide the file path as [PATH].

Produce a reshape script that:
1. Reads the file defensively: skips title and header junk rows explicitly (not by guessing), treats the right row as the header, and drops totals or subtotal rows by rule.
2. Handles merged cells and repeated group labels: forward-fill region or segment labels so every row is self-describing. Never assume a value carries down silently.
3. Melts wide month-as-column blocks into long form with explicit id_vars and value_vars, producing a clean (entity, attribute, period, value) shape.
4. Splits composite headers if needed (e.g. 'Revenue - USD' into metric='Revenue' and unit='USD') using documented string rules.
5. Coerces values to the right dtype and flags non-numeric junk in a value column rather than NaN-ing it silently.
6. Outputs the tidy frame plus a transformation log: each numbered step, what it did, and how many rows entered or left. The log is the proof the reshape is reversible and re-runnable.

Rules:
- Do not hardcode row numbers without stating the assumption; prefer locating header or total rows by content (e.g. the row whose first cell equals 'Total').
- Every fill or split must be explicit and recorded in the transform log.
- Preserve the original column where ambiguous; do not overwrite source data.

Output the script in one fenced block, then the expected tidy schema (columns plus dtypes) and the transformation-log template.

Success signal: the output is good only if the result is one tidy row per observation, merged or group labels are forward-filled explicitly, and every row-count change is explained in the transform log.

Use case

Use when someone hands you a pivot-style or report spreadsheet built for humans and you need machine-queryable tidy rows for analysis or loading into a DB.

When to use this

When the file is wide with months in columns, has merged cells, multi-row headers, or repeated group labels. Not for files that are already one row per record.

Follow-up prompts

  • Wrap the reshape steps into a function and test it against last quarter's export.
  • Add a reconciliation check that row counts match before and after the melt.
  • Output the tidy frame to Parquet with a stable schema for downstream pipelines.
#pandas#data-cleaning#reshaping#tidy-data#python
Source
promptfork seed
License
CC-BY-4.0
Published
6/22/2026

More prompts you might like

Pandas data-cleaning pipeline for a messy CSV

Produce a reproducible Pandas cleaning pipeline: types, missing values, dedupe, outliers.

New

Audit a messy DataFrame against an expected schema with dtype coercion

Produces a reusable schema-validation and dtype-coercion script that flags every column that drifted from spec, coerces what it safely can, and quarantines what it cannot instead of producing silent NaNs.

#pandas#data-cleaning
New

RAG system prompt that refuses to hallucinate and cites sources

A retrieval-augmented system prompt that answers only from context and returns inline citations or 'I don't know'.

New

Scaffold a clean PyTorch training loop with eval and early stopping

Gives you a reproducible, well-structured PyTorch training script — config, model, dataloaders, train/eval loop, metrics, checkpointing, and early stopping — tuned to your task.

#pytorch#machine-learning
New

Pick the right Ollama model and generate an install plus run script for your hardware

Produces a hardware-aware Ollama model recommendation for your task plus a ready-to-run install and run script with VRAM checks, instead of guessing a model name and hoping it fits.

#ollama#local-llm
New

Wire a local RAG pipeline to Ollama with a doc loader and vector store

Produces a complete, local-first RAG pipeline with document loading, chunking, Ollama embeddings, a vector store, retrieval, and a grounded answer step with citations, requiring no cloud APIs.

#ollama#rag
New