Generate an opinionated Cursor ruleset for a Python data-engineering repo
Produces a complete Cursor rules file for a Python data-engineering codebase covering DAG conventions, testing, type hints, the SQL/Python boundary, and env handling, each rule justified so the agent respects your pipelines.
You are a senior data engineer who configures AI assistants to respect pipeline conventions. Generate a complete, opinionated Cursor rules file for a Python data-engineering repo. Repo context: - Orchestrator: [Airflow / Prefect / Dagster] - dbt: [yes — dbt-core / no] - Compute: [pandas / Polars / PySpark / DuckDB] - Warehouse: [Snowflake / BigQuery / Redshift / Postgres] - Python version: [3.11 / 3.12], dependency manager: [uv / Poetry / pip-tools] - Layout: [DESCRIBE — e.g. 'dags/, dbt/models/, jobs/, tests/'] Generate a single .cursorrules file specific to THIS repo. Cover at minimum: 1. Architecture map — DAGs vs jobs vs dbt models, what runs where, what triggers what. 2. DAG conventions — idempotency is mandatory, tasks are pure, no top-level side effects, deterministic task IDs, partitioning and backfill expectations. 3. Python rules — full type hints on all public functions, dataclasses or Pydantic for configs, no bare except, no print in jobs (use the orchestrator's logger). 4. SQL / dbt boundaries — transformation logic lives in dbt or SQL where practical; Python only for what SQL cannot do; never inline raw SQL that duplicates a dbt model. 5. Testing — every transformation needs at least a schema test and a unit or sample test; DAGs have a DAG-integrity test; name the test runner. 6. Data safety — never hardcode credentials, no SELECT *, explicit column lists, append-over-destructive-overwrite unless flagged, partition-column discipline. 7. Env and config — settings via env vars and a config layer, never magic strings, the local .env pattern. 8. PR rules — a change to a model or DAG requires updating its tests; note the review checklist. Critical format rules: - Every rule MUST be followed by a one-line 'Why:' rationale. - Be concrete to THIS orchestrator and warehouse. Ban wrong patterns by name (e.g. no mutable global state in a DAG file). - Keep it under roughly 120 lines. No filler. Output the full .cursorrules file in a single fenced code block, then a 5-bullet summary of the conventions it enforces. Success signal: the output is good only if every rule has a rationale, idempotency and data-safety rules are explicit, and the SQL/Python boundary is clearly drawn for this stack.
Use case
Use when you want Cursor to edit Airflow, Prefect, or Dagster DAGs and dbt models without breaking orchestration, partitioning, or test conventions.
When to use this
For data-engineering repos with DAGs, dbt, and pandas or Spark jobs. Not for pure web-app or ML-research repos.
Follow-up prompts
- Add a separate ruleset for dbt model conventions (naming, tests, materializations).
- Generate the matching AGENTS.md so non-Cursor agents share the same rules.
- Add a rule block covering secrets handling and the local .env pattern for this repo.
- Source
- promptfork seed
- License
- CC-BY-4.0
- Published
- 6/22/2026