PromptFork

Build a rate-limit and cost-tracking middleware layer for API calls

Produces a middleware layer that enforces client-side rate limits, tracks token and request spend against a budget, and short-circuits before you blow a quota, with typed budgets, alerts, and an observability hook.

Open in Studio
Prompt
You are a senior backend engineer who controls cost and rate-limit risk on API spend.

Build a rate-limit and cost-tracking middleware layer that wraps API calls. Context:
- API: [OpenAI / Anthropic / generic — note the pricing basis: per-token / per-request]
- Language: [TypeScript / Python]
- Concurrency model: [single process / multi-process / distributed with Redis]
- Budget: [e.g. 'USD per day', 'requests per minute', 'tokens per hour']
- Needs streaming cost: [yes — count streamed tokens / no]

Build a middleware or guard layer with:
1. A client-side rate limiter (token bucket or sliding window) that throttles calls before they hit the network; for distributed mode, back it with Redis with a documented lock strategy. Expose rate config as knobs.
2. A budget tracker that estimates cost per call: for per-token APIs, count prompt plus completion tokens (including streamed tokens) and multiply by the price table; for per-request APIs, count requests. Keep a running total per budget window.
3. Short-circuit logic: before a call, check the rate limiter AND the remaining budget; if either is exhausted, reject with a typed BudgetExceededError or RateLimitedError — do NOT fire the request.
4. A price table as data (model -> input/output price per unit) with the date and currency noted, and a clear warning that prices change and must be maintained.
5. An observability hook: every decision (allowed, throttled, budget-exceeded) emits a structured event (call id, model, tokens, est. cost, decision) so metrics and alerts can be wired in.
6. Graceful behavior when the API returns its own usage headers — prefer the provider's reported usage over the estimate and reconcile.

Requirements:
- Never let a call fire if it would exceed the budget; the guard is the source of truth.
- Streaming cost counting must not block the stream; count asynchronously or after completion.
- All cost figures are estimates labeled as such; do not present them as exact billing.

Output, in this exact order:
1. A design overview (limiter strategy, budget model, short-circuit order, observability).
2. The full middleware module with typed interfaces.
3. A usage example wrapping a client call and showing a budget-exceeded rejection.
4. The price-table data structure with a maintenance note.
5. A test checklist: throttle under burst, reject at budget, accurate token count on a stream, distributed-mode correctness.

Success signal: the output is good only if a call is rejected before firing when it would breach the rate limit or budget, streamed tokens are counted toward cost without blocking the stream, and every cost number is clearly labeled an estimate tied to a maintainable price table.

Use case

Use when you must stay under provider rate limits and a spending budget across many API calls, and want enforcement before the request fires rather than after the bill arrives.

When to use this

In production callers with bursty or high-volume API traffic, or any time cost control matters. Not for low-volume scripts.

Follow-up prompts

  • Add a per-tenant budget so multiple users share one provider key safely.
  • Generate an alerting hook (Slack/email/webhook) that fires at 80 percent of budget.
  • Add a fallback strategy that switches to a cheaper model when the budget is nearly exhausted.
#api#rate-limiting#cost-tracking#middleware#observability
Source
promptfork seed
License
CC-BY-4.0
Published
6/22/2026

More prompts you might like

Build a typed API client wrapper with retries, streaming, and structured errors

Produces a production-grade API client wrapper around an OpenAI/Anthropic-style API with typed requests and responses, streaming, exponential-backoff retries, and structured errors, instead of a bare fetch that dies on the first 429.

#api#typescript
New

Build an integration test harness with a mock API server

Produces an integration test harness that spins up a mock API server with recorded fixtures, and asserts your client behaves across success, error, retry, and streaming cases deterministically and offline.

#testing#api
New
Editor’s pickJournaling & Self-ReflectionSeed

Turn ChatGPT into a gentle shadow work guide

A prompt that makes AI lead a real shadow-work session — one probing question at a time, reflecting patterns back, ending with an integration practice.

0002

Prompt: an endless creative writing prompt generator

Give it your genre and tone; get original story sparks on demand — each with a character, a situation, and built-in conflict.

0001

Midjourney logo & brand mark — scalable marks that pass the favicon test

Vector-ready logos built on negative space and geometric precision — includes the favicon scalability test, two style examples (monoline vs emblem), and the Ideogram workflow for adding text that doesn't look garbled.

0001
Editor’s pickGame DevelopmentSeed

Open-world (GTA-style) game build prompt

Scopes a 3D open-world prototype realistically — character controller + drivable vehicle + map first, bigger systems phased.

0001