API Rate Limiting and Usage-Based Pricing: the complete guide

Why rate limiting is essential for your SaaS

Picture this: your SaaS is running smoothly, customers are using your API, and then everything crashes. One customer sends 50,000 requests per minute — accidentally or intentionally. Without rate limiting, your entire platform goes down for everyone.

Rate limiting isn't a nice-to-have. It's the foundation of a stable, scalable SaaS. And if you do it right, you can tie it directly into your pricing model.

The three layers of rate limiting

An effective rate limiting strategy works at three levels:

1. Global rate limiting (infrastructure)

This is your first line of defense. At the nginx or load balancer level, you cap the total number of requests per IP.

# nginx.conf
limit_req_zone $binary_remote_addr zone=global:10m rate=100r/s;

server {
    location /api/ {
        limit_req zone=global burst=50 nodelay;
        limit_req_status 429;
    }
}

2. Per-tenant rate limiting (application)

This is where it gets interesting. Each customer gets their own limit based on their plan.

// middleware/rateLimiter.ts
import { Redis } from 'ioredis';

const redis = new Redis(process.env.REDIS_URL);

interface RateLimitConfig {
  windowMs: number;
  maxRequests: number;
}

const PLAN_LIMITS: Record<string, RateLimitConfig> = {
  starter:    { windowMs: 60_000, maxRequests: 100 },
  growth:     { windowMs: 60_000, maxRequests: 1_000 },
  enterprise: { windowMs: 60_000, maxRequests: 10_000 },
};

export async function checkRateLimit(
  tenantId: string,
  plan: string
): Promise<{ allowed: boolean; remaining: number; resetAt: number }> {
  const config = PLAN_LIMITS[plan] || PLAN_LIMITS.starter;
  const windowKey = Math.floor(Date.now() / config.windowMs);
  const key = `rl:${tenantId}:${windowKey}`;

  const current = await redis.incr(key);
  if (current === 1) {
    await redis.pexpire(key, config.windowMs);
  }

  const remaining = Math.max(0, config.maxRequests - current);
  const resetAt = (windowKey + 1) * config.windowMs;

  return {
    allowed: current <= config.maxRequests,
    remaining,
    resetAt,
  };
}

3. Per-endpoint rate limiting

Some endpoints are more expensive than others. A GET /users is cheap, but a POST /reports/generate can cost minutes of CPU time.

// Weighted rate limiting per endpoint
const ENDPOINT_WEIGHTS: Record<string, number> = {
  'GET /api/users':           1,
  'GET /api/reports':         2,
  'POST /api/reports':       10,
  'POST /api/ai/generate':   25,
};

export function getRequestCost(method: string, path: string): number {
  return ENDPOINT_WEIGHTS[`${method} ${path}`] || 1;
}

From rate limiting to usage-based pricing

Here's where the real magic happens: since you're already counting every API call, you can use that data for your pricing model.

The hybrid model

Most successful SaaS companies use a hybrid model: a fixed base fee plus variable costs based on usage.

Component	Starter (€49/mo)	Growth (€149/mo)	Enterprise (custom)
API calls	10,000 incl.	100,000 incl.	1,000,000 incl.
Extra calls	€0.005/call	€0.003/call	€0.001/call
AI features	❌	1,000 credits	Unlimited
Webhooks	5	25	Unlimited

Implementing usage tracking

// services/usageTracker.ts
import { Redis } from 'ioredis';

const redis = new Redis(process.env.REDIS_URL);

export async function trackUsage(
  tenantId: string,
  metric: string,
  cost: number = 1
): Promise<void> {
  const monthKey = new Date().toISOString().slice(0, 7); // "2026-03"
  const key = `usage:${tenantId}:${monthKey}:${metric}`;

  await redis.incrbyfloat(key, cost);

  // Keep for 90 days
  await redis.expire(key, 90 * 24 * 60 * 60);
}

export async function getUsage(
  tenantId: string,
  metric: string,
  month?: string
): Promise<number> {
  const monthKey = month || new Date().toISOString().slice(0, 7);
  const key = `usage:${tenantId}:${monthKey}:${metric}`;

  const value = await redis.get(key);
  return parseFloat(value || '0');
}

Overage alerts: warn your customers

Nobody likes surprise bills. Send proactive notifications when customers approach their limits.

// services/usageAlerts.ts
const ALERT_THRESHOLDS = [0.75, 0.90, 1.0];

export async function checkUsageAlerts(
  tenantId: string,
  plan: string
): Promise<void> {
  const usage = await getUsage(tenantId, 'api_calls');
  const limit = PLAN_LIMITS[plan].monthlyIncluded;
  const ratio = usage / limit;

  for (const threshold of ALERT_THRESHOLDS) {
    if (ratio >= threshold) {
      const alertKey = `alert:${tenantId}:usage:${threshold}`;
      const alreadySent = await redis.get(alertKey);

      if (!alreadySent) {
        await sendUsageAlert(tenantId, {
          percentage: Math.round(threshold * 100),
          current: usage,
          limit,
        });
        await redis.set(alertKey, '1', 'EX', 30 * 24 * 3600);
      }
    }
  }
}

The right response headers

Good APIs communicate rate limit status via headers. Here's the standard:

// middleware/rateLimitHeaders.ts
export function setRateLimitHeaders(
  res: Response,
  result: { remaining: number; resetAt: number; limit: number }
): void {
  res.setHeader('X-RateLimit-Limit', result.limit);
  res.setHeader('X-RateLimit-Remaining', result.remaining);
  res.setHeader('X-RateLimit-Reset', Math.ceil(result.resetAt / 1000));
  res.setHeader('Retry-After', Math.ceil((result.resetAt - Date.now()) / 1000));
}

When a customer exceeds their limit, return a 429 Too Many Requests:

{
  "error": "rate_limit_exceeded",
  "message": "You've reached your API limit. Upgrade your plan or wait for the reset.",
  "upgrade_url": "https://app.example.com/billing/upgrade",
  "reset_at": "2026-03-05T12:00:00Z"
}

Stripe integration for metered billing

If you're using Stripe (see our earlier article on Stripe integration), you can connect usage-based pricing directly:

// services/billing.ts
import Stripe from 'stripe';

const stripe = new Stripe(process.env.STRIPE_SECRET_KEY);

// Daily: report usage to Stripe
export async function reportUsageToStripe(
  subscriptionItemId: string,
  quantity: number
): Promise<void> {
  await stripe.subscriptionItems.createUsageRecord(
    subscriptionItemId,
    {
      quantity,
      timestamp: Math.floor(Date.now() / 1000),
      action: 'set', // 'set' for absolute value, 'increment' for addition
    }
  );
}

Common mistakes

1. Setting limits too tight at launch

Start generous. Nothing is more frustrating for a new customer than immediately hitting a limit. You can always tighten later.

2. No grace period

Give customers who just exceed their limit a short grace period (e.g., 10% buffer). Hard cutoffs feel hostile.

3. Rate limiting without monitoring

If you don't know who's hitting limits, you're missing upsell opportunities and bugs. Log every 429 response.

// Log rate limit hits for analysis
await analytics.track('rate_limit_hit', {
  tenantId,
  plan,
  endpoint,
  currentUsage: current,
  limit: config.maxRequests,
});

4. Forgetting to exclude internal services

Your own microservices shouldn't hit rate limits. Use a separate authentication layer for service-to-service communication.

Conclusion: rate limiting as a growth strategy

Rate limiting is more than protection — it's a growth strategy. By tying it to usage-based pricing, you:

Lower the barrier to entry: customers only pay for what they use
Increase lifetime value: heavy users automatically upgrade
Protect your platform: no single customer can bring down your service
Gain insights: usage data tells you exactly where your product delivers value

Start with a simple implementation (Redis + sliding window), measure usage, and build your pricing model from there. Your customers — and your infrastructure — will thank you.