Rate Limiting Strategies for Production APIs

Rate limiting is your API's immune system—it protects against abuse, ensures fair resource allocation, and prevents both accidental and malicious overload. Yet many APIs either skip rate limiting entirely or implement it poorly.

Why Rate Limiting Matters

Without rate limiting:

A single misconfigured client can crash your API
Attackers can brute-force authentication endpoints
Scrapers can steal your entire dataset
Resource costs spike unpredictably
Paying customers suffer from shared resource contention

With proper rate limiting:

Predictable costs and performance
Protection against brute force and abuse
Fair resource allocation among users
Early warning of integration problems
Business model enforcement (free vs paid tiers)

Rate Limiting Strategies

1. Fixed Window

Algorithm: Count requests per time window (e.g., 100 requests per hour)

// Simple fixed window implementation
const requestCounts = new Map();

function fixedWindowRateLimit(userId, limit = 100, windowMs = 3600000) {
  const now = Date.now();
  const windowStart = Math.floor(now / windowMs) * windowMs;
  const key = `${userId}:${windowStart}`;
  
  const count = requestCounts.get(key) || 0;
  
  if (count >= limit) {
    return {
      allowed: false,
      retryAfter: windowStart + windowMs - now
    };
  }
  
  requestCounts.set(key, count + 1);
  return { allowed: true, remaining: limit - count - 1 };
}

Pros: Simple to implement, low memory footprint

Cons: Burst problem at window boundaries (2x limit in 2 seconds)

Best for: Internal APIs, low-traffic endpoints

2. Sliding Window

Algorithm: Count requests in the last N seconds (rolling window)

// Sliding window with Redis
const Redis = require('ioredis');
const redis = new Redis();

async function slidingWindowRateLimit(userId, limit = 100, windowSec = 3600) {
  const now = Date.now();
  const windowStart = now - (windowSec * 1000);
  const key = `ratelimit:${userId}`;
  
  // Remove old entries
  await redis.zremrangebyscore(key, 0, windowStart);
  
  // Count current window
  const count = await redis.zcard(key);
  
  if (count >= limit) {
    const oldest = await redis.zrange(key, 0, 0, 'WITHSCORES');
    const retryAfter = parseInt(oldest[1]) + (windowSec * 1000) - now;
    return { allowed: false, retryAfter: Math.ceil(retryAfter / 1000) };
  }
  
  // Add current request
  await redis.zadd(key, now, `${now}-${Math.random()}`);
  await redis.expire(key, windowSec + 1);
  
  return { allowed: true, remaining: limit - count - 1 };
}

Pros: No burst problem, smooth rate limiting, accurate

Cons: More complex, higher memory usage

Best for: High-value endpoints, payment APIs, authentication

3. Token Bucket

Algorithm: Bucket fills with tokens at fixed rate, requests consume tokens

class TokenBucket {
  constructor(capacity, refillRate, refillInterval = 1000) {
    this.capacity = capacity;
    this.tokens = capacity;
    this.refillRate = refillRate;
    this.refillInterval = refillInterval;
    this.lastRefill = Date.now();
  }
  
  refill() {
    const now = Date.now();
    const timePassed = now - this.lastRefill;
    const tokensToAdd = (timePassed / this.refillInterval) * this.refillRate;
    
    this.tokens = Math.min(this.capacity, this.tokens + tokensToAdd);
    this.lastRefill = now;
  }
  
  consume(tokens = 1) {
    this.refill();
    
    if (this.tokens >= tokens) {
      this.tokens -= tokens;
      return { allowed: true, remaining: Math.floor(this.tokens) };
    }
    
    return {
      allowed: false,
      retryAfter: Math.ceil((tokens - this.tokens) / this.refillRate * this.refillInterval / 1000)
    };
  }
}

Pros: Handles bursts gracefully, industry standard (AWS, Stripe use this)

Cons: More complex implementation, requires state per user

Best for: Production APIs, handling bursty traffic patterns

4. Leaky Bucket

Algorithm: Requests fill bucket, bucket "leaks" at constant rate

Pros: Smooths out bursts completely, constant output rate

Cons: Can add latency (requests wait in queue)

Best for: Protecting downstream services, background jobs

Rate Limit Headers (Standard)

Follow RFC 9110 and draft-ietf-httpapi-ratelimit-headers:

function addRateLimitHeaders(res, result) {
  // Standard headers
  res.set('RateLimit-Limit', result.limit);
  res.set('RateLimit-Remaining', result.remaining);
  res.set('RateLimit-Reset', result.reset);
  
  // Legacy X- headers for compatibility
  res.set('X-RateLimit-Limit', result.limit);
  res.set('X-RateLimit-Remaining', result.remaining);
  res.set('X-RateLimit-Reset', result.reset);
  
  if (!result.allowed) {
    res.set('Retry-After', result.retryAfter);
    res.status(429).json({
      error: 'Too Many Requests',
      message: `Rate limit exceeded. Retry after ${result.retryAfter} seconds.`,
      limit: result.limit,
      retryAfter: result.retryAfter
    });
  }
}

Multi-Tier Rate Limiting

Different limits for different user tiers:

const RATE_LIMITS = {
  free: { requests: 100, window: 3600 },      // 100/hour
  basic: { requests: 1000, window: 3600 },    // 1000/hour
  pro: { requests: 10000, window: 3600 },     // 10k/hour
  enterprise: { requests: 100000, window: 3600 } // 100k/hour
};

async function getRateLimit(userId) {
  const user = await db.findUser(userId);
  const tier = user.tier || 'free';
  return RATE_LIMITS[tier];
}

Endpoint-Specific Limits

Different endpoints need different limits:

const ENDPOINT_LIMITS = {
  'POST /auth/login': { limit: 5, window: 300 },      // 5 per 5 min
  'POST /auth/signup': { limit: 3, window: 3600 },    // 3 per hour
  'GET /api/search': { limit: 100, window: 60 },      // 100 per minute
  'POST /api/payments': { limit: 10, window: 60 },    // 10 per minute
  'GET /api/*': { limit: 1000, window: 3600 }         // Default
};

Advanced Patterns

1. Cost-Based Rate Limiting

Different endpoints cost different amounts of "tokens":

const ENDPOINT_COSTS = {
  'GET /api/users/:id': 1,           // Cheap: single record
  'GET /api/users': 5,               // Moderate: list query
  'POST /api/reports/generate': 50,  // Expensive: computation
  'POST /api/bulk-import': 100       // Very expensive: bulk
};

2. Dynamic Rate Limiting

Adjust limits based on system load:

function getDynamicLimit(baseLimit) {
  const load = getSystemLoad();
  
  if (load > 0.9) {
    return Math.floor(baseLimit * 0.5);  // 50% under extreme load
  } else if (load > 0.7) {
    return Math.floor(baseLimit * 0.75); // 75% under high load
  }
  
  return baseLimit; // 100% under normal load
}

3. Rate Limit Exemptions

Whitelist trusted IPs or users:

const WHITELISTED_IPS = [
  '10.0.0.0/8',      // Internal network
  '123.45.67.89'     // Monitoring service
];

function isWhitelisted(req) {
  if (WHITELISTED_IPS.some(range => ipInRange(req.ip, range))) {
    return true;
  }
  if (req.user && WHITELISTED_USERS.includes(req.user.id)) {
    return true;
  }
  return false;
}

Monitoring & Alerts

Track rate limit violations:

const rateLimitCounter = new metrics.Counter({
  name: 'api_rate_limit_exceeded_total',
  help: 'Total number of rate limit violations',
  labelNames: ['endpoint', 'user_tier']
});

function logRateLimitViolation(req, user) {
  rateLimitCounter.inc({
    endpoint: `${req.method} ${req.path}`,
    user_tier: user?.tier || 'anonymous'
  });
}

Best Practices Checklist

✅ Implement on ALL endpoints — Even "safe" GET requests can be abused
✅ Use multiple layers — Global + per-user + per-endpoint
✅ Return standard headers — RateLimit-* and Retry-After
✅ Stricter limits for authentication — Prevent brute force attacks
✅ Different limits for different tiers — Monetize via rate limits
✅ Cost-based for expensive operations — Charge more tokens for heavy endpoints
✅ Use distributed storage (Redis) — For multi-server deployments
✅ Monitor and alert — Track violations and abuse patterns
✅ Whitelist trusted sources — Skip internal services
✅ Test thoroughly — Verify limits work as expected

Conclusion

Rate limiting is not optional. Choose the strategy that fits your needs:

Token Bucket: Best for most APIs (industry standard)
Sliding Window: Precise control, no burst issues
Fixed Window: Simple, low traffic
Leaky Bucket: Smooth output, protect downstream

Start with token bucket unless you have specific requirements.

ThreeStack specializes in API security for fintech companies. Get a free security scan or learn about our audit services.