March 26, 20268 min read

API Rate Limiting — Patterns and Implementation

Implement API rate limiting with token bucket, sliding window, and fixed window algorithms. Node.js and Redis examples, HTTP headers, and production patterns.

api rate limiting backend security redis

Without rate limiting, a single user (or bot) can hammer your API with thousands of requests per second and either crash your server or rack up enormous cloud bills. Rate limiting isn't optional for any public-facing API.

But "just add rate limiting" is vague. There are multiple algorithms, each with different trade-offs. The choice depends on whether you need hard limits, smooth throttling, or fairness across users.

The Three Algorithms You Need to Know

Algorithm	How it works	Best for
Fixed Window	Count requests in fixed time intervals (e.g., per minute)	Simple APIs, low-stakes limits
Sliding Window	Count requests in a rolling time window	Most APIs, good balance of accuracy and performance
Token Bucket	Tokens accumulate at a fixed rate; each request costs a token	Burst-tolerant APIs, CDNs, complex throttling

Fixed Window

Count requests in fixed time intervals. If the limit is 100/minute, the window resets every 60 seconds.

Problem: A user can send 100 requests at 11:59:59 and 100 more at 12:00:00 — 200 requests in 2 seconds. The boundary between windows creates a burst vulnerability.

// In-memory fixed window (for single-server apps)
const rateLimits = new Map();

function fixedWindowLimiter(key, limit, windowMs) {
  const now = Date.now();
  const windowStart = Math.floor(now / windowMs) * windowMs;
  const windowKey = ${key}:${windowStart};

const current = rateLimits.get(windowKey) || 0;

if (current >= limit) {
    return { allowed: false, remaining: 0, resetAt: windowStart + windowMs };
  }

rateLimits.set(windowKey, current + 1);

// Clean up old windows
  for (const [k] of rateLimits) {
    if (!k.endsWith(:${windowStart})) {
      rateLimits.delete(k);
    }
  }

return {
    allowed: true,
    remaining: limit - current - 1,
    resetAt: windowStart + windowMs,
  };
}

Sliding Window Log

Track the timestamp of every request. Count how many fall within the last N seconds. More accurate than fixed window, but uses more memory.

const requestLogs = new Map();

function slidingWindowLimiter(key, limit, windowMs) {
  const now = Date.now();
  const windowStart = now - windowMs;

// Get or create the log for this key
  let log = requestLogs.get(key) || [];

// Remove expired entries
  log = log.filter((timestamp) => timestamp > windowStart);

if (log.length >= limit) {
    requestLogs.set(key, log);
    return {
      allowed: false,
      remaining: 0,
      retryAfter: Math.ceil((log[0] + windowMs - now) / 1000),
    };
  }

log.push(now);
  requestLogs.set(key, log);

return {
    allowed: true,
    remaining: limit - log.length,
  };
}

Token Bucket

The most flexible algorithm. Tokens accumulate at a steady rate (e.g., 10 per second). Each request consumes a token. The bucket has a maximum capacity. This naturally allows bursts (up to the bucket size) while enforcing a long-term average rate.

const buckets = new Map();

function tokenBucketLimiter(key, capacity, refillRate, refillIntervalMs) {
  const now = Date.now();
  let bucket = buckets.get(key);

if (!bucket) {
    bucket = { tokens: capacity, lastRefill: now };
    buckets.set(key, bucket);
  }

// Refill tokens based on elapsed time
  const elapsed = now - bucket.lastRefill;
  const tokensToAdd = Math.floor(elapsed / refillIntervalMs) * refillRate;

if (tokensToAdd > 0) {
    bucket.tokens = Math.min(capacity, bucket.tokens + tokensToAdd);
    bucket.lastRefill = now;
  }

if (bucket.tokens < 1) {
    return {
      allowed: false,
      remaining: 0,
      retryAfter: Math.ceil(refillIntervalMs / 1000),
    };
  }

bucket.tokens -= 1;
  return {
    allowed: true,
    remaining: Math.floor(bucket.tokens),
  };
}

Production Implementation with Redis

In-memory rate limiting only works on a single server. The moment you scale to multiple instances, each server tracks counts independently. A user hitting two servers gets double the limit.

Redis solves this with atomic operations and automatic expiration:

const Redis = require("ioredis");
const redis = new Redis(process.env.REDIS_URL);

async function rateLimitRedis(key, limit, windowSeconds) {
  const redisKey = ratelimit:${key};

const multi = redis.multi();
  multi.incr(redisKey);
  multi.ttl(redisKey);

const results = await multi.exec();
  const count = results[0][1];
  const ttl = results[1][1];

// Set expiration on first request in window
  if (ttl === -1) {
    await redis.expire(redisKey, windowSeconds);
  }

const remaining = Math.max(0, limit - count);
  const allowed = count <= limit;

return {
    allowed,
    remaining,
    limit,
    resetAt: Date.now() + (ttl > 0 ? ttl  1000 : windowSeconds  1000),
  };
}

Sliding Window with Redis (More Accurate)

Using a sorted set for sliding window:

async function slidingWindowRedis(key, limit, windowMs) {
  const now = Date.now();
  const windowStart = now - windowMs;
  const redisKey = ratelimit:sw:${key};

const pipe = redis.pipeline();
  pipe.zremrangebyscore(redisKey, 0, windowStart);  // Remove expired
  pipe.zadd(redisKey, now, ${now}:${Math.random()});  // Add current request
  pipe.zcard(redisKey);  // Count requests in window
  pipe.expire(redisKey, Math.ceil(windowMs / 1000));  // Set TTL

const results = await pipe.exec();
  const count = results[2][1];

return {
    allowed: count <= limit,
    remaining: Math.max(0, limit - count),
    limit,
  };
}

Express Middleware

Wrap the rate limiter into Express middleware:

function createRateLimiter({ limit = 100, windowSeconds = 60, keyFn } = {}) {
  return async (req, res, next) => {
    // Determine the rate limit key
    const key = keyFn
      ? keyFn(req)
      : req.user?.id || req.ip;

const result = await rateLimitRedis(key, limit, windowSeconds);

// Set standard rate limit headers
    res.set({
      "X-RateLimit-Limit": result.limit,
      "X-RateLimit-Remaining": result.remaining,
      "X-RateLimit-Reset": Math.ceil(result.resetAt / 1000),
    });

if (!result.allowed) {
      res.set("Retry-After", windowSeconds);
      return res.status(429).json({
        error: "Too many requests",
        retryAfter: windowSeconds,
      });
    }

next();
  };
}

// Usage
const apiLimiter = createRateLimiter({ limit: 100, windowSeconds: 60 });
const authLimiter = createRateLimiter({ limit: 5, windowSeconds: 300 }); // 5 per 5 min

app.use("/api/", apiLimiter);
app.use("/api/auth/login", authLimiter);

Different Limits for Different Endpoints

Not all endpoints are equal. A GET request reading public data is cheap. A POST request that sends emails or processes payments is expensive.

const RATE_LIMITS = {
  default:    { limit: 100, windowSeconds: 60 },
  auth:       { limit: 5,   windowSeconds: 300 },
  upload:     { limit: 10,  windowSeconds: 60 },
  search:     { limit: 30,  windowSeconds: 60 },
  webhook:    { limit: 1000, windowSeconds: 60 },
};

function rateLimitByRoute(routeKey) {
  const config = RATE_LIMITS[routeKey] || RATE_LIMITS.default;
  return createRateLimiter(config);
}

app.post("/api/auth/login", rateLimitByRoute("auth"), loginHandler);
app.post("/api/upload", rateLimitByRoute("upload"), uploadHandler);
app.get("/api/search", rateLimitByRoute("search"), searchHandler);

Rate Limit by API Key (for Public APIs)

If you're building a public API, rate limit by API key and offer different tiers:

const PLAN_LIMITS = {
  free:       { limit: 100,   windowSeconds: 3600 },  // 100/hour
  pro:        { limit: 1000,  windowSeconds: 3600 },   // 1,000/hour
  enterprise: { limit: 10000, windowSeconds: 3600 },   // 10,000/hour
};

async function apiKeyRateLimiter(req, res, next) {
  const apiKey = req.headers["x-api-key"];
  if (!apiKey) {
    return res.status(401).json({ error: "API key required" });
  }

const keyData = await db.apiKey.findUnique({ where: { key: apiKey } });
  if (!keyData) {
    return res.status(401).json({ error: "Invalid API key" });
  }

const planConfig = PLAN_LIMITS[keyData.plan] || PLAN_LIMITS.free;
  const result = await rateLimitRedis(
    apikey:${apiKey},
    planConfig.limit,
    planConfig.windowSeconds
  );

res.set({
    "X-RateLimit-Limit": planConfig.limit,
    "X-RateLimit-Remaining": result.remaining,
    "X-RateLimit-Reset": Math.ceil(result.resetAt / 1000),
  });

if (!result.allowed) {
    return res.status(429).json({
      error: "Rate limit exceeded",
      upgrade_url: "https://myapi.com/pricing",
    });
  }

req.apiKeyData = keyData;
  next();
}

HTTP Headers You Should Return

These headers are the standard for communicating rate limit status:

Header	Value	Purpose
`X-RateLimit-Limit`	100	Maximum requests allowed in window
`X-RateLimit-Remaining`	47	Requests remaining in current window
`X-RateLimit-Reset`	1679097600	Unix timestamp when the window resets
`Retry-After`	60	Seconds to wait before retrying (on 429 responses)

Good API clients read these headers and self-throttle. Bad clients ignore them and get banned.

Rate Limiting Behind a Reverse Proxy

If your app is behind Nginx, Cloudflare, or a load balancer, req.ip returns the proxy's IP, not the client's. All users share one rate limit. Fix it:

// Trust the X-Forwarded-For header (only if you trust the proxy)
app.set("trust proxy", 1); // Trust first proxy

// Now req.ip returns the real client IP

Be careful: if you trust X-Forwarded-For without a trusted proxy, clients can spoof their IP to bypass rate limits. Only enable trust proxy when your app is actually behind a proxy you control.

Common Mistakes

Rate limiting by IP only. Behind NAT (corporate networks, mobile carriers), thousands of users share one IP. Use user ID for authenticated requests, IP only as a fallback.

No rate limiting on auth endpoints. Login and password reset endpoints are brute force targets. These need strict limits: 5-10 attempts per 5 minutes.

Returning a generic 500 instead of 429. Status code 429 (Too Many Requests) tells clients explicitly what happened. Include Retry-After so they know when to try again.

In-memory rate limiting with multiple servers. Each server maintains its own count. Use Redis or a similar shared store.

Not rate limiting internal services. Microservice A calling microservice B in a loop can create cascading failures. Apply rate limits to internal APIs too.

Rate limiting is foundational for any API that handles real traffic. Explore more backend architecture patterns through hands-on exercises on CodeUp.