Rate Limiters Demystified
Technology Blogs

Rate Limiters Demystified

Bhuvnesh Sharma
Software Engineer

Rate limiting is one of those infrastructure primitives that every production API needs, yet most teams copy-paste from a library without understanding the trade-offs underneath. Get it wrong and you are either letting abusers hammer your service or blocking legitimate users at inconvenient moments.

This post tears apart two algorithms, the Sliding Window Counter and the Token Bucket, and builds each one from scratch using Node.js and Redis. Along the way we cover the edge cases that haunt real deployments: the boundary burst problem, clock drift in distributed systems, memory costs at scale, and the subtle fairness differences that determine which algorithm fits your use case.

If you have read the companion NestJS architecture deep-dive, you already know how to structure your service layer with Injectable providers and module boundaries. This post applies the same thinking to a cross-cutting concern: the rate-limiter lives in a dedicated module, exposes a clean service interface, and slots into middleware with zero coupling to business logic.

Why Redis? Both algorithms need atomic, shared state across multiple API servers. Redis gives us sub-millisecond latency, atomic Lua scripting, built-in key expiry, and a data model that maps perfectly onto both implementations.

Prerequisites

  • Node.js 20 LTS or later
  • Redis 7.x running locally (docker run -p 6379:6379 redis:7-alpine)
  • ioredis npm install ioredis
  • Basic TypeScript knowledge (decorators, generics, async/await)
  • Familiarity with HTTP 429 Too Many Requests and Retry-After headers

The Problem Space

Before choosing an algorithm, we need to be precise about what we are measuring and enforcing.

What Is a Rate Limit?

A rate limit is a policy of the form: “at most N requests per unit of time T for a given identity I.” The identity can be an API key, an IP address, a user ID, or a composite like user + endpoint.

The unit of time T is deceptively important. “100 req/min” sounds simple, but it can mean three very different things depending on the algorithm:

  • Exactly 100 in any 60-second calendar window (fixed window)
  • Exactly 100 in any rolling 60-second period ending right now (sliding window)
  • Tokens arrive at 1.67 per second, and you can accumulate up to 100 (token bucket)

Each interpretation yields a different algorithm with different burst characteristics, memory costs, and implementation complexity. Let us build both proper algorithms and see the differences in action.

The Fixed Window Baseline (and Why It Fails)

Before the good algorithms, let us understand the naive approach, Fixed Window, so we know exactly what problem it introduces.

// Fixed Window, naive approach
async function fixedWindowCheck(key: string, limit: number, windowMs: number) {

const windowKey = `ratelimit:${key}:${Math.floor(Date.now() / windowMs)}`;

const count = await redis.incr(windowKey)

if (count === 1) await redis.pexpire(windowKey, windowMs);

return count <= limit;

}

This works until you think about the boundary. A user can make 100 requests at 00:59 and another 100 at 01:01, getting 200 requests in a 2-second span while technically respecting the limit. This is the “double-spend” boundary burst, and it is the primary motivation for every algorithm that follows.

Key insight: The fundamental failure of Fixed Window is that it counts in discrete epochs rather than across a continuous sliding time horizon. Both Sliding Window and Token Bucket solve this in different ways.

Sliding Window Counter

The Intuition

The Sliding Window Counter solves the boundary burst by blending two fixed windows: the current window and the previous one. Instead of a hard reset at the window boundary, the algorithm computes a weighted count:

// Conceptual formula
const weight = 1 - (elapsedInCurrentWindow / windowDurationMs);

const estimate = (previousCount * weight) + currentCount;

At the start of a new window the previous count contributes its full weight, gradually decaying to zero as the current window ages. This gives a smooth, continuous approximation of a true sliding window at the cost of only two integers per key in Redis.

Redis Data Model

Each rate-limit key maps to two Redis string keys: one for the current window counter and one for the previous. We use a Lua script to make the entire read-modify-write atomic. Without atomicity a race between two API servers could allow extra requests to slip through.

The Lua Script

-- sliding_window.lua
local key = KEYS[1]

local limit = tonumber(ARGV[1])

local windowMs = tonumber(ARGV[2])

local now = tonumber(ARGV[3]) -- unix ms from Node

local windowStart = math.floor(now / windowMs) * windowMs

local prevStart = windowStart - windowMs

local currKey = key .. ':' .. windowStart

local prevKey = key .. ':' .. prevStart

-- Read both counters atomically

local counts = redis.call('MGET', currKey, prevKey)

local curr = tonumber(counts[1]) or 0

local prev = tonumber(counts[2]) or 0

-- Weighted estimate

local elapsed = now - windowStart

local weight = 1 - (elapsed / windowMs)

local estimate = prev * weight + curr

if estimate >= limit then

return {0, math.ceil(windowStart + windowMs - now)}

end

redis.call('INCR', currKey)

redis.call('PEXPIRE', currKey, windowMs * 2)

return {1, 0}

Node.js Implementation

// sliding-window.ts
import Redis from 'ioredis';

import fs from 'fs';

import path from 'path';

export interface RateLimitResult {

allowed: boolean;

remaining: number;

retryAfterMs: number;

}

export class SlidingWindowRateLimiter {

private sha: string | null = null;

constructor(

private redis: Redis,

public limit: number,

private windowMs: number,

private clock: () => number = Date.now, // injectable for testing

) {}

async load(): Promise<void> {

const script = fs.readFileSync(

path.join(__dirname, 'sliding_window.lua'), 'utf8'

);

this.sha = await this.redis.script('LOAD', script);

}

async check(key: string): Promise<RateLimitResult> {

if (!this.sha) await this.load();

const now = this.clock();

try {

const result = await this.redis.evalsha(

this.sha!, 1, `sw:${key}`,

String(this.limit), String(this.windowMs), String(now),

) as [number, number];

return {

allowed: result[0] === 1,

remaining: Math.max(0, this.limit - 1),

retryAfterMs: result[1],

};

} catch (err: any) {

if (err.message?.includes('NOSCRIPT')) {

this.sha = null;

return this.check(key); // reload and retry once

}

// Fail open, Redis down should not take down the API

return { allowed: true, remaining: this.limit, retryAfterMs: 0 };

}

}

}

Express Middleware

import { Request, Response, NextFunction } from 'express';
export function slidingWindowMiddleware(

limiter: SlidingWindowRateLimiter,

keyFn = (req: Request) => req.ip,

) {

return async (req: Request, res: Response, next: NextFunction) => {

const result = await limiter.check(keyFn(req));

res.setHeader('X-RateLimit-Limit', limiter.limit);

res.setHeader('X-RateLimit-Remaining', result.remaining);

if (!result.allowed) {

res.setHeader('Retry-After', Math.ceil(result.retryAfterMs / 1000));

res.status(429).json({ error: 'Too Many Requests', retryAfter: result.retryAfterMs });

return;

}

next();

};

}

Token Bucket

The Intuition

The Token Bucket algorithm models a physical bucket. The bucket has a maximum capacity (burst limit). Tokens are added at a fixed rate (refill rate). Each request consumes one or more tokens. If the bucket is empty, the request is rejected.

Unlike the sliding window, which gives every client a fresh allowance at the start of each window, the token bucket is stateful per client: it remembers exactly how many tokens remain and when the last refill happened. This makes it excellent for use cases that benefit from controlled bursting.

The Lua Script

-- token_bucket.lua
local key = KEYS[1]

local capacity = tonumber(ARGV[1]) -- max tokens (burst size)

local refillRate = tonumber(ARGV[2]) -- tokens per second

local now = tonumber(ARGV[3]) -- unix ms

local cost = tonumber(ARGV[4]) -- tokens this request costs

local data = redis.call('HMGET', key, 'tokens', 'lastRefill')

local tokens = tonumber(data[1]) or capacity

local lastRefill = tonumber(data[2]) or now

 

-- Lazy refill: compute tokens added since last check

local elapsed = math.max(0, now - lastRefill) / 1000 -- seconds

tokens = math.min(capacity, tokens + elapsed * refillRate)

if tokens < cost then

local deficit = cost - tokens

local waitMs = math.ceil((deficit / refillRate) * 1000)

redis.call('HMSET', key, 'tokens', tokens, 'lastRefill', now)

redis.call('PEXPIRE', key, math.ceil(capacity / refillRate * 1000) + 5000)

return {0, waitMs}

end

tokens = tokens - cost

redis.call('HMSET', key, 'tokens', tokens, 'lastRefill', now)

redis.call('PEXPIRE', key, math.ceil(capacity / refillRate * 1000) + 5000)

return {1, 0}

Node.js Implementation

// token-bucket.ts
import Redis from 'ioredis';

import fs from 'fs';

import path from 'path';

export class TokenBucketRateLimiter {

private sha: string | null = null;

constructor(

private redis: Redis,

private capacity: number, // max tokens (burst size)

private refillRate: number, // tokens per second

private clock: () => number = Date.now,

) {}

async load(): Promise<void> {

const script = fs.readFileSync(

path.join(__dirname, 'token_bucket.lua'), 'utf8'

);

this.sha = await this.redis.script('LOAD', script);

}

async check(

key: string,

cost: number = 1,

): Promise<{ allowed: boolean; retryAfterMs: number }> {

if (!this.sha) await this.load();

const now = this.clock();

try {

const result = await this.redis.evalsha(

this.sha!, 1, `tb:${key}`,

String(this.capacity), String(this.refillRate),

String(now), String(cost),

) as [number, number];

return { allowed: result[0] === 1, retryAfterMs: result[1] };

} catch {

return { allowed: true, retryAfterMs: 0 }; // fail open

}

}

}

Variable-Cost Requests

One key advantage of Token Bucket over Sliding Window is the cost parameter. A bulk export endpoint might cost 10 tokens; a simple read costs 1. This lets you enforce resource-fair rate limits rather than request-count limits:

router.get('/export', async (req, res) => {
const result = await tokenBucket.check(req.ip, 10); // costs 10 tokens

if (!result.allowed) {

return res.status(429).json({ retryAfter: result.retryAfterMs });

}

// ... expensive export logic

});

Clock Drift in Distributed Clusters

Token Bucket is sensitive to the timestamp passed in: if two API servers have divergent clocks, the computed elapsed time, and therefore the refill amount, will differ. Use Redis TIME to get the authoritative server timestamp rather than Node Date.now():

-- Use Redis server clock inside the Lua script
local serverTime = redis.call('TIME') -- {seconds, microseconds}

local now = serverTime[1] * 1000 + math.floor(serverTime[2] / 1000)

Trade-off: Using Redis TIME adds one round-trip. At typical API scales (sub-1-second windows), the practical impact of small clock drift is negligible, only add this in clusters with known NTP issues.

Need help scaling your Node.js API infrastructure? Get a free consultation.

Side-by-Side Comparison

With both algorithms implemented, here is a direct comparison across the dimensions that matter in production:

PropertySliding WindowToken Bucket
Burst handlingSmooth, weighted blend prevents double-spendExplicit, bucket depth defines burst size
Memory per keyO(1), two integers in RedisO(1), count + last-refill timestamp
FairnessHigh, no burst timing advantageModerate, idle clients accumulate tokens
Clock sensitivityLow, only window boundary mattersHigh, stale timestamp skews refill math
ImplementationTwo INCR + EXPIRE via Lua scriptHMGET → compute → HMSET via Lua script
Best forPublic APIs, auth endpoints, webhook receiversStreaming, uploads, variable-cost requests
Thundering herdPossible at window boundaryAbsent, gradual refill smooths load
Retry-AfterExact, derived from weighted estimateExact, derived from token deficit + rate

Fairness Deep Dive

Fairness in rate limiting means that no client can gain a systematic advantage by timing their requests cleverly. Sliding Window is inherently fairer because the weighted count calculation means there is no privileged moment to burst, a client who sends 100 requests in the first second of a window is no better off than one who spreads them evenly.

Token Bucket is less fair in this sense: a client who happens to be idle for a full fill cycle gets a full bucket and can immediately burst. For public APIs where clients are adversarial, Sliding Window is the safer choice. For internal service-to-service calls where clients are trusted and bursty behaviour is normal, Token Bucket is more appropriate.

When to Choose Which

Choose Sliding Window When:

  • You are protecting a public-facing API against abuse by untrusted clients
  • Consistent, predictable fairness matters more than burst tolerance
  • The endpoint is stateless and all requests have equal cost (login, search, webhook)
  • You want the simplest possible mental model for your limit policies
  • You need a precise Retry-After value, the weighted formula gives an exact estimate

Choose Token Bucket When:

  • Clients legitimately need to burst (cold-start, batch job, media streaming)
  • Requests have variable computational cost and you want resource-proportional limits
  • You are rate-limiting internal service calls where clients are trusted
  • You need smooth, gradual refill rather than a hard window reset
  • The rate-limit policy is expressed as tokens per second, a natural fit for capacity planning

Rule of thumb: Public APIs with adversarial clients, Sliding Window. Internal services or streaming workloads, Token Bucket. When unsure, start with Sliding Window; it is easier to reason about under load.

NestJS Integration

If you are building in NestJS, the rate limiter fits cleanly into a dedicated module with an Injectable provider, exactly the same pattern as UsersModule with UsersService from the NestJS architecture guide.

The Module

// rate-limit.module.ts
import { Module, Global } from '@nestjs/common';

import { ConfigModule, ConfigService } from '@nestjs/config';

import Redis from 'ioredis';

import { SlidingWindowRateLimiter } from './sliding-window';

@Global()

@Module({

imports: [ConfigModule],

providers: [

{

provide: 'REDIS_CLIENT',

inject: [ConfigService],

useFactory: (config: ConfigService) => new Redis({

host: config.get('REDIS_HOST', 'localhost'),

port: config.get<number>('REDIS_PORT', 6379),

}),

},

{

provide: SlidingWindowRateLimiter,

inject: ['REDIS_CLIENT', ConfigService],

useFactory: (redis: Redis, config: ConfigService) =>

new SlidingWindowRateLimiter(

redis,

config.get<number>('RATE_LIMIT_MAX', 100),

config.get<number>('RATE_LIMIT_WINDOW_MS', 60_000),

),

},

],

exports: [SlidingWindowRateLimiter],

})

export class RateLimitModule {}

The Guard

// rate-limit.guard.ts
import { Injectable, CanActivate, ExecutionContext } from '@nestjs/common';

import { SlidingWindowRateLimiter } from './sliding-window';

@Injectable()

export class RateLimitGuard implements CanActivate {

constructor(private readonly limiter: SlidingWindowRateLimiter) {}

async canActivate(ctx: ExecutionContext): Promise<boolean> {

const req = ctx.switchToHttp().getRequest();

const res = ctx.switchToHttp().getResponse();

const result = await this.limiter.check(req.ip);

res.setHeader('X-RateLimit-Limit', this.limiter.limit);

res.setHeader('X-RateLimit-Remaining', result.remaining);

 

if (!result.allowed) {

res.status(429).json({

statusCode: 429,

message: 'Too Many Requests',

retryAfter: result.retryAfterMs,

});

return false;

}

return true;

}

}

Apply it globally in AppModule or per-controller with @UseGuards(RateLimitGuard). Swapping Sliding Window for Token Bucket requires changing exactly one import, the service boundary is clean.

Testing Your Rate Limiter

Rate limiters are notoriously hard to test because their behaviour is time-dependent. The key insight is to inject the clock as a dependency rather than calling Date.now() directly inside the implementation, both classes already accept a clock parameter for exactly this reason.

// deterministic test with injectable clock
let fakeTime = 1_700_000_000_000;

const limiter = new SlidingWindowRateLimiter(

redisMock,

5, // limit of 5

60_000, // 60-second window

() => fakeTime,

);

// Exhaust the limit

for (let i = 0; i < 5; i++) {

const r = await limiter.check('user:1');

expect(r.allowed).toBe(true);

}

// Next request should be denied

const denied = await limiter.check('user:1');

expect(denied.allowed).toBe(false);

// Advance to next window and verify reset

fakeTime += 60_001;

const allowed = await limiter.check('user:1');

expect(allowed.allowed).toBe(true);

Conclusion

Rate limiting is not a solved problem you can pick up and drop in from a library without thought. The algorithm you choose shapes the user experience of your API in subtle but important ways: how clients experience bursts, how fair the limits feel, how much memory you consume at scale, and how easy the policy is to reason about under load.

To summarise what we have built and learned:

  • Sliding Window Counter approximates a true rolling window using two Redis integers and a weighted formula, giving smooth, fair, predictable behaviour at O(1) memory cost per key.
  • Token Bucket models explicit burst capacity and smooth refill, making it ideal for variable-cost requests and workloads where clients legitimately need to burst.
  • Both implementations use Lua scripts in Redis to guarantee atomicity no race conditions, no extra round-trips.
  • Edge cases Redis downtime, script eviction, clock drift need explicit handling in production code, not afterthoughts.
  • The NestJS module pattern makes swapping algorithms a one-line change keep your rate limiter behind a clean service interface.

Start with Sliding Window for public APIs. Graduate to Token Bucket when your access patterns demand it. And always instrument your rate limiter log every 429 response with the key, algorithm, and remaining count so you can see exactly where your limits are being hit and whether they are calibrated correctly.

Bhuvnesh Sharma

Bhuvnesh Sharma

Software Engineer

Bhuvnesh is a proficient Full-Stack developer with 4+ years of expertise in the MERN stack. He excels in creating sustainable, scalable web applications and RESTful APIs with optimized code. Specializing in dynamic user interfaces and robust server-side applications, he is dedicated to staying current with the latest tech trends. His passion for innovation drives him to deliver high-quality solutions that exceed client expectations.

Share This Blog

Read More Similar Blogs

Let’s #Transform Healthcare,# Together.

Partner with us to design, build, and scale digital solutions that drive better outcomes.

BOOK A QUICK CONSULTATION

Have a Healthcare Project in Mind?

Let’s discuss your goals, workflows, and next steps in a focused consultation call.

Calendar icon Schedule a Call

Contact form