Rate limiting is one of those infrastructure primitives that every production API needs, yet most teams copy-paste from a library without understanding the trade-offs underneath. Get it wrong and you are either letting abusers hammer your service or blocking legitimate users at inconvenient moments.
This post tears apart two algorithms, the Sliding Window Counter and the Token Bucket, and builds each one from scratch using Node.js and Redis. Along the way we cover the edge cases that haunt real deployments: the boundary burst problem, clock drift in distributed systems, memory costs at scale, and the subtle fairness differences that determine which algorithm fits your use case.
If you have read the companion NestJS architecture deep-dive, you already know how to structure your service layer with Injectable providers and module boundaries. This post applies the same thinking to a cross-cutting concern: the rate-limiter lives in a dedicated module, exposes a clean service interface, and slots into middleware with zero coupling to business logic.
Why Redis? Both algorithms need atomic, shared state across multiple API servers. Redis gives us sub-millisecond latency, atomic Lua scripting, built-in key expiry, and a data model that maps perfectly onto both implementations.
Prerequisites
- Node.js 20 LTS or later
- Redis 7.x running locally (docker run -p 6379:6379 redis:7-alpine)
- ioredis npm install ioredis
- Basic TypeScript knowledge (decorators, generics, async/await)
- Familiarity with HTTP 429 Too Many Requests and Retry-After headers
The Problem Space
Before choosing an algorithm, we need to be precise about what we are measuring and enforcing.
What Is a Rate Limit?
A rate limit is a policy of the form: “at most N requests per unit of time T for a given identity I.” The identity can be an API key, an IP address, a user ID, or a composite like user + endpoint.
The unit of time T is deceptively important. “100 req/min” sounds simple, but it can mean three very different things depending on the algorithm:
- Exactly 100 in any 60-second calendar window (fixed window)
- Exactly 100 in any rolling 60-second period ending right now (sliding window)
- Tokens arrive at 1.67 per second, and you can accumulate up to 100 (token bucket)
Each interpretation yields a different algorithm with different burst characteristics, memory costs, and implementation complexity. Let us build both proper algorithms and see the differences in action.
The Fixed Window Baseline (and Why It Fails)
Before the good algorithms, let us understand the naive approach, Fixed Window, so we know exactly what problem it introduces.
// Fixed Window, naive approach
async function fixedWindowCheck(key: string, limit: number, windowMs: number) {
const windowKey = `ratelimit:${key}:${Math.floor(Date.now() / windowMs)}`;
const count = await redis.incr(windowKey)
if (count === 1) await redis.pexpire(windowKey, windowMs);
return count <= limit;
}This works until you think about the boundary. A user can make 100 requests at 00:59 and another 100 at 01:01, getting 200 requests in a 2-second span while technically respecting the limit. This is the “double-spend” boundary burst, and it is the primary motivation for every algorithm that follows.
Key insight: The fundamental failure of Fixed Window is that it counts in discrete epochs rather than across a continuous sliding time horizon. Both Sliding Window and Token Bucket solve this in different ways.
Sliding Window Counter
The Intuition
The Sliding Window Counter solves the boundary burst by blending two fixed windows: the current window and the previous one. Instead of a hard reset at the window boundary, the algorithm computes a weighted count:
// Conceptual formula const weight = 1 - (elapsedInCurrentWindow / windowDurationMs); const estimate = (previousCount * weight) + currentCount;
At the start of a new window the previous count contributes its full weight, gradually decaying to zero as the current window ages. This gives a smooth, continuous approximation of a true sliding window at the cost of only two integers per key in Redis.
Redis Data Model
Each rate-limit key maps to two Redis string keys: one for the current window counter and one for the previous. We use a Lua script to make the entire read-modify-write atomic. Without atomicity a race between two API servers could allow extra requests to slip through.
The Lua Script
-- sliding_window.lua
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local windowMs = tonumber(ARGV[2])
local now = tonumber(ARGV[3]) -- unix ms from Node
local windowStart = math.floor(now / windowMs) * windowMs
local prevStart = windowStart - windowMs
local currKey = key .. ':' .. windowStart
local prevKey = key .. ':' .. prevStart
-- Read both counters atomically
local counts = redis.call('MGET', currKey, prevKey)
local curr = tonumber(counts[1]) or 0
local prev = tonumber(counts[2]) or 0
-- Weighted estimate
local elapsed = now - windowStart
local weight = 1 - (elapsed / windowMs)
local estimate = prev * weight + curr
if estimate >= limit then
return {0, math.ceil(windowStart + windowMs - now)}
end
redis.call('INCR', currKey)
redis.call('PEXPIRE', currKey, windowMs * 2)
return {1, 0}Node.js Implementation
// sliding-window.ts
import Redis from 'ioredis';
import fs from 'fs';
import path from 'path';
export interface RateLimitResult {
allowed: boolean;
remaining: number;
retryAfterMs: number;
}
export class SlidingWindowRateLimiter {
private sha: string | null = null;
constructor(
private redis: Redis,
public limit: number,
private windowMs: number,
private clock: () => number = Date.now, // injectable for testing
) {}
async load(): Promise<void> {
const script = fs.readFileSync(
path.join(__dirname, 'sliding_window.lua'), 'utf8'
);
this.sha = await this.redis.script('LOAD', script);
}
async check(key: string): Promise<RateLimitResult> {
if (!this.sha) await this.load();
const now = this.clock();
try {
const result = await this.redis.evalsha(
this.sha!, 1, `sw:${key}`,
String(this.limit), String(this.windowMs), String(now),
) as [number, number];
return {
allowed: result[0] === 1,
remaining: Math.max(0, this.limit - 1),
retryAfterMs: result[1],
};
} catch (err: any) {
if (err.message?.includes('NOSCRIPT')) {
this.sha = null;
return this.check(key); // reload and retry once
}
// Fail open, Redis down should not take down the API
return { allowed: true, remaining: this.limit, retryAfterMs: 0 };
}
}
}Express Middleware
import { Request, Response, NextFunction } from 'express';
export function slidingWindowMiddleware(
limiter: SlidingWindowRateLimiter,
keyFn = (req: Request) => req.ip,
) {
return async (req: Request, res: Response, next: NextFunction) => {
const result = await limiter.check(keyFn(req));
res.setHeader('X-RateLimit-Limit', limiter.limit);
res.setHeader('X-RateLimit-Remaining', result.remaining);
if (!result.allowed) {
res.setHeader('Retry-After', Math.ceil(result.retryAfterMs / 1000));
res.status(429).json({ error: 'Too Many Requests', retryAfter: result.retryAfterMs });
return;
}
next();
};
}Token Bucket
The Intuition
The Token Bucket algorithm models a physical bucket. The bucket has a maximum capacity (burst limit). Tokens are added at a fixed rate (refill rate). Each request consumes one or more tokens. If the bucket is empty, the request is rejected.
Unlike the sliding window, which gives every client a fresh allowance at the start of each window, the token bucket is stateful per client: it remembers exactly how many tokens remain and when the last refill happened. This makes it excellent for use cases that benefit from controlled bursting.
The Lua Script
-- token_bucket.lua
local key = KEYS[1]
local capacity = tonumber(ARGV[1]) -- max tokens (burst size)
local refillRate = tonumber(ARGV[2]) -- tokens per second
local now = tonumber(ARGV[3]) -- unix ms
local cost = tonumber(ARGV[4]) -- tokens this request costs
local data = redis.call('HMGET', key, 'tokens', 'lastRefill')
local tokens = tonumber(data[1]) or capacity
local lastRefill = tonumber(data[2]) or now
-- Lazy refill: compute tokens added since last check
local elapsed = math.max(0, now - lastRefill) / 1000 -- seconds
tokens = math.min(capacity, tokens + elapsed * refillRate)
if tokens < cost then
local deficit = cost - tokens
local waitMs = math.ceil((deficit / refillRate) * 1000)
redis.call('HMSET', key, 'tokens', tokens, 'lastRefill', now)
redis.call('PEXPIRE', key, math.ceil(capacity / refillRate * 1000) + 5000)
return {0, waitMs}
end
tokens = tokens - cost
redis.call('HMSET', key, 'tokens', tokens, 'lastRefill', now)
redis.call('PEXPIRE', key, math.ceil(capacity / refillRate * 1000) + 5000)
return {1, 0}Node.js Implementation
// token-bucket.ts
import Redis from 'ioredis';
import fs from 'fs';
import path from 'path';
export class TokenBucketRateLimiter {
private sha: string | null = null;
constructor(
private redis: Redis,
private capacity: number, // max tokens (burst size)
private refillRate: number, // tokens per second
private clock: () => number = Date.now,
) {}
async load(): Promise<void> {
const script = fs.readFileSync(
path.join(__dirname, 'token_bucket.lua'), 'utf8'
);
this.sha = await this.redis.script('LOAD', script);
}
async check(
key: string,
cost: number = 1,
): Promise<{ allowed: boolean; retryAfterMs: number }> {
if (!this.sha) await this.load();
const now = this.clock();
try {
const result = await this.redis.evalsha(
this.sha!, 1, `tb:${key}`,
String(this.capacity), String(this.refillRate),
String(now), String(cost),
) as [number, number];
return { allowed: result[0] === 1, retryAfterMs: result[1] };
} catch {
return { allowed: true, retryAfterMs: 0 }; // fail open
}
}
}Variable-Cost Requests
One key advantage of Token Bucket over Sliding Window is the cost parameter. A bulk export endpoint might cost 10 tokens; a simple read costs 1. This lets you enforce resource-fair rate limits rather than request-count limits:
router.get('/export', async (req, res) => {
const result = await tokenBucket.check(req.ip, 10); // costs 10 tokens
if (!result.allowed) {
return res.status(429).json({ retryAfter: result.retryAfterMs });
}
// ... expensive export logic
});Clock Drift in Distributed Clusters
Token Bucket is sensitive to the timestamp passed in: if two API servers have divergent clocks, the computed elapsed time, and therefore the refill amount, will differ. Use Redis TIME to get the authoritative server timestamp rather than Node Date.now():
-- Use Redis server clock inside the Lua script
local serverTime = redis.call('TIME') -- {seconds, microseconds}
local now = serverTime[1] * 1000 + math.floor(serverTime[2] / 1000)Trade-off: Using Redis TIME adds one round-trip. At typical API scales (sub-1-second windows), the practical impact of small clock drift is negligible, only add this in clusters with known NTP issues.
Need help scaling your Node.js API infrastructure? Get a free consultation.
Side-by-Side Comparison
With both algorithms implemented, here is a direct comparison across the dimensions that matter in production:
| Property | Sliding Window | Token Bucket |
|---|---|---|
| Burst handling | Smooth, weighted blend prevents double-spend | Explicit, bucket depth defines burst size |
| Memory per key | O(1), two integers in Redis | O(1), count + last-refill timestamp |
| Fairness | High, no burst timing advantage | Moderate, idle clients accumulate tokens |
| Clock sensitivity | Low, only window boundary matters | High, stale timestamp skews refill math |
| Implementation | Two INCR + EXPIRE via Lua script | HMGET → compute → HMSET via Lua script |
| Best for | Public APIs, auth endpoints, webhook receivers | Streaming, uploads, variable-cost requests |
| Thundering herd | Possible at window boundary | Absent, gradual refill smooths load |
| Retry-After | Exact, derived from weighted estimate | Exact, derived from token deficit + rate |
Fairness Deep Dive
Fairness in rate limiting means that no client can gain a systematic advantage by timing their requests cleverly. Sliding Window is inherently fairer because the weighted count calculation means there is no privileged moment to burst, a client who sends 100 requests in the first second of a window is no better off than one who spreads them evenly.
Token Bucket is less fair in this sense: a client who happens to be idle for a full fill cycle gets a full bucket and can immediately burst. For public APIs where clients are adversarial, Sliding Window is the safer choice. For internal service-to-service calls where clients are trusted and bursty behaviour is normal, Token Bucket is more appropriate.
When to Choose Which
Choose Sliding Window When:
- You are protecting a public-facing API against abuse by untrusted clients
- Consistent, predictable fairness matters more than burst tolerance
- The endpoint is stateless and all requests have equal cost (login, search, webhook)
- You want the simplest possible mental model for your limit policies
- You need a precise Retry-After value, the weighted formula gives an exact estimate
Choose Token Bucket When:
- Clients legitimately need to burst (cold-start, batch job, media streaming)
- Requests have variable computational cost and you want resource-proportional limits
- You are rate-limiting internal service calls where clients are trusted
- You need smooth, gradual refill rather than a hard window reset
- The rate-limit policy is expressed as tokens per second, a natural fit for capacity planning
Rule of thumb: Public APIs with adversarial clients, Sliding Window. Internal services or streaming workloads, Token Bucket. When unsure, start with Sliding Window; it is easier to reason about under load.
NestJS Integration
If you are building in NestJS, the rate limiter fits cleanly into a dedicated module with an Injectable provider, exactly the same pattern as UsersModule with UsersService from the NestJS architecture guide.
The Module
// rate-limit.module.ts
import { Module, Global } from '@nestjs/common';
import { ConfigModule, ConfigService } from '@nestjs/config';
import Redis from 'ioredis';
import { SlidingWindowRateLimiter } from './sliding-window';
@Global()
@Module({
imports: [ConfigModule],
providers: [
{
provide: 'REDIS_CLIENT',
inject: [ConfigService],
useFactory: (config: ConfigService) => new Redis({
host: config.get('REDIS_HOST', 'localhost'),
port: config.get<number>('REDIS_PORT', 6379),
}),
},
{
provide: SlidingWindowRateLimiter,
inject: ['REDIS_CLIENT', ConfigService],
useFactory: (redis: Redis, config: ConfigService) =>
new SlidingWindowRateLimiter(
redis,
config.get<number>('RATE_LIMIT_MAX', 100),
config.get<number>('RATE_LIMIT_WINDOW_MS', 60_000),
),
},
],
exports: [SlidingWindowRateLimiter],
})
export class RateLimitModule {}The Guard
// rate-limit.guard.ts
import { Injectable, CanActivate, ExecutionContext } from '@nestjs/common';
import { SlidingWindowRateLimiter } from './sliding-window';
@Injectable()
export class RateLimitGuard implements CanActivate {
constructor(private readonly limiter: SlidingWindowRateLimiter) {}
async canActivate(ctx: ExecutionContext): Promise<boolean> {
const req = ctx.switchToHttp().getRequest();
const res = ctx.switchToHttp().getResponse();
const result = await this.limiter.check(req.ip);
res.setHeader('X-RateLimit-Limit', this.limiter.limit);
res.setHeader('X-RateLimit-Remaining', result.remaining);
if (!result.allowed) {
res.status(429).json({
statusCode: 429,
message: 'Too Many Requests',
retryAfter: result.retryAfterMs,
});
return false;
}
return true;
}
}Apply it globally in AppModule or per-controller with @UseGuards(RateLimitGuard). Swapping Sliding Window for Token Bucket requires changing exactly one import, the service boundary is clean.
Testing Your Rate Limiter
Rate limiters are notoriously hard to test because their behaviour is time-dependent. The key insight is to inject the clock as a dependency rather than calling Date.now() directly inside the implementation, both classes already accept a clock parameter for exactly this reason.
// deterministic test with injectable clock
let fakeTime = 1_700_000_000_000;
const limiter = new SlidingWindowRateLimiter(
redisMock,
5, // limit of 5
60_000, // 60-second window
() => fakeTime,
);
// Exhaust the limit
for (let i = 0; i < 5; i++) {
const r = await limiter.check('user:1');
expect(r.allowed).toBe(true);
}
// Next request should be denied
const denied = await limiter.check('user:1');
expect(denied.allowed).toBe(false);
// Advance to next window and verify reset
fakeTime += 60_001;
const allowed = await limiter.check('user:1');
expect(allowed.allowed).toBe(true);Conclusion
Rate limiting is not a solved problem you can pick up and drop in from a library without thought. The algorithm you choose shapes the user experience of your API in subtle but important ways: how clients experience bursts, how fair the limits feel, how much memory you consume at scale, and how easy the policy is to reason about under load.
To summarise what we have built and learned:
- Sliding Window Counter approximates a true rolling window using two Redis integers and a weighted formula, giving smooth, fair, predictable behaviour at O(1) memory cost per key.
- Token Bucket models explicit burst capacity and smooth refill, making it ideal for variable-cost requests and workloads where clients legitimately need to burst.
- Both implementations use Lua scripts in Redis to guarantee atomicity no race conditions, no extra round-trips.
- Edge cases Redis downtime, script eviction, clock drift need explicit handling in production code, not afterthoughts.
- The NestJS module pattern makes swapping algorithms a one-line change keep your rate limiter behind a clean service interface.
Start with Sliding Window for public APIs. Graduate to Token Bucket when your access patterns demand it. And always instrument your rate limiter log every 429 response with the key, algorithm, and remaining count so you can see exactly where your limits are being hit and whether they are calibrated correctly.









BLOGS
NEWSROOM
CASE STUDIES
WEBINARS
PODCASTS
ASSET HUB
EVENT CALENDAR 



















