How to build a rate limiter
Rate limiter controls the large number of requests to prevent DDOS attacks which can result in the service outage. Commonly, it is configured as allowing n number of requests in a small duration (e.g. second). Multiple approaches can be adopted for the implementation of the rate limiters.
Thinking Process:
- Different algorithms for the rate limiting and their pros and cons
- Use cases of rate limiting
- Where to implement rate limiting in the application
Architecture
Let’s discuss various approaches for building the rate limiter:
Approach 1:
- Using a queue for each user (Leaky bucket)
For each user, we have a queue, in which new requests would be appended. If the queue size reaches its limits, the new requests would be started dropping. This approach helps in traffic reshaping.
Drawback: The new requests would be dropped if the queue is swamped with the old requests.
- Token bucket : Tokens are created at a particular time interval (e.g. 1 minute) and each incoming request would consume a token. The requests can be started dropping after this all the tokens are used.
How to implement a token bucket:
This counter could be implemented as a key in Redis. For each user, create a unique key using its user id in Redis and add an expiry of the key for a minute. Every time a request comes to serve, increment the value of the counter with the INCR operation. It would add an EXPIRY with EXPIRE operation to delete this key after few seconds. Both operations would be set by using MULTI and EXEC.
- Using a sliding window : It uses a weighted sum of the previous time buckets and current time bucket. This would be helpful in dealing with burst of traffic.
Sample implementation:
Value = 0.2 (last -2 ) + 0.7 (last -1) + last
We would fetch the tokens from the last 2 time buckets and add the remaining tokens with the weight.
If this value is greater than n, we would block the incoming request. The benefit of sliding window is that it can smoothen the burst of traffic.
Load shedding with rate limiter
Load shedding is the process of dropping low priority requests to make room for high priority requests. To reserve the computational resources for critical API calls like create/update, rate limiter can drop the low priority API calls e.g Get. This is very useful in the systems of very high request rate.
Where to implement rate limiter
Rate limiters can be implemented as a middleware in the server application. Middlewares can be considered as the low level plugin system, allowing you to implement hooks that get executed in the request/response process.
References