Rate Limiting: Definition, Benefits, Algorithms

Rate limiting is a simple yet highly effective technique for protecting APIs from unintentional and malicious overuse. Without a rate limit, anyone can bombard a server with requests and cause spikes in traffic that eat up resources, "starve" other users, and make the service unresponsive.

This article is an intro to rate limiting and the importance of restricting the number of requests that reach APIs and services. We explain what rate limits are and how they work, plus cover the different types of algorithms you can use to adopt rate limiting for your use case.

Almost 95% of companies have had an API-related security incident in 2022. Additionally, approximately 31% (around 5 billion) of all malicious transactions targeted APIs, which should place securing this attack vector at the top of an organization's to-do list.

What Is Rate Limiting?

Rate limiting is the practice of restricting the number of requests users can make to a specific API or service. You place a cap on how often users can repeat an action (i.e., attempting to log into an account or send a message) within a certain time frame. If someone reaches their limit, the server begins rejecting additional requests.

Rate limiting is both a cybersecurity precaution and a key part of software quality assurance (QA). Companies use rate limits to:

Prevent a large number of requests from overwhelming a web or application server.
Ensure all users have equal access to the API and that no one consumes too much bandwidth, data storage, or memory.
Prevent different kinds of malicious bot activity (namely DoS/DDoS and brute force attacks).
Stop human users, bots, and applications (anyone or anything with the ability to issue calls to an API) from abusing a web property.

Technically, rate limiting is a form of traffic shaping. The practice lets you control the flow and distribution of traffic to prevent infrastructure overload or failure.

Most systems with a rate limit have caps well above what even a high-volume user could realistically request. The most common example is social media messaging. All social media websites have a cap on the number of direct messages you can send to other users. If someone decides to send a thousand messages to other profiles, rate limiting kicks in and stops the user from sending messages for a certain period.

Learn the most effective ways to prevent DDoS attacks and stay a step ahead of would-be hackers trying to overload your server with fake traffic.

Why Is Rate Limiting Important?

Here's a list of the main reasons why rate limiting is an essential aspect of any healthy service:

Preventing overloads. Too many requests can overwhelm a server, causing it to slow down or even become unresponsive. Limiting the number of requests a server or API processes helps maintain the performance and availability of your service.
Ensuring fairness. Rate limiting prevents any user from monopolizing resources at the expense of others. Placing a cap on allowed requests gives everyone a fair opportunity to use the service.
Managing costs. If there's a cost associated with each request made to the service, rate limiting is essential to controlling expenses and ensuring efficient use of resources.
Usage metering. If a user signs up for a plan that allows, for example, 1000 API requests per hour, rate limiting ensures the client stays within the set cap.
Controlling data flow. Rate limiting enables an admin to control data flows, which is key for APIs that process large volumes of data. For example, you could distribute data evenly between two APIs by limiting the flow into each element.
Protecting against malicious activity. Rate limiting is vital to preventing several types of cyberattacks. The practice is the go-to counter to denial-of-service attacks (flooding the server with requests in an attempt to make it unresponsive), brute force attacks (bots trying to guess login passwords by typing in random characters), and credential stuffing (using a list of compromised user credentials to breach into a system).

Our comprehensive article on the different types of cyberattacks takes you through 16 kinds of attacks your team must be ready to face.

How Does Rate Limiting Work?

To set a rate limit, an admin places a cap on the number of requests users can make to a server or API within a certain time frame. Typically, the rate-limiting mechanism tracks two key factors:

The IP addresses of users sending requests.
How much time elapses between each request.

The main metric for rate limits is the Transactions Per Second (TPS). If a single IP address makes too many requests within a certain period (i.e., goes over its TPS limit), rate limiting stops the server or API from responding. The user gets an error message and is unable to send further requests until the timer resets.

Rate limiting always relies on some form of throttling mechanism that slows down or blocks requests. Admins implement rate limiting on the server or client side, depending on which strategy better fits the use case:

Server-side rate limiting is more effective at preventing overload and stopping malicious activity.
Client-side rate limiting is better at managing costs and ensuring fair use of resources.

Many admins also set rate limits based on usernames. This approach prevents brute force attackers from attempting to log in from multiple IP addresses.

Types of Rate Limiting Algorithms

Let's look at the different types of rate limits you can use to control access to a server or API. Just remember that you can combine different types into a hybrid strategy. For example, you may limit the number of requests based on both IP addresses and certain time intervals.

Time-Based Rate Limits

Time-based rate limits operate on pre-defined time intervals. For example, a server may limit requests to a certain number per time period (such as 100 per minute).

Time-based rate limits typically apply to all users. You can set these limits to be either fixed (timers count down regardless of when and if users make requests) or sliding (the countdown starts whenever someone makes the first request).

Geographic Rate Limits

Geographic rate limits restrict the number of requests coming from certain regions. These caps are an excellent choice when running location-based campaigns. Admins get to limit the requests from outside the target audience and increase availability in target regions.

These rate limits are also good at preventing suspicious traffic. For example, you could predict that users in a certain region are less active between 11:00 PM and 8:00 AM. You set a lower rate limit for this time, which further constrains any attacker hoping to cause problems with malicious traffic.

User-Based Rate Limits

User-based rate limits control the number of actions individual users can take in a certain time frame. For example, a server may limit the number of login attempts each user can make to 100 per day.

User-based limits are the most common type of rate limiting. Most systems track the user's IP address or API key (or both). If the user exceeds the set rate limit, the app denies any further requests until the per-user counter resets.

Keep in mind that this type of rate limiting requires the system to maintain the usage statistics of each user. This type of setup often leads to operational overhead and increases overall IT costs.

Concurrency Rate Limiting

Concurrency rate limits control the number of parallel sessions the system allows in a certain time frame. For example, an app might prevent more than 1000 sessions within a minute.

Server Rate Limits

Server rate limiting helps admins share a workload among different servers. For example, if you run a distributed architecture with five servers, you could use a rate limit to place a cap on each device.

If one of the servers reaches its cap, the device either routes it to another server or drops the request. Such a strategy is vital to achieving high availability and preventing DoS attacks that target a specific server.

API Endpoint-Based Rate Limiting

These rate limits are based on the specific API endpoints users are trying to access. For example, an admin may limit requests to a specific endpoint to 50 per minute, either due to security or overloading concerns.

Learn about endpoint security and see what it takes to keep devices at the network's edge safe from malicious activity.

Rate Limiting Algorithms

Here are the most common algorithms companies rely on to implement rate limiting:

Token bucket. Each time a user makes a request, the system removes a token from the so-called token bucket. Once the bucket is empty, the user cannot make further requests until a refill resets the number of session tokens.
Greedy token bucket. This algorithm allows users to accumulate unused tokens and build up a bigger bucket. Users who do not fully utilize their quota use more tokens (i.e., requests) in the future.
Leaky bucket. Each time someone makes a request, the system adds a token to the user's bucket. If the bucket is full (i.e., users reach the set limit), the system drops all further requests. The leaky bucket is easy to implement on a load balancer and is highly memory-efficient.
Fixed window. This algorithm limits the number of requests users can make within a fixed time window (typically either a minute or an hour). For example, the server may only serve 100 requests between 11:00 and 11:01 AM. At 11:01 am, the window resets.
Rolling window. Instead of using a fixed time window like the previous algorithm, this method relies on a rolling window. The time frame only starts when a user makes a new request. For example, if the first request arrives exactly at 10:15:48 and the rate limit sits at 20 per minute, the server will allow 19 more requests until 10:16:48.
Sliding log. This algorithm requires the system to maintain a time-stamped log (set or table) of requests for every individual user. The system calculates the sum of logs to determine the request rate. If the rate exceeds the threshold rate, the system holds the request; otherwise, the system serves the request.

The main factors to consider when choosing a rate-limiting algorithm are the unique needs of your API and the expected traffic volume. Your method of choice must prevent overload and stop malicious activity but also ensure legitimate users use the service without interruptions.

How to Implement Rate Limiting?

Below is a step-by-step guide to implementing rate limiting (although the exact way you set limits depends on your specific tech stack):

Determine rate limit rules. Decide the specific rate limit you'll use. Determine the type of rate limit (time-based, request-based, etc.), the limit itself (i.e., the number of allowed requests), and the duration of the limit.
Choose a rate-limiting algorithm. Choose an algorithm that best fits your requirements. In general, the token bucket is suitable for most user-based rate limits. Likewise, the fixed window is excellent for time-based rate limits.
Decide where to implement the rate limit. Decide whether the team should implement rate limiting on the server or client side (or both).
Implement the rate limit. Next, deploy the chosen rate-limiting algorithm. This process usually involves a public module that provides rate-limiting functionality, although your team might have to write some custom code.
Test the rate limit. Once set up, the rate-limiting mechanism requires extensive testing. Use a mix of manual and automated testing to simulate different traffic conditions. Test the system periodically and adjust the rules as needed to keep performance levels up.

Implementing rate limiting is a simple process for most use cases. For example, if you're using Nginx as a web server and wish to set a rate limit at the server level, you'll use the ngx_http_limit_req_module module. Simply add the following code to the Nginx configuration file to set up rate limits based on the user's IP address:

http {
    limit_req_zone $binary_remote_addr zone=one:10m rate=2r/s;
    ...

server {
    ...
    location /promotion/ {
        limit_req zone=one burst=5;
    }
}

The code above allows no more than 2 requests per second on average, while bursts cannot exceed 5 requests.

A Simple, Yet Highly Effective Defensive Practice

Rate limiting is essential both for the security and quality of your APIs, apps, and websites. Failing to limit the number of requests leaves you open to traffic-based attacks and leads to poor performance (which causes higher bounce rates, problems with customer retention, etc.). Considering how easy it is to implement this precaution, setting a rate limit is a no-brainer decision for most use cases.