Building a Simple Rate Limiter in Java
With uses from an Android perspective

Rate Limiting
Real-world users are cruel, impatient, and sneaky. In high-concurrency systems, there might be situations where your server might get bombarded with bogus requests, so you may want to control that.
Some of the real-world use cases could be the following:
1. API quota management — As a provider, you might want to limit the rate at which API requests are made to your server, depending on the payment plans the user has taken. This could be on the client or service side.
2. Security — To protect against DDOS attacks.
3. Cost control — This is not necessarily for the service side or even the client side. If some component emits an event at a very high rate, it could help in controlling it. It could help control the telemetry sent from the client side.
Options we have while rate limiting
Depending on what type of request/event we are dealing with, the following can happen:
- We can drop the additional requests
- We can choose to make the requests wait until the system throttles them down to the predefined rate.
Common rate-limiting algorithms
- Token bucket algorithm
- Leaky bucket algorithm
- Fixed Window
We will not go into the internal details of these algorithms as it is out of scope for this article.
We will take the fixed window algorithm as our pivot. Let’s write down high-level requirements for the implementation.
Fixed window — Windows are split upfront, and each window has a counter. Each request increases the counter by one. Once the counter reaches the threshold, new requests are dropped until a new time window starts.
Requirements
- Should be able to accept required(TPS) transactions per second or the rate.
- Should drop the transactions if it exceeds our defined rate.
- Should work in concurrent situations.
Advanced features (In upcoming articles)
- Should be able to smooth bursts of requests. For example if we have defined TPS as
5
, and all five requests arrive at the same moment, it should be able to line them up at fixed intervals, i.e. execute each of them at an interval of 200ms. It requires an internal timing circuit. - If we have a TPS of
5
, and in one of the 1 second slots, we use only three tokens for the next second, we should be capable of providing 5+2 = 7 tokens as a bonus. But at the rate of 1/7 (142.28ms) per token. The bonus should not carry forward to the next slot.
Let’s first define the contract for our RateLimiter
:
/**
* Rate limiter helps in limiting the rate of execution of a piece of code. The rate is defined in terms of
* TPS(Transactions per second). Rate of 5 would suggest, 5 transactions/second. Transaction could be a DB call, API call,
* or a simple function call.
* <p>
* Every {@link RateLimiter} implementation should implement either {@link RateLimiter#throttle(Code)} or, {@link RateLimiter#enter()}.
* They can also choose to implement all.
* <p>
* {@link Code} represents a piece of code that needs to be rate limited. It could be a function call, if the code to be rate limited
* spreads across multiple functions, we need to use entry() and exit() contract.
*/
public interface RateLimiter {
/**
* Rate limits the code passed inside as an argument.
*
* @param code representation of the piece of code that needs to be rate limited.
* @return true if executed, false otherwise.
*/
boolean throttle(Code code);
/**
* When the piece of code that needs to be rate limited cannot be represented as a contiguous
* code, then entry() should be used before we start executing the code. This brings the code inside the rate
* limiting boundaries.
*
* @return true if the code will execute and false if rate limited.
* <p
*/
boolean enter();
/**
* Interface to represent a contiguous piece of code that needs to be rate limited.
*/
interface Code {
/**
* Calling this function should execute the code that is delegated to this interface.
*/
void invoke();
}
}
Our RateLimiter
has two sets of APIs: one is throttle(code)
, and another is enter()
. Both cater to the same functionality but in these two ways:
boolean throttle(code)
— can be used to pass a block of code if we have contiguous code with us.boolean enter()
— can be used in general before the API, DB, or any call we want to throttle. If the code following this would execute, it would returntrue
, andfalse
if it is rate limited. You can queue those requests or reject it.
You will never see the
throttle(code)
implementation in production, because it is suboptimal. Please let me know why in the comments. Most of the rate limiters use the APIs that looks likeenter()
.
Core Functionality
To build the core of our rate limiter, we need to make sure that between any two seconds we should not allow more than N
transactions. How will we do that?
Consider the moment when we make the first transaction t0
. So,
until (t0 + 1)s
, we are allowed to make only N
transactions. And how will be ensure that? At the time of next transaction, we will check if current time ≤ (t0+1)
. If not, then it means we have entered into a different second, and we are allowed to make N
transactions. Let’s see a small code section that demonstrates that:
long now = System.nanoTime();
if (now <= mNextSecondBoundary) { // If we are within the time limit of current second
if (mCounter < N) { // If token available
mLastExecutionNanos = now;
mCounter++; // Allocate token
invoke(code); // Invoke the code passed the throttle method.
}
}
So, how do we define the mNextSecondBoundary
? That will be done when we make the first transaction, as discussed, we will add one second to the moment when the first transaction is done.
if (mLastExecutionNanos == 0L) {
mCounter++; // Allocate the very first token here.
mLastExecutionNanos = System.nanoTime();
mNextSecondBoundary = mLastExecutionNanos + NANO_PER_SEC; // (10^9)
}
Now, what should we do if we execute the code and see we have entered into a different second? We will enhance the previous code by resetting our last execution time, number of tokens available, and repeating the same process by calling throttle()
again. Our method already knows how to handle a new second.
@Override
public boolean throttle(Code code) {
if (mTPS <= 0) {
// We do not want anything to pass.
return false;
}
synchronized (mLock) {
if (mLastExecutionNanos == 0L) {
mCounter++;
mLastExecutionNanos = System.nanoTime();
mNextSecondBoundary = mLastExecutionNanos + NANO_PER_SEC;
invoke(code);
return true;
} else {
long now = System.nanoTime();
if (now <= mNextSecondBoundary) {
if (mCounter < mTPS) {
mLastExecutionNanos = now;
mCounter++;
invoke(code);
return true;
} else {
return false;
}
} else {
// Reset the counter as we in a different second now.
mCounter = 0;
mLastExecutionNanos = 0L;
mNextSecondBoundary = 0L;
return throttle(code);
}
}
}
}
In this implementation, we can pass the code block that needs to be throttled, but there is a problem with this code. This will work but it will perform poorly. It’s not recommended, but why? Please let me know in the comments.
Now, it’s time to build the second API using the same building blocks and enter()
. We will use the same logic, but we are not going to execute the code block inside the method. Rather it will be followed after the call to enter()
as we do the state management. The implementation of the method is as follows:
@Override
public boolean enter() {
if (mTPS == 0L) {
return false;
}
synchronized (mBoundaryLock) {
if (mLastExecutionNanos == 0L) {
mLastExecutionNanos = System.nanoTime();
mCounter++;
mNextSecondBoundary = mLastExecutionNanos + NANO_PER_SEC;
return true;
} else {
long now = System.nanoTime();
if (now <= mNextSecondBoundary) {
if (mCounter < mTPS) {
mLastExecutionNanos = now;
mCounter++;
return true;
} else return false;
} else {
// Reset the counter as we in a different second now.
mCounter = 0;
mLastExecutionNanos = 0L;
mNextSecondBoundary = 0L;
return enter();
}
}
}
}
Now, our simple rate limiter is ready for use. You can check out the complete code here.
Results
We will try to create a driver code that creates six threads. Each thread tries to count from 0 to 100 with a delay of 50ms (that can be set to any number). We will put our rate limiter into action as follows:
public static void main(String[] args) {
RateLimiter limiter = new SimpleTokenBucketRateLimiter(1);
Thread[] group = new Thread[6];
Runnable r = () -> {
for (int i = 0; i < 100; i++) {
try {
Thread.sleep(50);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
if (limiter.enter()) {
System.out.println("Values:- " + Thread.currentThread().getName() + ": " + i);
}
}
};
for (int i = 0; i < 6; i++) {
group[i] = new Thread(r);
group[i].start();
}
}
Our APIs do not support smoothing the transactions, but instead, they make them wait until the next token is assigned rather than dropping requests. After rejecting them, it returns false, so we can queue up those if we really want to.
if (limiter.enter()) {
System.out.println("Values:- " + Thread.currentThread().getName() + ": " + i);
} else { // queue the work again }

When we try to set the TPS to 2
we see the following output:

It works!
Uses From The Android Perspective
- Consider a case where you are writing code to capture the user’s signature. As they drag the pointer, you capture thousands of points. All of them might not be required for a smooth signature, so you take a sample using rate limiting.
- Some events called at a high frequency. You can control that.
- We have
MessageQueue
’s idle listener. When we listen for it in the main thread, it’s called in a haphazard manner. Sometimes, it is called several times in a second. If we want to build a heart beat system that tells us when the main thread is idle, we can use this to receive events per second. If we do not receive an event for a second, we can assume the main thread is busy. - For your framework/library’s API quota management, you can control the API calls depending on the payment plan the user has opted for.
That’s all.
We will build a more complex rate limiter in the upcoming articles.