Implementing a Rate Limiter That Solves the Under-Utilisation of Resources in Java

Let’s solve the issues of utilisation that occurred in our previous implementations

Published in

Better Programming

8 min readJan 24, 2023

In our previous articles, we learned how to build a simple rate limiter and a rate limiter that helps us evenly distribute the requests. You can check out the linked articles. I will try to make this article as complete as possible so that you are not dependent on the previous ones, but it is always good to have the historical context.

Requirements

The limitations of previous implementations became the requirements of this implementation.

If our rate limiters were not used for some time, they could not compensate for the under-utilisation of resources.

This could happen in the following two ways:

a. If the rate limiter has a transaction/second of 5, and in the previous second, only two were utilised, we should be able to utilise previous unused permits. The exact policies may depend on implementations.

b. Suppose our rate limiter has a TPS (transactions per second) of 1. One request was made, and the limiter was idle for five seconds. If the request comes at 6.3 seconds, our previous implementation made it wait for 0.7 seconds. But ideally, we should execute immediately.

2. We could not acquire multiple permits at a time; they were built to acquire only one permit. This is a limitation. What if a client wants to make five n/w calls on five different threads through a thread pool? Not being able to acquire n permits would lead other unrelated callers to acquire the permit, which might not be what we want.

Solution

To solve under-utilisation, we must use tokens produced during the idle period, but what if the idle period is huge? It can lead to a flood of unused tokens, and our limiter would have no role to play as the resources can be bombarded. We introduce a concept called “max stored permits,” which is the maximum limit of the permits we can store. The client should provide it.

Solving the acquisition of multiple permits is simple. There are two steps: introduce a new API in our contract and understand how the throttling works. The code below shows the new contract for our rate limiters, and we’ll talk about throttling in the next sections.

public interface RateLimiter {
    /**
     * Rate limits the code passed inside as an argument.
     *
     * @param code representation of the piece of code that needs to be rate limited.
     * @return true if executed, false otherwise.
     */
    boolean throttle(Code code);

    /**
     * When the piece of code that needs to be rate limited cannot be represented as a contiguous
     * code, then entry() should be used before we start executing the code. This brings the code inside the rate
     * limiting boundaries.
     *
     * @return true if the code will execute and false if rate limited.
     * <p
     */
    boolean acquire();

    /**
     * Allows multiple permits at the same time. If an expensive task takes n permits, the further calls should take the
     * toll on the rate.
     * @param permits Permits required.
     * @return true, if successful, false otherwise.
     */
    boolean acquire(int permits);

    /**
     * Interface to represent a contiguous piece of code that needs to be rate limited.
     */
    interface Code {
        /**
         * Calling this function should execute the code that is delegated to this interface.
         */
        void invoke();
    }
}

Vocabulary

TPS — The number of transactions we can do in a second is called “transactions per second.”
Permit — Each transaction could happen through acquiring a permit.
Stored permits — Number of permits we have in store at any time. This is the sum of “fresh permits” for a second and those stored from the previous under-utilisation.
Max permits — We cannot store unlimited permits; this is the max cap on that.
STTD (SmoothTickTimeDuration) — This is crucial for timing. We divide one second by TPS and get STTD. Mathematically, this is the amount of time each request should be spaced out to maintain our desired rate.

Understand Timing and Throttling

In simple situations where we acquire one permit, our STTD will maintain the gap between the requests as we add STTD to the current execution time. When the next request comes, we get the difference between the next available time and the current time. This will make the calling thread wait for that duration, which keeps our rate intact.

But consider a situation where we want to acquire five permits with a TPS of 1. How should we acquire the permit? Should we acquire or deny the request? In simpler implementations, we can deny the request, but this could be a common scenario in real-world situations.

To solve this problem, we provide five permits. One would be from the current second (as TPS is 1), and four are taken from the future seconds. This will make the subsequent request block for four seconds. That way, we can maintain the desired rate. In short, we do not throttle the expensive request, but the fine is levied on the subsequent requests.

Implementation

Our implementation will become simpler if we break the problem into multiple sub-problems. Don’t worry; we will not do “dynamic programming” here. Bad joke! I know.

We need to ask for two things from the client: the TPS and the max permits that can be stored.

public BurstySmoothRateLimiter(int tps, int maxStoredPermits) {
    this.mMaxStoredPermits = maxStoredPermits;
    this.mRequiredSmoothTickDuration = (double) NANO_PER_SEC / tps;
}

Problem 1: Given the situation, 0 — — — 1 — — — 2 — — — 3 — — — 4 — — -b-5, the previous execution happened at 2 and a request came at b. We waited until 5 to execute the request.

To solve this, we will introduce circuit synchronisation that would modify the mNextFreeAvailableTime to now. It will also calculate how many permits we did not use in the idle duration.

// This method does the sync and returns the next available time of execution
private double syncLocked(long now) {
    // If current time is after the next free time, override to now.
    if (now > mNextFreeAvailableTime) {
        // Sync potentially available permits
        int freshPermits = (int) ((now - mNextFreeAvailableTime) / mRequiredSmoothTickDuration);
        mStoredPermits = Math.min(mStoredPermits + freshPermits, mMaxStoredPermits);
        mNextFreeAvailableTime = now;
    }
    return mNextFreeAvailableTime;
}

We have the next available time with us. We will execute the current request at this time.

Problem 2: How do we acquire the permits? For example, if the number of permits is higher than our TPS.

First, we will see if we can cater to the request using the stored permits in the limiter. If yes, we do that. If not, we acquire fresh permits and delay upcoming requests to maintain the rate.

int storePermitsUsed = Math.max(0, Math.min(permitsAskedFor, mStoredPermits));
// If we use all stored permits, then we have demanded >= storedPermits.
if (storePermitsUsed == mStoredPermits) {
    freshPermitsUsed = permitsAskedFor - storePermitsUsed;
}

// Wait if we cannot satisfy the request from stored permits.
// In case we can satisfy the request from stored permits, this wait will be 0
double wait = freshPermitsUsed * mRequiredSmoothTickDuration;

Now, we add the wait to calculate the next available time and reduce the permits used from stored permits.

// Reduce the number of available permits
mStoredPermits -= storePermitsUsed;
mNextFreeAvailableTime += wait; // Calculated in the previous snippet.

Here’s what the complete method looks like:

/**
 * Reserves us 'n' permits and returns the earliest when these permits could be used.
 * If a caller asks for more permits than max transactions per second, we do not stop per se but, the upcoming
 * transactions get delayed for the required amount of time.
 * @param permits permits asked for
 * @param now current time in nanos
 * @return Earliest time when 'n' permits could be used.
 */
private double reservePermitsLocked(int permits, long now) {
    int freshPermitsUsed = 0;
    double nextFreeAvailable = syncLocked(now);
    int storePermitsUsed = Math.max(0, Math.min(permits, mStoredPermits));
    // If we use all stored permits, then we have demanded >= storedPermits.
    if (storePermitsUsed == mStoredPermits) {
        freshPermitsUsed = permits - storePermitsUsed;
    }

    double wait = freshPermitsUsed * mRequiredSmoothTickDuration;
    // Reduce the number of available permits
    mStoredPermits -= storePermitsUsed;
    mNextFreeAvailableTime += wait;
    // Fresh permits will encounter throttling and stored permits will fire in burst to cover under utilization.
    return nextFreeAvailable;
}

Now that we have acquired the required permits and calculated the next available and execution time of the current permit request, we can return the free available time to the acquire() call.

In the acquisition, we check if we need to wait before making the request or see if we can make it immediately. We let the thread sleep without interruptions if there is a wait time.

@Override
    public boolean acquire(int permits) {
        synchronized (mLock) {
            if (permits <= 0) {
                return false;
            }
            long now = System.nanoTime();
            double nextAvailableTimeNanos = reservePermitsLocked(permits, now);
            double wait = Math.max(nextAvailableTimeNanos - now, 0);
            sleepWithoutInterruptions((long) wait);
            return true;
        }
    }

And that’s all. We have created a new implementation of our rate limiter capable of acquiring multiple tokens and taking care of under-utilisation up to a certain level of permits.

Usage

We can use it like the rate limiter we implemented in previous articles but without the capacity to store tokens. In this example, we’ll make the maxStoredPermits equal 0.

public static void main(String[] args) {
    RateLimiter limiter = new BurstySmoothRateLimiter(3,0);
    Thread[] group = new Thread[16];
    Runnable r = () -> {
        for (int i = 0; i < 50; i++) {
            if (limiter.acquire(1)) {
                System.out.println("Values:- " + Thread.currentThread().getName() + ": " + i);
            }

        }
    };

    for (int i = 0; i < 16; i++) {
        group[i] = new Thread(r);
        group[i].start();
    }
}

This should let each transaction happen with a gap of 0.3 seconds. Here’s the result:

Now, we will try to acquire ten permits from the capacity of 2. This should provide us with ten permits, but make the next request pay the extra cost by throttling it for eight extra tokens.

public static void main(String[] args) {
    RateLimiter limiter = new BurstySmoothRateLimiter(2,0);
    Thread[] group = new Thread[16];
    Runnable r = () -> {
        for (int i = 0; i < 50; i++) {
// Mimicks the expensive operation that requests for 10 permits at a time.
            if (limiter.acquire(10)) {
                System.out.println("Values:- " + Thread.currentThread().getName() + ": " + i);
            }

        }
    };

    for (int i = 0; i < 16; i++) {
        group[i] = new Thread(r);
        group[i].start();
    }
}

Here’s the expected results:

Similarly, you can use the maxStoredPermit parameter to store the previously under-utilised permits.

To see the full implementation of this class, please check out the code here.

Thanks for reading. Stay tuned for more.

Better Programming

Implementing a Rate Limiter That Solves the Under-Utilisation of Resources in Java

Let’s solve the issues of utilisation that occurred in our previous implementations

Requirements

Solution

Vocabulary

Understand Timing and Throttling

Implementation

Usage

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in Better Programming

Written by Vishal Ratna

No responses yet