You're unable to read via this Friend Link since it's expired. Learn more
Member-only story
When Scaling Is Not An Option: A Simple Asynchronous Pattern

We live in a world of APIs, and while they are practical you may face a situation where the number of requests increases rapidly and your underlying dependencies can’t keep up.
When this happens you have to define how you want to handle it. The first reaction would be to scale your infrastructure — horizontally or vertically — to increase the capacity you can offer to your clients. But it may not be possible or desirable.
If you find yourself in such a situation one simple pattern that can be applied is to switch your API to instead of carrying on the request execution just acknowledge receiving the request and providing ways to inform back once you actually process the request.
In this article, I will share some use cases where this is a valid pattern and the trade-offs involved.
The Problem Revisited
Let’s imagine that we offer a service — public or not — via an API like the one illustrated in Figure 1.

In our example, we can highlight 3 direct dependencies:
- The compute unit responsible for receiving the request and processing it
- The persistence used to retrieve/update any state
- The third-party service that is orchestrated to deliver the functionality
All is fine until you receive a burst of traffic and your clients are unable to use your service. Your first reaction could be to scale one or more of the dependencies until it can cope with the new reality.
While this is normally the approach you would take, even when using cloud-based solutions, sometimes it may not be the best approach or even a viable option.
For example, imagine that in order to sustain the new load you would have to scale the persistence memory and IO capacity to a new tier. This extra cost may be higher than you can absorb.