Better Programming

Advice for programmers.

Follow publication

Architecting Distributed Systems: The Importance of Idempotence

Robert Konarskis
Better Programming
Published in
6 min readDec 8, 2021
Image by author

Idempotence (or idempotency if you like), is a characteristic of an operation, such as an HTTP endpoint or an RPC call, allowing to execute it multiple times and observe the same result as if the operation was only applied once.

In other words, when integrating with another system such as a third-party payment provider, you want to be able to potentially request the same payment more than once, but you expect the provider to move the funds only once.

An obvious question would be: why would someone request the same payment more than once? Well, in an ideal scenario, typically in >99% of the cases, your system, the provider, the network, and everything else work fine and the operation successfully completes on the first attempt. If something goes wrong, however, and you are not sure of the result of the operation, you might need to retry, submitting the same payment request again. Such scenarios are explained in detail in my article about API Failures, which I highly recommend if you haven’t worked with integrations a whole lot since it helps to understand the rest of this article.

In order to fully appreciate the value of idempotent operations, we are going to compare the integration efforts with two imaginary payment providers: the first one being not idempotent, and the second one idempotent. Our requirements stay the same:

if any part of the system fails during the operation and we do not know if the operation succeeded or not, we should be able to retry the operation until it succeeds, and the operation must be executed exactly once.

To simulate this scenario, we will assume that our server crashed after sending the request, but before receiving the response from the third-party system. Upon the restart of our server, we want to make sure that only one payment is successfully executed.

Note: we must only restart if we don’t know whether the execution was successful or not, or if we have reasons to believe that the fault was temporary. If the provider responded with a 400: bad request, retrying won’t help, and we must fail the operation.

Integrating with a non-idempotent system

Let’s integrate with a non-idempotent system, and try to make it resilient to failures, meeting our initial requirements. Here is a non-idempotent API that we have to integrate with:

POST /payments
{
"from": "sender@email.com",
"to": "recipient@email.com",
"amount": 120
}

Every time we would make a POST request to the/payment endpoint, the payment provider would attempt to transfer the specified amount of money from the sender to the recipient.

Since our server crashed while waiting for a response and we don’t know what is the result of our first attempt (the payment might have actually succeeded), we can not just call the same endpoint again, since then we’re running the risk of making two payments instead of one. First, we need to figure out what happened to our previous attempt to execute the payment. This means that the provider needs to expose at least one more endpoint, listing past payments that have been executed, for us to check if we should proceed with the POST call or not:

GET /payments

On our side, now we would need to call this endpoint every time before attempting to call the POST endpoint, not only for the retries. This is because, without any additional modifications to our system (e.g. persisting all attempts to make a payment before calling the third-party provider), we can not tell if we’re trying for the first time or retrying the same payment. This leads us to an up to 2x increase in API calls required to execute one payment, but looks like it solves the problem of double execution!

…or does it?

The above seems to work fine on paper, with only one instance of our Backend trying to call the Provider at a time… But is that always the reality? What if the user had 2 browser tabs open and tried to check out the same shopping cart at about the same time from both of them? There is a high chance that the GET /payments request will result in the same response for both of our parallel processes, and both will proceed to try and execute the POST /payments call, potentially resulting in two payments instead of one. Not ideal.

There are multiple ways to work around this issue, but all of them rely on making naturally asynchronous processes, such as making an API call from several browser windows at a time, synchronous. We could have a single thread processing all requests, use locking, persist all attempts in a database, or use some other magic to prevent parallel execution of the same payment.

While it’s definitely possible, it is added complexity that needs to be carefully maintained and thoroughly tested. At this stage, we’ve met our requirements, but we needed to use an extra API call and some sort of synchronization mechanism to prevent parallel execution of the same payment. Also, our overall performance might have suffered from these additions.

Let’s see if this overhead can be avoided.

Integration with an idempotent system

Here’s what our idempotent payment provider API can look like:

PUT /payments/{payment_id}
{
"from": "sender@email.com",
"to": "recipient@email.com",
"amount": 120
}

As you can see, there are two notable differences:

  1. PUT instead of POST
  2. Additional payment_id parameter

The main difference between POST and PUT is that PUT is meant to be, you guessed it, idempotent, while POST is expected to create a new resource every time it is invoked. A more extensive explanation can be found here.

The additional payment_id parameter acts as our idempotence key and is used to uniquely identify a payment, allowing the provider to de-duplicate the requests in case we call the API more than once. This does create a necessity for us to generate this unique id though, which could be challenging in some cases, and this logic would depend on the use case. In our situation, we can assume that a combination of a user id and a shopping cart id uniquely identifies a payment request. If you’re integrating with an email provider, this could be a hash of the sender, list of recipients, and subject, as an option.

Such an endpoint allows us to keep trying to execute the same operation several times until we get a successful response back, without having to worry about parallel execution or additional payment success checks. As a result, we can write less production code, less test code to cover the production code, and sleep better.

Bonus tip: when retrying such operations, it is recommended to add some randomization to the timeouts. I explain why in my article about random numbers in distributed systems. You might also want to wrap the retry logic in a client library to simplify the integrations, as explained in the piece on how client libraries can improve your system’s availability.

Conclusion

I hope that this example shows how much easier it is to integrate with idempotent systems. As a developer, take a moment to appreciate the APIs that are a pleasure to work with, and should you be working on an API yourself, make it idempotent where possible.

Want to Connect With the Author?Check out konarskis.com.

Write a response