Lock-Free Stacks Are Even Cooler in Kotlin

Treiber stacks are the coolest data structure you’ve never heard of, and tail recursion makes them even better

Sam Cooper
Better Programming

--

Photo by iMattSmart on Unsplash

Concurrent data structures have a reputation for being fiendishly difficult to build and test.

Java’s ConcurrentHashMap occupies more than six thousand lines of source code, including hundreds of explanatory comments. And more than ten years after the release of Java 7, there are still unresolved issues with the exact behaviour of its concurrent queue data structure.

But what if I told you it doesn’t always have to be that way? In 1986, R. Kent Treiber described a concurrent stack implementation that’s so simple it almost seems too good to be true.

He was dealing with low-level assembly language, but the idea works great in many modern programming languages, too. It’s so straightforward that you can implement it in Kotlin in less than fifteen lines.

OK, maybe not totally straightforward. But if you want to understand more about how and why it works, keep reading, and I’ll build it up step by step so that each part of the code makes sense.

Concurrent data structures need to maintain their integrity when multiple threads are attempting to access or even modify them at the same time. For data structures that organise data across multiple variables or arrays, keeping everything in sync can be a daunting task.

Stacks don’t require much internal state at all, so they already bypass a lot of the complexity that makes concurrency a hard problem. Like a linked list, a simple stack can be made up of a series of nodes. Each node holds one value and a reference to the next node in the stack. All the user needs to do is hold a reference to the single root node at the head of the stack. With only one variable to manage, a linked stack is so simple that there’s nothing that can get out of sync.

Atomic Updates

As simple as it looks, though, this still isn’t safe for use with multiple threads.

The problem is that adding a new item to the stack still takes more than one step. First, we have to read the head variable to get the value that used to be at the top of the stack. Next, we create the new node object. Finally, we modify the head variable to point at our new node.

If multiple threads carry out those steps simultaneously, there’s no guarantee that the first thread will complete step three before the second thread executes step one. Here’s one example of how the two threads could interleave their operations. In this case, the result is that only one of the values is added to the stack, while the other one is silently lost.

Treiber’s lock-free stack uses an atomic compare-and-set operation to identify when it has ended up in this situation. The trick is that when setting the new value for the head variable, we also check its current value one more time. If the head of the stack differs from when we started, we know another thread has beaten us to the punch. In that case, we go back and try again.

We’ve upgraded our head variable to an AtomicReference, to gain access to atomic operations like compare-and-set. An atomic operation is executed as a single step. It’s guaranteed that the call to compareAndSet will succeed — and return true — if and only if the head variable is still pointing at our expected oldHead at the exact time when the modification takes place. If the current value doesn’t match what we told it we’re expecting, compareAndSet will just return false, and will not modify the value.

A characteristic of the Treiber stack is the infinite retry loop, which simply repeats until the compareAndSet operation succeeds.

At first glance, it can be alarming to see an infinite loop hanging out in a data structure. How can we be sure the loop won’t just run forever?

Well, the retry happens when compareAndSet returns false, which happens if some other thread has changed the head of the stack while we are preparing our modification. If that happens, it means the other thread succeeded with its modification to the stack. So, in every iteration of the loop, there are two possible outcomes. Either our attempt to modify the stack succeeds, or some other attempt to modify the stack succeeds, and ours fails. When a finite number of threads are waiting to modify the stack, they must all eventually succeed.

Technically, in a busy application constantly queuing up new modifications to the stack, we can’t guarantee that any individual thread will make progress. Although the stack as a whole can’t become deadlocked, it’s always possible that one thread could get super unlucky and be constantly beaten out by other threads when trying to add or remove a value. You’d need a lot of threads and some seriously bad luck for that to become a problem, though.

Heads or Tails?

Photo by Jason Leung on Unsplash

There are a few different ways to write the infinite retry loops in the push and pop methods. A simple while(true) loop makes it super obvious what’s going on. If you’re feeling adventurous, you can save a few lines by using a do-while loop, putting the compareAndSet check in the loop boundary condition itself.

In Kotlin, though, my favourite approach is to use recursion. Not only does it minimise the quantity of code, but it makes the algorithm easy to visualise. If the push operation succeeds, we’re done, but if it fails, we just call push again with the same value.

Recursion may be elegant, but it does come with a cost. Each new function call allocates a stack frame, consuming a little memory each time. The total size of the call stack is limited, so after a few hundred modification attempts, this recursive push operation is going to give up with a StackOverflowException. That’s not good!

One solution is to use tail recursion. When the last action in a function is a call to another function, the compiler can apply tail call optimisations to avoid creating extra stack frames. If the tail call — the last action in the function — is a recursive call back to the same function, tail call optimisation lets the compiler effectively rewrite the whole function as a simple loop.

Tail recursion isn’t optimised automatically in Kotlin, but we can opt in by adding the tailrec modifier to a function. With the modifier in place, we can write our lock-free stack mutation operations as recursive functions while having them execute just as efficiently as if they were loops.

With all those puzzle pieces in place, here’s the full Kotlin implementation one more time.

Treiber stacks are a beautiful anomaly in the otherwise hazardous landscape of concurrent data structures.

The idea only works because a stack can be managed by controlling a single piece of mutable state. There’s no such thing as a Treiber queue: you’d need multiple pieces of mutable state and extra code to keep them in sync because a queue can be modified from both ends. Maps and lists, with their complex random access operations, are completely out of reach.

But when all you need is a stack, Treiber’s implementation is a nice reminder that life doesn’t always have to be difficult.

--

--