Using Caches Without Losing Friends: A Short Story
After working at Zippin for the past four years, I have grown to love and hate caches equally. It has led to moments of great satisfaction in what we have built and discussed it heatedly with my brilliant colleagues.
This article is a fictionalized account of a story I believe will resonate with many folks working at startups that have made it past the explore stage.
Note: If you are looking for a technical discussion on caching, this isn’t it. I would suggest reading Azure’s guidance on caching and/or FB’s article on why it is a hard problem.
Chapter 1: Coalition
Our story has three friends: Alice, Bob, and Charlie. After a night out, the three friends stumble upon a pop-up lemonade stand, and it hits them — they make the perfect team take this pop-up stand to be the next big thing in town.
This is what they decide on:
- Alice will be building a mobile app for online ordering
- Bob will be building the web version
- Charlie is good with numbers. His friends call him Charlie the Calculator; he’s the quintessential quants guy that every story needs. He will be responsible for crunching numbers and making the lemonade price available based on his analysis.
To get the idea off quickly, they go with the following:
- Charlie will write the prices to a cache — they go with
redis
andMySQL
(a write-through approach). - Alice and Bob read prices from
redis
to ensure minimal latency and good UX for the end users. They load the price every time a user visits the page to ensure the latest price.
const updateLemonadePriceSimple = (newPrice) => {
cache.update(key, newPrice);
mysql.insert(newPrice);
};
They had discussed the elephant in the room — cache invalidation — but that was about it. Stuff was working, and there were bigger things to do: Alice wanted to support her app in multiple languages, Bob wanted to learn UX design to improve his website, and Charlie wanted to improve the data modeling.
They did cover their bases by enhancing the code to handle failure scenarios.
const updateLemonadePriceEnhanced = (newPrice) => {
mysql.startTransaction();
try {
mysql.insert(newPrice);
cache.update(key, newPrice);
assert(newPrice === cache.get(key));
mysql.commit();
} catch (err) {
mysql.rollback();
}
};
Chapter 2: Treason
One year later, and business is booming. They have expanded to multiple languages and countries; they don’t even remember that stuff they coded a year ago.
Alice calls up Charlie; she wants to support multiple flavors. They see a new addition to redis
— redis JSON
and choose to go with it. Bob is out that day, and Alice isn’t patient. She wants to go full steam ahead.
Charlie and Alice decide on the new schema in redis
and get it running. It works! Bob doesn’t notice; he trusts Charlie and has no reason not to do so. He has been using Charlie’s prices for the last year without any issues.
A month passes, and the three friends meet to view how their business is doing. And that is when it hits — Bob’s website has been showing old wrong values for a month now!
*Awkward silence ensues* Bob feels betrayed!
Chapter 3: Introspection
After stages of grief, Bob calms down, and the three friends discuss the following:
- The silent failures were the worst part. Had Bob’s website failed outright, he would have noticed the error early.
- Alice and Charlie moving ahead can seem like bad judgement, but Bob was off that day. They should not be expected to delay because Bob couldn’t do a design review. Moving fast is what made them successful in the first place.
- When it comes to this particular interaction, Charlie is handling an unfair share by owning writes to two components; it increases the onus on him.
- A possible solution would be to enlist William the Worker, who will maintain an offline job to keep the prices across
mysql
andredis
in check. But it is another person on the team, and even if they were to hire him, it means more components, more contracts, more urgh.
Instead, they think it is right to evolve to an API-first approach.
Chapter 4: Evolution
Times have changed. The three friends have now decided to separate their personal and professional lives and get some contracts in place. Now when they walk past each other in the hallway, they have that awkward smile.
- They have moved to an API-first design, where they handle different parts.
- Charlie maintains the API to write to MySQL, and that’s all this API does.
- Charlie also maintains the API to read data, which Bob and Alice use. Bob can use the/v1
, and Alice switches over to/v2
which has the capability of price per type of lemonade. - Bob and Alice now have complete independence from each other.
- Alice realizes the app users need more frequent updates, so she mentions a TTL of 30 seconds. Bob keeps it to a minute.
- They can share the cache to split costs. Even though there is repeated data, only machines read it, so it is not a problem. The ground truth is always Charlie’s API.
- TTL, cache invalidation is now up to the consumer: Let’s say Erica wants to launch an app for tablets. She can use the same API, but she can choose her TTL according to her needs. - Charlie charges per API call, which has incentivized caching for Alice and Bob.
- They must expire keys as a function of their requirements and caching costs. As responsible business owners, it is up to them to implement effective invalidation strategies to keep cache costs low. - Charlie can move to
MySQL/postgreSQL/whatever
if he feels like it, but he must keep the APIs running! Anytime the API fails, Bob and Alice charge him. - We can now chuck out
redis
(let’s say it fails); Alice and Bob can now still serve prices, albeit with a higher latency.
- They might even realize that one of them doesn’t even need a cache; the users are okay with their prices being displayed in 1s instead of 1ms.
Eventually, Bob also supports various flavors and moves to /v2
; and then they ̶l̶i̶v̶e̶ ̶h̶a̶p̶p̶i̶l̶y̶ ̶e̶v̶e̶r̶ ̶a̶f̶t̶e̶r̶ move on to the next problem.
But most importantly, they keep their friendship.
That was a tear-jerker. Here are some final thoughts:
- Context matters: Most problems are solved, but what matters is solving the problem for your organization. While the last solution did solve the problem, like every design, it is not perfect. It might be an overkill based on where you are in your journey. Contractual obligations like the one in the solution don’t make sense if your company isn’t even making any money or has no customers to care about out-of-sync lemonade prices.
- Cache is a privilege, not a right: Caches are expensive — Producers and consumers should code assuming the cache is a nice-to-have and can go away anytime.
In the initial days of a startup, this is not a big deal. But as your business takes off, cache costs increase, and you suddenly find yourself cleaning up dependencies on the cache to cut costs. You might even be caching data you don’t need to, just because that’s how things were done. - Treat FAANG as a guide, not a direction —I love the blogs the biggies publish; it is a great service they provide to the rest of the community. However, don’t just go building it out. Contextualize it.
Yes, your friend, software engineer level Louis XVIII, at a FAANG company did tell you tat they have a 1,000-layered caching stack, but as with the point on context, they are at the stage they need to do so. You most probably don’t need to do so, not yet anyway. - Avoid being blinded by shiny objects: In a world where we are running out of names to name DBs, databases like MySQL are easy to forget. In my FOMO to always be up to speed with all the shiny new objects, I took this awesome piece of technology for granted.
That’s because I can take MySQL for granted; it works and has worked for all these years. While I was exploring all kinds of DBs, Dave the dependable DB admin, showed up to work every day, ensuring MySQL was up; I just didn’t value it.
- Most importantly, everyone (including non-engineers) understandsMySQL
. In a small organization, not needing to train people is a huge plus that should be recognized.
Most importantly, don’t lose your friends, for such things anyways. If you must lose them, choose battles that matter.
Come say hello!
If you liked this, you’d like saying hello to me and my colleagues at Zippin, where we are solving (among other problems) the problem of data consistencies for IoT devices across the globe.