Member-only story
Why Every Software Developer Needs to Learn Chaos Engineering
Chaos engineering is the discipline of experimenting on a software system in production

Being a technologist, responsible for managing large scale systems, you always need to think about the continuity, security, and manageability of the system.
Recently I have been spending a lot of time to better understand how to put up a proper Disaster Recovery (DR) and Business Continuity Plan (BCP) for a large distributed system.
My quest to know more about the topic ended up in getting introduced to a new stream of engineering that every system should adopt — Chaos Engineering!
I ended up reading a free book from O’Reilly’s called Chaos Engineering — Building Confidence in System Behaviour through Experiments. I had been aware of Netflix’s Chaos Monkey for many years but I chose to never say that I was building another Netflix, until I ended up reading this book.
The book starts with a very interesting definition of chaos engineering —
“Chaos Engineering is the discipline of experimenting on a distributed system to build confidence in the system’s capability to withstand turbulent conditions in production.”
— Principles of Chaos
Another very simple, yet meaningful definition from Gremlin is:
“Breaking things on purpose to build more resilient systems!”
I never thought chaos could be used to bring discipline.
In a distributed system, you will never be able to prevent all the failures but you can always be prepared for it. As you experiment on your infrastructure, you get to know the weaknesses of the systems and can better plan to avoid those.
By now, you must be wondering — How is chaos engineering different from testing? Let’s try to understand this better.
Testing vs. Chaos Engineering
Most of the time, a good testing plan talks about load testing, security testing, and…