Member-only story
A Thorough Introduction to Apache Kafka
A deep dive into a system that serves as the heart of many companies’ architecture

⏳ TL;DR?
This article is a 10 minute read.
I have summarized it into 5x less text here 👉
— ✅ A 2 Minute Introduction to Apache Kafka.
I suggest you read both, but start with the short one. That way, you will have an idea of the concepts when you read through this one.
It is part of my newsletter “2 Minute Streaming”, where I write once a week about Kafka in 2-minute reads to 1500+ subscribers.
Introduction
Kafka is a word that gets heard a lot nowadays. A lot of leading digital companies seem to use it. But what is it actually?
Kafka was originally developed at LinkedIn in 2011 and has improved a lot since then. Nowadays, it’s a whole platform, allowing you to redundantly store absurd amounts of data, have a message bus with huge throughput (millions/sec), and use real-time stream processing on the data that goes through it all at once.
This is all well and great, but stripped down to its core, Kafka is a distributed, horizontally scalable, fault-tolerant commit log.
Those were some fancy words — let’s go at them one by one and see what they mean. Afterwards, we’ll dive deep into how it works.
Distributed
A distributed system is one which is split into multiple running machines, all of which work together in a cluster to appear as one single node to the end user. Kafka is distributed in the sense that it stores, receives, and sends messages on different nodes (called brokers). I’ve written a thorough introduction on this as well.
The benefits to this approach are high scalability and fault tolerance.
Horizontally scalable
Let’s define the term vertical scalability first. Say, for instance, you have a traditional database server that’s…