What Is the CAP Theorem and Where Is It Used?

Consistency, Availability, Partition Tolerance

Ankit
Better Programming

--

Photo by Michael Dziedzic on Unsplash.

The CAP theorem, also known as Brewer’s theorem, states that it is impossible for any distributed database system to provide more than two of the following properties together:

  • Consistency
  • Availability
  • Partition tolerance
CAP theorem

With the advances in parallel processing and distributed systems, it is more common to expand horizontally or have more machines, and the CAP theorem is the backbone of such architecture. Let’s explore the characteristics of the CAP theorem in detail.

Consistency

A consistent system is one in which all nodes see the same data at the same time. In other words, if we perform read operations after multiple write operations, then a consistent system should return the same value for all the read operations and the most recent write operation.

Note that consistency, as defined in the CAP theorem, is quite different from the consistency guaranteed in ACID database transactions.

Availability

A highly available distributed system is one that remains operational 100% of the time. Every request made should be accepted and receive a (non-error) response. Note: It is not necessary for the response to contain the most recent write value (i.e. the system does not need to be consistent, but it should be available all the time).

Partition Tolerance

It states that a system should continue to run even if the connection between nodes delays or breaks. Note: This doesn’t mean nodes have gone down. Nodes are up but can’t communicate.

Let’s say that we two nodes (N1 and N2) and both are connected. Now assume that the network connecting both the nodes goes down (network gets partitioned). Both nodes N1 and N2 are up and running fine, but the updates happening at node N1 can no longer reach node N2 and vice versa.

Partition tolerance is more of a necessity than an option in modern distributed systems, hence we cannot avoid the “P” in CAP. So we have to choose either consistency or availability.

Available Combinations

  1. CP (consistency and partition tolerance): Data is consistent among all the nodes and the nodes maintain partition tolerance. The de-sync node will not accept any request. Some requests will be dropped instead of returning inconsistent (the most recent) information. Let’s say we have two nodes and one of the nodes (N2) stops servicing read/write requests upon detecting that the network connecting the system got partitioned. This means that system treats node N2 as no longer available and any request on this node will be rejected, as it will end up returning the old/stale copy of the data. Hence, this kind of system provides consistency and partition tolerance. Examples: Google Bigtable, Hbase, MongoDB, MemcacheDB, Redis.
  2. AP (availability and partition tolerance): The distributed system is highly available and the system is partition-tolerant, so every read request will not guarantee the most recent information but will be processed.
    Examples: Voldemort, SimpleDB, CouchDB.
  3. CA (Consistency and availability): This is not possible in any distributed architecture. It can only be found in some non-scalable monolithic architecture.

Further Reading

Google Docs and Gmail enforce consistency at the cost of availability. The results you will see across the devices will be the same/consistent.

Google searches, Twitter, and YouTube focus on availability and relaxed consistency — hence the results you see will depend on the state of the server that responds to your request, and different servers can have inconsistent states. This is why you sometimes see inconsistent results on YouTube or Twitter in terms of likes or views.

--

--