Back to top

CAP Theorem

Quick Video Glance

Develop intuition

When you go to a restaurant, there’s always a possibility that the food doesn’t taste good. The probability of which is lesser in a well-known restaurant owned by renowned chefs like Gordon Ramsay, in comparison to fast-food chains. This may come at the cost of waiting in long queues at Gordon Ramsay’s restaurant, which may not be the case with fast food joints, as shown in the image below.

Fig 0 : Types of Restaurants - CAP in real world

We can use the same analogy to develop an intuition for CAP Theorem, which is based on three essential aspects of a distributed system

  1. Consistency: Every read receives the most recent write
  2. Availability: Every request received by a functional node must result in a response
  3. Partition Tolerance: The distributed system should continue to operate despite communication breakages that separate the network into clusters which are unable to communicate with each other.

CAP Theorem states that it’s impossible for a distributed system to offer more than two out of the three aspects mentioned above. Similar to the fact that food can taste bad at any restaurant, there’s always a possibility of network clusters not able to communicate with each other in a distributed system. Based on how long you are ready to wait for your food, you can make the choice to go between Gordon Ramsay’s restaurant (where you may have to wait for a long time) and the fast-food joint (where you will get the food almost instantly). On the same lines, while designing distributed systems, one has to choose between consistency and availability based on the business requirements, more details will be covered in later sections.

Historical Background

CAP theorem was proposed in the field of theoretical computer science by famous computer scientist Dr. Eric Brewer at the 2000 Symposium on Principles of Distributed Computing (PODC). During the time, the size of data was growing immensely, and the existing ACID databases weren’t able to scale to satisfy that demand. This led to the growth of a new paradigm called BASE (basically available, soft-state, eventual consistency). Brewer analyzed the paradigm changes and its implications and presented his findings, marking the inception of CAP Theorem.

Current State

It’s a matter of the fact that no distributed system is safe from network failures, network partitioning has to be tolerated, and we have to make the trade-off between consistency and availability. The resulting system isn’t entirely consistent or available - but a combination which is reasonable for specific business needs. It implies that when a system chooses consistency over availability, the system will return an error or a time-out if particular information cannot be guaranteed to be up to date due to network partitioning. When choosing availability over consistency, the system will always process the query and try to return the most recent available version of the information, even if it cannot guarantee it is up to date. We have listed below some real-world applications of CAP Theorem to make it more intuitive for our readers.

Making Trade-off decisions - Flight Booking System

Imagine you need to build a distributed system which enables customers to book flight tickets. In Fig. 1 below, we have Alice and Bob who are in separate geographical locations(Alice is in New York and Bob in Sydney) and both of them want to book the last flight ticket using your distributed system. As they are on different geographical locations, their requests are directed to different nodes on your system(Alice’s request to the Miami node and Bob’s to the Beijing node). For ensuring consistency when Alice wants to book the ticket, then the Miami node needs to communicate with the Beijing node, and both nodes must agree on the serialization of the requests. It provides consistency; however, in the case of any network breakage, both the nodes won’t be able to book the flight ticket, sacrificing availability.

Fig 1 : Making Trade-off decisions - Flight Booking System

Brain Exercise

What will you choose in the Flight Booking System: Consistency Or Availability? Why?

We recommend you to think about the solution of the brain exercise before reading further.

We may want to prefer availability over consistency in a flight booking system to provide a better experience to our customers. To gain more availability, we might allow both the nodes to keep accepting flight reservations even if the communication line breaks. The worst possible outcome of this approach is that Alice and Bob both will end up making the flight reservation. However, such situations can be resolved using domain knowledge. It’s a pretty common occurrence that the flights are overbooked and then the flight companies address such cases by taking the appropriate measures (e.g., Refunds, moving to another flight, etc.).

Partition Recovery - E-commerce shopping cart

Often on e-commerce websites, sometimes couples share the same customer account. Alice and Bob are a couple who are in different geographical locations(Alice is in London, and Bob is in Boston), and they are adding items on an e-commerce website’s shared account’s shopping cart(shown in Fig 2 below). As they are in separate geographical locations, Alice’s cart exists on the London node, and Bob’s on the New York node of the website’s distributed database system. Let’s consider a scenario when a network failure occurs in the distributed system, and the two nodes(i.e., London and Boston) are no longer able to communicate with each other. It results in a situation where Alice’s cart doesn’t show the items added by Bob and vice versa(aka network partitions).

Fig 2 : Handling Network Partitions : E-commerce Carts

Brain Exercise

If you were building this e-commerce system, will you allow customers to keep adding items on the cart when the network failure occurs? How will you handle failure scenarios related to network partitions?

We recommend you to think about the solution of the brain exercise before reading further.

If you are designing an e-commerce application, you probably don’t want to restrict your customers from adding items to their cart as it will deteriorate the customer experience and the business may also end up losing money. There are several ways to recover from such network partition scenarios. One possible solution can be to trigger the checkout process only after ensuring that the partition has recovered and then merge the items from Alice’s and Bob’s cart in a single cart as shown in Fig _2 _. If you are curious to learn more techniques of partition recovery, then we recommend you to watch this talk by Dr. Eric Brewer.

Conclusion

Often, the situations are tied to the domain and requires domain knowledge to determine the resolution. For a news media, we can tolerate old pages for minutes; however, in case of financial trading instruments, we can’t rely on stale data. This decision needs the involvement of the development team and the domain experts. CAP Theorem is often used (sometimes abused) when talking about consistency in distributed systems. However, the significant trade-off we make is actually between consistency and latency. Consistency can be improved by involving more nodes in the interaction at the cost of adding more time engaged in the communication. It implies that we should think of availability as the maximum limit of latency that can be tolerated, once that limit is breached, the system is deemed as unavailable.

References

  1. https://www.infoq.com/cap-theorem/
  2. https://towardsdatascience.com/cap-theorem-and-distributed-database-management-systems-5c2be977950e
  3. https://www.amazon.com/NoSQL-Distilled-Emerging-Polyglot-Persistence/dp/0321826620
  4. https://fenix.tecnico.ulisboa.pt/downloadFile/1126518382178117/10.e-CAP-3.pdf