Back to top

Consistent Hashing

Quick Video Glance

Develop intuition

While playing cricket, it’s important that players don’t miss catching the ball on the field as it may cost them losing the game. Let’s assume that a team has a catching practice session where players are placed on a circle, and a machine throws balls on the circle’s boundary. The players have to follow the guideline that the one who is closest to the landing place of the ball(in a clockwise direction) is supposed to catch the ball. This should hold true even if someone goes for a drink break

Fig 0 : Consistent Hashing - A cricket analogy

Using this analogy, we can develop an intuition of consistent hashing by treating players as servers and the balls which the players are supposed to catch as web-requests. The guideline of catching the closest ball in a clockwise direction can be referred to as Consistent Hashing. Now, let’s try to understand the need for consistent hashing.

In traditional hash tables, objects are mapped using their identifiers to a specific value by a hash function, which is then used to store the object in any one of hash slots, as shown in Fig 1 below.

Fig 1: Hash Function generating hashes of multiple objects

Any change in the number of hash slots causes nearly all the keys to be remapped, which can be a costly operation. In Fig _2 _, we have shown how all the keys are remapped to the modified hash slots (i.e., after being changed from 7 to 6).

Fig 2: All the objects get remapped when the number of hash slots change

However, using consistent hashing whenever the hash table is resized only a small portion of the keys are remapped. Consistent hashing maps objects to the same cache machine, as far as possible. It means when a cache machine is added, it takes its share of objects from all the other cache machines, and when it is removed, its objects are shared among the remaining machines. The main idea behind the consistent hashing algorithm is to associate each cache with one or more hash value intervals where the interval boundaries are determined by calculating the hash of each cache identifier. If the cache is removed, its range is taken over by a cache with a next interval. All the remaining caches are unchanged.

Historical background

The term “consistent hashing” was introduced in an academic paper from 1997 as a way of distributing requests among a changing population of Web servers. The authors of this paper founded Akamai Technologies a year later, which gave birth to the content delivery network industry.

Consistent Hashing Explained

We use consistent hashing to distribute the objects across multiple nodes. The underlying hashing algorithm map the objects and the nodes in the same circular search range using a hash-function, as shown in the image below.

Fig 3: Circular search range for hashing keys and nodes

The nodes are mapped on this circular range by hashing the

nodeIds in the search range and then doing a mod over the search range. For instance, let’s say the nodeId is n1 and we are using a hash function h1(k) over the search range 0…n-1 , then the value on the circular search range on which the node n1 will be mapped to will be: h1(n1) % n . Let’s consider we have five nodes (n1 to n5), and we use the same logic to map them over a circular search range, and finally we map all the nodes as shown in Fig 4 below.

Fig 4: Nodes mapped over the circular search range

Now we need to distribute objects across these nodes using the same logic for mapping the nodes over the circular ring. Let’s assume we have five objects (o1…o5) which we distribute over the same ring in Fig 5 using the function: h1(o1) % n.

Fig 5: Nodes and objects mapped over the same circular search range

In Fig 5 , we can see that both the nodes and the objects are mapped over the same circular range in a way that object o5 is mapped to node n1, o1 to n2, o2 to n3, o3 to n4, o4 to n5 and o5 to n1 is a clockwise fashion. It seems from this example that the objects are uniformly distributed over the nodes. However, things get tricky when nodes go down (the significant advantage of consistent hashing) as the distribution of objects may get skewed.

Let’s assume that nodes n2 and n3 goes down. In such a scenario, objects o1, o2 , and o3 are re-mapped to n4, o4 to n5 and o5 to n1 , leading to a skewed distribution as shown in Fig 6 below.

Fig 6: Skewed distribution of objects when n2 and n3 dies

There are several ways to solve the problem of a skewed distribution of objects across nodes. One possible way is to use multiple hash functions to map nodes on the circular ring resulting in multiple instances of a node on the ring. Another approach can be to replicate the objects across nodes, as explained in the distributed caching implementation below.

Consistent Hashing Applications and Simulations

In this section, we have provided a brief overview of some well-known applications of consistent hashing along with their simulations.

Load Balancing

Load Balancers are responsible for distributing the user requests amongst application servers. In Fig 7, we have shown the role of load balancers to help our readers develop an intuition of it. We will cover this topic in detail in one of our later chapters.

Fig 7: Load Balancer distributing users requests amongst servers

Brain Exercise

Let’s assume you are designing a system which can distribute the requests amongst multiple servers. What logic will you apply to distribute the requests amongst the servers?

**We recommend you to think of a solution to the brain exercise before reading further.**

There are multiple ways of distributing the load amongst servers. Consistent hashing is a commonly used approach in load balancers using which web requests are distributed across servers. Consistent Hashing is applied by mapping the requestIds of web-requests and the serverIds of the application servers on the same circular search range. When a request comes to the load balancer it uses the unique identifier of the web-requests to find the location of the request on the edge of the circle; after that, the system walks around the ring until it encounters a server. In case a server becomes unavailable, then the requests which would have mapped to that server get redirected to the next server on the circle. When a server is added a similar process gets executed, and the requests which were initially directed to the server at the next higher angle will start getting directed to the new server. Sample code for simulating a load balancing system using consistent hashing can be found here

Distributed Caching

Distributed Cache is an extension of cache which spans multiple servers so that it can scale horizontally. In Fig 8, we have shown distributed cache which spans over multiple servers and communicate amongst each other to attain high availability and reliability.

Fig 8: Distributed servers used for caching

Brain Exercise

How will you distribute the objects across multiple servers? What will you do to attain high availability?

**We recommend you to think of a solution to the brain exercise before reading further.**

Often, we use consistent hashing to enable distributed caching where the caches are spread across multiple hosts. We create replicas of a cache on multiple servers so that the cache can be retrieved even if one of the servers fail. There are numerous ways of selecting the servers on which cache replicas can be stored. In Fig _9 _, we have shown one way of storing replicas by creating virtual nodes using multiple hash functions which map over the same circular range. In the image below, n1 is mapped using the hash function h0 and n1’ is mapped using_h1 _. Similarly, n2 is mapped using the hash function h0 and n1’ is mapped using h1 and so forth.

Fig 9: Creating virtual nodes for storing replicas

Another simple approach to create replicas of the cache can be to choose the next k (example 2) successors of the server (on which the request was initially directed) to do so. A sample code which simulates this approach can be found here . However, there can be more robust strategies for replicating caches; one such method can be the usage of finger table as in Chord protocol .

References

  1. https://en.wikipedia.org/wiki/Consistent_hashing
  2. http://www.tom-e-white.com/2007/11/consistent-hashing.html