Aman's AI Journal • Consistent Hashing

Overview

Horizontal scaling requires efficient and even distribution of requests and data across servers.
Consistent hashing is a widely used technique for achieving efficient distribution.
The problem of distributing requests and data evenly across servers is examined in detail.
The rehashing problem arises when attempting to balance the load across multiple cache servers using a hash method.
The hash method involves calculating the hash of a key and using modulo division to determine the server index.
An example scenario is presented, with 4 servers and 8 string keys along with their corresponding hashes.

The approach of using hash(key) % N for load balancing works well when the server pool size is fixed and data distribution is even.
Challenges arise when servers are added or removed from the pool, causing changes in the server pool size.
When a server goes offline (e.g., server 1 in this example), the server pool size decreases, leading to different server indexes for keys with the same hash value.
Applying the hash modulo operation (% 3 in this case) to the keys results in different server indexes, as shown in Table 5-2.

Certainly! Here are the summarized bullets with the headings included:

Consistent hashing is a special kind of hashing technique that minimizes key remappings during hash table resizing.
When consistent hashing is used, only k/n keys need to be remapped on average, where k is the number of keys and n is the number of slots.
Traditional hash tables typically require remapping nearly all keys when the number of array slots changes.

Consistent hashing utilizes a hash space and a hash ring to distribute keys across servers.
The hash function (e.g., SHA-1) produces hash values within a defined range known as the hash space.
The hash space ranges from 0 to 2^160 - 1 in the case of SHA-1, where each hash value corresponds to a point on the hash ring.
The hash values are uniformly distributed across the hash ring, ensuring even distribution of keys.

Figure 5-3: Hash space and hash ring

Figure 5-3 illustrates the hash space and the distribution of hash values on the hash ring, showing how keys are distributed among servers using consistent hashing.

To map servers onto the hash ring, the same hash function f is used, typically based on the server’s IP address or name.
The hash function generates a hash value for each server, which determines its position on the hash ring.
Figure 5-5 visually represents the mapping of four servers onto the hash ring, showcasing their respective positions in the distributed system.

When a server is removed, only a small fraction of keys require redistribution with consistent hashing. In Figure 5-9, when server 1 is removed, only key1 must be remapped to server 2.
The rest of the keys are unaffected.

Wrap up

Consistent hashing offers several benefits, including minimizing key redistribution when servers are added or removed, facilitating horizontal scaling, and mitigating hotspot key issues.
It is widely used in various real-world systems, such as Amazon’s Dynamo database, Apache Cassandra, Discord chat application, Akamai content delivery network, and Maglev network load balancer.