Back to top



Caching is used by applications for high throughput and low latency so that frequently accessed data is fetched from faster in-memory storage, rather than slower traditional databases. Additionally, it’s also leveraged in storing results from complex computations, which can be time-consuming.

Caches generally store the data in faster memory (i.e., SSDs) which assists in quicker data retrievals. Caches are used in a variety of technologies such as operating systems, DNS, CDNs, gaming, and so forth. Its usage is prevalent in popular web applications such as Google, Amazon, and Netflix as well. Some of the significant advantages of caching are listed below.

• Reduction in Latency: We avoid making the network call to slower databases in comparison to faster caches. The caches are generally using faster SSDs as compared to slower HDDs used by the database solutions, which typically use commodity hardware.

Lower load on the database: Most of the requests will be served by a cache, thereby, reducing the traffic on the application database. This will also result in improving the performance of the databases as well.

Preventing redundant operations: We can cache the results of operations which are redundantly executed for each request. Such operations include calling other web-services with the same input for all the requests, complicated and time-consuming computations, and so forth.

Concepts of Caching

Cache Hit vs Cache Miss

Fig 1: Data Flow in Caching

In the sequence diagram above, we have shown the role cache plays when a user places a GET request. The request gets forwarded to the cache which results in two possible scenarios.

Cache Hit: Element in the request exist in the cache, and it returns the cached value.

Cache Miss: Element isn’t present in the cache, so in most cases the cache loads data from the underlying data store, populates the cache and returns it.

In an ideal scenario, we would want to have higher cache hits and minimize cache misses. However, if the size of the data is too large then storing all of it in the cache can be too costly. This led to the rise of a multitude of cache-eviction policies which were catered towards different caching needs.

Cache Eviction Policies

The use of caching may have solved some problems, but it came with its own caveats. Caches relied on costlier compute resources (SSDs), so it’s important to store only the data which is required to minimize the incurred cost. New eviction algorithms came into the picture as a way of deciding which elements to evict when the cache is full.

The choice of the right eviction policy is imperative and depends a lot on the application needs. The reason being that an incorrect choice of eviction policy will lead to frequent cache misses, resulting in wasted network calls to the cache. One of the most commonly used eviction policies is the Least Recently Used (LRU) policy. A cache having the LRU eviction policy uses a list to keep track of the sequence of elements added/accessed in the cache. When an element is added, or fetched from the cache, the element is moved to the head of the list. If a new element is being added to the cache which is full, then the tail element (of the list) is deleted from the cache. This helps in creating space for the new element which is then added to the head of the list, as shown in the image below.

Fig 2: LRU Eviction Policy in action

Some of the other popular cache eviction policies are listed below.

  • First-In-First-Out (FIFO): Evicts the cache entry in the order in which it was added to the cache.
  • Last-In-First-Out (LIFO): Evicts the cache entry in the reverse order in which it was added to the cache.
  • Least Frequently Used (LFU): Discards the cache entries which are least frequently used.

Cache Write Policies

The common issue with caching is an inconsistency between cache and source of truth. It happens when the data is modified in the database but not invalidated from cache. In order to resolve such data inconsistency issues cache-write policies were formulated each catered towards specific application needs.

Scenario – I: In-consistency can be tolerated to some extent

We can use the write policies listed below in cases such as customer reviews of a product where inconsistency can be permitted to some extent(aka eventual consistency).

  • Write-Through Cache: When the writes are first made to the cache, and then to the database, there is a time (after data is written to the cache but before being written to the database) when the two data sources are inconsistent. The issue with this approach is that the data needs to be written in two places for the write to be considered successful.
  • Write-Behind Cache: In this scheme, the writes are required to be made only to the cache to be considered successful. The writes are then forwarded to the database from the cache under some specific conditions or after a particular time interval. The advantages it has over “Write-Through Cache” is a reduction in latency (writes need not be made to two data sources) and lesser load over the database. However, it comes with the risk of data loss if the cache crashes before committing the updates to the database.

Scenario – II: No compromise with consistency should be tolerated

There are use-cases (e.g. financial transactions) where consistency is of utmost importance, and for them we may want to use write-around caching.

  • Write-Around Cache: In this policy**,** the consistency is guaranteed by invalidating the cache and writing the data into database and bypassing the cache during writes. However, it comes with the cost of high latency for read requests of recently written data as it results into a “cache miss” leading to data being read from slower back-end storage.

Caching Solutions

Caches can exist at all levels in architecture but are often found at the level nearest to the front end, where they are implemented to return data quickly without taxing downstream levels. It relies on how we place the cache: closer to the server (local caching) or closer to the database (distributed caching).

Fig 3: Placing the cache between server and database

Local Caching

In some applications, we need extremely low latency (e.g., the landing page of a big dot com website) which requires caching the required information on individual hosts of the application to avoid network calls for fetching that particular information. The general practice in such local caches is to instantiate the cache when the application boots up and populate and maintain the cache in real time when the application gets called, resulting into a final system similar to the image below.

Fig 4: LocalCache interaction model

The limitation of local caching is that the data to be cached should fit in RAM. One such framework used for local caching is Google Guava Cache. The hard reality of local cache is that we don’t have enough memory and so we need to decide when it is not worth keeping a cache entry. Guava provides three basic types of eviction: size-based eviction(cache size, cache entry weights), time-based eviction(expireAfterAccess, expireAfterWrite), and reference-based(weak references, soft references) eviction. In the following code snippet, we are creating a local cache having time-based and size-based eviction policies.

LoadingCache<Key, Value> cache = CacheBuilder.newBuilder()
       .expireAfterAccess(20, TimeUnit.MINUTES)
           new CacheLoader<Key, Value>() {
             public Value load(Key key) throws AnyException {
               return makeExpensiveNetworkCall(key);

In the code above, we have set the maximum cache size to 5000 and the expiration time to 20 minutes after the cache entry was read or written. In the end, we have also defined the CacheLoader’s load function, in cases, when the desired entry is not found in the local cache.

Distributed Caching

As we have seen in case of Local caching, the data required to be cached can’t be stored on a single host and is needed to be spread over a cluster of servers. Additionally, we also want to prevent data losses due to instance disappearance or faulty network. Some of the popular distributed caches used in the industry are Redis and Memcached. In this scenario, all the application servers fetch cached data from a single cache which is spread over a cluster similar to the diagram below.

Fig 5: DistributedCache interaction model

The limitation of distributed caching is that we incur the latency of the network call. However, the fast, managed, in-memory data stores still out-performs slower disk-based databases. Under the hood, the distributed caches use an efficient hashing algorithm(e.g., Consistent Hashing) for distributing the caches across the cluster and use sharding and replication for fault tolerance.

Case Study: Content Delivery Network (CDN)

CDN are the prime example of Distributed Caching in the wild. It is used to expedite the distribution of static and dynamic content (images, javascript, HTML, CSS, MPEG) to the end users. Some of the widely used CDNs in the industry are Akamai and Amazon CloudFront. CDNs are made up of two essential components: edge servers and origin servers. When a user requests static content (in Fig 5), it gets routed to the edge location (typically the geographically nearest location for low latency) which can best serve the request. The edge location looks up the content in the cache and returns it to the user in case of a cache hit. Otherwise, the request gets forwarded to the origin server that returns the actual content to the edge location, which caches and delivers the response back to the user. Almost all the CDN providers have the functionality to invalidate and refresh the cache at regular intervals or manually triggered. This act as a guard-rail against stale data being served from cache.

Fig 6: Data Flow in CDNs