7 minute read

Fundamentals of system design Chapter 4: Caching

The components of system design


A cache is a short-term memory that is used to store recently accessed data from original data store for faster retrieval therefore improving I/O operations.

Real World analogy

Let’s say you prepare dinner everyday. You need ingredients for your recipe. When you cook, is it efficient to go to the store every time to buy the ingredients? Absolutely Not. It’s time consuming to go to the store every time you want to cook. It makes more sense to buy ingredients and store them in refrigerator/pantry. This will save you time. In this case, your refrigerator/pantry acts like a cache.

Computer analogy

This is similar in computers. The CPU has a cache in the processor so that it doesn’t have to make data request every time from the RAM or Disk. Also, Accessing data from the RAM is faster than accessing data from disk. Caching acts like a temporary local store for data.

Caching hit and miss

When data is requested from a system, a copy of the data is either in the cache or not. If it’s present, this is referred to as cache hit, when it’s not, this is referred to cache miss. A cache is said to perform well when it has more hits than misses.

When to use caching

You can’t store all your system information in a cache because caching hardware is limited in storage and more expensive than normal databases. Also, the search time will significantly increase when you store tons of data. A cache, therefore, should contain the most relevant data for example data reads. This is because there are a lot more reads to handle than writes.For instance, twitter which has an average of has 300k requests on reads/second and only 6,000 writes/second. Caching tweets according to user’s timeline greatly improves the system performance and user experience. Caching is mostly useful in the following scenarios: a) Storing results of requests that are made many times to minimize data retrieval operations especially on data that is immutable(does not change often ). b) Storing results of complex and taxing computational problems therefore reducing the system latency.

Cache Consistency – Cache Writing Policy

When writing we must decide when to write to the cache or the primary data store. The following are Cache Writing Policy designs:

a) Write-behind cache design - This writes first to the cache and then to the primary database which can happen almost immediately or after a set amount of time. Changes in the cache that have not been made to the database are referred to as “dirty”. Write-behind caching is convenient when you expect to have a write-heavy workload. However, if the time limit is long and the system goes down, you risk losing data that has been updated to the database.

b) Write-through cache Design - This updates both the cache and the database at the same time. If the cache layer fails, the update will not be lost since it will be persisted in the database. Write-through caches are convenient when updates are done infrequently.If you perform updates too many times, this reduces the benefit of having a cache in the first place since you must hit the database.

Cache Eviction Policies

A cache is a transient storage, meaning it’s short-lived. The small amount of cache limits the amount of data to store. We need to track the data we store or remove over time. There are a number of algorithms that are used to manage memory of cache: a) Least Recently Used (LRU) - The cache discards the least recently used data. It’s implemented using a timestamp for last access. If the data has not been used recently, we assume the chances of being called are less compared to others so removing it provides space for more recent data in the cache.

b) Least Frequently Used (LFU) - In this policy, we count the number of times a cache item is accessed. The one with the least use is discarded first. It’s assumed that the least used data are wasting space.

c) First In First Out (FIFO) - This policy works like a queue. The cache evicts data that is accessed fist. It does not consider the amount of times that data has been accessed before.

d) Last In First Out (LIFO): - This policy works like a stack(FIFO). The cache removes data that is recently added.

e) Random Selection - Here, the system randomly selects a data item from the cache memory to give space when necessary.

Caching layers

a) DNS Cache When you visit a URL, for example www.google.com, a request is made to the DNS server to map the URL to an IP Address. The resolved IP Address get stored in the user web browser or the user’s OS to quickly retrieve it the next time it’s requested. This is much more efficient compared to resolving the IP Address every time you visit a website.

b) Client Side Cache Client Side caching is also known as browser caching. Once the client has requested data from server, it may store data on the browser. Client cache is efficient because it allows the browser to access data without contacting the web server therefore improving the latency. Clients store files after requesting them for the first time with an expiration time on the settings. Once the files expires, the client can send re-validation request to the server. If the server contains a newer version, re-validation fails and the server returns a full file. If the file is unchanged, the server returns a 304(Not Modified).The cache keeps the file and refreshes the expiration time. The client and server only exchange the headers information which is faster than reloading the files.

c) Content Delivery Network(CDN) The browser caches most of the static content that appear on the web page to lower the load times. Browsers stores the data until their time to live expires or until the memory cache is full.On the server side various web caching techniques such as CDN(reverse proxy) can be used to improve the performance of a website. A Content Delivery Network works like a browser cache, caching content such as images, videos, static web pages in proxy servers that are located closer to the user than the original servers. A CDN delivers content faster because the servers are closer to the user making the request. Think of CDN as a chain of grocery stores. Instead of going to the farm where food is grown which could be miles away, a shopper goes to their local grocery store. The CDN catches the “grocery” content so pages are served more quickly. A cached content remains in the CDN as long as users continue requesting it. Examples of CDN services are Cloudflare, amazon CloudFront and Azure CDN.

d) Application Server Cache Server side caching utilizes key/value stores to cache data in-memory. Requested data can be cached in memory instead of making requests to the database. In Memory cache delivers a sub-millisecond response times enabling hundred of thousands of requests per second. This is way more faster than making requests to the database therefore reducing latency, increase throughput, and ease the load off your relational database. Examples of Key/Value stores are Redis, Amazon’s DynamoDB and Microsoft Cassandra.

e) Database Cache A database cache layer can be applied to any relational and non-relational databases. Most databases usually includes some level of caching in the default configuration. Tweaking these settings for specific usage patterns greatly improves performance.

Benefits of caching

a) Improved Application Performance - Memory is 50-200 times faster than disk(magnetic or ssd) and therefore reading from in-memory is extremely fast. The fast data access from cache greatly improves the system performance.

b) Reduce latency - Latency is a measure of delay. Modern applications such as Amazon may experience high amount of traffic during some occasions like black Friday and the Christmas season. Increased load on the databases results to higher latency to get data which makes the overall application slow. This may cost amazon billions of dollars. By utilizing in-memory cache, this issue can be avoided since it greatly improves system performance by reducing latency.

c) Increase Read Throughput - In addition to lower latency, caching greatly increases throughput. Throughput refers to how much data can be processed within a specific period of time. A single instance cache can serve hundred of thousands of requests a second. This greatly improving system performance and scalability during spikes.

d) Reduce load on the database - By directing reads to the cache, it reduces the load on the database and protect it from slower performance or crashing at times of spikes.

e) Reduce Database Cost - A single cache instance can provide hundred of thousands Input/Output operations per second. This can potentially replace the need of multiple database instances thus driving the database cost down.


Caching is a key component in improving the performance of a system. Each time you deliver resources closer to the user, you optimize your app performance.

Thank You!

I’d love to keep in touch! Feel free to follow me on Twitter at @codewithfed. If you enjoyed this article please consider sponsoring me for more. Your feedback is welcome anytime. Thanks again!


Leave a comment