Definitive guide for System Design Interview

System Design engineering as a discipline has flourished in the past two decades, and I’ve had the privilege of witnessing its explosion firsthand.

Designing large-scale, distributed systems was considered a specialized skill when I began my career. This was true when I first entered Big Tech in the mid 2000’s, starting at Microsoft, where I helped design and develop Azure. By the time I moved on to Facebook, System Design Interviews had already been implemented as a way of evaluating candidates for more senior levels. (They still call it the “Pirate” interview round at Meta today — and more recently have even added a “Pirate X” loop for some candidates, focused more heavily on API & Product Design).

An abstract of System Design diagram

By the early 2010s, System Design Interview rounds had become standard practice across the industry (especially at top-tier companies, e.g. FAANG/MAANG). The more open-ended, conversational format of the SDI provides a nice complement to more technical coding interview rounds, enabling hiring managers to evaluate the candidate’s problem-solving, strategic thinking, and communication skills. For this reason, SDI performance is still consistently used by many companies as a key factor in determining a candidate’s starting level and salary.

So what exactly does a modern System Design Interview look like… and what can you do to successfully stand out to interviewers?

In this comprehensive guide, I will take you through every key System Design concept and interviewing tip you need to know to ace your next System Design Interview.

In the final section of this guide, I will even walk you step-by-step through an entire mock System Design Interview, using one of my all-time favorite SDI questions: Design TinyURL .

Before we jump in, here is the complete structure of the guide:

Who should read this System Design Interview guide?

A working understanding of System Design principles and best practices is expected from every software engineer in 2024. So, if you have a System Design Interview coming up, or are simply looking to sharpen your System Design Knowledge, this guide is for you.

Reminder: System Design isn’t just for senior engineers or system architects. It’s relevant to a broad range of specializations, including:

Software Developers: System Design skills can be a huge benefit whether you’re into back-end, front-end, or full-stack development.
Engineering Managers: From tech leads to architects, applying System Design principles is key to developing enterprise-scale applications.
Technical Product Managers: Learn to confidently steer the design and development of scalable products.
System Design Students: Explore real-world case studies for how the world’s most influential companies architect for scale.

Most of all, this guide can be an indispensable resource in your System Design Interview prep.

Before you continue, you will first need a basic understanding of operating systems, computer networks, and basic API design. You will also benefit from knowing the intricacies of monolithic/thick client architecture, microservices architecture, and event-driven architecture.

Want to sharpen your fundamentals? Here are a few recommended resources:

Luckily I will explain all of the most important concepts as we go (using plenty of real-world examples), so don’t be afraid to just dive in!

What is a System Design Interview?

A System Design Interview is an interview where you conceptualize and architect the structure of an application to fulfill both functional and non-functional requirements. This includes designing key components such as databases, APIs, and server infrastructure to ensure scalability, reliability, and performance (I will tell you about these later). It is a complete 180º turn from the usual technical interviews focusing on writing code or using algorithms to solve problems.

Other Interview vs systems design interview — Other interviews vs. a systems design interview

So why do companies care so much about this interview? Companies like Google, Amazon, and Meta live and breathe scalable systems. Their success hinges on handling massive amounts of data and traffic, and they need engineers who can design the infrastructure to make it happen. System Design is the ultimate test of your ability to tackle the problems engineers working at those companies face every day. It’s not just about coding skills; it’s about architectural thinking, understanding trade-offs, and making tough decisions under pressure.

Let me tell you something else I learned the hard way: There is no perfect System Design.

Almost every choice involves trade-offs. The key is understanding them and making informed decisions. So, don’t worry about perfection in System Design. System Design Interviews are much different than the algorithm-focused technical interviews you might be used to. There’s no right or wrong answer, and the focus is on your thought process and design choices rather than your ability to code.

System Design Interviews are crucial in determining the level of expertise and problem-solving abilities of candidates. Unlike algorithm-focused coding interviews, there’s no right or wrong answer in an SDI. The emphasis is on your thought process and design choices.

Common System Design misconceptions

Many engineers I talk to feel intimidated by System Design Interviews, believing that only candidates with previous hands-on experience designing distributed systems can succeed.

This misconception often deters candidates, especially junior developers, who might lack confidence in their abilities. However, System Design knowledge is less about expertise in every domain and more about having a solid grasp of fundamentals and having expertise in some areas.

Even junior engineers can excel in these interviews if they understand core concepts and can demonstrate their aptitude for learning. I’ve seen junior candidates outperform those with double their experience, underscoring that System Design proficiency is attainable for everyone, regardless of their technical background.

The reality is that both junior and senior engineers must have a strong foundational understanding of System Design… although the scope of this required knowledge will be different for different levels. For juniors, the focus is on potential and basic principles. For seniors, it’s mostly about justifying their design choices, demonstrating their experience through knowledge depth, and diving deep into technical details.

As you progress through this guide, you’ll gain the knowledge and confidence to tackle even the most challenging System Design Interview questions. To put your skills to the test in a realistic setting, consider trying an AI Mock Interview tool to quickly get personalized feedback and refine your approach.

Here are some common misconceptions I have observed regarding System Design Interviews:

You need years of distributed systems knowledge: While a deep understanding is expected from senior candidates, juniors can excel with a solid grasp of fundamentals and the ability to apply core principles (such as CAP theorem, load balancing, caching, partitioning, scaling, etc).
Only experience matters: Performance in System Design Interviews can often depend on how well you comprehend and articulate your design choices rather than just years of experience.
There’s a right answer: These interviews are more about assessing your thought process, creativity, and problem-solving approach rather than finding a single correct solution.
Technical knowledge only: Communication skills, the ability to justify your design decisions, and the ability to collaborate are also key aspects that interviewers assess.
I answered everything correctly; I’ll get the job: While accurate answers are crucial, System Design Interviews aren’t just about being “correct.” Interviewers are looking for candidates who can showcase the depth of their knowledge, provide insightful trade-offs, and demonstrate the ability to think through complex system implications. Even if your answers are technically correct, failing to elaborate or provide a nuanced perspective might not be enough to demonstrate your seniority or problem-solving skills.

Total knowledge vs. correct answer

A practical example of the illustration above during an interview can be the justification of the choice of SQL vs NoSQL, as seen below:

Interviewer:

Why SQL over NoSQL for a URL-shortening system?

Candidate A:

SQL databases are generally easier to work with and offer strong consistency, which is important for ensuring every shortened URL maps to the correct original URL.

Candidate B:

While NoSQL databases offer flexibility and scalability, the relational nature of SQL databases, coupled with their strong consistency guarantees, make them a more suitable choice for managing the mapping between shortened and original URLs in this use case.

So, even if the interviewer didn’t know the candidate’s experience level, a seasoned interviewer could tell that Candidate A was junior-level while Candidate B was much more experienced. So, yes — the way you handle questions should reflect the position you are applying for. In fact, this is one of the most important things that interviewers are looking for.

System Design Interviews at FAANG/MAANG

The System Design Interview has become a staple in the hiring process for technical roles at MAANG companies. This interview assesses your ability to design scalable, reliable, and efficient software systems.

Interview loop at MAANG

While the exact order can vary slightly between companies and roles, here’s a typical timeline of a FAANG interview loop and where the SDI usually falls:

Phone screen(s): This initial screening assesses your overall experience, skills, and cultural fit.

Technical screen(s) (Coding): These are coding interviews (one or more) to evaluate your problem-solving skills and algorithm knowledge.

System Design Interview: You’ll be asked to design a system from scratch, often with scaling and real-world constraints in mind.

As-Ap (Bar raiser): A senior interviewer provides an independent assessment and helps prevent bad hires. They can ask any type of questions.

Behavioral interview(s): This focuses on your past experiences, teamwork, and how you handle challenges.

How does the interview go?

A typical System Design Interview consists of several phases:

The System Design Interview structure

By understanding the key concepts and following a structured approach, you can confidently tackle your next System Design Interview and showcase your ability to design the systems that power our modern tech landscape. Having been on both sides of the table for many SDIs, I can tell you that the first three phases often are a telltale of your chances of being hired.

Now I will walk you through everything you need to know to ace your next SDI. Let’s start with some key concepts, and then we’ll explore the System Design fundamentals that will empower you to design systems that can handle anything the real world throws at them.

Part 1: Key System Design Concepts

I have always believed that the most crucial questions in an SDI setting are always the “How…?” and “Why…?” questions.

As an interviewer, I often saw candidates who claimed their systems were “scalable,” “available,” “fault-tolerant,” etc., but they couldn’t substantiate their claims. In other words, they couldn’t justify the “How?” part.

Being able to articulate the “how” and “why” demonstrates:

You know not just what a well-designed system looks like but how those characteristics are achieved.
You can explain the trade-offs you considered and why your specific solutions best fit the requirements.
You can analyze potential problems and explain how your design would mitigate or resolve them.

To give yourself a chance to answer these questions comprehensively, you need to have a solid foundational knowledge of the principles and practices that underpin well-designed systems. Let’s start by exploring the key characteristics that distinguish these systems.

Core System Design principles you need to know

These are the characteristics that distinguish good systems from bad ones. A well-designed system should be scalable, reliable, available, performant, consistent, and secure. Yes, that all sounds fancy, and you may think all the world’s most impressive systems can have it all… but in reality, you will need to prioritize these according to your system’s most critical needs and then consider the trade-offs that come along with them.

The most important nonfunctional requirements for System Design can be remembered with an acronym: SPARCS.

Scalability
Performance
Availability
Reliability
Consistency
Security

SPARCS: Scalability, Performance, Availability, Reliability, Consistency, Security

Scalability

Scalability is the ability of a system to handle a growing amount of work. It’s like building a house that can easily be expanded as your family grows. In System Design, we talk about two main types of scalability:

Horizontal scalability involves adding more machines to your system to distribute the workload. It’s like adding more lanes to a highway to accommodate more traffic. Horizontal scaling is often preferred for its flexibility and cost-effectiveness.
Vertical scalability involves upgrading a single machine’s resources, such as adding more RAM or CPU cores. It’s like upgrading your car to a more powerful engine. While vertical scaling is simpler to implement, it can be limited by a machine’s maximum capacity.

Here is a handy infographic on how to achieve scalability in System Design:

Scalability in System Design

Performance

Performance is about how quickly and efficiently your system can get things done. A fast system is a joy to use, and a slow system is seldom used. To achieve peak performance, I’ve found that you need to focus on:

Caching data in memory means you don’t have to dig for it in slower storage, which speeds things up considerably.
Load balancing is, again, about distributing the workload evenly. If one server struggles, it gives the effect of the whole system being slow.
Optimizing algorithms is where your coding skills come into play. Choosing the right algorithm or data structure can be the difference between a sluggish system and a lightning-fast one.

Remember, designing a performant system is a testament to your optimization skills. It’s about finding those hidden efficiencies and making your system run like a well-oiled machine.

Availability

Availability in System Design measures how often your system is up and running, ready to serve its purpose. Nobody likes a system that’s constantly down for maintenance, right? I’ve found that high availability is a delicate dance between:

Monitoring is like having a health checkup for your system. You must monitor it constantly to catch potential issues before they escalate into full-blown outages.
Load balancing is about distributing the workload evenly. No server should be overwhelmed, as that’s a recipe for disaster.
Failover mechanisms come into play when things go wrong (and they inevitably will). You need a way to seamlessly switch to a backup system.

We define availability in terms of the “nines of availability.” You can see them below:

Availability %	Downtime per year	Downtime per month
90% (1 nine)	36.5 days	72 hours
99% (2 nines)	3.65 days	7.20 hours
99.9% (3 nines)	8.76 hours	43.8 minutes
99.99% (4 nines)	52.56 minutes	4.32 minutes
99.999% (5 nines)	5.26 minutes	25.9 seconds

In my experience, prioritizing availability is a win-win. It keeps your users happy and your system running smoothly, which is always a good look for an engineer.

Reliability

Reliability is the ability of a system to function correctly, even in the face of failures. It’s like having a backup generator that kicks in when the power goes out. In system design, we achieve reliability through:

Fault tolerance involves designing systems that can continue operating even when some components fail. This often involves redundancy, where multiple components perform the same function so that others can take over if one fails.
Redundancy is about duplicating critical components or data to ensure the system can still function if one fails. This can be done at various levels, from hardware redundancy (e.g., having multiple servers) to data redundancy (e.g., replicating data across multiple storage devices).

By building redundancy and fault tolerance into your system, you can minimize downtime and ensure that your system remains available even when unexpected events occur.

Consistency

Consistency ensures that all parts of your system agree on the data state. This is particularly tricky in distributed systems, where data is spread across multiple machines. That’s where the CAP theorem comes in:

Consistency: All nodes see the same data at the same time.
Availability: The system is always available, even if some nodes fail.
Partition tolerance: The system continues to operate despite network failures, such as dropped partitions, slow network connections, or unavailable network connections between nodes.

Pro tip: Choose a consistency model that balances your application’s needs with the potential trade-offs in performance and complexity.

The CAP theorem states that you can only have two of these three guarantees. This is a harsh reality, but it forces you to make tough choices about what matters most to your system.

The PACELC theorem takes this a step further, suggesting that you can choose different levels of consistency depending on the specific needs of your system. It’s a more flexible approach that allows you to tailor your consistency guarantees to the unique challenges of your system.

The PACELC theorem

Security

Security is non-negotiable in today’s digital landscape. You need to know at least the core principles of security to ensure your design system is safe from malicious actors. What good is a system with all the above-listed characteristics if it isn’t secure?

A lapse in your judgment regarding security can not only risk the system you design but can also put your reputation as a candidate in peril. Here’s what I’ve learned about securing systems:

Defense in layers means having multiple layers of security, from firewalls to intrusion detection systems. It’s about making it as difficult for attackers to penetrate your system.
Encryption is like putting your data in a secret code only authorized users can decipher. It’s a crucial step in protecting sensitive information from prying eyes.
Access controls allow only authorized users to enter, ensuring that only the right people can access the data.

Note: Defense in layers often includes encryption and access controls as some of the layers.

See the illustration below for some of the most common trade-offs you have to consider:

These core principles act as a blueprint for designing modern systems. But can you build entire systems using these core concepts only? That’s where we will employ what I call the building blocks of modern System Design.

Building blocks of modern System Design

You must have heard of databases and caching. These and others form the building blocks of designing useful systems in today’s tech landscape.

Building blocks are the most commonly and frequently used components in almost every System Design. Using these building blocks is a life hack for your SDI success.

Let’s go through some of these essential building blocks.

Load balancers

Load balancers distribute incoming traffic across multiple servers, ensuring no single server gets overwhelmed. It’s like a waiter at a hotel directing customers to empty seats. I’ve seen firsthand how load balancers can dramatically improve a system’s responsiveness and reliability.

Pro tip: Discuss health checks in your load balancer configuration to automatically remove failing servers from the pool.

They are usually placed between the different servers (services, if you will). Load balancers improve availability, security, and performance. Speaking of performance, let me tell you about caching.

Caching

Caching stores frequently accessed data in a fast-access location, like a memory cache or a content delivery network (CDN). It’s like a chef keeping their favorite spices within arm’s reach—no need to rummage through the whole pantry for a pinch of flavor. There are many ways to integrate cache in a System Design:

Distributed cache: A distributed cache is a separate service you add to a System Design. Its only task is to provide a cache to the different system components, thus storing a various data types. You will often see this component in various architectures, but it is important to understand where and when to use it.
Server-side caching: While a CDN acts as a global store, server-side (in-memory) caching stores frequently accessed data directly on the application server’s memory. In-memory caches like Redis and Memcached are incredibly fast and can significantly improve an application’s responsiveness.
Client-side caching: This is all about storing the most accessed resources on the client’s end, i.e., their browser, device, etc. In SDIs, it is rare to be asked about client-side caching. The focus of a system design is on the system itself, not an individual user, but you should focus on designing systems that can support client-side caching to reduce the overhead on the system infrastructure.
CDN: A content delivery network (CDN) bridges the user and the application’s servers. It is placed near the user so that they can get their frequently accessed resources from the CDN (nearby) instead of the servers (far).

In the real world, systems combine multiple caching strategies to be successful.

Your specific caching strategy will depend on your application’s unique requirements, such as data size, access patterns, and budget.

Caching at different levels

However, caching isn’t a fix-all solution. There are trade-offs, like cache invalidation (ensuring cached data stays fresh) and potential cache misses (when requested data isn’t cached). However, with careful planning and implementation, caching can be a powerful tool for improving your system’s performance and user experience.

Databases

For starters, your system’s data resides in databases. Selecting the right database (SQL vs. NoSQL) can dramatically impact your application’s performance, scalability, and development efficiency. Understanding the nuances of each model and how they align with your specific data access patterns is essential for building a successful system.

Many candidates struggle with this question: Why an SQL database over a NoSQL database, or vice versa?

SQL Databases

SQL (Structured Query Language) databases, like MySQL and PostgreSQL, are based on a relational model, where data is organized into tables with predefined relationships. They’re excellent for structured data and complex queries, ensuring data integrity and consistency through ACID properties.

Pro Tip: Discuss database indexing to optimize query performance, especially for large datasets.

NoSQL Databases

NoSQL (Not Only SQL) databases, like MongoDB, Cassandra, and DynamoDB, offer a more flexible schema and are designed for horizontal scalability. This makes them ideal for handling large volumes of unstructured or semi-structured data. However, they sacrifice some ACID guarantees for speed and scalability.

Netflix handles over 1 billion hours of weekly video streaming, relying heavily on NoSQL databases for scalability and performance.

The choice between SQL and NoSQL often concerns your application’s specific needs. SQL might be the way to go if you need strict data consistency and complex relationships. NoSQL might be better if you prioritize flexibility, scalability, and handling diverse data types.

The database landscape is vast and ever-evolving, with new players and technologies emerging constantly. The key is understanding the fundamental differences between SQL and NoSQL and choosing the right database(s) that align with your system’s requirements.

But what if you want to store specialized data? Well, for that, I recommend the following:

Blob stores: For storing binary data (images, audio, videos, etc.).
Key-value stores: For storing and retrieving data using a unique key-value pair.
Time-series databases: For storing and retrieving large amounts of time-stamped data.
Graph databases: For storing and querying complex relationships between data entities.
Vector databases: These are all the rage in 2024. We use them for storing and querying high-dimensional vectors and mathematical representations of features or attributes. These databases are particularly useful for similarity search and retrieval in applications such as natural language processing, computer vision, and recommender systems.

SQL vs. NoSQL Databases

Beyond choosing the right database type, there are several other essential factors to consider when designing your system’s database:

Data partitioning: Distribute data across multiple locations or servers based on geographical region, access patterns, or data type. This can improve performance, scalability, and resilience. For instance, store data for users from the US on servers located in the US, while also ensuring data redundancy for availability in case of failures.
Data segregation: Separate different types of data, such as user data, application data, or log data, into distinct databases or collections. This can simplify data management, improve security, and optimize performance.
Data format: Understand the strengths and limitations of different data formats (e.g., relational tables, JSON documents, key-value pairs) and choose the format that best suits your data and access patterns. For example, use an SQL database for structured user data and a blob store for storing images in a blog.

Here are some broad applications of the different database types:

SQL Databases	NoSQL Databases	Vector Databases
Enterprise Resource Planning (ERP) Systems Examples: SAP and Oracle ERP Use case: Managing company operations such as accounting, human resources, and supply chain management.	Social Media Platform Examples: Facebook Use case: Storing and retrieving large volumes of unstructured data such as blogs, articles, and multimedia.	Fraud Detection Examples: Pinecone and Milvus Use case: Identifying patterns and anomalies in transaction data represented as vectors to detect fraudulent activities.
Customer Relationship Management (CRM) Systems Examples: Salesforce and Microsoft Dynamics Use case: Tracking customer interactions, sales processes, and customer support.	Real-Time Big Data Analytics Examples: Apache Cassandra and MongoDB Use case: Handling large-scale, distributed data for real-time analytics in applications like social media monitoring and IoT data processing.	Semantic Search Example: Qdrant Use case: Improving search results by understanding the semantic meaning of queries and documents rather than relying solely on keyword matching.

Task queues

In modern systems, many tasks don’t need to be executed immediately, or perhaps they do, but with specific sequencing or prioritization requirements. These tasks range from sending emails and processing images to running complex data analysis. This is where task queues come in handy. Task queues, like Celery, Redis Queue, and AWS SQS (in task queue mode), provide a mechanism to manage and execute background tasks asynchronously.

Task queue in action — A task queue in action

How task queues work

Task creation: A task, essentially a unit of work, is defined and added to the task queue.
Task scheduling: The task queue stores and schedules the task for execution based on priority, dependencies, or other criteria.
Task execution: Workers, separate processes or threads, continuously monitor the task queue. When a worker is available, it picks up a task from the queue and executes it in the background.

Task queues are a valuable tool for improving modern systems’ performance, scalability, and reliability. They allow you to decouple time-consuming tasks from the main application flow, ensuring a smooth and responsive user experience.

By understanding the critical role of task queues in modern System Design for asynchronous processing, you can architect more efficient, scalable, and resilient systems.

All of these are part of the eight common building blocks I recommend starting with, which I have given the pneumonic name SLIC FAST. I can continue with the details of more such building blocks, but you understand the goal now.

Distinguishing between commonly used and specialized components is crucial to designing systems effectively. This approach allows for better focus on the overall design and facilitates easier modifications when needed.

It can be easy to forget some essential system components under pressure, making your design fall short. To help prevent missing any important components, you can start with a foundation of the most common components in a system.

The table below contains important building blocks to design systems effectively.

	Component	Description	Use Case
S	Search System	Builds a searchable index of available data	Enables efficient search and retrieval of information within a system.
L	Load Balancer	Distributes incoming network traffic across multiple servers to prevent overload and ensure high availability.	Improves scalability, performance, and reliability of web applications by preventing any single server from becoming a bottleneck.
I	Interaction with a CDN	Enables integration with a Content Delivery Network (CDN) to cache and deliver content from edge servers.	Improves website performance, reduces latency, and enhances the user experience by serving content from servers geographically closer to the user.
C	Cache	Temporary storage of frequently accessed data to reduce the need to fetch it from slower storage.	Improves system performance and reduces response times by storing frequently accessed data in memory for faster retrieval.
F	Front-end Servers	Handle client requests, serve static content, and interact with back-end servers for dynamic content.	Act as the first point of contact for users accessing a web application and are responsible for delivering the user interface and handling user interactions.
A	Analytics	Collects and analyzes data on user behavior, system performance, and other metrics.	Provides insights into how users interact with a system, identifies areas for improvement, and helps make data-driven decisions to optimize performance and user experience.
S	Storage	Stores data persistently for later retrieval and use.	Provides a reliable and scalable way to store and manage data in a variety of formats, including structured, unstructured, and semi-structured data
T	Task Queue	Manages and schedules tasks that need to be executed asynchronously.	Enables efficient processing of background tasks, improves system responsiveness, and ensures that time-consuming tasks do not block the main application thread.

In Educative’s popular Grokking Modern System Design course, there are sixteen (including the eight above) essential System Design building blocks, like the pub-sub, typeahead system, DNS, etc., with real-world scenarios and designs. Mastering them is a foolproof first step towards acing your SDI.

Let’s see where these key concepts and building blocks fit into the phases of a real-world System Design Interview.

The phases of a System Design Interview

I still remember the feeling of walking into the interview room for my first System Design Interview — the nervousness, not knowing what questions could be asked, etc. But over time (especially after I started conducting interviews myself), I realized that cracking the SDI isn’t about memorizing obscure trivia; it’s about understanding the interviewer’s expectations and demonstrating your ability to tackle complex problems systematically.

Having personally led hundreds of System Design Interviews, I suppose I have a bit of an insider’s take on what interviewers really want to see from candidates. So let’s break down the interview phases and explore the key elements of the interview that can make or break your performance.

Requirements

The initial phase is your chance to shine as an active listener and critical thinker. Don’t just passively absorb the problem statement — feel free to bombard the interviewer with clarifying questions (within reason).

Remember, there’s no such thing as an unusual question. A well-placed question can uncover hidden assumptions, reveal potential bottlenecks, and set the stage for a successful design.

Here are a few questions you should be considering at this phase:

What are the core functionalities of the system?
Who are the primary users accessing the system concurrently?
What are the expected traffic volumes?
Are there any constraints we need to consider?

The expected output of this phase is a list of all the functional and non-functional requirements, which means you now have scoped your design problem.

Example:

Functional requirements for a real-time chat system:

Send a message
Read status
Offline storage (when no network is present)
Group chats

Non-functional requirements:

Availability: The system should be highly available to the users
Scalability: The system should scale with an increasing number of users and chats
Security: A user should only be able to access their chats

Estimations

In this phase, you’ll be expected to estimate various system resource requirements, such as storage capacity, network bandwidth, or query latency. Don’t worry about being 100% accurate; the interviewer is more interested in your thought process and ability to make reasonable assumptions.

Here are a few tips for success:

Round numbers generously—it’s easier to work with orders of magnitude.
Use Fermi estimations—break down complex problems into smaller, more manageable chunks.
Don’t be afraid to ask for clarification or additional information.

The three critical things to estimate are incoming/outgoing bandwidth, storage, and the number of servers required to handle user requests. Remember, this is a rough estimate, so don’t spend too much time on it.

High-level design

This is where you start by identifying the key components of the system and their relationships. Sketch a simple diagram to illustrate the overall flow of data and control. You will need to talk the interviewer through your thinking process so that once you start drawing the design, the interviewer is on the same page.

In my experience, most candidates fail not because they lack technical skills but because they don’t know how to frame their thinking in a way that resonates with the interviewer.

Key considerations for a solid high-level design:

Correct component identification: Have you identified the essential building blocks fundamental to the system’s operation?
Specialized component integration: If needed, have you incorporated specialized components to handle specific requirements (e.g., An encoder for TinyURL or a transcoder for YouTube)?
Proper interconnections: Does the data and control flow between components follow a logical and efficient path, ensuring user requests are handled correctly?
Functional requirements fulfillment: Does the high-level design address the core functionalities that the system needs to deliver?

The output of this phase is a clear and concise conceptual design that outlines the major components, their interactions, and how they contribute to fulfilling the system’s primary functions. This design should be readily understandable by the interviewer and serve as a solid foundation for the more detailed design phases.

Note: At this stage, avoid delving into the specifics of implementation details like database technologies, sharding strategies, or specific algorithms. Those decisions are typically addressed in later, more detailed design phases.

Detailed design

In the detailed design phase (sometimes called the “deep dive”), you’ll examine the technical aspects of your system, elaborate on your initial choices, and solidify the architectural foundation.

This involves carefully considering your algorithms and explaining how you’ll organize and store information to optimize performance and memory usage. Additionally, you’ll discuss the specific techniques and frameworks you’ll leverage to implement the system’s functionality, justifying your choices in the context of the system’s requirements.

High-level design vs. detailed design

Furthermore, articulate how you’ll represent data and interactions within the system, emphasizing the models and protocols that will ensure consistency and interoperability. Detail the API endpoints enabling external systems and users to seamlessly interact with your design, clearly defining input/output formats, error codes, and authentication mechanisms.

Robust error handling and reliability are paramount, so explain how you’ll handle errors, exceptions, and failures to guarantee the system’s stability and resilience. This is also where you’ll extend your high-level design to meticulously address non-functional requirements, such as scalability, availability, maintainability, security, and cost-effectiveness.

Remember, the interviewer evaluates your technical depth and ability to think through edge cases.

Here are some tips to perform your best:

Be sure to justify your design choices and explain the tradeoffs involved.
Improve design by doing a second iteration (if not right now, then in the next phase: discussion)
Showcase your area of expertise to increase chances of up-leveling, such as diving deep into security if you are a cybersecurity expert.
Focus on achieving non-functional requirements as opposed to functional requirements in high-level design.

This phase’s expected output is the system’s final design with all the key components and interactions.

Discussion

The last phase is your opportunity to demonstrate your ability to reflect critically on your design. Discuss potential bottlenecks, scalability challenges, and areas for improvement. Be honest about your limitations and show a willingness to learn and iterate.

Remember, interviewers aren’t just using the System Design Interview to see if you know the “right” answer. They are looking for you to showcase your problem-solving skills, technical expertise, and ability to communicate and collaborate effectively with your interviewer.

There are no right or wrong answers in a System Design Interview — only answers that are adequately or inadequately justified.

It’s possible that the interviewer changes constraints during the discussion. You may also get curveball questions here. In these moments, there will be no substitute for solid preparation.

To help you budget your time during your answer, here is a quick template I created for junior and senior engineers attempting SDIs:

Aspect	Junior Engineer	Mid-Level Engineer	Senior Engineer
Requirements	10 minutes	7 minutes	5 minutes
Estimations	5 minutes	5 minutes	5 minutes
High-Level design	10 minutes	7 minutes	5 minutes
Detailed design	10 minutes	10-15 minutes	15-20 minutes
Discussion	10 minutes	10-15 minutes	15 minutes

All of this can be overwhelming for beginners. The pressure of the interview, the back-and-forth discussions, and the fear of missing crucial details can lead to nervousness and confusion. It’s easy to lose track of the different aspects of the design or get caught in unproductive tangents, wasting valuable time.

That’s where a structured approach like RESHADED comes in. By providing a clear roadmap for navigating the System Design Interview, RESHADED helps you stay focused and organized. Knowing the specific steps to follow and the key areas to cover ensures that you address all angles of the problem, manage your time effectively, and present a well-rounded solution to the interviewer.

Part 2: The RESHADED approach

In the high-stakes environment of a System Design Interview, you’ll encounter various challenges, from deciphering ambiguous requirements to designing scalable, reliable solutions under time pressure. While each design problem is unique, it will inevitably share a few common aspects that you must be prepared to address throughout the interview.

A structured approach is essential to navigate this complexity. It’s not enough to have a deep understanding of system design principles; you need a strategy for applying that knowledge effectively within the interview constraints.

RESHADED is that strategy. It’s a comprehensive framework that guides you through each phase of the system design interview, ensuring you cover all the critical aspects of the problem and present a well-rounded solution.

SDI phases mapped to RESHADED

RESHADED isn’t just a word—each letter represents a critical aspect of System Design to consider during your interview:

Requirements clearly define the problem, scope, and user needs.
Estimation helps with approximating the real-world numbers required to run the system.
Storage helps choose appropriate data storage mechanisms and structures.
High-level design outlines the 1000-foot system architecture.
APIs help design clear and concise interfaces for communication.
Detailed design helps dive deeper into specific components and their interactions.
Evaluation helps evaluate your design solution and discusses trade-offs, bottlenecks, and improvements.
Distinctive component discusses one (or more) unique feature, service, or aspect of the system in detail.

By following RESHADED, you’ll demonstrate a systematic approach and a deep understanding of modern System Design principles, which will give you a good chance of impressing even the toughest interviewers. Whether aiming for a senior software engineering role or seeking to elevate your career, mastering RESHADED is your key to success in the ever-evolving tech landscape.

Pro tip: Use RESHADED as a mental checklist during your interview to ensure you cover all essential aspects of System Design.

I will take you through each of the steps one by one.

1) Requirements

Understanding requirements is the cornerstone of successful System Design. Think of it as laying the foundation before building a house – get this wrong, and everything crumbles.

Functional requirements

These are the core features that define what your system does. For instance, matching riders with drivers is a fundamental functional requirement in a ride-sharing app (like in the classic SDI problem, Design Uber). Without these functional requirements met, your system is just as the name would imply: not functional.

You don’t have to be a magician to extract this information in an interview. It is all about asking the right clarification questions. Let’s look at some examples of what you could ask the interviewer.

“Can you elaborate on the core use cases the system should support?”
“Are there any specific edge cases we need to consider?”

Non-functional requirements

While not directly tied to features, these are equally vital, encompassing performance, scalability, security, and reliability. Remember the key concepts of System Design from before? This is where things like availability come into play.

They define how well your system will operate. In the ride-sharing example, ensuring fast response times during peak hours is a critical non-functional requirement (we would call that good performance). Let’s look at some examples of what you could ask the interviewer.

“Are there any requirements for data backup and disaster recovery?”
“How crucial is data security and privacy for this system?”

Asking insightful questions during the interview isn’t just encouraged—it’s expected. It demonstrates your ability to think critically and grasp the nuances of the problem at hand. Don’t hesitate to seek clarification on both functional and non-functional aspects.

Common pitfalls and pro tips

Skipping the basics: Don’t jump into solutions before thoroughly understanding the requirements. A clear foundation is essential.
Neglecting non-functional aspects: Focusing solely on features can lead to a system that’s brilliant on paper but fails in real-world scenarios.
One-size-fits-all mentality: There’s no universal template for requirements. Each system is unique, so tailor your approach accordingly.
Ask often: Sometimes interviewers withhold information, expecting you to ask them. It is essential to ask questions instead of making assumptions about the requirements.

Once you have a solid grasp of the requirements, it’s time to move on to the next phase: estimation. This is where you’ll start quantifying the scale and scope of your design, paving the way for a robust and efficient system.

2) Estimation

This step is crucial, as it directly impacts technology choices, performance optimization, and, ultimately, the feasibility of your design. The three critical things to estimate are incoming/outgoing bandwidth, storage, and the number of servers required to handle user requests. Remember, this is a rough estimate, so don’t spend too much time on it.

Back-of-the-envelope calculations

Estimation is not about precise numbers but about making informed, reasonable approximations. These back-of-the-envelope calculations (BOTEC) help you understand the order of magnitude of various resources.

Here are some of the questions an interview could ask to estimate the resources needed:

Estimating the number of servers: How many daily active users (DAU) do you expect to support?
Estimating the daily storage requirements: How many tweets are posted daily, and what is the percentage of tweets containing media?
Estimating network requirements: What is the maximum response time expected by the end user?

Making assumptions is not only okay—it’s often necessary. You won’t have access to all the data in a time-constrained interview setting. By stating your assumptions clearly and justifying them based on available information or industry standards, you demonstrate your ability to think critically and adapt to uncertainty.

Here are some sample types of questions that you would be expected to answer during an interview (even if the interviewer doesn’t ask them):

“How many servers would you need to handle 1 million daily active users?”
“What storage capacity would be sufficient for a social media platform with 10 million daily posts?”
“How much network bandwidth is required to achieve a sub-second response time?”

Let’s say in your SDI, you have to calculate Instagram’s storage requirements.

Assume the following:

The total number of users uploading a post per day is 1 million.
The size of an image uploaded to Instagram, on average, is 1 MB.
The size of a video uploaded on average is 20 MB.
The textual content size per post is 2 KB.

Considering that half of the daily uploaded posts have an image while the other half has a video attached to it, as shown below:

0.5 M \times 1 MB + 0.5 M \times 20 MB + 1 M \times 2 KBs = 10.502 TBs

Estimated storage space required for Instagram in a day

Interactive Calculations

You might also need to elaborate to the interviewer:

The above calculations are the storage requirements for a single day only. But for a content-heavy service like Instagram, this storage is negligible. Remember that we haven’t considered any user or application data.

The purpose of BOTEC isn’t to accurately estimate the resources required but to develop an effective System Design. Remember these important numbers during resource estimations:

Important Latencies

Component	Time (Nanoseconds)
L1 cache reference	0.9
L2 cache reference	2.8
L3 cache reference	12.9
Main memory reference	100
Compress 1 KB with Snzip	3,000 (3 microseconds)
Read 1 MB sequentially from memory	9,000 (9 microseconds)
Read 1 MB sequentially from SSD	200,000 (200 microseconds)
Round trip within the same data center	500,000 (500 microseconds)
Read 1 MB sequentially from the SSD with a speed ~1 GB/sec SSD	1,000,000 (1 milliseconds)
Disk seek	4,000,000 (4 milliseconds)
Read 1 MB sequentially from disk	2,000,000 (2 milliseconds)
Send packet SF—>NYC	71,000,000 (71 milliseconds)

Most people underestimate the importance of estimations in a System Design Interview. Intuitively, it makes sense; it is a “design” interview. However, it can be an interviewer’s Hire or No Hire signal. A capable software engineer in 2024 should know more than the system design theory. They should know how to implement it in the real world.

3) Storage schema

While not always mandatory, discussing your storage schema can be a strategic move in system design interviews, especially when dealing with complex data models or performance-critical scenarios.

Consider discussing your storage schema when:

Data is highly normalized: If your data involves intricate relationships and a high degree of normalization, outlining the schema can demonstrate your ability to handle complexity.
Diverse data formats: When different parts of the data must be stored in various formats (e.g., relational databases, NoSQL databases, key-value stores), showcasing your schema can highlight your versatility.
Performance concerns: If efficient data storage and retrieval are paramount, explaining your schema can illustrate your optimization strategies.

Let’s consider our ride-sharing app. We might utilize multiple storage solutions to optimize for different aspects:

Relational database (e.g., PostgreSQL):
- Users table: Stores user profiles (ID, name, contact info, payment details, ratings).
- Trips table: Records trip details (ID, user ID, driver ID, origin, destination, time, fare, status).
- Drivers table: Stores driver profiles (ID, name, contact info, vehicle info, ratings, availability status).
Geospatial database (e.g., MongoDB with geospatial indexing):
- Locations collection: Tracks real-time locations of drivers and riders for efficient matching and route calculation.

With a clear storage schema in place, it’s time to move on to the high-level design of the system. You’ll define the major components and how they interact, creating the blueprint for a robust and scalable ride-sharing platform.

4) High-level design

With requirements and estimations in hand, it’s time to move on to the high-level design—the architectural blueprint of your system. This stage is all about identifying the major components and their interactions, laying the groundwork for a scalable, efficient, and maintainable solution.

Key components and their interactions

At the heart of every high-level design is a breakdown of the system into its constituent parts. These components could be modules, services, APIs, databases, or any other logical building blocks. The key is to strike a balance between granularity and abstraction. Don’t get bogged down in implementation details at this stage. Use abstractions to focus on the overall functionality and communication patterns between components.

This is where you will use the building blocks I told you about in SLIC FAST and the other eight from the Grokking Modern System Design Interview for Engineers and Managers course.

The best approach depends on the problem’s specific requirements and constraints. You’ll demonstrate your ability to think critically and make informed decisions by clearly articulating your design choices and the reasoning behind them.

5) APIs

With your high-level design in place, it’s time to focus on the interfaces that enable communication within your system—the APIs. These are the bridges that allow your components to exchange information seamlessly.

API architectural styles

Selecting the right architectural style is a pivotal decision, each with its strengths and weaknesses:

REST (REpresentational State Transfer) is the most popular choice for its simplicity, scalability, and broad industry adoption. REST APIs are stateless, resource-oriented, and use standard HTTP methods (GET, POST, PUT, DELETE).
gRPC is ideal for high-performance, low-latency communication. It uses Protocol Buffers for efficient serialization and offers bidirectional streaming capabilities.
GraphQL is perfect for complex data requirements where clients need flexibility in requesting specific data. GraphQL minimizes over-fetching and under-fetching of data.

Your choice will depend on various factors, such as the nature of your application and performance requirements. You may also define if your system is monolithic or microservice-based.

Translating requirements into APIs

Remember, your APIs embody your system’s functionality. Carefully translate your functional requirements into corresponding API calls, ensuring each call has a clear purpose and well-defined parameters.

Consider these functional requirements for a ride-sharing app:

Requirement: Users should be able to request a ride
API call: POST /rides (with origin, destination, and user information)
Requirement: Drivers should be able to accept ride requests
API call: PUT /rides/{ride_id}/accept (with driver information)

By thoughtfully designing your APIs, you’ll create a seamless experience for developers and end-users, ultimately contributing to your system’s success.

6) Detailed design

In the detailed design phase, you’ll transform the high-level blueprint into a working plan. This involves breaking down each major component into smaller, more manageable modules.

For example: A content platform’s recommendation engine might be divided into modules for content analysis, user profiling, and recommendation generation. You’ll then define the interfaces and data flows between these modules.

You’ll need to dive into the specifics of data storage and retrieval. If you’re building a social network, you might choose a graph database to model relationships between users and content, optimizing for fast traversal and analysis. If you’re designing a real-time bidding system for an ad platform, you might opt for an in-memory database like Redis to ensure lightning-fast responses.

The most critical aspect required from a candidate in this phase is justifying and reflecting the functional and non-functional requirements from the first phase. As your experience as a software engineer increases, you are expected to focus more on meeting non-functional requirements through your design. That may include things like sharding databases for better availability or performance. Ideally, you should spend half or more of your time for this phase on non-functional requirements.

Think from the user’s perspective. Which requirements would they need the most in order to have a positive experience? Is it availability, performance, or reliability?

For a system like Instagram, for example, performance and scalability are paramount. You’ll analyze potential bottlenecks, such as database queries or network latency, and design solutions to mitigate them. This might involve caching frequently accessed data, using load balancers to distribute traffic, or implementing asynchronous processing to handle computationally intensive tasks, such as uploading a reel.

Instagram‘s critical NFRs

Security and availability are also critical considerations. To protect user data and prevent unauthorized access, you must implement robust authentication and authorization mechanisms. Input validation, data sanitization, and encryption are also essential to ensure the integrity and confidentiality of sensitive information. In a complete trade-off, you must implement CDNs, caching, and redundancy to achieve availability.

By thoroughly considering all aspects of your system – from component interactions to data storage, performance optimization to security measures – you’ll create a solid foundation for a reliable, efficient, and secure software solution.

7) Evaluation

In the evaluation phase, you must step back and view your design critically. Does your system truly meet the needs it was designed for? Are there any hidden vulnerabilities or performance issues that could arise under real-world conditions?

Here, you will need to discuss tradeoffs in your design, identify bottlenecks, and suggest improvements. You can see an example below:

Performance vs. scalability: A design optimized for raw performance might not scale well under heavy load. Conversely, a highly scalable design might sacrifice some performance for flexibility.
Reliability vs. cost: Building a highly reliable system often involves redundancy and fault tolerance, which can increase costs. Finding the right balance is crucial.

Remember, the evaluation phase is not about finding fault with your design. It’s about demonstrating your ability to think critically, analyze complex systems, and make informed decisions that lead to the most robust and effective solution possible.

Here is the three-step solution to ace any SDI evaluation phase:

Start by revisiting the original requirements. Can you confidently say that your design fulfills each one? Be prepared to justify your choices with concrete examples and evidence. If there are any gaps, acknowledge them and discuss potential solutions.
Perform bottleneck analysis. Imagine your system under stress—what parts are most likely to buckle under pressure? Are there any single points of failure that could bring down the entire system? Identifying these weaknesses allows you to proactively address them before they become real problems.
As you analyze your design, you’ll inevitably encounter trade-offs. Perhaps you sacrificed some speed for greater scalability, or you opted for a simpler solution at the expense of some flexibility. Be transparent about these trade-offs, explaining the reasoning behind your decisions.

8) Distinctive component

Finally, here’s the secret sauce to any pro-level System Design Interview answer.

Every System Design has unique components or features. That’s why we added a distinctive component section to the RESHADED guideline. Depending on your needs, you may need to include a unique feature to your system to meet the functional or non-functional requirements.

For example, if you’re designing a video streaming platform, you might introduce a machine learning-powered adaptive bitrate algorithm that dynamically adjusts video quality based on network conditions. This optional step is your chance to set yourself apart from other candidates, showcasing your ability to think outside the box and develop novel solutions.

Pro tip: If you have time, briefly mention a unique feature you would add to your design to improve it further.

To bring RESHADED to life, let’s explore a practical case study: designing a URL shortener like TinyURL. This seemingly simple service presents a fascinating array of design challenges and opportunities for innovation. By applying RESHADED, we can systematically dissect the requirements, estimate the scale, devise a storage schema, architect the system, define APIs, and delve into the detailed design. Along the way, we’ll uncover potential bottlenecks, evaluate trade-offs, and consider distinctive features that could set our URL shortener apart.

Watch the video below for a case study on Uber’s design.

Part 3: The TinyURL case study

We will apply our learning from the previous two sections in this case study. Let’s use Design TinyURL as the sample System Design problem to solve. I will also give you a mock interview from the perspective of an interviewer and a candidate.

Interviewer: I want you to design a system similar to TinyURL.

Candidate: Alright, TinyURL is a URL-shortening service that produces short aliases for long URLs, commonly called short links. Upon clicking, these short links direct to the original URLs.

TinyURL shortens over 1 million URLs every day.

Interviewer: Yes, that‘s correct. How will you proceed with the design?

Candidate: Let’s start with listing down the requirements.

Your first response could be, “What is TinyURL?” This question not only clarifies the system you are designing but also provides a primary insight into the interviewer’s mind, revealing what aspects they value most about the system.

Remember RESHADED? Your first instinct should be to start with the R in RESHADED, which is requirements.

Requirements

Before giving the interviewer a list of requirements, it is important to ask the interviewer about the scope of the system, as that can signal to you what requirements the interviewer deems necessary. In any case, to ace this System Design Interview, you must nail down what your TinyURL-like service needs to do and how it should perform under pressure.

Interviewer: Sounds good. What requirements are the most relevant for a system like TinyURL?

Candidate: I think these are the most important requirements:

Must-haves

Shortening: Rapidly convert lengthy URLs into concise, unique short links.
Redirection: Instantly redirect users from short links to the original, full-length URLs.
Expiry: Handle deletion of URLs after an expiry time.

Nice-to-haves

Customization: Let users create personalized short links (think vanity URLs).
Link management: Allow users to edit, delete, or set expiration dates for their short links.
Analytics: Track clicks and usage patterns for each short link (optional but valuable).

Interviewer: Okay, what about the non-functional requirements?

Candidate: Sure, those will include the following list:

Non-functional requirements

High availability: Redirection must be near-instantaneous, without downtime.
Scalability: The system should grow gracefully as traffic increases.
Readability: Short links should be easy to type and remember.
Low latency: Snappy redirects create a seamless user experience.
Unpredictability: No sequential short links – this enhances security.

Interviewer: These sound like the right requirements for a service like TinyURL. So, based on the requirements, what do you plan on doing after sorting them out?

After telling the interviewer the requirements you think are necessary, take a moment to scope out which requirements are the most essential and how many of them you can fit into the interview (as it usually only lasts 45 minutes). The interviewer may tell you what requirements you can leave out for now (if you are lucky). These requirements go far beyond a simple checklist, as they will shape the rest of your interview. Once you have sorted these out, you can move on to estimations.

Estimation

Candidate: Now that we are done with the requirements, I will do some estimations.

Interviewer: Alright, how will you proceed?

Candidate: Let me start by assuming the following:

Assumptions

We assume that the URL shortening-to-redirection request ratio is 1:10. This means that most users will visit the link as compared to generating short links. In short, hundreds of times more reads than writes.
There are 200 million new URL shortening requests per month.
- So, there are 76 URL shortening requests and 7.6K URL redirection requests per second.

See the complete calculations in Educative’s flagship course on System Design!

A URL shortening entry requires 500 bytes of database storage.
Each entry will have a maximum of five years of expiry time unless explicitly deleted.
There are 100 million daily active users (DAU).

Interviewer: Okay, so what will be your storage requirements?

Candidate: Here is what I am thinking for the storage:

Storage estimation

Since entries are saved for 5 years and there are 200 million entries per month, the total number of entries will be approximately 12 billion.

200 Million / month \times 12 months / year \times 5 years = 12 Billion URL shortening requests

Since each entry is 500 Bytes, the total storage estimate would be 6 TB:

12 Billion \times 500 Bytes = 6 TB

Interviewer: Okay, do you think a 5-year expiry time is feasible?

Candidate: That seems pretty standard compared to real-world products. For accuracy, let’s estimate based on a year, too.

200 Million / month \times 12 months / year \times 1 year = 2.4 Billion URL shortening requests

Since each entry is 500 Bytes, the total storage estimate would be 1.2 TB:

2.4 Billion \times 500 Bytes = 1.2 TB

Interviewer: Looks good to me. Let’s move on to the network load of this application.

Candidate: Sure, here are my bandwidth estimations:

Bandwidth estimation

Shortening requests: The expected arrival rate will be 76 new URLs per second. The total incoming data would be 304 Kbps304 \ Kbps304 Kbps:

76 \times 500 Bytes \times 8 bits = 304 Kbps

Redirection requests: Because the expected rate would be 7.6K URL redirections per second, the total outgoing data would be 30.4 Mbps30.4\ Mbps30.4 Mbps:

7.6 K \times 500 Bytes \times 8 bits = 30.4 Mbps

The total bandwidth required by the URL shortening service:

304 Kbps + 30.4 Mbps = 30.7 Mbps

Servers estimation

Considering our assumption of using daily active users as a proxy for the number of requests per second for peak load times, we get 100 million requests per second. Then, we use the following formula to calculate the number of servers:

Servers needed at peak load = \frac{Number of requests/second}{RPS of server}

Servers needed at peak load = \frac{100 million}{64,000} = 1562.5 \sim 1.6 K servers

Note: This 64,000 number is not random; it comes from the calculations in the Grokking Modern System Design Interview for Engineers and Managers course.

Candidate: Here are all the relevant numbers based on our calculations.

Type of Operation	Estimates
New URLs	76 / s
URL redirections	7.6 K / s
Incoming data	304 Kbps
Outgoing data	30.4 Mbps
Storage for 5 years	6 TB
Servers	1600

Interviewer: Makes sense, how will you store this data?

Notice how the formulas used are quite simple. The beauty of these estimations, which we call back-of-the-envelope calculations, is the power of intelligent assumptions.

Candidate: I think this problem is quite simple in terms of the storage schema, as we just need to store the original URLs along with their corresponding short URLs. For that, we can use a key-value store or a hash map.

Interviewer: Alright, with your foundation built, you can move on to the high-level design.

Remember the eight building blocks from SLIC FAST? We will utilize them. The first step in any high-level design is to determine which building blocks you need. You will also need to justify why you need each one of them.

High-level design

Candidate: For sure, here are the basic components I will use in my design:

Database(s) will be needed to store the mapping of long URLs and the corresponding short URLs.
Load balancers at various layers will ensure smooth request distribution among available servers.
Caches will be utilized to store the most frequent short URL-related requests.

Let me show you their interactions.

High-level design for TinyURL

Interviewer: Will the one database be connected to all the different services? What do you think about the scalability of that solution?

Candidate: Of course, we will have separate caches at the servers as well, but as we are dealing with a central repository of URLs, this solution is the simplest. For scalability, we can shard and distribute this database.

Once you have this high-level design, depending on your seniority, you may need to justify the interactions between the components to the interviewer. Now, you can move on to defining the interfaces used between the services and the API design.

APIs

In my opinion, you should at least define the endpoints for the must-haves functional requirements unless you are in the API design interview.

Interviewer: Okay, can you define endpoints for these services to define the scope of their operations?

Candidate: Sure, we will need only three endpoints as a start.

Shortening a URL

shortenURL(original_url, custom_alias=None, expiry_date=None)

Here, we can define an API endpoint, such as POST /shortener with the parameters in the request’s body, we will get the shortened URL in the response.

Redirecting a short URL

redirectURL(url_key)

Here, we can define an API endpoint, such as GET /redirect?url=URL_KEY, we will get the full-length URL in the response.

Deleting a short URL

deleteURL(url_key)

Much like shortening, we can define an API endpoint for deletion: DELETE /shortener/{shortURL}.

This phase is usually optional in SDIs, as API design can sometimes be a separate interview. To learn more about API design, see the course Grokking the Product Architecture Interview.

Interviewer: Looks good to me. Let’s proceed. What will you do next?

Candidate: Let’s move toward a detailed design for the system.

Interviewer: Okay, how will you start?

Detailed design

In the detailed design phase, we dive into the specifics of our TinyURL system. This is where our earlier estimations become essential. Remember those back-of-the-envelope calculations we did? They help us determine the server capacity and storage requirements and anticipate potential bottlenecks. I can’t stress this enough: always relate your design decisions to those estimates. This shows the interviewer that you aren’t just a theorist but can practically apply your knowledge. For instance, if our calculations indicate a high request volume, your choice of database technology at the persistence layer will change.

Candidate: I will start with revisiting the building blocks, we will need a couple more to have a complete design.

Rate limiters will be used to avoid system exploitation.
The sequencer will provide unique IDs to serve as a starting point for each short URL generation.

Don’t worry if you haven’t seen them before; you can find them in the comprehensive course on system design: Grokking Modern System Design Interview for Engineers & Managers.

Building blocks in the detailed design

Interviewer: How will you decide the limits to apply using the rate limiter?

Candidate: I am thinking of a simple approach using the estimations we made. We estimated the peak load server capacity, which was on1 req per user per second1 \ req \ per \ user\ per \ second1 req per user per second, so we will need to limit one user to one request per second to stay within capacity.

Interviewer: Okay, can you show me the complete design?

You might want to take your time here, as knowing everything you will need is like starting a recipe with all the required ingredients present. It’s much harder to add something later on once you have started the design.

Candidate: Sure. Here are all the components and their interactions.

Design for TinyURL

Interviewer: Looks good. How will the short URL generator work?

Candidate: We will use an encoder to turn URLs into short URLs, like I showed you in the beginning.

Interviewer: How do you feel about your design meeting the functional requirements?

Candidate: Sure, referring back, this system meets all the must-have requirements. We store the shortened URLs in the data store, the web server handles requests for the redirection. It also deletes URLs on request or expiry.

Interviewer: Let’s test our design and see if it meets our non-functional requirements and the real-world constraints that can make or break our system.

Evaluation

Candidate: Our goal was to build a URL shortener that is highly available, scalable, and secure while also maintaining fast response times and generating unpredictable short URLs.

Here’s how my design achieves that:

Availability is non-negotiable: To achieve this, we’ve incorporated several layers of redundancy, from database replication to load balancing. Regular backups can ensure minimal data loss, even in a worst-case scenario.
Scalability is key to future-proofing our system: Our horizontally scalable design, combined with consistent hashing for load balancing, allows us to seamlessly add more servers as our user base grows. Our data storage can easily scale alongside our processing capacity thanks to sharding capabilities.
Security is paramount: Short URLs are unpredictable to prevent unauthorized access or malicious activities. Our random ID selection mechanism adds a layer of obfuscation, making it difficult to guess the next short URL in sequence. While we’ve opted for a simpler approach, more advanced techniques like salting could further enhance security.
Performance is critical to user experience: Nobody wants to wait for a short URL to generate or a long URL to load. That’s why we’ve focused on minimizing latency at every process step, e.g., distributed caching. While our URL shortening process may take a few milliseconds, the impact on user experience is negligible.

Interviewer: That’s a detailed rundown. What tradeoffs did you consider for your system?

Trade-offs and bottlenecks

Candidate: No System Design is perfect, and ours is no exception. For instance, our emphasis on high availability and scalability comes at a cost. The use of distributed systems and replication introduces complexity, which can make maintenance and troubleshooting more challenging. Given the nature of a URL-shortening service, we’ve also decided to prioritize read performance over the write performance. This is a tradeoff we’ve deemed acceptable, but it’s worth noting that it could become a bottleneck when write operations become unexpectedly frequent.

Interviewer: Interesting thought. What other bottlenecks can you identify in the system?

Candidate: Another potential bottleneck lies in reliance on a single database. It’s still a single point of failure. If the database goes down, our entire service could be disrupted. We could explore database clustering or multi-region replication to mitigate this risk, but these options come with complexities and costs.

Ultimately, the art of System Design lies in making informed decisions and balancing competing priorities. There’s no one-size-fits-all solution; the best approach will always depend on the problem’s requirements and constraints.

Interviewer: That was a productive conversation. I just want to revisit the encoder, as that’s the system’s life. Can you elaborate on your thinking behind it?

Distinctive component

Candidate: Sure, the distinctive feature of our TinyURL system is the use of an encoder, specifically a base-58 encoder. It might seem like a small detail, but it plays a significant role in our service’s usability and overall success. Base-58 encoding is a specific way of representing data that excludes characters that could be easily confused or misinterpreted, like 0 (zero) and O (capital o). This improves the readability of our short URLs and reduces the chances of errors when users manually type them.

Think of it like this: would you rather receive a short URL that looks like “hRaIrAiZs” or one that looks like “easyToReadUrl”? The latter is more user-friendly and less prone to typos. That’s the power of base-58 encoding.

Interviewer: Why not a base-64 encoder, then? as that’s more scalable?

Candidate: The answer lies in the characters base-64, including “+” and “/.” These characters can cause problems in certain contexts, such as URLs or file systems. Using base-58, we avoid these potential issues and ensure our short URLs are compatible with various platforms and applications. It’s a small but important detail that contributes to the overall robustness of our system.

Interviewer: That concludes today‘s interview. Thank you for your time.

A mid to senior-level software engineer in an SDI is expected to discuss the pros and cons of the distinctive component. To learn more about the encoder in TinyURL, check out this lesson.

Conclusion

That’s it. You are now equipped with everything you need to be successful in your next System Design Interview. In this comprehensive guide, we’ve taken an essential tour of SDIs, covering everything from fundamental System Design concepts to advanced strategies.

We’ve explored the RESHADED approach, a powerful framework that can guide you through the design of any system, from a simple URL shortener to a complex distributed system. We’ve also delved into the key building blocks of modern systems, from databases and caches to load balancers and task queues. By understanding these building blocks and their interactions, you’ll be well-equipped to design scalable, reliable, and performant systems that meet the demands of the real world.

Remember, the key to acing your System Design Interview is not just memorizing facts or formulas (in fact, any experienced interviewer will flag any candidate who appears to be regurgitating a memorized solution). System Design success is more about understanding the underlying principles, thinking critically, and applying your knowledge to solve real-world problems. It’s about demonstrating your ability to design systems that handle anything the digital world throws at them. If you can apply everything in this guide in your next interview, I am confident that you will be in excellent shape.

Ready to test your System Design skills? Try an AI Mock Interview to evaluate your strengths and skill gaps.

Additional System Design resources

As we wrap up this crash guide, remember that the discipline System Design is dynamic and ever-evolving. New technologies and challenges emerge constantly, but the core principles we’ve discussed here remain timeless. So, embrace the learning process, experiment, and never stop asking questions. By doing so, you’ll excel in your next System Design Interview and become a more effective and impactful software engineer.

If you want to deepen your knowledge and gain hands-on experience, consider checking out Educative’s resources. Every course is designed by industry pros, and the interactive platform makes it easy to actually get hands-on with the technologies you are learning about, with no setup or hassle.

I will link to some of my personal favorites below. Good luck with the prep — and happy learning!

Share it to others

System Design

System Design Interview Handbook

System Design Interview Guide

System Design Interview Questions for Senior Engineers

Exploring distributed file systems