System Design Fundamentals: Throughput vs. Latency

I wanted to introduce you to some foundational system design concepts that we work with every day. Two of the most critical concepts you'll encounter are throughput and latency. Understanding them, and the trade-offs between them, is fundamental to designing robust and efficient systems.

Let's break them down.

What is Latency?

Think of latency as a measure of delay. It's the time it takes for a single piece of data to travel from its source to its destination. This round-trip time includes network travel time, the time the server takes to process the request, and any delays from other services it depends on.

Analogy: Imagine you're ordering a single pizza. Latency is the total time from the moment you place your order until that one pizza arrives at your door. It includes the time it takes for the pizzeria to receive the order, make the pizza, and for the delivery driver to travel to your house.

How it's measured: Latency is typically measured in milliseconds (ms) or seconds (s). Lower latency is almost always better, as it means a faster, more responsive system for the user.

What is Throughput?

Think of throughput as a measure of capacity or rate. It's the total number of operations or requests that a system can handle successfully within a specific time period.

Analogy: Let's go back to the pizza shop. Throughput isn't about how fast one pizza gets delivered, but how many pizzas the entire shop can produce and deliver in an hour. A shop with a bigger oven and more delivery drivers has higher throughput than a small one, even if a single delivery takes the same amount of time.

How it's measured: Throughput is measured in units like requests per second (RPS), transactions per second (TPS), or data transfer rates like megabits per second (Mbps). Higher throughput means the system can handle more load.

The Key Difference: Delay vs. Capacity

The simplest way to remember the difference is:

Latency: How fast is a single request? (A measure of time)

Throughput: How many requests can be handled at once? (A measure of rate)

A system can have low latency but also low throughput. For example, a single, highly-optimized web server might respond to one request in just 50ms (low latency), but it may become overwhelmed and fail if it receives more than 10 requests per second (low throughput).

Conversely, a system can have high throughput but also high latency. A data processing pipeline might be able to process a million records per hour (high throughput), but it might take 10 minutes for any single record to get through the entire system (high latency).

Real-World Examples

1. Video Streaming (e.g., Netflix, YouTube):

Latency: When you click play on a video, how long does it take for the video to actually start playing? That initial buffering time is latency. A low latency here is crucial for a good user experience.

Throughput: How many users can stream videos simultaneously from the service without buffering issues? This is a measure of the platform's throughput. Netflix needs massive throughput to serve millions of customers worldwide at the same time.

2. E-commerce Website (e.g., Amazon):

Latency: When you add an item to your cart or click "Buy Now," how quickly does the page update to confirm your action? That's latency. A slow response can lead to frustrated users and abandoned carts.

Throughput: During a major sales event like Black Friday, how many orders can the system process per minute? This is a critical throughput concern. If the system's throughput is too low, the site will slow down or crash under the heavy load.

Why is This Important for System Design?

As architects, we are constantly making decisions that involve a trade-off between latency and throughput.

User Experience: For user-facing systems, low latency is often the primary goal. Users perceive fast systems as being better.
Scalability & Cost: High throughput is essential for systems that need to serve a large number of users or process large amounts of data. Achieving high throughput often involves adding more resources (servers, databases), which increases costs. Our job is to design a system that meets required throughput to support business growth and revenue, especially during peak loads, without over-provisioning and wasting money.
Trade-offs: Sometimes, improving throughput can increase latency, and vice-versa. For example, batching several small requests into one larger request can increase throughput (the system handles more items overall), but it increases the latency for any individual item because it has to wait for the batch to be full. We have to decide which is more important for the specific use case.

I hope this gives you a solid starting point. We'll be discussing these concepts in nearly every design meeting we have. Don't hesitate to ask questions as you see them come up in our work.

Resources:

Difference Between Throughput And Latency(Amazon)
Difference Between Throughput And Latency(Geeks for Geeks)
Latency vs Throughput(designgurus)

SnugAcademyTech