"Load Balancer" Must Know's for System Design Interview

Illustaration and Courtesy: appviewx

A load balancer is a device or software that distributes incoming network traffic across multiple servers. Its primary goal is to optimize resource utilization, maximize throughput, minimize response time, and avoid overloading any single server. This ensures high availability and reliability for applications.

How Load Balancers Work

When a client sends a request, the load balancer intercepts it. Instead of sending the request directly to a specific server, the load balancer uses a defined algorithm to decide which backend server is best suited to handle the request. This decision is typically based on factors such as server health, current load, and pre-configured rules.

Benefits of Load Balancing

High Availability: If one server fails, the load balancer redirects traffic to the remaining healthy servers, preventing service interruptions.

Scalability: Applications can scale horizontally by adding more servers to the backend pool without significant downtime.

Performance: By distributing requests evenly, load balancers prevent individual servers from becoming bottlenecks, leading to faster response times.

Redundancy: Provides a layer of fault tolerance, as the failure of a single server does not bring down the entire application.

Security: Some load balancers offer features like DDoS protection and SSL offloading.

Kinds of Load Balancing Strategies

Load balancing strategies determine how incoming requests are distributed among the backend servers. These strategies can be broadly categorized into static and dynamic methods.

Static Load Balancing Algorithms

Static algorithms do not consider the current state of the servers (e.g., CPU utilization, memory usage) when making distribution decisions. They rely on pre-configured rules.

1. Round Robin

Description: Requests are distributed to servers sequentially in a circular fashion. Each server gets a turn.

Use Case: Suitable when all backend servers have similar processing capabilities and handle similar workloads.

Pros: Simple to implement, ensures fair distribution over time.

Cons: Does not account for varying server capacities or current load, potentially leading to an overloaded server if one is slower or has more ongoing tasks.

2. Weighted Round Robin

Description: An enhancement of Round Robin where each server is assigned a "weight" based on its processing capacity or resources. Servers with higher weights receive more requests.

Use Case: Ideal when backend servers have different hardware specifications or capabilities.

Pros: Allows more efficient distribution based on server capacity.

Cons: Still static; doesn't dynamically adjust to real-time server load or health issues.

3. IP Hash

Description: The load balancer calculates a hash of the client's IP address. This hash value is then used to determine which server will handle the request. This ensures that a specific client consistently connects to the same server.

Use Case: Useful for applications that require session persistence (sticky sessions) without relying on application-level session management.

Pros: Provides session affinity, simplifying application design for stateful applications.

Cons: If a server fails, all sessions tied to that server are lost. Can lead to uneven distribution if client IP addresses are not uniformly distributed.

Dynamic Load Balancing Algorithms

Dynamic algorithms take into account the current state of the backend servers, such as their load, response time, or health, to make smarter distribution decisions.

1. Least Connection

Description: Directs incoming requests to the server with the fewest active connections.

Use Case: Most effective when requests vary significantly in processing time.

Pros: Ensures that newly arriving requests are sent to less busy servers, leading to better overall performance.

Cons: Only considers the number of connections, not the complexity or duration of those connections. A server with few connections might still be heavily loaded if those connections are long-running or resource-intensive.

2. Weighted Least Connection

Description: Similar to Least Connection, but also considers the server's pre-assigned weight. Servers with higher weights are expected to handle more connections. The algorithm distributes requests to servers with the lowest ratio of active connections to their weight.

Use Case: When servers have varying capacities and the goal is to balance load based on active connections while accounting for those differences.

Pros: More intelligent than simple Least Connection for heterogeneous server environments.

Cons: Still doesn't fully account for the "work" each connection represents.

3. Least Response Time

Description: Routes requests to the server that has the fastest response time, often measured by monitoring the time it takes for a server to respond to health checks or actual requests.

Use Case: Ideal for applications where low latency is critical.

Pros: Directs traffic to the most performant server, leading to optimal user experience.

Cons: Requires constant monitoring of server response times, which can add overhead. Response time can fluctuate, leading to rapid changes in server assignments.

4. Least Bandwidth

Description: Directs incoming requests to the server that is currently serving the least amount of network traffic (measured in Mbps or Gbps).

Use Case: Applications that involve large data transfers or streaming where network throughput is a primary concern.

Pros: Optimizes network resource utilization.

Cons: Doesn't consider CPU or memory load, only network traffic.

5. Resource-Based (Adaptive)

Description: This is the most advanced strategy, where the load balancer agents are installed on the backend servers. These agents report real-time server metrics (CPU utilization, memory usage, current connections, etc.) to the load balancer. The load balancer then uses this comprehensive data to make the most informed decision about where to send the next request.

Use Case: Highly dynamic and complex environments where optimal resource utilization across all server metrics is crucial.

Pros: Provides the most intelligent and flexible load distribution, adapting to real-time server conditions.

Cons: Requires more complex setup and ongoing management, as it involves deploying and maintaining agents on each server.

Types of Load Balancers

Load balancers can also be categorized based on their deployment and functionality:

Hardware Load Balancers

Dedicated physical devices (e.g., F5 BIG-IP, Citrix NetScaler).
Offer high performance and specialized features.
More expensive and less flexible than software alternatives.

Software Load Balancers

Can run on standard servers (e.g., HAProxy, NGINX Plus).
More flexible, scalable, and cost-effective.
Can be deployed in virtual machines or containers.

Cloud-Based Load Balancers

Provided as a service by cloud providers (e.g., AWS Elastic Load Balancing, Google Cloud Load Balancing, Azure Load Balancer).
Fully managed, highly scalable, and integrated with other cloud services.
Abstracts away the underlying infrastructure.

Layer 4 vs. Layer 7 Load Balancing

Load balancers operate at different layers of the OSI model:

Layer 4 (Transport Layer) Load Balancing

Focus: Distributes traffic based on IP addresses and TCP/UDP ports.

Mechanism: Looks at packet headers (source/destination IP, port number) to route traffic. Does not inspect the actual content of the packets.

Pros: High performance, lower latency, simpler to implement.

Cons: Lacks application-level intelligence, cannot perform content-based routing or SSL offloading.

Examples: TCP load balancing, UDP load balancing.

Layer 7 (Application Layer) Load Balancing

Focus: Distributes traffic based on application-level information, such as HTTP headers, URLs, cookies, and even the content of the request.

Mechanism: Terminates the connection from the client, inspects the full request, and then establishes a new connection to the backend server.

Pros: Provides advanced features like SSL offloading, content-based routing, URL rewriting, sticky sessions (based on cookies), and web application firewall (WAF) integration.

Cons: More resource-intensive due to deeper packet inspection, higher latency.

Examples: HTTP/HTTPS load balancing.

In a system design interview, understanding these concepts in detail will allow you to discuss the trade-offs and appropriate use cases for different load balancing approaches in various architectural scenarios.

SnugAcademyTech