Building a Production-Ready Load Balancer from Scratch in Go
- Published on
- Arnab Mondal--14 min read
Overview
- The Itch That Started It All
- What Is a Load Balancer, Really?
- The Architecture We're Building
- The Server Pool: Managing Multiple Backends
- Health Checks: The Heartbeat of Reliability
- The Retry Strategy: Don't Give Up Too Easily
- Graceful Shutdown: Don't Drop Requests
- What's Next?
The Itch That Started It All
It started with a question I couldn't shake: "What actually happens when thousands of requests hit my servers?"
Sure, I knew the textbook answer—load balancers distribute traffic across servers. But knowing something conceptually and understanding it deeply are two different things. I'd been deploying apps behind NGINX and AWS ALBs for years, treating them as black boxes. One day, that stopped being okay.
So I decided to build one from scratch. In Go. No frameworks, no shortcuts—just the standard library and a lot of curiosity.
What I discovered along the way wasn't just how load balancers work, but why every design decision matters when you're standing between users and your backend servers. This post is that journey—part case study, part tutorial. By the end, you'll understand the internals well enough to build your own.
What Is a Load Balancer, Really?
Before we write any code, let's get crystal clear on what we're building. A load balancer is essentially a traffic cop that sits between clients and your backend servers:
When a request arrives, the load balancer:
- Receives the incoming HTTP request
- Chooses which backend server should handle it (using an algorithm like Round Robin)
- Forwards the request to that backend
- Returns the backend's response to the client
Simple in concept. Deceptively tricky in execution.
The Architecture We're Building
Here's what we'll implement:
- Round Robin selection: Fair distribution across healthy backends
- Health checks: Both passive (detect failures from real requests) and active (periodic probes)
- Automatic failover: If a backend dies, skip it and try another
- Retry logic: Don't give up on the first hiccup
- Graceful shutdown: Finish in-flight requests before stopping
- Connection pooling: Reuse connections for efficiency
Let's dive in.
Setting Up the Foundation
Request-Scoped Tracking with Context
The first challenge: how do we track attempts and retries per request when multiple requests are being handled concurrently? This is where Go's context.Context shines.
Why context instead of global variables? Each HTTP request runs in its own goroutine. If we used globals, requests would interfere with each other's counters. Context lets us carry request-specific data through the call chain safely.
Why empty struct types as keys? Using string keys like "attempts" could collide with other packages. Empty struct types are unique and take zero bytes—they're just type markers.
The Backend Struct
Each backend server needs to track its URL, health status, and have a reverse proxy ready to forward requests:
Notice the sync.RWMutex—this is critical. Multiple goroutines will read the alive status simultaneously (every request checks it), but only health checks write to it. RWMutex allows multiple readers OR one writer, which is more efficient than a regular mutex.
The Server Pool: Managing Multiple Backends
The ServerPool is where the magic happens. It holds all backends and implements Round Robin selection:
Round Robin Selection
Round Robin is elegantly simple: Backend 1, then Backend 2, then Backend 3, then back to Backend 1. Like taking turns fairly.
Why atomic operations? Multiple requests hit the load balancer concurrently. Without atomic.AddUint64, two goroutines could read the same value, both increment it, and both write back the same result—losing an increment. Atomic operations guarantee the increment happens without interruption.
Getting the Next Available Peer
Here's where it gets interesting. We don't just pick the next backend—we pick the next alive backend:
The loop checks up to one full rotation of backends. If the first choice is dead, we try the next, and so on. If we had to skip some dead backends, we update current so the next request doesn't waste time checking those same dead backends first.
Health Checks: The Heartbeat of Reliability
A load balancer is only as good as its knowledge of backend health. We implement two types of health checking:
Passive Health Checks
These happen automatically when a real request fails. If the reverse proxy can't reach a backend, we mark it as down:
This is reactive—we discover failures from actual traffic.
Active Health Checks
We also proactively probe backends at regular intervals:
Why copy the slice? Without copying, we'd hold the lock for the entire duration of all health checks (potentially seconds), blocking other goroutines from adding/removing backends. Copying gives us a safe snapshot to work with while releasing the lock immediately.
Why pass b as a parameter? This is a classic Go gotcha. If we wrote go func() { ... backend ... }(), all goroutines would capture the same loop variable and see its final value. Passing it as a parameter gives each goroutine its own copy.
The Health Check Client
For efficiency, we use a shared HTTP client with connection pooling:
Why connection pooling? Creating TCP connections is expensive. If we have 10 backends and check health every 10 seconds, that's 10 new connections every 10 seconds—wasteful. With pooling, we reuse existing connections.
The Load Balancer Handler
This is the core function that handles every incoming request:
Clean and simple. The complexity is hidden in the reverse proxy's error handler.
The Retry Strategy: Don't Give Up Too Easily
When a backend fails, we don't immediately give up. Our strategy:
- Retry the same backend a few times (maybe it's a temporary glitch)
- If retries exhausted, mark the backend as down
- Try a different backend (if we haven't exceeded
maxAttempts)
The select statement is elegant—it either waits for the retry delay OR detects if the client cancelled the request. No wasted retries on abandoned connections.
Setting Up Backends with Connection Tuning
Production-grade connections need proper timeouts:
Each timeout prevents a different failure mode:
- DialTimeout: Don't wait forever for unresponsive hosts
- IdleConnTimeout: Clean up unused connections
- ResponseHeaderTimeout: Detect backends that accept connections but never respond
Graceful Shutdown: Don't Drop Requests
When someone hits Ctrl+C, we don't want to drop in-flight requests. Graceful shutdown:
srv.Shutdown() stops accepting new requests but lets existing ones finish—up to our 30-second timeout.
Running It: A Quick Demo
First, create a simple backend server to test with:
Now spin everything up:
Test the round-robin distribution:
You'll see responses from different backends in rotation:
Now kill one backend and watch the load balancer automatically route around it!
Key Takeaways
Building this load balancer taught me several things:
-
Concurrency is everything: Go's goroutines and channels make concurrent health checks elegant. Without them, checking 10 backends would be painfully slow.
-
Context is your friend: Request-scoped data like attempt counts should live in context, not globals. This keeps concurrent requests isolated.
-
Defense in depth: We have both passive health checks (detect failures from real traffic) and active health checks (proactive probing). Either alone isn't enough.
-
Timeouts everywhere: Every network operation needs a timeout. Without them, a single slow backend can exhaust all your connections.
-
Graceful shutdown matters: In production, you don't want to drop requests mid-flight. The few extra lines of code are worth it.
What's Next?
This load balancer is functional but basic. In a production scenario, you might want to add:
- Weighted Round Robin: Give more traffic to beefier backends
- Least Connections: Route to the backend with fewest active connections
- Sticky Sessions: Keep a user on the same backend (useful for stateful apps)
- Rate Limiting: Prevent any single client from overwhelming your backends
- Metrics/Observability: Export Prometheus metrics for monitoring
- TLS Termination : Handle HTTPS at the load balancer
The foundation we've built makes all of these extensions possible. The hard part—concurrent request handling, health checking, graceful failover—is already done.
Building this was one of those projects that made systems concepts click in a way that reading never could. I hope it does the same for you. If you build something cool with this foundation—or find bugs I missed—I'd love to hear about it.
Want to discuss distributed systems or Go? Feel free to reach out at hi@codewarnab.in