Building a Production-Ready Load Balancer from Scratch in Go

Published on
Arnab Mondal-
14 min read

Overview

The Itch That Started It All

It started with a question I couldn't shake: "What actually happens when thousands of requests hit my servers?"

Sure, I knew the textbook answer—load balancers distribute traffic across servers. But knowing something conceptually and understanding it deeply are two different things. I'd been deploying apps behind NGINX and AWS ALBs for years, treating them as black boxes. One day, that stopped being okay.

So I decided to build one from scratch. In Go. No frameworks, no shortcuts—just the standard library and a lot of curiosity.

What I discovered along the way wasn't just how load balancers work, but why every design decision matters when you're standing between users and your backend servers. This post is that journey—part case study, part tutorial. By the end, you'll understand the internals well enough to build your own.

What Is a Load Balancer, Really?

Before we write any code, let's get crystal clear on what we're building. A is essentially a traffic cop that sits between clients and your backend servers:

Clients
Load Balancer
1. Receives
2. Chooses
3. Forwards
4. Returns
Backend 1
Backend 2
Backend 3

When a request arrives, the load balancer:

  1. Receives the incoming HTTP request
  2. Chooses which backend server should handle it (using an algorithm like )
  3. Forwards the request to that backend
  4. Returns the backend's response to the client

Simple in concept. Deceptively tricky in execution.

The Architecture We're Building

Here's what we'll implement:

  • Round Robin selection: Fair distribution across healthy backends
  • Health checks: Both passive (detect failures from real requests) and active (periodic probes)
  • Automatic failover: If a backend dies, skip it and try another
  • Retry logic: Don't give up on the first hiccup
  • Graceful shutdown: Finish in-flight requests before stopping
  • Connection pooling: Reuse connections for efficiency

Let's dive in.

Setting Up the Foundation

Request-Scoped Tracking with Context

The first challenge: how do we track attempts and retries per request when multiple requests are being handled concurrently? This is where Go's context.Context shines.

go
1// Create unique types for context keys (avoids key collisions) 2type attemptKey struct{} 3type retryKey struct{} 4 5// Extract attempt count from request context 6func getAttempts(r *http.Request) int { 7 if v, ok := r.Context().Value(attemptKey{}).(int); ok { 8 return v 9 } 10 return 1 // First attempt 11} 12 13// Extract retry count from request context 14func getRetries(r *http.Request) int { 15 if v, ok := r.Context().Value(retryKey{}).(int); ok { 16 return v 17 } 18 return 0 19}

Why instead of global variables? Each HTTP request runs in its own . If we used globals, requests would interfere with each other's counters. Context lets us carry request-specific data through the call chain safely.

Why empty struct types as keys? Using string keys like "attempts" could collide with other packages. Empty struct types are unique and take zero bytes—they're just type markers.

The Backend Struct

Each backend server needs to track its URL, health status, and have a ready to forward requests:

go
1type Backend struct { 2 URL *url.URL // Parsed URL of this backend 3 alive bool // Is this backend healthy? 4 mu sync.RWMutex // Protects the alive field 5 ReverseProxy *httputil.ReverseProxy // Does the actual forwarding 6} 7 8func (b *Backend) SetAlive(alive bool) { 9 b.mu.Lock() 10 defer b.mu.Unlock() 11 b.alive = alive 12} 13 14func (b *Backend) IsAlive() bool { 15 b.mu.RLock() 16 defer b.mu.RUnlock() 17 return b.alive 18}

Notice the sync.RWMutex—this is critical. Multiple goroutines will read the alive status simultaneously (every request checks it), but only health checks write to it. allows multiple readers OR one writer, which is more efficient than a regular mutex.

The Server Pool: Managing Multiple Backends

The ServerPool is where the magic happens. It holds all backends and implements Round Robin selection:

go
1type ServerPool struct { 2 backends []*Backend 3 current uint64 // Atomic counter for round-robin 4 mu sync.RWMutex // Protects the backends slice 5} 6 7func (p *ServerPool) AddBackend(b *Backend) { 8 p.mu.Lock() 9 defer p.mu.Unlock() 10 p.backends = append(p.backends, b) 11}

Round Robin Selection

Round Robin is elegantly simple: Backend 1, then Backend 2, then Backend 3, then back to Backend 1. Like taking turns fairly.

go
1func (p *ServerPool) NextIndex() int { 2 p.mu.RLock() 3 defer p.mu.RUnlock() 4 5 if len(p.backends) == 0 { 6 return 0 7 } 8 9 // Atomic increment + modulo wraps around 10 return int(atomic.AddUint64(&p.current, 1) % uint64(len(p.backends))) 11}

Why ? Multiple requests hit the load balancer concurrently. Without atomic.AddUint64, two goroutines could read the same value, both increment it, and both write back the same result—losing an increment. Atomic operations guarantee the increment happens without interruption.

Getting the Next Available Peer

Here's where it gets interesting. We don't just pick the next backend—we pick the next alive backend:

go
1func (p *ServerPool) GetNextPeer() *Backend { 2 p.mu.RLock() 3 defer p.mu.RUnlock() 4 5 if len(p.backends) == 0 { 6 return nil 7 } 8 9 next := int(atomic.AddUint64(&p.current, 1) % uint64(len(p.backends))) 10 limit := len(p.backends) + next // Full cycle limit 11 12 for i := next; i < limit; i++ { 13 idx := i % len(p.backends) 14 15 if p.backends[idx].IsAlive() { 16 // Update current if we skipped dead backends 17 if i != next { 18 atomic.StoreUint64(&p.current, uint64(idx)) 19 } 20 return p.backends[idx] 21 } 22 } 23 24 return nil // All backends are down 25}

The loop checks up to one full rotation of backends. If the first choice is dead, we try the next, and so on. If we had to skip some dead backends, we update current so the next request doesn't waste time checking those same dead backends first.

Health Checks: The Heartbeat of Reliability

A load balancer is only as good as its knowledge of backend health. We implement two types of health checking:

Passive Health Checks

These happen automatically when a real request fails. If the reverse proxy can't reach a backend, we mark it as down:

go
1serverPool.MarkBackendStatus(backendURL, false)

This is reactive—we discover failures from actual traffic.

Active Health Checks

We also proactively probe backends at regular intervals:

go
1func (p *ServerPool) HealthCheck(timeout time.Duration, path string) { 2 // Copy backends slice to release lock quickly 3 p.mu.RLock() 4 backends := make([]*Backend, len(p.backends)) 5 copy(backends, p.backends) 6 p.mu.RUnlock() 7 8 var wg sync.WaitGroup 9 10 for _, b := range backends { 11 wg.Add(1) 12 go func(backend *Backend) { 13 defer wg.Done() 14 15 alive := isBackendAlive(backend.URL, timeout, path) 16 backend.SetAlive(alive) 17 18 status := "up" 19 if !alive { 20 status = "down" 21 } 22 log.Printf("healthcheck: %s [%s]\n", backend.URL, status) 23 }(b) 24 } 25 26 wg.Wait() 27}

Why copy the slice? Without copying, we'd hold the lock for the entire duration of all health checks (potentially seconds), blocking other goroutines from adding/removing backends. Copying gives us a safe snapshot to work with while releasing the lock immediately.

Why pass b as a parameter? This is a classic Go gotcha. If we wrote go func() { ... backend ... }(), all goroutines would capture the same loop variable and see its final value. Passing it as a parameter gives each goroutine its own copy.

The Health Check Client

For efficiency, we use a shared HTTP client with :

go
1var healthCheckClient = &http.Client{ 2 Timeout: 2 * time.Second, 3 Transport: &http.Transport{ 4 MaxIdleConns: 10, // Total idle connections 5 MaxIdleConnsPerHost: 2, // Per-backend idle connections 6 IdleConnTimeout: 30 * time.Second, // Close stale connections 7 }, 8} 9 10func isBackendAlive(base *url.URL, timeout time.Duration, path string) bool { 11 u := *base // Copy the URL struct 12 u.Path = path 13 u.RawQuery = "" 14 u.Fragment = "" 15 16 ctx, cancel := context.WithTimeout(context.Background(), timeout) 17 defer cancel() 18 19 req, err := http.NewRequestWithContext(ctx, "GET", u.String(), nil) 20 if err != nil { 21 return false 22 } 23 24 resp, err := healthCheckClient.Do(req) 25 if err != nil { 26 return false 27 } 28 defer resp.Body.Close() 29 30 // 2xx or 3xx = alive 31 return resp.StatusCode >= 200 && resp.StatusCode < 400 32}

Why connection pooling? Creating TCP connections is expensive. If we have 10 backends and check health every 10 seconds, that's 10 new connections every 10 seconds—wasteful. With pooling, we reuse existing connections.

The Load Balancer Handler

This is the core function that handles every incoming request:

go
1func (cfg lbConfig) lb(w http.ResponseWriter, r *http.Request) { 2 attempts := getAttempts(r) 3 4 // Give up if we've tried too many backends 5 if attempts > cfg.maxAttempts { 6 log.Printf("%s %s: max attempts reached (%d)\n", 7 r.RemoteAddr, r.URL.Path, attempts) 8 http.Error(w, "Service not available", http.StatusServiceUnavailable) 9 return 10 } 11 12 // Get the next available backend 13 peer := serverPool.GetNextPeer() 14 if peer == nil { 15 http.Error(w, "Service not available", http.StatusServiceUnavailable) 16 return 17 } 18 19 // Forward the request 20 peer.ReverseProxy.ServeHTTP(w, r) 21}

Clean and simple. The complexity is hidden in the reverse proxy's error handler.

The Retry Strategy: Don't Give Up Too Easily

When a backend fails, we don't immediately give up. Our strategy:

  1. Retry the same backend a few times (maybe it's a temporary glitch)
  2. If retries exhausted, mark the backend as down
  3. Try a different backend (if we haven't exceeded maxAttempts)
go
1proxy.ErrorHandler = func(w http.ResponseWriter, r *http.Request, err error) { 2 log.Printf("[%s] proxy error: %v\n", backendURL.Host, err) 3 4 // Client disconnected? 5 if r.Context().Err() != nil { 6 http.Error(w, "Request cancelled", http.StatusRequestTimeout) 7 return 8 } 9 10 retries := getRetries(r) 11 12 // Retry same backend? 13 if retries < cfg.maxRetries { 14 select { 15 case <-time.After(cfg.retryDelay): 16 ctx := context.WithValue(r.Context(), retryKey{}, retries+1) 17 proxy.ServeHTTP(w, r.WithContext(ctx)) 18 case <-r.Context().Done(): 19 http.Error(w, "Request cancelled", http.StatusRequestTimeout) 20 } 21 return 22 } 23 24 // Mark this backend as down (passive health check) 25 serverPool.MarkBackendStatus(backendURL, false) 26 27 // Try a different backend 28 attempts := getAttempts(r) 29 ctx := context.WithValue(r.Context(), attemptKey{}, attempts+1) 30 cfg.lb(w, r.WithContext(ctx)) 31}

The select statement is elegant—it either waits for the retry delay OR detects if the client cancelled the request. No wasted retries on abandoned connections.

Setting Up Backends with Connection Tuning

Production-grade connections need proper timeouts:

go
1proxy.Transport = &http.Transport{ 2 DialContext: (&net.Dialer{ 3 Timeout: 30 * time.Second, // Connection establishment timeout 4 KeepAlive: 30 * time.Second, // Keep connections alive 5 }).DialContext, 6 MaxIdleConns: 100, // Total idle connections 7 MaxIdleConnsPerHost: 10, // Per-backend idle connections 8 IdleConnTimeout: 90 * time.Second, // Close stale connections 9 TLSHandshakeTimeout: 10 * time.Second, // TLS timeout 10 ExpectContinueTimeout: 1 * time.Second, // 100-continue timeout 11 ResponseHeaderTimeout: 10 * time.Second, // Header read timeout 12}

Each timeout prevents a different failure mode:

  • DialTimeout: Don't wait forever for unresponsive hosts
  • IdleConnTimeout: Clean up unused connections
  • ResponseHeaderTimeout: Detect backends that accept connections but never respond

Graceful Shutdown: Don't Drop Requests

When someone hits Ctrl+C, we don't want to drop in-flight requests. Graceful shutdown:

go
1// Channel to receive OS signals 2quit := make(chan os.Signal, 1) 3signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM) 4 5// Start server in background 6go func() { 7 log.Printf("load balancer listening on http://localhost:%d\n", port) 8 if err := srv.ListenAndServe(); err != nil && err != http.ErrServerClosed { 9 log.Fatalf("Server failed: %v", err) 10 } 11}() 12 13// Wait for shutdown signal 14<-quit 15log.Println("shutting down load balancer...") 16 17// Stop health checks 18healthCheckCancel() 19 20// Give requests 30 seconds to complete 21shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 30*time.Second) 22defer shutdownCancel() 23 24if err := srv.Shutdown(shutdownCtx); err != nil { 25 log.Printf("Error during shutdown: %v\n", err) 26} else { 27 log.Println("load balancer stopped gracefully") 28}

srv.Shutdown() stops accepting new requests but lets existing ones finish—up to our 30-second timeout.

Running It: A Quick Demo

First, create a simple backend server to test with:

go
1// backend/main.go 2package main 3 4import ( 5 "encoding/json" 6 "flag" 7 "fmt" 8 "log" 9 "net/http" 10 "time" 11) 12 13type response struct { 14 Name string `json:"name"` 15 Port int `json:"port"` 16 Method string `json:"method"` 17 Path string `json:"path"` 18 Time string `json:"time"` 19} 20 21func main() { 22 var port int 23 var name string 24 25 flag.IntVar(&port, "port", 8081, "Port to listen on") 26 flag.StringVar(&name, "name", "backend-1", "Backend name") 27 flag.Parse() 28 29 mux := http.NewServeMux() 30 31 // Health endpoint 32 mux.HandleFunc("/health", func(w http.ResponseWriter, r *http.Request) { 33 w.WriteHeader(http.StatusOK) 34 w.Write([]byte("ok")) 35 }) 36 37 // Main endpoint - shows which backend handled the request 38 mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) { 39 w.Header().Set("Content-Type", "application/json") 40 json.NewEncoder(w).Encode(response{ 41 Name: name, 42 Port: port, 43 Method: r.Method, 44 Path: r.URL.Path, 45 Time: time.Now().Format(time.RFC3339Nano), 46 }) 47 }) 48 49 srv := &http.Server{ 50 Addr: fmt.Sprintf(":%d", port), 51 Handler: mux, 52 } 53 54 log.Printf("backend %q listening on http://localhost:%d\n", name, port) 55 log.Fatal(srv.ListenAndServe()) 56}

Now spin everything up:

bash
1# Terminal 1: Start backend 1 2go run backend/main.go -port 8081 -name backend-1 3 4# Terminal 2: Start backend 2 5go run backend/main.go -port 8082 -name backend-2 6 7# Terminal 3: Start backend 3 8go run backend/main.go -port 8083 -name backend-3 9 10# Terminal 4: Start the load balancer 11go run lb/main.go -backends "http://localhost:8081,http://localhost:8082,http://localhost:8083"

Test the round-robin distribution:

bash
1# Hit the load balancer multiple times 2curl http://localhost:8080/ 3curl http://localhost:8080/ 4curl http://localhost:8080/

You'll see responses from different backends in rotation:

json
1{"name":"backend-1","port":8081,"method":"GET","path":"/","time":"..."} 2{"name":"backend-2","port":8082,"method":"GET","path":"/","time":"..."} 3{"name":"backend-3","port":8083,"method":"GET","path":"/","time":"..."}

Now kill one backend and watch the load balancer automatically route around it!

Key Takeaways

Building this load balancer taught me several things:

  1. Concurrency is everything: Go's goroutines and channels make concurrent health checks elegant. Without them, checking 10 backends would be painfully slow.

  2. Context is your friend: Request-scoped data like attempt counts should live in context, not globals. This keeps concurrent requests isolated.

  3. Defense in depth: We have both passive health checks (detect failures from real traffic) and active health checks (proactive probing). Either alone isn't enough.

  4. Timeouts everywhere: Every network operation needs a timeout. Without them, a single slow backend can exhaust all your connections.

  5. Graceful shutdown matters: In production, you don't want to drop requests mid-flight. The few extra lines of code are worth it.

What's Next?

This load balancer is functional but basic. In a production scenario, you might want to add:

  • Weighted Round Robin: Give more traffic to beefier backends
  • Least Connections: Route to the backend with fewest active connections
  • Sticky Sessions: Keep a user on the same backend (useful for stateful apps)
  • Rate Limiting: Prevent any single client from overwhelming your backends
  • Metrics/Observability: Export Prometheus metrics for monitoring
  • : Handle HTTPS at the load balancer

The foundation we've built makes all of these extensions possible. The hard part—concurrent request handling, health checking, graceful failover—is already done.


Building this was one of those projects that made systems concepts click in a way that reading never could. I hope it does the same for you. If you build something cool with this foundation—or find bugs I missed—I'd love to hear about it.

Want to discuss distributed systems or Go? Feel free to reach out at hi@codewarnab.in

Building a Production-Ready Load Balancer from Scratch in Go | Arnab Mondal - CodeWarnab