Load-balancing with K8S
Publish on 2025-04-10
Client-side load-balancing in Kubernetes

đź’ˇ This article presume you already has a basic understanding about GRPC

Why this is a trick question?

  • ingress-nginx leaks features and well-forged doc, but yet fast and widely-applied
  • connect rpc’s content-type is different from grpc’s, few materials can be found

Connect RPC

https://connectrpc.com/

Play with K8S

When we talk about service in k8s, we talk about load-balance. There are two ways of load-balance in general:

  • server side load-balance
  • client side load-balance

So how do they usually get implemented?

It should be noted that here we focus on client-side load-balancing specifically.

Server side load-balance

  • Managed: The configuration in service like Nodeport, ClusterIP and Loadbalancer.
  • Ingress Controllers: These act as Layer 7 server-side balancers. The request hits the Ingress Pod, which looks at the HTTP path/host and proxies the request to a backend pod. Here is the explanation on ingress-nginx’s update: https://github.com/kubernetes/ingress-nginx/issues/9620

Client side load-balance

  • DNS-Based (Headless Service):
  1. Create a Service with clusterIP: None.
  2. When client queries my-service.namespace.svc, the K8s DNS server doesn’t return one virtual IP. Instead, it looks at the EndpointSlice and returns all the Pod IPs as multiple A records.
  3. Client Responsibility: The client receives this list and must implement its own logic (Round Robin, Least Request) to pick which IP to connect to.

Headless Services are just a way to “leak” the EndpointSlice data into the DNS system.

  • API-Based (Mesh & Libraries):

This is more powerful because it bypasses the limitations of DNS (like caching/TTL).

  1. Observer subscribes to the EndpointSlice API updates via a Long Poll or Watch.
  2. Example:
  • Istio/Linkerd: The control plane watches the EndpointSlice and pushes the IPs to the Envoy Sidecar. The sidecar intercepts the traffic and routes it directly to a Pod IP.
  • Smart Libraries: The code calls the K8s API to get the EndpointSlice, stores the IPs in a local cache, and chooses an IP before sending the request.

This is the way

TL; DR

Motivation

image.png

GRPC use HTTP/2, multiplexing after the conn established, maintain roundtrip via stream. HTTP2 will not TCP dial to create a new L4 connection. so the request over the same connection will always go to the same pod.

Connect RPC can use both HTTP1 and HTTP2. In go, default HTTP1 will create new connection when suffering races of request (while no idle connection exists). Different connections will be “fake load balance” to pods in EndpointSlice.

But if there is no racing or you are using HTTP2, LB (load balance) will be impossible if scaling happens - new pod will not receive any request.

Goal

Fast load balance even when pods are scaling, without the influences of DNS refresh delay and Ungraceful exit.

Basic Agreement

  • Concurrency-based test, 500 concurrency, 300 request per concurrency, request stream takes no time, response stream takes 10ms.
  • use ingress-nginx as ingress.
  • 5 replicas at the beginning.
  • In scaling experiment, scale from 5 pods to 7 pods, then scale from 7 pods to 5 pods.
  • Default use HTTP2 connection.

LB over Ingress

Load balancing is a piece of cake for Ingress. Or even more straight forward: It’s so f**king easy for user to achieve load balancing using ingress. Because ingress decouples the L7 and L4 problem, besides we do not consider multi-host in L7 here.

So, as for client-server scenario, client maintain the connection to nginx, ingress-nginx maintain the connections to upstream pods.

At the same time, ingress-nginx watch the EndpointSlice of the service, and dynamically update the connection to upstream pods. https://github.com/kubernetes/ingress-nginx/issues/9620

streaming client: Fine
streaming server: Fine
Bidirectional streaming: Fine

Issues in Ingress-nginx

  1. Ungracefully Exit (GOAWAY in HTTP2 and RESET by PEER in HTTP1)

đź’ˇ WHY ungracefully?
https://trac.nginx.org/nginx/ticket/2224

  • Keep-alive-requests:

Sets the maximum number of requests that can be served through one keep-alive connection. After the maximum number of requests are made, the connection is closed.

Can we disable this setting?

đź’ˇ NO

https://nginx.org/en/docs/http/ngx_http_core_module.html#keepalive_requests

  1. Lack of various balancing strategy Ingress-nginx only offer round-robin and ewma by default.

LB over Service

Solution: https://github.com/bufbuild/httplb

This project is currently in alpha. The API should be considered unstable and likely to change.

httplb is a package for OSI L7 load balancing, using HostPort as key to manage connection, re-resolve DNS periodically, and update according to the latest resolving result. Which settled the issue filed in https://medium.com/jamf-engineering/how-three-lines-of-configuration-solved-our-grpc-scaling-issues-in-kubernetes-ca1ff13f7f06.

However, in K8S, if pod exit ungracefully, or exit gracefully but coredns did not update in time. Will make the wrong address appears in the DNS resolving result. Lead to sufficient efficiency issue in httplb.

Service pod Scaling

When a pod is killed (simulate scaling),httplb encapaslated client (round-robin on connections) triggers the following issue: for each client, when encounter a unhealthy conn, client will dial till timeout.

All the httplb connections are lagging for 30 second, the default dialer timeout is exactly 30 second (see the log below)

unavailable: read tcp 10.3.193.73:35503->10.3.192.185:4000: read: connection reset by peer
unavailable: read tcp 10.3.193.73:35489->10.3.192.185:4000: read: connection reset by peer
unavailable: read tcp 10.3.193.73:35477->10.3.192.185:4000: read: connection reset by peer
...
unavailable: read tcp 10.3.193.73:35495->10.3.192.185:4000: read: connection reset by peer
unavailable: read tcp 10.3.193.73:35479->10.3.192.185:4000: read: connection reset by peer
unavailable: dial tcp 10.3.192.185:4000: connect: connection refused
...
unavailable: dial tcp 10.3.192.185:4000: connect: connection refused
deadline_exceeded: Post "[http://crpc-demo-api-headless:4000/demo.v1.DemoService/Ping](http://crpc-demo-api-headless:4000/demo.v1.DemoService/Ping)": dial tcp 10.3.192.185:4000: i/o timeout
deadline_exceeded: Post "[http://crpc-demo-api-headless:4000/demo.v1.DemoService/Ping](http://crpc-demo-api-headless:4000/demo.v1.DemoService/Ping)": dial tcp 10.3.192.185:4000: i/o timeout
deadline_exceeded: Post "[http://crpc-demo-api-headless:4000/demo.v1.DemoService/Ping](http://crpc-demo-api-headless:4000/demo.v1.DemoService/Ping)": dial tcp 10.3.192.185:4000: i/o timeout
...
deadline_exceeded: Post "[http://crpc-demo-api-headless:4000/demo.v1.DemoService/Ping](http://crpc-demo-api-headless:4000/demo.v1.DemoService/Ping)": dial tcp 10.3.192.185:4000: i/o timeout
deadline_exceeded: Post "[http://crpc-demo-api-headless:4000/demo.v1.DemoService/Ping](http://crpc-demo-api-headless:4000/demo.v1.DemoService/Ping)": dial tcp 10.3.192.185:4000: i/o timeout
deadline_exceeded: Post "[http://crpc-demo-api-headless:4000/demo.v1.DemoService/Ping](http://crpc-demo-api-headless:4000/demo.v1.DemoService/Ping)": dial tcp 10.3.192.185:4000: i/o timeout

Try to explain the above log

  1. when pod getting killed, go http server process is signaled by SIGKILL, connection lost → conn reset
  2. the network resources of pod has not yet been recycled, and requests arrived to the unlistened ip:port → conn refused
  3. pod has been removed completely, but the routing rules of iptable is not been updated yet, results in no response with any TCP dial → dialer context timeout
  4. Iptable rules is updated → no route to host

Normally, 20 concurrency with 150000 requests costs approximately 5s.

But with killing one pod, it takes 36.7s. Unfortunately, this can’t be overcame by just reduce the DNS resolving Interval and Dial Timeout. Since the Cache TTL of coredns is 30s https://github.com/kubernetes/kubernetes/issues/92559, so either you develop your own optimized dns server, or set a smaller TTL (which can heavily damage the performance).

Kill Random Pod

kill a random pod 1 second after test start, the result satisfies our inference

[Task 1731653783095054917]
Trail number: 150000
Time elapsed: 36711.305 ms
Success: 149096, Fail: 904
Load Balance:
crpc-demo-api-86888895d7-vsxc2 - 16679
crpc-demo-api-86888895d7-d9sd4 - 30000
crpc-demo-api-86888895d7-f8zqw - 30000
crpc-demo-api-86888895d7-7kx2h - 12417
crpc-demo-api-86888895d7-n48ns - 30000
crpc-demo-api-86888895d7-hqrkh - 30000

Solution

  1. Apply least request, if one requests hang on one connection, just let it hang. The other requests will use the healthy connection, a simple yet brutal way. This works for the API that responses quickly, if the request outlive the dialer timeout, it will just fail.
  2. Create a health checker to check if the connection is still alive once for a while, when a connection is believed dead, just skip it. k8s encourage liveness probing by default. However, it seems that we are recreating the “mesh” and “service discovery”, but in a naive way. If you prefer Istio or Consul, it’s fine.
// httplb healthchecker example
dialer := net.Dialer{Timeout: 5 * time.Second}
client := httplb.NewClient(
	WithHealthChecks(health.NewPollingChecker(
		health.PollingCheckerConfig{
			PollingInterval: 5 * time.Second,
			Timeout: 1 * time.Second,
		},
		health.NewSimpleProber("healthz"),
	)),
	WithDialer(dialer.DialContext),
)

Q&A

References

  1. https://kubernetes.github.io/ingress-nginx/how-it-works/#avoiding-reloads-on-endpoints-changes
  2. https://medium.com/@lapwingcloud/dont-load-balance-grpc-or-http2-using-kubernetes-service-ae71be026d7f
  3. https://www.reddit.com/r/kubernetes/comments/13a6p15/how_does_headless_service_route_traffic_from/
© 2024 humbornjo :: based on 
nobloger  ::  rss