Back to blog
TUTORIAL 8 min read Jan 28, 2026

Debugging Kubernetes Networking with eBPF

A deep dive into using eBPF to trace packet drops and latency spikes in complex microservices environments.

Stefan Nikolić

Senior SRE

Kubernetes networking is notoriously difficult to debug. Between CNI plugins, kube-proxy iptables rules, service meshes, and network policies, a single packet might pass through half a dozen transformation layers before reaching its destination. Traditional debugging tools like tcpdump show you what's happening at one point in the stack — they can't tell you where a packet got dropped or why.

eBPF changes this equation completely. By attaching programs to kernel functions, you can observe every packet at every layer of the networking stack without modifying any application code or adding overhead.

Setting Up the Environment

We'll use bpftrace for ad-hoc investigations and a custom eBPF program compiled with libbpf for production monitoring. First, ensure your kernel supports BTF (BPF Type Format) — you need kernel 5.2+ with CONFIG_DEBUG_INFO_BTF enabled. On most modern distributions, this is already the case.

Install bpftrace on your debug node. We recommend keeping a dedicated debug DaemonSet that has the necessary privileges but is scaled to zero by default. When you need to investigate, scale it up on the affected node.

Tracing Packet Drops

The most common networking issue in Kubernetes is packet drops. They manifest as intermittent timeouts, connection resets, and mysterious latency spikes. The kernel has a tracepoint specifically for this: `skb:kfree_skb`. Every time a packet is intentionally dropped, this tracepoint fires with the reason.

Attach a bpftrace probe to this tracepoint and you'll immediately see which kernel function is dropping packets and why. The most common culprits in Kubernetes environments are: conntrack table overflow (the kernel can't track more connections), network policy denials (Calico or Cilium dropping traffic that doesn't match policy), and TCP backlog overflow (the receiving application isn't accepting connections fast enough).

Diagnosing Latency

For latency investigations, tracing the full packet lifecycle is essential. Attach probes to `tcp_v4_connect`, `tcp_rcv_state_process`, and `tcp_close` to measure connection establishment time. If SYN packets are being retransmitted (visible via `tcp_retransmit_skb`), you've found your latency source.

In one production incident, we discovered that kube-proxy's iptables rules were causing a 50ms delay on every new connection. The iptables chain had grown to over 15,000 rules due to a large number of Services, and the linear search through these rules added measurable latency. Switching to IPVS mode reduced connection setup time by 40x.

Production Monitoring with eBPF

For ongoing monitoring, compile your eBPF programs and run them as part of your observability stack. Export metrics via the Prometheus exposition format: conntrack table utilization, packet drop rates by reason, TCP retransmission rates by service, and connection establishment latency distributions.

These metrics will catch networking issues before they become user-visible. A spike in conntrack table utilization gives you hours of warning before connections start getting dropped. Rising retransmission rates indicate network congestion or misconfiguration that you can address proactively.