When we first deployed OpenTelemetry collectors across our production fleet, we were processing around 500K spans per second. Six months later, that number had grown 20x — and our original architecture was buckling under the load.
The bottleneck wasn't where we expected. Most teams immediately point to the collector's processing pipeline, but our profiling showed that the real culprit was the OTLP receiver's gRPC connection handling. Each new connection was being allocated a disproportionate amount of memory, and under high concurrency, garbage collection pauses were creating cascading timeouts.
The Architecture Shift
We moved from a single-tier collector topology to a three-tier fan-out architecture. Edge collectors run as DaemonSets on every node, performing minimal processing — just enough to tag spans with node metadata and forward them. Regional aggregators handle the heavy lifting: tail-based sampling, span enrichment, and metric derivation. Finally, gateway collectors manage the connection to our storage backends.
This separation of concerns reduced memory pressure on any single component by 85%. More importantly, it allowed us to scale each tier independently based on its specific bottleneck — CPU for aggregators, network I/O for gateways, and memory for edge collectors.
Tail-Based Sampling at Scale
The real cost savings came from implementing intelligent tail-based sampling. Instead of sampling at the head (which means you lose interesting traces), we collect all spans and make sampling decisions after the trace is complete. Error traces? Always kept. Traces with latency above P95? Always kept. Routine health checks? Sampled at 1%.
This reduced our storage costs by 73% while actually improving our debugging capability. When something breaks, we have 100% of the relevant data.
Lessons Learned
First, don't over-process at the edge. The temptation to add more processors to your collector pipeline is strong, but every transformation adds latency and memory overhead. Push processing downstream where you have more headroom.
Second, invest in backpressure handling early. When downstream systems slow down, your collectors need to gracefully degrade — dropping low-priority telemetry rather than crashing entirely. We implemented a priority queue system that ensures error and high-latency spans are never dropped, even under extreme load.
Third, monitor your monitoring. It sounds recursive, but if your observability pipeline goes down silently, you're flying blind. We run a separate, lightweight monitoring stack that watches our main OTel infrastructure.