SERVERLESS 5 min read Jan 05, 2026

Cold Starts: The Silent Performance Killer

Strategies for keeping your Lambda functions warm and optimizing initialization time for latency-sensitive applications.

Ana Đorđević

ML Engineering Lead

Every serverless platform has a dirty secret: cold starts. When a function hasn't been invoked recently, the platform needs to allocate a container, download your code, initialize the runtime, and run your initialization logic before it can serve the first request. This can add anywhere from 100ms to several seconds of latency.

For background jobs and async processing, cold starts don't matter. But for synchronous API endpoints that users are waiting on, a 2-second cold start is the difference between a snappy experience and a frustrated customer.

Anatomy of a Cold Start

A cold start has four phases: container allocation (50-200ms, controlled by the platform), runtime initialization (50-500ms, depends on your language), dependency loading (100ms-2s, depends on your bundle size), and handler initialization (variable, depends on your code).

You can't control the first phase, but you can optimize the other three. Language choice matters enormously: Go and Rust cold start in under 100ms, Python in 200-500ms, Java in 1-5 seconds, and .NET in 500ms-2s. If cold start latency is critical, choose your runtime accordingly.

Minimizing Bundle Size

The fastest cold start is the one that doesn't happen. But when it does, smaller bundles mean faster downloads and initialization. For Node.js, use esbuild to tree-shake your dependencies and produce a single bundled file. We reduced our Lambda package from 45MB to 3MB and cut cold start time by 60%.

For Python, avoid importing large packages at the module level. Instead of `import pandas`, import only what you need inside the handler function. Better yet, use Lambda Layers for heavy dependencies — they're cached across invocations and cold starts.

Keeping Functions Warm

The most reliable strategy is provisioned concurrency. You pay for a fixed number of pre-warmed instances that are always ready to serve requests. For our critical API endpoints, we maintain provisioned concurrency equal to our P50 concurrent invocations and let on-demand scaling handle spikes.

For less critical endpoints, a scheduled ping (CloudWatch Events triggering the function every 5 minutes) keeps at least one instance warm. This doesn't help with burst traffic, but it eliminates cold starts for steady-state load.

The Larger Question

If cold starts are a significant problem for your workload, it's worth asking whether serverless is the right fit. A container running on Fargate or ECS has no cold starts and offers more predictable performance. The operational simplicity of serverless is valuable, but not if it comes at the cost of user experience.

We've moved our latency-sensitive endpoints to always-on containers while keeping batch processing and event handlers on Lambda. This hybrid approach gives us the best of both worlds: predictable latency for user-facing APIs and zero-ops scaling for background workloads.

▶Continue Reading

Access & Security

Core Business Services

Orchestration & Data

Infrastructure & Core

Service Delivery

Business Strategy

Get in Touch

Help & Resources

Cold Starts: The Silent Performance Killer

Anatomy of a Cold Start

Minimizing Bundle Size

Keeping Functions Warm

The Larger Question

Related articles

Scaling OpenTelemetry for High-Volume Data Ingestion

The Agentic Future: Beyond Passive Monitoring

Debugging Kubernetes Networking with eBPF