Every serverless platform has a dirty secret: cold starts. When a function hasn't been invoked recently, the platform needs to allocate a container, download your code, initialize the runtime, and run your initialization logic before it can serve the first request. This can add anywhere from 100ms to several seconds of latency.
For background jobs and async processing, cold starts don't matter. But for synchronous API endpoints that users are waiting on, a 2-second cold start is the difference between a snappy experience and a frustrated customer.
Anatomy of a Cold Start
A cold start has four phases: container allocation (50-200ms, controlled by the platform), runtime initialization (50-500ms, depends on your language), dependency loading (100ms-2s, depends on your bundle size), and handler initialization (variable, depends on your code).
You can't control the first phase, but you can optimize the other three. Language choice matters enormously: Go and Rust cold start in under 100ms, Python in 200-500ms, Java in 1-5 seconds, and .NET in 500ms-2s. If cold start latency is critical, choose your runtime accordingly.
Minimizing Bundle Size
The fastest cold start is the one that doesn't happen. But when it does, smaller bundles mean faster downloads and initialization. For Node.js, use esbuild to tree-shake your dependencies and produce a single bundled file. We reduced our Lambda package from 45MB to 3MB and cut cold start time by 60%.
For Python, avoid importing large packages at the module level. Instead of `import pandas`, import only what you need inside the handler function. Better yet, use Lambda Layers for heavy dependencies — they're cached across invocations and cold starts.
Keeping Functions Warm
The most reliable strategy is provisioned concurrency. You pay for a fixed number of pre-warmed instances that are always ready to serve requests. For our critical API endpoints, we maintain provisioned concurrency equal to our P50 concurrent invocations and let on-demand scaling handle spikes.
For less critical endpoints, a scheduled ping (CloudWatch Events triggering the function every 5 minutes) keeps at least one instance warm. This doesn't help with burst traffic, but it eliminates cold starts for steady-state load.
The Larger Question
If cold starts are a significant problem for your workload, it's worth asking whether serverless is the right fit. A container running on Fargate or ECS has no cold starts and offers more predictable performance. The operational simplicity of serverless is valuable, but not if it comes at the cost of user experience.
We've moved our latency-sensitive endpoints to always-on containers while keeping batch processing and event handlers on Lambda. This hybrid approach gives us the best of both worlds: predictable latency for user-facing APIs and zero-ops scaling for background workloads.