ENGINEERING 6 min read Feb 5, 2026

Event-Driven Architecture: How Kafka Powers PLATFORMA

30+ Kafka topics connect 8 microservices. Here's why we chose event-driven architecture and the patterns that make it work at scale.

Stefan Nikolić

Senior Backend Engineer

When we started building PLATFORMA, we had a choice: synchronous REST calls between services, or asynchronous event-driven communication. We chose events. Two years later, it's the best architectural decision we made.

Why Events

The core argument for events is decoupling. When a customer places an order, the order service doesn't need to know that it should trigger provisioning, billing, notifications, and audit logging. It publishes an `order.created` event, and every interested service reacts independently.

This matters enormously for reliability. If the notification service is down, orders still process. The notification will be sent when the service recovers and catches up on the Kafka topic. In a synchronous architecture, a failing notification service would block the entire order flow.

The Event Catalog

We have 30+ event types organized by domain. Order events: `order.created`, `order.confirmed`, `order.cancelled`, `order.completed`. Provisioning events: `provisioning.started`, `provisioning.progress`, `provisioning.completed`, `provisioning.failed`. Billing events: `invoice.generated`, `payment.received`, `payment.failed`.

EVENT BUS

Publish

Kafka

Consume

30+Kafka topics

8Services

0Data loss

Every event has a versioned schema. We use JSON Schema validation on both producers and consumers. Schema evolution follows strict rules: new fields can be added (with defaults), existing fields cannot be removed or renamed, and type changes require a new event version.

The Saga Pattern

Order processing is a distributed transaction that spans multiple services. The customer orders a VPS → provisioning creates the VM → billing generates the invoice → notification sends credentials. If provisioning fails, the order should be cancelled and the customer refunded.

We implement this using the saga pattern: each service publishes events about its success or failure, and a saga orchestrator tracks the overall state. If a step fails, compensating events are published to undo previous steps. The entire flow is visible in the event log — you can replay any order and see exactly what happened at each stage.

Consumer Groups and Scaling

Each service runs as a Kafka consumer group. When we need to scale the provisioning service (because there's a queue of orders), we add more instances to the consumer group. Kafka automatically rebalances partitions across instances. Scaling is horizontal and seamless.

We partition order events by tenant ID. This guarantees that all events for a single tenant are processed in order by the same consumer instance. No race conditions, no out-of-order processing.

Lessons Learned

Event-driven architecture is not simple. Debugging is harder — you can't just follow a request through a call stack. You have to trace events across topics and services. We built extensive tooling: a Kafka event viewer in our admin platform, correlation IDs on every event, and distributed tracing with OpenTelemetry.

But the benefits — reliability, scalability, and decoupling — are worth the complexity. We've had zero data loss in 18 months of production, and adding new services is as easy as subscribing to existing topics.

▶Continue Reading

Access & Security

Core Business Services

Orchestration & Data

Infrastructure & Core

Service Delivery

Business Strategy

Get in Touch

Help & Resources

Infrastructure

Enterprise

Specialized

Event-Driven Architecture: How Kafka Powers PLATFORMA

Why Events

The Event Catalog

The Saga Pattern

Consumer Groups and Scaling

Lessons Learned

Related articles

MCP Agents in Cloud Operations: How We Cut L1 Incidents by 73%

90-Second Provisioning: The Engineering Behind Order-to-VM

Building Multi-Tenant Billing From Scratch: Lessons from 500 Tenants