Dual-Writes and the Outbox Pattern
Your daily reconciliation job flags a problem: dozens of orders exist in your database, but no corresponding events ever reached Kafka. The email service never sent confirmations. The inventory service never reserved stock. After some digging, you find the culprit - Kafka had a brief hiccup yesterday and returned errors for a few publish attempts. The database writes succeeded, but without the outbox pattern, those events were simply lost.
This is a common problem in event-driven systems, and if you work with it long enough, chances are you’ll run into it at some point.
The Dual-Write Problem
Here’s what typically happens. Your order service needs to do two things:
- Save the order to the database
- Publish an “order-created” event to Kafka
The straightforward approach looks like this:
func CreateOrder(ctx context.Context, order Order) error {
if err := db.SaveOrder(ctx, order); err != nil {
return err
}
if err := kafka.Publish(ctx, "order-created", order); err != nil {
// Database is updated, but the event failed.
// Now what?
return err
}
return nil
}Most of the time this works fine. But then your service crashes between those two operations. Or the network blips. Or Kafka is temporarily unavailable. Now you have an order in your database, but the event never went out.
You might think: “Just publish to Kafka first, then save to the database.” But that has the opposite problem—if the database write fails, you’ve just told the world about an order that doesn’t exist.
The root issue is that you’re trying to make two separate systems act as one. Your database and Kafka don’t know about each other. There’s no way to make them commit together.
A Simple Solution
The transactional outbox pattern takes a different approach: instead of publishing directly to Kafka, you write the event to your database.
- Save the order to the database
- In the same transaction, save the event to an “outbox” table
- A separate process reads from the outbox and publishes to Kafka
flowchart LR
Service[Order Service] -->|"Single Transaction"| DB[(Database)]
DB --> Orders[orders]
DB --> Outbox[outbox]
Publisher[Publisher] -->|"Poll"| Outbox
Publisher -->|"Publish"| Kafka[Kafka]
The insight here is that your database already knows how to make multiple writes atomic. By writing both the order and the event in the same transaction, they either both succeed or both fail. No more inconsistency.
The Kafka publishing happens later, asynchronously, by a separate publisher process. If it fails, it just retries. The event is safely stored in your database until it gets published.
How It Works
1. The Outbox Table
You need a table to store events alongside your business data:
CREATE TABLE outbox (
id VARCHAR PRIMARY KEY,
aggregate_type VARCHAR NOT NULL, -- the entity type, e.g. "order", "user", "payment"
aggregate_id VARCHAR NOT NULL, -- the entity's ID, e.g. "order-123"
event_type VARCHAR NOT NULL, -- what happened, e.g. "order-created", "order-cancelled"
payload JSON NOT NULL, -- the event data
created_at TIMESTAMP NOT NULL, -- when the event was created (for ordering)
published_at TIMESTAMP -- null = not yet published, timestamp = published
);2. Write Both in One Transaction
When creating an order, you write to both tables atomically:
func (s *OrderService) CreateOrder(ctx context.Context, order Order) error {
tx, err := s.db.BeginTransaction(ctx)
if err != nil {
return err
}
defer tx.Rollback(ctx)
if err := tx.InsertOrder(ctx, order); err != nil {
return err
}
event := OutboxEvent{
ID: uuid.New().String(),
AggregateType: "order",
AggregateID: order.ID,
EventType: "order-created",
Payload: marshalJSON(order),
CreatedAt: time.Now(),
}
if err := tx.InsertOutboxEvent(ctx, event); err != nil {
return err
}
return tx.Commit(ctx)
}3. Publish from the Outbox
A background worker polls for unpublished events and sends them to Kafka:
func (p *OutboxPublisher) publishPending(ctx context.Context) error {
events, err := p.db.GetUnpublishedEvents(ctx, 100)
if err != nil {
return err
}
for _, event := range events {
if err := p.kafka.Publish(ctx, event); err != nil {
log.Printf("Failed to publish %s: %v", event.ID, err)
continue
}
p.db.MarkEventPublished(ctx, event.ID)
}
return nil
}The polling adds a small delay (typically 1-5 seconds), but you get reliability in return.
A Few Things to Keep in Mind
This pattern guarantees at-least-once delivery. If the publisher crashes after sending to Kafka but before marking the event as published, it will send the event again. Your consumers need to handle duplicates—track processed event IDs, use upserts, or design idempotent operations.
When not to use it: If losing an occasional event is acceptable (analytics, logging), publishing directly and logging failures might be enough. The outbox adds complexity—only use it when consistency actually matters.
The transactional outbox is a simple idea: if you can’t make two systems atomic, don’t try. Write everything to one system and let a separate process handle the rest. It’s not glamorous, but it works—and at 2am, that’s what counts.