← back to log

Why we built c28n: the 2am incident

Every data engineer has a story. Ours involved a missing null coercion in a Kafka consumer, 40GB of malformed events, and a very cold pizza.

Every data engineer has a story. A moment when data they trusted betrayed them. A validation rule that didn’t fire. A schema that drifted unnoticed. For us, it was a Tuesday.

The Setup

We had a Kafka consumer processing events from a third-party API. The schema was simple: user interactions with timestamps. We’d validated the format, written tests, deployed to production. Everything worked.

Until it didn’t.

What Went Wrong

Somewhere between version 2.1 and 2.3 of their API, they changed the timestamp field. Instead of ISO 8601 strings, they started sending Unix timestamps — sometimes. Their “smart” serializer would choose the format based on the client’s Accept header.

Our consumer? It assumed strings. Always strings. When it hit an integer timestamp, it didn’t fail gracefully. It logged an error, skipped the event, and kept consuming.

By the time we noticed, we’d lost 40GB of events. The cold pizza was a separate tragedy.

The Lesson

Schema drift is inevitable. External APIs change. Internal services evolve. What you validated yesterday might not match what you receive tomorrow.

You can’t prevent drift. But you can detect it. You can surface it before it becomes data loss. You can make schema changes observable, debuggable, and survivable.

That’s why we built c28n. Not to prevent this from happening — but to make sure we never lose 40GB of data to a silent schema mismatch again.