Skip to main content

Oops, I Wrote a Compiler (While Trying to Cut Logging Costs)

💡 Want to know how it really works?
Check out the full technical case study here →

Anyone here using Datadog?

It’s a wonderful tool, that quickly can become expensive. My client approached me to reduce our logging costs by removing unnecessary sources. The company was feeding logs to datadog from almost every source.

Costs were horrible. Consider 5 TBs of logs. No, not monthly. Daily. That gives roughly 150 TB / month, 1.8 PB a year. Petabytes.

For those who like mathematical notation it’s:

1.8 PB = 1.8 * 10^3 TB = 1.8 * 10^6 GiB = 1.8 * 10^9 MB

So, we’d need roughly 1.25 billion floppy disks (remember those?) every year to store it. Aligned next to each other, they’d create a road about 3,000 km long — roughly the distance from Poland to Barcelona and back. That’s roughly 8,000 standard 256 GiB SSDs — every year. That’s around 600 million iPhone photos — one photo per second for 19 years straight. Or 383,000 DVD discs. Stack them on top of each other and you’d get a tower almost 460 meters high — about 1.5× the Eiffel Tower, and slightly higher than the Empire State Building. Not quite Burj Khalifa yet, but… give them two years. That’s also about 900,000 VHS tapes — or five fully packed TIR trucks.

After previous attempts at tuning indexing and reducing ingest volume failed to bring meaningful savings, I was asked to step in and tackle the problem head-on. At first, I was asked to write something that would stop the bleeding.

Why not write something new

But I kept thinking — weren’t there better options? Surely someone had already solved this kind of problem.

I’ve written custom software before — for telecom data at much higher volumes, including proprietary diagnostic protocols from Nokia and Ericsson. But this case felt… familiar. Surely, others must have faced the same issue too.

Someone from the team previously proposed using Vector. A PoC was even created — though it was later abandoned, mostly because it wasn’t seen as “user-friendly enough” by some of the non-technical stakeholders responsible for managing the Datadog tenant.

To be fair, it wasn’t really about the tool’s capabilities — but more about the mental overhead required to configure and operate it effectively. Spoiler: it’s actually quite simple once you understand the model. Since it was previously evaluated and its developer was recently bought by the Datadog, we decided we’d go for another try. We developed few test pipelines, altered something, fixed something, added this, removed that. But it was still not ready… Or… at least I didn’t like it.

The goal

Why wasn’t the tool used before? Because it wasn’t user- or developer-friendly. And someone would have to maintain it. Basically, every developer would need to learn yet another tool, its configuration, its quirks.

Okay, okay, I’ll think about it. How to do it right. - I’ve said.

So I got my Admin’s permissions to the Datadog and I looked on the settings. Looking for something… else. I was about to tune something. And then I went: yes! The pipelines!

So I started digging whether I could somehow map Datadog behaviors to configuration file of vector. Nah, not really.

But how would I do it?

Inspired by Datadog admin panel I decided to create “my own configuration file”. It looked like this:


- name: service-that-trashes-logs
  condition: .service == "service-0"
  action: drop
- name: service-that-is-useful
  condition: .service == "service-1"
  action: accept

That would be a clean, readable config file. Not much to learn, but offering a lot of flexibility. And the configuration change shall be simply a merge request. How to do it? How to template it? How to make it scalable?

With the best templating system I knew then: helm. Why not create a Kubernetes deployments?

Okay, we had the tool, we had the “language”. Now, how to make this happen? We need to somehow convert those rules into our system. I dug into the documentation of vector and found transform called router.

I started creating a helm chart that did the following:

  • translation of our config file into the configuration format for Vector.

The configuration above created:

- Router
	- Routes with names i.e. route.service-that-trashes-logs
- Connected it to sink for datadog if the pipeline had the name `accept`

Wow. Did I just by accident wrote a compiler in Helm?

And that’s how it started. A YAML file. A Helm chart. A weekend of tinkering.

So yeah, I guess technically I’ve written a compiler. Not one that emits machine code — but one that turns intentions into config. Close enough 😄

In the next part: How I made it scale, survive production, and not eat itself alive. 🚀