Architecture check: Offloading Logstash gsub regex to K8s pod sidecars to save CPU?

Ilya_Ploskovitov · April 7, 2026, 1:43pm

Hi everyone,

We’ve been dealing with a classic pipeline bottleneck: our Logstash nodes are burning massive amounts of CPU because we have dozens of gsub mutate filters to scrub PII and secrets (like emails, Stripe tokens, API keys) before indexing to Elasticsearch.

Scaling Logstash just to handle regex processing is getting too expensive, and maintaining the regex patterns for new token formats is a nightmare.

I decided to try shifting this left and offloading it to the edge. I wrote a lightweight Go sidecar for our K8s pods that intercepts the stdout log stream. Instead of pure regex, it calculates Shannon Entropy on the fly to detect random API keys and replaces them with deterministic HMAC hashes (e.g., [HIDDEN:e9f1a2]). It does this before Filebeat/Fluent-bit even picks the logs up.

The Logstash pipeline is now almost empty, and the node CPU dropped drastically. I open-sourced the tool here if anyone wants to see the implementation: https://github.com/aragossa/pii-shield

Has anyone else moved away from central Logstash sanitization to edge-sanitization architecture? Are there any hidden pitfalls with Elasticsearch indexing when doing deterministic hashing at the pod level?

Would appreciate any architecture feedback or roasting of the code!

Topic		Replies	Views
K8s (kubernetes) sidecar considerations Logstash	5	2641	January 18, 2022
Logstash Servers are runs with high CPU consumption Logstash	1	393	September 10, 2019
Logstash CPU Problem Logstash	13	1549	December 4, 2023
Architecture advice(for my needs: several applications, multiple files...) Logstash	1	633	July 17, 2016
Logstash As Side Car Container Logstash	3	3061	April 27, 2017

Architecture check: Offloading Logstash gsub regex to K8s pod sidecars to save CPU?

Related topics