TL;DR: How to detect subfields with dots, converting those dots to underscores, but ONLY in fields under that subfield.
Detail:
We use filebeats, logstash, and feed messages to an Elastic Stack, of which all pieces are running logstash 6.4.3. We have some Kubernetes deployments that are a mixture of home-grown apps and some general community apps.
When we first started with Kube a couple years ago, we standardized our kube deployment templates with a label "app" to be a text string, with differing values depending on the application it was on. However, modern Kube app deployments are standardizing on labels like:
metadata.labels."app.kubernetes.io/foo"="bar"
The dynamic discovery of this pod and its labels by filebeats results in the following construction in the document:
kubernetes.pod.labels.app.kubernetes.io/foo=bar
This is a problem for us when the final logstash tries to insert the JS document into ES because ES sees the "kubernetes.pod.labels.app.kubernetes.io/foo" and tries to create an object under "app" but we already have defined that "app" must be a string. To be clear, the data coming out of logstash is, I think:
[kubernetes][pod][labels][app.kubernetes.io/foo] = "bar"
but ES converts the dots to field name separators so tries to create an object and index it at:
[kubernetes][pod][labels][app][kubernetes][io/foo] = "bar"
In a different part of the company, a different team has an ELK stack and uses fluentd to get container logs. Something in their config is automatically "dedot"ing the label portion of the field, so what gets inserted into their ES cluster is:
kubernetes.pod.labels.app_kubernetes_io/foo=bar
This would be an ideal solution for us, but I have been struggling to find a way to detect that field (and all fields kubernetes.pod.labels.*) with the dots in it and convert the dots to underscores before it tries to insert the JS document to ES.
I can't run dedot() on it, even passing in the [kubernetes][pod][labels] prefix because dedot doesn't let me specify what level to start dedot-ing at. It will convert the whole object to a single large field, which is not desired. It will look like this :
kubernetes_pod_labels_app_kubernetes_io/foo
I strongly suspect that I will need to use some kind of inline ruby to do this, kind of like what is shown in: https://stackoverflow.com/a/37617214/611911
In my head I'm thinking to loop over [kubernetes][pod][labels], and then in the inline ruby check for each label:
# if (label.include?(".")) # Would this work as expected?
if (label =~ /\./) # Example from the StackOverflow post
newlabel = label.gsub(".", "_")
event.set(newlabel, event.remove(label))
end
Is there anybody who has done this before, either in filebeat or in logstash or in logstash with inline ruby? I'd appreciate a hint (or more!) or example.