Send k8s logs with Filebeat to separate indices

Hi :slight_smile: Currently I'm using self-hosted elastic stack but evaluating elastic cloud.

I collect all the logs from multiple k8s clusters in a standard way with filebeat kubernetes module. There are application and nginx logs. For now I have logstash as filebeats output. Then I match logs with grok, tag them and send to different idices, eg.: filebeat-nginx-*, filebeat-app-*, filebeat-other-*. I thought about dropping logstash and don't worry about managing it if there was a way to filter those logs this way with filebeat or ingest node pipelines.

I thought about using indices in filebeat config along with some processors, but it's hard to do proper filtering content of a message with when.contains. There's also when.regexp, but I'd like to ask if anyone had the same problem and solved it in an elegant way? Maybe my approach to indexing should be completely different? How do you handle multiple k8s clusters logs, where every log goes to a common place (stdout, stderr)?

I'm running filebeat as a daemonset on kubernetes using the helm3 stable/filebeat chart. If you're fine with that setup, you can include k8s metadata in your index names pretty easily. First you'll need to enable the add_kubernetes_metadata processor (https://www.elastic.co/guide/en/beats/filebeat/6.8/add-kubernetes-metadata.html).

    processors:
    - add_kubernetes_metadata:
        in_cluster: true

From there, slap a label on your pods, I'll use an arbitrary example of a label called "index-suffix". Then, in your index specification in your filebeat config, you can then have something like:

    ...
    index: "filebeat-%{[kubernetes.pod.labels.index-suffix]}-MoreIndexSuffixes" `
    ...

I'd recommend you set this value with a ternary though. Consider instead:

    ...
    index: "filebeat-%{[kubernetes.pod.labels.index-suffix]:Default}-MoreIndexSuffixes"
    ...

This will use the "index-suffix" label for a given pod if it can find it, and the literal string "Default" if it can't.

Be warned: I've seen some weird interactions with pods that have just started up if you're using the "Container" filebeat input, and presumably you'll have the same issue if you are directly watching a specific directory as well. The problem I ran into was that filebeat didn't always have access to a freshly started pod's metadata, but it did have immediate access to the log file, so the first few lines of each log would show up in the default index. My fix was to use filebeat's kubernetes autodiscover provider (https://www.elastic.co/guide/en/beats/filebeat/current/configuration-autodiscover.html#_kubernetes) so we rely on k8s to tell us when and where to look for logs.