We're currently running a well-working setup for indexing nginx sidecar logs using filebeat. It consists of a filebeat container per node, and in our app deployments we have annotations such as
annotations: co.elastic.logs.nginx/enabled: "true"
(we make sure to actually name the nginx sidecar "nginx").
This is the filebeat config we're running:
filebeat.autodiscover: providers: - type: kubernetes hints.enabled: true hints.default_config.enabled: false include_annotations: ["logtype"] include_labels: ["app"] labels.dedot: true annotations.dedot: true processors: - decode_json_fields: fields: ["message"] target: "" max_depth: 2 overwrite_keys: false - add_kubernetes_metadata: in_cluster: true
Also worth mentioning we have a custom json-formatted logger in our nginx image, not using any of filebeat's builtin modules. We like to have full control ourselves. So far, all of this works really well.
The challenge: We'd like a similar setup to be able to index application logs from the "main" container. These will also probably be json-formatted, but we'd like some flexibility here so I'm trying to figure out what the best option will be.
As far as I can see, the kubernetes filebeat autodiscover setup doesn't really support running multiple filebeat containers with different config, because there's no way to direct a container's log file to a specific filebeat instance. Data from filebeat ends up in logstash, where each pipeline is published on an isolated port. Again, we do this to avoid large and hard-to-understand pipeline code blocks and it also allows is to decouple or pipelines later if needed.
So, in essense we need two different filebeat instances to push data to 2 completely separate logstash pipelines. I'm struggling to figure out how to do this, without running a filebeat sidecar alongside each kubernetes pod, which would work but with unneseccary overhead.
The only thing I can think of is to use the "processors" directive in such a way that the "nginx filebeat instance" drops all messages NOT from a container named "nginx", and the "app logs instance" drops all messages NOT from a container named "app". But this seems suboptimal, especially since we'd have to parse potentially large amounts of data twice, only to discard much of it.
Any hints appreciated.