Configuring multiple filebeat instances in Kubernetes

We're currently running a well-working setup for indexing nginx sidecar logs using filebeat. It consists of a filebeat container per node, and in our app deployments we have annotations such as

        co.elastic.logs.nginx/enabled: "true"

(we make sure to actually name the nginx sidecar "nginx").

This is the filebeat config we're running:

        - type: kubernetes
          hints.enabled: true
          hints.default_config.enabled: false
          include_annotations: ["logtype"]
          include_labels: ["app"]
          labels.dedot: true
          annotations.dedot: true
      - decode_json_fields:
          fields: ["message"]
          target: ""
          max_depth: 2
          overwrite_keys: false
      - add_kubernetes_metadata:
          in_cluster: true

Also worth mentioning we have a custom json-formatted logger in our nginx image, not using any of filebeat's builtin modules. We like to have full control ourselves. So far, all of this works really well.

The challenge: We'd like a similar setup to be able to index application logs from the "main" container. These will also probably be json-formatted, but we'd like some flexibility here so I'm trying to figure out what the best option will be.

As far as I can see, the kubernetes filebeat autodiscover setup doesn't really support running multiple filebeat containers with different config, because there's no way to direct a container's log file to a specific filebeat instance. Data from filebeat ends up in logstash, where each pipeline is published on an isolated port. Again, we do this to avoid large and hard-to-understand pipeline code blocks and it also allows is to decouple or pipelines later if needed.

So, in essense we need two different filebeat instances to push data to 2 completely separate logstash pipelines. I'm struggling to figure out how to do this, without running a filebeat sidecar alongside each kubernetes pod, which would work but with unneseccary overhead.

The only thing I can think of is to use the "processors" directive in such a way that the "nginx filebeat instance" drops all messages NOT from a container named "nginx", and the "app logs instance" drops all messages NOT from a container named "app". But this seems suboptimal, especially since we'd have to parse potentially large amounts of data twice, only to discard much of it.

Any hints appreciated.