Filebeat used for k8s autodiscovery and file logs sending duplicated events

Hi! I can see "duplicated events" is a very common issue, but I cannot find the solution for the problem I have, that looks like a very generic one IMHO.

I am using filebeat in a kubernetes cluster (taking care of kubernetes autodiscovery and file log extraction), so I have 8 instances created by the daemon set. It looks like that one per node.

It looks like the file log extraction is replicated 8 times, one per node. Is any easy way of solving this?
Else, I can foresee 3 solutions:

  • I'll need to deploy one instance taking care of everything (if k8s autodiscovery works from one node to all the clusters).
  • I'll need to keep these 8 instances + 1 specific one for file log extraction.
  • Stop using add_id processor and using fingerprint processor, but it would be a waste of energy processing and dropping 7 out of the 8 file reads.

Hi @Chexpir!

What do you mean by file log extraction? Could you provide your configuration? Usually each Daemonset autodiscover pods/containers on the node where it runs. I don't see any way to discover containers from a different node :thinking:.

Regards.

@ChrsMark sorry for not being specific enough. I mean I have 2 inputs: k8s autodiscover (which works perfectly) and file log extraction (which sends each event 8 times), and filebeat is deployed with the automatic daemonset (so once per node) and file logs are read once per node.

    filebeat.inputs:
      - type: log
        paths:
          - "/logs*/*/karaf/logs/tesb.log*"
        fields_under_root: true
        multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
        multiline.negate: true
        multiline.match: after
      - type: log
        paths:
          - "/logs*/*/Interfaces/logs/*"
        fields_under_root: true
        multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
        multiline.negate: true
        multiline.match: after
    filebeat.modules:
    - module: activemq
      audit:
        enabled: true
        var.paths: ["/logs*/*/activemq/logs/audit.log*"]
      log:
        enabled: true
        var.paths: ["/logs*/*/activemq/logs/activemq.log*"]
    - module: apache
      access:
        enabled: true
        var.paths: ["/logs*/*/*ui-*/logs/access.log*"]
      error:
        enabled: true
        var.paths: ["/logs*/*/*ui-*/logs/error.log*"]

    filebeat.autodiscover:
      providers:
        - type: kubernetes
          node: ${NODE_NAME}
          hints.enabled: true
          hints.default_config:
            type: container
            paths:
              - /var/log/containers/*${data.kubernetes.container.id}.log 

I was waiting for an answer for this, but I believe best solution, if I cannot do autodiscovery from different nodes, is the following
"keep these 8 instances for k8s autodiscover + 1 new specific filebeat instancefor file log extraction"

So these logs from which you see the events are being collected from the host? It seems that yes you need only one Filebeat instance to handle this cluster wide input.

Note that you can always define modules in autodiscover: https://www.elastic.co/guide/en/beats/filebeat/current/configuration-autodiscover.html#_kubernetes. For instance:

filebeat.autodiscover:
  providers:
    - type: kubernetes
      templates:
        - condition:
            equals:
              kubernetes.container.image: "redis"
          config:
            - module: redis
              log:
                input:
                  type: container
                  paths:
                    - /var/log/containers/*-${data.kubernetes.container.id}.log

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.