Filebeat creates too many handlers

I have this production cluster that is running dockerized versions of ELK applications. One VM hosts the entire stack except for filebeat instances that run one in each VM where other applications run.
Sometimes the containers that host filebeat get killed by the coordinator, and after investigating it is because they run out of memory. All linux
When checking all the filebeats I noticed that out of the 8 instances 5 have betwen 100 and 300 handlers open, while the other 3 have over 1500 handlers open.
By checking monitoring data before the instances got killed I noticed that they keep increasing handlers and memory until they use all the container memory (512Mb) and it gets killed.

The the applications we are harvesting logs from, use regular log rotation with about 2mb per file and have different amounts of traffic. But I check the servers filesystem and while filebeat has around 1500 handlers open I only see 250 log files.

The servers where the high handlers are happening are particularly high on traffic, so I imagine is possible that although the logs are rotating and being rolled over, filebeat is keeping the handlers open to keep reading from them but I need a way to be sure of what is going on to know if I need to increase memory or if I have a missconfiguration.
For example, from the monitor tab in kibana, both logstash and elasticsearch seem to be ingesting logs just fine, under 5ms per event with around 180 e/s but I don´t know how to be sure where the bottleneck is if there is any.

Thanks in advance!
Here is my filebeat config:

    - type: docker
        - condition:
             docker.container.labels.elkEnabled: "true"
            - type: docker
                - "${}"
              multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
              multiline.negate: true
              multiline.match: after
  - add_docker_metadata: ~
    events: 12800
    flush.min_events: 3200

  enabled: true
  hosts: '${LOGSTASH_HOSTS}'
  bulk_max_size: 3200
  worker: 2
xpack.monitoring.enabled: true

logging.to_stderr: true
1 Like

I´m still having this issue, I believe it may be the cause of the delay I see in logs reaching ES. We usually have to wait a few minutes to show up in kibana, and it gets worse when traffic spikes. This does not happen in lower environments with the same config but fewer traffic:

I continue to investigate monitoringa tab and believe neither ES nor LS are the culprits.
Logstash emits the same number of events it receives at the same time and the pipeline has a really small delay:


Elasticsearch shows a very constant and small indexing latency and resources usage:

So I suspect filebeat is delaying too much in reading events from logs in disk or something, here is the monitoring info for the overview:

And for one of the instances using a lot of handlers:

Please if someone can give me a pointer because I´m out of ideas.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.