Filebeat harvester open file count increasing when Logstash output to Elasticsearch Fails

arijitb · September 25, 2020, 2:52am

We have a log streaming pipeline setup with filebeat -> logstash -> Elasticsearch. Recently, our ES cluster started throwing 429 exceptions for all traffic and we observed that eventually logs stopped streaming to Logstash from filebeat. Our understanding is that the ES errors caused back-pressure in Logstash which eventually got propagated to filebeat, causing it to stall. Further investigation of the filebeat logs suggests that during the entire period ES was down, filebeat harvester open file count gradually increased to 48 (normally the value is 4-5 when everything is working fine) and stayed at such high value. Also, we saw that the for the entire duration (lasted almost 2 days) the libbeat.pipeline.events.active count was fixed at 4117. Can someone please explain why the open file count increased to such a high value and also why the libbeat.pipeline.events.active count was fixed at 4117? The filebeat client runs on the same node as our application. The application uses file rotation policy - we keep 10 rotated files, and the rotation is based on file size (10MB). We do generate a lot of logs and files rotate very frequently.

filebeat metric during the error period

filebeat metric when there was no error in ES:

Relevant input section in filebeat config:

kvch · September 25, 2020, 3:09pm

Unfortunately, at the moment the way Filebeat handles backpressure is not ideal. It holds all files it cannot forward to the output, until the output comes alive again. This lets Filebeat save the contents of as many files as possible. However, it might lead to memory issues on the host by keeping too many files open.

If you have these issues you can give a lifetime to readers by configuring close_timeout. See more: https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-log.html#filebeat-input-log-close-timeout

arijitb · September 25, 2020, 3:34pm

Hi Noemi @kvch ,

Thanks a lot for your reply. Could you please explain at present till what limit filebeat will try to save files it cannot forward? Is there a configuration parameter for that or will it keeping saving till memory runs out? Also, could you please explain the behavior of libbeat.pipeline.events.active metric? During the ES outage the value was fixed at 4117 at filebeat? Does it correspond to internal queue of filebeat - https://www.elastic.co/guide/en/beats/filebeat/master/configuring-internal-queue.html? Should we interpret as that 4117 was the max size of the queue and all events beyond that was dropped?

Thanks,
Arijit

system · October 23, 2020, 5:34pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ERR Failed opening: too many open files Beats filebeat	2	2005	July 5, 2017
Logstash input connections from beats stay established and lead to "too many open files" errors Logstash	1	870	January 1, 2018
Filebeat missing files Beats filebeat	19	5682	October 11, 2016
Logstash cannot consume Filebeat events Beats filebeat	7	1446	July 5, 2017
Logstash Too many open files Logstash	8	10068	July 6, 2017

Filebeat harvester open file count increasing when Logstash output to Elasticsearch Fails

Related topics