We have a log streaming pipeline setup with filebeat -> logstash -> Elasticsearch. Recently, our ES cluster started throwing 429 exceptions for all traffic and we observed that eventually logs stopped streaming to Logstash from filebeat. Our understanding is that the ES errors caused back-pressure in Logstash which eventually got propagated to filebeat, causing it to stall. Further investigation of the filebeat logs suggests that during the entire period ES was down, filebeat harvester open file count gradually increased to 48 (normally the value is 4-5 when everything is working fine) and stayed at such high value. Also, we saw that the for the entire duration (lasted almost 2 days) the libbeat.pipeline.events.active count was fixed at 4117. Can someone please explain why the open file count increased to such a high value and also why the libbeat.pipeline.events.active count was fixed at 4117? The filebeat client runs on the same node as our application. The application uses file rotation policy - we keep 10 rotated files, and the rotation is based on file size (10MB). We do generate a lot of logs and files rotate very frequently.
Unfortunately, at the moment the way Filebeat handles backpressure is not ideal. It holds all files it cannot forward to the output, until the output comes alive again. This lets Filebeat save the contents of as many files as possible. However, it might lead to memory issues on the host by keeping too many files open.
Thanks a lot for your reply. Could you please explain at present till what limit filebeat will try to save files it cannot forward? Is there a configuration parameter for that or will it keeping saving till memory runs out? Also, could you please explain the behavior of libbeat.pipeline.events.active metric? During the ES outage the value was fixed at 4117 at filebeat? Does it correspond to internal queue of filebeat - https://www.elastic.co/guide/en/beats/filebeat/master/configuring-internal-queue.html? Should we interpret as that 4117 was the max size of the queue and all events beyond that was dropped?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.