Hello!
I had to migrate my Elasticsearch 7.0.1 in a docker container from ssd to hdd.
After that I can see some filebeat log inputs are very late.
I use filebeat to send Nginx logs using a log file input for multiple files and use my custom pipeline for parsing them. Data from access log files with little write load appears in Elasticsearch very fast. But data from busy log files (from the same filebeat and the same server) is late for 4-6 hours.
How does the concurency work in this case? I don't know which parameters I need to tune. Is it possible to solve this problem by simply increasing count of workers?
First of all, I want to inform you Filebeat has a Nginx module. Consider it in the future.
Regarding your indexing delay, it is necessary to first understand if Filebeat is slow parsing the log files or if Elasticsearch is getting overwhelmed and it is sending 429 TOO MANY REQUESTS to Filebeat.
Check the Filebeat logs for errors or messages related to Elasticsearch or "output pipeline"
Check the Elasticsearch logs for errors related to write queue or EsRejectedExecutionException (see this blog post)
@Luca_Belluccini
Thank you for your answer!
I know about Nginx module but we have too much customization in log fields order and pipeline. It's also very comfortable to put pipeline into ES from git-server instead of filebeats.
There aren't any warning or error messages in the filebeat journal. only INFO: count of metrics and harvester's start/close.
I need more time to check all the details described in the blog post. Now I just can say my log file doesn't give me any scary lines with grep -i "reject\|error\|fail\|429\|warn\|wrn". And today the delay of logs from the busiest access log file was in a range from 3 till 90 minutes.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.