Big delay in writing to Elasticsearch from some log inputs of Filebeat

I had to migrate my Elasticsearch 7.0.1 in a docker container from ssd to hdd.
After that I can see some filebeat log inputs are very late.
I use filebeat to send Nginx logs using a log file input for multiple files and use my custom pipeline for parsing them. Data from access log files with little write load appears in Elasticsearch very fast. But data from busy log files (from the same filebeat and the same server) is late for 4-6 hours.

How does the concurency work in this case? I don't know which parameters I need to tune. Is it possible to solve this problem by simply increasing count of workers?

Best regards,

Hello @r2r2

First of all, I want to inform you Filebeat has a Nginx module. Consider it in the future.

Regarding your indexing delay, it is necessary to first understand if Filebeat is slow parsing the log files or if Elasticsearch is getting overwhelmed and it is sending 429 TOO MANY REQUESTS to Filebeat.

  1. Check the Filebeat logs for errors or messages related to Elasticsearch or "output pipeline"
  2. Check the Elasticsearch logs for errors related to write queue or EsRejectedExecutionException (see this blog post)
1 Like

Thank you for your answer!
I know about Nginx module but we have too much customization in log fields order and pipeline. It's also very comfortable to put pipeline into ES from git-server instead of filebeats.

  1. There aren't any warning or error messages in the filebeat journal. only INFO: count of metrics and harvester's start/close.
  2. I need more time to check all the details described in the blog post. Now I just can say my log file doesn't give me any scary lines with grep -i "reject\|error\|fail\|429\|warn\|wrn". And today the delay of logs from the busiest access log file was in a range from 3 till 90 minutes.

First check if Elasticsearch is queueing or not.
See the output of GET _cat/thread_pool/write?v, you should have no rejections and queue ~ 0.

If Elasticsearch is fine, you can start tweaking the Filebeat configuration.
By default the bulk sizes of Filebeat might be small.
Try with:

  bulk_max_size: 1000
  workers: 2

If you have multiple nodes in your cluster, provide the list of hosts (excluding dedicated masters).

Another good blog post about Beats is available here.

If possible, try to enable Beats Monitoring and Elasticsearch Monitoring to get more metrics on what is happening.

1 Like

Elasticsearch has neither rejects nor queue.
The reason was in Filebeat. These parameters helped me:

bulk_max_size: 1000
workers: 2

Thank you for you help!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.