Filebeat low throughput and many old files under harvesting

Hi,

I am noticing this weird problem on filebeat side. Using filebeat version=7.1

I follow this setup , Filebeat -> Logstash -> ES. Logstash servers have been setup with persisted queue of 20GB.

filebeat is reading from 2 folders each having 50+ files of 2 GB each. Most of the files were created 24 hrs back but filebeat is still harvesting these files and there is no new log lines written in these files for last 10hrs or more but don't know why filebeat is showing harvesting as below.

The impact is overall throughput is less than half. We still see have half of data is still not reached ES. There is no load on logstash side and no backpressure and near empty logstash queue.

I have below questions:-

  1. why the throughput is so low?
  2. How to see filebeat is actually sending the data and where is pointer on each file?
  3. How to confirm how much data is left for streaming on the filebeat side for each file?
  4. How to improve the throughput? Is too many big size files cause problem on filebeat throughput?
  5. How to increase filebeat throughput ?

I have also shared filebeat configs below.

2021-02-03T06:35:50.300Z INFO log/harvester.go:254 Harvester started for file: /home/centos/ext-drive/storm/apache-storm-2.1.0/logs/workers-artifacts/topology-datumizer-priming-9-1612258878/6700/23_worker.log
2021-02-03T06:35:50.301Z INFO log/harvester.go:254 Harvester started for file: /home/centos/ext-drive/storm/apache-storm-2.1.0/logs/workers-artifacts/topology-datumizer-priming-9-1612258878/6700/48_worker.log
2021-02-03T06:35:50.302Z INFO log/harvester.go:254 Harvester started for file: /home/centos/ext-drive/storm/apache-storm-2.1.0/logs/workers-artifacts/topology-datumizer-priming-9-1612258878/6701/41_worker.log
2021-02-03T06:35:50.302Z INFO log/harvester.go:254 Harvester started for file: /home/centos/ext-drive/storm/apache-storm-2.1.0/logs/workers-artifacts/topology-datumizer-priming-9-1612258878/6700/28_worker.log
2021-02-03T06:35:50.304Z INFO log/harvester.go:254 Harvester started for file: /home/centos/ext-drive/storm/apache-storm-2.1.0/logs/workers-artifacts/topology-datumizer-priming-9-1612258878/6701/20_worker.log
2021-02-03T06:35:50.305Z INFO log/harvester.go:254 Harvester started for file: /home/centos/ext-drive/storm/apache-storm-2.1.0/logs/workers-artifacts/topology-datumizer-priming-9-1612258878/6701/44_worker.log
2021-02-03T06:35:50.306Z INFO log/harvester.go:254 Harvester started for file: /home/centos/ext-drive/storm/apache-storm-2.1.0/logs/workers-artifacts/topology-datumizer-priming-9-1612258878/6701/47_worker.log
2021-02-03T06:35:50.307Z INFO log/harvester.go:254 Harvester started for file: /home/centos/ext-drive/storm/apache-storm-2.1.0/logs/workers-artifacts/topology-datumizer-priming-9-1612258878/6701/50_worker.log
2021-02-03T06:35:50.307Z INFO log/harvester.go:254 Harvester started for file: /home/centos/ext-drive/storm/apache-storm-2.1.0/logs/workers-artifacts/topology-datumizer-priming-9-1612258878/6701/13_worker.log
2021-02-03T06:35:50.307Z INFO log/harvester.go:254 Harvester started for file: /home/centos/ext-drive/storm/apache-storm-2.1.0/logs/workers-artifacts/topology-datumizer-priming-9-1612258878/6701/25_worker.log
2021-02-03T06:35:50.307Z INFO log/harvester.go:254 Harvester started for file: /home/centos/ext-drive/storm/apache-storm-2.1.0/logs/workers-artifacts/topology-datumizer-priming-9-1612258878/6701/31_worker.log
2021-02-03T06:35:50.307Z INFO log/harvester.go:254 Harvester started for file: /home/centos/ext-drive/storm/apache-storm-2.1.0/logs/workers-artifacts/topology-datumizer-priming-9-1612258878/6701/52_worker.log

.....
....
....

2021-02-03T06:35:50.449Z INFO log/harvester.go:254 Harvester started for file: /home/centos/ext-drive/storm/apache-storm-2.1.0/logs/workers-artifacts/topology-datumizer-priming-9-1612258878/6701/36_worker.log
2021-02-03T06:35:50.452Z INFO log/harvester.go:254 Harvester started for file: /home/centos/ext-drive/storm/apache-storm-2.1.0/logs/workers-artifacts/topology-datumizer-priming-9-1612258878/6701/37_worker.log
2021-02-03T06:35:50.455Z INFO log/harvester.go:254 Harvester started for file: /home/centos/ext-drive/storm/apache-storm-2.1.0/logs/workers-artifacts/topology-datumizer-priming-9-1612258878/6700/15_worker.log
2021-02-03T06:35:50.458Z INFO log/harvester.go:254 Harvester started for file: /home/centos/ext-drive/storm/apache-storm-2.1.0/logs/workers-artifacts/topology-datumizer-priming-9-1612258878/6700/24_worker.log
2021-02-03T06:35:50.465Z INFO log/harvester.go:254 Harvester started for file: /home/centos/ext-drive/storm/apache-storm-2.1.0/logs/workers-artifacts/topology-datumizer-priming-9-1612258878/6700/27_worker.log
2021-02-03T06:35:50.468Z INFO log/harvester.go:254 Harvester started for file: /home/centos/ext-drive/storm/apache-storm-2.1.0/logs/workers-artifacts/topology-datumizer-priming-9-1612258878/6700/34_worker.log
2021-02-03T06:35:50.468Z INFO log/harvester.go:254 Harvester started for file: /home/centos/ext-drive/storm/apache-storm-2.1.0/logs/workers-artifacts/topology-datumizer-priming-9-1612258878/6700/37_worker.log

ignore_older: 36h
close_inactive: 5m
close_removed: true
clean_removed: true

queue.mem:
events: 8000
flush.min_events: 2048
flush.timeout: 0s

output.logstash:

The Logstash hosts

hosts: ["10.178.128.26","10.178.130.27"]
loadbalance: true
bulk_max_size: 4096

Please suggest.

Regards
Ajay

Can you see any errors in the Logstash logs? Filebeat usually slows down when the output is not able to accept the packages from it.

Hi,

There is no errors in the logstash logs it seems fine except this below message.

Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>4, "pipeline.batch.size"=>3000, "pipeline.batch.delay"=>10, "pipeline.max_inflight"=>12000, :thread=>"#<Thread:0x52877120 run>"}

[2021-02-04T13:26:24,991][WARN ][logstash.javapipeline ] CAUTION: Recommended inflight events max exceeded! Logstash will run with up to 12000 events in memory in your current configuration. If your message sizes are large this may cause instability with the default heap size. Please consider setting a non-standard heap size, changing the batch size (currently 3000), or changing the number of pipeline workers (currently 4) {:pipeline_id=>"main", :thread=>"#<Thread:0x52877120 run>"}

There is no backpressure like 429 issue from ES.

I am using logstash persisted queue and same settings and configs are working perfectly on other clusters/pipeline and queue size is going towards 20GB as filebeat throughput is high on other pipelines.

But, on this pipeline this persisted queue is always below 100MB as filebeat throughput is very low, we have big size ES cluster so no back pressure from ES and logstash is always free with no errors in the log files.

Regards
Ajay

@kvch can you or any member please help in debugging the issue? Or redirect me .