Losing messages during high traffic rate

sharon.c · July 26, 2016, 9:00pm

I am doing log analysis using Filebeat (1.2) -> logstash(2.3) -> Elasticsearch (2.3)

I have 4 filebeat instances, 4 logstash instances (6 cores each) , and Elasticsearch cluster (8 cores, 64G RAM) of 2 nodes

In Filebeat.yml logstash setting is as such:

filebeat:
prospectors:
-
paths:
- /var/log/filebeat//.json
encoding: utf-8
input_type: log
ignore_older: 10m
scan_frequency: 1s
exclude_lines: ["^$"]
spool_size: 3072
registry_file: .filebeat

output:
logstash:
enabled: true
hosts: ["logstash1:5044","logstash2:5044"]
worker: 8
loadbalance: true
index: elkstats_record

Logstash’s Elasticsearch output setting is like this

    elasticsearch {
       hosts => ["node1", "node2"]
       index => "records_%{+YYYY.MM.dd}" # generate 1 index every month
       template_name => "template"
       document_id => "%{[@metadata][computed_id]}"  # set documented
       workers => 2
       flush_size => 3500
    }

Logstash host environment variable for heap size is set to $LS_HEAP_SIZE = 2048M

Normally, if the traffic is 3000-4000 msg/sec, when the traffic exceeds 6000-7000 msg/sec, logstash side will have these repeated messages:

CircuitBreaker::rescuing exceptions {:name=>"Beats input", :exception=>LogStash::Inputs::Beats::InsertingToQueueTakeTooLong, :level=>:warn}
Beats input: The circuit breaker has detected a slowdown or stall in the pipeline, the input is closing the current connection and rejecting new connection until the pipeline recover. {:exception=>LogStash::Inputs::BeatsSupport::CircuitBreaker::HalfOpenBreaker, :level=>:warn}

In system monitor tool, it shows logstash instance uses 50 - 80% of the cpu during the high traffic hours, but only 800MB RAM.

Elasticsearch side does not have obvious resource shortage. CPU usage is 15%-26%, memory usage is 26%

On elasticsearch error logs, it shows: org.apache.lucene.store.AlreadyClosedException, I have more details in this link

It looks like logstash is not processing messages fast enough, and filebeat takes long time to insert to logstash's queue. Then we end up losing the messages which are not inserted from filebeat to logstash queue.

Is there any way to tune logstash or filebeat (flush_size, workers) so that Logstash will process messages faster?
Also how to let logstash make use of all the LS_HEAP_SIZE of 2G to cache the unprocessed messages, instead of only 800MB?
Is there any other way to prevent logstash from losing messages?

anhlqn · July 27, 2016, 6:33pm

How many Logstash filter workers for each LS instance?

Have you tried to increase the pipeline batch size for LS with the -b switch? The default one of -b 125 is pretty low. Try to increase it gradually to see if it helps. Mine is set at -b 1500 or -b 3000 depending on the throughput.

Try increasing this number to match the number of LS worker filters. It shoud be at least 6 since your LS instance has 6 CPU cores.

How many LS filters do you have? Too many filters hurt LS processing capacity.

sharon.c · July 27, 2016, 6:35pm

Thank you for the prompt suggestions. Setting the batch size is very effective, there is no lost message after setting the appropriate batch size.

sharon.c · July 27, 2016, 7:08pm

Can you also provide some link of article about how to optimise LS processes?

anhlqn · July 27, 2016, 7:12pm

This may be helpful https://www.elastic.co/guide/en/logstash/2.3/pipeline.html. If possible, set Logstash output to /dev/null and test the filter workers and pipeline batch size first. Use LS metrics plugin to see how many msgs LS can handle.

Topic		Replies	Views
Overcome 2600 msg/seconds with Logstash/Filebeat Logstash	3	1205	July 6, 2017
Increasing throughput from Filebeat to Logstash Beats filebeat	1	1192	November 1, 2019
Filebeat slowing to a halt within 20-30 minutes of starting Beats filebeat	4	1362	May 19, 2017
While shipping the server logs through filebeat to elasticsearch via logstash , most of the entries are missing Beats filebeat	3	585	September 23, 2019
Filebeat - logstash performances troubleshooting Beats filebeat	15	4669	April 17, 2017

Losing messages during high traffic rate

Related topics