How to slow down large amount of data coming from filebeat?

Sagar_Mandal · March 25, 2020, 10:09am

Hi Team,

How will you slow down large amount of data streaming from filebeat to logstash so that it can be processed accurately in filter section.

Thanks and Regards,
Sagar Mandal

A_B · March 25, 2020, 4:43pm

Hi @Sagar_Mandal,

that should be mostly automatic. Of course depends on how much data would have to be cached...

Can't find this in any official Elastic documentation but as far as I remember, it is part of the lumberjack protocol that is used between Filebeat and Logstash.

From Send Your Data | Logz.io Docs

One of the facts that make Filebeat so efficient is the way it handles backpressure— so if Logstash is busy, Filebeat slows down it’s read rate and picks up the beat once the slowdown is over.

But from personal experience, if the Logstash filter section is very process heavy, you can still get into trouble. I have killed my Logstash instances with sub-optimal filters, especially GROK filters with poorly written patterns and no anchoring.

Sagar_Mandal · March 25, 2020, 5:24pm

okay so the thing is about 100GB of data comes everysingle day to logstash from filebeat and then it goes to a filter section where a lot of conditioning is being done so....yeah.

rcowart · March 25, 2020, 6:38pm

100GB per day is a lot of data. I would recommend something like:

filebeat --> kafka --> multiple logstash instances --> elasticsearch

Bursts of messages can then be queued in Kafka and multiple Logstash instance can be used to scale the post-processing of that data.

Rob

How to install Elasticsearch & Kibana on Ubuntu - incl. hardware recommendations
What is the best storage technology for Elasticsearch?

A_B · March 26, 2020, 7:20am

We are doing about 300GB of logs (about 400M documents) for 4 x Logstash with 12 CPU cores each. To be fair, load is < 1 at the moment. I did spend a lot of time optimizing our Logstash filters.

We are working on adding Kafka to the mix, not so much to deal with spikes but to be able to queue all messages during maintenance or if for some reason Logstash or Elasticsearch breaks completely.

rcowart · March 26, 2020, 9:37am

@A_B as you make the move to Kafka, a few things that will really boost throughput...

increase pipeline.batch.size from the default of 125 to at least 1024 (1280 was best in my environment)
increase pipeline.batch.delay from the default of 50 to at least 500 (1000 was best in my environment)
in the kafka input, set max_poll_records to the same value as pipeline.batch.size
each thread defined by consumer_threads in the kafka input will be an instance of a consumer. So if you have 4 instances with 2 threads, that is 8 consumer instances. Your Kafka topics must have at least 8 partitions for all consumer threads to ingest data. You will want more partitions than your current needs so you can easily scale in the future.
the number of pipeline.workers should be at least equal to consumer_threads.
the kafka output should set batch_size to at least 16384

You may end up tweaking some of the buffer settings as well, but the above will give you a good starting point.

Rob

How to install Elasticsearch & Kibana on Ubuntu - incl. hardware recommendations
What is the best storage technology for Elasticsearch?

system · April 23, 2020, 9:37am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filebeat unable to cope with incoming logs Beats filebeat	7	1553	February 8, 2018
Logstash处理数据延时问题？中文提问与讨论	2	3649	June 26, 2017
Filebeat sending data to Logstash seems too slow Beats filebeat	20	22443	June 1, 2017
Increasing throughput from Filebeat to Logstash Beats filebeat	1	1216	November 1, 2019
Logstash taking too long to process data Logstash	22	10048	March 2, 2017

How to slow down large amount of data coming from filebeat?

Related topics