Hi Team,
How will you slow down large amount of data streaming from filebeat to logstash so that it can be processed accurately in filter section.
Thanks and Regards,
Sagar Mandal
Hi Team,
How will you slow down large amount of data streaming from filebeat to logstash so that it can be processed accurately in filter section.
Thanks and Regards,
Sagar Mandal
Hi @Sagar_Mandal,
that should be mostly automatic. Of course depends on how much data would have to be cached...
Can't find this in any official Elastic documentation but as far as I remember, it is part of the lumberjack protocol that is used between Filebeat and Logstash.
From Send Your Data | Logz.io Docs
One of the facts that make Filebeat so efficient is the way it handles backpressure— so if Logstash is busy, Filebeat slows down it’s read rate and picks up the beat once the slowdown is over.
But from personal experience, if the Logstash filter section is very process heavy, you can still get into trouble. I have killed my Logstash instances with sub-optimal filters, especially GROK filters with poorly written patterns and no anchoring.
okay so the thing is about 100GB of data comes everysingle day to logstash from filebeat and then it goes to a filter section where a lot of conditioning is being done so....yeah.
100GB per day is a lot of data. I would recommend something like:
filebeat --> kafka --> multiple logstash instances --> elasticsearch
Bursts of messages can then be queued in Kafka and multiple Logstash instance can be used to scale the post-processing of that data.
Rob
How to install Elasticsearch & Kibana on Ubuntu - incl. hardware recommendations
What is the best storage technology for Elasticsearch?
We are doing about 300GB of logs (about 400M documents) for 4 x Logstash with 12 CPU cores each. To be fair, load is < 1 at the moment. I did spend a lot of time optimizing our Logstash filters.
We are working on adding Kafka to the mix, not so much to deal with spikes but to be able to queue all messages during maintenance or if for some reason Logstash or Elasticsearch breaks completely.
@A_B as you make the move to Kafka, a few things that will really boost throughput...
pipeline.batch.size
from the default of 125 to at least 1024 (1280 was best in my environment)pipeline.batch.delay
from the default of 50 to at least 500 (1000 was best in my environment)kafka
input, set max_poll_records
to the same value as pipeline.batch.size
consumer_threads
in the kafka
input will be an instance of a consumer. So if you have 4 instances with 2 threads, that is 8 consumer instances. Your Kafka topics must have at least 8 partitions for all consumer threads to ingest data. You will want more partitions than your current needs so you can easily scale in the future.pipeline.workers
should be at least equal to consumer_threads
.batch_size
to at least 16384You may end up tweaking some of the buffer settings as well, but the above will give you a good starting point.
Rob
How to install Elasticsearch & Kibana on Ubuntu - incl. hardware recommendations
What is the best storage technology for Elasticsearch?
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.