My current config is as below, right now below config processes 119,445,555 records from beats to logstash and sent to elastic takes 3 mins 40 sec. what other config can improve this performance. The AccountFilter1 is simple rename of fields, so i dont see much options there.
Unfortunately that is one of those questions that really does not work well in a forum like this because it requires a level of visibility into your system that it is not practical for you to provide.
Step one is to figure out whether beats, logstash or elasticsearch is the bottleneck. Then, for whichever one is the bottleneck, what is the resource constraint (CPU, memory, disk throughput) limiting it.
I would test with a smaller record set. Maybe 30 million records, to get faster turnaround time on the iterations.
I would test the scalability of each pipeline. That is, running only a single pipeline, change the number of worker threads and see how the throughput responds. You would hope for it to be nearly linear, but it will only take a few minutes to check.
Within logstash there are almost no tuning options related to performance. There are a few in elasticsearch if the problem turns out to be there.
Oh, and make sure any grok patterns and regexps are appropriately anchored.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.