I'm looking for methods to loadbalance on logstash or elasticsearch . The requirement is that i have decently large log files with thousands of lines, while logstash is perfectly processing the log file as expected, i want to know if there's way to reduce the time taken to do so .
is there a way to direct the input data stream to other instances of logstash in the same/another machine to filter/process it ?
This is a really complex question which cannot really be answered in a forum like this. Step one is to identify the bottleneck in the ingestion process. Is it elasticsearch or logstash? Is the process CPU limited? IO limited? If logstash is it the input or the filters in the pipeline that are limiting ingestion?
If the limit is the input then it might help to use multiple inputs, each processing a subset of *.gz.
It is certainly possible to configure logstash to divide traffic between other logstash instances. You could use something like
There's been no bottleneck yet, i'm in the beginning stages of setting up logstash to filter a huge *.gz file . I was just looking at methods to trim down the time taken for the input to be filtered .
Its currently not CPU or IO limited,logstash is working better than expected with the preliminary tests .
Thanks for the document on connecting logstash to logstash
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.