Methods to loadbalance on logstash from Input data stream to Filter section

I'm looking for methods to loadbalance on logstash or elasticsearch . The requirement is that i have decently large log files with thousands of lines, while logstash is perfectly processing the log file as expected, i want to know if there's way to reduce the time taken to do so .

is there a way to direct the input data stream to other instances of logstash in the same/another machine to filter/process it ?

input {
file  {
path => "/u0/sn/input_directory/*.gz"
mode => "read"
sincedb_path => "/tmp/main.db"
file_completed_action => "log"
file_completed_log_path => "/u0/sn/input_directory/output.txt" 
}    
}

This is a really complex question which cannot really be answered in a forum like this. Step one is to identify the bottleneck in the ingestion process. Is it elasticsearch or logstash? Is the process CPU limited? IO limited? If logstash is it the input or the filters in the pipeline that are limiting ingestion?

If the limit is the input then it might help to use multiple inputs, each processing a subset of *.gz.

It is certainly possible to configure logstash to divide traffic between other logstash instances. You could use something like

filter { ruby { code => 'event.set("[@metadata][target]", rand(3))' } }
output {
    if [@metadata][target] == "0" {
        output { ... }
    } else if [@metadata][target] == "1" {
        output { ... }
    } else {
        output { ... }
    }
}

One way of connecting logstash to logstash is described here.

1 Like

There's been no bottleneck yet, i'm in the beginning stages of setting up logstash to filter a huge *.gz file . I was just looking at methods to trim down the time taken for the input to be filtered .

Its currently not CPU or IO limited,logstash is working better than expected with the preliminary tests .

Thanks for the document on connecting logstash to logstash

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.