This is a really complex question which cannot really be answered in a forum like this. Step one is to identify the bottleneck in the ingestion process. Is it elasticsearch or logstash? Is the process CPU limited? IO limited? If logstash is it the input or the filters in the pipeline that are limiting ingestion?
If the limit is the input then it might help to use multiple inputs, each processing a subset of *.gz
.
It is certainly possible to configure logstash to divide traffic between other logstash instances. You could use something like
filter { ruby { code => 'event.set("[@metadata][target]", rand(3))' } }
output {
if [@metadata][target] == "0" {
output { ... }
} else if [@metadata][target] == "1" {
output { ... }
} else {
output { ... }
}
}
One way of connecting logstash to logstash is described here.