Distributing DB input across multiple Logstash instances

Very much a n00b here.

We're experimenting with an ELK install to index XML stored in an RDBMS. We're performing a fair amount of parsing on it to define certain fields of interest to be displayed in Kibana.

With a single LS instance, despite running with multiple workers, we expect it will struggle to keep up with the incoming data, and so would like to install multiple LS instances.

However, I'm unsure how the incoming data can be distributed among the instances, preferably with a single config file (rather than customising it for each instance). Each row of data currently has a sequential identifier, so I thought the input query for an instance could, for example, apply a modulo function to the id of newly arrived data to determine which rows that instance will process. But how could this be done with a single config file? I suppose I could somehow incorporate the host name or some other instance-specific variable into the algorithm, but is there a simpler way to distribute data across multiple instances vying for the same input source?

Having said this, I also considered that this may be wrong way to go about parallelising the LS operations, as it would mean (say) three instances are competing for the DB. Another option would be the reverse of the last diagram in https://www.elastic.co/guide/en/logstash/current/deploying-and-scaling.html - that is, to have a single LS shipper instance route the data from the DB into multiple queues, one for each LS indexer instance.

Am I on the right track?


You'll have to use different queries for different LS instances. Perhaps you can set environment variables that you reference in the queries (should work as of LS 2.3, IIRC)?

Having one LS instance to read the DB and feed a single queue that any number of LS instances can fetch from seems like a better idea.