I need to query multiple database instances and I wanted to know when defining multiple jdbc input plugins in just 1 pipeline is the exact same thing (technically) as defining multiple pipelines with 1 jdbc input each ?
When running Logstash with mutilple jdbc input plugins, are they querying databases in a simultaneous way (concurrently) or are they querying sequentially ?
And what about performance, what would be better, multiple pipelines with the same config (except for the jdbc URL) or multiple jdbc input plugins in just 1 pipeline ?
If you have multiple JDBC inputs in a pipeline or even multiple pipelines within the same jvm and none of them have schedules defined they will all execute as soon as the JVM starts. As far as what is better 1 pipeline or multiple it really depends on what you need to do to the data, I have not noticed any performance issues between the two. If logically you can get away with a single pipeline then go for it, if it's easier to manage via multiple pipelines then perhaps that's your direction. If you have very heavy filters or perform a lot of lookups to enrich (i.e. elasticsearch filter, jdbc_streams, etc...) I have found it's better to leverage multiple JVMs instead of piling everything onto a single one. I am not recommending you spin up extra JVMs, this is just something I found via trial and error given my use case. Just get in there and experiment....
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.