It’s feasible to setup a Logstash cluster model either by ruing multiple services or hosting Logstash image in a cluster like kubernetes, however with certain Input Plugins where you pull-in/query data you will run into issues for horizontal scaling , where different nodes trying to ingest the same data and processes it and that may not be ideal ?
Are the following the only implementation to the above issues ?
When using clustering approach, use two different Logstash image, single node (that host the input plugin) to pull-in data and using load balancing to forward responses to the other nodes for processing .
To have the Data/Logs ingestion process lives outside of the Logstash like Beat or any other forwarder.
Logstash doesn't cluster, in the same concept as Elasticsearch. You can use things like load balancers on the input side, but it's going to be relatively simplistic.
And we usually talk about at least once delivery, which means you'll get one, but you may get duplicates. This is the same with Logstash as with Beats.
I'm planing to ingest Data/Logs from Salesforce , GCP , other clouds , different DBs and few more services , each having multiple environments like dev, test, load and prod, and wanted to really ensure Logstasch could scale in case of high data volume and if i'm running multiple Logstash services they they don't ingest the same data, with the usage of unnecessary processing resources and duplicating data on the destination side (New Relics) .
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.