I need some suggestions on how/what can be used to balance the data that is coming from 1000 of applications to logstash.
Architecture looks like this:
We have various sources who's logs needs to be integrated with Elastic search, The incoming logs needs to be distributed across 8 logstash servers.
Currently we are using F5 load balancer for distributing the incoming data from sources to logstash servers , but we want to remove or completely replace the F5 load balancer.
Maybe, maybe not. If a domain name resolves to a list of IP addresses then the DNS server should return them in a different order for each request it serves. But there are multiple levels of caching of the responses, so in practice the order may be the same. It really depends on the software stack you are using to resolve addresses.
If you remove the F5 you would need something similar to Load Balance the requests, maybe a server with HAProxy, but then to have resilience you would need more than one server and a VIP address using Keepalived.
Or you may change the way you ingest your data and send everything to a Kafka Cluster and configure your Logstash servers to read from that Kafka Cluster.
I would not recommend using DNS to load balance because of cache and other things, it is not really a load balancer
I use Haproxy for this reason. sends thousands of requests to five logstash server and it does good job. sends almost equal amount of record to each logstash server.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.