I'm running Filebeat as daemonset in seperate kubernete cluster and sending logs to multiple logstash statefulsets which is running in other Kubernetes cluster. So, do I need to create seperate Logstash Service for every statefulset and expose all those services and give them in Filebeat with loadbalancer enabled or I can simply expose single service for multiple logstash instances and give that in Filebeat with loadbalancer disabled. What is best solution you suggest?
When using an external load balancer, keep in mind that Filebeat->Logstash communication uses persistent connections. In case of an load balancer, the balancer will mostly take care of scheduling different filebeat instances to different logstash instances. That is, this mode is very similar to filebeat configured with multiple Logstash endpoints, but load balancing disabled. This is still a somewhat valid mode, assuming you have a large set of filebeat instances.
With a single service and load balancer you can point filebeat to the service and stop/restart Logstash services, while filebeat can continue running. Yet, load balancing will be far from optimal.
Filebeat provides it's own load balancing support. But this requires you to either know every IP in advance or have one service with distinct names, so to control the scaling in the load balancer via service-/hostnames. Or even a mix of both. Have N known service endpoints, configuring the max load-balancing done within filebeat and additional load balancer distributing TCP connections, such that you can add/remove/replace logstash instances on the fly. Agreed, this is not optimal in the k8s environment. In the future I hope to add some kind of output discovery (such that filebeat can dynamically add/remove Logstash endpoints if k8s environment adds/removes them). But as of now there is no single best practice.
Thank you, so you suggest to use multiple logstash services in the filebeat configuration with load balancer enabled instead of using a single Kubernetes service pointing to multiple logstash instances with loadbalancer disabled.
I'd say it depends. With a single logstash service + multiple logstash instances behind a load balancer, you are might not get any good load balancing. But on the other hand, setup + maintenance is simpler.
Some users don't even use load balancers or services, but rely on filebeat load balancing support. Instead they reconfigure + restart filebeat whenever the number of active logstash instances is updated. For this later use case we're thinking to add some kind of auto-discovery, such that filebeat will update itself upon logstash container startup/shutdown.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.