We have been given a logstash endpoint, which goes against 2 replicas running on an infra of k8s. What we see is that there is no control of the logstash pod to which we connect, and that many times our clients end up in the same pod decreasing the throughput by half, since we see that the sessions that open the filebeats are constantly open until they have nothing to send.
Does the filebeat have any configuration options for the connection against the logstash to close and open every X seconds, every X events, or every X bytes?
I have also tried to increase the number of connections from each client, adding in the output of logstash "worker: 2" but I see that it does not work either.
An entrypoint. It is a loadbalancer VIP, that balance between all the workers.
This is the other point i am testing, set the fqdns of the 2 pods as:
my-pod-0.my-svc.my-namespace.svc.cluster.local
my-pod-1.my-svc.my-namespace.svc.cluster.local
And this fqdn resolve as the loadbalancer VIP.
But i also would like to know if at filebeat level i am able to set the number of connections and limit the time these connections are open.
I read about some issues in the past where it was need to point directly to the pods running logstash, not an entrypoint, to achieve a better load balancing, but I could not find the post about it.
If I'm not wrong you limit the number of connection with the workersetting, since you have worker: 2, it will have two workers connecting to your logstash endpoint.
To limit the time you may need to use the ttlsetting, since you are behind a load balancer.
Connections from beats to logstash are sticky, so if you have a load balancer in front of logstash, this can lead to uneven balancing.
You may try to use the ttl setting (and also need to disable pipelining as explained in the documentation).
As you said, i had to set "pipelining: 0", but also "loadbalance: true". The guide reference says loadbalance is true by default, but it seems that is not true, i have had to set "loadbalance: true" in the config to work. But with this config:
This is true if you have more than one hosts defined in the hosts setting, if you just have one the load balance is not done by filebeat, but by your ingress.
With one host as i have, with only pipelining: 0, it didn't work. Then, adding "loadbalance: true", it started to work as expected.
Finally I have not been able to connect to a specific pod, I understand because my service is not headless. But I managed to control the session balancing with this configuration in the k8s service:
externalTrafficPolicy: Local
With this configuration you can control from the loadbalancer the sessions to each worker/pod. Specifically I have configured a least_connections type balancing so that all pods have the same number of sessions.
Thanks for your help, because with these configurations I can say that I have managed to have a balanced flow.
It seems like you're facing connection issues with Logstash and are looking for ways to optimize Filebeat. To address the connection problem, you may want to consider adjusting the loadbalance option in your Filebeat configuration. Additionally, you can explore using the timeout setting to manage connections more effectively. Keep in mind that tuning these parameters may require some experimentation to find the best configuration for your specific use case. Best of luck with your optimization efforts! AC Football Cases
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.