Our ES clusters are deployed in k8s using the eck-operator and our application uses the Java client with sniffer enabled.
We are currently struggling with an annoying issue when the cluster is restarted.
By the time the pods are restarted, their IP changes. As the client uses
PoolingNHttpClientConnectionManager to manage its connections pool, the available connections with the old IPs are kept in the pool until they are picked to serve a new request. It means that an invalid connection is only detected and discarded when it is in fact used.
The time to detect the invalid connection and pick another node to serve the request is too high for our SLAs, so we are trying to find a way to solve this issue permanently.
One of the options we thought about was to disable the sniffer, so that the client would always direct the requests to the internal k8s HTTP SVC, which has static IP. However, we have some concerns regarding this practice, since all the load would then go to this svc.
Considering that we have clusters with 70+ data nodes, would you recommend this practice? Is this service designed to serve as a load balancer?
Thanks in advance.