I need some pointers on an intermittent no node available exception that I am facing
we have few VMs running wb services and a 6 node ES cluster . WE hit the cluster from these web services
Recently We have been observing intermittent no node available exceptions.
Whats common among these errors is that they all come from just one server and all the errors are centred around the same time(few seconds in a day).
Rest of the time the entire setup works fine.
nproc and nofile have been set to sufficiently high numbers in limits.conf
* soft nproc 256000
* hard nproc 256000
* hard nofile 1048576
* soft nofile 1048576
So I don't think It could be the case of sufficient file descriptors not being available. I am using elasticsearch trasnport client with sniff set to false.
How can I debug this ?
Is it possible that because of high load the server can not make ES connections ?
How can this be cofirmed ?
This seems like an open ended question but any help is appreciated