Also we are noticing that this error comes intermittently. All the queries fired start taking much longer response time like 8-16 seconds and then we get this error.
Post this error we see that the response time comes back to miliseconds for most of the queries.
Another things we noted was CPU Utilization on ES Nodes is very high (90% +) and its specifically for few nodes so lets say out of 5 nodes we see high utilization in 3 nodes.
We arent getting any pointers for this. Dont know if bumping up the memory is what we should do. Memory utilization though is constantly 67-70%