I have a elastic cluster of 6 data nodes running on AWS m4.2x large, each having 8 cores and 32 g of memory.Whenever load increases,it is able to serve upto a certain number which is
according to the formula - ((no. of cores)*3/2)+1,
a lot less that its search thread pool limit but after that all request starts going to the same node which gets stuffed up in the queue of that node and the response time of all the coming requests increases by about 5x.My question is why is the requests not distributing uniformly on other nodes.I have not used any load balancer so its the default elasticsearch routing.I am also not able to see anything from the slow logs as there is no specific type of query which is taking time.
Please let me know what is going on wrong here. Below are some stats for that node.
Are the clients distributing requests evenly across the nodes in the cluster? Is data distributed evenly across the cluster?
It seems the client is distributing requests on a certain factor because you can see from the above thread pool graph that there still are some threads in some nodes which can be used to service the request but all the requests are getting in the queue of a single node.How can I infer on what basis the client is sending request to the nodes ?
Below is the distribution of some of my indexes on the 6 data nodes.Most of them have 3 shards and 3 replica each.
Which client are you using? How have you configured it?
I am using transport client with client.transport.sniff property set to false and the transport addresses of the client nodes.
Are you using routing or preference, which may affect how requests are routed?
does anyone have any suggestion ?
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.