We have been using 3 nodes cluster on aws that handled up to 6,000 search requests per second : 1 master node on large instance and 2 data nodes on extra large intsances.
On each node 50% of RAM was allocated and locked for elastic search.
Data nodes are connected to elb load balancer and all search requests point to this load balancer
When we tried to add 2 more data nodes to support higher traffic (about 15,000 requests per second) cluster became very unstable and one of nodes constently disconnects when traffic comes to about 8,000 - 9,000. Disconnected node causes overload on other nodes which makes all cluster unusable.
when node gies down we r getting varias exceptions:
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id
org.elasticsearch.transport.SendRequestTransportException: [master][inet[/188.8.131.52:9300]][discovery/zen/leave] at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:199)
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of [org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler]
can i set the cluster in a more robust way ?