We've been running a production cluster of elasticsearch 1.0.0 with 3 nodes in 3 regions in AWS for about a year now, with about ~5K requests per minute on average.
Last night, although there was no traffic spike, the elasticsearch log started to fill up with the following exception:
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 1000)
Shortly after that the CPU of the 3 machines went up to 100% and they became inaccessible.
We restarted them and all is well for now, but we are trying to understand what happened and would greatly appreciate any insight we can get from the members of this forum.
Just to re-iterate, there was no traffic spike.
(We are also waiting to see if there was some network error with amazon, but that wouldn't fully explain the CPU spike).