I have an Elasticsearch cluster (version 2.3.4) with a custom app pulling data and a couchbase cluster pushing data to it (once an hour). The cluster is based on Amazon EC2 machines with the same specs and settings. After few hours one of the nodes seems to work "harder" than the others, in the monitoring plugins (KOPF, Elastic HQ) I can see the load is constantly high and once every few days the number of "Field Evictions" is raising.
While I understand this is an indication for lack of memory (which leads to high IOPs and high cpu), I wish to know why only one (specific) node is showing these symptoms in the cluster and why the load isn't spreading. If I restart the cluster, another node will show these symptoms few days after until the cluster is restarted again.
Thank you for the quick reply, I forgot to mention the version of Elasticsearching we're using, 2.3.4 (added to the original post too).
Yes, the app is allowed to access any/all of the Elasticsearch instances, the instances aren't limited to a specific role (master,data storage, router).
After we restart the cluster (gradually), the load will "bounce" to another instance and will stay there until we restart the cluster again.
You have a skew in your design: 4 nodes but 5 shards. So, one node must hold 2 shards, and must burden double load.
Golden rule: always align the shard count with the number of data nodes. Either by a 1:1 ratio, which is easiest, or in the case of many indices, by a 1:n ratio, so each data node will hold the same number of shards.
Beside that, I would strongly recommend to set up an odd number of master-eligible nodes to make the distributed system proof against split-brain situations. See the minimum_master_nodes setting .
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.