I have 5 elasticsearch nodes in one cluster ( near 85.000.000 docs , 30Gb data ). After stop and start one server I saw, that resync started with very low speed and very high load at each of cluster nodes. Randomly that nodes go out of cluster and resync start again. What I can do for fix that trouble ?
You have 9 495 shards on 5 nodes?
Which is about 2000 shards per node.
Would you run 2000 databases instances on one single physical machine?
That's a lot.
You did not give your version BTW. If you did not upgrade, upgrade.
Reduce the number of shards. May be you have 5 shards per index and 1 replica? If not needed, reduce that number.
You have here "number_of_pending_tasks" : 818. So I think you have to wait it recovers.
Also look at your logs. They will probably tell you what happened.
FROM elasticsearch:latest
RUN if [ ! -d /usr/share/elasticsearch/plugins/hq ]; then /usr/share/elasticsearch/bin/plugin install royrusso/elasticsearch-HQ; fi
RUN if [ ! -d /usr/share/elasticsearch/plugins/kopf ]; then /usr/share/elasticsearch/bin/plugin install lmenezes/elasticsearch-kopf/2.0; fi
my replica set is 4. We need all of indexes at all of servers.
Considering that you only have 30GB of data, you have far too many indices and shards. With this amount of data each daily index should only have a single primary shard, and I would most likely recommend switching to e.g. monthly indices in order to increase the average shard size and thereby reduce the number of induces/shards that need to be managed and the overhead associated with these. If you are on Elasticsearch 5.x, you can use the shrink index API to get down to 1 primary shard per index. Even though it is getting a bit old, this blog post also contains some good points.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.