We are experiencing some troubles with our cluster. When we come into the office on monday, one or two of our nodes are gone including the master.
I also get this message in the logs: org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (put-mapping) within 30s
From what I read here in the forum that could be because I have to many shards, which is highly possible when I look at my clusterhealth.
You have far too many shards for a cluster that size. You need to revise you sharing strategy and bring that down by at least an order of magnitude or so. Aim to have an average shard size between a few GB and a few tens of GB.
you can check shard and index size through the _cat/indices and _cat/shards APIs. What type of data do you have in the cluster? What is your current sharding strategy? If you are using time-based indices, what is your retention period? Which version of Elasticsearch are you using?
Thank you.
Ok my biggest index is something around 18Gb... and some of my shards are around 1,5Gb.
We are using it for Apache logfiles, some windows service logs and since a month or so the output of our docker containers.
We create a new index for everyday, but we have 9 indices.
We are running 5.0.1.
I am not quite sure what you meant with sharing strategy, but if it is the shard and replica config, there it is:
That was supposed to be sharding, not sharing. The biggest index seems OK, but probably do not need 5 primary shards. Adjust the number of primary shards and do not use the default of 5 for very small indices. Also consider consolidating small indices and/or using weekly or even monthly indices instead of daily.
If I am not completely wrong I can't change the shard size to anything smaller without removing the index?
But first of all thank you for your help. You already helped me a lot.
As you are on Elasticsearch 5.x, the shrink index API can help you get from 5 to 1 shard per index. You may also be able to reduce the number of replicas you have configured in order to bring the shard count down. Beyond that, and I think you will need to reduce the shard count further than that, you will need to reindex data. This can take time, but do change the settings for newly created indices so that you generate fewer new shards per day right away.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.