We had an event this morning were a node reached its high watermark. Elasticsearch decided to move 21 of its shards.
We didn't modify cluster.routing.allocation.cluster_concurrent_rebalance, so it should limit the reallocation to two, right? Is this parameter not respected when reallocations are due to a high watermark?
We had cluster.routing.allocation.node_concurrent_recoveries set to a somewhat high value (20) though, but our understanding is that this specific parameter only happen when recovering, while the move we saw were clearly labelled as a reallocation...
cluster.routing.allocation.cluster_concurrent_rebalance only affects rebalance operations, but moving shards due to a disk watermark are different (and higher-priority) so they are not sensitive to this parameter.
It's almost always best to leave cluster.routing.allocation.node_concurrent_recoveries set to the default of 2. If you want faster recoveries and your system can handle it then increase indices.recovery.max_bytes_per_sec instead.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.