When using the index settings with auto_expand_replicas set to "0-all," an issue arises where primary shards are concentrated on specific nodes

Due to the service requirements, the setting "auto_expand_replicas" is configured as "0-all," enabling replica shards to be present on all nodes. However, there is an issue where primary shards are concentrated on a specific node, which can negatively impact performance.

To address this, the following steps are being utilized, and we want to know if there are any potential risks and if there is a better approach:

Step 1: Set "auto_expand_replicas" to false and "number_of_replicas" to 0.
Step 2: Verify that primary shards are evenly distributed across all nodes.
Step 3: Set "auto_expand_replicas" back to "0-all."
Step 4: finish

Please let us know if there are any potential risks involved in these steps, and if there are any better alternatives.

I'm uploading a link to a stackoverflow question because I can't upload the picture.

The steps you've outlined are generally safe, but there are a few potential risks and considerations:

  1. During the time when "number_of_replicas" is set to 0, your data is at risk. If a node fails during this time, you could lose data.

  2. Changing the "number_of_replicas" to 0 and then back to "0-all" will cause a lot of shard movement, which can put a significant load on your cluster and impact performance. This is especially true if your indices are large.

  3. Verifying that primary shards are evenly distributed across all nodes can be tricky. Elasticsearch tries to balance the shards across all nodes, but it's not always perfect. There's no guarantee that the shards will be evenly distributed after setting "number_of_replicas" back to "0-all."

As for alternatives, you could consider using shard allocation filtering to control the allocation of the shards. This allows you to specify which nodes a shard can be allocated to, giving you more control over the distribution of your shards. However, this requires careful planning and understanding of your cluster's capacity.

I fully sympathize with the risks you mentioned.

Considering the various risks, I think that reroute the primary shard in the above situation may not be of great benefit. I wonder what you think.

In my opinion, there is a replica shard on every node anyway, so there is no problem even if the node with the primary shard is shut down, so I think it would be better not to do it.

Nevertheless, I'm curious about your opinion if you think it's good to reblance when primary shards are crowded.

Thanks in advance for your reply.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.