i am creating an index with 10 primary shards and 0 replicas, however ES keeps creating the shards on the same data node.
i tried to set cluster.routing.allocation.balance.index to 0.75 but this seem to have no effect.
in fact, I've noticed that after changing this setting, shards started to reallocate, but of indexes that their shards are already distributed among different data nodes (which was a surprise).
Why ES insists on creating primaries shards of the same index on the same data node? How can i make ES allocate the shards on different data nodes (even at the cost of failing the creation of index)?
Ill get back to you about that patch version.
All 10 primary shards are created immediately on the same data node.
I tried to delete the index and re-create it, they still created on a single data node, and oddly always on that same one.
when i enumerate through the shards with the explain api, all shards return the following rebalance_explanation:
"rebalance_explanation" : "cannot rebalance as no target node exists that can both allocate this shard and improve the cluster balance"
from the data node log, in the 5 minutes time frame before the index was created, i see the following message repeating multiple times - on a different index:
[2018-07-10T10:30:00,877][DEBUG][o.e.a.b.TransportShardBulkAction] [iapp707-data] [mount_search-2018.07.10_0700][4] failed to execute bulk item (index) BulkShardRequest [[mount_search-2018.07.10_0700][4]]
containing [37] requests
i did not tried that setting.
i tried to change cluster.routing.allocation.balance.index to 0.75, expecting the shards to start relocate, but that never happened.
as i wrote above, i am confused as why my setting change did not have any effect (or at least the effect i expected it), and what is the difference to your suggestion?
so if most of the indices are distributed correctly, setting cluster.routing.allocation.balance.index to higher value may not have the expected effect because overall it will still not cross allocation.balance.threshold?
and you suggested setting is more 'aggressive', meaning there is no threshold involved, just making sure the shards are distributed?
also - is this setting only enforced at index creation time or also will reallocate existing indices?
It is a dynamic setting, so even though it probably would be good to set through an index template, I believe it should take effect and cause a rebalancing even if applied at a later stage. I have however not used it in a long time so am not entirely sure. Best way to find out is probably to try.
it worked - all shards are now distributed on different nodes.
do you recommend to apply this setting to all indices, or 'as needed'? i am referring to note at the bottom of the documentation page: "These settings impose a hard limit which can result in some shards not being allocated. Use with caution."
If you use this by default and lose a few data nodes so that all primary and replica shards can not be allocated to distinct hosts I assume the index would go to a yellow state (red if all primaries could not be all allocated). You are in a better position to judge what impact this would have on your use case and how likely it is to happen.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.