ES allocates primary shards on the same data node

Hi,

i am creating an index with 10 primary shards and 0 replicas, however ES keeps creating the shards on the same data node.
i tried to set cluster.routing.allocation.balance.index to 0.75 but this seem to have no effect.
in fact, I've noticed that after changing this setting, shards started to reallocate, but of indexes that their shards are already distributed among different data nodes (which was a surprise).

i looked at this post: https://www.elastic.co/guide/en/elasticsearch/reference/current/allocation-total-shards.html and been wondering (haven't tried that yet) - why my configuration does not have the effect i expect it to have while the suggested setting seem to tackle the exact same issue i am facing?

Why ES insists on creating primaries shards of the same index on the same data node? How can i make ES allocate the shards on different data nodes (even at the cost of failing the creation of index)?

thanks,
Ofer

What version of Elasticsearch are you using and crucially, how many nodes are in the cluster?

Hi,

I am using ES 5.6, and my cluster has 22 data nodes.

Adding more info, there are 3 master nodes and 20 coordinating nodes.
An overall of 10k shards, the cluster is always in green state.

Ok, so in total, 45 node cluster running Elasticsearch 5.6.x (which patch version - 5.6.10?):

  • 3 dedicated master nodes
  • 22 data nodes
  • 20 coordinating (i.e. not data, not master) nodes

And with a 10 primary shard index, what distribution of the primary shards are you seeing?

Ill get back to you about that patch version.
All 10 primary shards are created immediately on the same data node.
I tried to delete the index and re-create it, they still created on a single data node, and oddly always on that same one.

What does the log file show on that data node for when the index is created?

If you enumerate the shard ids and execute Cluster Allocation Explain API, what is returned for each shard?

GET /_cluster/allocation/explain?include_yes_decisions=true
{
  "index": "<index name>",
  "shard": 0,
  "primary": true
}

hi,

version details:
"version" : {
"number" : "5.6.3",
"build_hash" : "1a2f265",
"build_date" : "2017-10-06T20:33:39.012Z",
"build_snapshot" : false,
"lucene_version" : "6.6.1"
}

when i enumerate through the shards with the explain api, all shards return the following rebalance_explanation:
"rebalance_explanation" : "cannot rebalance as no target node exists that can both allocate this shard and improve the cluster balance"

from the data node log, in the 5 minutes time frame before the index was created, i see the following message repeating multiple times - on a different index:
[2018-07-10T10:30:00,877][DEBUG][o.e.a.b.TransportShardBulkAction] [iapp707-data] [mount_search-2018.07.10_0700][4] failed to execute bulk item (index) BulkShardRequest [[mount_search-2018.07.10_0700][4]]
containing [37] requests

Can you verify that all nodes are running exactly the same version, by running GET /_cat/nodes?h=id,ip,v,m ?

yes, all nodes are running 5.6.3:
curl --silent "my_ip/_cat/nodes?h=v" | sort | uniq
5.6.3

Have you tried using the total shards per node index setting?

i did not tried that setting.
i tried to change cluster.routing.allocation.balance.index to 0.75, expecting the shards to start relocate, but that never happened.
as i wrote above, i am confused as why my setting change did not have any effect (or at least the effect i expected it), and what is the difference to your suggestion?

Cluster settings take all indices into account while the one I linked to is per index.

so if most of the indices are distributed correctly, setting cluster.routing.allocation.balance.index to higher value may not have the expected effect because overall it will still not cross allocation.balance.threshold?
and you suggested setting is more 'aggressive', meaning there is no threshold involved, just making sure the shards are distributed?
also - is this setting only enforced at index creation time or also will reallocate existing indices?

It is a dynamic setting, so even though it probably would be good to set through an index template, I believe it should take effect and cause a rebalancing even if applied at a later stage. I have however not used it in a long time so am not entirely sure. Best way to find out is probably to try.

i will give it try and update soon - thanks

it worked - all shards are now distributed on different nodes.
do you recommend to apply this setting to all indices, or 'as needed'? i am referring to note at the bottom of the documentation page: "These settings impose a hard limit which can result in some shards not being allocated. Use with caution."

If you use this by default and lose a few data nodes so that all primary and replica shards can not be allocated to distinct hosts I assume the index would go to a yellow state (red if all primaries could not be all allocated). You are in a better position to judge what impact this would have on your use case and how likely it is to happen.

Christian/forloop,

thanks for helping with this issue!

Ofer

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.