Replica allocation awareness

JacovanZyl · August 30, 2019, 10:25am

Hi there

Does cluster routing awareness route replicas based on number data nodes as default if routing attribute is not set specifically in the elasticsearch.yml?

I only set an attribute to differentiate between hot/warm for ilm and no specific routing awareness setting based on an attribute but found that although my primary shards are indeed on the hot nodes all the replicas are placed on the warm nodes (There is no discrepancy here all replicas are on the warm nodes and not split across data nodes).

Can I or should I rather set another attribute to differentiate the hot nodes from the warm nodes so it will balance according to that? Will the replicas actually then be allocated to only the more performant nodes or will the replicas still be allocated to the less performant nodes?

Thanks in advance

DavidTurner · August 30, 2019, 11:22am

As a rule, if your shards are not being allocated how you expect then the allocation explain API is the best way to find out why. Can you use this API to explain one of the problematic shards and share the output here if you need help understanding it?

JacovanZyl · September 2, 2019, 5:40am

Hi David, it seems there might be some misconfiguration on the amount of nodes and I might need some explicit settings to account for cluster routing to only hot nodes.

I have 4 'hot' data nodes and 2 'warm' data nodes.

  "index" : "index-000003",
  "shard" : 0,
  "primary" : false,
  "current_state" : "started",
  "current_node" : {
    "id" : "J-k9P8KCTjyZ96S7dffvrw",
    "name" : "node-5",
    "transport_address" : "192.168.1.8:9300",
    "attributes" : {
      "ml.machine_memory" : "16510197760",
      "ml.max_open_jobs" : "20",
      "xpack.installed" : "true",
      "data" : "warm"
    },
    "weight_ranking" : 5
  },
  "can_remain_on_current_node" : "yes",
  "can_rebalance_cluster" : "yes",
  "can_rebalance_to_other_node" : "no",
  "rebalance_explanation" : "cannot rebalance as no target node exists that can both allocate this shard and improve the cluster balance",
  "node_allocation_decisions" : [
    {
      "node_id" : "huFdb7s0SBKjPqN_6xAnyg",
      "node_name" : "node-1",
      "transport_address" : "192.168.1.2:9300",
      "node_attributes" : {
        "ml.machine_memory" : "32144646144",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true",
        "data" : "hot"
      },
      "node_decision" : "no",
      "weight_ranking" : 1,
      "deciders" : [
        {
          "decider" : "max_retry",
          "decision" : "YES",
          "explanation" : "shard has no previous failures"
        },
        {
          "decider" : "replica_after_primary_active",
          "decision" : "YES",
          "explanation" : "primary shard for this replica is already active"
        },
        {
          "decider" : "enable",
          "decision" : "YES",
          "explanation" : "all allocations are allowed"
        },
        {
          "decider" : "node_version",
          "decision" : "YES",
          "explanation" : "can allocate replica shard to a node with version [7.3.1] since this is equal-or-newer than the primary version [7.3.1]"
        },
        {
          "decider" : "snapshot_in_progress",
          "decision" : "YES",
          "explanation" : "the shard is not being snapshotted"
        },
        {
          "decider" : "restore_in_progress",
          "decision" : "YES",
          "explanation" : "ignored as shard is not being recovered from a snapshot"
        },
        {
          "decider" : "filter",
          "decision" : "YES",
          "explanation" : "node passes include/exclude/require filters"
        },
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[us-application-logs-000021][0], node[huFdb7s0SBKjPqN_6xAnyg], [P], s[STARTED], a[id=k_2YYC6cTnKXkR3MYciw9A]]"
        },
        {
          "decider" : "disk_threshold",
          "decision" : "YES",
          "explanation" : "enough disk for shard on node, free: [773.9gb], shard size: [3.9gb], free after allocating shard: [769.9gb]"
        },
        {
          "decider" : "throttling",
          "decision" : "YES",
          "explanation" : "below shard recovery limit of outgoing: [0 < 2] incoming: [0 < 2]"
        },
        {
          "decider" : "shards_limit",
          "decision" : "YES",
          "explanation" : "total shard limits are disabled: [index: -1, cluster: -1] <= 0"
        },
        {
          "decider" : "awareness",
          "decision" : "NO",
          "explanation" : "there are too many copies of the shard allocated to nodes with attribute [data], there are [2] total configured shard copies for this shard id and [2] total attribute values, expected the allocated shard count per attribute [2] to be less than or equal to the upper bound of the required number of shards per attribute [1]"
        }
      ]
}

DavidTurner · September 2, 2019, 5:56am

This shard is on a warm node:

  "current_node" : {
    "id" : "J-k9P8KCTjyZ96S7dffvrw",
    "name" : "node-5",
    "transport_address" : "192.168.1.8:9300",
    "attributes" : {
      "ml.machine_memory" : "16510197760",
      "ml.max_open_jobs" : "20",
      "xpack.installed" : "true",
      "data" : "warm"
    },
    "weight_ranking" : 5
  },

This is permitted by the filters that you have currently configured:

        {
          "decider" : "filter",
          "decision" : "YES",
          "explanation" : "node passes include/exclude/require filters"
        },

If this is surprising, and you expect this shard not to be on a warm node due to the filters you're using, then can you share the settings for this index?

GET index-000003/_settings

JacovanZyl · September 2, 2019, 6:13am

Hi David

Settings for the index:

{
  "index-000003" : {
    "settings" : {
      "index" : {
        "lifecycle" : {
          "name" : "ilm_policy",
          "rollover_alias" : "index"
        },
        "number_of_shards" : "4",
        "provided_name" : "index-000003",
        "creation_date" : "1567387868921",
        "priority" : "100",
        "number_of_replicas" : "1",
        "uuid" : "HfL2SuEMRpG84MBORBsbnQ",
        "version" : {
          "created" : "7030199"
        }
      }
    }
  }
}

Thanks again for the assist

DavidTurner · September 2, 2019, 6:27am

You have no allocation filters in place, so this shard can be allocated to every node. If you want it only to be on certain nodes, you need to configure some allocation filters. For instance, set "index.routing.allocation.require.data": "hot" to require nodes with an attribute "data": "hot".

JacovanZyl · September 2, 2019, 7:22am

Hi David,

Can I add this to my index template settings and also if I add this to my current live indexes will the cluster start to move them automatically?

DavidTurner · September 2, 2019, 7:35am

Yes and yes.

JacovanZyl · September 2, 2019, 8:30am

I added the awareness setting and the Primaries are split across the hot tier as expected, but the replicas are not being assigned, do I need an explicit setting in templete to account for replica allocation?

See decision on a Replica shard below:

{
  "index" : "index-000004",
  "shard" : 0,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "INDEX_CREATED",
    "at" : "2019-09-02T08:01:08.929Z",
    "last_allocation_status" : "no_attempt"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",

There is also this bit in the decision output from one of the Primary shards:

    {
      "decider" : "awareness",
      "decision" : "NO",
      "explanation" : "there are too many copies of the shard allocated to nodes with attribute [data], there are [2] total configured shard copies for this shard id and [2] total attribute values, expected the allocated shard count per attribute [2] to be less than or equal to the upper bound of the required number of shards per attribute [1]"
    }

Template Setting:

"settings" : {
  "index" : {
    "lifecycle" : {
      "name" : "index",
      "rollover_alias" : "index"
    },
    "refresh_interval" : "1s",
    "number_of_shards" : "4",
    "number_of_replicas" : "1"
    },
  "routing": {
    "allocation": {
      "require": {
        "data": "hot"
      }
    }
  }
},

DavidTurner · September 2, 2019, 8:43am

It looks like you have configured allocation awareness on the data node attribute, effectively asking Elasticsearch to spread the shards evenly across hot and warm nodes. This is in conflict with your request that all the shards be on hot nodes. I don't think you want allocation awareness.

JacovanZyl · September 2, 2019, 9:11am

There was a setting in my elasticsearch.yml but has been removed a while ago.

If the setting is still lingering somewhere will it account for why the replica shards are not being split across the hot nodes?

Replica Explanation:
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",

Here is my current cluster settings and only the node attribute is set in the elasticsearch.yml:

{
  "persistent" : {
    "xpack" : {
      "monitoring" : {
        "collection" : {
          "enabled" : "true"
        }
      }
    }
  },
  "transient" : {
    "indices" : {
      "lifecycle" : {
        "poll_interval" : "5m"
      }
    }
  }
}

DavidTurner · September 2, 2019, 9:25am

Could you check for any mention of allocation awareness in GET /_nodes/settings too?

JacovanZyl · September 2, 2019, 9:52am

I found 1 hot node still with the allocation setting, can I remove it with an API call or do I need a restart?

The setting is commented out in the elasticsearch.yml

  "settings" : {
    "pidfile" : "/var/run/elasticsearch/elasticsearch.pid",
    "cluster" : {
      "name" : "es-cluster",
      "routing" : {
        "allocation" : {
          "awareness" : {
            "attributes" : "data"
          }
        }
      },

DavidTurner · September 2, 2019, 10:04am

The settings returned by GET /_nodes/settings are static (i.e. read from elasticsearch.yml at startup) so you will need to restart this node to pick up any changes you've made since the last time it was restarted.

JacovanZyl · September 2, 2019, 1:33pm

Hi David

Thanks for your help, my cluster finished rebalancing and I can say my ingest rates are much better, now I can focus my efforts on building and scaling the individual pipelines sending me documents at 25K/s

system · September 30, 2019, 1:34pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cluster.routing.allocation.awareness.attributes not working as documented Elasticsearch	4	627	July 6, 2017
Allocation awareness - something not right Elasticsearch	26	1232	July 8, 2019
Shard Awareness and Allocation Elasticsearch	3	900	July 5, 2017
Replicas allocation issue in a hot cold architecture Elasticsearch	7	833	December 18, 2020
All replica shards reside on two nodes, 9 nodes total Elasticsearch	3	287	April 21, 2022

Replica allocation awareness

Related topics