BUG: Elasticsearch ignoring node.roles after upgrade from 7.6.1

mverbeek · May 31, 2021, 7:47am

Hello everyone,

When upgrading from 7.6.1 to 7.10.2, I replaced the old node.master and node.data setting to the new node.roles setting. I changed 2 of the 5 data nodes to data_cold only. But if the node has already been used as an data node before it will not respect this role. It will start and work as expected but it will still hold warm/hot shards. You can also still move warm/hot shards to the node.

When I do an fresh install with 7.10.2 with the same config and cluster settings it does use the data_cold role as expected.
I typed out an simple log to recreate this problem if anyone wants to recreate it.

Note: upgrading to the newest version of Elastic also did not work.

warkolm · May 31, 2021, 7:48am

Can you share your config please.

mverbeek · May 31, 2021, 7:51am

cluster.name: test-cluster
node.name: elkm02
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 192.168.1.5
discovery.seed_hosts: ["192.168.1.1", "192.168.1.2", "192.168.1.3", "192.168.1.5"]
cluster.initial_master_nodes: ["192.168.1.1", "192.168.1.5"]
node.roles: master, ingest

Ofcourse, this is the config of the cluster I manged to recreate the problem in.
This is the second test master, only changes are the name and roles

mverbeek · May 31, 2021, 11:31am

Any luck finding something? I can post the steps for what I did to recreate the problem if that helps.

DavidTurner · May 31, 2021, 7:14pm

Could you share GET _cat/nodes from the cluster exhibiting the problem?

If after completing the upgrade you do a further rolling restart (i.e. restart all nodes, one-by-one) does the problem persist?

mverbeek · June 1, 2021, 9:19am

192.168.1.3 56 95 4 0.33 0.14 0.13 hsw - elkd02
192.168.1.4  8 95 5 0.16 0.03 0.05 c   - elkd03
192.168.1.5 34 95 2 0.08 0.02 0.03 im  * elkm02
192.168.1.2 15 95 1 0.62 0.86 0.87 hsw - elkd01
192.168.1.1 46 94 2 0.29 0.25 0.22 im  - elkm01

Yes, even if I put the entire cluster down and up the problem still persists.

DavidTurner · June 1, 2021, 9:34am

Ok would you use the cluster allocation explain API to explain the allocation of one of the shards you think to be allocated in the wrong place?

mverbeek · June 1, 2021, 9:53am

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "unable to find any unassigned shards to explain [ClusterAllocationExplainRequest[useAnyUnassignedShard=true,includeYesDecisions?=false]"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "unable to find any unassigned shards to explain [ClusterAllocationExplainRequest[useAnyUnassignedShard=true,includeYesDecisions?=false]"
  },
  "status" : 400
}

That is the weird thing. The server thinks that its all fine, even tho there are hot/warm indices on the cold node

DavidTurner · June 1, 2021, 10:23am

You need to tell the API which shard to explain, otherwise it just picks a random unassigned one and fails if all shards are assigned.

mverbeek · June 1, 2021, 11:08am

Sorry, my bad.
Here it is:

{
  "index" : "my-index-000006",
  "shard" : 0,
  "primary" : true,
  "current_state" : "started",
  "current_node" : {
    "id" : "cFGH4_FKRoKPlEg8rPD6Mg",
    "name" : "elkd03",
    "transport_address" : "192.168.1.4:9300",
    "attributes" : {
      "xpack.installed" : "true",
      "transform.node" : "false"
    },
    "weight_ranking" : 1
  },
  "can_remain_on_current_node" : "yes",
  "can_rebalance_cluster" : "yes",
  "can_rebalance_to_other_node" : "no",
  "rebalance_explanation" : "cannot rebalance as no target node exists that can both allocate this shard and improve the cluster balance",
  "node_allocation_decisions" : [
    {
      "node_id" : "PEb7VU_1RKa8q3HN8J7LCA",
      "node_name" : "elkd02",
      "transport_address" : "192.168.1.3:9300",
      "node_attributes" : {
        "xpack.installed" : "true",
        "transform.node" : "false"
      },
      "node_decision" : "worse_balance",
      "weight_ranking" : 1
    },
    {
      "node_id" : "iNdDsFRiSquDN_V8iXwSPA",
      "node_name" : "elkd01",
      "transport_address" : "192.168.1.2:9300",
      "node_attributes" : {
        "xpack.installed" : "true",
        "transform.node" : "false"
      },
      "node_decision" : "worse_balance",
      "weight_ranking" : 1
    }
  ]
}

This is an newly made index with one shard that is located on the cold node.

DavidTurner · June 1, 2021, 11:28am

Ok this shard can be allocated to all three nodes. What does GET /my-index-000006/_settings return? What steps have you taken to exclude it from the cold node?

mverbeek · June 1, 2021, 11:36am

{
  "my-index-000006" : {
    "settings" : {
      "index" : {
        "creation_date" : "1622533255026",
        "number_of_shards" : "1",
        "number_of_replicas" : "0",
        "uuid" : "aTg9qJNSQxOEJwl-IVU1wA",
        "version" : {
          "created" : "7060199",
          "upgraded" : "7100299"
        },
        "provided_name" : "my-index-000006"
      }
    }
  }
}

Currently on this test environment nothing outside of the roles, If I recreate it like this with an fresh install the shard cannot be moved to that cold node since it does not meet the requirement.

[NO(index has a preference for tiers [data_content] and node does not meet the required [data_content] tier)]

mverbeek · June 3, 2021, 6:43am

Would it be possible that Elastic has an internal system that prevents the cold/hot/warm role from working until there are enough managed indices?
I have been testing with that theory and it seems to be working but then why do the roles work immediately with an fresh install.

DavidTurner · June 3, 2021, 7:01am

This index has no settings to restrict its allocation to any particular tier so it can be allocated anywhere. If you check GET /$INDEX/_settings on your fresh install you will see that newer indices do have allocation settings applied. If you want to restrict the older indices to particular tiers you'll need to apply those settings yourself.

mverbeek · June 3, 2021, 7:08am

Weird then, then I still can't explain the behavior on the production server. It ignored the roles completely until I added an new cold policy to change an bunch of older indices to cold. But we already had an working hot/warm/cold ILM in place at that time.

system · July 1, 2021, 7:08am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elastic ignoring cold_role Elasticsearch ilm-index-lifecycle-management	10	418	June 7, 2021
Node.role: [data_cold] Elasticsearch	2	915	December 12, 2020
Cluster node roles changed after rolling upgrade Elasticsearch	2	1651	June 8, 2021
Hot shards get allocated to cold nodes Elasticsearch ilm-index-lifecycle-management	2	990	August 10, 2021
Inconsistent node role display for the same configuration Elasticsearch	4	313	February 12, 2022

BUG: Elasticsearch ignoring node.roles after upgrade from 7.6.1

Related topics