How to completely reset forced shard allocation awareness?

dschneiter · April 29, 2021, 7:40pm

While teaching about shard allocation awareness and forced shard allocation awareness (Elasticsearch Engineer II course), one of the students made a typo in his request to configure forced shard allocation awareness which broke the Elasticsearch cluster. I could not properly clean up things, only found a workaround to make the cluster operational again.

What happened?

Start up a 4-node cluster, in which

node1 & node2 are tagged with attribute my_rack with value rack1, and
node3 & node4 are tagged with attribute my_rack with value rack2,
all but node4 are master-eligible

Configure Shard allocation awareness - still all fine

PUT _cluster/settings
{
  "transient": {
    "cluster": {
      "routing": {
        "allocation.awareness.attributes": "my_rack"
      }
    }
  }
}

Shut down node3 & node4 and get the shards reallocated
Now configure forced shard allocation awareness with the following command:

PUT _cluster/settings
{
  "persistent": {
    "cluster": {
      "routing": {
        "allocation.awareness.attributes": "my_rack",
        "allocation.awareness.force.my_rack_values": "rack1,rack2"
      }
    }
  }
}

Notre the typo: the student accidentally typed an underscore (instead of a dot) to separate attribute name my_rack and the values keyword. While the request passes through it will render the cluster non-functional. nodes seem to quit and no longer being able to join because of the seeting not containing an expected ".".

As a workaround I stopped all nodes, added a the second attribute my_rack_values and also put the shard allocation awareness configuration using this bogus attribute name into the yml-file. This allowed to start up again all nodes, making the cluster operational again, no longer constantly throwing exceptions that nodes cannot join and there is no master node. Obvisouly with a wrong, unwanted attribute name.

After the succesfull startup I tried to change again the shard allocation awareness configuration to the proper attribute name, but it seems that the values for forced shard allocation awareness just get added, not replaced. Initially everything worked, but when shutting down the nodes, commenting out the bogus attribute name and shard allocation rules, again the nodes would not properly start up.

I tried to reset forced shard allocation awareness by setting the property to null, but I haven't managed to come up with a correct syntax that would have allowed me doing so. in the end I was not able to clean up / recover from this bogus statement:

"cluster.routing.allocation.awareness.force.my_rack_values": "rack1,rack2"

This happened with Elasticsearch 7.8.1. Any ideas how to get such a "type" fixed? I was a bit surprised to see how fatal such a typo can be...

DavidTurner · April 29, 2021, 8:22pm

That appears to be a nasty bug, would you open an issue for it on Github? Please include the logs in the report, including the exception. The invalid setting update should be rejected.

You can technically recover by shutting the whole cluster down and running elasticsearch-node remove-settings cluster.routing.allocation.awareness.force.my_rack_values on every node, but we don't encourage that sort of thing nor do we think it's a good long-term solution. Best to report it as a bug.

DavidTurner · April 29, 2021, 8:36pm

FWIW this is rather simple to reproduce, you only need one node with basically any config, and you just have to do this one thing to destroy it:

PUT /_cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.awareness.force.nonsense": ""
  }
}

dschneiter · April 29, 2021, 9:21pm

Thanks @DavidTurner

Before raising it as a bug I wanted to raise it here for clarification. Unfortunately I can't get hold of the logs any longer as the instance has already been wiped.

Elasticsearch issue: Wrong forced shard allocation setting crashing the cluster · Issue #72524 · elastic/elasticsearch · GitHub

Daniel

system · May 27, 2021, 9:21pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Question shard allocation awareness? Elasticsearch	16	2501	November 14, 2019
Enable/Disable shard allocation awareness dynamically Elasticsearch	1	790	August 29, 2018
Issue with forced shard allocation awareness Elasticsearch	3	111	June 4, 2024
Unassigned shards after adding Shard Allocation Awareness Elasticsearch	4	462	December 7, 2016
Shard Allocation awareness stopped working Elasticsearch	3	429	February 14, 2020

How to completely reset forced shard allocation awareness?

Related topics