Recovering from cluster.blocks.read_only=true in dynamic settings

I managed to get cluster.blocks.read_only=true into my persistent dynamic settings. Unfortunately, when the cluster is read only it doesn't seem to allow changes in the dynamic settings so I can't figure out how to get out of this situation.

$ cat /tmp/dynamic-settings.json
{
    "transient": {
        "cluster.blocks.read_only": "false"
    },
    "persistent": {
        "cluster.blocks.read_only": "false"
    }
}
$ curl -XPUT localhost:9200/_cluster/settings?pretty -d @- < /tmp/dynamic-settings.json
{
  "error" : "RemoteTransportException[[seldlx3178][inet[/10.128.247.145:9300]][cluster:admin/settings/update]]; nested: ClusterBlockException[blocked by: [FORBIDDEN/6/cluster read-only (api)];]; ",
  "status" : 403
}

I'm able to set read_only=false as a transient setting only (like above but with the persistent setting omitted) so that I end up with read_only=false as a transient setting but read_only=true remains as a persistent setting and seems to take precedence.

This is on a test cluster so it's okay to e.g. lose some state or do a full cluster restart but it would be nice to keep the data in the indexes. Help? I'm running Elasticsearch 1.7.1.

I think you'll need an entire restart to get around this one :frowning:

Huh. A full cluster restart actually did help, which I find surprising since the problematic cluster.blocks.read_only setting was set as a persistent setting.

But it seems it only worked when just setting the persistent setting in a request. When setting the persistent and transient settings in the same request (as in the original post above) after the restart I still got the complaint about the cluster being read-only. When gathering a shell transcript as evidence for a bug report a few minutes later it worked:

$ curl -XPUT localhost:9200/_cluster/settings?pretty -d '
{
  "persistent" : {
    "cluster" : {
      "blocks" : {
        "read_only" : "false"
      }
    }
  }
}'
{
  "acknowledged" : true,
  "persistent" : {
    "cluster" : {
      "blocks" : {
        "read_only" : "false"
      }
    }
  },
  "transient" : { }
}

I'm glad I got out of this, but could there be any legitimate reason why this setting should be allowed to be set as persistent? How's this setting supposed to work? Are you supposed to be able to disable the read-only state if it has been enabled as a transient setting or is a full restart supposed to be the only away out? And given the potentially severe impact of such an operation, is that reasonable? The documentation,

Have the whole cluster read only (indices do not accept write operations), metadata is not allowed to be modified (create or delete indices).

is pretty terse and perhaps even misleading since it's easy to interpret it like it's only index creation and deletion that's affected by the setting.

There are potential reasons, but it seems very edge case for me.

I'd raise a GH issue and see what the dev team reckon. We strive to be as safe as possible with ES so perhaps they can elaborate why this exists, or otherwise remove it.

Issue created: