`cluster.routing.allocation.enable: none` set by ES?

Hi,

We had four (4) out of nine (9) nodes run into OOM issues the other day and the processes subsequently got killed. They were later started automatically (as we have a curator systemd unit with Requires=elasticsearch.service) and failed with the following stacktrace: gist

Which is a bit odd as 24 hours later they were started again, without any of the above errors (our metrics doesn't show that the nodes were out of disk either). All nodes did however log the following, without any operator disabling cluster shard allocation at the time:

Jun 11 00:00:31 <hostname>.ec2.internal rkt[27354]: [5734381.586362] elasticsearch[6]: [2018-06-11T00:00:31,477][INFO ][o.e.c.s.ClusterSettings  ] [rkt-26a7e9c0-3cf1-4758-810c-23470856e39e] updating [cluster.routing.allocation.enable] from [ALL] to [none]

Checking the cluster settings confirmed that it was set persistent as well:

$ curl --silent "http://127.0.0.1:9200/_cluster/settings" | jq .
{
  "persistent": {
    "cluster": {
      "routing": {
        "allocation": {
          "enable": "none"
        }
      }
    }
  },
  "transient": {}
}

Is there any scenario where Elasticsearch will automatically disable cluster shard allocation? If not, the other possible reason for it being disabled would be that an operator disabled it at some past point (though it's been several weeks since we've replaced nodes). Will the shard allocation only be disabled for shards that existed when the setting was changed, or will it prevent new shards from being created?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.