Replicas ignoring allocation rules

Hi,
I've set-up a small testing cluster with hot-warm architecture. 1 hot, 1 warm node. Hot is also master and ingest node. Warm is only data and coordinator. I have ILM policy which rotates after 24h index from hot to warm.

However, I have a problem with replicas. When creating a new index everyday (timedata from logstash) replica of a new index seems to ignore allocation rules. What's more after rotating index with ILM primary seems to be running on hot node and replica on warm node.

I've tried to manually decrease number of replicas to 0. This seems to be working as expected and all shard are allocated to proper nodes. When I increase number of replicas back to one replicas stay unassigned. This is what I would expect in the first place when criteria for replicas can't be meet.

I looked at cluster.routing.allocation.allow_rebalance: always which seems to be preventing reallocation when index is not / won't be healthy after rebalance. I've changed the default setting indices_all_active to always this seems not to solve the problem.

Observed behavior:
Replicas on a new index seem to ignore allocation rules, even after rotation with ILM.

This sounds very surprising. Can you share more detailed information about the allocation rules that you've set up for the respective index and also share the allocation explain API output for the replicas that are allocated to the wrong node?

GET _cat/shards

syslog-2019.07.26-000009 0 p STARTED 551317 81mb 172.25.4.176 warm-logging-elk-3
syslog-2019.07.26-000009 0 r STARTED 551317 81.9mb 172.25.4.175 hot-logging-elk-2

GET syslog-2019.07.26*/_settings
"syslog-2019.07.26-000009" : {
    "settings" : {
      "index" : {
        "lifecycle" : {
          "name" : "logstash-policy",
          "rollover_alias" : "syslog",
          "indexing_complete" : "true"
        },
        "routing" : {
          "allocation" : {
            "require" : {
              "data" : "warm"
            }
          }
        },
        "refresh_interval" : "5s",
        "number_of_shards" : "1",
        "provided_name" : "<syslog-{now/d}-000009>",
        "creation_date" : "1564136984891",
        "priority" : "100",
        "number_of_replicas" : "1",
        "version" : {
          "created" : "7020099"
        }
      }
    }
  }
GET _cluster/state
{
  "_nodes": {
    "total": 2,
    "successful": 2,
    "failed": 0
  },
  "cluster_name": "elk",
  "nodes": {
    "4cYRknWaRSiff32Y-A29ng": {
      "name": "warm-logging-elk-3",
      "version": "7.2.0",
      "build_flavor": "default",
      "build_type": "deb",
      "build_hash": "508c38a",
      "roles": [
        "data"
      ],
      "attributes": {
        "ml.machine_memory": "21048377344",
        "ml.max_open_jobs": "20",
        "xpack.installed": "true",
        "data": "warm"
      }
    },
    "lCBklgj7RCat6JP1i3V3tg": {
      "name": "hot-logging-elk-2",
      "version": "7.2.0",
      "build_flavor": "default",
      "build_type": "deb",
      "build_hash": "508c38a",
      "roles": [
        "data",
        "ingest",
         "master"
      ],
      "attributes": {
        "ml.machine_memory": "15763595264",
        "xpack.installed": "true",
        "data": "hot",
        "ml.max_open_jobs": "20"
      }
    }
  }
}
  • all nodes have settings that contain
"settings.cluster" : {
          "name" : "elk",
          "routing" : {
            "allocation" : {
              "allow_rebalance" : "always"
            }
          }
        }
GET /_cluster/allocation/explain
{
  "index" : "syslog-2019.07.26-000009",
  "shard" : 0,
  "primary" : false,
  "current_state" : "started",
  "current_node" : {
    "id" : "lCBWDgj7RCat6JP1i3V3tg",
    "name" : "hot-logging-elk",
    "transport_address" : "172.25.4.175:9300",
    "attributes" : {
      "ml.machine_memory" : "15763595264",
      "ml.max_open_jobs" : "20",
      "xpack.installed" : "true",
      "data" : "hot"
    }
  },
  "can_remain_on_current_node" : "no",
  "can_remain_decisions" : [
    {
      "decider" : "filter",
      "decision" : "NO",
      "explanation" : """node does not match index setting [index.routing.allocation.require] filters [data:"warm"]"""
    }
  ],
  "can_move_to_other_node" : "no",
  "move_explanation" : "cannot move shard to another node, even though it is not allowed to remain on its current node",
  "node_allocation_decisions" : [
    {
      "node_id" : "4cYRknWMRSiff32Y-A29ng",
      "node_name" : "warm-logging-elk-3",
      "transport_address" : "172.25.4.176:9300",
      "node_attributes" : {
        "ml.machine_memory" : "21048377344",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true",
        "data" : "warm"
      },
      "node_decision" : "no",
      "weight_ranking" : 1,
      "deciders" : [
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[syslog-2019.07.26-000009][0], node[4cYRknWMRSiff32Y-A29ng], [P], s[STARTED], a[id=2vTwS3Z1TGyXbjqfriKKNA]]"
        }
      ]
    }
  ]
}

Executing

PUT syslog*/_settings
{
    "index" : {
        "number_of_replicas" : 0
    }
}
PUT syslog*/_settings
{
    "index" : {
        "number_of_replicas" : 1
    }
}

forces replica to be unassigned resulting in

syslog-2019.07.26-000009 0 p STARTED      551317   81mb 172.25.4.176 warm-logging-elk-3
syslog-2019.07.26-000009 0 r UNASSIGNED  

restarting hot node seems to have the same effect.

Based on the output from allocation/explain it seems that elasticsearch decides to keep replica there on purpose even if cluster.routing.allocation.allow_rebalance: always should allow to perform rebalance that results in cluster unhealthy state

Were there any allocation filter settings (index.routing.allocation.require.*) applied when the index was created? If not, then primary and replica shards can obviously be allocated to all nodes. This means that if the filter settings are only put in place afterwards, the system won't actively destroy the shard copy on the hot node but tries to move it elsewhere (and there's no place to put it, so there's nothing it can do). The solution here would be to initially put the index.routing.allocation.require.data: hot in place when the index is created (either directly on the index creation request, or in a template that applies to the newly created indices)

1 Like

This seems to be what happens even if there were allocation filter settings in place when the index is created, if the number of replicas is too large. When the index is hot it allocates the primary to a hot node but leaves the replica unassigned; when you switch the index over to warm it allocates the unassigned replica to a warm node (NB not a primary relocation, just a regular replica allocation) and will not then destroy the primary on the hot node since there's nowhere else to put it:

    public void testNumberOfReplicasOverridesAllocationRules() {
        internalCluster().startMasterOnlyNode();
        final String node1 = internalCluster().startDataOnlyNode();
        final String node2 = internalCluster().startDataOnlyNode();

        createIndex("test", Settings.builder()
            .put(IndexMetaData.SETTING_NUMBER_OF_SHARDS, 1)
            .put(IndexMetaData.SETTING_NUMBER_OF_REPLICAS, 1)
            .put(IndexMetaData.INDEX_ROUTING_REQUIRE_GROUP_PREFIX + "._name", node1)
            .build());

        ensureYellowAndNoInitializingShards("test");

        assertAcked(client().admin().indices().prepareUpdateSettings("test").setSettings(Settings.builder()
            .put(IndexMetaData.INDEX_ROUTING_REQUIRE_GROUP_PREFIX + "._name", node2)));

        ensureGreen("test");
    }
1 Like

The solution, @retep007, is to set number_of_replicas: 0 since there is no way you can have replicas in this cluster. In a sense the number_of_replicas setting takes precedence over the allocation rules: Elasticsearch will prefer to keep a badly-allocated copy of a shard around for redundancy rather than destroy it as you seem to want. Rebalancing will also never destroy a copy of a shard - it'll only ever move the existing shards around.

Thank you both @DavidTurner and @ywelsch. :slight_smile:

1 Like