Elasticsearch cluster shards allocation not even

Hi,
I have an Elasticsearch cluster setup with 3 nodes. 2 of them are Master and Data nodes with one of them being a Data only node.
I had a situation where the Data only node had twice the amount of shards allocated to it than the 2 Master nodes. It looked like all the replica shards from the Master nodes were being allocated to the Data only node. In order to balance it out I decided to remove one of the Master nodes temporarily from the cluster. Once I did this then the shards between the single Master node and the Data node levelled out and both were are 158 shards each.
At this point I reintroduced the second Master back into the cluster hoping that the shards would spread evenly between the 3 nodes.
Currently it looks like the shards are being re-allocated across the cluster, but the shards seem to be going from the Master to the new Master rather than the shards going from the Data node + Master node to the reintroduced Master node.
Any ideas why this is like this? I have 500GB on each node, but the data is not being spread evenly as the Data only node has the majority of data on it.
This the setting for the cluster:
{
"persistent": {
"cluster": {
"routing": {
"rebalance": {
"enable": "all"
},
"allocation": {
"allow_rebalance": "always",
"enable": "all",
"exclude": {
"_ip": ""
}
}
}
}
},
"transient": {
"cluster": {
"routing": {
"rebalance": {
"enable": "all"
},
"allocation": {
"allow_rebalance": "always",
"include": {
"_ip": ""
},
"exclude": {
"_ip": ""
},
"enable": "all"
}
}
}
}
}

How can I get the load to be spread evenly, rather than the Data node holding most of the replica and primary shards?

Thanks

Your cluster settings contains a mix of persistent and transient settings, and this can cause confusion. Additionally many of these settings look to be set to their defaults. I'd recommend cleaning them up, and in future only use the persistent settings (the transient ones can evaporate sometimes which only adds to the confusion):

PUT _cluster/settings
{
  "persistent": {
    "cluster": {
      "routing": {
        "rebalance": {
          "enable": null
        },
        "allocation": {
          "enable": null,
          "exclude": {
            "_ip": null
          }
        }
      }
    }
  },
  "transient": {
    "cluster": {
      "routing": {
        "rebalance": {
          "enable": null
        },
        "allocation": {
          "allow_rebalance": null,
          "include": {
            "_ip": null
          },
          "exclude": {
            "_ip": null
          },
          "enable": null
        }
      }
    }
  }
}

However the settings you quote do not look surprising. You can also add some of these settings to the elasticsearch.yml config file - are there any in your config? In addition to the cluster-wide allocation settings there are index-level settings. Are these set on any of your indices?

There are no further cluster settings in the elasticsearch.yml file. I have some basic settings in there:
cluster.name: elasticsearch-cluster
network.host: ec2
node.data: true
node.master: true
cloud.node.auto_attributes: true
discovery.ec2.tag.Type: thetag
discovery.zen.hosts_provider: ec2
cluster.routing.allocation.awareness.attributes: aws_availability_zone
cloud.aws.region: eu-west-1
xpack.monitoring.exporters:
id1:
type: http
host: ["myip"]

xpack.security.enabled: false

There are no index level settings that I have set. It is probably worth mentioning that the cluster is being used by Graylog and not the traditional ELK stack so not sure if Graylog is allocating the shards to the nodes using it's own methods.

For the time being I have turned the Data node into a Master + Data node and excluded it from the cluster routing hoping the other nodes can "catch up" at which point i will put it back into the cluster.

Are your two master nodes in the same aws_availability_zone?

The 2 Master nodes are in eu-west-1c and the other Data node is in eu-west-1b. I thought as long as they are in the same region it should work fine?

Ok that explains it. You have allocation awareness enabled which means Elasticsearch is trying to balance the shards of each index evenly across the zones. Since the two master nodes are in the same zone they get roughly a quarter of the shards each. Move one to eu-west-1a, or disable allocation awareness, and you should see a more even spread of data.

You also should set discovery.zen.minimum_master_nodes to 2 if you have two or three master nodes, or else you will at some point see a split brain. It probably makes sense to have all three nodes be master nodes, because if you only have two of them then your cluster can't cope with the loss of either.

I've added a new Master + Data node to the cluster in eu-west-1, it has the version 5.6.15 and the other nodes are on 5.6.14. This is a minor version difference so i'm guessing this should not be a problem?
I've excluded the other Master and shard are being redistributed at the moment. Will see how it goes from here.
Thanks for the help.

We generally recommend having all the nodes on the same version, although the test suite does cover mixed-version clusters too. In this case the only change between 5.6.14 and 5.6.15 in Elasticsearch is very minor so you'll probably be fine.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.