[ESv5.6] Cluster and shards problems

Hi all,

I've cluster and shards problem on my ES cluster in v5.6.

Node01 :

# cat /etc/elasticsearch/elasticsearch.yml | grep -v "^#"
cluster.name: escluster
node.name: node-01
node.master: true
node.data: false
path.data: /data/elasticsearch
path.logs: /var/log/elasticsearch
network.host: ["127.0.0.1","10.0.0.1"]
http.port: 9200
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["10.0.0.1", "10.0.0.2", "10.0.0.3"]
discovery.zen.minimum_master_nodes: 2
action.auto_create_index: false

# cat /etc/sysconfig/elasticsearch | grep -v "^#"
LOG_DIR=/var/log/elasticsearch
ES_JAVA_OPTS="-Xms2g -Xmx2g"
ES_STARTUP_SLEEP_TIME=5

Node02 :

# cat /etc/elasticsearch/elasticsearch.yml | grep -v "^#"     
cluster.name: escluster
node.name: node-02
node.master: false
node.data: true
path.data: /data/elasticsearch
path.logs: /var/log/elasticsearch
network.host: ["127.0.0.1","10.0.0.2"]
http.port: 9200
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["10.0.0.1", "10.0.0.2", "10.0.0.3"]
discovery.zen.minimum_master_nodes: 2
action.auto_create_index: false

# cat /etc/sysconfig/elasticsearch | grep -v "^#"
LOG_DIR=/var/log/elasticsearch
ES_JAVA_OPTS="-Xms2g -Xmx2g"
ES_STARTUP_SLEEP_TIME=5

Node03 :

# cat /etc/elasticsearch/elasticsearch.yml | grep -v "^#"     
cluster.name: escluster
node.name: node-03
node.master: false
node.data: true
path.data: /data/elasticsearch
path.logs: /var/log/elasticsearch
network.host: ["127.0.0.1","10.0.0.3"]
http.port: 9200
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["10.0.0.1", "10.0.0.2", "10.0.0.3"]
discovery.zen.minimum_master_nodes: 2
action.auto_create_index: false

# cat /etc/sysconfig/elasticsearch | grep -v "^#"
LOG_DIR=/var/log/elasticsearch
ES_JAVA_OPTS="-Xms2g -Xmx2g"
ES_STARTUP_SLEEP_TIME=5

For each node, when I restart the service, it's up for < 1minute and pass to failed state.

In logs :

 unknown setting [discovery.zen.ping.multicast.enabled] please check that any required plugins are installed, or check the breaking changes documentation for removed settings

I comment this line in the configuration file and retry : the service is up. If we check the logs :

[WARN ][o.e.d.z.ZenDiscovery     ] [node-01] not enough master nodes discovered during pinging (found [[Candidate{node={node-01}{-ZFUWniuRaeJOoHVnKw6fQ}{sgMbuNMIQHSgQDMuGflU6w}{10.0.0.1}{10.0.0.1:9300}, clusterStateVersion=-1}]], but needed [2]), pinging again

We change the configuration to have 2 masters :

Node01 :

# cat /etc/elasticsearch/elasticsearch.yml | grep -v "^#"
cluster.name: escluster
node.name: node-01
node.master: true
node.data: false
path.data: /data/elasticsearch
path.logs: /var/log/elasticsearch
network.host: ["127.0.0.1","10.0.0.1"]
http.port: 9200
discovery.zen.ping.unicast.hosts: ["10.0.0.1", "10.0.0.2", "10.0.0.3"]
discovery.zen.minimum_master_nodes: 2
action.auto_create_index: false

Node02 :

# cat /etc/elasticsearch/elasticsearch.yml | grep -v "^#"     
cluster.name: escluster
node.name: node-02
node.master: true
node.data: true
path.data: /data/elasticsearch
path.logs: /var/log/elasticsearch
network.host: ["127.0.0.1","10.0.0.2"]
http.port: 9200
discovery.zen.ping.unicast.hosts: ["10.0.0.1", "10.0.0.2", "10.0.0.3"]
discovery.zen.minimum_master_nodes: 2
action.auto_create_index: false

Node03 :

# cat /etc/elasticsearch/elasticsearch.yml | grep -v "^#"     
cluster.name: escluster
node.name: node-03
node.master: false
node.data: true
path.data: /data/elasticsearch
path.logs: /var/log/elasticsearch
network.host: ["127.0.0.1","10.0.0.3"]
http.port: 9200
discovery.zen.ping.unicast.hosts: ["10.0.0.1", "10.0.0.2", "10.0.0.3"]
discovery.zen.minimum_master_nodes: 2
action.auto_create_index: false

We restart the ES service on 3 nodes : we have no errors and no warning.

Now, I check the index and shard status :

# curl -XGET 'localhost:9200/_cluster/health?pretty=true'
{
  "cluster_name" : "escluster",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 2,
  "active_primary_shards" : 8,
  "active_shards" : 16,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 8,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 66.66666666666666
}

The cluster status is red ...
I check the state :

# curl -XGET http://127.0.0.1:9200/_cluster/state?pretty
{
  "cluster_name" : "escluster",
  "version" : 16,
  "state_uuid" : "KeEWQGNbROabuOLFSN_xLw",
  "master_node" : "-ZFUWniuRaeJOoHVnKw6fQ",
  "blocks" : { },
  "nodes" : {
 [3 nodes in cluster]
  },
[...]

      "index_0" : {
        "shards" : {
          "3" : [
            {
              "state" : "UNASSIGNED",
              "primary" : true,
              "node" : null,
              "relocating_node" : null,
              "shard" : 3,
              "index" : "index_0",
              "recovery_source" : {
                "type" : "EXISTING_STORE"
              },
              "unassigned_info" : {
                "reason" : "CLUSTER_RECOVERED",
                "at" : "2017-12-19T09:34:24.810Z",
                "delayed" : false,
                "allocation_status" : "no_valid_shard_copy"
              }
            },
            {
              "state" : "UNASSIGNED",
              "primary" : false,
              "node" : null,
              "relocating_node" : null,
              "shard" : 3,
              "index" : "index_0",
              "recovery_source" : {
                "type" : "PEER"
              },
              "unassigned_info" : {
                "reason" : "CLUSTER_RECOVERED",
                "at" : "2017-12-19T09:34:24.810Z",
                "delayed" : false,
                "allocation_status" : "no_attempt"

(I cut the output for post limit)

We see several shards in UNASSIGNED state.

Someone could help me to resolv the red status and unassigned shards ?

Thanks !!

Hi @alias

It is recommended that you have at least 3 eligible masters, in which case "discovery.zen.minimum_master_nodes" must exist only on machines that are eligible, you can remove it from node03

Hi @Fram_Souza

I delete this parameter on node03.

I've the same errors : Cluster red and UNASSIGNED shards.

Any idea ?

Thanks

@alias

You have already run the _cluster/reroute ?

date nodes is only 2 you set replication is 2? so that you need 3 data nodes.

You should always define minimum masters on all nodes in a cluster, not just the masters.

What does the output from _cat/shards look like

1 Like

After differents try, all shards are started and 0 are unassigned. The cluster is green.

I've delete all data and indices, readd the minimum_master_nodes parameter.

Thanks you !

@alias Please check the Elasticsearch version on all the nodes. I had a similar issue where shards were not getting allocated because I had made a cluster out of nodes have different ES versions

@Abhilash_Bolla thanks, but it's one of the first check and all version are OK (ES, and Lucene)

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.