[ESv5.6] Cluster and shards problems


#1

Hi all,

I've cluster and shards problem on my ES cluster in v5.6.

Node01 :

# cat /etc/elasticsearch/elasticsearch.yml | grep -v "^#"
cluster.name: escluster
node.name: node-01
node.master: true
node.data: false
path.data: /data/elasticsearch
path.logs: /var/log/elasticsearch
network.host: ["127.0.0.1","10.0.0.1"]
http.port: 9200
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["10.0.0.1", "10.0.0.2", "10.0.0.3"]
discovery.zen.minimum_master_nodes: 2
action.auto_create_index: false

# cat /etc/sysconfig/elasticsearch | grep -v "^#"
LOG_DIR=/var/log/elasticsearch
ES_JAVA_OPTS="-Xms2g -Xmx2g"
ES_STARTUP_SLEEP_TIME=5

Node02 :

# cat /etc/elasticsearch/elasticsearch.yml | grep -v "^#"     
cluster.name: escluster
node.name: node-02
node.master: false
node.data: true
path.data: /data/elasticsearch
path.logs: /var/log/elasticsearch
network.host: ["127.0.0.1","10.0.0.2"]
http.port: 9200
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["10.0.0.1", "10.0.0.2", "10.0.0.3"]
discovery.zen.minimum_master_nodes: 2
action.auto_create_index: false

# cat /etc/sysconfig/elasticsearch | grep -v "^#"
LOG_DIR=/var/log/elasticsearch
ES_JAVA_OPTS="-Xms2g -Xmx2g"
ES_STARTUP_SLEEP_TIME=5

Node03 :

# cat /etc/elasticsearch/elasticsearch.yml | grep -v "^#"     
cluster.name: escluster
node.name: node-03
node.master: false
node.data: true
path.data: /data/elasticsearch
path.logs: /var/log/elasticsearch
network.host: ["127.0.0.1","10.0.0.3"]
http.port: 9200
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["10.0.0.1", "10.0.0.2", "10.0.0.3"]
discovery.zen.minimum_master_nodes: 2
action.auto_create_index: false

# cat /etc/sysconfig/elasticsearch | grep -v "^#"
LOG_DIR=/var/log/elasticsearch
ES_JAVA_OPTS="-Xms2g -Xmx2g"
ES_STARTUP_SLEEP_TIME=5

For each node, when I restart the service, it's up for < 1minute and pass to failed state.

In logs :

 unknown setting [discovery.zen.ping.multicast.enabled] please check that any required plugins are installed, or check the breaking changes documentation for removed settings

I comment this line in the configuration file and retry : the service is up. If we check the logs :

[WARN ][o.e.d.z.ZenDiscovery     ] [node-01] not enough master nodes discovered during pinging (found [[Candidate{node={node-01}{-ZFUWniuRaeJOoHVnKw6fQ}{sgMbuNMIQHSgQDMuGflU6w}{10.0.0.1}{10.0.0.1:9300}, clusterStateVersion=-1}]], but needed [2]), pinging again

We change the configuration to have 2 masters :

Node01 :

# cat /etc/elasticsearch/elasticsearch.yml | grep -v "^#"
cluster.name: escluster
node.name: node-01
node.master: true
node.data: false
path.data: /data/elasticsearch
path.logs: /var/log/elasticsearch
network.host: ["127.0.0.1","10.0.0.1"]
http.port: 9200
discovery.zen.ping.unicast.hosts: ["10.0.0.1", "10.0.0.2", "10.0.0.3"]
discovery.zen.minimum_master_nodes: 2
action.auto_create_index: false

Node02 :

# cat /etc/elasticsearch/elasticsearch.yml | grep -v "^#"     
cluster.name: escluster
node.name: node-02
node.master: true
node.data: true
path.data: /data/elasticsearch
path.logs: /var/log/elasticsearch
network.host: ["127.0.0.1","10.0.0.2"]
http.port: 9200
discovery.zen.ping.unicast.hosts: ["10.0.0.1", "10.0.0.2", "10.0.0.3"]
discovery.zen.minimum_master_nodes: 2
action.auto_create_index: false

Node03 :

# cat /etc/elasticsearch/elasticsearch.yml | grep -v "^#"     
cluster.name: escluster
node.name: node-03
node.master: false
node.data: true
path.data: /data/elasticsearch
path.logs: /var/log/elasticsearch
network.host: ["127.0.0.1","10.0.0.3"]
http.port: 9200
discovery.zen.ping.unicast.hosts: ["10.0.0.1", "10.0.0.2", "10.0.0.3"]
discovery.zen.minimum_master_nodes: 2
action.auto_create_index: false

We restart the ES service on 3 nodes : we have no errors and no warning.

Now, I check the index and shard status :

# curl -XGET 'localhost:9200/_cluster/health?pretty=true'
{
  "cluster_name" : "escluster",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 2,
  "active_primary_shards" : 8,
  "active_shards" : 16,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 8,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 66.66666666666666
}

The cluster status is red ...
I check the state :

# curl -XGET http://127.0.0.1:9200/_cluster/state?pretty
{
  "cluster_name" : "escluster",
  "version" : 16,
  "state_uuid" : "KeEWQGNbROabuOLFSN_xLw",
  "master_node" : "-ZFUWniuRaeJOoHVnKw6fQ",
  "blocks" : { },
  "nodes" : {
 [3 nodes in cluster]
  },
[...]

      "index_0" : {
        "shards" : {
          "3" : [
            {
              "state" : "UNASSIGNED",
              "primary" : true,
              "node" : null,
              "relocating_node" : null,
              "shard" : 3,
              "index" : "index_0",
              "recovery_source" : {
                "type" : "EXISTING_STORE"
              },
              "unassigned_info" : {
                "reason" : "CLUSTER_RECOVERED",
                "at" : "2017-12-19T09:34:24.810Z",
                "delayed" : false,
                "allocation_status" : "no_valid_shard_copy"
              }
            },
            {
              "state" : "UNASSIGNED",
              "primary" : false,
              "node" : null,
              "relocating_node" : null,
              "shard" : 3,
              "index" : "index_0",
              "recovery_source" : {
                "type" : "PEER"
              },
              "unassigned_info" : {
                "reason" : "CLUSTER_RECOVERED",
                "at" : "2017-12-19T09:34:24.810Z",
                "delayed" : false,
                "allocation_status" : "no_attempt"

(I cut the output for post limit)

We see several shards in UNASSIGNED state.

Someone could help me to resolv the red status and unassigned shards ?

Thanks !!


(Fram Souza) #2

Hi @alias

It is recommended that you have at least 3 eligible masters, in which case "discovery.zen.minimum_master_nodes" must exist only on machines that are eligible, you can remove it from node03


#3

Hi @Fram_Souza

I delete this parameter on node03.

I've the same errors : Cluster red and UNASSIGNED shards.

Any idea ?

Thanks


(Fram Souza) #4

@alias

You have already run the _cluster/reroute ?


(andy_zhou) #5

date nodes is only 2 you set replication is 2? so that you need 3 data nodes.


(Mark Walkom) #6

You should always define minimum masters on all nodes in a cluster, not just the masters.

What does the output from _cat/shards look like


#7

After differents try, all shards are started and 0 are unassigned. The cluster is green.

I've delete all data and indices, readd the minimum_master_nodes parameter.

Thanks you !


(Abhilash Bolla) #8

@alias Please check the Elasticsearch version on all the nodes. I had a similar issue where shards were not getting allocated because I had made a cluster out of nodes have different ES versions


#9

@Abhilash_Bolla thanks, but it's one of the first check and all version are OK (ES, and Lucene)


(system) #10

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.