ES goes to red when node restarts

Yogesh_BG · August 5, 2019, 4:23am

Hi

I am using ES 6.4 version. And I have 3 machines, Each of them have one instance of ES master and ES data node running as separate processes. totally 6 nodes, 3 master and 3 data nodes. Replication Factor is being set to 1.

When we restart all nodes together, ES goes to red state. We had set min master and min data node required to be 2. Any thing else can lead to this issue?

metrics-2019.08.01   2 p UNASSIGNED CLUSTER_RECOVERED
metrics-2019.08.01   2 r UNASSIGNED CLUSTER_RECOVERED


{
  "index": "metrics-2019.08.01",
 "shard": 2,
 "primary": true,
  "current_state": "unassigned",
"unassigned_info": {
  "reason": "CLUSTER_RECOVERED",
"at": "2019-08-01T19:22:50.394Z",
"last_allocation_status": "no_valid_shard_copy"
},
  "can_allocate": "no_valid_shard_copy",
"allocate_explanation": "cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster",
"node_allocation_decisions": [
{
  "node_id": "I8hrSGVsQcO0c7DQTdmdgA",
  "node_name": "metrics-datastore-1",
  "transport_address": "192.168.25.79:9300",
  "node_attributes": {
    "ml.machine_memory": "33566429184",
    "ml.max_open_jobs": "20",
    "xpack.installed": "true",
    "ml.enabled": "true"
  },
  "node_decision": "no",
  "store": {
    "found": false
  }
},
{
  "node_id": "L-TlEqTJRjuQKJBMFsnSgw",
  "node_name": "metrics-datastore-0",
  "transport_address": "192.168.25.18:9300",
  "node_attributes": {
    "ml.machine_memory": "33566429184",
    "ml.max_open_jobs": "20",
    "xpack.installed": "true",
    "ml.enabled": "true"
  },
  "node_decision": "no",
  "store": {
    "found": false
  }
},
{
  "node_id": "zTKAccDPSZezu7iyYbVVww",
  "node_name": "metrics-datastore-2",
  "transport_address": "192.168.25.53:9300",
  "node_attributes": {
    "ml.machine_memory": "33566429184",
    "ml.max_open_jobs": "20",
    "xpack.installed": "true",
    "ml.enabled": "true"
  },
  "node_decision": "no",
  "store": {
    "found": false
  }
}
]
}

DavidTurner · August 5, 2019, 6:12am

This response is indicating that the data for this shard is completely gone. Is every shard unassigned or is it just some of them? Can you share the output of GET _cluster/health? Also could you share your elasticsearch.yml files (use https://gist.github.com if they are too large to fit here).

Yogesh_BG · August 5, 2019, 9:59am

Hi only one of the index among 4 or 5 indices goes to red.

{
  "cluster_name" : "metrics-datastore",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 6,
"number_of_data_nodes" : 3,
"active_primary_shards" : 14,
"active_shards" : 28,
"relocating_shards" : 0,
"  initializing_shards" : 0,
"unassigned_shards" : 2,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 93.33333333333333
}

ES config File

cluster.name: metrics-datastore
node.name: metrics-datastore-0
node.master: false
node.data: true
node.max_local_storage_nodes: 1
path.data: /data/elasticsearch/data,logs/elasticsearch/data 
path.logs: /logs/elasticsearch
path.repo: /cfs/data/harmony_backup/esbackup
bootstrap.memory_lock: true
http.port: 9200
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.unicast.hosts: metrics-master
discovery.zen.fd.ping_interval: 10s
discovery.zen.fd.ping_retries: 5
discovery.zen.fd.ping_timeout: 120s
gateway.recover_after_master_nodes: 2
gateway.recover_after_time: 5m
http.cors.enabled: false
indices.fielddata.cache.size: 10%
indices.memory.index_buffer_size: 30%
thread_pool.write.queue_size: 2000
network.bind_host: <IP>, _local_
network.publish_host: <IP>
logger.gateway: TRACE
gateway.expected_nodes: 6
gateway.expected_master_nodes: 3
gateway.expected_data_nodes: 3
gateway.recover_after_data_nodes: 2

Yogesh_BG · August 5, 2019, 10:05am

index conf are as below

{
  "index_patterns": ["*"],
  "order": 1,
  "settings": {
"index.number_of_replicas": 1,
"index.number_of_shards": 3,
     "index.merge.scheduler.max_thread_count": 1,
     "index.refresh_interval": "30s",
     "index.translog.durability": "async",
     "index.translog.flush_threshold_size": "1g",
     "index.translog.sync_interval": "10s",
     "index.unassigned.node_left.delayed_timeout": "10m",
     "index.mapping.total_fields.limit": 3000
  }
}

DavidTurner · August 5, 2019, 10:43am

This is the default for this setting. Are you setting this to something other than 1 on any of your nodes?

This tells Elasticsearch to split its data between /data/elasticsearch/data and $ES_HOME/logs/elasticsearch/data. This is a fairly unusual configuration. Why are you set up like this? It's normally better to just use a single data path. Is it possible that $ES_HOME/logs is being cleared out on a restart?

Yogesh_BG · August 5, 2019, 12:01pm

No we don't set below property to any other value

max_local_storage_nodes

and about path, we have two disks one mounted with /data and another with <es_home>/logs, and they don't get cleared

DavidTurner · August 6, 2019, 9:11am

I don't have any further ideas. The shard data is no longer anywhere that Elasticsearch can find it, so either it's looking in the wrong places or else something else has deleted it.

Christian_Dahlqvist · August 6, 2019, 9:36am

Have you specified different data paths for the master and data nodes running on the same host? If not I assume the nodes may flip directories depending on which order they come up in??

DavidTurner · August 6, 2019, 10:23am

That was my guess too, but that doesn't happen if node.max_local_storage_nodes: 1 on every node.

Yogesh_BG · August 9, 2019, 9:04am

Hi

Thanks for helping me out, found out the issue and fixed it. It was related to data.path where we had given two paths and one of them was going corrupted because of some disk issue. We can close this topic as resolved

system · September 6, 2019, 9:12am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cluster turns to red after reboot Elasticsearch	29	2862	January 4, 2019
HIgh availability of ES Elasticsearch	29	1718	December 19, 2018
Data node lost, all shards go to RED - Data node returns but shards lost forever Elasticsearch	10	3610	July 24, 2019
Elasticsearch cluster is in Red state. How to recover it? Elasticsearch	14	10949	December 25, 2019
3 nodes ES 2.3.2 cluster with Replica 2 goes to red state after bringing down whole cluster and starting only a single node Elasticsearch	5	864	June 22, 2017

ES goes to red when node restarts

Related topics