[root@metrics-datastore-0 esutilities]# curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
106 212 106 212 0 0 6074 0 --:--:-- --:--:-- --:--:-- 6625
logs-2018.11.27.07 0 p UNASSIGNED CLUSTER_RECOVERED
is . there a way to fix it and can you explain me how it could happened and how to prevent this?
Though cluster is in RED it accepts the requests to other indices
What @cy_lir said, but the TLDR is to run GET /_cluster/allocation/explain and share the output here if it's unclear. Please use the </> button to format output, it makes it much easier to read.
Ah, ok, then it's harder. The high-level issue is that there is no available on-disk copy of shard 0 of index logs-2018.11.27.07. Off the top of my head this will either be because:
there is an on-disk copy, but it's corrupt
the node holding the on-disk copy is no longer in the cluster.
I don't know 2.4 very well, but I think the first of these will result in lots of log messages, but I'm not sure how to determine the second. The health output you quote mentions 2 nodes of which one is a data node. Is this right, or should there be another node?
Again i tested the similar scenario with 6 nodes( 3 master and 3 data). When all of them restarted ended up two indices in RED
[root@metrics-datastore-0 esutilities]# sh check_cluster.sh
{
"cluster_name" : "metrics-datastore",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 6,
"number_of_data_nodes" : 3,
"active_primary_shards" : 10,
"active_shards" : 30,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 6,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 83.33333333333334
}
[root@metrics-datastore-0 esutilities]# curl -XGET localhost:9200/_cat/shards?
h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1908 100 1908 0 0 19032 0 --:--:-- --:--:-- --:--:-- 19272
logs-2018.11.28.11 0 p UNASSIGNED CLUSTER_RECOVERED
logs-2018.11.28.11 0 r UNASSIGNED CLUSTER_RECOVERED
logs-2018.11.28.11 0 r UNASSIGNED CLUSTER_RECOVERED
metrics-2018.11.25 1 p UNASSIGNED CLUSTER_RECOVERED
metrics-2018.11.25 1 r UNASSIGNED CLUSTER_RECOVERED
metrics-2018.11.25 1 r UNASSIGNED CLUSTER_RECOVERED
Is there a way to recover from this with out data loss, i know if only replica are in this state rerouting will help. Here primary shards also in CLUSTER_RECOVERED. Can we do something to recover?
You seem to have multiple clusters with the same name. It is possible that nodes might be joining the wrong cluster when started. Does this effect still occur if you only run one cluster at a time?
Can you reproduce this on a version that isn't past the end of it's supported life, 5.6 or above?
i have only one cluster in my setup, but i am using version 2.4. Is this a kind of known issue in this version? because upgrading to latest version is a big task for us.
I am confused. This thread started out asking about a cluster called metrics-datastore with 1 master-eligible node and 1 data node, and then asked about a cluster with the same name with 3 master-eligible nodes and 3 data nodes. Are these the same cluster? If so, why the discrepancy in size?
Not really. I mean, if you do strange things to a cluster then yes this might lose data, but a properly managed cluster doesn't behave like this. As I said I am confused.
This seems odd. You're telling it to try and find at least 2 (really 3) master-eligible nodes but only giving it one address to try. Perhaps you are expecting this name to resolve to multiple addresses and then for Elasticsearch to try them all, but this isn't how it works. I would try giving it the addresses of all three master-eligible nodes, or using one of the discovery plugins to discover the master-eligible nodes dynamically.
[root@metrics-master-0 esutilities]# curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
103 2385 103 2385 0 0 74671 0 --:--:-- --:--:-- --:--:-- 76935
metrics-2018.11-10 1 p UNASSIGNED CLUSTER_RECOVERED
metrics-2018.11-10 1 r UNASSIGNED CLUSTER_RECOVERED
metrics-2018.11-10 1 r UNASSIGNED CLUSTER_RECOVERED
logs-2018.11.30.11 2 p UNASSIGNED CLUSTER_RECOVERED
logs-2018.11.30.11 2 r UNASSIGNED CLUSTER_RECOVERED
logs-2018.11.30.11 2 r UNASSIGNED CLUSTER_RECOVERED
logs-2018.11.30.12 0 p UNASSIGNED CLUSTER_RECOVERED
logs-2018.11.30.12 0 r UNASSIGNED CLUSTER_RECOVERED
logs-2018.11.30.12 0 r UNASSIGNED CLUSTER_RECOVERED
metrics-2018.11.25 1 p UNASSIGNED CLUSTER_RECOVERED
metrics-2018.11.25 1 r UNASSIGNED CLUSTER_RECOVERED
metrics-2018.11.25 1 r UNASSIGNED CLUSTER_RECOVERED
Ok, I think adding logger.gateway: TRACE to the config file on every node will give a little bit more detail about what's going on.
If I understand right, your problem is that you have a green cluster, with all shards assigned, but when you restart it it has unassigned shards and reports red health. If so, I would like to see logs from all nodes, starting with a green cluster, shutting everything down and starting it all back up again.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.