My Elastic search cluster is not responding and status is RED from last few days. It is not able to allocate the shards which are unassigned. This issue started when I first restart the primary instance and I am also not able to delete the shards which are unassigned. Can you please help
I am using 2.4.1 version Elasticsearch on CentOS 6.7 with 130GB disk and 16GB Memory on each node
Yes I have 37K shards on 5nodes, we are using OneOps to provision the env and config.
Can you please point me how to reduce them, I am ok if I have to loose any data or all the data. I haven't configured anywhere to have so many shards, I want to keep this minimum. I am using file beat to ship the logs to Elasticsearch regularly.
Change the number of shards to 1 by index, have a lesser period of retention, use rollover API instead... Many choices. May be a combination of all that?
So you have only 8gb of HEAP, right? Might be not enough for so many shards.
BTW about your specific problem, do you see any errors in logs?
I don't see any errors in the logs, when I restart complete cluster - it starts allocating all the 32K shards and stops allocating a some point without any errors.
When I tried to delete few by using below CURl, its not helping.
$ curl -XDELETE 'http://localhost:9200/graylog_57529/'
{"error":{"root_cause":[{"type":"process_cluster_event_timeout_exception","reason":"failed to process cluster event (delete-index [graylog_57529]) within 30s"}],"type":"process_cluster_event_timeout_exception","reason":"failed to process cluster event (delete-index [graylog_57529]) within 30s"},"status":503}
$ curl -XDELETE 'localhost:9200/graylog_57286?pretty'
{
"acknowledged" : false
}
[app@elastic-search-60586867-5-190299640 ~]$ curl -XDELETE 'localhost:9200/graylog_57287?pretty'
{
"error" : {
"root_cause" : [ {
"type" : "process_cluster_event_timeout_exception",
"reason" : "failed to process cluster event (delete-index [graylog_57287]) within 30s"
} ],
"type" : "process_cluster_event_timeout_exception",
"reason" : "failed to process cluster event (delete-index [graylog_57287]) within 30s"
},
"status" : 503
}
And if I delete indexes manually from the directory and restart the cluster, it is restoring the deleted ones I guess from replica. Can you please suggest how to deleted the indexes using CURL or manually so that I can bring up the cluster and then work on ES management.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.