ES 2.4 failed to discover master after hard disks ran full

nilsnolde · October 17, 2018, 12:39pm

Hi,

I'm fairly new to ES and have to use 2.4 within the framework of Pelias.

A few days ago, I realized that ES wrote logs totalling > 900GB on the entire cluster, both to log directory and syslog. Resulting in all disks (root and ES data mounted one) running completely full (I installed netdata now, so I'll get warned next time). That was mostly due to Pelias (well, and ES VERY verbose error logs). Anyways, I knew smth would be messed up after I cleaned the logs directories.

Now, my cluster is responding to /_cluster/health?pretty queries (or /_cat/indices/v) with

{
  "error" : {
    "root_cause" : [ {
      "type" : "master_not_discovered_exception",
      "reason" : null
    } ],
    "type" : "master_not_discovered_exception",
    "reason" : null
  },
  "status" : 503
}

What does work is /_stats?pretty (output is too long for a post). But that's not super useful for my particular problem. However, I did notice that the main index (distributed over 24 shards) has only a fraction of the documents it should have (23 Mio instead of > 500 Mio). Which looks to me like a lot of shards went offline.. But then again, can't query any /_cat or /<index>/settings API..

The API which is querying ES in the background, i.e. the Pelias API, randomly succeeds or throws this:

"[cluster_block_exception] blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];"

As I said, I'm not very used to ES and this definitely exceeds my knowledge. If possible, I want to recover the cluster without destroying it entirely (if not too late already) and set it up from scratch.

Can you pls give me hints how to proceed or debug?

Setup:

Ubuntu 16.04
ES v2.4
4 node cluster with 32GB RAM each and 16GB heap size for ES

Many thanks
Nils

system · November 14, 2018, 12:39pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
After Create My ES cluster included only 3 nodes each of nodes is "master and data node" , when trying to check cluster health using "9200/_cluster/health?pretty'" getting an error Elasticsearch	1	268	February 6, 2019
Cluster error, no known master node, scheduling a retry Elasticsearch	1	2059	July 5, 2017
ES Cluster Recovery and Restart Elasticsearch	3	582	July 6, 2017
The cluster often reports an error [service_unavailable / 2 / no master] Elasticsearch	3	7503	May 11, 2020
Restart cluster ES2.1 failed after disk full exception, how can I make it? Elasticsearch	1	570	November 16, 2017

ES 2.4 failed to discover master after hard disks ran full

Related Topics