ES 2.4 failed to discover master after hard disks ran full

Hi,

I'm fairly new to ES and have to use 2.4 within the framework of Pelias.

A few days ago, I realized that ES wrote logs totalling > 900GB on the entire cluster, both to log directory and syslog. Resulting in all disks (root and ES data mounted one) running completely full (I installed netdata now, so I'll get warned next time). That was mostly due to Pelias (well, and ES VERY verbose error logs). Anyways, I knew smth would be messed up after I cleaned the logs directories.

Now, my cluster is responding to /_cluster/health?pretty queries (or /_cat/indices/v) with

{
  "error" : {
    "root_cause" : [ {
      "type" : "master_not_discovered_exception",
      "reason" : null
    } ],
    "type" : "master_not_discovered_exception",
    "reason" : null
  },
  "status" : 503
}

What does work is /_stats?pretty (output is too long for a post). But that's not super useful for my particular problem. However, I did notice that the main index (distributed over 24 shards) has only a fraction of the documents it should have (23 Mio instead of > 500 Mio). Which looks to me like a lot of shards went offline.. But then again, can't query any /_cat or /<index>/settings API..

The API which is querying ES in the background, i.e. the Pelias API, randomly succeeds or throws this:

"[cluster_block_exception] blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];"

As I said, I'm not very used to ES and this definitely exceeds my knowledge. If possible, I want to recover the cluster without destroying it entirely (if not too late already) and set it up from scratch.

Can you pls give me hints how to proceed or debug?

Setup:

  • Ubuntu 16.04
  • ES v2.4
  • 4 node cluster with 32GB RAM each and 16GB heap size for ES

Many thanks
Nils

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.