Hi,
I'm fairly new to ES and have to use 2.4 within the framework of Pelias.
A few days ago, I realized that ES wrote logs totalling > 900GB on the entire cluster, both to log
directory and syslog. Resulting in all disks (root and ES data mounted one) running completely full (I installed netdata now, so I'll get warned next time). That was mostly due to Pelias (well, and ES VERY verbose error logs). Anyways, I knew smth would be messed up after I cleaned the logs directories.
Now, my cluster is responding to /_cluster/health?pretty
queries (or /_cat/indices/v
) with
{
"error" : {
"root_cause" : [ {
"type" : "master_not_discovered_exception",
"reason" : null
} ],
"type" : "master_not_discovered_exception",
"reason" : null
},
"status" : 503
}
What does work is /_stats?pretty
(output is too long for a post). But that's not super useful for my particular problem. However, I did notice that the main index (distributed over 24 shards) has only a fraction of the documents it should have (23 Mio instead of > 500 Mio). Which looks to me like a lot of shards went offline.. But then again, can't query any /_cat
or /<index>/settings
API..
The API which is querying ES in the background, i.e. the Pelias API, randomly succeeds or throws this:
"[cluster_block_exception] blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];"
As I said, I'm not very used to ES and this definitely exceeds my knowledge. If possible, I want to recover the cluster without destroying it entirely (if not too late already) and set it up from scratch.
Can you pls give me hints how to proceed or debug?
Setup:
- Ubuntu 16.04
- ES v2.4
- 4 node cluster with 32GB RAM each and 16GB heap size for ES
Many thanks
Nils