(ES ver 19.3)
dearest ES wizards,
are there other ways to check on a cluster state/health/status other than
via curl 9200 or the web UI?
we have a cluster that need to be restarted (due to split brain masters),
and after startup, we are unable to reach 9200 nor the web UI.
(I am guessing the web UI fills data and the pretty hosts <-> index[shard#]
mappings from info gathered via 9200, hence why neither works?)
I would like to be able to see progress of any recovery/initialization/etc
going to see if this cluster is recoverable, before declaring it a loss
(and having to wipe/rebuild).
I am able to see a couple individual index states (ones i've found by
looking in the es-data dir for the indices), but anything node level or
cluster level fails, or just hangs.
after startup, I did tail each of the 10 hosts logs and saw them assign a
master, run some recoveries..... then queries started coming in... it
appears that some hosts have high load and are serving queries, while
others (perhaps with less hot shards, or maybe no recoverable indexes?) are
fairly idle and are spewing " Failed to execute fetch
"org.elasticsearch.search.SearchContextMissingException: No search context
found for id " - just guessing a few indexes are missing shards or
just not avail?
we've been having a lot of issues recently with failed hosts, or hosts
dropping out [logs say timeouts] (presumably due to load or net issues).
i had to restart the cluster a few times to get 1 master to stick (had to
set some of the hotter nodes to node.master: false - otherwise the master
got too loaded and timedout causing various cluster hosts to assign a new
master). Is it possible our cluster has some corrupt states?, is just too
overloaded, or we've just got a bad configuration. HW all seems to check
out. beefy-ish 48G raid 10, 6 drive boxes 16G jvm (not sure why this
much thanks for reading!
any input / advice / suggestions greatly appreciated