I would like to set up a watcher that triggers an action (email for example) when one of the cluster nodes is down.
From the documentation I know, that it's possible to monitor cluster status from /_cluster/health. But this one is not precise enough for me:
status yellow does not always mean, that the node is down
attribute number_of_nodes is not reliable: I do not want to hardcode any values in the watcher trigger because it will need a change when new nodes are added
Any ideas on how to achieve such a metric?
Does active_shards_percent_as_number reflect number of active nodes directly?
Unfortunately Elasticsearch has no way of knowing how many nodes you expect to have in the cluster. Put differently, it cannot tell the difference between a node leaving the cluster due to a failure and a node leaving the cluster because you are deliberately shrinking the cluster. If you want to validate that the number of nodes in the cluster is correct you have to write down the number of nodes you expect to be in the cluster, and keep that number up to date as the cluster evolves.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.