Recommendations for health monitoring

jpotisch · March 23, 2015, 3:11pm

We currently monitor our app by having a monitoring tool (Pingdom) retrieve
a health page from our app that retrieves and displays the Elasticsearch
cluster info, e.g.

{
"status": 200,
"name": "whatever",
"cluster_name": "whatever_dev",
"version": {
"number": "1.4.4",
"build_hash": "c38f773fc81201d1abdfde1ca2746fab58efa912",
"build_timestamp": "2015-02-19T13:05:36Z",
"build_snapshot": false,
"lucene_version": "4.10.3"
},
"tagline": "You Know, for Search"
}

If the monitoring process can't reach our app, or our app can't reach
Elasticsearch, we'll get an error and an alert, however, this doesn't tell
us anything about node and index health. I've made a page that calls
ClusterClient.health(level='indices') but want to confirm

Is this sufficient for surfacing any issue with our Elasticsearch
infrastructure? and
Does this call block query requests/backups, consume a lot of
resources, or otherwise create impacts such that we wouldn't want to be
calling it every 60 seconds 24x7?

We don't need to have our monitoring page give us a full diagnosis of all
conceivable issues, we just need it to trigger an alert that there is an
issue so we know we have some work to do, while having minimal impact on
overall application performance.

Any recommendations on what we should monitor to achieve those two mandates
would be greatly appreciated.

Thanks,

-joel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d9290f69-5150-4824-9ef4-6011b35ed959%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

mjdude5 · March 23, 2015, 4:00pm

You probably want to monitor each node as well, _nodes/stats has useful
disk/cpu/heap/gc stats. Also has information about thread usage and
completed tasks to monitor search/index growth.

I don't fully know the answer to #2, but I assume _nodes & _cluster are
served by management threads. We hit _nodes/stats and _cluster/health
every 5min and haven't seen any issues. Depending on your cluster size I
don't know if I'd do 60seconds, _nodes/stats can take some time to gather
if there's a lot of nodes.

On Monday, March 23, 2015 at 11:11:36 AM UTC-4, Joel Potischman wrote:

We currently monitor our app by having a monitoring tool (Pingdom)
retrieve a health page from our app that retrieves and displays the
Elasticsearch cluster info, e.g.

{
"status": 200,
"name": "whatever",
"cluster_name": "whatever_dev",
"version": {
"number": "1.4.4",
"build_hash": "c38f773fc81201d1abdfde1ca2746fab58efa912",
"build_timestamp": "2015-02-19T13:05:36Z",
"build_snapshot": false,
"lucene_version": "4.10.3"
},
"tagline": "You Know, for Search"
}

If the monitoring process can't reach our app, or our app can't reach
Elasticsearch, we'll get an error and an alert, however, this doesn't tell
us anything about node and index health. I've made a page that calls
ClusterClient.health(level='indices') but want to confirm

Is this sufficient for surfacing any issue with our Elasticsearch
infrastructure? and

Does this call block query requests/backups, consume a lot of
resources, or otherwise create impacts such that we wouldn't want to be
calling it every 60 seconds 24x7?

We don't need to have our monitoring page give us a full diagnosis of all
conceivable issues, we just need it to trigger an alert that there is an
issue so we know we have some work to do, while having minimal impact on
overall application performance.

Any recommendations on what we should monitor to achieve those two
mandates would be greatly appreciated.

Thanks,

-joel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b3d31d67-669e-4175-ae4b-1d734013c977%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Elasticsearch Indices Cluster Health API Elasticsearch	5	1242	July 5, 2017
Easy way to monitor ElasticSearch Elasticsearch elastic-stack-monitoring	4	343	March 20, 2023
Cluster health vs indices health Elasticsearch	3	371	April 17, 2019
Best way to monitor a single node? Elasticsearch	1	620	September 18, 2017
ClusterHealthRequest Clarification Elasticsearch	2	377	February 7, 2017

Recommendations for health monitoring

Related topics