This is a very good point, I'm thinking about this for years.
Node failures should be easy to monitor by OS services. But latency spikes
are totally different.
It is a very, very hard job to measure anomalies in latency correctly. Just
consider the skews of wrong programming, or of the hostile environments
JVMs do run in (clocks, OSes, VMs, ...) If anomalies are detected wrongly,
no or false alerts are emitted, and all of the effort would lead to
annoyance or frustration.
Lately I read about Gil Tene's LatencyUtils
https://groups.google.com/forum/#!topic/mechanical-sympathy/oZSv5QnpAYs
which I find a promising tool to measure anomalies in histograms.
Some of this might be possible to get implemented by an ES plugin, but I
haven't tried LatencyUtils yet, and how it can be connected to ES metrics
is still open to me.
Jörg
On Thu, Mar 6, 2014 at 7:24 PM, T Vinod Gupta tvinod@readypulse.com wrote:
is there a plugin or api support for monitoring ES key metrics and
alerting the dev ops about situations when some node in a cluster fails or
there is a spike in latency due to whatever reason?
what are the best practices here and what do people usually do?
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGXNqJkF5uL2oCKmBsHYqQJxFdxUrW%2BF0maVSJupOGupQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.