I am looking for ideas on how to send an automated UUID through the ELK stack and make sure it arrives in the elasticsearch cluster.
Basically some sort of elasticsearch process that generates the UUID inserts a record for it, then another external process possibly from Rundeck grabs that record and flips a switch saying it's active. That Rundeck job then logs into a server and uses the Linux 'logger' command to generate a syslog message with that UUID.
Once the switch is flipped elasticsearch starts some sort of timer, if that timer expires before that UUID makes its way back through the ELK stack an alert is generated.
Sort of like a bullet or echo ping being sent waiting for a response.
This tests multiple points of failure at one time.
Suggestions? Does anyone know of something similar to this already out there?
I'm going to use Lovebeat to detect significant changes in the frequency of messages flowing through the system. You can use the heartbeat input plugin to generate heartbeats or a cron job if you don't want to dependent on Logstash itself, then use the statsd or graphite output to send the beats to Lovebeat. Such a setup will indirectly check whether ES is accepting the messages since problems with the elasticsearch output will halt the whole pipeline.
This method can also be used to monitor the actual messages sources, i.e. you can easily detect if Logstash on host X has been hosed and its messages no longer come through.
Thanks for the pointer to Lovebeat. I have been looking at it and it seems very promising for my use case. Have you used Lovebeat before? It seems like the only way to add Services is via the NewServices call from the web UI.
I haven't used Lovebeat for real work yet, but it's on my list for next week. I've just dabbled with it and written a couple of source code patches.
You don't have to pre-create services. You can just start posting heartbeats (via the HTTP, statsd, or Graphite protocols). See the examples towards the end of the readme file.
Yes I just tried it and sure enough it was magic. Thanks again for the pointers.
Hm, not sure if I got it, but are you trying to test that ES is up and running? If so, I know our SPM, and I assume all other modern monitoring tools can do that for you.... in SPM we call them Heartbeat Alerts - basically alerts that notify you if SPM doesn't hear from your ES cluster for some time. I assume you could do this even with Nagios.
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/