Questions about Self Monitoring Systems blog post


(David Reagan) #1

I found https://www.elastic.co/blog/a-case-for-self-monitoring-systems very interesting. It's very close to what I really want to do here at work. The problem is that that setup relies on Watcher for sending alerts. There's no way we can afford a subscription, so Watcher is out of our reach. Are there any alternatives to Watcher I have not found?

The post does say this:

As an alternative to Watcher, Integrating Elasticsearch with any
existing monitoring system would be trivial - one could simply set up a
check that alerts and escalates under similar conditions.

That sounds like we'd still need a monitoring system. Avoiding that need would be the whole point of the "Self Monitoring" set up, right?

Also, do you need to install nagios on every node that you are monitoring to make this work?

I'm posting this in the Beats forum because it's connected to the nagiosbeats beat. Apologies if it should have been posted elsewhere.


(Jay Greenberg) #2

Hi @jerrac,

You do not need to install Nagios Core on end systems - only plugins or checks. In some cases, the check might be a single Perl script, or in other cases it might be a python script plus dependencies, etc. Most Linux distributions have installable packs of Nagios plugins via repository, e.g., you can install many common plugins like this:

yum install nagios-plugins-all

or e.g., just the disk check plugin:

yum install nagios-plugins-disk

Even without watcher, systems are still "self monitoring", since checks are pushed by the systems into a central repository, instead of a central system reaching out to check the end systems.

Nagios Core does 2 things - Monitoring & Alerting. In our alternative, systems would continue to "self-monitor", but Nagios would handle the alerting piece.

The only outstanding task would be to develop a Nagios check to run out of Nagios Core, that checks Elasticsearch (using the same queries that Watcher would use). The advantage here is that Nagios (or whatever monitoring system) only reaches out to Elasticsearch using a single check, as opposed to hundreds or thousands.

Let me know what you think...


(David Reagan) #3

So, all nodes would need nagios plugins installed, got it.

If I still have to have nagios core running so that alerts work, then I have to ask myself why not just use nagios to do all the monitoring as well? For me, the point of the self monitoring aspect is to limit the number tools I have to support. If ELK/Beats can send the alerts as well as do the monitoring, then that's less work for me. I wouldn't have to learn how to configure nagios.

Anyway, I get that we'd need another service running to send alerts. I was just hoping someone had created an open source ES plugin that was similar to watcher.

All that said, here's a few thoughts on how we could expand on the overall idea.

Build a Kibana "monitoring" plugin. It would let you define "alert searches " and how the alerts would be sent. (I assume nodejs has email/sms/etc. plugins that could do that.) Ideally, there'd be a default dashboard that displays relevant info for the various alert searches. And there'd also be a tool that would let people define the beat config for a nagios check without knowing how to configure beats. They'd then send the generated config to whomever configures the beats. And now I'm wanting to find time to learn nodejs and see if this would even be feasible... :slightly_smiling:


(David Reagan) #4

For future reference, logstash output plugins might work as a watcher replacement.

Using: https://www.elastic.co/guide/en/logstash/current/plugins-outputs-email.html

wouldn't something like this be possible?

if [field] == "error|critical" {
    email {
    <options>
    }
 }

Though, the docs don't indicate if the entire log message is sent, or just whatever you put in the config.

You could do something similar with https://www.elastic.co/guide/en/logstash/current/plugins-outputs-hipchat.html

And https://www.elastic.co/guide/en/logstash/current/plugins-outputs-nagios.html could help avoid having to write searches to trigger nagios alerts.

https://www.elastic.co/guide/en/logstash/current/plugins-outputs-sns.html might help with sms text messages, if that's needed.


(Jay Greenberg) #5

@jerrac I think it would be possible to whip something simple up in Logstash:

input {
  http_poller {
    urls => {
      nagioschecks => {
        method => post
        body => '{
                    "query":{
                      "filtered":{
                         "query" : {
                            "bool" : {
                              "must" : [
                                    { "term" : {"_type": "nagioscheck"} },
                                    { "range" : {"@timestamp" : {"gte" : "now-30m"}} },
                                    { "term" : {"status" : "CRITICAL" } }
                              ]
                            }
                         }
                      }
                    }
                }'
        url => "http://localhost:9200/nagioscheckbeat*/_search"
        headers => {
          Accept => "application/json"
        }
      }
    }
    request_timeout => 30
    interval => 10
    codec => "json"
  }
}

output {
  if [hits][hits] != [] {
   email {
    to => "you@gmail.com"
   }
  }
}

Although it might be easier to pump the check results through Logstash inline, and then alert as you put it:

if [status] == "WARNING|CRITICAL" {
    email {
      to => "you@gmail.com"
    }
 }

The disadvantage to this approach would be a barrage of emails when anything went wrong, so you would need some way to throttle / acknowledge them.


(Jerry Hoffmeister) #6

This looks pretty interesting to me too - is there some example logstash config that I could use to start with if I just want to pull the data in from the example thru logstash instead of direct to ES? And is there a minimum logstash version I would need? Would 1.5.6 for example be sufficient or do I need the current version?


(David Reagan) #7

Well, I had the wonderful idea this morning of building a Kibana app that would do a lot of the monitoring tasks. Run saved searches on a schedule, find all the servers configured with a beat, show a dashboard with server status, use saved searches as a form of remote probe, etc. Then I ran into this: https://github.com/elastic/kibana/issues/4704 So, no custom plugin development is possible for Kibana yet. Which confuses me since I thought that was exactly what Timelion was... (Well, it's possible if I want to frequently rewrite my plugin...)

@Jerry_Hoffmeister Could you clarify?

If I were to guess what you mean, you want to use nagios beat to send data to elasticsearch via logstash. Right? If so, you can do so easily with the beats input in logstash 2.x. That's what I do with file and top beat. The lumberjack input might work for 1.5.6... You could also output from nagiosbeat to a file, then use logstash-forwarder to watch that file and send to logstash.


(Jerry Hoffmeister) #8

Thanks, yeah, that's what I wanted to do but I ended up just upgrading everything to the latest versions and sending directly to elasticsearch. I'm just kinda figuring things out at this point - yes I could use the beats input which I'm currently using with filebeat. I guess unless I want to modify the data there's no advantage to going thru logstash?


(Monica Sarbu) #9

Yes, that's right. For now, if you use Filebeat, it makes sense to send the data through Logstash as Filebeat doesn't do any parsing of your log lines. In the next major release, Ingest Node is available and you would be able to send the data directly to Elasticsearch.


(David Reagan) #10

Are there any compiled binaries of nagioscheckbeat out there I could use for testing things? I haven't quite got the hang of compiling go projects yet. I'm going to go work on that, but just downloading something would be easier when I'm just testing.


(David Reagan) #11

Never mind... https://github.com/PhaedrusTheGreek/nagioscheckbeat/tree/master/build


(Jay Greenberg) #12

@jerrac / @Jerry_Hoffmeister ,

I have played with this a little bit more, and have come up with the following mash up. The cool thing about this set up is that you end up with all your Hosts and Services in the Nagios dashboard , looking very tidy, with zero nagios config. Additionally, we are sending all statuses through, including OK statuses, so service will recover properly.

Configure Logstash to Output Nagios Beats to NSCA

The only caveat about this configuration (which is not really a huge deal) is that you have to duplicate the output for each status. I am working with the logstash folks to determine why it isn't possible to use an integer value for the status code setting

This configuration takes each nagios check, and outputs it using the NSCA (Nagios Service Check Adapter).

input {
 beats {
  port => 5044
 }
}

output{
  if [status_code] == 0  {
    nagios_nsca {
      host => "localhost"
      nagios_service => "%{name}"
      nagios_status => 0
      nagios_host => "%{[beat][hostname]}"
      message_format => "%{name}: %{message}"
    }
  }
  if [status_code] == 1  {
    nagios_nsca {
      host => "localhost"
      nagios_service => "%{name}"
      nagios_status => 1
      nagios_host => "%{[beat][hostname]}"
      message_format => "%{name}: %{message}"
    }
  }
  if [status_code] == 2  {
    nagios_nsca {
      host => "localhost"
      nagios_service => "%{name}"
      nagios_status => 2
      nagios_host => "%{[beat][hostname]}"
      message_format => "%{name}: %{message}"
    }
  }
  if [status_code] == 3  {
    nagios_nsca {
      host => "localhost"
      nagios_service => "%{name}"
      nagios_status => 3
      nagios_host => "%{[beat][hostname]}"
      message_format => "%{name}: %{message}"
    }
  }
}

To make this work, you also have to install nsca-client on the Logstash server.

yum install nsca-client

Luckily for me, it put the binary in the default location of [the output setting] (https://www.elastic.co/guide/en/logstash/current/plugins-outputs-nagios_nsca.html#plugins-outputs-nagios_nsca-send_nsca_bin) - send_nsca_bin => "/usr/sbin/send_nsca"

Configure NSCA & Radar on the Nagios server

On my CentOS box, I only had to do a:

yum install nsca

I used Radar to scan for new Hosts and Services and automatically update the Nagios configuration. It's a simple script you can download and run in cron.

Here are the steps I performed to integrate Radar:

  1. I had to add a Perl dependency with yum install perl-File-Pid
  2. I set the Radar script to run in cron every so often, followed by a service nagios reload
  3. I created the file /etc/nagios/objects/radar.cfg and installed the service and host templates as defined in the Radar docs. Note that the Radar templates include reference to a host and service group which doesnt exist, so you should just remove those lines if you don't need the groups. Reference that file in /etc/nagios/nagios.cfg so that it loads.
  4. Modify /etc/nagios/nagios.cfg to enable the /etc/nagios/conf.d/ directory, as it's not enabled by default.
  5. Modify the Radar script configuration to match your environments. In my CentOS environment, I did it like this:
my $NAGIOS_LOGFILE="/var/log/nagios/nagios.log";
my $CFG_DIRECTORY="/etc/nagios/conf.d/";
my $NAGIOS_CONFIG="/etc/nagios/objects/";
my $HOST_TEMPLATES="generic-radar-host";
my $SERVICE_TEMPLATES="generic-radar-service";
my $ICINGA_USER="nagios";
my $ICINGA_GROUP="nagios";
my $ENABLE_LOGGING=1;
my $LOGFILE_DIRECTORY="/var/log/nagios/";
my $PID_FILE_DIRECTORY="/var/run/";

I was surprised at how easy this was to set up, and it found a nice range of hosts and services generated by my Beats test machines, and it looked great in nagios! You can always edit the .cfg files created by Radar and remove things that no longer exist anymore.

I haven't tried this yet, but it should also be possible to play with the host templates in order to automatically set up pings to discovered hosts as well.


(Jerry Hoffmeister) #13

Interesting... At this point, we're trying to JUST use ELK to monitor and eventually send alerts / tickets to ServiceNow. I liked the idea of being able to use any nagios plugin to send data thru beats to elasticsearch. We're NOT using nagios at this point although (not my decision) not sure that we shouldn't.


(Steffen Siering) #14

Why all these if-statements? Wouldn't this do the trick:

nagios_nsca {
      host => "localhost"
      nagios_service => "%{name}"
      nagios_status => "%{[status_code]}"
      nagios_host => "%{[beat][hostname]}"
      message_format => "%{name}: %{message}"
    }

(Jay Greenberg) #15

@steffens,

Discussing this yesterday with @suyograo -

The Nagios NSCA output plugin does a check here that will fail if the passed variable is not an integer.

We think this is a bug in Logstash that an integer value cannot be passed as a config parameter.


(Steffen Siering) #16

TIL, thanks.


(Jerry Hoffmeister) #17

With nagioscheckbeat, can I add fields to a document?

Also, I'd like to modify one of the fields and I'm assuming the best way is using logstash? I'm using nagioscheckbeat with a mongodb nagios plugin (https://github.com/mzupan/nagios-plugin-mongodb) and one of the fields that comes thru is "args" which looks like: "-H 10.0.0.202 -A connect -P 27018 -W 2 -C 4 -s -u username -p password" for example. I'd like to get rid of the password. I could use mutate to remove the entire field but I'd prefer to just redact the password. And the args field isn't consistent (there may be missing or other parameters).


(Jerry Hoffmeister) #18

Figured out the second part - redacting the password. Added the following to my logstash filter:

  gsub => [
    "args", "-p \S*", "-p redacted"
  ]

(Jay Greenberg) #19

@Jerry_Hoffmeister - You can statically add fields directly to each document using the fields directive, part of libbeat.


(system) #20