Show the server that is down


#1

Hi,

I am sending following data to elasticsearch from multiple servers.
SysTime, HostName, MemFree, CPULoad

I am sending this data every 5 minutes. That is every 5 minutes some 10 servers in my deployment will send this data to elasticsearch.

If data from any server is missing for last 5 minutes (or say last 6 minutes, to avoid any boundary condition issues) then I want to show it as "down".

I don't want to hard code the list of servers that I would be looking for since I may dynamically start monitoring more servers. Thus, I want to base my logic such that I find a list of distinct servers within last 10 minutes and a list of distinct servers within last 5 minutes and compare. If some server is missing in the last 5 minute list, then I want to show it as "down".

The "show" as down is preferably on Kibana (currently on 5.6, but can go to 6 if needed - same with elasticsearch). If not on Kibana, then I would like to just get that output through some elasticsearch query.

Is it possible to do so?

Thanks so much.


(Luiz Santos) #2

Hi @deuskars,

You could use terms aggregation to do it, please consider this full example:

PUT monitoring/doc/1
{
  "server": "server1",
  "cpu": 20,
  "timestamp": "2017-12-18T00:00:00"
}

PUT monitoring/doc/2
{
  "server": "server2",
  "cpu": 30,
  "timestamp": "2017-12-18T00:00:00"
}

PUT monitoring/doc/3
{
  "server": "server1",
  "cpu": 15,
  "timestamp": "2017-12-18T00:10:00"
}

GET monitoring/_search
{
  "size": 0, 
  "query": {
    "bool": {
      "filter": {
        "range": {
          "timestamp": {
            "gte": "2017-12-18T00:00:00",
            "lt": "2017-12-18T00:10:00"
          }
        }
      }
    }
  },
  "aggs": {
    "servers": {
      "terms": {
        "field": "server.keyword",
        "size": 10
      }
    }
  }
}

GET monitoring/_search
{
  "size": 0, 
  "query": {
    "bool": {
      "filter": {
        "range": {
          "timestamp": {
            "gte": "2017-12-18T00:10:00",
            "lt": "2017-12-18T00:20:00"
          }
        }
      }
    }
  },
  "aggs": {
    "servers": {
      "terms": {
        "field": "server.keyword",
        "size": 10
      }
    }
  }
} 

But, why not using using Heartbeat that already do the hard work with great dashboards?

Cheers,
LG


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.