Aggregations / Curl / # of events by device in the past hour

All, looking for a simple curl command that can run every hour in crontab. it's purpose is to look at the total number of events generated for the past hour and sort by: source device generating event, total number of events generated by that device.

we use ELK for network syslog monitoring. we have a few hundred devices pointed to a syslog-ng process and what I'm hoping to do is simply check every hour how many events each device generated. some will generate 0 events. some (firewalls) will generate a hundred or so... I just need to know how to curl that data out.

from there I'll have my script check the values to a known set of "acceptable ranges" and if they are out of that range it will send an e-mail alert. this piece I can figure out... it's the curl in elasticsearch that I could use some assistance on. any help / guidance is greatly appreciated.

thanks,

Lee

You'll want to do a terms aggregation. Simple example:

curl -XPOST hostname:9200/_search\?pretty -d '
{
  "size": 0,
  "aggs": {
    "any-name-for-this-agg": {
      "terms": {
        "field": "name-of-field-to-aggregate"
      }
    }
  }
}'

If you only want to check that last hour you'll need to add a restriction on the @timestamp field.

thanks so much. I've been working with your framework and so far this works great:

curl -XPOST localhost:9200/_search\?pretty -d '
{
  "query": { "term" : { "devtype.raw" : "ASA Firewall" } },
    "aggs" : {
        "date_interval" : {
            "date_histogram" : {
                "field" : "syslog_server_time",
                "interval" : "month"
            }, "aggs": {
                   "any-name-for-this-agg": {
                       "terms": { "field": "hostname.raw" }
                   }
                }
            }
        }
    }
}'

but I think with the "two level's of agg"... what I'm getting is all events per month for every month... would like to get just the last month result... not every month prior... any thoughts/suggestions?

thanks,

Lee

A date histogram is not what you're looking for. Add the date restriction to the query so that not all events are subject to the aggregation in the first place.

thanks so much for your help on this... think I got what I need... need to still tweak it a bit but what I now have is:

curl -XPOST localhost:9200/_search\?pretty -d '
{
  "size": 0,
  "aggs": {
    "last_X_min": {
       "filter" : { 
          "range" : { 
             "syslog_server_time" : { 
                "gte": "now-15m", "lte" : "now"   **<--- 15m to 5m changes how far back to look...**
              }
           }
        },
        "aggs" : {
           "any-name-for-this-agg" : {
              "terms" : {
                 "field" : "hostname.raw",   **<--- finding I have to use field.raw for this to work... not sure why...**
                 "size":355   **<--- this number has to be greater than the total # of devices or you may miss one...**
               }
            }
         }
     }
   }
}'

thanks again.

I'm not very good with the query DSL and I don't have time to look things up, but as I said I think you should use the query section for weeding out old events. But if you have above works, great.

finding I have to use field.raw for this to work... not sure why

That's because the hostname field is analyzed. The .raw subfield isn't.