How to use Kibana to run Elasticsearch query?


(Kennedy Kan) #1

I am using Elasticsearch 2.3.2, and Kibana-4.5. I have found from https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline-movavg-aggregation.html which states that moving average can do predictions. I know it is possible to only make query in ES, but I am not sure how should I do that with kibana.
I have a logstash file which reads a csv log file storing CPU usage for every 15 seconds. Should I just include the following into the logstash output json file for the related index as an output mapping?

{
    "the_movavg":{
        "moving_avg":{
            "buckets_path": "the_sum",
            "window" : 30,
            "model" : "holt_winters",
            "settings" : {
                "type" : "mult",
                "alpha" : 0.5,
                "beta" : 0.5,
                "gamma" : 0.5,
                "period" : 7,
                "pad" : true
            }
        }
}

Is it possible to have it as a graph as to be shown in Kibana by letting Kibana to read ES query?


How to build charts from Elasticsearch query
(Mark Walkom) #2

Check out https://www.elastic.co/blog/implementing-a-statistical-anomaly-detector-part-1, https://www.elastic.co/blog/implementing-a-statistical-anomaly-detector-part-2 and https://www.elastic.co/blog/implementing-a-statistical-anomaly-detector-part-3 by @polyfractal, it might give you some ideas.

But when it comes to getting the data in ES, you should be able to do that with an input query like you have. You may want to restructure it though, which is why those blog posts might give you some guidance.


(Kennedy Kan) #3

Thanks for the answer. I have first tried it in ES to do some queries. However, I have come across with some issues.

GET linux_cpu*/_search?search_type=count
{
  "aggs": {
    "my_date_histo": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "day"
      },
      "aggs": {
        "the_sum": {
          "avg": {
            "field": "CPU(%)"
          }
        },
        "the_movavg": {
          "moving_avg": {
            "bucketsPath": "the_sum",
            "window": 90,
            "model": "holt_winters",
            "settings": {
              "type": "add",
              "alpha": 0.8,
              "beta": 0.2,
              "gamma": 0.7,
              "period": 30
            },
            "predict": 30
          }
        }
      }
    }
  }
}

This is my command, and I wish to predict 30 days data. I have data input from 2014-12-31 to 2015-05-31, a 15-second basis record for CPU value and I use day as interval here in ES query.

However, when I go through the date start from 2015-06-01, it gives me the following

  "key_as_string": "2015-06-01T00:00:00.000Z",
  "key": 1433116800000,
  "doc_count": 0,
  "the_sum": {
    "value": null
  }
},
{
  "key_as_string": "2015-06-02T00:00:00.000Z",
  "key": 1433203200000,
  "doc_count": 0,
  "the_sum": {
    "value": null
  }
},

Is that an error or is that normal for ES?

Hi @danielmitterdorfer,

I have forgotten that I have deep dived the issue here; thus, please kindly reply me there if you have a solution instead of replying How to build charts from Elasticsearch query.

Great thanks.


How to use Kibana to run ES query?
(Kennedy Kan) #4

@warkolm, thanks for reminding. As the post seems more related to Kibana now, I have moved the post to Kibana.


(Mark Walkom) #5

No worries, better to move it than start a new one :slight_smile:


(Daniel Mitterdorfer) #6

Hi @Kennedy_Kan1,

I have just ran a very similar query against our benchmark data:

GET rally-2016/_search?search_type=count
{
  "query": {
    "term": {
      "name": {
        "value": "indexing_throughput"
      }
    }
  }, 
   "aggs": {
      "my_date_histo": {
         "date_histogram": {
            "field": "@timestamp",
            "interval": "day"
         },
         "aggs": {
            "the_sum": {
               "avg": {
                  "field": "value"
               }
            },
            "the_movavg": {
               "moving_avg": {
                  "bucketsPath": "the_sum",
                  "window": 90,
                  "model": "holt_winters",
                  "settings": {
                     "type": "add",
                     "alpha": 0.8,
                     "beta": 0.2,
                     "gamma": 0.7,
                     "period": 30
                  },
                  "predict": 30
               }
            }
         }
      }
   }
}

(see Rally docs on the metrics record format). Despite that I got bogus predictions (I didn't tune any of the parameters), I got at least predictions. Maybe @polyfractal has an idea what could be wrong.

Daniel


(Zachary Tong) #7

Hm, since the buckets have a doc_count of 0, the sum will be null (nothing to sum up). Can you gist up the full response somewhere? Are all buckets zero count? I'd verify that the field is correctly named and the rest of the buckets have data, then we can tackle the moving average problem next.


(Kennedy Kan) #8

@polyfractal @danielmitterdorfer
Thanks for your kindly replies.
For a better picture of my index resources, the following are part of my resources data and logstash config, so you may know the field data.
Resources Data

"Datetime","Hostname","Location","Usage","YYMMDD","HHMM","CPU(%)",
"20150101-00:00:00","tested3a","DC6","EAP-Production(Production/EAI)","160101","0000","5.0",

Config File

csv {
         columns => ["Datetime","Hostname","Location","Usage","YYMMDD","HHMM","CPU(%)","NA"]
         separator => ","
	 skip_empty_columns => true
	 convert => {"CPU(%)" => "float"}
     }	
     date {
         match => [ "Datetime" , "yyyyMMdd-HH:mm:ss"]
     }

And here is a successful output. As said above, I have successfully get the data from 2015-01-01 to 2015-05-31 (my last data record of my sources ends at 2015-05-31). For full result you can refer from http://paste.ubuntu.com/18553374/

      "key_as_string": "2015-05-31T00:00:00.000Z",
      "key": 1433030400000,
      "doc_count": 63,
      "the_sum": {
        "value": 5
      },
      "the_movavg": {
        "value": 4.789703098153141
      }
    },

Will it give a better idea or is there any other data further needed?

Thanks for any help.


(Zachary Tong) #9

So it looks like the problem is that you have CPU percentage data until 2015-05-31T00:00:00.000Z, but your index has timestamps that go all the way up until 2016-07-05T00:00:00.000Z. So all the buckets after 2015-05-31 have zero document counts.

Starting after 2016-07-05, the moving average agg starts generating predictions. By default, the agg "skips" null buckets so it's using the values from the non-zero buckets... but with the later date.

To fix this, you should probably use a range filter in the search query to restrict the date range, or figure out why only some of your docs have CPU percentage values.


(Kennedy Kan) #10

@polyfractal
Thanks for your sharing. However,actually I didnt have any records on the date 2016-07-05 in my sources. My last record is already stopped at 2016-05-31.

"20150531-23:59:45","tested3a","DC6","EAP-Production(Production/EAI)","160101","0000","5.0",

Indeed I just start the logstash conf file to read the past data on 2016-07-05. Will this be the effect why I have doc.count on this day?


(Zachary Tong) #11

If you look at the agg response, you'll see there is a single bucket at the end which has some documents:

{
          "key_as_string": "2016-07-05T00:00:00.000Z",
          "key": 1467676800000,
          "doc_count": 288,
          "the_sum": {
            "value": 5
          },
          "the_movavg": {
            "value": 4.789703098153141
          }
        },

Since this is a non-zero bucket, the histogram will automatically "fill" the intermediate range with zero-count buckets to make it a contiguous range. So logstash must have picked up some new data and added it to the index, which is causing this large span in your data.

You can see those docs directly by doing a query for anything with a later date:

GET linux_cpu*/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "range": {
          "@timestamp": {
            "gte": "2015-05-31T00:00:00.000Z"
          }
        }
      }
    }
  }
}

(Kennedy Kan) #12

@polyfractal
Great thanks for your advice. I am able to do the prediction on a correct date now after I have corrected some records in my sources. The output is awesome. Nonetheless, is it possible for this query result to be displayed as a chart in the Kibana or Timelion? I have tried them out but none seems give a satisfactory output as both seem cannot display prediction value?


(Zachary Tong) #13

Yeah, I don't think Kibana or Timelion support pipeline aggs yet, unfortunately. You could make it work in Timelion by writing a custom function (they are relatively small, self-contained modules) but I don't think it has the functionality out of the box.

I think it's on the roadmap though.


(Kennedy Kan) #14

@polyfractal
Just leave it then. Great thanks anyway for the answers and guidance all the way.


(Mark Walkom) #15

Pipelines are being worked on in KB, I think the UX is proving challenging though which is slowing things down.


(Kennedy Kan) #16

@warkolm
Does that mean there is a way of inserting this query into Kibana?
Thanks.


(Mark Walkom) #17

Not yet.


(Kennedy Kan) #18

@warkolm
Sorry to get back late here.
Though I could not work it out, is it possible to do that by inserting the query editing the visualization objects in Kibana?


(system) #19