How to aggregate data by date range


(peter) #1

Hi,

I'm a fresh to ElasticSearch, I created the environment and insert some sample date to test, my data is very simple (just 2 rows from the Apache log):

{
"message" => "199.201.64.129 - - [19/Aug/2015:07:06:25 +0800] "GET /v2/brands/12 HTTP/1.1" 200 1031 "-" "Mode/2.0.7 (iPhone Simulator; iOS 8.3; Scale/2.00)"",
"@version" => "1",
"@timestamp" => "2015-08-18T23:06:25.000Z",
"host" => "zhaoweiweideMacBook-Pro.local",
"path" => "/Users/zhaoweiwei/logs/test/testLog.log",
"clientip" => "199.201.64.129",
"ident" => "-",
"auth" => "-",
"timestamp" => "19/Aug/2015:07:06:25 +0800",
"verb" => "GET",
"request" => "/v2/brands/12",
"httpversion" => "1.1",
"response" => "200",
"bytes" => "1031",
"referrer" => ""-"",
"agent" => ""Mode/2.0.7 (iPhone Simulator; iOS 8.3; Scale/2.00)""
}
{
"message" => "211.144.202.170 - - [20/Aug/2015:07:06:25 +0800] "GET /v2/brands/12 HTTP/1.1" 200 1031 "-" "Mode/2.0.7 (iPhone Simulator; iOS 8.3; Scale/2.00)"",
"@version" => "1",
"@timestamp" => "2015-08-19T23:06:25.000Z",
"host" => "zhaoweiweideMacBook-Pro.local",
"path" => "/Users/zhaoweiwei/logs/test/testLog.log",
"clientip" => "211.144.202.170",
"ident" => "-",
"auth" => "-",
"timestamp" => "20/Aug/2015:07:06:25 +0800",
"verb" => "GET",
"request" => "/v2/brands/12",
"httpversion" => "1.1",
"response" => "200",
"bytes" => "1031",
"referrer" => ""-"",
"agent" => ""Mode/2.0.7 (iPhone Simulator; iOS 8.3; Scale/2.00)""
}

I want to get distinct client ip in a date range, So I execute the below query:

curl -XGET 'http://localhost:9200/logstash-*/_search?search_type=count' -d '
{
"aggregations": {
"distinct_value": {
"terms": {
"field": "clientip"
},
"aggregations": {
"dates_between": {
"range": {
"field": "timestamp",
"ranges": [
{ "format": "dd/MMM/yyyy:HH:mm:ss",
"gte": "20/08/2015:00:00:00",
"lte": "20/08/2015:00:00:00"}
]
}
}
}
}
}
}'

but it always throw exception:
"reason":"ClassCastException[org.elasticsearch.index.fielddata.plain.PagedBytesIndexFieldData cannot be cast to org.elasticsearch.index.fielddata.IndexNumericFieldData]",

How to implement the query? Thanks in advance!


(Colin Goodheart-Smithe) #2

This is because your "timestamp" field is mapped as a String field and not as a Date field. There are two solutions you could use to fix this (both require re-indexing your 2 records into a new index):

  1. Explicitly map the "timestamp" field as a date field when you create the index. You will need to specify the date format in the mapping as you are using a non-standard format.
  2. Use the date filter in your Logstash config to convert the field to a date in Logstash before it gets sent to Elasticsearch

The query you are using looks to be oke so if you implement one of the two solutions above you should get what you are after.

Hope that helps


(peter) #3

Thanks for Reply. I already added
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
timezone => "Asia/Shanghai"
}
in the configuration file, I think it should convert 'timestamp' to date format just like your second proposal.

Currently, I use below query to get the result with no exception, but result looks wrong. First I run the below query:

curl -XGET 'http://localhost:9200/logstash-*/_search?search_type=count' -d '
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"range": {
"timestamp": {
"gte": "18/08/2015",
"lte": "19/08/2015",
"format": "dd/MMM/yyyy"
}
}
}
}
},
"aggs": {
"group_by_device_id": {
"terms": {
"field": "clientip"
}
}
}
}'

the result is '211.144.202.170', my timestamp is shanghai time, why the result looks like get UTC time value?

If it's UTC time, I changed the above 'range' part to:
"range": {
"timestamp": {
"gte": "19/08/2015:00:00:00",
"lte": "19/08/2015:23:59:59",
"format": "dd/MMM/yyyy:HH:mm:ss"
}
}

It should show the result, but there is nothing... Do you know what's the problem of my query?

Thanks a lot!


(peter) #4

Sorry, I got your point, after I added: 'target => "timestamp"' to parse the timestamp to date, it works fine. Thanks!


(system) #5