Question about date histogram


(peter) #1

Hi,

I want to get distinct "clientip" count in a specified day (TimeZone +08:00), so I run the query:
{
"query" : {
"filtered" : {
"query" : {
"match_all" : { }
},
"filter" : {
"and" : {
"filters" : [ {
"term" : {
"response" : "200"
}
}, {
"range" : {
"timestamp" : {
"from" : "2015-08-28T16:00:00.000Z",
"to" : "2015-08-29T16:00:00.000Z",
"include_lower" : true,
"include_upper" : false
}
}
}, {
"prefix" : {"request" : "/v2/version"}
} ]
}
}
}
},
"aggregations" : {
"distinct_client_ip_count" : {
"cardinality" : {
"field" : "clientip"
}
}
}
}
it works well, the result is 143. but when I change the query using "date histogram", the result is different (result is 139), my "date histogram" query is:
{
"query" : {
"filtered" : {
"query" : {
"match_all" : { }
},
"filter" : {
"and" : {
"filters" : [ {
"term" : {
"response" : "200"
}
}, {
"range" : {
"timestamp" : {
"from" : "2015-08-28T16:00:00.000Z",
"to" : "2015-08-29T16:00:00.000Z",
"include_lower" : true,
"include_upper" : false
}
}
}, {
"prefix" : {"request" : "/v2/version"}
} ]
}
}
}
},
"aggs" : {
"over_time" : {
"date_histogram" : {
"field" : "timestamp",
"interval" : "day",
"time_zone": "+08:00"
},
"aggregations" : {
"distinct_client_ip_count" : {
"cardinality" : {
"field" : "clientip"
}
}
}
}
}
}

What's wrong with my query? The only difference between the 2 query is I embedded the cardinality agg as a sub agg in the date histogram.
Thanks in advance!


(peter) #2

I have resolved the issue, it caused by cardinality, cardinality in the ES is not very accurate, we can set the precision_threshold to change the accuracy.


(system) #3