*Elasticsearch version1.4.4*:
OS versionRHEL7.2:
Description of the problem including expected versus actual behavior:
Hi,
i use elasticsearch to collect logmessages. I try to get a overview of
all hosts and the sum of logfiles in the last hour. To get the result i
use the python client from elasticsearch and these query:
{
"aggs": {
"hosts" : {
"filter" : {
"range" : {
"@timestamp" : { "gt" : "now-1h" }
}
},
"aggs" : {
"logs_per_host" : {
"terms" : {
"field" : "logsource",
"size" : 5000
}
}
}
}
}, "size" : 0
})
The field "logsource" contains the unique hostname of each server.
The query runs well and i got buckets with the doc_count of each host.
The problem is the count of some hosts seems to be wrong. The query
counts ~ 8000 logs in the last hour. If i verify the value of these
hosts with kibana the count for this host is ~4500 logs. I also verify
the count of this host with this es query:
{
"aggs" : {
"host" : { "filter" : { "term" : { "logsource" : hostname } },
"aggs" : {
"logs_per_hour" : {
"date_histogram" : {
"field" : "@timestamp",
"interval" : "1h",
"order" : { "_count" : "asc" }
}
}
}
}
}
This shows me that the host has ~ 4000 Logs per our, so the first query
seems to be wrong. I dont know if this is a bug or the query is wrong...
Some counts from the first query seems to be okay because the values
matches with kibana and the secound query.
clintongormley told me on github:
Hi @xoxys
You're using a top-level filter in the first query which is applied AFTER aggs are calculated.
But i dont know what this means. Can someone explain this a little bit more?
Thanks