Hi, So I am running a query to get the number of api calls in my logs.
When I use this query, the numbers come in too low
{
"query": {
"bool": {
"must": [
{
"query_string": {
"analyze_wildcard": true,
"query": "*"
}
},
{
"range": {
"@timestamp": {
"gte": 1505102400000,
"lte": 1505188799999,
"format": "epoch_millis"
}
}
}
],
"must_not": []
}
},
"size": 0,
"_source": {
"excludes": []
},
"aggs": {
"2": {
"terms": {
"field": "api_call.keyword",
"size": 150,
"order": {
"_count": "desc"
}
},
"aggs": {
"3": {
"terms": {
"field": "response.keyword",
"size": 10,
"order": {
"_count": "desc"
}
}
}
}
}
}
}
As an example, when I run this, my top endpoint comes back as having 582,734 hits in a day.
When I search my logs for that endpoint, I get 597,076 hits.
My logs having more hits than elastic for a given endpoint appears to be consistent - even the % difference between elastic and the logs stays the same.
At this point you are probably thinking that there is a problem with how I am getting data into elastic, and that the missing documents are simply not in elastic. However, when I go into kibana, and filter for an endpoint, the number I get is the exact same as what I see in my logs. In addition, when I run the query, the total number of hits matches exactly the number of hits in my logs.
The query's response also has
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
appear before each result, so it does not seem like the issue discussed in https://www.elastic.co/guide/en/elasticsearch/reference/5.4/search-aggregations-bucket-terms-aggregation.html#search-aggregations-bucket-terms-aggregation-approximate-counts is the cause.
Does anyone know why I am seeing this discrepancy?