Query hit count and query with aggregations don't match


(Karl Putland) #1

I'd like to see the aggregation over the entire index. Really what is of interest is utilization of the fields, but the query with the aggregation returns only a subset of the 2.7m records.

>>> r = es.search(index='cdr-2015.11.17', search_type='count', body={ 'query': {'match_all':{}}}, timeout=120, size=5000000)
>>> r
{u'hits': {u'hits': [], u'total': 2684630, u'max_score': 0.0}, u'_shards': {u'successful': 3, u'failed': 0, u'total': 3}, u'took': 35, u'timed_out': False}
>>> r = es.search(index='cdr-2015.11.17', search_type='count', body={ 'query': {'match_all':{}}, 'aggs': {
                        'missing_origination_egress_packets': {'missing': {'field': u'@fields.origination_egress_packets'}},
                        'missing_centrex_cfaDeactivation_facResult': {'missing': {'field': u'@fields.centrex_cfaDeactivation_facResult'}},
                        'missing_centrex_executiveAssistantOptOut_facResult': {'missing': {'field': u'@fields.centrex_executiveAssistantOptOut_facResult'}}} }, timeout=120, size=5000000)
>>> r['hits']['total']
14017
>>>

(Adrien Grand) #2

Can you check that there were no shard failures and that you did not hit the timeout?


(Karl Putland) #3

On Thu, Nov 19, 2015 at 8:02 AM, Adrien Grand noreply@discuss.elastic.co wrote:

timeout=120

query with aggregation has a took of ~650.

total changes from run to run on the query.


(system) #4