Query hit count and query with aggregations don't match

I'd like to see the aggregation over the entire index. Really what is of interest is utilization of the fields, but the query with the aggregation returns only a subset of the 2.7m records.

>>> r = es.search(index='cdr-2015.11.17', search_type='count', body={ 'query': {'match_all':{}}}, timeout=120, size=5000000)
>>> r
{u'hits': {u'hits': [], u'total': 2684630, u'max_score': 0.0}, u'_shards': {u'successful': 3, u'failed': 0, u'total': 3}, u'took': 35, u'timed_out': False}
>>> r = es.search(index='cdr-2015.11.17', search_type='count', body={ 'query': {'match_all':{}}, 'aggs': {
                        'missing_origination_egress_packets': {'missing': {'field': u'@fields.origination_egress_packets'}},
                        'missing_centrex_cfaDeactivation_facResult': {'missing': {'field': u'@fields.centrex_cfaDeactivation_facResult'}},
                        'missing_centrex_executiveAssistantOptOut_facResult': {'missing': {'field': u'@fields.centrex_executiveAssistantOptOut_facResult'}}} }, timeout=120, size=5000000)
>>> r['hits']['total']
14017
>>>

Can you check that there were no shard failures and that you did not hit the timeout?

On Thu, Nov 19, 2015 at 8:02 AM, Adrien Grand noreply@discuss.elastic.co wrote:

timeout=120

query with aggregation has a took of ~650.

total changes from run to run on the query.