Aggregation returning contradictory results

Our elasticsearch cluster is returning is contradictory results. We execute the following query, searching for values below 480 (that would mean we have a problem):

{
  "aggs": {
    "2": {
      "terms": {
        "field": "tenant",
        "order": {"_count": "desc"},
        "size": 20
      }
    }
  },
  "size": 0,
  "track_total_hits" : true,
  "query": {
    "bool": {
      "filter": [
        {"match_phrase": {"source": "JOB"}},
        {"match_phrase": {"profile": "producer"}},
        {"match_phrase": {"type": "SECTION"}},
        {"match_phrase": {"sectionIndex": "0"}},
        {
          "range": {
            "timestamp": {
              "gte": "2020-09-16T08:00:00.000Z",
              "lte": "2020-09-17T08:00:00.000Z",
              "format": "strict_date_optional_time"
            }
          }
        }
      ]
    }
  }
}

Which returns:

{
  "took" : 61,
  "timed_out" : false,
  "_shards" : {
    "total" : 36,
    "successful" : 36,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 324476,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "2" : {
      "doc_count_error_upper_bound" : 609,
      "sum_other_doc_count" : 308229,
      "buckets" : [
        {
          "key" : "domain1.net",
          "doc_count" : 1440
        },
        {
          "key" : "domain2.pe",
          "doc_count" : 1440
        },
        {
          "key" : "domain3.com",
          "doc_count" : 1440
        },
        {
          "key" : "www.domain4.net",
          "doc_count" : 1440
        },
        {
          "key" : "www.domain5.com",
          "doc_count" : 1440
        },
        {
          "key" : "www.domain6.es",
          "doc_count" : 1440
        },
        {
          "key" : "domain7.com",
          "doc_count" : 960
        },
        {
          "key" : "m.domain8.ba",
          "doc_count" : 960
        },
        {
          "key" : "domain9.com",
          "doc_count" : 960
        },
        {
          "key" : "www.domain10.cl",
          "doc_count" : 960
        },
        {
          "key" : "www.domain11.cl",
          "doc_count" : 960
        },
        {
          "key" : "www.domain12.com",
          "doc_count" : 960
        },
        {
          "key" : "www.domain13.net",
          "doc_count" : 959
        },
        {
          "key" : "www.domain14.com",
          "doc_count" : 158
        },
        {
          "key" : "domain15.fr",
          "doc_count" : 156
        },
        {
          "key" : "domain16.com",
          "doc_count" : 121
        },
        {
          "key" : "domain17.com",
          "doc_count" : 117
        },
        {
          "key" : "pre.domain18.com",
          "doc_count" : 116
        },
        {
          "key" : "domain19.com.br",
          "doc_count" : 110
        },
        {
          "key" : "m.domain20.com.py",
          "doc_count" : 110
        }
      ]
    }
  }
}

So, we go after one of these low results, and we add to the filter:

 {"match_phrase": {"tenant": "domain15.fr"}},

And we get:

{
  "took" : 12,
  "timed_out" : false,
  "_shards" : {
    "total" : 36,
    "successful" : 36,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 480,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "2" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "domain15.fr",
          "doc_count" : 480
        }
      ]
    }
  }
}

Contradicting itself. This happens with every domain we have tried, every time. It's reproducible and consistent.

Any idea?

Your first result has doc_count_error_upper_bound and sum_other_doc_count above zero, which means approximate results. See terms aggregation document counts are approximate for more context.

Increasing the size parameter will help with accuracy, but can hurt performance. In this case try setting a shard size or maybe use the composite aggregation.

Best

Clear, thanks! A bummer, though.