Why does aggregation contain ID's that don't exist in the query results?

To gather data on how many documents have been prepared by some employee in a given time period, I have written this query:

{
  "from": 0,
  "size": 0,
  "sort": [],
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "preparers",
            "query": {
              "terms": {
                "preparers.employee.id": [
                  -1097
                ]
              }
            }
          }
        },
        {
          "range": {
            "regDate": {
              "gte": "22.02.2023.",
              "format": "dd.MM.yyyy."
            }
          }
        },
        {
          "range": {
            "regDate": {
              "lte": "24.02.2023.",
              "format": "dd.MM.yyyy."
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "body": {
      "nested": {
        "path": "preparers"
      },
      "aggs": {
        "body": {
          "terms": {
            "field": "preparers.employee.id",
            "size": 1000,
            "order": {
              "_count": "desc"
            },
            "min_doc_count": 0
          }
        }
      }
    }
  }
}

For the chosen employee the query returns correct data. However, it also returns aggregations for ID's that aren't even in the query results. A sample:

{
    "took": 6,
    "timed_out": false,
    "_shards": {
        "total": 3,
        "successful": 3,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 6,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "body": {
            "doc_count": 13,
            "body": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                    {
                        "key": -1097,
                        "doc_count": 6
                    },
                    {
                        "key": -1208,
                        "doc_count": 3
                    },
                    {
                        "key": -1210,
                        "doc_count": 0
                    },
                    {
                        "key": -1602,
                        "doc_count": 0
                    },
                    {
                        "key": -1233,
                        "doc_count": 0
                    },
                    {
                        "key": 204382,
                        "doc_count": 0
                    }
                ]
            }
        }
    }
}

The ID's with doc_count 0 don't exist in the results. Have I written this query wrong, or why are those non-existent ID's added in the aggregation?

Hi @A_K3

I suspected that document has inside "preparers" a list employee id with values -1097, -1208, -1210.

When you filter by the ID of employee x, the document is returned with all employee ids inside preparers and therefore in its aggregation the final result has several ids.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.