Wrong doc_count on Aggregations

Request:

{
  "query": {
    "bool": {
      "should": [
        {
          "match_all": {}
        }
      ],
      "must": [
        {
          "nested": {
            "path": "invitations",
            "query": {
              "terms": {
                "invitations.locationName": [
                  "rue de Lemaire"
                ]
              }
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "invitations.locationName": {
      "nested": {
        "path": "invitations"
      },
      "aggs": {
        "invitations.locationName_nested": {
          "terms": {
            "field": "invitations.locationName",
            "size": 0
          }
        }
      }
    }
  },
  "from": 0,
  "size": 10
}

Response:

"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 108,
...
},
"aggregations": {
"invitations.locationName": {
"doc_count": 369,
"invitations.locationName_nested": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "rue de Lemaire",
"doc_count": 123
},
...

=> The filter doesn't correspond to the aggregation,
hits = 108 vs agg doc_count = 123 with the key "rue de Lemaire".
Same issue with the other keys of this aggregation.
Why? How to solve it?
Thanks!

I don't have this issue with other aggs that are not nested.

Because 369 and 123 are numbers of nested documents and 108 is the number of root-level docs?

Thank you,
if you're right, what should I add on the query to have the root_doc count insted of the nested one?

I got it!
Source: http://stackoverflow.com/questions/27553916/how-to-return-the-count-of-unique-documents-by-using-elasticsearch-aggregation

Original:

"aggs": {
    "invitations.locationName_nested": {
        "terms": {
            "field": "invitations.locationName",
            "size": 0
        }
    }
}

Modified:

"aggs": {
    "invitations.locationName_nested": {
        "terms": {
            "field": "invitations.locationName",
            "size": 0
        },
        "aggs": {
            "top_reverse_nested": {
                "reverse_nested": {}
            }
        }
    }
}

And in the response I have:

"buckets": [
{
    "key": "rue de Lemaire",
    "doc_count": 123,
    "top_reverse_nested": {
        "doc_count": 108 <---- HERE 
    }
}

When you use the nested agg you are accounting for numbers of nested docs based on properties of those nested docs.
If you want your accounting to be on root-level docs then you can use the copy_to expression in your mapping to copy nested terms to a root-level one and then do your aggs on that

Ok so there are 2 solutions:

  • using the reverse_nested agg in the query (may have performance issue with large data?)
  • using "copy_to" in the indexation part and aggregate directly on the root level (may introduce some limitations for complexe aggregation queries?).

Thanks for the help!

Good point re reverse_nested!