Inconsistent Aggs When Using Nested Docs

Elasticsearch version: 5.2.1

JVM version: 1.8.0_121

OS version: Debian 8

I'm using a single shard index (no replicas).

For some reason my query's aggs are giving me numbers that are either exactly correct, or greater than the query's doc count.

Since I'm using an index that uses multiple levels of nested docs, I have to use an "reverse_nested" agg to get the doc count when filtering by a nested doc value.

Any ideas why this would happen? (I can provide a copy of the full db if needed, this isn't prod/sensitive info).

#Query

{
  "size": 0,
  "_source": false,
  "query": {
    "bool": {
      "filter": [
        {
          "nested": {
            "path": "product",
            "query": {
              "bool": {
                "filter": [
                  {
                    "term": {
                      "product.status_id": 1
                    }
                  },
                  {
                    "nested": {
                      "path": "product.brand",
                      "query": {
                        "bool": {
                          "filter": [
                            {
                              "term": {
                                "product.brand.id": 23107 //<-- additional product.brand.id filter
                              }
                            }
                          ]
                        }
                      }
                    }
                  }
                ]
              }
            }
          }
        }
    ]
  }
},
"aggs": {
  "brands": {
    "global": {}, //<---- global scope
      "aggs": {
        "brands": {
          "filter": {
            "bool": {
              "filter": [
                {
                  "nested": {
                    "path": "product",
                    "query": {
                      "bool": {
                        "filter": [
                          {
                            "term": {
                              "product.status_id": 1 //<-- same filter as above minus product.brand.id
                            }
                          }
                        ]
                      }
                    }
                  }
                }
              ]
            }
          },
          "aggs": {
            "brands": {
              "nested": {
                "path": "product.brand"
              },
              "aggs": {
                "brands": {
                  "terms": {
                    "field": "product.brand.id",
                    "size": 5
                  },
                  "aggs": {
                    "brands": {
                      "reverse_nested": {} //<-- get root doc count
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

#Results:

{
  "took": 226,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 2890, // <-- doc count
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "brands": {
      "doc_count": 1354323,
      "brands": {
        "doc_count": 442506,
        "brands": {
          "doc_count": 500478,
          "brands": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 465820,
            "buckets": [
              {
                "key": 11317,
                "doc_count": 9877,
                "brands": {
                  "doc_count": 9877
                }
              },
              {
                "key": 28221,
                "doc_count": 6805,
                "brands": {
                  "doc_count": 6805
                }
              },
              {
                "key": 40330,
                "doc_count": 6243,
                "brands": {
                  "doc_count": 6243
                }
              },
              {
                "key": 23107, //<--- same product.brand.id as used in main query
                "doc_count": 6132,
                "brands": {
                  "doc_count": 2948 //<--- main query doc count is 2890
                }
              },
              {
                "key": 23083,
                "doc_count": 5601,
                "brands": {
                  "doc_count": 5599
                }
              }
            ]
          }
        }
      }
    }
  }
}

The difference is due to your query. First your query filter by product.status_id = 1 and then product.brand.id = 23107 . This result in 2890 docs returned. In your aggs you only specified filter product.status_id = 1.

Right, but I'm doing an agg on the product.brand.id which creates buckets with counts, then the corresponding bucket for product.brand.id = 23107 shows a count of 2948 even though I rejoin the agg to the doc root with 'reverse_nested'.

Can you try removing the product.brand agg?

From:

"aggs": {
            "brands": {
              "nested": {
                "path": "product.brand"
              },
              "aggs": {
                "brands": {
                  "terms": {
                    "field": "product.brand.id",
                    "size": 5
                  },

To

          "aggs": {
            "brands": {
              "terms": {
                "field": "product.brand.id",
                "size": 5
              },

That query wouldn't work since we're dealing with nested documents. The agg would return nothing.

I'm able to get my sample to work and it's nested doc. It would be easier if you can provide some sample data and your index mapping.


My example:

......
......

"aggs": {
  "all": {
    "global": {},
                  "aggs": {
        "price": {
          "filter": {
            "bool": {
              "filter": [
                {
                  "nested": {
                    "path": "user",
                    "query": {
                      "bool": {
                        "filter": [
                          {
                            "term": {
                              "user.price": 1 //<-- same filter as above minus cost.id
                            }
                          }
                        ]
                      }
                    }
                  }
                }
              ]
            }
          },
          "aggs": {
            "brands": {
              "nested": {
                "path": "cost"
              },
              
                  "aggs": {
                    "brands": {
                      "reverse_nested": {} //<-- get root doc count
                    }
              }
            }
          }
                  
        }}}}
Output:

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0,
    "hits": [

.....
....

  "aggregations": {
    "all": {
      "doc_count": 6,
      "price": {
        "doc_count": 4,
        "brands": {
          "doc_count": 7,
          "brands": {
            "doc_count": 3
          }
        }
      }
    }
  }

An ElasticSearch contributor solved my issue here: https://github.com/elastic/elasticsearch/issues/23319

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.