Elasticsearch top_hits aggregation result and doc_count are different

Query

GET /someindex/_search
{
   "size": 0,
   "query": {
      "ids": {
         "types": [],
         "values": ["08a2","08a3","03a2","03a3","84a1"]
      }
   },
   "aggregations": {
      "498": {
         "terms": {
            "field": "holderInfo.raw",
            "size": 50
         },
         "aggregations": {
            "tops": {
               "top_hits": {
                  "_source": {
                     "includes": ["uid"]
                  }
               }
            }
         }
      }
   }
}

Result

{
   ...
   "hits": {
      "total": 5,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "498": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "MATSUSHITA ELECTRIC INDUSTRIAL",
               "doc_count": 5,
               "tops": {
                  "hits": {
                     "total": 5,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "someindex",
                           "_id": "03a3",
                           "_score": 1,
                           "_source": {
                              "uid": "03a3"
                           }
                        },
                        {
                           "_index": "someindex",
                           "_id": "08a2",
                           "_score": 1,
                           "_source": {
                              "uid": "08a2"
                           }
                        },
                        {
                           "_index": "someindex",
                           "_id": "84a1",
                           "_score": 1,
                           "_source": {
                              "uid": "84a1"
                           }
                        }
                     ]
                  }
               }
            }
         ]
      }
   }
}

"08a2", "08a3", "03a2", "03a3" and "84a1" each clearly have 'MATSUSHITA ELECTRIC INDUSTRIAL' in the holderInfo.raw field.

Therefore, there are 5 cases in the doc_count, but only "03a3", "08a2", and "84a1" are output in the top_hits results, and "08a3" and "03a2" are omitted.

Query

GET /someindex/_search
{
   "size": 0,
   "query": {
      "ids": {
         "types": [],
         "values": ["08a2","08a3","03a2","03a3","84a1"]
      }
   },
   "aggregations": {
      "498": {
         "terms": {
            "script": {
               "inline": "doc['holderInfo.raw'].value"
            },
            "size": 50
         }
      }
   }
}

Result

{
   ...
   "hits": {
      "total": 5,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "498": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "MATSUSHITA ELECTRIC INDUSTRIAL",
               "doc_count": 3
            }
         ]
      }
   }
}

In addition, two cases are omitted when aggregating with script.

I'd like to know why some uids are missing.

I'm in a situation where I have to use the Elasticsearch version 2.2. I want to know if it's an Elasticsearch bug that occurs in an old version or a user's fault.

Thanks!

I'm not sure it is the same in 2.2 but top hits metric returns top three documents by default.

This is very much past it's EOL and definitely no longer supported. Unless there's a noticeable issue with your syntax that someone picks up that solves this, there's not a lot of chance of anyone being able to spot it as a bug.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.