How to aggregate over or access top_hits results in elasticsearch?

Mojtaba_Tabatabaeipo · September 28, 2017, 2:41pm

Here is an example:

doc = {
    'model': 'v1',
    'dataset': 1,
    'accuracy': 0.2,
    'timestamp': "2017-09-28T14:15:08.817638",
}

doc = {
    'model': 'v1',
    'dataset': 2,
    'accuracy': 0.8,
    'timestamp': "2017-09-28T14:15:08.972043",
}
doc = {
    'model': 'v1',
    'dataset': 1,
    'accuracy': 0.5,
    'timestamp': "2017-09-28T14:15:29.007065",
}

For each dataset, I want to get the latest accuracy and then get the average of all the accuracies. So, in the above example, for dataset 1 I must only look at the document with "timestamp": "2017-09-28T14:15:29.007065". Hence the average of accuracies would be (0.8 + 0.5) / 2 = 0.65.

I can get the latest document for each datasets using aggregation using the following:

"aggs": {
     "group_by_dataset_id":{
      "terms": {
       "field": "dataset"
      },
      "aggs": {
        "most_recent": {
          "top_hits": {
            "sort":[
              {
                "timestamp":{
                 "order": "desc"
                }
              }
            ],
            "size": 1
          }
        }
      }
   }
 }

This gives me:

"aggregations": {
    "group_by_dataset_id": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": 1,
          "doc_count": 2,
          "most_recent": {
            "hits": {
              "total": 2,
              "max_score": null,
              "hits": [
                {
                  "_index": "model_metrics",
                  "_type": "type",
                  "_id": "AV7Iacbya3PBMkqcVi8g",
                  "_score": null,
                  "_source": {
                    "model": "v1",
                    "dataset": 1,
                    "accuracy": 0.5,
                    "timestamp": "2017-09-28T14:15:29.007065"
                  },
                  "sort": [
                    1506608129007
                  ]
                }
              ]
            }
          }
        },
        {
          "key": 2,
          "doc_count": 1,
          "most_recent": {
            "hits": {
              "total": 1,
              "max_score": null,
              "hits": [
                {
                  "_index": "model_metrics",
                  "_type": "type",
                  "_id": "AV7IaXiua3PBMkqcVi8f",
                  "_score": null,
                   "_source": {
                    "model": "v1",
                    "dataset": 2,
                    "accuracy": 0.8,
                    "timestamp": "2017-09-28T14:15:08.972043"
                  },
                  "sort": [
                    1506608108972
                  ]
                }
             ]
            }
          }
        }
      ]
    }
  }

How should I now aggregate over the accuracy field of the hits? Or how can I access any fields in the hits in each bucket? Do you know another way to get the result that I want maybe by using other aggregation types thatn top_hits?

system · October 26, 2017, 2:42pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Get the latest document version and aggregate the results Elasticsearch	1	1425	July 5, 2017
Aggregation over top hits Elasticsearch	1	183	July 23, 2022
Top-hits on aggregations Elasticsearch	2	357	July 9, 2021
Sum results from top_hits aggregation Elasticsearch	1	616	May 2, 2018
Return the top k hits from each bucket after aggregation? Elasticsearch	3	480	February 27, 2021

How to aggregate over or access top_hits results in elasticsearch?

Related topics