How to aggregate over or access top_hits results in elasticsearch?

Here is an example:

doc = {
    'model': 'v1',
    'dataset': 1,
    'accuracy': 0.2,
    'timestamp': "2017-09-28T14:15:08.817638",
}

doc = {
    'model': 'v1',
    'dataset': 2,
    'accuracy': 0.8,
    'timestamp': "2017-09-28T14:15:08.972043",
}
doc = {
    'model': 'v1',
    'dataset': 1,
    'accuracy': 0.5,
    'timestamp': "2017-09-28T14:15:29.007065",
}

For each dataset, I want to get the latest accuracy and then get the average of all the accuracies. So, in the above example, for dataset 1 I must only look at the document with "timestamp": "2017-09-28T14:15:29.007065". Hence the average of accuracies would be (0.8 + 0.5) / 2 = 0.65.

I can get the latest document for each datasets using aggregation using the following:

"aggs": {
     "group_by_dataset_id":{
      "terms": {
       "field": "dataset"
      },
      "aggs": {
        "most_recent": {
          "top_hits": {
            "sort":[
              {
                "timestamp":{
                 "order": "desc"
                }
              }
            ],
            "size": 1
          }
        }
      }
   }
 }

This gives me:

"aggregations": {
    "group_by_dataset_id": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": 1,
          "doc_count": 2,
          "most_recent": {
            "hits": {
              "total": 2,
              "max_score": null,
              "hits": [
                {
                  "_index": "model_metrics",
                  "_type": "type",
                  "_id": "AV7Iacbya3PBMkqcVi8g",
                  "_score": null,
                  "_source": {
                    "model": "v1",
                    "dataset": 1,
                    "accuracy": 0.5,
                    "timestamp": "2017-09-28T14:15:29.007065"
                  },
                  "sort": [
                    1506608129007
                  ]
                }
              ]
            }
          }
        },
        {
          "key": 2,
          "doc_count": 1,
          "most_recent": {
            "hits": {
              "total": 1,
              "max_score": null,
              "hits": [
                {
                  "_index": "model_metrics",
                  "_type": "type",
                  "_id": "AV7IaXiua3PBMkqcVi8f",
                  "_score": null,
                   "_source": {
                    "model": "v1",
                    "dataset": 2,
                    "accuracy": 0.8,
                    "timestamp": "2017-09-28T14:15:08.972043"
                  },
                  "sort": [
                    1506608108972
                  ]
                }
             ]
            }
          }
        }
      ]
    }
  }

How should I now aggregate over the accuracy field of the hits? Or how can I access any fields in the hits in each bucket? Do you know another way to get the result that I want maybe by using other aggregation types thatn top_hits?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.