Here is an example:
doc = {
'model': 'v1',
'dataset': 1,
'accuracy': 0.2,
'timestamp': "2017-09-28T14:15:08.817638",
}
doc = {
'model': 'v1',
'dataset': 2,
'accuracy': 0.8,
'timestamp': "2017-09-28T14:15:08.972043",
}
doc = {
'model': 'v1',
'dataset': 1,
'accuracy': 0.5,
'timestamp': "2017-09-28T14:15:29.007065",
}
For each dataset, I want to get the latest accuracy and then get the average of all the accuracies. So, in the above example, for dataset 1 I must only look at the document with "timestamp": "2017-09-28T14:15:29.007065". Hence the average of accuracies would be (0.8 + 0.5) / 2 = 0.65.
I can get the latest document for each datasets using aggregation using the following:
"aggs": {
"group_by_dataset_id":{
"terms": {
"field": "dataset"
},
"aggs": {
"most_recent": {
"top_hits": {
"sort":[
{
"timestamp":{
"order": "desc"
}
}
],
"size": 1
}
}
}
}
}
This gives me:
"aggregations": {
"group_by_dataset_id": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 2,
"most_recent": {
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_index": "model_metrics",
"_type": "type",
"_id": "AV7Iacbya3PBMkqcVi8g",
"_score": null,
"_source": {
"model": "v1",
"dataset": 1,
"accuracy": 0.5,
"timestamp": "2017-09-28T14:15:29.007065"
},
"sort": [
1506608129007
]
}
]
}
}
},
{
"key": 2,
"doc_count": 1,
"most_recent": {
"hits": {
"total": 1,
"max_score": null,
"hits": [
{
"_index": "model_metrics",
"_type": "type",
"_id": "AV7IaXiua3PBMkqcVi8f",
"_score": null,
"_source": {
"model": "v1",
"dataset": 2,
"accuracy": 0.8,
"timestamp": "2017-09-28T14:15:08.972043"
},
"sort": [
1506608108972
]
}
]
}
}
}
]
}
}
How should I now aggregate over the accuracy field of the hits? Or how can I access any fields in the hits in each bucket? Do you know another way to get the result that I want maybe by using other aggregation types thatn top_hits?