My index contains a lot of documents, each of them has several versions, for example:
{"doc_id": 13,
"version": 1,
"text": "bar"}
{"doc_id": 13,
"version": 2,
"text": "bar"}
{"doc_id": 13,
"version": 3,
"text": "bar"}
{"doc_id": 14,
"version": 1,
"text": "foo"}
{"doc_id": 14,
"version": 2,
"text": "bar"}
I want to get the last version for each document, and aggregate them (last versions) using terms
aggregation.
I've tried to use top hits
to retrieve last versions:
{"size" :0,
"aggs" : {
"doc_id_groups" : {
"terms" : {
"field" : "doc_id",
"size" : "0"
},
"aggs" : {
"docs" : {
"top_hits" : {
"size" : 1,
"sort" : {
"version" : {
"order" : "desc"
}
}
},
"aggs" : {
"text_agg" : {
"terms" : { "field" : "text" }
}
}
}
}
}
}
}
But I can't use text_agg
aggregation, because top hits
doesn't support sub aggregations.
I'm expecting this response: "buckets": [ { "key": "bar", "doc_count": 2 }]
I guess retrieving ids and then aggregating them would be very heavy operation for the client.
Maybe scripting could help?
Update: I found a very non-flexible workaround. See here: http://stackoverflow.com/a/39788948/4769188
But I'm still looking for better solution.