Hey everyone! I'm having a hard time with the following issue. Suppose I have the following three documents:
{
"entity_id": "A",
"stringField1": "SF1A",
"stringField2": "SF2",
"numericField": 10,
"updated_at": "2020-10-18",
}
{
"entity_id": "A",
"stringField1": "SF1A",
"stringField2": "SF2",
"numericField": 1000,
"updated_at": "2020-11-18",
}
{
"entity_id": "B",
"stringField1": "SF1B",
"stringField2": "SF2",
"numericField": 1000,
"updated_at": "2020-11-18",
}
I'd like to run an aggregation of numericField
values for example for a specific value of stringField1/stringField2 but ONLY considering the "newest" document for a given an entity_id
.
I read here about field collapsing here, and it worked if I want to do a term query.
If e.g. I wanted to aggregate for stringField1 == "SF1A"
and stringField == "SF1B"
, I could do:
"aggs": {
"first_agg": {
"aggs": {
"my_numericfield_agg": {
"sum": {
"field": "numericField"
}
}
},
"filter": {
"term": {
"stringField1": "SF1A"
}
}
},
"second_agg": {
"aggs": {
"my_numericfield_agg": {
"sum": {
"field": "numericField"
}
}
},
"filter": {
"term": {
"stringField1": "SF1B"
}
}
}
}
But this would consider all (old and new) documents for the same entity_id.
Is there a way to do this aggregation but only considering the latest (by "updated_at") document per "entity_id"? (This is before doing the sum aggregation)
I've read about top hits here but couldn't figure out yet how to apply it to my use case.
Thank you very much!