Hello Elasticsearch,
I'm having an issue finding the right combination of aggregations to achieve a query I'm looking for. Essentially what I need to do is take the most recent document by ID over the last 30 days and take a value from that most recent document - depending on what the value of that document is I want to map its value to a 1 or a 0, which I can then run a sum aggregation on all IDs for each day.
I have data that looks something like this:
{ "id": 1, "date": "2020-01-01T00:00:00Z", "status": 10}
{ "id": 1, "date": "2020-01-04T00:00:00Z", "status": 8}
{ "id": 1, "date": "2020-01-06T00:00:00Z", "status": 10}
{ "id": 2, "date": "2020-01-02T00:00:00Z", "status": 10}
{ "id": 2, "date": "2020-01-03T00:00:00Z", "status": 8}
so I want to make buckets for each day in whatever time range - in this case: [2020-01-01, 2020-01-02, 2020-01-03, 2020-01-04, 2020-01-05, 2020-01-06, 2020-01-07]. For each day, I need the most recent document (according to "date" field) in the past 30 days, for each individual ID.
Day: 2020-01-01
{ "id": 1, "date": "2020-01-01T00:00:00Z", "status": 10}
{ "id": 2, "date": "2020-01-02T00:00:00Z", "status": 10}
Day: 2020-01-02
{ "id": 1, "date": "2020-01-01T00:00:00Z", "status": 10}
{ "id": 2, "date": "2020-01-02T00:00:00Z", "status": 10}
Day: 2020-01-03
{ "id": 1, "date": "2020-01-01T00:00:00Z", "status": 10}
{ "id": 2, "date": "2020-01-03T00:00:00Z", "status": 8}
Day: 2020-01-04
{ "id": 1, "date": "2020-01-04T00:00:00Z", "status": 8}
{ "id": 2, "date": "2020-01-03T00:00:00Z", "status": 8}
Day: 2020-01-05
{ "id": 1, "date": "2020-01-04T00:00:00Z", "status": 8}
{ "id": 2, "date": "2020-01-03T00:00:00Z", "status": 8}
Day: 2020-01-06
{ "id": 1, "date": "2020-01-06T00:00:00Z", "status": 10}
{ "id": 2, "date": "2020-01-03T00:00:00Z", "status": 8}
Then for each document that matches in the bucket I need to do some predicate like status == 8 ? 1 : 0
which will reduce the documents into a value that I can then sum for each individual day.
Is there a way that I can do this? I've tried something things like using the Moving Function Aggregation to do that sort of "past 30 days most recent document" check, which works pretty well except that I get buckets of IDs > buckets of Days > 1 or 0 for each day, but then I can't figure out how to sum the values for each ID bucket together for each day (so that the resulting aggregation is a date_histogram, not an ID bucket containing a date histogram).
Please let me know if I can provide more info, thank you!