Query by max date

I have an index that is append only. Every change gets saved as a new document, together with a date property. In order to get the current active data I need to group all entities by their ID and get the one with the latest date. It should be possible to add a filter for that date. I have been playing around with aggregations but I got stuck with the scroll API.

Data example (note: id's are guids, dates are unix timestamps, using other values here to make it clearer)

   {"id": "<entityId_date>", "entityId", 1, "date": "2020-05-10", "value": { <JSON document> } }, 
   {"id": "<entityId_date>", "entityId", 1, "date": "2020-06-10", "value": { <JSON document> } }, 
   {"id": "<entityId_date>", "entityId", 2, "date": "2020-04-10", "value": { <JSON document> } }, 
   {"id": "<entityId_date>", "entityId", 2, "date": "2020-08-10", "value": { <JSON document> } },
   {"id": "<entityId_date>", "entityId", 3, "date": "2020-07-10", "value": { <JSON document> } }

Required results:

   {"id": "<entityId_date>", "entityId", 1, "date": "2020-06-10", "value": { <JSON document> } }, 
   {"id": "<entityId_date>", "entityId", 2, "date": "2020-08-10", "value": { <JSON document> } },
   {"id": "<entityId_date>", "entityId", 3, "date": "2020-07-10", "value": { <JSON document> } }

Tried the following query:

{
  "size": 0,
  "aggs": {
    "id_aggregate": {
      "terms": { 
          "field": "entityId"
       },
      "aggs": {
        "date_aggregate": {
          "top_hits": {
            "sort": [{
                "date.ticks": {
                    "order": "desc"
                }
            }],
            "size": 1
          }
        }
     }
    }
  }
}

The result I get back is correct, but I only get the first X results back. I can set the size of the id_aggregate, but there could be thousands of documents.
With a normal search query, I'd be able to use the Scroll API, but this is not possible with aggregations.

I have looked into using partitions, but that doesn't help me either, because sometimes there will be very few documents and sometimes there will be loads. (This is kind of a base query, all other queries we execute will have additional filters specified).

How can I build a query that fetches documents by their max date and is scrollable?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.