Inability to retrieve data set when sorted on multivalued date field

We have a data set of 40M+ records in one of the indices in Elasticsearch v6.8.18. Data is added/updated/removed daily at a given time (no ad-hoc changes happen during the day).

In the application which uses ES we have a possibility to sort data set via multivalued date field (aka timestamp) which is defined like this:

{
    "properties": {
        "ourFancyDateField": {
            "type": "date",
            "format": "yyyy-MM-dd"
        }
    }
}

The sort part of the query we do in the application looks like the following:

{
    "sort": [
        {
            "parent.child.ourFancyDateField": {
                "order": "desc",
                "mode": "max"
            }
        },
        // here goes document id sort (tie breaker)
        // ...
    ]
}

All the data in the given field is valid. Multiple records could have the same date in this array.

Now comes the interesting part.

We try to paginate over the data set (let's say 100k records matched), and we use "search after" approach. After paginating around 70-80k records, we start getting empty hits.
We can even write a query to limit date to only one value and the issue is still reproducible.

Same issue happens with regular scroll approach.

For some other queries, we can fetch data fully. Also, if we sort on any other field, we get everything back.

Even more interesting that after reindexing with completely the same mapping issue goes away for one set of queries but arises for others.
Also, the issue is not dependent on number of replicas, as on our dev environment we have set it to zero.

Moreover, I tried to reproduce this issue when taking a subset of affected data and reindexing it to the smaller index, but then it is not reproducible any more.

I suppose it can be somehow related to the fact ES transforms dates internally to timestamps and maybe struggling to sort by it with "max" mode.

Could you suggest me anything?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.