How does sorting work under the hood?

Elasticsearch Version: 7.10

Before I start to take a deeper look at a bug report that just arrived my desk I wanted to know the following.

When a sort criteria is applied, is it guaranteed that all documents are sorted or can the order be somehow fuzzy due to performance optimizations. I did not find anything about a fuzzy sort in documentation but I also find it unlikely that Elasticsearch always reads each document of an index to do its sorting.

Why do I ask this question

We implemented a Rest service that translates SQL like queries to Elasticsearch queries. This rest service also offers the possibility to page through the results.

When we are paging through a result we are not using "from", instead we are constantly increasing size for each page and cut the not needed results. I honestly don't know why we are doing it that way, but that should not be of concern for this question.

We got a bug report from a customer concerning this paging.

Here is what happens: So lets say the customer uses our SQL like syntax to sort by a text property and uses our pages approach to page through the result size. For simplicity let the page size be 2. The first page contains "A" and "C", the second page contains "B" and "D". So it seems like each page is sorted but the global sorting somehow messed up.

That would mean that Elasticsearch returned "A", "C" on the first query with "size: 2" and "A","B","C","D" on the second query with "size: 4". At the moment I cannot verify what was actually returned from Elasticsearch. May be we messed it up on our side.

Although I can not verify what was actually returned by Elasticsearch, I can verify that the queries we sent are "correct"... meaning, the sort criteria is set and the size attribute is incremented on each successive query.

There is no any fuzzy sorting in elasticsearch. The sort is always precise based on the values of sort fields, and using internal values of shard and docId for tie breaking.

Thanks for the clarification.

In our case the issue was due to a wrong sorting criterion. We were missing a "sort.nested.filter" criterion.

1 Like