Sorting by the opposite direction is very slow


(Jason Baumgartner) #1

I have an index where I used the new Lucene feature of indexing on a specific field. In my mapping, I have the following:

"sort.field": "created_utc",
"sort.order": "desc",

When I request documents using the same sort order, matches come back extremely fast. For example, I am searching Reddit submissions with an API I created:

https://beta.pushshift.io/reddit/submission/search/?sort=desc&size=1&pretty=true&metadata=true

If you look at the metadata->elasticsearch->took value, this search completes in milliseconds.

However, if I sort in the opposite direction, the search time is in seconds (usually over 10):

https://beta.pushshift.io/reddit/submission/search/?sort=asc&size=1&pretty=true&metadata=true

From what I know about binary search, searching on a sorted array is a O(log N) operation. However, sorting in the opposite direction should also be nearly as fast. Is this a bug with Lucene's implementation of their index?


(Jimferenczi) #2

From what I know about binary search, searching on a sorted array is a O(log N) operation. However, sorting in the opposite direction should also be nearly as fast. Is this a bug with Lucene's implementation of their index?

Sorting the result of a search is not a binary search. We collect all documents that match the query in the order of their appearance in the index and keep track of the N best ones with a priority queue. If the index sort matches the search sort we can early terminate the query when N documents are collected, otherwise all documents must be collected. The reverse index sort order is the worst case since it requires to change the top N every time we visit a new document. There was some discussions to optimize this case in Lucene though (https://issues.apache.org/jira/browse/LUCENE-7482) with some interesting ideas. The issue is quite old so it could be interesting to open a new one that focuses on reverse index sort.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.