docvalues are sorted by docid, so they are not values, but why can they be fast for sorting? I want to know deeply about the internal structure, please help!
Indeed, doc values are sorted by docIds, so they are not super fast if you need to sort a lot of documents. But, you can:
-
provide a filter that will limit the number of documents for which you need to lookup their doc values. In the article you mentioned, this filter is a term query that matches 10,000 documents (0.1% of the index). Thus even if we have an index of 10M documents, we need to look up doc values only of 10K documents that satisfy our filter, which makes sorting operation much faster.
-
You can sort your index by a field on which you expect to run a lot of sort queries. In this case, your doc Ids will be reorganized to to match the index sort criteria, and this type of sort queries will be super fast.
-
We have recently done sort optimization for numeric fields, where internally we would use points data structure instead of doc values for some cases, which would make sorting for those cases much faster.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.