Say I have 2 indexes with 100K documents and 1M documents.
Then I make a filter which returns 1K documents from each index and sorts by a biginteger field. Which one will complete faster ?
Meaning, is the sort speed based on the number of filtered documents or total doucments (assuming everything in ram) ?
If you mean an infinite precision field then that isn't a thing we have support for. Long is the best we've got.
The dominant factor is going to be the number of hits that who's value for that field is in a disk block that the OS hasn't paged in. If all blocks are paged in then the dominant factor is the number of hits.
Sorting (by a field or by score) works by making a min-heap the size of the number of hits and dumping all the hits into it, discarding hits that "fall out" or "don't fit". If you sort by _score
then the query has to figure out some score and that is the sort key. If you are sorting by a field the queries skip figuring out a score and instead the sort key is the document's value for that field (or those fields if sorting by more than one field).
Getting the value for a field should be fairly fast this talk went into some detail about it but I can't figure the video at this point, sadly. It is really one of my favorite talks.