For my thesis I'm currently investigating the speed (down to milliseconds) of Elasticsearch and some other NoSQL database systems.
My question, rather technical mind you, is: What is the internal behaviour when using the size operator in a search query?
I've noticed that, compared to other database systems, Elasticsearch is very consistent when it comes to the speed at which it returns data and the total items found. Where other databases take a longer time to return data the more results are found, Elasticsearch's response time is almost always the same, regardless of the total amount of requests sent.
My hypothesis is that in Elasticsearch, when using the size operator, the number of documents that are actually looked up and retrieved after the search in the indexes is finished is exactly the amount set in the size operator. Where in other database systems this is not the case, in these database systems all documents that matched in the index are retrieved, and only the top X amount is eventually returned to the client.
I have no way, other than to spend hours looking through the source code, to figure out if this hypothesis is correct, or if this is something that can be found in the Lucene documentation?
Thanks for taking the time to read this, any responses are appreciated and will help me further my research.