This question always comes up during discussions and there is "no" strong answer to say
Is "Fetch" phase really needed during search - As we understand , the coordinating node executes the search in two phases - "Query" - "Then"
One of the reasoning was - From documentation - Why the whole information cant be sent to coordinating node and only just enough information? If whole information i sent, coordinating can merge shard level results and send the documents back to client by NOT performing "Fetch operation" ? Is it for pagination , or is it to save some bandwidth ?
The request is processed in two phases. In the first phase, the query is forwarded to all involved shards . Each shard executes the search and generates a sorted list of results, local to that shard. Each shard returns just enough information to the coordinating node to allow it to merge and re-sort the shard level results into a globally sorted set of results, of maximum length size .
Yes, it saves a good deal of effort to run this in two phases. If you run a search asking for the top 10 documents across 1000 shards in a single phase then the coordinating node might receive up to 10,000 documents, 99.9% of which it would then discard.
Put differently, that would be a lot more than just enough information, it would be far too much information.
hmm , coordinate node cant just ask for "those 10 documents" as it need to build whole aggregation, it doesnt seem no possibilities exist to optmize this .
However, if the client doesnt send pagination parameters , can this fetch phase be disabled ? ( I know it doesnt seem practical as it always, most the cases, the serach requires 1% results out of all documents - )
I dont see it as a problem but as a convenient feature . Is there a way to disable this "fetch" phase if it is not needed something in case of aggregations with size = 0
I don't really understand, sorry. By default we return the top 10 docs, so we must fetch them, and it's much more efficient to do that in a second phase rather than during the query phase. If you don't want to return any docs then you can override this default behaviour and ask for no docs to be returned. Are you asking about the default behaviour or the override?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.