The Search phase has to A) traverse all the shards, B) traverse all
documents which match your (potentially complex) query and C) calculate a
score for each matching document. Depending on your query, that is
potentially a lot of documents to evaluate.
In contrast, the Fetch phase is given a list of document addresses (shard
and docID). The list is usually small, say ten results. It then goes
straight to the appropriate shard, loads the document and serializes the
source back to the coordinating node. This is much faster than the Search
phase.
As an aside, by default elasticsearch will search all shards of an index.
You can use custom routing to direct the search to a particular shard, but
this is generally more advanced behavior and may not be needed.
-Zach
On Friday, September 20, 2013 4:22:52 PM UTC-4, Pierce Wetter wrote:
Noob Question:
So my query times in Elastic HQ are about 7x the fetch times.
Is that expected or is it possible we're not giving the shard hint
correctly so it has to ask all the shards?
Yeah, I was pretty sure that we were using custom routing, but the
resulting subset would still be a fair number of documents, so 7x isn't so
bad from that point of view.
Ah, I see. Yeah, I'll stand by the original "expected behavior", the
Search phase will probably be considerably slower than Fetch even if you
direct it to a single shard, dependent on size of the shard.
Now, if the search is taking too long in general, that's something to be
concerned about. But I wouldn't worry about search:fetch ratio =)
-Zach
On Fri, Sep 20, 2013 at 5:49 PM, Pierce Wetter obastard@gmail.com wrote:
Yeah, I was pretty sure that we were using custom routing, but the
resulting subset would still be a fair number of documents, so 7x isn't so
bad from that point of view.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.