I am writing perf tool to load test elasticsearch. I have both index and
search working properly. I noticed that when I query for a term I get the
response in pretty decent time considering I just have 1 shard with 200M
docs in it. However, when I set size to 1000 it takes around 10 secs. In
both cases when I use "size" or not I still get same number of hits.total
value, which means it is searching through entire shard. Does it mean that
the extra time, when size is set to 1000, is coming from the fact that it
has to fetch the source of the document? In other words time to do just
"search" is always the same for that specific term in both cases?
Hi,
queries are always executed against the relevant shards. Every shard
returns "size" documents, thus a reduce is needed to resort and return only
the top "size" documents. With a single shard the reduce is not needed
though, which means that the additional time is spent on loading the lucene
fields that need to be returned from disk (_source by default). From your
search response I see 3 shards though, not just one.
On Wednesday, November 20, 2013 1:46:54 AM UTC+1, Mo wrote:
I am writing perf tool to load test elasticsearch. I have both index and
search working properly. I noticed that when I query for a term I get the
response in pretty decent time considering I just have 1 shard with 200M
docs in it. However, when I set size to 1000 it takes around 10 secs. In
both cases when I use "size" or not I still get same number of hits.total
value, which means it is searching through entire shard. Does it mean that
the extra time, when size is set to 1000, is coming from the fact that it
has to fetch the source of the document? In other words time to do just
"search" is always the same for that specific term in both cases?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.