I am seeing unexpected elastic behavior: with increasing the number of
replicas and servers, the query time is actually increasing, while I
would expect it to decrease.
I have a cluster with 8 million complex documents (overall about 300
million nested documents). I run complex queries, where each query involves
computing dozens of terms facets (top 100 values) on non-analyzed fields
with hundreds of thousands unique field values. I have a controlled load
testing environment, where I fire a number of queries simultaneously and
average measure query time. Since queries did not run very fast (5-10
seconds/query on average), I started adding servers and replicas. However,
not only did this not help query time, but it made things worse, as you can
see from the attached graph.
Maybe the OS cache or Elasticsearch's own caches play a role here. How
many queries did you run for each test to get the average?
Another thing might be the distribution of load. Do you get different
results if you add a non-data node as a load balancer and run the same
queries on it?
I am seeing unexpected elastic behavior: with increasing the number of
replicas and servers, the query time is actually increasing, while I would
expect it to decrease.
I have a cluster with 8 million complex documents (overall about 300 million
nested documents). I run complex queries, where each query involves
computing dozens of terms facets (top 100 values) on non-analyzed fields
with hundreds of thousands unique field values. I have a controlled load
testing environment, where I fire a number of queries simultaneously and
average measure query time. Since queries did not run very fast (5-10
seconds/query on average), I started adding servers and replicas. However,
not only did this not help query time, but it made things worse, as you can
see from the attached graph.
Bizarre indeed. Have a look at some other metrics, both ES and system one,
like disk IO, memory, CPU, etc. How do they change as you add more
servers? Did you check if shards were evenly spread in the cluster and
there were no shards being moved around the cluster when you ran your
tests? If you don't have a tool that can show you that, Elasticsearch Monitoring will
help.
On Thursday, October 18, 2012 11:49:41 AM UTC-4, Zmicier wrote:
I am seeing unexpected elastic behavior: with increasing the number of
replicas and servers, the query time is actually increasing, while I
would expect it to decrease.
I have a cluster with 8 million complex documents (overall about 300
million nested documents). I run complex queries, where each query involves
computing dozens of terms facets (top 100 values) on non-analyzed fields
with hundreds of thousands unique field values. I have a controlled load
testing environment, where I fire a number of queries simultaneously and
average measure query time. Since queries did not run very fast (5-10
seconds/query on average), I started adding servers and replicas. However,
not only did this not help query time, but it made things worse, as you can
see from the attached graph.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.