I have a 4 node cluster with 7.7M docs indexed. 5 shards.
Fist I've tried with 1 replica (default config) and we tested the
query speed. It worked fine.
The we used the REST API to increase the number of replicas to 4 and
after a while the nodes reflected this change (in shards dirs stored
locally all had a copy of all the 5 shards).
The problems is that the query speed was exactly the same as when
using 1 replica config!
Any ideas why this happens? Shouldn't be any improvement at all?
I have a 4 node cluster with 7.7M docs indexed. 5 shards.
Fist I've tried with 1 replica (default config) and we tested the
query speed. It worked fine.
The we used the REST API to increase the number of replicas to 4 and
after a while the nodes reflected this change (in shards dirs stored
locally all had a copy of all the 5 shards).
The problems is that the query speed was exactly the same as when
using 1 replica config!
Any ideas why this happens? Shouldn't be any improvement at all?
ES still has to talk to exactly the same number of shards for your
query. The fact that there are more of them to choose from doesn't
affect your query speed.
Where it will make a difference is when you reach the point that your 1
replica setup is too busy to cope with all of your queries. At that
stage, having more replicas to choose from will help you to scale
I see... I agree the response time should be the same, but the I
thought the throughput should increase.
I'll try to increase the load and compare the results then.
I have a 4 node cluster with 7.7M docs indexed. 5 shards.
Fist I've tried with 1 replica (default config) and we tested the
query speed. It worked fine.
The we used the REST API to increase the number of replicas to 4 and
after a while the nodes reflected this change (in shards dirs stored
locally all had a copy of all the 5 shards).
The problems is that the query speed was exactly the same as when
using 1 replica config!
Any ideas why this happens? Shouldn't be any improvement at all?
ES still has to talk to exactly the same number of shards for your
query. The fact that there are more of them to choose from doesn't
affect your query speed.
Where it will make a difference is when you reach the point that your 1
replica setup is too busy to cope with all of your queries. At that
stage, having more replicas to choose from will help you to scale
I see... I agree the response time should be the same, but the I
thought the throughput should increase.
I'll try to increase the load and compare the results then.
I have a 4 node cluster with 7.7M docs indexed. 5 shards.
Fist I've tried with 1 replica (default config) and we tested the
query speed. It worked fine.
The we used the REST API to increase the number of replicas to 4 and
after a while the nodes reflected this change (in shards dirs stored
locally all had a copy of all the 5 shards).
The problems is that the query speed was exactly the same as when
using 1 replica config!
Any ideas why this happens? Shouldn't be any improvement at all?
ES still has to talk to exactly the same number of shards for your
query. The fact that there are more of them to choose from doesn't
affect your query speed.
Where it will make a difference is when you reach the point that your 1
replica setup is too busy to cope with all of your queries. At that
stage, having more replicas to choose from will help you to scale
Can you give the details of the tests you're running? What are the
variables in your tests? What resource (CPU, Disk IO, etc.) seems to be the
bottleneck? How many client connections do you use? Do you increase the
number? Do the clients run on the same box with servers? Which language is
the client written in, could it be the bottleneck? What are the queries? do
you change the queries or run the same ones? Are you only querying or
updating the indices at the same time as well?
It's not feasible to guess what the reason may be without fully
understanding details of the tests.
Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype
I see... I agree the response time should be the same, but the I
thought the throughput should increase.
I'll try to increase the load and compare the results then.
I have a 4 node cluster with 7.7M docs indexed. 5 shards.
Fist I've tried with 1 replica (default config) and we tested the
query speed. It worked fine.
The we used the REST API to increase the number of replicas to 4 and
after a while the nodes reflected this change (in shards dirs stored
locally all had a copy of all the 5 shards).
The problems is that the query speed was exactly the same as when
using 1 replica config!
Any ideas why this happens? Shouldn't be any improvement at all?
ES still has to talk to exactly the same number of shards for your
query. The fact that there are more of them to choose from doesn't
affect your query speed.
Where it will make a difference is when you reach the point that your 1
replica setup is too busy to cope with all of your queries. At that
stage, having more replicas to choose from will help you to scale
On a 4 node setup with 5 shards and 1 replica increasing the number of
replicas will not change the search performance, since all the nodes are
already "maxed" in terms of search being executed on them. You are just
"making" more shards, but you still have only 4 boxes.
If you have an index that is already spread out, lets say an index with 2
shards and 1 replica on a 10 box cluster (there might be other indices),
then increasing the number of replicas will help then, since you will span
more boxes in this case.
I see... I agree the response time should be the same, but the I
thought the throughput should increase.
I'll try to increase the load and compare the results then.
I have a 4 node cluster with 7.7M docs indexed. 5 shards.
Fist I've tried with 1 replica (default config) and we tested the
query speed. It worked fine.
The we used the REST API to increase the number of replicas to 4 and
after a while the nodes reflected this change (in shards dirs stored
locally all had a copy of all the 5 shards).
The problems is that the query speed was exactly the same as when
using 1 replica config!
Any ideas why this happens? Shouldn't be any improvement at all?
ES still has to talk to exactly the same number of shards for your
query. The fact that there are more of them to choose from doesn't
affect your query speed.
Where it will make a difference is when you reach the point that your 1
replica setup is too busy to cope with all of your queries. At that
stage, having more replicas to choose from will help you to scale
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.