Number of replicas and query speed


(bogdanionescu) #1

Hello

I have a 4 node cluster with 7.7M docs indexed. 5 shards.
Fist I've tried with 1 replica (default config) and we tested the
query speed. It worked fine.
The we used the REST API to increase the number of replicas to 4 and
after a while the nodes reflected this change (in shards dirs stored
locally all had a copy of all the 5 shards).
The problems is that the query speed was exactly the same as when
using 1 replica config!
Any ideas why this happens? Shouldn't be any improvement at all?


(Clinton Gormley) #2

Hiya

I have a 4 node cluster with 7.7M docs indexed. 5 shards.
Fist I've tried with 1 replica (default config) and we tested the
query speed. It worked fine.
The we used the REST API to increase the number of replicas to 4 and
after a while the nodes reflected this change (in shards dirs stored
locally all had a copy of all the 5 shards).
The problems is that the query speed was exactly the same as when
using 1 replica config!
Any ideas why this happens? Shouldn't be any improvement at all?

ES still has to talk to exactly the same number of shards for your
query. The fact that there are more of them to choose from doesn't
affect your query speed.

Where it will make a difference is when you reach the point that your 1
replica setup is too busy to cope with all of your queries. At that
stage, having more replicas to choose from will help you to scale

clint


(bogdanionescu) #3

I see... I agree the response time should be the same, but the I
thought the throughput should increase.
I'll try to increase the load and compare the results then.

On Jan 9, 12:25 pm, Clinton Gormley cl...@traveljury.com wrote:

Hiya

I have a 4 node cluster with 7.7M docs indexed. 5 shards.
Fist I've tried with 1 replica (default config) and we tested the
query speed. It worked fine.
The we used the REST API to increase the number of replicas to 4 and
after a while the nodes reflected this change (in shards dirs stored
locally all had a copy of all the 5 shards).
The problems is that the query speed was exactly the same as when
using 1 replica config!
Any ideas why this happens? Shouldn't be any improvement at all?

ES still has to talk to exactly the same number of shards for your
query. The fact that there are more of them to choose from doesn't
affect your query speed.

Where it will make a difference is when you reach the point that your 1
replica setup is too busy to cope with all of your queries. At that
stage, having more replicas to choose from will help you to scale

clint


(bogdanionescu) #4

I did some tests under heavier load and I still could not see any
improvement when using more replicas...

On Jan 9, 12:55 pm, bogdaniones...@yahoo.com wrote:

I see... I agree the response time should be the same, but the I
thought the throughput should increase.
I'll try to increase the load and compare the results then.

On Jan 9, 12:25 pm, Clinton Gormley cl...@traveljury.com wrote:

Hiya

I have a 4 node cluster with 7.7M docs indexed. 5 shards.
Fist I've tried with 1 replica (default config) and we tested the
query speed. It worked fine.
The we used the REST API to increase the number of replicas to 4 and
after a while the nodes reflected this change (in shards dirs stored
locally all had a copy of all the 5 shards).
The problems is that the query speed was exactly the same as when
using 1 replica config!
Any ideas why this happens? Shouldn't be any improvement at all?

ES still has to talk to exactly the same number of shards for your
query. The fact that there are more of them to choose from doesn't
affect your query speed.

Where it will make a difference is when you reach the point that your 1
replica setup is too busy to cope with all of your queries. At that
stage, having more replicas to choose from will help you to scale

clint


(Berkay Mollamustafaoglu-2) #5

Can you give the details of the tests you're running? What are the
variables in your tests? What resource (CPU, Disk IO, etc.) seems to be the
bottleneck? How many client connections do you use? Do you increase the
number? Do the clients run on the same box with servers? Which language is
the client written in, could it be the bottleneck? What are the queries? do
you change the queries or run the same ones? Are you only querying or
updating the indices at the same time as well?
It's not feasible to guess what the reason may be without fully
understanding details of the tests.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Mon, Jan 9, 2012 at 10:50 AM, bogdanionescu3@yahoo.com wrote:

I did some tests under heavier load and I still could not see any
improvement when using more replicas...

On Jan 9, 12:55 pm, bogdaniones...@yahoo.com wrote:

I see... I agree the response time should be the same, but the I
thought the throughput should increase.
I'll try to increase the load and compare the results then.

On Jan 9, 12:25 pm, Clinton Gormley cl...@traveljury.com wrote:

Hiya

I have a 4 node cluster with 7.7M docs indexed. 5 shards.
Fist I've tried with 1 replica (default config) and we tested the
query speed. It worked fine.
The we used the REST API to increase the number of replicas to 4 and
after a while the nodes reflected this change (in shards dirs stored
locally all had a copy of all the 5 shards).
The problems is that the query speed was exactly the same as when
using 1 replica config!
Any ideas why this happens? Shouldn't be any improvement at all?

ES still has to talk to exactly the same number of shards for your
query. The fact that there are more of them to choose from doesn't
affect your query speed.

Where it will make a difference is when you reach the point that your 1
replica setup is too busy to cope with all of your queries. At that
stage, having more replicas to choose from will help you to scale

clint


(Shay Banon) #6

On a 4 node setup with 5 shards and 1 replica increasing the number of
replicas will not change the search performance, since all the nodes are
already "maxed" in terms of search being executed on them. You are just
"making" more shards, but you still have only 4 boxes.

If you have an index that is already spread out, lets say an index with 2
shards and 1 replica on a 10 box cluster (there might be other indices),
then increasing the number of replicas will help then, since you will span
more boxes in this case.

On Mon, Jan 9, 2012 at 5:50 PM, bogdanionescu3@yahoo.com wrote:

I did some tests under heavier load and I still could not see any
improvement when using more replicas...

On Jan 9, 12:55 pm, bogdaniones...@yahoo.com wrote:

I see... I agree the response time should be the same, but the I
thought the throughput should increase.
I'll try to increase the load and compare the results then.

On Jan 9, 12:25 pm, Clinton Gormley cl...@traveljury.com wrote:

Hiya

I have a 4 node cluster with 7.7M docs indexed. 5 shards.
Fist I've tried with 1 replica (default config) and we tested the
query speed. It worked fine.
The we used the REST API to increase the number of replicas to 4 and
after a while the nodes reflected this change (in shards dirs stored
locally all had a copy of all the 5 shards).
The problems is that the query speed was exactly the same as when
using 1 replica config!
Any ideas why this happens? Shouldn't be any improvement at all?

ES still has to talk to exactly the same number of shards for your
query. The fact that there are more of them to choose from doesn't
affect your query speed.

Where it will make a difference is when you reach the point that your 1
replica setup is too busy to cope with all of your queries. At that
stage, having more replicas to choose from will help you to scale

clint


(system) #7