Temporary performance degradation when adding new node


(ppearcy) #1

Hey,
Was curious if there were any thoughts on a performance issue I have
observed.

Running two nodes mirrored (ie, replicas=1 for all indexes) w/ 40
indexes, 75 shards, ~30gb of data, fs storage, with no indexing
occurring and 64 threads running test queries. In a steady state, I
get ~600 queries per second.

I remove one node and as expected throughput drops to a reasonable 350
queries per second.

When I add the second node back, I initially see some performance
degradation. For a ten second period throughput drops to 90 qps and
then begins recovering and after 20 seconds is back up to the 600 qps
range.

All the indexes in the work directory are up to date on the second
node and disk caches should be reasonably warmed, so my guess is that
the overhead is from building up the filter caches or similar on the
new node.

So, under a high load situation where it is necessary to add a new
node, things will get worse before they get better.

Perhaps a node warm-up mechanism could alleviate this situation?

I need to play around more with in memory indexes to see if they have
the same issue, since there is a warm_cache setting that may be
appropriate for this situation.

Thanks,
Paul


(Shay Banon) #2

This basically happens because of the large amount of indices and shards you
have on a single node. When you start a second node, the replica shards will
be allocated to it, and a recover process will start. I suspect that the
recovery process takes its time, as well as the creation of all those
indices and shards.

It is throttled, by the way (the recovery process), but the default number
of concurrent recoveries allowed on a node is based on the number of cores,
which you have plenty. You can set
the routing.allocation.concurrent_recoveries setting, and set it to a lower
value. Note though, that changing this setting only applies for new masters.
So you will need to bring both nodes down to change the setting.

-shay.banon

-shay.banon

On Wed, Sep 1, 2010 at 9:34 AM, Paul ppearcy@gmail.com wrote:

Hey,
Was curious if there were any thoughts on a performance issue I have
observed.

Running two nodes mirrored (ie, replicas=1 for all indexes) w/ 40
indexes, 75 shards, ~30gb of data, fs storage, with no indexing
occurring and 64 threads running test queries. In a steady state, I
get ~600 queries per second.

I remove one node and as expected throughput drops to a reasonable 350
queries per second.

When I add the second node back, I initially see some performance
degradation. For a ten second period throughput drops to 90 qps and
then begins recovering and after 20 seconds is back up to the 600 qps
range.

All the indexes in the work directory are up to date on the second
node and disk caches should be reasonably warmed, so my guess is that
the overhead is from building up the filter caches or similar on the
new node.

So, under a high load situation where it is necessary to add a new
node, things will get worse before they get better.

Perhaps a node warm-up mechanism could alleviate this situation?

I need to play around more with in memory indexes to see if they have
the same issue, since there is a warm_cache setting that may be
appropriate for this situation.

Thanks,
Paul


(ppearcy) #3

Hey Shay,
Thanks for the details. I tried adding the following into my config
file and restarting the cluster:

routing:
allocation:
concurrent_recoveries : 1

I also tried this setting that I found mentioned elsewhere:

indices:
recovery:
throttler:
concurrent_recoveries : 1

Are either of these correct? For the first setting, performance
appeared as bad and with the second, worse.

Also, keep in mind, the actual recovery should be pretty light weight,
as I am used fs based local storage and not indexing during my test
case, so the local working data is all good when the node comes
online.

I'm going to be playing around with in memory indexes this afternoon
to see if behavior is similar.

Thanks,
Paul

On Sep 1, 6:49 am, Shay Banon shay.ba...@elasticsearch.com wrote:

This basically happens because of the large amount of indices and shards you
have on a single node. When you start a second node, the replica shards will
be allocated to it, and a recover process will start. I suspect that the
recovery process takes its time, as well as the creation of all those
indices and shards.

It is throttled, by the way (the recovery process), but the default number
of concurrent recoveries allowed on a node is based on the number of cores,
which you have plenty. You can set
the routing.allocation.concurrent_recoveries setting, and set it to a lower
value. Note though, that changing this setting only applies for new masters.
So you will need to bring both nodes down to change the setting.

-shay.banon

-shay.banon

On Wed, Sep 1, 2010 at 9:34 AM, Paul ppea...@gmail.com wrote:

Hey,
Was curious if there were any thoughts on a performance issue I have
observed.

Running two nodes mirrored (ie, replicas=1 for all indexes) w/ 40
indexes, 75 shards, ~30gb of data, fs storage, with no indexing
occurring and 64 threads running test queries. In a steady state, I
get ~600 queries per second.

I remove one node and as expected throughput drops to a reasonable 350
queries per second.

When I add the second node back, I initially see some performance
degradation. For a ten second period throughput drops to 90 qps and
then begins recovering and after 20 seconds is back up to the 600 qps
range.

All the indexes in the work directory are up to date on the second
node and disk caches should be reasonably warmed, so my guess is that
the overhead is from building up the filter caches or similar on the
new node.

So, under a high load situation where it is necessary to add a new
node, things will get worse before they get better.

Perhaps a node warm-up mechanism could alleviate this situation?

I need to play around more with in memory indexes to see if they have
the same issue, since there is a warm_cache setting that may be
appropriate for this situation.

Thanks,
Paul


(Shay Banon) #4

If you are using latest elasticsearch (0.10) then the setting is
routing.allocation.concurrent_recoveries. Which version are you using?

On Wed, Sep 1, 2010 at 7:48 PM, Paul ppearcy@gmail.com wrote:

Hey Shay,
Thanks for the details. I tried adding the following into my config
file and restarting the cluster:

routing:
allocation:
concurrent_recoveries : 1

I also tried this setting that I found mentioned elsewhere:

indices:
recovery:
throttler:
concurrent_recoveries : 1

Are either of these correct? For the first setting, performance
appeared as bad and with the second, worse.

Also, keep in mind, the actual recovery should be pretty light weight,
as I am used fs based local storage and not indexing during my test
case, so the local working data is all good when the node comes
online.

I'm going to be playing around with in memory indexes this afternoon
to see if behavior is similar.

Thanks,
Paul

On Sep 1, 6:49 am, Shay Banon shay.ba...@elasticsearch.com wrote:

This basically happens because of the large amount of indices and shards
you
have on a single node. When you start a second node, the replica shards
will
be allocated to it, and a recover process will start. I suspect that the
recovery process takes its time, as well as the creation of all those
indices and shards.

It is throttled, by the way (the recovery process), but the default
number
of concurrent recoveries allowed on a node is based on the number of
cores,
which you have plenty. You can set
the routing.allocation.concurrent_recoveries setting, and set it to a
lower
value. Note though, that changing this setting only applies for new
masters.
So you will need to bring both nodes down to change the setting.

-shay.banon

-shay.banon

On Wed, Sep 1, 2010 at 9:34 AM, Paul ppea...@gmail.com wrote:

Hey,
Was curious if there were any thoughts on a performance issue I have
observed.

Running two nodes mirrored (ie, replicas=1 for all indexes) w/ 40
indexes, 75 shards, ~30gb of data, fs storage, with no indexing
occurring and 64 threads running test queries. In a steady state, I
get ~600 queries per second.

I remove one node and as expected throughput drops to a reasonable 350
queries per second.

When I add the second node back, I initially see some performance
degradation. For a ten second period throughput drops to 90 qps and
then begins recovering and after 20 seconds is back up to the 600 qps
range.

All the indexes in the work directory are up to date on the second
node and disk caches should be reasonably warmed, so my guess is that
the overhead is from building up the filter caches or similar on the
new node.

So, under a high load situation where it is necessary to add a new
node, things will get worse before they get better.

Perhaps a node warm-up mechanism could alleviate this situation?

I need to play around more with in memory indexes to see if they have
the same issue, since there is a warm_cache setting that may be
appropriate for this situation.

Thanks,
Paul


(ppearcy) #5

On 0.10.0.

I played around with that setting trying 1 (which I actually think
gets set to 3 in the code) and 10. Both settings demonstrate the same
performance impact, as the default, which should be 24 (cores) in my
case.

Also, played around with in memory indexes in this situation. Seems
the same issue occurs, but I'm not 100% confident in my results, as I
started swapping due to low memory.

Thanks,
Paul

On Sep 1, 12:22 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

If you are using latest elasticsearch (0.10) then the setting is
routing.allocation.concurrent_recoveries. Which version are you using?

On Wed, Sep 1, 2010 at 7:48 PM, Paul ppea...@gmail.com wrote:

Hey Shay,
Thanks for the details. I tried adding the following into my config
file and restarting the cluster:

routing:
allocation:
concurrent_recoveries : 1

I also tried this setting that I found mentioned elsewhere:

indices:
recovery:
throttler:
concurrent_recoveries : 1

Are either of these correct? For the first setting, performance
appeared as bad and with the second, worse.

Also, keep in mind, the actual recovery should be pretty light weight,
as I am used fs based local storage and not indexing during my test
case, so the local working data is all good when the node comes
online.

I'm going to be playing around with in memory indexes this afternoon
to see if behavior is similar.

Thanks,
Paul

On Sep 1, 6:49 am, Shay Banon shay.ba...@elasticsearch.com wrote:

This basically happens because of the large amount of indices and shards
you
have on a single node. When you start a second node, the replica shards
will
be allocated to it, and a recover process will start. I suspect that the
recovery process takes its time, as well as the creation of all those
indices and shards.

It is throttled, by the way (the recovery process), but the default
number
of concurrent recoveries allowed on a node is based on the number of
cores,
which you have plenty. You can set
the routing.allocation.concurrent_recoveries setting, and set it to a
lower
value. Note though, that changing this setting only applies for new
masters.
So you will need to bring both nodes down to change the setting.

-shay.banon

-shay.banon

On Wed, Sep 1, 2010 at 9:34 AM, Paul ppea...@gmail.com wrote:

Hey,
Was curious if there were any thoughts on a performance issue I have
observed.

Running two nodes mirrored (ie, replicas=1 for all indexes) w/ 40
indexes, 75 shards, ~30gb of data, fs storage, with no indexing
occurring and 64 threads running test queries. In a steady state, I
get ~600 queries per second.

I remove one node and as expected throughput drops to a reasonable 350
queries per second.

When I add the second node back, I initially see some performance
degradation. For a ten second period throughput drops to 90 qps and
then begins recovering and after 20 seconds is back up to the 600 qps
range.

All the indexes in the work directory are up to date on the second
node and disk caches should be reasonably warmed, so my guess is that
the overhead is from building up the filter caches or similar on the
new node.

So, under a high load situation where it is necessary to add a new
node, things will get worse before they get better.

Perhaps a node warm-up mechanism could alleviate this situation?

I need to play around more with in memory indexes to see if they have
the same issue, since there is a warm_cache setting that may be
appropriate for this situation.

Thanks,
Paul


(ppearcy) #6

Hi Shay,
Here is an interesting observation... Once a node is warmed up, ES
is using ~9GB of memory. When I run the test to add the new node
performance does not stabilize until the new nodes memory usage hits
about the 9GB mark.

This implies to me that the degradation is occurring due to ES needing
to build up the filter cache and the data cache.

Also, the first node in the cluster looks fine in terms of system
metrics while this goes on. It is the new node that was just added
that spikes out the CPU and is the bottleneck.

Thanks,
Paul

On Sep 1, 12:44 pm, Paul ppea...@gmail.com wrote:

On 0.10.0.

I played around with that setting trying 1 (which I actually think
gets set to 3 in the code) and 10. Both settings demonstrate the same
performance impact, as the default, which should be 24 (cores) in my
case.

Also, played around with in memory indexes in this situation. Seems
the same issue occurs, but I'm not 100% confident in my results, as I
started swapping due to low memory.

Thanks,
Paul

On Sep 1, 12:22 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

If you are using latest elasticsearch (0.10) then the setting is
routing.allocation.concurrent_recoveries. Which version are you using?

On Wed, Sep 1, 2010 at 7:48 PM, Paul ppea...@gmail.com wrote:

Hey Shay,
Thanks for the details. I tried adding the following into my config
file and restarting the cluster:

routing:
allocation:
concurrent_recoveries : 1

I also tried this setting that I found mentioned elsewhere:

indices:
recovery:
throttler:
concurrent_recoveries : 1

Are either of these correct? For the first setting, performance
appeared as bad and with the second, worse.

Also, keep in mind, the actual recovery should be pretty light weight,
as I am used fs based local storage and not indexing during my test
case, so the local working data is all good when the node comes
online.

I'm going to be playing around with in memory indexes this afternoon
to see if behavior is similar.

Thanks,
Paul

On Sep 1, 6:49 am, Shay Banon shay.ba...@elasticsearch.com wrote:

This basically happens because of the large amount of indices and shards
you
have on a single node. When you start a second node, the replica shards
will
be allocated to it, and a recover process will start. I suspect that the
recovery process takes its time, as well as the creation of all those
indices and shards.

It is throttled, by the way (the recovery process), but the default
number
of concurrent recoveries allowed on a node is based on the number of
cores,
which you have plenty. You can set
the routing.allocation.concurrent_recoveries setting, and set it to a
lower
value. Note though, that changing this setting only applies for new
masters.
So you will need to bring both nodes down to change the setting.

-shay.banon

-shay.banon

On Wed, Sep 1, 2010 at 9:34 AM, Paul ppea...@gmail.com wrote:

Hey,
Was curious if there were any thoughts on a performance issue I have
observed.

Running two nodes mirrored (ie, replicas=1 for all indexes) w/ 40
indexes, 75 shards, ~30gb of data, fs storage, with no indexing
occurring and 64 threads running test queries. In a steady state, I
get ~600 queries per second.

I remove one node and as expected throughput drops to a reasonable 350
queries per second.

When I add the second node back, I initially see some performance
degradation. For a ten second period throughput drops to 90 qps and
then begins recovering and after 20 seconds is back up to the 600 qps
range.

All the indexes in the work directory are up to date on the second
node and disk caches should be reasonably warmed, so my guess is that
the overhead is from building up the filter caches or similar on the
new node.

So, under a high load situation where it is necessary to add a new
node, things will get worse before they get better.

Perhaps a node warm-up mechanism could alleviate this situation?

I need to play around more with in memory indexes to see if they have
the same issue, since there is a warm_cache setting that may be
appropriate for this situation.

Thanks,
Paul


(Shay Banon) #7

Yes, this might be that. If you use facets / sorting / filters then the
cache will require some time to fill up.

On Wed, Sep 1, 2010 at 9:54 PM, Paul ppearcy@gmail.com wrote:

Hi Shay,
Here is an interesting observation... Once a node is warmed up, ES
is using ~9GB of memory. When I run the test to add the new node
performance does not stabilize until the new nodes memory usage hits
about the 9GB mark.

This implies to me that the degradation is occurring due to ES needing
to build up the filter cache and the data cache.

Also, the first node in the cluster looks fine in terms of system
metrics while this goes on. It is the new node that was just added
that spikes out the CPU and is the bottleneck.

Thanks,
Paul

On Sep 1, 12:44 pm, Paul ppea...@gmail.com wrote:

On 0.10.0.

I played around with that setting trying 1 (which I actually think
gets set to 3 in the code) and 10. Both settings demonstrate the same
performance impact, as the default, which should be 24 (cores) in my
case.

Also, played around with in memory indexes in this situation. Seems
the same issue occurs, but I'm not 100% confident in my results, as I
started swapping due to low memory.

Thanks,
Paul

On Sep 1, 12:22 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

If you are using latest elasticsearch (0.10) then the setting is
routing.allocation.concurrent_recoveries. Which version are you using?

On Wed, Sep 1, 2010 at 7:48 PM, Paul ppea...@gmail.com wrote:

Hey Shay,
Thanks for the details. I tried adding the following into my config
file and restarting the cluster:

routing:
allocation:
concurrent_recoveries : 1

I also tried this setting that I found mentioned elsewhere:

indices:
recovery:
throttler:
concurrent_recoveries : 1

Are either of these correct? For the first setting, performance
appeared as bad and with the second, worse.

Also, keep in mind, the actual recovery should be pretty light
weight,

as I am used fs based local storage and not indexing during my test
case, so the local working data is all good when the node comes
online.

I'm going to be playing around with in memory indexes this afternoon
to see if behavior is similar.

Thanks,
Paul

On Sep 1, 6:49 am, Shay Banon shay.ba...@elasticsearch.com wrote:

This basically happens because of the large amount of indices and
shards

you

have on a single node. When you start a second node, the replica
shards

will

be allocated to it, and a recover process will start. I suspect
that the

recovery process takes its time, as well as the creation of all
those

indices and shards.

It is throttled, by the way (the recovery process), but the default
number
of concurrent recoveries allowed on a node is based on the number
of

cores,

which you have plenty. You can set
the routing.allocation.concurrent_recoveries setting, and set it to
a

lower

value. Note though, that changing this setting only applies for new
masters.
So you will need to bring both nodes down to change the setting.

-shay.banon

-shay.banon

On Wed, Sep 1, 2010 at 9:34 AM, Paul ppea...@gmail.com wrote:

Hey,
Was curious if there were any thoughts on a performance issue I
have

observed.

Running two nodes mirrored (ie, replicas=1 for all indexes) w/ 40
indexes, 75 shards, ~30gb of data, fs storage, with no indexing
occurring and 64 threads running test queries. In a steady state,
I

get ~600 queries per second.

I remove one node and as expected throughput drops to a
reasonable 350

queries per second.

When I add the second node back, I initially see some performance
degradation. For a ten second period throughput drops to 90 qps
and

then begins recovering and after 20 seconds is back up to the 600
qps

range.

All the indexes in the work directory are up to date on the
second

node and disk caches should be reasonably warmed, so my guess is
that

the overhead is from building up the filter caches or similar on
the

new node.

So, under a high load situation where it is necessary to add a
new

node, things will get worse before they get better.

Perhaps a node warm-up mechanism could alleviate this situation?

I need to play around more with in memory indexes to see if they
have

the same issue, since there is a warm_cache setting that may be
appropriate for this situation.

Thanks,
Paul


(Shay Banon) #8

Especially with this many shards... .

On Wed, Sep 1, 2010 at 9:58 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Yes, this might be that. If you use facets / sorting / filters then the
cache will require some time to fill up.

On Wed, Sep 1, 2010 at 9:54 PM, Paul ppearcy@gmail.com wrote:

Hi Shay,
Here is an interesting observation... Once a node is warmed up, ES
is using ~9GB of memory. When I run the test to add the new node
performance does not stabilize until the new nodes memory usage hits
about the 9GB mark.

This implies to me that the degradation is occurring due to ES needing
to build up the filter cache and the data cache.

Also, the first node in the cluster looks fine in terms of system
metrics while this goes on. It is the new node that was just added
that spikes out the CPU and is the bottleneck.

Thanks,
Paul

On Sep 1, 12:44 pm, Paul ppea...@gmail.com wrote:

On 0.10.0.

I played around with that setting trying 1 (which I actually think
gets set to 3 in the code) and 10. Both settings demonstrate the same
performance impact, as the default, which should be 24 (cores) in my
case.

Also, played around with in memory indexes in this situation. Seems
the same issue occurs, but I'm not 100% confident in my results, as I
started swapping due to low memory.

Thanks,
Paul

On Sep 1, 12:22 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

If you are using latest elasticsearch (0.10) then the setting is
routing.allocation.concurrent_recoveries. Which version are you using?

On Wed, Sep 1, 2010 at 7:48 PM, Paul ppea...@gmail.com wrote:

Hey Shay,
Thanks for the details. I tried adding the following into my config
file and restarting the cluster:

routing:
allocation:
concurrent_recoveries : 1

I also tried this setting that I found mentioned elsewhere:

indices:
recovery:
throttler:
concurrent_recoveries : 1

Are either of these correct? For the first setting, performance
appeared as bad and with the second, worse.

Also, keep in mind, the actual recovery should be pretty light
weight,

as I am used fs based local storage and not indexing during my test
case, so the local working data is all good when the node comes
online.

I'm going to be playing around with in memory indexes this afternoon
to see if behavior is similar.

Thanks,
Paul

On Sep 1, 6:49 am, Shay Banon shay.ba...@elasticsearch.com wrote:

This basically happens because of the large amount of indices and
shards

you

have on a single node. When you start a second node, the replica
shards

will

be allocated to it, and a recover process will start. I suspect
that the

recovery process takes its time, as well as the creation of all
those

indices and shards.

It is throttled, by the way (the recovery process), but the
default

number

of concurrent recoveries allowed on a node is based on the number
of

cores,

which you have plenty. You can set
the routing.allocation.concurrent_recoveries setting, and set it
to a

lower

value. Note though, that changing this setting only applies for
new

masters.

So you will need to bring both nodes down to change the setting.

-shay.banon

-shay.banon

On Wed, Sep 1, 2010 at 9:34 AM, Paul ppea...@gmail.com wrote:

Hey,
Was curious if there were any thoughts on a performance issue I
have

observed.

Running two nodes mirrored (ie, replicas=1 for all indexes) w/
40

indexes, 75 shards, ~30gb of data, fs storage, with no indexing
occurring and 64 threads running test queries. In a steady
state, I

get ~600 queries per second.

I remove one node and as expected throughput drops to a
reasonable 350

queries per second.

When I add the second node back, I initially see some
performance

degradation. For a ten second period throughput drops to 90 qps
and

then begins recovering and after 20 seconds is back up to the
600 qps

range.

All the indexes in the work directory are up to date on the
second

node and disk caches should be reasonably warmed, so my guess is
that

the overhead is from building up the filter caches or similar on
the

new node.

So, under a high load situation where it is necessary to add a
new

node, things will get worse before they get better.

Perhaps a node warm-up mechanism could alleviate this situation?

I need to play around more with in memory indexes to see if they
have

the same issue, since there is a warm_cache setting that may be
appropriate for this situation.

Thanks,
Paul


(ppearcy) #9

Yeah, doing heavy filtering and sorting based on date.

Everyone of my queries is using a filter cache for part of it. The
filter caches used are finite, probably ~2000 or so different ones.

I can get my shard count down to ~45 without much effort, but lower
than that point will be non-trivial due to the fact that I am
dynamically translating queries from a previous domain to a new one
and the old domain had 45 standalone indexes.

Since the node immediately joins the cluster, I don't see anyway that
I can warm it up in isolation. Our current search system, each machine
is isolated and none share any state, so it is easy to re-run the last
N queries to warm its internal caches.

My other alternative is to run multiple single machine clusters, which
would allow isolated warm up, but this config would remove a large
chunk of ES's usefulness and have data consistency issues.

From an ES perspective, I see a few possible solutions:

  1. Do a configurable internal warm up of the node. Could either
    dispatch live queries to the node and disregard the response or store
    the last N queries to re-run
  2. Figure out how to distribute the caches from the active nodes. This
    would be a really cool approach, but I don't know about the
    feasibility.

I am open to any other suggestions that I can do on my side.

Thanks,
Paul

On Sep 1, 12:58 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Especially with this many shards... .

On Wed, Sep 1, 2010 at 9:58 PM, Shay Banon shay.ba...@elasticsearch.comwrote:

Yes, this might be that. If you use facets / sorting / filters then the
cache will require some time to fill up.

On Wed, Sep 1, 2010 at 9:54 PM, Paul ppea...@gmail.com wrote:

Hi Shay,
Here is an interesting observation... Once a node is warmed up, ES
is using ~9GB of memory. When I run the test to add the new node
performance does not stabilize until the new nodes memory usage hits
about the 9GB mark.

This implies to me that the degradation is occurring due to ES needing
to build up the filter cache and the data cache.

Also, the first node in the cluster looks fine in terms of system
metrics while this goes on. It is the new node that was just added
that spikes out the CPU and is the bottleneck.

Thanks,
Paul

On Sep 1, 12:44 pm, Paul ppea...@gmail.com wrote:

On 0.10.0.

I played around with that setting trying 1 (which I actually think
gets set to 3 in the code) and 10. Both settings demonstrate the same
performance impact, as the default, which should be 24 (cores) in my
case.

Also, played around with in memory indexes in this situation. Seems
the same issue occurs, but I'm not 100% confident in my results, as I
started swapping due to low memory.

Thanks,
Paul

On Sep 1, 12:22 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

If you are using latest elasticsearch (0.10) then the setting is
routing.allocation.concurrent_recoveries. Which version are you using?

On Wed, Sep 1, 2010 at 7:48 PM, Paul ppea...@gmail.com wrote:

Hey Shay,
Thanks for the details. I tried adding the following into my config
file and restarting the cluster:

routing:
allocation:
concurrent_recoveries : 1

I also tried this setting that I found mentioned elsewhere:

indices:
recovery:
throttler:
concurrent_recoveries : 1

Are either of these correct? For the first setting, performance
appeared as bad and with the second, worse.

Also, keep in mind, the actual recovery should be pretty light
weight,

as I am used fs based local storage and not indexing during my test
case, so the local working data is all good when the node comes
online.

I'm going to be playing around with in memory indexes this afternoon
to see if behavior is similar.

Thanks,
Paul

On Sep 1, 6:49 am, Shay Banon shay.ba...@elasticsearch.com wrote:

This basically happens because of the large amount of indices and
shards

you

have on a single node. When you start a second node, the replica
shards

will

be allocated to it, and a recover process will start. I suspect
that the

recovery process takes its time, as well as the creation of all
those

indices and shards.

It is throttled, by the way (the recovery process), but the
default

number

of concurrent recoveries allowed on a node is based on the number
of

cores,

which you have plenty. You can set
the routing.allocation.concurrent_recoveries setting, and set it
to a

lower

value. Note though, that changing this setting only applies for
new

masters.

So you will need to bring both nodes down to change the setting.

-shay.banon

-shay.banon

On Wed, Sep 1, 2010 at 9:34 AM, Paul ppea...@gmail.com wrote:

Hey,
Was curious if there were any thoughts on a performance issue I
have

observed.

Running two nodes mirrored (ie, replicas=1 for all indexes) w/
40

indexes, 75 shards, ~30gb of data, fs storage, with no indexing
occurring and 64 threads running test queries. In a steady
state, I

get ~600 queries per second.

I remove one node and as expected throughput drops to a
reasonable 350

queries per second.

When I add the second node back, I initially see some
performance

degradation. For a ten second period throughput drops to 90 qps
and

then begins recovering and after 20 seconds is back up to the
600 qps

range.

All the indexes in the work directory are up to date on the
second

node and disk caches should be reasonably warmed, so my guess is
that

the overhead is from building up the filter caches or similar on
the

new node.

So, under a high load situation where it is necessary to add a
new

node, things will get worse before they get better.

Perhaps a node warm-up mechanism could alleviate this situation?

I need to play around more with in memory indexes to see if they
have

the same issue, since there is a warm_cache setting that may be
appropriate for this situation.

Thanks,
Paul


(Shay Banon) #10

Is the 10 second period of QPS degradation is critical? Another solution can
be to have more machines to even the load and reduce the effect of another
node joining the cluster.

Doing a pre warming phase is not simple at all (though it certainly sounds
like it :wink: ). The reason is that I can add pre warming on a shard level, but
you end up having 75 shards allocate on that node. Each shard is on its own,
they are not interconnected, or have dependencies one or the other (as it is
very very complicate to do in distributed fashion). So, the first shard can
warm before it is available, but then, the others warming up might cause the
queries directed to the already started shard to slow down.

-shay.banon

On Wed, Sep 1, 2010 at 10:16 PM, Paul ppearcy@gmail.com wrote:

Yeah, doing heavy filtering and sorting based on date.

Everyone of my queries is using a filter cache for part of it. The
filter caches used are finite, probably ~2000 or so different ones.

I can get my shard count down to ~45 without much effort, but lower
than that point will be non-trivial due to the fact that I am
dynamically translating queries from a previous domain to a new one
and the old domain had 45 standalone indexes.

Since the node immediately joins the cluster, I don't see anyway that
I can warm it up in isolation. Our current search system, each machine
is isolated and none share any state, so it is easy to re-run the last
N queries to warm its internal caches.

My other alternative is to run multiple single machine clusters, which
would allow isolated warm up, but this config would remove a large
chunk of ES's usefulness and have data consistency issues.

From an ES perspective, I see a few possible solutions:

  1. Do a configurable internal warm up of the node. Could either
    dispatch live queries to the node and disregard the response or store
    the last N queries to re-run
  2. Figure out how to distribute the caches from the active nodes. This
    would be a really cool approach, but I don't know about the
    feasibility.

I am open to any other suggestions that I can do on my side.

Thanks,
Paul

On Sep 1, 12:58 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Especially with this many shards... .

On Wed, Sep 1, 2010 at 9:58 PM, Shay Banon <shay.ba...@elasticsearch.com
wrote:

Yes, this might be that. If you use facets / sorting / filters then the
cache will require some time to fill up.

On Wed, Sep 1, 2010 at 9:54 PM, Paul ppea...@gmail.com wrote:

Hi Shay,
Here is an interesting observation... Once a node is warmed up, ES
is using ~9GB of memory. When I run the test to add the new node
performance does not stabilize until the new nodes memory usage hits
about the 9GB mark.

This implies to me that the degradation is occurring due to ES needing
to build up the filter cache and the data cache.

Also, the first node in the cluster looks fine in terms of system
metrics while this goes on. It is the new node that was just added
that spikes out the CPU and is the bottleneck.

Thanks,
Paul

On Sep 1, 12:44 pm, Paul ppea...@gmail.com wrote:

On 0.10.0.

I played around with that setting trying 1 (which I actually think
gets set to 3 in the code) and 10. Both settings demonstrate the
same

performance impact, as the default, which should be 24 (cores) in my
case.

Also, played around with in memory indexes in this situation. Seems
the same issue occurs, but I'm not 100% confident in my results, as
I

started swapping due to low memory.

Thanks,
Paul

On Sep 1, 12:22 pm, Shay Banon shay.ba...@elasticsearch.com
wrote:

If you are using latest elasticsearch (0.10) then the setting is
routing.allocation.concurrent_recoveries. Which version are you
using?

On Wed, Sep 1, 2010 at 7:48 PM, Paul ppea...@gmail.com wrote:

Hey Shay,
Thanks for the details. I tried adding the following into my
config

file and restarting the cluster:

routing:
allocation:
concurrent_recoveries : 1

I also tried this setting that I found mentioned elsewhere:

indices:
recovery:
throttler:
concurrent_recoveries : 1

Are either of these correct? For the first setting, performance
appeared as bad and with the second, worse.

Also, keep in mind, the actual recovery should be pretty light
weight,

as I am used fs based local storage and not indexing during my
test

case, so the local working data is all good when the node comes
online.

I'm going to be playing around with in memory indexes this
afternoon

to see if behavior is similar.

Thanks,
Paul

On Sep 1, 6:49 am, Shay Banon shay.ba...@elasticsearch.com
wrote:

This basically happens because of the large amount of indices
and

shards

you

have on a single node. When you start a second node, the
replica

shards

will

be allocated to it, and a recover process will start. I
suspect

that the

recovery process takes its time, as well as the creation of
all

those

indices and shards.

It is throttled, by the way (the recovery process), but the
default

number

of concurrent recoveries allowed on a node is based on the
number

of

cores,

which you have plenty. You can set
the routing.allocation.concurrent_recoveries setting, and set
it

to a

lower

value. Note though, that changing this setting only applies
for

new

masters.

So you will need to bring both nodes down to change the
setting.

-shay.banon

-shay.banon

On Wed, Sep 1, 2010 at 9:34 AM, Paul ppea...@gmail.com
wrote:

Hey,
Was curious if there were any thoughts on a performance
issue I

have

observed.

Running two nodes mirrored (ie, replicas=1 for all indexes)
w/

40

indexes, 75 shards, ~30gb of data, fs storage, with no
indexing

occurring and 64 threads running test queries. In a steady
state, I

get ~600 queries per second.

I remove one node and as expected throughput drops to a
reasonable 350

queries per second.

When I add the second node back, I initially see some
performance

degradation. For a ten second period throughput drops to 90
qps

and

then begins recovering and after 20 seconds is back up to
the

600 qps

range.

All the indexes in the work directory are up to date on the
second

node and disk caches should be reasonably warmed, so my
guess is

that

the overhead is from building up the filter caches or
similar on

the

new node.

So, under a high load situation where it is necessary to add
a

new

node, things will get worse before they get better.

Perhaps a node warm-up mechanism could alleviate this
situation?

I need to play around more with in memory indexes to see if
they

have

the same issue, since there is a warm_cache setting that may
be

appropriate for this situation.

Thanks,
Paul


(ppearcy) #11

Hey Shay,
Thanks for the further details. My final config will be using a 3
node cluster and I need to get another piece of h/w to see how things
behave for this case with that config.

The throughput will probably still be good enough to live with the
performance hit of adding a node, but am not sure. Worst case, will
have to have rules on my side that we can't add a new node during high
traffic periods, which sounds a little backwards.

When a shard is reallocating is there any feasibility in copying over
the field/data cache state, as well?

Thanks again for the help,
Paul

On Sep 1, 1:27 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Is the 10 second period of QPS degradation is critical? Another solution can
be to have more machines to even the load and reduce the effect of another
node joining the cluster.

Doing a pre warming phase is not simple at all (though it certainly sounds
like it :wink: ). The reason is that I can add pre warming on a shard level, but
you end up having 75 shards allocate on that node. Each shard is on its own,
they are not interconnected, or have dependencies one or the other (as it is
very very complicate to do in distributed fashion). So, the first shard can
warm before it is available, but then, the others warming up might cause the
queries directed to the already started shard to slow down.

-shay.banon

On Wed, Sep 1, 2010 at 10:16 PM, Paul ppea...@gmail.com wrote:

Yeah, doing heavy filtering and sorting based on date.

Everyone of my queries is using a filter cache for part of it. The
filter caches used are finite, probably ~2000 or so different ones.

I can get my shard count down to ~45 without much effort, but lower
than that point will be non-trivial due to the fact that I am
dynamically translating queries from a previous domain to a new one
and the old domain had 45 standalone indexes.

Since the node immediately joins the cluster, I don't see anyway that
I can warm it up in isolation. Our current search system, each machine
is isolated and none share any state, so it is easy to re-run the last
N queries to warm its internal caches.

My other alternative is to run multiple single machine clusters, which
would allow isolated warm up, but this config would remove a large
chunk of ES's usefulness and have data consistency issues.

From an ES perspective, I see a few possible solutions:

  1. Do a configurable internal warm up of the node. Could either
    dispatch live queries to the node and disregard the response or store
    the last N queries to re-run
  2. Figure out how to distribute the caches from the active nodes. This
    would be a really cool approach, but I don't know about the
    feasibility.

I am open to any other suggestions that I can do on my side.

Thanks,
Paul

On Sep 1, 12:58 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Especially with this many shards... .

On Wed, Sep 1, 2010 at 9:58 PM, Shay Banon <shay.ba...@elasticsearch.com
wrote:

Yes, this might be that. If you use facets / sorting / filters then the
cache will require some time to fill up.

On Wed, Sep 1, 2010 at 9:54 PM, Paul ppea...@gmail.com wrote:

Hi Shay,
Here is an interesting observation... Once a node is warmed up, ES
is using ~9GB of memory. When I run the test to add the new node
performance does not stabilize until the new nodes memory usage hits
about the 9GB mark.

This implies to me that the degradation is occurring due to ES needing
to build up the filter cache and the data cache.

Also, the first node in the cluster looks fine in terms of system
metrics while this goes on. It is the new node that was just added
that spikes out the CPU and is the bottleneck.

Thanks,
Paul

On Sep 1, 12:44 pm, Paul ppea...@gmail.com wrote:

On 0.10.0.

I played around with that setting trying 1 (which I actually think
gets set to 3 in the code) and 10. Both settings demonstrate the
same

performance impact, as the default, which should be 24 (cores) in my
case.

Also, played around with in memory indexes in this situation. Seems
the same issue occurs, but I'm not 100% confident in my results, as
I

started swapping due to low memory.

Thanks,
Paul

On Sep 1, 12:22 pm, Shay Banon shay.ba...@elasticsearch.com
wrote:

If you are using latest elasticsearch (0.10) then the setting is
routing.allocation.concurrent_recoveries. Which version are you
using?

On Wed, Sep 1, 2010 at 7:48 PM, Paul ppea...@gmail.com wrote:

Hey Shay,
Thanks for the details. I tried adding the following into my
config

file and restarting the cluster:

routing:
allocation:
concurrent_recoveries : 1

I also tried this setting that I found mentioned elsewhere:

indices:
recovery:
throttler:
concurrent_recoveries : 1

Are either of these correct? For the first setting, performance
appeared as bad and with the second, worse.

Also, keep in mind, the actual recovery should be pretty light
weight,

as I am used fs based local storage and not indexing during my
test

case, so the local working data is all good when the node comes
online.

I'm going to be playing around with in memory indexes this
afternoon

to see if behavior is similar.

Thanks,
Paul

On Sep 1, 6:49 am, Shay Banon shay.ba...@elasticsearch.com
wrote:

This basically happens because of the large amount of indices
and

shards

you

have on a single node. When you start a second node, the
replica

shards

will

be allocated to it, and a recover process will start. I
suspect

that the

recovery process takes its time, as well as the creation of
all

those

indices and shards.

It is throttled, by the way (the recovery process), but the
default

number

of concurrent recoveries allowed on a node is based on the
number

of

cores,

which you have plenty. You can set
the routing.allocation.concurrent_recoveries setting, and set
it

to a

lower

value. Note though, that changing this setting only applies
for

new

masters.

So you will need to bring both nodes down to change the
setting.

-shay.banon

-shay.banon

On Wed, Sep 1, 2010 at 9:34 AM, Paul ppea...@gmail.com
wrote:

Hey,
Was curious if there were any thoughts on a performance
issue I

have

observed.

Running two nodes mirrored (ie, replicas=1 for all indexes)
w/

40

indexes, 75 shards, ~30gb of data, fs storage, with no
indexing

occurring and 64 threads running test queries. In a steady
state, I

get ~600 queries per second.

I remove one node and as expected throughput drops to a
reasonable 350

queries per second.

When I add the second node back, I initially see some
performance

degradation. For a ten second period throughput drops to 90
qps

and

then begins recovering and after 20 seconds is back up to
the

600 qps

range.

All the indexes in the work directory are up to date on the
second

node and disk caches should be reasonably warmed, so my
guess is

that

the overhead is from building up the filter caches or
similar on

the

new node.

So, under a high load situation where it is necessary to add
a

new

node, things will get worse before they get better.

Perhaps a node warm-up mechanism could alleviate this
situation?

I need to play around more with in memory indexes to see if
they

have

the same issue, since there is a warm_cache setting that may
be

appropriate for this situation.

Thanks,
Paul


(Shay Banon) #12

No, the field data cache is unique to the JVM and not really serializable. I
can try and check if its possible to minimize the time mentioned, but won't
the 10 seconds slower QPS be mitigated by doubling the QPS once all is done?

-shay.banon

On Wed, Sep 1, 2010 at 10:38 PM, Paul ppearcy@gmail.com wrote:

Hey Shay,
Thanks for the further details. My final config will be using a 3
node cluster and I need to get another piece of h/w to see how things
behave for this case with that config.

The throughput will probably still be good enough to live with the
performance hit of adding a node, but am not sure. Worst case, will
have to have rules on my side that we can't add a new node during high
traffic periods, which sounds a little backwards.

When a shard is reallocating is there any feasibility in copying over
the field/data cache state, as well?

Thanks again for the help,
Paul

On Sep 1, 1:27 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Is the 10 second period of QPS degradation is critical? Another solution
can
be to have more machines to even the load and reduce the effect of
another
node joining the cluster.

Doing a pre warming phase is not simple at all (though it certainly
sounds
like it :wink: ). The reason is that I can add pre warming on a shard level,
but
you end up having 75 shards allocate on that node. Each shard is on its
own,
they are not interconnected, or have dependencies one or the other (as it
is
very very complicate to do in distributed fashion). So, the first shard
can
warm before it is available, but then, the others warming up might cause
the
queries directed to the already started shard to slow down.

-shay.banon

On Wed, Sep 1, 2010 at 10:16 PM, Paul ppea...@gmail.com wrote:

Yeah, doing heavy filtering and sorting based on date.

Everyone of my queries is using a filter cache for part of it. The
filter caches used are finite, probably ~2000 or so different ones.

I can get my shard count down to ~45 without much effort, but lower
than that point will be non-trivial due to the fact that I am
dynamically translating queries from a previous domain to a new one
and the old domain had 45 standalone indexes.

Since the node immediately joins the cluster, I don't see anyway that
I can warm it up in isolation. Our current search system, each machine
is isolated and none share any state, so it is easy to re-run the last
N queries to warm its internal caches.

My other alternative is to run multiple single machine clusters, which
would allow isolated warm up, but this config would remove a large
chunk of ES's usefulness and have data consistency issues.

From an ES perspective, I see a few possible solutions:

  1. Do a configurable internal warm up of the node. Could either
    dispatch live queries to the node and disregard the response or store
    the last N queries to re-run
  2. Figure out how to distribute the caches from the active nodes. This
    would be a really cool approach, but I don't know about the
    feasibility.

I am open to any other suggestions that I can do on my side.

Thanks,
Paul

On Sep 1, 12:58 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Especially with this many shards... .

On Wed, Sep 1, 2010 at 9:58 PM, Shay Banon <
shay.ba...@elasticsearch.com

wrote:

Yes, this might be that. If you use facets / sorting / filters then
the

cache will require some time to fill up.

On Wed, Sep 1, 2010 at 9:54 PM, Paul ppea...@gmail.com wrote:

Hi Shay,
Here is an interesting observation... Once a node is warmed up,
ES

is using ~9GB of memory. When I run the test to add the new node
performance does not stabilize until the new nodes memory usage
hits

about the 9GB mark.

This implies to me that the degradation is occurring due to ES
needing

to build up the filter cache and the data cache.

Also, the first node in the cluster looks fine in terms of system
metrics while this goes on. It is the new node that was just added
that spikes out the CPU and is the bottleneck.

Thanks,
Paul

On Sep 1, 12:44 pm, Paul ppea...@gmail.com wrote:

On 0.10.0.

I played around with that setting trying 1 (which I actually
think

gets set to 3 in the code) and 10. Both settings demonstrate the
same

performance impact, as the default, which should be 24 (cores)
in my

case.

Also, played around with in memory indexes in this situation.
Seems

the same issue occurs, but I'm not 100% confident in my results,
as

I

started swapping due to low memory.

Thanks,
Paul

On Sep 1, 12:22 pm, Shay Banon shay.ba...@elasticsearch.com
wrote:

If you are using latest elasticsearch (0.10) then the setting
is

routing.allocation.concurrent_recoveries. Which version are
you

using?

On Wed, Sep 1, 2010 at 7:48 PM, Paul ppea...@gmail.com
wrote:

Hey Shay,
Thanks for the details. I tried adding the following into
my

config

file and restarting the cluster:

routing:
allocation:
concurrent_recoveries : 1

I also tried this setting that I found mentioned elsewhere:

indices:
recovery:
throttler:
concurrent_recoveries : 1

Are either of these correct? For the first setting,
performance

appeared as bad and with the second, worse.

Also, keep in mind, the actual recovery should be pretty
light

weight,

as I am used fs based local storage and not indexing during
my

test

case, so the local working data is all good when the node
comes

online.

I'm going to be playing around with in memory indexes this
afternoon

to see if behavior is similar.

Thanks,
Paul

On Sep 1, 6:49 am, Shay Banon <shay.ba...@elasticsearch.com

wrote:

This basically happens because of the large amount of
indices

and

shards

you

have on a single node. When you start a second node, the
replica

shards

will

be allocated to it, and a recover process will start. I
suspect

that the

recovery process takes its time, as well as the creation
of

all

those

indices and shards.

It is throttled, by the way (the recovery process), but
the

default

number

of concurrent recoveries allowed on a node is based on the
number

of

cores,

which you have plenty. You can set
the routing.allocation.concurrent_recoveries setting, and
set

it

to a

lower

value. Note though, that changing this setting only
applies

for

new

masters.

So you will need to bring both nodes down to change the
setting.

-shay.banon

-shay.banon

On Wed, Sep 1, 2010 at 9:34 AM, Paul ppea...@gmail.com
wrote:

Hey,
Was curious if there were any thoughts on a performance
issue I

have

observed.

Running two nodes mirrored (ie, replicas=1 for all
indexes)

w/

40

indexes, 75 shards, ~30gb of data, fs storage, with no
indexing

occurring and 64 threads running test queries. In a
steady

state, I

get ~600 queries per second.

I remove one node and as expected throughput drops to a
reasonable 350

queries per second.

When I add the second node back, I initially see some
performance

degradation. For a ten second period throughput drops to
90

qps

and

then begins recovering and after 20 seconds is back up
to

the

600 qps

range.

All the indexes in the work directory are up to date on
the

second

node and disk caches should be reasonably warmed, so my
guess is

that

the overhead is from building up the filter caches or
similar on

the

new node.

So, under a high load situation where it is necessary to
add

a

new

node, things will get worse before they get better.

Perhaps a node warm-up mechanism could alleviate this
situation?

I need to play around more with in memory indexes to see
if

they

have

the same issue, since there is a warm_cache setting that
may

be

appropriate for this situation.

Thanks,
Paul


(ppearcy) #13

It all depends on the search requirements, SLAs, etc...

The tests I am running at the moment are synthetic in that they don't
simulate realworld traffic, and instead are going for max throughput.
I should have the infrastructure in place over the next couple of
weeks to replay traffic through ES at real world rates (or multiples
thereof) w/ the burst patterns associated with that and will no for
sure if this will be a real issue for me.

If it is still an issue for me, it sounds like I may need to have an
extra server in the cluster to account for this case, but we will
see.

Thanks,
Paul

On Sep 1, 1:41 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

No, the field data cache is unique to the JVM and not really serializable. I
can try and check if its possible to minimize the time mentioned, but won't
the 10 seconds slower QPS be mitigated by doubling the QPS once all is done?

-shay.banon

On Wed, Sep 1, 2010 at 10:38 PM, Paul ppea...@gmail.com wrote:

Hey Shay,
Thanks for the further details. My final config will be using a 3
node cluster and I need to get another piece of h/w to see how things
behave for this case with that config.

The throughput will probably still be good enough to live with the
performance hit of adding a node, but am not sure. Worst case, will
have to have rules on my side that we can't add a new node during high
traffic periods, which sounds a little backwards.

When a shard is reallocating is there any feasibility in copying over
the field/data cache state, as well?

Thanks again for the help,
Paul

On Sep 1, 1:27 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Is the 10 second period of QPS degradation is critical? Another solution
can
be to have more machines to even the load and reduce the effect of
another
node joining the cluster.

Doing a pre warming phase is not simple at all (though it certainly
sounds
like it :wink: ). The reason is that I can add pre warming on a shard level,
but
you end up having 75 shards allocate on that node. Each shard is on its
own,
they are not interconnected, or have dependencies one or the other (as it
is
very very complicate to do in distributed fashion). So, the first shard
can
warm before it is available, but then, the others warming up might cause
the
queries directed to the already started shard to slow down.

-shay.banon

On Wed, Sep 1, 2010 at 10:16 PM, Paul ppea...@gmail.com wrote:

Yeah, doing heavy filtering and sorting based on date.

Everyone of my queries is using a filter cache for part of it. The
filter caches used are finite, probably ~2000 or so different ones.

I can get my shard count down to ~45 without much effort, but lower
than that point will be non-trivial due to the fact that I am
dynamically translating queries from a previous domain to a new one
and the old domain had 45 standalone indexes.

Since the node immediately joins the cluster, I don't see anyway that
I can warm it up in isolation. Our current search system, each machine
is isolated and none share any state, so it is easy to re-run the last
N queries to warm its internal caches.

My other alternative is to run multiple single machine clusters, which
would allow isolated warm up, but this config would remove a large
chunk of ES's usefulness and have data consistency issues.

From an ES perspective, I see a few possible solutions:

  1. Do a configurable internal warm up of the node. Could either
    dispatch live queries to the node and disregard the response or store
    the last N queries to re-run
  2. Figure out how to distribute the caches from the active nodes. This
    would be a really cool approach, but I don't know about the
    feasibility.

I am open to any other suggestions that I can do on my side.

Thanks,
Paul

On Sep 1, 12:58 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Especially with this many shards... .

On Wed, Sep 1, 2010 at 9:58 PM, Shay Banon <
shay.ba...@elasticsearch.com

wrote:

Yes, this might be that. If you use facets / sorting / filters then
the

cache will require some time to fill up.

On Wed, Sep 1, 2010 at 9:54 PM, Paul ppea...@gmail.com wrote:

Hi Shay,
Here is an interesting observation... Once a node is warmed up,
ES

is using ~9GB of memory. When I run the test to add the new node
performance does not stabilize until the new nodes memory usage
hits

about the 9GB mark.

This implies to me that the degradation is occurring due to ES
needing

to build up the filter cache and the data cache.

Also, the first node in the cluster looks fine in terms of system
metrics while this goes on. It is the new node that was just added
that spikes out the CPU and is the bottleneck.

Thanks,
Paul

On Sep 1, 12:44 pm, Paul ppea...@gmail.com wrote:

On 0.10.0.

I played around with that setting trying 1 (which I actually
think

gets set to 3 in the code) and 10. Both settings demonstrate the
same

performance impact, as the default, which should be 24 (cores)
in my

case.

Also, played around with in memory indexes in this situation.
Seems

the same issue occurs, but I'm not 100% confident in my results,
as

I

started swapping due to low memory.

Thanks,
Paul

On Sep 1, 12:22 pm, Shay Banon shay.ba...@elasticsearch.com
wrote:

If you are using latest elasticsearch (0.10) then the setting
is

routing.allocation.concurrent_recoveries. Which version are
you

using?

On Wed, Sep 1, 2010 at 7:48 PM, Paul ppea...@gmail.com
wrote:

Hey Shay,
Thanks for the details. I tried adding the following into
my

config

file and restarting the cluster:

routing:
allocation:
concurrent_recoveries : 1

I also tried this setting that I found mentioned elsewhere:

indices:
recovery:
throttler:
concurrent_recoveries : 1

Are either of these correct? For the first setting,
performance

appeared as bad and with the second, worse.

Also, keep in mind, the actual recovery should be pretty
light

weight,

as I am used fs based local storage and not indexing during
my

test

case, so the local working data is all good when the node
comes

online.

I'm going to be playing around with in memory indexes this
afternoon

to see if behavior is similar.

Thanks,
Paul

On Sep 1, 6:49 am, Shay Banon <shay.ba...@elasticsearch.com

wrote:

This basically happens because of the large amount of
indices

and

shards

you

have on a single node. When you start a second node, the
replica

shards

will

be allocated to it, and a recover process will start. I
suspect

that the

recovery process takes its time, as well as the creation
of

all

those

indices and shards.

It is throttled, by the way (the recovery process), but
the

default

number

of concurrent recoveries allowed on a node is based on the
number

of

cores,

which you have plenty. You can set
the routing.allocation.concurrent_recoveries setting, and
set

it

to a

lower

value. Note though, that changing this setting only
applies

for

new

masters.

So you will need to bring both nodes down to change the
setting.

-shay.banon

-shay.banon

On Wed, Sep 1, 2010 at 9:34 AM, Paul ppea...@gmail.com
wrote:

Hey,
Was curious if there were any thoughts on a performance
issue I

have

observed.

Running two nodes mirrored (ie, replicas=1 for all
indexes)

w/

40

indexes, 75 shards, ~30gb of data, fs storage, with no
indexing

occurring and 64 threads running test queries. In a
steady

state, I

get ~600 queries per second.

I remove one node and as expected throughput drops to a
reasonable 350

queries per second.

When I add the second node back, I initially see some
performance

degradation. For a ten second period throughput drops to
90

qps

and

then begins recovering and after 20 seconds is back up
to

the

600 qps

range.

All the indexes in the work directory are up to date on
the

second

node and disk caches should be reasonably warmed, so my
guess is

that

the overhead is from building up the filter caches or
similar on

the

new node.

So, under a high load situation where it is necessary to
add

a

new

node, things will get worse before they get better.

Perhaps a node warm-up mechanism could alleviate this
situation?

I need to play around more with in memory indexes to see
if

they

have

the same issue, since there is a warm_cache setting that
may

be

appropriate for this situation.

...

read more »


(ppearcy) #14

FYI, figured out how to avoid this. Since the shard recovery from the
fs is so fast when there have been no index updates, all shards are up
nearly instantly with empty caches.

To work around this contention, I have:

  1. as suggested, throttled the concurrent_recoveries to 1
  2. remove the work directory before node start, forcing much slower
    recovery from the gateway
  3. clear the system disk cache (ymmv on this one, but seemed to help
    me)

When I do this performance never drops below the single node steady
state.

This got me thinking to an alternative approach that may handle this
and many other cluster conditions.

If there was feedback back into the shard router from response times,
cpu load, etc it would be able route traffic around shards with
warming caches, nodes with failing h/w, out of mem conditions, and
properly distribute traffic to slower vs faster h/w in the cluster.

Right now, in a smaller cluster (and probably a bigger), one slow node
can drag down performance disproportionately.

Thanks again for all the help.

Best Regards,
Paul

On Sep 1, 2:13 pm, Paul ppea...@gmail.com wrote:

It all depends on the search requirements, SLAs, etc...

The tests I am running at the moment are synthetic in that they don't
simulate realworld traffic, and instead are going for max throughput.
I should have the infrastructure in place over the next couple of
weeks to replay traffic through ES at real world rates (or multiples
thereof) w/ the burst patterns associated with that and will no for
sure if this will be a real issue for me.

If it is still an issue for me, it sounds like I may need to have an
extra server in the cluster to account for this case, but we will
see.

Thanks,
Paul

On Sep 1, 1:41 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

No, the field data cache is unique to the JVM and not really serializable. I
can try and check if its possible to minimize the time mentioned, but won't
the 10 seconds slower QPS be mitigated by doubling the QPS once all is done?

-shay.banon

On Wed, Sep 1, 2010 at 10:38 PM, Paul ppea...@gmail.com wrote:

Hey Shay,
Thanks for the further details. My final config will be using a 3
node cluster and I need to get another piece of h/w to see how things
behave for this case with that config.

The throughput will probably still be good enough to live with the
performance hit of adding a node, but am not sure. Worst case, will
have to have rules on my side that we can't add a new node during high
traffic periods, which sounds a little backwards.

When a shard is reallocating is there any feasibility in copying over
the field/data cache state, as well?

Thanks again for the help,
Paul

On Sep 1, 1:27 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Is the 10 second period of QPS degradation is critical? Another solution
can
be to have more machines to even the load and reduce the effect of
another
node joining the cluster.

Doing a pre warming phase is not simple at all (though it certainly
sounds
like it :wink: ). The reason is that I can add pre warming on a shard level,
but
you end up having 75 shards allocate on that node. Each shard is on its
own,
they are not interconnected, or have dependencies one or the other (as it
is
very very complicate to do in distributed fashion). So, the first shard
can
warm before it is available, but then, the others warming up might cause
the
queries directed to the already started shard to slow down.

-shay.banon

On Wed, Sep 1, 2010 at 10:16 PM, Paul ppea...@gmail.com wrote:

Yeah, doing heavy filtering and sorting based on date.

Everyone of my queries is using a filter cache for part of it. The
filter caches used are finite, probably ~2000 or so different ones.

I can get my shard count down to ~45 without much effort, but lower
than that point will be non-trivial due to the fact that I am
dynamically translating queries from a previous domain to a new one
and the old domain had 45 standalone indexes.

Since the node immediately joins the cluster, I don't see anyway that
I can warm it up in isolation. Our current search system, each machine
is isolated and none share any state, so it is easy to re-run the last
N queries to warm its internal caches.

My other alternative is to run multiple single machine clusters, which
would allow isolated warm up, but this config would remove a large
chunk of ES's usefulness and have data consistency issues.

From an ES perspective, I see a few possible solutions:

  1. Do a configurable internal warm up of the node. Could either
    dispatch live queries to the node and disregard the response or store
    the last N queries to re-run
  2. Figure out how to distribute the caches from the active nodes. This
    would be a really cool approach, but I don't know about the
    feasibility.

I am open to any other suggestions that I can do on my side.

Thanks,
Paul

On Sep 1, 12:58 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Especially with this many shards... .

On Wed, Sep 1, 2010 at 9:58 PM, Shay Banon <
shay.ba...@elasticsearch.com

wrote:

Yes, this might be that. If you use facets / sorting / filters then
the

cache will require some time to fill up.

On Wed, Sep 1, 2010 at 9:54 PM, Paul ppea...@gmail.com wrote:

Hi Shay,
Here is an interesting observation... Once a node is warmed up,
ES

is using ~9GB of memory. When I run the test to add the new node
performance does not stabilize until the new nodes memory usage
hits

about the 9GB mark.

This implies to me that the degradation is occurring due to ES
needing

to build up the filter cache and the data cache.

Also, the first node in the cluster looks fine in terms of system
metrics while this goes on. It is the new node that was just added
that spikes out the CPU and is the bottleneck.

Thanks,
Paul

On Sep 1, 12:44 pm, Paul ppea...@gmail.com wrote:

On 0.10.0.

I played around with that setting trying 1 (which I actually
think

gets set to 3 in the code) and 10. Both settings demonstrate the
same

performance impact, as the default, which should be 24 (cores)
in my

case.

Also, played around with in memory indexes in this situation.
Seems

the same issue occurs, but I'm not 100% confident in my results,
as

I

started swapping due to low memory.

Thanks,
Paul

On Sep 1, 12:22 pm, Shay Banon shay.ba...@elasticsearch.com
wrote:

If you are using latest elasticsearch (0.10) then the setting
is

routing.allocation.concurrent_recoveries. Which version are
you

using?

On Wed, Sep 1, 2010 at 7:48 PM, Paul ppea...@gmail.com
wrote:

Hey Shay,
Thanks for the details. I tried adding the following into
my

config

file and restarting the cluster:

routing:
allocation:
concurrent_recoveries : 1

I also tried this setting that I found mentioned elsewhere:

indices:
recovery:
throttler:
concurrent_recoveries : 1

Are either of these correct? For the first setting,
performance

appeared as bad and with the second, worse.

Also, keep in mind, the actual recovery should be pretty
light

weight,

as I am used fs based local storage and not indexing during
my

test

case, so the local working data is all good when the node
comes

online.

I'm going to be playing around with in memory indexes this
afternoon

to see if behavior is similar.

Thanks,
Paul

On Sep 1, 6:49 am, Shay Banon <shay.ba...@elasticsearch.com

wrote:

This basically happens because of the large amount of
indices

and

shards

you

have on a single node. When you start a second node, the
replica

shards

will

be allocated to it, and a recover process will start. I
suspect

that the

recovery process takes its time, as well as the creation
of

all

those

indices and shards.

It is throttled, by the way (the recovery process), but
the

default

number

of concurrent recoveries allowed on a node is based on the
number

of

cores,

which you have plenty. You can set
the routing.allocation.concurrent_recoveries setting, and
set

it

to a

lower

value. Note though, that changing this setting only
applies

for

new

masters.

So you will need to bring both nodes down to change the
setting.

-shay.banon

-shay.banon

On Wed, Sep 1, 2010 at 9:34 AM, Paul ppea...@gmail.com
wrote:

Hey,
Was curious if there were any thoughts on a performance
issue I

have

observed.

Running two nodes mirrored (ie, replicas=1 for all
indexes)

w/

40

indexes, 75 shards, ~30gb of data, fs storage, with no
indexing

occurring and 64 threads running test queries. In a
steady

state, I

get ~600 queries per second.

I remove one node and as expected throughput drops to a
reasonable 350

queries per

...

read more »


(Shay Banon) #15

Agreed, smart traffic / operation shaping based on load (tricky think to
define) will certainly help here.

-shay.banon

On Thu, Sep 2, 2010 at 9:36 AM, Paul ppearcy@gmail.com wrote:

FYI, figured out how to avoid this. Since the shard recovery from the
fs is so fast when there have been no index updates, all shards are up
nearly instantly with empty caches.

To work around this contention, I have:

  1. as suggested, throttled the concurrent_recoveries to 1
  2. remove the work directory before node start, forcing much slower
    recovery from the gateway
  3. clear the system disk cache (ymmv on this one, but seemed to help
    me)

When I do this performance never drops below the single node steady
state.

This got me thinking to an alternative approach that may handle this
and many other cluster conditions.

If there was feedback back into the shard router from response times,
cpu load, etc it would be able route traffic around shards with
warming caches, nodes with failing h/w, out of mem conditions, and
properly distribute traffic to slower vs faster h/w in the cluster.

Right now, in a smaller cluster (and probably a bigger), one slow node
can drag down performance disproportionately.

Thanks again for all the help.

Best Regards,
Paul

On Sep 1, 2:13 pm, Paul ppea...@gmail.com wrote:

It all depends on the search requirements, SLAs, etc...

The tests I am running at the moment are synthetic in that they don't
simulate realworld traffic, and instead are going for max throughput.
I should have the infrastructure in place over the next couple of
weeks to replay traffic through ES at real world rates (or multiples
thereof) w/ the burst patterns associated with that and will no for
sure if this will be a real issue for me.

If it is still an issue for me, it sounds like I may need to have an
extra server in the cluster to account for this case, but we will
see.

Thanks,
Paul

On Sep 1, 1:41 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

No, the field data cache is unique to the JVM and not really
serializable. I

can try and check if its possible to minimize the time mentioned, but
won't

the 10 seconds slower QPS be mitigated by doubling the QPS once all is
done?

-shay.banon

On Wed, Sep 1, 2010 at 10:38 PM, Paul ppea...@gmail.com wrote:

Hey Shay,
Thanks for the further details. My final config will be using a 3
node cluster and I need to get another piece of h/w to see how things
behave for this case with that config.

The throughput will probably still be good enough to live with the
performance hit of adding a node, but am not sure. Worst case, will
have to have rules on my side that we can't add a new node during
high

traffic periods, which sounds a little backwards.

When a shard is reallocating is there any feasibility in copying over
the field/data cache state, as well?

Thanks again for the help,
Paul

On Sep 1, 1:27 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Is the 10 second period of QPS degradation is critical? Another
solution

can

be to have more machines to even the load and reduce the effect of
another
node joining the cluster.

Doing a pre warming phase is not simple at all (though it certainly
sounds
like it :wink: ). The reason is that I can add pre warming on a shard
level,

but

you end up having 75 shards allocate on that node. Each shard is on
its

own,

they are not interconnected, or have dependencies one or the other
(as it

is

very very complicate to do in distributed fashion). So, the first
shard

can

warm before it is available, but then, the others warming up might
cause

the

queries directed to the already started shard to slow down.

-shay.banon

On Wed, Sep 1, 2010 at 10:16 PM, Paul ppea...@gmail.com wrote:

Yeah, doing heavy filtering and sorting based on date.

Everyone of my queries is using a filter cache for part of it.
The

filter caches used are finite, probably ~2000 or so different
ones.

I can get my shard count down to ~45 without much effort, but
lower

than that point will be non-trivial due to the fact that I am
dynamically translating queries from a previous domain to a new
one

and the old domain had 45 standalone indexes.

Since the node immediately joins the cluster, I don't see anyway
that

I can warm it up in isolation. Our current search system, each
machine

is isolated and none share any state, so it is easy to re-run the
last

N queries to warm its internal caches.

My other alternative is to run multiple single machine clusters,
which

would allow isolated warm up, but this config would remove a
large

chunk of ES's usefulness and have data consistency issues.

From an ES perspective, I see a few possible solutions:

  1. Do a configurable internal warm up of the node. Could either
    dispatch live queries to the node and disregard the response or
    store

the last N queries to re-run
2) Figure out how to distribute the caches from the active nodes.
This

would be a really cool approach, but I don't know about the
feasibility.

I am open to any other suggestions that I can do on my side.

Thanks,
Paul

On Sep 1, 12:58 pm, Shay Banon shay.ba...@elasticsearch.com
wrote:

Especially with this many shards... .

On Wed, Sep 1, 2010 at 9:58 PM, Shay Banon <
shay.ba...@elasticsearch.com

wrote:

Yes, this might be that. If you use facets / sorting /
filters then

the

cache will require some time to fill up.

On Wed, Sep 1, 2010 at 9:54 PM, Paul ppea...@gmail.com
wrote:

Hi Shay,
Here is an interesting observation... Once a node is warmed
up,

ES

is using ~9GB of memory. When I run the test to add the new
node

performance does not stabilize until the new nodes memory
usage

hits

about the 9GB mark.

This implies to me that the degradation is occurring due to
ES

needing

to build up the filter cache and the data cache.

Also, the first node in the cluster looks fine in terms of
system

metrics while this goes on. It is the new node that was just
added

that spikes out the CPU and is the bottleneck.

Thanks,
Paul

On Sep 1, 12:44 pm, Paul ppea...@gmail.com wrote:

On 0.10.0.

I played around with that setting trying 1 (which I
actually

think

gets set to 3 in the code) and 10. Both settings
demonstrate the

same

performance impact, as the default, which should be 24
(cores)

in my

case.

Also, played around with in memory indexes in this
situation.

Seems

the same issue occurs, but I'm not 100% confident in my
results,

as

I

started swapping due to low memory.

Thanks,
Paul

On Sep 1, 12:22 pm, Shay Banon <
shay.ba...@elasticsearch.com>

wrote:

If you are using latest elasticsearch (0.10) then the
setting

is

routing.allocation.concurrent_recoveries. Which version
are

you

using?

On Wed, Sep 1, 2010 at 7:48 PM, Paul <ppea...@gmail.com

wrote:

Hey Shay,
Thanks for the details. I tried adding the following
into

my

config

file and restarting the cluster:

routing:
allocation:
concurrent_recoveries : 1

I also tried this setting that I found mentioned
elsewhere:

indices:
recovery:
throttler:
concurrent_recoveries : 1

Are either of these correct? For the first setting,
performance

appeared as bad and with the second, worse.

Also, keep in mind, the actual recovery should be
pretty

light

weight,

as I am used fs based local storage and not indexing
during

my

test

case, so the local working data is all good when the
node

comes

online.

I'm going to be playing around with in memory indexes
this

afternoon

to see if behavior is similar.

Thanks,
Paul

On Sep 1, 6:49 am, Shay Banon <
shay.ba...@elasticsearch.com

wrote:

This basically happens because of the large amount
of

indices

and

shards

you

have on a single node. When you start a second node,
the

replica

shards

will

be allocated to it, and a recover process will
start. I

suspect

that the

recovery process takes its time, as well as the
creation

of

all

those

indices and shards.

It is throttled, by the way (the recovery process),
but

the

default

number

of concurrent recoveries allowed on a node is based
on the

number

of

cores,

which you have plenty. You can set
the routing.allocation.concurrent_recoveries
setting, and

set

it

to a

lower

value. Note though, that changing this setting only
applies

for

new

masters.

So you will need to bring both nodes down to change
the

setting.

-shay.banon

-shay.banon

On Wed, Sep 1, 2010 at 9:34 AM, Paul <
ppea...@gmail.com>

wrote:

Hey,
Was curious if there were any thoughts on a
performance

issue I

have

observed.

Running two nodes mirrored (ie, replicas=1 for all
indexes)

w/

40

indexes, 75 shards, ~30gb of data, fs storage,
with no

indexing

occurring and 64 threads running test queries. In
a

steady

state, I

get ~600 queries per second.

I remove one node and as expected throughput drops
to a

reasonable 350

queries per

...

read more »


(system) #16