Temporary performance degradation when adding new node

Hey,
Was curious if there were any thoughts on a performance issue I have
observed.

Running two nodes mirrored (ie, replicas=1 for all indexes) w/ 40
indexes, 75 shards, ~30gb of data, fs storage, with no indexing
occurring and 64 threads running test queries. In a steady state, I
get ~600 queries per second.

I remove one node and as expected throughput drops to a reasonable 350
queries per second.

When I add the second node back, I initially see some performance
degradation. For a ten second period throughput drops to 90 qps and
then begins recovering and after 20 seconds is back up to the 600 qps
range.

All the indexes in the work directory are up to date on the second
node and disk caches should be reasonably warmed, so my guess is that
the overhead is from building up the filter caches or similar on the
new node.

So, under a high load situation where it is necessary to add a new
node, things will get worse before they get better.

Perhaps a node warm-up mechanism could alleviate this situation?

I need to play around more with in memory indexes to see if they have
the same issue, since there is a warm_cache setting that may be
appropriate for this situation.

Thanks,
Paul

This basically happens because of the large amount of indices and shards you
have on a single node. When you start a second node, the replica shards will
be allocated to it, and a recover process will start. I suspect that the
recovery process takes its time, as well as the creation of all those
indices and shards.

It is throttled, by the way (the recovery process), but the default number
of concurrent recoveries allowed on a node is based on the number of cores,
which you have plenty. You can set
the routing.allocation.concurrent_recoveries setting, and set it to a lower
value. Note though, that changing this setting only applies for new masters.
So you will need to bring both nodes down to change the setting.

-shay.banon

-shay.banon

On Wed, Sep 1, 2010 at 9:34 AM, Paul ppearcy@gmail.com wrote:

Hey,
Was curious if there were any thoughts on a performance issue I have
observed.

Running two nodes mirrored (ie, replicas=1 for all indexes) w/ 40
indexes, 75 shards, ~30gb of data, fs storage, with no indexing
occurring and 64 threads running test queries. In a steady state, I
get ~600 queries per second.

I remove one node and as expected throughput drops to a reasonable 350
queries per second.

When I add the second node back, I initially see some performance
degradation. For a ten second period throughput drops to 90 qps and
then begins recovering and after 20 seconds is back up to the 600 qps
range.

All the indexes in the work directory are up to date on the second
node and disk caches should be reasonably warmed, so my guess is that
the overhead is from building up the filter caches or similar on the
new node.

So, under a high load situation where it is necessary to add a new
node, things will get worse before they get better.

Perhaps a node warm-up mechanism could alleviate this situation?

I need to play around more with in memory indexes to see if they have
the same issue, since there is a warm_cache setting that may be
appropriate for this situation.

Thanks,
Paul

Hey Shay,
Thanks for the details. I tried adding the following into my config
file and restarting the cluster:

routing:
allocation:
concurrent_recoveries : 1

I also tried this setting that I found mentioned elsewhere:

indices:
recovery:
throttler:
concurrent_recoveries : 1

Are either of these correct? For the first setting, performance
appeared as bad and with the second, worse.

Also, keep in mind, the actual recovery should be pretty light weight,
as I am used fs based local storage and not indexing during my test
case, so the local working data is all good when the node comes
online.

I'm going to be playing around with in memory indexes this afternoon
to see if behavior is similar.

Thanks,
Paul

On Sep 1, 6:49 am, Shay Banon shay.ba...@elasticsearch.com wrote:

This basically happens because of the large amount of indices and shards you
have on a single node. When you start a second node, the replica shards will
be allocated to it, and a recover process will start. I suspect that the
recovery process takes its time, as well as the creation of all those
indices and shards.

It is throttled, by the way (the recovery process), but the default number
of concurrent recoveries allowed on a node is based on the number of cores,
which you have plenty. You can set
the routing.allocation.concurrent_recoveries setting, and set it to a lower
value. Note though, that changing this setting only applies for new masters.
So you will need to bring both nodes down to change the setting.

-shay.banon

-shay.banon

On Wed, Sep 1, 2010 at 9:34 AM, Paul ppea...@gmail.com wrote:

Hey,
Was curious if there were any thoughts on a performance issue I have
observed.

Running two nodes mirrored (ie, replicas=1 for all indexes) w/ 40
indexes, 75 shards, ~30gb of data, fs storage, with no indexing
occurring and 64 threads running test queries. In a steady state, I
get ~600 queries per second.

I remove one node and as expected throughput drops to a reasonable 350
queries per second.

When I add the second node back, I initially see some performance
degradation. For a ten second period throughput drops to 90 qps and
then begins recovering and after 20 seconds is back up to the 600 qps
range.

All the indexes in the work directory are up to date on the second
node and disk caches should be reasonably warmed, so my guess is that
the overhead is from building up the filter caches or similar on the
new node.

So, under a high load situation where it is necessary to add a new
node, things will get worse before they get better.

Perhaps a node warm-up mechanism could alleviate this situation?

I need to play around more with in memory indexes to see if they have
the same issue, since there is a warm_cache setting that may be
appropriate for this situation.

Thanks,
Paul

If you are using latest elasticsearch (0.10) then the setting is
routing.allocation.concurrent_recoveries. Which version are you using?

On Wed, Sep 1, 2010 at 7:48 PM, Paul ppearcy@gmail.com wrote:

Hey Shay,
Thanks for the details. I tried adding the following into my config
file and restarting the cluster:

routing:
allocation:
concurrent_recoveries : 1

I also tried this setting that I found mentioned elsewhere:

indices:
recovery:
throttler:
concurrent_recoveries : 1

Are either of these correct? For the first setting, performance
appeared as bad and with the second, worse.

Also, keep in mind, the actual recovery should be pretty light weight,
as I am used fs based local storage and not indexing during my test
case, so the local working data is all good when the node comes
online.

I'm going to be playing around with in memory indexes this afternoon
to see if behavior is similar.

Thanks,
Paul

On Sep 1, 6:49 am, Shay Banon shay.ba...@elasticsearch.com wrote:

This basically happens because of the large amount of indices and shards
you
have on a single node. When you start a second node, the replica shards
will
be allocated to it, and a recover process will start. I suspect that the
recovery process takes its time, as well as the creation of all those
indices and shards.

It is throttled, by the way (the recovery process), but the default
number
of concurrent recoveries allowed on a node is based on the number of
cores,
which you have plenty. You can set
the routing.allocation.concurrent_recoveries setting, and set it to a
lower
value. Note though, that changing this setting only applies for new
masters.
So you will need to bring both nodes down to change the setting.

-shay.banon

-shay.banon

On Wed, Sep 1, 2010 at 9:34 AM, Paul ppea...@gmail.com wrote:

Hey,
Was curious if there were any thoughts on a performance issue I have
observed.

Running two nodes mirrored (ie, replicas=1 for all indexes) w/ 40
indexes, 75 shards, ~30gb of data, fs storage, with no indexing
occurring and 64 threads running test queries. In a steady state, I
get ~600 queries per second.

I remove one node and as expected throughput drops to a reasonable 350
queries per second.

When I add the second node back, I initially see some performance
degradation. For a ten second period throughput drops to 90 qps and
then begins recovering and after 20 seconds is back up to the 600 qps
range.

All the indexes in the work directory are up to date on the second
node and disk caches should be reasonably warmed, so my guess is that
the overhead is from building up the filter caches or similar on the
new node.

So, under a high load situation where it is necessary to add a new
node, things will get worse before they get better.

Perhaps a node warm-up mechanism could alleviate this situation?

I need to play around more with in memory indexes to see if they have
the same issue, since there is a warm_cache setting that may be
appropriate for this situation.

Thanks,
Paul

On 0.10.0.

I played around with that setting trying 1 (which I actually think
gets set to 3 in the code) and 10. Both settings demonstrate the same
performance impact, as the default, which should be 24 (cores) in my
case.

Also, played around with in memory indexes in this situation. Seems
the same issue occurs, but I'm not 100% confident in my results, as I
started swapping due to low memory.

Thanks,
Paul

On Sep 1, 12:22 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

If you are using latest elasticsearch (0.10) then the setting is
routing.allocation.concurrent_recoveries. Which version are you using?

On Wed, Sep 1, 2010 at 7:48 PM, Paul ppea...@gmail.com wrote:

Hey Shay,
Thanks for the details. I tried adding the following into my config
file and restarting the cluster:

routing:
allocation:
concurrent_recoveries : 1

I also tried this setting that I found mentioned elsewhere:

indices:
recovery:
throttler:
concurrent_recoveries : 1

Are either of these correct? For the first setting, performance
appeared as bad and with the second, worse.

Also, keep in mind, the actual recovery should be pretty light weight,
as I am used fs based local storage and not indexing during my test
case, so the local working data is all good when the node comes
online.

I'm going to be playing around with in memory indexes this afternoon
to see if behavior is similar.

Thanks,
Paul

On Sep 1, 6:49 am, Shay Banon shay.ba...@elasticsearch.com wrote:

This basically happens because of the large amount of indices and shards
you
have on a single node. When you start a second node, the replica shards
will
be allocated to it, and a recover process will start. I suspect that the
recovery process takes its time, as well as the creation of all those
indices and shards.

It is throttled, by the way (the recovery process), but the default
number
of concurrent recoveries allowed on a node is based on the number of
cores,
which you have plenty. You can set
the routing.allocation.concurrent_recoveries setting, and set it to a
lower
value. Note though, that changing this setting only applies for new
masters.
So you will need to bring both nodes down to change the setting.

-shay.banon

-shay.banon

On Wed, Sep 1, 2010 at 9:34 AM, Paul ppea...@gmail.com wrote:

Hey,
Was curious if there were any thoughts on a performance issue I have
observed.

Running two nodes mirrored (ie, replicas=1 for all indexes) w/ 40
indexes, 75 shards, ~30gb of data, fs storage, with no indexing
occurring and 64 threads running test queries. In a steady state, I
get ~600 queries per second.

I remove one node and as expected throughput drops to a reasonable 350
queries per second.

When I add the second node back, I initially see some performance
degradation. For a ten second period throughput drops to 90 qps and
then begins recovering and after 20 seconds is back up to the 600 qps
range.

All the indexes in the work directory are up to date on the second
node and disk caches should be reasonably warmed, so my guess is that
the overhead is from building up the filter caches or similar on the
new node.

So, under a high load situation where it is necessary to add a new
node, things will get worse before they get better.

Perhaps a node warm-up mechanism could alleviate this situation?

I need to play around more with in memory indexes to see if they have
the same issue, since there is a warm_cache setting that may be
appropriate for this situation.

Thanks,
Paul

Hi Shay,
Here is an interesting observation... Once a node is warmed up, ES
is using ~9GB of memory. When I run the test to add the new node
performance does not stabilize until the new nodes memory usage hits
about the 9GB mark.

This implies to me that the degradation is occurring due to ES needing
to build up the filter cache and the data cache.

Also, the first node in the cluster looks fine in terms of system
metrics while this goes on. It is the new node that was just added
that spikes out the CPU and is the bottleneck.

Thanks,
Paul

On Sep 1, 12:44 pm, Paul ppea...@gmail.com wrote:

On 0.10.0.

I played around with that setting trying 1 (which I actually think
gets set to 3 in the code) and 10. Both settings demonstrate the same
performance impact, as the default, which should be 24 (cores) in my
case.

Also, played around with in memory indexes in this situation. Seems
the same issue occurs, but I'm not 100% confident in my results, as I
started swapping due to low memory.

Thanks,
Paul

On Sep 1, 12:22 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

If you are using latest elasticsearch (0.10) then the setting is
routing.allocation.concurrent_recoveries. Which version are you using?

On Wed, Sep 1, 2010 at 7:48 PM, Paul ppea...@gmail.com wrote:

Hey Shay,
Thanks for the details. I tried adding the following into my config
file and restarting the cluster:

routing:
allocation:
concurrent_recoveries : 1

I also tried this setting that I found mentioned elsewhere:

indices:
recovery:
throttler:
concurrent_recoveries : 1

Are either of these correct? For the first setting, performance
appeared as bad and with the second, worse.

Also, keep in mind, the actual recovery should be pretty light weight,
as I am used fs based local storage and not indexing during my test
case, so the local working data is all good when the node comes
online.

I'm going to be playing around with in memory indexes this afternoon
to see if behavior is similar.

Thanks,
Paul

On Sep 1, 6:49 am, Shay Banon shay.ba...@elasticsearch.com wrote:

This basically happens because of the large amount of indices and shards
you
have on a single node. When you start a second node, the replica shards
will
be allocated to it, and a recover process will start. I suspect that the
recovery process takes its time, as well as the creation of all those
indices and shards.

It is throttled, by the way (the recovery process), but the default
number
of concurrent recoveries allowed on a node is based on the number of
cores,
which you have plenty. You can set
the routing.allocation.concurrent_recoveries setting, and set it to a
lower
value. Note though, that changing this setting only applies for new
masters.
So you will need to bring both nodes down to change the setting.

-shay.banon

-shay.banon

On Wed, Sep 1, 2010 at 9:34 AM, Paul ppea...@gmail.com wrote:

Hey,
Was curious if there were any thoughts on a performance issue I have
observed.

Running two nodes mirrored (ie, replicas=1 for all indexes) w/ 40
indexes, 75 shards, ~30gb of data, fs storage, with no indexing
occurring and 64 threads running test queries. In a steady state, I
get ~600 queries per second.

I remove one node and as expected throughput drops to a reasonable 350
queries per second.

When I add the second node back, I initially see some performance
degradation. For a ten second period throughput drops to 90 qps and
then begins recovering and after 20 seconds is back up to the 600 qps
range.

All the indexes in the work directory are up to date on the second
node and disk caches should be reasonably warmed, so my guess is that
the overhead is from building up the filter caches or similar on the
new node.

So, under a high load situation where it is necessary to add a new
node, things will get worse before they get better.

Perhaps a node warm-up mechanism could alleviate this situation?

I need to play around more with in memory indexes to see if they have
the same issue, since there is a warm_cache setting that may be
appropriate for this situation.

Thanks,
Paul

Yes, this might be that. If you use facets / sorting / filters then the
cache will require some time to fill up.

On Wed, Sep 1, 2010 at 9:54 PM, Paul ppearcy@gmail.com wrote:

Hi Shay,
Here is an interesting observation... Once a node is warmed up, ES
is using ~9GB of memory. When I run the test to add the new node
performance does not stabilize until the new nodes memory usage hits
about the 9GB mark.

This implies to me that the degradation is occurring due to ES needing
to build up the filter cache and the data cache.

Also, the first node in the cluster looks fine in terms of system
metrics while this goes on. It is the new node that was just added
that spikes out the CPU and is the bottleneck.

Thanks,
Paul

On Sep 1, 12:44 pm, Paul ppea...@gmail.com wrote:

On 0.10.0.

I played around with that setting trying 1 (which I actually think
gets set to 3 in the code) and 10. Both settings demonstrate the same
performance impact, as the default, which should be 24 (cores) in my
case.

Also, played around with in memory indexes in this situation. Seems
the same issue occurs, but I'm not 100% confident in my results, as I
started swapping due to low memory.

Thanks,
Paul

On Sep 1, 12:22 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

If you are using latest elasticsearch (0.10) then the setting is
routing.allocation.concurrent_recoveries. Which version are you using?

On Wed, Sep 1, 2010 at 7:48 PM, Paul ppea...@gmail.com wrote:

Hey Shay,
Thanks for the details. I tried adding the following into my config
file and restarting the cluster:

routing:
allocation:
concurrent_recoveries : 1

I also tried this setting that I found mentioned elsewhere:

indices:
recovery:
throttler:
concurrent_recoveries : 1

Are either of these correct? For the first setting, performance
appeared as bad and with the second, worse.

Also, keep in mind, the actual recovery should be pretty light
weight,
as I am used fs based local storage and not indexing during my test
case, so the local working data is all good when the node comes
online.

I'm going to be playing around with in memory indexes this afternoon
to see if behavior is similar.

Thanks,
Paul

On Sep 1, 6:49 am, Shay Banon shay.ba...@elasticsearch.com wrote:

This basically happens because of the large amount of indices and
shards
you
have on a single node. When you start a second node, the replica
shards
will
be allocated to it, and a recover process will start. I suspect
that the
recovery process takes its time, as well as the creation of all
those
indices and shards.

It is throttled, by the way (the recovery process), but the default
number
of concurrent recoveries allowed on a node is based on the number
of
cores,
which you have plenty. You can set
the routing.allocation.concurrent_recoveries setting, and set it to
a
lower
value. Note though, that changing this setting only applies for new
masters.
So you will need to bring both nodes down to change the setting.

-shay.banon

-shay.banon

On Wed, Sep 1, 2010 at 9:34 AM, Paul ppea...@gmail.com wrote:

Hey,
Was curious if there were any thoughts on a performance issue I
have
observed.

Running two nodes mirrored (ie, replicas=1 for all indexes) w/ 40
indexes, 75 shards, ~30gb of data, fs storage, with no indexing
occurring and 64 threads running test queries. In a steady state,
I
get ~600 queries per second.

I remove one node and as expected throughput drops to a
reasonable 350
queries per second.

When I add the second node back, I initially see some performance
degradation. For a ten second period throughput drops to 90 qps
and
then begins recovering and after 20 seconds is back up to the 600
qps
range.

All the indexes in the work directory are up to date on the
second
node and disk caches should be reasonably warmed, so my guess is
that
the overhead is from building up the filter caches or similar on
the
new node.

So, under a high load situation where it is necessary to add a
new
node, things will get worse before they get better.

Perhaps a node warm-up mechanism could alleviate this situation?

I need to play around more with in memory indexes to see if they
have
the same issue, since there is a warm_cache setting that may be
appropriate for this situation.

Thanks,
Paul

Especially with this many shards... .

On Wed, Sep 1, 2010 at 9:58 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Yes, this might be that. If you use facets / sorting / filters then the
cache will require some time to fill up.

On Wed, Sep 1, 2010 at 9:54 PM, Paul ppearcy@gmail.com wrote:

Hi Shay,
Here is an interesting observation... Once a node is warmed up, ES
is using ~9GB of memory. When I run the test to add the new node
performance does not stabilize until the new nodes memory usage hits
about the 9GB mark.

This implies to me that the degradation is occurring due to ES needing
to build up the filter cache and the data cache.

Also, the first node in the cluster looks fine in terms of system
metrics while this goes on. It is the new node that was just added
that spikes out the CPU and is the bottleneck.

Thanks,
Paul

On Sep 1, 12:44 pm, Paul ppea...@gmail.com wrote:

On 0.10.0.

I played around with that setting trying 1 (which I actually think
gets set to 3 in the code) and 10. Both settings demonstrate the same
performance impact, as the default, which should be 24 (cores) in my
case.

Also, played around with in memory indexes in this situation. Seems
the same issue occurs, but I'm not 100% confident in my results, as I
started swapping due to low memory.

Thanks,
Paul

On Sep 1, 12:22 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

If you are using latest elasticsearch (0.10) then the setting is
routing.allocation.concurrent_recoveries. Which version are you using?

On Wed, Sep 1, 2010 at 7:48 PM, Paul ppea...@gmail.com wrote:

Hey Shay,
Thanks for the details. I tried adding the following into my config
file and restarting the cluster:

routing:
allocation:
concurrent_recoveries : 1

I also tried this setting that I found mentioned elsewhere:

indices:
recovery:
throttler:
concurrent_recoveries : 1

Are either of these correct? For the first setting, performance
appeared as bad and with the second, worse.

Also, keep in mind, the actual recovery should be pretty light
weight,
as I am used fs based local storage and not indexing during my test
case, so the local working data is all good when the node comes
online.

I'm going to be playing around with in memory indexes this afternoon
to see if behavior is similar.

Thanks,
Paul

On Sep 1, 6:49 am, Shay Banon shay.ba...@elasticsearch.com wrote:

This basically happens because of the large amount of indices and
shards
you
have on a single node. When you start a second node, the replica
shards
will
be allocated to it, and a recover process will start. I suspect
that the
recovery process takes its time, as well as the creation of all
those
indices and shards.

It is throttled, by the way (the recovery process), but the
default
number
of concurrent recoveries allowed on a node is based on the number
of
cores,
which you have plenty. You can set
the routing.allocation.concurrent_recoveries setting, and set it
to a
lower
value. Note though, that changing this setting only applies for
new
masters.
So you will need to bring both nodes down to change the setting.

-shay.banon

-shay.banon

On Wed, Sep 1, 2010 at 9:34 AM, Paul ppea...@gmail.com wrote:

Hey,
Was curious if there were any thoughts on a performance issue I
have
observed.

Running two nodes mirrored (ie, replicas=1 for all indexes) w/
40
indexes, 75 shards, ~30gb of data, fs storage, with no indexing
occurring and 64 threads running test queries. In a steady
state, I
get ~600 queries per second.

I remove one node and as expected throughput drops to a
reasonable 350
queries per second.

When I add the second node back, I initially see some
performance
degradation. For a ten second period throughput drops to 90 qps
and
then begins recovering and after 20 seconds is back up to the
600 qps
range.

All the indexes in the work directory are up to date on the
second
node and disk caches should be reasonably warmed, so my guess is
that
the overhead is from building up the filter caches or similar on
the
new node.

So, under a high load situation where it is necessary to add a
new
node, things will get worse before they get better.

Perhaps a node warm-up mechanism could alleviate this situation?

I need to play around more with in memory indexes to see if they
have
the same issue, since there is a warm_cache setting that may be
appropriate for this situation.

Thanks,
Paul

Yeah, doing heavy filtering and sorting based on date.

Everyone of my queries is using a filter cache for part of it. The
filter caches used are finite, probably ~2000 or so different ones.

I can get my shard count down to ~45 without much effort, but lower
than that point will be non-trivial due to the fact that I am
dynamically translating queries from a previous domain to a new one
and the old domain had 45 standalone indexes.

Since the node immediately joins the cluster, I don't see anyway that
I can warm it up in isolation. Our current search system, each machine
is isolated and none share any state, so it is easy to re-run the last
N queries to warm its internal caches.

My other alternative is to run multiple single machine clusters, which
would allow isolated warm up, but this config would remove a large
chunk of ES's usefulness and have data consistency issues.

From an ES perspective, I see a few possible solutions:

  1. Do a configurable internal warm up of the node. Could either
    dispatch live queries to the node and disregard the response or store
    the last N queries to re-run
  2. Figure out how to distribute the caches from the active nodes. This
    would be a really cool approach, but I don't know about the
    feasibility.

I am open to any other suggestions that I can do on my side.

Thanks,
Paul

On Sep 1, 12:58 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Especially with this many shards... .

On Wed, Sep 1, 2010 at 9:58 PM, Shay Banon shay.ba...@elasticsearch.comwrote:

Yes, this might be that. If you use facets / sorting / filters then the
cache will require some time to fill up.

On Wed, Sep 1, 2010 at 9:54 PM, Paul ppea...@gmail.com wrote:

Hi Shay,
Here is an interesting observation... Once a node is warmed up, ES
is using ~9GB of memory. When I run the test to add the new node
performance does not stabilize until the new nodes memory usage hits
about the 9GB mark.

This implies to me that the degradation is occurring due to ES needing
to build up the filter cache and the data cache.

Also, the first node in the cluster looks fine in terms of system
metrics while this goes on. It is the new node that was just added
that spikes out the CPU and is the bottleneck.

Thanks,
Paul

On Sep 1, 12:44 pm, Paul ppea...@gmail.com wrote:

On 0.10.0.

I played around with that setting trying 1 (which I actually think
gets set to 3 in the code) and 10. Both settings demonstrate the same
performance impact, as the default, which should be 24 (cores) in my
case.

Also, played around with in memory indexes in this situation. Seems
the same issue occurs, but I'm not 100% confident in my results, as I
started swapping due to low memory.

Thanks,
Paul

On Sep 1, 12:22 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

If you are using latest elasticsearch (0.10) then the setting is
routing.allocation.concurrent_recoveries. Which version are you using?

On Wed, Sep 1, 2010 at 7:48 PM, Paul ppea...@gmail.com wrote:

Hey Shay,
Thanks for the details. I tried adding the following into my config
file and restarting the cluster:

routing:
allocation:
concurrent_recoveries : 1

I also tried this setting that I found mentioned elsewhere:

indices:
recovery:
throttler:
concurrent_recoveries : 1

Are either of these correct? For the first setting, performance
appeared as bad and with the second, worse.

Also, keep in mind, the actual recovery should be pretty light
weight,
as I am used fs based local storage and not indexing during my test
case, so the local working data is all good when the node comes
online.

I'm going to be playing around with in memory indexes this afternoon
to see if behavior is similar.

Thanks,
Paul

On Sep 1, 6:49 am, Shay Banon shay.ba...@elasticsearch.com wrote:

This basically happens because of the large amount of indices and
shards
you
have on a single node. When you start a second node, the replica
shards
will
be allocated to it, and a recover process will start. I suspect
that the
recovery process takes its time, as well as the creation of all
those
indices and shards.

It is throttled, by the way (the recovery process), but the
default
number
of concurrent recoveries allowed on a node is based on the number
of
cores,
which you have plenty. You can set
the routing.allocation.concurrent_recoveries setting, and set it
to a
lower
value. Note though, that changing this setting only applies for
new
masters.
So you will need to bring both nodes down to change the setting.

-shay.banon

-shay.banon

On Wed, Sep 1, 2010 at 9:34 AM, Paul ppea...@gmail.com wrote:

Hey,
Was curious if there were any thoughts on a performance issue I
have
observed.

Running two nodes mirrored (ie, replicas=1 for all indexes) w/
40
indexes, 75 shards, ~30gb of data, fs storage, with no indexing
occurring and 64 threads running test queries. In a steady
state, I
get ~600 queries per second.

I remove one node and as expected throughput drops to a
reasonable 350
queries per second.

When I add the second node back, I initially see some
performance
degradation. For a ten second period throughput drops to 90 qps
and
then begins recovering and after 20 seconds is back up to the
600 qps
range.

All the indexes in the work directory are up to date on the
second
node and disk caches should be reasonably warmed, so my guess is
that
the overhead is from building up the filter caches or similar on
the
new node.

So, under a high load situation where it is necessary to add a
new
node, things will get worse before they get better.

Perhaps a node warm-up mechanism could alleviate this situation?

I need to play around more with in memory indexes to see if they
have
the same issue, since there is a warm_cache setting that may be
appropriate for this situation.

Thanks,
Paul

Is the 10 second period of QPS degradation is critical? Another solution can
be to have more machines to even the load and reduce the effect of another
node joining the cluster.

Doing a pre warming phase is not simple at all (though it certainly sounds
like it :wink: ). The reason is that I can add pre warming on a shard level, but
you end up having 75 shards allocate on that node. Each shard is on its own,
they are not interconnected, or have dependencies one or the other (as it is
very very complicate to do in distributed fashion). So, the first shard can
warm before it is available, but then, the others warming up might cause the
queries directed to the already started shard to slow down.

-shay.banon

On Wed, Sep 1, 2010 at 10:16 PM, Paul ppearcy@gmail.com wrote:

Yeah, doing heavy filtering and sorting based on date.

Everyone of my queries is using a filter cache for part of it. The
filter caches used are finite, probably ~2000 or so different ones.

I can get my shard count down to ~45 without much effort, but lower
than that point will be non-trivial due to the fact that I am
dynamically translating queries from a previous domain to a new one
and the old domain had 45 standalone indexes.

Since the node immediately joins the cluster, I don't see anyway that
I can warm it up in isolation. Our current search system, each machine
is isolated and none share any state, so it is easy to re-run the last
N queries to warm its internal caches.

My other alternative is to run multiple single machine clusters, which
would allow isolated warm up, but this config would remove a large
chunk of ES's usefulness and have data consistency issues.

From an ES perspective, I see a few possible solutions:

  1. Do a configurable internal warm up of the node. Could either
    dispatch live queries to the node and disregard the response or store
    the last N queries to re-run
  2. Figure out how to distribute the caches from the active nodes. This
    would be a really cool approach, but I don't know about the
    feasibility.

I am open to any other suggestions that I can do on my side.

Thanks,
Paul

On Sep 1, 12:58 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Especially with this many shards... .

On Wed, Sep 1, 2010 at 9:58 PM, Shay Banon <shay.ba...@elasticsearch.com
wrote:

Yes, this might be that. If you use facets / sorting / filters then the
cache will require some time to fill up.

On Wed, Sep 1, 2010 at 9:54 PM, Paul ppea...@gmail.com wrote:

Hi Shay,
Here is an interesting observation... Once a node is warmed up, ES
is using ~9GB of memory. When I run the test to add the new node
performance does not stabilize until the new nodes memory usage hits
about the 9GB mark.

This implies to me that the degradation is occurring due to ES needing
to build up the filter cache and the data cache.

Also, the first node in the cluster looks fine in terms of system
metrics while this goes on. It is the new node that was just added
that spikes out the CPU and is the bottleneck.

Thanks,
Paul

On Sep 1, 12:44 pm, Paul ppea...@gmail.com wrote:

On 0.10.0.

I played around with that setting trying 1 (which I actually think
gets set to 3 in the code) and 10. Both settings demonstrate the
same
performance impact, as the default, which should be 24 (cores) in my
case.

Also, played around with in memory indexes in this situation. Seems
the same issue occurs, but I'm not 100% confident in my results, as
I
started swapping due to low memory.

Thanks,
Paul

On Sep 1, 12:22 pm, Shay Banon shay.ba...@elasticsearch.com
wrote:

If you are using latest elasticsearch (0.10) then the setting is
routing.allocation.concurrent_recoveries. Which version are you
using?

On Wed, Sep 1, 2010 at 7:48 PM, Paul ppea...@gmail.com wrote:

Hey Shay,
Thanks for the details. I tried adding the following into my
config
file and restarting the cluster:

routing:
allocation:
concurrent_recoveries : 1

I also tried this setting that I found mentioned elsewhere:

indices:
recovery:
throttler:
concurrent_recoveries : 1

Are either of these correct? For the first setting, performance
appeared as bad and with the second, worse.

Also, keep in mind, the actual recovery should be pretty light
weight,
as I am used fs based local storage and not indexing during my
test
case, so the local working data is all good when the node comes
online.

I'm going to be playing around with in memory indexes this
afternoon
to see if behavior is similar.

Thanks,
Paul

On Sep 1, 6:49 am, Shay Banon shay.ba...@elasticsearch.com
wrote:

This basically happens because of the large amount of indices
and
shards
you
have on a single node. When you start a second node, the
replica
shards
will
be allocated to it, and a recover process will start. I
suspect
that the
recovery process takes its time, as well as the creation of
all
those
indices and shards.

It is throttled, by the way (the recovery process), but the
default
number
of concurrent recoveries allowed on a node is based on the
number
of
cores,
which you have plenty. You can set
the routing.allocation.concurrent_recoveries setting, and set
it
to a
lower
value. Note though, that changing this setting only applies
for
new
masters.
So you will need to bring both nodes down to change the
setting.

-shay.banon

-shay.banon

On Wed, Sep 1, 2010 at 9:34 AM, Paul ppea...@gmail.com
wrote:

Hey,
Was curious if there were any thoughts on a performance
issue I
have
observed.

Running two nodes mirrored (ie, replicas=1 for all indexes)
w/
40
indexes, 75 shards, ~30gb of data, fs storage, with no
indexing
occurring and 64 threads running test queries. In a steady
state, I
get ~600 queries per second.

I remove one node and as expected throughput drops to a
reasonable 350
queries per second.

When I add the second node back, I initially see some
performance
degradation. For a ten second period throughput drops to 90
qps
and
then begins recovering and after 20 seconds is back up to
the
600 qps
range.

All the indexes in the work directory are up to date on the
second
node and disk caches should be reasonably warmed, so my
guess is
that
the overhead is from building up the filter caches or
similar on
the
new node.

So, under a high load situation where it is necessary to add
a
new
node, things will get worse before they get better.

Perhaps a node warm-up mechanism could alleviate this
situation?

I need to play around more with in memory indexes to see if
they
have
the same issue, since there is a warm_cache setting that may
be
appropriate for this situation.

Thanks,
Paul

Hey Shay,
Thanks for the further details. My final config will be using a 3
node cluster and I need to get another piece of h/w to see how things
behave for this case with that config.

The throughput will probably still be good enough to live with the
performance hit of adding a node, but am not sure. Worst case, will
have to have rules on my side that we can't add a new node during high
traffic periods, which sounds a little backwards.

When a shard is reallocating is there any feasibility in copying over
the field/data cache state, as well?

Thanks again for the help,
Paul

On Sep 1, 1:27 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Is the 10 second period of QPS degradation is critical? Another solution can
be to have more machines to even the load and reduce the effect of another
node joining the cluster.

Doing a pre warming phase is not simple at all (though it certainly sounds
like it :wink: ). The reason is that I can add pre warming on a shard level, but
you end up having 75 shards allocate on that node. Each shard is on its own,
they are not interconnected, or have dependencies one or the other (as it is
very very complicate to do in distributed fashion). So, the first shard can
warm before it is available, but then, the others warming up might cause the
queries directed to the already started shard to slow down.

-shay.banon

On Wed, Sep 1, 2010 at 10:16 PM, Paul ppea...@gmail.com wrote:

Yeah, doing heavy filtering and sorting based on date.

Everyone of my queries is using a filter cache for part of it. The
filter caches used are finite, probably ~2000 or so different ones.

I can get my shard count down to ~45 without much effort, but lower
than that point will be non-trivial due to the fact that I am
dynamically translating queries from a previous domain to a new one
and the old domain had 45 standalone indexes.

Since the node immediately joins the cluster, I don't see anyway that
I can warm it up in isolation. Our current search system, each machine
is isolated and none share any state, so it is easy to re-run the last
N queries to warm its internal caches.

My other alternative is to run multiple single machine clusters, which
would allow isolated warm up, but this config would remove a large
chunk of ES's usefulness and have data consistency issues.

From an ES perspective, I see a few possible solutions:

  1. Do a configurable internal warm up of the node. Could either
    dispatch live queries to the node and disregard the response or store
    the last N queries to re-run
  2. Figure out how to distribute the caches from the active nodes. This
    would be a really cool approach, but I don't know about the
    feasibility.

I am open to any other suggestions that I can do on my side.

Thanks,
Paul

On Sep 1, 12:58 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Especially with this many shards... .

On Wed, Sep 1, 2010 at 9:58 PM, Shay Banon <shay.ba...@elasticsearch.com
wrote:

Yes, this might be that. If you use facets / sorting / filters then the
cache will require some time to fill up.

On Wed, Sep 1, 2010 at 9:54 PM, Paul ppea...@gmail.com wrote:

Hi Shay,
Here is an interesting observation... Once a node is warmed up, ES
is using ~9GB of memory. When I run the test to add the new node
performance does not stabilize until the new nodes memory usage hits
about the 9GB mark.

This implies to me that the degradation is occurring due to ES needing
to build up the filter cache and the data cache.

Also, the first node in the cluster looks fine in terms of system
metrics while this goes on. It is the new node that was just added
that spikes out the CPU and is the bottleneck.

Thanks,
Paul

On Sep 1, 12:44 pm, Paul ppea...@gmail.com wrote:

On 0.10.0.

I played around with that setting trying 1 (which I actually think
gets set to 3 in the code) and 10. Both settings demonstrate the
same
performance impact, as the default, which should be 24 (cores) in my
case.

Also, played around with in memory indexes in this situation. Seems
the same issue occurs, but I'm not 100% confident in my results, as
I
started swapping due to low memory.

Thanks,
Paul

On Sep 1, 12:22 pm, Shay Banon shay.ba...@elasticsearch.com
wrote:

If you are using latest elasticsearch (0.10) then the setting is
routing.allocation.concurrent_recoveries. Which version are you
using?

On Wed, Sep 1, 2010 at 7:48 PM, Paul ppea...@gmail.com wrote:

Hey Shay,
Thanks for the details. I tried adding the following into my
config
file and restarting the cluster:

routing:
allocation:
concurrent_recoveries : 1

I also tried this setting that I found mentioned elsewhere:

indices:
recovery:
throttler:
concurrent_recoveries : 1

Are either of these correct? For the first setting, performance
appeared as bad and with the second, worse.

Also, keep in mind, the actual recovery should be pretty light
weight,
as I am used fs based local storage and not indexing during my
test
case, so the local working data is all good when the node comes
online.

I'm going to be playing around with in memory indexes this
afternoon
to see if behavior is similar.

Thanks,
Paul

On Sep 1, 6:49 am, Shay Banon shay.ba...@elasticsearch.com
wrote:

This basically happens because of the large amount of indices
and
shards
you
have on a single node. When you start a second node, the
replica
shards
will
be allocated to it, and a recover process will start. I
suspect
that the
recovery process takes its time, as well as the creation of
all
those
indices and shards.

It is throttled, by the way (the recovery process), but the
default
number
of concurrent recoveries allowed on a node is based on the
number
of
cores,
which you have plenty. You can set
the routing.allocation.concurrent_recoveries setting, and set
it
to a
lower
value. Note though, that changing this setting only applies
for
new
masters.
So you will need to bring both nodes down to change the
setting.

-shay.banon

-shay.banon

On Wed, Sep 1, 2010 at 9:34 AM, Paul ppea...@gmail.com
wrote:

Hey,
Was curious if there were any thoughts on a performance
issue I
have
observed.

Running two nodes mirrored (ie, replicas=1 for all indexes)
w/
40
indexes, 75 shards, ~30gb of data, fs storage, with no
indexing
occurring and 64 threads running test queries. In a steady
state, I
get ~600 queries per second.

I remove one node and as expected throughput drops to a
reasonable 350
queries per second.

When I add the second node back, I initially see some
performance
degradation. For a ten second period throughput drops to 90
qps
and
then begins recovering and after 20 seconds is back up to
the
600 qps
range.

All the indexes in the work directory are up to date on the
second
node and disk caches should be reasonably warmed, so my
guess is
that
the overhead is from building up the filter caches or
similar on
the
new node.

So, under a high load situation where it is necessary to add
a
new
node, things will get worse before they get better.

Perhaps a node warm-up mechanism could alleviate this
situation?

I need to play around more with in memory indexes to see if
they
have
the same issue, since there is a warm_cache setting that may
be
appropriate for this situation.

Thanks,
Paul

No, the field data cache is unique to the JVM and not really serializable. I
can try and check if its possible to minimize the time mentioned, but won't
the 10 seconds slower QPS be mitigated by doubling the QPS once all is done?

-shay.banon

On Wed, Sep 1, 2010 at 10:38 PM, Paul ppearcy@gmail.com wrote:

Hey Shay,
Thanks for the further details. My final config will be using a 3
node cluster and I need to get another piece of h/w to see how things
behave for this case with that config.

The throughput will probably still be good enough to live with the
performance hit of adding a node, but am not sure. Worst case, will
have to have rules on my side that we can't add a new node during high
traffic periods, which sounds a little backwards.

When a shard is reallocating is there any feasibility in copying over
the field/data cache state, as well?

Thanks again for the help,
Paul

On Sep 1, 1:27 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Is the 10 second period of QPS degradation is critical? Another solution
can
be to have more machines to even the load and reduce the effect of
another
node joining the cluster.

Doing a pre warming phase is not simple at all (though it certainly
sounds
like it :wink: ). The reason is that I can add pre warming on a shard level,
but
you end up having 75 shards allocate on that node. Each shard is on its
own,
they are not interconnected, or have dependencies one or the other (as it
is
very very complicate to do in distributed fashion). So, the first shard
can
warm before it is available, but then, the others warming up might cause
the
queries directed to the already started shard to slow down.

-shay.banon

On Wed, Sep 1, 2010 at 10:16 PM, Paul ppea...@gmail.com wrote:

Yeah, doing heavy filtering and sorting based on date.

Everyone of my queries is using a filter cache for part of it. The
filter caches used are finite, probably ~2000 or so different ones.

I can get my shard count down to ~45 without much effort, but lower
than that point will be non-trivial due to the fact that I am
dynamically translating queries from a previous domain to a new one
and the old domain had 45 standalone indexes.

Since the node immediately joins the cluster, I don't see anyway that
I can warm it up in isolation. Our current search system, each machine
is isolated and none share any state, so it is easy to re-run the last
N queries to warm its internal caches.

My other alternative is to run multiple single machine clusters, which
would allow isolated warm up, but this config would remove a large
chunk of ES's usefulness and have data consistency issues.

From an ES perspective, I see a few possible solutions:

  1. Do a configurable internal warm up of the node. Could either
    dispatch live queries to the node and disregard the response or store
    the last N queries to re-run
  2. Figure out how to distribute the caches from the active nodes. This
    would be a really cool approach, but I don't know about the
    feasibility.

I am open to any other suggestions that I can do on my side.

Thanks,
Paul

On Sep 1, 12:58 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Especially with this many shards... .

On Wed, Sep 1, 2010 at 9:58 PM, Shay Banon <
shay.ba...@elasticsearch.com
wrote:

Yes, this might be that. If you use facets / sorting / filters then
the
cache will require some time to fill up.

On Wed, Sep 1, 2010 at 9:54 PM, Paul ppea...@gmail.com wrote:

Hi Shay,
Here is an interesting observation... Once a node is warmed up,
ES
is using ~9GB of memory. When I run the test to add the new node
performance does not stabilize until the new nodes memory usage
hits
about the 9GB mark.

This implies to me that the degradation is occurring due to ES
needing
to build up the filter cache and the data cache.

Also, the first node in the cluster looks fine in terms of system
metrics while this goes on. It is the new node that was just added
that spikes out the CPU and is the bottleneck.

Thanks,
Paul

On Sep 1, 12:44 pm, Paul ppea...@gmail.com wrote:

On 0.10.0.

I played around with that setting trying 1 (which I actually
think
gets set to 3 in the code) and 10. Both settings demonstrate the
same
performance impact, as the default, which should be 24 (cores)
in my
case.

Also, played around with in memory indexes in this situation.
Seems
the same issue occurs, but I'm not 100% confident in my results,
as
I
started swapping due to low memory.

Thanks,
Paul

On Sep 1, 12:22 pm, Shay Banon shay.ba...@elasticsearch.com
wrote:

If you are using latest elasticsearch (0.10) then the setting
is
routing.allocation.concurrent_recoveries. Which version are
you
using?

On Wed, Sep 1, 2010 at 7:48 PM, Paul ppea...@gmail.com
wrote:

Hey Shay,
Thanks for the details. I tried adding the following into
my
config
file and restarting the cluster:

routing:
allocation:
concurrent_recoveries : 1

I also tried this setting that I found mentioned elsewhere:

indices:
recovery:
throttler:
concurrent_recoveries : 1

Are either of these correct? For the first setting,
performance
appeared as bad and with the second, worse.

Also, keep in mind, the actual recovery should be pretty
light
weight,
as I am used fs based local storage and not indexing during
my
test
case, so the local working data is all good when the node
comes
online.

I'm going to be playing around with in memory indexes this
afternoon
to see if behavior is similar.

Thanks,
Paul

On Sep 1, 6:49 am, Shay Banon <shay.ba...@elasticsearch.com

wrote:

This basically happens because of the large amount of
indices
and
shards
you
have on a single node. When you start a second node, the
replica
shards
will
be allocated to it, and a recover process will start. I
suspect
that the
recovery process takes its time, as well as the creation
of
all
those
indices and shards.

It is throttled, by the way (the recovery process), but
the
default
number
of concurrent recoveries allowed on a node is based on the
number
of
cores,
which you have plenty. You can set
the routing.allocation.concurrent_recoveries setting, and
set
it
to a
lower
value. Note though, that changing this setting only
applies
for
new
masters.
So you will need to bring both nodes down to change the
setting.

-shay.banon

-shay.banon

On Wed, Sep 1, 2010 at 9:34 AM, Paul ppea...@gmail.com
wrote:

Hey,
Was curious if there were any thoughts on a performance
issue I
have
observed.

Running two nodes mirrored (ie, replicas=1 for all
indexes)
w/
40
indexes, 75 shards, ~30gb of data, fs storage, with no
indexing
occurring and 64 threads running test queries. In a
steady
state, I
get ~600 queries per second.

I remove one node and as expected throughput drops to a
reasonable 350
queries per second.

When I add the second node back, I initially see some
performance
degradation. For a ten second period throughput drops to
90
qps
and
then begins recovering and after 20 seconds is back up
to
the
600 qps
range.

All the indexes in the work directory are up to date on
the
second
node and disk caches should be reasonably warmed, so my
guess is
that
the overhead is from building up the filter caches or
similar on
the
new node.

So, under a high load situation where it is necessary to
add
a
new
node, things will get worse before they get better.

Perhaps a node warm-up mechanism could alleviate this
situation?

I need to play around more with in memory indexes to see
if
they
have
the same issue, since there is a warm_cache setting that
may
be
appropriate for this situation.

Thanks,
Paul

It all depends on the search requirements, SLAs, etc...

The tests I am running at the moment are synthetic in that they don't
simulate realworld traffic, and instead are going for max throughput.
I should have the infrastructure in place over the next couple of
weeks to replay traffic through ES at real world rates (or multiples
thereof) w/ the burst patterns associated with that and will no for
sure if this will be a real issue for me.

If it is still an issue for me, it sounds like I may need to have an
extra server in the cluster to account for this case, but we will
see.

Thanks,
Paul

On Sep 1, 1:41 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

No, the field data cache is unique to the JVM and not really serializable. I
can try and check if its possible to minimize the time mentioned, but won't
the 10 seconds slower QPS be mitigated by doubling the QPS once all is done?

-shay.banon

On Wed, Sep 1, 2010 at 10:38 PM, Paul ppea...@gmail.com wrote:

Hey Shay,
Thanks for the further details. My final config will be using a 3
node cluster and I need to get another piece of h/w to see how things
behave for this case with that config.

The throughput will probably still be good enough to live with the
performance hit of adding a node, but am not sure. Worst case, will
have to have rules on my side that we can't add a new node during high
traffic periods, which sounds a little backwards.

When a shard is reallocating is there any feasibility in copying over
the field/data cache state, as well?

Thanks again for the help,
Paul

On Sep 1, 1:27 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Is the 10 second period of QPS degradation is critical? Another solution
can
be to have more machines to even the load and reduce the effect of
another
node joining the cluster.

Doing a pre warming phase is not simple at all (though it certainly
sounds
like it :wink: ). The reason is that I can add pre warming on a shard level,
but
you end up having 75 shards allocate on that node. Each shard is on its
own,
they are not interconnected, or have dependencies one or the other (as it
is
very very complicate to do in distributed fashion). So, the first shard
can
warm before it is available, but then, the others warming up might cause
the
queries directed to the already started shard to slow down.

-shay.banon

On Wed, Sep 1, 2010 at 10:16 PM, Paul ppea...@gmail.com wrote:

Yeah, doing heavy filtering and sorting based on date.

Everyone of my queries is using a filter cache for part of it. The
filter caches used are finite, probably ~2000 or so different ones.

I can get my shard count down to ~45 without much effort, but lower
than that point will be non-trivial due to the fact that I am
dynamically translating queries from a previous domain to a new one
and the old domain had 45 standalone indexes.

Since the node immediately joins the cluster, I don't see anyway that
I can warm it up in isolation. Our current search system, each machine
is isolated and none share any state, so it is easy to re-run the last
N queries to warm its internal caches.

My other alternative is to run multiple single machine clusters, which
would allow isolated warm up, but this config would remove a large
chunk of ES's usefulness and have data consistency issues.

From an ES perspective, I see a few possible solutions:

  1. Do a configurable internal warm up of the node. Could either
    dispatch live queries to the node and disregard the response or store
    the last N queries to re-run
  2. Figure out how to distribute the caches from the active nodes. This
    would be a really cool approach, but I don't know about the
    feasibility.

I am open to any other suggestions that I can do on my side.

Thanks,
Paul

On Sep 1, 12:58 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Especially with this many shards... .

On Wed, Sep 1, 2010 at 9:58 PM, Shay Banon <
shay.ba...@elasticsearch.com
wrote:

Yes, this might be that. If you use facets / sorting / filters then
the
cache will require some time to fill up.

On Wed, Sep 1, 2010 at 9:54 PM, Paul ppea...@gmail.com wrote:

Hi Shay,
Here is an interesting observation... Once a node is warmed up,
ES
is using ~9GB of memory. When I run the test to add the new node
performance does not stabilize until the new nodes memory usage
hits
about the 9GB mark.

This implies to me that the degradation is occurring due to ES
needing
to build up the filter cache and the data cache.

Also, the first node in the cluster looks fine in terms of system
metrics while this goes on. It is the new node that was just added
that spikes out the CPU and is the bottleneck.

Thanks,
Paul

On Sep 1, 12:44 pm, Paul ppea...@gmail.com wrote:

On 0.10.0.

I played around with that setting trying 1 (which I actually
think
gets set to 3 in the code) and 10. Both settings demonstrate the
same
performance impact, as the default, which should be 24 (cores)
in my
case.

Also, played around with in memory indexes in this situation.
Seems
the same issue occurs, but I'm not 100% confident in my results,
as
I
started swapping due to low memory.

Thanks,
Paul

On Sep 1, 12:22 pm, Shay Banon shay.ba...@elasticsearch.com
wrote:

If you are using latest elasticsearch (0.10) then the setting
is
routing.allocation.concurrent_recoveries. Which version are
you
using?

On Wed, Sep 1, 2010 at 7:48 PM, Paul ppea...@gmail.com
wrote:

Hey Shay,
Thanks for the details. I tried adding the following into
my
config
file and restarting the cluster:

routing:
allocation:
concurrent_recoveries : 1

I also tried this setting that I found mentioned elsewhere:

indices:
recovery:
throttler:
concurrent_recoveries : 1

Are either of these correct? For the first setting,
performance
appeared as bad and with the second, worse.

Also, keep in mind, the actual recovery should be pretty
light
weight,
as I am used fs based local storage and not indexing during
my
test
case, so the local working data is all good when the node
comes
online.

I'm going to be playing around with in memory indexes this
afternoon
to see if behavior is similar.

Thanks,
Paul

On Sep 1, 6:49 am, Shay Banon <shay.ba...@elasticsearch.com

wrote:

This basically happens because of the large amount of
indices
and
shards
you
have on a single node. When you start a second node, the
replica
shards
will
be allocated to it, and a recover process will start. I
suspect
that the
recovery process takes its time, as well as the creation
of
all
those
indices and shards.

It is throttled, by the way (the recovery process), but
the
default
number
of concurrent recoveries allowed on a node is based on the
number
of
cores,
which you have plenty. You can set
the routing.allocation.concurrent_recoveries setting, and
set
it
to a
lower
value. Note though, that changing this setting only
applies
for
new
masters.
So you will need to bring both nodes down to change the
setting.

-shay.banon

-shay.banon

On Wed, Sep 1, 2010 at 9:34 AM, Paul ppea...@gmail.com
wrote:

Hey,
Was curious if there were any thoughts on a performance
issue I
have
observed.

Running two nodes mirrored (ie, replicas=1 for all
indexes)
w/
40
indexes, 75 shards, ~30gb of data, fs storage, with no
indexing
occurring and 64 threads running test queries. In a
steady
state, I
get ~600 queries per second.

I remove one node and as expected throughput drops to a
reasonable 350
queries per second.

When I add the second node back, I initially see some
performance
degradation. For a ten second period throughput drops to
90
qps
and
then begins recovering and after 20 seconds is back up
to
the
600 qps
range.

All the indexes in the work directory are up to date on
the
second
node and disk caches should be reasonably warmed, so my
guess is
that
the overhead is from building up the filter caches or
similar on
the
new node.

So, under a high load situation where it is necessary to
add
a
new
node, things will get worse before they get better.

Perhaps a node warm-up mechanism could alleviate this
situation?

I need to play around more with in memory indexes to see
if
they
have
the same issue, since there is a warm_cache setting that
may
be
appropriate for this situation.

...

read more »

FYI, figured out how to avoid this. Since the shard recovery from the
fs is so fast when there have been no index updates, all shards are up
nearly instantly with empty caches.

To work around this contention, I have:

  1. as suggested, throttled the concurrent_recoveries to 1
  2. remove the work directory before node start, forcing much slower
    recovery from the gateway
  3. clear the system disk cache (ymmv on this one, but seemed to help
    me)

When I do this performance never drops below the single node steady
state.

This got me thinking to an alternative approach that may handle this
and many other cluster conditions.

If there was feedback back into the shard router from response times,
cpu load, etc it would be able route traffic around shards with
warming caches, nodes with failing h/w, out of mem conditions, and
properly distribute traffic to slower vs faster h/w in the cluster.

Right now, in a smaller cluster (and probably a bigger), one slow node
can drag down performance disproportionately.

Thanks again for all the help.

Best Regards,
Paul

On Sep 1, 2:13 pm, Paul ppea...@gmail.com wrote:

It all depends on the search requirements, SLAs, etc...

The tests I am running at the moment are synthetic in that they don't
simulate realworld traffic, and instead are going for max throughput.
I should have the infrastructure in place over the next couple of
weeks to replay traffic through ES at real world rates (or multiples
thereof) w/ the burst patterns associated with that and will no for
sure if this will be a real issue for me.

If it is still an issue for me, it sounds like I may need to have an
extra server in the cluster to account for this case, but we will
see.

Thanks,
Paul

On Sep 1, 1:41 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

No, the field data cache is unique to the JVM and not really serializable. I
can try and check if its possible to minimize the time mentioned, but won't
the 10 seconds slower QPS be mitigated by doubling the QPS once all is done?

-shay.banon

On Wed, Sep 1, 2010 at 10:38 PM, Paul ppea...@gmail.com wrote:

Hey Shay,
Thanks for the further details. My final config will be using a 3
node cluster and I need to get another piece of h/w to see how things
behave for this case with that config.

The throughput will probably still be good enough to live with the
performance hit of adding a node, but am not sure. Worst case, will
have to have rules on my side that we can't add a new node during high
traffic periods, which sounds a little backwards.

When a shard is reallocating is there any feasibility in copying over
the field/data cache state, as well?

Thanks again for the help,
Paul

On Sep 1, 1:27 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Is the 10 second period of QPS degradation is critical? Another solution
can
be to have more machines to even the load and reduce the effect of
another
node joining the cluster.

Doing a pre warming phase is not simple at all (though it certainly
sounds
like it :wink: ). The reason is that I can add pre warming on a shard level,
but
you end up having 75 shards allocate on that node. Each shard is on its
own,
they are not interconnected, or have dependencies one or the other (as it
is
very very complicate to do in distributed fashion). So, the first shard
can
warm before it is available, but then, the others warming up might cause
the
queries directed to the already started shard to slow down.

-shay.banon

On Wed, Sep 1, 2010 at 10:16 PM, Paul ppea...@gmail.com wrote:

Yeah, doing heavy filtering and sorting based on date.

Everyone of my queries is using a filter cache for part of it. The
filter caches used are finite, probably ~2000 or so different ones.

I can get my shard count down to ~45 without much effort, but lower
than that point will be non-trivial due to the fact that I am
dynamically translating queries from a previous domain to a new one
and the old domain had 45 standalone indexes.

Since the node immediately joins the cluster, I don't see anyway that
I can warm it up in isolation. Our current search system, each machine
is isolated and none share any state, so it is easy to re-run the last
N queries to warm its internal caches.

My other alternative is to run multiple single machine clusters, which
would allow isolated warm up, but this config would remove a large
chunk of ES's usefulness and have data consistency issues.

From an ES perspective, I see a few possible solutions:

  1. Do a configurable internal warm up of the node. Could either
    dispatch live queries to the node and disregard the response or store
    the last N queries to re-run
  2. Figure out how to distribute the caches from the active nodes. This
    would be a really cool approach, but I don't know about the
    feasibility.

I am open to any other suggestions that I can do on my side.

Thanks,
Paul

On Sep 1, 12:58 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Especially with this many shards... .

On Wed, Sep 1, 2010 at 9:58 PM, Shay Banon <
shay.ba...@elasticsearch.com
wrote:

Yes, this might be that. If you use facets / sorting / filters then
the
cache will require some time to fill up.

On Wed, Sep 1, 2010 at 9:54 PM, Paul ppea...@gmail.com wrote:

Hi Shay,
Here is an interesting observation... Once a node is warmed up,
ES
is using ~9GB of memory. When I run the test to add the new node
performance does not stabilize until the new nodes memory usage
hits
about the 9GB mark.

This implies to me that the degradation is occurring due to ES
needing
to build up the filter cache and the data cache.

Also, the first node in the cluster looks fine in terms of system
metrics while this goes on. It is the new node that was just added
that spikes out the CPU and is the bottleneck.

Thanks,
Paul

On Sep 1, 12:44 pm, Paul ppea...@gmail.com wrote:

On 0.10.0.

I played around with that setting trying 1 (which I actually
think
gets set to 3 in the code) and 10. Both settings demonstrate the
same
performance impact, as the default, which should be 24 (cores)
in my
case.

Also, played around with in memory indexes in this situation.
Seems
the same issue occurs, but I'm not 100% confident in my results,
as
I
started swapping due to low memory.

Thanks,
Paul

On Sep 1, 12:22 pm, Shay Banon shay.ba...@elasticsearch.com
wrote:

If you are using latest elasticsearch (0.10) then the setting
is
routing.allocation.concurrent_recoveries. Which version are
you
using?

On Wed, Sep 1, 2010 at 7:48 PM, Paul ppea...@gmail.com
wrote:

Hey Shay,
Thanks for the details. I tried adding the following into
my
config
file and restarting the cluster:

routing:
allocation:
concurrent_recoveries : 1

I also tried this setting that I found mentioned elsewhere:

indices:
recovery:
throttler:
concurrent_recoveries : 1

Are either of these correct? For the first setting,
performance
appeared as bad and with the second, worse.

Also, keep in mind, the actual recovery should be pretty
light
weight,
as I am used fs based local storage and not indexing during
my
test
case, so the local working data is all good when the node
comes
online.

I'm going to be playing around with in memory indexes this
afternoon
to see if behavior is similar.

Thanks,
Paul

On Sep 1, 6:49 am, Shay Banon <shay.ba...@elasticsearch.com

wrote:

This basically happens because of the large amount of
indices
and
shards
you
have on a single node. When you start a second node, the
replica
shards
will
be allocated to it, and a recover process will start. I
suspect
that the
recovery process takes its time, as well as the creation
of
all
those
indices and shards.

It is throttled, by the way (the recovery process), but
the
default
number
of concurrent recoveries allowed on a node is based on the
number
of
cores,
which you have plenty. You can set
the routing.allocation.concurrent_recoveries setting, and
set
it
to a
lower
value. Note though, that changing this setting only
applies
for
new
masters.
So you will need to bring both nodes down to change the
setting.

-shay.banon

-shay.banon

On Wed, Sep 1, 2010 at 9:34 AM, Paul ppea...@gmail.com
wrote:

Hey,
Was curious if there were any thoughts on a performance
issue I
have
observed.

Running two nodes mirrored (ie, replicas=1 for all
indexes)
w/
40
indexes, 75 shards, ~30gb of data, fs storage, with no
indexing
occurring and 64 threads running test queries. In a
steady
state, I
get ~600 queries per second.

I remove one node and as expected throughput drops to a
reasonable 350
queries per

...

read more »

Agreed, smart traffic / operation shaping based on load (tricky think to
define) will certainly help here.

-shay.banon

On Thu, Sep 2, 2010 at 9:36 AM, Paul ppearcy@gmail.com wrote:

FYI, figured out how to avoid this. Since the shard recovery from the
fs is so fast when there have been no index updates, all shards are up
nearly instantly with empty caches.

To work around this contention, I have:

  1. as suggested, throttled the concurrent_recoveries to 1
  2. remove the work directory before node start, forcing much slower
    recovery from the gateway
  3. clear the system disk cache (ymmv on this one, but seemed to help
    me)

When I do this performance never drops below the single node steady
state.

This got me thinking to an alternative approach that may handle this
and many other cluster conditions.

If there was feedback back into the shard router from response times,
cpu load, etc it would be able route traffic around shards with
warming caches, nodes with failing h/w, out of mem conditions, and
properly distribute traffic to slower vs faster h/w in the cluster.

Right now, in a smaller cluster (and probably a bigger), one slow node
can drag down performance disproportionately.

Thanks again for all the help.

Best Regards,
Paul

On Sep 1, 2:13 pm, Paul ppea...@gmail.com wrote:

It all depends on the search requirements, SLAs, etc...

The tests I am running at the moment are synthetic in that they don't
simulate realworld traffic, and instead are going for max throughput.
I should have the infrastructure in place over the next couple of
weeks to replay traffic through ES at real world rates (or multiples
thereof) w/ the burst patterns associated with that and will no for
sure if this will be a real issue for me.

If it is still an issue for me, it sounds like I may need to have an
extra server in the cluster to account for this case, but we will
see.

Thanks,
Paul

On Sep 1, 1:41 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

No, the field data cache is unique to the JVM and not really
serializable. I
can try and check if its possible to minimize the time mentioned, but
won't
the 10 seconds slower QPS be mitigated by doubling the QPS once all is
done?

-shay.banon

On Wed, Sep 1, 2010 at 10:38 PM, Paul ppea...@gmail.com wrote:

Hey Shay,
Thanks for the further details. My final config will be using a 3
node cluster and I need to get another piece of h/w to see how things
behave for this case with that config.

The throughput will probably still be good enough to live with the
performance hit of adding a node, but am not sure. Worst case, will
have to have rules on my side that we can't add a new node during
high
traffic periods, which sounds a little backwards.

When a shard is reallocating is there any feasibility in copying over
the field/data cache state, as well?

Thanks again for the help,
Paul

On Sep 1, 1:27 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Is the 10 second period of QPS degradation is critical? Another
solution
can
be to have more machines to even the load and reduce the effect of
another
node joining the cluster.

Doing a pre warming phase is not simple at all (though it certainly
sounds
like it :wink: ). The reason is that I can add pre warming on a shard
level,
but
you end up having 75 shards allocate on that node. Each shard is on
its
own,
they are not interconnected, or have dependencies one or the other
(as it
is
very very complicate to do in distributed fashion). So, the first
shard
can
warm before it is available, but then, the others warming up might
cause
the
queries directed to the already started shard to slow down.

-shay.banon

On Wed, Sep 1, 2010 at 10:16 PM, Paul ppea...@gmail.com wrote:

Yeah, doing heavy filtering and sorting based on date.

Everyone of my queries is using a filter cache for part of it.
The
filter caches used are finite, probably ~2000 or so different
ones.

I can get my shard count down to ~45 without much effort, but
lower
than that point will be non-trivial due to the fact that I am
dynamically translating queries from a previous domain to a new
one
and the old domain had 45 standalone indexes.

Since the node immediately joins the cluster, I don't see anyway
that
I can warm it up in isolation. Our current search system, each
machine
is isolated and none share any state, so it is easy to re-run the
last
N queries to warm its internal caches.

My other alternative is to run multiple single machine clusters,
which
would allow isolated warm up, but this config would remove a
large
chunk of ES's usefulness and have data consistency issues.

From an ES perspective, I see a few possible solutions:

  1. Do a configurable internal warm up of the node. Could either
    dispatch live queries to the node and disregard the response or
    store
    the last N queries to re-run
  2. Figure out how to distribute the caches from the active nodes.
    This
    would be a really cool approach, but I don't know about the
    feasibility.

I am open to any other suggestions that I can do on my side.

Thanks,
Paul

On Sep 1, 12:58 pm, Shay Banon shay.ba...@elasticsearch.com
wrote:

Especially with this many shards... .

On Wed, Sep 1, 2010 at 9:58 PM, Shay Banon <
shay.ba...@elasticsearch.com
wrote:

Yes, this might be that. If you use facets / sorting /
filters then
the
cache will require some time to fill up.

On Wed, Sep 1, 2010 at 9:54 PM, Paul ppea...@gmail.com
wrote:

Hi Shay,
Here is an interesting observation... Once a node is warmed
up,
ES
is using ~9GB of memory. When I run the test to add the new
node
performance does not stabilize until the new nodes memory
usage
hits
about the 9GB mark.

This implies to me that the degradation is occurring due to
ES
needing
to build up the filter cache and the data cache.

Also, the first node in the cluster looks fine in terms of
system
metrics while this goes on. It is the new node that was just
added
that spikes out the CPU and is the bottleneck.

Thanks,
Paul

On Sep 1, 12:44 pm, Paul ppea...@gmail.com wrote:

On 0.10.0.

I played around with that setting trying 1 (which I
actually
think
gets set to 3 in the code) and 10. Both settings
demonstrate the
same
performance impact, as the default, which should be 24
(cores)
in my
case.

Also, played around with in memory indexes in this
situation.
Seems
the same issue occurs, but I'm not 100% confident in my
results,
as
I
started swapping due to low memory.

Thanks,
Paul

On Sep 1, 12:22 pm, Shay Banon <
shay.ba...@elasticsearch.com>
wrote:

If you are using latest elasticsearch (0.10) then the
setting
is
routing.allocation.concurrent_recoveries. Which version
are
you
using?

On Wed, Sep 1, 2010 at 7:48 PM, Paul <ppea...@gmail.com

wrote:

Hey Shay,
Thanks for the details. I tried adding the following
into
my
config
file and restarting the cluster:

routing:
allocation:
concurrent_recoveries : 1

I also tried this setting that I found mentioned
elsewhere:

indices:
recovery:
throttler:
concurrent_recoveries : 1

Are either of these correct? For the first setting,
performance
appeared as bad and with the second, worse.

Also, keep in mind, the actual recovery should be
pretty
light
weight,
as I am used fs based local storage and not indexing
during
my
test
case, so the local working data is all good when the
node
comes
online.

I'm going to be playing around with in memory indexes
this
afternoon
to see if behavior is similar.

Thanks,
Paul

On Sep 1, 6:49 am, Shay Banon <
shay.ba...@elasticsearch.com

wrote:

This basically happens because of the large amount
of
indices
and
shards
you
have on a single node. When you start a second node,
the
replica
shards
will
be allocated to it, and a recover process will
start. I
suspect
that the
recovery process takes its time, as well as the
creation
of
all
those
indices and shards.

It is throttled, by the way (the recovery process),
but
the
default
number
of concurrent recoveries allowed on a node is based
on the
number
of
cores,
which you have plenty. You can set
the routing.allocation.concurrent_recoveries
setting, and
set
it
to a
lower
value. Note though, that changing this setting only
applies
for
new
masters.
So you will need to bring both nodes down to change
the
setting.

-shay.banon

-shay.banon

On Wed, Sep 1, 2010 at 9:34 AM, Paul <
ppea...@gmail.com>
wrote:

Hey,
Was curious if there were any thoughts on a
performance
issue I
have
observed.

Running two nodes mirrored (ie, replicas=1 for all
indexes)
w/
40
indexes, 75 shards, ~30gb of data, fs storage,
with no
indexing
occurring and 64 threads running test queries. In
a
steady
state, I
get ~600 queries per second.

I remove one node and as expected throughput drops
to a
reasonable 350
queries per

...

read more »