EC2 Perfomance problems, advice needed


(Ariel Amato) #1

Hi,
I have been trying to use ES for a search service for my site,
which is running on EC2 servers, and I am having a lot of performance
problems.
My data set is roughly 1.2M records of 30 fields each, mostly
Integers of Long numbers, a few short Strings and a Geo location
point.
Most of the search queries use a native custom score script.
I need to query the ES cluster with around 300 concurrent
requests, and I roughly need a performance of 40 requests per second.
I have got a staging environment, with around 2k records of the
same nature that the production data set has, with the default
sharding configuration and testing with only one ES instance it
achieves over 30 requests per second with the above mentioned amount
of concurrent requests.
My production cluster is constituted by 2 c1.xlarge instances,
which are 8 core boxes with 7 GB of memory each, running with Ubuntu
10.10 and ES 0.16.2.
Here is my elasticsearch.yml configuration file: https://gist.github.com/1066379
Right now I am testing a sharding configuration of 10 shards and 1
replica.
Nodes are discovered perfectly, I use the bulk api (from inside
the Java api) to index the data in bulks of around 1k, with no
indexing performance problems, and my client is connected to both ES
instances.
Before going live with the new service I decided to stress test
it, and after a lot of testing I discovered that I can not achieve, by
far, my desired performance.
With this configuration, when I run the stress test, with 100
concurrent clients making requests after around 10 seconds, system
load starts to increase on the elasticsearch servers to above 10, and
I start getting timeouts (8 seconds) on the stress test.
During this time there are no exceptions on the ES log which is on
INFO level.
I have also noticed an uneven distribution of the load, most of
the times the master node has a system load of above 10 when the other
node is having a system load of 2-3.
I have also tested without the custom script, with the same
results.
For the search queries I am using a BoolQuery with a few range
terms and a couple of Integer terms, and I am not using the full text
search.

If you have endured my ranting so far, here are a couple of

questions:

Is my configuration ok? Am I missing something?
What would your recommendation be to achieve my desired

performance? More smaller servers, more extra large instances, more
sharding, less sharding?
Should I discard ElasticSearch and use a different solution?

Thank you in advance,
Best regards,

Ariel Amato


(jp.lorandi) #2

How much min and max ram for ES -- you only mention the image's RAM. Is there anything else running on the box?

Also a high shard/replica ratio like you have increases indexing (esp. bulk) performance, but the inverse ratio will boost search speed, so enlarging the amount of replicas will speed up the system (have to be more replicas than shards though for optimal results).

I would try with smaller machines -- less cores, less ram, 2 shards, 8 replicas, 8 nodes total. Shay suggests in the README on github a 1/10 ratio even.

Espero que ayude,JP


-------- Original Message --------

Subject: EC2 Perfomance problems, advice needed

From: Ariel Amato <ariel.amato.gm@gmail.com>

Date: Tue, July 05, 2011 11:32 pm

To: users <users@elasticsearch.com>

Hi,

I have been trying to use ES for a search service for my site,

which is running on EC2 servers, and I am having a lot of performance

problems.

My data set is roughly 1.2M records of 30 fields each, mostly

Integers of Long numbers, a few short Strings and a Geo location

point.

Most of the search queries use a native custom score script.


I need to query the ES cluster with around 300 concurrent

requests, and I roughly need a performance of 40 requests per second.

I have got a staging environment, with around 2k records of the

same nature that the production data set has, with the default

sharding configuration and testing with only one ES instance it

achieves over 30 requests per second with the above mentioned amount

of concurrent requests.

My production cluster is constituted by 2 c1.xlarge instances,

which are 8 core boxes with 7 GB of memory each, running with Ubuntu

10.10 and ES 0.16.2.

Here is my elasticsearch.yml configuration file: <a href="https://gist.github.com/1066379">https://gist.github.com/1066379</a>


Right now I am testing a sharding configuration of 10 shards and 1

replica.

Nodes are discovered perfectly, I use the bulk api (from inside

the Java api) to index the data in bulks of around 1k, with no

indexing performance problems, and my client is connected to both ES

instances.

Before going live with the new service I decided to stress test

it, and after a lot of testing I discovered that I can not achieve, by

far, my desired performance.

With this configuration, when I run the stress test, with 100

concurrent clients making requests after around 10 seconds, system

load starts to increase on the elasticsearch servers to above 10, and

I start getting timeouts (8 seconds) on the stress test.

During this time there are no exceptions on the ES log which is on

INFO level.

I have also noticed an uneven distribution of the load, most of

the times the master node has a system load of above 10 when the other

node is having a system load of 2-3.

I have also tested without the custom script, with the same

results.

For the search queries I am using a BoolQuery with a few range

terms and a couple of Integer terms, and I am not using the full text

search.

If you have endured my ranting so far, here are a couple of

questions:

Is my configuration ok? Am I missing something?


What would your recommendation be to achieve my desired

performance? More smaller servers, more extra large instances, more

sharding, less sharding?

Should I discard ElasticSearch and use a different solution?

Thank you in advance,

Best regards,

Ariel Amato


(Clinton Gormley) #3

Hi Ariel

Most of the search queries use a native custom score script.

Is this really required? Could you gist some example docs and queries?

Here is my elasticsearch.yml configuration file: https://gist.github.com/1066379

I see you're using mlockall, which is good. Are you sure that it is
being applied though?

ulimit -l needs to be set to unlimited, and ubuntu by default doesn't
apply the limits.conf file with sudo.

Also, ES_MIN_MEM and ES_MAX_MEM should be set to the same value, and
when you start the node, it should reserve that amount of memory right
from the beginning. You can use https://github.com/lukas-vlcek/bigdesk
to see if it is doing that.

Also, don't give ALL your memory to ES. The kernel needs space for the
file cache as well.

Right now I am testing a sharding configuration of 10 shards and 1

replica.

For only 1.2m docs, why are you using 10 primary shards? Reducing the
number of shards could improve performance, and you can increase the
number of replicas to spread out the search load more.

For the search queries I am using a BoolQuery with a few range

terms and a couple of Integer terms, and I am not using the full text
search.

Unless you want those terms to be included in the relevance (_score)
calculation, you should use filters instead of queries - they can be
cached and don't have to go through the scoring process.

Should I discard ElasticSearch and use a different solution?

Bah humbug! :slight_smile:

clint


(Ariel Amato) #4

Hi,
Thank you for your replies, here are the answers to your
questions:

@JP:

"How much min and max ram for ES -- you only mention the image's RAM. Is there anything else running on the box?"

5GB of ram assigned to ES, there's nothing else running on that box.
The box has 7GB total, of which 1.1GB are free.

@clint:

"Is this really required? Could you gist some example docs and queries? "
I'll upload some examples in a couple of hours.

"I see you're using mlockall, which is good. Are you sure that it is
being applied though?
ulimit -l needs to be set to unlimited, and ubuntu by default doesn't
apply the limits.conf file with sudo. "

I'm using 5Gb of ram assigned to ES, with the same value for
ES_MIN_MEM and ES_MAX_MEM.
ES is able to lock that memory at startup, it's not swapping anything,
here is an output of the memory usage:

                             total       used

free shared buffers cached
Mem: 7132040 5995712 1136328 0
188708 54844
-/+ buffers/cache: 5752160 1379880
Swap: 0 0 0

"For only 1.2m docs, why are you using 10 primary shards? Reducing the
number of shards could improve performance, and you can increase the
number of replicas to spread out the search load more. "

By your suggestion I decreased the number of primary shards, and
increased the number of replicas but the indexing time took a real
blow, now to index the full set of 1.2M users it would take around
20hs with 6 shards/6 replicas, i've also tried with 2 shards/4
replicas and 4 shards/4 replicas with similar indexing results.
With 4 shards/2 replicas, indexing time was reasonably fast, and I was
able to achieve ~30 requests per second with less timeouts.

"Unless you want those terms to be included in the relevance (_score)
calculation, you should use filters instead of queries - they can be
cached and don't have to go through the scoring process. "

I actually need this terms to be included on the score as I need the
quality of the results to downgrade gracefully when there are no exact
matches.
For example, when i'm making a geo search, i need to include results
that fit on the given distance range, but i also need to include
results outside the distance range, but with a lesser score based on
the distance to the point. I know, you are going to say that what i'm
describing is more of a sort operation than a score operation, but I
also need the rest of the search terms to be included on the score.

Should I discard ElasticSearch and use a different solution?

Bah humbug! :slight_smile:

Sorry for that, I actually think that ES is great, really simple to
use and powerful, but I was wondering if it's not the right tool for
the job....

I will try with another one of JP's suggestion, having a few smaller
servers, but i'm worried about indexing performance with that setup.

Thanks again for your replies, and thanks in advance for any other
help you can give me.

Regards,

Ariel

On Jul 6, 6:40 am, Clinton Gormley clin...@iannounce.co.uk wrote:

Hi Ariel

Most of the search queries use a native custom score script.

Is this really required? Could you gist some example docs and queries?

Here is my elasticsearch.yml configuration file:https://gist.github.com/1066379

I see you're using mlockall, which is good. Are you sure that it is
being applied though?

ulimit -l needs to be set to unlimited, and ubuntu by default doesn't
apply the limits.conf file with sudo.

Also, ES_MIN_MEM and ES_MAX_MEM should be set to the same value, and
when you start the node, it should reserve that amount of memory right
from the beginning. You can usehttps://github.com/lukas-vlcek/bigdesk
to see if it is doing that.

Also, don't give ALL your memory to ES. The kernel needs space for the
file cache as well.

Right now I am testing a sharding configuration of 10 shards and 1

replica.

For only 1.2m docs, why are you using 10 primary shards? Reducing the
number of shards could improve performance, and you can increase the
number of replicas to spread out the search load more.

For the search queries I am using a BoolQuery with a few range

terms and a couple of Integer terms, and I am not using the full text
search.

Unless you want those terms to be included in the relevance (_score)
calculation, you should use filters instead of queries - they can be
cached and don't have to go through the scoring process.

Should I discard ElasticSearch and use a different solution?

Bah humbug! :slight_smile:

clint


(Ariel Amato) #5

Hi again,
Now I am seeing that with a high replica/shard ratio as
you suggested, 8 shards/8 replicas, it does not seem to be indexing
anything.
The bulk call returns in about 1 minute, but the index
seems empty, on the elasticsearch log I see:

[2011-07-06 14:35:02,853][INFO ][cluster.metadata ]
[Cagliostro] [users] creating index, cause [auto(bulk api)], shards
[8]/[6], mappings []

        But there after a while of indexing there are still no

information available on ES. What am I doing wrong?

Regards,
Ariel

On Jul 6, 10:21 am, Ariel Amato ariel.amato...@gmail.com wrote:

Hi,
Thank you for your replies, here are the answers to your
questions:

@JP:

"How much min and max ram for ES -- you only mention the image's RAM. Is there anything else running on the box?"

5GB of ram assigned to ES, there's nothing else running on that box.
The box has 7GB total, of which 1.1GB are free.

@clint:

"Is this really required? Could you gist some example docs and queries? "

I'll upload some examples in a couple of hours.

"I see you're using mlockall, which is good. Are you sure that it is

being applied though?
ulimit -l needs to be set to unlimited, and ubuntu by default doesn't
apply the limits.conf file with sudo. "

I'm using 5Gb of ram assigned to ES, with the same value for
ES_MIN_MEM and ES_MAX_MEM.
ES is able to lock that memory at startup, it's not swapping anything,
here is an output of the memory usage:

                             total       used

free shared buffers cached
Mem: 7132040 5995712 1136328 0
188708 54844
-/+ buffers/cache: 5752160 1379880
Swap: 0 0 0

"For only 1.2m docs, why are you using 10 primary shards? Reducing the

number of shards could improve performance, and you can increase the
number of replicas to spread out the search load more. "

By your suggestion I decreased the number of primary shards, and
increased the number of replicas but the indexing time took a real
blow, now to index the full set of 1.2M users it would take around
20hs with 6 shards/6 replicas, i've also tried with 2 shards/4
replicas and 4 shards/4 replicas with similar indexing results.
With 4 shards/2 replicas, indexing time was reasonably fast, and I was
able to achieve ~30 requests per second with less timeouts.

"Unless you want those terms to be included in the relevance (_score)

calculation, you should use filters instead of queries - they can be
cached and don't have to go through the scoring process. "

I actually need this terms to be included on the score as I need the
quality of the results to downgrade gracefully when there are no exact
matches.
For example, when i'm making a geo search, i need to include results
that fit on the given distance range, but i also need to include
results outside the distance range, but with a lesser score based on
the distance to the point. I know, you are going to say that what i'm
describing is more of a sort operation than a score operation, but I
also need the rest of the search terms to be included on the score.

Should I discard ElasticSearch and use a different solution?

Bah humbug! :slight_smile:

Sorry for that, I actually think that ES is great, really simple to
use and powerful, but I was wondering if it's not the right tool for
the job....

I will try with another one of JP's suggestion, having a few smaller
servers, but i'm worried about indexing performance with that setup.

Thanks again for your replies, and thanks in advance for any other
help you can give me.

Regards,

Ariel

On Jul 6, 6:40 am, Clinton Gormley clin...@iannounce.co.uk wrote:

Hi Ariel

Most of the search queries use a native custom score script.

Is this really required? Could you gist some example docs and queries?

Here is my elasticsearch.yml configuration file:https://gist.github.com/1066379

I see you're using mlockall, which is good. Are you sure that it is
being applied though?

ulimit -l needs to be set to unlimited, and ubuntu by default doesn't
apply the limits.conf file with sudo.

Also, ES_MIN_MEM and ES_MAX_MEM should be set to the same value, and
when you start the node, it should reserve that amount of memory right
from the beginning. You can usehttps://github.com/lukas-vlcek/bigdesk
to see if it is doing that.

Also, don't give ALL your memory to ES. The kernel needs space for the
file cache as well.

Right now I am testing a sharding configuration of 10 shards and 1

replica.

For only 1.2m docs, why are you using 10 primary shards? Reducing the
number of shards could improve performance, and you can increase the
number of replicas to spread out the search load more.

For the search queries I am using a BoolQuery with a few range

terms and a couple of Integer terms, and I am not using the full text
search.

Unless you want those terms to be included in the relevance (_score)
calculation, you should use filters instead of queries - they can be
cached and don't have to go through the scoring process.

Should I discard ElasticSearch and use a different solution?

Bah humbug! :slight_smile:

clint


(Shay Banon) #6

On ec2, I would suggest going with m1.xlarge, as more RAM means more file system cache. Also, having more shards on 2 nodes can slow down search, as it needs to execute across all of them, maybe you can do with 2-4 shards (with 1 replica). Increasing the number of replicas will not help, since you only run on 2 nodes (and you can always increase it dynamically later if oyu add more nodes).

Also, in terms of queries, maybe some of your range queries, especially the numeric ones, can benefit from being used as filters instead of queries (so they can be cached).

How much memory are you allocating to ES?

Master nodes play no part when searching, so the fact that it uses more resources does not imply because its the master node.

Can you tell where the load is coming from? Is it CPU, IO?

On Wednesday, July 6, 2011 at 7:33 AM, jp.lorandi@cfyar.com (mailto:jp.lorandi@cfyar.com) wrote:

How much min and max ram for ES -- you only mention the image's RAM. Is there anything else running on the box?

Also a high shard/replica ratio like you have increases indexing (esp. bulk) performance, but the inverse ratio will boost search speed, so enlarging the amount of replicas will speed up the system (have to be more replicas than shards though for optimal results).

I would try with smaller machines -- less cores, less ram, 2 shards, 8 replicas, 8 nodes total. Shay suggests in the README on github a 1/10 ratio even.

Espero que ayude,
JP

-------- Original Message --------
Subject: EC2 Perfomance problems, advice needed
From: Ariel Amato <ariel.amato.gm@gmail.com (http://ariel.amato.gm@gmail.com)>
Date: Tue, July 05, 2011 11:32 pm
To: users <users@elasticsearch.com (mailto:users@elasticsearch.com)>

Hi,
I have been trying to use ES for a search service for my site,
which is running on EC2 servers, and I am having a lot of performance
problems.
My data set is roughly 1.2M records of 30 fields each, mostly
Integers of Long numbers, a few short Strings and a Geo location
point.
Most of the search queries use a native custom score script.
I need to query the ES cluster with around 300 concurrent
requests, and I roughly need a performance of 40 requests per second.
I have got a staging environment, with around 2k records of the
same nature that the production data set has, with the default
sharding configuration and testing with only one ES instance it
achieves over 30 requests per second with the above mentioned amount
of concurrent requests.
My production cluster is constituted by 2 c1.xlarge instances,
which are 8 core boxes with 7 GB of memory each, running with Ubuntu
10.10 and ES 0.16.2.
Here is my elasticsearch.yml configuration file: https://gist.github.com/1066379
Right now I am testing a sharding configuration of 10 shards and 1
replica.
Nodes are discovered perfectly, I use the bulk api (from inside
the Java api) to index the data in bulks of around 1k, with no
indexing performance problems, and my client is connected to both ES
instances.
Before going live with the new service I decided to stress test
it, and after a lot of testing I discovered that I can not achieve, by
far, my desired performance.
With this configuration, when I run the stress test, with 100
concurrent clients making requests after around 10 seconds, system
load starts to increase on the elasticsearch servers to above 10, and
I start getting timeouts (8 seconds) on the stress test.
During this time there are no exceptions on the ES log which is on
INFO level.
I have also noticed an uneven distribution of the load, most of
the times the master node has a system load of above 10 when the other
node is having a system load of 2-3.
I have also tested without the custom script, with the same
results.
For the search queries I am using a BoolQuery with a few range
terms and a couple of Integer terms, and I am not using the full text
search.

If you have endured my ranting so far, here are a couple of
questions:

Is my configuration ok? Am I missing something?
What would your recommendation be to achieve my desired
performance? More smaller servers, more extra large instances, more
sharding, less sharding?
Should I discard ElasticSearch and use a different solution?

Thank you in advance,
Best regards,

Ariel Amato


(Shay Banon) #7

How many nodes do you have in the cluster? More replicas and not enough nodes (For example, 2 nodes and having 6 replicas) will cause indexing to fail by default, as it expects a quorum of replicas to be allocated (and on 2 nodes, it will only assign the primary shard and a replica, no place for the rest of the replicas).

This thread has somehow split, I gave more suggestions on the other one...

On Wednesday, July 6, 2011 at 5:43 PM, Ariel Amato wrote:

Hi again,
Now I am seeing that with a high replica/shard ratio as
you suggested, 8 shards/8 replicas, it does not seem to be indexing
anything.
The bulk call returns in about 1 minute, but the index
seems empty, on the elasticsearch log I see:

[2011-07-06 14:35:02,853][INFO ][cluster.metadata ]
[Cagliostro] [users] creating index, cause [auto(bulk api)], shards
[8]/[6], mappings []

But there after a while of indexing there are still no
information available on ES. What am I doing wrong?

Regards,
Ariel

On Jul 6, 10:21 am, Ariel Amato <ariel.amato...@gmail.com (http://gmail.com)> wrote:

Hi,
Thank you for your replies, here are the answers to your
questions:

@JP:

"How much min and max ram for ES -- you only mention the image's RAM. Is there anything else running on the box?"

5GB of ram assigned to ES, there's nothing else running on that box.
The box has 7GB total, of which 1.1GB are free.

@clint:

"Is this really required? Could you gist some example docs and queries? "

I'll upload some examples in a couple of hours.

"I see you're using mlockall, which is good. Are you sure that it is

being applied though?
ulimit -l needs to be set to unlimited, and ubuntu by default doesn't
apply the limits.conf file with sudo. "

I'm using 5Gb of ram assigned to ES, with the same value for
ES_MIN_MEM and ES_MAX_MEM.
ES is able to lock that memory at startup, it's not swapping anything,
here is an output of the memory usage:

total used
free shared buffers cached
Mem: 7132040 5995712 1136328 0
188708 54844
-/+ buffers/cache: 5752160 1379880
Swap: 0 0 0

"For only 1.2m docs, why are you using 10 primary shards? Reducing the

number of shards could improve performance, and you can increase the
number of replicas to spread out the search load more. "

By your suggestion I decreased the number of primary shards, and
increased the number of replicas but the indexing time took a real
blow, now to index the full set of 1.2M users it would take around
20hs with 6 shards/6 replicas, i've also tried with 2 shards/4
replicas and 4 shards/4 replicas with similar indexing results.
With 4 shards/2 replicas, indexing time was reasonably fast, and I was
able to achieve ~30 requests per second with less timeouts.

"Unless you want those terms to be included in the relevance (_score)

calculation, you should use filters instead of queries - they can be
cached and don't have to go through the scoring process. "

I actually need this terms to be included on the score as I need the
quality of the results to downgrade gracefully when there are no exact
matches.
For example, when i'm making a geo search, i need to include results
that fit on the given distance range, but i also need to include
results outside the distance range, but with a lesser score based on
the distance to the point. I know, you are going to say that what i'm
describing is more of a sort operation than a score operation, but I
also need the rest of the search terms to be included on the score.

Should I discard ElasticSearch and use a different solution?
Bah humbug! :slight_smile:

Sorry for that, I actually think that ES is great, really simple to
use and powerful, but I was wondering if it's not the right tool for
the job....

I will try with another one of JP's suggestion, having a few smaller
servers, but i'm worried about indexing performance with that setup.

Thanks again for your replies, and thanks in advance for any other
help you can give me.

Regards,

Ariel

On Jul 6, 6:40 am, Clinton Gormley <clin...@iannounce.co.uk (http://iannounce.co.uk)> wrote:

Hi Ariel

Most of the search queries use a native custom score script.

Is this really required? Could you gist some example docs and queries?

Here is my elasticsearch.yml configuration file:https://gist.github.com/1066379 (http://gist.github.com/1066379)

I see you're using mlockall, which is good. Are you sure that it is
being applied though?

ulimit -l needs to be set to unlimited, and ubuntu by default doesn't
apply the limits.conf file with sudo.

Also, ES_MIN_MEM and ES_MAX_MEM should be set to the same value, and
when you start the node, it should reserve that amount of memory right
from the beginning. You can usehttps://github.com/lukas-vlcek/bigdesk (http://github.com/lukas-vlcek/bigdesk)
to see if it is doing that.

Also, don't give ALL your memory to ES. The kernel needs space for the
file cache as well.

Right now I am testing a sharding configuration of 10 shards and 1
replica.

For only 1.2m docs, why are you using 10 primary shards? Reducing the
number of shards could improve performance, and you can increase the
number of replicas to spread out the search load more.

For the search queries I am using a BoolQuery with a few range
terms and a couple of Integer terms, and I am not using the full text
search.

Unless you want those terms to be included in the relevance (_score)
calculation, you should use filters instead of queries - they can be
cached and don't have to go through the scoring process.

Should I discard ElasticSearch and use a different solution?

Bah humbug! :slight_smile:

clint


(Ariel Amato) #8

Hi Shay, I am allocating 5gb of ram for ES on each server. When it's
not under load, heap usage is really low ~350mb with the 5gb
allocated.
Under load, the amount threads increased to around 150 (when I stopped
the test).
The problem with the uneven load between the master and slave node
seems to be fixed if I specify a routing key when indexing.
The main cause for the load seems to be cpu usage, here is a "top"
output:

top - 16:24:07 up 17:45, 2 users, load average: 31.69, 20.89, 11.02
Tasks: 158 total, 1 running, 157 sleeping, 0 stopped, 0 zombie
Cpu(s): 88.3%us, 3.3%sy, 0.0%ni, 2.7%id, 0.0%wa, 0.0%hi,
0.1%si, 5.6%st
Mem: 7132040k total, 7042560k used, 89480k free, 178336k
buffers
Swap: 0k total, 0k used, 0k free, 1078180k
cached

PID USER     PR  NI  VIRT     RES  SHR S  %CPU %MEM    TIME+

COMMAND
28055 root 20 0 5689m 5.3g 16m S 789 77.7
21:45.30
java
221 root 20 0 0 0 0 S
0 0.0 0:28.83
kjournald
28314 root 20 0 19276 1352 968 R 0
0.0 0:00.06
top
1 root 20 0 23860 1364 656 S 0
0.0 0:00.83
init

Regarding the queries, for the range terms, I need to downgrade
gracefully the quality of the results that do not match.
For example, when searching for an age field, I need to specify a
range of 25 to 31, so the users on that range get the greatest score,
but I also need to include on the results users with 32 with a lower
score, and 35 with an even lower score. Please let me know if there is
a way to do that with filters.

I'm in the process of expanding the cluster to 4 m1.large instances,
with a 2 shards / 2 replicas configuration.
Please let me know what you think about that.

Thanks in advance
Regards,

Ariel

On Jul 6, 1:07 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

On ec2, I would suggest going with m1.xlarge, as more RAM means more file system cache. Also, having more shards on 2 nodes can slow down search, as it needs to execute across all of them, maybe you can do with 2-4 shards (with 1 replica). Increasing the number of replicas will not help, since you only run on 2 nodes (and you can always increase it dynamically later if oyu add more nodes).

Also, in terms of queries, maybe some of your range queries, especially the numeric ones, can benefit from being used as filters instead of queries (so they can be cached).

How much memory are you allocating to ES?

Master nodes play no part when searching, so the fact that it uses more resources does not imply because its the master node.

Can you tell where the load is coming from? Is it CPU, IO?

On Wednesday, July 6, 2011 at 7:33 AM, jp.lora...@cfyar.com (mailto:jp.lora...@cfyar.com) wrote:

How much min and max ram for ES -- you only mention the image's RAM. Is there anything else running on the box?

Also a high shard/replica ratio like you have increases indexing (esp. bulk) performance, but the inverse ratio will boost search speed, so enlarging the amount of replicas will speed up the system (have to be more replicas than shards though for optimal results).

I would try with smaller machines -- less cores, less ram, 2 shards, 8 replicas, 8 nodes total. Shay suggests in the README on github a 1/10 ratio even.

Espero que ayude,
JP

-------- Original Message --------
Subject: EC2 Perfomance problems, advice needed
From: Ariel Amato <ariel.amato...@gmail.com (http://ariel.amato...@gmail.com)>
Date: Tue, July 05, 2011 11:32 pm
To: users <us...@elasticsearch.com (mailto:us...@elasticsearch.com)>

Hi,
I have been trying to use ES for a search service for my site,
which is running on EC2 servers, and I am having a lot of performance
problems.
My data set is roughly 1.2M records of 30 fields each, mostly
Integers of Long numbers, a few short Strings and a Geo location
point.
Most of the search queries use a native custom score script.
I need to query the ES cluster with around 300 concurrent
requests, and I roughly need a performance of 40 requests per second.
I have got a staging environment, with around 2k records of the
same nature that the production data set has, with the default
sharding configuration and testing with only one ES instance it
achieves over 30 requests per second with the above mentioned amount
of concurrent requests.
My production cluster is constituted by 2 c1.xlarge instances,
which are 8 core boxes with 7 GB of memory each, running with Ubuntu
10.10 and ES 0.16.2.
Here is my elasticsearch.yml configuration file:https://gist.github.com/1066379
Right now I am testing a sharding configuration of 10 shards and 1
replica.
Nodes are discovered perfectly, I use the bulk api (from inside
the Java api) to index the data in bulks of around 1k, with no
indexing performance problems, and my client is connected to both ES
instances.
Before going live with the new service I decided to stress test
it, and after a lot of testing I discovered that I can not achieve, by
far, my desired performance.
With this configuration, when I run the stress test, with 100
concurrent clients making requests after around 10 seconds, system
load starts to increase on the elasticsearch servers to above 10, and
I start getting timeouts (8 seconds) on the stress test.
During this time there are no exceptions on the ES log which is on
INFO level.
I have also noticed an uneven distribution of the load, most of
the times the master node has a system load of above 10 when the other
node is having a system load of 2-3.
I have also tested without the custom script, with the same
results.
For the search queries I am using a BoolQuery with a few range
terms and a couple of Integer terms, and I am not using the full text
search.

If you have endured my ranting so far, here are a couple of
questions:

Is my configuration ok? Am I missing something?
What would your recommendation be to achieve my desired
performance? More smaller servers, more extra large instances, more
sharding, less sharding?
Should I discard ElasticSearch and use a different solution?

Thank you in advance,
Best regards,

Ariel Amato


(Shay Banon) #9

Hey,

I would suggest using m1.xlarge, and allocate to ES ~5gb. When you run the test, can you take a node.stats sample, and gist it?

As for your queries and using filter, it feels like filters will really help here. Can you gist a sample query so we can work on how it will look like?

-shay.banon

On Wednesday, July 6, 2011 at 7:46 PM, Ariel Amato wrote:

Hi Shay, I am allocating 5gb of ram for ES on each server. When it's
not under load, heap usage is really low ~350mb with the 5gb
allocated.
Under load, the amount threads increased to around 150 (when I stopped
the test).
The problem with the uneven load between the master and slave node
seems to be fixed if I specify a routing key when indexing.
The main cause for the load seems to be cpu usage, here is a "top"
output:

top - 16:24:07 up 17:45, 2 users, load average: 31.69, 20.89, 11.02
Tasks: 158 total, 1 running, 157 sleeping, 0 stopped, 0 zombie
Cpu(s): 88.3%us, 3.3%sy, 0.0%ni, 2.7%id, 0.0%wa, 0.0%hi,
0.1%si, 5.6%st
Mem: 7132040k total, 7042560k used, 89480k free, 178336k
buffers
Swap: 0k total, 0k used, 0k free, 1078180k
cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
28055 root 20 0 5689m 5.3g 16m S 789 77.7
21:45.30
java
221 root 20 0 0 0 0 S
0 0.0 0:28.83
kjournald
28314 root 20 0 19276 1352 968 R 0
0.0 0:00.06
top
1 root 20 0 23860 1364 656 S 0
0.0 0:00.83
init

Regarding the queries, for the range terms, I need to downgrade
gracefully the quality of the results that do not match.
For example, when searching for an age field, I need to specify a
range of 25 to 31, so the users on that range get the greatest score,
but I also need to include on the results users with 32 with a lower
score, and 35 with an even lower score. Please let me know if there is
a way to do that with filters.

I'm in the process of expanding the cluster to 4 m1.large instances,
with a 2 shards / 2 replicas configuration.
Please let me know what you think about that.

Thanks in advance
Regards,

Ariel

On Jul 6, 1:07 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

On ec2, I would suggest going with m1.xlarge, as more RAM means more file system cache. Also, having more shards on 2 nodes can slow down search, as it needs to execute across all of them, maybe you can do with 2-4 shards (with 1 replica). Increasing the number of replicas will not help, since you only run on 2 nodes (and you can always increase it dynamically later if oyu add more nodes).

Also, in terms of queries, maybe some of your range queries, especially the numeric ones, can benefit from being used as filters instead of queries (so they can be cached).

How much memory are you allocating to ES?

Master nodes play no part when searching, so the fact that it uses more resources does not imply because its the master node.

Can you tell where the load is coming from? Is it CPU, IO?

On Wednesday, July 6, 2011 at 7:33 AM, jp.lora...@cfyar.com (mailto:jp.lora...@cfyar.com (http://cfyar.com)) wrote:

How much min and max ram for ES -- you only mention the image's RAM. Is there anything else running on the box?

Also a high shard/replica ratio like you have increases indexing (esp. bulk) performance, but the inverse ratio will boost search speed, so enlarging the amount of replicas will speed up the system (have to be more replicas than shards though for optimal results).

I would try with smaller machines -- less cores, less ram, 2 shards, 8 replicas, 8 nodes total. Shay suggests in the README on github a 1/10 ratio even.

Espero que ayude,
JP

-------- Original Message --------
Subject: EC2 Perfomance problems, advice needed
From: Ariel Amato <ariel.amato...@gmail.com (http://ariel.amato...@gmail.com (http://gmail.com))>
Date: Tue, July 05, 2011 11:32 pm
To: users <us...@elasticsearch.com (mailto:us...@elasticsearch.com (http://elasticsearch.com))>

Hi,
I have been trying to use ES for a search service for my site,
which is running on EC2 servers, and I am having a lot of performance
problems.
My data set is roughly 1.2M records of 30 fields each, mostly
Integers of Long numbers, a few short Strings and a Geo location
point.
Most of the search queries use a native custom score script.
I need to query the ES cluster with around 300 concurrent
requests, and I roughly need a performance of 40 requests per second.
I have got a staging environment, with around 2k records of the
same nature that the production data set has, with the default
sharding configuration and testing with only one ES instance it
achieves over 30 requests per second with the above mentioned amount
of concurrent requests.
My production cluster is constituted by 2 c1.xlarge instances,
which are 8 core boxes with 7 GB of memory each, running with Ubuntu
10.10 and ES 0.16.2.
Here is my elasticsearch.yml configuration file:https://gist.github.com/1066379 (http://gist.github.com/1066379)
Right now I am testing a sharding configuration of 10 shards and 1
replica.
Nodes are discovered perfectly, I use the bulk api (from inside
the Java api) to index the data in bulks of around 1k, with no
indexing performance problems, and my client is connected to both ES
instances.
Before going live with the new service I decided to stress test
it, and after a lot of testing I discovered that I can not achieve, by
far, my desired performance.
With this configuration, when I run the stress test, with 100
concurrent clients making requests after around 10 seconds, system
load starts to increase on the elasticsearch servers to above 10, and
I start getting timeouts (8 seconds) on the stress test.
During this time there are no exceptions on the ES log which is on
INFO level.
I have also noticed an uneven distribution of the load, most of
the times the master node has a system load of above 10 when the other
node is having a system load of 2-3.
I have also tested without the custom script, with the same
results.
For the search queries I am using a BoolQuery with a few range
terms and a couple of Integer terms, and I am not using the full text
search.

If you have endured my ranting so far, here are a couple of
questions:

Is my configuration ok? Am I missing something?
What would your recommendation be to achieve my desired
performance? More smaller servers, more extra large instances, more
sharding, less sharding?
Should I discard ElasticSearch and use a different solution?

Thank you in advance,
Best regards,

Ariel Amato


(Ariel Amato) #10

Hey,

Here they are:
Stats when idle: https://gist.github.com/1067820
Stats under load: https://gist.github.com/1067822
Top: https://gist.github.com/1067818

Sample code for generating the query: https://gist.github.com/1067840

What's the amount of m1.xlarge servers you suggest?

Regards,
Ariel

On Jul 6, 1:52 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hey,

I would suggest using m1.xlarge, and allocate to ES ~5gb. When you run the test, can you take a node.stats sample, and gist it?

As for your queries and using filter, it feels like filters will really help here. Can you gist a sample query so we can work on how it will look like?

-shay.banon

On Wednesday, July 6, 2011 at 7:46 PM, Ariel Amato wrote:

Hi Shay, I am allocating 5gb of ram for ES on each server. When it's
not under load, heap usage is really low ~350mb with the 5gb
allocated.
Under load, the amount threads increased to around 150 (when I stopped
the test).
The problem with the uneven load between the master and slave node
seems to be fixed if I specify a routing key when indexing.
The main cause for the load seems to be cpu usage, here is a "top"
output:

top - 16:24:07 up 17:45, 2 users, load average: 31.69, 20.89, 11.02
Tasks: 158 total, 1 running, 157 sleeping, 0 stopped, 0 zombie
Cpu(s): 88.3%us, 3.3%sy, 0.0%ni, 2.7%id, 0.0%wa, 0.0%hi,
0.1%si, 5.6%st
Mem: 7132040k total, 7042560k used, 89480k free, 178336k
buffers
Swap: 0k total, 0k used, 0k free, 1078180k
cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
28055 root 20 0 5689m 5.3g 16m S 789 77.7
21:45.30
java
221 root 20 0 0 0 0 S
0 0.0 0:28.83
kjournald
28314 root 20 0 19276 1352 968 R 0
0.0 0:00.06
top
1 root 20 0 23860 1364 656 S 0
0.0 0:00.83
init

Regarding the queries, for the range terms, I need to downgrade
gracefully the quality of the results that do not match.
For example, when searching for an age field, I need to specify a
range of 25 to 31, so the users on that range get the greatest score,
but I also need to include on the results users with 32 with a lower
score, and 35 with an even lower score. Please let me know if there is
a way to do that with filters.

I'm in the process of expanding the cluster to 4 m1.large instances,
with a 2 shards / 2 replicas configuration.
Please let me know what you think about that.

Thanks in advance
Regards,

Ariel

On Jul 6, 1:07 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

On ec2, I would suggest going with m1.xlarge, as more RAM means more file system cache. Also, having more shards on 2 nodes can slow down search, as it needs to execute across all of them, maybe you can do with 2-4 shards (with 1 replica). Increasing the number of replicas will not help, since you only run on 2 nodes (and you can always increase it dynamically later if oyu add more nodes).

Also, in terms of queries, maybe some of your range queries, especially the numeric ones, can benefit from being used as filters instead of queries (so they can be cached).

How much memory are you allocating to ES?

Master nodes play no part when searching, so the fact that it uses more resources does not imply because its the master node.

Can you tell where the load is coming from? Is it CPU, IO?

On Wednesday, July 6, 2011 at 7:33 AM, jp.lora...@cfyar.com (mailto:jp.lora...@cfyar.com (http://cfyar.com)) wrote:

How much min and max ram for ES -- you only mention the image's RAM. Is there anything else running on the box?

Also a high shard/replica ratio like you have increases indexing (esp. bulk) performance, but the inverse ratio will boost search speed, so enlarging the amount of replicas will speed up the system (have to be more replicas than shards though for optimal results).

I would try with smaller machines -- less cores, less ram, 2 shards, 8 replicas, 8 nodes total. Shay suggests in the README on github a 1/10 ratio even.

Espero que ayude,
JP

-------- Original Message --------
Subject: EC2 Perfomance problems, advice needed
From: Ariel Amato <ariel.amato...@gmail.com (http://ariel.amato...@gmail.com (http://gmail.com))>
Date: Tue, July 05, 2011 11:32 pm
To: users <us...@elasticsearch.com (mailto:us...@elasticsearch.com (http://elasticsearch.com))>

Hi,
I have been trying to use ES for a search service for my site,
which is running on EC2 servers, and I am having a lot of performance
problems.
My data set is roughly 1.2M records of 30 fields each, mostly
Integers of Long numbers, a few short Strings and a Geo location
point.
Most of the search queries use a native custom score script.
I need to query the ES cluster with around 300 concurrent
requests, and I roughly need a performance of 40 requests per second.
I have got a staging environment, with around 2k records of the
same nature that the production data set has, with the default
sharding configuration and testing with only one ES instance it
achieves over 30 requests per second with the above mentioned amount
of concurrent requests.
My production cluster is constituted by 2 c1.xlarge instances,
which are 8 core boxes with 7 GB of memory each, running with Ubuntu
10.10 and ES 0.16.2.
Here is my elasticsearch.yml configuration file:https://gist.github.com/1066379(http://gist.github.com/1066379)
Right now I am testing a sharding configuration of 10 shards and 1
replica.
Nodes are discovered perfectly, I use the bulk api (from inside
the Java api) to index the data in bulks of around 1k, with no
indexing performance problems, and my client is connected to both ES
instances.
Before going live with the new service I decided to stress test
it, and after a lot of testing I discovered that I can not achieve, by
far, my desired performance.
With this configuration, when I run the stress test, with 100
concurrent clients making requests after around 10 seconds, system
load starts to increase on the elasticsearch servers to above 10, and
I start getting timeouts (8 seconds) on the stress test.
During this time there are no exceptions on the ES log which is on
INFO level.
I have also noticed an uneven distribution of the load, most of
the times the master node has a system load of above 10 when the other
node is having a system load of 2-3.
I have also tested without the custom script, with the same
results.
For the search queries I am using a BoolQuery with a few range
terms and a couple of Integer terms, and I am not using the full text
search.

If you have endured my ranting so far, here are a couple of
questions:

Is my configuration ok? Am I missing something?
What would your recommendation be to achieve my desired
performance? More smaller servers, more extra large instances, more
sharding, less sharding?
Should I discard ElasticSearch and use a different solution?

Thank you in advance,
Best regards,

Ariel Amato


(Shay Banon) #11

Heya,

Lets start with the query, try and replace the range query you add as a should clause with a ConstantScore one wrapping a range filter (and set the boost on it). This will work only on memory and should speed things up.

The suggested filtering will use more memory (you should see in the stats the filter cache being used), but not by much, since age is easily cacheable.

I sugget you start with 2 nodes, and test, lets see what we get. Improving search perf is simple since once if you don't have full utilization of nodes (for example, 2 shard with 1 replica will max out on 4 nodes), you can dynamically increase the number of replicas.

-shay.banon

On Wednesday, July 6, 2011 at 8:43 PM, Ariel Amato wrote:

Hey,

Here they are:
Stats when idle: https://gist.github.com/1067820
Stats under load: https://gist.github.com/1067822
Top: https://gist.github.com/1067818

Sample code for generating the query: https://gist.github.com/1067840

What's the amount of m1.xlarge servers you suggest?

Regards,
Ariel

On Jul 6, 1:52 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

Hey,

I would suggest using m1.xlarge, and allocate to ES ~5gb. When you run the test, can you take a node.stats sample, and gist it?

As for your queries and using filter, it feels like filters will really help here. Can you gist a sample query so we can work on how it will look like?

-shay.banon

On Wednesday, July 6, 2011 at 7:46 PM, Ariel Amato wrote:

Hi Shay, I am allocating 5gb of ram for ES on each server. When it's
not under load, heap usage is really low ~350mb with the 5gb
allocated.
Under load, the amount threads increased to around 150 (when I stopped
the test).
The problem with the uneven load between the master and slave node
seems to be fixed if I specify a routing key when indexing.
The main cause for the load seems to be cpu usage, here is a "top"
output:

top - 16:24:07 up 17:45, 2 users, load average: 31.69, 20.89, 11.02
Tasks: 158 total, 1 running, 157 sleeping, 0 stopped, 0 zombie
Cpu(s): 88.3%us, 3.3%sy, 0.0%ni, 2.7%id, 0.0%wa, 0.0%hi,
0.1%si, 5.6%st
Mem: 7132040k total, 7042560k used, 89480k free, 178336k
buffers
Swap: 0k total, 0k used, 0k free, 1078180k
cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
28055 root 20 0 5689m 5.3g 16m S 789 77.7
21:45.30
java
221 root 20 0 0 0 0 S
0 0.0 0:28.83
kjournald
28314 root 20 0 19276 1352 968 R 0
0.0 0:00.06
top
1 root 20 0 23860 1364 656 S 0
0.0 0:00.83
init

Regarding the queries, for the range terms, I need to downgrade
gracefully the quality of the results that do not match.
For example, when searching for an age field, I need to specify a
range of 25 to 31, so the users on that range get the greatest score,
but I also need to include on the results users with 32 with a lower
score, and 35 with an even lower score. Please let me know if there is
a way to do that with filters.

I'm in the process of expanding the cluster to 4 m1.large instances,
with a 2 shards / 2 replicas configuration.
Please let me know what you think about that.

Thanks in advance
Regards,

Ariel

On Jul 6, 1:07 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

On ec2, I would suggest going with m1.xlarge, as more RAM means more file system cache. Also, having more shards on 2 nodes can slow down search, as it needs to execute across all of them, maybe you can do with 2-4 shards (with 1 replica). Increasing the number of replicas will not help, since you only run on 2 nodes (and you can always increase it dynamically later if oyu add more nodes).

Also, in terms of queries, maybe some of your range queries, especially the numeric ones, can benefit from being used as filters instead of queries (so they can be cached).

How much memory are you allocating to ES?

Master nodes play no part when searching, so the fact that it uses more resources does not imply because its the master node.

Can you tell where the load is coming from? Is it CPU, IO?

On Wednesday, July 6, 2011 at 7:33 AM, jp.lora...@cfyar.com (mailto:jp.lora...@cfyar.com (http://cfyar.com)) wrote:

How much min and max ram for ES -- you only mention the image's RAM. Is there anything else running on the box?

Also a high shard/replica ratio like you have increases indexing (esp. bulk) performance, but the inverse ratio will boost search speed, so enlarging the amount of replicas will speed up the system (have to be more replicas than shards though for optimal results).

I would try with smaller machines -- less cores, less ram, 2 shards, 8 replicas, 8 nodes total. Shay suggests in the README on github a 1/10 ratio even.

Espero que ayude,
JP

-------- Original Message --------
Subject: EC2 Perfomance problems, advice needed
From: Ariel Amato <ariel.amato...@gmail.com (http://ariel.amato...@gmail.com (http://gmail.com))>
Date: Tue, July 05, 2011 11:32 pm
To: users <us...@elasticsearch.com (mailto:us...@elasticsearch.com (http://elasticsearch.com))>

Hi,
I have been trying to use ES for a search service for my site,
which is running on EC2 servers, and I am having a lot of performance
problems.
My data set is roughly 1.2M records of 30 fields each, mostly
Integers of Long numbers, a few short Strings and a Geo location
point.
Most of the search queries use a native custom score script.
I need to query the ES cluster with around 300 concurrent
requests, and I roughly need a performance of 40 requests per second.
I have got a staging environment, with around 2k records of the
same nature that the production data set has, with the default
sharding configuration and testing with only one ES instance it
achieves over 30 requests per second with the above mentioned amount
of concurrent requests.
My production cluster is constituted by 2 c1.xlarge instances,
which are 8 core boxes with 7 GB of memory each, running with Ubuntu
10.10 and ES 0.16.2.
Here is my elasticsearch.yml configuration file:https://gist.github.com/1066379(http://gist.github.com/1066379)
Right now I am testing a sharding configuration of 10 shards and 1
replica.
Nodes are discovered perfectly, I use the bulk api (from inside
the Java api) to index the data in bulks of around 1k, with no
indexing performance problems, and my client is connected to both ES
instances.
Before going live with the new service I decided to stress test
it, and after a lot of testing I discovered that I can not achieve, by
far, my desired performance.
With this configuration, when I run the stress test, with 100
concurrent clients making requests after around 10 seconds, system
load starts to increase on the elasticsearch servers to above 10, and
I start getting timeouts (8 seconds) on the stress test.
During this time there are no exceptions on the ES log which is on
INFO level.
I have also noticed an uneven distribution of the load, most of
the times the master node has a system load of above 10 when the other
node is having a system load of 2-3.
I have also tested without the custom script, with the same
results.
For the search queries I am using a BoolQuery with a few range
terms and a couple of Integer terms, and I am not using the full text
search.

If you have endured my ranting so far, here are a couple of
questions:

Is my configuration ok? Am I missing something?
What would your recommendation be to achieve my desired
performance? More smaller servers, more extra large instances, more
sharding, less sharding?
Should I discard ElasticSearch and use a different solution?

Thank you in advance,
Best regards,

Ariel Amato


(Ariel Amato) #12

Hi again,

I created the cluster you suggested, but with no luck. Same results as
before, an extremely high load on one of the nodes and almost nothing
on the other and a lot of timeouts.
After that I added a new node to the cluster, with a configuration
like this:

  • 3 shards / 1 replica, at first, and then
  • 3 shards / 3 replicas

Results were similar, then I removed almost every term from the query
(leaving just 2 terms, which are simple integer value properties), and
ran the stress test again, with a big improvement on the amount of
requests per second, with just a few timeouts, but there were still
timeouts.
But again, system load skyrocketed, here are some captures from the
servers:

Master node:
top: https://gist.github.com/1068559
vmstat: https://gist.github.com/1068564

Slave node 1:
top: https://gist.github.com/1068566
vmstat; https://gist.github.com/1068571

Slave node 2:
top: https://gist.github.com/1068576

Total GC time (ParNew + CMS) on all nodes does not exceed 300ms.

We're using the Sun jvm, Java HotSpot(TM) 64-Bit Server VM, version
19.1-b02, Java version 1.6.0_24 on all nodes and the OS is Ubuntu
10.10.

Would a thread dump of the nodes help?
Right now i'm using the dynamic mapping, except for a geo location
point, would it help to create the mappings before indexing? At the
moment i'm not seeing any problems at indexing time.
I'm actually running out of questions to make, tomorrow i'll work a
little on replicating the queries with your suggestion, but even
without the search query the system load is way too high.

Thank you in advance for any help you can give me on this.

Best regards
Ariel

On Jul 6, 4:33 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Heya,

Lets start with the query, try and replace the range query you add as a should clause with a ConstantScore one wrapping a range filter (and set the boost on it). This will work only on memory and should speed things up.

The suggested filtering will use more memory (you should see in the stats the filter cache being used), but not by much, since age is easily cacheable.

I sugget you start with 2 nodes, and test, lets see what we get. Improving search perf is simple since once if you don't have full utilization of nodes (for example, 2 shard with 1 replica will max out on 4 nodes), you can dynamically increase the number of replicas.

-shay.banon

On Wednesday, July 6, 2011 at 8:43 PM, Ariel Amato wrote:

Hey,

Here they are:
Stats when idle:https://gist.github.com/1067820
Stats under load:https://gist.github.com/1067822
Top:https://gist.github.com/1067818

Sample code for generating the query:https://gist.github.com/1067840

What's the amount of m1.xlarge servers you suggest?

Regards,
Ariel

On Jul 6, 1:52 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

Hey,

I would suggest using m1.xlarge, and allocate to ES ~5gb. When you run the test, can you take a node.stats sample, and gist it?

As for your queries and using filter, it feels like filters will really help here. Can you gist a sample query so we can work on how it will look like?

-shay.banon

On Wednesday, July 6, 2011 at 7:46 PM, Ariel Amato wrote:

Hi Shay, I am allocating 5gb of ram for ES on each server. When it's
not under load, heap usage is really low ~350mb with the 5gb
allocated.
Under load, the amount threads increased to around 150 (when I stopped
the test).
The problem with the uneven load between the master and slave node
seems to be fixed if I specify a routing key when indexing.
The main cause for the load seems to be cpu usage, here is a "top"
output:

top - 16:24:07 up 17:45, 2 users, load average: 31.69, 20.89, 11.02
Tasks: 158 total, 1 running, 157 sleeping, 0 stopped, 0 zombie
Cpu(s): 88.3%us, 3.3%sy, 0.0%ni, 2.7%id, 0.0%wa, 0.0%hi,
0.1%si, 5.6%st
Mem: 7132040k total, 7042560k used, 89480k free, 178336k
buffers
Swap: 0k total, 0k used, 0k free, 1078180k
cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
28055 root 20 0 5689m 5.3g 16m S 789 77.7
21:45.30
java
221 root 20 0 0 0 0 S
0 0.0 0:28.83
kjournald
28314 root 20 0 19276 1352 968 R 0
0.0 0:00.06
top
1 root 20 0 23860 1364 656 S 0
0.0 0:00.83
init

Regarding the queries, for the range terms, I need to downgrade
gracefully the quality of the results that do not match.
For example, when searching for an age field, I need to specify a
range of 25 to 31, so the users on that range get the greatest score,
but I also need to include on the results users with 32 with a lower
score, and 35 with an even lower score. Please let me know if there is
a way to do that with filters.

I'm in the process of expanding the cluster to 4 m1.large instances,
with a 2 shards / 2 replicas configuration.
Please let me know what you think about that.

Thanks in advance
Regards,

Ariel

On Jul 6, 1:07 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

On ec2, I would suggest going with m1.xlarge, as more RAM means more file system cache. Also, having more shards on 2 nodes can slow down search, as it needs to execute across all of them, maybe you can do with 2-4 shards (with 1 replica). Increasing the number of replicas will not help, since you only run on 2 nodes (and you can always increase it dynamically later if oyu add more nodes).

Also, in terms of queries, maybe some of your range queries, especially the numeric ones, can benefit from being used as filters instead of queries (so they can be cached).

How much memory are you allocating to ES?

Master nodes play no part when searching, so the fact that it uses more resources does not imply because its the master node.

Can you tell where the load is coming from? Is it CPU, IO?

On Wednesday, July 6, 2011 at 7:33 AM, jp.lora...@cfyar.com (mailto:jp.lora...@cfyar.com (http://cfyar.com)) wrote:

How much min and max ram for ES -- you only mention the image's RAM. Is there anything else running on the box?

Also a high shard/replica ratio like you have increases indexing (esp. bulk) performance, but the inverse ratio will boost search speed, so enlarging the amount of replicas will speed up the system (have to be more replicas than shards though for optimal results).

I would try with smaller machines -- less cores, less ram, 2 shards, 8 replicas, 8 nodes total. Shay suggests in the README on github a 1/10 ratio even.

Espero que ayude,
JP

-------- Original Message --------
Subject: EC2 Perfomance problems, advice needed
From: Ariel Amato <ariel.amato...@gmail.com (http://ariel.amato...@gmail.com (http://gmail.com))>
Date: Tue, July 05, 2011 11:32 pm
To: users <us...@elasticsearch.com (mailto:us...@elasticsearch.com (http://elasticsearch.com))>

Hi,
I have been trying to use ES for a search service for my site,
which is running on EC2 servers, and I am having a lot of performance
problems.
My data set is roughly 1.2M records of 30 fields each, mostly
Integers of Long numbers, a few short Strings and a Geo location
point.
Most of the search queries use a native custom score script.
I need to query the ES cluster with around 300 concurrent
requests, and I roughly need a performance of 40 requests per second.
I have got a staging environment, with around 2k records of the
same nature that the production data set has, with the default
sharding configuration and testing with only one ES instance it
achieves over 30 requests per second with the above mentioned amount
of concurrent requests.
My production cluster is constituted by 2 c1.xlarge instances,
which are 8 core boxes with 7 GB of memory each, running with Ubuntu
10.10 and ES 0.16.2.
Here is my elasticsearch.yml configuration file:https://gist.github.com/1066379(http://gist.github.com/1066379)
Right now I am testing a sharding configuration of 10 shards and 1
replica.
Nodes are discovered perfectly, I use the bulk api (from inside
the Java api) to index the data in bulks of around 1k, with no
indexing performance problems, and my client is connected to both ES
instances.
Before going live with the new service I decided to stress test
it, and after a lot of testing I discovered that I can not achieve, by
far, my desired performance.
With this configuration, when I run the stress test, with 100
concurrent clients making requests after around 10 seconds, system
load starts to increase on the elasticsearch servers to above 10, and
I start getting timeouts (8 seconds) on the stress test.
During this time there are no exceptions on the ES log which is on
INFO level.
I have also noticed an uneven distribution of the load, most of
the times the master node has a system load of above 10 when the other
node is having a system load of 2-3.
I have also tested without the custom script, with the same
results.
For the search queries I am using a BoolQuery with a few range
terms and a couple of Integer terms, and I am not using the full text
search.

If you have endured my ranting so far, here are a couple of
questions:

Is my configuration ok? Am I missing something?
What would your recommendation be to achieve my desired
performance? More smaller servers, more extra large instances, more
sharding, less sharding?
Should I discard ElasticSearch and use a different solution?

Thank you in advance,
Best regards,

Ariel Amato


(Shay Banon) #13

Hey,

What I would like to see are those queries that you execute, and possibly the script.

A note on shards and replicas. You can increase teh number of replicas to a high value, but as long as you don't have enough nodes to allocate them, it does not matter. For example, if you have 3 shards and 1 replica, and 2 nodes, and then you increase the number of replicas, it would not have affect unless you bring more nodes into the picture.

Another option that you can try and play with is the search thread pool. Can you try and configure it to be a fixed sized thread pool? The configuration is:

threadpool.search.type: fixed
threadpool.search.size: 10

-shay.banon

On Thursday, July 7, 2011 at 2:27 AM, Ariel Amato wrote:

Hi again,

I created the cluster you suggested, but with no luck. Same results as
before, an extremely high load on one of the nodes and almost nothing
on the other and a lot of timeouts.
After that I added a new node to the cluster, with a configuration
like this:

  • 3 shards / 1 replica, at first, and then
  • 3 shards / 3 replicas

Results were similar, then I removed almost every term from the query
(leaving just 2 terms, which are simple integer value properties), and
ran the stress test again, with a big improvement on the amount of
requests per second, with just a few timeouts, but there were still
timeouts.
But again, system load skyrocketed, here are some captures from the
servers:

Master node:
top: https://gist.github.com/1068559
vmstat: https://gist.github.com/1068564

Slave node 1:
top: https://gist.github.com/1068566
vmstat; https://gist.github.com/1068571

Slave node 2:
top: https://gist.github.com/1068576

Total GC time (ParNew + CMS) on all nodes does not exceed 300ms.

We're using the Sun jvm, Java HotSpot(TM) 64-Bit Server VM, version
19.1-b02, Java version 1.6.0_24 on all nodes and the OS is Ubuntu
10.10.

Would a thread dump of the nodes help?
Right now i'm using the dynamic mapping, except for a geo location
point, would it help to create the mappings before indexing? At the
moment i'm not seeing any problems at indexing time.
I'm actually running out of questions to make, tomorrow i'll work a
little on replicating the queries with your suggestion, but even
without the search query the system load is way too high.

Thank you in advance for any help you can give me on this.

Best regards
Ariel

On Jul 6, 4:33 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

Heya,

Lets start with the query, try and replace the range query you add as a should clause with a ConstantScore one wrapping a range filter (and set the boost on it). This will work only on memory and should speed things up.

The suggested filtering will use more memory (you should see in the stats the filter cache being used), but not by much, since age is easily cacheable.

I sugget you start with 2 nodes, and test, lets see what we get. Improving search perf is simple since once if you don't have full utilization of nodes (for example, 2 shard with 1 replica will max out on 4 nodes), you can dynamically increase the number of replicas.

-shay.banon

On Wednesday, July 6, 2011 at 8:43 PM, Ariel Amato wrote:

Hey,

Here they are:
Stats when idle:https://gist.github.com/1067820
Stats under load:https://gist.github.com/1067822
Top:https://gist.github.com/1067818

Sample code for generating the query:https://gist.github.com/1067840

What's the amount of m1.xlarge servers you suggest?

Regards,
Ariel

On Jul 6, 1:52 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

Hey,

I would suggest using m1.xlarge, and allocate to ES ~5gb. When you run the test, can you take a node.stats sample, and gist it?

As for your queries and using filter, it feels like filters will really help here. Can you gist a sample query so we can work on how it will look like?

-shay.banon

On Wednesday, July 6, 2011 at 7:46 PM, Ariel Amato wrote:

Hi Shay, I am allocating 5gb of ram for ES on each server. When it's
not under load, heap usage is really low ~350mb with the 5gb
allocated.
Under load, the amount threads increased to around 150 (when I stopped
the test).
The problem with the uneven load between the master and slave node
seems to be fixed if I specify a routing key when indexing.
The main cause for the load seems to be cpu usage, here is a "top"
output:

top - 16:24:07 up 17:45, 2 users, load average: 31.69, 20.89, 11.02
Tasks: 158 total, 1 running, 157 sleeping, 0 stopped, 0 zombie
Cpu(s): 88.3%us, 3.3%sy, 0.0%ni, 2.7%id, 0.0%wa, 0.0%hi,
0.1%si, 5.6%st
Mem: 7132040k total, 7042560k used, 89480k free, 178336k
buffers
Swap: 0k total, 0k used, 0k free, 1078180k
cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
28055 root 20 0 5689m 5.3g 16m S 789 77.7
21:45.30
java
221 root 20 0 0 0 0 S
0 0.0 0:28.83
kjournald
28314 root 20 0 19276 1352 968 R 0
0.0 0:00.06
top
1 root 20 0 23860 1364 656 S 0
0.0 0:00.83
init

Regarding the queries, for the range terms, I need to downgrade
gracefully the quality of the results that do not match.
For example, when searching for an age field, I need to specify a
range of 25 to 31, so the users on that range get the greatest score,
but I also need to include on the results users with 32 with a lower
score, and 35 with an even lower score. Please let me know if there is
a way to do that with filters.

I'm in the process of expanding the cluster to 4 m1.large instances,
with a 2 shards / 2 replicas configuration.
Please let me know what you think about that.

Thanks in advance
Regards,

Ariel

On Jul 6, 1:07 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

On ec2, I would suggest going with m1.xlarge, as more RAM means more file system cache. Also, having more shards on 2 nodes can slow down search, as it needs to execute across all of them, maybe you can do with 2-4 shards (with 1 replica). Increasing the number of replicas will not help, since you only run on 2 nodes (and you can always increase it dynamically later if oyu add more nodes).

Also, in terms of queries, maybe some of your range queries, especially the numeric ones, can benefit from being used as filters instead of queries (so they can be cached).

How much memory are you allocating to ES?

Master nodes play no part when searching, so the fact that it uses more resources does not imply because its the master node.

Can you tell where the load is coming from? Is it CPU, IO?

On Wednesday, July 6, 2011 at 7:33 AM, jp.lora...@cfyar.com (mailto:jp.lora...@cfyar.com (http://cfyar.com)) wrote:

How much min and max ram for ES -- you only mention the image's RAM. Is there anything else running on the box?

Also a high shard/replica ratio like you have increases indexing (esp. bulk) performance, but the inverse ratio will boost search speed, so enlarging the amount of replicas will speed up the system (have to be more replicas than shards though for optimal results).

I would try with smaller machines -- less cores, less ram, 2 shards, 8 replicas, 8 nodes total. Shay suggests in the README on github a 1/10 ratio even.

Espero que ayude,
JP

-------- Original Message --------
Subject: EC2 Perfomance problems, advice needed
From: Ariel Amato <ariel.amato...@gmail.com (http://ariel.amato...@gmail.com (http://gmail.com))>
Date: Tue, July 05, 2011 11:32 pm
To: users <us...@elasticsearch.com (mailto:us...@elasticsearch.com (http://elasticsearch.com))>

Hi,
I have been trying to use ES for a search service for my site,
which is running on EC2 servers, and I am having a lot of performance
problems.
My data set is roughly 1.2M records of 30 fields each, mostly
Integers of Long numbers, a few short Strings and a Geo location
point.
Most of the search queries use a native custom score script.
I need to query the ES cluster with around 300 concurrent
requests, and I roughly need a performance of 40 requests per second.
I have got a staging environment, with around 2k records of the
same nature that the production data set has, with the default
sharding configuration and testing with only one ES instance it
achieves over 30 requests per second with the above mentioned amount
of concurrent requests.
My production cluster is constituted by 2 c1.xlarge instances,
which are 8 core boxes with 7 GB of memory each, running with Ubuntu
10.10 and ES 0.16.2.
Here is my elasticsearch.yml configuration file:https://gist.github.com/1066379(http://gist.github.com/1066379)
Right now I am testing a sharding configuration of 10 shards and 1
replica.
Nodes are discovered perfectly, I use the bulk api (from inside
the Java api) to index the data in bulks of around 1k, with no
indexing performance problems, and my client is connected to both ES
instances.
Before going live with the new service I decided to stress test
it, and after a lot of testing I discovered that I can not achieve, by
far, my desired performance.
With this configuration, when I run the stress test, with 100
concurrent clients making requests after around 10 seconds, system
load starts to increase on the elasticsearch servers to above 10, and
I start getting timeouts (8 seconds) on the stress test.
During this time there are no exceptions on the ES log which is on
INFO level.
I have also noticed an uneven distribution of the load, most of
the times the master node has a system load of above 10 when the other
node is having a system load of 2-3.
I have also tested without the custom script, with the same
results.
For the search queries I am using a BoolQuery with a few range
terms and a couple of Integer terms, and I am not using the full text
search.

If you have endured my ranting so far, here are a couple of
questions:

Is my configuration ok? Am I missing something?
What would your recommendation be to achieve my desired
performance? More smaller servers, more extra large instances, more
sharding, less sharding?
Should I discard ElasticSearch and use a different solution?

Thank you in advance,
Best regards,

Ariel Amato


(Ariel Amato) #14

Hi again,
Here's a sample query just the age search like you
suggested: https://gist.github.com/1069984 .
Right now i'm running a 3 server cluster, with that config
that's why I tested with 3 replicas.
I've put the configuration for the thread pool, but
still.... the same problems happen.

Thanks for your assistance, and any extra help would be appreciated!

Regards,
Ariel

On Jul 7, 11:32 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Hey,

What I would like to see are those queries that you execute, and possibly the script.

A note on shards and replicas. You can increase teh number of replicas to a high value, but as long as you don't have enough nodes to allocate them, it does not matter. For example, if you have 3 shards and 1 replica, and 2 nodes, and then you increase the number of replicas, it would not have affect unless you bring more nodes into the picture.

Another option that you can try and play with is the search thread pool. Can you try and configure it to be a fixed sized thread pool? The configuration is:

threadpool.search.type: fixed
threadpool.search.size: 10

-shay.banon

On Thursday, July 7, 2011 at 2:27 AM, Ariel Amato wrote:

Hi again,

I created the cluster you suggested, but with no luck. Same results as
before, an extremely high load on one of the nodes and almost nothing
on the other and a lot of timeouts.
After that I added a new node to the cluster, with a configuration
like this:

  • 3 shards / 1 replica, at first, and then
  • 3 shards / 3 replicas

Results were similar, then I removed almost every term from the query
(leaving just 2 terms, which are simple integer value properties), and
ran the stress test again, with a big improvement on the amount of
requests per second, with just a few timeouts, but there were still
timeouts.
But again, system load skyrocketed, here are some captures from the
servers:

Master node:
top:https://gist.github.com/1068559
vmstat:https://gist.github.com/1068564

Slave node 1:
top:https://gist.github.com/1068566
vmstat;https://gist.github.com/1068571

Slave node 2:
top:https://gist.github.com/1068576

Total GC time (ParNew + CMS) on all nodes does not exceed 300ms.

We're using the Sun jvm, Java HotSpot(TM) 64-Bit Server VM, version
19.1-b02, Java version 1.6.0_24 on all nodes and the OS is Ubuntu
10.10.

Would a thread dump of the nodes help?
Right now i'm using the dynamic mapping, except for a geo location
point, would it help to create the mappings before indexing? At the
moment i'm not seeing any problems at indexing time.
I'm actually running out of questions to make, tomorrow i'll work a
little on replicating the queries with your suggestion, but even
without the search query the system load is way too high.

Thank you in advance for any help you can give me on this.

Best regards
Ariel

On Jul 6, 4:33 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

Heya,

Lets start with the query, try and replace the range query you add as a should clause with a ConstantScore one wrapping a range filter (and set the boost on it). This will work only on memory and should speed things up.

The suggested filtering will use more memory (you should see in the stats the filter cache being used), but not by much, since age is easily cacheable.

I sugget you start with 2 nodes, and test, lets see what we get. Improving search perf is simple since once if you don't have full utilization of nodes (for example, 2 shard with 1 replica will max out on 4 nodes), you can dynamically increase the number of replicas.

-shay.banon

On Wednesday, July 6, 2011 at 8:43 PM, Ariel Amato wrote:

Hey,

Here they are:
Stats when idle:https://gist.github.com/1067820
Stats under load:https://gist.github.com/1067822
Top:https://gist.github.com/1067818

Sample code for generating the query:https://gist.github.com/1067840

What's the amount of m1.xlarge servers you suggest?

Regards,
Ariel

On Jul 6, 1:52 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

Hey,

I would suggest using m1.xlarge, and allocate to ES ~5gb. When you run the test, can you take a node.stats sample, and gist it?

As for your queries and using filter, it feels like filters will really help here. Can you gist a sample query so we can work on how it will look like?

-shay.banon

On Wednesday, July 6, 2011 at 7:46 PM, Ariel Amato wrote:

Hi Shay, I am allocating 5gb of ram for ES on each server. When it's
not under load, heap usage is really low ~350mb with the 5gb
allocated.
Under load, the amount threads increased to around 150 (when I stopped
the test).
The problem with the uneven load between the master and slave node
seems to be fixed if I specify a routing key when indexing.
The main cause for the load seems to be cpu usage, here is a "top"
output:

top - 16:24:07 up 17:45, 2 users, load average: 31.69, 20.89, 11.02
Tasks: 158 total, 1 running, 157 sleeping, 0 stopped, 0 zombie
Cpu(s): 88.3%us, 3.3%sy, 0.0%ni, 2.7%id, 0.0%wa, 0.0%hi,
0.1%si, 5.6%st
Mem: 7132040k total, 7042560k used, 89480k free, 178336k
buffers
Swap: 0k total, 0k used, 0k free, 1078180k
cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
28055 root 20 0 5689m 5.3g 16m S 789 77.7
21:45.30
java
221 root 20 0 0 0 0 S
0 0.0 0:28.83
kjournald
28314 root 20 0 19276 1352 968 R 0
0.0 0:00.06
top
1 root 20 0 23860 1364 656 S 0
0.0 0:00.83
init

Regarding the queries, for the range terms, I need to downgrade
gracefully the quality of the results that do not match.
For example, when searching for an age field, I need to specify a
range of 25 to 31, so the users on that range get the greatest score,
but I also need to include on the results users with 32 with a lower
score, and 35 with an even lower score. Please let me know if there is
a way to do that with filters.

I'm in the process of expanding the cluster to 4 m1.large instances,
with a 2 shards / 2 replicas configuration.
Please let me know what you think about that.

Thanks in advance
Regards,

Ariel

On Jul 6, 1:07 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

On ec2, I would suggest going with m1.xlarge, as more RAM means more file system cache. Also, having more shards on 2 nodes can slow down search, as it needs to execute across all of them, maybe you can do with 2-4 shards (with 1 replica). Increasing the number of replicas will not help, since you only run on 2 nodes (and you can always increase it dynamically later if oyu add more nodes).

Also, in terms of queries, maybe some of your range queries, especially the numeric ones, can benefit from being used as filters instead of queries (so they can be cached).

How much memory are you allocating to ES?

Master nodes play no part when searching, so the fact that it uses more resources does not imply because its the master node.

Can you tell where the load is coming from? Is it CPU, IO?

On Wednesday, July 6, 2011 at 7:33 AM, jp.lora...@cfyar.com (mailto:jp.lora...@cfyar.com (http://cfyar.com)) wrote:

How much min and max ram for ES -- you only mention the image's RAM. Is there anything else running on the box?

Also a high shard/replica ratio like you have increases indexing (esp. bulk) performance, but the inverse ratio will boost search speed, so enlarging the amount of replicas will speed up the system (have to be more replicas than shards though for optimal results).

I would try with smaller machines -- less cores, less ram, 2 shards, 8 replicas, 8 nodes total. Shay suggests in the README on github a 1/10 ratio even.

Espero que ayude,
JP

-------- Original Message --------
Subject: EC2 Perfomance problems, advice needed
From: Ariel Amato <ariel.amato...@gmail.com (http://ariel.amato...@gmail.com (http://gmail.com))>
Date: Tue, July 05, 2011 11:32 pm
To: users <us...@elasticsearch.com (mailto:us...@elasticsearch.com (http://elasticsearch.com))>

Hi,
I have been trying to use ES for a search service for my site,
which is running on EC2 servers, and I am having a lot of performance
problems.
My data set is roughly 1.2M records of 30 fields each, mostly
Integers of Long numbers, a few short Strings and a Geo location
point.
Most of the search queries use a native custom score script.
I need to query the ES cluster with around 300 concurrent
requests, and I roughly need a performance of 40 requests per second.
I have got a staging environment, with around 2k records of the
same nature that the production data set has, with the default
sharding configuration and testing with only one ES instance it
achieves over 30 requests per second with the above mentioned amount
of concurrent requests.
My production cluster is constituted by 2 c1.xlarge instances,
which are 8 core boxes with 7 GB of memory each, running with Ubuntu
10.10 and ES 0.16.2.
Here is my elasticsearch.yml configuration file:https://gist.github.com/1066379(http://gist.github.com/1066379)
Right now I am testing a sharding configuration of 10 shards and 1
replica.
Nodes are discovered perfectly, I use the bulk api (from inside

...

read more »


(Ariel Amato) #15

Another thing that really worries me is the difference in system load
between the nodes, when i'm running the tests, it's always the master
node, that has a spike on system load.
Even if the requests are sent to all of the nodes.

Regards,
Ariel

On Jul 7, 2:07 pm, Ariel Amato ariel.amato...@gmail.com wrote:

Hi again,
Here's a sample query just the age search like you
suggested:https://gist.github.com/1069984.
Right now i'm running a 3 server cluster, with that config
that's why I tested with 3 replicas.
I've put the configuration for the thread pool, but
still.... the same problems happen.

Thanks for your assistance, and any extra help would be appreciated!

Regards,
Ariel

On Jul 7, 11:32 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Hey,

What I would like to see are those queries that you execute, and possibly the script.

A note on shards and replicas. You can increase teh number of replicas to a high value, but as long as you don't have enough nodes to allocate them, it does not matter. For example, if you have 3 shards and 1 replica, and 2 nodes, and then you increase the number of replicas, it would not have affect unless you bring more nodes into the picture.

Another option that you can try and play with is the search thread pool. Can you try and configure it to be a fixed sized thread pool? The configuration is:

threadpool.search.type: fixed
threadpool.search.size: 10

-shay.banon

On Thursday, July 7, 2011 at 2:27 AM, Ariel Amato wrote:

Hi again,

I created the cluster you suggested, but with no luck. Same results as
before, an extremely high load on one of the nodes and almost nothing
on the other and a lot of timeouts.
After that I added a new node to the cluster, with a configuration
like this:

  • 3 shards / 1 replica, at first, and then
  • 3 shards / 3 replicas

Results were similar, then I removed almost every term from the query
(leaving just 2 terms, which are simple integer value properties), and
ran the stress test again, with a big improvement on the amount of
requests per second, with just a few timeouts, but there were still
timeouts.
But again, system load skyrocketed, here are some captures from the
servers:

Master node:
top:https://gist.github.com/1068559
vmstat:https://gist.github.com/1068564

Slave node 1:
top:https://gist.github.com/1068566
vmstat;https://gist.github.com/1068571

Slave node 2:
top:https://gist.github.com/1068576

Total GC time (ParNew + CMS) on all nodes does not exceed 300ms.

We're using the Sun jvm, Java HotSpot(TM) 64-Bit Server VM, version
19.1-b02, Java version 1.6.0_24 on all nodes and the OS is Ubuntu
10.10.

Would a thread dump of the nodes help?
Right now i'm using the dynamic mapping, except for a geo location
point, would it help to create the mappings before indexing? At the
moment i'm not seeing any problems at indexing time.
I'm actually running out of questions to make, tomorrow i'll work a
little on replicating the queries with your suggestion, but even
without the search query the system load is way too high.

Thank you in advance for any help you can give me on this.

Best regards
Ariel

On Jul 6, 4:33 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

Heya,

Lets start with the query, try and replace the range query you add as a should clause with a ConstantScore one wrapping a range filter (and set the boost on it). This will work only on memory and should speed things up.

The suggested filtering will use more memory (you should see in the stats the filter cache being used), but not by much, since age is easily cacheable.

I sugget you start with 2 nodes, and test, lets see what we get. Improving search perf is simple since once if you don't have full utilization of nodes (for example, 2 shard with 1 replica will max out on 4 nodes), you can dynamically increase the number of replicas.

-shay.banon

On Wednesday, July 6, 2011 at 8:43 PM, Ariel Amato wrote:

Hey,

Here they are:
Stats when idle:https://gist.github.com/1067820
Stats under load:https://gist.github.com/1067822
Top:https://gist.github.com/1067818

Sample code for generating the query:https://gist.github.com/1067840

What's the amount of m1.xlarge servers you suggest?

Regards,
Ariel

On Jul 6, 1:52 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

Hey,

I would suggest using m1.xlarge, and allocate to ES ~5gb. When you run the test, can you take a node.stats sample, and gist it?

As for your queries and using filter, it feels like filters will really help here. Can you gist a sample query so we can work on how it will look like?

-shay.banon

On Wednesday, July 6, 2011 at 7:46 PM, Ariel Amato wrote:

Hi Shay, I am allocating 5gb of ram for ES on each server. When it's
not under load, heap usage is really low ~350mb with the 5gb
allocated.
Under load, the amount threads increased to around 150 (when I stopped
the test).
The problem with the uneven load between the master and slave node
seems to be fixed if I specify a routing key when indexing.
The main cause for the load seems to be cpu usage, here is a "top"
output:

top - 16:24:07 up 17:45, 2 users, load average: 31.69, 20.89, 11.02
Tasks: 158 total, 1 running, 157 sleeping, 0 stopped, 0 zombie
Cpu(s): 88.3%us, 3.3%sy, 0.0%ni, 2.7%id, 0.0%wa, 0.0%hi,
0.1%si, 5.6%st
Mem: 7132040k total, 7042560k used, 89480k free, 178336k
buffers
Swap: 0k total, 0k used, 0k free, 1078180k
cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
28055 root 20 0 5689m 5.3g 16m S 789 77.7
21:45.30
java
221 root 20 0 0 0 0 S
0 0.0 0:28.83
kjournald
28314 root 20 0 19276 1352 968 R 0
0.0 0:00.06
top
1 root 20 0 23860 1364 656 S 0
0.0 0:00.83
init

Regarding the queries, for the range terms, I need to downgrade
gracefully the quality of the results that do not match.
For example, when searching for an age field, I need to specify a
range of 25 to 31, so the users on that range get the greatest score,
but I also need to include on the results users with 32 with a lower
score, and 35 with an even lower score. Please let me know if there is
a way to do that with filters.

I'm in the process of expanding the cluster to 4 m1.large instances,
with a 2 shards / 2 replicas configuration.
Please let me know what you think about that.

Thanks in advance
Regards,

Ariel

On Jul 6, 1:07 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

On ec2, I would suggest going with m1.xlarge, as more RAM means more file system cache. Also, having more shards on 2 nodes can slow down search, as it needs to execute across all of them, maybe you can do with 2-4 shards (with 1 replica). Increasing the number of replicas will not help, since you only run on 2 nodes (and you can always increase it dynamically later if oyu add more nodes).

Also, in terms of queries, maybe some of your range queries, especially the numeric ones, can benefit from being used as filters instead of queries (so they can be cached).

How much memory are you allocating to ES?

Master nodes play no part when searching, so the fact that it uses more resources does not imply because its the master node.

Can you tell where the load is coming from? Is it CPU, IO?

On Wednesday, July 6, 2011 at 7:33 AM, jp.lora...@cfyar.com (mailto:jp.lora...@cfyar.com (http://cfyar.com)) wrote:

How much min and max ram for ES -- you only mention the image's RAM. Is there anything else running on the box?

Also a high shard/replica ratio like you have increases indexing (esp. bulk) performance, but the inverse ratio will boost search speed, so enlarging the amount of replicas will speed up the system (have to be more replicas than shards though for optimal results).

I would try with smaller machines -- less cores, less ram, 2 shards, 8 replicas, 8 nodes total. Shay suggests in the README on github a 1/10 ratio even.

Espero que ayude,
JP

-------- Original Message --------
Subject: EC2 Perfomance problems, advice needed
From: Ariel Amato <ariel.amato...@gmail.com (http://ariel.amato...@gmail.com (http://gmail.com))>
Date: Tue, July 05, 2011 11:32 pm
To: users <us...@elasticsearch.com (mailto:us...@elasticsearch.com (http://elasticsearch.com))>

Hi,
I have been trying to use ES for a search service for my site,
which is running on EC2 servers, and I am having a lot of performance
problems.
My data set is roughly 1.2M records of 30 fields each, mostly
Integers of Long numbers, a few short Strings and a Geo location
point.
Most of the search queries use a native custom score script.
I need to query the ES cluster with around 300 concurrent
requests, and I roughly need a performance of 40 requests per second.
I have got a staging

...

read more »


(Ariel Amato) #16

Adding thread dumps for the master and slave nodes:

Master: https://gist.github.com/1070165
Slave1: https://gist.github.com/1070167
Slave2: https://gist.github.com/1070168

Regards,
Ariel

On Jul 7, 2:24 pm, Ariel Amato ariel.amato...@gmail.com wrote:

Another thing that really worries me is the difference in system load
between the nodes, when i'm running the tests, it's always the master
node, that has a spike on system load.
Even if the requests are sent to all of the nodes.

Regards,
Ariel

On Jul 7, 2:07 pm, Ariel Amato ariel.amato...@gmail.com wrote:

Hi again,
Here's a sample query just the age search like you
suggested:https://gist.github.com/1069984.
Right now i'm running a 3 server cluster, with that config
that's why I tested with 3 replicas.
I've put the configuration for the thread pool, but
still.... the same problems happen.

Thanks for your assistance, and any extra help would be appreciated!

Regards,
Ariel

On Jul 7, 11:32 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Hey,

What I would like to see are those queries that you execute, and possibly the script.

A note on shards and replicas. You can increase teh number of replicas to a high value, but as long as you don't have enough nodes to allocate them, it does not matter. For example, if you have 3 shards and 1 replica, and 2 nodes, and then you increase the number of replicas, it would not have affect unless you bring more nodes into the picture.

Another option that you can try and play with is the search thread pool. Can you try and configure it to be a fixed sized thread pool? The configuration is:

threadpool.search.type: fixed
threadpool.search.size: 10

-shay.banon

On Thursday, July 7, 2011 at 2:27 AM, Ariel Amato wrote:

Hi again,

I created the cluster you suggested, but with no luck. Same results as
before, an extremely high load on one of the nodes and almost nothing
on the other and a lot of timeouts.
After that I added a new node to the cluster, with a configuration
like this:

  • 3 shards / 1 replica, at first, and then
  • 3 shards / 3 replicas

Results were similar, then I removed almost every term from the query
(leaving just 2 terms, which are simple integer value properties), and
ran the stress test again, with a big improvement on the amount of
requests per second, with just a few timeouts, but there were still
timeouts.
But again, system load skyrocketed, here are some captures from the
servers:

Master node:
top:https://gist.github.com/1068559
vmstat:https://gist.github.com/1068564

Slave node 1:
top:https://gist.github.com/1068566
vmstat;https://gist.github.com/1068571

Slave node 2:
top:https://gist.github.com/1068576

Total GC time (ParNew + CMS) on all nodes does not exceed 300ms.

We're using the Sun jvm, Java HotSpot(TM) 64-Bit Server VM, version
19.1-b02, Java version 1.6.0_24 on all nodes and the OS is Ubuntu
10.10.

Would a thread dump of the nodes help?
Right now i'm using the dynamic mapping, except for a geo location
point, would it help to create the mappings before indexing? At the
moment i'm not seeing any problems at indexing time.
I'm actually running out of questions to make, tomorrow i'll work a
little on replicating the queries with your suggestion, but even
without the search query the system load is way too high.

Thank you in advance for any help you can give me on this.

Best regards
Ariel

On Jul 6, 4:33 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

Heya,

Lets start with the query, try and replace the range query you add as a should clause with a ConstantScore one wrapping a range filter (and set the boost on it). This will work only on memory and should speed things up.

The suggested filtering will use more memory (you should see in the stats the filter cache being used), but not by much, since age is easily cacheable.

I sugget you start with 2 nodes, and test, lets see what we get. Improving search perf is simple since once if you don't have full utilization of nodes (for example, 2 shard with 1 replica will max out on 4 nodes), you can dynamically increase the number of replicas.

-shay.banon

On Wednesday, July 6, 2011 at 8:43 PM, Ariel Amato wrote:

Hey,

Here they are:
Stats when idle:https://gist.github.com/1067820
Stats under load:https://gist.github.com/1067822
Top:https://gist.github.com/1067818

Sample code for generating the query:https://gist.github.com/1067840

What's the amount of m1.xlarge servers you suggest?

Regards,
Ariel

On Jul 6, 1:52 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

Hey,

I would suggest using m1.xlarge, and allocate to ES ~5gb. When you run the test, can you take a node.stats sample, and gist it?

As for your queries and using filter, it feels like filters will really help here. Can you gist a sample query so we can work on how it will look like?

-shay.banon

On Wednesday, July 6, 2011 at 7:46 PM, Ariel Amato wrote:

Hi Shay, I am allocating 5gb of ram for ES on each server. When it's
not under load, heap usage is really low ~350mb with the 5gb
allocated.
Under load, the amount threads increased to around 150 (when I stopped
the test).
The problem with the uneven load between the master and slave node
seems to be fixed if I specify a routing key when indexing.
The main cause for the load seems to be cpu usage, here is a "top"
output:

top - 16:24:07 up 17:45, 2 users, load average: 31.69, 20.89, 11.02
Tasks: 158 total, 1 running, 157 sleeping, 0 stopped, 0 zombie
Cpu(s): 88.3%us, 3.3%sy, 0.0%ni, 2.7%id, 0.0%wa, 0.0%hi,
0.1%si, 5.6%st
Mem: 7132040k total, 7042560k used, 89480k free, 178336k
buffers
Swap: 0k total, 0k used, 0k free, 1078180k
cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
28055 root 20 0 5689m 5.3g 16m S 789 77.7
21:45.30
java
221 root 20 0 0 0 0 S
0 0.0 0:28.83
kjournald
28314 root 20 0 19276 1352 968 R 0
0.0 0:00.06
top
1 root 20 0 23860 1364 656 S 0
0.0 0:00.83
init

Regarding the queries, for the range terms, I need to downgrade
gracefully the quality of the results that do not match.
For example, when searching for an age field, I need to specify a
range of 25 to 31, so the users on that range get the greatest score,
but I also need to include on the results users with 32 with a lower
score, and 35 with an even lower score. Please let me know if there is
a way to do that with filters.

I'm in the process of expanding the cluster to 4 m1.large instances,
with a 2 shards / 2 replicas configuration.
Please let me know what you think about that.

Thanks in advance
Regards,

Ariel

On Jul 6, 1:07 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

On ec2, I would suggest going with m1.xlarge, as more RAM means more file system cache. Also, having more shards on 2 nodes can slow down search, as it needs to execute across all of them, maybe you can do with 2-4 shards (with 1 replica). Increasing the number of replicas will not help, since you only run on 2 nodes (and you can always increase it dynamically later if oyu add more nodes).

Also, in terms of queries, maybe some of your range queries, especially the numeric ones, can benefit from being used as filters instead of queries (so they can be cached).

How much memory are you allocating to ES?

Master nodes play no part when searching, so the fact that it uses more resources does not imply because its the master node.

Can you tell where the load is coming from? Is it CPU, IO?

On Wednesday, July 6, 2011 at 7:33 AM, jp.lora...@cfyar.com (mailto:jp.lora...@cfyar.com (http://cfyar.com)) wrote:

How much min and max ram for ES -- you only mention the image's RAM. Is there anything else running on the box?

Also a high shard/replica ratio like you have increases indexing (esp. bulk) performance, but the inverse ratio will boost search speed, so enlarging the amount of replicas will speed up the system (have to be more replicas than shards though for optimal results).

I would try with smaller machines -- less cores, less ram, 2 shards, 8 replicas, 8 nodes total. Shay suggests in the README on github a 1/10 ratio even.

Espero que ayude,
JP

-------- Original Message --------
Subject: EC2 Perfomance problems, advice needed
From: Ariel Amato <ariel.amato...@gmail.com (http://ariel.amato...@gmail.com (http://gmail.com))>
Date: Tue, July 05, 2011 11:32 pm
To: users <us...@elasticsearch.com (mailto:us...@elasticsearch.com (http://elasticsearch.com))>

Hi,
I have been trying to use ES

...

read more »


(Ariel Amato) #17

Hi again,
After reviewing the thread dumps I could see that only the
master node threads where busy, all with similar stack traces:

"elasticsearch[search]-pool-3-thread-10" daemon prio=10
tid=0x00007f4698300800 nid=0x4d7 runnable [0x00007f4695ddc000]
java.lang.Thread.State: RUNNABLE
at org.apache.lucene.search.BooleanScorer2.docID(BooleanScorer2.java:
298)
at org.apache.lucene.search.FilteredQuery
$1$1.nextDoc(FilteredQuery.java:147)
at org.apache.lucene.search.Scorer.score(Scorer.java:89)

        and the threads of the other 2 nodes had this stack trace:

"elasticsearch[search]-pool-3-thread-2" daemon prio=10
tid=0x0000000041d33000 nid=0x47b waiting on condition
[0x00007f0d689bf000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000006c61a6b08> (a
org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at
org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:
689)

   After that, I downgraded to ES 0.15.2, with better results.

Load across the nodes was similar and the requests per seconds
increased to ~45.
Sadly, system load an all nodes stayed around 8 during the
tests, which worries me.
Is there an explanation for this?

Best regards,

Ariel Amato

On Jul 7, 3:24 pm, Ariel Amato ariel.amato...@gmail.com wrote:

Adding thread dumps for the master and slave nodes:

Master:https://gist.github.com/1070165
Slave1: https://gist.github.com/1070167
Slave2: https://gist.github.com/1070168

Regards,
Ariel

On Jul 7, 2:24 pm, Ariel Amato ariel.amato...@gmail.com wrote:

Another thing that really worries me is the difference in system load
between the nodes, when i'm running the tests, it's always the master
node, that has a spike on system load.
Even if the requests are sent to all of the nodes.

Regards,
Ariel

On Jul 7, 2:07 pm, Ariel Amato ariel.amato...@gmail.com wrote:

Hi again,
Here's a sample query just the age search like you
suggested:https://gist.github.com/1069984.
Right now i'm running a 3 server cluster, with that config
that's why I tested with 3 replicas.
I've put the configuration for the thread pool, but
still.... the same problems happen.

Thanks for your assistance, and any extra help would be appreciated!

Regards,
Ariel

On Jul 7, 11:32 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Hey,

What I would like to see are those queries that you execute, and possibly the script.

A note on shards and replicas. You can increase teh number of replicas to a high value, but as long as you don't have enough nodes to allocate them, it does not matter. For example, if you have 3 shards and 1 replica, and 2 nodes, and then you increase the number of replicas, it would not have affect unless you bring more nodes into the picture.

Another option that you can try and play with is the search thread pool. Can you try and configure it to be a fixed sized thread pool? The configuration is:

threadpool.search.type: fixed
threadpool.search.size: 10

-shay.banon

On Thursday, July 7, 2011 at 2:27 AM, Ariel Amato wrote:

Hi again,

I created the cluster you suggested, but with no luck. Same results as
before, an extremely high load on one of the nodes and almost nothing
on the other and a lot of timeouts.
After that I added a new node to the cluster, with a configuration
like this:

  • 3 shards / 1 replica, at first, and then
  • 3 shards / 3 replicas

Results were similar, then I removed almost every term from the query
(leaving just 2 terms, which are simple integer value properties), and
ran the stress test again, with a big improvement on the amount of
requests per second, with just a few timeouts, but there were still
timeouts.
But again, system load skyrocketed, here are some captures from the
servers:

Master node:
top:https://gist.github.com/1068559
vmstat:https://gist.github.com/1068564

Slave node 1:
top:https://gist.github.com/1068566
vmstat;https://gist.github.com/1068571

Slave node 2:
top:https://gist.github.com/1068576

Total GC time (ParNew + CMS) on all nodes does not exceed 300ms.

We're using the Sun jvm, Java HotSpot(TM) 64-Bit Server VM, version
19.1-b02, Java version 1.6.0_24 on all nodes and the OS is Ubuntu
10.10.

Would a thread dump of the nodes help?
Right now i'm using the dynamic mapping, except for a geo location
point, would it help to create the mappings before indexing? At the
moment i'm not seeing any problems at indexing time.
I'm actually running out of questions to make, tomorrow i'll work a
little on replicating the queries with your suggestion, but even
without the search query the system load is way too high.

Thank you in advance for any help you can give me on this.

Best regards
Ariel

On Jul 6, 4:33 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

Heya,

Lets start with the query, try and replace the range query you add as a should clause with a ConstantScore one wrapping a range filter (and set the boost on it). This will work only on memory and should speed things up.

The suggested filtering will use more memory (you should see in the stats the filter cache being used), but not by much, since age is easily cacheable.

I sugget you start with 2 nodes, and test, lets see what we get. Improving search perf is simple since once if you don't have full utilization of nodes (for example, 2 shard with 1 replica will max out on 4 nodes), you can dynamically increase the number of replicas.

-shay.banon

On Wednesday, July 6, 2011 at 8:43 PM, Ariel Amato wrote:

Hey,

Here they are:
Stats when idle:https://gist.github.com/1067820
Stats under load:https://gist.github.com/1067822
Top:https://gist.github.com/1067818

Sample code for generating the query:https://gist.github.com/1067840

What's the amount of m1.xlarge servers you suggest?

Regards,
Ariel

On Jul 6, 1:52 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

Hey,

I would suggest using m1.xlarge, and allocate to ES ~5gb. When you run the test, can you take a node.stats sample, and gist it?

As for your queries and using filter, it feels like filters will really help here. Can you gist a sample query so we can work on how it will look like?

-shay.banon

On Wednesday, July 6, 2011 at 7:46 PM, Ariel Amato wrote:

Hi Shay, I am allocating 5gb of ram for ES on each server. When it's
not under load, heap usage is really low ~350mb with the 5gb
allocated.
Under load, the amount threads increased to around 150 (when I stopped
the test).
The problem with the uneven load between the master and slave node
seems to be fixed if I specify a routing key when indexing.
The main cause for the load seems to be cpu usage, here is a "top"
output:

top - 16:24:07 up 17:45, 2 users, load average: 31.69, 20.89, 11.02
Tasks: 158 total, 1 running, 157 sleeping, 0 stopped, 0 zombie
Cpu(s): 88.3%us, 3.3%sy, 0.0%ni, 2.7%id, 0.0%wa, 0.0%hi,
0.1%si, 5.6%st
Mem: 7132040k total, 7042560k used, 89480k free, 178336k
buffers
Swap: 0k total, 0k used, 0k free, 1078180k
cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
28055 root 20 0 5689m 5.3g 16m S 789 77.7
21:45.30
java
221 root 20 0 0 0 0 S
0 0.0 0:28.83
kjournald
28314 root 20 0 19276 1352 968 R 0
0.0 0:00.06
top
1 root 20 0 23860 1364 656 S 0
0.0 0:00.83
init

Regarding the queries, for the range terms, I need to downgrade
gracefully the quality of the results that do not match.
For example, when searching for an age field, I need to specify a
range of 25 to 31, so the users on that range get the greatest score,
but I also need to include on the results users with 32 with a lower
score, and 35 with an even lower score. Please let me know if there is
a way to do that with filters.

I'm in the process of expanding the cluster to 4 m1.large instances,
with a 2 shards / 2 replicas configuration.
Please let me know what you think about that.

Thanks in advance
Regards,

Ariel

On Jul 6, 1:07 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

On ec2, I would suggest going with m1.xlarge, as more RAM means more file system cache. Also, having more shards on 2 nodes can slow down search, as it needs to execute across all of them, maybe you can do with 2-4 shards (with 1 replica). Increasing the number of replicas will not help, since you only run on 2 nodes (and you can always increase it dynamically later if oyu add more nodes).

Also, in terms of queries, maybe some of your range queries, especially the numeric ones, can benefit from being used as filters instead of queries (so they can be cached).

How much memory are you allocating to ES?

Master nodes play no part when searching, so the fact that it uses more resources does not imply because its the master node.

Can you tell where the load is coming from? Is it CPU, IO?

On Wednesday, July 6, 2011 at 7:33 AM, jp.lora...@cfyar.com (mailto:jp.lora...@cfyar.com (http://cfyar.com)) wrote:

How much min and max ram for ES -- you only mention the image's RAM. Is there anything else running on the box?

Also a high shard/replica ratio like you have increases indexing (esp. bulk) performance, but the inverse ratio will boost search speed, so enlarging the amount of replicas will speed up the system (have to be more replicas than shards though for optimal results).

I would try with smaller machines -- less cores, less ram, 2 shards, 8 replicas, 8 nodes total. Shay suggests in the README on github a 1/10 ratio even.

...

read more »


(Shay Banon) #18

Thread dumps are a point in time in nature, but, I think I found a reason why there isn't a proper distribution of search request across a shard and a replica (in a 1 replica scenario). I will push a fix, but, for now, can you simply set

action.search.optimize_single_shard: false

in the config file and see?

On Friday, July 8, 2011 at 12:46 AM, Ariel Amato wrote:

Hi again,
After reviewing the thread dumps I could see that only the
master node threads where busy, all with similar stack traces:

"elasticsearch[search]-pool-3-thread-10" daemon prio=10
tid=0x00007f4698300800 nid=0x4d7 runnable [0x00007f4695ddc000]
java.lang.Thread.State: RUNNABLE
at org.apache.lucene.search.BooleanScorer2.docID(BooleanScorer2.java:
298)
at org.apache.lucene.search.FilteredQuery
$1$1.nextDoc(FilteredQuery.java:147)
at org.apache.lucene.search.Scorer.score(Scorer.java:89)

and the threads of the other 2 nodes had this stack trace:

"elasticsearch[search]-pool-3-thread-2" daemon prio=10
tid=0x0000000041d33000 nid=0x47b waiting on condition
[0x00007f0d689bf000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)

  • parking to wait for <0x00000006c61a6b08> (a
    org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
    at
    org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:

After that, I downgraded to ES 0.15.2, with better results.
Load across the nodes was similar and the requests per seconds
increased to ~45.
Sadly, system load an all nodes stayed around 8 during the
tests, which worries me.
Is there an explanation for this?

Best regards,

Ariel Amato

On Jul 7, 3:24 pm, Ariel Amato <ariel.amato...@gmail.com (http://gmail.com)> wrote:

Adding thread dumps for the master and slave nodes:

Master:https://gist.github.com/1070165
Slave1: https://gist.github.com/1070167
Slave2: https://gist.github.com/1070168

Regards,
Ariel

On Jul 7, 2:24 pm, Ariel Amato <ariel.amato...@gmail.com (http://gmail.com)> wrote:

Another thing that really worries me is the difference in system load
between the nodes, when i'm running the tests, it's always the master
node, that has a spike on system load.
Even if the requests are sent to all of the nodes.

Regards,
Ariel

On Jul 7, 2:07 pm, Ariel Amato <ariel.amato...@gmail.com (http://gmail.com)> wrote:

Hi again,
Here's a sample query just the age search like you
suggested:https://gist.github.com/1069984.
Right now i'm running a 3 server cluster, with that config
that's why I tested with 3 replicas.
I've put the configuration for the thread pool, but
still.... the same problems happen.

Thanks for your assistance, and any extra help would be appreciated!

Regards,
Ariel

On Jul 7, 11:32 am, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

Hey,

What I would like to see are those queries that you execute, and possibly the script.

A note on shards and replicas. You can increase teh number of replicas to a high value, but as long as you don't have enough nodes to allocate them, it does not matter. For example, if you have 3 shards and 1 replica, and 2 nodes, and then you increase the number of replicas, it would not have affect unless you bring more nodes into the picture.

Another option that you can try and play with is the search thread pool. Can you try and configure it to be a fixed sized thread pool? The configuration is:

threadpool.search.type: fixed
threadpool.search.size: 10

-shay.banon

On Thursday, July 7, 2011 at 2:27 AM, Ariel Amato wrote:

Hi again,

I created the cluster you suggested, but with no luck. Same results as
before, an extremely high load on one of the nodes and almost nothing
on the other and a lot of timeouts.
After that I added a new node to the cluster, with a configuration
like this:

  • 3 shards / 1 replica, at first, and then
  • 3 shards / 3 replicas

Results were similar, then I removed almost every term from the query
(leaving just 2 terms, which are simple integer value properties), and
ran the stress test again, with a big improvement on the amount of
requests per second, with just a few timeouts, but there were still
timeouts.
But again, system load skyrocketed, here are some captures from the
servers:

Master node:
top:https://gist.github.com/1068559
vmstat:https://gist.github.com/1068564

Slave node 1:
top:https://gist.github.com/1068566
vmstat;https://gist.github.com/1068571

Slave node 2:
top:https://gist.github.com/1068576

Total GC time (ParNew + CMS) on all nodes does not exceed 300ms.

We're using the Sun jvm, Java HotSpot(TM) 64-Bit Server VM, version
19.1-b02, Java version 1.6.0_24 on all nodes and the OS is Ubuntu
10.10.

Would a thread dump of the nodes help?
Right now i'm using the dynamic mapping, except for a geo location
point, would it help to create the mappings before indexing? At the
moment i'm not seeing any problems at indexing time.
I'm actually running out of questions to make, tomorrow i'll work a
little on replicating the queries with your suggestion, but even
without the search query the system load is way too high.

Thank you in advance for any help you can give me on this.

Best regards
Ariel

On Jul 6, 4:33 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

Heya,

Lets start with the query, try and replace the range query you add as a should clause with a ConstantScore one wrapping a range filter (and set the boost on it). This will work only on memory and should speed things up.

The suggested filtering will use more memory (you should see in the stats the filter cache being used), but not by much, since age is easily cacheable.

I sugget you start with 2 nodes, and test, lets see what we get. Improving search perf is simple since once if you don't have full utilization of nodes (for example, 2 shard with 1 replica will max out on 4 nodes), you can dynamically increase the number of replicas.

-shay.banon

On Wednesday, July 6, 2011 at 8:43 PM, Ariel Amato wrote:

Hey,

Here they are:
Stats when idle:https://gist.github.com/1067820
Stats under load:https://gist.github.com/1067822
Top:https://gist.github.com/1067818

Sample code for generating the query:https://gist.github.com/1067840

What's the amount of m1.xlarge servers you suggest?

Regards,
Ariel

On Jul 6, 1:52 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

Hey,

I would suggest using m1.xlarge, and allocate to ES ~5gb. When you run the test, can you take a node.stats sample, and gist it?

As for your queries and using filter, it feels like filters will really help here. Can you gist a sample query so we can work on how it will look like?

-shay.banon

On Wednesday, July 6, 2011 at 7:46 PM, Ariel Amato wrote:

Hi Shay, I am allocating 5gb of ram for ES on each server. When it's
not under load, heap usage is really low ~350mb with the 5gb
allocated.
Under load, the amount threads increased to around 150 (when I stopped
the test).
The problem with the uneven load between the master and slave node
seems to be fixed if I specify a routing key when indexing.
The main cause for the load seems to be cpu usage, here is a "top"
output:

top - 16:24:07 up 17:45, 2 users, load average: 31.69, 20.89, 11.02
Tasks: 158 total, 1 running, 157 sleeping, 0 stopped, 0 zombie
Cpu(s): 88.3%us, 3.3%sy, 0.0%ni, 2.7%id, 0.0%wa, 0.0%hi,
0.1%si, 5.6%st
Mem: 7132040k total, 7042560k used, 89480k free, 178336k
buffers
Swap: 0k total, 0k used, 0k free, 1078180k
cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
28055 root 20 0 5689m 5.3g 16m S 789 77.7
21:45.30
java
221 root 20 0 0 0 0 S
0 0.0 0:28.83
kjournald
28314 root 20 0 19276 1352 968 R 0
0.0 0:00.06
top
1 root 20 0 23860 1364 656 S 0
0.0 0:00.83
init

Regarding the queries, for the range terms, I need to downgrade
gracefully the quality of the results that do not match.
For example, when searching for an age field, I need to specify a
range of 25 to 31, so the users on that range get the greatest score,
but I also need to include on the results users with 32 with a lower
score, and 35 with an even lower score. Please let me know if there is
a way to do that with filters.

I'm in the process of expanding the cluster to 4 m1.large instances,
with a 2 shards / 2 replicas configuration.
Please let me know what you think about that.

Thanks in advance
Regards,

Ariel

On Jul 6, 1:07 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

On ec2, I would suggest going with m1.xlarge, as more RAM means more file system cache. Also, having more shards on 2 nodes can slow down search, as it needs to execute across all of them, maybe you can do with 2-4 shards (with 1 replica). Increasing the number of replicas will not help, since you only run on 2 nodes (and you can always increase it dynamically later if oyu add more nodes).

Also, in terms of queries, maybe some of your range queries, especially the numeric ones, can benefit from being used as filters instead of queries (so they can be cached).

How much memory are you allocating to ES?

Master nodes play no part when searching, so the fact that it uses more resources does not imply because its the master node.

Can you tell where the load is coming from? Is it CPU, IO?

On Wednesday, July 6, 2011 at 7:33 AM, jp.lora...@cfyar.com (mailto:jp.lora...@cfyar.com (http://cfyar.com)) wrote:

How much min and max ram for ES -- you only mention the image's RAM. Is there anything else running on the box?

Also a high shard/replica ratio like you have increases indexing (esp. bulk) performance, but the inverse ratio will boost search speed, so enlarging the amount of replicas will speed up the system (have to be more replicas than shards though for optimal results).

I would try with smaller machines -- less cores, less ram, 2 shards, 8 replicas, 8 nodes total. Shay suggests in the README on github a 1/10 ratio even.

...

read more »


(Ariel Amato) #19

Hi Shay,
I tested what you suggested but i'm getting the same
results...

Regards,
Ariel Amato

On Jul 7, 10:56 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Thread dumps are a point in time in nature, but, I think I found a reason why there isn't a proper distribution of search request across a shard and a replica (in a 1 replica scenario). I will push a fix, but, for now, can you simply set

action.search.optimize_single_shard: false

in the config file and see?

On Friday, July 8, 2011 at 12:46 AM, Ariel Amato wrote:

Hi again,
After reviewing the thread dumps I could see that only the
master node threads where busy, all with similar stack traces:

"elasticsearch[search]-pool-3-thread-10" daemon prio=10
tid=0x00007f4698300800 nid=0x4d7 runnable [0x00007f4695ddc000]
java.lang.Thread.State: RUNNABLE
at org.apache.lucene.search.BooleanScorer2.docID(BooleanScorer2.java:
298)
at org.apache.lucene.search.FilteredQuery
$1$1.nextDoc(FilteredQuery.java:147)
at org.apache.lucene.search.Scorer.score(Scorer.java:89)

and the threads of the other 2 nodes had this stack trace:

"elasticsearch[search]-pool-3-thread-2" daemon prio=10
tid=0x0000000041d33000 nid=0x47b waiting on condition
[0x00007f0d689bf000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)

  • parking to wait for <0x00000006c61a6b08> (a
    org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
    at
    org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitM atch(LinkedTransferQueue.java:

After that, I downgraded to ES 0.15.2, with better results.
Load across the nodes was similar and the requests per seconds
increased to ~45.
Sadly, system load an all nodes stayed around 8 during the
tests, which worries me.
Is there an explanation for this?

Best regards,

Ariel Amato

On Jul 7, 3:24 pm, Ariel Amato <ariel.amato...@gmail.com (http://gmail.com)> wrote:

Adding thread dumps for the master and slave nodes:

Master:https://gist.github.com/1070165
Slave1:https://gist.github.com/1070167
Slave2:https://gist.github.com/1070168

Regards,
Ariel

On Jul 7, 2:24 pm, Ariel Amato <ariel.amato...@gmail.com (http://gmail.com)> wrote:

Another thing that really worries me is the difference in system load
between the nodes, when i'm running the tests, it's always the master
node, that has a spike on system load.
Even if the requests are sent to all of the nodes.

Regards,
Ariel

On Jul 7, 2:07 pm, Ariel Amato <ariel.amato...@gmail.com (http://gmail.com)> wrote:

Hi again,
Here's a sample query just the age search like you
suggested:https://gist.github.com/1069984.
Right now i'm running a 3 server cluster, with that config
that's why I tested with 3 replicas.
I've put the configuration for the thread pool, but
still.... the same problems happen.

Thanks for your assistance, and any extra help would be appreciated!

Regards,
Ariel

On Jul 7, 11:32 am, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

Hey,

What I would like to see are those queries that you execute, and possibly the script.

A note on shards and replicas. You can increase teh number of replicas to a high value, but as long as you don't have enough nodes to allocate them, it does not matter. For example, if you have 3 shards and 1 replica, and 2 nodes, and then you increase the number of replicas, it would not have affect unless you bring more nodes into the picture.

Another option that you can try and play with is the search thread pool. Can you try and configure it to be a fixed sized thread pool? The configuration is:

threadpool.search.type: fixed
threadpool.search.size: 10

-shay.banon

On Thursday, July 7, 2011 at 2:27 AM, Ariel Amato wrote:

Hi again,

I created the cluster you suggested, but with no luck. Same results as
before, an extremely high load on one of the nodes and almost nothing
on the other and a lot of timeouts.
After that I added a new node to the cluster, with a configuration
like this:

  • 3 shards / 1 replica, at first, and then
  • 3 shards / 3 replicas

Results were similar, then I removed almost every term from the query
(leaving just 2 terms, which are simple integer value properties), and
ran the stress test again, with a big improvement on the amount of
requests per second, with just a few timeouts, but there were still
timeouts.
But again, system load skyrocketed, here are some captures from the
servers:

Master node:
top:https://gist.github.com/1068559
vmstat:https://gist.github.com/1068564

Slave node 1:
top:https://gist.github.com/1068566
vmstat;https://gist.github.com/1068571

Slave node 2:
top:https://gist.github.com/1068576

Total GC time (ParNew + CMS) on all nodes does not exceed 300ms.

We're using the Sun jvm, Java HotSpot(TM) 64-Bit Server VM, version
19.1-b02, Java version 1.6.0_24 on all nodes and the OS is Ubuntu
10.10.

Would a thread dump of the nodes help?
Right now i'm using the dynamic mapping, except for a geo location
point, would it help to create the mappings before indexing? At the
moment i'm not seeing any problems at indexing time.
I'm actually running out of questions to make, tomorrow i'll work a
little on replicating the queries with your suggestion, but even
without the search query the system load is way too high.

Thank you in advance for any help you can give me on this.

Best regards
Ariel

On Jul 6, 4:33 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

Heya,

Lets start with the query, try and replace the range query you add as a should clause with a ConstantScore one wrapping a range filter (and set the boost on it). This will work only on memory and should speed things up.

The suggested filtering will use more memory (you should see in the stats the filter cache being used), but not by much, since age is easily cacheable.

I sugget you start with 2 nodes, and test, lets see what we get. Improving search perf is simple since once if you don't have full utilization of nodes (for example, 2 shard with 1 replica will max out on 4 nodes), you can dynamically increase the number of replicas.

-shay.banon

On Wednesday, July 6, 2011 at 8:43 PM, Ariel Amato wrote:

Hey,

Here they are:
Stats when idle:https://gist.github.com/1067820
Stats under load:https://gist.github.com/1067822
Top:https://gist.github.com/1067818

Sample code for generating the query:https://gist.github.com/1067840

What's the amount of m1.xlarge servers you suggest?

Regards,
Ariel

On Jul 6, 1:52 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

Hey,

I would suggest using m1.xlarge, and allocate to ES ~5gb. When you run the test, can you take a node.stats sample, and gist it?

As for your queries and using filter, it feels like filters will really help here. Can you gist a sample query so we can work on how it will look like?

-shay.banon

On Wednesday, July 6, 2011 at 7:46 PM, Ariel Amato wrote:

Hi Shay, I am allocating 5gb of ram for ES on each server. When it's
not under load, heap usage is really low ~350mb with the 5gb
allocated.
Under load, the amount threads increased to around 150 (when I stopped
the test).
The problem with the uneven load between the master and slave node
seems to be fixed if I specify a routing key when indexing.
The main cause for the load seems to be cpu usage, here is a "top"
output:

top - 16:24:07 up 17:45, 2 users, load average: 31.69, 20.89, 11.02
Tasks: 158 total, 1 running, 157 sleeping, 0 stopped, 0 zombie
Cpu(s): 88.3%us, 3.3%sy, 0.0%ni, 2.7%id, 0.0%wa, 0.0%hi,
0.1%si, 5.6%st
Mem: 7132040k total, 7042560k used, 89480k free, 178336k
buffers
Swap: 0k total, 0k used, 0k free, 1078180k
cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
28055 root 20 0 5689m 5.3g 16m S 789 77.7
21:45.30
java
221 root 20 0 0 0 0 S
0 0.0 0:28.83
kjournald
28314 root 20 0 19276 1352 968 R 0
0.0 0:00.06
top
1 root 20 0 23860 1364 656 S 0
0.0 0:00.83
init

Regarding the queries, for the range terms, I need to downgrade
gracefully the quality of the results that do not match.
For example, when searching for an age field,

...

read more »


(system) #20