Performance under concurrent queries


(Hugh Miles) #1

Hi,

I've been testing concurrent search queries using JMeter.

I have an index with two types: commenttab contains around 725k
documents, hierarchy contains just 2600 documents. I run faceted and
straight queries. The faceted queries run against commenttab and use
the terms_stats facet to return mean values of question scores for
different values of a key field. e.g.,

curl -XGET localhost:9200/summary/commenttab/_search -d
{
"size": 0,
"query": {
"matchAll": {}
},
"facets": {
"week-sat": {
"terms_stats": {
"size": 0,
"key_field": "week",
"value_script":
"(doc['question2'].doubleValue + doc['question3'].doubleValue)/2"
}
},
"week-nps": {
"terms_stats": {
"size": 0,
"key_field": "week",
"value_field": "question1"
}
}
}
}

The straight query just fetches a page of documents from commenttab,
but does include a sort. e.g.

curl -XGET localhost:9200/summary/commenttab/_search -d'
{
"from": 0,
"size": 25,
"sort": [
{
"message_time": "desc"
}
],
"fields": [
"message_time",
"message_text",
"store",
"rdm",
"rod",
"question1",
"question2",
"question3",
"cat1",
"cat2"
],
"query": {
"matchAll": { }
}
}

I have two nodes. The index is split into 5 shards and there is 1
replica.

My problem is that when I have more than about 10 threads running in
JMeter, the response time for each of these queries shoots up from
less than 100ms to anything up to greater than 1.5 minutes (1000
threads).

The master node has 4 cores and 8GB. I set ES_MIN_MEM/ES_MAX_MEM to
4GB. The slave node has only 1 core and 1GB. I ran top on the two node
servers. The master is running with about 50% CPU assigned to ES. The
slave is running with 99%+.

For comparison, I ran an unsorted query against my commenttab and
hierarchy datasets. For 1000 threads, the response time on the small
hierarchy dataset stayed at less than 100ms; the response time on the
large commenttab dataset was 11s and the slave CPU went up to 99%.

curl -XGET localhost:9200/summary/commenttab/_search -d'
{
"size": 20,
"query": {
"matchAll": {}
}
}

I'm planning to try the following but do you have any other pointers
as to what I should be looking at?

  • increasing the number of nodes by adding another 4 core server
  • increasing the number of replicas
  • run optimize on the index

Thanks for a great product!

Hugh


(Karussell) #2

hmmh, ElasticSearch does not make a difference (as far as I know) of
the machines. So it will utilize the slave too much and my guess is,
that the smaller slave will go out of RAM (facetting and sorting are
no lightweight operations ;)) or just one cpu is not enough for 5
shards.

Peter.

On 25 Nov., 13:45, Hugh Miles huge.mi...@googlemail.com wrote:

Hi,

I've been testing concurrent search queries using JMeter.

I have an index with two types: commenttab contains around 725k
documents, hierarchy contains just 2600 documents. I run faceted and
straight queries. The faceted queries run against commenttab and use
the terms_stats facet to return mean values of question scores for
different values of a key field. e.g.,

curl -XGET localhost:9200/summary/commenttab/_search -d
{
"size": 0,
"query": {
"matchAll": {}
},
"facets": {
"week-sat": {
"terms_stats": {
"size": 0,
"key_field": "week",
"value_script":
"(doc['question2'].doubleValue + doc['question3'].doubleValue)/2"
}
},
"week-nps": {
"terms_stats": {
"size": 0,
"key_field": "week",
"value_field": "question1"
}
}
}
}

The straight query just fetches a page of documents from commenttab,
but does include a sort. e.g.

curl -XGET localhost:9200/summary/commenttab/_search -d'
{
"from": 0,
"size": 25,
"sort": [
{
"message_time": "desc"
}
],
"fields": [
"message_time",
"message_text",
"store",
"rdm",
"rod",
"question1",
"question2",
"question3",
"cat1",
"cat2"
],
"query": {
"matchAll": { }
}
}

I have two nodes. The index is split into 5 shards and there is 1
replica.

My problem is that when I have more than about 10 threads running in
JMeter, the response time for each of these queries shoots up from
less than 100ms to anything up to greater than 1.5 minutes (1000
threads).

The master node has 4 cores and 8GB. I set ES_MIN_MEM/ES_MAX_MEM to
4GB. The slave node has only 1 core and 1GB. I ran top on the two node
servers. The master is running with about 50% CPU assigned to ES. The
slave is running with 99%+.

For comparison, I ran an unsorted query against my commenttab and
hierarchy datasets. For 1000 threads, the response time on the small
hierarchy dataset stayed at less than 100ms; the response time on the
large commenttab dataset was 11s and the slave CPU went up to 99%.

curl -XGET localhost:9200/summary/commenttab/_search -d'
{
"size": 20,
"query": {
"matchAll": {}
}

}

I'm planning to try the following but do you have any other pointers
as to what I should be looking at?

  • increasing the number of nodes by adding another 4 core server
  • increasing the number of replicas
  • run optimize on the index

Thanks for a great product!

Hugh


(Eran Kutner-2) #3

Are you sure the delay came from ES and not from Jmeter?
You can't just add threads and expect performance to improve, in fact, the
better the repose time of your test subject the less threads you need, you
need to add threads just to the point where your Jmeter machine CPU, starts
to become loaded but before thread context switches start skewing your test
results. There isn't a right number I can give you because it depends on
the combinations of machines and applications you are testing but we've
found that for a reasonably performing application anything beyond 80
threads for a Jmeter machine starts increasing latency on the Jmeter side.
Adding more Jmeter machines is the right way to scale the test without
increasing the number of threads on each machine too much.


(Shay Banon) #4

Both answers mentioned here are very important, just want to second it. The
first problem is that the "slave" machine is considerably weaker compared
to the "master" machine. In elasticsearch, there isn't really a concept of
a "slave" node, nodes are containers that have shards allocated on them.
So, the "slave" machine is simply killing the performance. Second, whenever
doing multi threaded test, check the clients as well as ES.

On Sun, Nov 27, 2011 at 9:04 AM, Eran Kutner eran@gigya-inc.com wrote:

Are you sure the delay came from ES and not from Jmeter?
You can't just add threads and expect performance to improve, in fact, the
better the repose time of your test subject the less threads you need, you
need to add threads just to the point where your Jmeter machine CPU, starts
to become loaded but before thread context switches start skewing your test
results. There isn't a right number I can give you because it depends on
the combinations of machines and applications you are testing but we've
found that for a reasonably performing application anything beyond 80
threads for a Jmeter machine starts increasing latency on the Jmeter side.
Adding more Jmeter machines is the right way to scale the test without
increasing the number of threads on each machine too much.


(Hugh Miles) #5

Thanks All,

As far as JMeter goes, I'm pretty sure it's not significant. I ran
1000 threads getting a file from an Apache server and was seeing
response times of 43ms.

The weak slave did slow the whole cluster down. I ran 2 node and 3
node clusters with all the nodes at the same spec (4G heap and 4
cores). I tested with 10, 20 and 30 threads and got average response
times of approx 200, 300 and 800ms. The effect of adding the extra
node (with its own replica) was to reduce the average CPU load on each
node by about a third.

That suggests that I'm going to need a node for every 10 users (OK, 10
concurrent requests) which surprises me. How does that match with
other people's experience?

I am currently looking at the makeup and content of my indexes. For my
query that aggregates over the whole data set with a terms_stats
facet, I'm building an index which just has the projection of the data
needed for that query. My initial index contained complete documents.
I am also experimenting with the number of shard; I'm building and
index with 10 shards and another with only 1 shard.

Thanks again for your help,

Hugh

On Nov 28, 9:16 am, Shay Banon kim...@gmail.com wrote:

Both answers mentioned here are very important, just want to second it. The
first problem is that the "slave" machine is considerably weaker compared
to the "master" machine. In elasticsearch, there isn't really a concept of
a "slave" node, nodes are containers that have shards allocated on them.
So, the "slave" machine is simply killing the performance. Second, whenever
doing multi threaded test, check the clients as well as ES.

On Sun, Nov 27, 2011 at 9:04 AM, Eran Kutner e...@gigya-inc.com wrote:

Are you sure the delay came from ES and not from Jmeter?
You can't just add threads and expect performance to improve, in fact, the
better the repose time of your test subject the less threads you need, you
need to add threads just to the point where your Jmeter machine CPU, starts
to become loaded but before thread context switches start skewing your test
results. There isn't a right number I can give you because it depends on
the combinations of machines and applications you are testing but we've
found that for a reasonably performing application anything beyond 80
threads for a Jmeter machine starts increasing latency on the Jmeter side.
Adding more Jmeter machines is the right way to scale the test without
increasing the number of threads on each machine too much.


(Shay Banon) #6

It really depends on the search requests that you execute, how much data
you execute it on, against how many shards / replicas. One thing that you
can play with is the concurrent search operations (shard level) that can
execute on a node. Check the thread pool setting for search, it might make
sense to move to a fixed thread pool with 10-20 threads. See more here:
http://www.elasticsearch.org/guide/reference/modules/threadpool.html.

On Mon, Nov 28, 2011 at 1:32 PM, Hugh Miles huge.miles@googlemail.comwrote:

Thanks All,

As far as JMeter goes, I'm pretty sure it's not significant. I ran
1000 threads getting a file from an Apache server and was seeing
response times of 43ms.

The weak slave did slow the whole cluster down. I ran 2 node and 3
node clusters with all the nodes at the same spec (4G heap and 4
cores). I tested with 10, 20 and 30 threads and got average response
times of approx 200, 300 and 800ms. The effect of adding the extra
node (with its own replica) was to reduce the average CPU load on each
node by about a third.

That suggests that I'm going to need a node for every 10 users (OK, 10
concurrent requests) which surprises me. How does that match with
other people's experience?

I am currently looking at the makeup and content of my indexes. For my
query that aggregates over the whole data set with a terms_stats
facet, I'm building an index which just has the projection of the data
needed for that query. My initial index contained complete documents.
I am also experimenting with the number of shard; I'm building and
index with 10 shards and another with only 1 shard.

Thanks again for your help,

Hugh

On Nov 28, 9:16 am, Shay Banon kim...@gmail.com wrote:

Both answers mentioned here are very important, just want to second it.
The
first problem is that the "slave" machine is considerably weaker compared
to the "master" machine. In elasticsearch, there isn't really a concept
of
a "slave" node, nodes are containers that have shards allocated on them.
So, the "slave" machine is simply killing the performance. Second,
whenever
doing multi threaded test, check the clients as well as ES.

On Sun, Nov 27, 2011 at 9:04 AM, Eran Kutner e...@gigya-inc.com wrote:

Are you sure the delay came from ES and not from Jmeter?
You can't just add threads and expect performance to improve, in fact,
the

better the repose time of your test subject the less threads you need,
you

need to add threads just to the point where your Jmeter machine CPU,
starts

to become loaded but before thread context switches start skewing your
test

results. There isn't a right number I can give you because it depends
on

the combinations of machines and applications you are testing but we've
found that for a reasonably performing application anything beyond 80
threads for a Jmeter machine starts increasing latency on the Jmeter
side.

Adding more Jmeter machines is the right way to scale the test without
increasing the number of threads on each machine too much.


(Hugh Miles) #7

Thanks for the pointer.

I got much the best results using a blocking threadpool (with default
settings) for both the index and search threadpools. In my latest
test, I ran 1000 concurrent threads randomly split between page
queries repeating every 3s and terms_stats facet queries repeating
every 45s against a two node cluster (4-cores/4GB each). The average
response times were between 50ms and 250ms.

On Nov 28, 11:55 am, Shay Banon kim...@gmail.com wrote:

It really depends on the search requests that you execute, how much data
you execute it on, against how many shards / replicas. One thing that you
can play with is the concurrent search operations (shard level) that can
execute on a node. Check the thread pool setting for search, it might make
sense to move to a fixed thread pool with 10-20 threads. See more here:http://www.elasticsearch.org/guide/reference/modules/threadpool.html.

On Mon, Nov 28, 2011 at 1:32 PM, Hugh Miles huge.mi...@googlemail.comwrote:

Thanks All,

As far as JMeter goes, I'm pretty sure it's not significant. I ran
1000 threads getting a file from an Apache server and was seeing
response times of 43ms.

The weak slave did slow the whole cluster down. I ran 2 node and 3
node clusters with all the nodes at the same spec (4G heap and 4
cores). I tested with 10, 20 and 30 threads and got average response
times of approx 200, 300 and 800ms. The effect of adding the extra
node (with its own replica) was to reduce the average CPU load on each
node by about a third.

That suggests that I'm going to need a node for every 10 users (OK, 10
concurrent requests) which surprises me. How does that match with
other people's experience?

I am currently looking at the makeup and content of my indexes. For my
query that aggregates over the whole data set with a terms_stats
facet, I'm building an index which just has the projection of the data
needed for that query. My initial index contained complete documents.
I am also experimenting with the number of shard; I'm building and
index with 10 shards and another with only 1 shard.

Thanks again for your help,

Hugh

On Nov 28, 9:16 am, Shay Banon kim...@gmail.com wrote:

Both answers mentioned here are very important, just want to second it.
The
first problem is that the "slave" machine is considerably weaker compared
to the "master" machine. In elasticsearch, there isn't really a concept
of
a "slave" node, nodes are containers that have shards allocated on them.
So, the "slave" machine is simply killing the performance. Second,
whenever
doing multi threaded test, check the clients as well as ES.

On Sun, Nov 27, 2011 at 9:04 AM, Eran Kutner e...@gigya-inc.com wrote:

Are you sure the delay came from ES and not from Jmeter?
You can't just add threads and expect performance to improve, in fact,
the

better the repose time of your test subject the less threads you need,
you

need to add threads just to the point where your Jmeter machine CPU,
starts

to become loaded but before thread context switches start skewing your
test

results. There isn't a right number I can give you because it depends
on

the combinations of machines and applications you are testing but we've
found that for a reasonably performing application anything beyond 80
threads for a Jmeter machine starts increasing latency on the Jmeter
side.

Adding more Jmeter machines is the right way to scale the test without
increasing the number of threads on each machine too much.


(system) #8