Performance under concurrent queries

Hugh_Miles · November 25, 2011, 12:45pm

Hi,

I've been testing concurrent search queries using JMeter.

I have an index with two types: commenttab contains around 725k
documents, hierarchy contains just 2600 documents. I run faceted and
straight queries. The faceted queries run against commenttab and use
the terms_stats facet to return mean values of question scores for
different values of a key field. e.g.,

curl -XGET localhost:9200/summary/commenttab/_search -d
{
"size": 0,
"query": {
"matchAll": {}
},
"facets": {
"week-sat": {
"terms_stats": {
"size": 0,
"key_field": "week",
"value_script":
"(doc['question2'].doubleValue + doc['question3'].doubleValue)/2"
}
},
"week-nps": {
"terms_stats": {
"size": 0,
"key_field": "week",
"value_field": "question1"
}
}
}
}

The straight query just fetches a page of documents from commenttab,
but does include a sort. e.g.

curl -XGET localhost:9200/summary/commenttab/_search -d'
{
"from": 0,
"size": 25,
"sort": [
{
"message_time": "desc"
}
],
"fields": [
"message_time",
"message_text",
"store",
"rdm",
"rod",
"question1",
"question2",
"question3",
"cat1",
"cat2"
],
"query": {
"matchAll": { }
}
}

I have two nodes. The index is split into 5 shards and there is 1
replica.

My problem is that when I have more than about 10 threads running in
JMeter, the response time for each of these queries shoots up from
less than 100ms to anything up to greater than 1.5 minutes (1000
threads).

The master node has 4 cores and 8GB. I set ES_MIN_MEM/ES_MAX_MEM to
4GB. The slave node has only 1 core and 1GB. I ran top on the two node
servers. The master is running with about 50% CPU assigned to ES. The
slave is running with 99%+.

For comparison, I ran an unsorted query against my commenttab and
hierarchy datasets. For 1000 threads, the response time on the small
hierarchy dataset stayed at less than 100ms; the response time on the
large commenttab dataset was 11s and the slave CPU went up to 99%.

curl -XGET localhost:9200/summary/commenttab/_search -d'
{
"size": 20,
"query": {
"matchAll": {}
}
}

I'm planning to try the following but do you have any other pointers
as to what I should be looking at?

increasing the number of nodes by adding another 4 core server
increasing the number of replicas
run optimize on the index

Thanks for a great product!

Hugh

Karussell1 · November 25, 2011, 11:48pm

hmmh, Elasticsearch does not make a difference (as far as I know) of
the machines. So it will utilize the slave too much and my guess is,
that the smaller slave will go out of RAM (facetting and sorting are
no lightweight operations ;)) or just one cpu is not enough for 5
shards.

Peter.

On 25 Nov., 13:45, Hugh Miles huge.mi...@googlemail.com wrote:

Hi,

I've been testing concurrent search queries using JMeter.

I have an index with two types: commenttab contains around 725k
documents, hierarchy contains just 2600 documents. I run faceted and
straight queries. The faceted queries run against commenttab and use
the terms_stats facet to return mean values of question scores for
different values of a key field. e.g.,

curl -XGET localhost:9200/summary/commenttab/_search -d
{
"size": 0,
"query": {
"matchAll": {}
},
"facets": {
"week-sat": {
"terms_stats": {
"size": 0,
"key_field": "week",
"value_script":
"(doc['question2'].doubleValue + doc['question3'].doubleValue)/2"
}
},
"week-nps": {
"terms_stats": {
"size": 0,
"key_field": "week",
"value_field": "question1"
}
}
}
}

The straight query just fetches a page of documents from commenttab,
but does include a sort. e.g.

curl -XGET localhost:9200/summary/commenttab/_search -d'
{
"from": 0,
"size": 25,
"sort": [
{
"message_time": "desc"
}
],
"fields": [
"message_time",
"message_text",
"store",
"rdm",
"rod",
"question1",
"question2",
"question3",
"cat1",
"cat2"
],
"query": {
"matchAll": { }
}
}

I have two nodes. The index is split into 5 shards and there is 1
replica.

My problem is that when I have more than about 10 threads running in
JMeter, the response time for each of these queries shoots up from
less than 100ms to anything up to greater than 1.5 minutes (1000
threads).

The master node has 4 cores and 8GB. I set ES_MIN_MEM/ES_MAX_MEM to
4GB. The slave node has only 1 core and 1GB. I ran top on the two node
servers. The master is running with about 50% CPU assigned to ES. The
slave is running with 99%+.

For comparison, I ran an unsorted query against my commenttab and
hierarchy datasets. For 1000 threads, the response time on the small
hierarchy dataset stayed at less than 100ms; the response time on the
large commenttab dataset was 11s and the slave CPU went up to 99%.

curl -XGET localhost:9200/summary/commenttab/_search -d'
{
"size": 20,
"query": {
"matchAll": {}
}

}

I'm planning to try the following but do you have any other pointers
as to what I should be looking at?

increasing the number of nodes by adding another 4 core server

increasing the number of replicas

run optimize on the index

Thanks for a great product!

Hugh

Eran_Kutner_2 · November 27, 2011, 7:04am

Are you sure the delay came from ES and not from Jmeter?
You can't just add threads and expect performance to improve, in fact, the
better the repose time of your test subject the less threads you need, you
need to add threads just to the point where your Jmeter machine CPU, starts
to become loaded but before thread context switches start skewing your test
results. There isn't a right number I can give you because it depends on
the combinations of machines and applications you are testing but we've
found that for a reasonably performing application anything beyond 80
threads for a Jmeter machine starts increasing latency on the Jmeter side.
Adding more Jmeter machines is the right way to scale the test without
increasing the number of threads on each machine too much.

kimchy · November 28, 2011, 9:16am

Both answers mentioned here are very important, just want to second it. The
first problem is that the "slave" machine is considerably weaker compared
to the "master" machine. In elasticsearch, there isn't really a concept of
a "slave" node, nodes are containers that have shards allocated on them.
So, the "slave" machine is simply killing the performance. Second, whenever
doing multi threaded test, check the clients as well as ES.

On Sun, Nov 27, 2011 at 9:04 AM, Eran Kutner eran@gigya-inc.com wrote:

Are you sure the delay came from ES and not from Jmeter?
You can't just add threads and expect performance to improve, in fact, the
better the repose time of your test subject the less threads you need, you
need to add threads just to the point where your Jmeter machine CPU, starts
to become loaded but before thread context switches start skewing your test
results. There isn't a right number I can give you because it depends on
the combinations of machines and applications you are testing but we've
found that for a reasonably performing application anything beyond 80
threads for a Jmeter machine starts increasing latency on the Jmeter side.
Adding more Jmeter machines is the right way to scale the test without
increasing the number of threads on each machine too much.

Hugh_Miles · November 28, 2011, 11:32am

Thanks All,

As far as JMeter goes, I'm pretty sure it's not significant. I ran
1000 threads getting a file from an Apache server and was seeing
response times of 43ms.

The weak slave did slow the whole cluster down. I ran 2 node and 3
node clusters with all the nodes at the same spec (4G heap and 4
cores). I tested with 10, 20 and 30 threads and got average response
times of approx 200, 300 and 800ms. The effect of adding the extra
node (with its own replica) was to reduce the average CPU load on each
node by about a third.

That suggests that I'm going to need a node for every 10 users (OK, 10
concurrent requests) which surprises me. How does that match with
other people's experience?

I am currently looking at the makeup and content of my indexes. For my
query that aggregates over the whole data set with a terms_stats
facet, I'm building an index which just has the projection of the data
needed for that query. My initial index contained complete documents.
I am also experimenting with the number of shard; I'm building and
index with 10 shards and another with only 1 shard.

Thanks again for your help,

Hugh

On Nov 28, 9:16 am, Shay Banon kim...@gmail.com wrote:

Both answers mentioned here are very important, just want to second it. The
first problem is that the "slave" machine is considerably weaker compared
to the "master" machine. In elasticsearch, there isn't really a concept of
a "slave" node, nodes are containers that have shards allocated on them.
So, the "slave" machine is simply killing the performance. Second, whenever
doing multi threaded test, check the clients as well as ES.

On Sun, Nov 27, 2011 at 9:04 AM, Eran Kutner e...@gigya-inc.com wrote:

Are you sure the delay came from ES and not from Jmeter?
You can't just add threads and expect performance to improve, in fact, the
better the repose time of your test subject the less threads you need, you
need to add threads just to the point where your Jmeter machine CPU, starts
to become loaded but before thread context switches start skewing your test
results. There isn't a right number I can give you because it depends on
the combinations of machines and applications you are testing but we've
found that for a reasonably performing application anything beyond 80
threads for a Jmeter machine starts increasing latency on the Jmeter side.
Adding more Jmeter machines is the right way to scale the test without
increasing the number of threads on each machine too much.

kimchy · November 28, 2011, 11:55am

It really depends on the search requests that you execute, how much data
you execute it on, against how many shards / replicas. One thing that you
can play with is the concurrent search operations (shard level) that can
execute on a node. Check the thread pool setting for search, it might make
sense to move to a fixed thread pool with 10-20 threads. See more here:
Elasticsearch Platform — Find real-time answers at scale | Elastic.

On Mon, Nov 28, 2011 at 1:32 PM, Hugh Miles huge.miles@googlemail.comwrote:

Thanks All,

As far as JMeter goes, I'm pretty sure it's not significant. I ran
1000 threads getting a file from an Apache server and was seeing
response times of 43ms.

The weak slave did slow the whole cluster down. I ran 2 node and 3
node clusters with all the nodes at the same spec (4G heap and 4
cores). I tested with 10, 20 and 30 threads and got average response
times of approx 200, 300 and 800ms. The effect of adding the extra
node (with its own replica) was to reduce the average CPU load on each
node by about a third.

That suggests that I'm going to need a node for every 10 users (OK, 10
concurrent requests) which surprises me. How does that match with
other people's experience?

I am currently looking at the makeup and content of my indexes. For my
query that aggregates over the whole data set with a terms_stats
facet, I'm building an index which just has the projection of the data
needed for that query. My initial index contained complete documents.
I am also experimenting with the number of shard; I'm building and
index with 10 shards and another with only 1 shard.

Thanks again for your help,

Hugh

On Nov 28, 9:16 am, Shay Banon kim...@gmail.com wrote:

Both answers mentioned here are very important, just want to second it.
The
first problem is that the "slave" machine is considerably weaker compared
to the "master" machine. In elasticsearch, there isn't really a concept
of
a "slave" node, nodes are containers that have shards allocated on them.
So, the "slave" machine is simply killing the performance. Second,
whenever
doing multi threaded test, check the clients as well as ES.

On Sun, Nov 27, 2011 at 9:04 AM, Eran Kutner e...@gigya-inc.com wrote:

Are you sure the delay came from ES and not from Jmeter?
You can't just add threads and expect performance to improve, in fact,
the
better the repose time of your test subject the less threads you need,
you
need to add threads just to the point where your Jmeter machine CPU,
starts
to become loaded but before thread context switches start skewing your
test
results. There isn't a right number I can give you because it depends
on
the combinations of machines and applications you are testing but we've
found that for a reasonably performing application anything beyond 80
threads for a Jmeter machine starts increasing latency on the Jmeter
side.
Adding more Jmeter machines is the right way to scale the test without
increasing the number of threads on each machine too much.

Hugh_Miles · November 30, 2011, 4:06pm

Thanks for the pointer.

I got much the best results using a blocking threadpool (with default
settings) for both the index and search threadpools. In my latest
test, I ran 1000 concurrent threads randomly split between page
queries repeating every 3s and terms_stats facet queries repeating
every 45s against a two node cluster (4-cores/4GB each). The average
response times were between 50ms and 250ms.

On Nov 28, 11:55 am, Shay Banon kim...@gmail.com wrote:

It really depends on the search requests that you execute, how much data
you execute it on, against how many shards / replicas. One thing that you
can play with is the concurrent search operations (shard level) that can
execute on a node. Check the thread pool setting for search, it might make
sense to move to a fixed thread pool with 10-20 threads. See more here:Elasticsearch Platform — Find real-time answers at scale | Elastic.

On Mon, Nov 28, 2011 at 1:32 PM, Hugh Miles huge.mi...@googlemail.comwrote:

Thanks All,

As far as JMeter goes, I'm pretty sure it's not significant. I ran
1000 threads getting a file from an Apache server and was seeing
response times of 43ms.

The weak slave did slow the whole cluster down. I ran 2 node and 3
node clusters with all the nodes at the same spec (4G heap and 4
cores). I tested with 10, 20 and 30 threads and got average response
times of approx 200, 300 and 800ms. The effect of adding the extra
node (with its own replica) was to reduce the average CPU load on each
node by about a third.

That suggests that I'm going to need a node for every 10 users (OK, 10
concurrent requests) which surprises me. How does that match with
other people's experience?

I am currently looking at the makeup and content of my indexes. For my
query that aggregates over the whole data set with a terms_stats
facet, I'm building an index which just has the projection of the data
needed for that query. My initial index contained complete documents.
I am also experimenting with the number of shard; I'm building and
index with 10 shards and another with only 1 shard.

Thanks again for your help,

Hugh

On Nov 28, 9:16 am, Shay Banon kim...@gmail.com wrote:

Both answers mentioned here are very important, just want to second it.
The
first problem is that the "slave" machine is considerably weaker compared
to the "master" machine. In elasticsearch, there isn't really a concept
of
a "slave" node, nodes are containers that have shards allocated on them.
So, the "slave" machine is simply killing the performance. Second,
whenever
doing multi threaded test, check the clients as well as ES.

On Sun, Nov 27, 2011 at 9:04 AM, Eran Kutner e...@gigya-inc.com wrote:

Are you sure the delay came from ES and not from Jmeter?
You can't just add threads and expect performance to improve, in fact,
the
better the repose time of your test subject the less threads you need,
you
need to add threads just to the point where your Jmeter machine CPU,
starts
to become loaded but before thread context switches start skewing your
test
results. There isn't a right number I can give you because it depends
on
the combinations of machines and applications you are testing but we've
found that for a reasonably performing application anything beyond 80
threads for a Jmeter machine starts increasing latency on the Jmeter
side.
Adding more Jmeter machines is the right way to scale the test without
increasing the number of threads on each machine too much.