Improving a slow running Match_All Query


(sairam-2) #1

Hello,

The queries that we run seem to be very CPU Intensive and cause the Servers
to max out within a short amount of time. On debugging, it looks like
standard queries take too long to respond too.

We are currently running version 1.0.2 of Elasticsearch and have about
67.3G of data on Production. There are currently 5 Shards running on 2
Nodes (1 Replica). There is a total of 252gb RAM with Heap Size set to
109.9gb.

https://lh5.googleusercontent.com/-4bLBVeQuIHY/U4YnR7KX__I/AAAAAAAAAAM/XA8B0kff_u8/s1600/Elasticsearch+Setup.png

The Indexing rate is high since we are migrating data:

From Bigdesk:
https://lh4.googleusercontent.com/-gwJ_8GUT5nc/U4ZplswvYYI/AAAAAAAAAA0/6UhKvCpCF6c/s1600/Indexing+Rate.png

The Refresh Activity report from ElasticHQ:

https://lh6.googleusercontent.com/-DHO2pXXflME/U4ZqLxawJKI/AAAAAAAAABA/7pqxXnKbLn0/s1600/ElasticHQ+-+Index+Activity.png

The Match All Query takes a whopping 520 - 680ms to run.
{
"query": {
"match_all": {}
}
}

However, on a similar Test Environment Setup (with 8G of data), the same
query takes about 80-120ms to execute. Which feels more like the average.

https://lh5.googleusercontent.com/-LTtjBlbcHR0/U4YqUTlNt_I/AAAAAAAAAAc/P1h6okpdm4w/s1600/Elasticsearch+Test+Setup.png
What are some of the recommendations that can improve this bottleneck? Will
adding more Nodes help alleviate this issue or will it worsen it.

Thanks,
Sairam

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/66c26a89-b3f8-4a8c-96dc-45babc1e012d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(sairam-2) #2

Bump

On Wednesday, May 28, 2014 4:10:26 PM UTC-7, sai...@roblox.com wrote:

Hello,

The queries that we run seem to be very CPU Intensive and cause the
Servers to max out within a short amount of time. On debugging, it looks
like standard queries take too long to respond too.

We are currently running version 1.0.2 of Elasticsearch and have about
67.3G of data on Production. There are currently 5 Shards running on 2
Nodes (1 Replica). There is a total of 252gb RAM with Heap Size set to
109.9gb.

https://lh5.googleusercontent.com/-4bLBVeQuIHY/U4YnR7KX__I/AAAAAAAAAAM/XA8B0kff_u8/s1600/Elasticsearch+Setup.png

The Indexing rate is high since we are migrating data:

From Bigdesk:

https://lh4.googleusercontent.com/-gwJ_8GUT5nc/U4ZplswvYYI/AAAAAAAAAA0/6UhKvCpCF6c/s1600/Indexing+Rate.png

The Refresh Activity report from ElasticHQ:

https://lh6.googleusercontent.com/-DHO2pXXflME/U4ZqLxawJKI/AAAAAAAAABA/7pqxXnKbLn0/s1600/ElasticHQ+-+Index+Activity.png

The Match All Query takes a whopping 520 - 680ms to run.
{
"query": {
"match_all": {}
}
}

However, on a similar Test Environment Setup (with 8G of data), the same
query takes about 80-120ms to execute. Which feels more like the average.

https://lh5.googleusercontent.com/-LTtjBlbcHR0/U4YqUTlNt_I/AAAAAAAAAAc/P1h6okpdm4w/s1600/Elasticsearch+Test+Setup.png
What are some of the recommendations that can improve this bottleneck? Will
adding more Nodes help alleviate this issue or will it worsen it.

Thanks,
Sairam

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/30267e6d-317b-4bb2-aa8a-66f05cfdf49f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #3

Is "match_all" always running at that time or is it getting faster after a
first run?

Did you run an optimize with maximum number of segments? What is your
segment count?

Jörg

On Fri, May 30, 2014 at 9:20 PM, sairam@roblox.com wrote:

Bump

On Wednesday, May 28, 2014 4:10:26 PM UTC-7, sai...@roblox.com wrote:

Hello,

The queries that we run seem to be very CPU Intensive and cause the
Servers to max out within a short amount of time. On debugging, it looks
like standard queries take too long to respond too.

We are currently running version 1.0.2 of Elasticsearch and have about
67.3G of data on Production. There are currently 5 Shards running on 2
Nodes (1 Replica). There is a total of 252gb RAM with Heap Size set to
109.9gb.

https://lh5.googleusercontent.com/-4bLBVeQuIHY/U4YnR7KX__I/AAAAAAAAAAM/XA8B0kff_u8/s1600/Elasticsearch+Setup.png

The Indexing rate is high since we are migrating data:

From Bigdesk:

https://lh4.googleusercontent.com/-gwJ_8GUT5nc/U4ZplswvYYI/AAAAAAAAAA0/6UhKvCpCF6c/s1600/Indexing+Rate.png

The Refresh Activity report from ElasticHQ:

https://lh6.googleusercontent.com/-DHO2pXXflME/U4ZqLxawJKI/AAAAAAAAABA/7pqxXnKbLn0/s1600/ElasticHQ+-+Index+Activity.png

The Match All Query takes a whopping 520 - 680ms to run.
{
"query": {
"match_all": {}
}
}

However, on a similar Test Environment Setup (with 8G of data), the same
query takes about 80-120ms to execute. Which feels more like the average.

https://lh5.googleusercontent.com/-LTtjBlbcHR0/U4YqUTlNt_I/AAAAAAAAAAc/P1h6okpdm4w/s1600/Elasticsearch+Test+Setup.png
What are some of the recommendations that can improve this bottleneck? Will
adding more Nodes help alleviate this issue or will it worsen it.

Thanks,
Sairam

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/30267e6d-317b-4bb2-aa8a-66f05cfdf49f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/30267e6d-317b-4bb2-aa8a-66f05cfdf49f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE57R%2BUnx-xUf9%3D0bb76e_JBzsycrsiFDiC%2BDujAus-zw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(sairam-2) #4

Yes, the match_all keeps taking that time. It hasn't improved after the
first few queries.

I did not run the Optimize command since we were in the middle of Indexing.
I can run it now by setting the max_num_segments to 1.

On Friday, May 30, 2014 1:52:55 PM UTC-7, Jörg Prante wrote:

Is "match_all" always running at that time or is it getting faster after a
first run?

Did you run an optimize with maximum number of segments? What is your
segment count?

Jörg

On Fri, May 30, 2014 at 9:20 PM, <sai...@roblox.com <javascript:>> wrote:

Bump

On Wednesday, May 28, 2014 4:10:26 PM UTC-7, sai...@roblox.com wrote:

Hello,

The queries that we run seem to be very CPU Intensive and cause the
Servers to max out within a short amount of time. On debugging, it looks
like standard queries take too long to respond too.

We are currently running version 1.0.2 of Elasticsearch and have about
67.3G of data on Production. There are currently 5 Shards running on 2
Nodes (1 Replica). There is a total of 252gb RAM with Heap Size set to
109.9gb.

https://lh5.googleusercontent.com/-4bLBVeQuIHY/U4YnR7KX__I/AAAAAAAAAAM/XA8B0kff_u8/s1600/Elasticsearch+Setup.png

The Indexing rate is high since we are migrating data:

From Bigdesk:

https://lh4.googleusercontent.com/-gwJ_8GUT5nc/U4ZplswvYYI/AAAAAAAAAA0/6UhKvCpCF6c/s1600/Indexing+Rate.png

The Refresh Activity report from ElasticHQ:

https://lh6.googleusercontent.com/-DHO2pXXflME/U4ZqLxawJKI/AAAAAAAAABA/7pqxXnKbLn0/s1600/ElasticHQ+-+Index+Activity.png

The Match All Query takes a whopping 520 - 680ms to run.
{
"query": {
"match_all": {}
}
}

However, on a similar Test Environment Setup (with 8G of data), the same
query takes about 80-120ms to execute. Which feels more like the average.

https://lh5.googleusercontent.com/-LTtjBlbcHR0/U4YqUTlNt_I/AAAAAAAAAAc/P1h6okpdm4w/s1600/Elasticsearch+Test+Setup.png
What are some of the recommendations that can improve this bottleneck? Will
adding more Nodes help alleviate this issue or will it worsen it.

Thanks,
Sairam

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/30267e6d-317b-4bb2-aa8a-66f05cfdf49f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/30267e6d-317b-4bb2-aa8a-66f05cfdf49f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f5f590d4-649d-4147-a059-270e1cf2320f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #5