Scaling out Elasticsearch Cluster to improve slow Empty Search query (520ms)


(sairam-2) #1

We currently run our Elasticsearch (v1.0.2) cluster on 3 Nodes with 5
Shards and 1 Replication
Scheme. The total index size is about 70GB
(~140GB with replication).

The Empty Search (/_search) query takes 500-600 ms to respond. Will adding
in more Nodes help in this case? The Servers are have 252gb of RAM and
110gb for Heap.

The Index uses the following analyzers - standard, lowercase, stop,
porter_stem. Will this degrade Query performance?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9941cfd1-d211-4706-aa45-6c545c66baff%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Nik Everett) #2

I imagine that depends on lots of stuff. Are you doing
elasticsearch:9200/_search or elasticsearch:9200/index/_search ? The
former can take quite a while if you have lots index and lots of shards.
If you can get away with not doing it, I would. The latter will only take
a long time if you have tons of shards. It should otherwise be pretty
quick.

On Tue, Jun 10, 2014 at 2:10 PM, sairam@roblox.com wrote:

We currently run our Elasticsearch (v1.0.2) cluster on 3 Nodes with 5
Shards and 1 Replication
Scheme. The total index size is about 70GB
(~140GB with replication).

The Empty Search (/_search) query takes 500-600 ms to respond. Will adding
in more Nodes help in this case? The Servers are have 252gb of RAM and
110gb for Heap.

The Index uses the following analyzers - standard, lowercase, stop,
porter_stem. Will this degrade Query performance?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9941cfd1-d211-4706-aa45-6c545c66baff%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9941cfd1-d211-4706-aa45-6c545c66baff%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3FBD_fDvejUwaf_L%3D8s4KzzLZmT%3DBJMyj9-5FeUs6YOA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(sairam-2) #3

I am currently running only 1 index with 5 shards. So the both of those
queries yield the same response time. My main question is to understand if
scaling out is an Option given the current replication scheme.

https://lh5.googleusercontent.com/-bz8iQd0KUaA/U5dMSGLNNFI/AAAAAAAAABg/tGJl0HOj4xo/s1600/Elasticsearch+Cluster.png

On Tuesday, June 10, 2014 11:15:26 AM UTC-7, Nikolas Everett wrote:

I imagine that depends on lots of stuff. Are you doing
elasticsearch:9200/_search or elasticsearch:9200/index/_search ? The
former can take quite a while if you have lots index and lots of shards.
If you can get away with not doing it, I would. The latter will only take
a long time if you have tons of shards. It should otherwise be pretty
quick.

On Tue, Jun 10, 2014 at 2:10 PM, <sai...@roblox.com <javascript:>> wrote:

We currently run our Elasticsearch (v1.0.2) cluster on 3 Nodes with 5
Shards and 1 Replication
Scheme. The total index size is about 70GB
(~140GB with replication).

The Empty Search (/_search) query takes 500-600 ms to respond. Will
adding in more Nodes help in this case? The Servers are have 252gb of
RAM and 110gb for Heap.

The Index uses the following analyzers - standard, lowercase, stop,
porter_stem. Will this degrade Query performance?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9941cfd1-d211-4706-aa45-6c545c66baff%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9941cfd1-d211-4706-aa45-6c545c66baff%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d5291841-e791-40b3-93e6-d8fbe9921ac5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Nik Everett) #4

Short answer: yes.
Long answer: 500ms is a long time for the empty query. I see 2ms from
elasticsearch and 23ms from time in development. In production I see maybe
54ms from elasticsearch and 70 from time across far far more shards and
more data. When I do the same query across thousands of shards and a
couple of TB of data I get ~250ms. Production is 16 servers with 96GB of
ram and 30GB heaps.

The analyzers really aren't going to hurt performance.

I'd have a look at your servers themselves: what kind of load are they
under? What is your indexing rate? that kind of thing.

Also, 30GB is normally the sweet spot for heap sizes, making ~64GB of total
ram the sweet spot for total ram. 110GB heap is pretty high and I'd expect
for new generation (pause the world) garbage collection to take a while
there.

Nik

On Tue, Jun 10, 2014 at 2:20 PM, sairam@roblox.com wrote:

I am currently running only 1 index with 5 shards. So the both of those
queries yield the same response time. My main question is to understand if
scaling out is an Option given the current replication scheme.

https://lh5.googleusercontent.com/-bz8iQd0KUaA/U5dMSGLNNFI/AAAAAAAAABg/tGJl0HOj4xo/s1600/Elasticsearch+Cluster.png

On Tuesday, June 10, 2014 11:15:26 AM UTC-7, Nikolas Everett wrote:

I imagine that depends on lots of stuff. Are you doing
elasticsearch:9200/_search or elasticsearch:9200/index/_search ? The
former can take quite a while if you have lots index and lots of shards.
If you can get away with not doing it, I would. The latter will only take
a long time if you have tons of shards. It should otherwise be pretty
quick.

On Tue, Jun 10, 2014 at 2:10 PM, sai...@roblox.com wrote:

We currently run our Elasticsearch (v1.0.2) cluster on 3 Nodes
with 5 Shards and 1 Replication Scheme. The total index size is
about 70GB (~140GB with replication).

The Empty Search (/_search) query takes 500-600 ms to respond. Will
adding in more Nodes help in this case? The Servers are have 252gb of
RAM and 110gb for Heap.

The Index uses the following analyzers - standard, lowercase, stop,
porter_stem. Will this degrade Query performance?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/9941cfd1-d211-4706-aa45-6c545c66baff%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9941cfd1-d211-4706-aa45-6c545c66baff%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d5291841-e791-40b3-93e6-d8fbe9921ac5%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/d5291841-e791-40b3-93e6-d8fbe9921ac5%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2pd%3DOyW1%2B7WmELMVm3Y9z28QAeTSzCpZktOiqHz9mMTQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(sairam-2) #5

Thanks for the clarification. The servers aren't under any (read) load yet.
There is constant update of data in the background - Roughly about 60 Index
Writes per second. The refresh interval is set to 60s. Can this be a
performance bottleneck?

We can add in more nodes to bring it up to 10 Nodes - 5 Shards with 1
Replica. But I doubt if that will reduce the Empty Search Query to 50ms.
Are there any other profiling tools out there to debug the response time?

On Tuesday, June 10, 2014 11:30:03 AM UTC-7, Nikolas Everett wrote:

Short answer: yes.
Long answer: 500ms is a long time for the empty query. I see 2ms from
elasticsearch and 23ms from time in development. In production I see maybe
54ms from elasticsearch and 70 from time across far far more shards and
more data. When I do the same query across thousands of shards and a
couple of TB of data I get ~250ms. Production is 16 servers with 96GB of
ram and 30GB heaps.

The analyzers really aren't going to hurt performance.

I'd have a look at your servers themselves: what kind of load are they
under? What is your indexing rate? that kind of thing.

Also, 30GB is normally the sweet spot for heap sizes, making ~64GB of
total ram the sweet spot for total ram. 110GB heap is pretty high and I'd
expect for new generation (pause the world) garbage collection to take a
while there.

Nik

On Tue, Jun 10, 2014 at 2:20 PM, <sai...@roblox.com <javascript:>> wrote:

I am currently running only 1 index with 5 shards. So the both of those
queries yield the same response time. My main question is to understand if
scaling out is an Option given the current replication scheme.

https://lh5.googleusercontent.com/-bz8iQd0KUaA/U5dMSGLNNFI/AAAAAAAAABg/tGJl0HOj4xo/s1600/Elasticsearch+Cluster.png

On Tuesday, June 10, 2014 11:15:26 AM UTC-7, Nikolas Everett wrote:

I imagine that depends on lots of stuff. Are you doing
elasticsearch:9200/_search or elasticsearch:9200/index/_search ? The
former can take quite a while if you have lots index and lots of shards.
If you can get away with not doing it, I would. The latter will only take
a long time if you have tons of shards. It should otherwise be pretty
quick.

On Tue, Jun 10, 2014 at 2:10 PM, sai...@roblox.com wrote:

We currently run our Elasticsearch (v1.0.2) cluster on 3 Nodes
with 5 Shards and 1 Replication Scheme. The total index size is
about 70GB (~140GB with replication).

The Empty Search (/_search) query takes 500-600 ms to respond. Will
adding in more Nodes help in this case? The Servers are have 252gb of
RAM and 110gb for Heap.

The Index uses the following analyzers - standard, lowercase, stop,
porter_stem. Will this degrade Query performance?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/9941cfd1-d211-4706-aa45-6c545c66baff%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9941cfd1-d211-4706-aa45-6c545c66baff%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d5291841-e791-40b3-93e6-d8fbe9921ac5%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/d5291841-e791-40b3-93e6-d8fbe9921ac5%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ecb24d7b-404e-482c-b70e-9b90d33fd18d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #6

You will likely see an increase by distributing it to one shard per
machine, but that's hard to quantify without actually doing it.

Also, you may be doing yourself a disservice with such a large heap size as
Nik mentioned. Over 32GB, Java pointers are not compressed and you do lose
a bit of performance due to this.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 11 June 2014 07:20, sairam@roblox.com wrote:

Thanks for the clarification. The servers aren't under any (read) load
yet. There is constant update of data in the background - Roughly about 60
Index Writes per second. The refresh interval is set to 60s. Can this be a
performance bottleneck?

We can add in more nodes to bring it up to 10 Nodes - 5 Shards with 1
Replica. But I doubt if that will reduce the Empty Search Query to 50ms.
Are there any other profiling tools out there to debug the response time?

On Tuesday, June 10, 2014 11:30:03 AM UTC-7, Nikolas Everett wrote:

Short answer: yes.
Long answer: 500ms is a long time for the empty query. I see 2ms from
elasticsearch and 23ms from time in development. In production I see maybe
54ms from elasticsearch and 70 from time across far far more shards and
more data. When I do the same query across thousands of shards and a
couple of TB of data I get ~250ms. Production is 16 servers with 96GB of
ram and 30GB heaps.

The analyzers really aren't going to hurt performance.

I'd have a look at your servers themselves: what kind of load are they
under? What is your indexing rate? that kind of thing.

Also, 30GB is normally the sweet spot for heap sizes, making ~64GB of
total ram the sweet spot for total ram. 110GB heap is pretty high and I'd
expect for new generation (pause the world) garbage collection to take a
while there.

Nik

On Tue, Jun 10, 2014 at 2:20 PM, sai...@roblox.com wrote:

I am currently running only 1 index with 5 shards. So the both of those
queries yield the same response time. My main question is to understand if
scaling out is an Option given the current replication scheme.

https://lh5.googleusercontent.com/-bz8iQd0KUaA/U5dMSGLNNFI/AAAAAAAAABg/tGJl0HOj4xo/s1600/Elasticsearch+Cluster.png

On Tuesday, June 10, 2014 11:15:26 AM UTC-7, Nikolas Everett wrote:

I imagine that depends on lots of stuff. Are you doing
elasticsearch:9200/_search or elasticsearch:9200/index/_search ? The
former can take quite a while if you have lots index and lots of shards.
If you can get away with not doing it, I would. The latter will only take
a long time if you have tons of shards. It should otherwise be pretty
quick.

On Tue, Jun 10, 2014 at 2:10 PM, sai...@roblox.com wrote:

We currently run our Elasticsearch (v1.0.2) cluster on 3 Nodes
with 5 Shards and 1 Replication Scheme. The total index size is
about 70GB (~140GB with replication).

The Empty Search (/_search) query takes 500-600 ms to respond. Will
adding in more Nodes help in this case? The Servers are have 252gb of
RAM and 110gb for Heap.

The Index uses the following analyzers - standard, lowercase, stop,
porter_stem. Will this degrade Query performance?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/9941cfd1-d211-4706-aa45-6c545c66baff%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/9941cfd1-d211-4706-aa45-6c545c66baff%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/d5291841-e791-40b3-93e6-d8fbe9921ac5%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/d5291841-e791-40b3-93e6-d8fbe9921ac5%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ecb24d7b-404e-482c-b70e-9b90d33fd18d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ecb24d7b-404e-482c-b70e-9b90d33fd18d%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624b%2BvvAar4%2Bje3uW8WtaoZ7WBdjHWMV%2Bfok9Q3%3DR%3DSFnnA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(sairam-2) #7

The Heap Size is being reduced to 30GB to ensure that's not the bottleneck.

The servers currently run SAS Drives. Though SSDs are usually preferred for
Elasticsearch, can this cause such disparities in performance? ElasticHQ
reports very high Refresh Rates, Search-Fetch and Search-Query rates.

On Tuesday, June 10, 2014 3:17:49 PM UTC-7, Mark Walkom wrote:

You will likely see an increase by distributing it to one shard per
machine, but that's hard to quantify without actually doing it.

Also, you may be doing yourself a disservice with such a large heap size
as Nik mentioned. Over 32GB, Java pointers are not compressed and you do
lose a bit of performance due to this.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 11 June 2014 07:20, <sai...@roblox.com <javascript:>> wrote:

Thanks for the clarification. The servers aren't under any (read) load
yet. There is constant update of data in the background - Roughly about 60
Index Writes per second. The refresh interval is set to 60s. Can this be a
performance bottleneck?

We can add in more nodes to bring it up to 10 Nodes - 5 Shards with 1
Replica. But I doubt if that will reduce the Empty Search Query to 50ms.
Are there any other profiling tools out there to debug the response time?

On Tuesday, June 10, 2014 11:30:03 AM UTC-7, Nikolas Everett wrote:

Short answer: yes.
Long answer: 500ms is a long time for the empty query. I see 2ms from
elasticsearch and 23ms from time in development. In production I see maybe
54ms from elasticsearch and 70 from time across far far more shards and
more data. When I do the same query across thousands of shards and a
couple of TB of data I get ~250ms. Production is 16 servers with 96GB of
ram and 30GB heaps.

The analyzers really aren't going to hurt performance.

I'd have a look at your servers themselves: what kind of load are they
under? What is your indexing rate? that kind of thing.

Also, 30GB is normally the sweet spot for heap sizes, making ~64GB of
total ram the sweet spot for total ram. 110GB heap is pretty high and I'd
expect for new generation (pause the world) garbage collection to take a
while there.

Nik

On Tue, Jun 10, 2014 at 2:20 PM, sai...@roblox.com wrote:

I am currently running only 1 index with 5 shards. So the both of those
queries yield the same response time. My main question is to understand if
scaling out is an Option given the current replication scheme.

https://lh5.googleusercontent.com/-bz8iQd0KUaA/U5dMSGLNNFI/AAAAAAAAABg/tGJl0HOj4xo/s1600/Elasticsearch+Cluster.png

On Tuesday, June 10, 2014 11:15:26 AM UTC-7, Nikolas Everett wrote:

I imagine that depends on lots of stuff. Are you doing
elasticsearch:9200/_search or elasticsearch:9200/index/_search ? The
former can take quite a while if you have lots index and lots of shards.
If you can get away with not doing it, I would. The latter will only take
a long time if you have tons of shards. It should otherwise be pretty
quick.

On Tue, Jun 10, 2014 at 2:10 PM, sai...@roblox.com wrote:

We currently run our Elasticsearch (v1.0.2) cluster on 3 Nodes
with 5 Shards and 1 Replication Scheme. The total index size is
about 70GB (~140GB with replication).

The Empty Search (/_search) query takes 500-600 ms to respond. Will
adding in more Nodes help in this case? The Servers are have 252gb
of RAM and 110gb for Heap.

The Index uses the following analyzers - standard, lowercase, stop,
porter_stem. Will this degrade Query performance?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/9941cfd1-d211-4706-aa45-6c545c66baff%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/9941cfd1-d211-4706-aa45-6c545c66baff%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/d5291841-e791-40b3-93e6-d8fbe9921ac5%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/d5291841-e791-40b3-93e6-d8fbe9921ac5%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ecb24d7b-404e-482c-b70e-9b90d33fd18d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ecb24d7b-404e-482c-b70e-9b90d33fd18d%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c6cf345b-762e-4093-80fe-b7c535da77b9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Zaki Agha) #8

Hi Mark,
With Java 7, are pointers compressed by default.

Other JVM Settings

-XX:UseCompressedOops

  Compressed oops is supported and enabled by default in Java SE 6u23 
  and later.

In Java SE 7, use of compressed oops is the default for 64-bit JVM
processes when -Xmx isn't specified and for values of -Xmx less than 32
gigabytes.

On Tuesday, June 10, 2014 3:17:49 PM UTC-7, Mark Walkom wrote:

You will likely see an increase by distributing it to one shard per
machine, but that's hard to quantify without actually doing it.

Also, you may be doing yourself a disservice with such a large heap size
as Nik mentioned. Over 32GB, Java pointers are not compressed and you do
lose a bit of performance due to this.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 11 June 2014 07:20, <sai...@roblox.com <javascript:>> wrote:

Thanks for the clarification. The servers aren't under any (read) load
yet. There is constant update of data in the background - Roughly about 60
Index Writes per second. The refresh interval is set to 60s. Can this be a
performance bottleneck?

We can add in more nodes to bring it up to 10 Nodes - 5 Shards with 1
Replica. But I doubt if that will reduce the Empty Search Query to 50ms.
Are there any other profiling tools out there to debug the response time?

On Tuesday, June 10, 2014 11:30:03 AM UTC-7, Nikolas Everett wrote:

Short answer: yes.
Long answer: 500ms is a long time for the empty query. I see 2ms from
elasticsearch and 23ms from time in development. In production I see maybe
54ms from elasticsearch and 70 from time across far far more shards and
more data. When I do the same query across thousands of shards and a
couple of TB of data I get ~250ms. Production is 16 servers with 96GB of
ram and 30GB heaps.

The analyzers really aren't going to hurt performance.

I'd have a look at your servers themselves: what kind of load are they
under? What is your indexing rate? that kind of thing.

Also, 30GB is normally the sweet spot for heap sizes, making ~64GB of
total ram the sweet spot for total ram. 110GB heap is pretty high and I'd
expect for new generation (pause the world) garbage collection to take a
while there.

Nik

On Tue, Jun 10, 2014 at 2:20 PM, sai...@roblox.com wrote:

I am currently running only 1 index with 5 shards. So the both of those
queries yield the same response time. My main question is to understand if
scaling out is an Option given the current replication scheme.

https://lh5.googleusercontent.com/-bz8iQd0KUaA/U5dMSGLNNFI/AAAAAAAAABg/tGJl0HOj4xo/s1600/Elasticsearch+Cluster.png

On Tuesday, June 10, 2014 11:15:26 AM UTC-7, Nikolas Everett wrote:

I imagine that depends on lots of stuff. Are you doing
elasticsearch:9200/_search or elasticsearch:9200/index/_search ? The
former can take quite a while if you have lots index and lots of shards.
If you can get away with not doing it, I would. The latter will only take
a long time if you have tons of shards. It should otherwise be pretty
quick.

On Tue, Jun 10, 2014 at 2:10 PM, sai...@roblox.com wrote:

We currently run our Elasticsearch (v1.0.2) cluster on 3 Nodes
with 5 Shards and 1 Replication Scheme. The total index size is
about 70GB (~140GB with replication).

The Empty Search (/_search) query takes 500-600 ms to respond. Will
adding in more Nodes help in this case? The Servers are have 252gb
of RAM and 110gb for Heap.

The Index uses the following analyzers - standard, lowercase, stop,
porter_stem. Will this degrade Query performance?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/9941cfd1-d211-4706-aa45-6c545c66baff%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/9941cfd1-d211-4706-aa45-6c545c66baff%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/d5291841-e791-40b3-93e6-d8fbe9921ac5%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/d5291841-e791-40b3-93e6-d8fbe9921ac5%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ecb24d7b-404e-482c-b70e-9b90d33fd18d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ecb24d7b-404e-482c-b70e-9b90d33fd18d%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2feb8fa6-70e3-426c-b1be-03f52ff8d512%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #9

SSDs help, though there is likely some other issue here so it's probably
not worth looking at, at this time.

Have you checked hot threads or the slow query log?
Can you provide more specs on your hardware? What java version are you
running?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 11 June 2014 08:41, sairam@roblox.com wrote:

The Heap Size is being reduced to 30GB to ensure that's not the bottleneck.

The servers currently run SAS Drives. Though SSDs are usually preferred
for Elasticsearch, can this cause such disparities in
performance? ElasticHQ reports very high Refresh Rates, Search-Fetch and
Search-Query rates.

On Tuesday, June 10, 2014 3:17:49 PM UTC-7, Mark Walkom wrote:

You will likely see an increase by distributing it to one shard per
machine, but that's hard to quantify without actually doing it.

Also, you may be doing yourself a disservice with such a large heap size
as Nik mentioned. Over 32GB, Java pointers are not compressed and you do
lose a bit of performance due to this.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 11 June 2014 07:20, sai...@roblox.com wrote:

Thanks for the clarification. The servers aren't under any (read) load
yet. There is constant update of data in the background - Roughly about 60
Index Writes per second. The refresh interval is set to 60s. Can this be a
performance bottleneck?

We can add in more nodes to bring it up to 10 Nodes - 5 Shards with 1
Replica. But I doubt if that will reduce the Empty Search Query to 50ms.
Are there any other profiling tools out there to debug the response time?

On Tuesday, June 10, 2014 11:30:03 AM UTC-7, Nikolas Everett wrote:

Short answer: yes.
Long answer: 500ms is a long time for the empty query. I see 2ms from
elasticsearch and 23ms from time in development. In production I see maybe
54ms from elasticsearch and 70 from time across far far more shards and
more data. When I do the same query across thousands of shards and a
couple of TB of data I get ~250ms. Production is 16 servers with 96GB of
ram and 30GB heaps.

The analyzers really aren't going to hurt performance.

I'd have a look at your servers themselves: what kind of load are they
under? What is your indexing rate? that kind of thing.

Also, 30GB is normally the sweet spot for heap sizes, making ~64GB of
total ram the sweet spot for total ram. 110GB heap is pretty high and I'd
expect for new generation (pause the world) garbage collection to take a
while there.

Nik

On Tue, Jun 10, 2014 at 2:20 PM, sai...@roblox.com wrote:

I am currently running only 1 index with 5 shards. So the both of
those queries yield the same response time. My main question is to
understand if scaling out is an Option given the current replication scheme.

https://lh5.googleusercontent.com/-bz8iQd0KUaA/U5dMSGLNNFI/AAAAAAAAABg/tGJl0HOj4xo/s1600/Elasticsearch+Cluster.png

On Tuesday, June 10, 2014 11:15:26 AM UTC-7, Nikolas Everett wrote:

I imagine that depends on lots of stuff. Are you doing
elasticsearch:9200/_search or elasticsearch:9200/index/_search ?
The former can take quite a while if you have lots index and lots of
shards. If you can get away with not doing it, I would. The latter will
only take a long time if you have tons of shards. It should otherwise be
pretty quick.

On Tue, Jun 10, 2014 at 2:10 PM, sai...@roblox.com wrote:

We currently run our Elasticsearch (v1.0.2) cluster on 3 Nodes
with 5 Shards and 1 Replication Scheme. The total index size is
about 70GB (~140GB with replication).

The Empty Search (/_search) query takes 500-600 ms to respond. Will
adding in more Nodes help in this case? The Servers are have 252gb
of RAM and 110gb for Heap.

The Index uses the following analyzers - standard, lowercase, stop,
porter_stem. Will this degrade Query performance?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9941cfd1-d21
1-4706-aa45-6c545c66baff%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9941cfd1-d211-4706-aa45-6c545c66baff%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/d5291841-e791-40b3-93e6-d8fbe9921ac5%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/d5291841-e791-40b3-93e6-d8fbe9921ac5%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/ecb24d7b-404e-482c-b70e-9b90d33fd18d%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ecb24d7b-404e-482c-b70e-9b90d33fd18d%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c6cf345b-762e-4093-80fe-b7c535da77b9%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c6cf345b-762e-4093-80fe-b7c535da77b9%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624aytE%3Dm1CnLLmagMpFJPOLxN2sytLKN%2B7ZW344fYHHbRA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(sairam-2) #10

Machine Specs:
Processor: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Number of CPU cores: 24
Number of Physical CPUs: 2
Installed RAM: [~256 GB Total] 128 GB 128 GB 16 MB
Drive: Two 278GB SAS Drive configured in
RAID 0
OS:
Arch: 64bit(x86_64)
OS Type: Linux
Kernel: 2.6.32-431.5.1.el6.x86_64
OS Version: Red Hat Enterprise Linux Server release 6.5
(Santiago)
Java Version: Java 1.7.0_51 (Java 7u51 x64 version for
Linux).

Since we don't have any read queries executed against ES, the Hot Threads
are under 1%. I have given 3 different scenarios with basic Match_All,
Query with Filters/Sorts and Idle time.

  • Search Query (Basic Match_All with no filters/sorts)

28.2% (141.2ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][search][T#46]'

 10/10 snapshots sharing following 10 elements

   sun.misc.Unsafe.park(Native Method)

   java.util.concurrent.locks.LockSupport.park(Unknown Source)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)

   org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)

   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)

  23.4% (116.9ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][search][T#30]'

 10/10 snapshots sharing following 10 elements

   sun.misc.Unsafe.park(Native Method)

   java.util.concurrent.locks.LockSupport.park(Unknown Source)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)

   org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)

   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)

   1.2% (6ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][management][T#3]'

 10/10 snapshots sharing following 9 elements

   sun.misc.Unsafe.park(Native Method)

   java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:702)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.poll(LinkedTransferQueue.java:1117)

   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)
  • Query with filters and Sorts

80.4% (402ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][search][T#57]'

 4/10 snapshots sharing following 20 elements

   org.apache.lucene.search.FilteredDocIdSetIterator.nextDoc(FilteredDocIdSetIterator.java:60)

   org.apache.lucene.search.ConstantScoreQuery$ConstantScorer.nextDoc(ConstantScoreQuery.java:196)

   org.elasticsearch.common.lucene.search.function.FunctionScoreQuery$CustomBoostFactorScorer.nextDoc(FunctionScoreQuery.java:169)

   org.apache.lucene.search.Scorer.score(Scorer.java:64)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)

   org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:173)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:491)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:448)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:269)

   org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:122)

   org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:249)

   org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:202)

   org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:186)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)

 6/10 snapshots sharing following 18 elements

   org.elasticsearch.common.lucene.search.FilteredCollector.collect(FilteredCollector.java:60)

   org.apache.lucene.search.Scorer.score(Scorer.java:65)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)

   org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:173)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:491)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:448)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:269)

   org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:122)

   org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:249)

   org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:202)

   org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:186)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)

   0.6% (2.7ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][http_server_worker][T#45]{New I/O worker #143}'

 10/10 snapshots sharing following 15 elements

   sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

   sun.nio.ch.EPollArrayWrapper.poll(Unknown Source)

   sun.nio.ch.EPollSelectorImpl.doSelect(Unknown Source)

   sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)

   sun.nio.ch.SelectorImpl.select(Unknown Source)

   org.elasticsearch.common.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:415)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)

   org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)

   org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)

   org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)

   0.4% (1.8ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][http_server_worker][T#41]{New I/O worker #139}'

 10/10 snapshots sharing following 15 elements

   sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

   sun.nio.ch.EPollArrayWrapper.poll(Unknown Source)

   sun.nio.ch.EPollSelectorImpl.doSelect(Unknown Source)

   sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)

   sun.nio.ch.SelectorImpl.select(Unknown Source)

   org.elasticsearch.common.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:415)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)

   org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)

   org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)

   org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)
  • When there no reads

99.6% (498ms out of 500ms) cpu usage by thread 'elasticsearch[Black Mamba][http_server_worker][T#9]{New I/O worker #107}'
10/10 snapshots sharing following 17 elements
java.nio.Bits.copyFromArray(Unknown Source)
java.nio.DirectByteBuffer.put(Unknown Source)
java.nio.DirectByteBuffer.put(Unknown Source)
sun.nio.ch.IOUtil.write(Unknown Source)
sun.nio.ch.SocketChannelImpl.write(Unknown Source)
org.elasticsearch.common.netty.channel.socket.nio.SocketSendBufferPool$UnpooledSendBuffer.transferTo(SocketSendBufferPool.java:203)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:201)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.writeFromSelectorLoop(AbstractNioWorker.java:158)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:114)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
java.lang.Thread.run(Unknown Source)

0.5% (2.3ms out of 500ms) cpu usage by thread 'elasticsearch[Black Mamba][index][T#21]'
 10/10 snapshots sharing following 10 elements
   sun.misc.Unsafe.park(Native Method)
   java.util.concurrent.locks.LockSupport.park(Unknown Source)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)
   org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)
   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   java.lang.Thread.run(Unknown Source)

0.5% (2.3ms out of 500ms) cpu usage by thread 'elasticsearch[Black Mamba][index][T#13]'
 10/10 snapshots sharing following 10 elements
   sun.misc.Unsafe.park(Native Method)
   java.util.concurrent.locks.LockSupport.park(Unknown Source)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)
   org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)
   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   java.lang.Thread.run(Unknown Source)

On Tuesday, June 10, 2014 3:51:03 PM UTC-7, Mark Walkom wrote:

SSDs help, though there is likely some other issue here so it's probably
not worth looking at, at this time.

Have you checked hot threads or the slow query log?
Can you provide more specs on your hardware? What java version are you
running?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 11 June 2014 08:41, <sai...@roblox.com <javascript:>> wrote:

The Heap Size is being reduced to 30GB to ensure that's not the
bottleneck.

The servers currently run SAS Drives. Though SSDs are usually preferred
for Elasticsearch, can this cause such disparities in
performance? ElasticHQ reports very high Refresh Rates, Search-Fetch and
Search-Query rates.

On Tuesday, June 10, 2014 3:17:49 PM UTC-7, Mark Walkom wrote:

You will likely see an increase by distributing it to one shard per
machine, but that's hard to quantify without actually doing it.

Also, you may be doing yourself a disservice with such a large heap size
as Nik mentioned. Over 32GB, Java pointers are not compressed and you do
lose a bit of performance due to this.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 11 June 2014 07:20, sai...@roblox.com wrote:

Thanks for the clarification. The servers aren't under any (read) load
yet. There is constant update of data in the background - Roughly about 60
Index Writes per second. The refresh interval is set to 60s. Can this be a
performance bottleneck?

We can add in more nodes to bring it up to 10 Nodes - 5 Shards with 1
Replica. But I doubt if that will reduce the Empty Search Query to 50ms.
Are there any other profiling tools out there to debug the response time?

On Tuesday, June 10, 2014 11:30:03 AM UTC-7, Nikolas Everett wrote:

Short answer: yes.
Long answer: 500ms is a long time for the empty query. I see 2ms from
elasticsearch and 23ms from time in development. In production I see maybe
54ms from elasticsearch and 70 from time across far far more shards and
more data. When I do the same query across thousands of shards and a
couple of TB of data I get ~250ms. Production is 16 servers with 96GB of
ram and 30GB heaps.

The analyzers really aren't going to hurt performance.

I'd have a look at your servers themselves: what kind of load are they
under? What is your indexing rate? that kind of thing.

Also, 30GB is normally the sweet spot for heap sizes, making ~64GB of
total ram the sweet spot for total ram. 110GB heap is pretty high and I'd
expect for new generation (pause the world) garbage collection to take a
while there.

Nik

On Tue, Jun 10, 2014 at 2:20 PM, sai...@roblox.com wrote:

I am currently running only 1 index with 5 shards. So the both of
those queries yield the same response time. My main question is to
understand if scaling out is an Option given the current replication scheme.

https://lh5.googleusercontent.com/-bz8iQd0KUaA/U5dMSGLNNFI/AAAAAAAAABg/tGJl0HOj4xo/s1600/Elasticsearch+Cluster.png

On Tuesday, June 10, 2014 11:15:26 AM UTC-7, Nikolas Everett wrote:

I imagine that depends on lots of stuff. Are you doing
elasticsearch:9200/_search or elasticsearch:9200/index/_search ?
The former can take quite a while if you have lots index and lots of
shards. If you can get away with not doing it, I would. The latter will
only take a long time if you have tons of shards. It should otherwise be
pretty quick.

On Tue, Jun 10, 2014 at 2:10 PM, sai...@roblox.com wrote:

We currently run our Elasticsearch (v1.0.2) cluster on 3 Nodes
with 5 Shards and 1 Replication Scheme. The total index size is
about 70GB (~140GB with replication).

The Empty Search (/_search) query takes 500-600 ms to respond. Will
adding in more Nodes help in this case? The Servers are have 252gb
of RAM and 110gb for Heap.

The Index uses the following analyzers - standard, lowercase, stop,
porter_stem. Will this degrade Query performance?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9941cfd1-d21
1-4706-aa45-6c545c66baff%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9941cfd1-d211-4706-aa45-6c545c66baff%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/d5291841-e791-40b3-93e6-d8fbe9921ac5%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/d5291841-e791-40b3-93e6-d8fbe9921ac5%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/ecb24d7b-404e-482c-b70e-9b90d33fd18d%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ecb24d7b-404e-482c-b70e-9b90d33fd18d%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c6cf345b-762e-4093-80fe-b7c535da77b9%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c6cf345b-762e-4093-80fe-b7c535da77b9%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e0c2ca98-faa4-4d23-b57e-796638bf5679%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #11

Are you running OpenJDK or Oracle?

Also, what system monitor are you using? Does it show high IO latency? You
might be able to find something using iostat and friends if you aren't
using anything now.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 11 June 2014 09:35, sairam@roblox.com wrote:

Machine Specs:
Processor: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Number of CPU cores: 24
Number of Physical CPUs: 2
Installed RAM: [~256 GB Total] 128 GB 128 GB 16 MB
Drive: Two 278GB SAS Drive configured in
RAID 0
OS:
Arch: 64bit(x86_64)
OS Type: Linux
Kernel: 2.6.32-431.5.1.el6.x86_64
OS Version: Red Hat Enterprise Linux Server release 6.5
(Santiago)
Java Version: Java 1.7.0_51 (Java 7u51 x64 version for
Linux).

Since we don't have any read queries executed against ES, the Hot Threads
are under 1%. I have given 3 different scenarios with basic Match_All,
Query with Filters/Sorts and Idle time.

  • Search Query (Basic Match_All with no filters/sorts)

28.2% (141.2ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][search][T#46]'

 10/10 snapshots sharing following 10 elements

   sun.misc.Unsafe.park(Native Method)

   java.util.concurrent.locks.LockSupport.park(Unknown Source)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)


   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)

   org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)


   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)


   java.lang.Thread.run(Unknown Source)

23.4% (116.9ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][search][T#30]'

 10/10 snapshots sharing following 10 elements

   sun.misc.Unsafe.park(Native Method)

   java.util.concurrent.locks.LockSupport.park(Unknown Source)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)


   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)

   org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)


   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)


   java.lang.Thread.run(Unknown Source)


1.2% (6ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][management][T#3]'

 10/10 snapshots sharing following 9 elements

   sun.misc.Unsafe.park(Native Method)

   java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:702)


   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.poll(LinkedTransferQueue.java:1117)

   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)


   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)
  • Query with filters and Sorts

80.4% (402ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][search][T#57]'

 4/10 snapshots sharing following 20 elements


   org.apache.lucene.search.FilteredDocIdSetIterator.nextDoc(FilteredDocIdSetIterator.java:60)

   org.apache.lucene.search.ConstantScoreQuery$ConstantScorer.nextDoc(ConstantScoreQuery.java:196)

   org.elasticsearch.common.lucene.search.function.FunctionScoreQuery$CustomBoostFactorScorer.nextDoc(FunctionScoreQuery.java:169)

   org.apache.lucene.search.Scorer.score(Scorer.java:64)


   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)

   org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:173)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:491)


   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:448)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:269)


   org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:122)

   org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:249)

   org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:202)

   org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:186)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)


   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)

 6/10 snapshots sharing following 18 elements


   org.elasticsearch.common.lucene.search.FilteredCollector.collect(FilteredCollector.java:60)

   org.apache.lucene.search.Scorer.score(Scorer.java:65)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)


   org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:173)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:491)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:448)


   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:269)

   org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:122)


   org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:249)

   org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:202)

   org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:186)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)


   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)

   0.6% (2.7ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][http_server_worker][T#45]{New I/O worker #143}'

 10/10 snapshots sharing following 15 elements


   sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

   sun.nio.ch.EPollArrayWrapper.poll(Unknown Source)

   sun.nio.ch.EPollSelectorImpl.doSelect(Unknown Source)


   sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)

   sun.nio.ch.SelectorImpl.select(Unknown Source)

   org.elasticsearch.common.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)


   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:415)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)

   org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)


   org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)

   org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)


   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)

   0.4% (1.8ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][http_server_worker][T#41]{New I/O worker #139}'

 10/10 snapshots sharing following 15 elements


   sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

   sun.nio.ch.EPollArrayWrapper.poll(Unknown Source)

   sun.nio.ch.EPollSelectorImpl.doSelect(Unknown Source)


   sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)

   sun.nio.ch.SelectorImpl.select(Unknown Source)

   org.elasticsearch.common.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)


   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:415)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)

   org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)


   org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)

   org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)


   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)
  • When there no reads

99.6% (498ms out of 500ms) cpu usage by thread 'elasticsearch[Black Mamba][http_server_worker][T#9]{New I/O worker #107}'
10/10 snapshots sharing following 17 elements
java.nio.Bits.copyFromArray(Unknown Source)
java.nio.DirectByteBuffer.put(Unknown Source)
java.nio.DirectByteBuffer.put(Unknown Source)
sun.nio.ch.IOUtil.write(Unknown Source)
sun.nio.ch.SocketChannelImpl.write(Unknown Source)
org.elasticsearch.common.netty.channel.socket.nio.SocketSendBufferPool$UnpooledSendBuffer.transferTo(SocketSendBufferPool.java:203)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:201)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.writeFromSelectorLoop(AbstractNioWorker.java:158)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:114)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
java.lang.Thread.run(Unknown Source)

0.5% (2.3ms out of 500ms) cpu usage by thread 'elasticsearch[Black Mamba][index][T#21]'
 10/10 snapshots sharing following 10 elements
   sun.misc.Unsafe.park(Native Method)
   java.util.concurrent.locks.LockSupport.park(Unknown Source)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)
   org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)
   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   java.lang.Thread.run(Unknown Source)

0.5% (2.3ms out of 500ms) cpu usage by thread 'elasticsearch[Black Mamba][index][T#13]'
 10/10 snapshots sharing following 10 elements
   sun.misc.Unsafe.park(Native Method)
   java.util.concurrent.locks.LockSupport.park(Unknown Source)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)
   org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)
   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   java.lang.Thread.run(Unknown Source)

On Tuesday, June 10, 2014 3:51:03 PM UTC-7, Mark Walkom wrote:

SSDs help, though there is likely some other issue here so it's probably
not worth looking at, at this time.

Have you checked hot threads or the slow query log?
Can you provide more specs on your hardware? What java version are you
running?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 11 June 2014 08:41, sai...@roblox.com wrote:

The Heap Size is being reduced to 30GB to ensure that's not the
bottleneck.

The servers currently run SAS Drives. Though SSDs are usually preferred
for Elasticsearch, can this cause such disparities in
performance? ElasticHQ reports very high Refresh Rates, Search-Fetch and
Search-Query rates.

On Tuesday, June 10, 2014 3:17:49 PM UTC-7, Mark Walkom wrote:

You will likely see an increase by distributing it to one shard per
machine, but that's hard to quantify without actually doing it.

Also, you may be doing yourself a disservice with such a large heap
size as Nik mentioned. Over 32GB, Java pointers are not compressed and you
do lose a bit of performance due to this.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 11 June 2014 07:20, sai...@roblox.com wrote:

Thanks for the clarification. The servers aren't under any (read) load
yet. There is constant update of data in the background - Roughly about 60
Index Writes per second. The refresh interval is set to 60s. Can this be a
performance bottleneck?

We can add in more nodes to bring it up to 10 Nodes - 5 Shards with 1
Replica. But I doubt if that will reduce the Empty Search Query to 50ms.
Are there any other profiling tools out there to debug the response time?

On Tuesday, June 10, 2014 11:30:03 AM UTC-7, Nikolas Everett wrote:

Short answer: yes.
Long answer: 500ms is a long time for the empty query. I see 2ms
from elasticsearch and 23ms from time in development. In production I see
maybe 54ms from elasticsearch and 70 from time across far far more shards
and more data. When I do the same query across thousands of shards and a
couple of TB of data I get ~250ms. Production is 16 servers with 96GB of
ram and 30GB heaps.

The analyzers really aren't going to hurt performance.

I'd have a look at your servers themselves: what kind of load are
they under? What is your indexing rate? that kind of thing.

Also, 30GB is normally the sweet spot for heap sizes, making ~64GB of
total ram the sweet spot for total ram. 110GB heap is pretty high and I'd
expect for new generation (pause the world) garbage collection to take a
while there.

Nik

On Tue, Jun 10, 2014 at 2:20 PM, sai...@roblox.com wrote:

I am currently running only 1 index with 5 shards. So the both of
those queries yield the same response time. My main question is to
understand if scaling out is an Option given the current replication scheme.

https://lh5.googleusercontent.com/-bz8iQd0KUaA/U5dMSGLNNFI/AAAAAAAAABg/tGJl0HOj4xo/s1600/Elasticsearch+Cluster.png

On Tuesday, June 10, 2014 11:15:26 AM UTC-7, Nikolas Everett wrote:

I imagine that depends on lots of stuff. Are you doing
elasticsearch:9200/_search or elasticsearch:9200/index/_search ?
The former can take quite a while if you have lots index and lots of
shards. If you can get away with not doing it, I would. The latter will
only take a long time if you have tons of shards. It should otherwise be
pretty quick.

On Tue, Jun 10, 2014 at 2:10 PM, sai...@roblox.com wrote:

We currently run our Elasticsearch (v1.0.2) cluster on 3
Nodes
with 5 Shards and 1 Replication Scheme. The total index
size is about 70GB (~140GB with replication).

The Empty Search (/_search) query takes 500-600 ms to respond.
Will adding in more Nodes help in this case? The Servers are have
252gb of RAM and 110gb for Heap.

The Index uses the following analyzers - standard, lowercase,
stop, porter_stem. Will this degrade Query performance?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9941cfd1-d21
1-4706-aa45-6c545c66baff%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9941cfd1-d211-4706-aa45-6c545c66baff%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d5291841-e79
1-40b3-93e6-d8fbe9921ac5%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/d5291841-e791-40b3-93e6-d8fbe9921ac5%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/ecb24d7b-404e-482c-b70e-9b90d33fd18d%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/ecb24d7b-404e-482c-b70e-9b90d33fd18d%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/c6cf345b-762e-4093-80fe-b7c535da77b9%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c6cf345b-762e-4093-80fe-b7c535da77b9%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e0c2ca98-faa4-4d23-b57e-796638bf5679%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/e0c2ca98-faa4-4d23-b57e-796638bf5679%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624aHM-vbdNGzQvSc2vD3p8CZ8zwsNjtZN0ssPazk-rEoBw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(sairam-2) #12

We run Oracle version. It doesn't show any major IO Latency. The %IOWait is
at 0.00 to 0.07 at most.

On Tuesday, June 10, 2014 4:43:44 PM UTC-7, Mark Walkom wrote:

Are you running OpenJDK or Oracle?

Also, what system monitor are you using? Does it show high IO latency? You
might be able to find something using iostat and friends if you aren't
using anything now.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 11 June 2014 09:35, <sai...@roblox.com <javascript:>> wrote:

Machine Specs:
Processor: Intel(R) Xeon(R) CPU E5-2630 0 @
2.30GHz
Number of CPU cores: 24
Number of Physical CPUs: 2
Installed RAM: [~256 GB Total] 128 GB 128 GB 16 MB
Drive: Two 278GB SAS Drive configured in
RAID 0
OS:
Arch: 64bit(x86_64)
OS Type: Linux
Kernel: 2.6.32-431.5.1.el6.x86_64
OS Version: Red Hat Enterprise Linux Server release
6.5 (Santiago)
Java Version: Java 1.7.0_51 (Java 7u51 x64 version for
Linux).

Since we don't have any read queries executed against ES, the Hot Threads
are under 1%. I have given 3 different scenarios with basic Match_All,
Query with Filters/Sorts and Idle time.

  • Search Query (Basic Match_All with no filters/sorts)

28.2% (141.2ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][search][T#46]'

 10/10 snapshots sharing following 10 elements

   sun.misc.Unsafe.park(Native Method)

   java.util.concurrent.locks.LockSupport.park(Unknown Source)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)

   org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)

   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)

  23.4% (116.9ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][search][T#30]'

 10/10 snapshots sharing following 10 elements

   sun.misc.Unsafe.park(Native Method)

   java.util.concurrent.locks.LockSupport.park(Unknown Source)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)

   org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)

   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)

   1.2% (6ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][management][T#3]'

 10/10 snapshots sharing following 9 elements

   sun.misc.Unsafe.park(Native Method)

   java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:702)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.poll(LinkedTransferQueue.java:1117)

   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)
  • Query with filters and Sorts

80.4% (402ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][search][T#57]'

 4/10 snapshots sharing following 20 elements

   org.apache.lucene.search.FilteredDocIdSetIterator.nextDoc(FilteredDocIdSetIterator.java:60)

   org.apache.lucene.search.ConstantScoreQuery$ConstantScorer.nextDoc(ConstantScoreQuery.java:196)

   org.elasticsearch.common.lucene.search.function.FunctionScoreQuery$CustomBoostFactorScorer.nextDoc(FunctionScoreQuery.java:169)

   org.apache.lucene.search.Scorer.score(Scorer.java:64)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)

   org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:173)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:491)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:448)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:269)

   org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:122)

   org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:249)

   org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:202)

   org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:186)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)

 6/10 snapshots sharing following 18 elements

   org.elasticsearch.common.lucene.search.FilteredCollector.collect(FilteredCollector.java:60)

   org.apache.lucene.search.Scorer.score(Scorer.java:65)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)

   org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:173)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:491)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:448)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:269)

   org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:122)

   org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:249)

   org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:202)

   org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:186)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)

   0.6% (2.7ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][http_server_worker][T#45]{New I/O worker #143}'

 10/10 snapshots sharing following 15 elements

   sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

   sun.nio.ch.EPollArrayWrapper.poll(Unknown Source)

   sun.nio.ch.EPollSelectorImpl.doSelect(Unknown Source)

   sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)

   sun.nio.ch.SelectorImpl.select(Unknown Source)

   org.elasticsearch.common.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:415)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)

   org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)

   org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)

   org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)

   0.4% (1.8ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][http_server_worker][T#41]{New I/O worker #139}'

 10/10 snapshots sharing following 15 elements

   sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

   sun.nio.ch.EPollArrayWrapper.poll(Unknown Source)

   sun.nio.ch.EPollSelectorImpl.doSelect(Unknown Source)

   sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)

   sun.nio.ch.SelectorImpl.select(Unknown Source)

   org.elasticsearch.common.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:415)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)

   org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)

   org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)

   org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)
  • When there no reads

99.6% (498ms out of 500ms) cpu usage by thread 'elasticsearch[Black Mamba][http_server_worker][T#9]{New I/O worker #107}'
10/10 snapshots sharing following 17 elements
java.nio.Bits.copyFromArray(Unknown Source)
java.nio.DirectByteBuffer.put(Unknown Source)
java.nio.DirectByteBuffer.put(Unknown Source)
sun.nio.ch.IOUtil.write(Unknown Source)
sun.nio.ch.SocketChannelImpl.write(Unknown Source)
org.elasticsearch.common.netty.channel.socket.nio.SocketSendBufferPool$UnpooledSendBuffer.transferTo(SocketSendBufferPool.java:203)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:201)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.writeFromSelectorLoop(AbstractNioWorker.java:158)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:114)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
java.lang.Thread.run(Unknown Source)

0.5% (2.3ms out of 500ms) cpu usage by thread 'elasticsearch[Black Mamba][index][T#21]'
 10/10 snapshots sharing following 10 elements
   sun.misc.Unsafe.park(Native Method)
   java.util.concurrent.locks.LockSupport.park(Unknown Source)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)
   org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)
   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   java.lang.Thread.run(Unknown Source)

0.5% (2.3ms out of 500ms) cpu usage by thread 'elasticsearch[Black Mamba][index][T#13]'
 10/10 snapshots sharing following 10 elements
   sun.misc.Unsafe.park(Native Method)
   java.util.concurrent.locks.LockSupport.park(Unknown Source)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)
   org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)
   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   java.lang.Thread.run(Unknown Source)

On Tuesday, June 10, 2014 3:51:03 PM UTC-7, Mark Walkom wrote:

SSDs help, though there is likely some other issue here so it's probably
not worth looking at, at this time.

Have you checked hot threads or the slow query log?
Can you provide more specs on your hardware? What java version are you
running?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 11 June 2014 08:41, sai...@roblox.com wrote:

The Heap Size is being reduced to 30GB to ensure that's not the
bottleneck.

The servers currently run SAS Drives. Though SSDs are usually preferred
for Elasticsearch, can this cause such disparities in
performance? ElasticHQ reports very high Refresh Rates, Search-Fetch and
Search-Query rates.

On Tuesday, June 10, 2014 3:17:49 PM UTC-7, Mark Walkom wrote:

You will likely see an increase by distributing it to one shard per
machine, but that's hard to quantify without actually doing it.

Also, you may be doing yourself a disservice with such a large heap
size as Nik mentioned. Over 32GB, Java pointers are not compressed and you
do lose a bit of performance due to this.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 11 June 2014 07:20, sai...@roblox.com wrote:

Thanks for the clarification. The servers aren't under any (read)
load yet. There is constant update of data in the background - Roughly
about 60 Index Writes per second. The refresh interval is set to 60s. Can
this be a performance bottleneck?

We can add in more nodes to bring it up to 10 Nodes - 5 Shards with 1
Replica. But I doubt if that will reduce the Empty Search Query to 50ms.
Are there any other profiling tools out there to debug the response time?

On Tuesday, June 10, 2014 11:30:03 AM UTC-7, Nikolas Everett wrote:

Short answer: yes.
Long answer: 500ms is a long time for the empty query. I see 2ms
from elasticsearch and 23ms from time in development. In production I see
maybe 54ms from elasticsearch and 70 from time across far far more shards
and more data. When I do the same query across thousands of shards and a
couple of TB of data I get ~250ms. Production is 16 servers with 96GB of
ram and 30GB heaps.

The analyzers really aren't going to hurt performance.

I'd have a look at your servers themselves: what kind of load are
they under? What is your indexing rate? that kind of thing.

Also, 30GB is normally the sweet spot for heap sizes, making ~64GB
of total ram the sweet spot for total ram. 110GB heap is pretty high and
I'd expect for new generation (pause the world) garbage collection to take
a while there.

Nik

On Tue, Jun 10, 2014 at 2:20 PM, sai...@roblox.com wrote:

I am currently running only 1 index with 5 shards. So the both of
those queries yield the same response time. My main question is to
understand if scaling out is an Option given the current replication scheme.

https://lh5.googleusercontent.com/-bz8iQd0KUaA/U5dMSGLNNFI/AAAAAAAAABg/tGJl0HOj4xo/s1600/Elasticsearch+Cluster.png

On Tuesday, June 10, 2014 11:15:26 AM UTC-7, Nikolas Everett wrote:

I imagine that depends on lots of stuff. Are you doing
elasticsearch:9200/_search or elasticsearch:9200/index/_search ?
The former can take quite a while if you have lots index and lots of
shards. If you can get away with not doing it, I would. The latter will
only take a long time if you have tons of shards. It should otherwise be
pretty quick.

On Tue, Jun 10, 2014 at 2:10 PM, sai...@roblox.com wrote:

We currently run our Elasticsearch (v1.0.2) cluster on 3
Nodes
with 5 Shards and 1 Replication Scheme. The total index
size is about 70GB (~140GB with replication).

The Empty Search (/_search) query takes 500-600 ms to respond.
Will adding in more Nodes help in this case? The Servers are
have 252gb of RAM and 110gb for Heap.

The Index uses the following analyzers - standard, lowercase,
stop, porter_stem. Will this degrade Query performance?

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9941cfd1-d21
1-4706-aa45-6c545c66baff%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9941cfd1-d211-4706-aa45-6c545c66baff%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d5291841-e79
1-40b3-93e6-d8fbe9921ac5%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/d5291841-e791-40b3-93e6-d8fbe9921ac5%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/ecb24d7b-404e-482c-b70e-9b90d33fd18d%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/ecb24d7b-404e-482c-b70e-9b90d33fd18d%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/c6cf345b-762e-4093-80fe-b7c535da77b9%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c6cf345b-762e-4093-80fe-b7c535da77b9%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e0c2ca98-faa4-4d23-b57e-796638bf5679%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/e0c2ca98-faa4-4d23-b57e-796638bf5679%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4f7f4539-96cf-4052-bada-ea27cb7ac8d7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Antonio Augusto Santos) #13

Did you try setting up am empty index and see if you get the same results?
Also, are you returning all the docs in your query, or are you just getting
a subset of it?

On Tuesday, June 10, 2014 9:36:23 PM UTC-3, sai...@roblox.com wrote:

We run Oracle version. It doesn't show any major IO Latency. The %IOWait
is at 0.00 to 0.07 at most.

On Tuesday, June 10, 2014 4:43:44 PM UTC-7, Mark Walkom wrote:

Are you running OpenJDK or Oracle?

Also, what system monitor are you using? Does it show high IO latency? You
might be able to find something using iostat and friends if you aren't
using anything now.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 11 June 2014 09:35, sai...@roblox.com wrote:

Machine Specs:
Processor: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Number of CPU cores: 24
Number of Physical CPUs: 2
Installed RAM: [~256 GB Total] 128 GB 128 GB 16 MB
Drive: Two 278GB SAS Drive configured in
RAID 0
OS:
Arch: 64bit(x86_64)
OS Type: Linux
Kernel: 2.6.32-431.5.1.el6.x86_64
OS Version: Red Hat Enterprise Linux Server release 6.5
(Santiago)
Java Version: Java 1.7.0_51 (Java 7u51 x64 version for
Linux).

Since we don't have any read queries executed against ES, the Hot Threads
are under 1%. I have given 3 different scenarios with basic Match_All,
Query with Filters/Sorts and Idle time.

  • Search Query (Basic Match_All with no filters/sorts)

28.2% (141.2ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][search][T#46]'

 10/10 snapshots sharing following 10 elements

   sun.misc.Unsafe.park(Native Method)

   java.util.concurrent.locks.LockSupport.park(Unknown Source)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)

   org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)

   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)

  23.4% (116.9ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][search][T#30]'

 10/10 snapshots sharing following 10 elements

   sun.misc.Unsafe.park(Native Method)

   java.util.concurrent.locks.LockSupport.park(Unknown Source)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)

   org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)

   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)

   1.2% (6ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][management][T#3]'

 10/10 snapshots sharing following 9 elements

   sun.misc.Unsafe.park(Native Method)

   java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:702)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.poll(LinkedTransferQueue.java:1117)

   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)
  • Query with filters and Sorts

80.4% (402ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][search][T#57]'

 4/10 snapshots sharing following 20 elements

   org.apache.lucene.search.FilteredDocIdSetIterator.nextDoc(FilteredDocIdSetIterator.java:60)

   org.apache.lucene.search.ConstantScoreQuery$ConstantScorer.nextDoc(ConstantScoreQuery.java:196)

   org.elasticsearch.common.lucene.search.function.FunctionScoreQuery$CustomBoostFactorScorer.nextDoc(FunctionScoreQuery.java:169)

   org.apache.lucene.search.Scorer.score(Scorer.java:64)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)

   org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:173)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:491)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:448)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:269)

   org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:122)

   org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:249)

   org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:202)

   org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:186)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)

 6/10 snapshots sharing following 18 elements

   org.elasticsearch.common.lucene.search.FilteredCollector.collect(FilteredCollector.java:60)

   org.apache.lucene.search.Scorer.score(Scorer.java:65)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)

   org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:173)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:491)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:448)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:269)

   org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:122)

   org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:249)

   org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:202)

   org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:186)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)

   0.6% (2.7ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][http_server_worker][T#45]{New I/O worker #143}'

 10/10 snapshots sharing following 15 elements

   sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

   sun.nio.ch.EPollArrayWrapper.poll(Unknown Source)

   sun.nio.ch.EPollSelectorImpl.doSelect(Unknown Source)

   sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)

   sun.nio.ch.SelectorImpl.select(Unknown Source)

   org.elasticsearch.common.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:415)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)

   org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)

   org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)

   org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)

   0.4% (1.8ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][http_server_worker][T#41]{New I/O worker #139}'

 10/10 snapshots sharing following 15 elements

   sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

   sun.nio.ch.EPollArrayWrapper.poll(Unknown Source)

   sun.nio.ch.EPollSelectorImpl.doSelect(Unknown Source)

   sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)

   sun.nio.ch.SelectorImpl.select(Unknown Source)

   org.elasticsearch.common.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:415)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)

   org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)

   org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)

   org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)
  • When there no reads

99.6% (498ms out of 500ms) cpu usage by thread 'elasticsearch[Black Mamba][http_server_worker][T#9]{New I/O worker #107}'
10/10 snapshots sharing following 17 elements
java.nio.Bits.copyFromArray(Unknown Source)
java.nio.DirectByteBuffer.put(Unknown Source)
java.nio.DirectByteBuffer.put(Unknown Source)
sun.nio.ch.IOUtil.write(Unknown Source)
sun.nio.ch.SocketChannelImpl.write(Unknown Source)
org.elasticsearch.common.netty.channel.socket.nio.SocketSendBufferPool$UnpooledSendBuffer.transferTo(SocketSendBufferPool.java:203)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:201)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.writeFromSelectorLoop(AbstractNioWorker.java:158)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:114)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
java.lang.Thread.run(Unknown Source)

0.5% (2.3ms out of 500ms) cpu usage by thread 'elasticsearch[Black Mamba][index][T#21]'
 10/10 snapshots sharing following 10 elements
   sun.misc.Unsafe.park(Native Method)
   java.util.concurrent.locks.LockSupport.park(Unknown Source)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)
   org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)
   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   java.lang.Thread.run(Unknown Source)

0.5% (2.3ms out of 500ms) cpu usage by thread 'elasticsearch[Black Mamba][index][T#13]'
 10/10 snapshots sharing following 10 elements
   sun.misc.Unsafe.park(Native Method)
   java.util.concurrent.locks.LockSupport.park(Unknown Source)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)
   org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)
   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   java.lang.Thread.run(Unknown Source)

On Tuesday, June 10, 2014 3:51:03 PM UTC-7, Mark Walkom wrote:

SSDs help, though there is likely some other issue here so it's probably
not worth looking at, at this time.

Have you checked hot threads or the slow query log?
Can you provide more specs on your hardware? What java version are you
running?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 11 June 2014 08:41, sai...@roblox.com wrote:

The Heap Size is being reduced to 30GB to ensure that's not the bottleneck.

The servers currently run SAS Drives. Though SSDs are usually preferred
for Elasticsearch, can this cause such disparities in
performance? ElasticHQ reports very high Refresh Rates, Search-Fetch and
Search-Query rates.

On Tuesday, June 10, 2014 3:17:49 PM UTC-7, Mark Walkom wrote:

You will likely see an increase by distributing it to one shard per
machine, but that's hard to quantify without actually doing it.

Also, you may be doing yourself a disservice with such a large heap size
as Nik mentioned. Over 32GB, Java pointers are not compressed and you do
lose a bit of performance due to this.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 11 June 2014 07:20, sai...@roblox.com wrote:

Thanks for the clarification. The servers aren't under any (read) load
yet. There is constant update of data in the background - Roughly about 60
Index Writes per second. The refresh interval is set to 60s. Can this be a
performance bottleneck?

We can add in more nodes to bring it up to 10 Nodes - 5 Shards with 1
Replica. But I doubt if that will reduce the Empty Search Query to 50ms.
Are there any other profiling tools out there to debug the response time?

On Tuesday, June 10, 2014 11:30:03 AM UTC-7, Nikolas Everett wrote:

Short answer: yes.
Long answer: 500ms is a long time for the empty query. I see 2ms from
elasticsearch and 23ms from time in development. In production I see maybe
54ms from elasticsearch and 70 from time across far far more shards and
more data. When I do the same query across thousands of shards and a
couple of TB of data I get ~250ms. Production is 16 servers with 96GB of
ram and 30GB heaps.

The analyzers really aren't going to hurt performance.

I'd have a look at your servers themselves: what kind of load are they
under? What is your indexing rate? that kind of thing.

Also, 30GB is normally the sweet spot for heap sizes, making ~64GB of
total ram the sweet spot for total ram. 110GB heap is pretty high and I'd
expect for new generation (pause the world) garbage collection to take a
while there.

Nik

On Tue, Jun 10, 2014 at 2:20 PM, sai...@roblox.com wrote:

I am currently running only 1 index with 5 shards. So the both of those
queries yield the same response time. My main question is to understand if
scaling out is an Option given the current replication scheme.

https://lh5.googleusercontent.com/-bz8iQd0KUaA/U5dMSGLNNFI/AAAAAAAAABg/tGJl0HOj4xo/s1600/Elasticsearch+Cluster.png

On Tuesday, June 10, 2014 11:15:26 AM UTC-7, Nikolas Everett wrote:

I imagine that depends on lots of stuff. Are you doing
elasticsearch:9200/_search or elasticsearch:9200/index/_search ? The
former can take quite a while if you have lots index and lots of shards.
If you can get away with not doing it, I would. The latter will only take
a long time if you have tons of shards. It should otherwise be pretty
quick.

On Tue, Jun 10, 2014 at 2:10 PM, sai...@roblox.com wrote:

We currently run our Elasticsearch (v1.0.2) cluster on 3 Nodes with 5
Shards and 1 Replication
Scheme. The total index size is about 70GB
(~140GB with replication).

The Empty Search (/_search) query takes 500-600 ms to respond. Will adding
in more Nodes help in this case? The Servers are have 252gb of RAM and
110gb for Heap.

The Index uses the following analyzers - standard, lowercase, stop,
porter_stem. Will this degrade Query performance?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/ms
gid/elasticsearch/9941cfd1-d211-4706-aa45-6c545c66baff%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9941cfd1-d211-4706-aa45-6c545c66baff%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="
https://groups.google.com/d/msgid/elasticsearch/d5291841-e791-40b3-93e6-d8fbe9921ac5%40googlegroups.com?utm_medium=email&utm_source=foo

...

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6c7af9f0-bc61-4431-950a-b3026eaf1bd6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(sairam-2) #14

The empty index takes 1 ms to return for the Empty Search
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/empty-search.html
query. I did not specify a From and Size in my empty searches (they
defaulted to 10). Specifying explicit Size, Fields doesn't make a
difference.

On Wednesday, June 11, 2014 8:04:17 AM UTC-7, Antonio Augusto Santos wrote:

Did you try setting up am empty index and see if you get the same results?
Also, are you returning all the docs in your query, or are you just
getting a subset of it?

On Tuesday, June 10, 2014 9:36:23 PM UTC-3, sai...@roblox.com wrote:

We run Oracle version. It doesn't show any major IO Latency. The %IOWait
is at 0.00 to 0.07 at most.

On Tuesday, June 10, 2014 4:43:44 PM UTC-7, Mark Walkom wrote:

Are you running OpenJDK or Oracle?

Also, what system monitor are you using? Does it show high IO latency? You
might be able to find something using iostat and friends if you aren't
using anything now.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 11 June 2014 09:35, sai...@roblox.com wrote:

Machine Specs:
Processor: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Number of CPU cores: 24
Number of Physical CPUs: 2
Installed RAM: [~256 GB Total] 128 GB 128 GB 16 MB
Drive: Two 278GB SAS Drive configured in
RAID 0
OS:
Arch: 64bit(x86_64)
OS Type: Linux
Kernel: 2.6.32-431.5.1.el6.x86_64
OS Version: Red Hat Enterprise Linux Server release 6.5
(Santiago)
Java Version: Java 1.7.0_51 (Java 7u51 x64 version for
Linux).

Since we don't have any read queries executed against ES, the Hot Threads
are under 1%. I have given 3 different scenarios with basic Match_All,
Query with Filters/Sorts and Idle time.

  • Search Query (Basic Match_All with no filters/sorts)

28.2% (141.2ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][search][T#46]'

 10/10 snapshots sharing following 10 elements

   sun.misc.Unsafe.park(Native Method)

   java.util.concurrent.locks.LockSupport.park(Unknown Source)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)

   org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)

   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)

  23.4% (116.9ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][search][T#30]'

 10/10 snapshots sharing following 10 elements

   sun.misc.Unsafe.park(Native Method)

   java.util.concurrent.locks.LockSupport.park(Unknown Source)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)

   org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)

   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)

   1.2% (6ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][management][T#3]'

 10/10 snapshots sharing following 9 elements

   sun.misc.Unsafe.park(Native Method)

   java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:702)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.poll(LinkedTransferQueue.java:1117)

   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)
  • Query with filters and Sorts

80.4% (402ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][search][T#57]'

 4/10 snapshots sharing following 20 elements

   org.apache.lucene.search.FilteredDocIdSetIterator.nextDoc(FilteredDocIdSetIterator.java:60)

   org.apache.lucene.search.ConstantScoreQuery$ConstantScorer.nextDoc(ConstantScoreQuery.java:196)

   org.elasticsearch.common.lucene.search.function.FunctionScoreQuery$CustomBoostFactorScorer.nextDoc(FunctionScoreQuery.java:169)

   org.apache.lucene.search.Scorer.score(Scorer.java:64)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)

   org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:173)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:491)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:448)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:269)

   org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:122)

   org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:249)

   org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:202)

   org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:186)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)

 6/10 snapshots sharing following 18 elements

   org.elasticsearch.common.lucene.search.FilteredCollector.collect(FilteredCollector.java:60)

   org.apache.lucene.search.Scorer.score(Scorer.java:65)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)

   org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:173)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:491)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:448)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:269)

   org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:122)

   org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:249)

   org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:202)

   org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:186)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)

   0.6% (2.7ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][http_server_worker][T#45]{New I/O worker #143}'

 10/10 snapshots sharing following 15 elements

   sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

   sun.nio.ch.EPollArrayWrapper.poll(Unknown Source)

   sun.nio.ch.EPollSelectorImpl.doSelect(Unknown Source)

   sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)

   sun.nio.ch.SelectorImpl.select(Unknown Source)

   org.elasticsearch.common.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:415)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)

   org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)

   org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)

   org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)

   0.4% (1.8ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][http_server_worker][T#41]{New I/O worker #139}'

 10/10 snapshots sharing following 15 elements

   sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

   sun.nio.ch.EPollArrayWrapper.poll(Unknown Source)

   sun.nio.ch.EPollSelectorImpl.doSelect(Unknown Source)

   sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)

   sun.nio.ch.SelectorImpl.select(Unknown Source)

   org.elasticsearch.common.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:415)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)

   org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)

   org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)

   org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)
  • When there no reads

99.6% (498ms out of 500ms) cpu usage by thread 'elasticsearch[Black Mamba][http_server_worker][T#9]{New I/O worker #107}'
10/10 snapshots sharing following 17 elements
java.nio.Bits.copyFromArray(Unknown Source)
java.nio.DirectByteBuffer.put(Unknown Source)
java.nio.DirectByteBuffer.put(Unknown Source)
sun.nio.ch.IOUtil.write(Unknown Source)
sun.nio.ch.SocketChannelImpl.write(Unknown Source)
org.elasticsearch.common.netty.channel.socket.nio.SocketSendBufferPool$UnpooledSendBuffer.transferTo(SocketSendBufferPool.java:203)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:201)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.writeFromSelectorLoop(AbstractNioWorker.java:158)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:114)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
java.lang.Thread.run(Unknown Source)

0.5% (2.3ms out of 500ms) cpu usage by thread 'elasticsearch[Black Mamba][index][T#21]'
 10/10 snapshots sharing following 10 elements
   sun.misc.Unsafe.park(Native Method)
   java.util.concurrent.locks.LockSupport.park(Unknown Source)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)
   org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)
   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   java.lang.Thread.run(Unknown Source)

0.5% (2.3ms out of 500ms) cpu usage by thread 'elasticsearch[Black Mamba][index][T#13]'
 10/10 snapshots sharing following 10 elements
   sun.misc.Unsafe.park(Native Method)
   java.util.concurrent.locks.LockSupport.park(Unknown Source)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)
   org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)
   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   java.lang.Thread.run(Unknown Source)

On Tuesday, June 10, 2014 3:51:03 PM UTC-7, Mark Walkom wrote:

SSDs help, though there is likely some other issue here so it's probably
not worth looking at, at this time.

Have you checked hot threads or the slow query log?
Can you provide more specs on your hardware? What java version are you
running?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 11 June 2014 08:41, sai...@roblox.com wrote:

The Heap Size is being reduced to 30GB to ensure that's not the bottleneck.

The servers currently run SAS Drives. Though SSDs are usually preferred
for Elasticsearch, can this cause such disparities in
performance? ElasticHQ reports very high Refresh Rates, Search-Fetch and
Search-Query rates.

On Tuesday, June 10, 2014 3:17:49 PM UTC-7, Mark Walkom wrote:

You will likely see an increase by distributing it to one shard per
machine, but that's hard to quantify without actually doing it.

Also, you may be doing yourself a disservice with such a large heap size
as Nik mentioned. Over 32GB, Java pointers are not compressed and you do
lose a bit of performance due to this.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 11 June 2014 07:20, sai...@roblox.com wrote:

Thanks for the clarification. The servers aren't under any (read) load
yet. There is constant update of data in the background - Roughly about 60
Index Writes per second. The refresh interval is set to 60s. Can this be a
performance bottleneck?

We can add in more nodes to bring it up to 10 Nodes - 5 Shards with 1
Replica. But I doubt if that will reduce the Empty Search Query to 50ms.
Are there any other profiling tools out there to debug the response time?

On Tuesday, June 10, 2014 11:30:03 AM UTC-7, Nikolas Everett wrote:

Short answer: yes.
Long answer: 500ms is a long time for the empty query. I see 2ms from
elasticsearch and 23ms from time in development. In production I see maybe
54ms from elasticsearch and 70 from time across far far more shards and
more data. When I do the same query across thousands of shards and a
couple of TB of data I get ~250ms. Production is 16 servers with 96GB of
ram and 30GB heaps.

The analyzers really aren't going to hurt performance.

I'd have a look at your servers themselves: what kind of load are they
under? What is your indexing rate? that kind of thing.

Also, 30GB is normally the sweet spot for heap sizes, making ~64GB of
total ram the sweet spot for total ram. 110GB heap is pretty high and I'd
expect for new generation (pause the world) garbage collection to take a
while there.

Nik

On Tue, Jun 10, 2014 at 2:20 PM, sai...@roblox.com wrote:

I am currently running only 1 index with 5 shards. So the both of those
queries yield the same response time. My main question is to understand if
scaling out is an Option given the current replication scheme.

https://lh5.googleusercontent.com/-bz8iQd0KUaA/U5dMSGLNNFI/AAAAAAAAABg/tGJl0HOj4xo/s1600/Elasticsearch+Cluster.png

On Tuesday, June 10, 2014 11:15:26 AM UTC-7, Nikolas Everett wrote:

I imagine that depends on lots of stuff. Are you doing
elasticsearch:9200/_search or elasticsearch:9200/index/_search ? The
former can take quite a while if you have lots index and lots of shards.
If you can get away with not doing it, I would. The latter will only take
a long time if you have tons of shards. It should otherwise be pretty
quick.

On Tue, Jun 10, 2014 at 2:10 PM, sai...@roblox.com wrote:

We currently run our Elasticsearch (v1.0.2) cluster on 3 Nodes with 5
Shards and 1 Replication
Scheme. The total index size is about 70GB
(~140GB with replication).

The Empty Search (/_search) query takes 500-600 ms to respond. Will adding
in more Nodes help in this case? The Servers are have 252gb of RAM and
110gb for Heap.

The Index uses the following analyzers - standard, lowercase, stop,
porter_stem. Will this degrade Query performance?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/ms
gid/elasticsearch/9941cfd1-d211-4706-aa45-6c545c66baff%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9941cfd1-d211-4706-aa45-6c545c66baff%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="<a href="
https://groups.google.com/d/msgid/elasticsearch/d5291841-e791-40b3-93e6-d8fbe9921ac5%40googlegroups.com?utm_medium=email&utm_source=foo"
target="_blank" onmousedown="this.href='
https://groups.google.com/d/msgid/elasticsearch/d5291841-e791-40b3-93e6-d8fbe9921ac5%40googlegroups.com?utm_medium\75email\46utm_source\75foo';return
https://groups.google.com/d/msgid/elasticsearch/d5291841-e791-40b3-93e6-d8fbe9921ac5%40googlegroups.com?utm_medium\75email\46utm_source\75foo';return
true;" onclick="this.href='
https://groups.google.com/d/msgid/elasticsearch/d5291841-e791-40b3-93e6-d8

...

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f7282a6b-a328-4d26-96c8-c64685b1d652%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(nc) #15

May be good idea to stop writes, wait for merges.current to go down to zero
and then look at response time.

On Wednesday, June 11, 2014 12:36:04 PM UTC-5, sai...@roblox.com wrote:

The empty index takes 1 ms to return for the Empty Search
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/empty-search.html
query. I did not specify a From and Size in my empty searches (they
defaulted to 10). Specifying explicit Size, Fields doesn't make a
difference.

On Wednesday, June 11, 2014 8:04:17 AM UTC-7, Antonio Augusto Santos wrote:

Did you try setting up am empty index and see if you get the same results?
Also, are you returning all the docs in your query, or are you just
getting a subset of it?

On Tuesday, June 10, 2014 9:36:23 PM UTC-3, sai...@roblox.com wrote:

We run Oracle version. It doesn't show any major IO Latency. The %IOWait
is at 0.00 to 0.07 at most.

On Tuesday, June 10, 2014 4:43:44 PM UTC-7, Mark Walkom wrote:

Are you running OpenJDK or Oracle?

Also, what system monitor are you using? Does it show high IO latency? You
might be able to find something using iostat and friends if you aren't
using anything now.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 11 June 2014 09:35, sai...@roblox.com wrote:

Machine Specs:
Processor: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Number of CPU cores: 24
Number of Physical CPUs: 2
Installed RAM: [~256 GB Total] 128 GB 128 GB 16 MB
Drive: Two 278GB SAS Drive configured in
RAID 0
OS:
Arch: 64bit(x86_64)
OS Type: Linux
Kernel: 2.6.32-431.5.1.el6.x86_64
OS Version: Red Hat Enterprise Linux Server release 6.5
(Santiago)
Java Version: Java 1.7.0_51 (Java 7u51 x64 version for
Linux).

Since we don't have any read queries executed against ES, the Hot Threads
are under 1%. I have given 3 different scenarios with basic Match_All,
Query with Filters/Sorts and Idle time.

  • Search Query (Basic Match_All with no filters/sorts)

28.2% (141.2ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][search][T#46]'

 10/10 snapshots sharing following 10 elements

   sun.misc.Unsafe.park(Native Method)

   java.util.concurrent.locks.LockSupport.park(Unknown Source)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)

   org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)

   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)

  23.4% (116.9ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][search][T#30]'

 10/10 snapshots sharing following 10 elements

   sun.misc.Unsafe.park(Native Method)

   java.util.concurrent.locks.LockSupport.park(Unknown Source)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)

   org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)

   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)

   1.2% (6ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][management][T#3]'

 10/10 snapshots sharing following 9 elements

   sun.misc.Unsafe.park(Native Method)

   java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:702)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.poll(LinkedTransferQueue.java:1117)

   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)
  • Query with filters and Sorts

80.4% (402ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][search][T#57]'

 4/10 snapshots sharing following 20 elements

   org.apache.lucene.search.FilteredDocIdSetIterator.nextDoc(FilteredDocIdSetIterator.java:60)

   org.apache.lucene.search.ConstantScoreQuery$ConstantScorer.nextDoc(ConstantScoreQuery.java:196)

   org.elasticsearch.common.lucene.search.function.FunctionScoreQuery$CustomBoostFactorScorer.nextDoc(FunctionScoreQuery.java:169)

   org.apache.lucene.search.Scorer.score(Scorer.java:64)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)

   org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:173)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:491)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:448)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:269)

   org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:122)

   org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:249)

   org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:202)

   org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:186)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)

 6/10 snapshots sharing following 18 elements

   org.elasticsearch.common.lucene.search.FilteredCollector.collect(FilteredCollector.java:60)

   org.apache.lucene.search.Scorer.score(Scorer.java:65)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)

   org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:173)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:491)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:448)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:269)

   org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:122)

   org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:249)

   org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:202)

   org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:186)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)

   0.6% (2.7ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][http_server_worker][T#45]{New I/O worker #143}'

 10/10 snapshots sharing following 15 elements

   sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

   sun.nio.ch.EPollArrayWrapper.poll(Unknown Source)

   sun.nio.ch.EPollSelectorImpl.doSelect(Unknown Source)

   sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)

   sun.nio.ch.SelectorImpl.select(Unknown Source)

   org.elasticsearch.common.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:415)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)

   org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)

   org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)

   org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)

   0.4% (1.8ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][http_server_worker][T#41]{New I/O worker #139}'

 10/10 snapshots sharing following 15 elements

   sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

   sun.nio.ch.EPollArrayWrapper.poll(Unknown Source)

   sun.nio.ch.EPollSelectorImpl.doSelect(Unknown Source)

   sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)

   sun.nio.ch.SelectorImpl.select(Unknown Source)

   org.elasticsearch.common.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:415)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)

   org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)

   org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)

   org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)
  • When there no reads

99.6% (498ms out of 500ms) cpu usage by thread 'elasticsearch[Black Mamba][http_server_worker][T#9]{New I/O worker #107}'
10/10 snapshots sharing following 17 elements
java.nio.Bits.copyFromArray(Unknown Source)
java.nio.DirectByteBuffer.put(Unknown Source)
java.nio.DirectByteBuffer.put(Unknown Source)
sun.nio.ch.IOUtil.write(Unknown Source)
sun.nio.ch.SocketChannelImpl.write(Unknown Source)
org.elasticsearch.common.netty.channel.socket.nio.SocketSendBufferPool$UnpooledSendBuffer.transferTo(SocketSendBufferPool.java:203)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:201)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.writeFromSelectorLoop(AbstractNioWorker.java:158)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:114)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
java.lang.Thread.run(Unknown Source)

0.5% (2.3ms out of 500ms) cpu usage by thread 'elasticsearch[Black Mamba][index][T#21]'
 10/10 snapshots sharing following 10 elements
   sun.misc.Unsafe.park(Native Method)
   java.util.concurrent.locks.LockSupport.park(Unknown Source)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)
   org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)
   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   java.lang.Thread.run(Unknown Source)

0.5% (2.3ms out of 500ms) cpu usage by thread 'elasticsearch[Black Mamba][index][T#13]'
 10/10 snapshots sharing following 10 elements
   sun.misc.Unsafe.park(Native Method)
   java.util.concurrent.locks.LockSupport.park(Unknown Source)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)
   org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)
   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   java.lang.Thread.run(Unknown Source)

On Tuesday, June 10, 2014 3:51:03 PM UTC-7, Mark Walkom wrote:

SSDs help, though there is likely some other issue here so it's probably
not worth looking at, at this time.

Have you checked hot threads or the slow query log?
Can you provide more specs on your hardware? What java version are you
running?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 11 June 2014 08:41, sai...@roblox.com wrote:

The Heap Size is being reduced to 30GB to ensure that's not the bottleneck.

The servers currently run SAS Drives. Though SSDs are usually preferred
for Elasticsearch, can this cause such disparities in
performance? ElasticHQ reports very high Refresh Rates, Search-Fetch and
Search-Query rates.

On Tuesday, June 10, 2014 3:17:49 PM UTC-7, Mark Walkom wrote:

You will likely see an increase by distributing it to one shard per
machine, but that's hard to quantify without actually doing it.

Also, you may be doing yourself a disservice with such a large heap size
as Nik mentioned. Over 32GB, Java pointers are not compressed and you do
lose a bit of performance due to this.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 11 June 2014 07:20, sai...@roblox.com wrote:

Thanks for the clarification. The servers aren't under any (read) load
yet. There is constant update of data in the background - Roughly about 60
Index Writes per second. The refresh interval is set to 60s. Can this be a
performance bottleneck?

We can add in more nodes to bring it up to 10 Nodes - 5 Shards with 1
Replica. But I doubt if that will reduce the Empty Search Query to 50ms.
Are there any other profiling tools out there to debug the response time?

On Tuesday, June 10, 2014 11:30:03 AM UTC-7, Nikolas Everett wrote:

Short answer: yes.
Long answer: 500ms is a long time for the empty query. I see 2ms from
elasticsearch and 23ms from time in development. In production I see maybe
54ms from elasticsearch and 70 from time across far far more shards and
more data. When I do the same query across thousands of shards and a
couple of TB of data I get ~250ms. Production is 16 servers with 96GB of
ram and 30GB heaps.

The analyzers really aren't going to hurt performance.

I'd have a look at your servers themselves: what kind of load are they
under? What is your indexing rate? that kind of thing.

Also, 30GB is normally the sweet spot for heap sizes, making ~64GB of
total ram the sweet spot for total ram. 110GB heap is pretty high and I'd
expect for new generation (pause the world) garbage collection to take a
while there.

Nik

On Tue, Jun 10, 2014 at 2:20 PM, sai...@roblox.com wrote:

I am currently running only 1 index with 5 shards. So the both of those
queries yield the same response time. My main question is to understand if
scaling out is an Option given the current replication scheme.

https://lh5.googleusercontent.com/-bz8iQd0KUaA/U5dMSGLNNFI/AAAAAAAAABg/tGJl0HOj4xo/s1600/Elasticsearch+Cluster.png

On Tuesday, June 10, 2014 11:15:26 AM UTC-7, Nikolas Everett wrote:

I imagine that depends on lots of stuff. Are you doing
elasticsearch:9200/_search or elasticsearch:9200/index/_search ? The
former can take quite a while if you have lots index and lots of shards.
If you can get away with not doing it, I would. The latter will only take
a long time if you have tons of shards. It should otherwise be pretty
quick.

On Tue, Jun 10, 2014 at 2:10 PM, sai...@roblox.com wrote:

We currently run our Elasticsearch (v1.0.2) cluster on 3 Nodes with 5
Shards and 1 Replication
Scheme. The total index size is about 70GB
(~140GB with replication).

The Empty Search (/_search) query takes 500-600 ms to respond. Will adding
in more Nodes help in this case? The Servers are have 252gb of RAM and
110gb for Heap.

The Index uses the following analyzers - standard, lowercase, stop,
porter_stem. Will this degrade Query performance?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/ms
gid/elasticsearch/9941cfd1-d211-4706-aa45-6c545c66baff%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9941cfd1-d211-4706-aa45-6c545c66baff%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

</b

...

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/07eb459d-23ae-43df-b5c3-0eb88f1bd68b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(sairam-2) #16

That didn't seem to help either. The Query Response dropped to about 480ms
from 530ms for few seconds.

On Wednesday, June 11, 2014 6:43:23 PM UTC-7, NC wrote:

May be good idea to stop writes, wait for merges.current to go down to
zero and then look at response time.

On Wednesday, June 11, 2014 12:36:04 PM UTC-5, sai...@roblox.com wrote:

The empty index takes 1 ms to return for the Empty Search
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/empty-search.html
query. I did not specify a From and Size in my empty searches (they
defaulted to 10). Specifying explicit Size, Fields doesn't make a
difference.

On Wednesday, June 11, 2014 8:04:17 AM UTC-7, Antonio Augusto Santos wrote:

Did you try setting up am empty index and see if you get the same results?
Also, are you returning all the docs in your query, or are you just
getting a subset of it?

On Tuesday, June 10, 2014 9:36:23 PM UTC-3, sai...@roblox.com wrote:

We run Oracle version. It doesn't show any major IO Latency. The %IOWait
is at 0.00 to 0.07 at most.

On Tuesday, June 10, 2014 4:43:44 PM UTC-7, Mark Walkom wrote:

Are you running OpenJDK or Oracle?

Also, what system monitor are you using? Does it show high IO latency? You
might be able to find something using iostat and friends if you aren't
using anything now.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 11 June 2014 09:35, sai...@roblox.com wrote:

Machine Specs:
Processor: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Number of CPU cores: 24
Number of Physical CPUs: 2
Installed RAM: [~256 GB Total] 128 GB 128 GB 16 MB
Drive: Two 278GB SAS Drive configured in
RAID 0
OS:
Arch: 64bit(x86_64)
OS Type: Linux
Kernel: 2.6.32-431.5.1.el6.x86_64
OS Version: Red Hat Enterprise Linux Server release 6.5
(Santiago)
Java Version: Java 1.7.0_51 (Java 7u51 x64 version for
Linux).

Since we don't have any read queries executed against ES, the Hot Threads
are under 1%. I have given 3 different scenarios with basic Match_All,
Query with Filters/Sorts and Idle time.

  • Search Query (Basic Match_All with no filters/sorts)

28.2% (141.2ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][search][T#46]'

 10/10 snapshots sharing following 10 elements

   sun.misc.Unsafe.park(Native Method)

   java.util.concurrent.locks.LockSupport.park(Unknown Source)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)

   org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)

   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)

  23.4% (116.9ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][search][T#30]'

 10/10 snapshots sharing following 10 elements

   sun.misc.Unsafe.park(Native Method)

   java.util.concurrent.locks.LockSupport.park(Unknown Source)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)

   org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)

   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)

   1.2% (6ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][management][T#3]'

 10/10 snapshots sharing following 9 elements

   sun.misc.Unsafe.park(Native Method)

   java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:702)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)

   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.poll(LinkedTransferQueue.java:1117)

   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)
  • Query with filters and Sorts

80.4% (402ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][search][T#57]'

 4/10 snapshots sharing following 20 elements

   org.apache.lucene.search.FilteredDocIdSetIterator.nextDoc(FilteredDocIdSetIterator.java:60)

   org.apache.lucene.search.ConstantScoreQuery$ConstantScorer.nextDoc(ConstantScoreQuery.java:196)

   org.elasticsearch.common.lucene.search.function.FunctionScoreQuery$CustomBoostFactorScorer.nextDoc(FunctionScoreQuery.java:169)

   org.apache.lucene.search.Scorer.score(Scorer.java:64)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)

   org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:173)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:491)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:448)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:269)

   org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:122)

   org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:249)

   org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:202)

   org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:186)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)

 6/10 snapshots sharing following 18 elements

   org.elasticsearch.common.lucene.search.FilteredCollector.collect(FilteredCollector.java:60)

   org.apache.lucene.search.Scorer.score(Scorer.java:65)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)

   org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:173)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:491)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:448)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281)

   org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:269)

   org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:122)

   org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:249)

   org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:202)

   org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203)

   org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:186)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)

   0.6% (2.7ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][http_server_worker][T#45]{New I/O worker #143}'

 10/10 snapshots sharing following 15 elements

   sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

   sun.nio.ch.EPollArrayWrapper.poll(Unknown Source)

   sun.nio.ch.EPollSelectorImpl.doSelect(Unknown Source)

   sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)

   sun.nio.ch.SelectorImpl.select(Unknown Source)

   org.elasticsearch.common.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:415)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)

   org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)

   org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)

   org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)

   0.4% (1.8ms out of 500ms) cpu usage by thread 'elasticsearch[Crime Master][http_server_worker][T#41]{New I/O worker #139}'

 10/10 snapshots sharing following 15 elements

   sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

   sun.nio.ch.EPollArrayWrapper.poll(Unknown Source)

   sun.nio.ch.EPollSelectorImpl.doSelect(Unknown Source)

   sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)

   sun.nio.ch.SelectorImpl.select(Unknown Source)

   org.elasticsearch.common.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:415)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)

   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)

   org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)

   org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)

   org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)

   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

   java.lang.Thread.run(Unknown Source)
  • When there no reads

99.6% (498ms out of 500ms) cpu usage by thread 'elasticsearch[Black Mamba][http_server_worker][T#9]{New I/O worker #107}'
10/10 snapshots sharing following 17 elements
java.nio.Bits.copyFromArray(Unknown Source)
java.nio.DirectByteBuffer.put(Unknown Source)
java.nio.DirectByteBuffer.put(Unknown Source)
sun.nio.ch.IOUtil.write(Unknown Source)
sun.nio.ch.SocketChannelImpl.write(Unknown Source)
org.elasticsearch.common.netty.channel.socket.nio.SocketSendBufferPool$UnpooledSendBuffer.transferTo(SocketSendBufferPool.java:203)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:201)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.writeFromSelectorLoop(AbstractNioWorker.java:158)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:114)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
java.lang.Thread.run(Unknown Source)

0.5% (2.3ms out of 500ms) cpu usage by thread 'elasticsearch[Black Mamba][index][T#21]'
 10/10 snapshots sharing following 10 elements
   sun.misc.Unsafe.park(Native Method)
   java.util.concurrent.locks.LockSupport.park(Unknown Source)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)
   org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)
   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   java.lang.Thread.run(Unknown Source)

0.5% (2.3ms out of 500ms) cpu usage by thread 'elasticsearch[Black Mamba][index][T#13]'
 10/10 snapshots sharing following 10 elements
   sun.misc.Unsafe.park(Native Method)
   java.util.concurrent.locks.LockSupport.park(Unknown Source)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)
   org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)
   org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)
   java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   java.lang.Thread.run(Unknown Source)

On Tuesday, June 10, 2014 3:51:03 PM UTC-7, Mark Walkom wrote:

SSDs help, though there is likely some other issue here so it's probably
not worth looking at, at this time.

Have you checked hot threads or the slow query log?
Can you provide more specs on your hardware? What java version are you
running?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 11 June 2014 08:41, sai...@roblox.com wrote:

The Heap Size is being reduced to 30GB to ensure that's not the bottleneck.

The servers currently run SAS Drives. Though SSDs are usually preferred
for Elasticsearch, can this cause such disparities in
performance? ElasticHQ reports very high Refresh Rates, Search-Fetch and
Search-Query rates.

On Tuesday, June 10, 2014 3:17:49 PM UTC-7, Mark Walkom wrote:

You will likely see an increase by distributing it to one shard per
machine, but that's hard to quantify without actually doing it.

Also, you may be doing yourself a disservice with such a large heap size
as Nik mentioned. Over 32GB, Java pointers are not compressed and you do
lose a bit of performance due to this.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 11 June 2014 07:20, sai...@roblox.com wrote:

Thanks for the clarification. The servers aren't under any (read) load
yet. There is constant update of data in the background - Roughly about 60
Index Writes per second. The refresh interval is set to 60s. Can this be a
performance bottleneck?

We can add in more nodes to bring it up to 10 Nodes - 5 Shards with 1
Replica. But I doubt if that will reduce the Empty Search Query to 50ms.
Are there any other profiling tools out there to debug the response time?

On Tuesday, June 10, 2014 11:30:03 AM UTC-7, Nikolas Everett wrote:

Short answer: yes.
Long answer: 500ms is a long time for the empty query. I see 2ms from
elasticsearch and 23ms from time in development. In production I see maybe
54ms from elasticsearch and 70 from time across far far more shards and
more data. When I do the same query across thousands of shards and a
couple of TB of data I get ~250ms. Production is 16 servers with 96GB of
ram and 30GB heaps.

The analyzers really aren't going to hurt performance.

I'd have a look at your servers themselves: what kind of load are they
under? What is your indexing rate? that kind of thing.

Also, 30GB is normally the sweet spot for heap sizes, making ~64GB of
total ram the sweet spot for total ram. 110GB heap is pretty high and I'd
expect for new generation (pause the world) garbage collection to take a
while there.

Nik

On Tue, Jun 10, 2014 at 2:20 PM, sai...@roblox.com wrote:

I am currently running only 1 index with 5 shards. So the both of those
queries yield the same response time. My main question is to understand if
scaling out is an Option given the current replication scheme.

https://lh5.googleusercontent.com/-bz8iQd0KUaA/U5dMSGLNNFI/AAAAAAAAABg/tGJl0HOj4xo/s1600/Elasticsearch+Cluster.png

On Tuesday, June 10, 2014 11:15:26 AM UTC-7, Nikolas Everett wrote:

I imagine that depends on lots of stuff. Are you doing
elasticsearch:9200/_search or elasticsearch:9200/index/_search ? The
former can take quite a while if you have lots index and lots of shards.
If you can get away with not doing it, I would. The latter will only take
a long time if you have tons of shards. It should otherwise be pretty
quick.

On Tue, Jun 10, 2014 at 2:10 PM, sai...@roblox.com wrote:

We currently run our Elasticsearch (v1.0.2) cluster on 3 Nodes with 5
Shards and 1 Replication
Scheme. The total index size is about 70GB
(~140GB with replication).

The Empty Search (/_search) query takes 500-600 ms to respond. Will adding
in more Nodes help in this case? The Servers are have 252gb of RAM and
110gb for Heap.

The Index uses the following analyzers - standard, lowercase, stop,
porter_stem. Will this degrade Query performance?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit <a href="
https://groups.google.com/d/msgid/elasticsearch/9941cfd1-d211-4706-aa45-6c545c66baff%40googlegroups.com?utm_medium=email&utm_source=footer"
target="_blank" onmousedown="this.href='
https://groups.google.com/d/msgid/elasticsearch/9941cfd1-d211-4706-aa45-6c545c66baff%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return
https://groups.google.com/d/msgid/elasticsearch/9941cfd1-d211-4706-aa45-6c545c66baff%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return
t

...

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2af501f2-f634-4551-87dd-57f6a0f57b06%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #17