Advice on what to do about garbage


(Nik Everett) #1

tl/dr: ParNew is kicking in 25% of the time. I want to track down why.
Please send advice.

I've been investigating ramping up load on my Elasticsearch cluster on and
off for months and CPU utilization has always been too high for me.
Usually I ramp up the load, check hot_threads and jstack's thread dumps and
find some stack traces that are a smoking gun. Then I go fix something
somewhere and try again. Well this time around I don't see any smoking
guns. When I check hot_threads and jstack's thread dumps Elasticsearch
looks bored. One or two threads are genuinely RUNNABLE. But top/ganglia
say I'm using 40% of all 12 my cpus.

So I checked garbage collection. Under normal load ParNew kicks in every
~1.1 seconds and runs for .08 seconds. When I ramp the load up it kicks in
every ~.3 seconds and runs for the same .08 seconds.

I could spend a while optimizing GC configuration but the internet tells me
I'm not likely to crank down that .08 seconds too much. I figure the best
thing to do is stop making so much stuff. But I'm really unsure what tools
there are for finding out how much stuff I've been making. Can anyone give
me any tips on where to go from here?

I'm using default configuration except for 30GB heaps and garbage
collection logging. I have 12 CPUs per node.

Nik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd25%3DC%3DCwZEQnBkttenM%2BcsJbaoZ%3DB7K7J1jWOzhXgy-Mw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #2

What ES and java versions? How much data (size and index count), how many
nodes, replication factor?

If you've done a lot of that back end work to optimise all your queries you
are probably just reaching the limit of your cluster.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 14 May 2014 23:25, Nikolas Everett nik9000@gmail.com wrote:

tl/dr: ParNew is kicking in 25% of the time. I want to track down why.
Please send advice.

I've been investigating ramping up load on my Elasticsearch cluster on and
off for months and CPU utilization has always been too high for me.
Usually I ramp up the load, check hot_threads and jstack's thread dumps and
find some stack traces that are a smoking gun. Then I go fix something
somewhere and try again. Well this time around I don't see any smoking
guns. When I check hot_threads and jstack's thread dumps Elasticsearch
looks bored. One or two threads are genuinely RUNNABLE. But top/ganglia
say I'm using 40% of all 12 my cpus.

So I checked garbage collection. Under normal load ParNew kicks in every
~1.1 seconds and runs for .08 seconds. When I ramp the load up it kicks in
every ~.3 seconds and runs for the same .08 seconds.

I could spend a while optimizing GC configuration but the internet tells
me I'm not likely to crank down that .08 seconds too much. I figure the
best thing to do is stop making so much stuff. But I'm really unsure what
tools there are for finding out how much stuff I've been making. Can
anyone give me any tips on where to go from here?

I'm using default configuration except for 30GB heaps and garbage
collection logging. I have 12 CPUs per node.

Nik

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd25%3DC%3DCwZEQnBkttenM%2BcsJbaoZ%3DB7K7J1jWOzhXgy-Mw%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAPmjWd25%3DC%3DCwZEQnBkttenM%2BcsJbaoZ%3DB7K7J1jWOzhXgy-Mw%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YuxkRiwsdSZZXBbZAHa%3DX52u%3Dbi3zdPwAJZcyt9ybGxg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Nik Everett) #3

I'm really after experience folks have with finding things that allocate
lots of memory. I'm in the process of replacing an application built off a
very old version of Lucene that currently handles the traffic no problem on
half the hardware. Elasticsearch can do better then it is. Either I'm
abusing it or there it needs fixing.

Because you asked, ES1.1.0, OpenJDK1.7.0_25, 16 nodes with 12 cpus each,
90GB of RAM on each node with 30GB heaps. Currently 1768 indexes split
into 6179 shards all with 2 replicas (so three total copies). 170,518,530
total documents taking up ~3.7TB. Many of the documents pretty big. The
troublesome index is 20 shards, 5,177,757 documents taking up 252GB.

My queries are all pretty beefy - the hit maybe ten fields all with
different boost, have a ~8000 hits rescore winder that applies some script
scoring, filter scoring, frequently a phrase query version of the original
query. Normally I return 50 hits at a time, on which I highlight 6ish
fields. I also run the phrase suggester on almost every search. That all
sounds pretty heavy and it was at first, but we've been able to improve
performance on the suggester and highlighting to the point where the
servers look pretty bored. Except for GC.

I suppose I can tweak the load generation to skip the suggester and/or skip
the highlighting and see where that puts me but at that point I'm really
just guessing.

Nik

On Wed, May 14, 2014 at 7:17 PM, Mark Walkom markw@campaignmonitor.comwrote:

What ES and java versions? How much data (size and index count), how many
nodes, replication factor?

If you've done a lot of that back end work to optimise all your queries
you are probably just reaching the limit of your cluster.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 14 May 2014 23:25, Nikolas Everett nik9000@gmail.com wrote:

tl/dr: ParNew is kicking in 25% of the time. I want to track down
why. Please send advice.

I've been investigating ramping up load on my Elasticsearch cluster on
and off for months and CPU utilization has always been too high for me.
Usually I ramp up the load, check hot_threads and jstack's thread dumps and
find some stack traces that are a smoking gun. Then I go fix something
somewhere and try again. Well this time around I don't see any smoking
guns. When I check hot_threads and jstack's thread dumps Elasticsearch
looks bored. One or two threads are genuinely RUNNABLE. But top/ganglia
say I'm using 40% of all 12 my cpus.

So I checked garbage collection. Under normal load ParNew kicks in every
~1.1 seconds and runs for .08 seconds. When I ramp the load up it kicks in
every ~.3 seconds and runs for the same .08 seconds.

I could spend a while optimizing GC configuration but the internet
tells me I'm not likely to crank down that .08 seconds too much. I figure
the best thing to do is stop making so much stuff. But I'm really unsure
what tools there are for finding out how much stuff I've been making. Can
anyone give me any tips on where to go from here?

I'm using default configuration except for 30GB heaps and garbage
collection logging. I have 12 CPUs per node.

Nik

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd25%3DC%3DCwZEQnBkttenM%2BcsJbaoZ%3DB7K7J1jWOzhXgy-Mw%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAPmjWd25%3DC%3DCwZEQnBkttenM%2BcsJbaoZ%3DB7K7J1jWOzhXgy-Mw%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624YuxkRiwsdSZZXBbZAHa%3DX52u%3Dbi3zdPwAJZcyt9ybGxg%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAEM624YuxkRiwsdSZZXBbZAHa%3DX52u%3Dbi3zdPwAJZcyt9ybGxg%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0cJvA7RDi-8o93QTn0sT3esRHJt-xJOBjyFMrtUFX8MA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4