High load average running on ES node

Arjit_Gupta · January 5, 2014, 4:49am

Hi,

I have 4 node cluster for 32 Gb ram and 8 core processor. With 5 indexes
and 5 primary shrads and 2 replica.
I am using elastic search version 0.90.1
I have a lot of read/writes/deletes. Most of time the load average is one
of the node goes to 70-80 for other it comes to 10 on a high load.
I attached jconsole sharing the screenshorts. I see a lot of gc cycles
happening
I was going
through http://jprante.github.io/2012/11/28/Elasticsearch-Java-Virtual-Machine-settings-explained.html
It says to avoid "Avoiding Stop-the-world phases" adjust index.merge.policy.segments_per_tier.
I am using default values as of now for merge.

Jstack of the ES node on high load

gist.github.com

https://gist.github.com/arjitgupta/8264462

jstack

2014-01-04 16:03:56
Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.1-b02 mixed mode):

"Attach Listener" daemon prio=10 tid=0x0000000042c17000 nid=0x5eab waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"elasticsearch[sp-cms-hoodoo-search8][[fk_mp_product_category_node_ds5][0]: Lucene Merge Thread #17458]" daemon prio=10 tid=0x00007fdaa5f71000 nid=0x5eaa sleeping[0x00007fdabe77c000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at java.lang.Thread.sleep(Thread.java:302)

This file has been truncated. show original

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d5d2c8e5-5eab-4af9-bd06-951c8786ddaa%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

vaidik · January 5, 2014, 6:34am

I have seen similar problems. I am no Elasticsearch expert yet. But I'd
suggest trying G1 Garbage Collector as well instead of the default CMS
Garbage Collector. From what I know, CMS was never made for such large JVM
heaps. G1 works better with large JVM heaps, runs more frequently instead
of after a long time to keep the GC run shorter, leading to avoiding
stop-the-world pauses.

I am not sure if this will help. But you can give it a try.

Vaidik Kapoor
vaidikkapoor.info

On 5 January 2014 10:19, Arjit Gupta arjit292@gmail.com wrote:

Hi,

I have 4 node cluster for 32 Gb ram and 8 core processor. With 5 indexes
and 5 primary shrads and 2 replica.
I am using Elasticsearch version 0.90.1
I have a lot of read/writes/deletes. Most of time the load average is one
of the node goes to 70-80 for other it comes to 10 on a high load.
I attached jconsole sharing the screenshorts. I see a lot of gc cycles
happening
I was going through
http://jprante.github.io/2012/11/28/Elasticsearch-Java-Virtual-Machine-settings-explained.html
It says to avoid "Avoiding Stop-the-world phases" adjust index.merge.policy.segments_per_tier.
I am using default values as of now for merge.

Jstack of the ES node on high load
Jstack of ES cluster node on high load · GitHub

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d5d2c8e5-5eab-4af9-bd06-951c8786ddaa%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACWtv5kAXL33%3DKChLsRfZ9TMRONwsX1bGD9KyEnOwFroaNKG9g%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Arjit_Gupta · January 5, 2014, 7:35am

I really dont know if G1 is production ready on Java6 . Are you using it on
java 6 ?

Java version on my servers :
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)

Thanks ,
Arjit

On Sun, Jan 5, 2014 at 12:04 PM, Vaidik Kapoor kapoor.vaidik@gmail.comwrote:

I have seen similar problems. I am no Elasticsearch expert yet. But I'd
suggest trying G1 Garbage Collector as well instead of the default CMS
Garbage Collector. From what I know, CMS was never made for such large JVM
heaps. G1 works better with large JVM heaps, runs more frequently instead
of after a long time to keep the GC run shorter, leading to avoiding
stop-the-world pauses.

I am not sure if this will help. But you can give it a try.

Vaidik Kapoor
vaidikkapoor.info

On 5 January 2014 10:19, Arjit Gupta arjit292@gmail.com wrote:

Hi,

I have 4 node cluster for 32 Gb ram and 8 core processor. With 5 indexes
and 5 primary shrads and 2 replica.
I am using Elasticsearch version 0.90.1
I have a lot of read/writes/deletes. Most of time the load average is one
of the node goes to 70-80 for other it comes to 10 on a high load.
I attached jconsole sharing the screenshorts. I see a lot of gc cycles
happening
I was going through
http://jprante.github.io/2012/11/28/Elasticsearch-Java-Virtual-Machine-settings-explained.html
It says to avoid "Avoiding Stop-the-world phases" adjust index.merge.policy.segments_per_tier.
I am using default values as of now for merge.

Jstack of the ES node on high load
Jstack of ES cluster node on high load · GitHub

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d5d2c8e5-5eab-4af9-bd06-951c8786ddaa%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/taLTdd4S29w/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CACWtv5kAXL33%3DKChLsRfZ9TMRONwsX1bGD9KyEnOwFroaNKG9g%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CADe%2BHd8oAyEt-y_xf0AQkvQo%3DWoHcxb-c6b%3DWeM4D8qtyupJqg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

vaidik · January 5, 2014, 7:36am

No I'm using it with Java 7. But anyways it is recommended to use Java 7.
Is it not possible for you to move to Java 7?
On Jan 5, 2014 1:05 PM, "Arjit Gupta" arjit292@gmail.com wrote:

I really dont know if G1 is production ready on Java6 . Are you using it
on java 6 ?

Java version on my servers :
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)

Thanks ,
Arjit

On Sun, Jan 5, 2014 at 12:04 PM, Vaidik Kapoor kapoor.vaidik@gmail.comwrote:

I have seen similar problems. I am no Elasticsearch expert yet. But I'd
suggest trying G1 Garbage Collector as well instead of the default CMS
Garbage Collector. From what I know, CMS was never made for such large JVM
heaps. G1 works better with large JVM heaps, runs more frequently instead
of after a long time to keep the GC run shorter, leading to avoiding
stop-the-world pauses.

I am not sure if this will help. But you can give it a try.

Vaidik Kapoor
vaidikkapoor.info

On 5 January 2014 10:19, Arjit Gupta arjit292@gmail.com wrote:

Hi,

I have 4 node cluster for 32 Gb ram and 8 core processor. With 5 indexes
and 5 primary shrads and 2 replica.
I am using Elasticsearch version 0.90.1
I have a lot of read/writes/deletes. Most of time the load average is
one of the node goes to 70-80 for other it comes to 10 on a high load.
I attached jconsole sharing the screenshorts. I see a lot of gc cycles
happening
I was going through
http://jprante.github.io/2012/11/28/Elasticsearch-Java-Virtual-Machine-settings-explained.html
It says to avoid "Avoiding Stop-the-world phases" adjust index.merge.policy.segments_per_tier.
I am using default values as of now for merge.

Jstack of the ES node on high load
Jstack of ES cluster node on high load · GitHub

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d5d2c8e5-5eab-4af9-bd06-951c8786ddaa%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/taLTdd4S29w/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CACWtv5kAXL33%3DKChLsRfZ9TMRONwsX1bGD9KyEnOwFroaNKG9g%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CADe%2BHd8oAyEt-y_xf0AQkvQo%3DWoHcxb-c6b%3DWeM4D8qtyupJqg%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACWtv5np8eUckt4cG67EnwaKN0VfUcpTNT4_q4qKq0HN%2BVFvVw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · January 5, 2014, 5:25pm

The load is not much a surprise for an 8 core CPU node, I have also
observed loads of 80-100.

This high load, when induced by indexing, can be significantly reduced when
using a high performance input/output disk subsystem, such as SSD. The
disks are the slowest part in the system and generate high I/O wait which
is responsible for increasing the CPU load.

GC does generate high load too, this is mostly related to expensive queries
that use filters or caches. The overall performance of the JVM is getting
very poor in that case.

You have several options:

rewriting queries or reconfiguring ES for efficient cache usage
adding nodes
decrease the heap slightly to smooth the steep edge when stop-the-world
GC kicks in (but this depends on the workload if your ES cluster can work
with less heap)

G1 GC does not help against query/filter load, it is not decreasing CPU
load, in fact, it is putting more CPU load on the machines, so it can
better make a trade-off with less stop-of-the-world. G1 GC helps to push
the stop-the-world periods under a certain limit so ES nodes do not
disconnect that easily. It has no steep edge when performing stop-the-world
GC phases.

Please note, currently G1 GC seems safe only with Java 7 or Java 8 and ES
version that have replaced GNU trove4j with HPPC library, that is, 0.90.9
or 1.0.0.Beta2

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFnK4SweOfQK1b0yg9M4yCBJVVmGpd%3DcJpHSwqLC9Cjxw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Arjit_Gupta · January 9, 2014, 5:16am

Hi Jörg,

Thanks a lot for your detailed reply.
Can you please explain how can I reconfiguring ES for efficient cache
usage ?

Thanks,
Arjit

Thanks ,
Arjit

On Sun, Jan 5, 2014 at 10:55 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

The load is not much a surprise for an 8 core CPU node, I have also
observed loads of 80-100.

This high load, when induced by indexing, can be significantly reduced
when using a high performance input/output disk subsystem, such as SSD. The
disks are the slowest part in the system and generate high I/O wait which
is responsible for increasing the CPU load.

GC does generate high load too, this is mostly related to expensive
queries that use filters or caches. The overall performance of the JVM is
getting very poor in that case.

You have several options:

rewriting queries or reconfiguring ES for efficient cache usage

adding nodes

decrease the heap slightly to smooth the steep edge when stop-the-world
GC kicks in (but this depends on the workload if your ES cluster can work
with less heap)

G1 GC does not help against query/filter load, it is not decreasing CPU
load, in fact, it is putting more CPU load on the machines, so it can
better make a trade-off with less stop-of-the-world. G1 GC helps to push
the stop-the-world periods under a certain limit so ES nodes do not
disconnect that easily. It has no steep edge when performing stop-the-world
GC phases.

Please note, currently G1 GC seems safe only with Java 7 or Java 8 and ES
version that have replaced GNU trove4j with HPPC library, that is, 0.90.9
or 1.0.0.Beta2

Jörg

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/taLTdd4S29w/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFnK4SweOfQK1b0yg9M4yCBJVVmGpd%3DcJpHSwqLC9Cjxw%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CADe%2BHd_1c2OpJoHdH8WGa0q0F-FR4haC71FrmPGgk5Oz1RwDPg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
One node very high load average , why? Elasticsearch	9	1688	July 15, 2022
High CPU usage / load average while no running queries Elasticsearch	16	23083	February 5, 2019
The load of ES cluster CPU is high, but the utilization rate is not high Elasticsearch	7	468	December 14, 2021
ES High cpu issues Elasticsearch	11	1061	September 6, 2018
Investigating elasticsearch load issues Elasticsearch	9	1194	July 6, 2017

High load average running on ES node

Related topics