Minor temp performance hit when JVM heap size is reached

ppearcy · November 3, 2010, 5:20pm

Hey,
This isn't a huge deal, but wanted to point it out to see if there
are any ways to avoid. Current production load is pretty low, 40 or so
queries per second max. Performance is great and we know we can ramp
up the traffic to 25x this rate. Average response times are ~25ms and
max are ~200ms.

However, we have seen two performance hits, where averages jumped to a
sec and maxs up to 12 seconds. Both times this has occurred it
directly correlates one of the machines in the cluster hitting its JVM
heap max for elasticsearch. After the limit has been hit, each machine
has been fine.

Any ideas if there is anyway to avoid this hit? I'm guessing that the
GC kicks on and ends up blocking some processing.

Thanks,
Paul

kimchy · November 3, 2010, 5:37pm

Yes, its basically the GC kicking in. One problem that might cause long GC
is the mem being swapped to disk, causing the GC to be really slow since it
needs to process the whole memory (thus loading the swapped memory). This
will be better in 0.13 as I added support (copied form cassandra) to force
the OS not to swap out the JVM.

-shay.banon

On Wed, Nov 3, 2010 at 7:20 PM, Paul ppearcy@gmail.com wrote:

Hey,
This isn't a huge deal, but wanted to point it out to see if there
are any ways to avoid. Current production load is pretty low, 40 or so
queries per second max. Performance is great and we know we can ramp
up the traffic to 25x this rate. Average response times are ~25ms and
max are ~200ms.

However, we have seen two performance hits, where averages jumped to a
sec and maxs up to 12 seconds. Both times this has occurred it
directly correlates one of the machines in the cluster hitting its JVM
heap max for elasticsearch. After the limit has been hit, each machine
has been fine.

Any ideas if there is anyway to avoid this hit? I'm guessing that the
GC kicks on and ends up blocking some processing.

Thanks,
Paul

tfreitas · November 3, 2010, 6:15pm

Hi

maybe this link helps to JVM tuning

Tony

On Nov 3, 1:37 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Yes, its basically the GC kicking in. One problem that might cause long GC
is the mem being swapped to disk, causing the GC to be really slow since it
needs to process the whole memory (thus loading the swapped memory). This
will be better in 0.13 as I added support (copied form cassandra) to force
the OS not to swap out the JVM.

-shay.banon

On Wed, Nov 3, 2010 at 7:20 PM, Paul ppea...@gmail.com wrote:

Hey,
This isn't a huge deal, but wanted to point it out to see if there
are any ways to avoid. Current production load is pretty low, 40 or so
queries per second max. Performance is great and we know we can ramp
up the traffic to 25x this rate. Average response times are ~25ms and
max are ~200ms.

However, we have seen two performance hits, where averages jumped to a
sec and maxs up to 12 seconds. Both times this has occurred it
directly correlates one of the machines in the cluster hitting its JVM
heap max for elasticsearch. After the limit has been hit, each machine
has been fine.

Any ideas if there is anyway to avoid this hit? I'm guessing that the
GC kicks on and ends up blocking some processing.

Thanks,
Paul

ppearcy · November 3, 2010, 10:56pm

Awesome. Great link.

Like I said, not concerned, more curious. Will see how things look
after moving to 0.13.

Thanks,
Paul

On Nov 3, 12:15 pm, tfreitas tfrei...@gmail.com wrote:

Hi

maybe this link helps to JVM tuning

Open Source and Enterprise Architecture: JVM Performance Tuning

Tony

On Nov 3, 1:37 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Yes, its basically the GC kicking in. One problem that might cause long GC
is the mem being swapped to disk, causing the GC to be really slow since it
needs to process the whole memory (thus loading the swapped memory). This
will be better in 0.13 as I added support (copied form cassandra) to force
the OS not to swap out the JVM.

-shay.banon

On Wed, Nov 3, 2010 at 7:20 PM, Paul ppea...@gmail.com wrote:

Hey,
This isn't a huge deal, but wanted to point it out to see if there
are any ways to avoid. Current production load is pretty low, 40 or so
queries per second max. Performance is great and we know we can ramp
up the traffic to 25x this rate. Average response times are ~25ms and
max are ~200ms.

However, we have seen two performance hits, where averages jumped to a
sec and maxs up to 12 seconds. Both times this has occurred it
directly correlates one of the machines in the cluster hitting its JVM
heap max for elasticsearch. After the limit has been hit, each machine
has been fine.

Any ideas if there is anyway to avoid this hit? I'm guessing that the
GC kicks on and ends up blocking some processing.

Thanks,
Paul

kimchy · November 3, 2010, 11:13pm

Just one note regarding the 0.13 not swapping feature, it requires setting
the minimum and the maximum memory allocation for the JVM to be the same.

On Thu, Nov 4, 2010 at 12:56 AM, Paul ppearcy@gmail.com wrote:

Awesome. Great link.

Like I said, not concerned, more curious. Will see how things look
after moving to 0.13.

Thanks,
Paul

On Nov 3, 12:15 pm, tfreitas tfrei...@gmail.com wrote:

Hi

maybe this link helps to JVM tuning

Open Source and Enterprise Architecture: JVM Performance Tuning

Tony

On Nov 3, 1:37 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Yes, its basically the GC kicking in. One problem that might cause long
GC
is the mem being swapped to disk, causing the GC to be really slow
since it
needs to process the whole memory (thus loading the swapped memory).
This
will be better in 0.13 as I added support (copied form cassandra) to
force
the OS not to swap out the JVM.

-shay.banon

On Wed, Nov 3, 2010 at 7:20 PM, Paul ppea...@gmail.com wrote:

Hey,
This isn't a huge deal, but wanted to point it out to see if there
are any ways to avoid. Current production load is pretty low, 40 or
so
queries per second max. Performance is great and we know we can ramp
up the traffic to 25x this rate. Average response times are ~25ms and
max are ~200ms.

However, we have seen two performance hits, where averages jumped to
a
sec and maxs up to 12 seconds. Both times this has occurred it
directly correlates one of the machines in the cluster hitting its
JVM
heap max for elasticsearch. After the limit has been hit, each
machine
has been fine.

Any ideas if there is anyway to avoid this hit? I'm guessing that the
GC kicks on and ends up blocking some processing.

Thanks,
Paul

ppearcy · December 2, 2010, 10:25pm

Just wanted to say that 0.13.0 has been running for around a week, now
and we haven't hit this case again. Really awesome work. Many thanks
to Shay and the Cassandra team for figuring out how to solve this
case.

On Nov 3, 4:13 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Just one note regarding the 0.13 not swapping feature, it requires setting
the minimum and the maximum memory allocation for the JVM to be the same.

On Thu, Nov 4, 2010 at 12:56 AM, Paul ppea...@gmail.com wrote:

Awesome. Great link.

Like I said, not concerned, more curious. Will see how things look
after moving to 0.13.

Thanks,
Paul

On Nov 3, 12:15 pm, tfreitas tfrei...@gmail.com wrote:

Hi

maybe this link helps to JVM tuning

Open Source and Enterprise Architecture: JVM Performance Tuning

Tony

On Nov 3, 1:37 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Yes, its basically the GC kicking in. One problem that might cause long
GC
is the mem being swapped to disk, causing the GC to be really slow
since it
needs to process the whole memory (thus loading the swapped memory).
This
will be better in 0.13 as I added support (copied form cassandra) to
force
the OS not to swap out the JVM.

-shay.banon

On Wed, Nov 3, 2010 at 7:20 PM, Paul ppea...@gmail.com wrote:

Hey,
This isn't a huge deal, but wanted to point it out to see if there
are any ways to avoid. Current production load is pretty low, 40 or
so
queries per second max. Performance is great and we know we can ramp
up the traffic to 25x this rate. Average response times are ~25ms and
max are ~200ms.

However, we have seen two performance hits, where averages jumped to
a
sec and maxs up to 12 seconds. Both times this has occurred it
directly correlates one of the machines in the cluster hitting its
JVM
heap max for elasticsearch. After the limit has been hit, each
machine
has been fine.

Any ideas if there is anyway to avoid this hit? I'm guessing that the
GC kicks on and ends up blocking some processing.

Thanks,
Paul

Topic		Replies	Views
Heap usage holds steady at max and GC does not run. Need to force restart the cluster Elasticsearch elastic-stack-monitoring	5	548	July 6, 2023
JVM > 90% - Small indexes , High Shards Elasticsearch	6	955	July 5, 2017
JVM heap usage is high Elasticsearch	7	1922	December 26, 2019
JVM Heap usage spikes Elasticsearch	3	1111	July 5, 2017
ElasticSearch gc performance on cluster Elasticsearch	3	650	July 5, 2017

Minor temp performance hit when JVM heap size is reached

Related topics