Minor temp performance hit when JVM heap size is reached

Hey,
This isn't a huge deal, but wanted to point it out to see if there
are any ways to avoid. Current production load is pretty low, 40 or so
queries per second max. Performance is great and we know we can ramp
up the traffic to 25x this rate. Average response times are ~25ms and
max are ~200ms.

However, we have seen two performance hits, where averages jumped to a
sec and maxs up to 12 seconds. Both times this has occurred it
directly correlates one of the machines in the cluster hitting its JVM
heap max for elasticsearch. After the limit has been hit, each machine
has been fine.

Any ideas if there is anyway to avoid this hit? I'm guessing that the
GC kicks on and ends up blocking some processing.

Thanks,
Paul

Yes, its basically the GC kicking in. One problem that might cause long GC
is the mem being swapped to disk, causing the GC to be really slow since it
needs to process the whole memory (thus loading the swapped memory). This
will be better in 0.13 as I added support (copied form cassandra) to force
the OS not to swap out the JVM.

-shay.banon

On Wed, Nov 3, 2010 at 7:20 PM, Paul ppearcy@gmail.com wrote:

Hey,
This isn't a huge deal, but wanted to point it out to see if there
are any ways to avoid. Current production load is pretty low, 40 or so
queries per second max. Performance is great and we know we can ramp
up the traffic to 25x this rate. Average response times are ~25ms and
max are ~200ms.

However, we have seen two performance hits, where averages jumped to a
sec and maxs up to 12 seconds. Both times this has occurred it
directly correlates one of the machines in the cluster hitting its JVM
heap max for elasticsearch. After the limit has been hit, each machine
has been fine.

Any ideas if there is anyway to avoid this hit? I'm guessing that the
GC kicks on and ends up blocking some processing.

Thanks,
Paul

Hi

maybe this link helps to JVM tuning

Tony

On Nov 3, 1:37 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Yes, its basically the GC kicking in. One problem that might cause long GC
is the mem being swapped to disk, causing the GC to be really slow since it
needs to process the whole memory (thus loading the swapped memory). This
will be better in 0.13 as I added support (copied form cassandra) to force
the OS not to swap out the JVM.

-shay.banon

On Wed, Nov 3, 2010 at 7:20 PM, Paul ppea...@gmail.com wrote:

Hey,
This isn't a huge deal, but wanted to point it out to see if there
are any ways to avoid. Current production load is pretty low, 40 or so
queries per second max. Performance is great and we know we can ramp
up the traffic to 25x this rate. Average response times are ~25ms and
max are ~200ms.

However, we have seen two performance hits, where averages jumped to a
sec and maxs up to 12 seconds. Both times this has occurred it
directly correlates one of the machines in the cluster hitting its JVM
heap max for elasticsearch. After the limit has been hit, each machine
has been fine.

Any ideas if there is anyway to avoid this hit? I'm guessing that the
GC kicks on and ends up blocking some processing.

Thanks,
Paul

Awesome. Great link.

Like I said, not concerned, more curious. Will see how things look
after moving to 0.13.

Thanks,
Paul

On Nov 3, 12:15 pm, tfreitas tfrei...@gmail.com wrote:

Hi

maybe this link helps to JVM tuning

Open Source and Enterprise Architecture: JVM Performance Tuning

Tony

On Nov 3, 1:37 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Yes, its basically the GC kicking in. One problem that might cause long GC
is the mem being swapped to disk, causing the GC to be really slow since it
needs to process the whole memory (thus loading the swapped memory). This
will be better in 0.13 as I added support (copied form cassandra) to force
the OS not to swap out the JVM.

-shay.banon

On Wed, Nov 3, 2010 at 7:20 PM, Paul ppea...@gmail.com wrote:

Hey,
This isn't a huge deal, but wanted to point it out to see if there
are any ways to avoid. Current production load is pretty low, 40 or so
queries per second max. Performance is great and we know we can ramp
up the traffic to 25x this rate. Average response times are ~25ms and
max are ~200ms.

However, we have seen two performance hits, where averages jumped to a
sec and maxs up to 12 seconds. Both times this has occurred it
directly correlates one of the machines in the cluster hitting its JVM
heap max for elasticsearch. After the limit has been hit, each machine
has been fine.

Any ideas if there is anyway to avoid this hit? I'm guessing that the
GC kicks on and ends up blocking some processing.

Thanks,
Paul

Just one note regarding the 0.13 not swapping feature, it requires setting
the minimum and the maximum memory allocation for the JVM to be the same.

On Thu, Nov 4, 2010 at 12:56 AM, Paul ppearcy@gmail.com wrote:

Awesome. Great link.

Like I said, not concerned, more curious. Will see how things look
after moving to 0.13.

Thanks,
Paul

On Nov 3, 12:15 pm, tfreitas tfrei...@gmail.com wrote:

Hi

maybe this link helps to JVM tuning

Open Source and Enterprise Architecture: JVM Performance Tuning

Tony

On Nov 3, 1:37 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Yes, its basically the GC kicking in. One problem that might cause long
GC
is the mem being swapped to disk, causing the GC to be really slow
since it
needs to process the whole memory (thus loading the swapped memory).
This
will be better in 0.13 as I added support (copied form cassandra) to
force
the OS not to swap out the JVM.

-shay.banon

On Wed, Nov 3, 2010 at 7:20 PM, Paul ppea...@gmail.com wrote:

Hey,
This isn't a huge deal, but wanted to point it out to see if there
are any ways to avoid. Current production load is pretty low, 40 or
so
queries per second max. Performance is great and we know we can ramp
up the traffic to 25x this rate. Average response times are ~25ms and
max are ~200ms.

However, we have seen two performance hits, where averages jumped to
a
sec and maxs up to 12 seconds. Both times this has occurred it
directly correlates one of the machines in the cluster hitting its
JVM
heap max for elasticsearch. After the limit has been hit, each
machine
has been fine.

Any ideas if there is anyway to avoid this hit? I'm guessing that the
GC kicks on and ends up blocking some processing.

Thanks,
Paul

Just wanted to say that 0.13.0 has been running for around a week, now
and we haven't hit this case again. Really awesome work. Many thanks
to Shay and the Cassandra team for figuring out how to solve this
case.

On Nov 3, 4:13 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Just one note regarding the 0.13 not swapping feature, it requires setting
the minimum and the maximum memory allocation for the JVM to be the same.

On Thu, Nov 4, 2010 at 12:56 AM, Paul ppea...@gmail.com wrote:

Awesome. Great link.

Like I said, not concerned, more curious. Will see how things look
after moving to 0.13.

Thanks,
Paul

On Nov 3, 12:15 pm, tfreitas tfrei...@gmail.com wrote:

Hi

maybe this link helps to JVM tuning

Open Source and Enterprise Architecture: JVM Performance Tuning

Tony

On Nov 3, 1:37 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Yes, its basically the GC kicking in. One problem that might cause long
GC
is the mem being swapped to disk, causing the GC to be really slow
since it
needs to process the whole memory (thus loading the swapped memory).
This
will be better in 0.13 as I added support (copied form cassandra) to
force
the OS not to swap out the JVM.

-shay.banon

On Wed, Nov 3, 2010 at 7:20 PM, Paul ppea...@gmail.com wrote:

Hey,
This isn't a huge deal, but wanted to point it out to see if there
are any ways to avoid. Current production load is pretty low, 40 or
so
queries per second max. Performance is great and we know we can ramp
up the traffic to 25x this rate. Average response times are ~25ms and
max are ~200ms.

However, we have seen two performance hits, where averages jumped to
a
sec and maxs up to 12 seconds. Both times this has occurred it
directly correlates one of the machines in the cluster hitting its
JVM
heap max for elasticsearch. After the limit has been hit, each
machine
has been fine.

Any ideas if there is anyway to avoid this hit? I'm guessing that the
GC kicks on and ends up blocking some processing.

Thanks,
Paul