Hey,
This isn't a huge deal, but wanted to point it out to see if there
are any ways to avoid. Current production load is pretty low, 40 or so
queries per second max. Performance is great and we know we can ramp
up the traffic to 25x this rate. Average response times are ~25ms and
max are ~200ms.
However, we have seen two performance hits, where averages jumped to a
sec and maxs up to 12 seconds. Both times this has occurred it
directly correlates one of the machines in the cluster hitting its JVM
heap max for elasticsearch. After the limit has been hit, each machine
has been fine.
Any ideas if there is anyway to avoid this hit? I'm guessing that the
GC kicks on and ends up blocking some processing.
Yes, its basically the GC kicking in. One problem that might cause long GC
is the mem being swapped to disk, causing the GC to be really slow since it
needs to process the whole memory (thus loading the swapped memory). This
will be better in 0.13 as I added support (copied form cassandra) to force
the OS not to swap out the JVM.
Hey,
This isn't a huge deal, but wanted to point it out to see if there
are any ways to avoid. Current production load is pretty low, 40 or so
queries per second max. Performance is great and we know we can ramp
up the traffic to 25x this rate. Average response times are ~25ms and
max are ~200ms.
However, we have seen two performance hits, where averages jumped to a
sec and maxs up to 12 seconds. Both times this has occurred it
directly correlates one of the machines in the cluster hitting its JVM
heap max for elasticsearch. After the limit has been hit, each machine
has been fine.
Any ideas if there is anyway to avoid this hit? I'm guessing that the
GC kicks on and ends up blocking some processing.
Yes, its basically the GC kicking in. One problem that might cause long GC
is the mem being swapped to disk, causing the GC to be really slow since it
needs to process the whole memory (thus loading the swapped memory). This
will be better in 0.13 as I added support (copied form cassandra) to force
the OS not to swap out the JVM.
Hey,
This isn't a huge deal, but wanted to point it out to see if there
are any ways to avoid. Current production load is pretty low, 40 or so
queries per second max. Performance is great and we know we can ramp
up the traffic to 25x this rate. Average response times are ~25ms and
max are ~200ms.
However, we have seen two performance hits, where averages jumped to a
sec and maxs up to 12 seconds. Both times this has occurred it
directly correlates one of the machines in the cluster hitting its JVM
heap max for elasticsearch. After the limit has been hit, each machine
has been fine.
Any ideas if there is anyway to avoid this hit? I'm guessing that the
GC kicks on and ends up blocking some processing.
Yes, its basically the GC kicking in. One problem that might cause long GC
is the mem being swapped to disk, causing the GC to be really slow since it
needs to process the whole memory (thus loading the swapped memory). This
will be better in 0.13 as I added support (copied form cassandra) to force
the OS not to swap out the JVM.
Hey,
This isn't a huge deal, but wanted to point it out to see if there
are any ways to avoid. Current production load is pretty low, 40 or so
queries per second max. Performance is great and we know we can ramp
up the traffic to 25x this rate. Average response times are ~25ms and
max are ~200ms.
However, we have seen two performance hits, where averages jumped to a
sec and maxs up to 12 seconds. Both times this has occurred it
directly correlates one of the machines in the cluster hitting its JVM
heap max for elasticsearch. After the limit has been hit, each machine
has been fine.
Any ideas if there is anyway to avoid this hit? I'm guessing that the
GC kicks on and ends up blocking some processing.
Yes, its basically the GC kicking in. One problem that might cause long
GC
is the mem being swapped to disk, causing the GC to be really slow
since it
needs to process the whole memory (thus loading the swapped memory).
This
will be better in 0.13 as I added support (copied form cassandra) to
force
the OS not to swap out the JVM.
Hey,
This isn't a huge deal, but wanted to point it out to see if there
are any ways to avoid. Current production load is pretty low, 40 or
so
queries per second max. Performance is great and we know we can ramp
up the traffic to 25x this rate. Average response times are ~25ms and
max are ~200ms.
However, we have seen two performance hits, where averages jumped to
a
sec and maxs up to 12 seconds. Both times this has occurred it
directly correlates one of the machines in the cluster hitting its
JVM
heap max for elasticsearch. After the limit has been hit, each
machine
has been fine.
Any ideas if there is anyway to avoid this hit? I'm guessing that the
GC kicks on and ends up blocking some processing.
Just wanted to say that 0.13.0 has been running for around a week, now
and we haven't hit this case again. Really awesome work. Many thanks
to Shay and the Cassandra team for figuring out how to solve this
case.
Yes, its basically the GC kicking in. One problem that might cause long
GC
is the mem being swapped to disk, causing the GC to be really slow
since it
needs to process the whole memory (thus loading the swapped memory).
This
will be better in 0.13 as I added support (copied form cassandra) to
force
the OS not to swap out the JVM.
Hey,
This isn't a huge deal, but wanted to point it out to see if there
are any ways to avoid. Current production load is pretty low, 40 or
so
queries per second max. Performance is great and we know we can ramp
up the traffic to 25x this rate. Average response times are ~25ms and
max are ~200ms.
However, we have seen two performance hits, where averages jumped to
a
sec and maxs up to 12 seconds. Both times this has occurred it
directly correlates one of the machines in the cluster hitting its
JVM
heap max for elasticsearch. After the limit has been hit, each
machine
has been fine.
Any ideas if there is anyway to avoid this hit? I'm guessing that the
GC kicks on and ends up blocking some processing.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.