Long GC Pauses


(Vahid) #1

Hi all,

There are 50 indexes, each contains 3 primary shards and 1 replica. Some
threads running every 15 minutes to search and index a few documents(each
thread process at max 10 docs).

After some days, ES get into long GC pauses and at the end split brain
problem.

From the bigdesk we could see that more than 60% of heap is not used.

The only thing which we think could be a problem is many manually refresh
calls, but I'm not sure.

4 GB assigned to ES process, xms and xmx are set equally.

Total system memory is 12GB

3 other java processes using almost 4 GB.

ES version: 90.3

Java version: 1.7.0_25

Es vm configuration:

JAVA_OPTS="$JAVA_OPTS -XX:+UseParNewGC"

JAVA_OPTS="$JAVA_OPTS -XX:+UseConcMarkSweepGC"

JAVA_OPTS="$JAVA_OPTS -XX:CMSInitiatingOccupancyFraction=75"

JAVA_OPTS="$JAVA_OPTS -XX:+UseCMSInitiatingOccupancyOnly"

JAVA_OPTS="$JAVA_OPTS -XX:+UseCondCardMark"

JAVA_OPTS="$JAVA_OPTS -XX:+UseTLAB"

JAVA_OPTS="$JAVA_OPTS -XX:+CMSClassUnloadingEnabled"

JAVA_OPTS="$JAVA_OPTS -XX:MaxGCPauseMillis=10000"

ES configuration:

ndex.cache.filter.max_size: 10

index.store.throttle.type: merge

index.compound_format: false

index.cache.field.expire: 1m

index.merge.policy.merge_factor: 30

index.cache.filter.expire: 1m

index.refresh_interval: -1

index.number_of_replicas: 1

index.version.created: 200599

index.store.throttle.max_bytes_per_sec: 5mb

index.number_of_shards: 3

index.translog.flush_threshold_period: 60s

index.merge.policy.use_compound_file: false

index.store.compress.stored: true

index.cache.field.type: resident

index.indices.memory.index_buffer_size: 20%

bootstrap.mlockall is not configured yet, but I think there is no problem
with memory swapping atm.

Can someone help?

Thanks in advance,

Vahid

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/51a19985-4f36-44af-b4dd-b2fb27556717%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Radu Gheorghe) #2

Hi Vahid,

I can't say what your problem is (maybe someone else has an insight - all
your settings look fine to me), but here are some "leads":

  • it would be interesting to know if switching to the G1 garbage collector
    would help
  • maybe upgrading your JVM would help, even though yours is pretty fresh
  • it would be interesting to see how your memory pool and garbage
    collection is doing over time. SPM for
    Elasticsearchhttp://sematext.com/spm/elasticsearch-performance-monitoring/can
    help you with that, and there's a free plan that should be good enough
    for diagnostics. With this information, you'll probably be able to tune
    your GC settings for shorter pauses (maybe share some graphs here and I'm
    sure someone will give you useful hints)

Best regards,
Radu

On Tue, Dec 3, 2013 at 5:52 PM, Vahid vhasani57@gmail.com wrote:

Hi all,

There are 50 indexes, each contains 3 primary shards and 1 replica. Some
threads running every 15 minutes to search and index a few documents(each
thread process at max 10 docs).

After some days, ES get into long GC pauses and at the end split brain
problem.

From the bigdesk we could see that more than 60% of heap is not used.

The only thing which we think could be a problem is many manually refresh
calls, but I'm not sure.

4 GB assigned to ES process, xms and xmx are set equally.

Total system memory is 12GB

3 other java processes using almost 4 GB.

ES version: 90.3

Java version: 1.7.0_25

Es vm configuration:

JAVA_OPTS="$JAVA_OPTS -XX:+UseParNewGC"

JAVA_OPTS="$JAVA_OPTS -XX:+UseConcMarkSweepGC"

JAVA_OPTS="$JAVA_OPTS -XX:CMSInitiatingOccupancyFraction=75"

JAVA_OPTS="$JAVA_OPTS -XX:+UseCMSInitiatingOccupancyOnly"

JAVA_OPTS="$JAVA_OPTS -XX:+UseCondCardMark"

JAVA_OPTS="$JAVA_OPTS -XX:+UseTLAB"

JAVA_OPTS="$JAVA_OPTS -XX:+CMSClassUnloadingEnabled"

JAVA_OPTS="$JAVA_OPTS -XX:MaxGCPauseMillis=10000"

ES configuration:

ndex.cache.filter.max_size: 10

index.store.throttle.type: merge

index.compound_format: false

index.cache.field.expire: 1m

index.merge.policy.merge_factor: 30

index.cache.filter.expire: 1m

index.refresh_interval: -1

index.number_of_replicas: 1

index.version.created: 200599

index.store.throttle.max_bytes_per_sec: 5mb

index.number_of_shards: 3

index.translog.flush_threshold_period: 60s

index.merge.policy.use_compound_file: false

index.store.compress.stored: true

index.cache.field.type: resident

index.indices.memory.index_buffer_size: 20%

bootstrap.mlockall is not configured yet, but I think there is no problem
with memory swapping atm.

Can someone help?

Thanks in advance,

Vahid

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/51a19985-4f36-44af-b4dd-b2fb27556717%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHXA0_2X6VkU2wE3SoLzy6RzOrLfynXFKyVKdHAQTPKvjY85yw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Alexander Reelsen) #3

Hey,

If you total system memory is 12GB, you have 4GB of heap, 4GB of other java
process, there are only 4GB left for the file system cache. This is pretty
easy to fill, if you are doing quite some searches on that machine. So this
makes setting bootstrap.mlockall crucial. If a garbage collection has to go
to disk/swap in order to collect garbage, I am not surprised it is very
slow.

Are there any specific reasons you set all this additional JVM setups?
There is no dynamic JVM language involved so why call with
CMSClassUnloadingEnabled? Can you try with the defaults first, before
tuning in order to eliminate those as a source of these problems? Same goes
for pause times, and tlab thread allocation. I dont see a special setup
here, why the standard settings should be a bad choice.

Also, upgrading your JVM should be postponed, as newer versions have
problems with Lucene which are not fixed yet, I would stay with the
current. I would not recommend using G1, but you are free to try - there
are people telling about speedups but there are at least as many people
telling about JVM crashes :slight_smile:

Last, using nodes stats and graphing the output might make sense here, take
a special view at fielddata or maybe there is another part of the heap
space under pressure than the oldgen pool. See
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-nodes-stats.html

--Alex

On Tue, Dec 3, 2013 at 5:13 PM, Radu Gheorghe radu.gheorghe@sematext.comwrote:

Hi Vahid,

I can't say what your problem is (maybe someone else has an insight - all
your settings look fine to me), but here are some "leads":

  • it would be interesting to know if switching to the G1 garbage collector
    would help
  • maybe upgrading your JVM would help, even though yours is pretty fresh
  • it would be interesting to see how your memory pool and garbage
    collection is doing over time. SPM for Elasticsearchhttp://sematext.com/spm/elasticsearch-performance-monitoring/can help you with that, and there's a free plan that should be good enough
    for diagnostics. With this information, you'll probably be able to tune
    your GC settings for shorter pauses (maybe share some graphs here and I'm
    sure someone will give you useful hints)

Best regards,
Radu

On Tue, Dec 3, 2013 at 5:52 PM, Vahid vhasani57@gmail.com wrote:

Hi all,

There are 50 indexes, each contains 3 primary shards and 1 replica. Some
threads running every 15 minutes to search and index a few documents(each
thread process at max 10 docs).

After some days, ES get into long GC pauses and at the end split brain
problem.

From the bigdesk we could see that more than 60% of heap is not used.

The only thing which we think could be a problem is many manually refresh
calls, but I'm not sure.

4 GB assigned to ES process, xms and xmx are set equally.

Total system memory is 12GB

3 other java processes using almost 4 GB.

ES version: 90.3

Java version: 1.7.0_25

Es vm configuration:

JAVA_OPTS="$JAVA_OPTS -XX:+UseParNewGC"

JAVA_OPTS="$JAVA_OPTS -XX:+UseConcMarkSweepGC"

JAVA_OPTS="$JAVA_OPTS -XX:CMSInitiatingOccupancyFraction=75"

JAVA_OPTS="$JAVA_OPTS -XX:+UseCMSInitiatingOccupancyOnly"

JAVA_OPTS="$JAVA_OPTS -XX:+UseCondCardMark"

JAVA_OPTS="$JAVA_OPTS -XX:+UseTLAB"

JAVA_OPTS="$JAVA_OPTS -XX:+CMSClassUnloadingEnabled"

JAVA_OPTS="$JAVA_OPTS -XX:MaxGCPauseMillis=10000"

ES configuration:

ndex.cache.filter.max_size: 10

index.store.throttle.type: merge

index.compound_format: false

index.cache.field.expire: 1m

index.merge.policy.merge_factor: 30

index.cache.filter.expire: 1m

index.refresh_interval: -1

index.number_of_replicas: 1

index.version.created: 200599

index.store.throttle.max_bytes_per_sec: 5mb

index.number_of_shards: 3

index.translog.flush_threshold_period: 60s

index.merge.policy.use_compound_file: false

index.store.compress.stored: true

index.cache.field.type: resident

index.indices.memory.index_buffer_size: 20%

bootstrap.mlockall is not configured yet, but I think there is no problem
with memory swapping atm.

Can someone help?

Thanks in advance,

Vahid

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/51a19985-4f36-44af-b4dd-b2fb27556717%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHXA0_2X6VkU2wE3SoLzy6RzOrLfynXFKyVKdHAQTPKvjY85yw%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-aPr2%2B8Qu%2BtSbOLR4Yfbg-cBdncrL%2BQsbWXTq7zodzBQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Vahid) #4

Hi,

Many thanks Radu and Alex for your replies,

Atm I'm not granted to install any application on customer system, so using
SPM for me is not an option.
I've created a screenshot of one of the nodes, maybe it give you more info.

Best regards,
Vahid

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/32c8f2ef-57d2-4461-a977-0d90d2ed19e8%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Vahid) #5

On this cluster(which graphs provided) bootstrap.mlockall=true is
configured and from the top command I see swap memory used is 0.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/81aecc2d-78cc-45a7-b18c-6bd08ceadeb9%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #6