ElasticSearch client nodes are continuously getting into GC and not restarting

I have two Elasticsearch client nodes (2.3.5) each having 6GB heap, on 8GB RAM. Both client nodes are frequently getting into continuous GC loop and never completing. Client nodes serve search requests from Grafana and Kibana.

Data nodes do not show such behavior. There are 8 other data nodes, each having 16GB RAM and 12GB heap.

Client nodes logs show this:

[2017-01-25 15:58:05,553][INFO ][node                     ] [blp04910017] started
[2017-01-25 16:00:03,520][WARN ][monitor.jvm              ] [blp04910017] [gc][old][105][2] duration [16.2s], collections [1]/[16.9s], total [16.2s]/[16.2s], memory [5.3gb]->[5.5gb]/[5.9gb], all_pools {[young] [5.3mb]->[25.4mb]/[266.2mb]}{[survivor] [33.2mb]->[0b]/[33.2mb]}{[old] [5.3gb]->[5.5gb]/[5.6gb]}
[2017-01-25 16:00:21,672][WARN ][monitor.jvm              ] [blp04910017] [gc][old][106][3] duration [17.7s], collections [1]/[18.1s], total [17.7s]/[34s], memory [5.5gb]->[5.8gb]/[5.9gb], all_pools {[young] [25.4mb]->[164.1mb]/[266.2mb]}{[survivor] [0b]->[0b]/[33.2mb]}{[old] [5.5gb]->[5.6gb]/[5.6gb]}
...
...
...
[2017-01-25 16:03:45,743][WARN ][monitor.jvm              ] [blp04910017] [gc][old][119][17] duration [15.7s], collections [1]/[15.7s], total [15.7s]/[3.9m], memory [5.9gb]->[5.9gb]/[5.9gb], all_pools {[young] [266.2mb]->[266.2mb]/[266.2mb]}{[survivor] [33.2mb]->[32.4mb]/[33.2mb]}{[old] [5.6gb]->[5.6gb]/[5.6gb]}
[2017-01-25 16:03:57,145][WARN ][monitor.jvm              ] [blp04910017] [gc][old][120][18] duration [11.3s], collections [1]/[11.4s], total [11.3s]/[4.1m], memory [5.9gb]->[5.9gb]/[5.9gb], all_pools {[young] [266.2mb]->[266.2mb]/[266.2mb]}{[survivor] [32.4mb]->[32.6mb]/[33.2mb]}{[old] [5.6gb]->[5.6gb]/[5.6gb]}

I cannot restart the service either.

# service elasticsearch restart
Stopping elasticsearch:                                    [FAILED]
Starting elasticsearch: Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x0000000654cc0000, 6093537280, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 6093537280 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /tmp/hs_err_pid11988.log

The process does not accept kill command. Have to do kill -4 to close it.

elasticsearch.yml config for client nodes:

discovery.zen.minimum_master_nodes: 2
bootstrap.mlockall: 1
action.disable_delete_all_indices: True
indices.fielddata.cache.size: 40%
index.number_of_shards: 7
discovery.zen.fd.ping_retries: 5
search.default_search_timeout: 10s
discovery.zen.fd.ping_interval: 15s
cluster.name: es-garden
network.host: 0.0.0.0
discovery.zen.ping.unicast.hosts: ['10.50.76.230', '10.50.76.231', '10.50.76.232', '10.50.76.233', '10.9.140.227', '10.9.140.228', '10.9.140.229', '10.9.140.230']
discovery.zen.ping.multicast.enabled: False
node.data: 0
discovery.zen.no_master_block: write
discovery.zen.fd.ping_timeout: 60s
index.number_of_replicas: 1
node.name: blp04910017
node.master: false

A screenshot from Kopf when this happens:

You can see one of the client node heap is full - blp04910017. (Click on the image to see all)

If it helps, I can see this in the log when client nodes start:

[2017-01-25 16:14:35,551][INFO ][indices.breaker          ] [blp04910017] Updated breaker settings fielddata: [fielddata,type=MEMORY,limit=1278030643/1.1gb,overhead=1.03]
[2017-01-25 16:14:35,551][INFO ][indices.breaker          ] [blp04910017] Updated breaker settings parent: [parent,type=PARENT,limit=2130051072/1.9gb,overhead=1.0]

I would appreciate any help from the community on solving this.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.