Long GC even after mlockall set to true

Hi,

I am having a 3 node ES 2.3.2 cluster running on Linux(2.6.32-400.37.1.el6uek.x86_64). For every few hours nodes each nodes goes into Long GC's taking time from 2 minutes to 5 minutes and hence node gets disconnected and cluster gets unstable resulting in indexing activity/search activity to fail.

As i read disabling swapping would help in OLD GC to run quickly in millis than seconds , I have sent the mlockall:true and when i verify it in _nodes API i see it set. Yet I still face the long GC's( both old and Young) . Can you guys provide me some pointers

[2017-08-24 17:10:31,748][WARN ][monitor.jvm              ] [node2] [gc][young][96270][321] duration [5.3s], collections [1]/[6.3s], total [5.3s]/[7.3m], memory [15.4gb]->[13.2gb]/[19.5gb], all_pools {[young] [3.3gb]->[68.4mb]/[3.4gb]}{[survivor] [440.9mb]->[440.9mb]/[440.9mb]}{[old] [11.6gb]->[12.7gb]/[15.6gb]}
[2017-08-24 17:10:33,667][INFO ][cluster.service          ] [node2] detected_master {node1}{mDHkISCtTNikmybnFKWAYg}{17x.xx.xx.221}{17x.xx.xx.221:9300}, added {{node1}{mDHkISCtTNikmybnFKWAYg}{17x.xx.xx.221}{17x.xx.xx.221:9300},}, reason: zen-disco-receive(from master [{node1}{mDHkISCtTNikmybnFKWAYg}{17x.xx.xx.221}{17x.xx.xx.221:9300}])
[2017-08-24 17:12:50,033][WARN ][monitor.jvm              ] [node2] [gc][young][96275][322] duration [1.8m], collections [1]/[2.2m], total [1.8m]/[9.2m], memory [16.3gb]->[5.2gb]/[19.5gb], all_pools {[young] [3.1gb]->[55.1mb]/[3.4gb]}{[survivor] [440.9mb]->[0b]/[440.9mb]}{[old] [12.7gb]->[5.2gb]/[15.6gb]}
[2017-08-24 17:12:50,033][WARN ][monitor.jvm              ] [node2] [gc][old][96275][13] duration [22.2s], collections [1]/[2.2m], total [22.2s]/[1.6m], memory [16.3gb]->[5.2gb]/[19.5gb], all_pools {[young] [3.1gb]->[55.1mb]/[3.4gb]}{[survivor] [440.9mb]->[0b]/[440.9mb]}{[old] [12.7gb]->[5.2gb]/[15.6gb]}
[2017-08-24 17:12:50,269][INFO ][discovery.zen            ] [node2] master_left [{node1}{mDHkISCtTNikmybnFKWAYg}{17x.xx.xx.221}{17x.xx.xx.221:9300}], reason [failed to ping, tried [3] times, each with  maximum [30s] timeout]
[2017-08-24 17:12:50,269][WARN ][discovery.zen            ] [node2] master left (reason = failed to ping, tried [3] times, each with  maximum [30s] timeout), current nodes: {{node2}{yetMu9irR4q25WCyI265lw}{17x.xx.xx.222}{node2/17x.xx.xx.222:9300},{node3}{i4Up59UlQzOqY4q-i4-ZAg}{17x.xx.xx.223}{17x.xx.xx.223:9300},}
[2017-08-24 17:12:50,270][INFO ][cluster.service          ] [node2] removed {{node1}{mDHkISCtTNikmybnFKWAYg}{17x.xx.xx.221}{17x.xx.xx.221:9300},}, reason: zen-disco-master_failed ({node1}{mDHkISCtTNikmybnFKWAYg}{17x.xx.xx.221}{17x.xx.xx.221:9300})
[2017-08-24 17:12:53,493][INFO ][cluster.service          ] [node2] detected_master {node1}{mDHkISCtTNikmybnFKWAYg}{17x.xx.xx.221}{17x.xx.xx.221:9300}, added {{node1}{mDHkISCtTNikmybnFKWAYg}{17x.xx.xx.221}{17x.xx.xx.221:9300},}, reason: zen-disco-receive(from master [{node1}{mDHkISCtTNikmybnFKWAYg}{17x.xx.xx.221}{17x.xx.xx.221:9300}])
[2017-08-24 17:12:55,917][WARN ][monitor.jvm              ] [node2] [gc][young][96279][323] duration [2.3s], collections [1]/[2.8s], total [2.3s]/[9.2m], memory [8gb]->[6.6gb]/[19.5gb], all_pools {[young] [2.8gb]->[54.3mb]/[3.4gb]}{[survivor] [0b]->[440.9mb]/[440.9mb]}{[old] [5.2gb]->[6.1gb]/[15.6gb]}
[2017-08-24 17:13:08,406][WARN ][monitor.jvm              ] [node2] [gc][young][96284][324] duration [8.2s], collections [1]/[8.4s], total [8.2s]/[9.3m], memory [10gb]->[8.8gb]/[19.5gb], all_pools {[young] [3.4gb]->[401.1mb]/[3.4gb]}{[survivor] [440.9mb]->[440.9mb]/[440.9mb]}{[old] [6.1gb]->[7.9gb]/[15.6gb]}
[2017-08-24 17:13:20,165][WARN ][monitor.jvm              ] [node2] [gc][young][96288][325] duration [8s], collections [1]/[8.5s], total [8s]/[9.5m], memory [10.8gb]->[10.8gb]/[19.5gb], all_pools {[young] [2.4gb]->[903.2mb]/[3.4gb]}{[survivor] [440.9mb]->[440.9mb]/[440.9mb]}{[old] [7.9gb]->[9.5gb]/[15.6gb]}
[2017-08-24 17:13:36,853][WARN ][monitor.jvm              ] [node2] [gc][young][96292][326] duration [13.6s], collections [1]/[13.6s], total [13.6s]/[9.7m], memory [13.4gb]->[12.8gb]/[19.5gb], all_pools {[young] [3.4gb]->[142.5mb]/[3.4gb]}{[survivor] [440.9mb]->[440.9mb]/[440.9mb]}{[old] [9.5gb]->[12.2gb]/[15.6gb]}
[2017-08-24 17:13:53,700][WARN ][transport                ] [node2] Received response for a request that has timed out, sent [57780ms] ago, timed out [16846ms] ago, action [internal:discovery/zen/fd/master_ping], node [{node1}{mDHkISCtTNikmybnFKWAYg}{17x.xx.xx.221}{17x.xx.xx.221:9300}], id [99933]
[2017-08-24 17:17:06,826][WARN ][monitor.jvm              ] [node2] [gc][old][96310][14] duration [3.2m], collections [1]/[3.2m], total [3.2m]/[4.9m], memory [16.1gb]->[10.9gb]/[19.5gb], all_pools {[young] [3.4gb]->[86.6mb]/[3.4gb]}{[survivor] [440.9mb]->[0b]/[440.9mb]}{[old] [12.2gb]->[10.8gb]/[15.6gb]}
[2017-08-24 17:17:06,846][INFO ][discovery.zen            ] [node2] master_left [{node1}{mDHkISCtTNikmybnFKWAYg}{17x.xx.xx.221}{17x.xx.xx.221:9300}], reason [failed to ping, tried [3] times, each with  maximum [30s] timeout]
[2017-08-24 17:19:39,078][WARN ][cluster.service          ] [node2] cluster state update task [zen-disco-receive(from master [{node1}{mDHkISCtTNikmybnFKWAYg}{17x.xx.xx.221}{17x.xx.xx.221:9300}])] took 6.7m above the warn threshold of 30s
[2017-08-24 17:19:39,080][WARN ][discovery.zen            ] [node2] master left (reason = failed to ping, tried [3] times, each with  maximum [30s] timeout), current nodes: {{node2}{yetMu9irR4q25WCyI265lw}{17x.xx.xx.222}{node2/17x.xx.xx.222:9300},{node3}{i4Up59UlQzOqY4q-i4-ZAg}{17x.xx.xx.223}{17x.xx.xx.223:9300},}
[2017-08-24 17:19:39,080][INFO ][cluster.service          ] [node2] removed {{node1}{mDHkISCtTNikmybnFKWAYg}{17x.xx.xx.221}{17x.xx.xx.221:9300},}, reason: zen-disco-master_failed ({node1}{mDHkISCtTNikmybnFKWAYg}{17x.xx.xx.221}{17x.xx.xx.221:9300})
[2017-08-24 17:19:39,086][INFO ][discovery.zen            ] [node2] master_left [null], reason [failed to perform initial connect [null]]
[2017-08-24 17:19:39,086][INFO ][discovery.zen            ] [node2] master_left [null], reason [failed to perform initial connect [null]]
[2017-08-24 17:19:39,089][INFO ][discovery.zen            ] [node2] master_left [null], reason [failed to perform initial connect [null]]
[2017-08-24 17:19:39,090][ERROR][discovery.zen            ] [node2] unexpected failure during [zen-disco-master_failed (null)]
java.lang.NullPointerException
	at org.elasticsearch.discovery.zen.ZenDiscovery$7.execute(ZenDiscovery.java:615)
	at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:45)
	at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:468)
	at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:772)
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
[2017-08-24 17:19:39,091][ERROR][discovery.zen            ] [node2] unexpected failure during [zen-disco-master_failed (null)]
java.lang.NullPointerException
	at org.elasticsearch.discovery.zen.ZenDiscovery$7.execute(ZenDiscovery.java:615)
	at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:45)
	at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:468)
	at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:772)
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
[2017-08-24 17:19:39,092][ERROR][discovery.zen            ] [node2] unexpected failure during [zen-disco-master_failed (null)]
java.lang.NullPointerException
	at org.elasticsearch.discovery.zen.ZenDiscovery$7.execute(ZenDiscovery.java:615)
	at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:45)
	at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:468)
	at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:772)
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
[2017-08-24 17:19:39,092][INFO ][discovery.zen            ] [node2] master_left [null], reason [failed to perform initial connect [null]]
[2017-08-24 17:19:39,094][ERROR][discovery.zen            ] [node2] unexpected failure during [zen-disco-master_failed (null)]
java.lang.NullPointerException

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.