I've been having a lot of problems dealing with the GC on my cluster. It always reaches a point where the heap is filled up, the GC doesn't evict data and my nodes become unresponsive. I've tried many different things to no avail. I'm currently dealing with 20 shards, 2 replicas, 16GB of ram (8GB for heap), mlockall set to true, ran "sudo swapoff -a" on all the machines. However, I still get unresponsiveness after a while due to GC not doing its job:
[2016-06-28 11:44:45,302][WARN ][monitor.jvm ] [ursa-es-data-node-18] [gc][old][50439][1134] duration [24.7s], collections [1]/[24.8s], total [24.7s]/[5.9h], memory [7.9gb]->[7.9gb]/[7.9gb], all_pools {[young] [399.4mb]->[399.4mb]/[399.4mb]}{[survivor] [42.5mb]->[38.1mb]/[49.8mb]}{[old] [7.5gb]->[7.5gb]/[7.5gb]}
[2016-06-28 11:45:19,942][WARN ][monitor.jvm ] [ursa-es-data-node-18] [gc][old][50440][1135] duration [34.5s], collections [1]/[34.6s], total [34.5s]/[5.9h], memory [7.9gb]->[7.9gb]/[7.9gb], all_pools {[young] [399.4mb]->[399.4mb]/[399.4mb]}{[survivor] [38.1mb]->[37mb]/[49.8mb]}{[old] [7.5gb]->[7.5gb]/[7.5gb]}
[2016-06-28 11:45:44,822][WARN ][monitor.jvm ] [ursa-es-data-node-18] [gc][old][50441][1136] duration [24.7s], collections [1]/[24.8s], total [24.7s]/[5.9h], memory [7.9gb]->[7.9gb]/[7.9gb], all_pools {[young] [399.4mb]->[399.4mb]/[399.4mb]}{[survivor] [37mb]->[36.3mb]/[49.8mb]}{[old] [7.5gb]->[7.5gb]/[7.5gb]}
[2016-06-28 11:46:18,494][WARN ][monitor.jvm ] [ursa-es-data-node-18] [gc][old][50442][1137] duration [33.5s], collections [1]/[33.6s], total [33.5s]/[5.9h], memory [7.9gb]->[7.9gb]/[7.9gb], all_pools {[young] [399.4mb]->[399.4mb]/[399.4mb]}{[survivor] [36.3mb]->[36.4mb]/[49.8mb]}{[old] [7.5gb]->[7.5gb]/[7.5gb]}
[2016-06-28 11:46:43,184][WARN ][monitor.jvm ] [ursa-es-data-node-18] [gc][old][50443][1138] duration [24.5s], collections [1]/[24.6s], total [24.5s]/[5.9h], memory [7.9gb]->[7.9gb]/[7.9gb], all_pools {[young] [399.4mb]->[399.4mb]/[399.4mb]}{[survivor] [36.4mb]->[35.3mb]/[49.8mb]}{[old] [7.5gb]->[7.5gb]/[7.5gb]}
[2016-06-28 11:47:16,802][WARN ][monitor.jvm ] [ursa-es-data-node-18] [gc][old][50444][1139] duration [33.5s], collections [1]/[33.6s], total [33.5s]/[5.9h], memory [7.9gb]->[7.9gb]/[7.9gb], all_pools {[young] [399.4mb]->[399.4mb]/[399.4mb]}{[survivor] [35.3mb]->[34.9mb]/[49.8mb]}{[old] [7.5gb]->[7.5gb]/[7.5gb]}
[2016-06-28 11:47:43,330][WARN ][monitor.jvm ] [ursa-es-data-node-18] [gc][old][50445][1140] duration [26.4s], collections [1]/[26.5s], total [26.4s]/[5.9h], memory [7.9gb]->[7.9gb]/[7.9gb], all_pools {[young] [399.4mb]->[399.4mb]/[399.4mb]}{[survivor] [34.9mb]->[32.3mb]/[49.8mb]}{[old] [7.5gb]->[7.5gb]/[7.5gb]}
Does anyone have an idea of what I can do to try to debug this problem?