Okay, we moved all nodes to 5.2.2. Still seeing OOM deaths, probably related to higher query cache settings (moved them from 2% to 6% since 5.2.2 fixed some memory leaks).
The query cache is still leaking memory somewhere ... for now we moved the query cache limit back to 2% and are hoping for an uneventful weekend.
[2017-03-25T17:54:17,743][WARN ][o.e.m.j.JvmGcMonitorService] [es-big-14] [gc][8626] overhead, spent [7.9s] collecting in the last [8.2s]
[2017-03-25T17:54:26,548][INFO ][o.e.m.j.JvmGcMonitorService] [es-big-14] [gc][old][8627][755] duration [7.9s], collections [1]/[9.1s], total [7.9s]/[27.9m], memory [20.2gb]->[20.2gb]/[20.3gb], all_po$
[2017-03-25T17:54:26,548][WARN ][o.e.m.j.JvmGcMonitorService] [es-big-14] [gc][8627] overhead, spent [7.9s] collecting in the last [9.1s]
[2017-03-25T17:54:35,305][INFO ][o.e.m.j.JvmGcMonitorService] [es-big-14] [gc][old][8628][756] duration [8.1s], collections [1]/[8.7s], total [8.1s]/[28m], memory [20.2gb]->[20.3gb]/[20.3gb], all_pool$
[2017-03-25T17:54:35,305][WARN ][o.e.m.j.JvmGcMonitorService] [es-big-14] [gc][8628] overhead, spent [8.1s] collecting in the last [8.7s]
[2017-03-25T17:54:44,336][INFO ][o.e.m.j.JvmGcMonitorService] [es-big-14] [gc][old][8629][757] duration [8.4s], collections [1]/[9s], total [8.4s]/[28.2m], memory [20.3gb]->[20.3gb]/[20.3gb], all_pool$
[2017-03-25T17:54:44,357][WARN ][o.e.m.j.JvmGcMonitorService] [es-big-14] [gc][8629] overhead, spent [8.4s] collecting in the last [9s]
[2017-03-25T17:54:52,639][INFO ][o.e.m.j.JvmGcMonitorService] [es-big-14] [gc][old][8630][758] duration [8s], collections [1]/[8.2s], total [8s]/[28.3m], memory [20.3gb]->[20.3gb]/[20.3gb], all_pools $
[2017-03-25T17:54:52,639][WARN ][o.e.m.j.JvmGcMonitorService] [es-big-14] [gc][8630] overhead, spent [8s] collecting in the last [8.2s]
[2017-03-25T17:55:01,384][INFO ][o.e.m.j.JvmGcMonitorService] [es-big-14] [gc][old][8631][759] duration [8.5s], collections [1]/[8.7s], total [8.5s]/[28.4m], memory [20.3gb]->[20.3gb]/[20.3gb], all_po$
[2017-03-25T17:55:01,384][WARN ][o.e.m.j.JvmGcMonitorService] [es-big-14] [gc][8631] overhead, spent [8.5s] collecting in the last [8.7s]
[2017-03-25T17:55:09,525][INFO ][o.e.m.j.JvmGcMonitorService] [es-big-14] [gc][old][8632][760] duration [7.6s], collections [1]/[7.8s], total [7.6s]/[28.6m], memory [20.3gb]->[20.3gb]/[20.3gb], all_po$
[2017-03-25T17:55:09,525][WARN ][o.e.m.j.JvmGcMonitorService] [es-big-14] [gc][8632] overhead, spent [7.6s] collecting in the last [7.8s]
[2017-03-25T17:55:18,271][INFO ][o.e.m.j.JvmGcMonitorService] [es-big-14] [gc][old][8633][761] duration [8.8s], collections [1]/[9s], total [8.8s]/[28.7m], memory [20.3gb]->[20.3gb]/[20.3gb], all_pool$
[2017-03-25T17:55:18,271][WARN ][o.e.m.j.JvmGcMonitorService] [es-big-14] [gc][8633] overhead, spent [8.8s] collecting in the last [9s]
[2017-03-25T17:55:26,762][INFO ][o.e.m.j.JvmGcMonitorService] [es-big-14] [gc][old][8634][762] duration [8.2s], collections [1]/[8.1s], total [8.2s]/[28.8m], memory [20.3gb]->[20.3gb]/[20.3gb], all_po$
[2017-03-25T17:55:26,762][WARN ][o.e.m.j.JvmGcMonitorService] [es-big-14] [gc][8634] overhead, spent [8.2s] collecting in the last [8.1s]
[2017-03-25T17:55:35,641][INFO ][o.e.m.j.JvmGcMonitorService] [es-big-14] [gc][old][8635][763] duration [8.6s], collections [1]/[9.2s], total [8.6s]/[29m], memory [20.3gb]->[20.3gb]/[20.3gb], all_pool$
[2017-03-25T17:55:35,641][WARN ][o.e.m.j.JvmGcMonitorService] [es-big-14] [gc][8635] overhead, spent [8.6s] collecting in the last [9.2s]
[2017-03-25T17:55:43,958][INFO ][o.e.m.j.JvmGcMonitorService] [es-big-14] [gc][old][8636][764] duration [7.7s], collections [1]/[8s], total [7.7s]/[29.1m], memory [20.3gb]->[20.3gb]/[20.3gb], all_pool$
[2017-03-25T17:55:43,958][WARN ][o.e.m.j.JvmGcMonitorService] [es-big-14] [gc][8636] overhead, spent [7.7s] collecting in the last [8s]
[2017-03-25T17:55:52,936][INFO ][o.e.m.j.JvmGcMonitorService] [es-big-14] [gc][old][8637][765] duration [9.1s], collections [1]/[9.2s], total [9.1s]/[29.3m], memory [20.3gb]->[20.3gb]/[20.3gb], all_po$
[2017-03-25T17:55:52,936][WARN ][o.e.m.j.JvmGcMonitorService] [es-big-14] [gc][8637] overhead, spent [9.1s] collecting in the last [9.2s]
[2017-03-25T17:56:01,058][INFO ][o.e.m.j.JvmGcMonitorService] [es-big-14] [gc][old][8638][766] duration [8s], collections [1]/[8.1s], total [8s]/[29.4m], memory [20.3gb]->[20.3gb]/[20.3gb], all_pools $
[2017-03-25T17:56:01,058][WARN ][o.e.m.j.JvmGcMonitorService] [es-big-14] [gc][8638] overhead, spent [8s] collecting in the last [8.1s]
[2017-03-25T17:56:18,569][INFO ][o.e.m.j.JvmGcMonitorService] [es-big-14] [gc][old][8639][767] duration [9.2s], collections [1]/[9.2s], total [9.2s]/[29.6m], memory [20.3gb]->[20.3gb]/[20.3gb], all_po$
[2017-03-25T17:56:27,408][WARN ][o.e.m.j.JvmGcMonitorService] [es-big-14] [gc][8639] overhead, spent [9.2s] collecting in the last [9.2s]
[2017-03-25T17:58:17,349][INFO ][o.e.m.j.JvmGcMonitorService] [es-big-14] [gc][old][8640][777] duration [1.4m], collections [10]/[1.4m], total [1.4m]/[31m], memory [20.3gb]->[20.3gb]/[20.3gb], all_poo$
[2017-03-25T17:59:31,930][WARN ][o.e.m.j.JvmGcMonitorService] [es-big-14] [gc][8640] overhead, spent [1.4m] collecting in the last [1.4m]
[2017-03-25T18:06:58,819][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [es-big-14] fatal error in thread [elasticsearch[es-big-14][warmer][T#5]], exiting
java.lang.OutOfMemoryError: Java heap space