ES goes on RED with master node left

I have ES 2.4 cluster with one data node and one master node. Pumping the traffic for perf testing. Everything went fine for 2 days and 3rd day master goes weird and doesnt connect to datanode. when i restart the master node it comes to healthy cluster.

attached the logs for 3 days.

https://drive.google.com/open?id=1QDv_NvQmteta1dMviudpiPsm_Kyn_XzR

Can any one help me find out why this OOM can happen after 3 days
We have constant num of indices at any given point <10. in that one index is always going went till 50gb. others are light weight

[2018-12-05 12:59:42,967][INFO ][cluster.metadata ] [metrics-master-0] [logs-2018.12.05.13] update_mapping [accesslog]
[2018-12-05 12:59:43,022][TRACE][gateway ] [metrics-master-0] [logs-2018.12.05.13] writing state, reason [version changed from [1] to [2]]
[2018-12-05 12:59:43,082][TRACE][gateway ] [metrics-master-0] [logs-2018.12.05.13] writing state, reason [version changed from [2] to [3]]
[2018-12-05 13:55:03,320][WARN ][monitor.jvm ] [metrics-master-0] [gc][young][84164][857] duration [1.9s], collections [1]/[2.8s], total [1.9s]/[1m], memory [547.2mb]->[273.3mb]/[1gb], all_pools {[young] [443.9mb]->[8.3mb]/[451.2mb]}{[survivor] [33.2mb]->[56.3mb]/[56.3mb]}{[old] [70mb]->[208.6mb]/[564mb]}
[2018-12-05 13:56:40,896][WARN ][monitor.jvm ] [metrics-master-0] [gc][young][84259][858] duration [2.7s], collections [1]/[3.4s], total [2.7s]/[1m], memory [696.5mb]->[583.3mb]/[1gb], all_pools {[young] [431.5mb]->[8.1mb]/[451.2mb]}{[survivor] [56.3mb]->[56.3mb]/[56.3mb]}{[old] [208.6mb]->[518.8mb]/[564mb]}
[2018-12-05 13:59:43,887][INFO ][cluster.metadata ] [metrics-master-0] [logs-2018.12.05.14] creating index, cause [auto(bulk api)], templates [cgn_l4_pe_log_template, cgn_l4_pc_template, slb_vport_l4_pc_template, slb_vport_l7_pr_template, fw_l4_pc_template], shards [1]/[0], mappings [cgn_l4_pc, slb_vport_l4_pc, fw_l4_pc, cgn_l4_pe_log, accesslog]
[2018-12-05 13:59:44,040][TRACE][gateway ] [metrics-master-0] [logs-2018.12.05.14] writing state, reason [freshly created]
[2018-12-05 13:59:44,132][INFO ][cluster.routing.allocation] [metrics-master-0] Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[logs-2018.12.05.14][0]] ...]).
[2018-12-05 14:03:21,060][INFO ][cluster.metadata ] [metrics-master-0] [logs-2018.12.05.14] update_mapping [accesslog]
[2018-12-05 14:03:21,150][TRACE][gateway ] [metrics-master-0] [logs-2018.12.05.14] writing state, reason [version changed from [1] to [2]]
[2018-12-05 14:04:13,751][INFO ][monitor.jvm ] [metrics-master-0] [gc][young][84690][863] duration [843ms], collections [1]/[3.2s], total [843ms]/[1m], memory [810.7mb]->[634.5mb]/[1gb], all_pools {[young] [451.2mb]->[70.5mb]/[451.2mb]}{[survivor] [56.1mb]->[0b]/[56.3mb]}{[old] [303.4mb]->[563.9mb]/[564mb]}
[2018-12-05 14:12:26,008][INFO ][monitor.jvm ] [metrics-master-0] [gc][young][85150][870] duration [839ms], collections [1]/[3.5s], total [839ms]/[1.1m], memory [770.3mb]->[608.8mb]/[1gb], all_pools {[young] [451.2mb]->[44.8mb]/[451.2mb]}{[survivor] [46.3mb]->[0b]/[56.3mb]}{[old] [272.8mb]->[563.9mb]/[564mb]}
[2018-12-05 14:23:32,618][INFO ][monitor.jvm ] [metrics-master-0] [gc][young][85799][878] duration [808ms], collections [1]/[1.4s], total [808ms]/[1.1m], memory [623.9mb]->[487.4mb]/[1gb], all_pools {[young] [446.3mb]->[7.1kb]/[451.2mb]}{[survivor] [0b]->[56.3mb]/[56.3mb]}{[old] [177.6mb]->[431mb]/[564mb]}
[2018-12-05 14:26:20,063][INFO ][monitor.jvm ] [metrics-master-0] [gc][young][85964][881] duration [717ms], collections [1]/[1.3s], total [717ms]/[1.1m], memory [591.1mb]->[441mb]/[1gb], all_pools {[young] [436mb]->[3.3kb]/[451.2mb]}{[survivor] [30.7mb]->[56.3mb]/[56.3mb]}{[old] [124.2mb]->[384.6mb]/[564mb]}
[2018-12-05 14:48:25,337][WARN ][monitor.jvm ] [metrics-master-0] [gc][young][87246][897] duration [1.1s], collections [1]/[3.6s], total [1.1s]/[1.2m], memory [842.8mb]->[700.8mb]/[1gb], all_pools {[young] [451.2mb]->[136.8mb]/[451.2mb]}{[survivor] [53.6mb]->[0b]/[56.3mb]}{[old] [337.9mb]->[563.9mb]/[564mb]}
[2018-12-05 14:53:06,380][WARN ][monitor.jvm ] [metrics-master-0] [gc][young][87501][901] duration [1.2s], collections [1]/[4.1s], total [1.2s]/[1.2m], memory [804.3mb]->[659.4mb]/[1gb], all_pools {[young] [451.2mb]->[95.4mb]/[451.2mb]}{[survivor] [43.5mb]->[0b]/[56.3mb]}{[old] [309.5mb]->[563.9mb]/[564mb]}
[2018-12-05 15:00:04,043][INFO ][cluster.metadata ] [metrics-master-0] [logs-2018.12.05.15] creating index, cause [auto(bulk api)], templates [cgn_l4_pe_log_template, cgn_l4_pc_template, slb_vport_l4_pc_template, slb_vport_l7_pr_template, fw_l4_pc_template], shards [1]/[0], mappings [cgn_l4_pc, slb_vport_l4_pc, fw_l4_pc, cgn_l4_pe_log, accesslog]
[2018-12-05 15:00:04,292][TRACE][gateway ] [metrics-master-0] [logs-2018.12.05.15] writing state, reason [freshly created]
[2018-12-05 15:00:04,310][INFO ][cluster.routing.allocation] [metrics-master-0] Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[logs-2018.12.05.15][0]] ...]).
[2018-12-05 15:03:36,363][INFO ][cluster.metadata ] [metrics-master-0] [logs-2018.12.05.15] update_mapping [accesslog]
[2018-12-05 15:03:36,425][TRACE][gateway ] [metrics-master-0] [logs-2018.12.05.15] writing state, reason [version changed from [1] to [2]]
[2018-12-05 15:05:48,571][WARN ][monitor.jvm ] [metrics-master-0] [gc][young][88208][910] duration [1.2s], collections [1]/[5s], total [1.2s]/[1.2m], memory [781.7mb]->[643.5mb]/[1gb], all_pools {[young] [451.2mb]->[79.5mb]/[451.2mb]}{[survivor] [51.1mb]->[0b]/[56.3mb]}{[old] [279.3mb]->[563.9mb]/[564mb]}
[2018-12-05 15:33:12,938][WARN ][monitor.jvm ] [metrics-master-0] [gc][young][89714][935] duration [1.5s], collections [1]/[4.2s], total [1.5s]/[1.2m], memory [712.2mb]->[505.4mb]/[1gb], all_pools {[young] [451.2mb]->[5.3mb]/[451.2mb]}{[survivor] [37.7mb]->[0b]/[56.3mb]}{[old] [223.3mb]->[500.1mb]/[564mb]}
[2018-12-05 15:43:50,929][WARN ][transport.netty ] [metrics-master-0] exception caught on transport layer [[id: 0x4d1b4257, /192.168.13.7:54848 => /192.168.13.12:9300]], closing connection
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
at java.lang.StringBuilder.append(StringBuilder.java:136)
at org.elasticsearch.transport.netty.MessageChannelHandler.handleResponse(MessageChannelHandler.java:152)
at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:124)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)

[2018-12-05 15:44:59,262][WARN ][transport.netty ] [metrics-master-0] exception caught on transport layer [[id: 0x75df096f, /192.168.13.17:43068 => /192.168.13.7:9300]], closing connection
java.lang.OutOfMemoryError: Java heap space

Please do not ping people not already involved in the thread. This forum is manned by volunteers, so you need to be patient.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.