ES goes on RED with master node left


(Yogesh BG) #1

I have ES 2.4 cluster with one data node and one master node. Pumping the traffic for perf testing. Everything went fine for 2 days and 3rd day master goes weird and doesnt connect to datanode. when i restart the master node it comes to healthy cluster.

attached the logs for 3 days.

https://drive.google.com/open?id=1QDv_NvQmteta1dMviudpiPsm_Kyn_XzR


(Yogesh BG) #2

Can any one help me find out why this OOM can happen after 3 days
We have constant num of indices at any given point <10. in that one index is always going went till 50gb. others are light weight


(Yogesh BG) #3

[2018-12-05 12:59:42,967][INFO ][cluster.metadata ] [metrics-master-0] [logs-2018.12.05.13] update_mapping [accesslog]
[2018-12-05 12:59:43,022][TRACE][gateway ] [metrics-master-0] [logs-2018.12.05.13] writing state, reason [version changed from [1] to [2]]
[2018-12-05 12:59:43,082][TRACE][gateway ] [metrics-master-0] [logs-2018.12.05.13] writing state, reason [version changed from [2] to [3]]
[2018-12-05 13:55:03,320][WARN ][monitor.jvm ] [metrics-master-0] [gc][young][84164][857] duration [1.9s], collections [1]/[2.8s], total [1.9s]/[1m], memory [547.2mb]->[273.3mb]/[1gb], all_pools {[young] [443.9mb]->[8.3mb]/[451.2mb]}{[survivor] [33.2mb]->[56.3mb]/[56.3mb]}{[old] [70mb]->[208.6mb]/[564mb]}
[2018-12-05 13:56:40,896][WARN ][monitor.jvm ] [metrics-master-0] [gc][young][84259][858] duration [2.7s], collections [1]/[3.4s], total [2.7s]/[1m], memory [696.5mb]->[583.3mb]/[1gb], all_pools {[young] [431.5mb]->[8.1mb]/[451.2mb]}{[survivor] [56.3mb]->[56.3mb]/[56.3mb]}{[old] [208.6mb]->[518.8mb]/[564mb]}
[2018-12-05 13:59:43,887][INFO ][cluster.metadata ] [metrics-master-0] [logs-2018.12.05.14] creating index, cause [auto(bulk api)], templates [cgn_l4_pe_log_template, cgn_l4_pc_template, slb_vport_l4_pc_template, slb_vport_l7_pr_template, fw_l4_pc_template], shards [1]/[0], mappings [cgn_l4_pc, slb_vport_l4_pc, fw_l4_pc, cgn_l4_pe_log, accesslog]
[2018-12-05 13:59:44,040][TRACE][gateway ] [metrics-master-0] [logs-2018.12.05.14] writing state, reason [freshly created]
[2018-12-05 13:59:44,132][INFO ][cluster.routing.allocation] [metrics-master-0] Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[logs-2018.12.05.14][0]] ...]).
[2018-12-05 14:03:21,060][INFO ][cluster.metadata ] [metrics-master-0] [logs-2018.12.05.14] update_mapping [accesslog]
[2018-12-05 14:03:21,150][TRACE][gateway ] [metrics-master-0] [logs-2018.12.05.14] writing state, reason [version changed from [1] to [2]]
[2018-12-05 14:04:13,751][INFO ][monitor.jvm ] [metrics-master-0] [gc][young][84690][863] duration [843ms], collections [1]/[3.2s], total [843ms]/[1m], memory [810.7mb]->[634.5mb]/[1gb], all_pools {[young] [451.2mb]->[70.5mb]/[451.2mb]}{[survivor] [56.1mb]->[0b]/[56.3mb]}{[old] [303.4mb]->[563.9mb]/[564mb]}
[2018-12-05 14:12:26,008][INFO ][monitor.jvm ] [metrics-master-0] [gc][young][85150][870] duration [839ms], collections [1]/[3.5s], total [839ms]/[1.1m], memory [770.3mb]->[608.8mb]/[1gb], all_pools {[young] [451.2mb]->[44.8mb]/[451.2mb]}{[survivor] [46.3mb]->[0b]/[56.3mb]}{[old] [272.8mb]->[563.9mb]/[564mb]}
[2018-12-05 14:23:32,618][INFO ][monitor.jvm ] [metrics-master-0] [gc][young][85799][878] duration [808ms], collections [1]/[1.4s], total [808ms]/[1.1m], memory [623.9mb]->[487.4mb]/[1gb], all_pools {[young] [446.3mb]->[7.1kb]/[451.2mb]}{[survivor] [0b]->[56.3mb]/[56.3mb]}{[old] [177.6mb]->[431mb]/[564mb]}
[2018-12-05 14:26:20,063][INFO ][monitor.jvm ] [metrics-master-0] [gc][young][85964][881] duration [717ms], collections [1]/[1.3s], total [717ms]/[1.1m], memory [591.1mb]->[441mb]/[1gb], all_pools {[young] [436mb]->[3.3kb]/[451.2mb]}{[survivor] [30.7mb]->[56.3mb]/[56.3mb]}{[old] [124.2mb]->[384.6mb]/[564mb]}
[2018-12-05 14:48:25,337][WARN ][monitor.jvm ] [metrics-master-0] [gc][young][87246][897] duration [1.1s], collections [1]/[3.6s], total [1.1s]/[1.2m], memory [842.8mb]->[700.8mb]/[1gb], all_pools {[young] [451.2mb]->[136.8mb]/[451.2mb]}{[survivor] [53.6mb]->[0b]/[56.3mb]}{[old] [337.9mb]->[563.9mb]/[564mb]}
[2018-12-05 14:53:06,380][WARN ][monitor.jvm ] [metrics-master-0] [gc][young][87501][901] duration [1.2s], collections [1]/[4.1s], total [1.2s]/[1.2m], memory [804.3mb]->[659.4mb]/[1gb], all_pools {[young] [451.2mb]->[95.4mb]/[451.2mb]}{[survivor] [43.5mb]->[0b]/[56.3mb]}{[old] [309.5mb]->[563.9mb]/[564mb]}
[2018-12-05 15:00:04,043][INFO ][cluster.metadata ] [metrics-master-0] [logs-2018.12.05.15] creating index, cause [auto(bulk api)], templates [cgn_l4_pe_log_template, cgn_l4_pc_template, slb_vport_l4_pc_template, slb_vport_l7_pr_template, fw_l4_pc_template], shards [1]/[0], mappings [cgn_l4_pc, slb_vport_l4_pc, fw_l4_pc, cgn_l4_pe_log, accesslog]
[2018-12-05 15:00:04,292][TRACE][gateway ] [metrics-master-0] [logs-2018.12.05.15] writing state, reason [freshly created]
[2018-12-05 15:00:04,310][INFO ][cluster.routing.allocation] [metrics-master-0] Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[logs-2018.12.05.15][0]] ...]).
[2018-12-05 15:03:36,363][INFO ][cluster.metadata ] [metrics-master-0] [logs-2018.12.05.15] update_mapping [accesslog]
[2018-12-05 15:03:36,425][TRACE][gateway ] [metrics-master-0] [logs-2018.12.05.15] writing state, reason [version changed from [1] to [2]]
[2018-12-05 15:05:48,571][WARN ][monitor.jvm ] [metrics-master-0] [gc][young][88208][910] duration [1.2s], collections [1]/[5s], total [1.2s]/[1.2m], memory [781.7mb]->[643.5mb]/[1gb], all_pools {[young] [451.2mb]->[79.5mb]/[451.2mb]}{[survivor] [51.1mb]->[0b]/[56.3mb]}{[old] [279.3mb]->[563.9mb]/[564mb]}
[2018-12-05 15:33:12,938][WARN ][monitor.jvm ] [metrics-master-0] [gc][young][89714][935] duration [1.5s], collections [1]/[4.2s], total [1.5s]/[1.2m], memory [712.2mb]->[505.4mb]/[1gb], all_pools {[young] [451.2mb]->[5.3mb]/[451.2mb]}{[survivor] [37.7mb]->[0b]/[56.3mb]}{[old] [223.3mb]->[500.1mb]/[564mb]}
[2018-12-05 15:43:50,929][WARN ][transport.netty ] [metrics-master-0] exception caught on transport layer [[id: 0x4d1b4257, /192.168.13.7:54848 => /192.168.13.12:9300]], closing connection
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
at java.lang.StringBuilder.append(StringBuilder.java:136)
at org.elasticsearch.transport.netty.MessageChannelHandler.handleResponse(MessageChannelHandler.java:152)
at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:124)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)

[2018-12-05 15:44:59,262][WARN ][transport.netty ] [metrics-master-0] exception caught on transport layer [[id: 0x75df096f, /192.168.13.17:43068 => /192.168.13.7:9300]], closing connection
java.lang.OutOfMemoryError: Java heap space


(Christian Dahlqvist) #5

Please do not ping people not already involved in the thread. This forum is manned by volunteers, so you need to be patient.