shjdwxy
(shjdwxy)
April 21, 2019, 2:39pm
1
hi,
ES version: 5.4.3
java version: 1.8.0_162
ES cluster: 3 dedicated master, 7 hot datanode, 14 cold datanode, 31GB heap for each node
9,425 indices and 23000 shards
ES cluster was running normally. But one node suddenly output lots of disconnected logs and then OOM. Before OOM, the CPU is exhausted. I am sure there is no problem with network.
logs:
gistfile1.txt
[2019-04-21T22:02:54,642][WARN ][o.e.a.b.TransportShardBulkAction] [jssz-billions-es-02-datanode_hot] [[billions-main.app-svr.app-feed-@2019.04.21-jssz01-0][0]] failed to perform indices:data/write/bulk[s] on replica [billions-main.app-svr.app-feed-@2019.04.21-jssz01-0]
[0], node[rrGzzIqxTGS8-7UcmBCWVQ], [R], s[STARTED], a[id=8cojHrnlQ6WQj54m16pLOg]
org.elasticsearch.transport.NodeDisconnectedException: [jssz-billions-es-01-datanode_hot][10.69.23.23:9300][indices:data/write/bulk[s][r]] disconnected
[2019-04-21T22:02:54,642][WARN ][o.e.a.b.TransportShardBulkAction] [jssz-billions-es-02-datanode_hot] [[billions-main.app-svr.app-feed-@2019.04.21-jssz01-0][0]] failed to perform indices:data/write/bulk[s] on replica [billions-main.app-svr.app-feed-@2019.04.21-jssz01-0]
[0], node[rrGzzIqxTGS8-7UcmBCWVQ], [R], s[STARTED], a[id=8cojHrnlQ6WQj54m16pLOg]
org.elasticsearch.transport.NodeDisconnectedException: [jssz-billions-es-01-datanode_hot][10.69.23.23:9300][indices:data/write/bulk[s][r]] disconnected
[2019-04-21T22:02:54,642][WARN ][o.e.a.b.TransportShardBulkAction] [jssz-billions-es-02-datanode_hot] [[billions-main.account.account-interface-@2019.04.21-jssz01-0][2]] failed to perform indices:data/write/bulk[s] on replica [billions-main.account.account-interface-@20
19.04.21-jssz01-0][2], node[rrGzzIqxTGS8-7UcmBCWVQ], [R], s[STARTED], a[id=p86QZ3L5QqKr4RkrXmLU2w]
org.elasticsearch.transport.NodeDisconnectedException: [jssz-billions-es-01-datanode_hot][10.69.23.23:9300][indices:data/write/bulk[s][r]] disconnected
[2019-04-21T22:02:54,642][WARN ][o.e.a.b.TransportShardBulkAction] [jssz-billions-es-02-datanode_hot] [[billions-main.account.account-interface-@2019.04.21-jssz01-0][0]] failed to perform indices:data/write/bulk[s] on replica [billions-main.account.account-interface-@20
This file has been truncated. show original
Does anyone meet this before?
I think you already asked this, or something very like it, here:
hi,
ES version: 5.4.3
java version: 1.8.0_162
ES was running normally and GC is normal too. But one node suddenly OOM with "fatal error on the network layer"
logs link:
Does anyone meet this before? How to debug this problem ?
Thank you in advance!
As before:
By default Elasticsearch will write a heap dump when it encounters an OutOfMemoryError. The best thing to do is to open this heap dump (e.g. in MAT ) and investigate what was using all the heap.
If you need help investigating your heap dump then you can ask more detailed questions here of course.
shjdwxy
(shjdwxy)
April 22, 2019, 8:40am
3
ES heap is 31GB, I am wondering:
whether the heap can be dumped successfully when oom
even dump successfully, 31GB is too big to analyse.
31GB is not really very large. I've never heard of that being a problem to dump, and MAT can normally open dumps of that size just fine.
system
(system)
Closed
May 20, 2019, 9:17am
5
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.