Lots of disconnected logs and then OOM

shjdwxy · April 21, 2019, 2:39pm

hi,
ES version: 5.4.3
java version: 1.8.0_162
ES cluster: 3 dedicated master, 7 hot datanode, 14 cold datanode, 31GB heap for each node
9,425 indices and 23000 shards

ES cluster was running normally. But one node suddenly output lots of disconnected logs and then OOM. Before OOM， the CPU is exhausted. I am sure there is no problem with network.

logs:

gist.github.com

https://gist.github.com/wangxiangyu/199d7171ef2f9f3d68ba27dfed94ba80

gistfile1.txt

[2019-04-21T22:02:54,642][WARN ][o.e.a.b.TransportShardBulkAction] [jssz-billions-es-02-datanode_hot] [[billions-main.app-svr.app-feed-@2019.04.21-jssz01-0][0]] failed to perform indices:data/write/bulk[s] on replica [billions-main.app-svr.app-feed-@2019.04.21-jssz01-0]
[0], node[rrGzzIqxTGS8-7UcmBCWVQ], [R], s[STARTED], a[id=8cojHrnlQ6WQj54m16pLOg]
org.elasticsearch.transport.NodeDisconnectedException: [jssz-billions-es-01-datanode_hot][10.69.23.23:9300][indices:data/write/bulk[s][r]] disconnected
[2019-04-21T22:02:54,642][WARN ][o.e.a.b.TransportShardBulkAction] [jssz-billions-es-02-datanode_hot] [[billions-main.app-svr.app-feed-@2019.04.21-jssz01-0][0]] failed to perform indices:data/write/bulk[s] on replica [billions-main.app-svr.app-feed-@2019.04.21-jssz01-0]
[0], node[rrGzzIqxTGS8-7UcmBCWVQ], [R], s[STARTED], a[id=8cojHrnlQ6WQj54m16pLOg]
org.elasticsearch.transport.NodeDisconnectedException: [jssz-billions-es-01-datanode_hot][10.69.23.23:9300][indices:data/write/bulk[s][r]] disconnected
[2019-04-21T22:02:54,642][WARN ][o.e.a.b.TransportShardBulkAction] [jssz-billions-es-02-datanode_hot] [[billions-main.account.account-interface-@2019.04.21-jssz01-0][2]] failed to perform indices:data/write/bulk[s] on replica [billions-main.account.account-interface-@20
19.04.21-jssz01-0][2], node[rrGzzIqxTGS8-7UcmBCWVQ], [R], s[STARTED], a[id=p86QZ3L5QqKr4RkrXmLU2w]
org.elasticsearch.transport.NodeDisconnectedException: [jssz-billions-es-01-datanode_hot][10.69.23.23:9300][indices:data/write/bulk[s][r]] disconnected
[2019-04-21T22:02:54,642][WARN ][o.e.a.b.TransportShardBulkAction] [jssz-billions-es-02-datanode_hot] [[billions-main.account.account-interface-@2019.04.21-jssz01-0][0]] failed to perform indices:data/write/bulk[s] on replica [billions-main.account.account-interface-@20

This file has been truncated. show original

Does anyone meet this before?

DavidTurner · April 21, 2019, 3:26pm

I think you already asked this, or something very like it, here:

As before:

If you need help investigating your heap dump then you can ask more detailed questions here of course.

shjdwxy · April 22, 2019, 8:40am

ES heap is 31GB, I am wondering:

whether the heap can be dumped successfully when oom
even dump successfully, 31GB is too big to analyse.

DavidTurner · April 22, 2019, 9:17am

31GB is not really very large. I've never heard of that being a problem to dump, and MAT can normally open dumps of that size just fine.

system · May 20, 2019, 9:17am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
All data nodes died on the cluster Elasticsearch	7	1619	April 17, 2019
ES goes on RED with master node left Elasticsearch	4	465	January 3, 2019
[OutOfMemoryError[Java heap space]] Elasticsearch	2	2685	July 6, 2017
Elasticsearch showing NodeDisconnectedException followed by Outofmemory? Elasticsearch	3	1137	July 25, 2017
ES node out of memory: Java heap space Elasticsearch	16	548	March 2, 2021

Lots of disconnected logs and then OOM

Related topics