hi,
We use ES for log management and ES cluster is built by hot-cold architecture. One physical host holds one hot node and one cold node. Hot node and cold node share Cpu, memory, but use different storage( ssd for hot node, sata for cold node).
One cold node of one cluster lots of errors happened recently.
[2019-04-09T12:46:21,482][WARN ][o.e.c.s.ClusterService ] [jssz-billions-es-05-datanode_stale] cluster state update task [zen-disco-receive(from master [master {jssz-billions-es-01-masternode}{IfiUfj6nRKSRpQqSL2tmkQ}{D22n7s9YRwieNT67hyPbUg}{10.69.23.23}{10.69.23.23:9310} committed version [1458408]])] took [40.5s] above the warn threshold of 30s
[2019-04-09T12:55:45,538][WARN ][o.e.c.s.ClusterService ] [jssz-billions-es-05-datanode_stale] cluster state update task [zen-disco-receive(from master [master {jssz-billions-es-01-masternode}{IfiUfj6nRKSRpQqSL2tmkQ}{D22n7s9YRwieNT67hyPbUg}{10.69.23.23}{10.69.23.23:9310} committed version [1458422]])] took [43.9s] above the warn threshold of 30s
[2019-04-09T12:57:49,492][WARN ][o.e.c.s.ClusterService ] [jssz-billions-es-05-datanode_stale] cluster state update task [zen-disco-receive(from master [master {jssz-billions-es-01-masternode}{IfiUfj6nRKSRpQqSL2tmkQ}{D22n7s9YRwieNT67hyPbUg}{10.69.23.23}{10.69.23.23:9310} committed version [1458423]])] took [31.1s] above the warn threshold of 30s
[2019-04-09T12:59:22,582][WARN ][o.e.c.s.ClusterService ] [jssz-billions-es-05-datanode_stale] cluster state update task [zen-disco-receive(from master [master {jssz-billions-es-01-masternode}{IfiUfj6nRKSRpQqSL2tmkQ}{D22n7s9YRwieNT67hyPbUg}{10.69.23.23}{10.69.23.23:9310} committed version [1458425]])] took [55.1s] above the warn threshold of 30s
[2019-04-09T13:00:00,721][WARN ][o.e.c.s.ClusterService ] [jssz-billions-es-05-datanode_stale] cluster state update task [zen-disco-receive(from master [master {jssz-billions-es-01-masternode}{IfiUfj6nRKSRpQqSL2tmkQ}{D22n7s9YRwieNT67hyPbUg}{10.69.23.23}{10.69.23.23:9310} committed version [1458426]])] took [38.1s] above the warn threshold of 30s
[2019-04-09T13:35:41,830][WARN ][o.e.c.s.ClusterService ] [jssz-billions-es-05-datanode_stale] cluster state update task [zen-disco-receive(from master [master {jssz-billions-es-01-masternode}{IfiUfj6nRKSRpQqSL2tmkQ}{D22n7s9YRwieNT67hyPbUg}{10.69.23.23}{10.69.23.23:9310} committed version [1458624]])] took [32.4s] above the warn threshold of 30s
[2019-04-09T14:08:07,766][WARN ][o.e.c.s.ClusterService ] [jssz-billions-es-05-datanode_stale] cluster state update task [zen-disco-receive(from master [master {jssz-billions-es-01-masternode}{IfiUfj6nRKSRpQqSL2tmkQ}{D22n7s9YRwieNT67hyPbUg}{10.69.23.23}{10.69.23.23:9310} committed version [1458960]])] took [30.1s] above the warn threshold of 30s
The other nodes were normal beside this node. I have removed index and search request, but these log still happened.
So how to debug this problem? Any suggestions are welcome.