Hi team
We have multiple clusters that export monitoring data to a remote cluster.
All nodes were reporting timeouts when connecting to the monitoring cluster, some of the nodes with hight CPU uage(close to 100%), the memory is normal, nothing changed.
I want to find the root cause, could you help me, thanks!
Node logs:
[2019-03-27T09:04:18,070][WARN ][o.e.x.m.e.h.NodeFailureListener] connection failed to node at [http://X.X.X.X:9200]
[2019-03-27T09:04:18,077][WARN ][o.e.x.m.e.h.HttpExportBulkResponseListener] bulk request failed unexpectedly
java.io.IOException: request retries exceeded max retry timeout [30000]
at org.elasticsearch.client.RestClient$1.retryIfPossible(RestClient.java:574) [elasticsearch-rest-client-6.4.0.jar:6.4.0]
at org.elasticsearch.client.RestClient$1.failed(RestClient.java:561) [elasticsearch-rest-client-6.4.0.jar:6.4.0]
at org.apache.http.concurrent.BasicFuture.failed(BasicFuture.java:134) [httpcore-4.4.5.jar:4.4.5]
at org.apache.http.impl.nio.client.AbstractClientExchangeHandler.failed(AbstractClientExchangeHandler.java:419) [httpasyncclient-4.1.2.jar:4.1.2]
at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.timeout(HttpAsyncRequestExecutor.java:375) [httpcore-nio-4.4.5.jar:4.4.5]
at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:92) [httpasyncclient-4.1.2.jar:4.1.2]
at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:39) [httpasyncclient-4.1.2.jar:4.1.2]
at org.apache.http.impl.nio.reactor.AbstractIODispatch.timeout(AbstractIODispatch.java:175) [httpcore-nio-4.4.5.jar:4.4.5]
at org.apache.http.impl.nio.reactor.BaseIOReactor.sessionTimedOut(BaseIOReactor.java:263) [httpcore-nio-4.4.5.jar:4.4.5]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.timeoutCheck(AbstractIOReactor.java:492) [httpcore-nio-4.4.5.jar:4.4.5]
at org.apache.http.impl.nio.reactor.BaseIOReactor.validate(BaseIOReactor.java:213) [httpcore-nio-4.4.5.jar:4.4.5]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:280) [httpcore-nio-4.4.5.jar:4.4.5]
at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) [httpcore-nio-4.4.5.jar:4.4.5]
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588) [httpcore-nio-4.4.5.jar:4.4.5]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
[2019-03-27T09:04:18,080][WARN ][o.e.x.m.MonitoringService] [node-5] monitoring execution failed
org.elasticsearch.xpack.monitoring.exporter.ExportException: Exception when closing export bulk
at org.elasticsearch.xpack.monitoring.exporter.ExportBulk$1$1.(ExportBulk.java:95) ~[?:?]
at org.elasticsearch.xpack.monitoring.exporter.ExportBulk$1.onFailure(ExportBulk.java:93) ~[?:?]
at org.elasticsearch.xpack.monitoring.exporter.ExportBulk$Compound$1.onResponse(ExportBulk.java:206) ~[?:?]
at org.elasticsearch.xpack.monitoring.exporter.ExportBulk$Compound$1.onResponse(ExportBulk.java:200) ~[?:?]
at org.elasticsearch.xpack.core.common.IteratingActionListener.onResponse(IteratingActionListener.java:96) ~[?:?]
at org.elasticsearch.xpack.monitoring.exporter.ExportBulk$Compound.lambda$doFlush$0(ExportBulk.java:164) ~[?:?]
at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:68) [elasticsearch-6.4.0.jar:6.4.0]
at org.elasticsearch.xpack.monitoring.exporter.http.HttpExportBulk$1.onFailure(HttpExportBulk.java:114) [x-pack-monitoring-6.4.0.jar:6.4.0]
at org.elasticsearch.client.RestClient$FailureTrackingResponseListener.onDefinitiveFailure(RestClient.java:844) [elasticsearch-rest-client-6.4.0.jar:6.4.0]
at org.elasticsearch.client.RestClient$1.retryIfPossible(RestClient.java:575) [elasticsearch-rest-client-6.4.0.jar:6.4.0]
at org.elasticsearch.client.RestClient$1.failed(RestClient.java:561) [elasticsearch-rest-client-6.4.0.jar:6.4.0]
at org.apache.http.concurrent.BasicFuture.failed(BasicFuture.java:134) [httpcore-4.4.5.jar:4.4.5]
at org.apache.http.impl.nio.client.AbstractClientExchangeHandler.failed(AbstractClientExchangeHandler.java:419) [httpasyncclient-4.1.2.jar:4.1.2]
at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.timeout(HttpAsyncRequestExecutor.java:375) [httpcore-nio-4.4.5.jar:4.4.5]
at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:92) [httpasyncclient-4.1.2.jar:4.1.2]
at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:39) [httpasyncclient-4.1.2.jar:4.1.2]
at org.apache.http.impl.nio.reactor.AbstractIODispatch.timeout(AbstractIODispatch.java:175) [httpcore-nio-4.4.5.jar:4.4.5]
at org.apache.http.impl.nio.reactor.BaseIOReactor.sessionTimedOut(BaseIOReactor.java:263) [httpcore-nio-4.4.5.jar:4.4.5]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.timeoutCheck(AbstractIOReactor.java:492) [httpcore-nio-4.4.5.jar:4.4.5]
at org.apache.http.impl.nio.reactor.BaseIOReactor.validate(BaseIOReactor.java:213) [httpcore-nio-4.4.5.jar:4.4.5]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:280) [httpcore-nio-4.4.5.jar:4.4.5]
at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) [httpcore-nio-4.4.5.jar:4.4.5]
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588) [httpcore-nio-4.4.5.jar:4.4.5]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: org.elasticsearch.xpack.monitoring.exporter.ExportException: failed to flush export bulks
at org.elasticsearch.xpack.monitoring.exporter.ExportBulk$Compound.lambda$doFlush$0(ExportBulk.java:156) ~[?:?]
... 18 more
Caused by: java.io.IOException: request retries exceeded max retry timeout [30000]
at org.elasticsearch.client.RestClient$1.retryIfPossible(RestClient.java:574) ~[?:?]
... 14 more
Monitor cluster logs:
[2019-03-27T09:08:29,062][ERROR][o.e.x.m.c.n.NodeStatsCollector] [node-2] collector [node_stats] timed out when collecting data