We have multiple clusters which log their monitoring data to their own monitoring clusters. All clusters consistantly stop logging monitoring data on multiple if not all nodes in the cluster after an extended period of time (days to weeks). The error logs from the nodes that stop logging have the following errors. Restarting the nodes experiencing the issue will resolve the issue. I am able to perform the query to /?filter_path=version.number of the monitoring cluster from the nodes experiencing the problem and they return the version info as expected.
[es5-node] # curl -v http://monitoring.cluster:9200/?filter_path=version.number
* Trying ...
* Connected to monitoring.cluster () port 9200 (#0)
> GET /?filter_path=version.number HTTP/1.1
> Host: monitoring.cluster:9200
> User-Agent: curl/7.47.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Mon, 10 Jul 2017 16:27:56 GMT
< Content-Type: application/json; charset=UTF-8
< Content-Length: 47
< Connection: keep-alive
<
{
"version" : {
"number" : "5.2.0"
}
}
[2017-07-10T17:09:22,594][INFO ][o.e.x.m.e.Exporters ] [master-ip] skipping exporter [es5-monitoring] as it is not ready yet
[2017-07-10T17:09:37,689][WARN ][o.e.x.m.e.h.NodeFailureListener] connection failed to node at [http://monitoring.cluster:9200]
[2017-07-10T17:09:37,689][ERROR][o.e.x.m.e.h.VersionHttpResource] failed to verify minimum version [5.0.0-beta1] on the [xpack.monitoring.exporters.es5-monitoring] monitoring cluster
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:?]
at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvent(DefaultConnectingIOReactor.java:171) ~[?:?]
at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:145) ~[?:?]
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:348) ~[?:?]
at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:192) ~[?:?]
at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) ~[?:?]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]