Consistantly losing monitoring data for clusters

We have multiple clusters which log their monitoring data to their own monitoring clusters. All clusters consistantly stop logging monitoring data on multiple if not all nodes in the cluster after an extended period of time (days to weeks). The error logs from the nodes that stop logging have the following errors. Restarting the nodes experiencing the issue will resolve the issue. I am able to perform the query to /?filter_path=version.number of the monitoring cluster from the nodes experiencing the problem and they return the version info as expected.

[es5-node] # curl -v http://monitoring.cluster:9200/?filter_path=version.number

*   Trying ...
* Connected to monitoring.cluster () port 9200 (#0)
> GET /?filter_path=version.number HTTP/1.1
> Host: monitoring.cluster:9200
> User-Agent: curl/7.47.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Mon, 10 Jul 2017 16:27:56 GMT
< Content-Type: application/json; charset=UTF-8
< Content-Length: 47
< Connection: keep-alive
<
{
  "version" : {
    "number" : "5.2.0"
  }
}

[2017-07-10T17:09:22,594][INFO ][o.e.x.m.e.Exporters      ] [master-ip] skipping exporter [es5-monitoring] as it is not ready yet
[2017-07-10T17:09:37,689][WARN ][o.e.x.m.e.h.NodeFailureListener] connection failed to node at [http://monitoring.cluster:9200]
[2017-07-10T17:09:37,689][ERROR][o.e.x.m.e.h.VersionHttpResource] failed to verify minimum version [5.0.0-beta1] on the [xpack.monitoring.exporters.es5-monitoring] monitoring cluster
java.net.NoRouteToHostException: No route to host
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:?]
    at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvent(DefaultConnectingIOReactor.java:171) ~[?:?]
    at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:145) ~[?:?]
    at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:348) ~[?:?]
    at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:192) ~[?:?]
    at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) ~[?:?]
    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]

Hi @ryan.dyer

Is it possible that the DNS hostname's underlying IP address is changing after Elasticsearch is starting? If so, the JVM isn't going to notice because of DNS caching. You can update this in the $JAVA_HOME/lib/security/java.security file for Java itself via the networkaddress.cache.ttl setting.

https://www.elastic.co/guide/en/cloud/current/_dns_caching.html

This documentation is for the Elastic Cloud, but it's true for any instance of Elasticsearch.

Hope that helps,
Chris

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.