Data cluster failing to connect to monitoring cluster

We're running a 2 node cluster and a separate 1 node monitoring cluster. Both clusters are running elasticsearch v5.0.2. The monitoring node is running Kibana v5.0.2. Everything was working fine for a few days until suddenly the 2 node cluster stopped being able to connect to the monitoring cluster. I see the following exception:

[2016-12-14T23:35:47,626][WARN ][o.e.x.m.e.h.NodeFailureListener] connection failed to node at [http://DOMAIN-NAME-REPLACED:9200]
[2016-12-14T23:35:47,627][ERROR][o.e.x.m.e.h.VersionHttpResource] failed to verify minimum version [5.0.0-beta1] on the [xpack.monitoring.exporters.id1] monitoring cluster
[2016-12-14T23:35:47,630][INFO ][o.e.x.m.e.Exporters      ] [products-qa1.localdomain] skipping exporter [id1] as it isn't ready yet
[2016-12-14T23:35:47,630][ERROR][o.e.x.m.AgentService     ] [products-qa1.localdomain] exception when exporting documents
org.elasticsearch.xpack.monitoring.exporter.ExportException: exporters are either not ready or faulty

I'm able to curl the monitoring node from both nodes in the main cluster. I checked the logs for both elasticsearch and kibana on the monitoring node but didn't see any exceptions. Restarting elasticsearch and kibana didn't help either. After restarting, I'm occasionally seeing this exception in the elasticsearch logs of the monitoring node:

[2016-12-14T22:57:59,224][DEBUG][o.e.a.s.TransportSearchAction] [monitor2.localdomain] [.kibana][0], node[mZQDRKCBSdCEB7xNaF6EmA], [P], s[STARTED], a[id=JU_3TnGcQ16HwDSs6uY4lg]: Failed to execute [SearchRequest{searchType=QUERY_THEN_FETCH, indices=[], indicesOptions=IndicesOptions[id=38, ignore_unavailable=false, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, allow_alisases_to_multiple_indices=true, forbid_closed_indices=true], types=[cluster_stats], routing='null', preference='null', requestCache=null, scroll=null, source={
  "size" : 1,
  "query" : {
    "bool" : {
      "filter" : [
        {
          "term" : {
            "cluster_uuid" : {
              "value" : "w-v6k0pbQpyKWLuaJUbQUw",
              "boost" : 1.0
            }
          }
        },
        {
          "range" : {
            "timestamp" : {
              "from" : null,
              "to" : 1481756279098,
              "include_lower" : true,
              "include_upper" : true,
              "format" : "epoch_millis",
              "boost" : 1.0
            }
          }
        }
      ],
      "disable_coord" : false,
      "adjust_pure_negative" : true,
      "boost" : 1.0
    }
  },
  "sort" : [
    {
      "timestamp" : {
        "order" : "desc"
      }
    }
  ],
  "ext" : { }
}}] lastShard [true]
org.elasticsearch.transport.RemoteTransportException: [monitor2.localdomain][10.0.1.136:9300][indices:data/read/search[phase/query]]
Caused by: org.elasticsearch.index.query.QueryShardException: No mapping found for [timestamp] in order to sort on

the only custom jvm option passed to the monitoring node is -Djava.net.preferIPv4Stack=true, though i don't think that this is the issue because everything was working fine for quite awhile. Any idea as to what may have happened? let me know if more information is required. I didn't include the full stactrace due to character limitations

Unfortunately, there could be a lot of reasons why the data cluster can't connect to the monitoring cluster.

Restarting elasticsearch and kibana didn't help either

Based on the first error text you pasted, this seems to be a problem of one ES cluster (your data cluster) not able to connect to another (your monitoring cluster), so Kibana isn't a factor here.

A few questions:

  • Does the monitoring cluster have a dedicated instance of Kibana, and are you able to make an Index Pattern for .monitoring-es-* and see data in Discover?

I'm able to curl the monitoring node from both nodes in the main cluster

  • What type of request did you send to the monitoring node? Was it something that actually tried to read the monitoring indices such as:
curl 'http://DOMAIN-NAME-REPLACED:9200/.monitoring-data-2/cluster_info/_search?_source=cluster_uuid&pretty'

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.