One ES node not showing in Kibana Monitoring

I recently upgraded 3 (of 9) nodes to new hardware. Since that time, one of the nodes (es02) is not visible in the Kibana "Monitoring" page. ALL nodes are visible via Elasticsearch Head plugin.

Version 6.8.3

Based on earlier report from another user (aparently abandoned by the user), I will attach the results requested by one of the ES developers for that ticket.

In the results below, the nodes are es01,02,03,5,06,07,08,09, and hd3. The node that is not displaying is es02; the other new machines that are configured the same way and are working fine are es01 and es03. Please note that the node appears to be otherwise working fine.

  1. GET _cat/indices?v

    File: indices.txt

  2. GET _cat/nodes?v

    File: nodes.json

  3. GET _cluster/settings?include_defaults&filter_path=.xpack.monitoring.

    File: cluster.json

  4. POST .monitoring-es-*/_search

    File: monitoring.json

  5. Could you post all xpack.monitoring.* settings from one of your "good" node's elasticsearch.yml? Similarly, could you post all xpack.monitoring.* settings from one of your "bad" node's elasticsearch.yml?

    Good node:

    [root@es01:/etc/elasticsearch]# grep xpack elasticsearch.yml
    xpack.monitoring.collection.enabled: true

    Bad node:

    [root@es02:/etc/elasticsearch]# grep xpack elasticsearch.yml
    #xpack.monitoring.enabled: true
    xpack.monitoring.collection.enabled: true
    #xpack.monitoring.elasticsearch.collection.enabled: true
    (The commented out lines were tried based on suggestions found elsewhere)

** Please advise how/where to send the result files. I thought I could attach them here.

Hi @thealy

Could you please query the cluster where the monitoring data is stored with the following?

GET /.monitoring-es*/_search?filter_path=hits.hits._source.node_stats.node_id

You should get a list of node IDs. Could you please compare that list to what's returned by GET /_nodes?filter_path=nodes.*.name see if the node that's not appearing (es02) is present?

Hi Mike-

Thanks for your help. Here are the results:

GET /.monitoring-es*/_search?filter_path=hits.hits._source.node_stats.node_id
indent preformatted text by 4 spaces
{
"hits" : {
"hits" : [
{
"_source" : {
"node_stats" : {
"node_id" : "3PgfzMugQhihpx1eiHEfOQ"
}
}
}
]
}
}

GET /_nodes?filter_path=nodes.*.name

{
"nodes" : {
"pelqvr1NSCWkRCQ0B0uonw" : {
"name" : "es02"
},
"L_4bDstyTlO4ojV577QkvA" : {
"name" : "es07"
},
"3PgfzMugQhihpx1eiHEfOQ" : {
"name" : "hd3"
},
"U30RaA3LQFqTKCLvvBEYCA" : {
"name" : "es09"
},
"ny_bTxplRwq1YUsnNDswPQ" : {
"name" : "es08"
},
"wbeNKPt7T_WmYDzkeW0TMA" : {
"name" : "es03"
},
"MuCs6HxkT0W9IqKSz--Bpw" : {
"name" : "es05"
},
"QYoIq7mlQ9W4DpyymdfR2w" : {
"name" : "es06"
},
"IG8pEj3WRi-Ia_seu2HVvg" : {
"name" : "es01"
}
}
}

Any thoughts Mike?

@thealy Are you certain that GET /.monitoring-es*/_search?filter_path=hits.hits._source.node_stats.node_id was run from the monitoring cluster? The other nodes should be shown there, so this is a bit surprising.

I'm afraid I don't understand what you are asking. We have only one cluster, and the command was run there. The "missing" node (es02) is in the list.

{
"nodes" : {
"wbeNKPt7T_WmYDzkeW0TMA" : {
"name" : "es03"
},
"QYoIq7mlQ9W4DpyymdfR2w" : {
"name" : "es06"
},
"pelqvr1NSCWkRCQ0B0uonw" : {
"name" : "es02"
},
"IG8pEj3WRi-Ia_seu2HVvg" : {
"name" : "es01"
},
"L_4bDstyTlO4ojV577QkvA" : {
"name" : "es07"
},
"ny_bTxplRwq1YUsnNDswPQ" : {
"name" : "es08"
},
"MuCs6HxkT0W9IqKSz--Bpw" : {
"name" : "es05"
},
"3PgfzMugQhihpx1eiHEfOQ" : {
"name" : "hd3"
},
"U30RaA3LQFqTKCLvvBEYCA" : {
"name" : "es09"
}
}
}

Let's try a slightly different query.

GET .monitoring-es-*/_search?filter_path=aggregations.node_ids
{
  "size": 0,
  "query": {
    "term": {
      "type": {
        "value": "node_stats"
      }
    }
  }, 
  "aggs": {
    "node_ids": {
      "terms": {
        "field": "node_stats.node_id",
        "size": 50
      }
    }
  }
}

Do you see the es02 showing up there? Lemme know and don't worry - we'll figure this out!

Yes, the key for es02 is in the results.

Okay, great!

Now let's make sure it's reporting at the proper time intervals:

POST .monitoring-es-*/_search
{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "type": "node_stats"
          }
        },
        {
          "range": {
            "timestamp": {
              "gte": "now-1h"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "nodes": {
      "terms": {
        "field": "node_stats.node_id",
        "size": 100
      },
      "aggs": {
        "timestamp": {
          "terms": {
            "field": "timestamp",
            "size": 100
          }
        }
      }
    }
  }
}

Let's make sure it's there again before moving to the next step. Please paste the entire output

Paste 4,104 lines?

Feel free to use a gist or something. I want to make sure we don't miss anything

I think.

Any additional thoughts?

@thealy I don't think we ever saw the gist get posted. Did you link it here?

Let's try this one: https://gist.github.com/thealy/52bdad6fc825e4f392798b5e654851cd

Sorry for the delay here. In looking at this gist, it does appear that @chrisronline is on the right track. I see 8 of the 9 machines in this list and es02 is indeed missing. That suggests that although the data may be indexed, it may have an incorrect time stamp. Might it be the case that es02 is somehow an outlier with the timestamps it is setting?

Spot on! Restarted NTP and up it came. Sorry for it to be something so lame, and appreciate the help in getting it back.

That's great news! Glad things are working for you now. Thanks, @thealy

@Mike_Place and @chrisronline,

We are also facing similar problem. Our cluster comprises of 3 master, 3 coordinate, 4 hot and 4 warm on 6.7.1 version. We upgrade one master and one warm node to 6.8.3 but those are not visible in Kibana monitoring console.

  1. Output of

    GET .monitoring-es-*/_search?filter_path=aggregations.node_ids
    {
    "size": 0,
    "query": {
    "term": {
    "type": {
    "value": "node_stats"
    }
    }
    },
    "aggs": {
    "node_ids": {
    "terms": {
    "field": "node_stats.node_id",
    "size": 50
    }
    }
    }
    }

is

{
  "aggregations" : {
	"node_ids" : {
	  "doc_count_error_upper_bound" : 0,
	  "sum_other_doc_count" : 0,
	  "buckets" : [
		{
		  "key" : "XZFg8VpuRS2WIse9KBIvMA",
		  "doc_count" : 51900
		},
		{
		  "key" : "ebvjKCrhQvK8581Uvr8Gew",
		  "doc_count" : 51900
		},
		{
		  "key" : "xDtX5E1JRnu2BNv8zHe8bQ",
		  "doc_count" : 51900
		},
		{
		  "key" : "yvOSip5mQRG3uThwezXGBg",
		  "doc_count" : 51899
		},
		{
		  "key" : "GdXTW8qmR2aDXclXrQAgRw",
		  "doc_count" : 51897
		},
		{
		  "key" : "Qfpu2zrJTHaP_ngUNLZ7Jw",
		  "doc_count" : 51897
		},
		{
		  "key" : "A8Qe1_q3QUqYPsgw3jcz5g",
		  "doc_count" : 51896
		},
		{
		  "key" : "2OazDzlySLWXeTlNlkQ-cA",
		  "doc_count" : 51895
		},
		{
		  "key" : "XYIU3i3cRw-C7jAdDRj17g",
		  "doc_count" : 51895
		},
		{
		  "key" : "gdoeR2EzTvCuE4RDeeb_aA",
		  "doc_count" : 51895
		},
		{
		  "key" : "djCtwvYuSgWcgQ2DnbvitQ",
		  "doc_count" : 51893
		},
		{
		  "key" : "JNSXpU_1SP-nYn5lxWxZXQ",
		  "doc_count" : 51890
		},
		{
		  "key" : "a5_1IeuxSji6QYlo3SHQ6Q",
		  "doc_count" : 44169
		},
		{
		  "key" : "m_7bmIwbRoi-j80tDms2lw",
		  "doc_count" : 19446
		}
	  ]
	}
  }
}
  1. Output of

    POST .monitoring-es-*/_search
    {
    "size": 0,
    "query": {
    "bool": {
    "filter": [
    {
    "term": {
    "type": "node_stats"
    }
    },
    {
    "range": {
    "timestamp": {
    "gte": "now-1h"
    }
    }
    }
    ]
    }
    },
    "aggs": {
    "nodes": {
    "terms": {
    "field": "node_stats.node_id",
    "size": 100
    },
    "aggs": {
    "timestamp": {
    "terms": {
    "field": "timestamp",
    "size": 100
    }
    }
    }
    }
    }
    }

is

please suggest..

We upgrade one master and one warm node to 6.8.3

What are the node ids for these particular nodes? Do you see them in either output?