Elasticsearch monitoring stopped abruptly

We have ELK cluster in Kubernetes and monitoring is enabled for Elasticsearch and was working fine. Today it stopped working.
Please find the monitoring chart of Kibana below.

Monitoring index for today ( .monitoring-es-7-2020.05.29) is there, but not much activity. Yesterday's index have about 1.2k docs per minute, but today only 40docs per minute and no chart.

Is there a way to check the monitoring status?
Where should I be looking for issues ?

Hi @tamilsweet,

Are there any error in Kibana or Elasticsearch server logs?

Can you also run this query against the monitoring cluster and return the results?

POST .monitoring-es-*/_search
{
  "size": 0,
  "aggs": {
    "types": {
      "terms": {
        "field": "type",
        "size": 10
      },
      "aggs": {
        "index": {
          "terms": {
            "field": "_index",
            "size": 10
          },
          "aggs": {
            "times": {
              "max": {
                "field": "timestamp"
              }
            }
          }
        }
      }
    }
  }
}

@chrisronline Thanks for your response.
I did not see any errors in Kibana or Elasticsearch server logs.
The monitoring recovered itself after sometime.

Had to create gist due to char limit of post. https://gist.github.com/tamilsweet/b4dccda685d63045b6cfa7f874be34de

You can notice that the counts on 29th are lesser than other days.
The duration of data loss was between 2020-05-28T22:45:00.000Z and
2020-05-29T05:20:00.000Z

I want to understand what happened/caused this break in stats. But I also understand with the data we have, its nearly impossible to figure out the reason.
I really appreciate your help.

Yea that's interesting. I'm glad it cleared up, but I wonder if there was any backpressure causing indexing issues during the outage. However, if the Elasticsearch logs fail to uncover anything, I'm not sure what steps we can take to figure it out. Please keep an eye on it and let us know if it happens again.

Sure, Thanks @chrisronline

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.