Remote cluster monitoring creating alerts with percentage values

jugonzalezv · April 7, 2021, 1:55pm

Good day everyone, lately I've been trying to create alarms on remotely connected clusters using the following command on the clusters i connect:

the connection part has been succesful so far, but I'm posting it to create a better context.

the problem I've been running into is that while trying to find cpu and disk percentage values in the clusters I'm monitoring through the stack either using .monitoring* or monitoring-es-*

the values are not there, but if i go to stack monitoring I'm able to see said values are arriving

my question is: Where am I able to find these values? Am I searching for these values in the wrong place?

I have tried looking for os.cpu.percentages in the stack but the only thing close to a percentage i am able to find is the used JVM heap memory

chrisronline · April 7, 2021, 4:22pm

Hi @jugonzalezv ,

I can help!

What does this mean? You mentioned. you were able to see values in Stack Monitoring but where are you unable to see these values?

jugonzalezv · April 7, 2021, 5:55pm

Hi!

I made an Index pattern that receives all the metrics that are sent by the connected clusters, .monitoring*, however inside this index I'm unable to find said fields that cotain the percents.

thanks for answering!

chrisronline · April 7, 2021, 6:07pm

Where are you not seeing them? In Discover? Or In Metrics UI? Or are you querying against the index in Dev Tools?

jugonzalezv · April 7, 2021, 6:42pm

I'm unable to find them whenever i do a cat mapping, or search in Discover, but as i said before the percent values do appear in the stack monitoring.

thanks again for answering!

chrisronline · April 7, 2021, 7:12pm

Some fields are stored within .monitoring-* indices but do not exist in the mapping as they are not used for any filtering or aggregations. They just exist in the source document. I don't think Discover will show these either, as they are not mapped.

If you do a

POST .monitoring-es-*/_search
{
  "size": 1,
  "query": {
    "term": {
      "type": {
        "value": "cluster_stats"
      }
    }
  }
}

in Dev Tools, you can see exactly what's in the document(s)

jugonzalezv · April 7, 2021, 7:59pm

Thanks!

I tried the command you gave me it worked to solve 1 out of 2 problems i have

but now looking at CPU the only value I'm getting is inside a process and not inside os

chrisronline · April 8, 2021, 1:56pm

I don't think we report on overall OS cpu, as these metrics are for Elastic stack products only.

If you are looking for OS level metrics, you should consider using beats to get this information. Unfortunately, it will be in a different format than the monitoring data (and therefore will not show up in the Stack Monitoring UI) but you can ingest both and create a dashboard showing any information you'd like.

jugonzalezv · April 8, 2021, 5:58pm

sorry! I didn't mean to say that, i got confused

I managed to find disk and memory percentages with the comand you gave me, but this CPU usage value has been giving me headaches lately as i'm unable to find it even with the instructions you gave me.

thank you so much with the support you have given me so far, I've managed to learn new things!

chrisronline · April 8, 2021, 6:30pm

That is sourced from either node_stats.process.cpu.percent within type: node_stats for .monitoring-es-* indices, or it could be using cgroup data which is calculated by looking at data within node_stats.os.cgroup

jugonzalezv · April 8, 2021, 7:19pm

I have tried in node_stats.process.cpu.percent but it always shows a 0 so i thought it was showing a different value, as in cgroup it just shows usage in nanos.

jugonzalezv · April 9, 2021, 4:35am

another question has risen, I'm trying to create alarms, the fact that said values are not aggregations doesn't allow me to compare them to a threshhold, is there anyway for me to be able to create alarms?

chrisronline · April 9, 2021, 1:51pm

Are you running your ES nodes within containers? If so, this is expected and you can calculate CPU usage within containers roughly by usage / (periods * quota). In the UI you linked above, we calculate rates of change, as these values are counters and always increase as long as the node is online so ensure you use the derivative aggregation to get the same result as the UI.

jugonzalezv · April 21, 2021, 8:46pm

sorry if i havent responded lately it has been a busy couple of days, I understand now the UI Results, it was a confusion I had, now i asked earlier, I'm trying to create alarms but as the fs field is not an aggregation is it possible to use it as a threshold to create and monitor my disk usage?

thanks again for the support you have given me thus far!

system · May 19, 2021, 8:47pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Kibana missing cpu stats Elasticsearch elastic-stack-monitoring	5	292	October 27, 2021
Stats os.cpu.percent is -1 Elasticsearch	2	664	October 7, 2019
Guidance setting up Kibana alarms on other cluster's metrics Kibana elastic-stack-monitoring , elastic-stack-alerting	4	599	March 18, 2021
Stack-monitoring Kibana elastic-stack-monitoring	5	383	April 21, 2022
Kibana watcher sending alert cpu utilisation more than 90% Elasticsearch elastic-stack-monitoring , elastic-stack-alerting	3	802	June 2, 2022

Remote cluster monitoring creating alerts with percentage values

Related topics