Remote cluster monitoring creating alerts with percentage values

Good day everyone, lately I've been trying to create alarms on remotely connected clusters using the following command on the clusters i connect:

the connection part has been succesful so far, but I'm posting it to create a better context.

the problem I've been running into is that while trying to find cpu and disk percentage values in the clusters I'm monitoring through the stack either using .monitoring* or monitoring-es-*

the values are not there, but if i go to stack monitoring I'm able to see said values are arriving

my question is: Where am I able to find these values? Am I searching for these values in the wrong place?

I have tried looking for os.cpu.percentages in the stack but the only thing close to a percentage i am able to find is the used JVM heap memory

2 Likes

Hi @jugonzalezv ,

I can help!

What does this mean? You mentioned. you were able to see values in Stack Monitoring but where are you unable to see these values?

1 Like

Hi!

I made an Index pattern that receives all the metrics that are sent by the connected clusters, .monitoring*, however inside this index I'm unable to find said fields that cotain the percents.

thanks for answering!

Where are you not seeing them? In Discover? Or In Metrics UI? Or are you querying against the index in Dev Tools?

I'm unable to find them whenever i do a cat mapping, or search in Discover, but as i said before the percent values do appear in the stack monitoring.

thanks again for answering!

Some fields are stored within .monitoring-* indices but do not exist in the mapping as they are not used for any filtering or aggregations. They just exist in the source document. I don't think Discover will show these either, as they are not mapped.

If you do a

POST .monitoring-es-*/_search
{
  "size": 1,
  "query": {
    "term": {
      "type": {
        "value": "cluster_stats"
      }
    }
  }
}

in Dev Tools, you can see exactly what's in the document(s)

1 Like

Thanks!

I tried the command you gave me it worked to solve 1 out of 2 problems i have

but now looking at CPU the only value I'm getting is inside a process and not inside os

I don't think we report on overall OS cpu, as these metrics are for Elastic stack products only.

If you are looking for OS level metrics, you should consider using beats to get this information. Unfortunately, it will be in a different format than the monitoring data (and therefore will not show up in the Stack Monitoring UI) but you can ingest both and create a dashboard showing any information you'd like.

sorry! I didn't mean to say that, i got confused

I managed to find disk and memory percentages with the comand you gave me, but this CPU usage value has been giving me headaches lately as i'm unable to find it even with the instructions you gave me.

thank you so much with the support you have given me so far, I've managed to learn new things!

That is sourced from either node_stats.process.cpu.percent within type: node_stats for .monitoring-es-* indices, or it could be using cgroup data which is calculated by looking at data within node_stats.os.cgroup

1 Like

I have tried in node_stats.process.cpu.percent but it always shows a 0 so i thought it was showing a different value, as in cgroup it just shows usage in nanos.

another question has risen, I'm trying to create alarms, the fact that said values are not aggregations doesn't allow me to compare them to a threshhold, is there anyway for me to be able to create alarms?

Are you running your ES nodes within containers? If so, this is expected and you can calculate CPU usage within containers roughly by usage / (periods * quota). In the UI you linked above, we calculate rates of change, as these values are counters and always increase as long as the node is online so ensure you use the derivative aggregation to get the same result as the UI.

1 Like

sorry if i havent responded lately it has been a busy couple of days, I understand now the UI Results, it was a confusion I had, now i asked earlier, I'm trying to create alarms but as the fs field is not an aggregation is it possible to use it as a threshold to create and monitor my disk usage?

thanks again for the support you have given me thus far!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.