System.process.cpu.total.pct Output Clarifications

Hi -
I tried to get the Total Percentage of CPU consumption of a specific process using Ruby client, with an aggregation of AVG and MAX.

For Average aggregation, I've got the output as 0.0834, and which also displayed on Kibana process dashboard as 8.33 (multiplied by 100)

and when I tried for MAX aggregation for the same process I've got it as 9.434 .

I'd like to know how would the final value display ? does Kibana multiply with 100 for each aggregated output? or does it depends on the decimal in the output? Please let me know. Thanks !

Hi @prakash1243,

Whenever I look at aggregations in Kibana, I like to take a step back and think about what they're doing in Elasticsearch. Given your post's title, I'm guessing you have a pretty complete document, possibly from Metricbeat.

I imagine you have your data in a document, something like this (simplified, made up document):

{
  "@timestamp": "2017-04-19T15:43:00.123Z",
  "host": "xyz",
  "ip": "1.2.3.4",
  "process": "myprocess",
  "ram_bytes": 123456,
  "cpu_pct": 0.0123
}

The CPU being 0.0123 represents 1.23% utilization. If I had a bunch of these documents, then it would make sense to aggregate them to figure out what is happening for the last 15 minutes and only that process, myprocess:

GET /my-index-*/_search
{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "match": {
            "process": "myprocess"
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": "now-15m"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "average_cpu": {
      "avg": {
        "field": "cpu_pct"
      }
    },
    "max_cpu": {
      "max": {
        "field": "cpu_pct"
      }
    }
  }
}

(In reality it may make more sense to efficiently do this via single stats aggregation rather than two separate ones if you were doing it directly via Elasticsearch)

At this point, you'd get back two aggregated metrics: the average and the maximum of the CPU. Both of those values would be the raw calculation relative to the matching values. The average would not be pre-multipled by 100, nor would the maximum because Elasticsearch does not know that this field is actually a percentage.

On the other hand, Kibana can be configured to format numbers specially, such as the Percentage format. Kibana will then know to multiply the number by 100. It's specifically how you tell Kibana to format it that it decides: it will never automatically format a number without some way to know how to format it. You can control this at the visualization level and within the index pattern.

Perhaps more interestingly, you can see exactly what search/aggregation Kibana sends to Elasticsearch by going under the visualization and clicking on the "Request" and "Response" buttons. You can copy/paste the request and look at it in Console to see how it really works, or just look at the response there.

Hope that helps,
Chris

@pickypg That really helps.

Here is the query I had executed:

GET /metricbeat-2017.04.20/_search
{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "match": {
            "system.process.name": "metricbeat.exe"
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": "1492639200000",
              "lte": "1492725599999"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "average_cpu": {
      "avg": {
        "field": "system.process.cpu.total.pct"
      }
    },
    "max_cpu": {
      "max": {
        "field": "system.process.cpu.total.pct"
      }
    }
  }
}

And here is the output I received:

"aggregations": {
    "average_cpu": {
      "value": 0.02864462686567158
    },
    "max_cpu": {
      "value": 0.776
    }
  }
}

The output is the raw data from the Elastic Search, I've just noticed the parameter is been formatted as % in the Index patterns on Kibana. So it does multiply and show it on the dashlets.

Per my understanding, metricbeat had touched 0.77% during the whole run and with a average of 0.028 ? We are analyzing the CPU usage of the beats, with different time intervals .

Please let me know. Thanks !

Per my understanding, metricbeat had touched 0.77% during the whole run and with a average of 0.028 ? We are analyzing the CPU usage of the beats, with different time intervals .

77.6%, but otherwise correct: some document in that time range with that process had a value of 0.776 (77.6%). You could even use the max_cpu value to find the document in question.

Keep in mind that floating points numbers are always a little sketchy in terms of precision, so I would never try to search for a floating point number's exact value (this one ironically would probably be fine, but imagine that the stated average was the max and I would definitely never look for that number), but rather a range:

GET /my-index-*/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "system.process.cpu.total.pct": {
              "gte": 0.775
            }
          }
        },
        {
          "match": {
            "system.process.name": "metricbeat.exe"
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": "1492639200000",
              "lte": "1492725599999"
            }
          }
        }
      ]
    }
  }
}

That should give you back the maximum document so you can see it's structure. If you could, please post it here because I think it may further help with the understanding.

Hope that helps,
Chris

Hi Chris @pickypg

I had deleted the index with the above data, but scheduled the beats again and captured the output and document as well, Here are the details:

Max and Average values:

 "aggregations": {
    "average_cpu": {
      "value": 0.00013767176562094898
    },
    "max_cpu": {
      "value": 0.094
    }
  }

And I ran your above query with 0.094 as the filter and here is the document received:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0,
    "hits": [
      {
        "_index": "metricbeat-6.0.0-alpha1-2017.04.20",
        "_type": "doc",
        "_id": "AVuKJ0tAdFW3h5XRRlha",
        "_score": 0,
        "_source": {
          "@timestamp": "2017-04-20T07:52:00.923Z",
          "beat": {
            "hostname": "HPPP-01-DL20-Secondary",
            "name": "HPPP-01-DL20-Secondary",
            "version": "6.0.0-alpha1"
          },
          "metricset": {
            "module": "system",
            "name": "process",
            "rtt": 3999
          },
          "system": {
            "process": {
              "cmdline": "\"C:\\Program Files\\metricbeat\\\\metricbeat.exe\" -c \"C:\\Program Files\\metricbeat\\\\metricbeat.yml\" -path.home \"C:\\Program Files\\metricbeat\" -path.data \"C:\\\\ProgramData\\\\metricbeat\"",
              "cpu": {
                "start_time": "2017-04-20T06:52:38.615Z",
                "total": {
                  "pct": 0.0943
                }
              },
              "memory": {
                "rss": {
                  "bytes": 23928832,
                  "pct": 0.0007
                },
                "share": 0,
                "size": 14684160
              },
              "name": "metricbeat.exe",
              "pgid": 0,
              "pid": 4536,
              "ppid": 568,
              "state": "running",
              "username": "NT AUTHORITY\\SYSTEM"
            }
          },
          "type": "metricsets"
        }
      },
      {
        "_index": "metricbeat-6.0.0-alpha1-2017.04.20",
        "_type": "doc",
        "_id": "AVuKSXVUdFW3h5XRRqoG",
        "_score": 0,
        "_source": {
          "@timestamp": "2017-04-20T08:29:19.921Z",
          "beat": {
            "hostname": "HPPP-01-DL20-Secondary",
            "name": "HPPP-01-DL20-Secondary",
            "version": "6.0.0-alpha1"
          },
          "metricset": {
            "module": "system",
            "name": "process",
            "rtt": 3926
          },
          "system": {
            "process": {
              "cmdline": "\"C:\\Program Files\\metricbeat\\\\metricbeat.exe\" -c \"C:\\Program Files\\metricbeat\\\\metricbeat.yml\" -path.home \"C:\\Program Files\\metricbeat\" -path.data \"C:\\\\ProgramData\\\\metricbeat\"",
              "cpu": {
                "start_time": "2017-04-20T06:52:38.615Z",
                "total": {
                  "pct": 0.094
                }
              },
              "memory": {
                "rss": {
                  "bytes": 24174592,
                  "pct": 0.0007
                },
                "share": 0,
                "size": 14684160
              },
              "name": "metricbeat.exe",
              "pgid": 0,
              "pid": 4536,
              "ppid": 568,
              "state": "running",
              "username": "NT AUTHORITY\\SYSTEM"
            }
          },
          "type": "metricsets"
        }
      },
      {
        "_index": "metricbeat-6.0.0-alpha1-2017.04.20",
        "_type": "doc",
        "_id": "AVuKffLtdFW3h5XRRycS",
        "_score": 0,
        "_source": {
          "@timestamp": "2017-04-20T09:26:39.939Z",
          "beat": {
            "hostname": "HPPP-01-DL20-Secondary",
            "name": "HPPP-01-DL20-Secondary",
            "version": "6.0.0-alpha1"
          },
          "metricset": {
            "module": "system",
            "name": "process",
            "rtt": 3899
          },
          "system": {
            "process": {
              "cmdline": "\"C:\\Program Files\\metricbeat\\\\metricbeat.exe\" -c \"C:\\Program Files\\metricbeat\\\\metricbeat.yml\" -path.home \"C:\\Program Files\\metricbeat\" -path.data \"C:\\\\ProgramData\\\\metricbeat\"",
              "cpu": {
                "start_time": "2017-04-20T06:52:38.615Z",
                "total": {
                  "pct": 0.0942
                }
              },
              "memory": {
                "rss": {
                  "bytes": 24285184,
                  "pct": 0.0007
                },
                "share": 0,
                "size": 14684160
              },
              "name": "metricbeat.exe",
              "pgid": 0,
              "pid": 4536,
              "ppid": 568,
              "state": "running",
              "username": "NT AUTHORITY\\SYSTEM"
            }
          },
          "type": "metricsets"
        }
      }
    ]
  }
}

Excellent, thanks.

So a few things are going on here that I wanted to show from real documents:

Kibana would be limiting the the request to a some time-span, which explains why it was 0.094 and not one of the other values.

Elasticsearch is going to use the query to limit the number of hits to be whatever is relevant. From there the aggregations only handle the documents that are within that view, or search context, to perform its aggregations on. So, when you ask for the max, it's a matter of only looking at these few documents and plucking out the raw value, which happened to be 0.094 when you ran the search + aggregation.

The average follows the exact same flow, but it has to do more because it adds up all of the values then divides by the count.

Hope that helps,
Chris

Okies, Got you. Thanks !

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.