How to use select bucket aggregation

wifi · September 28, 2020, 12:26pm

I want to get a list of processes that utilize more than 1% CPU. So I created a nested bucket aggregation query which ends in an avg aggregation:

GET metricbeat-*/_search
{
  "aggs": {
    "host": {
      "terms": {
        "field": "agent.hostname"
      },
      "aggs": {
        "user": {
          "terms": {
            "field": "user.name"
          },
          "aggs": {
            "process": {
              "terms": {
                "field": "process.name"
              },
              "aggs": {
                "pid": {
                  "terms": {
                    "field": "process.pid"
                  },
                  "aggs": {
                    "cpu": {
                      "avg": {
                        "field": "system.process.cpu.total.pct"
                      }
                    },
                    "cpu_bucket_selector": {
                      "bucket_selector": {
                        "buckets_path": {
                          "avg_cpu": "cpu"
                        },
                        "script": "params.avg_cpu > 0.01"
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  },
  "size": 0,
  "stored_fields": [
    "*"
  ],
  "script_fields": {},
  "docvalue_fields": [
    {
      "field": "@timestamp",
      "format": "date_time"
    }
  ],
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "match_all": {}
        },
        {
          "range": {
            "@timestamp": {
              "gte": "now-5m/s"
            }
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  }
}

Values below 1% are filtered but this also leads to empty "pid" buckets in the results list. I tried to define the bucket selector at the very first parent aggregation like this:

GET metricbeat-*/_search
{
  "aggs": {
    "host": {
      "terms": {
        "field": "agent.hostname"
      },
      "aggs": {
        "user": {
          "terms": {
            "field": "user.name"
          },
          "aggs": {
            "process": {
              "terms": {
                "field": "process.name"
              },
              "aggs": {
                "pid": {
                  "terms": {
                    "field": "process.pid"
                  },
                  "aggs": {
                    "cpu": {
                      "avg": {
                        "field": "system.process.cpu.total.pct"
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    },
    "cpu_bucket_selector": {
      "bucket_selector": {
        "buckets_path": {
          "avg_cpu": "host>user>process>pid>cpu"
        },
        "script": "params.avg_cpu > 0.01"
      }
    }
  },
  "size": 0,
  "stored_fields": [
    "*"
  ],
  "script_fields": {},
  "docvalue_fields": [
    {
      "field": "@timestamp",
      "format": "date_time"
    }
  ],
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "match_all": {}
        },
        {
          "range": {
            "@timestamp": {
              "gte": "now-5m/s"
            }
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  }
}

But this gives me an error:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "action_request_validation_exception",
        "reason" : "Validation Failed: 1: bucket_selector aggregation [cpu_bucket_selector] must be declared inside of another aggregation;"
      }
    ],
    "type" : "action_request_validation_exception",
    "reason" : "Validation Failed: 1: bucket_selector aggregation [cpu_bucket_selector] must be declared inside of another aggregation;"
  },
  "status" : 400
}

Whats the issue and am I even on the right path?

spinscale · September 29, 2020, 9:59am

Just to make sure I understand the request, you are searching for any data, where the pct is greater than 0.01? If so, why don't you put this into the search request instead of using an aggregation for this? That would be much more efficient, as a lot less document would be returned and required to be parsed. You could still run an aggregation for the process name/pid on top of that data.

Hope that helps.

wifi · September 29, 2020, 2:53pm

The value 0.01 is an example and meant to be variable. The Query is: "give me all processes that used more than x percent of cpu during the last 5 minutes".

Anyway efficiency is not my problem. I want to filter whole branches depending on the value of the avg cpu aggregation. The bucket selector in the first example is working fine except that it returns branches without leaves sometimes. So i try to move the bucket selector further up in the aggregation tree but i can't figure the right parameters.

spinscale · September 30, 2020, 8:57am

I misread your requirement, not it seems more clear, thx for explaining.

Is it possible, that some buckets don't have any documents and thus you end up with empty buckets. You could try out the min_doc_count param for some aggs - but that is just a guess without seeing your data.

Pipeline aggs cannot be put in the 'root' elements of aggs - this is what the exception message tries to tell you by hinting to added inside of another agg.

wifi · October 2, 2020, 2:43pm

The buckets have always data just until i sort them out with the bucket selector

system · October 30, 2020, 2:44pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
CPU Usage of Specific Process Elasticsearch	1	591	May 17, 2017
Large buckets aggregations Elasticsearch	3	3043	July 5, 2017
Need to fetch Avg CPU values for a particular set of hosts Elasticsearch	1	311	August 5, 2020
Elasticsearch - bucket aggregation selection Elasticsearch	1	666	July 5, 2017
Aggregate with bucket selector and sorting on data stream time series data Elasticsearch painless , datastreams , aggregations	0	119	May 8, 2024

How to use select bucket aggregation

Related topics