Filter Sum aggregation results

CSkobbs · June 16, 2017, 9:45am

Hi,

I'm facing a problem from a while : returning multiple values filtered from a sum aggregation.

I want to detect if the sum of my cs_bytes exceed a particular threshold
• this condition works :
"condition" : { "script" : { "inline": "for(int i = 0; i<10;i++){if (ctx.payload.aggregations.2.buckets[i].1.value > 100000000)return true}" }
where 2 is a bucket aggregation by terms (username) and 1 is my sum aggregation.

My for loop doesn't satisfy my excpectations : it cross the index and if one condition is met the whole condition is verified.

From there I want to extract the values ( values of i in my case where the condition is completed) of my condition and use them in my email alert.

I tried severals things to fix this issue :
• filter my sum aggregation before the condition in order to come up with the result , unfortunatly sum aggs doesn't allow sub-aggregations.
• Filter my aggs with a range aggs for the field cs_bytesbut this returns wrong data

The best way will be to be able to apply a range agg on my sum aggs...

To sum up my problem : I would like to find a proper way to extract the indices where my condition is verified

-Chris

spinscale · June 16, 2017, 9:59am

Hey,

not providing any data structure that your query returns makes it very hard to actual follow the problem along. If you provide a sample output, how your aggregations look like, this would be really helpful.

I also do not understand your tries to fix the issue, again because I have no idea how your data actually looks like on a single document base. Can you please provide more information before diving into solutions?

If you run an aggregation across several indices, there is no way to find out, which indices the aggregated data comes from - if you need that information you have to run several search requests across those indices - which you could do inside of a watch with a chain input for example.

Thanks!

--Alex

CSkobbs · June 16, 2017, 12:28pm

Thanks for your quick answer,

Here are some parts of my aggragation and the results (filtered)

GET *******/_search?filter_path=aggregations.2.buckets.key,aggregations.2.buckets.3.buckets.key,aggregations.2.buckets.3.buckets.1.value
{
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "analyze_wildcard": true,
            "query": "*"
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": "now-4h",
              "lte": "now"
            }
          }
        }
      ],
      "must_not": []
    }
  },
  "size": 0,
  "_source": {
    "excludes": []
  },
  "aggs": {
    "2": {
      "terms": {
        "field": "cs_username",
        "size": 3,
        "order": {
          "1": "desc"
        }
      },
      "aggs": {
        "1": {
          "sum": {
            "field": "cs_bytes"
          }
        },
        "3": {
          "terms": {
            "field": "cs_host",
            "size": 3,
            "order": {
              "1": "desc"
            }
          },
          "aggs": {
            "1": {
              "sum": {
                "field": "cs_bytes"
              }
            }
          }
        }
      }
    }
  }
}

and the results :

{
  "aggregations": {
    "2": {
      "buckets": [
        {
          "3": {
            "buckets": [
              {
                "1": {
                  "value": 1213761876
                },
                "key": "site0.com"
              },
              {
                "1": {
                  "value": 161186
                },
                "key": "site0.1.com"
              },
              {
                "1": {
                  "value": 87301
                },
                "key": "site0.2.com"
              }
            ]
          },
          "key": "username0"
        },
        {
          "3": {
            "buckets": [
              {
                "1": {
                  "value": 789571585
                },
                "key": "site1.com"
              },
              {
                "1": {
                  "value": 320507
                },
                "key": "site1.1.com"
              },
              {
                "1": {
                  "value": 121567
                },
                "key": "site1.2.com"
              }
            ]
          },
          "key": "username1"
        },
        {
          "3": {
            "buckets": [
              {
                "1": {
                  "value": 735793791
                },
                "key": "site2.com"
              },
              {
                "1": {
                  "value": 2941322
                },
                "key": "site2.1.com"
              },
              {
                "1": {
                  "value": 749758
                },
                "key": "site2.2.com"

              }
            ]
          },
          "key": "username2"
        }
      ]
    }
  }
}

I'm currently interesting in extract indices of my buckets were the value > threshold .
In this case : get the indices of "site0.com","site1.com","site2.com" and use them in my email alert.

You mentioned a chain input,
Is that to separate my big data array into multiple unique objects? (wichi I can filter by range)

In that case can you specify how to do that ?

Chris

spinscale · June 16, 2017, 1:29pm

Hey,

no the chain input is not what you are expecting. It only allows you to gather data from different data sources or queries.

Again my question: What are you referring to with indices - an index is a collection of document, where as the aggregation result is a count of documents which fall into a certain bucket. Lets please make sure we are using Elasticsearch jargon, otherwise it is really hard to follow.

If you want to extract an index, this is not possible with the reply. If you want to extract just the valies site0.com, site1.com and site2.com, you can do that via scripting by using a transform in your action.

You can check out the Watching the watches: Writing, debugging and testing watches
blog post, that will show you, how to use an aggregation output together with a watch (that you do not need to store) in order to speed up turnaround times when writing your watches.

--Alex

CSkobbs · June 16, 2017, 2:04pm

Hi,

This is my previous conditon :

"condition" : {
    "script" : {
       "inline": "for(int i = 0; i<10;i++){if (ctx.payload.aggregations.2.buckets[i].1.value > 100000000)...}"
}

I want to catch the values of my counter i where the condition is true.

Do you know a way to do so?

Chris

spinscale · June 28, 2017, 7:03am

Hey,

keep ind mind that the condition is only supposed to return true or false. You can use java lambdas or for loops to filter out the buckets, i.e. (just a snippet not a complete example)

ctx.payload.aggregations.2.buckets.stream().filter(a -> a.3.buckets.size() > 0).map(a -> a.key).collect(Collectors.toList())

this filters by a certain bucket size and collects the usernames of those found. Filtering by bucket size is of course not too useful, but should give you an example

--Alex

system · July 26, 2017, 7:03am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filter on Aggregation Results Elasticsearch	1	347	April 24, 2018
Help with ES 1.x aggregation range filter over "sum" value Elasticsearch	2	537	July 6, 2017
Range query on the result of Sum aggregation Elasticsearch	2	721	November 26, 2018
Holding aggregation results in parameters Elasticsearch	3	359	July 6, 2017
Getting only items from "array_compare" condition is met Elasticsearch elastic-stack-alerting	5	1322	April 25, 2018

Filter Sum aggregation results

Related topics