Aggregation on metric (sum) aggregation


(Refael Dakar) #1

I'm trying to filter out results of an aggregation.

This is my query:

GET _search
{
  "size": 0,
  "query": { 
    "match": 
    {
      "name": "tick_event"
    }
  },
  "aggs": {
    "all_users": {
      "terms": {
        "field": "uuid",
        "size": 0
      },
      "aggs": {
        "total_calls": {
          "sum": {
            "field":"extra.call",
            "script": "_value / 100"
          },
          "aggs": {
            "above_9999" : {
              "filter": {
                "range": {
                  "_value": {
                    "gte": 9999
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

I'm getting this error:

...
nested: AggregationInitializationException[Aggregator [total_calls] of type [sum] cannot accept sub-aggregations]
...

So I understand that it's impossible to do a sub-aggregation on a sum aggregation but how can I get what I want?


(Colin Goodheart-Smithe) #2

If you want to get the sum of your extra.call field for only the documents matching your filter, try the following:

GET _search
{
  "size": 0,
  "query": {
    "match": {
      "name": "tick_event"
    }
  },
  "aggs": {
    "all_users": {
      "terms": {
        "field": "uuid",
        "size": 0
      },
      "aggs": {
        "above_9999": {
          "filter": {
            "range": {
              "_value": {
                "gte": 9999
              }
            }
          }
        },
        "aggs": {
          "total_calls": {
            "sum": {
              "field": "extra.call",
              "script": "_value / 100"
            }
          }
        }
      }
    }
  }
}

(Refael Dakar) #3

Thanks @colings86!

Are you sure this is what I need to do?
Let me explain: What I want to get is only the documents where total_calls are > 9999. I mean, i want to filter the results of the "sum" aggregation so it'll only show me those who are above 9999.

What you sent doesn't work. I'm getting:

Parse Failure [Could not find aggregator type [total_calls] in [aggs]]

I think what you meant is that I'll have the 'total_calls' aggs inside the 'above_9999'. When I ran it like that it just gave me the list of uuids (as expected) but with sum values of zero.


(Colin Goodheart-Smithe) #4

Oh sorry, I misunderstood what you were trying to do (it looks like I also have a syntax error in the JSON I sent but thats another matter :wink: )

You can't currently perform filtering on the aggregation results themselves. The filter aggregator is used to filter the documents which are passed into the aggregator. In 2.0 there is a new feature to allow computations to be performed on aggregation results (https://github.com/elastic/elasticsearch/issues/9876). This does not yet include being able to filter the results of an aggregation but it is something that I am thinking about.

For now you would need to do this kind of filtering on the client side once you have the aggregation results.


(Colin Goodheart-Smithe) #5

Additionally, running a terms aggregation with size: 0 on a high cardinality field such as uuid is going to be very resource intensive. I am wondering what you are trying to achieve since 'uuid' is unique to each document and this will result in a bucket for every document in your index? If you just need to find all the documents where extra.call is greater than 99.99 then you could do this using a simple query, something like the following:

    GET _search
    {
      "size": 10,
      "query": {
        "bool": {
          "must": [
            {
              "match": {
                "name": "tick_event"
              }
            },
            {
              "range": {
                "extra.call": {
                  "gte": 99.99
                }
              }
            }
          ]
        }
      }
    }

(Refael Dakar) #6

Well that's a bummer :disappointed: I really hoped to be able to do this kind of thing with ES. It looks like an extremely needed feature... I hope you'll do something about it soon. My data scientist can't do queries he needs without it... we'll have to code everything out and do a lot of post processing.

Regarding your second post: I think you misunderstood what I need again. I don't want extra.call>99.99, I want to first sum up all extra.call for every user and get those where the sum is > 9999.

Something like:
SELECT * FROM (SELECT uuid, sum(extra.call) as calls FROM xxx WHERE name='tick_event' GROUP_BY uuid) WHERE calls>9999


(Colin Goodheart-Smithe) #7

Ok, sorry for the misunderstanding, I had read uuid as the elasticsearch _uid which is unique for every document in the index.

I agree that it would be a great feature to have and hopefully we will be able to land it in Elasticsearch soon. We needed first to be able to perform computations on the results of aggregations, which was merged in recently.

The bit that makes me un-easy about your current aggregation is the size: 0 on your terms aggregation. This is not going to scale well to lots of users as a bucket is being created for every user in your system, and these buckets need to be generated on each node and then shipped back to the coordinating node where they will be merged and then streamed over the network to your client application. This will be memory and network intensive if the number of users is high (maybe your app has a low volume of users and so is ok?). You may want to look into a talk from Elastic{ON} by @Mark_Harwood on 'Entity Centric indexes' which is a method of indexing to help with answering these kinds of questions:

https://www.elastic.co/elasticon/2015/sf/building-entity-centric-indexes

Here is an earlier video of the same idea from the London Elasticsearch User Group:

https://www.elastic.co/videos/entity-centric-indexing-london-meetup-sep-2014


(Refael Dakar) #8

Thanks.

My main problem right now is not with the size but with the ability to actually do the queries I need.


(system) #9