Filtering on aggregated values


#1

Hi all, I'm a newbie in elasticsearch, so I apologise if my question is incorrect, too simple ...
I have a type called "individual", also have a type "income", individual has many incomes.
income type contains fileds date and sum.

I need to build query that allows to answer next question:
What is the count of individuals that have total income ( total income = sum of all sum field values of individual) in range {min_range_value} - {max_range_value} during the period {start_date} - {end_date}

Should I map theese types as parent/child or I need nested ?
What way should I choose to implement request (filtering/scripting) ?
Greate thanks for any ideas or keywords for googling !

{
"query": {
"bool": {
"filter": [
{
"script": {
"script": {
"source": "def s = 0; for (p in params['_source'].incomes) { s = s + p.sum; } return s > params.min_inco> me_sum;",
"lang": "painless",
"params": {
"min_income_sum": 5000
}
}
}
}
]
}
}
}
this query returns nothing


(Zachary Tong) #2

How many users do you have? If you have a reasonably small number of users, you could do:

query:
  range:
    gte: start_date
    lte: end_date
aggregations:
  terms:
    field: user_id
  aggregations:
    sum:
      field: incomes
    bucket_selector:
      script: income_sums > 5000

E.g. a query finds all documents that are within the time range you care about. Then a terms aggregation partitions each user into their own bucket. For each bucket you then calculate the sum of the incomes, and use a bucket_selector pipeline aggregation to filter out any bucket that doesn't match the threshold.

Then you can count the number of remaining buckets (or use a stats_bucket to count them).


#3

thanks a lot!