How to use arithmetic operation inside bucket aggregation on a very large data containing millions document in elasticsearch?

vaibhav_goel · July 21, 2017, 9:51am

Using script within bucket aggregation on a very large set of data takes very long time to return result and many a times Elasticsearch health goes down due to this.

ES has around 10 millions documents and it haven't given result while performing arithmetic operation on 2 fields and make ES down, on using 1 field and 1 integer it return result.. My query :

GET urza/user/_search
{
"size": 0,
"aggs" : {
"prices" : {
"histogram" : {
"script" :
{
"inline":"doc['total_revenue'].value * doc['total_event'].value ",
"lang": "painless"
},
"interval" : 100
}
}
}
}

colings86 · July 21, 2017, 12:05pm

What are the range of values for total_revenue and total_event (i.e. what are the minimum and maximum values of those two fields)?

vaibhav_goel · July 21, 2017, 12:19pm

total revenue can be from 0 to 1000000 and total event can be from 0 to 1000

colings86 · July 21, 2017, 12:24pm

Ok so the range of values which can be output from your script is 1,000,000 * 1,000 = 1,000,000,000. The interval for your histogram is 100 so you are requesting 10,000,000 buckets from the histogram. This will take alot amount of memory so the reason you are not getting a response and the cluster health is affected is likely to be because this request is causing an OutOfMemoryError. You can confirm this by looking at the Elasticsearch server log when you see the happen.

I would suggest you choose a larger interval which will create a smaller number of buckets or, if you can, increase the amount of heap memory you give your Elasticsearch server.

vaibhav_goel · July 21, 2017, 12:40pm

is this the best way to do arithmetic operation or their is any other better way to perform arithmetic operation and is this query can perform good for even more documents???

colings86 · July 21, 2017, 12:47pm

it is not the number of documents that is causing the issue here. The issue is down to you asking for 10,000,000 buckets of information. You would have the same issue if instead of multiplying the two fields in a script you had a single field which already contained the result of total_revenue * total_event for each document.

You need to either give Elasticsearch more heap memory so it can store the state for the number of buckets you are requesting or change the request to ask for less buckets in the histogram (for example by increasing the interval) until the server can cope with the request.

If we take a step back and look at this from a higher level, what are you trying to present to the user by making this request? what question are you trying to answer from the data?

vaibhav_goel · July 22, 2017, 5:04am

Thanks a lot.. This solve my issue...i will either give it more heap space or increase the interval size.

system · August 19, 2017, 5:04am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Too many aggregation buckets Elasticsearch	3	3073	July 5, 2017
Histogram aggregation Elasticsearch Elasticsearch	1	337	November 15, 2019
Elasticsearch aggregation on million or more data Elasticsearch	6	3285	February 7, 2022
Performing sum after terms aggregation for 1.5m buckets Elasticsearch	1	336	July 30, 2020
Can you do date histogram with specified number of buckets? Elasticsearch	2	449	July 6, 2017

How to use arithmetic operation inside bucket aggregation on a very large data containing millions document in elasticsearch?

Related topics