Optimization for heavy Elasticsearch terms aggregation

I am using Elasticsearch 2.4 and upgrading it is not possible at the moment, since the mappings have to be changed. So directly to the question.
I am trying to get different metrics for around 50 groups.
I am using terms aggregation 2 times, and nested aggregations too. I am querying on index size of around 2 TB.
My query looks like this:

Query:
{
"size": 0,
"query": {
"bool": {
"must": [
{
"nested": {
"path": "datefield",
"filter": {
"bool": {
"must": [
{
"range": {
"datefield.fromDate": {
"from": null,
"to": "2018-12-31"
}
}
},
{
"range": {
"datefield.toDate": {
"from": "2018-12-31",
"to": null
}
}
}
]
}
}
}
}
]
}
},
"aggregations": {
"Group1": {
"filter": {
"nested": {
"path": "field1",
"query": {
"bool": {
"must": [
{
"term": {
"field1.id": "Something"
}
},
{
"terms": {
"field1.type": [
"valueA",
"valueB",
"valueC"
]
}
},
{
"range": {
"field1.fromDate": {
"include_lower": true,
"include_upper": true,
"from": null,
"to": "2018-12-31"
}
}
},
{
"range": {
"field1.toDate": {
"include_lower": true,
"include_upper": true,
"from": "2018-12-31",
"to": null
}
}
}
]
}
}
}
},
"aggregations": {
"people": {
"terms": {
"field": "field2",
"size": 0
},
"aggregations": {
"amount": {
"nested": {
"path": "field3"
},
"aggregations": {
"total_paid": {
"filter": {
"bool": {
"must": [
{
"range": {
"field3.month": {
"include_lower": true,
"include_upper": true,
"from": "2015-01-01",
"to": "2018-12-31"
}
}
},
{
"range": {
"field3.differentmonth": {
"include_lower": true,
"include_upper": true,
"from": "2015-01-01",
"to": "2018-12-31"
}
}
},
{
"term": {
"field3.field1[flatField]": "value1"
}
}
]
}
},
"aggregations": {
"sum_amt": {
"sum": {
"field": "field3.value2"
}
}
}
}
}
},
"Score": {
"nested": {
"path": "field4"
},
"aggs": {
"DateFilter": {
"filter": {
"bool": {
"must": [
{
"term": {
"field4.date": "2019-04-30"
}
}
]
}
},
"aggs": {
"ScoreValue": {
"terms": {
"field": "field4.value3",
"size": 0
}
}
}
}
}
}
}
},
"Age": {
"avg": {
"field": "age"
}
},
"Gender": {
"terms": {
"field": "gender",
"size": 0
}
}
}
}
}
}

So what I am trying to do here is, group by a metric, and based on that metric calculate people related to that group. Then calculate sum of each individual's amount(individual may contain multiple amounts, I need the sum of all the amount based on the date period), and again calculate score of each individual.
This sample query consists of how Group1 is calculated. Likewise 50 Groups are calculated in this order.
The Elasticsearch execution is taking a lot of time. Any different approach to solve the timing issue? The response is very big, and the size is around 20-30 Mbs.
Any open suggestion will be well appreciated.
Thanks.

Hi @Prasiddha ,

Just my opinion if you can upgrade to more recent version upgrade, there was a lot of performance improvement and new feature from version 2.

You can see a graph in this post about performance improvement (in Korean but just watch the picture it's explicit)
http://kimjmin.net/2019/04/2019-04-elastic-stack-7-release/

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.