For the last two days my team deals with solving an issue of querying the data from Elasticsearch DB (ES). Our purpose is to get aggregated data by a field from ES with two values accumulated.
If I would translate it to SQL query we need something like that:
SELECT MAX(FIELD1) AS F1, MAX(FIELD2) AS F2 FROM ES GROUP BY FIELD3 HAVING F1 = ‘SOME_TEXT’
Please put attention that F1 is a text field.
The only solution that we found as of now is:
{
"size": 0 ,
"aggs": {
"flowId": {
"terms": {
"field": "flowId.keyword"
},
"aggs" :{
"scenario" : { "terms" : { "field" : "scnName.keyword" } },
"max_time" : { "max" : { "field" : "inFlowTimeNsec" } },
"sales_bucket_filter": {
"bucket_selector": {
"buckets_path": {
"totalSales": "scenario"
},
"script": "params.totalSales != null && params.totalSales > 0"
}
}
}
}
}
}
The issue that we encountered is:
{
"error": {
"root_cause": [],
"type": "search_phase_execution_exception",
"reason": "",
"phase": "fetch",
"grouped": true,
"failed_shards": [],
"caused_by": {
"type": "aggregation_execution_exception",
"reason": "buckets_path must reference either a number value or a single value numeric metric aggregation, got: org.elasticsearch.search.aggregations.bucket.terms.StringTerms"
}
},
"status": 503
}
As far as I understand that issue was already raised: https://github.com/elastic/elasticsearch/issues/23874
The output of the above query without bucket_selector part looks as following:
{
"took": 52,
"timed_out": false,
"_shards": {
"total": 480,
"successful": 480,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 15657901,
"max_score": 0,
"hits": []
},
"aggregations": {
"flowId": {
"doc_count_error_upper_bound": 4104,
"sum_other_doc_count": 9829317,
"buckets": [
{
"key": "0_66718_31120bfd_39ae_4258_81e8_08abd89a81bf",
"doc_count": 107816,
"scenario": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "GetPop",
"doc_count": 12
}
]
},
"max_time": {
"value": 121244876800
}
},
{
"key": "0_67116_31120bfd_39ae_4258_81e8_08abd89a81bf",
"doc_count": 107752,
"scenario": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "GetPop",
"doc_count": 12
}
]
},
"max_time": {
"value": 120955101184
}
},
…
}
The question is there any other way to achieve what we need?
Thank you a lot,
EG