ElasticSearch equivalent query for SQL group by multiple columns


(NANDAKUMAR SUBRAMANIAM) #1

I have a few million documents with name and version (both of type keyword) as properties in each. What is the equivalent Elastic query for group by name, version?

I have tried the following query:
{
"size":0,
"query": {
"bool": {
"filter": {
"range": {
"time": {
"gte": "2017-01-28",
"lte": "2017-02-28"
}
}
}
}
},
"aggs": {
"group_by_name": {
"terms": {
"field": "name"
},
"aggs": {
"group_by_version": {
"terms": {
"field": "version"
}
}
}
}
}

However the results are not same as doing Group by name, version. The results are grouped by name and within each group, they are grouped by version. How do I modify the above query to group by name, version tuple and return results in descending order?

{
"took": 1424,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 115,
"max_score": 0,
"hits": []
},
"aggregations": {
"group_by_name": {
"doc_count_error_upper_bound": 2,
"sum_other_doc_count": 115,
"buckets": [
{
"key": "product1",
"doc_count": 50,
"group_by_version": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 50,
"buckets": [
{
"key": "1.0",
"doc_count": 40
},
{
"key": "2.0",
"doc_count": 10
},
]
}
},
{
"key": "product3",
"doc_count": 35,
"group_by_version": {
"doc_count_error_upper_bound": 4,
"sum_other_doc_count": 35,
"buckets": [
{
"key": "8.0",
"doc_count": 20
},
{
"key": "9.0",
"doc_count": 15
}
]
}
},
{
"key": "product2",
"doc_count": 30,
"group_by_version": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 30,
"buckets": [
{
"key": "4.0",
"doc_count": 25
},
{
"key": "5.0",
"doc_count": 5
}
]
}
}
]
}
}
}

What i want is:

name, version count
product1 1.0 40
product2 4.0 25
product3 8.0 20
product3 9.0 15
product1 2.0 10
product2 5.0 5


(David Pilato) #2

Ideally you should create a composite field at index time and then run a terms agg on it.

If you can't reindex then use a script to combine 2 fields but that will be slow.


(NANDAKUMAR SUBRAMANIAM) #3

Thanks for the reply. Due to space considerations (we have 20 Billion records), i am really not considering composite field solution.


(David Pilato) #4

You are going to pay a huge price at search time then with a script.


(NANDAKUMAR SUBRAMANIAM) #5

Hi, You are correct. While ElasticSearch solves most of our problems, it falls short on this one. To get maximum benefits out of ES, we'd probably change our problem (no group by multiple columns)


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.