Hi
I need some help to understand how having is used in aggregations.
I'm using ES 2.4.1 and I have some documents indexed like these:
{"author":"first writter", "book": "first book"},
{"author":"first writter","book": "second book"},
{"author":"second writter","book": "first book"}
So, my test dataset has more than 2 millions indexed documents as described above: I hope this three examples are enough to understand the structure. Note that all authors have at least 1 book, but it's not mandatory to have more than 1.
I want to retrieve two stats:
- Top 5 Authors with more than 1 book
- Bottom 5 Authors with more than 1 book
As I need to retrieve only authors with more than 1 book, my first aproach (as I come from an SQL environment) has been to use having clause in aggregation as following:
"aggregations" : {
"author" : {
"terms" : {
"field" : "author",
"size" : 5,
"order" : {
"requests": "desc"
}
},
"aggregations" : {
"requests" : {
"cardinality" : {
"field" : "book"
}
},
"having" : {
"bucket_selector": {
"buckets_path": {
"total": "requests"
},
"script": "total > 1"
}
}
}
}
}
With this approach I have succeeded in retrieving top indicator, but bottom indicator (changing order to desc) doesn't return any result.
I have changed my having clause from
"script": "total > 1"
to
"script": "total > 0"
And then, bottom indicator returns 5 authors with only 1 book, but this is not the expected behaviour.
Am I missing something? Is there any other better approach to retrieve this information?
Thanks in advance