Doubt about having clause behaviour


#1

Hi

I need some help to understand how having is used in aggregations.

I'm using ES 2.4.1 and I have some documents indexed like these:

{"author":"first writter", "book": "first book"},
{"author":"first writter","book": "second book"},
{"author":"second writter","book": "first book"}

So, my test dataset has more than 2 millions indexed documents as described above: I hope this three examples are enough to understand the structure. Note that all authors have at least 1 book, but it's not mandatory to have more than 1.

I want to retrieve two stats:

  • Top 5 Authors with more than 1 book
  • Bottom 5 Authors with more than 1 book

As I need to retrieve only authors with more than 1 book, my first aproach (as I come from an SQL environment) has been to use having clause in aggregation as following:

"aggregations" : {
	"author" : {
  		"terms" : {
    		"field" : "author",
    		"size" : 5,
    		"order" : {
      			"requests": "desc"
    		}
  		},
        "aggregations" : {
        	"requests" : {
            	"cardinality" : {
                	"field" : "book"
                }
            },
            "having" : {
            	"bucket_selector": {
                           "buckets_path": {
                               "total": "requests"
                    },
                    "script": "total > 1"
                }
            }
        }
	}
 }

With this approach I have succeeded in retrieving top indicator, but bottom indicator (changing order to desc) doesn't return any result.

I have changed my having clause from
"script": "total > 1"
to
"script": "total > 0"

And then, bottom indicator returns 5 authors with only 1 book, but this is not the expected behaviour.

Am I missing something? Is there any other better approach to retrieve this information?

Thanks in advance


(Alexander Reelsen) #2

Hey,

for the first approach, you just need a single terms aggregation with the min_doc_count field set to 1. See the terms aggs docs

I am not sure what a bottom 5 author is in this context, as you are referring to a requests field that was not mentioned before, but maybe this helps already.

--Alex


#3

Hi

First of all I would like to thank you Alexander for your reply, but it doesn't help me to clarifying how to use having in elasticsearch.

As I explained, I come from an SQL environment, and I would retrieve bottom 5 Authors with more than 1 book with an SQL query like this one:

select count(books), author from authors group by author having count(books) > 2 order by count(books) asc limit 5

Is there a way to reproduce this behaviour in elasticsearch? Is there any other option to retrieve this kind of aggregation?

Thanks again and sorry for my late reply


#4

Hi

Does elasticsearch provide described behaviour?

Should I take another aproach in order to retrieve this bottom aggregation?

Thanks again


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.