Doubt about having clause behaviour

Hi

I need some help to understand how having is used in aggregations.

I'm using ES 2.4.1 and I have some documents indexed like these:

{"author":"first writter", "book": "first book"},
{"author":"first writter","book": "second book"},
{"author":"second writter","book": "first book"}

So, my test dataset has more than 2 millions indexed documents as described above: I hope this three examples are enough to understand the structure. Note that all authors have at least 1 book, but it's not mandatory to have more than 1.

I want to retrieve two stats:

  • Top 5 Authors with more than 1 book
  • Bottom 5 Authors with more than 1 book

As I need to retrieve only authors with more than 1 book, my first aproach (as I come from an SQL environment) has been to use having clause in aggregation as following:

"aggregations" : {
	"author" : {
  		"terms" : {
    		"field" : "author",
    		"size" : 5,
    		"order" : {
      			"requests": "desc"
    		}
  		},
        "aggregations" : {
        	"requests" : {
            	"cardinality" : {
                	"field" : "book"
                }
            },
            "having" : {
            	"bucket_selector": {
                           "buckets_path": {
                               "total": "requests"
                    },
                    "script": "total > 1"
                }
            }
        }
	}
 }

With this approach I have succeeded in retrieving top indicator, but bottom indicator (changing order to desc) doesn't return any result.

I have changed my having clause from
"script": "total > 1"
to
"script": "total > 0"

And then, bottom indicator returns 5 authors with only 1 book, but this is not the expected behaviour.

Am I missing something? Is there any other better approach to retrieve this information?

Thanks in advance

Hey,

for the first approach, you just need a single terms aggregation with the min_doc_count field set to 1. See the terms aggs docs

I am not sure what a bottom 5 author is in this context, as you are referring to a requests field that was not mentioned before, but maybe this helps already.

--Alex

Hi

First of all I would like to thank you Alexander for your reply, but it doesn't help me to clarifying how to use having in elasticsearch.

As I explained, I come from an SQL environment, and I would retrieve bottom 5 Authors with more than 1 book with an SQL query like this one:

select count(books), author from authors group by author having count(books) > 2 order by count(books) asc limit 5

Is there a way to reproduce this behaviour in elasticsearch? Is there any other option to retrieve this kind of aggregation?

Thanks again and sorry for my late reply

Hi

Does elasticsearch provide described behaviour?

Should I take another aproach in order to retrieve this bottom aggregation?

Thanks again

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.