Doubt about having clause behaviour

templair79 · November 6, 2016, 10:03am

Hi

I need some help to understand how having is used in aggregations.

I'm using ES 2.4.1 and I have some documents indexed like these:

{"author":"first writter", "book": "first book"},
{"author":"first writter","book": "second book"},
{"author":"second writter","book": "first book"}

So, my test dataset has more than 2 millions indexed documents as described above: I hope this three examples are enough to understand the structure. Note that all authors have at least 1 book, but it's not mandatory to have more than 1.

I want to retrieve two stats:

Top 5 Authors with more than 1 book
Bottom 5 Authors with more than 1 book

As I need to retrieve only authors with more than 1 book, my first aproach (as I come from an SQL environment) has been to use having clause in aggregation as following:

"aggregations" : {
	"author" : {
  		"terms" : {
    		"field" : "author",
    		"size" : 5,
    		"order" : {
      			"requests": "desc"
    		}
  		},
        "aggregations" : {
        	"requests" : {
            	"cardinality" : {
                	"field" : "book"
                }
            },
            "having" : {
            	"bucket_selector": {
                           "buckets_path": {
                               "total": "requests"
                    },
                    "script": "total > 1"
                }
            }
        }
	}
 }

With this approach I have succeeded in retrieving top indicator, but bottom indicator (changing order to desc) doesn't return any result.

I have changed my having clause from
"script": "total > 1"
to
"script": "total > 0"

And then, bottom indicator returns 5 authors with only 1 book, but this is not the expected behaviour.

Am I missing something? Is there any other better approach to retrieve this information?

Thanks in advance

spinscale · November 7, 2016, 8:41am

Hey,

for the first approach, you just need a single terms aggregation with the min_doc_count field set to 1. See the terms aggs docs

I am not sure what a bottom 5 author is in this context, as you are referring to a requests field that was not mentioned before, but maybe this helps already.

--Alex

templair79 · November 17, 2016, 6:00pm

Hi

First of all I would like to thank you Alexander for your reply, but it doesn't help me to clarifying how to use having in elasticsearch.

As I explained, I come from an SQL environment, and I would retrieve bottom 5 Authors with more than 1 book with an SQL query like this one:

select count(books), author from authors group by author having count(books) > 2 order by count(books) asc limit 5

Is there a way to reproduce this behaviour in elasticsearch? Is there any other option to retrieve this kind of aggregation?

Thanks again and sorry for my late reply

templair79 · December 1, 2016, 10:14pm

Hi

Does elasticsearch provide described behaviour?

Should I take another aproach in order to retrieve this bottom aggregation?

Thanks again

system · December 29, 2016, 10:14pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ElasticSearch Having-Clause? Elasticsearch	2	2889	July 6, 2017
Having clause equivalent Elasticsearch	4	7315	July 5, 2017
Filtering by a aggregation (like SQL having clause) Elasticsearch	4	6036	November 5, 2018
Queries with nested fields and "having" clauses Elasticsearch	4	24	September 19, 2024
SQL-like Having on aggregated data Elasticsearch	1	327	April 16, 2019

Doubt about having clause behaviour

Related topics