ElasticSearch Having-Clause?


(tkeane) #1

Hi guys,
I’m using elasticsearch 1.0.0RC2 and wondering if there is an
equivalent to SQL’s “having-clause” for the aggregation framework there.
Below is an example query and a link to a ticket that describes the issue
well. The part of the query that's highlighted doesn't work, and is there
purely to give an idea of what I'm after. This query (omitting the
highlighted portion) gives impression counts for every
placement-referer-device-date combo. This is fine but the output is HUGE! I
was wondering if there was a way (like a having clause or filter) to reduce
the amount of results based off some logic (in this case, impressions
counts greater than 50). Thanks all!

  • Trev

curl -XPOST "XXXXXXXX/YYYYYYYY/_search?pretty=true" -d '

{

"size":0,

"query": {

"filtered": {

"query": {

"range": {

"date_time": {

"from": "ZZZZZZZ",

"to": "QQQQQQQQ",

"include_lower": true,

"include_upper": true

}

}

}

}

},

"aggs": {

"placement": {

"terms": {

"field": "placement"

},

"aggs": {

"device": {

"terms": {

"field": "device"

},

"aggs": {

"referer": {

"terms": {

"field": "referer"

},

"aggs": {

"totals": {

"date_histogram": {

"field": "date_time",

"interval": "day"

},

"aggs": {

"impression": {

"sum": {

"field": "impression"

}

,"having" : { "from" : 50 }

}

}

}

}

}

}

}

}

}

}

}

'

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c194a5e3-4f5b-4660-bf38-f1a2fad2cf55%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Zennet Wheatcroft) #2

As the open issue you quote suggests, ES currently has no support for an
equivalent to to SQL’s HAVING clause.
Here's another reference which supports that:
https://groups.google.com/forum/#!msg/elasticsearch/UsrCG2Abj-A/IDO9DX_PoQwJ

What I did as a workaround is get all the results in an intermediate layer
and then loop through them to leaving out the ones not meeting my boolean
criteria (COUNT(*) = x). But that is not really a solution to your problem
of too many results. I had 200,000 results which worked fine but if I had
200M that would not work so well. And it won't work for any of the
aggregation functions (sum, min, max, avg) other than count as far as I can
tell.

Have you considered the '"min_doc_count": 50' feature?
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_minimum_document_count
I used it to filter out all the groups that have less than x documents and
then manually removed the groups with more than x.
In your case it looks like you want to filter on something like HAVING
SUM(impression) > 49 and I don't think there is even a workaround for that
because the functions, and even script filters, are applied to documents,
not to the aggregations. At least as far as I can tell. It would be great
if someone showed me otherwise.

Zennet

On Wednesday, February 5, 2014 7:50:01 AM UTC-8, tke...@rundsp.com wrote:

Hi guys,
I’m using elasticsearch 1.0.0RC2 and wondering if there is an
equivalent to SQL’s “having-clause” for the aggregation framework there.
Below is an example query and a link to a ticket that describes the issue
well. The part of the query that's highlighted doesn't work, and is there
purely to give an idea of what I'm after. This query (omitting the
highlighted portion) gives impression counts for every
placement-referer-device-date combo. This is fine but the output is HUGE! I
was wondering if there was a way (like a having clause or filter) to reduce
the amount of results based off some logic (in this case, impressions
counts greater than 50). Thanks all!

  • Trev

https://github.com/elasticsearch/elasticsearch/issues/4404

curl -XPOST "XXXXXXXX/YYYYYYYY/_search?pretty=true" -d '

{

"size":0,

"query": {

"filtered": {

"query": {

"range": {

"date_time": {

"from": "ZZZZZZZ",

"to": "QQQQQQQQ",

"include_lower": true,

"include_upper": true

}

}

}

}

},

"aggs": {

"placement": {

"terms": {

"field": "placement"

},

"aggs": {

"device": {

"terms": {

"field": "device"

},

"aggs": {

"referer": {

"terms": {

"field": "referer"

},

"aggs": {

"totals": {

"date_histogram": {

"field": "date_time",

"interval": "day"

},

"aggs": {

"impression": {

"sum": {

"field": "impression"

}

,"having" : { "from" : 50 }

}

}

}

}

}

}

}

}

}

}

}

'

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/99cd80a9-8f5d-4378-b4a5-09b8421f8c4e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3