Aggregations sort on doc_count of a filter


(Nils) #1

Hi,

Aggregations ROCK!

But....
Imagine, we have a document with a number (n) in the range of -10 to 10.
And a field with a couple of terms.

Now I want to create a view with two columns, the first for the number of
documents having a positive value, the second for documents having a
negative value. This is easily done with an aggregation like this:

{
"size": 0,
"aggs": {
"label": {
"terms": {
"field": "somefield"
},
"aggs": {
"positive": {
"filter": {
"range": {
"n": { "gt": 0 }
}
}
},
"negative": {
"filter": {
"range": {
"n": { "lt": 0 }
}
}
}
}
}
}
}

The counts of the positive and negative documents can be found in
'positive.doc_count' and the 'negative.doc_count'. Everything is fine.

Now you want to sort you aggregation on the label with the most positive
documents, so we add the following to the 'terms':
...
{
"terms": {
"field": "somefield",
"order": { "positive": "desc" }
},
...
}
...

We get an error back saying:
SearchPhaseExecutionException[Failed to execute phase [query], all shards
failed; shardFailures {[RwghWCxzQ-S9SyjTAX119A][XXXXXXXXXXXXX][8]:
AggregationExecutionException[terms aggregation [label] is configured to
order by sub-aggregation [positive] which is is not a metrics aggregation.
Terms aggregation order can only refer to metrics aggregations]}

This error tells me that I can't order on the doc_count of a filter because
filter is not a metrics aggregation. It would be really help full to be
able to sort on filtered aggregations as well. I would even go as far as
sorting on sub aggregations of filters! That way you could sum the value of
the positive documents and sort on the sum as well (although I would not
know what that number is supposed to represent now).

This might not be trivial to implement, but I think it is worth looking in
to.

At least I found a way to get around this problem in the short term by
summing the value of a script which does the heavy lifting in checking if
the document should be counted, but we have some configuration files now
where we have 100+ columns representing counts of documents with according
filters which you cannot really port automatically to a value_script of a
sum.

If people are interesting in the workaround this is the query:
{
"size": 0,
"aggs": {
"label": {
"terms": {
"field": "somefield",
"order": { "positive": "desc" }
},
"aggs": {
"positive": {
"sum": {
"field": "n",
"script": "_value>0?1:0"
}
},
"negative": {
"sum": {
"field": "n",
"script": "_value<0?1:0"
}
}
}
}
}
}

Could we get the sorting on filtered aggregations as a feature request? Or
is this simply impossible to achieve (with decent performance) in the
aggregations framework?

-- Nils

On a side note, the above queries are not tested on elasticsearch so could
contain errors when copy/paste into an elasticsearch request.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/971d14e9-e9e7-45f3-bc54-5c5c2c3603fe%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #2