Hi there,
Im a little new to the nuances of building aggregations, but essentially I
am trying to construct an aggregation which results in an "in" bucket and
an "out" bucket with respect to some predicate (filter?) that I want to
apply.
I can easily achieve the in-bucket by using a filter aggregation, but if I
also want to see the inverse to this filter I would rather not have to
create (repeat) the filter to identify the "out" set.
Is there an easy way to do this?
Thanks,
Ross
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com .
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b2a8e295-6db1-49ce-af2d-d78638f9cf48%40googlegroups.com .
For more options, visit https://groups.google.com/d/optout .
jpountz
(Adrien Grand)
February 26, 2015, 11:21pm
2
Hi Ross,
There is no way to do it today without repeating the filter and nesting it
inside of a not
filter. We are considering adding info about missing and
other buckets to our terms aggregation, you can read the discussion at
opened 05:18PM - 03 Mar 14 UTC
closed 08:16AM - 23 Jul 15 UTC
>feature
:Analytics/Aggregations
NEED: In many (if not majority cases) when present users with business analytics… , the user would want to see numbers for complete data set. No matter how you aggregate it should present the same data with the same number of documents. Inability to handle "missing" values exclude those from analysis making analyzed data set incomplete and grand totals dependent on which field(s) the aggregation is done. It is impossible to explain to the users why the lower level totals do not add up to the upper level ones!
WORKAROUND: Currently field based bucket aggregations (term, range etc) have no way to aggregate missing values. The only way is to use missing aggregation on the same level and the same field as the term aggregation itself. It is easy enough when dealing with one level aggregations but if you have 2-3 level aggregation number of "missing" aggregations (and complete lower level aggregation to be repeated in them) mushrooms very quickly to the point that the query is huge, convoluted and not debuggable. It may affect performance as well. Also fetched date needs to be heavily post-processed to extract multiple levels aggregation buckets from under various "missing" elements and put them inline with the regular aggregation values. Below please see a simple query to do 2 level aggregation with just one sum metrics
PROPOSAL: I would suggest that any aggregation operating on a field should have a missing option. If missing config is specified, aggregation should accumulate missing values under that value and honor any nested aggregations within. It should never assume any value like 0 or _missing since it may clash with actual keys. If it is not specified the aggregation should skip missing values as it does now.
This approach makes it entirely compatible with existing logic and give developers complete control over whether to aggregate missing and under what key. In cases when it is not needed (and not specified) there will be no performance overhead. But when it needed it will work faster as we would not need to do missing aggregation and aggregations under it separately (same goes for "other" aggregation)
To be honest, I would love to see the same handling for "other" - documents that have not been included in aggregation due to the aggregation size constraints. Again the same rationale - ability to slice complete data set regardless of aggregation structure. It is just as needed as "missing" and just as troublesome to calculate but
I could understand if you did not add it as it may be not compatible with your algorithms but PLEASE PLEASE add "missing" handling at least
```
{
"total": {
"sum": {
"field": "money.totals.obligationTotal"
}
},
"missing": {
"missing": {
"field": "division"
},
"aggs": {
"total": {
"sum": {
"field": "money.totals.obligationTotal"
}
},
"missing": {
"missing": {
"field": "fy"
}
},
"group": {
"terms": {
"field": "fy",
"order": { "_term": "asc" }
},
"aggs": {
"total": {
"sum": {
"field": "money.totals.obligationTotal"
}
}
}
}
}
},
"group": {
"terms": {
"field": "division",
"order": { "_term": "asc" },
size:100
},
"aggs": {
"total": {
"sum": {
"field": "money.totals.obligationTotal"
}
},
"missing": {
"missing": {
"field": "fy"
},
"aggs": {
"total": {
"sum": {
"field": "money.totals.obligationTotal"
}
}
}
},
"group": {
"terms": {
"field": "fy",
"order": { "_term": "asc" }
},
"aggs": {
"total": {
"sum": {
"field": "money.totals.obligationTotal"
}
}
}
}
}
}
}
```
cc @uboness, @jpountz
On Thu, Feb 26, 2015 at 4:00 PM, Ross Duncan rossjduncan@gmail.com wrote:
Hi there,
Im a little new to the nuances of building aggregations, but essentially I
am trying to construct an aggregation which results in an "in" bucket and
an "out" bucket with respect to some predicate (filter?) that I want to
apply.
I can easily achieve the in-bucket by using a filter aggregation, but if I
also want to see the inverse to this filter I would rather not have to
create (repeat) the filter to identify the "out" set.
Is there an easy way to do this?
Thanks,
Ross
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com .
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b2a8e295-6db1-49ce-af2d-d78638f9cf48%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b2a8e295-6db1-49ce-af2d-d78638f9cf48%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout .
--
Adrien Grand
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com .
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j74_iX33r-3n%2BzbwyXWGnTm_-Ci6gDCyv8d2u%3D-8CQNew%40mail.gmail.com .
For more options, visit https://groups.google.com/d/optout .