Hi
I know some work has begun on this but thought I would post my
findings here. Our goal is the same as Zohar's - we want to generate
statistics on a numeric field, but have those statistics broken down
by the value of another field(s) in the document.
We have come up with a (suboptimal) solution for this by making two
requests to elastic.
The first has a single terms facet on the field we want the statistics
broken down by. This gives us all possible values for the field. The
second request is then based off the first. We add a statistical
facet with a filter, the filter based on the terms which we got back
from the first request.
So say the terms facet returned 1000 terms, we'd then make a second
request with 1000 statistical facets each with a different filter.
This works, but as I'm sure you can imagine it suffers horribly when
it comes to performance!
Regards
Neil
On Dec 28 2010, 6:35 pm, Shay Banon shay.ba...@elasticsearch.com
wrote:
It make sense, what you are after. The main challenge with facets is the
fact that they can get really interesting once you start to combine them (as
is the case in this thread with terms and stats). The problem is that those
facet implementation are highly optimized for the simple reason that they
might end up running over 100s of millions of docs. And implementing all the
combinations in a generic fashion is certainly possible, but will incur a
performance overhead (both in computation, but even more in serialization
over network).One of the things lined up for 0.15 is to do some refactoring in facets and
make them more pluggable. Once thats out of the way, then people can write
their own facet implementations.Of couse, there should be a good out of the box set of facets that comes
with ES. My current line of thought is that there will simply be a lot of
facet types, all heavily optimized. There will be a terms_stats, and
date_histogram, and others. I don't mind implementing all of those and have
them as past of ES. Hopefully the community will help with it (or at the
very least, help with coming up with good names for them ), so you will
get a really rich and heavily optimized set of facets.-shay.banon
On Tue, Dec 28, 2010 at 2:05 PM, harelba hare...@gmail.com wrote:
Hi,
I've been looking for a way to perform aggregations similar to the
ones talked about in this thread, grouping the data according to an
arbitrary set or fields (or better yet - an expression).The ScriptHistogramFacet seemed like a good choice, allowing the key
to actually be a "key_script", and skipping the "bucketing" stage. I
thought that this would allow me to achieve this kind of aggregations,
but then I saw that ScriptHistogramFacetCollector.doCollect() relies
on the fact that value returned from key_script has to be of type
Number even if the interval==0. I know that currently you're using
LongLong maps, but If it would have accepted other types as well (at
least strings), that would have been really great.Am I getting it wrong? Is there a good way to do that? Your help would
be much appreciated.Thanks,
RLbtw, it would have been totally cool if the data collected by the
StatisticalFacet would be integrated into the HistogramFacet (and its
scripted brother). The StatisticalFacet is great, but often-times the
statistical data is required per some kind of "group", and not only on
some kind of filter over the whole data.