Proper use of aggregation?


(Tinou Bao) #1

My instincts says this is not the proper use of aggregation but want to
check w/ people who have actually used it. We want to bucket on a very high
cardinality field and return ALL buckets (no size limit). For example,
imagine documents representing people and their parents:

person - parent

john - cindy
james - cindy
tony - mark
tim - doug

I want to bucket by parent, so it'll be

cindy

  • john
  • james
    mark
  • tony
    doug
  • tim

This is a high cardinality field, so already it concerns me. I want all
buckets (setting size to zero). So if I have 10,000 documents I have 5,000
parent buckets and I want all 5,000 of these parent buckets. Essentially
I'm trying to display by parent (group by parent). Moreover, I want to sort
the parent's age (so imagine the parent has an age it it). Or maybe I want
to sort by the average person (child) age in each bucket. So w/ aggregation
this seems possible:

bucket by parent, sort by average age of person, bucket by person (to get
all people for a parent bucket), set size to zero.

But it feels very wrong to me, both in terms of the potential performance
issues around unlimited, high cardinality buckets and the sorting of those
buckets; and that aggregration/bucketing wasn't designed for this.

Any input/feedback would be appreciated.
-T

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ea6665a1-6562-456c-a806-937fd9f15463%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #2