What is the best way to fin the number of authors who have written between
2-3 books? In this case it would be 2, John and Joe.
I know I can do a terms aggregation on author, set size to be very very
large, and then on the client side traverse through the thousands of
authors and count how many had between 2-3. Is there a more efficient way
to do this? The cardinality aggregation is almost what I want, if only I
could specify a min and max term count.
This is a Map/Reduce operation, you'll be better off maintaining a
ref-count document IMO then trying to hack the aggregations framework to
support this
Another reason for doing it that way is in a distributed environment some
aggregations can't be computed to an exact value - the Terms bucketing is
one example. So if you need exact values, I'd go for a model that does it.
What is the best way to fin the number of authors who have written between
2-3 books? In this case it would be 2, John and Joe.
I know I can do a terms aggregation on author, set size to be very very
large, and then on the client side traverse through the thousands of
authors and count how many had between 2-3. Is there a more efficient way
to do this? The cardinality aggregation is almost what I want, if only I
could specify a min and max term count.
I'm ok with the count returned being some estimate. Say in this simple
example if it returned 1 for just Joe, or 3 for John, Joe, and Jack that
would be ok too. I am also ok with restructuring my data in any way to
more efficiently get this number.
You mentioned creating a reference count document. How would that look? 1
doc per unique author, with a count of the total number of books he wrote
so then I can do a range aggregation on that number? What if I wanted to
find "the number of authors who have written between 2-3 books that have a
title containing E, F, H, or I" (still 2 in this case, John and Joe) ?
On Thursday, June 19, 2014 6:43:41 PM UTC-4, Itamar Syn-Hershko wrote:
This is a Map/Reduce operation, you'll be better off maintaining a
ref-count document IMO then trying to hack the aggregations framework to
support this
Another reason for doing it that way is in a distributed environment some
aggregations can't be computed to an exact value - the Terms bucketing is
one example. So if you need exact values, I'd go for a model that does it.
What is the best way to fin the number of authors who have written
between 2-3 books? In this case it would be 2, John and Joe.
I know I can do a terms aggregation on author, set size to be very very
large, and then on the client side traverse through the thousands of
authors and count how many had between 2-3. Is there a more efficient way
to do this? The cardinality aggregation is almost what I want, if only I
could specify a min and max term count.
I'm ok with the count returned being some estimate. Say in this simple
example if it returned 1 for just Joe, or 3 for John, Joe, and Jack that
would be ok too. I am also ok with restructuring my data in any way to
more efficiently get this number.
You mentioned creating a reference count document. How would that look?
1 doc per unique author, with a count of the total number of books he
wrote so then I can do a range aggregation on that number? What if I
wanted to find "the number of authors who have written between 2-3 books
that have a title containing E, F, H, or I" (still 2 in this case, John and
Joe) ?
On Thursday, June 19, 2014 6:43:41 PM UTC-4, Itamar Syn-Hershko wrote:
This is a Map/Reduce operation, you'll be better off maintaining a
ref-count document IMO then trying to hack the aggregations framework to
support this
Another reason for doing it that way is in a distributed environment some
aggregations can't be computed to an exact value - the Terms bucketing is
one example. So if you need exact values, I'd go for a model that does it.
What is the best way to fin the number of authors who have written
between 2-3 books? In this case it would be 2, John and Joe.
I know I can do a terms aggregation on author, set size to be very very
large, and then on the client side traverse through the thousands of
authors and count how many had between 2-3. Is there a more efficient way
to do this? The cardinality aggregation is almost what I want, if only I
could specify a min and max term count.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.