How to find the number of authors who have written between 2-3 books?


(Mike) #1

Assume each document is a book:
{ title: "A", author: "Mike" }
{ title: "B", author: "Mike" }
{ title: "C", author: "Mike" }
{ title: "D", author: "Mike" }

{ title: "E", author: "John" }
{ title: "F", author: "John" }
{ title: "G", author: "John" }

{ title: "H", author: "Joe" }
{ title: "I", author: "Joe" }

{ title: "J", author: "Jack" }

What is the best way to fin the number of authors who have written between
2-3 books? In this case it would be 2, John and Joe.

I know I can do a terms aggregation on author, set size to be very very
large, and then on the client side traverse through the thousands of
authors and count how many had between 2-3. Is there a more efficient way
to do this? The cardinality aggregation is almost what I want, if only I
could specify a min and max term count.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/22fc4e6d-bcac-426c-a343-ff1d36fc25de%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Itamar Syn-Hershko) #2

This is a Map/Reduce operation, you'll be better off maintaining a
ref-count document IMO then trying to hack the aggregations framework to
support this

Another reason for doing it that way is in a distributed environment some
aggregations can't be computed to an exact value - the Terms bucketing is
one example. So if you need exact values, I'd go for a model that does it.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Fri, Jun 20, 2014 at 1:34 AM, Mike mnilsson2323@gmail.com wrote:

Assume each document is a book:
{ title: "A", author: "Mike" }
{ title: "B", author: "Mike" }
{ title: "C", author: "Mike" }
{ title: "D", author: "Mike" }

{ title: "E", author: "John" }
{ title: "F", author: "John" }
{ title: "G", author: "John" }

{ title: "H", author: "Joe" }
{ title: "I", author: "Joe" }

{ title: "J", author: "Jack" }

What is the best way to fin the number of authors who have written between
2-3 books? In this case it would be 2, John and Joe.

I know I can do a terms aggregation on author, set size to be very very
large, and then on the client side traverse through the thousands of
authors and count how many had between 2-3. Is there a more efficient way
to do this? The cardinality aggregation is almost what I want, if only I
could specify a min and max term count.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/22fc4e6d-bcac-426c-a343-ff1d36fc25de%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/22fc4e6d-bcac-426c-a343-ff1d36fc25de%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zv5%3DmuahwGVbGobX5SgMHYzC_bD4udiZ3XTiAdU1v8YCg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Mike) #3

I'm ok with the count returned being some estimate. Say in this simple
example if it returned 1 for just Joe, or 3 for John, Joe, and Jack that
would be ok too. I am also ok with restructuring my data in any way to
more efficiently get this number.

You mentioned creating a reference count document. How would that look? 1
doc per unique author, with a count of the total number of books he wrote
so then I can do a range aggregation on that number? What if I wanted to
find "the number of authors who have written between 2-3 books that have a
title containing E, F, H, or I" (still 2 in this case, John and Joe) ?

On Thursday, June 19, 2014 6:43:41 PM UTC-4, Itamar Syn-Hershko wrote:

This is a Map/Reduce operation, you'll be better off maintaining a
ref-count document IMO then trying to hack the aggregations framework to
support this

Another reason for doing it that way is in a distributed environment some
aggregations can't be computed to an exact value - the Terms bucketing is
one example. So if you need exact values, I'd go for a model that does it.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Fri, Jun 20, 2014 at 1:34 AM, Mike <mnilss...@gmail.com <javascript:>>
wrote:

Assume each document is a book:
{ title: "A", author: "Mike" }
{ title: "B", author: "Mike" }
{ title: "C", author: "Mike" }
{ title: "D", author: "Mike" }

{ title: "E", author: "John" }
{ title: "F", author: "John" }
{ title: "G", author: "John" }

{ title: "H", author: "Joe" }
{ title: "I", author: "Joe" }

{ title: "J", author: "Jack" }

What is the best way to fin the number of authors who have written
between 2-3 books? In this case it would be 2, John and Joe.

I know I can do a terms aggregation on author, set size to be very very
large, and then on the client side traverse through the thousands of
authors and count how many had between 2-3. Is there a more efficient way
to do this? The cardinality aggregation is almost what I want, if only I
could specify a min and max term count.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/22fc4e6d-bcac-426c-a343-ff1d36fc25de%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/22fc4e6d-bcac-426c-a343-ff1d36fc25de%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2cab8d84-7c65-4f6e-ab39-3e2a0e859a87%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Clinton Gormley) #4

Alternatively, if you mode this with parent-child, then you can use
min_children/max_children which is available in the next release

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-child-filter.html#_min_max_children_2

clint

On 20 June 2014 17:15, Mike mnilsson2323@gmail.com wrote:

I'm ok with the count returned being some estimate. Say in this simple
example if it returned 1 for just Joe, or 3 for John, Joe, and Jack that
would be ok too. I am also ok with restructuring my data in any way to
more efficiently get this number.

You mentioned creating a reference count document. How would that look?
1 doc per unique author, with a count of the total number of books he
wrote so then I can do a range aggregation on that number? What if I
wanted to find "the number of authors who have written between 2-3 books
that have a title containing E, F, H, or I" (still 2 in this case, John and
Joe) ?

On Thursday, June 19, 2014 6:43:41 PM UTC-4, Itamar Syn-Hershko wrote:

This is a Map/Reduce operation, you'll be better off maintaining a
ref-count document IMO then trying to hack the aggregations framework to
support this

Another reason for doing it that way is in a distributed environment some
aggregations can't be computed to an exact value - the Terms bucketing is
one example. So if you need exact values, I'd go for a model that does it.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Fri, Jun 20, 2014 at 1:34 AM, Mike mnilss...@gmail.com wrote:

Assume each document is a book:
{ title: "A", author: "Mike" }
{ title: "B", author: "Mike" }
{ title: "C", author: "Mike" }
{ title: "D", author: "Mike" }

{ title: "E", author: "John" }
{ title: "F", author: "John" }
{ title: "G", author: "John" }

{ title: "H", author: "Joe" }
{ title: "I", author: "Joe" }

{ title: "J", author: "Jack" }

What is the best way to fin the number of authors who have written
between 2-3 books? In this case it would be 2, John and Joe.

I know I can do a terms aggregation on author, set size to be very very
large, and then on the client side traverse through the thousands of
authors and count how many had between 2-3. Is there a more efficient way
to do this? The cardinality aggregation is almost what I want, if only I
could specify a min and max term count.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/22fc4e6d-bcac-426c-a343-ff1d36fc25de%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/22fc4e6d-bcac-426c-a343-ff1d36fc25de%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2cab8d84-7c65-4f6e-ab39-3e2a0e859a87%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2cab8d84-7c65-4f6e-ab39-3e2a0e859a87%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPt3XKSyio7izuxr5UL4SD5uiA5J7rwtfyP742W3robxfk7s6A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #5