Get number of unique values in a field

Hi,
I was trying to check Cardinality Aggregation. I believe it will give me an
approx value of the number of unique users.

Below is what I am using.
{
"aggs" : {
"user_count" : {
"cardinality" : {
"field" : "userid"
}
}
}
}

Can some one confirm a few things for me.

  1. What is the accuracy of the result.
  2. Is this is the only way or are there other options to do this as well.
  3. This feature is experimental as per docs, what are the future roadmaps
    for this, if any ?

Thanks,
Saurabh

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/35040cd2-49ed-4363-857e-f66892d64faf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi,

On Mon, Nov 17, 2014 at 7:55 AM, Saurabh Minni saurabh.minni@gmail.com
wrote:

Hi,
I was trying to check Cardinality Aggregation. I believe it will give me
an approx value of the number of unique users.

Below is what I am using.
{
"aggs" : {
"user_count" : {
"cardinality" : {
"field" : "userid"
}
}
}
}

Can some one confirm a few things for me.

  1. What is the accuracy of the result.

The accuracy is quite good in general, we tried to give some examples in
the documentations to show that even with rather low values of the
precision threshold, the error is often very low. The paper about
HyperLogLog++ (the algorithm beneath the cardinality aggregation) gives
more information about the error margin that you can expect (see figure 8
in particular).

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html#_counts_are_approximate
http://stefanheule.com/papers/edbt2013-hyperloglog.pdf

  1. Is this is the only way or are there other options to do this as well.

Not really. If you know the cardinality is going to be low (< 1000), you
could use a terms aggregation with a size of 0 (which tells elasticsearch
to return all terms) and count the number of terms returned. Although this
would help you find out the exact number of terms, this would not scale for
high cardinalities, and the cardinality aggregation has optimizations
that make it almost accurate when cardinalities are low anyway.

  1. This feature is experimental as per docs, what are the future roadmaps
    for this, if any ?

There are no concrete plans at the moment. When we added this aggregation
in Elasticsearch 1.1, it was quite new in terms of functionalities that
elasticsearch exposes, so we wanted to make it experimental in order to
have the freedom to modify it based on feedback. The experimental flag will
very likely be removed in the next major version.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6_EWm0_hNyuNGmTc%3DjUxd8RyctDAC65XPOWyZGhggBaw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi Adrien,
Thanks for the quick reply, this answers everything for me.

-Saurabh

On Mon, Nov 17, 2014 at 4:26 PM, Adrien Grand <
adrien.grand@elasticsearch.com> wrote:

Hi,

On Mon, Nov 17, 2014 at 7:55 AM, Saurabh Minni saurabh.minni@gmail.com
wrote:

Hi,
I was trying to check Cardinality Aggregation. I believe it will give me
an approx value of the number of unique users.

Below is what I am using.
{
"aggs" : {
"user_count" : {
"cardinality" : {
"field" : "userid"
}
}
}
}

Can some one confirm a few things for me.

  1. What is the accuracy of the result.

The accuracy is quite good in general, we tried to give some examples in
the documentations to show that even with rather low values of the
precision threshold, the error is often very low. The paper about
HyperLogLog++ (the algorithm beneath the cardinality aggregation) gives
more information about the error margin that you can expect (see figure 8
in particular).

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html#_counts_are_approximate
http://stefanheule.com/papers/edbt2013-hyperloglog.pdf

  1. Is this is the only way or are there other options to do this as well.

Not really. If you know the cardinality is going to be low (< 1000), you
could use a terms aggregation with a size of 0 (which tells elasticsearch
to return all terms) and count the number of terms returned. Although this
would help you find out the exact number of terms, this would not scale for
high cardinalities, and the cardinality aggregation has optimizations
that make it almost accurate when cardinalities are low anyway.

  1. This feature is experimental as per docs, what are the future roadmaps
    for this, if any ?

There are no concrete plans at the moment. When we added this aggregation
in Elasticsearch 1.1, it was quite new in terms of functionalities that
elasticsearch exposes, so we wanted to make it experimental in order to
have the freedom to modify it based on feedback. The experimental flag will
very likely be removed in the next major version.

--
Adrien Grand

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/zx9UFt2JPNY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6_EWm0_hNyuNGmTc%3DjUxd8RyctDAC65XPOWyZGhggBaw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6_EWm0_hNyuNGmTc%3DjUxd8RyctDAC65XPOWyZGhggBaw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHGf3smr%2BzvvagNQEQa1tdh7m5JxxH1bhGaaOq%3Dm35MLPqq0yg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.