Get number of unique values in a field

Saurabh_Minni · November 17, 2014, 6:55am

Hi,
I was trying to check Cardinality Aggregation. I believe it will give me an
approx value of the number of unique users.

Below is what I am using.
{
"aggs" : {
"user_count" : {
"cardinality" : {
"field" : "userid"
}
}
}
}

Can some one confirm a few things for me.

What is the accuracy of the result.
Is this is the only way or are there other options to do this as well.
This feature is experimental as per docs, what are the future roadmaps
for this, if any ?

Thanks,
Saurabh

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/35040cd2-49ed-4363-857e-f66892d64faf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jpountz · November 17, 2014, 10:56am

Hi,

On Mon, Nov 17, 2014 at 7:55 AM, Saurabh Minni saurabh.minni@gmail.com
wrote:

Hi,
I was trying to check Cardinality Aggregation. I believe it will give me
an approx value of the number of unique users.

Below is what I am using.
{
"aggs" : {
"user_count" : {
"cardinality" : {
"field" : "userid"
}
}
}
}

Can some one confirm a few things for me.

What is the accuracy of the result.

The accuracy is quite good in general, we tried to give some examples in
the documentations to show that even with rather low values of the
precision threshold, the error is often very low. The paper about
HyperLogLog++ (the algorithm beneath the cardinality aggregation) gives
more information about the error margin that you can expect (see figure 8
in particular).

http://stefanheule.com/papers/edbt2013-hyperloglog.pdf

Is this is the only way or are there other options to do this as well.

Not really. If you know the cardinality is going to be low (< 1000), you
could use a terms aggregation with a size of 0 (which tells elasticsearch
to return all terms) and count the number of terms returned. Although this
would help you find out the exact number of terms, this would not scale for
high cardinalities, and the cardinality aggregation has optimizations
that make it almost accurate when cardinalities are low anyway.

This feature is experimental as per docs, what are the future roadmaps
for this, if any ?

There are no concrete plans at the moment. When we added this aggregation
in Elasticsearch 1.1, it was quite new in terms of functionalities that
elasticsearch exposes, so we wanted to make it experimental in order to
have the freedom to modify it based on feedback. The experimental flag will
very likely be removed in the next major version.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6_EWm0_hNyuNGmTc%3DjUxd8RyctDAC65XPOWyZGhggBaw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Saurabh_Minni · November 17, 2014, 11:05am

Hi Adrien,
Thanks for the quick reply, this answers everything for me.

-Saurabh

On Mon, Nov 17, 2014 at 4:26 PM, Adrien Grand <
adrien.grand@elasticsearch.com> wrote:

Hi,

On Mon, Nov 17, 2014 at 7:55 AM, Saurabh Minni saurabh.minni@gmail.com
wrote:

Hi,
I was trying to check Cardinality Aggregation. I believe it will give me
an approx value of the number of unique users.

Below is what I am using.
{
"aggs" : {
"user_count" : {
"cardinality" : {
"field" : "userid"
}
}
}
}

Can some one confirm a few things for me.

What is the accuracy of the result.

The accuracy is quite good in general, we tried to give some examples in
the documentations to show that even with rather low values of the
precision threshold, the error is often very low. The paper about
HyperLogLog++ (the algorithm beneath the cardinality aggregation) gives
more information about the error margin that you can expect (see figure 8
in particular).

Elasticsearch Platform — Find real-time answers at scale | Elastic
http://stefanheule.com/papers/edbt2013-hyperloglog.pdf

Is this is the only way or are there other options to do this as well.

Not really. If you know the cardinality is going to be low (< 1000), you
could use a terms aggregation with a size of 0 (which tells elasticsearch
to return all terms) and count the number of terms returned. Although this
would help you find out the exact number of terms, this would not scale for
high cardinalities, and the cardinality aggregation has optimizations
that make it almost accurate when cardinalities are low anyway.

This feature is experimental as per docs, what are the future roadmaps
for this, if any ?

There are no concrete plans at the moment. When we added this aggregation
in Elasticsearch 1.1, it was quite new in terms of functionalities that
elasticsearch exposes, so we wanted to make it experimental in order to
have the freedom to modify it based on feedback. The experimental flag will
very likely be removed in the next major version.

--
Adrien Grand

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/zx9UFt2JPNY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6_EWm0_hNyuNGmTc%3DjUxd8RyctDAC65XPOWyZGhggBaw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6_EWm0_hNyuNGmTc%3DjUxd8RyctDAC65XPOWyZGhggBaw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHGf3smr%2BzvvagNQEQa1tdh7m5JxxH1bhGaaOq%3Dm35MLPqq0yg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Cardinality Aggregation gives wrong number? Elasticsearch	33	7349	March 7, 2019
Is the precision of cardinality aggregation decided by total unique value count or filtered unique value count? Elasticsearch	5	180	January 10, 2024
Accuracy on cardinality aggregate Elasticsearch	9	2252	July 6, 2017
Cardinality and value_count aggr values are 200-500% off Elasticsearch	4	857	March 21, 2017
Different result for cardinality aggregation [6.3.0] in Python 3.6 plugin Elasticsearch	1	502	October 15, 2018

Get number of unique values in a field

Related topics