Cardinality not giving expected results


(Tim Uckun) #1

I have roughly 50 million records in ES. The data was generated
artificially and is mostly duplicates. I am executing the following query

{
size: 0,
aggregations: {
by_month: {

        date_histogram: {
            field:    "time_stamp",
            interval: "1M",
            format:   "yyyy-MM-dd HH:mm"

        },
        aggregations:   {
            by_node_mac: {
                terms:        {
                    field: "node_mac"
                },
                aggregations: {
                    cardinality: {field: 'device_mac'}
                }

            }
        }
    }
}

}

I expect the cardinality of these to be in the hundreds but I am getting
hundreds of thousands an even millions as a count.

It looks like it's not counting the cardinality but actually counting the
number of records.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/68eaee35-1069-4b25-99d4-7719df7d13f6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Adrien Grand) #2

Thank you for this report, this indeed looks like a bug. Do you have the
same issue if you use the cardinality aggregation as a top-level
aggregation (the way it is executed is a bit different in that case so I am
thinking the bug might only happen when used as a sub aggregation).

On Tue, Apr 15, 2014 at 4:39 AM, Tim Uckun timuckun@gmail.com wrote:

I have roughly 50 million records in ES. The data was generated
artificially and is mostly duplicates. I am executing the following query

{
size: 0,
aggregations: {
by_month: {

        date_histogram: {
            field:    "time_stamp",
            interval: "1M",
            format:   "yyyy-MM-dd HH:mm"

        },
        aggregations:   {
            by_node_mac: {
                terms:        {
                    field: "node_mac"
                },
                aggregations: {
                    cardinality: {field: 'device_mac'}
                }

            }
        }
    }
}

}

I expect the cardinality of these to be in the hundreds but I am getting
hundreds of thousands an even millions as a count.

It looks like it's not counting the cardinality but actually counting the
number of records.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/68eaee35-1069-4b25-99d4-7719df7d13f6%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/68eaee35-1069-4b25-99d4-7719df7d13f6%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5imRf1eCWRcLGpy3dYXrA5n0t%2BYWFFQ1TBV4H8Fgbqjg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3