Sbuqueries and distinct counts

Tim_Uckun · April 9, 2014, 11:59pm

I want to do something like this.

select date_trunc('month', time_stamp), sum(distinct_count) from (
select date_trunc('week', time_stamp) as time_stamp, count(distinct
field_name) as distinct_count
from blah
group by date_trun('week', time_stamp)
)
group by date_trunc('month', time_stamp)

So basically I want to break up the data into weekly chunks and count the
distinct appearances of a value and then sum those up on a per monthly
basis.

In preparation for that I tried to do the subquery for that which looks
like this

aggregations: {
by_month: {
date_histogram: {
field: "time_stamp",
interval: "1M",
format: "yyyy-MM-dd HH:mm"
},
aggregations: {
by_node_mac: {
terms: {
field: "node_mac"
},
aggregations: {
cardinality: {field: 'device_mac'}
}

}

but I seem to be getting the wrong answers. I am using fake data which
should give me very low numbers for the cardinality but it actually seems
to be counting the number of rows not the number of distinct items. The
numbers are outrageously high.

I tried a precision threshold of 1000 and 100 but it seems to make no
difference.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4cafbefc-bf00-49aa-9c7c-2240c4f1fd55%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tim_Uckun · April 10, 2014, 10:45pm

I hate to bump myself but does anybody have any input on this at all?

On Thursday, April 10, 2014 11:59:43 AM UTC+12, Tim Uckun wrote:

I want to do something like this.

select date_trunc('month', time_stamp), sum(distinct_count) from (
select date_trunc('week', time_stamp) as time_stamp, count(distinct
field_name) as distinct_count
from blah
group by date_trun('week', time_stamp)
)
group by date_trunc('month', time_stamp)

So basically I want to break up the data into weekly chunks and count the
distinct appearances of a value and then sum those up on a per monthly
basis.

In preparation for that I tried to do the subquery for that which looks
like this

aggregations: {
by_month: {
date_histogram: {
field: "time_stamp",
interval: "1M",
format: "yyyy-MM-dd HH:mm"
},
aggregations: {
by_node_mac: {
terms: {
field: "node_mac"
},
aggregations: {
cardinality: {field: 'device_mac'}
}
                          }
                      }
                  }
              }
}

but I seem to be getting the wrong answers. I am using fake data which
should give me very low numbers for the cardinality but it actually seems
to be counting the number of rows not the number of distinct items. The
numbers are outrageously high.

I tried a precision threshold of 1000 and 100 but it seems to make no
difference.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/016b94f6-4b62-4009-83ec-ccef3bfb17f6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

spinscale · April 21, 2014, 11:42am

Hey,

you may want to provide a gist with some sample data, so people can
follow.. Also you might need to use the value_count aggregation somewhere,
to make sure you are only using the counts, but this really depends on your
data model and the data being indexed, so full gist might help a lot.

--Alex

On Fri, Apr 11, 2014 at 12:45 AM, Tim Uckun timuckun@gmail.com wrote:

I hate to bump myself but does anybody have any input on this at all?

On Thursday, April 10, 2014 11:59:43 AM UTC+12, Tim Uckun wrote:
I want to do something like this.

select date_trunc('month', time_stamp), sum(distinct_count) from (
select date_trunc('week', time_stamp) as time_stamp, count(distinct
field_name) as distinct_count
from blah
group by date_trun('week', time_stamp)
)
group by date_trunc('month', time_stamp)

So basically I want to break up the data into weekly chunks and count the
distinct appearances of a value and then sum those up on a per monthly
basis.

In preparation for that I tried to do the subquery for that which looks
like this

aggregations: {
by_month: {
date_histogram: {
field: "time_stamp",
interval: "1M",
format: "yyyy-MM-dd HH:mm"
},
aggregations: {
by_node_mac: {
terms: {
field: "node_mac"
},
aggregations: {
cardinality: {field: 'device_mac'}
}
                          }
                      }
                  }
              }
}

but I seem to be getting the wrong answers. I am using fake data which
should give me very low numbers for the cardinality but it actually seems
to be counting the number of rows not the number of distinct items. The
numbers are outrageously high.

I tried a precision threshold of 1000 and 100 but it seems to make no
difference.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/016b94f6-4b62-4009-83ec-ccef3bfb17f6%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/016b94f6-4b62-4009-83ec-ccef3bfb17f6%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8JwkQHOsxUwt6ddH2W-f1nNcttUvGWq0AXZiFmVt9bSQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Count distinct by date Elasticsearch	3	833	October 28, 2019
Is it possible to get Count of distinct field value with groupby in ElasticSearch 1.2.1? Elasticsearch	1	378	July 6, 2017
Count distinct value by date Elasticsearch	6	2444	July 6, 2017
Aggregating the distinct result to a grouping Elasticsearch	3	981	July 5, 2017
Simple aggregation counting distinct values that has turned out to be difficult Elasticsearch	1	202	March 15, 2023

Sbuqueries and distinct counts

Related topics