Count distinct value by date

I use ElasticSearch for statistical purposes and have recently switched from MySQL to ElasticSearch. My table looks as follows:

datetime | unique_identifier | some more fields...

2013-05-01 | abc | ...
2013-05-01 | cde | ...
2013-05-01 | abc | ...
2013-05-01 | abc | ...
2013-05-01 | cde | ...
2013-05-01 | cde | ...
2013-05-01 | abc | ...
2013-05-01 | xyz | ...
2013-05-01 | abc | ...
2013-05-02 | abc | ...
2013-05-02 | cde | ...
2013-05-02 | abc | ...
2013-05-02 | abc | ...
2013-05-02 | cde | ...
2013-05-03 | cde | ...
2013-05-03 | abc | ...
2013-05-03 | xyz | ...
2013-05-03 | abc | ...
2013-05-04 | abc | ...

Now I would like to have listed who many different unique_identifier per day are in the table. Taking the above table as an example, the result would look like this:

2013-05-01 | 3
2013-05-02 | 2
2013-05-03 | 3
2013-05-04 | 1

For this I have always used the following MySQL query:
SELECT DATE (datetime), count (distinct unique_identifier)
FROM tablenname
GROUP BY DATE(datetime);

Unfortunately I could not find the right companion piece to it in ElasticSearch.
Can someone give me a hint?

Thank you very much
Sebastian

Hello,

You must use date histogram facet :
http://www.elasticsearch.org/guide/reference/api/search/facets/date-histogram-facet/

{
"query" : {
"match_all" : {}
},
"facets" : {
"histo1" : {
"date_histogram" : {
"field" : "datetime",
"interval" : "day"
}
}
}
}

You can set other interval, by hour, month, week, etc.

Le lundi 10 juin 2013 15:47:36 UTC+2, shammes a écrit :

I use ElasticSearch for statistical purposes and have recently switched
from
MySQL to ElasticSearch. My table looks as follows:

datetime | unique_identifier | some more fields...

2013-05-01 | abc | ...
2013-05-01 | cde | ...
2013-05-01 | abc | ...
2013-05-01 | abc | ...
2013-05-01 | cde | ...
2013-05-01 | cde | ...
2013-05-01 | abc | ...
2013-05-01 | xyz | ...
2013-05-01 | abc | ...
2013-05-02 | abc | ...
2013-05-02 | cde | ...
2013-05-02 | abc | ...
2013-05-02 | abc | ...
2013-05-02 | cde | ...
2013-05-03 | cde | ...
2013-05-03 | abc | ...
2013-05-03 | xyz | ...
2013-05-03 | abc | ...
2013-05-04 | abc | ...

Now I would like to have listed who many different unique_identifier per
day
are in the table. Taking the above table as an example, the result would
look like this:

2013-05-01 | 3
2013-05-02 | 2
2013-05-03 | 3
2013-05-04 | 1

For this I have always used the following MySQL query:
SELECT DATE (datetime), count (distinct unique_identifier)
FROM tablenname
GROUP BY DATE(datetime);

Unfortunately I could not find the right companion piece to it in
ElasticSearch.
Can someone give me a hint?

Thank you very much
Sebastian

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Count-distinct-value-by-date-tp4036320.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi,

thanks for your reply.
I know the date_histogram-facet, but this only counts (for example per day) the number of entries or when you set the "value_field" the numeric value of this field.

But i need a distinct count-value. In my example i need the total, how many different "unique_identifier" per day exists.

Your code snippet would have following result:
2013-05-01 | 9
2013-05-02 | 5
2013-05-03 | 4
2013-05-04 | 1

The result i am looking for should look like:
2013-05-01 | 3
2013-05-02 | 2
2013-05-03 | 3
2013-05-04 | 1

Do you have any other hint for me?

Oh OK sorry, I think I have the same problem.
I'm interesting by the reply.

Le mardi 11 juin 2013 16:03:59 UTC+2, shammes a écrit :

Hi,

thanks for your reply.
I know the date_histogram-facet, but this only counts (for example per
day)
the number of entries or when you set the "value_field" the numeric value
of
this field.

But i need a distinct count-value. In my example i need the total, how
many
different "unique_identifier" per day exists.

Your code snippet would have following result:
2013-05-01 | 9
2013-05-02 | 5
2013-05-03 | 4
2013-05-04 | 1

The result i am looking for should look like:
2013-05-01 | 3
2013-05-02 | 2
2013-05-03 | 3
2013-05-04 | 1

Do you have any other hint for me?

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Count-distinct-value-by-date-tp4036320p4036361.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Shammes,

In 1.0 there might be some changes to the facet system that allows to nest
facets. I think


might be something to look in to, however, the plugin is not compatible
with 0.90, so it might be difficult to get it to work.

For now I don't see how to do this, but maybe Boaz can explain it better?

Jaap

On Tuesday, June 11, 2013 5:38:38 PM UTC+2, Rémy Turpin wrote:

Oh OK sorry, I think I have the same problem.
I'm interesting by the reply.

Le mardi 11 juin 2013 16:03:59 UTC+2, shammes a écrit :

Hi,

thanks for your reply.
I know the date_histogram-facet, but this only counts (for example per
day)
the number of entries or when you set the "value_field" the numeric value
of
this field.

But i need a distinct count-value. In my example i need the total, how
many
different "unique_identifier" per day exists.

Your code snippet would have following result:
2013-05-01 | 9
2013-05-02 | 5
2013-05-03 | 4
2013-05-04 | 1

The result i am looking for should look like:
2013-05-01 | 3
2013-05-02 | 2
2013-05-03 | 3
2013-05-04 | 1

Do you have any other hint for me?

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Count-distinct-value-by-date-tp4036320p4036361.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Shammes, Remy, Jaap,

You could indeed use the faceted-date-histogram with an inlined term facet
(with size=0 or 1) to get that information. The faceted dated histogram
allows you to first group posting by date and then apply an arbitrary facet
to every group.

As Jaap already pointed out, the plugin is not compatible with version
0.90.0 and up of Elasticsearch. Version 0.90.0 came with a complete
re-write of the internal in memory data structures that drive the faceting
engine. That rewrite delivered a tremendous amount of memory savings so I
highly recommend using it. Sadly, it also rendered my plugin to be
incompatible and I simply didn't have the time to re-write things. If
enough people need it, I might find some time to do it and make a 0.90.X
compatible version (dropping other features like the hashed terms facet,
which is simply incompatible). Of course, pull requests are welcome :slight_smile:

About the 1.0 version of ES - we are currently working on a new powerful
faceting engine that will allow to do this and much more by allow to nest
facets. It will take a couple of month, though, before it's ready.

Cheers,
Boaz

On Tuesday, June 11, 2013 11:01:35 PM UTC+2, Jaap Taal wrote:

Hi Shammes,

In 1.0 there might be some changes to the facet system that allows to nest
facets. I think
https://github.com/bleskes/elasticfacets#faceted-date-histogram
might be something to look in to, however, the plugin is not compatible
with 0.90, so it might be difficult to get it to work.

For now I don't see how to do this, but maybe Boaz can explain it better?

Jaap

On Tuesday, June 11, 2013 5:38:38 PM UTC+2, Rémy Turpin wrote:

Oh OK sorry, I think I have the same problem.
I'm interesting by the reply.

Le mardi 11 juin 2013 16:03:59 UTC+2, shammes a écrit :

Hi,

thanks for your reply.
I know the date_histogram-facet, but this only counts (for example per
day)
the number of entries or when you set the "value_field" the numeric
value of
this field.

But i need a distinct count-value. In my example i need the total, how
many
different "unique_identifier" per day exists.

Your code snippet would have following result:
2013-05-01 | 9
2013-05-02 | 5
2013-05-03 | 4
2013-05-04 | 1

The result i am looking for should look like:
2013-05-01 | 3
2013-05-02 | 2
2013-05-03 | 3
2013-05-04 | 1

Do you have any other hint for me?

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Count-distinct-value-by-date-tp4036320p4036361.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.