Hi all,
What would be the best way to query Elasticsearch in order to implement a date histogram representing the total number of unique visitors metric?
Considering the following data:
PUT /events
{
"mappings" : {
"_doc" : {
"properties" : {
"userId" : { "type" : "keyword" },
"eventDate" : { "type" : "date" }
}
}
}
}
POST /events/_bulk
{ "index" : { "_index" : "events", "_type" : "_doc", "_id" : "1" } }
{"userId": "1","eventDate": "2019-03-04T13:40:18.514Z"}
{ "index" : { "_index" : "events", "_type" : "_doc", "_id" : "2" } }
{"userId": "2","eventDate": "2019-03-04T13:46:18.514Z"}
{ "index" : { "_index" : "events", "_type" : "_doc", "_id" : "3" } }
{"userId": "3","eventDate": "2019-03-04T13:50:18.514Z"}
{ "index" : { "_index" : "events", "_type" : "_doc", "_id" : "4" } }
{"userId": "1","eventDate": "2019-03-05T13:46:18.514Z"}
{ "index" : { "_index" : "events", "_type" : "_doc", "_id" : "5" } }
{"userId": "4","eventDate": "2019-03-05T13:46:18.514Z"}
{ "index" : { "_index" : "events", "_type" : "_doc", "_id" : "6" } }
Now, if I query the cardinality of the userId field I get the 4 distinct visitors.
POST /events/_search
{
"size": 0,
"aggs": {
"visitors": {
"cardinality": {
"field": "userId"
}
}
}
}
However, distributing the documents over a date histogram, I get a total sum of 5 because there's a repeated userId in both buckets.
POST /events/_search
{
"size": 0,
"aggs": {
"visits_over_time": {
"date_histogram": {
"field": "eventDate",
"interval": "1d"
},
"aggs": {
"visitors": {
"cardinality": {
"field": "userId"
}
}
}
}
}
}
Is there a way to filter out those repeated values? What would be the best way to accomplish this?
Thanks a lot in advance!