Elasticsearch - Cardinality of Visitors group by day, But visitors be taken uniquely in the queried date range


(Dhineshkumar) #1

We are trying to get the cardinality of visitors based on a time range, group by day. But the point to note is, the visitor should be taken unique in the complete queried date range
and then it should be grouped by day.

For example..

Our document samples,

   {visitorid:1,time:November 3rd 2016}
   {visitorid:2,time:November 3rd 2016}
   {visitorid:1,time:November 4th 2016}
   {visitorid:3,time:November 4th 2016}

Our result count (for the date range Nov 3 to Nov 4) should be like,

   Nov 3 - 2 visitors (visitorid 1,2)<br>
   Nov 4 - 1 visitors (visitorid 3, since visitorid 1 has already taken for Nov 3)

We have tried the following aggregation,

"aggs": {
    "articles_over_time": {
      "date_histogram": {
        "field": "time",
        "interval": "day"
      },
      "aggs": {
        "distinct_visitors": {
          "cardinality": {
            "field": "visitorid"
          }
        }
      }
    }
  }

But it provides, the result as,

    Nov 3 - 2 visitors
    Nov 4 - 2 visitors

Since it does the group by first and then does the cardinality.

Please share us some solutions or alternative ideas to approach this.


(Mark Harwood) #2

Is it safe to assume the time range you are querying is long eg a year or a month not the 2 days in your example?
I'm guessing this measure is your way of tracking rate of new customer acquisition?
If so that might be better handled using an entity centric index on customer with a "first visit" property (last visit would be interesting to record too for customer retention analysis)


(Dhineshkumar) #3

Thanks for the reply, Harwood.

My usecase is not exactly like a new customer acquisition tracking.
Mine will be like representing a customer visit tracking within a date range.

Lets take this scenario, a customer have visited an e-commerce site in the following days, November 2nd,5th,8th.
I have to track the first visit of the customer in a given date range.
For example,if a date range given is Nov 1st to Nov 10th, then I have to consider his visit day as on Nov 2nd but not the 5th or 8th. His contribution count will be added only for Nov 2nd and not for 5th or 8th.
If the date range given is Nov 5th to Nov 10th, then I have to take his visit day as Nov 5th and not the 8th. His contribution count will be added only for Nov 5th and not for 8th.

So my requirement is to provide the contribution count of visitors in a day basis, for a given date range.

Hope, the above explains my real usecase better.


(system) #4