This is a general question. I find that not all metrics are implementable from one index in ElasticSearch. Case in point if we have a dataset that is as follows
(ts = timestamp long integer)
(email = User email who logs in)
email_1, ts1
email_2, ts2
email_3, ts3
email_1, ts2
email_4, ts5
email_2, ts6
email_1, ts7
and so on....
A metric like count daily unique new users is easy using a Kibana Transform - group by email, aggregate min(ts) and build a histogram(daily)
However a metric like count daily unique RETURN users is relatively tougher. I have not been able to do it using the index based on the dataset above. So what I do is crunch data on the backend to figure out return users daily and create another dataset and bulk load it into a new index in ElasticSearch.
Another tricky metric is average time between repeated user logins. From the above dataset it will be
email_1, ((ts2 - ts1) + (ts7 - ts2))/2
email_2, (ts6 - ts2) / 1
email_3, N/A (they have not logged back again yet)
email_4, N/A (they have not logged back again yet)
For now my flow is the following
Step 1 : Use Apache Drill to wrangle the data into a CSV
Step 2 : Python script to JSON-ize the CSV
Step 3: Bulk Load the JSONs into a new a new index in ElasticSearch
Step 4: Use this new index to create visualization in Kibana
I have a cron or some scheduler to periodically do Step 1, Step 2, Step 3.
So my question is how do you all implement complex metrics such as these?
Thanks