I need to create visualisation 4 lines
(I want to identify how many customers in time are using how many devices)
count (unique customer) having count(uniqued deviceid) = 2
count (unique customer) having count(uniqued deviceid) = 3
count (unique customer) having count(uniqued deviceid) = 4
count (unique customer) having count(uniqued deviceid) > 4
In Index I have 3 field @timestamp - date
customer - keyword
deviceid - keyword
this is a log from application, one customer can have several documents with same deviceid. This
This is sort of behavioural analysis will be hard to do using a distributed index on a lot of data. It requires a lot of joins based on a customer key and distributed joins are expensive in any system.
You'll likely need to build an "entity centric" index (each doc = one customer).
The new transform api can help collapse the device id for each customer doc using a cardinality aggregation. Once you've built the customer index you can use a date histogram in Kibana on it and and plot the lines using custom ranges for the 4 device ownership ranges you listed.
However, the challenge here is the date info - presumably you want that a customer could appear once in the line for February with 2 devices but again in March with 3 devices? This would mean the docs would not be customer documents but customer-as-at documents which summarise the person's device count on that particular date. That will likely require custom script to build that sort of index.
Great I did not know about transform api , I am looking at documentation
this seems to be useful in my case.
I will give it a try. I hope I will be able to create code to transform the data for further analysis.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.