Compare aggregation result in Elastic Search to find repetitive users

sa35t · July 14, 2016, 5:32am

Hi,

Basically I want to do something like this http://stackoverflow.com/questions/36711667/comparing-data-in-kibana

Currently I can find unique users with in data range, but how do I compare it with my whole corpus to find how many of them are new and how many of them repetitive. Current query to find unique user with in time range

{ "from": 0, "size": 0, "query": { "filtered": { "query": { "query_string": { "analyze_wildcard": true, "query": "*" } }, "filter": { "bool": { "must": [ { "range": { "date_time": { "lte": 1468348199000, "format": "epoch_millis", "gte": 1468261800000 } } } ], "must_not": [] } } } }, "aggs": { "cardinality_device_id": { "terms": { "field": "device_id" } } }, "fields": [ "*", "_source" ] }

Any help will be appreciated. Thanks

johtani · August 9, 2016, 10:13am

Hi,

I think it is hard to calculate in Elasticsearch only.
You get terms aggs twice, one is total one is a day.
Then you compare these data on your familiar programing language.
It is easy way to do this.

sa35t · August 14, 2016, 8:39am

What if data set is too large ?

Christian_Dahlqvist · August 14, 2016, 9:27am

One way to perform this type of user centric analysis is to create a separate entity-centric index. This allows you to spread out the computation and prepare the data over time rather than do it all at query time, which can be expensive and complicated. If designed correctly it should also be possible to use this entity-centric index directly in Kibana, and as it will contained summarised and aggregated information it will generally perform and scale quite well.

taras · August 14, 2016, 7:43pm

Depending on the scale of your problem. Lets say you're talking RTB scale, then entity centric indexes and some batch processing are your main options.

For processing billions of signals we have the following rough breakdown:

Trail Collection (Audit log) with raw data and minimal indexed fields
CurrentProfile - a sliding time window index with verbose aggregation of pretty much everything we may care about
DeviceIdMapping - ID to ID mapping. Gives you cheap existence check among other things.
Profile - That is your longer lived

If you have your profile object index then you can query & aggregate by creation timestamp, last action timestamp, whatever makes sense to the app

Topic		Replies	Views
Retention users in ES Elasticsearch	13	2513	November 4, 2022
How can I use aggregations to query distinct values across all time grouped by first seen Elasticsearch	17	8585	July 5, 2017
Aggregating the distinct result to a grouping Elasticsearch	3	1002	July 5, 2017
Searching elastic unique user data in specific time in elasticsearch Elasticsearch	3	390	July 6, 2017
Elasticsearch - count users query Elasticsearch	6	768	July 6, 2017

Compare aggregation result in Elastic Search to find repetitive users

Related topics