Retention users in ES


(Sergey) #1

Hi!!!
I have documents with user_id field. When new user comes I get log with :"New user_id created"
Is it possible in ES to calculate how many new users in my store visit it again after 7(28) days after.


(Mark Walkom) #2

It should be with an aggregation, can you provide a sample document?


(Sergey) #3

@warkolm
New user(this document comes only one per a user while he's registration):

"_source": {
           "message": "Authorization request, created new account: login: '678886946f9a80a24f407ade11d21ee8'; user_id: 1113124; name: ''; provider: GooglePlay; build_type: 'atc';",
           "@version": "1",
           "@timestamp": "2015-09-21T15:33:59.420Z",
           "host": "132.91.54.125",
           "server": "PRODUCTION",
           "HostName": "RD000D3AB11C64",
           "thread": "45",
           "level": "DEBUG",
           "user_id": 1113124,
           "event": "create_user",
           "build_type": "atc",
           "provider": "GooglePlay"
        },
        "sort": [
           1442849639420
        ]
     },

And after that I get other documents with user_id field, for example:

"message": "Cash updated: account: 1115836; name: 'han han'; operation: UnlockArmor; delta_cash0: 0; delta_cash1: 0; delta_cash2: 0; delta_cash3: 0; delta_cash4: 0; cash0: 3000; cash1: 1000; cash2: 3; cash3: 5; cash4: 0;",
           "@version": "1",
           "@timestamp": "2015-09-23T00:01:24.886Z",
           "host": "138.91.54.148",
           "server": "PRODUCTION",
           "HostName": "RD000D3AB11C64",
           "thread": "34",
           "level": "DEBUG",
           "user_name": "han han",
           "user_id": 1115836,
           "operation": "UnlockArmor",
           "credits": 3000,
           "parts": 1000,
           "iron_runes": 3,
           "life_runes": 5,
           "boosters": 0,
           "delta_credits": 0,
           "delta_parts": 0,
           "delta_iron_runes": 0,
           "delta_life_runes": 0,
           "delta_boosters": 0,
           "event": "update_cash"

All I want is to calculate all new users per 1 period of time (day, week, etc.) and know how many users come back to my store also in a certain period of time (day,week,etc).


(Christian Dahlqvist) #4

This may be a use case that could be well served through creating an entity centric user index, in which a single document or hierarchy of documents contain information about the user, e.g. creation date and interaction history, and better supports the type of queries you wish to run.


(Sergey) #5

@Christian_Dahlqvist
So without entity centric I can't do this?


(Christian Dahlqvist) #6

Although I was not able to think of any easy way to do it based on my understanding of your requirements, I would not rule out that there are other solutions.


(Nick Li) #7

Hi, have you solved this problem and how?
My team encounter the same situation


(Sergey) #8

No I haven't!
But if you interested in Analytics. I would recommend to you R language with RStudio. They have elastic package(for working with raw data).


(Nick Li) #9

Do you mean that RStudio can solve this problem?


(Raghvendra Singh) #10

hey . have you guys figured out , how to do retention in Elasticsearch ?. I am also stuck with the same problem .


(Mark Harwood) #11

This isn't an elasticsearch problem, it's essentially physics.

If you are trying to do analysis of user behaviours it's easiest to do on a user-centric store where all related data is consolidated in one place. The further apart you spread the related data the more costly life becomes.
Separating related user data on the same disk (as often happens when data is received in time-series) requires lots of disk accesses and/or RAM to link these events together as part of querying.
Separating related data by distributing it across multiple machines incurs the costs of streaming data over slow networks to link information.

On systems with large volumes of users, each generating many log events these physical linking costs become unbearable at query time. We can't make disks and networks faster, nor can we make RAM cheaper. We have to use techniques like entity-centric indexing to keep the query costs contained.


(Raghvendra Singh) #12

Is there tool which could be useful ?


(Mark Harwood) #13

The scripts and data from the talk on building entity-centric indexes is here