Retention users in ES

Hi!!!
I have documents with user_id field. When new user comes I get log with :"New user_id created"
Is it possible in ES to calculate how many new users in my store visit it again after 7(28) days after.

It should be with an aggregation, can you provide a sample document?

@warkolm
New user(this document comes only one per a user while he's registration):

"_source": {
           "message": "Authorization request, created new account: login: '678886946f9a80a24f407ade11d21ee8'; user_id: 1113124; name: ''; provider: GooglePlay; build_type: 'atc';",
           "@version": "1",
           "@timestamp": "2015-09-21T15:33:59.420Z",
           "host": "132.91.54.125",
           "server": "PRODUCTION",
           "HostName": "RD000D3AB11C64",
           "thread": "45",
           "level": "DEBUG",
           "user_id": 1113124,
           "event": "create_user",
           "build_type": "atc",
           "provider": "GooglePlay"
        },
        "sort": [
           1442849639420
        ]
     },

And after that I get other documents with user_id field, for example:

"message": "Cash updated: account: 1115836; name: 'han han'; operation: UnlockArmor; delta_cash0: 0; delta_cash1: 0; delta_cash2: 0; delta_cash3: 0; delta_cash4: 0; cash0: 3000; cash1: 1000; cash2: 3; cash3: 5; cash4: 0;",
           "@version": "1",
           "@timestamp": "2015-09-23T00:01:24.886Z",
           "host": "138.91.54.148",
           "server": "PRODUCTION",
           "HostName": "RD000D3AB11C64",
           "thread": "34",
           "level": "DEBUG",
           "user_name": "han han",
           "user_id": 1115836,
           "operation": "UnlockArmor",
           "credits": 3000,
           "parts": 1000,
           "iron_runes": 3,
           "life_runes": 5,
           "boosters": 0,
           "delta_credits": 0,
           "delta_parts": 0,
           "delta_iron_runes": 0,
           "delta_life_runes": 0,
           "delta_boosters": 0,
           "event": "update_cash"

All I want is to calculate all new users per 1 period of time (day, week, etc.) and know how many users come back to my store also in a certain period of time (day,week,etc).

This may be a use case that could be well served through creating an entity centric user index, in which a single document or hierarchy of documents contain information about the user, e.g. creation date and interaction history, and better supports the type of queries you wish to run.

@Christian_Dahlqvist
So without entity centric I can't do this?

Although I was not able to think of any easy way to do it based on my understanding of your requirements, I would not rule out that there are other solutions.

Hi, have you solved this problem and how?
My team encounter the same situation

No I haven't!
But if you interested in Analytics. I would recommend to you R language with RStudio. They have elastic package(for working with raw data).

Do you mean that RStudio can solve this problem?

hey . have you guys figured out , how to do retention in Elasticsearch ?. I am also stuck with the same problem .

This isn't an elasticsearch problem, it's essentially physics.

If you are trying to do analysis of user behaviours it's easiest to do on a user-centric store where all related data is consolidated in one place. The further apart you spread the related data the more costly life becomes.
Separating related user data on the same disk (as often happens when data is received in time-series) requires lots of disk accesses and/or RAM to link these events together as part of querying.
Separating related data by distributing it across multiple machines incurs the costs of streaming data over slow networks to link information.

On systems with large volumes of users, each generating many log events these physical linking costs become unbearable at query time. We can't make disks and networks faster, nor can we make RAM cheaper. We have to use techniques like entity-centric indexing to keep the query costs contained.

1 Like

Is there tool which could be useful ?

The scripts and data from the talk on building entity-centric indexes is here