Complicated aggregation query in period interval

We want to save historical information on a specific entity-type that our system creates and modify. This data will be used to run analytics and show charts about statistics over time.
For simplicity let’s say it’s a User object which has various attributes (title, name, birthdate, role, and weight).

The historical data is stored in ElasticSearch so that every change on the user object opens to create a new document with user attributes values and in addition EffectiveStartDate and EffectiveEndDate that represent the duration of time for which the values of the attribute were correct for this user. This means that for each user object at any point in time there should only be one document with the an empty EffectiveEndDate.

The analytics and statistics we do are at a resolution of days, and we are looking a way to run a script while querying to duplicate the documents as the number of days in the interval [EffectiveStartDate, EffectiveEndDate]

Question:
Suppose the weight attribute of the user can change multiple times a day, how can we run the following query such that it will only take into account the last weight the user had in each day so we can create an aggregation such as
“For the time period (week/month/year) query and aggregate(average, max, min) users-weight at the end of each day”

Hi @sevatal,

If I understand correctly you have a user that post his weight several time a day and create a new doc on each update:

{
    "user": "Tom",
    "date_time": "2019-10-10 10:00:00",
    "weight": 5,
    .....
}

{
    "user": "Tom",
    "date_time": "2019-10-10 11:00:00",
    "weight": 10,
    .....
}

So for this case you want to retrieve the value 10 as it's the last value for your user Tom this day (2019-10-10)?

To get the last value you can order_by date_time desc to have the last value first then limit to 1 so you get the only last value, then you can make what ever aggregations you need.

There's a range date field to help on interval search if you need to get a value for a week, or working days etc...:
https://www.elastic.co/guide/en/elasticsearch/reference/7.4/range.html

It is not exactly. Please, see example below:

{
"user": "Tom",
"start_date_time": "2019-10-10 10:00:00",
"end_date_time": "2019-10-10 11:00:00",
"weight": 5,
.....
}

{
"user": "Tom",
"start_date_time": "2019-10-10 11:00:00",
"end_date_time": "2019-12-10 11:00:00",
"weight": 10,
.....
}

{
"user": "Tom",
"start_date_time": "2019-12-10 11:00:00",
"end_date_time": "2019-12-10 12:00:00",
"weight": 15,
.....
}

{
"user": "Tom",
"start_date_time": "2019-14-10 11:00:00",
"end_date_time": null,
"weight": 20,
.....
}
{
"user": "Seva",
"start_date_time": "2019-10-10 08:00:00",
"end_date_time": "2019-13-10 09:00:00",
"weight": 5,
.....
}

{
"user": "Seva",
"start_date_time": "2019-13-10 09:00:00",
"end_date_time": "2019-14-10 11:00:00",
"weight": 10,
.....
}

{
"user": "Seva",
"start_date_time": "2019-15-10 11:00:00",
"end_date_time": null,
"weight": 15,
.....
}

The question is:
I want to know for the period of time ("2019-10-10", "2019-16-10") how many users were in each "weight" per day (end of day)

Expected result:
Date Weight Unique Count
2019-10-10 5 1
10 1
15 0
20 0
2019-11-10 5 1
10 1
15 0
20 0
2019-12-10 5 1
10 0
15 1
20 0
2019-13-10 5 0
10 1
15 1
20 0
2019-14-10 5 0
10 1
15 0
20 1
2019-15-10 5 0
10 0
15 1
20 1
2019-16-10 5 0
10 0
15 1
20 1

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.