We want to save historical information on a specific entity-type that our system creates and modify. This data will be used to run analytics and show charts about statistics over time.
For simplicity let’s say it’s a User object which has various attributes (title, name, birthdate, role, and weight).
The historical data is stored in ElasticSearch so that every change on the user object opens to create a new document with user attributes values and in addition EffectiveStartDate and EffectiveEndDate that represent the duration of time for which the values of the attribute were correct for this user. This means that for each user object at any point in time there should only be one document with the an empty EffectiveEndDate.
The analytics and statistics we do are at a resolution of days, and we are looking a way to run a script while querying to duplicate the documents as the number of days in the interval [EffectiveStartDate, EffectiveEndDate]
Suppose the weight attribute of the user can change multiple times a day, how can we run the following query such that it will only take into account the last weight the user had in each day so we can create an aggregation such as
“For the time period (week/month/year) query and aggregate(average, max, min) users-weight at the end of each day”