Moving average with Top Hit values of a given id

Hi everyone,

I've got a bunch of data with which I want to create a moving average visualization over time.
I'd like the time span for the calculation of the moving average to be the last 2 days, and the time span of my line chart to be 1 or 2 months.
The trick is that I receive data that can appear several times. For instance, this is what I can receive:

[
  {"changeId":"18864", "timestamp": "2018-10-19 10:00:00", "total": 5 },
  {"changeId":"18865", "timestamp": "2018-10-19 10:00:00", "total": 10 },
  {"changeId":"18866", "timestamp": "2018-10-19 10:00:00", "total": 15 },
  {"changeId":"18864", "timestamp": "2018-10-20 10:00:00", "total": 5 },
  {"changeId":"18865", "timestamp": "2018-10-20 10:00:00", "total": 10 },
  {"changeId":"18866", "timestamp": "2018-10-20 10:00:00", "total": 15 },
  {"changeId":"18867", "timestamp": "2018-10-20 10:00:00", "total": 20 },
  {"changeId":"18865", "timestamp": "2018-10-21 10:00:00", "total": 10 },
  {"changeId":"18866", "timestamp": "2018-10-21 10:00:00", "total": 15 },
  {"changeId":"18867", "timestamp": "2018-10-21 10:00:00", "total": 20 },
  {"changeId":"18868", "timestamp": "2018-10-21 10:00:00", "total": 25 },
  {"changeId":"18865", "timestamp": "2018-10-22 10:00:00", "total": 10 },
  {"changeId":"18866", "timestamp": "2018-10-22 10:00:00", "total": 15 },
  {"changeId":"18867", "timestamp": "2018-10-22 10:00:00", "total": 20 },
  {"changeId":"18868", "timestamp": "2018-10-22 10:00:00", "total": 25 },
  {"changeId":"18869", "timestamp": "2018-10-22 10:00:00", "total": 30 },
  {"changeId":"18869", "timestamp": "2018-10-23 10:00:00", "total": 35 }
]

So in this case, I want the total field where changeId = 18864 to be counted only once (the most recent value). Is it even possible ? I've tried many things but I can't find any solutions...

I want the total field where changeId = 18864 to be counted only once

Does that mean you want the total for changeId 18864 from 2018-10-19 to be ignored? With Elasticsearch you can create nested aggregations, and create metrics based on the values within each level of the aggregation, so to get the latest value for each changeId I would split on the "terms" of the changeId field and use a "top hits" metric to get the most recent total value from each bucket (one bucket is created for each term in the changeId field.

This basically makes the timestamp in your test data useless though and nearly every measurement is for the same day, which means there wouldn't be anything to average over the timeline.

Based on the data it seems you might just want to do a standard chart that splits lines based on the changeId

Does that mean you want the total for changeId 18864 from 2018-10-19 to be ignored?

It depends ! I want it to be ignored when I calculate the moving average of the 2018-10-20 because I only want the latest. But for the calculation of the moving average of the 2018-10-19, it has to be taken into account because this moving average should be the average of all the distinct changeId lines over the last 2 days (so from N-1 to N date).

Based on the data it seems you might just want to do a standard chart that splits lines based on the changeId

I don't want to split chart lines because I have way too much different changeId, this is only a small sample data...

So to be more precise, for the day 2018-10-20, I want an average calculated from the 2018-10-19 to 2018-10-20, taking into account only the most recent total value for a given changeId so in this case indeed, all the total values from the 2018-10-19 would be ignored because they are present in the 2018-10-20 data. So the average in this period should be: (5+10+15+20)/4
But if there was a changeId not present in the 2018-10-20 data but present in the 2018-10-19 data, we would take it into account in the average calculation.

I don't know if I'm clear enough ?

Just to up this post

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.