Give this sample data structure:
"data": {
"values": [
{ "A": "Room1", "B": "Team1", "C": "Chef", "X": 30},
{ "A": "Room2", "B": "Team2", "C": "Waitress", "X": 24},
...]
}
Documents are coming in every 10 seconds with increasing values for X for a combination of fields A, B, C.
My goal is to create a bar chart (with vega) that shows the increase of field X split by field C in the selected time range of the dashboard.
The value field X is a gauge. So the value is continuously growing.
My first solution was, to create a terms aggregation to split by C. Than a data histogram sub aggregation with interval 10 seconds. Than a sum sub aggregation to get the combined sum of X for all combinations of A and B inside every interval. And than a derivative pipeline aggregation to get the increase of X split by C for every interval of the data histogram. After that I would use a sum_bucket aggregation over the derivative to get the increase of X over the whole time range for every C.
Not only is this a very wasteful and slow calculation because I aggregate way more data than I need, it also leads to randomly occurring bugs if the border of the 10 second interval happens to be splitting the incoming data in half. Than some combinations of A, B or C have no data in one interval leading to extremely big negative values. (Because the derivative calculates the difference from the last X to zero.)
So my question is: Is there a better way to aggregate the data to reach my goal?
I thought, I would only need the MIN and MAX of every combination of A,B,C and than calculate the difference. So I tried a composite bucket aggregation over fields A,B,C and added Min & Max sub aggregations. That worked, but than I couldn't split the resulting data by field C.
Thank you in advance.