Each document in my index corresponds to a specific sensor. I cannot change the data layout at all. I am interested in the value field for three of these sensors: X, Y, and Z at a given time (within a couple of seconds). I am able to successfully fetch all of the documents needed with a lucerne query: sensor:X AND sensor:Y AND sensor:Z
. The problem specifically is that the representation of a given point in time is supposed to be the sum of the values of the three sensors at that time. The sensors are supposed to take measurements every 20 seconds (4 times/minute) but it doesn't always work out that way due to dropped data packets, etc. However, when the measurements are collected, the timestamps on the documents are within 1-4 seconds of one another (always in 3s as X,Y,Z).
It's easier to explain this with a diagram of the data. The rightmost column is the document. The left is the timestamp. Middle is the value. The red boxes represent what needs to be summed together.
This is clunky but somewhat doable on short time scales. However, for timescales on the order of ~6 months, the queries will take too long. So I am trying to sum over larger periods of time (e.g., interval of 1m, 5m, 10m) and take the average of the sums over the interval. The problem is that over the course of an "actual" @timestamp
minute, the number of documents that get returned for the query can vary be 9, 10, 11, or 12. What I really want to do is sort of summing the value fields of the 3 documents
So, at the end of the day, what I am really interested is a corresponding data set that looks like:
11:29:11, RESULT = 35.6
11:29:39, RESULT=35.4
11:30:03, RESULT=35.4
etc.
But in an Elastic-friendly/scalable way that the server could handle.