Aggregation of multiple arrays into single one

Manuel · April 21, 2016, 7:53am

Hi,

I want to use Elasticsearch as a timeseries database where I store an array of 850 sensor values every hour. The database will store data for several years. Now I want to query all the documents within a given time range and downsample them using a max aggregation. My question is: How do I efficiently compute the maximum value for each array index?

For example I have arrays of five documents and I want to aggregate them into a array of the same size containing the maximum values :

Array 1: [1, 0, 2, ..... ]
Array 2: [3, 2, 2, ..... ]
Array 3: [1, 1, 1, ..... ]
Array 4: [5, 2, 0, ..... ]
Array 5: [4, 0, 3, ..... ]

Max Array: [5, 2, 3, ..... ]

I currently use a scripted metric aggregation where I iterate over each of the 850 values for each document, which results in a quite low performance. Can this be achieved in a more efficient way?

Mark_Harwood · April 21, 2016, 8:30am

I'm not 100% on the document mappings or the aggregations you're trying to achieve here but if you want to sum only the maximum values found in each elasticsearch document where each doc holds an array at present you will have to use a script. The need for a script would be removed if you held a "max" value on each doc which would obviously be trivial to compute at write time.

Alternatively, you could create a "rolled-up" index using some of the techniques I describe here in building entity-centric indexing: https://www.youtube.com/watch?v=yBf7oeJKH2Y (includes some scripts you can download)
It amounts to the same thing - maintaining a thinned version of events.

Cheers
Mark

Manuel · April 21, 2016, 8:54am

Thanks for your fast reply. I dont want to compute the maximum of a single array but the maximum for each index across several arrays.

In the example above I have five input arrays, where the nubmers 1, 3, 1, 5 and 4 are the values of each array at index 0. Therefore the resulting array should store 5 at index 0.

colings86 · April 21, 2016, 11:05am

If the element at each index in the array each represent a different metric, I would store these metrics as separate fields so then you can query them separately.

So you first document would be:

{
  "metric_0": 1,
  "metric_1": 0,
  "metric_2": 2,
  ...
  "metric_849": 1
}

Obviously you could name the metrics properly according to what they represent instead of having the form metric_<INDEX>

Manuel · April 21, 2016, 12:55pm

And performing 850 individual aggregations would be faster?

colings86 · April 21, 2016, 1:15pm

You would have to test to find out if it would be faster than your script solution.

What information are you actually presenting to the user? Surely a user will have trouble processing 850 figures if you are displaying them all at the same time? Are you doing some post processing of these results before you display something to the user? There may be other ways of achieving what you are after, for example, if you are presenting these results in pages, this would give you the opportunity to query only one page (say 20) of the metrics at a time.

Manuel · April 21, 2016, 1:36pm

I just tried it out and it seems there is no difference. The data is presented to the user all at once as an image. One column in that image would correspond to one aggregated document.

Topic		Replies	Views
Is there a way to "chain" aggregations together? Elasticsearch	1	619	May 23, 2019
How to get max value with document data Elasticsearch	4	7618	July 5, 2017
Multiple sum aggregations using the same script Elasticsearch	3	962	January 3, 2017
How to parse below SQL query to ES? Elasticsearch	10	2831	November 6, 2017
Aggregate count and max per document query Elasticsearch	5	303	July 21, 2023

Aggregation of multiple arrays into single one

Related topics