Scripted metric aggregations & sorting

colings86 · August 13, 2018, 8:48am

If you wanted to do the analytics offline you could use the scroll API to stream all the data out of Elasticsearch and do your calculations on your client.

The problem with your approach here is that you are potentially going to need to stream a lot of data from the shards to the coordinating node because the number of timestamps could be large. You could mitigate this in two ways:

When indexing documents use routing to route documents with the same ccPairs value to the same shard - This way you are guaranteed that all timestamps for a term bucket are on the same shard. You will still need to do complex processing though so will likely still need to use the scripted_metric aggregation.
Have a secondary index where each document represents a ccPairs value and contains the information about whether there is a gap and at what timestamps - This involves having a job that runs periodically, collects new data from the primary index and merges that new data into the relevant documents int eh secondary index. At query time you can than run normal aggregations to obtain the information you need if you structure the documents in the secondary index in appropriate ways to show this data.

If your data volumes are small enough that your current approach (after you add a reduce script) seems to be working well then you can continue with this approach but it might be worth keeping the above in mind if your data volume increases and performance starts to suffer.

Topic		Replies	Views
Terms aggregation bucket sort order based on scripted metric aggregation Elasticsearch	2	2770	July 6, 2017
Sorting based on scripted metrics aggregation Elasticsearch	3	1017	March 23, 2021
Bucket sort by a scripted metric Elasticsearch	1	434	August 5, 2020
Sort on scripted metric fields Elasticsearch	1	356	January 10, 2020
Scripting aggregation order Elasticsearch	5	3628	July 5, 2017

Scripted metric aggregations & sorting

Related topics