I tried to select by a scripted_metric aggregations & sort it by timestamp field.
The result seems to be divided into sorted arrays in the aggregation bucket.
However, the arrays themselves are not sorted between themselves (see the screenshot).
Also I wonder what your use case for using the scripted_metric aggregation is here? Often there are ways of achieving what you need without using the scripted_metric aggregation and instead combining the script feature in another aggregation instead
The data is a collection of timestamps.
I want to know if there is a gap between 2 consecutive timestamps (index split to more than one shard - 2 consecutive timestamps can on different shards).
Is it possible to gather all data and sort it after?
If you wanted to do the analytics offline you could use the scroll API to stream all the data out of Elasticsearch and do your calculations on your client.
The problem with your approach here is that you are potentially going to need to stream a lot of data from the shards to the coordinating node because the number of timestamps could be large. You could mitigate this in two ways:
When indexing documents use routing to route documents with the same ccPairs value to the same shard - This way you are guaranteed that all timestamps for a term bucket are on the same shard. You will still need to do complex processing though so will likely still need to use the scripted_metric aggregation.
Have a secondary index where each document represents a ccPairs value and contains the information about whether there is a gap and at what timestamps - This involves having a job that runs periodically, collects new data from the primary index and merges that new data into the relevant documents int eh secondary index. At query time you can than run normal aggregations to obtain the information you need if you structure the documents in the secondary index in appropriate ways to show this data.
If your data volumes are small enough that your current approach (after you add a reduce script) seems to be working well then you can continue with this approach but it might be worth keeping the above in mind if your data volume increases and performance starts to suffer.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.