I am trying to compute metrics based only on the latest record per key using ES|QL in 9.x version?
I have a time-series index where each entity (e.g. trainnumber) emits multiple events over time and I want to:
Consider only the latest record per trainnumber (by @timestamp)
Then compute counts based on the latest station value
In the couple of years I've been active on the forum, this type of Q has come up pretty often, and it's always been a bit awkward to answer. The TOP function, which will be enhanced soon with an OutputField, has close to the functionality you need.
What's seems to be missing, and will still be missing, is support for an outputField of keyword type. If your train number was actually a number then I think TOP that might be part of 9.3 will do the job.
EDIT: maybe actually better if the station were a number. With latest SNAPSHOT (*) I see:
FROM trains | STATS station=TOP(@timestamp, 1,"DESC",stationnumber) BY trainnumber | STATS train_count=COUNT(station) BY station | SORT train_count ASC
seems to do the job, given this data loaded into trains index
That all said, these types of common questions could be wayyy simpler to answer in ES|QL if the TOP function could be extended further with keyword fields as OutputField.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.