Are you already tired of hearing Christmas songs? One of the classics is "Driving home for Christmas". The lyrics of that song have not much content, Chris Rea basically tells about a long boring journey in his car driving home. In this post we accompany this song and spice it with new bits and pieces about Transforms. Transforms provide an easy way to summarize data.
And it's been so long
The aim of a search engine - like Elasticsearch - is to provide relevant results quickly. For your family, it is relevant to know that you are ok when you are on your way home. The important information could be your current, latest status. Early this year we introduced a new function to the transform API that provides fast access to what's latest in a stream. What’s latest is up to you, it could be the latest state of a server or a car:
{
"id": "313",
"model": "bantam",
"location" : {
"lon" : 47.99,
"lat" : 11.67
},
"speed": 120,
"engine_temperature": 70,
"state_of_charge": 94,
"timestamp": "2021-12-23T09:28:48+00:00"
}
For defining latest
we need 2 properties. We need one or more unique keys. For example, in this case we can use the id. The timestamp tells us when the event took place, it is our second property, the sort field. You can find a walkthrough in this blog post.
But soon there'll be a freeway yeah
On a freeway you can drive faster, Elasticsearch provides index structure to make queries faster, transforms makes access even more effective using checkpoints. Checkpoints are used in the continuous mode of transforms to remember which data we have already processed. You can think of it as exits on the freeway. If you get asked where you are you say I just passed a certain exit instead of saying you are somewhere between start and end. You can learn more about checkpoints here.
I take look at the driver next to me
Many of us will be on the road; traffic will be jammed. A transform writes results into a new destination index. Such an index has the benefit of allowing us to do second-level analysis. If we have all the latest information of every car on the road, we can calculate the average speed of all cars and we can query for cars that need to recharge soon.
Oh, I got red lights all around
These are hopefully just stop lights, we definitely don’t need engine problems. For more complex use cases, transforms that use a pivot
function can pre-aggregate data. Over time we have added more and more supported aggregations. One of the recent additions is top_metrics
. It can provide similar functionality to latest
in a pivot
. Assume we want to know the engine temperature at top speed:
"engine_temp_at_top_speed": {
"top_metrics": {
"metrics": {"field": "engine_temperature"},
"sort": {"speed": "desc"}
}
}
But top metrics work not only for metric type fields. In fact, only the sort field must be a numeric or date type. Here is an example to get the last given email address:
"email": {
"top_metrics": {
"metrics": { "field": "email" },
"sort": { "timestamp": "desc" }
}
}
You can find out more about top metrics in our docs. Note that at the moment transform does not support the size parameter for retrieving more than 1 top entry.
With a thousand memories
Data can get larger and larger, at some point you might want to get rid of outdated entries in your transform destination index. For example, you might have a new car. For this you can define a retention_policy
:
"retention_policy": {
"time": {
"field": "timestamp",
"max_age": "30d"
}
}
This works for both latest
and pivot
transforms.
Driving home for Christmas
We wish everyone safe travels, whether that is by car or by any other transportation.