Dec 11th, 2021: [en] On a road trip with Transform

Are you already tired of hearing Christmas songs? One of the classics is "Driving home for Christmas". The lyrics of that song have not much content, Chris Rea basically tells about a long boring journey in his car driving home. In this post we accompany this song and spice it with new bits and pieces about Transforms. Transforms provide an easy way to summarize data.

And it's been so long

The aim of a search engine - like Elasticsearch - is to provide relevant results quickly. For your family, it is relevant to know that you are ok when you are on your way home. The important information could be your current, latest status. Early this year we introduced a new function to the transform API that provides fast access to what's latest in a stream. What’s latest is up to you, it could be the latest state of a server or a car:

{
    "id": "313",
    "model": "bantam",
    "location" : {
        "lon" : 47.99,
        "lat" : 11.67
    },
    "speed": 120,
    "engine_temperature": 70,
    "state_of_charge": 94,
    "timestamp": "2021-12-23T09:28:48+00:00"
}

For defining latest we need 2 properties. We need one or more unique keys. For example, in this case we can use the id. The timestamp tells us when the event took place, it is our second property, the sort field. You can find a walkthrough in this blog post.

But soon there'll be a freeway yeah

On a freeway you can drive faster, Elasticsearch provides index structure to make queries faster, transforms makes access even more effective using checkpoints. Checkpoints are used in the continuous mode of transforms to remember which data we have already processed. You can think of it as exits on the freeway. If you get asked where you are you say I just passed a certain exit instead of saying you are somewhere between start and end. You can learn more about checkpoints here.

I take look at the driver next to me

Many of us will be on the road; traffic will be jammed. A transform writes results into a new destination index. Such an index has the benefit of allowing us to do second-level analysis. If we have all the latest information of every car on the road, we can calculate the average speed of all cars and we can query for cars that need to recharge soon.

Oh, I got red lights all around

These are hopefully just stop lights, we definitely don’t need engine problems. For more complex use cases, transforms that use a pivot function can pre-aggregate data. Over time we have added more and more supported aggregations. One of the recent additions is top_metrics. It can provide similar functionality to latest in a pivot. Assume we want to know the engine temperature at top speed:

"engine_temp_at_top_speed": {
    "top_metrics": {
        "metrics": {"field": "engine_temperature"},
        "sort": {"speed": "desc"}
    }
}

But top metrics work not only for metric type fields. In fact, only the sort field must be a numeric or date type. Here is an example to get the last given email address:

"email": {
    "top_metrics": {
        "metrics": { "field": "email" },
        "sort": { "timestamp": "desc" }
    }
}

You can find out more about top metrics in our docs. Note that at the moment transform does not support the size parameter for retrieving more than 1 top entry.

With a thousand memories

Data can get larger and larger, at some point you might want to get rid of outdated entries in your transform destination index. For example, you might have a new car. For this you can define a retention_policy:

"retention_policy": {
    "time": {
        "field": "timestamp",
        "max_age": "30d"
    }
}

This works for both latest and pivot transforms.

Driving home for Christmas

We wish everyone safe travels, whether that is by car or by any other transportation.

3 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.