Data view in Kibana with the latest timestamp version of a datastream

Hello,

I have a datastream that it is often being updated, for some graphs I use the full data stream for visualizations, for example, doing histograms with the @timestamp field. But for other cases I would like to do graphs that are only about the latest version(latest timestamp) in the records.

Which way do you recommend me to do this? How would you do a graph that always segments the records that has the latest @timestamp?

Best regards.

This seems to be the case for a transform to create a new index with the latest data based in a common entity field. Check the documentation for Latest transforms:

1 Like

Thanks I was considering that option and I'll go for it.

Hello,

I comming back to this,

as far as I see, I cannot use the transforms with a datastreams as an origin, correct?

Regards.

You can, not sure what you are trying to do, but there is no limitation about this.

Transforms can have a Data View or a Search as the source of the data, you just need to have a Data View for your Datastream and use this data view as the source of your transform.

1 Like

Thanks, I think having the data view as a source for the transform it is the solution, I'm just generating a latest type platform over the datastream.

Regards.

If you create the transform using the Kibana UI you need a Data View or saved search.

If you create using the API you can specify the data stream directly.

1 Like

That's useful to know.

On the other hand, once I generate the transforms and runs OK I can see that the destination index contains documents not only with the latest @timestamp field from the datastream but also some previous timestamps.

When creating the transform I set max age to 1h and in the datastream I have sets of documents from each hour timestamp, I can see timestamps for the few previous days(each hour).

Is there a way to restrict that the destination index has only the latest @timestamp docs, for the latest hour?

Trying to solve that I can use the following filter:

 "query":{
    "range": {
            "@timestamp": {
              "gt": "now"
              
            }
          }
  },  

and I get just the latest timestamp, that I think I can use it. But not sure why if I se the gt to now-1h, if it is one hour from now, I get the two latest timestamps with:

GET index_name/_search
{                                                                                                                             "query":{
    "range": {
            "@timestamp": {
              "gt": "now-1h"
              
            }
          }
  },  
  
    "size": 0,                                                                                                                                                                                                                                
    "aggs" : {                                                                                                                                                                                                                                
      "langs" : {                                                                                                                                                                                                                             
        "terms" : { "field" : "@timestamp",  "size" : 500 }                                                                                                                                                                            
      }                                                                                                                                                                                                                                       
    }                                                                                                                                                                                                                                         
}             
 "buckets": [
    {
      "key": 1698753777000,
      "key_as_string": "2023-10-31T12:02:57.000Z",
      "doc_count": 47444
    },
    {
      "key": 1698757351000,
      "key_as_string": "2023-10-31T13:02:31.000Z",
      "doc_count": 47367
    }
  ]

since the current time is 13:43.

You need to share the json of the transform you are using, you can get it in Kibana, in the Transform part.

Also, you need to share what you are seeing in the destination index and what you expect to see.

This is the transform json

{
  "id": "transform_name",
  "authorization": {
    "roles": [
      "superuser"
    ]
  },
  "version": "8.7.0",
  "create_time": 1698754072583,
  "source": {
    "index": [
      "datastream_name"
    ],
    "query": {
      "match_all": {}
    }
  },
  "dest": {
    "index": "index_name"
  },
  "frequency": "1h",
  "latest": {
    "unique_key": [
      "doc_hash.keyword"
    ],
    "sort": "@timestamp"
  },
  "settings": {},
  "retention_policy": {
    "time": {
      "field": "@timestamp",
      "max_age": "1h"
    }
  }
}

In the data stream I have a set of documents for a given timestamp for 12 days for each hour, so a total of 288 different unique timestamps.

I expect to see the sets of documents with the latest timestamp.

What there is in the destination index is the timestamp sets of docs for the last 3 days, so a total of 72 of unique timestamps( or 72 sets of documents for each timestamp)

I checked it again after some time and finally I can see just the latest timestamp documents, there might be some time required to reindex/process.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.