Looking to get only the latest records from the index

Mohamed_Naufal · June 2, 2022, 8:18am

Hi Community,

I'm currently working on Elasticsearch and looking for an option to retrieve only the latest records from the index. Let me break down this query,

We're using AWS Data Stream to load the logs and then, it is sent to Firehose, then to Elasticsearch,
We receive like lakhs of records in a span of few couple of minutes.
Elasticsearch takes 30 secs gap to retrieve a set of records from the Firehose.
So, using Dev Tools, I do the aggregation on the data that I get from the Firehose. Then the curl is shared to the Developer, and then the formatted data is stored in s3 for later use.

The problem is, when I run the query from Dev tool, initially I'd get few hundreds or thousands of records. The next time (after 30 secs) the data gets accumulated, and Total hits increase so on. So, each time I run the script in Dev Tools, I get the old data + new data which i don't want to.
I'm looking for a method, where, when i run the query, I should not get the old records, but only the new one, so I can do aggregation on latest records. Is there any possibilities for this?

Thanks in Advance.

DineshNaik · June 2, 2022, 11:22am

Welcome to the community @Mohamed_Naufal

You can have a date field and use that in your query to look for specific range.

warkolm · June 2, 2022, 11:14pm

Elasticsearch doesn't track state like that for you.
You'd want to look at using a date range query, where you keep track of the last time you ran it.

Mohamed_Naufal · June 3, 2022, 6:04am

Hi Warkolm,

Thanks for your response, But would I be able to dynamically change the date range? And, How can I track data availability to make sure there is no data loss. It'd be much helpful if you can share the sample template.

Mohamed_Naufal · June 3, 2022, 6:05am

Hi Dinesh,

Thanks for your response, If I use the Datefield, would I be able to change it dynamically for each 30 secs? And how guarantee is that there is no data loss on the range that I choose. Please help me if you have any solutions on it.

DineshNaik · June 3, 2022, 11:52am

You can use the now param which represents the current time , that way you always look for latest data , say last one day or you can change it to hours or minutes as per your need .

For ex.

{
  "query": {
    "range": {
      "timestamp": {
        "gte": "now-1d/d",
        "lt": "now/d"
      }
    }
  }
}

Please refer Range query | Elasticsearch Guide [8.2] | Elastic for more details.

Mohamed_Naufal · June 3, 2022, 12:11pm

Hey Dinesh,

This seems to be a satisfying idea, But I feel there would be minimal amount of data loss! Or what's your thoughts in it?

stephenb · June 3, 2022, 2:02pm

@Mohamed_Naufal

Can you help us understand What do you mean "minimal amount of data loss!"?

Are you concerned about data being lost / drop during the ingestion process?

Data is not "Lost" during the query process...

system · July 1, 2022, 2:02pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Aggregating and searching data from only the latest records of an identifier Elasticsearch	2	407	October 28, 2019
Not getting Updated data from Elastic search DB Elasticsearch	4	879	February 3, 2017
How to get only deltas in elasticsearch input Logstash	2	306	October 6, 2021
How can I search for the latest data entered in the indexes? Elasticsearch	18	776	January 24, 2024
ElasticSearch - Latest record per day per ID from elastic Elasticsearch	2	579	March 2, 2021

Looking to get only the latest records from the index

Related topics