Indices generated lately than real time several hours

Anh_Nguyen · January 16, 2023, 2:11am

I got some problems with Datastream's Indices Generating. In detail, the indices are generated more lately than real time 1 day or a few hours. I want to have an index for the right date of the generated index but i could not find the way to do neither the reason of generating late.
For example, I found that index .ds-logs-cef.log-arcsight.asa-2023.01.15-000020 started rollover action at 2023-01-16 01:16:10 and I do not have an index for today, this is the /_ilm/explain of this index:

I have an index template for the datastream of the index above and it does not have aliases but i have rollover_alias for this datastream:

And, this is my ILM:

warkolm · January 17, 2023, 12:23am

Welcome to our community!
Please don't post pictures of text, logs or code. They are difficult to read, impossible to search and replicate (if it's code), and some people may not be even able to see them.

ILM does not guarantee that the rollover will happen every day at 00:00, there are potential delays due to the timing of the ILM process.

Anh_Nguyen · January 17, 2023, 3:15am

I will fix the post, thankyou.
Is there any way to force the timing of ILM process or solution for this case (rightable time generated indices), I worry that my data will be lost.

stephenb · January 17, 2023, 3:30am

This look like exactly what I would expect.

The index is only 7.74 hours old
The ILM Policy indicates 1 Day Rollover or 50GB
So index is waiting to be 1 Day old or 50GB before it rolls over

Perhaps I am missing something....

Anh_Nguyen · January 17, 2023, 3:37am

Yes, but the generation of this index is 15.1.2023 that not match the current date _ 16.1.2023. I would like to have the index as same as current date. I have no idea to solve it.

stephenb · January 17, 2023, 3:38am

Are you taking into consideration timezone?

What are you using to ingest the logs?

By the way, you are never actually guaranteed that a log timestamp will always be in the same date time stamp of the index...

For example, what if a host is offline for a bit and comes back online? The latest logs will get ingested into the current index, but perhaps they're older.. from the day before.

So I'm not quite sure what you're actually trying to solve.

Anh_Nguyen · January 18, 2023, 1:32am

I use elastic agent and CEF to ingest logs.
The log timestamp is correct for my country's time zone. But I do not get what "index's timestamp" is...
And this is my real topology so I never turn it off and I am sure that it never down until now.

Thank you for your patience with my question (●'◡'●). Is this clearer?

stephenb · January 18, 2023, 2:34am

The index date is probably when you first ran setup or ingested the first event.

The index was created at:

Sunday, January 15, 2023 6:16:05.792 PM

I got that by putting the timestamp in and convert... 1673806565792

So that is the creation date and it will rollover when it is 1 day old... And the every day from there old ... When the index is one day old not to line up with the clock which is what you want.

If you really want it to line up you can manually rollover the data stream at the Time you want... Then it will line up but since this is a data stream You're supposed to interact with the data via the data stream not the backing index.

Try to do this is fine but adds little value and is missing the concept of a data stream which is supposed to abstract you from the underlying indices.

Anh_Nguyen · January 18, 2023, 2:43am

In my ILM policy, I set rollover action that it can rollover anytime but not longer than 1day olds of the index's creation time. In phase defination:

Btw, I want the indices line up with the clock (exactly date of generation). Was my settings incorrect?

stephenb · January 18, 2023, 4:29am

Data Streams do not work that way, there is no guarantee they will rollover exactly at midnight not anyway I know. Even it you manually rollover the data stream at exactly midnight UTC.. the next rollover will be approximately 24 hours not exact...as ILM is a background task and 1d means ~24 hours from rollover not exactly midnight each day

Good luck.. poke around perhaps you can find a way... But with data Streams and ILM It is not meant to work the way you want.

You can look up legacy daily indices that is more what you want.. you would probably need to use logstash or something.

In my view you'll do all that and they'll be very little actual benefit... And even with that at the edges it may not be perfect. Not sure why you're trying to make it perfect.

Good luck poke around. Perhaps you'll find a way.

Anh_Nguyen · January 18, 2023, 4:37am

I'll try that. Thanh you. Have a good day (❁´◡`❁)

Christian_Dahlqvist · January 18, 2023, 6:15am

Before rollover and data streams were available the standard way to create time-based indices was to create indices with the date in the name. All data having a timestamp belonging to that date would go into that index. This potentailly gives you the data separation you are looking for, although the scheme is generally based on UTC time and not local timezone.

The problem with this approach is that it easily results in very uneven sized shards, which can be a performance problem. It was also possible for data arriving very late to be indexed into indices that had already been moved to the warm or cold phase, which could also cause performance problems.

Rollover and data streams index into a single index using a write alias, which solves the second problem mentioned. Rollover also allows you to switch to a new index based on age and/or shard size. This means that you can generate multiple indices during a very busy day and have an index cover multiple days worth of data when the data volumes are low without changing configuration. This will give you shards of more even size, which is generally good and desirable for this type of data.

As retention is managed at the index level this means that you may keep data around in the cluster a bit longer if the index to be deleted covers multiple days but you can still guarantee that data covering a certain retention period is available, so this is usually not an issue.

I would recommend you to think carefully about why you feel you need the level of control you describe as you will be trading away the benefits I described and potentially make your cluster less performant and more difficult to manage.

Anh_Nguyen · January 18, 2023, 6:44am

Thank you (≧∇≦)ﾉ. I appreciate this.

system · February 15, 2023, 6:44am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Datastream index name Elasticsearch ilm-index-lifecycle-management , datastreams	2	788	September 1, 2022
Datastream rollover without dated Elasticsearch ilm-index-lifecycle-management , datastreams	3	331	October 18, 2022
Inconsistent datastream index rollover Elasticsearch ilm-index-lifecycle-management , datastreams	1	348	February 27, 2023
Existing index and lifecycle policy Elasticsearch ilm-index-lifecycle-management	3	277	June 30, 2023
Inconsistency in data stream rollover Elasticsearch docker , ilm-index-lifecycle-management , datastreams	2	329	February 24, 2023

Indices generated lately than real time several hours

Related topics