I got some problems with Datastream's Indices Generating. In detail, the indices are generated more lately than real time 1 day or a few hours. I want to have an index for the right date of the generated index but i could not find the way to do neither the reason of generating late.
For example, I found that index .ds-logs-cef.log-arcsight.asa-2023.01.15-000020 started rollover action at 2023-01-16 01:16:10 and I do not have an index for today, this is the /_ilm/explain of this index:
Welcome to our community!
Please don't post pictures of text, logs or code. They are difficult to read, impossible to search and replicate (if it's code), and some people may not be even able to see them.
ILM does not guarantee that the rollover will happen every day at 00:00, there are potential delays due to the timing of the ILM process.
I use elastic agent and CEF to ingest logs.
The log timestamp is correct for my country's time zone. But I do not get what "index's timestamp" is...
And this is my real topology so I never turn it off and I am sure that it never down until now.
Thank you for your patience with my question (●'◡'●). Is this clearer?
The index date is probably when you first ran setup or ingested the first event.
The index was created at:
Sunday, January 15, 2023 6:16:05.792 PM
I got that by putting the timestamp in and convert... 1673806565792
So that is the creation date and it will rollover when it is 1 day old... And the every day from there old ... When the index is one day old not to line up with the clock which is what you want.
If you really want it to line up you can manually rollover the data stream at the Time you want... Then it will line up but since this is a data stream You're supposed to interact with the data via the data stream not the backing index.
Try to do this is fine but adds little value and is missing the concept of a data stream which is supposed to abstract you from the underlying indices.
Data Streams do not work that way, there is no guarantee they will rollover exactly at midnight not anyway I know. Even it you manually rollover the data stream at exactly midnight UTC.. the next rollover will be approximately 24 hours not exact...as ILM is a background task and 1d means ~24 hours from rollover not exactly midnight each day
Good luck.. poke around perhaps you can find a way... But with data Streams and ILM It is not meant to work the way you want.
You can look up legacy daily indices that is more what you want.. you would probably need to use logstash or something.
In my view you'll do all that and they'll be very little actual benefit... And even with that at the edges it may not be perfect. Not sure why you're trying to make it perfect.
Before rollover and data streams were available the standard way to create time-based indices was to create indices with the date in the name. All data having a timestamp belonging to that date would go into that index. This potentailly gives you the data separation you are looking for, although the scheme is generally based on UTC time and not local timezone.
The problem with this approach is that it easily results in very uneven sized shards, which can be a performance problem. It was also possible for data arriving very late to be indexed into indices that had already been moved to the warm or cold phase, which could also cause performance problems.
Rollover and data streams index into a single index using a write alias, which solves the second problem mentioned. Rollover also allows you to switch to a new index based on age and/or shard size. This means that you can generate multiple indices during a very busy day and have an index cover multiple days worth of data when the data volumes are low without changing configuration. This will give you shards of more even size, which is generally good and desirable for this type of data.
As retention is managed at the index level this means that you may keep data around in the cluster a bit longer if the index to be deleted covers multiple days but you can still guarantee that data covering a certain retention period is available, so this is usually not an issue.
I would recommend you to think carefully about why you feel you need the level of control you describe as you will be trading away the benefits I described and potentially make your cluster less performant and more difficult to manage.