Pro and cons between using daily and monthly indexes?

Hello

Following the advice given here (thank you), in a few months, Ill be able to reduce the total shards in my Elastic Stack single node from aprox 5000 to 160. This will generally give it better performance and not overload the single node.

I do have worries still though :slight_smile: What are some pros and cons to changing from daily to monthly indexes?

Thank you

1 Like

Before anyone asks why not merge them now.....

The client asks for a retention period of 180 days; If I merge them, the new index gets a creation date of today, not months ago, breaking the ILP of deleting after x days of index creation.

You could reindex the old data in indices named data-2021-06, data-2021-07, data-2021-08, data-2021-09...

That would work well, no?
I see the problem with ILM though. I think you can use index.lifecycle.origination_date (Index lifecycle management settings in Elasticsearch | Elasticsearch Guide [7.14] | Elastic).

Read also Manage existing indices | Elasticsearch Guide [7.14] | Elastic

You could reindex the old data in indices named data-2021-06 , data-2021-07 , data-2021-08 , data-2021-09 ...

That would work well, no?
I see the problem with ILM though. I think you can use index.lifecycle.origination_date (Index lifecycle management settings in Elasticsearch | Elasticsearch Guide [7.14] | Elastic).

The thing is this needs to be supported in the Merge API, right?

When I do the merge, I set index.lifecycle.origination_date to the epoch date, it will create the NEW index with that date/time correct?

What is the "Merge API"?

It's not related I believe. force merge is just merging segments within a shard of a given index. It does not merge "indices" if this is what you are looking for.

Ah, my mistake. I ment the reindex API

At a quick glance, it doesnt seem to support the index.lifecycle.origination_date paramter.

(BTW, we are getting offtopic here)

So the reindex API reads data from one or many indices using the Scroll API and then load them into a destination index using the Bulk API.
The destination index is like any normal index. I mean that you can create it and set index.lifecycle.origination_date setting IMO.
Then call the reindex API.

OK, so if I understand correctly.....

I should copy a ILM policy i use BUT set a index.lifecycle.origination_date.

Create a blank empty index using that new ILM policy

Copy from my old index (with the original ILM policy) to the new index (with the copied ILM policy) and you are saying that the index creation date on that new one should be the one I established?

It sounds intresting......not exactly sure if it works like that.

It would be great if you can try to reproduce this scenario in a test env and make sure it works as you'd expect.
And update this thread so the community will benefit from your experience :slight_smile:

As you problably well know, production IS the test scenario :wink: but sadly, it is for a external client so I cannot do it.

It does look good on paper (documentation) and should work. Plus building a Powershell script for it shouldnt be THAT difficult.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.