Around 1500 indexes to snapshot and restore in another cluster. How would you do it?

Dear community members,

I recently encountered a challenge involving the migration of an entire cluster to a different version of Elasticsearch, specifically one that includes archive indexes.

A few days ago, I posted a query seeking assistance on this matter and received valuable feedback. Unfortunately, due to constraints with our customer's licensing (Platinum to Enterprise), we are unable to perform a direct snapshot and restore from Elasticsearch 6.x to 8.12. Consequently, I am required to execute the migration in two steps: first from version 6 to 7, and then from 7 to 8.

The production environment consists of approximately 1500 indexes, each ranging in size from 2 to 4GB, with data organized by day and month.

I am seeking advice on best practices for handling this scenario. How do seasoned professionals approach such operations? Is scripting commonly employed to, for instance, snapshot an entire month at a time and then restore it?

I am keen to hear your insights on managing this data migration with a considerable number of indexes. Your expertise and experiences will be highly valuable in guiding me through this process.

Thank you in advance for your assistance!!!

An option is to reindex using reindex from remote. So you basically read the 6.x cluster and write to the 8.x cluster.

2 Likes

You could also use Logstash to do the same thing with the Logstash Elasticsearch input and output filters. We use Logstash a lot for these types of operations.

1 Like

I'm investigating in this direction.

Already made a few tests but I'm having issues regarding network communications. Not related with Elastic. Some ports need to be opened between Elastic 6 and 8.

I'm a bit concern regarding network throughput and read/write speeds... since we have around 1000 indexes with 2 to 4GB each...

In terms of operations this (reindex command) wouldn't be done one by one right?

What do you people use? Scripting in some way? :S

1500 indexes with this size, would give you something close to 6 TB.

Do your index have some naming pattern? You coud reduce the number of indices in your destination cluster.

Like, reindex daily indices from january into a monthly index.

For example, if you have indices named like this:

  • appname-2023.01.01
  • appname-2023.01.02
  • ...
  • appname-2023.01.31

You could set your destination index to be just appname-2023.01, in the source for the reindex command you just use appname-2023.01.* and every daily index will be reindex on the same destination index.

1 Like

Yeap. They are like that.

I don't know the impacts for the development team and for the business to reduce the number of indexes (let's say like that).

I have no idea if i really need to do 1 for 1 or I could join them like you suggested.

But yes. They seem divided by day since they have a name_2024022001, and so on. Something like this.

But if a * works, for a 1 for 1 situation... that would be awesome for me.

Hi there,

One other thing that you may wish to look at in the future is using ILM.

You seem to have lots of small indexes and by using ILM it provides you with an efficient shard size. It also allows you to set retention periods in Kibana for index housekeeping

We migrated all of our indexes to them a few months back. We still do have daily indexes and as these have the same index pattern, both can be queried at the same time.

We are just letting the daily indexes drop off based on their retention.

I don't think this works, on the reindex you can use the wildcard on the source index, to say that every index that matches the pattern will be reindex in the same destination, but if you want to reindex and keep the same name, you will need to do one by one.

It depends on how the index are being queried and written.

Also, having multiple small daily indices is not optimal, there is a recommendation to keep the shard size between 40 ~ 50 GB, this is hard to achieve using daily indices, the best approach is to change the indexing strategy to use an index managed by ILM that will rolover when it reaches the specified size or age.

But this will probably require changes on your side, on how you are indexing and reading the data.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.