Snapshot, Hot/Warm Architecture and Upgrade

yuswanul · October 12, 2023, 8:55am

Hi there,

I have a few questions here:

First, if I have an index of 4.5 TB, what is the best way to back up that much data?
Second, my existing cluster has 25 data nodes in total, if I want to apply hot/warm architecture, what is the best practice for it?
Third, currently, my cluster is running on version 7.17. if I want to upgrade my cluster which contains 25 data nodes, 3 master nodes, 2 coordinate nodes, and 1 monitoring site. what is the best practice to upgrade it?

your answer will mean a lot
Thanks

Christian_Dahlqvist · October 12, 2023, 9:08am

The only supported way to backup Elasticsearch is through the snapshot API.

Is this a single index or a set of time-based indices? How many primary and replica shards do you have?

A hot warm architecture generally assumes you have time-based indices of some sort and a defined retention period. Is that the case for your use case?

yuswanul · October 12, 2023, 9:29am

I want to know:

What is the performance of backup & restore based on, and
What are the parameters to speed up the process?

that is a single index and it has 25 primary and replicas. the store.size is 4.5 TB and pri.store.size is 2.2 TB

umm no, what I want to know is, with my current cluster condition, which is all data nodes in the hot tier by default (CMIIW). because I don't explicitly set the tier for each data node, how do I make the transition to implement this hot/warm architecture?

Christian_Dahlqvist · October 12, 2023, 9:35am

It depends on a lot of factors, e.g. infrastructure and available resources, so I would recommend you set up a repository and test.

Sounds like you have very large shards, which can cause performance problems.

If you have a large single index a hot-warm architecture does not make any sense. What are you hoping to achieve? What is the problem you are looking to solve?

yuswanul · October 12, 2023, 10:28am

yeah, pretty large. each shard has a size of 93-94 GB

actually, it's not the only one. there are a few indexes with a size of 4.2 - 4.5 TB and what I want to achieve is to extend the retention of it

yuswanul · October 13, 2023, 9:13am

regarding extending retention, do you have any other suggestions for that?

currently, all those big indexes have a retention of 15 days

Christian_Dahlqvist · October 13, 2023, 9:54am

It sounds like you are not using time-based indices, is that correct? If so, do you delete data through delete-by-query?

Is your data immutable or are you performing updates?

yuswanul · October 13, 2023, 10:32am

no, we have been using time-based indices from the start till now. that big data is really the data that comes in 1 day. here is the capture

Christian_Dahlqvist · October 13, 2023, 10:42am

I can see a few deleted documents in the indices. Does this mean that you are performing updates/deletes or may this be a side effect of you specifying your own document ID?

yuswanul · October 13, 2023, 10:55am

yeah, but what is the correlation between deleted documents and backup all those indexes?

Christian_Dahlqvist · October 13, 2023, 10:58am

This is not related to backup, but switching to a hot/warm architecture.

For backup speed you will need to test.

yuswanul · October 13, 2023, 11:25am

OK, I can confirm that it is a side effect of specifying my own document ID

system · November 10, 2023, 11:25am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Best practice to backup Elasticsearch indexes Elasticsearch	3	1515	December 27, 2017
Hot backup strategy for Elasticsearch Elasticsearch	7	1907	July 6, 2017
Multiple clusters for hot warm architecture Elasticsearch	9	861	December 12, 2018
Elasticsearch backup only hot indices Elasticsearch	2	576	October 11, 2021
Storage optimization for ElasticSearch storing large data Elasticsearch	4	1409	July 5, 2017

Snapshot, Hot/Warm Architecture and Upgrade

Related topics