I have the needs to backup data and dump data from production environment to stage.
The size of one index is about 20G.
I've tried using elasticsearch-dump but it is too slow.
And I also try to use snapshot but have difficulties in renaming the index.
I know snapshot may restore data with the same index to other cluster easily but not sure if this could also restore data to new index on the same cluster.
Does anyone have a better solution or how to do it?
you can try reindex from remote, or if the clusters cannot communicate with each other using snapshot and restore in combination with rename_pattern and rename_replacement will work as well.
To my understanding, reindex is more like migration, moving data from index A to index B. Only one piece of data exists. But what I want to do is copying the data. I have two indices, A and B, and they have the same data. Am I getting you correctly?
Besides, I have few questions about using snapshot.
Let's say I have a repo called posts and a snapshot called post_20191218. The original index is posts and I want to restore the data to index posts_2019. So the rename_pattern is 'posts' and rename_replacement is '$1_2019'?
I got the error message by using api lol. The answer is `post` to `post_2019`. That's it.
The second question is if I want to restore the data from production to stage using snapshot, I have to transmit the snapshot on production to stage, right? Or is there any remote api that I could access the snapshot on remote machine and restore on local?
Thank you so much. As a newcomer in elasticsearch, your reply is so helpful to me!
reindex will also leave you with two indices, but there is one fundamental difference. reindex will index each document individually again, doing all the analysis before storing the data. This is usually used when you change the mapping.
Restoring an index basically just copies the original data from the repository and then loads the index from that data - involving much less CPU as there is no reindex work.
Usually you snapshot your data to a remote repository, which you then also make accessible for the other cluster. Another possibility is to copy your data somewhere where it is available via HTTP and use read only repositories, that cannot be used for snapshotting, but for restoring only.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.