I have a ELK stack deployed with docker container in a private network, and I have to export some of the indices to a device (such as a laptop) and import them to another Elasticsearch instance in another private network.
These two network aren't able to connect each other, so I have to save the indices as files or something.
The processes I found on internet is this (link). The steps to bring indices from instance A to instance B are:
Create a snapshot with Create Snapshot API with same name on A and B. While creating snapshot on A, specify which indices are to export (for example, specifying "index_2022.07.29" with body {"indices":"index_2022.07.29"}).
Copy all contents inside the folder which snapshot repository registers with (A instance).
Remove all content inside the folder which snapshot repository registers with (B instance), and paste all contents from A to this folder.
Restart B Elasticsearch.
Restore at B with Restore Snapshot API, and specify "index_2022.07.29" with body {"indices":"index_2022.07.29"}
I don't know if there's an officially recommended processes to meet this export-import requirement. I am worried that the steps I found are risky or not a stable way.
And if I want to do the export-import process several times, for example, export index_2022.07.27, index_2022.07.28, index_2022.07.29 from A as separate folders (or files), and restore each index at B one by one, what should I do? (such as delete snapshot with Delete Snapshot API at the end of importing process.)
That's almost the right process. You're pretty much taking a repository backup and then restoring it at the new location. However it's important that you don't modify the contents of a repository while it's registered with Elasticsearch (see these docs for details).
To follow the official repository backup process, you should unregister the repository on A before step 3, and you should not register the repository on B until after step 4.
@DavidTurner Thanks for your help! I try it and the index is successfully imported at the B instance!
I have an advanced use case that needs help, the process now is:
Register the snapshot repository at A instance.
Create a snapshot and specify one index at A instance.
Unregister the snapshot repository at A instance.
Copy all contents inside the folder which the snapshot repository registers with (A instance).
Paste all contents from A instance to the repository folder of B instance (at the first time, this folder is empty).
Register the snapshot repository at B instance.
Restart B Elasticsearch. (is it necessary?)
Restore and specify the index specified while creating the snapshot at A instance.
Like I said in my question, this process needs to be run several times.
If I do steps 1 to 8 in order to export and import "index_1", what else should I do if I want to do this export-import to "index_2"?
At A instance, I think I only need to register again and create another snapshot just like steps 1 to 4, but at B instance, should I directly paste all contents from A or delete all previous contents and paste all new contents?
@DavidTurner It seems like it's safe to delete or replace all contents after the repository is unregistered.
If I paste the other contents generated by a snapshot from any Elasticsearch instance to replace the original existing contents, and then register the repository, the new snapshot will be loaded and can be restored.
Therefore, what B importing with another snapshot from A should do is unregister the repository of B, replace all contents with the contents from A, register the repository of B, and then restore.
Use the step numbers I said in the previous reply, importing two indices one by one will be:
1 => 2 => 3 => 4 => 5 => 6 => 8 => unregister B repo => 1 => 2 => 3 => 4 => remove all contents of repository folder of B => 5 => 6 => 8
Does your elasticsearch deployments have internet access?
If they both have access to the internet it would be easier to have a cloud repository with your snapshots and share it, in the second deployment the repository would be read-only.
This has an extra cost, but if you need to do this frequently, maybe the amount of work that it will save you can justify this extra cost.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.