Make a copy of a running Elastic node

tmslara.a · February 1, 2023, 3:01pm

Hi,

I have a running Elastic node with an index. I'm constantly inserting documents into the index in a process that I can't stop. The whole process is running in a remote server.

I need to make a copy of the index into my local machine in order to experiment with the data, just the data, I don't need to have any interaction with the original process after making the copy. The idea is that I need just to make an instant copy without touching the remote server process in any way since it a critical indexing process.

Is it possible to copy a instantaneous snapshot from the index into my local machine?

Best regards.

DavidTurner · February 1, 2023, 5:52pm

Yes, that's what snapshot and restore is all about.

That's not a reasonable constraint, you will have to touch it to get the data out of it. But taking a snapshot should not be disruptive.

tmslara.a · February 1, 2023, 6:12pm

Thank you for your answers. A couple of questions about them.

Regarding snapshot and restore. Let me put it this way. I have an external server running an Elasticsearch process, call it external_process. I'm constantly ingesting new data into external_process's index (let's say 100 documents per minute). Snapshot and restore allows me to take a copy from external_process's index into my local machine without interrupting the ingestion of data? Sorry if you already explain this but keeping the process running is very important for me

By 'not be disruptive' do you mean that the ingestion of data will not be interrupted?

Thanks

DavidTurner · February 1, 2023, 6:14pm

Yes that's right

tmslara.a · February 1, 2023, 7:48pm

Thanks. I'm starting to understand the procedure of using snapshots. As far as I understand, to create a snapshot I need to previously have created and registered a snapshot repository. Doing this (following the procedure defined here), I encounter a problem, since elastic returns an error indicating that my location does not match any location in path.repo (in elasticsearch.yml).

As a test, I created an elastic process in another host, configuring path.repo, and I managed to register the snapshot repository correctly. However, I didn't configure the value of path.repo on my remote Elastic before starting it. Since I can't restart this remote Elastic I wonder if it is possible to configure the snapshot repository without stopping the process. At the time I haven't found an alternative manner.

Any insights will be very appreciated.

tmslara.a · February 1, 2023, 8:03pm

Is it possible to use the _reindex API?

leandrojmp · February 1, 2023, 8:22pm

It is not possible, to add a shared file-system repository you need to restart the node, also the path.repo of both nodes needs to be the same shared file system.

DavidTurner · February 1, 2023, 8:40pm

Not exactly. But you can do a reconfiguration like this without downtime if you introduce another two nodes into your cluster. Given how important it is to you that this indexing process does not stop, you really do need multiple nodes. See desigining for resilience for more information about setting up a HA cluster.

I think you can set up a repository on S3 or similar without needing to restart tho.

leandrojmp · February 1, 2023, 9:02pm

I think it would depend on the version, on version 8.X the repositories plugins are built-in so there is no need to install them.

On version 7.X the repositories plugins need to be installed, which would need a restart.

system · March 1, 2023, 9:03pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Snapshot a single node Elasticsearch	4	324	July 6, 2017
Snapshot & Restore clarification Elasticsearch	8	1709	November 4, 2022
How to move Elasticsearch indexes from one server to another one Elasticsearch	15	52245	July 5, 2017
Index copy via snapshot & restore not working as expected Elasticsearch	1	438	July 6, 2017
Restore Snapshot from different cluster Elasticsearch	2	1707	November 3, 2017

Make a copy of a running Elastic node

Related topics