Make a copy of a running Elastic node


I have a running Elastic node with an index. I'm constantly inserting documents into the index in a process that I can't stop. The whole process is running in a remote server.

I need to make a copy of the index into my local machine in order to experiment with the data, just the data, I don't need to have any interaction with the original process after making the copy. The idea is that I need just to make an instant copy without touching the remote server process in any way since it a critical indexing process.

Is it possible to copy a instantaneous snapshot from the index into my local machine?

Best regards.

Yes, that's what snapshot and restore is all about.

That's not a reasonable constraint, you will have to touch it to get the data out of it. But taking a snapshot should not be disruptive.

Thank you for your answers. A couple of questions about them.

Regarding snapshot and restore. Let me put it this way. I have an external server running an Elasticsearch process, call it external_process. I'm constantly ingesting new data into external_process's index (let's say 100 documents per minute). Snapshot and restore allows me to take a copy from external_process's index into my local machine without interrupting the ingestion of data? Sorry if you already explain this but keeping the process running is very important for me :sweat_smile:

By 'not be disruptive' do you mean that the ingestion of data will not be interrupted?


Yes that's right

Thanks. I'm starting to understand the procedure of using snapshots. As far as I understand, to create a snapshot I need to previously have created and registered a snapshot repository. Doing this (following the procedure defined here), I encounter a problem, since elastic returns an error indicating that my location does not match any location in path.repo (in elasticsearch.yml).

As a test, I created an elastic process in another host, configuring path.repo, and I managed to register the snapshot repository correctly. However, I didn't configure the value of path.repo on my remote Elastic before starting it. Since I can't restart this remote Elastic I wonder if it is possible to configure the snapshot repository without stopping the process. At the time I haven't found an alternative manner.

Any insights will be very appreciated.

Is it possible to use the _reindex API?

It is not possible, to add a shared file-system repository you need to restart the node, also the path.repo of both nodes needs to be the same shared file system.

Not exactly. But you can do a reconfiguration like this without downtime if you introduce another two nodes into your cluster. Given how important it is to you that this indexing process does not stop, you really do need multiple nodes. See desigining for resilience for more information about setting up a HA cluster.

I think you can set up a repository on S3 or similar without needing to restart tho.

I think it would depend on the version, on version 8.X the repositories plugins are built-in so there is no need to install them.

On version 7.X the repositories plugins need to be installed, which would need a restart.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.