I have a running Elastic node with an index. I'm constantly inserting documents into the index in a process that I can't stop. The whole process is running in a remote server.
I need to make a copy of the index into my local machine in order to experiment with the data, just the data, I don't need to have any interaction with the original process after making the copy. The idea is that I need just to make an instant copy without touching the remote server process in any way since it a critical indexing process.
Is it possible to copy a instantaneous snapshot from the index into my local machine?
Thank you for your answers. A couple of questions about them.
Regarding snapshot and restore. Let me put it this way. I have an external server running an Elasticsearch process, call it external_process. I'm constantly ingesting new data into external_process's index (let's say 100 documents per minute). Snapshot and restore allows me to take a copy from external_process's index into my local machine without interrupting the ingestion of data? Sorry if you already explain this but keeping the process running is very important for me
By 'not be disruptive' do you mean that the ingestion of data will not be interrupted?
Thanks. I'm starting to understand the procedure of using snapshots. As far as I understand, to create a snapshot I need to previously have created and registered a snapshot repository. Doing this (following the procedure defined here), I encounter a problem, since elastic returns an error indicating that my location does not match any location in path.repo (in elasticsearch.yml).
As a test, I created an elastic process in another host, configuring path.repo, and I managed to register the snapshot repository correctly. However, I didn't configure the value of path.repo on my remote Elastic before starting it. Since I can't restart this remote Elastic I wonder if it is possible to configure the snapshot repository without stopping the process. At the time I haven't found an alternative manner.
It is not possible, to add a shared file-system repository you need to restart the node, also the path.repo of both nodes needs to be the same shared file system.
Not exactly. But you can do a reconfiguration like this without downtime if you introduce another two nodes into your cluster. Given how important it is to you that this indexing process does not stop, you really do need multiple nodes. See desigining for resilience for more information about setting up a HA cluster.
I think you can set up a repository on S3 or similar without needing to restart tho.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.