Have a simple and a stupid question, since I couldn't find a direct answer (except here)
We have a cluster of 8 servers, all dockerized with data volume mounted on the host (/var/lib/elasticsearch) and want to set up snapshots. A couple of quick stupid questions:
Do we take snapshot from each node in the cluster?
If answer to 1 is yes, do we create a separate snapshot (different name) for each node?
E.g.
Following are the nodes: esnode1-5, esmaster1-3
Following is the query for snapshot from each node in the cluster, where <nodename_date> could be data1_12162016 or data2_12162016 and so on.
If answer to question 1 is no, please let me know how snapshots are taken for a dockerized cluster along with an example, if possible. Let me know if you need more information and thank you in advance.
A snapshot it a cluster level action, ie it happens on every node that holds data for the snapshot, and that node is responsible for putting the data into the repo.
Thanks. I take your reply as "you need to back up from each node in the cluster".
When we restore the snapshot, how would each node know what to restore from the repository? For example, if index has 5 shards, each shard on one data node, how would each data node know what to restore from snapshot?
Creating a snapshot is as Mark points out a cluster-level operation, and the generated snapshots are written to a shared storage that all nodes need to have access to, e.g. a network mounted file system, S3 or HDFS.
I added AWS plugin and was able to register the repository. However, when I try to take snapshot from each of the nodes in the cluster, I get concurrent snapshot exception.
This means, you can't use the same snapshot name from each node in the cluster unless you use different snapshot name from each node in the cluster. Or you don't have to fire snapshot API from all the nodes in the cluster. I don't know which one is valid and would like to avoid trial and error test. A verbose explanation with an example would be appreciated. We are trying to go live in production. Thank you.
I think all nodes have access to the storage because the snapshot was successful. This would mean that, you only need to trigger snapshot from one of the nodes in the cluster and it will talk with other nodes, grep the data and push it to S3. Let me know if that's not the case. Thanks so much!
Been monitoring the logs and observed the following, which seem to be from AWS connection. I think the logs are probably because of some internal protocol to connect with AWS, but I still wanted to run through you guys, if that's expected.
I am not sure why is it connecting to AWS so many times that it has to close the connection over and over. We did not request any snapshots during this time.
Seems to log every minute based on the pattern (6:04:11, 6:05:11, etc...)
[2016-12-20T06:04:11,002][DEBUG][o.a.h.i.c.PoolingClientConnectionManager] Closing connections idle longer than 60 SECONDS
[2016-12-20T06:05:11,002][DEBUG][o.a.h.i.c.PoolingClientConnectionManager] Closing connections idle longer than 60 SECONDS
[2016-12-20T06:06:11,002][DEBUG][o.a.h.i.c.PoolingClientConnectionManager] Closing connections idle longer than 60 SECONDS
[2016-12-20T06:07:11,002][DEBUG][o.a.h.i.c.PoolingClientConnectionManager] Closing connections idle longer than 60 SECONDS
[2016-12-20T06:08:11,003][DEBUG][o.a.h.i.c.PoolingClientConnectionManager] Closing connections idle longer than 60 SECONDS
[2016-12-20T06:09:11,003][DEBUG][o.a.h.i.c.PoolingClientConnectionManager] Closing connections idle longer than 60 SECONDS
[2016-12-20T06:10:11,003][DEBUG][o.a.h.i.c.PoolingClientConnectionManager] Closing connections idle longer than 60 SECONDS
[2016-12-20T06:11:11,004][DEBUG][o.a.h.i.c.PoolingClientConnectionManager] Closing connections idle longer than 60 SECONDS
[2016-12-20T06:12:11,004][DEBUG][o.a.h.i.c.PoolingClientConnectionManager] Closing connections idle longer than 60 SECONDS
[2016-12-20T06:13:11,004][DEBUG][o.a.h.i.c.PoolingClientConnectionManager] Closing connections idle longer than 60 SECONDS
[2016-12-20T06:14:11,005][DEBUG][o.a.h.i.c.PoolingClientConnectionManager] Closing connections idle longer than 60 SECONDS
[2016-12-20T06:15:11,005][DEBUG][o.a.h.i.c.PoolingClientConnectionManager] Closing connections idle longer than 60 SECONDS
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.