How to take snapshots in cluster

animageofmine · December 16, 2016, 6:57pm

Have a simple and a stupid question, since I couldn't find a direct answer (except here)

We have a cluster of 8 servers, all dockerized with data volume mounted on the host (/var/lib/elasticsearch) and want to set up snapshots. A couple of quick stupid questions:

Do we take snapshot from each node in the cluster?
If answer to 1 is yes, do we create a separate snapshot (different name) for each node?

E.g.
Following are the nodes: esnode1-5, esmaster1-3

Following is the query for snapshot from each node in the cluster, where <nodename_date> could be data1_12162016 or data2_12162016 and so on.

curl -XPUT 'http://localhost:9200/_snapshot/elasticsearch/<nodename_date>?wait_for_completion=true' -d '{
    "ignore_unavailable": "true",
    "include_global_state": false
}'

When we restore, run the following API / request:

curl -XPOST 'localhost:9200/_snapshot/elasticsearch/<nodename_date>/_restore' -d '{
    "ignore_unavailable": "true",
    "include_global_state": false
}'

If answer to question 1 is no, please let me know how snapshots are taken for a dockerized cluster along with an example, if possible. Let me know if you need more information and thank you in advance.

warkolm · December 17, 2016, 12:22am

A snapshot it a cluster level action, ie it happens on every node that holds data for the snapshot, and that node is responsible for putting the data into the repo.

animageofmine · December 17, 2016, 4:09am

Thanks. I take your reply as "you need to back up from each node in the cluster".

When we restore the snapshot, how would each node know what to restore from the repository? For example, if index has 5 shards, each shard on one data node, how would each data node know what to restore from snapshot?

warkolm · December 17, 2016, 4:14am

Yes, all nodes.

A restore does the entire index, or snapshot. You can't do per shard.

animageofmine · December 17, 2016, 4:29am

Sounds good, I will set this up and give it a shot, thanks for your help!

Just a quick clarification: Should I use the same snapshot name (e.g. snapshot_12162016) from all nodes or separate snapshot name from each node?

warkolm · December 17, 2016, 4:35am

You cannot have a per node snapshot.

animageofmine · December 17, 2016, 6:11am

Perfect, thank you for the clarification!

Christian_Dahlqvist · December 17, 2016, 7:11am

Creating a snapshot is as Mark points out a cluster-level operation, and the generated snapshots are written to a shared storage that all nodes need to have access to, e.g. a network mounted file system, S3 or HDFS.

animageofmine · December 19, 2016, 7:08pm

@Christian_Dahlqvist @warkolm

I added AWS plugin and was able to register the repository. However, when I try to take snapshot from each of the nodes in the cluster, I get concurrent snapshot exception.

{
  "error": {
    "root_cause": [
      {
        "type": "concurrent_snapshot_execution_exception",
        "reason": "[essnapshots:12192016]a snapshot is already running"
      }
    ],
    "type": "concurrent_snapshot_execution_exception",
    "reason": "[essnapshots:12192016]a snapshot is already running"
  },
  "status": 503
}

Following is my query that runs from each node in the cluster:

curl -XPUT 'http://localhost:9200/_snapshot/essnapshots/12192016?wait_for_completion=false' -d '{    
    "ignore_unavailable": "true",
    "include_global_state": false
}'

This means, you can't use the same snapshot name from each node in the cluster unless you use different snapshot name from each node in the cluster. Or you don't have to fire snapshot API from all the nodes in the cluster. I don't know which one is valid and would like to avoid trial and error test. A verbose explanation with an example would be appreciated. We are trying to go live in production. Thank you.

Christian_Dahlqvist · December 19, 2016, 7:19pm

Snapshots are not created per node but for the whole cluster, which is why all nodes need access to the storage.

animageofmine · December 19, 2016, 8:24pm

I think all nodes have access to the storage because the snapshot was successful. This would mean that, you only need to trigger snapshot from one of the nodes in the cluster and it will talk with other nodes, grep the data and push it to S3. Let me know if that's not the case. Thanks so much!

Christian_Dahlqvist · December 19, 2016, 8:45pm

It does not matter which node you trigger the snapshot from, so that is correct.

animageofmine · December 19, 2016, 9:04pm

sounds good. Thank you so much!

animageofmine · December 20, 2016, 4:19pm

@Christian_Dahlqvist @warkolm

Been monitoring the logs and observed the following, which seem to be from AWS connection. I think the logs are probably because of some internal protocol to connect with AWS, but I still wanted to run through you guys, if that's expected.

I am not sure why is it connecting to AWS so many times that it has to close the connection over and over. We did not request any snapshots during this time.

Seems to log every minute based on the pattern (6:04:11, 6:05:11, etc...)

[2016-12-20T06:04:11,002][DEBUG][o.a.h.i.c.PoolingClientConnectionManager] Closing connections idle longer than 60 SECONDS
[2016-12-20T06:05:11,002][DEBUG][o.a.h.i.c.PoolingClientConnectionManager] Closing connections idle longer than 60 SECONDS
[2016-12-20T06:06:11,002][DEBUG][o.a.h.i.c.PoolingClientConnectionManager] Closing connections idle longer than 60 SECONDS
[2016-12-20T06:07:11,002][DEBUG][o.a.h.i.c.PoolingClientConnectionManager] Closing connections idle longer than 60 SECONDS
[2016-12-20T06:08:11,003][DEBUG][o.a.h.i.c.PoolingClientConnectionManager] Closing connections idle longer than 60 SECONDS
[2016-12-20T06:09:11,003][DEBUG][o.a.h.i.c.PoolingClientConnectionManager] Closing connections idle longer than 60 SECONDS
[2016-12-20T06:10:11,003][DEBUG][o.a.h.i.c.PoolingClientConnectionManager] Closing connections idle longer than 60 SECONDS
[2016-12-20T06:11:11,004][DEBUG][o.a.h.i.c.PoolingClientConnectionManager] Closing connections idle longer than 60 SECONDS
[2016-12-20T06:12:11,004][DEBUG][o.a.h.i.c.PoolingClientConnectionManager] Closing connections idle longer than 60 SECONDS
[2016-12-20T06:13:11,004][DEBUG][o.a.h.i.c.PoolingClientConnectionManager] Closing connections idle longer than 60 SECONDS
[2016-12-20T06:14:11,005][DEBUG][o.a.h.i.c.PoolingClientConnectionManager] Closing connections idle longer than 60 SECONDS
[2016-12-20T06:15:11,005][DEBUG][o.a.h.i.c.PoolingClientConnectionManager] Closing connections idle longer than 60 SECONDS

system · January 17, 2017, 4:20pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Snapshot/restore questions Elasticsearch	1	293	July 6, 2017
Snapshot a single node Elasticsearch	4	323	July 6, 2017
Creating/Restoring snapshot from one cluster to another Elasticsearch	3	4446	December 20, 2017
Elasticsearch snapshot in cluster environment Elasticsearch	2	606	July 27, 2017
Is this the correct approach of taking a snapshop of an indice which is in production? Elasticsearch	11	417	April 15, 2021

How to take snapshots in cluster

Related topics