Snapshot process timing out , How to split snapshot process up?

I have designed an AWS lambda function to snapshot all indices on our smaller cluster .
We have 16 data nodes , 3 master nodes , 2 query nodes; with quite alot of data to snapshot.

The function worked ,however lambda functions timeout after 15mins so i couldn't snapshot all of the data. I managed to get only 3Tb. The function also needed to be throttled as it maxed out the 10gb Nic Interface.

Anyone have any suggestions of how i could break up the snapshotting process into chunks?
If i snapshot indices separately , can i combine all indices into one snapshot later for the restore process ?

Unsure what the best approach for this is and is elasticsearch has any features to solve this .
Running elasticsearch 6.3.0

Hi @Bkelly and welcome!

The snapshot process does not time out - it continues in the background until it's finished, and you can monitor its progress. Do you mean that your lambda is timing out? It doesn't sound like a great idea to keep a lambda running for the whole duration of a snapshot.

Hi @DavidTurner
Oh sorry yeah David i mean the lambda function times out after 15minutes . Have you any suggestions on how to tackle the snapshot process for a large amount of indices ? The aim is to snapshot our clusters and restore in DR rather than snapshotting the EBS for each instance.

Normally people start the snapshot via the API and then every few minutes call the status API to see if it's completed or not. No need to spend money on a Lambda for that.

Alternatively you can let Elasticsearch do this for you.

snapshotting the EBS for each instance

NB the docs warn you that this doesn't work:

WARNING: It is not possible to back up an Elasticsearch cluster simply by taking a copy of the data directories of all of its nodes.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.