Hardware snapshot best practice advice

I've read the Elasticsearch guide to using the built-in software snapshots, but I can't find any guidance on hardware snapshots in the docs or with a little googling.

What I want to do is quiesce Elasticsearch so that the index is in a stable state, equivalent to that you would see if Elasticsearch was shutdown, trigger a hardware snapshot of the index, then resume disk activity again on the server. Is there any way to do that in Elasticsearch, other than the obvious route of shutting down the node?

Note that I am assuming that the storage I am snapshotting has replicas of all of the shards in the index. This is something I can manage.

e.g. for comparison see the fsyncLock() in mongo http://docs.mongodb.org/manual/reference/method/db.fsyncLock/ , lock tables in MySqlhttps://dev.mysql.com/doc/refman/5.0/en/lock-tables.html etc

This isn't something that Elasticsearch supports sorry.

Hi Mark,

I wonder if there are any news on hardware snapshot strategies?

If not a proper quiesce, can we issue a command to ES to write tanslog checkpoints so that we know that that the data on disk is consistent at least up to certain timestamp? - this will be good enough to take snapshots of time based data, for example.

Can anyone else where their experience, if any, with hardware snapshots and ES? (like AWS EBS snapshots, GCP Persistent Disk snapshots, etc.)

Nothing that I know.
I'd raise a feature request if I were you :slight_smile:

I see. From what I read in Translog section, if I have time-based, immutable data, I should be fine with just taking hardware snapshots IF I'm able to re-ingest incoming data from the point just before the backup happened. Am I right?

I believe so. This isn't something we test so you should certainly test it
before you do it. After 2.0 everything that has been returned over the HTTP
response has been fsynced on all live nodes.

The restarts may be slow if there is a lot stuff in the translog to be
replayed on startup. A flush ought to help with that.

Yeah, I was planning to do flush + synced flush before taking snapshots.