Findings related to local gateway backup

I read most of the discussions regarding ES backup strategy using local gateway which has been recommended as a better approach by many including Shay. To better understand what "disabling flush" means, I did some tests on a ES cluster with 3 nodes ( 6 shards with 2 replicas for each shard meaning each node has exactly same data) in my laptop. Below are some of my findings. Wanted to share with you. If you have any question/doubt please post.

  1. Local default gateway stores the data(indices, entities) in the $ES_HOME/data dir by default. We can choose to store the data in anywhere we want (e.g. EBS) instead of the default location.

  2. ES first writes data to translog (binary file) and then flushes (writes) to storage automatically. If we do something like translog.disable_flush = "true", ES will no longer do automatic flushing. Even though automatic flush is disabled, ES operations are still valid (e.g. I checked creating index/entity, searching index/entity are valid).

  3. When the automatic flushing is disabled, we can still do manual flushing using api.

  4. When the automatic flushing is disabled and we have uncommitted translogs, if we restart the nodes, it looks like translogs gets automatically flushed. So, ES will respond correctly to search queries for those previously uncommitted indices/entities.

  5. As long as the translog for any shard exists in any replica/node, ES will respond correctly to search operation. But, if the translog for any shard is deleted from all nodes and we restart the cluster, we will lose this uncommitted shard data.

Shay also mentioned in this post that its OK to copy the data dir without disabling flush. In that case, the last successful lucene commit state will be used for the recovery. To me, its not clear, where exactly the last successful lucene commit state is recorded, what meta data are recorded as a part of successful commit?!topic/elasticsearch/3U6NTMVoOZI

To me it sounded like, using a high IOPS performance EBS volume as a local gateway is probably the best way for ES backup which ensures data persistence, easy recover (attaching to new ES instance) and also not to deal with disabling flush stuffs.