Hi Paul,
Thank you so much. I was doing rsynch on the ES data directory as a
temporary backup solution. Could you please guide me in setting up the FS
Gateway correctly. I tried something and I could not restore the data.
- Could you please show me what and all are the settings that you
made and in which config files?
The gateway setting is very simple...:
Gateway is shared FS
gateway:
type: fs
fs:
location: /mnt/esgateway/
/mnt/esgateway is an NFS mounted share to a separate physical host to all
the ES nodes. All ES nodes have this same share point.
- Is incremental index possible on the FS Gateway snapshot?
No with a bit of yes, but it's not what you think it is. At the end of
the day the segments are files, and the incremental 'sync' difference is
relatively low since the smaller segments generally merge together, more
often than not leaving the bigger segments unchanged until a larger merge
happens, so the rsync tends to have good 'saving' in terms of not needing
to sync too many large files - all until a very large merge or an Optimize
is done, and then it's sort of all brand new files again.
So it's mostly no, it's always a 'full' sync, but there's lots of savings
there. Maybe you're asking a different question though.
- Am I correct if I have understood that FS Gateway snapshot will
have both the state and also the data?
yes, the gateway includes cluster state (metadata) and the indices
directories
- How to restore the data back in Cluster. Is this like, restore the
snapshot and then restart the cluster?
While there's no 'restore' tool (as yet), all we do in our DR is:
- have the DR cluster shutdown
- wipe clean the local data directory for each node
- Ensure the DR cluster has a configuration with the FS Gateway pointed to
an NFS share with the replicated copy of the gateway
- Start up the cluster
All the nodes now recover their state from the gateway. (we have multiple
DC's using this DR data centre as a location, so we deliberately purge any
local node state to ensure we get a clean recovery for the DC coming into
the DR location.
- Are you using it in production if yes, do know the time it would
take for restoring a Cluster of 900GB data in total?
Yes we do. 900 Gb's a good size for sure. Ours are only in the
up-to-100-Gb mark. You'll have to do some of your own testing on that,
it'll be hardware/environment specific on how long that takes (Disk RAID
setup, network bandwidth, number of nodes for parallel recovery etc). At
the end of the day it's how fast you can transfer the shard contents to the
relevant nodes.
I'm guessing here actually (Shay or others could confirm?), but I believe
the Master 'delegates' the node to recover specific shards from the shared
gateway, so the central location will be hit from all nodes to recover
from, so that host is probably the limiting resource factor (Disk & Network
bandwidth on that node).
Paul Smith