Hi,
I just pushed support for a `local` gateway:
http://github.com/elasticsearch/elasticsearch/issues/issue/343. The idea of
the local gateway is to allow to perform full cluster recovery from local
information each node stores and not require a shared storage gateway (like
fs
for shared file system, or s3
, and hdfs
).
The idea of a shared gateway works really well when local node data is
considered transient, but a full cluster recovery is still required. A
transient local data can be when deciding to store the index in memory.
Another benefit of using a shared storage gateway is to allow for easy
backup of the index. If a backup process is required (on top of the high
availability aspect of elasticsearch), it make sense to have it integrated
into how elasticsearch works, so the recovery process will take the backups
into account.
Still, there are many cases where reusing the local data stored on each
node when performing full recovery make sense. Very large indices for
example, that are good with N (>1) replicas per shard, might find the shared
storage an overhead. Note that the shared storage model is not something
that other nosql solution provide, and most people are ok with relying on
local node storage and increased number of replicas. The good news is that
elasticsearch provides it now.
As a side note, shared storage model is something that takes some time
for people to understand (I can't count the number of times I have heard
something like: "ahh, it requires a shared storage, its not really
distributed then..."). Its a given when talking about "in memory" data
grids, since when bringing down the whole cluster, data was lost, and there
should be a way to recover it. With elasticsearch, this was my first
architecture decision for long term persistency, but I always had on the
roadmap the mentioned support.
Of course, this only make sense when using file based index storage (the
transaction log has moved to be file based since 0.8).
Enabling it is quite simple:
gateway:
type: "local"
Usually, you would want to configure the "gateway.recover_after_nodes"
setting to allow for more nodes to be in play when performing full cluster
restart so the correct cluster state is elected.
Last, I am considering make this setting the default one that comes with
the elasticsearch.yml setting. What do you think?
-shay.banon