I am interested in deploying ES, but I have a couple of questions
about the Gateway module that I couldn't answer with the current
documentation. First a bit of background, the site has a persistent
store in a traditional RDBMs, and the idea is to also send the
information to ES for searching.
Having already all the information for batch recovery in a DB, and
given the reasonable stability of the data center, I am considering
not replicating even once more the information in a Gateway (DB + ES
Lucene + ES Gateway). For example if the site goes totally down, and
then comes back, I can live with some inconsistent indices (meaning
some values are in one Lucene replica but not in another one, not a
corrupted index of course) but mostly everything would be ok (if you
hit one server you'll see a couple of results more than the next one,
but nothing serious in my case). What would be a problem in my case is
that in the name of consistency the recovery takes more than a couple
of minutes or the whole cluster gets blocked waiting for some shard/
index in a server that is still missing for whatever reason. Later
without the live site downtime pressure I would run a batch full
indexing, using aliases so it won't disrupt traffic.
The point of this is instead of paying an everyday price of
replication I'd pay a bit of inconsistency plus a batch full indexing
just in the case of a full cluster failure.
So, the question is, if I disable the Gateway, and a full cluster fail
happens, is it possible for ES to come back with it's local index
information (probably the cluster meta-data too) but without reloading
everything from snapshots+translog? If yes, what are the configuration
options to enable?
This is getting too long, I know, sorry for that.
Thanks a lot for all the hard work,