I am interested in deploying ES, but I have a couple of questions
about the Gateway module that I couldn't answer with the current
documentation. First a bit of background, the site has a persistent
store in a traditional RDBMs, and the idea is to also send the
information to ES for searching.
Having already all the information for batch recovery in a DB, and
given the reasonable stability of the data center, I am considering
not replicating even once more the information in a Gateway (DB + ES
Lucene + ES Gateway). For example if the site goes totally down, and
then comes back, I can live with some inconsistent indices (meaning
some values are in one Lucene replica but not in another one, not a
corrupted index of course) but mostly everything would be ok (if you
hit one server you'll see a couple of results more than the next one,
but nothing serious in my case). What would be a problem in my case is
that in the name of consistency the recovery takes more than a couple
of minutes or the whole cluster gets blocked waiting for some shard/
index in a server that is still missing for whatever reason. Later
without the live site downtime pressure I would run a batch full
indexing, using aliases so it won't disrupt traffic.
The point of this is instead of paying an everyday price of
replication I'd pay a bit of inconsistency plus a batch full indexing
just in the case of a full cluster failure.
So, the question is, if I disable the Gateway, and a full cluster fail
happens, is it possible for ES to come back with it's local index
information (probably the cluster meta-data too) but without reloading
everything from snapshots+translog? If yes, what are the configuration
options to enable?
I am interested in deploying ES, but I have a couple of questions
about the Gateway module that I couldn't answer with the current
documentation. First a bit of background, the site has a persistent
store in a traditional RDBMs, and the idea is to also send the
information to ES for searching.
Having already all the information for batch recovery in a DB, and
given the reasonable stability of the data center, I am considering
not replicating even once more the information in a Gateway (DB + ES
Lucene + ES Gateway). For example if the site goes totally down, and
then comes back, I can live with some inconsistent indices (meaning
some values are in one Lucene replica but not in another one, not a
corrupted index of course) but mostly everything would be ok (if you
hit one server you'll see a couple of results more than the next one,
but nothing serious in my case). What would be a problem in my case is
that in the name of consistency the recovery takes more than a couple
of minutes or the whole cluster gets blocked waiting for some shard/
index in a server that is still missing for whatever reason. Later
without the live site downtime pressure I would run a batch full
indexing, using aliases so it won't disrupt traffic.
The point of this is instead of paying an everyday price of
replication I'd pay a bit of inconsistency plus a batch full indexing
just in the case of a full cluster failure.
So, the question is, if I disable the Gateway, and a full cluster fail
happens, is it possible for ES to come back with it's local index
information (probably the cluster meta-data too) but without reloading
everything from snapshots+translog? If yes, what are the configuration
options to enable?
Yes, you are right, I've seen it and gave it a try. I saw it generates
the Lucene index files and a translog dir, but I didn't see a snapshot
anywhere, I added some documents, did a refresh, tried the snapshot
api, and nothing.
I read the documentation, but couldn't get an answer to this, I assume
it doesn't need a separate snapshot, right? it just uses Lucene and a
translog.
I am interested in deploying ES, but I have a couple of questions
about the Gateway module that I couldn't answer with the current
documentation. First a bit of background, the site has a persistent
store in a traditional RDBMs, and the idea is to also send the
information to ES for searching.
Having already all the information for batch recovery in a DB, and
given the reasonable stability of the data center, I am considering
not replicating even once more the information in a Gateway (DB + ES
Lucene + ES Gateway). For example if the site goes totally down, and
then comes back, I can live with some inconsistent indices (meaning
some values are in one Lucene replica but not in another one, not a
corrupted index of course) but mostly everything would be ok (if you
hit one server you'll see a couple of results more than the next one,
but nothing serious in my case). What would be a problem in my case is
that in the name of consistency the recovery takes more than a couple
of minutes or the whole cluster gets blocked waiting for some shard/
index in a server that is still missing for whatever reason. Later
without the live site downtime pressure I would run a batch full
indexing, using aliases so it won't disrupt traffic.
The point of this is instead of paying an everyday price of
replication I'd pay a bit of inconsistency plus a batch full indexing
just in the case of a full cluster failure.
So, the question is, if I disable the Gateway, and a full cluster fail
happens, is it possible for ES to come back with it's local index
information (probably the cluster meta-data too) but without reloading
everything from snapshots+translog? If yes, what are the configuration
options to enable?
Yes, you are right, I've seen it and gave it a try. I saw it generates
the Lucene index files and a translog dir, but I didn't see a snapshot
anywhere, I added some documents, did a refresh, tried the snapshot
api, and nothing.
I read the documentation, but couldn't get an answer to this, I assume
it doesn't need a separate snapshot, right? it just uses Lucene and a
translog.
I am interested in deploying ES, but I have a couple of questions
about the Gateway module that I couldn't answer with the current
documentation. First a bit of background, the site has a persistent
store in a traditional RDBMs, and the idea is to also send the
information to ES for searching.
Having already all the information for batch recovery in a DB, and
given the reasonable stability of the data center, I am considering
not replicating even once more the information in a Gateway (DB + ES
Lucene + ES Gateway). For example if the site goes totally down, and
then comes back, I can live with some inconsistent indices (meaning
some values are in one Lucene replica but not in another one, not a
corrupted index of course) but mostly everything would be ok (if you
hit one server you'll see a couple of results more than the next one,
but nothing serious in my case). What would be a problem in my case is
that in the name of consistency the recovery takes more than a couple
of minutes or the whole cluster gets blocked waiting for some shard/
index in a server that is still missing for whatever reason. Later
without the live site downtime pressure I would run a batch full
indexing, using aliases so it won't disrupt traffic.
The point of this is instead of paying an everyday price of
replication I'd pay a bit of inconsistency plus a batch full indexing
just in the case of a full cluster failure.
So, the question is, if I disable the Gateway, and a full cluster fail
happens, is it possible for ES to come back with it's local index
information (probably the cluster meta-data too) but without reloading
everything from snapshots+translog? If yes, what are the configuration
options to enable?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.