Hi Eugene,
Thanks for your comments - I'll do my best to explain where I am coming
from, and to address some of the issues you have raised.
Firstly, where I'm coming from: the data I'm holding and searching against
needs to be 100% backed up because it needs to be audited in the future.
For that reason the data is held on an old fashioned multi-master
replicated relational DB.
In terms of the issues you raised:
- But how this is different from any other DB?
i) With relational DBs it is part of the strategy to replay the
transaction logs to make up for any data that hasn't been backed up. I've
heard of people doing this with ES, but it is not documented well anywhere,
additionally the transaction logs, to my limited understanding, are kept in
the same area as the index files and can suffer corruption. I think there
may be some monitoring in version 1.0 to stop ES writing to disk before the
files become corrupted, which would help. But the first point, that there
is no clear transaction log replay strategy outlined for elasticsearch.
ii) Multi-master replication - no doubt its possible to arrange JMS queues
or hazelcast/coherence grids to do this - but a build in solution would be
useful.
- Examples of data loss - upgrading elasticsearch versions, I've ended up
losing all data, no doubt through my own fault, and maybe I'd have been
more careful, and read upgrade instructions more carefully if I'd have know
that my data was not backed up in the relational database, but it is
definitely something that plays on my mind: "If I screw up this upgrade
process, or misunderstand the upgrade process then that's it my data is
gone"
So, I would probably add the following, although I could be wrong, because
I have not read every blog relating to ES upgrades:
- But how this is different from any other DB?
iii) There is no clear, consistent, well documented process of upgrading
elasticsearch versions, particularly when the underlying Lucene version
changes.
David.
On Tuesday, 14 January 2014 20:13:22 UTC, Eugene Strokin wrote:
You are correct. But how this is different from any other DB?
I guess the question is more like: if I'm running ES under normal
conditions, could index get corrupted?
If this is hardware issue, and you have replication switched on, then you
wouldn't get affected much. Your system will continue functioning but state
would become yellow. You'd need to replase the node and this is it.
Some people claimed, that they expirienced sudden index corruption with
data loss. I myself nether saw anything like this. Even though I had done
few times stupid things, and had near hart stroke feelings but data wasn't
lost at the end, and again I have nothing to blame but myself.
Regarding stability I could say that ES has not gave us any problems. I
was performing such things with success on production envirement with zero
downtime:
- adding nodes and replication
- transitioning data to another data center
- adding more clients
Etc...
I'd really like to hear from people who expirienced data loss. If someone
would provide details this would help us to understand that was wrong and
what we should avoid doing.
But becides claims that there are such cases, I didn't hear anything else.
Eugene
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5cff97f3-9541-4cba-a3c2-be0d8ad4440d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.