In the world of traditional database, the transaction log is often kept on
separate devices / disk drives. This is reasonable because if a database
file gets corrupted, the transaction log must be kept in a safe place to
ensure full recovery. The idea is that database files contain saved
transactions and a volatile part which may get corrupted and can be
repaired by replaying the transaction log. The concept is based on the ACID
paradigm.
From what I understand, ES translogs are for communicating with Lucene.
Each Lucene operation is preceded by a write to the translog. By doing
this, request for Lucene operations are persisted and can not be
accidentally dropped. Lucene also does not know about transactional
operations and manages an internal cache in RAM. From time to time, the
Lucene segments are written to disk in append-only fashion (sync
operation). Lucene passes this operations to the JVM and the JVM passes it
to the operating system.
I'm not sure if ES translogs can be used to repair Lucene indexes by
replaying it, similar to the traditional database world. In case of low
disk situations, I have the experience it may succeed, but it does not
always succeed. The situation is often that ES translog is still intact and
Lucene files ran into out of disk space, but it can also occur that both ES
and Lucene tried to flush data to disk in failure, and ES can recover only
after the translog file got removed (at least in older ES versions).
The gateway concept is more similar to transactional processing. At node
startup, the gateway orchestrates the shards of an index and invokes
recovery with help from replica shards in order to recreate a valid index.
My conclusion is: it would not hurt to have a configuration for maintaining
ES translogs on separate disk devices, but I doubt it is worth the effort.
From a performance point of view, under high disk I/O contention, separate
disks might help to enhance disk IOPS, but with SSD, this became a
non-issue. ES translogs alone do not ensure the recoverability of a whole
index. The ES design for separation to recover an index successfully is
different from traditional databases. That is where replica shards on
remote nodes come into play.
Jörg
On Wed, Dec 11, 2013 at 8:34 AM, David Pilato david@pilato.fr wrote:
- Is it a good practice to store transaction log and index data in
separate storage, just to be safe? If so, what property controls this
behavior?
Not sure you can and if it's a best practice. I have never heard about it
so I think you should not worry about it.
What others think?
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHMTvTUNnzLNQ_JqWBoHCJgfsyES7ozKb7FVHr5ggHCmQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.