Details of how transaction log is managed during indexing

Amit_Soni · December 11, 2013, 3:35am

Hi - I am curious to know the specific detail (sequence of steps involved)
when an indexing request is sent to elastic search and how transaction logs
are managed? It would be great if anyone can point me to a right article
which covers this stuff in good level of details.

For instance, a few questions that I am hoping to get answered are:

In the default (quorum) mode, is it the master node which determines
which of the shards have to be part of the quorum? Or is this figured out
by the elastic search node client (which is aware of cluster state and sits
in the client application code)
Once the shards are identified where write would happen, is the
request returned with OK status soon after the stuff is written to
transaction log?
Is transaction log maintained at shard level (each shard has its own
transaction log entry)
Is it a good practice to store transaction log and index data in
separate storage, just to be safe? If so, what property controls this
behavior?

-Amit.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAAOGaQ%2B088fDmR72V7TORD%2BLw5mmD5bMcASMWuN-gfw_x5msgg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

dadoonet · December 11, 2013, 7:34am

Hey Amit!

Some answers inlined.

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 11 déc. 2013 à 04:35, Amit Soni amitsoni29@gmail.com a écrit :

Hi - I am curious to know the specific detail (sequence of steps involved) when an indexing request is sent to Elasticsearch and how transaction logs are managed? It would be great if anyone can point me to a right article which covers this stuff in good level of details.

For instance, a few questions that I am hoping to get answered are:

In the default (quorum) mode, is it the master node which determines which of the shards have to be part of the quorum? Or is this figured out by the Elasticsearch node client (which is aware of cluster state and sits in the client application code)
When an index request happen, elasticsearch send it to all shards (primary first and then all replicas) whatever the replication mode is.
Elasticsearch won't fail if a minimum of shards (quorum by default) answers OK.
Once the shards are identified where write would happen, is the request returned with OK status soon after the stuff is written to transaction log?
Yes. The primary shard can fail just after sending IndexResponse OK and elasticsearch needs to promote a replica as a primary with a synchronized transaction log.
Is transaction log maintained at shard level (each shard has its own transaction log entry)
Yes.
Is it a good practice to store transaction log and index data in separate storage, just to be safe? If so, what property controls this behavior?
Not sure you can and if it's a best practice. I have never heard about it so I think you should not worry about it.

What others think?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/41C9CD54-2880-48EC-8D53-BDD652E198EF%40pilato.fr.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · December 11, 2013, 9:58am

In the world of traditional database, the transaction log is often kept on
separate devices / disk drives. This is reasonable because if a database
file gets corrupted, the transaction log must be kept in a safe place to
ensure full recovery. The idea is that database files contain saved
transactions and a volatile part which may get corrupted and can be
repaired by replaying the transaction log. The concept is based on the ACID
paradigm.

From what I understand, ES translogs are for communicating with Lucene.
Each Lucene operation is preceded by a write to the translog. By doing
this, request for Lucene operations are persisted and can not be
accidentally dropped. Lucene also does not know about transactional
operations and manages an internal cache in RAM. From time to time, the
Lucene segments are written to disk in append-only fashion (sync
operation). Lucene passes this operations to the JVM and the JVM passes it
to the operating system.

I'm not sure if ES translogs can be used to repair Lucene indexes by
replaying it, similar to the traditional database world. In case of low
disk situations, I have the experience it may succeed, but it does not
always succeed. The situation is often that ES translog is still intact and
Lucene files ran into out of disk space, but it can also occur that both ES
and Lucene tried to flush data to disk in failure, and ES can recover only
after the translog file got removed (at least in older ES versions).

The gateway concept is more similar to transactional processing. At node
startup, the gateway orchestrates the shards of an index and invokes
recovery with help from replica shards in order to recreate a valid index.

My conclusion is: it would not hurt to have a configuration for maintaining
ES translogs on separate disk devices, but I doubt it is worth the effort.
From a performance point of view, under high disk I/O contention, separate
disks might help to enhance disk IOPS, but with SSD, this became a
non-issue. ES translogs alone do not ensure the recoverability of a whole
index. The ES design for separation to recover an index successfully is
different from traditional databases. That is where replica shards on
remote nodes come into play.

Jörg

On Wed, Dec 11, 2013 at 8:34 AM, David Pilato david@pilato.fr wrote:

Is it a good practice to store transaction log and index data in
separate storage, just to be safe? If so, what property controls this
behavior?

Not sure you can and if it's a best practice. I have never heard about it
so I think you should not worry about it.

What others think?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHMTvTUNnzLNQ_JqWBoHCJgfsyES7ozKb7FVHr5ggHCmQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jason_Wee · December 11, 2013, 1:32pm

Very informative answers. I may have similar question to this, when a node
that hold the primary shard goes down, what is the interval of the replica
shard come online or is there any way to know it?

Thank you.

Jason

On Wed, Dec 11, 2013 at 5:58 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

In the world of traditional database, the transaction log is often kept on
separate devices / disk drives. This is reasonable because if a database
file gets corrupted, the transaction log must be kept in a safe place to
ensure full recovery. The idea is that database files contain saved
transactions and a volatile part which may get corrupted and can be
repaired by replaying the transaction log. The concept is based on the ACID
paradigm.

From what I understand, ES translogs are for communicating with Lucene.
Each Lucene operation is preceded by a write to the translog. By doing
this, request for Lucene operations are persisted and can not be
accidentally dropped. Lucene also does not know about transactional
operations and manages an internal cache in RAM. From time to time, the
Lucene segments are written to disk in append-only fashion (sync
operation). Lucene passes this operations to the JVM and the JVM passes it
to the operating system.

I'm not sure if ES translogs can be used to repair Lucene indexes by
replaying it, similar to the traditional database world. In case of low
disk situations, I have the experience it may succeed, but it does not
always succeed. The situation is often that ES translog is still intact and
Lucene files ran into out of disk space, but it can also occur that both ES
and Lucene tried to flush data to disk in failure, and ES can recover only
after the translog file got removed (at least in older ES versions).

The gateway concept is more similar to transactional processing. At node
startup, the gateway orchestrates the shards of an index and invokes
recovery with help from replica shards in order to recreate a valid index.

My conclusion is: it would not hurt to have a configuration for
maintaining ES translogs on separate disk devices, but I doubt it is worth
the effort. From a performance point of view, under high disk I/O
contention, separate disks might help to enhance disk IOPS, but with SSD,
this became a non-issue. ES translogs alone do not ensure the
recoverability of a whole index. The ES design for separation to recover an
index successfully is different from traditional databases. That is where
replica shards on remote nodes come into play.

Jörg

On Wed, Dec 11, 2013 at 8:34 AM, David Pilato david@pilato.fr wrote:

Is it a good practice to store transaction log and index data in
separate storage, just to be safe? If so, what property controls this
behavior?

Not sure you can and if it's a best practice. I have never heard about it
so I think you should not worry about it.

What others think?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHMTvTUNnzLNQ_JqWBoHCJgfsyES7ozKb7FVHr5ggHCmQ%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHO4itwrDsiH2vLA-mZHc0%2BpCCbFPQcDMQS84qDsfRz-n2vfuw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

nik9000 · December 11, 2013, 2:02pm

On Wed, Dec 11, 2013 at 4:58 AM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

In the world of traditional database, the transaction log is often kept on
separate devices / disk drives. This is reasonable because if a database
file gets corrupted, the transaction log must be kept in a safe place to
ensure full recovery. The idea is that database files contain saved
transactions and a volatile part which may get corrupted and can be
repaired by replaying the transaction log. The concept is based on the ACID
paradigm.

Normally you actually ship them off system if you really want recovery.
I'll describe what I know about PostgreSQL:

Once the data files are successfully synchronized (think fsync) then
the transaction log containing those operations can usually be recycled.
If you want the ability to recover from some disaster (think meteor)
what you do is bundle those logs when it is time to recycle them and ship
them off site (think gzip and scp).
You can replay those logs on top of a restored backup to get a
reasonably recent copy of the database. It is also possible to replay only
a portion of the logs to restore a point in time.
If there is a problem before the data files are synced (think power
failure) then the transaction logs can be replayed on top of the data files
during server start up. The transaction logs are synced by definition of
what a transaction is (unless you've relaxed it with settings).
In the world of raid you aren't likely to lose the whole
disk, but corrupting a file is totally possible if you write to it and lose
power at the same time. If this happens while writing to the transaction
log then the transaction wasn't committed. If this happens while writing
to the data files the transaction log will recover the write on startup.

Normally moving transaction logs to another disk is a performance thing -
but only in very specific disk layouts. The thing is that the transaction
log is always written forwards while the data is written randomly and being
read as well. Too few disks and it isn't worth doing because you'll waste
2 whole disks on the transaction log. Too many and you've probably already
bought a batter backed raid card or SAN or something which smooths out the
always forward writes to the point where you won't see any benefit for the
trouble of partitioning. The battery backed raid card or SAN should be
able to recover from a power failure by replying its write ahead cache on
startup, thus keeping anything from getting corrupted.

My conclusion is: it would not hurt to have a configuration for maintaining

ES translogs on separate disk devices, but I doubt it is worth the effort.
From a performance point of view, under high disk I/O contention, separate
disks might help to enhance disk IOPS, but with SSD, this became a
non-issue. ES translogs alone do not ensure the recoverability of a whole
index. The ES design for separation to recover an index successfully is
different from traditional databases. That is where replica shards on
remote nodes come into play.

From a recoverability standpoint I'd rely on restoring actions from a
replica in the case of single node failure and replaying the actions from
another system of record for whole cluster failures.

Nik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1STO_CN0d6EPOC-CxPd8jCBnrVTOfhNuP%3D%2BffsfjAFfg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · December 11, 2013, 2:06pm

The switchover from primary to replica shards is seamless for each node,
there is no interval, the local cluster state routing table holds the info
for all known shards of the cluster.

If a node goes down and a shard of that node is accessed, this raises an
error, and the failed access is rerouted to another node by the help of the
local cluster state routing table.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG9tXT9QgA%3D1gfZNvOr3HF%3DaZtv-zzt0yVYBwEe7OaaUQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

ananth · December 30, 2013, 11:29am

Words from shay.banon regarding transaction logs in ElasticSearch.

kindly refer
: http://elasticsearch-users.115913.n3.nabble.com/Permanent-and-selective-transaction-log-td886751.html

The aim of the transaction log in elasticsearch is to basically make sure
that once you index data into elasticsearch it will be there, without the
need to perform a "commit" on lucene each time (which will kill
performance). This means that when a recovery happens, either from gateway
or from another node, the actual index files are recovered, and then the
transaction log is replayed. When an actual flush (in elasticsearch lingo)
happens, a commit on Lucene is performed and a new transaction log is
created.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d610b5c6-9913-498f-820c-f8e45eefa5b7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
What is a transaction during the indexing request to E.S ? What is the transaction during the search query request to E.S? Elasticsearch	2	311	July 6, 2017
A request life in elasticsearch Elasticsearch	2	1932	July 5, 2017
Persistent transaction log Elasticsearch	3	725	July 6, 2017
Simple persistence question Elasticsearch	2	273	July 6, 2017
Shard allocation with Multi Index in a cluster Elasticsearch	3	331	July 6, 2017

Details of how transaction log is managed during indexing

Related Topics