Async index.translog.durability

Alex_Davidovich · July 18, 2017, 1:36pm

Why is it not recommended to change the index.translog.durability setting to "async"?
I mean, besides losing data in many more cases.
In addition, can it increase the chance of corruption of the translog in any way? Is it uncommon to use this setting?

Alex_Davidovich · July 20, 2017, 5:52am

Anybody?

polyfractal · July 20, 2017, 8:33pm

This is basically the reason

By making writes to the translog async, ES can no longer guarantee that when it returns "OK" to the client, that the document was actually persisted. It undermines some of the fundamental consistency guarantees of the system.

It doesn't increase the chance of corruption per-se, but it does increase the chance of losing documents due to transient errors, power outages, disk hiccups, etc.

For that reason, and the fact that it doesn't add much overhead, it is uncommon to change. I would not recommend changing it

Alex_Davidovich · July 21, 2017, 5:57am

I mean, can this setting increase the chance for ES to not recover after a power outage? Maybe because of parital writes to the translog? Or cause a problem so big that deleting or activating the translog tool wont help?
We want to know if the ES team keeps the translog consistent all time for recovery.
In addition, let's say losing data from the last is not a problem, would you still not recommend?
Thanks

polyfractal · July 21, 2017, 1:33pm

I don't believe it has any effect on corruption of the translog itself.

The async parameter simply tells Elasticearch to buffer operations in memory and fsync on the regular interval (default 5s), rather than fsyncing on each operation.

Once an fsync is called, we rely on the OS and hardware to properly persist the data to disk. Of course, it's possible something goes wrong in this process, so we have checksums to verify the integrity of the translog on restarts. But the chance of spurious corruption is unrelated to the async parameter.

We do everything we can on our side, in software, to make sure the translog is consistent. But hiccups happen due to hardware or kernel problems, so it isn't bullet proof (but that's also why we have replicas in the cluster, etc).

If you're ok with losing the last interval of data on power loss, and want a slightly higher indexing rate, you could consider enabling async. I personally don't think it's worth the hassle... if you are doing bulk requests, the fsync is only being called at the end of the bulk payload so async doesn't really do much.

You can read more here (search for "How safe is translog"): Making Changes Persistent | Elasticsearch: The Definitive Guide [2.x] | Elastic

Alex_Davidovich · July 21, 2017, 2:37pm

Thank you very much

system · August 18, 2017, 2:51pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Confuse about 'index.translog.durability' Elasticsearch	7	1579	July 22, 2020
Index.translog.durability async Elasticsearch	6	865	August 14, 2017
Translog Durability Elasticsearch	2	378	June 22, 2018
Slow performance compare to v1.7 when using 5.x Elasticsearch	3	620	October 20, 2017
Some confusion about translog Elasticsearch	6	689	September 23, 2019

Async index.translog.durability

Related topics