Confuse about 'index.translog.durability'

Hey guys,I'm learning the index.translog.durability parameter and got confused.
In my opinion,when I change this parameter to async ,it means ElasticSearch use Buffer I/O,data is cached in pagecache and index.translog.sync_interval later data are fsynced into disk,while request means Direct I/O?
Is it right?

@KellyK313

we do not really change whether we buffer I/O or use direct I/O. When using async durability, some data will typically still reside in Elasticsearch until sync time. And the data that has already been flushed to the channel is explicitly disregarded if recovering, since no "checkpoint" has been made (which is a small atomic file update).

So if Elasticsearch or the underlying host crashes with async durability on, all changes since last periodic sync will be lost.

So,translog data within index.translog.sync_interval will reside in Elasticsearch and use jvm heap?
According to some tuning articles,it says use async and turn up the index.translog.sync_interval will speed up the index speed ratio,is that true?

It is buffered in heap and flushed to the OS buffers when the on-heap buffer is full. So will not consume a lot of heap.

It is possible that async durability will speed up indexing ratio over sync durabilty. How much depends on how much time is spent fsync'ing and also how many parallel indexing requests are sent.

If most time is spent analysing and indexing fields, the performance benefit will be small only.

Looking at GET _nodes/hot_threads while indexing runs can give you some hints on whether sync is a big part of it, but running a controlled benchmark with both setups would give you a more precise answer.

Many thanks for the kind reply.
Q1:It is buffered in heap,is there a parameter to controll the usage of percent of heap?
Q2:The reason why async can speed up the indexing ratio is that it reduces the fsync count,so reduces the influence on disk I/O,in favor of merge and refresh?

@HenningAndersen
hey,I have do some test and the result shows request orasync don't affect the indexing speed ratio.
ElasticSearch version:7.7.0
Client:Rest High Client
Bulk size:10M
concurrent thread count:10
(1)default translog parameter(request)
Image
(2)use async
Image 2
(3)use async and sync_interval:20s
Image 3

Under three kind of different parameter,and I have try use 30 concurrent thread count,the indexing ratio speed is really the same.
Is it says that when our disk speed is good enough(SSD),we don't need to change this parameter to get the higher indexing ratio?

If the scenario setup here contains primary elements from your production scenario, it looks like async durability offers no benefits to that specific scenario.

The bulk size of 10M (I assume that mean 10 MB) is that typical? I think the large bulk size and the relatively low concurrency here is part of the reason why you do not see any effect of this. But if this mimics what you expect in production, it is perfectly fine.

It is hard to guess at bottlenecks without a deeper investigation. I would recommend double checking that your network between client/simulation and Elasticsearch is not saturated and that the client machine simulating the workload is not overloaded. Also checking that you do see high cpu usage on the Elasticsearch nodes while running this makes sense. If you have not read it already, the blog post about benchmarking elasticsearch is definitely worth a read.

I also want to mention that there are other ways to tune indexing, like optimizing mappings for indexing speed.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.