All,
We are running an elasticsearch load (48K documents / mins with elasticsearch 6.5) for 24 hours. There are 32 active main shards for each 8 hours (no replica shard). Here is what we see,
- Load runs well at the beginning.
- About 6 hours later, there are lots of disk I/O which cause _bulk request big delay. Then, lots of documents failed to write to Elasticsearch.
We believe the high disk I/O occupied most time slice which caused by segment merge. We did something to reduce the Elasticsearch merge as below,
- index.merge.policy.floor_segment : 8M (default is 2M)
- index.merge.policy.segments_per_tier : 15 (default is 10)
- index.merge.policy.max_merged_segment: 1G (default is 5G)
- index.merge.scheduler.max_thread_count :1 (default is 3)
- refresh_interval = 120s
- index.translog.durability: async
- index.translog.sync_interval: 120s
- 4 data path for each data node.
After that, take one of index for example, we can see “disk amplification” is 1.95.
"store" : {
"size_in_bytes" : 20333529464
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size_in_bytes" : 0,
"total" : 149,
"total_time_in_millis" : 9815076,
"total_docs" : 17613481,
"total_size_in_bytes" : 39838526820,
"total_stopped_time_in_millis" : 0,
"total_throttled_time_in_millis" : 6424989,
"total_auto_throttle_in_bytes" : 20971520
}
My question is,
Is there anything we can do to optimize segment merge and to save some disk I/O for _bulk request besides I listed ?
Thanks much,
Jill