Optimize elasticsearch segment merge


We are running an elasticsearch load (48K documents / mins with elasticsearch 6.5) for 24 hours. There are 32 active main shards for each 8 hours (no replica shard). Here is what we see,

  1. Load runs well at the beginning.
  2. About 6 hours later, there are lots of disk I/O which cause _bulk request big delay. Then, lots of documents failed to write to Elasticsearch.

We believe the high disk I/O occupied most time slice which caused by segment merge. We did something to reduce the Elasticsearch merge as below,

  1. index.merge.policy.floor_segment : 8M (default is 2M)
  2. index.merge.policy.segments_per_tier : 15 (default is 10)
  3. index.merge.policy.max_merged_segment: 1G (default is 5G)
  4. index.merge.scheduler.max_thread_count :1 (default is 3)
  5. refresh_interval = 120s
  6. index.translog.durability: async
  7. index.translog.sync_interval: 120s
  8. 4 data path for each data node.

After that, take one of index for example, we can see “disk amplification” is 1.95.

  "store" : {
    "size_in_bytes" : 20333529464
   "merges" : {
    "current" : 0,
    "current_docs" : 0,
    "current_size_in_bytes" : 0,
    "total" : 149,
    "total_time_in_millis" : 9815076,
    "total_docs" : 17613481,
    "total_size_in_bytes" : 39838526820,
    "total_stopped_time_in_millis" : 0,
    "total_throttled_time_in_millis" : 6424989,
    "total_auto_throttle_in_bytes" : 20971520

My question is,
Is there anything we can do to optimize segment merge and to save some disk I/O for _bulk request besides I listed ?

Thanks much,

It looks like you are using slow storage as there is a lot of throttling. Indexing is an I/O intensive process and I am not sure how much trying to tune merging will give you. I would recommend watching this video and try to upgrade to faster storage.

Thanks much for your info. Actually, hardware disk is out of our control for now. Let's assume storage cannot be changed. With this condition, is there anything we can to optimize the segment merge?

It seems you have done most of what I would expect. I am unsure you would see any major gains from further tuning.

Thanks anyway. Since we are using cloud resources, hardware is out of our limits.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.