I don't understand why merging a 8GB(total) shard into one 3GB segment shard should take about 6 minutes, while all the indicators(io/cpu/memory/gc) of the machine(ssd/32 cores/30GB jvm/64GB total/swap disabled) are very low.
I have tried the following configurations:
-
set the thread_pool.force_merge.size to 4, I understand that merging is one thread per shard, but there is still one merge thread(through remote debug) after i put two shards in one ES node, and the merging time is twice that of the one shard. What did i miss?
-
set the data directory to the /dev/shm, and the process doesn't seem to have changed much.
BTW, here is my scenario(3 ES nodes in 3 machines separately, one index with 6 shards, 0 replica, each machine shall have two shards, ES version 7.7.1)
- create the index, set the refresh interval to -1
- bulk into the es
- stop writing, call refresh and set refresh interval to 60s
- force merge into 1 segments by calling the _forcemerge?max_num_segments=1
- wait until the segments become 1 and prepare the alias to take traffic
After merging into 1 segment, the size of the whole index is about 14GB, 2 million docs.