Force_merge Thread_pool

Thek10patil · August 16, 2020, 12:30am

Hi Everyone,

I need to run force_merge frequently to have max_num_segments = 1. It takes lot of time. I want to make force_merge faster. What are the possible option do I have?

I read and found that max size of threads available for force_merge is 1. I changed that to 4 as I have 8 thread machine by changing elasticsearch.yml config file. I can see the respective change as well in node setting.

I checked during force_merge execution, it shows as 1 active thread even though current thread size is 4 for force_merge.

Christian_Dahlqvist · August 16, 2020, 5:52am

How many indices and shards are you forcemerging? Why do you need to run it so often? What is the size of the cluster?

Thek10patil · August 17, 2020, 5:38pm

I am planning to use Ultrawarm AWS feature. For that I wanted to run force merge first on all the available indices.

Currently force-merge uses one thread only. I have added more threads using yml file but while running force merge it is showing only one active thread.

DavidTurner · August 17, 2020, 7:20pm

You will likely need to approach AWS support for help with this question, AWS Elasticsearch is different from the official Elasticsearch that we support here.

Thek10patil · August 17, 2020, 8:12pm

Hey David,

I know that feature is from AWS. But I am looking to make Force_Merge faster.

For now force_merge to max_segments = 1 with default setting taking 25 mins for 50GB shard which has around 45 segments. I want to make it faster.

What will be a good way to do this?

Thanks in advance.

DavidTurner · August 17, 2020, 8:20pm

Can you reproduce the issue outside of AWS Elasticsearch? The only people who have access to the AWS Elasticsearch code are AWS themselves, and it's pretty tricky to debug this kind of thing without access to the code that you're actually running.

Thek10patil · August 17, 2020, 9:11pm

I am not sure if I can reproduce it but for now I am looking for parameters which will help to make force_merge faster.

Parameters such as

threads assigned for force_merge though THREAD_POOL node setting
index.merge.scheduler.max_thread_count

Christian_Dahlqvist · August 18, 2020, 5:23am

If I recall correctly each shard is forcemerged in a single thread. Increasing the thread pool allows multiple shards to be forcemerged in parallel but would as far as I know not speed up a single forcemerge. I am not aware of any setting that would speed it up, but the time it takes is however generally proportional to the shard size so you could try to reduce the shard size.

Thek10patil · September 3, 2020, 10:40pm

Yes you are correct. Increasing thread pools allows to run FM on multiple shards.

system · October 1, 2020, 10:40pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to speed up the segment merging Elasticsearch	1	766	February 1, 2021
Force Merge API Status Elasticsearch	3	5398	December 14, 2016
Rerun ForceMerge When Already At 1 Segment per Shard? Elasticsearch	3	455	March 28, 2017
Slow force merge after 7.5.0 -> 7.9.1 upgrade Elasticsearch	1	666	November 13, 2020
How to speed up the segment merging at night? Elasticsearch	3	804	May 25, 2021

Force_merge Thread_pool

Related topics