Force merge optimise in background process

INS · July 6, 2023, 9:00am

Hi
I need to change the force merge process in the background for count of segments.
Also how I can steering/manipulating force merge. In my case I have the index which is really updating by data. So If this index is so tiered I can observe low performance on searching.

Christian_Dahlqvist · July 6, 2023, 9:16am

If the index is still being updated or indexed to, why would you forcemerge?

Forcemerging can be I/O intensive. What type of storage are you using? Local SSDs?

INS · July 6, 2023, 10:35am

Hi We're using SSD disk with P30 tier on Azure,
"Force merge" with less segments gives in performance search test, much more better results.

It allows to delete nested fields.

Christian_Dahlqvist · July 6, 2023, 10:56am

Indexing and updating continously create new segments, so I do not see much point forcemerging an index actively indexed into. If you modified data only periodically it might make sense though.

Are you indexing/updating through bulk requests? What is your refresh interval set to?

INS · July 7, 2023, 1:18pm

Yes through bulk request, refresh interval was set to 60s

Christian_Dahlqvist · July 7, 2023, 1:30pm

How many inserts/updates are you performing per second? How many documents are there in the index?

Which version of Elasticsearch are you using?

How many segments did you forcemerge into? Did this test include concurrent indexing and updates?

INS · July 7, 2023, 1:45pm

Indexing rate is avr ~250/s so this index consists with 32mln of docs
We're using 7.17 of ES

BTW. I can't find the topic but as far I as remember You've claimed that search performing on 1 shard and 2 replica (on 3 data nodes) should give much more better results then 3 shard with 1 replica. You was referring to using cache from this replica shard. But when I'checked it from .monitoring-es* the param "index_stats.total.query_cache.hit_count" was pointed out for the nodes with primary shards only.

Christian_Dahlqvist · July 7, 2023, 1:53pm

The ideal number of primary and replica shards typically depend on whether you are optimising for latency or the number of concurrent queries that the cluster can support.

If a single primary shard give acceptable query latencies scaling out replicas will allow more nodes to handle queries. If a single primary shard is too large to support acceptable query latencies you may need more smaller primary shards to improve concurrency. This however often lead to fewer cobcurrent queries being possible on the same hardware.

INS · July 7, 2023, 2:33pm

The index as mentioned is 28GB in size, from the monitoring it can be seen that other nodes with replicas are used along with the query (I infer this from the mem and CPU usage) but I do not see that it uses cach. Does the ES mechanism only allow the use of cach from the primary shard?

system · August 4, 2023, 2:34pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Force merge reduced performance Elasticsearch	2	1965	February 8, 2019
Elastic 5.6.5 - Force Merge how many segments need to configure Elasticsearch	9	917	August 5, 2019
Optimizing segment merging Elasticsearch	1	721	March 12, 2021
Es forcemerge question Elasticsearch	5	546	October 2, 2020
About segment merge Elasticsearch	7	929	October 11, 2017

Force merge optimise in background process

Related topics