Force merge optimise in background process

Hi
I need to change the force merge process in the background for count of segments.
Also how I can steering/manipulating force merge. In my case I have the index which is really updating by data. So If this index is so tiered I can observe low performance on searching.

If the index is still being updated or indexed to, why would you forcemerge?

Forcemerging can be I/O intensive. What type of storage are you using? Local SSDs?

Hi We're using SSD disk with P30 tier on Azure,
"Force merge" with less segments gives in performance search test, much more better results.

It allows to delete nested fields.

Indexing and updating continously create new segments, so I do not see much point forcemerging an index actively indexed into. If you modified data only periodically it might make sense though.

Are you indexing/updating through bulk requests? What is your refresh interval set to?

1 Like

Yes through bulk request, refresh interval was set to 60s

How many inserts/updates are you performing per second? How many documents are there in the index?

Which version of Elasticsearch are you using?

How many segments did you forcemerge into? Did this test include concurrent indexing and updates?

Indexing rate is avr ~250/s so this index consists with 32mln of docs
We're using 7.17 of ES

BTW. I can't find the topic but as far I as remember You've claimed that search performing on 1 shard and 2 replica (on 3 data nodes) should give much more better results then 3 shard with 1 replica. You was referring to using cache from this replica shard. But when I'checked it from .monitoring-es* the param "index_stats.total.query_cache.hit_count" was pointed out for the nodes with primary shards only.

The ideal number of primary and replica shards typically depend on whether you are optimising for latency or the number of concurrent queries that the cluster can support.

If a single primary shard give acceptable query latencies scaling out replicas will allow more nodes to handle queries. If a single primary shard is too large to support acceptable query latencies you may need more smaller primary shards to improve concurrency. This however often lead to fewer cobcurrent queries being possible on the same hardware.

The index as mentioned is 28GB in size, from the monitoring it can be seen that other nodes with replicas are used along with the query (I infer this from the mem and CPU usage) but I do not see that it uses cach. Does the ES mechanism only allow the use of cach from the primary shard?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.