Elastic 5.6.5 - Force Merge how many segments need to configure

Hi,

I have more than 1 billion documents in one index with 1 shard and 2 replica. My use case is to update elastic document frequently, so i have more than six hundred fifty million deleted documents in the index which is degrades the search performance.

We have 3 Master nodes, 3 Coordinate nodes, 3 Data nodes, and 2 Elastic load balancer's

No of cores in Master nodes: 4 per node
No of cores in Data nodes: 8 per node
No of cores in Coordinate nodes: 4 per node
No of cores in ES load balancers: 2 per node

Datastore Lun, provisioned storage is 414GB

Heap memory for Data Nodes: 30Gb
Heap Memory for Master Nodes: 4 GB
Heap Memory for Coordinate Nodes: 4 GB

Swap Memory for Data Nodes: 4 GB
Swap Memory for Master Nodes: 4 GB
Swap Memory for Coordinate Nodes: 4GB

pri.store.size - 187.9gb
store.size - 563.9gb

Segments created in primary shard,
"num_committed_segments": 59,
"num_search_segments": 62,
Segments created in Secondary shard (Replica)
"num_committed_segments": 61,
"num_search_segments": 63,

  • Is it recommended to run force merge on this Index ? If yes how long will it take ? Since there is no way to monitor force merge i need to max time it will take.
  • Do i need to stop writes to the index during force merge ?
  • How many max segments can i mention during force merge ? Since more number of segments in a shard will also reduce performance.
  • Will search get impact during force merge ? Do i need to stop search as well ?
  • Do i need to increase the shard/ Replica to improve search performance ?

Thank in Advance !!!!

Since you frequently update documents advice you configure
index.translog.retention.age
instead of force merge.
It will improve performance and decrease file descriptors

@Denis_Lamanov, Thanks for the quick response.

index.translog.retention.age will work in ES 6.0. But my ES version is 5.6.

Will Translog settings remove deleted doc count and segments ?

Hi @Sathish_kumar_Marimu,

Another solution (that work with all versions) is to make a cron (during low CPU time period) that _reindex your data in a fresh index then switch alias.

@gabriel_tessier, Me too thought the same, but this 1 billion documents will take a while to re-index.

And i was worried about the writes to the source index.

What will happen to the insert/update document during re-indexing ?

@Sathish_kumar_Marimu if you don't have slow time and always write/update, what about using time base index?

I had one index with lot of update and document I restructure the data, keeping the static data in one index and the updated values in a daily index.... but I had to make a lot of change in my code.
What about your data what is your structure do you think you can split them?

@gabriel_tessier

Even though if we split the updated values in daily index still i used to get deleted doc counts. Same document can be updated multiple times in a day.

I may run into the same problem again.

Will running force merge will cause any issue on the same Index midnight daily ?

@Sathish_kumar_Marimu
For me time based is for example: you have some data that are often updated during one week but after this week this data are not updated, but you still access them (read).
Depend on your data you may have longer period when you need to update often your data.

Do you update randomly all your data or there's some way to split them, often update, less update, never updated etc...?

About your question I don't think that running force merge will cause issue on your index maybe only use a lot of CPU and disk write. But need confirmation of an Elastic Expert that I'm not. :grin: So far I never run force merge on a daily base. Maximum one or two time a year.

It is better to split indices to time based. Small index will be fast merged.
Force merge on big index can cause high disk IO and may take up a lot of disk space.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.