I have more than 1 billion documents in one index with 1 shard and 2 replica. My use case is to update elastic document frequently, so i have more than six hundred fifty million deleted documents in the index which is degrades the search performance.
We have 3 Master nodes, 3 Coordinate nodes, 3 Data nodes, and 2 Elastic load balancer's
No of cores in Master nodes: 4 per node
No of cores in Data nodes: 8 per node
No of cores in Coordinate nodes: 4 per node
No of cores in ES load balancers: 2 per node
Datastore Lun, provisioned storage is 414GB
Heap memory for Data Nodes: 30Gb
Heap Memory for Master Nodes: 4 GB
Heap Memory for Coordinate Nodes: 4 GB
Swap Memory for Data Nodes: 4 GB
Swap Memory for Master Nodes: 4 GB
Swap Memory for Coordinate Nodes: 4GB
pri.store.size - 187.9gb
store.size - 563.9gb
Segments created in primary shard,
"num_committed_segments": 59,
"num_search_segments": 62,
Segments created in Secondary shard (Replica)
"num_committed_segments": 61,
"num_search_segments": 63,
Is it recommended to run force merge on this Index ? If yes how long will it take ? Since there is no way to monitor force merge i need to max time it will take.
Do i need to stop writes to the index during force merge ?
How many max segments can i mention during force merge ? Since more number of segments in a shard will also reduce performance.
Will search get impact during force merge ? Do i need to stop search as well ?
Do i need to increase the shard/ Replica to improve search performance ?
Since you frequently update documents advice you configure
index.translog.retention.age
instead of force merge.
It will improve performance and decrease file descriptors
Another solution (that work with all versions) is to make a cron (during low CPU time period) that _reindex your data in a fresh index then switch alias.
@Sathish_kumar_Marimu if you don't have slow time and always write/update, what about using time base index?
I had one index with lot of update and document I restructure the data, keeping the static data in one index and the updated values in a daily index.... but I had to make a lot of change in my code.
What about your data what is your structure do you think you can split them?
Even though if we split the updated values in daily index still i used to get deleted doc counts. Same document can be updated multiple times in a day.
I may run into the same problem again.
Will running force merge will cause any issue on the same Index midnight daily ?
@Sathish_kumar_Marimu
For me time based is for example: you have some data that are often updated during one week but after this week this data are not updated, but you still access them (read).
Depend on your data you may have longer period when you need to update often your data.
Do you update randomly all your data or there's some way to split them, often update, less update, never updated etc...?
About your question I don't think that running force merge will cause issue on your index maybe only use a lot of CPU and disk write. But need confirmation of an Elastic Expert that I'm not. So far I never run force merge on a daily base. Maximum one or two time a year.
It is better to split indices to time based. Small index will be fast merged.
Force merge on big index can cause high disk IO and may take up a lot of disk space.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.