No, you will have more segments, since segments will have a smaller maximum size.
Yes, but this is an approximation. The size of a merged segment is approximated as the sum of the segments that are being merged while sometimes a merge can increase/decrease the efficiency of compression so the actualy size might be a bit different.
Thanks @jpountz,
The situation is my system keep receiving message from lots of device, the ultimate goal is sustained during indexing and also performed. Query from ES is not the first priority. So What i'm going to figure out how to reduce overall merge count and to reduce CPU & disk I/O.
(message is not batch input but constantly send to system almost every few seconds, but total number of message can be predict and calculated at specificed time range).
So how to adjust merge policy params need some experience. Please give more inupt. Thank you very much.
PS: System will do optimize at mid-night for each index.
Hi @jpountz, thanks for you link. I read it before.
I turned on the slow merge log to and there are many log entry show my elastic node did many merge and lot of them take long time. Merging is costly process right? Slow merge means CPU consumption much more.
So I guess reducing merge both time and count may help my indexing performance.
Update on this, after 2 month observation, the problem is actually I/O and because I have limited resource for ES. So to reduce merge count and each merge time is the only way to sustain the system.
I want to make sure one thing and please correct me if I am wrong.
Does max_merged_segment value = max_merged_segment_at_once * segments_per_tier when ES doing a merge budget caculation?
for example if I set max_merged_segment_at_once=5 & segments_per_tier=5 and most of my segments were 5MB, does mean each merge will do 5 * 5MB at once and since it is not exceed max_merged_segment(5gb), the merge will go?
If I lower max_merged_segment to 256mb what will actually happened in this flow?
Forgive my poor understanding of per_tier concept.
(what is actually tier means?)
My system is actually a indexing intensively but query not so frequently, even query, the major purpose is doing aggregation.
I use time-based index strategy and every night, the system will rotate index and doing a optimize for index last day.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.