Adjust index.merge.policy.max_merged_segment value to lower result?


#1

Hi,
I've red this doc [ https://www.elastic.co/guide/en/elasticsearch/reference/1.7/index-modules-merge.html
] many times, but I'm not sure what is the actual result if set lower value of index.merge.policy.max_merged_segment.

Assume I have bulk indexing for 3million docs in 30 minutes.
If I set lower value than default 5GB, let's say 2.5GB,

  1. Will I have better index performance?
  2. Will I reduce number of merge that occurred?
  3. Will I get fewer segments files?
  4. Does this mean in the end, max segment file will approximate 2.5GB?

Please share your experience and thought, many thanks.


Set the min size of a segment?
(Adrien Grand) #2

No.

Maybe slightly less.

No, you will have more segments, since segments will have a smaller maximum size.

Yes, but this is an approximation. The size of a merged segment is approximated as the sum of the segments that are being merged while sometimes a merge can increase/decrease the efficiency of compression so the actualy size might be a bit different.


#3

Thanks @jpountz,
The situation is my system keep receiving message from lots of device, the ultimate goal is sustained during indexing and also performed. Query from ES is not the first priority. So What i'm going to figure out how to reduce overall merge count and to reduce CPU & disk I/O.
(message is not batch input but constantly send to system almost every few seconds, but total number of message can be predict and calculated at specificed time range).

So how to adjust merge policy params need some experience. Please give more inupt. Thank you very much.

PS: System will do optimize at mid-night for each index.


(Adrien Grand) #4

Is merging really the issue? Just in case you missed it, this post tries to give guidelines around optimizing for indexing speed: https://www.elastic.co/blog/performance-considerations-elasticsearch-indexing.


#5

Hi @jpountz, thanks for you link. I read it before.

I turned on the slow merge log to and there are many log entry show my elastic node did many merge and lot of them take long time. Merging is costly process right? Slow merge means CPU consumption much more.

So I guess reducing merge both time and count may help my indexing performance.

Slow merge log ref:
http://jontai.me/blog/2012/07/configuring-elasticsearch-to-log-merges/


(Adrien Grand) #6

Hmm I am not very familiar with this log so I am wondering that it might take throttling into account.

If you want to reduce merging, I think the best approach would be to raise the refresh interval and increase the indexing memory buffer.


#7

Hi @jpountz

Update on this, after 2 month observation, the problem is actually I/O and because I have limited resource for ES. So to reduce merge count and each merge time is the only way to sustain the system.
I want to make sure one thing and please correct me if I am wrong.

Does max_merged_segment value = max_merged_segment_at_once * segments_per_tier when ES doing a merge budget caculation?
for example if I set max_merged_segment_at_once=5 & segments_per_tier=5 and most of my segments were 5MB, does mean each merge will do 5 * 5MB at once and since it is not exceed max_merged_segment(5gb), the merge will go?
If I lower max_merged_segment to 256mb what will actually happened in this flow?

Forgive my poor understanding of per_tier concept.
(what is actually tier means?)

My system is actually a indexing intensively but query not so frequently, even query, the major purpose is doing aggregation.

I use time-based index strategy and every night, the system will rotate index and doing a optimize for index last day.


(Adrien Grand) #8

The defaults should actually be good at making merges efficient, I think you should keep them.

A tier is a group of segments that have approximately the same size.


(system) #9