Bulk Indexing - Tier Merge Policy

phobos182 · December 15, 2011, 1:21am

Quick question. Is it best to increase the number of segments for the Tier Merge Policy to decrease the time spent merging during a bulk indexing? My reasoning being after the bulk indexing has been complete, and optimize can be issued, and dynamically changing the segments_per_tier back to something like 5ish. I want to minimize the number of segments to decrease query speed once an index has been bulk indexed.

What are some other settings individuals are using for Bulk indexing to get it done quickly, and then change it to something that creates smaller segments for quicker searching?

Karussell1 · December 15, 2011, 8:18am

You can read about bulk indexing tuning here

and yes the merge.policy.merge_factor can be dynamically updated to a
higher value (>20) to improve indexing speed. Afterwards optimize.
(but you'll have to check if overall execution time is really
smaller).

Peter.

On 15 Dez., 02:21, phobos182 phobos...@gmail.com wrote:

Quick question. Is it best to increase the number of segments for the Tier
Merge Policy to decrease the time spent merging during a bulk indexing? My
reasoning being after the bulk indexing has been complete, and optimize can
be issued, and dynamically changing the segments_per_tier back to something
like 5ish. I want to minimize the number of segments to decrease query speed
once an index has been bulk indexed.

What are some other settings individuals are using for Bulk indexing to get
it done quickly, and then change it to something that creates smaller
segments for quicker searching?

--
View this message in context:http://elasticsearch-users.115913.n3.nabble.com/Bulk-Indexing-Tier-Me...
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

phobos182 · December 15, 2011, 4:27pm

Merge Factor does not work with the Tiered Merge Policy. They have a high watermark for segments, and try to merge down to get under the watermark. My question is how to index fast, but then optimize after the fact for a one time bulk index. How are individuals accomplishing this themselves? If I have 500 million documents to index, I don't want 2 segments total, I want something like 20-30. But after it's done, the searching is so slow I want to get to 5-10 segments to increase speed.

kimchy · December 16, 2011, 3:42pm

Yes, increasing the segments should help for bulk indexing. Also, another
one that many people miss is to simply start with no replicas, and increase
the replica count once indexing is done.

On Thu, Dec 15, 2011 at 6:27 PM, phobos182 phobos182@gmail.com wrote:

Merge Factor does not work with the Tiered Merge Policy. They have a high
watermark for segments, and try to merge down to get under the watermark.
My
question is how to index fast, but then optimize after the fact for a one
time bulk index. How are individuals accomplishing this themselves? If I
have 500 million documents to index, I don't want 2 segments total, I want
something like 20-30. But after it's done, the searching is so slow I want
to get to 5-10 segments to increase speed.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Bulk-Indexing-Tier-Merge-Policy-tp3587182p3589130.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

Topic		Replies	Views
Reduce Number of Segments Elasticsearch	8	1336	July 6, 2017
Changing Merge Policy And Optimization Elasticsearch	4	810	July 6, 2017
Tiered merge policy settings not documented in 2.x: gone? Elasticsearch	3	1168	July 5, 2017
Adjust index.merge.policy.max_merged_segment value to lower result? Elasticsearch	8	5529	July 5, 2017
Clarification: how does floor_segment impact merge rate? Elasticsearch	1	1532	July 5, 2017

Bulk Indexing - Tier Merge Policy

Related topics