Bulk Indexing - Tier Merge Policy


(phobos182) #1

Quick question. Is it best to increase the number of segments for the Tier Merge Policy to decrease the time spent merging during a bulk indexing? My reasoning being after the bulk indexing has been complete, and optimize can be issued, and dynamically changing the segments_per_tier back to something like 5ish. I want to minimize the number of segments to decrease query speed once an index has been bulk indexed.

What are some other settings individuals are using for Bulk indexing to get it done quickly, and then change it to something that creates smaller segments for quicker searching?


(Karussell) #2

You can read about bulk indexing tuning here

http://www.elasticsearch.org/guide/reference/api/admin-indices-update-settings.html
http://www.elasticsearch.org/guide/reference/index-modules/merge.html

and yes the merge.policy.merge_factor can be dynamically updated to a
higher value (>20) to improve indexing speed. Afterwards optimize.
(but you'll have to check if overall execution time is really
smaller).

Peter.

On 15 Dez., 02:21, phobos182 phobos...@gmail.com wrote:

Quick question. Is it best to increase the number of segments for the Tier
Merge Policy to decrease the time spent merging during a bulk indexing? My
reasoning being after the bulk indexing has been complete, and optimize can
be issued, and dynamically changing the segments_per_tier back to something
like 5ish. I want to minimize the number of segments to decrease query speed
once an index has been bulk indexed.

What are some other settings individuals are using for Bulk indexing to get
it done quickly, and then change it to something that creates smaller
segments for quicker searching?

--
View this message in context:http://elasticsearch-users.115913.n3.nabble.com/Bulk-Indexing-Tier-Me...
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(phobos182) #3

Merge Factor does not work with the Tiered Merge Policy. They have a high watermark for segments, and try to merge down to get under the watermark. My question is how to index fast, but then optimize after the fact for a one time bulk index. How are individuals accomplishing this themselves? If I have 500 million documents to index, I don't want 2 segments total, I want something like 20-30. But after it's done, the searching is so slow I want to get to 5-10 segments to increase speed.


(Shay Banon) #4

Yes, increasing the segments should help for bulk indexing. Also, another
one that many people miss is to simply start with no replicas, and increase
the replica count once indexing is done.

On Thu, Dec 15, 2011 at 6:27 PM, phobos182 phobos182@gmail.com wrote:

Merge Factor does not work with the Tiered Merge Policy. They have a high
watermark for segments, and try to merge down to get under the watermark.
My
question is how to index fast, but then optimize after the fact for a one
time bulk index. How are individuals accomplishing this themselves? If I
have 500 million documents to index, I don't want 2 segments total, I want
something like 20-30. But after it's done, the searching is so slow I want
to get to 5-10 segments to increase speed.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Bulk-Indexing-Tier-Merge-Policy-tp3587182p3589130.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(system) #5