Quick question. Is it best to increase the number of segments for the Tier Merge Policy to decrease the time spent merging during a bulk indexing? My reasoning being after the bulk indexing has been complete, and optimize can be issued, and dynamically changing the segments_per_tier back to something like 5ish. I want to minimize the number of segments to decrease query speed once an index has been bulk indexed.
What are some other settings individuals are using for Bulk indexing to get it done quickly, and then change it to something that creates smaller segments for quicker searching?
and yes the merge.policy.merge_factor can be dynamically updated to a
higher value (>20) to improve indexing speed. Afterwards optimize.
(but you'll have to check if overall execution time is really
smaller).
Quick question. Is it best to increase the number of segments for the Tier
Merge Policy to decrease the time spent merging during a bulk indexing? My
reasoning being after the bulk indexing has been complete, and optimize can
be issued, and dynamically changing the segments_per_tier back to something
like 5ish. I want to minimize the number of segments to decrease query speed
once an index has been bulk indexed.
What are some other settings individuals are using for Bulk indexing to get
it done quickly, and then change it to something that creates smaller
segments for quicker searching?
Merge Factor does not work with the Tiered Merge Policy. They have a high watermark for segments, and try to merge down to get under the watermark. My question is how to index fast, but then optimize after the fact for a one time bulk index. How are individuals accomplishing this themselves? If I have 500 million documents to index, I don't want 2 segments total, I want something like 20-30. But after it's done, the searching is so slow I want to get to 5-10 segments to increase speed.
Yes, increasing the segments should help for bulk indexing. Also, another
one that many people miss is to simply start with no replicas, and increase
the replica count once indexing is done.
Merge Factor does not work with the Tiered Merge Policy. They have a high
watermark for segments, and try to merge down to get under the watermark.
My
question is how to index fast, but then optimize after the fact for a one
time bulk index. How are individuals accomplishing this themselves? If I
have 500 million documents to index, I don't want 2 segments total, I want
something like 20-30. But after it's done, the searching is so slow I want
to get to 5-10 segments to increase speed.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.