I am doing bulk indexing and as an optimization I set the refresh_interval of the indices to -1. I am going to set it back to the default after I am done. In one of the ES documents (perhaps old) I read I have to forcemerge. I don't see a need for it based on what I read about refresh_interval, should I?
You don't have to do it at all. You can do it, but you only should do it, if you do not have any further writes on that index - otherwise merges will happen anyway again over time.
Mine is a time based index and after last write there aren't any writes. I didn't do force merge but just index refresh. I see possibly the last line I inserted (at least last line in my log feed) from the search. I say 'possibly' as there are other hits with search but the last one did show up. The index is large (about 1.6 TB) so don't want to force merge if I can get away not doing it.
So are you sure I have to force merge to see the results of insertion or are you saying it helps (usage of 'should' in your answer is confusing)?
I understand force merge makes searches faster/better on read only indices but that optimization is for another day.
A refresh is perfectly fine to see the latest changes. You might want to run a flush, so that everything is persisted to disk, so recovery is going to be faster., in case it is needed.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.