I know that a refresh will create new Lucene segments from newly indexed
documents, but does the refresh directly signal the Lucene merge scheduler
in any way? If there are no documents being indexed, does a positive
refresh interval (> 0ms) have any effect on merging?
We are facing a scenario were too many segments on a relatively small index
is causing search performance issues. Too much time is spent advancing
between segments and only an optimize (er, force merge) will alleviate the
issue.
Thanks Mark. Your responses are as expected. A co-worker is convinced we
need to set the refresh interval to something other than -1, even if we run
an explicit refresh after a bulk indexing batch.
Our queries are adversely affected by the number of segments, which we
never see decreasing. We index a mostly small number of documents every
twenty minutes, which may include updates (deletes). Current attempt is to
increase the number of max segments so that no one segment is over 1gb in
hopes that the merge scheduler can do its thing.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.