It might be more convenient to do it on the application level. You can add
a numeric field called "availability" and create a filtering alias for the
index. The filter in the alias would filter out all records that have
availability value higher than a certain threshold. Each new batch should
be indexed with availability higher than the current threshold. So, the
records from the new batch would not appear in the search results. When you
want to make a new batch of records available, you just recreate the alias
with a new filter with a higher threshold.
On Wednesday, April 11, 2012 9:28:25 AM UTC-4, Hillel Taub-Tabib wrote:
I'm doing bulk indexing, and I need my documents to become searchable
only on explicit request. I was hoping to achieve this by disabling
auto-refresh. I changed "index.refresh_interval" to -1, and indeed,
when indexing small batches of documents, the documents do not become
searchable. However, when I'm indexing large batches (~40000 docs),
the index appears to be refreshing itself several times during the
process (I assume due to memory issues).
Is there a way to bullet proof the process and make sure that
documents do not become searchable until explicitly requested?