Need to force search results only after indexing is complete

2 ES nodes cluster (non-prod lab)
ES version 6.1
RAM =24 GB/ node
CPU cores =6/ node

I am working on a use case where I need to do bulk indexing of large number of records (around 2 million - close to 900 MB total) and results to show up in search only after the entire indexing is done.

I disabled the refresh_interval, assuming i would need to refresh manually after indexing is complete for the full data bulk upload.
But during indexing process, i noticed the search results started showing up while only half the records had been indexed (~ 450 MB).
I also noticed cluster's memory release and segment count increased at around this time.

Researching more about this behavior, is it the ES’s default flush process which could have caused this?
I wanted to know more about the optimal metrics for the ES clusters for below config parameters and also its implications on the performance of the cluster to find - the max threshold for this cluster that i can set and use.

I also read that disk I/O and file system cache may be dependent parameters to consider.
Can I know how would these affect the bulk index and search results for my use case

Any help on this would be really appreciated.
Thanks!
D

Maybe you could search through an alias that you only link to the index once indexing has finished?

It has to be dynamically accessed by the application.
I can do this manually in lab but how to go about doing it in production cluster

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.