2 ES nodes cluster (non-prod lab)
ES version 6.1
RAM =24 GB/ node
CPU cores =6/ node
I am working on a use case where I need to do bulk indexing of large number of records (around 2 million - close to 900 MB total) and results to show up in search only after the entire indexing is done.
I disabled the refresh_interval, assuming i would need to refresh manually after indexing is complete for the full data bulk upload.
But during indexing process, i noticed the search results started showing up while only half the records had been indexed (~ 450 MB).
I also noticed cluster's memory release and segment count increased at around this time.
Researching more about this behavior, is it the ES’s default flush process which could have caused this?
I wanted to know more about the optimal metrics for the ES clusters for below config parameters and also its implications on the performance of the cluster to find - the max threshold for this cluster that i can set and use.
I also read that disk I/O and file system cache may be dependent parameters to consider.
Can I know how would these affect the bulk index and search results for my use case
Any help on this would be really appreciated.
Thanks!
D