We are syncing data from Couchbase into Elasticsearch via CBES connector and have up to ~550ms delay from CBES inserting data into ES.
Afterwards it takes up to 2 seconds for data to become searchable in ES.
Do I understand correctly that there is unavoidable delay of up to 1 second from the refresh_interval index setting? Can refresh_interval be set to less than 1 second?
The goal is to decrease latency between data being inserted in Couchbase and appearing for search in Elasticsearch.
I've applied 200ms refresh_interval to an index in test env. Are we going to have any adverse effects in a big indexes? How low can it go realistically?
Do you know someone running refresh_interval lower than 1/s in Production? I'll start lowering it by 100-200ms increments and see if load is acceptable.
Can you suggest which parameters should I monitor after decreasing this parameter? To check for any adverse effects.
In addition to altering the refresh interval you can also set a parameter per request to refresh once the request has completed. This however genarally has a huge impact on indexing throughpout as it results in a lot of additional processing, disk I/O and a lot of very small segments that need to be merged. I assume this would also negatively affect search performance. I would expect setting a very short refresh interval to have similar effects.
And only way to assess negative performance impact is by trial? If I have some resources to spare should it work reliably and only adverse effect is performance degradation?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.