I've been trying to figure out how to change the refresh setting from Refresh.True to Refresh.False, but so far, I haven't found a proper way to do it.
Could you please advise if there's a way to modify the refresh setting after the BulkIngester has been created?
Alternatively, if modifying the refresh setting directly is not possible, I'm curious if it's feasible to have multiple BulkIngester instances with different refresh settings concurrently.
Any insights or suggestions would be greatly appreciated. Thank you!
Thank you for your question. The current approach I'm taking involves using the new Java library (v8.2). Initially, this approach seemed unfamiliar to me as well. However, after conducting some research on the matter, I've found that the recommended approach is to utilize refresh on search by setting refresh=true in your GET API calls.
However, it's important to note that this method doesn't guarantee the absolutely latest data in every scenario, correct? If this is the case, I'm curious about how to ensure that the data found in the index is indeed the latest available.
@dadoonet Ah, I couldn't locate any information regarding the efficacy of utilizing refresh on search by setting refresh=true in my GET API calls to ensure that the latest indexed data is immediately visible. Can you share with me if your know where is this documentation?
@dadoonet Thank you very much. However, my understanding is that having refresh=true in every GET API call triggers a refresh of indices with each call. Wouldn't this potentially impact performance negatively?
Refreshing frequently, either ahead of a query or through every bulk request, adds a lot of overhead and can have a huge negative performance on indexing throughput. Performing a refresh is an expensive operation which is why Elasticsearch by default does not do it for every operation but rather at a configurable interval. If you look at the guidelines for tuning indexing throughput you can see that one of the main recommendations is to disable or lengthen the refresh interval. Instead making it more frequent than the default value naturally has the opposite effect.
How often do you query the data? How are you querying the data? If you are querying frequently you will end up refreshing the index frequently, so it is not necessarily much better that refreshing after each bulk request and will affect indexing performance. If you only query occasionally it is likely a lot better to refresh ahead of the query rather than for each bulk request, although it may still affect indexing performance.
Frequent refreshes, irrespective of how they are triggered, has a negative impact on indexing performance.
@Christian_Dahlqvist The data is queried frequently, which is why I decided to refresh the index during the bulk step. Creating, updating, or deleting documents in the index is not a common occurrence.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.