Question about default refresh_interval and upsert

Hello everyone,

First of all, I am pretty new to ElasticSearch; please forgive me if the answer to my questions are obvious.

We have switched to ElasticSearch 7.x from ElasticSearch 6.8 and I was very interested by the new behavior of refresh_interval: " If this setting is not explicitly set, shards that haven’t seen search traffic for at least index.search.idle.after seconds will not receive background refreshes until they receive a search request".

In our case, we do bulk upsert at a pretty high rate but not, in general, when indexes are searched. I think I understand pretty well the behavior but I was wondering if the fact that we are using upsert (we force the id of the document and want to make sure we update any existing document with same id) instead of always creating a new document makes a different, does searching a document with same id is considered as a search request on the index?

I have a side question, I have not been able to find how to log all the refresh requests done by ElasticSearch (for debugging purpose); is this possible?

Thanks for your time

Welcome to our community! :smiley:

No worries. Do note that it's Elasticsearch, not a camel case S :slight_smile:

Yep, it's a search unless you are doing a GET on the _id.

As in API requests to _refresh?

Thanks for your answer

Sorry, I was not clear. I don't do a search but I do an update of the document by specifying the document _id. This, I guess, causes a search of the document internally using the _id; it is this search I was referring to. Anyway, I suppose you have answered my question since this search is likely based on the _id.

I was more thinking about logging the refresh actions triggered internally by Elasticsearch (with the good naming this time :slightly_smiling_face:) after a document is inserted/updated. I know this will cause a large amount of logging but it was to make sure I clearly understand the underlying processing.

If you are doing upserts I believe refreshes would be triggered, at least when updating a document that has not already been refreshed.

Yes, I am sure the refresh will be triggered but I am wondering when. Does the upsert command causes the index to exit the "search idle" state (if it was already in this state of course) or will the refresh be triggered only after the search idle timeout (or when a search request is done through another call).

A refresh would trigger any time you are upserting a document that exists in the transaction log and have not yet been refreshed. This is why frequent updates of single documents is very expensive in Elasticsearch as it triggers frequent small expensive refreshes. I am not sure if it would trigger refreshes on other conditions, but this means that the frequency of refreshes will depend on your update patterns.

It has been changed back and forth, hence can be confusing, but in the latest ES versions updates do NOT trigger refreshes. A more detailed history can be read in this github PR.

About logging refreshes, I am wondering if looking at refresh count in index stats could be helpful.

5 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.