Is it possible to know if the index is in “refreshed” state


(Bert Vanpeteghem) #1

I need to do an operation requiring all pending transaction to be in the index.

One option would to be a manual call to _refresh, I'm wondering if there is a way to know if the index is in a "refreshed" state.

That way I can call _refresh only when necessary.


(Colin Goodheart-Smithe) #2

If you call refresh and there is no new data since the previous refresh happened then it will noop so calling refresh on an already up to date index doesn't really have an overhead.

That said you should be careful how often you call refresh as it does carry a significant overhead when there is data to refresh. E.g. you should not call refresh for every index or search operation.


(Bert Vanpeteghem) #3

As an extra question: after the refresh request is returned, can we assume the data is consistent?
So:

  • bulk
  • refresh
  • query. Will this query have the refreshed data, or is it possible the refresh is still running on elastic.

Assuming we do the above steps synchronously...


(Colin Goodheart-Smithe) #4

It is possible that the refresh will still be running on Elasticsearch when you run the query. See https://github.com/elastic/elasticsearch/issues/1063 for more information on this. It looks like @nik9000 is currently thinking about solutions to this issue.


(Nik Everett) #5

Yeah - I've got 30% of a branch to get a block_until_refresh option you can
slap on index and bulk requests. There will be limits though, like only
10000 on the node at a time or something. It's more for the interactive use
case.

It won't support "wait until everything is quiet" or "wait until everything
I started in the last x minutes is done", but its kind of like what you are
asking for.

The trouble is that its heavily niced work. I've got a few things that keep
preempting it.


(Bert Vanpeteghem) #6

Hi, thanks for your answer.

So, just to be sure that I understand correctly: does this mean that internally, refresh just starts doing the operation and returns a response before the operation is finished?

If this is just about saving 1 item (ignore the bulk part of the original questions), would it be an option to block on our end (client) until we see the document is available (eg id query in a loop until it returns the doc)? Or would there be better options?

I'm not very familiar with the elasticsearch codebase (or java for that matter :-)), could you point me to the classes where I can see this (and stop nagging here about it)?


(Nik Everett) #7

Sorry, I'm on mobile right now so I can't point to classes too effectively.
You can totally I'd query in a loop until it shows up.

The branch I mentioned is more about not issuing the refresh call because
that is a heavy-ish operation, not recommended for something you do all the
time. I'm 99% sure the refresh call is blocking because its used in tests
all the time.


(Bert Vanpeteghem) #8

Hi thanks again for the answer.

Is there a way you could say 100% sure :slight_smile: ? Because, we are currently having certain tests that (periodically) fail, because data we excpect to be in the index does not return results. We call (and wait for) refresh after putting (_bulk) the data in elastic.
After that, sometimes, our tests fail because they don't find the data.

The data in the index not being ready for search after calling refresh might be a possible cause, and we would like to rule that out...

I'm not that familiar with java or the elastic codebase, but after some debugging, in the refresh request, I did see that certain parts of the refresh code are put on a thread, and I'm not sure the main thread waits until they are finished:

TransportBroadcastOperationAction.performOperation is called (where the operation is put on a thread)


(Nik Everett) #9

If you call and wait for request it should, 100%, be in there. That is how Elasticsearch's tests work.

Your right that the refresh request is put on a thread. Its fairly normal in Elasticsearch for requests to get pushed onto a thread but its also normal for the reply not to be sent until all the requests have finished. I haven't read the source for the refresh action but I think if it didn't 100% ensure that the data was there it'd cause test failures on our side.

The upside of this is that I think you can rule refresh out from your test failures. Or, rather, you can rule refresh not working out. Its possible that you aren't waiting for refresh to finish or for the index request to finish before calling refresh. I know I've made those mistakes in the past.


(system) #10