Yeah - I've got 30% of a branch to get a block_until_refresh option you can
slap on index and bulk requests. There will be limits though, like only
10000 on the node at a time or something. It's more for the interactive use
It won't support "wait until everything is quiet" or "wait until everything
I started in the last x minutes is done", but its kind of like what you are
The trouble is that its heavily niced work. I've got a few things that keep
So, just to be sure that I understand correctly: does this mean that internally, refresh just starts doing the operation and returns a response before the operation is finished?
If this is just about saving 1 item (ignore the bulk part of the original questions), would it be an option to block on our end (client) until we see the document is available (eg id query in a loop until it returns the doc)? Or would there be better options?
I'm not very familiar with the elasticsearch codebase (or java for that matter :-)), could you point me to the classes where I can see this (and stop nagging here about it)?
Sorry, I'm on mobile right now so I can't point to classes too effectively.
You can totally I'd query in a loop until it shows up.
The branch I mentioned is more about not issuing the refresh call because
that is a heavy-ish operation, not recommended for something you do all the
time. I'm 99% sure the refresh call is blocking because its used in tests
all the time.
Is there a way you could say 100% sure ? Because, we are currently having certain tests that (periodically) fail, because data we excpect to be in the index does not return results. We call (and wait for) refresh after putting (_bulk) the data in elastic.
After that, sometimes, our tests fail because they don't find the data.
The data in the index not being ready for search after calling refresh might be a possible cause, and we would like to rule that out...
I'm not that familiar with java or the elastic codebase, but after some debugging, in the refresh request, I did see that certain parts of the refresh code are put on a thread, and I'm not sure the main thread waits until they are finished:
TransportBroadcastOperationAction.performOperation is called (where the operation is put on a thread)
If you call and wait for request it should, 100%, be in there. That is how Elasticsearch's tests work.
Your right that the refresh request is put on a thread. Its fairly normal in Elasticsearch for requests to get pushed onto a thread but its also normal for the reply not to be sent until all the requests have finished. I haven't read the source for the refresh action but I think if it didn't 100% ensure that the data was there it'd cause test failures on our side.
The upside of this is that I think you can rule refresh out from your test failures. Or, rather, you can rule refresh not working out. Its possible that you aren't waiting for refresh to finish or for the index request to finish before calling refresh. I know I've made those mistakes in the past.