Make sure document is indexed before search for it


(Tomer Praizler) #1

Hey,

I have a use case In which I index a document and immediately try to search for it.
The thing is because elasticsearch is eventual consistent, sometimes the document is indexed before the query get executed, and sometime it doesn't.

Is there a way to make sure the document is indexed before I query it?


(David Pilato) #2

You can refresh the index. But don't do that for every doc. It's good when unit testing.


(Tomer Praizler) #3

My requirements at that specific flow, every document that is indexed (it happens 100 times a day top) can be queried immediately.
Refresh the index doesn't seems like a good solution for me.

Can't I tell the index query to return only when the document is fully indexed and copied to all replicas?
Another thing I can do is to block and query every let's say 200 milliseconds, and continue only when the document exist. what do you think?


(David Pilato) #4

If you do that only 100 times per day, refresh is the way to IMO.


(Nik Everett) #5

I've been working on and off on block until refresh that'd block the index call until the refresh but I don't know when it'll be ready. Its not my top priority, but something I do want. For now, especially given its only 100 times a day, just add the refresh option on the index request. If you want to poll elasticsearch for your changes using non-realtime get its something, but I think that is overly complex to avoid a performance hit 100 times a day.


(Tomer Praizler) #6

Thanks! that was really helpful!

So I wrote a test that index 10 documents, and then I immediacy query them, but I still don't get all of them. Does it make sense?

Here is the code (Using elastic4s):

  def insertDocs(myId: Int, amount: Int): Unit = {
    val futures: List[Future[Employee]] = List()
    for (i <- 1 to amount){
      futures :+ client.execute {
           index into indexName / documentType source MyDoc(myId) refresh true
        }
     }
    Await.result(Future.sequence(futures), 1 seconds)
  }

Then I query them:

client.execute(search in indexName / documentType term("myId", myId))

Am I missing here something?


(Nik Everett) #7

I'm not sure! Im' not used to Scala any more so I can't check the syntax and I might be getting lost, but it should work. If you can show me it not working reproduced with a minimal sequence of curl commands then I'd be able to tell you what is up. I hope you can't though, because that'd mean the tests that I run every day are missing something huge!

One thing that recently tripped by up is that when you put ?refresh=true on an index request that'll cause Elasticsearch to refresh the shard that indexed that document. Its not the same as just using the _refreshAPI on the whole index. It makes sense and seems like a really obvious optimization now that I've seen it but I just didn't realize it worked like that until about a week ago.


(system) #8