Total number of hits returned is wrong

Hi am running into the following scenario and I am hoping that someone can shed some light on this.

I have an integration test that follows the following design

  1. insert 8 documents in an elastic index,
  • Each item has a unique string in the title field of the document (a uuid)
  • Because this process is async, after each insertion we wait until we can find the item we attempted to insert in elastic
  1. Do a search on the unique string present in the title field of each of the documents, limit the results by 3 and use 0 as the offset

Problem :

Most of the times this will return a set of 3 documents and a total number of hits that is 8
But on occasion it thinks that the number of hits matching the query is 7, which is incorrect

Then if you wait a small amount of time eventually the total becomes 8 again. How can this happen ?
Especially considering that i validate that each of the items that have been inserted is retrievable

My setup is 3 elastic nodes, 5 shareds 1 replica. Elastic search 1.5.1

Locally I cannot reproduce this problem in any shape way or form

It is possible for a record to be retrievable from a primary shard but not available from its replica yet. So, if your presence check happens to be against the primary but the final search is done against replica, you might not get all the records back. Moreover, if you are using automatic refresh, it might occur at different times on primary and replica. So, for a short period of time, it's normal for primary and replica to be a bit out of sync. Perhaps, you need to redesign your test to use explicit refresh or at least use the same custom preference through the entire test to make sure that all your searches are using the same subset of shards and have a consistent view of the world.