Search response accuracy

Warning this is probably not a good use case for Elasticsearch but...

I have an index which will containing millions of documents. We have built a real-time API which
allows customers search and return less than 1000 documents and a second API which will call a batch search API where customers will submit a search asynchronously and an app will search the index and when complete stream the resultss back to the consumer - our index and apps are deployed in AWS and the consumers are on prem. Consumers on the batch search may expect up to 2 million documents returned

I've tested the batch search API and is accurate at ~100K documents; however I have also searched using a criteria that should have return 1.6 million documents and missed returning a few hundred documents.

  1. How accurate should I expect the Elasticsearch response to be and 2) is there any method I can use to increase the accuracy of the response?

Thanks in advance

Hi, David. Welcome to the forums!

A few questions:

1.) Which version of Elasticsearch are you using?
2.) How are you retrieving documents? Are you using the Elasticsearch scroll API?

In Elasticsearch, the scroll API is the intended method for retrieving all documents matching a particular search. From the documents:

While a search request returns a single “page” of results, the scroll API can be used to retrieve large numbers of results (or even all results) from a single search request, in much the same way as you would use a cursor on a traditional database.