We have a requirement in our project to extract all the data from the Elasticsearch index and dump it into a relational DB. The volume of data in the index is quite high around 100 million. Also there are processes that continuously write into this index.
Due to the above nature of index, we are using search_after with a batch size of 100,00 along with point in time (PIT).
We have ID column in the index and we are sorting on that while making search_after request, the keep-alive time is 10 mins and in the query we are using the index name.
We are using an infinite loop to send this request to Elasticsearch cluster, and after the response is returned we are updating the search_after by taking the sort values from the response.
Also when the response is returned we check if the hit size is zero, if yes we break from the infinite loop.
The application which is doing it is written in Java and we are using elastic client version 7.17.X.
Now the logic seems to be okay and during execution, there are no errors in the logs.
However after completion, the records which are processed are much less than the count of records in the index. e.g we ran the process for 10 times and each time the processed records were different and less the the records in the index, sometimes its 30% , sometimes its 70% and sometimes even 10% !!!!
So I am puzzled and cannot seem to find the issue. Also, this code is working fine on lower environments and the processed records are equal to the records in the index. But on higher environments, we are facing this issue all the time.