Inconsistent behaviour of search_after when used along with Point in time Id for large data sets

Pravin_Mourya · August 31, 2023, 3:59pm

Hello,

We have a requirement in our project to extract all the data from the Elasticsearch index and dump it into a relational DB. The volume of data in the index is quite high around 100 million. Also there are processes that continuously write into this index.

Due to the above nature of index, we are using search_after with a batch size of 100,00 along with point in time (PIT).

We have ID column in the index and we are sorting on that while making search_after request, the keep-alive time is 10 mins and in the query we are using the index name.

We are using an infinite loop to send this request to Elasticsearch cluster, and after the response is returned we are updating the search_after by taking the sort values from the response.

Also when the response is returned we check if the hit size is zero, if yes we break from the infinite loop.

The application which is doing it is written in Java and we are using elastic client version 7.17.X.

Now the logic seems to be okay and during execution, there are no errors in the logs.

However after completion, the records which are processed are much less than the count of records in the index. e.g we ran the process for 10 times and each time the processed records were different and less the the records in the index, sometimes its 30% , sometimes its 70% and sometimes even 10% !!!!

So I am puzzled and cannot seem to find the issue. Also, this code is working fine on lower environments and the processed records are equal to the records in the index. But on higher environments, we are facing this issue all the time.

Kindly advice

system · September 28, 2023, 4:00pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Search_after queries are not performing as expected Elasticsearch	7	2575	December 14, 2021
Inconsistent results when search_after is used for pagination sorted by score and id Elasticsearch	4	3053	November 23, 2021
Search after\| Same request return different results Elasticsearch	1	75	April 15, 2024
ElasticSearch: search_after with refresh Elasticsearch	2	393	January 18, 2022
Search_after query doesn't return correct results for pagination Elasticsearch	3	648	September 22, 2021

Inconsistent behaviour of search_after when used along with Point in time Id for large data sets

Related Topics