Fetch data from a middle of a big stack using searchAfter( jump to a specific page)

Hi All
I have a large data set around 25million records
I am using searchAfter with PointInTime to walk through the data
My question is there a way where I can skip records over the limit of 10000

index.max_result_window

and start picking the records for example from 100,000 up to 105,000

right now I am sending multiple requests to Elasticsearch until I reach the desired point but it is not efficient and it is consuming a lot of time

Here is how I did it :
I calculated how many pages I needed to do the pagination.
Then the user will send a request with page number i.e number 3. So in this case only when I reach the desired page I will set the source to true.
this I best I managed to do to improve the performance and reduce the response size for none required pages

 int numberOfPages =  Pagination.GetTotalPages(totalCount, _size);

 var pitResponse = await _esClient.OpenPointInTimeAsync(content._index, p => p.KeepAlive("2m"));

            if (pitResponse.IsValid)
            {
                IEnumerable<object> lastHit = null;

                    for (int round = 0; round < numberOfPages; round++)
                    {
                        bool fetchSource = round == requiredPage;
                        var response = await _esClient.SearchAsync<ProductionDataItem>(s => s
                            .Index(content._index)
                            .Size(10000)
                            .Source(fetchSource)
                            .Query(query)
                            .PointInTime(pitResponse.Id)
                            .Sort(srt => {
                                if (content.Sort == 1) { srt.Ascending(sortBy); }
                                else { srt.Descending(sortBy); }
                                return srt; })
                            .SearchAfter(lastHit)
                        );

                        if (fetchSource)
                        {
                           itemsList.AddRange(response.Documents.ToList());
                            break;
                        }
                        lastHit = response.Hits.Last().Sorts;
                    }
                }
                //Closing PIT
                await _esClient.ClosePointInTimeAsync(p => p.Id(pitResponse.Id));

Welcome to our community! :smiley:

I'm not super familiar with PIT, but can you show us the request that you are making?

Thanks for replay I updated the post

1 Like

Any feedback ?

I think the best way to do it, is how I did it

by keeping scrolling via Point in time and only loading the result when the desired page is reached by using the .source(bool)

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.