Elasticsearch Pagination: Scroll API

Hi,

Noticed, that the following note has been added for Scroll Pagination.

We no longer recommend using the scroll API for deep pagination. If you need to preserve the index state while paging through more than 10,000 hits, use the search_after parameter with a point in time (PIT).

Could someone advise me on this to understand this better as we are intended to use scroll pagination for one of our use cases, where the results has to be fetched in a paginated fashion to be consumed by another consuming application.

  1. Why this is not recommended for paging through more than 10,000 hits ? What is the impact of this ?

That's exactly the goal of pit + search_after.
This is much better than scroll API as there are some optimizations behind the scene.

Why this is not recommended for paging through more than 10,000 hits?

Actually, I think I read the sentence in another way than you did and we probably meant:

if you need to run data extraction for more than 10 000 hits, don't use from + size but search_after + pit.

Where previously it was:

if you need to run data extraction for more than 10 000 hits, don't use from + size but scroll

My 2 cents.

Thanks @dadoonet - From your message, I understand that pit + search_after is more optimized than scroll.
But in our use case, we are leaning towards scroll mostly, because,

  1. Identifying an unique sortable attribute may be bit difficult. But in scrol API, we do not need any such sortable fields.
  2. These producing(which fetches the data from the ES in paginated batches) and consuming services here are not going to run all the time. These agents may run only when required.
  3. Probably, we can keep the hits less than 10, 000 most of the time, with proper search criteria

Would you still advise that, its better to use the pit+search_after instead of scroll? If at all, scroll what kind of impacts, we can expect.

Yeah. Sort by _doc. That's the most efficient way.

That's even better. You can fetch all the hits in one single query. Which means that you don't need to hold the pit.

Thanks David. This helps.

Could you please advise me on the following also,

  1. What is the maximum keep alive duration for
    a. Scroll API
    b. PIT for Search_After
    Thanks,
    Sundar.