Noticed, that the following note has been added for Scroll Pagination.
We no longer recommend using the scroll API for deep pagination. If you need to preserve the index state while paging through more than 10,000 hits, use the search_after parameter with a point in time (PIT).
Could someone advise me on this to understand this better as we are intended to use scroll pagination for one of our use cases, where the results has to be fetched in a paginated fashion to be consumed by another consuming application.
Why this is not recommended for paging through more than 10,000 hits ? What is the impact of this ?
Thanks @dadoonet - From your message, I understand that pit + search_after is more optimized than scroll.
But in our use case, we are leaning towards scroll mostly, because,
Identifying an unique sortable attribute may be bit difficult. But in scrol API, we do not need any such sortable fields.
These producing(which fetches the data from the ES in paginated batches) and consuming services here are not going to run all the time. These agents may run only when required.
Probably, we can keep the hits less than 10, 000 most of the time, with proper search criteria
Would you still advise that, its better to use the pit+search_after instead of scroll? If at all, scroll what kind of impacts, we can expect.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.