If you want to read an index completely, you can read in the documentation that the search after function should be used from 10,000 documents (instead of scroll).
We have now tested this and unfortunately we have noticed a slowdown by a factor of 10. we sort with the _id meta field. If we use a technical id as a number for sorting, the speed is the same as when scrolling.
unfortunately we do not have a technical number field in our use case.
why is sorting with a text field much slower? or are we still not using something properly?
What does you query look like exactly? What is the mapping?
Do you mean this?
We no longer recommend using the scroll API for deep pagination. If you need to preserve the index state while paging through more than 10,000 hits, use the search_after parameter with a point in time (PIT).
The deep pagination does not mean extracting the whole resultset.
If your goal is to extract the whole resultset, you should IMHO use the _scroll API.
This might not solve the problem here though.
Please format your code, logs or configuration files using </> icon as explained in this guide and not the citation button. It will make your post more readable.
Or use markdown style like:
```
CODE
```
This is the icon to use if you are not using markdown format:
That's a huge difference indeed.
I'd may be try to decrease the size for search_after. But for your use case anyway, I'd use the scroll API as your goal is to extract everything.
Pinging @jimczi as he might know what is happening.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.