I am trying to query my index which has more then 500k records but i am only able to extract 10k records at a time. now i understand this has to do with performance of the application but how can i do bulk extract?
with records more than 50k, i don't think option of "size" and "after " will be correct way to go ahead.
current ES version we are using is 7.10.2
I tried to use the concept of PIT ID for this purpose but i am not sure if this version of ES supports PIT id because when i used dev tool with PIT id concept, it didn't work.
Please help how can i do bulk get or any pagination approach for more then 500k records.
@leandrojmp thanks for reply. but as mentioned in documentation, scroll is not recommended for deep pagination for more than 10k records. In my case, i have more then 500k records which i need to extract.
I am not using any client, rather making direct http rest calls to extract data from ES but as mentioned earlier, not able to get more than 10k at a time.
below is my sample request.
The documentation points out that you should use search_after together with PIT. As you are using the OSS version where this is not available, using the scroll API for deep pagination is still the recommended option.
My recommendation would however be to switch to the default distribution and upgrade at to the latest 7.17 release.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.