How to pull large amount of data (all documents in the index) using elasticsearch python client?

safarial.fatemeh · April 20, 2022, 2:57am

I'm using Elasticsearch.helpers.scan to pull down over 1M documents from Elasticsearch and I use match_all query for that.
the process is superslow (taking over 2hrs).
is there a better way to pull down all the documents from Elasticsearch?

casterQ · April 20, 2022, 3:05am

PIT or Scroll

safarial.fatemeh · April 20, 2022, 4:01am

thanks @casterQ
can you elaborate? I'm using scan which I think is the wrapper utilizing Scroll. isn't it?
and can you please provide some example about PIT and Scroll

casterQ · April 20, 2022, 6:22am

snapshot(Can only be used on ES)
CCR(Platinum)
PIT or scroll(It is the way of pulling es, and it is recommended to use the later version of pit)

here is PIT doc:

and scroll can be accelerated using slice：

system · May 18, 2022, 6:22am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Need help with scan/scroll using elasticsearch-py client Elasticsearch	2	11333	April 11, 2017
Is there a way to do scan with limit Elasticsearch	3	796	April 4, 2018
Scan/Scroll performance degrading logarithmically Elasticsearch	4	1266	July 5, 2017
Retrieving over a million records in Elasticsearch Elasticsearch	10	28481	July 5, 2017
What's the quickest way to extract a LARGE amount of records out of ES? Best practices for scroll API are welcome Elasticsearch	2	3049	July 5, 2017

How to pull large amount of data (all documents in the index) using elasticsearch python client?

Related topics