Scroll API taking long time and using a lot of memory

(Raj Kiran) #1


I have a single machine with following configs
RAM: 32 GB (allocated 12 gb to ES)
I have indexed 17 million rows x 3000 columns (200 gb) data with 5 shards and no replicas
I want to retrieve a subset of data 1 million x 3000 (10gb) and store it in csv file.
I have tried various ways and it takes 9 hours to complete.
I came across the scroll api, but it is using a lot of memory and my process slows down eventually.
i am using the following query:

result_dict ="genes",doc_type="test",scroll='1m',size=5000, body={
"query": {
"terms": {
"Cadd_GeneName.keyword": arr

Any help would be greatly appreciated.


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.