Scroll API taking long time and using a lot of memory

Raj_Kiran · February 13, 2018, 6:12pm

Hi,

I have a single machine with following configs
RAM: 32 GB (allocated 12 gb to ES)
HDD
I have indexed 17 million rows x 3000 columns (200 gb) data with 5 shards and no replicas
I want to retrieve a subset of data 1 million x 3000 (10gb) and store it in csv file.
I have tried various ways and it takes 9 hours to complete.
I came across the scroll api, but it is using a lot of memory and my process slows down eventually.
i am using the following query:

result_dict = es.search(index="genes",doc_type="test",scroll='1m',size=5000, body={
"query": {
"terms": {
"Cadd_GeneName.keyword": arr
}
},
"sort":"_doc"
}

Any help would be greatly appreciated.

Thanks,
-Raj

system · March 13, 2018, 6:13pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Scroll vs search_after memory Elasticsearch	4	998	April 22, 2019
How to fetch ~12M documents(may be even more) quickly from ES using scroll API? Elasticsearch	4	831	December 28, 2017
Slow results retrieval Elasticsearch	5	400	December 17, 2018
How to improve Scroll runtime for 5 billion record retrieval? Elasticsearch	3	403	May 11, 2020
Elasticsearch Bulk Write is slow using Scan and Scroll Elasticsearch	4	899	July 5, 2017

Scroll API taking long time and using a lot of memory

Related topics