Memory error when using scan In python to retrieve millions of documents

liron_gofberg · July 17, 2020, 8:54am

I’m using scan in python to retrieve my data
It is working when I ask for thousands of documents but when I ask for millions of records I get an error after a while and that error relate to the memory

What could be happen?

warkolm · July 17, 2020, 9:03am

Please share the full error.
It'd also help if you shared the code that you are using.

liron_gofberg · July 17, 2020, 6:42pm

Thanks for answering

scan use scroll
scroll raise the error

I don’t have the code now but it doesn’t matter
I would like you to think about what could happen If we talk about memory error
the error happens when I'm trying to fetch multiple millions of records
but if I ask for 500,000 everything is ok, there is no error

I read about the need to set heap size

If I query with scroll millions of records from Elasticsearch
do I need to set heap size of the master node only? do I need to configure the data nodes too?
do I need to do something else?

the master node need to have available space to assemble the results of all the shards
so if I ask for 20GB of results, The master node need to have 20 GB available?(In heap size?)

I would like to know what memory scroll need to function right with millions of records

Please

system · August 14, 2020, 6:42pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Scroll & heap size\ another memory configuration Elasticsearch	7	654	August 19, 2020
Error while using .scan() function call Elasticsearch language-clients	7	919	October 13, 2023
How to avoid jvm error in python elasticsearch rest client Elasticsearch	6	322	June 14, 2021
Is there a way to do scan with limit Elasticsearch	3	784	April 4, 2018
Errors while doing bulk update, Am I doing this wrong? Elasticsearch	10	999	July 5, 2017

Memory error when using scan In python to retrieve millions of documents

Related topics