Memory error when using scan In python to retrieve millions of documents

I’m using scan in python to retrieve my data
It is working when I ask for thousands of documents but when I ask for millions of records I get an error after a while and that error relate to the memory

What could be happen?

Please share the full error.
It'd also help if you shared the code that you are using.

Thanks for answering

scan use scroll
scroll raise the error

I don’t have the code now but it doesn’t matter
I would like you to think about what could happen If we talk about memory error
the error happens when I'm trying to fetch multiple millions of records
but if I ask for 500,000 everything is ok, there is no error

I read about the need to set heap size

If I query with scroll millions of records from Elasticsearch
do I need to set heap size of the master node only? do I need to configure the data nodes too?
do I need to do something else?

the master node need to have available space to assemble the results of all the shards
so if I ask for 20GB of results, The master node need to have 20 GB available?(In heap size?)

I would like to know what memory scroll need to function right with millions of records

Please

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.