Scroll & heap size\ another memory configuration

liron_gofberg · July 17, 2020, 11:03pm

Hi,
I want to fetch 50 millions of records into data frame with scan(scroll...)
what heap size do I need for that procedure?
How can I know what is the right size for me?
do I need to configure it only in the master node?
In all the nodes? In the client side?

there is something else that I need to configure to make this procedure running without any problems.?

please..

thanks,
Liron

warkolm · July 21, 2020, 12:20am

What problems are you having?
How big are each scroll sections you are requesting?

liron_gofberg · July 21, 2020, 5:27am

batch size is 1000
and I’ve tried to scroll 11Millions records
smaller queries works fine

I have 3 nodes
heap size at each node was 1.9GB
I changed it to 8GB

I query with routing value so I query from one shard only
Do I need to increase the heap size? what else but heap size can cause memory issue?

Thanks!!

warkolm · July 21, 2020, 5:32am

How big is the total amount of data?

liron_gofberg · July 21, 2020, 6:46am

Now the total data is 20 Millions records(2 shards)
with routing value I query 11M records out of 11M records from one of the two shards

warkolm · July 21, 2020, 11:45pm

How big in GB.

liron_gofberg · July 22, 2020, 5:57am

1GB - 5M documents

system · August 19, 2020, 5:57am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Memory error when using scan In python to retrieve millions of documents Elasticsearch	3	818	August 14, 2020
High Memory usage even after setting heap memory size Elasticsearch	5	2875	August 18, 2017
Elasticsearch Heapsize query Elasticsearch	3	721	February 7, 2020
How much heap memory need for elasticsearch data nodes Elasticsearch	5	748	March 1, 2021
Memory or heap size Logstash	5	300	May 18, 2020

Scroll & heap size\ another memory configuration

Related topics