Hi,
I want to fetch 50 millions of records into data frame with scan(scroll...)
what heap size do I need for that procedure?
How can I know what is the right size for me?
do I need to configure it only in the master node?
In all the nodes? In the client side?
there is something else that I need to configure to make this procedure running without any problems.?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.