I Get "EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed" When i do a df = spark.sql(select * from a table)and do a df.saveToES(indexName+"/docs")
I have 200 ORC files and with average of 145mb (raw data) = ~29GB of serialized and compressed ORC raw data.
I see 200 tasks in SPARK UI during the above code.
And the last task gets failed with the above exception.
I infer from this ticket: Similar Issue
That i need to reduce the bulk SIZE.
How to determine the bulk size during dataframe.saveToEs() during runtime? Is there a formula based on No of executors, Cores, Memory etc..?
How to reduce the bulk size?