EsHadoopNoNodesLeftException-all nodes failed On Spark.SaveToES

Hi,

I Get "EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed" When i do a df = spark.sql(select * from a table)and do a df.saveToES(indexName+"/docs")

I have 200 ORC files and with average of 145mb (raw data) = ~29GB of serialized and compressed ORC raw data.

I see 200 tasks in SPARK UI during the above code.

And the last task gets failed with the above exception.

I infer from this ticket: Similar Issue

That i need to reduce the bulk SIZE.

MY QUESTION:

How to determine the bulk size during dataframe.saveToEs() during runtime? Is there a formula based on No of executors, Cores, Memory etc..?

How to reduce the bulk size?

Thanks

You can configure the bulk size using the properties detailed at https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html#configuration-serialization, but it's also important to fully understand why the tasks are failing. I'd suggest looking through the task log for error messages for why node connections might be failing.