EsHadoopNoNodesLeftException-all nodes failed On Spark.SaveToES

Hi,

I Get "EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed" When i do a df = spark.sql(select * from a table)and do a df.saveToES(indexName+"/docs")

I have 200 ORC files and with average of 145mb (raw data) = ~29GB of serialized and compressed ORC raw data.

I see 200 tasks in SPARK UI during the above code.

And the last task gets failed with the above exception.

I infer from this ticket: Similar Issue

That i need to reduce the bulk SIZE.

MY QUESTION:

How to determine the bulk size during dataframe.saveToEs() during runtime? Is there a formula based on No of executors, Cores, Memory etc..?

How to reduce the bulk size?

Thanks

You can configure the bulk size using the properties detailed at https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html#configuration-serialization, but it's also important to fully understand why the tasks are failing. I'd suggest looking through the task log for error messages for why node connections might be failing.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.