Unable to index documents after around 20 million, Spark job fails because of NoNodesLeftException

Ravi_Ranjan · January 15, 2018, 10:40am

Hi,
I am indexing some 30 million records in Elastic using pyspark. I ran the code and was able to index some 20 million records post which i got the following error:

18/01/15 10:34:42 WARN TaskSetManager: Lost task 100.0 in stage 2.0 (TID 102, localhost, executor driver): org.apache.spark.SparkException: Task failed while writing rows
	at org.apache.spark.internal.io.SparkHadoopMapReduceWriter$.org$apache$spark$internal$io$SparkHadoopMapReduceWriter$$executeTask(SparkHadoopMapReduceWriter.scala:178)
	at org.apache.spark.internal.io.SparkHadoopMapReduceWriter$$anonfun$3.apply(SparkHadoopMapReduceWriter.scala:89)
	at org.apache.spark.internal.io.SparkHadoopMapReduceWriter$$anonfun$3.apply(SparkHadoopMapReduceWriter.scala:88)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:108)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[10.240.0.86:9200]] 
	at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:150)
	at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:461)
	at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:445)
	at org.elasticsearch.hadoop.rest.RestClient.bulk(RestClient.java:186)
	at org.elasticsearch.hadoop.rest.RestRepository.tryFlush(RestRepository.java:220)
	at org.elasticsearch.hadoop.rest.RestRepository.flush(RestRepository.java:242)
	at org.elasticsearch.hadoop.rest.RestRepository.doWriteToIndex(RestRepository.java:182)
	at org.elasticsearch.hadoop.rest.RestRepository.writeToIndex(RestRepository.java:159)
	at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.write(EsOutputFormat.java:151)
	at org.apache.spark.internal.io.SparkHadoopMapReduceWriter$$anonfun$4.apply(SparkHadoopMapReduceWriter.scala:148)
	at org.apache.spark.internal.io.SparkHadoopMapReduceWriter$$anonfun$4.apply(SparkHadoopMapReduceWriter.scala:144)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1375)
	at org.apache.spark.internal.io.SparkHadoopMapReduceWriter$.org$apache$spark$internal$io$SparkHadoopMapReduceWriter$$executeTask(SparkHadoopMapReduceWriter.scala:159)
	... 8 more

How can I get rid of this error? I tried increasing the heap size, RAM and also changed host to external ip of the google server, to internal ip, to 0.0.0.0 and also localhost. Nothing seems to have solved this yet.

james.baiera · January 19, 2018, 6:05pm

If you increase the logging level on your job it should display the underlying communication failure information for the failed request. Another thing I'm noticing is that you only have a single node address being used. This isn't really a problem if you are using a cloud environment or single node deployment, but if you are using a multi-node deployment and have access to all nodes in the cluster, there may be an issue with your settings that is keeping the connector from discovering more nodes.

system · February 16, 2018, 6:05pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.