Network related issue retry from spark to ES

Shreyas_K_C · January 12, 2017, 7:25am

Hello All.

I am using Spark native library to connect to Elastic Search. for non network issues we have the batch retry config, i read in post : [SPARK] es.batch.write.retry.count negative value is ignored

I am deliberately giving an invalid ES IP and the spark errors out with the below trace, since we are running in muti cluster mode, catching the exception is not feasible (have tried) as it goes into its own executor. Is there a config to set network related retry count? FYI i am using 2.3.2 Elastic Seacrh.

Any inputs related to this would be very helpful.

Exception Stack trace.

org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 1.0 failed 1 times, most recent failure: Lost task 2.0 in stage 1.0 (TID 3, localhost): org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:190)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:379)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:40)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[127.0.0.11:9200]]
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:142)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:434)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:414)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:418)
at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:122)
at org.elasticsearch.hadoop.rest.RestClient.esVersion(RestClient.java:564)
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:178)
... 10 more

james.baiera · January 16, 2017, 10:32pm

The es.batch.write.retry.count is only respected when the response from a server is either an HTTP 429 or HTTP 503 response code, which usually denotes that the server is too busy and has chosen to ignore the request in order to exert backpressure on the writers. All other HTTP Failure Responses are treated as if they will not succeed no matter how many executions are performed.

The retry policy that is configured into the http client library can be found here on github if you are interested in the usage of that configuration setting in verbatim.

system · February 13, 2017, 10:32pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
[SPARK] es.batch.write.retry.count negative value is ignored Elasticsearch es-hadoop	7	4871	July 6, 2017
Elasticsearch Spark EsHadoopNoNodesLeftException in cluster Mode Elasticsearch	7	7457	July 5, 2017
Error when multiple calls to writeToES Elasticsearch es-hadoop	5	2372	June 30, 2017
Get org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried Elasticsearch es-hadoop	2	7931	April 19, 2017
Writing from spark to elasticsearch fails Elasticsearch es-hadoop	2	1047	August 14, 2017

Network related issue retry from spark to ES

Related topics