Spark.es.nodes config setting in Spark not getting picked up

David_Kincaid · June 28, 2016, 6:27pm

Using Spark 1.6 with pyspark and TargetHolding/pyspark-elastic 0.4.2 and can't figure out how to tell it which ES node to use. The docs for pyspark-elastic say to set --conf spark.es.nodes=<nodes> but for some reason it's still using localhost. Here is my command line for launching pyspark:

/usr/bin/pyspark --packages TargetHolding/pyspark-elastic:0.4.2 --conf spark.es.nodes="172.22.6.41" --conf spark.es.nodes.discovery=false --master yarn-client

Here is the error I'm getting the logs. As you can see it's trying to connect to localhost:9200

16/06/28 18:19:53 WARN TaskSetManager: Lost task 77.0 in stage 3.0 (TID 85, ip-172-22-2-229.vet2pet.idexxi.com): org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[localhost:9200]] 
	at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:142)
	at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:383)
	at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:363)
	at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:367)
	at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:121)
	at org.elasticsearch.hadoop.rest.RestClient.esVersion(RestClient.java:513)
	at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:177)
	at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:378)
	at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:40)
	at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67)
	at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

james.baiera · June 30, 2016, 7:23pm

You mentioned this is for the TargetHolding/pyspark-elastic which isn't really affiliated with es-hadoop proper. It also seems that you've already opened an issue with the project (Just linking it for completeness). I noticed that it's a fairly young project, but good luck with tracking down a resolution!

Topic		Replies	Views
Spark and Elastic node definition issue Elasticsearch es-hadoop	1	1299	July 6, 2017
How to config elasticsearch nodes remote in pyspark? Node [127.0.0.1:9200] failed (Connection refused (Connection refused)); no other nodes left - aborting Elasticsearch es-hadoop	2	1763	June 27, 2018
Cannot detect ES version - typically this happens when accessing a WAN/Cloud instance without the proper setting 'es.nodes.wan.only' on Docker Deployment Elasticsearch es-hadoop	5	11026	July 6, 2017
Using Spark DataSource with ES Hadoop Elasticsearch es-hadoop	2	689	July 6, 2017
Org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed Elasticsearch	2	2004	July 6, 2017

Spark.es.nodes config setting in Spark not getting picked up

Related topics