Spark.es.nodes config setting in Spark not getting picked up


(David Kincaid) #1

Using Spark 1.6 with pyspark and TargetHolding/pyspark-elastic 0.4.2 and can't figure out how to tell it which ES node to use. The docs for pyspark-elastic say to set --conf spark.es.nodes=<nodes> but for some reason it's still using localhost. Here is my command line for launching pyspark:

/usr/bin/pyspark --packages TargetHolding/pyspark-elastic:0.4.2 --conf spark.es.nodes="172.22.6.41" --conf spark.es.nodes.discovery=false --master yarn-client

Here is the error I'm getting the logs. As you can see it's trying to connect to localhost:9200

16/06/28 18:19:53 WARN TaskSetManager: Lost task 77.0 in stage 3.0 (TID 85, ip-172-22-2-229.vet2pet.idexxi.com): org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[localhost:9200]] 
	at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:142)
	at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:383)
	at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:363)
	at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:367)
	at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:121)
	at org.elasticsearch.hadoop.rest.RestClient.esVersion(RestClient.java:513)
	at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:177)
	at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:378)
	at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:40)
	at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67)
	at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

(James Baiera) #2

You mentioned this is for the TargetHolding/pyspark-elastic which isn't really affiliated with es-hadoop proper. It also seems that you've already opened an issue with the project (Just linking it for completeness). I noticed that it's a fairly young project, but good luck with tracking down a resolution!


(system) #3