I have tried several different way and I am not able to get a Spark 2.0 cluster interact with Amazon Elasticsearch cluster using ES-Hadoop (recent version 5.1.2). Please check the settings and see if you can spot anything in the configuration. I am able to telnet to ES endpoint at port 80 and also create a new ES index from EMR master node.
I keep getting the error:
Caused
by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection
error (check network and/or proxy settings)- all nodes failed; tried [[127.0.0.1:9200]]
spark-shell --packages org.elasticsearch:elasticsearch-spark-20_2.11:5.1.2
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.elasticsearch.spark._
val conf = new SparkConf().setAppName("myESHadoop").setMaster("local[*]")
conf.set("es.index.auto.create", "true")
conf.set("es.nodes", "-piqsbtyzn5dsvrmrzucprcmuiu.us-east-1.es.amazonaws.com")
conf.set("es.index.auto.create", "true")
conf.set("es.port", "80")
conf.set("es.nodes.wan.only", "true")
val sc = new SparkContext(conf)
val numbers = Map("one" -> 1, "two" -> 2, "three" -> 3)
val airports = Map("arrival" -> "Otopeni", "SFO" -> "San Fran")
sc.makeRDD(Seq(numbers, airports)).saveToEs("spark/docs")
When I use the following instead to set the configuration, the job just hangs and I see no error at all.
spark-shell --packages org.elasticsearch:elasticsearch-spark-20_2.11:5.1.2 --conf spark.es.nodes=-piqsbtyzn5dsvrmrzucprcmuiu.us-east-1.es.amazonaws.com spark.es.port=80 spark.es.index.auto.create= true spark.es.nodes.discovery=false spark.es.nodes.wan.only=true
I have a feeling configuration is the underlying issue and not networking. ES endpoint is wide open.