Unable to integrate Spark on EMR with Amazon ELasticsearch

I have tried several different way and I am not able to get a Spark 2.0 cluster interact with Amazon Elasticsearch cluster using ES-Hadoop (recent version 5.1.2). Please check the settings and see if you can spot anything in the configuration. I am able to telnet to ES endpoint at port 80 and also create a new ES index from EMR master node.
I keep getting the error:

by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection
error (check network and/or proxy settings)- all nodes failed; tried [[]]

spark-shell --packages org.elasticsearch:elasticsearch-spark-20_2.11:5.1.2

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.elasticsearch.spark._

val conf = new SparkConf().setAppName("myESHadoop").setMaster("local[*]")
conf.set("es.index.auto.create", "true")
conf.set("es.nodes", "-piqsbtyzn5dsvrmrzucprcmuiu.us-east-1.es.amazonaws.com")
conf.set("es.index.auto.create", "true")
conf.set("es.port", "80")
conf.set("es.nodes.wan.only", "true")
val sc = new SparkContext(conf)

val numbers = Map("one" -> 1, "two" -> 2, "three" -> 3)
val airports = Map("arrival" -> "Otopeni", "SFO" -> "San Fran")

sc.makeRDD(Seq(numbers, airports)).saveToEs("spark/docs")

When I use the following instead to set the configuration, the job just hangs and I see no error at all.

spark-shell --packages org.elasticsearch:elasticsearch-spark-20_2.11:5.1.2 --conf spark.es.nodes=-piqsbtyzn5dsvrmrzucprcmuiu.us-east-1.es.amazonaws.com spark.es.port=80 spark.es.index.auto.create= true spark.es.nodes.discovery=false spark.es.nodes.wan.only=true

I have a feeling configuration is the underlying issue and not networking. ES endpoint is wide open.

@RuchikaAWS Do you see any differences when you pass those configuration entries to the saveToEs call?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.