Unable to integrate Spark on EMR with Amazon ELasticsearch

(Ruchika Abbi) #1

I have tried several different way and I am not able to get a Spark 2.0 cluster interact with Amazon Elasticsearch cluster using ES-Hadoop (recent version 5.1.2). Please check the settings and see if you can spot anything in the configuration. I am able to telnet to ES endpoint at port 80 and also create a new ES index from EMR master node.
I keep getting the error:

by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection
error (check network and/or proxy settings)- all nodes failed; tried [[]]

spark-shell --packages org.elasticsearch:elasticsearch-spark-20_2.11:5.1.2

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.elasticsearch.spark._

val conf = new SparkConf().setAppName("myESHadoop").setMaster("local[*]")
conf.set("es.index.auto.create", "true")
conf.set("es.nodes", "-piqsbtyzn5dsvrmrzucprcmuiu.us-east-1.es.amazonaws.com")
conf.set("es.index.auto.create", "true")
conf.set("es.port", "80")
conf.set("es.nodes.wan.only", "true")
val sc = new SparkContext(conf)

val numbers = Map("one" -> 1, "two" -> 2, "three" -> 3)
val airports = Map("arrival" -> "Otopeni", "SFO" -> "San Fran")

sc.makeRDD(Seq(numbers, airports)).saveToEs("spark/docs")

When I use the following instead to set the configuration, the job just hangs and I see no error at all.

spark-shell --packages org.elasticsearch:elasticsearch-spark-20_2.11:5.1.2 --conf spark.es.nodes=-piqsbtyzn5dsvrmrzucprcmuiu.us-east-1.es.amazonaws.com spark.es.port=80 spark.es.index.auto.create= true spark.es.nodes.discovery=false spark.es.nodes.wan.only=true

I have a feeling configuration is the underlying issue and not networking. ES endpoint is wide open.

(James Baiera) #2

@RuchikaAWS Do you see any differences when you pass those configuration entries to the saveToEs call?

(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.