Correct settings for "es.nodes.wan.only"

(Ankit Singh) #1


I am trying to run a spark job to load data from emr to ES cluster hosted by [ cluster ID "665e60" ]

Following is code snippet i am using to test this.

import org.apache.spark.SparkContext    
import org.apache.spark.SparkContext._
import org.elasticsearch.spark._ 
val conf = new SparkConf()
  conf.set("","false")`indent preformatted text by 4 spaces`
 val sc = new SparkContext(conf)    
//  print(conf.toDebugString)
val numbers = Map("one" -> 1, "two" -> 2, "three" -> 3)
val airports = Map("arrival" -> "Otopeni", "SFO" -> "San Fran")
sc.makeRDD(Seq(numbers, airports)).saveToEs("spark/docs")

Which results in the following error.
> org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
> at
> at
> at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:40)
> at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:84)
> at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:84)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> at
> at org.apache.spark.executor.Executor$
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> at java.util.concurrent.ThreadPoolExecutor$
> at

Can someone please suggest what config am i missing here ?

(James Baiera) #2

Your Spark driver is able to connect to the ES Cloud instance, but it seems that the Executors do not have the capability to establish a connection. This is normally the case when running in cloud environments, as most cloud environments keep all of their executors/task runners inside of a separate secured network. You will need to make sure that the executors/task runners are able to connect to the provided ES node by configuring the network settings of your deployment.

(Ankit Singh) #3

Thanks James,
I was able figure this out.

(Animageofmine) #4

@darthapple Can you share what was the issue and how did you fix?


@darthapple what was the resolution for this?
@animageofmine were you able to resolve this? I am facing the same issue. Please help.

(Joby Johny) #6

I am facing issue now. Could you please let me know the fix details

(Mayank Vijay) #8

I just want to view data stored in elastic search in form of hive table. The hive query i run is:
CREATE EXTERNAL TABLE testHiveELKTable (account int, quantity int) STORED BY 'org.elasticsearch.hasoop.hive.EsStoragwHandler' TBLPROPERTIES('es.resource' = 'index/type');

But i get,
Failed:Execution Error, return code 1 fromorg.apache.hadoop.hive.ql.exec.DDLTask. org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version- this typically happens if network/elasticsearch cluster is not accessible or when targetting a WAN/Cloud instance without proper setting in 'es.nodes.wan.only'

I even tried setting es.nodes.wan.only = true but still error shows.

(donghe90) #9

This issue was a headache for me as well. I bypass this by adding executors' ipaddress to AWS elasticsearch access policy. Hope this help.

(Mohammed Sheik) #10


we are running locally the spark as standalone master so when spark driver able to connect what could be other reasons?

(Nara Rao) #11

Hi @darthapple,

Could you please post the solution. As , I am experiencing the same issue. I have hosted both EMR and ES on AWS. Thanks!