Correct settings for "es.nodes.wan.only"

Hello,

I am trying to run a spark job to load data from emr to ES cluster hosted by elastic.co. [ cluster ID "665e60" ]

Following is code snippet i am using to test this.

import java.io.PrintStream
import org.apache.spark.SparkContext    
import org.apache.spark.SparkContext._
import org.elasticsearch.spark._ 
val conf = new SparkConf()
  conf.set("spark.es.nodes","ssssss.us-east-1.aws.found.io")
  conf.set("spark.es.port","9243")
  conf.set("spark.es.nodes.discovery","ture")
  conf.set("spark.es.nodes.client.only","false")
  conf.set("spark.es.nodes.wan.only","false")`indent preformatted text by 4 spaces`
  conf.set("spark.es.net.http.auth.user","sssss")
  conf.set("spark.es.net.http.auth.pass","lololol")
 val sc = new SparkContext(conf)    
//  print(conf.toDebugString)
val numbers = Map("one" -> 1, "two" -> 2, "three" -> 3)
val airports = Map("arrival" -> "Otopeni", "SFO" -> "San Fran")
sc.makeRDD(Seq(numbers, airports)).saveToEs("spark/docs")

Which results in the following error.
> org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
> at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:196)
> at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:379)
> at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:40)
> at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:84)
> at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:84)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)

Can someone please suggest what config am i missing here ?



1 Like

Your Spark driver is able to connect to the ES Cloud instance, but it seems that the Executors do not have the capability to establish a connection. This is normally the case when running in cloud environments, as most cloud environments keep all of their executors/task runners inside of a separate secured network. You will need to make sure that the executors/task runners are able to connect to the provided ES node by configuring the network settings of your deployment.

2 Likes

Thanks James,
I was able figure this out.

@darthapple Can you share what was the issue and how did you fix?

@darthapple what was the resolution for this?
@animageofmine were you able to resolve this? I am facing the same issue. Please help.

I am facing issue now. Could you please let me know the fix details

Hello,
I just want to view data stored in elastic search in form of hive table. The hive query i run is:
CREATE EXTERNAL TABLE testHiveELKTable (account int, quantity int) STORED BY 'org.elasticsearch.hasoop.hive.EsStoragwHandler' TBLPROPERTIES('es.resource' = 'index/type');

But i get,
Failed:Execution Error, return code 1 fromorg.apache.hadoop.hive.ql.exec.DDLTask. org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version- this typically happens if network/elasticsearch cluster is not accessible or when targetting a WAN/Cloud instance without proper setting in 'es.nodes.wan.only'

I even tried setting es.nodes.wan.only = true but still error shows.

This issue was a headache for me as well. I bypass this by adding executors' ipaddress to AWS elasticsearch access policy. Hope this help.

James

we are running locally the spark as standalone master so when spark driver able to connect what could be other reasons?

Hi @darthapple,

Could you please post the solution. As , I am experiencing the same issue. I have hosted both EMR and ES on AWS. Thanks!

For me even when I had es.nodes.wan.only = true it still complained that it org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version

I use ES provided by AWS Elasticsearch service. In order to overcome the problem I also had to add the following config property: es.net.ssl = true.

just import right version with your elasticsearch cluster version, it will work well all the time.

Not being able to connect to Elasticsearch is a common problem and we appreciate when people post solutions that got them out of that problem. That said, I am going to close this thread since it is quite old and is often resurrected while a new thread may have been better to post questions and solutions in. Thank you all for your participation!